Commit Graph

416 Commits

Author SHA1 Message Date
Kazu Hirata 902cbcd59e Use llvm::is_contained where appropriate (NFC)
Summary:
This patch replaces std::find with llvm::is_contained where
appropriate.

Reviewers: efriedma, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84489
2020-07-27 10:20:44 -07:00
Simon Pilgrim 0128b9505c Revert rG5dd566b7c7b78bd- "PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI."
This reverts commit 5dd566b7c7.

Causing some buildbot failures that I'm not seeing on MSVC builds.
2020-07-24 13:02:33 +01:00
Simon Pilgrim 5dd566b7c7 PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI.
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.

This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
2020-07-24 12:40:50 +01:00
Yuanfang Chen ca1e69a675 [NFC] remove unused includes of SelectionDAGISel.h 2020-07-20 10:43:29 -07:00
Simon Pilgrim 017e5c949b MCFixup.h - remove unnecessary MCExpr.h include. NFCI.
Move the include down to files that actually depend on MCExpr definitions.

Also exposes an implicit dependency on MCContext in AVRAsmBackend.h
2020-07-20 15:17:19 +01:00
Yonghong Song 0e347c0ff0 BPF: generate .rodata BTF datasec for certain initialized local var's
Currently, BTF datasec type for .rodata is generated only if there are
user-defined readonly global variables which have debuginfo generated.

Certain readonly global variables may be generated from initialized
local variables. For example,
  void foo(const void *);
  int test() {
    const struct {
      unsigned a[4];
      char b;
    } val = { .a = {2, 3, 4, 5}, .b = 6 };
    foo(&val);
    return 0;
  }

The clang will create a private linkage const global to store
the initialized value:
  @__const.test.val = private unnamed_addr constant %struct.anon
      { [4 x i32] [i32 2, i32 3, i32 4, i32 5], i8 6 }, align 4

This global variable eventually is put in .rodata ELF section.

If there is .rodata ELF section, libbpf expects a BTF .rodata
datasec as well even though it may be empty meaning there are no
global readonly variables with proper debuginfo. Martin reported
a bug where without this empty BTF .rodata datasec, the bpftool
gen will exit with an error.

This patch fixed the issue by generating .rodata BTF datasec
if there exists local var intial data which will result in
.rodata ELF section.

Differential Revision: https://reviews.llvm.org/D84002
2020-07-17 09:45:57 -07:00
Logan Smith a19461d9e1 [NFC] Add 'override' keyword where missing in include/ and lib/.
This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual.

Differential Revision: https://reviews.llvm.org/D83709
2020-07-14 09:47:29 -07:00
Yonghong Song 152a9fef1b BPF: permit .maps section variables with typedef type
Currently, llvm when see a global variable in .maps section,
it ensures its type must be a struct type. Then pointee
will be further evaluated for the structure members.
In normal cases, the pointee type will be skipped.

Although this is what current all bpf programs are doing,
but it is a little bit restrictive. For example, it is legitimate
for users to have:
typedef struct { int key_size; int value_size; } __map_t;
__map_t map __attribute__((section(".maps")));

This patch lifts this restriction and typedef of
a struct type is also allowed for .maps section variables.
To avoid create unnecessary fixup entries when traversal
started with typedef/struct type, the new implementation
first traverse all map struct members and then traverse
the typedef/struct type. This way, in internal BTFDebug
implementation, no fixup entries are generated.

Two new unit tests are added for typedef and const
struct in .maps section. Also tested with kernel bpf selftests.

Differential Revision: https://reviews.llvm.org/D83638
2020-07-12 09:42:25 -07:00
Yonghong Song 3eacfdc72f [BPF] Fix a BTF gen bug related to a pointer struct member
Currently, BTF generation stops at pointer struct members
if the pointee type is a struct. This is to avoid bloating
generated BTF size. The following is the process to
correctly record types for these pointee struct types.
  - During type traversal stage, when a struct member, which
    is a pointer to another struct, is encountered,
    the pointee struct type, keyed with its name, is
    remembered in a Fixup map.
  - Later, when all type traversal is done, the Fixup map
    is scanned, based on struct name matching, to either
    resolve as pointing to a real already generated type
    or as a forward declaration.

Andrii discovered a bug if the struct member pointee struct
is anonymous. In this case, a struct with empty name is
recorded in Fixup map, and later it happens another anonymous
struct with empty name is defined in BTF. So wrong type
resolution happens.

To fix the problem, if the struct member pointee struct
is anonymous, pointee struct type will be generated in
stead of being put in Fixup map.

Differential Revision: https://reviews.llvm.org/D82976
2020-07-01 09:55:01 -07:00
Guillaume Chatelet 0f9d623b63 [Alignment][NFC] Use Align for BPFAbstractMemberAccess::RecordAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82962
2020-07-01 16:23:52 +00:00
Yonghong Song 7f6bc84a97 [BPF] Fix a bug for __builtin_preserve_field_info() processing
Andrii discovered a problem where a simple case similar to below
will generate wrong relocation kind:
  enum { FIELD_EXISTENCE = 2, };
  struct s1 { int a1; };
  int test() {
    struct s1 *v = 0;
    return __builtin_preserve_field_info(v[0], FIELD_EXISTENCE);
  }
The expected relocation kind should be FIELD_EXISTENCE, but
recorded reloc kind in the final object file is FIELD_BYTE_OFFSET,
which is incorrect.

This exposed a bug in generating access strings from intrinsics.
The current access string generation has two steps:
  step 1: find the base struct/union type,
  step 2: traverse members in the base type.

The current implementation relies on at lease one member access
in step 2 to get the correct relocation kind, which is true
in typical cases. But if there is no member accesses, the current
implementation falls to the default info kind FIELD_BYTE_OFFSET.
This is incorrect, we should still record the reloc kind
based on the user input. This patch fixed this issue by properly
recording the reloc kind in such cases.

Differential Revision: https://reviews.llvm.org/D82932
2020-06-30 23:45:37 -07:00
Guillaume Chatelet c1cd61e02a [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemcpy to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82849
2020-06-30 13:12:31 +00:00
Yonghong Song 89648eb16d [BPF] fix a bug for BTF pointee type pruning
In BTF, pointee type pruning is used to reduce cluttering
too many unused types into prog BTF. For example,
   struct task_struct {
      ...
      struct mm_struct *mm;
      ...
   }
If bpf program does not access members of "struct mm_struct",
there is no need to bring types for "struct mm_struct" to BTF.

This patch fixed a bug where an incorrect pruning happened.
The test case like below:
    struct t;
    typedef struct t _t;
    struct s1 { _t *c; };
    int test1(struct s1 *arg) { ... }

    struct t { int a; int b; };
    struct s2 { _t c; }
    int test2(struct s2 *arg) { ... }

After processing test1(), among others, BPF backend generates BTF types for
    "struct s1", "_t" and a placeholder for "struct t".
Note that "struct t" is not really generated. If later a direct access
to "struct t" member happened, "struct t" BTF type will be generated
properly.

During processing test2(), when processing member type "_t c",
BPF backend sees type "_t" already generated, so returned.
This caused the problem that "struct t" BTF type is never generated and
eventually causing incorrect type definition for "struct s2".

To fix the issue, during DebugInfo type traversal, even if a
typedef/const/volatile/restrict derived type has been recorded in BTF,
if it is not a type pruning candidate, type traversal of its base type continues.

Differential Revision: https://reviews.llvm.org/D82041
2020-06-17 15:13:46 -07:00
Benjamin Kramer df9a51dab3 Remove global std::strings. NFCI. 2020-06-17 14:29:42 +02:00
Yonghong Song 4db1878158 [BPF] fix incorrect type in BPFISelDAGToDAG readonly load optimization
In BPF Instruction Selection DAGToDAG transformation phase,
BPF backend had an optimization to turn load from readonly data
section to direct load of the values. This phase is implemented
before libbpf has readonly section support and before alu32
is supported.

This phase however may generate incorrect type when alu32 is
enabled. The following is an example,
  -bash-4.4$ cat ~/tmp2/t.c
  struct t {
    unsigned char a;
    unsigned char b;
    unsigned char c;
  };
  extern void foo(void *);
  int test() {
    struct t v = {
      .b = 2,
    };
    foo(&v);
    return 0;
  }

The compiler will turn local variable "v" into a readonly section.
During instruction selection phase, the compiler generates two
loads from readonly section, one 2 byte load or 1 byte load, e.g., for 2 loads,
  t8: i32,ch = load<(dereferenceable load 2 from `i8* getelementptr inbounds
       (%struct.t, %struct.t* @__const.test.v, i64 0, i32 0)`, align 1),
       anyext from i16> t3, GlobalAddress:i64<%struct.t* @__const.test.v> 0, undef:i64
  t9: ch = store<(store 2 into %ir.v1.sub1), trunc to i16> t3, t8,
    FrameIndex:i64<0>, undef:i64

BPF backend changed t8 to i64 = Constant<2> and eventually the generated machine IR:
  t10: i64 = MOV_ri TargetConstant:i64<2>
  t40: i32 = SLL_ri_32 t10, TargetConstant:i32<8>
  t41: i32 = OR_ri_32 t40, TargetConstant:i64<0>
  t9: ch = STH32<Mem:(store 2 into %ir.v1.sub1)> t41, TargetFrameIndex:i64<0>,
      TargetConstant:i64<0>, t3

Note that t10 in the above is not correct. The type should be i32 and instruction
should be MOV_ri_32. The reason for incorrect insn selection is BPF insn selection
generated an i64 constant instead of an i32 constant as specified in the original
load instruction. Such incorrect insn sequence eventually caused the following
fatal error when a COPY insn tries to copy a 64bit register to a 32bit subregister.
  Impossible reg-to-reg copy
  UNREACHABLE executed at ../lib/Target/BPF/BPFInstrInfo.cpp:42!

This patch fixed the issue by using the load result type instead of always i64
when doing readonly load optimization.

Differential Revision: https://reviews.llvm.org/D81630
2020-06-11 19:31:06 -07:00
Yonghong Song 3659559cf3 [BPF] Remove unnecessary MOV_32_64 instructions
Commit 13f6c81c5d ("[BPF] simplify zero extension
with MOV_32_64") tried to use MOV_32_64 instructions
instead of lshift/rshift instructions for zero extension.
This has the benefit to remove the number of instructions
and may help verifier too.

But the same commit also removed the old MOV_32_64
pruning as it deems unsafe as MOV_32_64 does have the
side effect, zeroing out the top 32bit in the register.
This caused the following failure in kernel selftest
test_cls_redirect.o. In linux kernel, we have
     struct __sk_buff {
        __u32 data;
        __u32 data_end;
     };
The compiler will generate 32bit load for __sk_buff->data
and __sk_buff->data_end. But kernel verifier will actually
loads an address (64bit address on 64bit kernel) to the
result register. In this particular example, the explicit zext
was not optimized away and destroyed top 32bit
address and the verifier rejected the program :
     w2 = *(u32 *)(r1 + 76)
     ...
     r2 = w2  /* MOV_32_64: this will clear top 32bit */

Currently, if the load and the zext are next to each other, the
instruction pattern match can actually capture this to
avoid MOV_32_64, e.g., in BPFInstrInfo.td, we have
  def : Pat<(i64 (zextloadi32 ADDRri:$src)),
            (SUBREG_TO_REG (i64 0), (LDW32 ADDRri:$src), sub_32)>;

However, if they are not next to each other, LDW32 and
MOV_32_64 are generated, which may cause the above mentioned
problem.

BPF Backend already tried to optimize away pattern
   mov_32_64 + lshift + rshift

Commit 13f6c81c5d may generate mov_32_64 not followed by shifts.
This patch added optimization for only mov_32_64 too.

Differential Revision: https://reviews.llvm.org/D81048
2020-06-03 08:14:54 -07:00
John Fastabend 13f6c81c5d [BPF] simplify zero extension with MOV_32_64
The current pattern matching for zext results in the following code snippet
being produced,

  w1 = w0
  r1 <<= 32
  r1 >>= 32

Because BPF implementations require zero extension on 32bit loads this
both adds a few extra unneeded instructions but also makes it a bit
harder for the verifier to track the r1 register bounds. For example in
this verifier trace we see at the end of the snippet R2 offset is unknown.
However, if we track this correctly we see w1 should have the same bounds
as r8. R8 smax is less than U32 max value so a zero extend load should keep
the same value. Adding a max value of 800 (R8=inv(id=0,smax_value=800)) to
an off=0, as seen in R7 should create a max offset of 800. However at the
end of the snippet we note the R2 max offset is 0xffffFFFF.

  R0=inv(id=0,smax_value=800)
  R1_w=inv(id=0,umax_value=2147483647,var_off=(0x0; 0x7fffffff))
  R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
  R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R9=inv800 R10=fp0 fp-8=mmmm????
 58: (1c) w9 -= w8
 59: (bc) w1 = w8
 60: (67) r1 <<= 32
 61: (77) r1 >>= 32
 62: (bf) r2 = r7
 63: (0f) r2 += r1
 64: (bf) r1 = r6
 65: (bc) w3 = w9
 66: (b7) r4 = 0
 67: (85) call bpf_get_stack#67
  R0=inv(id=0,smax_value=800)
  R1_w=ctx(id=0,off=0,imm=0)
  R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R3_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
  R4_w=inv0 R6=ctx(id=0,off=0,imm=0)
  R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
  R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R9_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
  R10=fp0 fp-8=mmmm????

After this patch R1 bounds are not smashed by the <<=32 >>=32 shift and we
get correct bounds on R2 umax_value=800.

Further it reduces 3 insns to 1.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Differential Revision: https://reviews.llvm.org/D73985
2020-05-27 11:26:39 -07:00
Yonghong Song eec758825d [BPF] fix an asan issue when disassemble an illegal instruction
Commit 8e8f1bd75a ("[BPF] Return fail if disassembled insn registers
out of range") tried to fix a segfault when an illegal instruction
is decoded. A test case is added to emulate such an illegal instruction.

The llvm buildbot reported an asan issue with this test case.
  ERROR: AddressSanitizer: global-buffer-overflow on address ...
  decodeMemoryOpValue(llvm::MCInst&, unsigned int, ...)
  llvm::MCDisassembler::DecodeStatus llvm::decodeToMCInst<unsigned long>(...)
  llvm::MCDisassembler::DecodeStatus llvm::decodeInstruction<unsigned long>(...)
  in (anonymous namespace)::BPFDisassembler::getInstruction(...)
  ...

Basically, the fix in Commit 8e8f1bd75a is too later to prevent
the asan. The fix in this patch moved the register number check earlier
during decodeInstruction(). It will return fail for decodeInstruction()
if the register number is out of range.

Note that DecodeGPRRegisterClass() and DecodeGPR32RegisterClass()
already have register number checking, so here we only check
decodeMemoryOpValue().
2020-05-18 22:33:34 -07:00
Yonghong Song 8e8f1bd75a [BPF] Return fail if disassembled insn registers out of range
Daniel reported a llvm-objdump segfault like below:
  $ llvm-objdump -D bpf_xdp.o
  ...
  0000000000000000 <.strtab>:
       0:       00 63 69 6c 69 75 6d 5f <unknown>
       1:       6c 62 36 5f 61 66 66 69 w2 <<= w6
  ...
  (llvm-objdump: lib/Target/BPF/BPFGenAsmWriter.inc:1087: static const char*
   llvm::BPFInstPrinter::getRegisterName(unsigned int): Assertion
   `RegNo && RegNo < 25 && "Invalid register number!"' failed.
   Stack dump:
   0.      Program arguments: llvm-objdump -D bpf_xdp.o
    ...
    abort
    ...
    llvm::BPFInstPrinter::getRegisterName(unsigned int)
    llvm::BPFInstPrinter::printMemOperand(llvm::MCInst const*,
                          int, llvm::raw_ostream&, char const*)
    llvm::BPFInstPrinter::printInstruction(llvm::MCInst const*,
                          unsigned long, llvm::raw_ostream&)
    llvm::BPFInstPrinter::printInst(llvm::MCInst const*,
                          unsigned long, llvm::StringRef, llvm::MCSubtargetInfo const&,
                          llvm::raw_ostream&)
   ...

Basically, since -D enables disassembly for all sections, .strtab is also disassembled,
but some strings are decoded as legal instructions but with illegal register numbers.
When llvm-objdump tries to print register name for these illegal register numbers,
assertion and segfault happens.

The patch fixed the issue by returning fail for a disassembled insn if
that insn contains a reg operand with illegal reg number.
The insn will be printed as "<unknown>" instead of causing an assertion.
2020-05-18 18:53:23 -07:00
Yonghong Song ddff9799d2 [BPF] Prevent disassembly segfault for NOP insn
For a simple program like below:
  -bash-4.4$ cat t.c
  int test() {
    asm volatile("r0 = r0" ::);
    return 0;
  }
compiled with
  clang -target bpf -O2 -c t.c
the following llvm-objdump command will segfault.
  llvm-objdump -d t.o

  0:       bf 00 00 00 00 00 00 00 nop
  llvm-objdump: ../include/llvm/ADT/SmallVector.h:180
  ...
  Assertion `idx < size()' failed
  ...
  abort
  ...
  llvm::BPFInstPrinter::printOperand
  llvm::BPFInstPrinter::printInstruction
  ...

The reason is both NOP and MOV_rr (r0 = r0) having the same encoding.
The disassembly getInstruction() decodes to be a NOP instruciton but
during printInstruction() the same encoding is interpreted as
a MOV_rr instruction. Such a mismatcch caused the segfault.

The fix is to make NOP instruction as CodeGen only so disassembler
will skip NOP insn for disassembling.

Note that instruction "r0 = r0" should not appear in non inline
asm codes since BPF Machine Instruction Peephole optimization will
remove it.

Differential Revision: https://reviews.llvm.org/D80156
2020-05-18 17:40:18 -07:00
Yonghong Song 6b01b46538 [BPF] preserve debuginfo types for builtin __builtin__btf_type_id()
The builtin function
  u32 btf_type_id = __builtin_btf_type_id(param, 0)
can help preserve type info for the following use case:
  extern void foo(..., void *data, int size);
  int test(...) {
    struct t { int a; int b; int c; } d;
    d.a = ...; d.b = ...; d.c = ...;
    foo(..., &d, sizeof(d));
  }

The function "foo" in the above only see raw data and does not
know what type of the data is. In certain cases, e.g., logging,
the additional type information will help pretty print.

This patch handles the builtin in BPF backend. It includes
an IR pass to translate the IR intrinsic to a load of
a global variable which carries the metadata, and an MI
pass to remove the intermediate load of the global variable.
Finally, in AsmPrinter pass, proper instruction are generated.

In the above example, the second argument for __builtin_btf_type_id()
is 0, which means a relocation for local adjustment,
i.e., w.r.t. bpf program BTF change,  will be generated.
The value 1 for the second argument means
a relocation for remote adjustment, e.g., against vmlinux.

Differential Revision: https://reviews.llvm.org/D74572
2020-05-15 08:00:44 -07:00
Craig Topper a58b62b4a2 [IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand().
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().

I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.

Differential Revision: https://reviews.llvm.org/D78882
2020-04-27 22:17:03 -07:00
Simon Pilgrim fa6b68a404 BPFMCTargetDesc.h - remove unused raw_ostream forward declaration. NFC. 2020-04-22 18:26:50 +01:00
Simon Pilgrim 54b3f91d20 [BPF] Remove unused forward declarations. NFC. 2020-04-22 15:07:18 +01:00
Shengchen Kan 8bb059ab63 [MC][Bugfix] Remove redundant parameter for relaxInstruction
Summary:
Before this patch, `relaxInstruction` takes three arguments, the first
argument refers to the instruction before relaxation and the third
argument is the output instruction after relaxation. There are two quite
strange things:
  1) The first argument's type is `const MCInst &`, the third
  argument's type is `MCInst &`, but they may be aliased to the same
  variable
  2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third
  argument is a fresh uninitialized `MCInst` even if `relaxInstruction`
  may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a
  loop.

In this patch, we drop the thrid argument, and let `relaxInstruction`
directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and
breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon.

Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain

Reviewed By: Razer6, MaskRay, bcain

Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78364
2020-04-21 11:06:55 +08:00
Yonghong Song 3cb7e7bf95 BPF: fix a CORE optimization bug
For the test case in this patch like below
  struct t { int a; } __attribute__((preserve_access_index));
  int foo(void *);
  int test(struct t *arg) {
      long param[1];
      param[0] = (long)&arg->a;
      return foo(param);
  }

The IR right before BPF SimplifyPatchable phase:
  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %2:gpr = LDD killed %1:gpr, 0
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
  STD killed %3:gpr, %stack.0.param, 0
After SimplifyPatchable phase, the incorrect IR is generated:
  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
  CORE_MEM killed %3:gpr, 306, %0:gpr, @"llvm.t:0:0$0:0"

Note that CORE_MEM pseudo op is introduced to encode
memory operations related to CORE. In the above, we intend
to check whether we have a store like
   *(%3:gpr + 0) = ...
and if this is the case, we could replace it with
   *(%0:gpr + @"llvm.t:0:0$0:0"_ = ...

Unfortunately, in the above, IR for the store is
   *(%stack.0.param + 0) = %3:gpr
and transformation should not happen.

Note that we won't have problem if the actual CORE
dereference (arg->a) happens.

This patch fixed the problem by skip CORE optimization if
the use of ADD_rr result is not the base address of the store
operation.

Differential Revision: https://reviews.llvm.org/D78466
2020-04-20 19:54:51 -07:00
Simon Pilgrim cbd790a443 DebugHandlerBase.h - reduce MachineInstr.h include to DebugLoc.h include.
We were only including MachineInstr.h for DebugLoc.h. This exposes an implicit include dependency in BTFDebug.h where I've had to add the MachineInstr.h include.
2020-04-19 11:14:01 +01:00
LemonBoy aad3d578da [DebugInfo] Change DIEnumerator payload type from int64_t to APInt
This allows the representation of arbitrarily large enumeration values.
See https://lists.llvm.org/pipermail/llvm-dev/2017-December/119475.html for context.

Reviewed By: andrewrk, aprantl, MaskRay

Differential Revision: https://reviews.llvm.org/D62475
2020-04-18 12:49:31 -07:00
Fangrui Song 7d1ff446b6 [MC] Rename MCSection*::getSectionName() to getName(). NFC
A pending change will merge MCSection*::getName() to MCSection::getName().
2020-04-15 16:48:14 -07:00
Fangrui Song d2e5157c1f [MC] Add UseIntegratedAssembler = false. NFC 2020-04-11 10:13:49 -07:00
Simon Pilgrim a88cc20456 ProfileSummaryInfo.h - remove unnecessary includes. NFC
Remove a number of includes that aren't necessary (nor are we relying on the remaining includes to provide the declarations), we just needed a llvm::Instruction forward declaration.

This exposed a couple of source files that were implicitly replying on the includes for their use of llvm::SmallSet or std::set, requiring local includes to be added there instead.
2020-04-10 16:25:48 +01:00
Eli Friedman 565b56a72c [NFC] Clean up uses of LoadInst constructor. 2020-04-07 16:28:53 -07:00
Yonghong Song ced0d1f42b [BPF] support 128bit int explicitly in layout spec
Currently, bpf does not specify 128bit alignment in its
layout spec. So for a structure like
  struct ipv6_key_t {
    unsigned pid;
    unsigned __int128 saddr;
    unsigned short lport;
  };
clang will generate IR type
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }
Additional padding is to ensure later IR->MIR can generate correct
stack layout with target layout spec.

But it is common practice for a tracing program to be
first compiled with target flag (e.g., x86_64 or aarch64) through
clang to generate IR and then go through llc to generate bpf
byte code. Tracing program often refers to kernel internal
data structures which needs to be compiled with non-bpf target.

But such a compilation model may cause a problem on aarch64.
The bcc issue https://github.com/iovisor/bcc/issues/2827
reported such a problem.

For the above structure, since aarch64 has "i128:128" in its
layout string, the generated IR will have
  %struct.ipv6_key_t = type { i32, i128, i16 }

Since bpf does not have "i128:128" in its spec string,
the selectionDAG assumes alignment 8 for i128 and
computes the stack storage size for the above is 32 bytes,
which leads incorrect code later.

The x86_64 does not have this issue as it does not have
"i128:128" in its layout spec as it does permits i128 to
be alignmented at 8 bytes at stack. Its IR type looks like
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }

The fix here is add i128 support in layout spec, the same as
aarch64. The only downside is we may have less optimal stack
allocation in certain cases since we require 16byte alignment
for i128 instead of 8. But this is probably fine as i128 is
not used widely and in most cases users should already
have proper alignment.

Differential Revision: https://reviews.llvm.org/D76587
2020-03-28 11:46:29 -07:00
Fangrui Song 692e0c9648 [MC] Add MCStreamer::emitInt{8,16,32,64}
Similar to AsmPrinter::emitInt{8,16,32,64}.
2020-02-29 09:40:21 -08:00
Fangrui Song 774971030d [MCStreamer] De-capitalize EmitValue EmitIntValue{,InHex} 2020-02-14 23:08:40 -08:00
Fangrui Song 6d2d589b06 [MC] De-capitalize another set of MCStreamer::Emit* functions
Emit{ValueTo,Code}Alignment Emit{DTP,TP,GP}* EmitSymbolValue etc
2020-02-14 19:26:52 -08:00
Fangrui Song a55daa1461 [MC] De-capitalize some MCStreamer::Emit* functions 2020-02-14 19:11:53 -08:00
Fangrui Song bcd24b2d43 [AsmPrinter][MCStreamer] De-capitalize EmitInstruction and EmitCFI* 2020-02-13 22:08:55 -08:00
Fangrui Song 0bc77a0f0d [AsmPrinter] De-capitalize some AsmPrinter::Emit* functions
Similar to rL328848.
2020-02-13 13:38:33 -08:00
Yonghong Song 61bd33e37b [BPF] explicit warning of not supporting dynamic stack allocation
Currently, BPF does not support dynamic static allocation.
For a program like below:
  extern void bar(int *);
  void foo(int n) {
    int a[n];
    bar(a);
  }

The current error message looks like:
  unimplemented operand
  UNREACHABLE executed at /.../llvm/lib/Target/BPF/BPFISelLowering.cpp:199!

Let us make error message explicit so it will be clear to the user
what is the problem. With this patch, the error message looks like:
  fatal error: error in backend: Unsupported dynamic stack allocation
  ...

Differential Revision: https://reviews.llvm.org/D74521
2020-02-12 20:43:06 -08:00
Yonghong Song 29bc5dd194 [BPF] implement isTruncateFree and isZExtFree in BPFTargetLowering
Currently, isTruncateFree() and isZExtFree() callbacks return false
as they are not implemented in BPF backend. This may cause suboptimal
code generation. For example, if the load in the context of zero extension
has more than one use, the pattern zextload{i8,i16,i32} will
not be generated. Rather, the load will be matched first and
then the result is zero extended.

For example, in the test together with this commit, we have
   I1: %0 = load i32, i32* %data_end1, align 4, !tbaa !2
   I2: %conv = zext i32 %0 to i64
   ...
   I3: %2 = load i32, i32* %data, align 4, !tbaa !7
   I4: %conv2 = zext i32 %2 to i64
   ...
   I5: %4 = trunc i64 %sub.ptr.lhs.cast to i32
   I6: %conv13 = sub i32 %4, %2
   ...

The I1 and I2 will match to one zextloadi32 DAG node, where SUBREG_TO_REG is
used to convert a 32bit register to 64bit one. During code generation,
SUBREG_TO_REG is a noop.

The %2 in I3 is used in both I4 and I6. If isTruncateFree() is false,
the current implementation will generate a SLL_ri and SRL_ri
for the zext part during lowering.

This patch implement isTruncateFree() in the BPF backend, so for the
above example, I3 and I4 will generate a zextloadi32 DAG node with
SUBREG_TO_REG is generated during lowering to Machine IR.

isZExtFree() is also implemented as it should help code gen as well.

This patch also enables the change in https://reviews.llvm.org/D73985
since it won't kick in generates MOV_32_64 machine instruction.

Differential Revision: https://reviews.llvm.org/D74101
2020-02-11 09:59:19 -08:00
Eric Astor 8d5bf0422b [ms] [llvm-ml] Add support for attempted register parsing
Summary:
Add a new method (tryParseRegister) that attempts to parse a register specification.

MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't.

Reviewers: thakis

Reviewed By: thakis

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73486
2020-02-11 10:45:33 -05:00
Yonghong Song d96c1bbaa0 [BPF] disable ReduceLoadWidth during SelectionDag phase
The compiler may transform the following code
  ctx = ctx + reloc_offset
  ... (*(u32 *)ctx) & 0x8000 ...
to
  ctx = ctx + reloc_offset
  ... (*(u8 *)(ctx + 1)) & 0x80 ...
where reloc_offset will be replaced with a constant during
AsmPrinter phase.

The above transformed code will be rejected the kernel verifier
as it does not allow
  *(type *)((ctx + non_zero_offset1) + non_zero_offset2)
style access pattern.

It is hard at SelectionDag phase to identify whether a load
is related to context or not. Sometime, interprocedure analysis
may be needed. So let us simply prevent such optimization
from happening.

Differential Revision: https://reviews.llvm.org/D73997
2020-02-04 18:37:43 -08:00
Yonghong Song 6d07802d63 [BPF] handle typedef of struct/union for CO-RE relocations
Linux commit
  1cf5b23988 (diff-289313b9fec99c6f0acfea19d9cfd949)
uses "#pragma clang attribute push (__attribute__((preserve_access_index)),
      apply_to = record)"
to apply CO-RE relocations to all records including the following pattern:
  #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)
  typedef struct {
    int a;
  } __t;
  #pragma clang attribute pop
  int test(__t *arg) { return arg->a; }

The current approach to use struct/union type in the relocation record will
result in an anonymous struct, which make later type matching difficult
in bpf loader. In fact, current BPF backend will fail the above program
with assertion:
  clang: ../lib/Target/BPF/BPFAbstractMemberAccess.cpp:796: ...
     Assertion `TypeName.size()' failed.

clang will change to use the type of the base of the member access
which will preserve the typedef modifier for the
preserve_{struct,union}_access_index intrinsics in the above example.
Here we adjust BPF backend to accept that the debuginfo
type metadata may be 'typedef' and handle them properly.

Differential Revision: https://reviews.llvm.org/D73902
2020-02-04 08:53:03 -08:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
Simon Moll 5c8ba508b2 [NFC] unsigned->Register in storeRegTo/loadRegFromStack
Summary:
This patch makes progress on the 'unsigned -> Register' rewrite for
`TargetInstrInfo::loadRegFromStack` and `TII::storeRegToStack`.

Reviewers: arsenm, craig.topper, uweigand, jpienaar, atanasyan, venkatra, robertlytton, dylanmckay, t.p.northover, kparzysz, tstellar, k-ishizaka

Reviewed By: arsenm

Subscribers: wuzish, merge_guards_bot, jyknight, sdardis, nemanjai, jvesely, wdng, nhaehnle, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73870
2020-02-03 14:22:16 +01:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Yonghong Song 795bbb3662 [BPF] fix a bug in BPFMISimplifyPatchable pass with -O0
The recommended optimization level for BPF programs
is O2 since (1). BPF is running inside the kernel and
linux kernel won't work at -O0 level, and (2). Verifier
is not able to handle O0 code properly, e.g., potential
large stack size and a lot of spills.

But we should keep -O0 at least compiling.
This patch fixed a bug in BPFMISimplifyPatchable phase
where with -O0, a segmentation fault will happen for a
simple program like:
  int test(int a, int b) { return a + b; }

A test case is added to capture such a case.

Differential Revision: https://reviews.llvm.org/D73681
2020-01-30 08:28:39 -08:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Tom Stellard 0dbcb36394 CMake: Make most target symbols hidden by default
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.

A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.

This patch reduces the number of public symbols in libLLVM.so by about
25%.  This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so

One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.

Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278

Reviewers: chandlerc, beanz, mgorny, rnk, hans

Reviewed By: rnk, hans

Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54439
2020-01-14 19:46:52 -08:00
Fangrui Song 6fdd6a7b3f [Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction()
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.

If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
2020-01-11 13:34:52 -08:00
Yonghong Song fbb64aa698 [BPF] extend BTF_KIND_FUNC to cover global, static and extern funcs
Previously extern function is added as BTF_KIND_VAR. This does not work
well with existing BTF infrastructure as function expected to use
BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO.

This patch added extern function to BTF_KIND_FUNC. The two bits 0:1
of btf_type.info are used to indicate what kind of function it is:
  0: static
  1: global
  2: extern

Differential Revision: https://reviews.llvm.org/D71638
2020-01-10 09:06:31 -08:00
Fangrui Song 3d87d0b925 [MC] Add parameter `Address` to MCInstrPrinter::printInstruction
Follow-up of D72172.

Reviewed By: jhenderson, rnk

Differential Revision: https://reviews.llvm.org/D72180
2020-01-06 20:44:14 -08:00
Fangrui Song aa708763d3 [MC] Add parameter `Address` to MCInstPrinter::printInst
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.

It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.

Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.

The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.

In any case, downstream projects which don't know `Address` can pass 0 as
the argument.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72172
2020-01-06 20:42:22 -08:00
Yonghong Song ffd57408ef [BPF] Enable relocation location for load/store/shifts
Previous btf field relocation is always at assignment like
   r1 = 4
which is converted from an ld_imm64 instruction.

This patch did an optimization such that relocation
instruction might be load/store/shift. Specically, the
following insns may also have relocation, except BPF_MOV:
  LDB, LDH, LDW, LDD, STB, STH, STW, STD,
  LDB32, LDH32, LDW32, STB32, STH32, STW32,
  SLL, SRL, SRA

To accomplish this, a few BPF target specific
codegen only instructions are invented. They
are generated at backend BPF SimplifyPatchable phase,
which is at early llc phase when SSA form is available.
The new codegen only instructions will be converted to
real proper instructions at the codegen and BTF emission stage.

Note that, as revealed by a few tests, this optimization might
be actual generating more relocations:
Scenario 1:
  if (...) {
    ... __builtin_preserve_field_info(arg->b2, 0) ...
  } else {
    ... __builtin_preserve_field_info(arg->b2, 0) ...
  }
  Compiler could do CSE to only have one relocation. But if both
  of the above is translated into codegen internal instructions,
  the compiler will not be able to do that.
Scenario 2:
  offset = ... __builtin_preserve_field_info(arg->b2, 0) ...
  ...
  ...  offset ...
  ...  offset ...
  ...  offset ...
  For whatever reason, the compiler might be temporarily do copy
  propagation of the righthand of "offset" assignment like
  ...  __builtin_preserve_field_info(arg->b2, 0) ...
  ...  __builtin_preserve_field_info(arg->b2, 0) ...
  and CSE will be able to deduplicate later.
  But if these intrinsics are converted to BPF pseudo instructions,
  they will not be able to get deduplicated.

I do not expect we have big instruction count difference.
It may actually reduce instruction count since now relocation
is in deeper insn dependency chain.
For example, for test offset-reloc-fieldinfo-2.ll, this patch
generates 7 instead of 6 relocations for non-alu32 mode, but it
actually reduced instruction count from 29 to 26.

Differential Revision: https://reviews.llvm.org/D71790
2019-12-26 09:07:39 -08:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Yonghong Song 7d0e8930ed [BPF] put not-section-attribute externs into BTF ".extern" data section
Currently for extern variables with section attribute, those
BTF_KIND_VARs will not be placed in any DataSec. This is
inconvenient as any other generated BTF_KIND_VAR belongs to
one DataSec. This patch put these extern variables into
".extern" section so bpf loader can have a consistent
processing mechanism for all data sections and variables.
2019-12-10 11:45:17 -08:00
Yonghong Song 4448125007 [BPF] Support to emit debugInfo for extern variables
extern variable usage in BPF is different from traditional
pure user space application. Recent discussion in linux bpf
mailing list has two use cases where debug info types are
required to use extern variables:
  - extern types are required to have a suitable interface
    in libbpf (bpf loader) to provide kernel config parameters
    to bpf programs.
    https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t
  - extern types are required so kernel bpf verifier can
    verify program which uses external functions more precisely.
    This will make later link with actual external function no
    need to reverify.
    https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed

This patch added bpf support to consume such info into BTF,
which can then be used by bpf loader. Function processFuncPrototypes()
only adds extern function definitions into BTF. The functions
with actual definition have been added to BTF in some other places.

Differential Revision: https://reviews.llvm.org/D70697
2019-12-09 21:53:29 -08:00
Yonghong Song 5ea611daf9 [BPF] Support weak global variables for BTF
Generate types for global variables with "weak" attribute.
Keep allocation scope the same for both weak and non-weak
globals as ELF symbol table can determine whether a global
symbol is weak or not.

Differential Revision: https://reviews.llvm.org/D71162
2019-12-07 08:58:19 -08:00
Yonghong Song 6db023b99b [BPF] add "llvm." prefix to BPF internally created globals
Currently, BPF backend creates some global variables with name like
  <type_name>:<reloc_type>:<patch_imm>$<access_str>
to carry certain information to BPF backend.

With direct clang compilation, the following code in
   llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
is triggered and the above globals are emitted to the ELF file.
(clang enabled this as opt flag -faddrsig is on by default.)
   if (TM.Options.EmitAddrsig) {
    // Emit address-significance attributes for all globals.
    OutStreamer->EmitAddrsig();
    for (const GlobalValue &GV : M.global_values())
      if (!GV.use_empty() && !GV.isThreadLocal() &&
          !GV.hasDLLImportStorageClass() && !GV.getName().startswith("llvm.") &&
          !GV.hasAtLeastLocalUnnamedAddr())
        OutStreamer->EmitAddrsigSym(getSymbol(&GV));
  }
...
 10162: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND tcp_sock:0:2048$0:117
 10163: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND tcp_sock:0:2112$0:126:0
 10164: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND tcp_sock:1:8$0:31:6
...
While in llc, those globals are not emited since EmitAddrsig
default option is false for llc. The llc flag "-addrsig" can be used to
enable the above code.

This patch added "llvm." prefix to these internal globals so that
they can be ignored in the above codes and possible other
places.

Differential Revision: https://reviews.llvm.org/D70703
2019-11-25 21:34:46 -08:00
Yonghong Song 9e6aa81588 [BPF] Fix a recursion bug in BPF Peephole ZEXT optimization
Commit a0841dfe85 ("[BPF] Fix a bug in peephole optimization")
fixed a bug in peephole optimization. Recursion is introduced
to handle COPY and PHI instructions.

Unfortunately, multiple PHI instructions may form a cycle
and this will cause infinite recursion, eventual segfault.
For Commit a0841dfe85, I indeed tried a few loops to ensure
that I won't see the recursion, but I did not try with
complex control flows, which, as demonstrated with the test case
in this patch, may introduce PHI cycles.

This patch fixed the issue by introducing a set to remember
visited PHI instructions. This way, cycles can be properly
detected and handled.

Differential Revision: https://reviews.llvm.org/D70586
2019-11-22 08:05:43 -08:00
Tom Stellard ab411801b8 [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO.  I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so.  Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:

1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so.  This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.

With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.

2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set.  This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.

I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:

- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON

Reviewers: beanz, smeenai, compnerd, phosek

Reviewed By: beanz

Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70179
2019-11-21 10:48:08 -08:00
Yonghong Song a0841dfe85 [BPF] Fix a bug in peephole optimization
One of current peephole optimiations is to remove SLL/SRL if
the sub register has been zero extended. This phase has two bugs
and one limitations.

First, for the physical subregister used in pseudo insn COPY
like below, it permits incorrect optimization.
    %0:gpr32 = COPY $w0
    ...
    %4:gpr = MOV_32_64 %0:gpr32
    %5:gpr = SLL_ri %4:gpr(tied-def 0), 32
    %6:gpr = SRA_ri %5:gpr(tied-def 0), 32
The $w0 could be from the return value of a previous function call
and its upper 32-bit value might contain some non-zero values.
The same applies to function arguments.

Second, the current code may permits removing SLL/SRA like below:
    %0:gpr32 = COPY $w0
    %1:gpr32 = COPY %0:gpr32
    ...
    %4:gpr = MOV_32_64 %1:gpr32
    %5:gpr = SLL_ri %4:gpr(tied-def 0), 32
    %6:gpr = SRA_ri %5:gpr(tied-def 0), 32
The reason is that it did not follow def-use chain to skip all
intermediate 32bit-to-32bit COPY instructions.

The current implementation is also very conservative for PHI
instructions. If any PHI insn component is another PHI or COPY insn,
it will just permit SLL/SRA.

This patch fixed the issue as follows:
 - During def/use chain traversal, if any physical register is read,
   SLL/SRA will be preserved as these physical registers are mostly
   from function return values or current function arguments.
 - Recursively visit all COPY and PHI instructions.
2019-11-20 15:19:59 -08:00
Simon Pilgrim f784ad8ff3 Fix uninitialized variable warning. NFCI. 2019-11-14 14:21:17 +00:00
Yonghong Song 166cdc0281 [BPF] generate BTF_KIND_VARs for all non-static globals
Enable to generate BTF_KIND_VARs for non-static
default-section globals which is not allowed previously.
Modified the existing test case to accommodate the new change.

Also removed unused linkage enum members VAR_GLOBAL_TENTATIVE and
VAR_GLOBAL_EXTERNAL.

Differential Revision: https://reviews.llvm.org/D70145
2019-11-12 14:34:08 -08:00
Matt Arsenault e6c9a9af39 Use MCRegister in copyPhysReg 2019-11-11 14:42:33 +05:30
Yonghong Song 6b8baf3062 [BPF] turn on -mattr=+alu32 for cpu version v3 and later
-mattr=+alu32 has shown good performance vs. without this attribute.
Based on discussion at
  https://lore.kernel.org/bpf/1ec37838-966f-ec0b-5223-ca9b6eb0860d@fb.com/T/#t
cpu version v3 should support -mattr=+alu32.
This patch enabled alu32 if cpu version is v3, either specified by user
or probed by the llvm.

Differential Revision: https://reviews.llvm.org/D69957
2019-11-07 22:08:46 -08:00
Yonghong Song 9f34447f3f [BPF] fix a use after free bug
Commit fff2721286 ("[BPF] Fix CO-RE bugs with bitfields")
fixed CO-RE handling bitfield issues. But the implementation
introduced a use after free bug. The "Base" of the intrinsic
might be freed so later on accessing the Type of "Base"
might access the freed memory. The failed test case,
  CodeGen/BPF/CORE/offset-reloc-middle-chain.ll
is exactly used to test such a case.

Similarly to previous attempt to remember Metadata etc,
remember "Base" pointee Alignment in advance to avoid
such use after free bug.
2019-11-04 22:20:23 -08:00
Yonghong Song fff2721286 [BPF] Fix CO-RE bugs with bitfields
bitfield handling is not robust with current implementation.
I have seen two issues as described below.

Issue 1:
  struct s {
    long long f1;
    char f2;
    char b1:1;
  } *p;
  The current approach will generate an access bit size
  56 (from b1 to the end of structure) which will be
  rejected as it is not power of 2.

Issue 2:
  struct s {
    char f1;
    char b1:3;
    char b2:5;
    char b3:6:
    char b4:2;
    char f2;
  };
  The LLVM will group 4 bitfields together with 2 bytes. But
  loading 2 bytes is not correct as it violates alignment
  requirement. Note that sometimes, LLVM breaks a large
  bitfield groups into multiple groups, but not in this case.

To resolve the above two issues, this patch takes a
different approach. The alignment for the structure is used
to construct the offset of the bitfield access. The bitfield
incurred memory access is an aligned memory access with alignment/size
equal to the alignment of the structure.
This also simplified the code.

This may not be the optimal memory access in terms of memory access
width. But this should be okay since extracting the bitfield value
will have the same amount of work regardless of what kind of
memory access width.

Differential Revision: https://reviews.llvm.org/D69837
2019-11-04 20:08:05 -08:00
Yonghong Song c430533771 [BPF] fix a bug in __builtin_preserve_field_info() with FIELD_BYTE_SIZE
During deriving proper bitfield access FIELD_BYTE_SIZE,
function Member->getStorageOffsetInBits() is used to
get llvm IR type storage offset in bits so that
the byte size can permit aligned loads/stores with previously
derived FIELD_BYTE_OFFSET.

The function should only be used with bitfield members and it will
assert if ASSERT is turned on during cmake build.
  Constant *getStorageOffsetInBits() const {
    assert(getTag() == dwarf::DW_TAG_member && isBitField());
    if (auto *C = cast_or_null<ConstantAsMetadata>(getExtraData()))
      return C->getValue();
    return nullptr;
  }

This patch fixed the issue by using Member->isBitField()
directly and a test case is added to cover this missing case.
This issue is discovered when running Andrii's linux kernel CO-RE
tests.

Differential Revision: https://reviews.llvm.org/D69761
2019-11-03 08:18:28 -08:00
Yonghong Song a27c998c00 [BPF] fix a CO-RE issue with -mattr=+alu32
Ilya Leoshkevich (<iii@linux.ibm.com>) reported an issue that
with -mattr=+alu32 CO-RE has a segfault in BPF MISimplifyPatchable
pass.

The pattern will be transformed by MISimplifyPatchable
pass looks like below:
  r5 = ld_imm64 @"b:0:0$0:0"
  r2 = ldw r5, 0
  ... r2 ... // use r2
The pass will remove the intermediate 'ldw' instruction
and replacing all r2 with r5 likes below:
  r5 = ld_imm64 @"b:0:0$0:0"
  ... r5 ... // use r5
Later, the ld_imm64 insn will be replaced with
  r5 = <patched immediate>
for field relocation purpose.

With -mattr=+alu32, the input code may become
  r5 = ld_imm64 @"b:0:0$0:0"
  w2 = ldw32 r5, 0
  ... w2 ... // use w2
Replacing "w2" with "r5" is incorrect and will
trigger compiler internal errors.

To fix the problem, if the register class of ldw* dest
register is sub_32, we just replace the original ldw*
register with:
  w2 = w5
Directly replacing all uses of w2 with in-place
constructed w5 for the use operand seems not working in all cases.

The latest kernel will have -mattr=+alu32 on by default,
so added this flag to all CORE tests.
Tested with latest kernel bpf-next branch as well with this patch.

Differential Revision: https://reviews.llvm.org/D69438
2019-10-25 14:27:25 -07:00
Mirko Brkusanin 4b63ca1379 [Mips] Use appropriate private label prefix based on Mips ABI
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.

Tags: #llvm, #clang, #lldb

Differential Revision: https://reviews.llvm.org/D66795
2019-10-23 12:24:35 +02:00
Yonghong Song ee881197b0 [BPF] fix indirect call assembly code
Currently, for indirect call, the assembly code printed out as
  callx <imm>
This is not right, it should be
  callx <reg>

Fixed the issue with proper format.

Differential Revision: https://reviews.llvm.org/D69229

llvm-svn: 375386
2019-10-21 03:22:03 +00:00
Reid Kleckner 904cd3e06b Prune a LegacyDivergenceAnalysis and MachineLoopInfo include each
Now X86ISelLowering doesn't depend on many IR analyses.

llvm-svn: 375320
2019-10-19 01:31:09 +00:00
Guillaume Chatelet 882c43d703 [Alignment][NFC] Use Align for TargetFrameLowering/Subtarget
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68993

llvm-svn: 375084
2019-10-17 07:49:39 +00:00
Jiong Wang ec51851026 bpf: fix wrong truncation elimination when there is back-edge/loop
Currently, BPF backend is doing truncation elimination. If one truncation
is performed on a value defined by narrow loads, then it could be redundant
given BPF loads zero extend the destination register implicitly.

When the definition of the truncated value is a merging value (PHI node)
that could come from different code paths, then checks need to be done on
all possible code paths.

Above described optimization was introduced as r306685, however it doesn't
work when there is back-edge, for example when loop is used inside BPF
code.

For example for the following code, a zero-extended value should be stored
into b[i], but the "and reg, 0xffff" is wrongly eliminated which then
generates corrupted data.

void cal1(unsigned short *a, unsigned long *b, unsigned int k)
{
  unsigned short e;

  e = *a;
  for (unsigned int i = 0; i < k; i++) {
    b[i] = e;
    e = ~e;
  }
}

The reason is r306685 was trying to do the PHI node checks inside isel
DAG2DAG phase, and the checks are done on MachineInstr. This is actually
wrong, because MachineInstr is being built during isel phase and the
associated information is not completed yet. A quick search shows none
target other than BPF is access MachineInstr info during isel phase.

For an PHI node, when you reached it during isel phase, it may have all
predecessors linked, but not successors. It seems successors are linked to
PHI node only when doing SelectionDAGISel::FinishBasicBlock and this
happens later than PreprocessISelDAG hook.

Previously, BPF program doesn't allow loop, there is probably the reason
why this bug was not exposed.

This patch therefore fixes the bug by the following approach:
 - The existing truncation elimination code and the associated
   "load_to_vreg_" records are removed.
 - Instead, implement truncation elimination using MachineSSA pass, this
   is where all information are built, and keep the pass together with other
   similar peephole optimizations inside BPFMIPeephole.cpp. Redundant move
   elimination logic is updated accordingly.
 - Unit testcase included + no compilation errors for kernel BPF selftest.

Patch Review
===
Patch was sent to and reviewed by BPF community at:

  https://lore.kernel.org/bpf

Reported-by: David Beckett <david.beckett@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 375007
2019-10-16 15:27:59 +00:00
Kadir Cetinkaya dd37a26f6d Fix assertions disabled builds after rL374367
llvm-svn: 374372
2019-10-10 16:04:45 +00:00
Yonghong Song d46a6a9e68 [BPF] Remove relocation for patchable externs
Previously, patchable extern relocations are introduced to patch
external variables used for multi versioning in
compile once, run everywhere use case. The load instruction
will be converted into a move with an patchable immediate
which can be changed by bpf loader on the host.

The kernel verifier has evolved and is able to load
and propagate constant values, so compiler relocation
becomes unnecessary. This patch removed codes related to this.

Differential Revision: https://reviews.llvm.org/D68760

llvm-svn: 374367
2019-10-10 15:33:09 +00:00
Yonghong Song 05e46979d2 [BPF] do compile-once run-everywhere relocation for bitfields
A bpf specific clang intrinsic is introduced:
   u32 __builtin_preserve_field_info(member_access, info_kind)
Depending on info_kind, different information will
be returned to the program. A relocation is also
recorded for this builtin so that bpf loader can
patch the instruction on the target host.
This clang intrinsic is used to get certain information
to facilitate struct/union member relocations.

The offset relocation is extended by 4 bytes to
include relocation kind.
Currently supported relocation kinds are
 enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
 };
for __builtin_preserve_field_info. The old
access offset relocation is covered by
    FIELD_BYTE_OFFSET = 0.

An example:
struct s {
    int a;
    int b1:9;
    int b2:4;
};
enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
};

void bpf_probe_read(void *, unsigned, const void *);
int field_read(struct s *arg) {
  unsigned long long ull = 0;
  unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET);
  unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE);
 #ifdef USE_PROBE_READ
  bpf_probe_read(&ull, size, (const void *)arg + offset);
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
  lshift = lshift + (size << 3) - 64;
 #endif
 #else
  switch(size) {
  case 1:
    ull = *(unsigned char *)((void *)arg + offset); break;
  case 2:
    ull = *(unsigned short *)((void *)arg + offset); break;
  case 4:
    ull = *(unsigned int *)((void *)arg + offset); break;
  case 8:
    ull = *(unsigned long long *)((void *)arg + offset); break;
  }
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #endif
  ull <<= lshift;
  if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS))
    return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
  return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
}

There is a minor overhead for bpf_probe_read() on big endian.

The code and relocation generated for field_read where bpf_probe_read() is
used to access argument data on little endian mode:
        r3 = r1
        r1 = 0
        r1 = 4  <=== relocation (FIELD_BYTE_OFFSET)
        r3 += r1
        r1 = r10
        r1 += -8
        r2 = 4  <=== relocation (FIELD_BYTE_SIZE)
        call bpf_probe_read
        r2 = 51 <=== relocation (FIELD_LSHIFT_U64)
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r2
        r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
        r0 = r1
        r0 >>= r2
        r3 = 1  <=== relocation (FIELD_SIGNEDNESS)
        if r3 == 0 goto LBB0_2
        r1 s>>= r2
        r0 = r1
LBB0_2:
        exit

Compare to the above code between relocations FIELD_LSHIFT_U64 and
FIELD_LSHIFT_U64, the code with big endian mode has four more
instructions.
        r1 = 41   <=== relocation (FIELD_LSHIFT_U64)
        r6 += r1
        r6 += -64
        r6 <<= 32
        r6 >>= 32
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r6
        r2 = 60   <=== relocation (FIELD_RSHIFT_U64)

The code and relocation generated when using direct load.
        r2 = 0
        r3 = 4
        r4 = 4
        if r4 s> 3 goto LBB0_3
        if r4 == 1 goto LBB0_5
        if r4 == 2 goto LBB0_6
        goto LBB0_9
LBB0_6:                                 # %sw.bb1
        r1 += r3
        r2 = *(u16 *)(r1 + 0)
        goto LBB0_9
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
        if r4 == 8 goto LBB0_8
        goto LBB0_9
LBB0_8:                                 # %sw.bb9
        r1 += r3
        r2 = *(u64 *)(r1 + 0)
        goto LBB0_9
LBB0_5:                                 # %sw.bb
        r1 += r3
        r2 = *(u8 *)(r1 + 0)
        goto LBB0_9
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51
        r2 <<= r1
        r1 = 60
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

Considering verifier is able to do limited constant
propogation following branches. The following is the
code actually traversed.
        r2 = 0
        r3 = 4   <=== relocation
        r4 = 4   <=== relocation
        if r4 s> 3 goto LBB0_3
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51   <=== relocation
        r2 <<= r1
        r1 = 60   <=== relocation
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

For native load case, the load size is calculated to be the
same as the size of load width LLVM otherwise used to load
the value which is then used to extract the bitfield value.

Differential Revision: https://reviews.llvm.org/D67980

llvm-svn: 374099
2019-10-08 18:23:17 +00:00
Jordan Rose fdaa742174 Second attempt to add iterator_range::empty()
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.

https://reviews.llvm.org/D68439

llvm-svn: 373935
2019-10-07 18:14:24 +00:00
Yonghong Song 02ac75092d [BPF] Handle offset reloc endpoint ending in the middle of chain properly
During studying support for bitfield, I found an issue for
an example like the one in test offset-reloc-middle-chain.ll.
  struct t1 { int c; };
  struct s1 { struct t1 b; };
  struct r1 { struct s1 a; };
  #define _(x) __builtin_preserve_access_index(x)
  void test1(void *p1, void *p2, void *p3);
  void test(struct r1 *arg) {
    struct s1 *ps = _(&arg->a);
    struct t1 *pt = _(&arg->a.b);
    int *pi = _(&arg->a.b.c);
    test1(ps, pt, pi);
  }

The IR looks like:
  %0 = llvm.preserve.struct.access(base, ...)
  %1 = llvm.preserve.struct.access(%0, ...)
  %2 = llvm.preserve.struct.access(%1, ...)
  using %0, %1 and %2

In this case, we need to generate three relocatiions
corresponding to chains: (%0), (%0, %1) and (%0, %1, %2).
After collecting all the chains, the current implementation
process each chain (in a map) with code generation sequentially.
For example, after (%0) is processed, the code may look like:
  %0 = base + special_global_variable
  // llvm.preserve.struct.access(base, ...) is delisted
  // from the instruction stream.
  %1 = llvm.preserve.struct.access(%0, ...)
  %2 = llvm.preserve.struct.access(%1, ...)
  using %0, %1 and %2

When processing chain (%0, %1), the current implementation
tries to visit intrinsic llvm.preserve.struct.access(base, ...)
to get some of its properties and this caused segfault.

This patch fixed the issue by remembering all necessary
information (kind, metadata, access_index, base) during
analysis phase, so in code generation phase there is
no need to examine the intrinsic call instructions.
This also simplifies the code.

Differential Revision: https://reviews.llvm.org/D68389

llvm-svn: 373621
2019-10-03 16:30:29 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Simon Pilgrim 93c8951147 [BPF] Remove unused variables. NFCI.
Fixes a dyn_cast<> null dereference warning.

llvm-svn: 372958
2019-09-26 10:55:57 +00:00
Yonghong Song 1487bf6c82 [BPF] Generate array dimension size properly for zero-size elements
Currently, if an array element type size is 0, the number of
array elements will be set to 0, regardless of what user
specified. This implementation is done in the beginning where
BTF is mostly used to calculate the member offset.

For example,
  struct s {};
  struct s1 {
        int b;
        struct s a[2];
  };
  struct s1 s1;
The BTF will have struct "s1" member "a" with element count 0.

Now BTF types are used for compile-once and run-everywhere
relocations and we need more precise type representation
for type comparison. Andrii reported the issue as there
are differences between original structure and BTF-generated
structure.

This patch made the change to correctly assign "2"
as the number elements of member "a".
Some dead codes related to ElemSize compuation are also removed.

Differential Revision: https://reviews.llvm.org/D67979

llvm-svn: 372785
2019-09-24 22:38:43 +00:00
Yonghong Song c68ee0ce70 [BPF] Permit all user instructed offset relocatiions
Currently, not all user specified relocations
(with clang intrinsic __builtin_preserve_access_index())
will turn into relocations.

In the current implementation, a __builtin_preserve_access_index()
chain is turned into relocation only if the result of the clang
intrinsic is used in a function call or a nonzero offset computation
of getelementptr. For all other cases, the relocatiion request
is ignored and the __builtin_preserve_access_index() is turned
into regular getelementptr instructions.
The main reason is to mimic bpf_probe_read() requirement.

But there are other use cases where relocatable offset is
generated but not used for bpf_probe_read(). This patch
relaxed previous constraints when to generate relocations.
Now, all user __builtin_preserve_access_index() will have
relocations generated.

Differential Revision: https://reviews.llvm.org/D67688

llvm-svn: 372198
2019-09-18 03:49:07 +00:00
Guillaume Chatelet ad1cea0dda [Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67267

llvm-svn: 371212
2019-09-06 15:03:49 +00:00
Guillaume Chatelet 4fc3ad9e13 [Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67229

llvm-svn: 371200
2019-09-06 12:48:34 +00:00
Guillaume Chatelet aff45e4b23 [LLVM][Alignment] Make functions using log of alignment explicit
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:

 - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
 - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
 - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,

Reviewers: lattner, thegameg, courbet

Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65945

llvm-svn: 371045
2019-09-05 10:00:22 +00:00
Sam Clegg 90b6bb75e8 [MC] Minor cleanup to MCFixup::Kind handling. NFC.
Prefer `MCFixupKind` where possible and add getTargetKind() to
convert to `unsigned` when needed rather than scattering cast
operators around the place.

Differential Revision: https://reviews.llvm.org/D59890

llvm-svn: 369720
2019-08-23 01:00:55 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Bill Wendling ebc2cf9c27 Use "isa" since the variable isn't used.
llvm-svn: 367985
2019-08-06 07:27:26 +00:00
Yonghong Song 37d24a696b [BPF] Handling type conversions correctly for CO-RE
With newly added debuginfo type
metadata for preserve_array_access_index() intrinsic,
this patch did the following two things:
 (1). checking validity before adding a new access index
      to the access chain.
 (2). calculating access byte offset in IR phase
      BPFAbstractMemberAccess instead of when BTF is emitted.

For (1), the metadata provided by all preserve_*_access_index()
intrinsics are used to check whether the to-be-added type
is a proper struct/union member or array element.

For (2), with all available metadata, calculating access byte
offset becomes easier in BPFAbstractMemberAccess IR phase.
This enables us to remove the unnecessary complexity in
BTFDebug.cpp.

New tests are added for
  . user explicit casting to array/structure/union
  . global variable (or its dereference) as the source of base
  . multi demensional arrays
  . array access given a base pointer
  . cases where we won't generate relocation if we cannot find
    type name.

Differential Revision: https://reviews.llvm.org/D65618

llvm-svn: 367735
2019-08-02 23:16:44 +00:00
Daniel Sanders 2bea69bf65 Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
2019-08-01 23:27:28 +00:00
Yonghong Song 329abf2939 [BPF] fix typedef issue for offset relocation
Currently, the CO-RE offset relocation does not work
if any struct/union member or array element is a typedef.
For example,
  typedef const int arr_t[7];
  struct input {
      arr_t a;
  };
  func(...) {
       struct input *in = ...;
       ... __builtin_preserve_access_index(&in->a[1]) ...
  }
The BPF backend calculated default offset is 0 while
4 is the correct answer. Similar issues exist for struct/union
typedef's.

When getting struct/union member or array element type,
we should trace down to the type by skipping typedef
and qualifiers const/volatile as this is what clang did
to generate getelementptr instructions.
(const/volatile member type qualifiers are already
ignored by clang.)

This patch fixed this issue, for each access index,
skipping typedef and const/volatile/restrict BTF types.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D65259

llvm-svn: 367062
2019-07-25 21:47:27 +00:00
Yonghong Song d8efec97be [BPF] fix CO-RE incorrect index access string
Currently, we expect the CO-RE offset relocation records
a string encoding the original getelementptr access index,
so kernel bpf loader can decode it correctly.

For example,
  struct s { int a; int b; };
  struct t { int c; int d; };
  #define _(x) (__builtin_preserve_access_index(x))
  int get_value(const void *addr1, const void *addr2);
  int test(struct s *arg1, struct t *arg2) {
    return get_value(_(&arg1->b), _(&arg2->d));
  }

We expect two offset relocations:
  reloc 1: type s, access index 0, 1
  reloc 2: type t, access index 0, 1

Two globals are created to retain access indexes for the
above two relocations with global variable names.
The first global has a name "0:1:". Unfortunately,
the second global has the name "0:1:.1" as the llvm
internals automatically add suffix ".1" to a global
with the same name. Later on, the BPF peels the last
character and record "0:1" and "0:1:." in the
relocation table.

This is not desirable. BPF backend could use the global
variable suffix knowledge to generate correct access str.
This patch rather took an approach not relying on
that knowledge. It generates "s:0:1:" and "t:0:1:" to
avoid global variable suffixes and later on generate
correct index access string "0:1" for both records.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D65258

llvm-svn: 367030
2019-07-25 16:01:26 +00:00
Yonghong Song a1b2a27a38 [BPF] Fix a typo in the file name
Fixed the file name from BPFAbstrctMemberAccess.cpp to
BPFAbstractMemberAccess.cpp.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 365532
2019-07-09 18:35:46 +00:00
Yonghong Song d3d88d08b5 [BPF] Support for compile once and run everywhere
Introduction
============

This patch added intial support for bpf program compile once
and run everywhere (CO-RE).

The main motivation is for bpf program which depends on
kernel headers which may vary between different kernel versions.
The initial discussion can be found at https://lwn.net/Articles/773198/.

Currently, bpf program accesses kernel internal data structure
through bpf_probe_read() helper. The idea is to capture the
kernel data structure to be accessed through bpf_probe_read()
and relocate them on different kernel versions.

On each host, right before bpf program load, the bpfloader
will look at the types of the native linux through vmlinux BTF,
calculates proper access offset and patch the instruction.

To accommodate this, three intrinsic functions
   preserve_{array,union,struct}_access_index
are introduced which in clang will preserve the base pointer,
struct/union/array access_index and struct/union debuginfo type
information. Later, bpf IR pass can reconstruct the whole gep
access chains without looking at gep itself.

This patch did the following:
  . An IR pass is added to convert preserve_*_access_index to
    global variable who name encodes the getelementptr
    access pattern. The global variable has metadata
    attached to describe the corresponding struct/union
    debuginfo type.
  . An SimplifyPatchable MachineInstruction pass is added
    to remove unnecessary loads.
  . The BTF output pass is enhanced to generate relocation
    records located in .BTF.ext section.

Typical CO-RE also needs support of global variables which can
be assigned to different values to different hosts. For example,
kernel version can be used to guard different versions of codes.
This patch added the support for patchable externals as well.

Example
=======

The following is an example.

  struct pt_regs {
    long arg1;
    long arg2;
  };
  struct sk_buff {
    int i;
    struct net_device *dev;
  };

  #define _(x) (__builtin_preserve_access_index(x))
  static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) =
          (void *) 4;
  extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version;
  int bpf_prog(struct pt_regs *ctx) {
    struct net_device *dev = 0;

    // ctx->arg* does not need bpf_probe_read
    if (__kernel_version >= 41608)
      bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev));
    else
      bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev));
    return dev != 0;
  }

In the above, we want to translate the third argument of
bpf_probe_read() as relocations.

  -bash-4.4$ clang -target bpf -O2 -g -S trace.c

The compiler will generate two new subsections in .BTF.ext,
OffsetReloc and ExternReloc.
OffsetReloc is to record the structure member offset operations,
and ExternalReloc is to record the external globals where
only u8, u16, u32 and u64 are supported.

   BPFOffsetReloc Size
   struct SecLOffsetReloc for ELF section #1
   A number of struct BPFOffsetReloc for ELF section #1
   struct SecOffsetReloc for ELF section #2
   A number of struct BPFOffsetReloc for ELF section #2
   ...
   BPFExternReloc Size
   struct SecExternReloc for ELF section #1
   A number of struct BPFExternReloc for ELF section #1
   struct SecExternReloc for ELF section #2
   A number of struct BPFExternReloc for ELF section #2

  struct BPFOffsetReloc {
    uint32_t InsnOffset;    ///< Byte offset in this section
    uint32_t TypeID;        ///< TypeID for the relocation
    uint32_t OffsetNameOff; ///< The string to traverse types
  };

  struct BPFExternReloc {
    uint32_t InsnOffset;    ///< Byte offset in this section
    uint32_t ExternNameOff; ///< The string for external variable
  };

Note that only externs with attribute section ".BPF.patchable_externs"
are considered for Extern Reloc which will be patched by bpf loader
right before the load.

For the above test case, two offset records and one extern record
will be generated:
  OffsetReloc records:
        .long   .Ltmp12                 # Insn Offset
        .long   7                       # TypeId
        .long   242                     # Type Decode String
        .long   .Ltmp18                 # Insn Offset
        .long   7                       # TypeId
        .long   242                     # Type Decode String

  ExternReloc record:
        .long   .Ltmp5                  # Insn Offset
        .long   165                     # External Variable

  In string table:
        .ascii  "0:1"                   # string offset=242
        .ascii  "__kernel_version"      # string offset=165

The default member offset can be calculated as
    the 2nd member offset (0 representing the 1st member) of struct "sk_buff".

The asm code:
    .Ltmp5:
    .Ltmp6:
            r2 = 0
            r3 = 41608
    .Ltmp7:
    .Ltmp8:
            .loc    1 18 9 is_stmt 0        # t.c:18:9
    .Ltmp9:
            if r3 > r2 goto LBB0_2
    .Ltmp10:
    .Ltmp11:
            .loc    1 0 9                   # t.c:0:9
    .Ltmp12:
            r2 = 8
    .Ltmp13:
            .loc    1 19 66 is_stmt 1       # t.c:19:66
    .Ltmp14:
    .Ltmp15:
            r3 = *(u64 *)(r1 + 0)
            goto LBB0_3
    .Ltmp16:
    .Ltmp17:
    LBB0_2:
            .loc    1 0 66 is_stmt 0        # t.c:0:66
    .Ltmp18:
            r2 = 8
            .loc    1 21 66 is_stmt 1       # t.c:21:66
    .Ltmp19:
            r3 = *(u64 *)(r1 + 8)
    .Ltmp20:
    .Ltmp21:
    LBB0_3:
            .loc    1 0 66 is_stmt 0        # t.c:0:66
            r3 += r2
            r1 = r10
    .Ltmp22:
    .Ltmp23:
    .Ltmp24:
            r1 += -8
            r2 = 8
            call 4

For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number
8 is the structure offset based on the current BTF.
Loader needs to adjust it if it changes on the host.

For instruction .Ltmp5, "r2 = 0", the external variable
got a default value 0, loader needs to supply an appropriate
value for the particular host.

Compiling to generate object code and disassemble:
   0000000000000000 bpf_prog:
           0:       b7 02 00 00 00 00 00 00         r2 = 0
           1:       7b 2a f8 ff 00 00 00 00         *(u64 *)(r10 - 8) = r2
           2:       b7 02 00 00 00 00 00 00         r2 = 0
           3:       b7 03 00 00 88 a2 00 00         r3 = 41608
           4:       2d 23 03 00 00 00 00 00         if r3 > r2 goto +3 <LBB0_2>
           5:       b7 02 00 00 08 00 00 00         r2 = 8
           6:       79 13 00 00 00 00 00 00         r3 = *(u64 *)(r1 + 0)
           7:       05 00 02 00 00 00 00 00         goto +2 <LBB0_3>

    0000000000000040 LBB0_2:
           8:       b7 02 00 00 08 00 00 00         r2 = 8
           9:       79 13 08 00 00 00 00 00         r3 = *(u64 *)(r1 + 8)

    0000000000000050 LBB0_3:
          10:       0f 23 00 00 00 00 00 00         r3 += r2
          11:       bf a1 00 00 00 00 00 00         r1 = r10
          12:       07 01 00 00 f8 ff ff ff         r1 += -8
          13:       b7 02 00 00 08 00 00 00         r2 = 8
          14:       85 00 00 00 04 00 00 00         call 4

Instructions #2, #5 and #8 need relocation resoutions from the loader.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D61524

llvm-svn: 365503
2019-07-09 15:28:41 +00:00
Matt Arsenault e3a676e9ad CodeGen: Introduce a class for registers
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().

llvm-svn: 364191
2019-06-24 15:50:29 +00:00
Tom Stellard 4b0b26199b Revert CMake: Make most target symbols hidden by default
This reverts r362990 (git commit 374571301d)

This was causing linker warnings on Darwin:

ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)'
from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol
'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&),
std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)'
means the weak symbol cannot be overridden at runtime. This was likely caused by different translation
units being compiled with different visibility settings.

llvm-svn: 363028
2019-06-11 03:21:13 +00:00