Commit Graph

230 Commits

Author SHA1 Message Date
Reid Kleckner 4dc0b1ac60 Fix clang -Wimplicit-fallthrough warnings across llvm, NFC
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
   of only 'break'.

We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
   doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
   the outer case.

I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.

Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu

Differential Revision: https://reviews.llvm.org/D53950

llvm-svn: 345882
2018-11-01 19:54:45 +00:00
Jonas Paulsson faad1b3056 [TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints()
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.

NFC.

Review: Simon Pilgrim
https://reviews.llvm.org/D52316

llvm-svn: 343851
2018-10-05 14:23:11 +00:00
Fangrui Song 10a2162588 Use unique_ptr to hold AsmInfo,MRI,MII,STI
Reviewers: pcc, dblaikie

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52389

llvm-svn: 342945
2018-09-25 06:19:31 +00:00
Yonghong Song 150ca5143b bpf: check illegal usage of XADD insn return value
Currently, BPF has XADD (locked add) insn support and the
asm looks like:
  lock *(u32 *)(r1 + 0) += r2
  lock *(u64 *)(r1 + 0) += r2
The instruction itself does not have a return value.

At the source code level, users often use
  __sync_fetch_and_add()
which eventually translates to XADD. The return value of
__sync_fetch_and_add() is supposed to be the old value
in the xadd memory location. Since BPF::XADD insn does not
support such a return value, this patch added a PreEmit
phase to check such a usage. If such an illegal usage
pattern is detected, a fatal error will be reported like
  line 4: Invalid usage of the XADD return value
if compiled with -g, or
  Invalid usage of the XADD return value
if compiled without -g.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 342692
2018-09-20 22:24:27 +00:00
Yonghong Song 5b476c5a9f [bpf] Symbol sizes and types in object file
Clang-compiled object files currently don't include the symbol sizes and
types.  Some tools however need that information.  For example, ctfconvert
uses that information to generate FreeBSD's CTF representation from ELF
files.
With this patch, symbol sizes and types are included in object files.

Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>
llvm-svn: 342556
2018-09-19 16:04:13 +00:00
Benjamin Kramer 27c769d28a [Target] Untangle disassemblers
Disassemblers cannot depend on main target headers. The same is true for
MCTargetDesc, but there's a lot more cleanup needed for that.

llvm-svn: 341822
2018-09-10 12:53:46 +00:00
Yonghong Song 48883142de bpf: fix an assertion in BPFAsmBackend applyFixup()
Fix bug https://bugs.llvm.org/show_bug.cgi?id=38643

In BPFAsmBackend applyFixup(), there is an assertion for FixedValue to be 0.
This may not be true, esp. for optimiation level 0.
For example, in the above bug, for the following two
static variables:
  @bpf_map_lookup_elem = internal global i8* (i8*, i8*)*
      inttoptr (i64 1 to i8* (i8*, i8*)*), align 8
  @bpf_map_update_elem = internal global i32 (i8*, i8*, i8*, i64)*
      inttoptr (i64 2 to i32 (i8*, i8*, i8*, i64)*), align 8

The static variable @bpf_map_update_elem will have a symbol
offset of 8 and a FK_SecRel_8 with FixupValue 8 will cause
the assertion if llvm is built with -DLLVM_ENABLE_ASSERTIONS=ON.

The above relocations will not exist if the program is compiled
with optimization level -O1 and above as the compiler optimizes
those static variables away. In the below error message, -O2
is suggested as this is the common practice.

Note that FixedValue = 0 in applyFixup() does exist and is valid,
e.g., for the global variable my_map in the above bug. The bpf
loader will process them properly for map_id's before loading
the program into the kernel.

The static variables, which are not optimized away by compiler,
may have FK_SecRel_8 relocation with non-zero FixedValue.

The patch removed the offending assertion and will issue
a hard error as below if the FixedValue in applyFixup()
is not 0.
  $ llc -march=bpf -filetype=obj fixup.ll
  LLVM ERROR: Unsupported relocation: try to compile with -O2 or above,
      or check your static variable usage

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 340455
2018-08-22 21:21:03 +00:00
Yonghong Song 04ccfda075 bpf: add missing RegState to notify MachineInstr verifier necessary register usage
Errors like the following are reported by:

  https://urldefense.proofpoint.com/v2/url?u=http-3A__lab.llvm.org-3A8011_builders_llvm-2Dclang-2Dx86-5F64-2Dexpensive-2Dchecks-2Dwin_builds_11261&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=929oWPCf7Bf2qQnir4GBtowB8ZAlIRWsAdTfRkDaK-g&s=9k-wbEUVpUm474hhzsmAO29VXVvbxJPWD9RTgCD71fQ&e=

  *** Bad machine code: Explicit definition marked as use ***
  - function:    cal_align1
  - basic block: %bb.0 entry (0x47edd98)
  - instruction: LDB $r3, $r2, 0
  - operand 0:   $r3

This is because RegState info was missing for ScratchReg inside
expandMEMCPY. This caused incomplete register usage information to
MachineInstr verifier which then would complain as there could be potential
code-gen issue if the complained MachineInstr is used in place where
register usage information matters even though the memcpy expanding is not
in such case as it happens at the last stage of IR optimization pipeline.

We should always specify those register usage information which compiler
couldn't deduct automatically whenever we add a hardware register manually.

Reported-by: Builder llvm-clang-x86_64-expensive-checks-win Build #11261
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 338134
2018-07-27 16:58:52 +00:00
Yonghong Song 71d81e5c8f bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in order
Some BPF JIT backends would want to optimize memcpy in their own
architecture specific way.

However, at the moment, there is no way for JIT backends to see memcpy
semantics in a reliable way. This is due to LLVM BPF backend is expanding
memcpy into load/store sequences and could possibly schedule them apart from
each other further. So, BPF JIT backends inside kernel can't reliably
recognize memcpy semantics by peephole BPF sequence.

This patch introduce new intrinsic expand infrastructure to memcpy.

To get stable in-order load/store sequence from memcpy, we first lower
memcpy into BPF::MEMCPY node which then expanded into in-order load/store
sequences in expandPostRAPseudo pass which will happen after instruction
scheduling. By this way, kernel JIT backends could reliably recognize
memcpy through scanning BPF sequence.

This new memcpy expand infrastructure is gated by a new option:

  -bpf-expand-memcpy-in-order

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 337977
2018-07-25 22:40:02 +00:00
Peter Smith 57f661bd7d [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928

llvm-svn: 334078
2018-06-06 09:40:06 +00:00
Amaury Sechet 8467411dad Set ADDE/ADDC/SUBE/SUBC to expand by default
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.

Target that uses these opcodes are changed in order to ensure their behavior doesn't change.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D47422

llvm-svn: 333748
2018-06-01 13:21:33 +00:00
Peter Collingbourne dcd7d6c331 MC: Separate creating a generic object writer from creating a target object writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47045

llvm-svn: 332868
2018-05-21 19:20:29 +00:00
Peter Collingbourne 571a3301ae MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47035

llvm-svn: 332857
2018-05-21 17:57:19 +00:00
Peter Collingbourne e3f652973e Support: Simplify endian stream interface. NFCI.
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47032

llvm-svn: 332757
2018-05-18 19:46:24 +00:00
Peter Collingbourne f7b81db715 MC: Change the streamer ctors to take an object writer instead of a stream. NFCI.
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47050

llvm-svn: 332749
2018-05-18 18:26:45 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Shiva Chen 801bf7ebbe [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

llvm-svn: 331844
2018-05-09 02:42:00 +00:00
Nico Weber 5d53aed419 Consistently sort add_subdirectory calls in lib/Target/*/CMakeLists.txt
llvm-svn: 330584
2018-04-23 12:49:34 +00:00
Yonghong Song 149d4d3730 bpf: signal error instead of silent drop for certain invalid asm insn
Currently, an invalid asm insn, either in an asm file or
in an inline asm format, might be silently dropped. This patch
fixed two places where this may happen by
signaling the error so user knows what goes wrong.

The following is an example to demonstrate error messages:

    -bash-4.2$ cat t.c
    int test(void *ctx) {
    #if defined(NO_ERROR)
      asm volatile("r0 = *(u16 *)skb[%0]" : : "i"(2));
    #elif defined(ERROR_1)
      asm volatile("r20 = *(u16 *)skb[%0]" : : "i"(2));
    #elif defined(ERROR_2)
      asm volatile("r0 = *(u16 *)(r1 + ?)" : :);
    #endif
      return 0;
    }
    -bash-4.2$ cat run.sh
    for macro in NO_ERROR ERROR_1 ERROR_2; do
      echo "===== compile for macro" $macro
      clang -D${macro} -O2 -target bpf -emit-llvm -S t.c
      echo "==llc=="
      llc -march=bpf -filetype=obj t.ll
    done
    -bash-4.2$ ./run.sh
    ===== compile for macro NO_ERROR
    ==llc==
    ===== compile for macro ERROR_1
    ==llc==
    <inline asm>:1:2: error: invalid register/token name
            r20 = *(u16 *)skb[2]
            ^
    note: !srcloc = 135
    ===== compile for macro ERROR_2
    ==llc==
    <inline asm>:1:21: error: unexpected token
            r0 = *(u16 *)(r1 + ?)
                               ^
    note: !srcloc = 210
    -bash-4.2$

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 329849
2018-04-11 20:24:52 +00:00
Nico Weber 1cbd096914 Sort targetgen calls in lib/Target/*/CMakeLists.
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.

Also remove some stale-looking comments from the Nios2 target cmakefile.

No intended behavior change.

llvm-svn: 329181
2018-04-04 12:37:44 +00:00
Yonghong Song d3b522f519 bpf: fix incorrect SELECT_CC lowering
Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
  (1). In code, just swap is not enough, ConditionalCode CC
       should also be swapped, otherwise incorrect code will
       be generated.
  (2). The ConditionalCode swap should be subject to
       getHasJmpExt(). If getHasJmpExt() is False, certain
       conditional codes will not be supported and swap
       may generate incorrect code.

The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 329043
2018-04-03 03:56:37 +00:00
Fangrui Song 956ee79795 Fix a bunch of typoes. NFC
llvm-svn: 328907
2018-03-30 22:22:31 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
David Blaikie b3f471a4bd Remove some unused includes to fix layering.
llvm-svn: 328745
2018-03-29 00:29:45 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
Yonghong Song 82bf8bcb4f bpf: Enhance debug information for peephole optimization passes
Add more debug information for peephole optimization passes.

These would only be enabled for debug version binary and could help
analyzing why some optimization opportunities were missed.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327371
2018-03-13 06:47:07 +00:00
Yonghong Song e91802f336 bpf: New post-RA peephole optimization pass to eliminate bad RA codegen
This new pass eliminate identical move:

  MOV rA, rA

This is particularly possible to happen when sub-register support
enabled. The special type cast insn MOV_32_64 involves different
register class on src (i32) and dst (i64), RA could generate useless
instruction due to this.

This pass also could serve as the bast for further post-RA optimization.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327370
2018-03-13 06:47:06 +00:00
Yonghong Song 80b882ecc5 bpf: Don't expand BSWAP on i32, promote it
Currently, there is no ALU32 bswap support in eBPF ISA.

BSWAP on i32 was set to EXPAND which would need about eight instructions
for single BSWAP.

It would be more efficient to promote it to i64, then doing BSWAP on i64.
For eBPF programs, most of the promotion are zero extensions which are
likely be elimiated later by peephole optimizations.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327369
2018-03-13 06:47:05 +00:00
Yonghong Song 1d28a759d9 bpf: Support subregister definition check on PHI node
This patch relax the subregister definition check on Phi node.
Previously, we just cancel the optimizatoin when the definition is Phi
node while actually we could further check the definitions of incoming
parameters of PHI node.

This helps catch more elimination opportunities.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327368
2018-03-13 06:47:04 +00:00
Yonghong Song c88bcdec43 bpf: Extends zero extension elimination beyond comparison instructions
The current zero extension elimination was restricted to operands of
comparison. It actually could be extended to more cases.

For example:

  int *inc_p (int *p, unsigned a)
  {
    return p + a;
  }

'a' will be promoted to i64 during addition, and the zero extension could
be eliminated as well.

For the elimination optimization, it should be much better to start
recognizing the candidate sequence from the SRL instruction instead of J*
instructions.

This patch makes it an generic zero extension elimination pass instead of
one restricted with comparison.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327367
2018-03-13 06:47:03 +00:00
Yonghong Song 905d13c123 bpf: J*_RR should check both operands
There is a mistake in current code that we "break" out the optimization
when the first operand of J*_RR doesn't qualify the elimination. This
caused some elimination opportunities missed, for example the one in the
testcase.

The code should just fall through to handle the second operand.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327366
2018-03-13 06:47:02 +00:00
Yonghong Song 89e47ac671 bpf: Tighten subregister definition check
The current subregister definition check stops after the MOV_32_64
instruction.

This means we are thinking all the following instruction sequences
are safe to be eliminated:

  MOV_32_64 rB, wA
  SLL_ri    rB, rB, 32
  SRL_ri    rB, rB, 32

However, this is *not* true. The source subregister wA of MOV_32_64 could
come from a implicit truncation of 64-bit register in which case the high
bits of the 64-bit register is not zeroed, therefore we can't eliminate
above sequence.

For example, for i32_val, we shouldn't do the elimination:

  long long bar ();

  int foo (int b, int c)
  {
    unsigned int i32_val = (unsigned int) bar();

    if (i32_val < 10)
      return b;
    else
      return c;
  }

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327365
2018-03-13 06:47:00 +00:00
Yonghong Song 03e1c8b8f9 bpf: introduce -mattr=dwarfris to disable DwarfUsesRelocationsAcrossSections
Commit e4507fb8c94b ("bpf: disable DwarfUsesRelocationsAcrossSections")
disables MCAsmInfo DwarfUsesRelocationsAcrossSections unconditionally
so that dwarf will not use cross section (between dwarf and symbol table)
relocations. This new debug format enables pahole to dump structures
correctly as libdwarves.so does not have BPF backend support yet.

This new debug format, however, breaks bcc (https://github.com/iovisor/bcc)
source debug output as llvm in-memory Dwarf support has some issues to
handle it. More specifically, with DwarfUsesRelocationsAcrossSections
disabled, JIT compiler does not generate .debug_abbrev and Dwarf
DIE (debug info entry) processing is not happy about this.

This patch introduces a new flag -mattr=dwarfris
(dwarf relocation in section) to disable DwarfUsesRelocationsAcrossSections.
DwarfUsesRelocationsAcrossSections is true by default.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 326505
2018-03-01 23:04:59 +00:00
Yonghong Song 60fed1fef0 bpf: New optimization pass for eliminating unnecessary i32 promotions
This pass performs peephole optimizations to cleanup ugly code sequences at
MachineInstruction layer.

Currently, the only optimization in this pass is to eliminate type
promotion
sequences for zero extending 32-bit subregisters to 64-bit registers.

If the compiler could prove the zero extended source come from 32-bit
subregistere then it is safe to erase those promotion sequece, because the
upper half of the underlying 64-bit registers were zeroed implicitly
already.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325991
2018-02-23 23:49:32 +00:00
Yonghong Song ae961bb061 bpf: New decoder namespace for 32-bit subregister load/store
When -mattr=+alu32 passed to the disassembler, use decoder namespace for
32-bit subregister.

This is to disassemble load and store instructions in preferred B format
as described in previous commit:

      w = *(u8 *) (r + off) // BPF_LDX | BPF_B
      w = *(u16 *)(r + off) // BPF_LDX | BPF_H
      w = *(u32 *)(r + off) // BPF_LDX | BPF_W

      *(u8 *) (r + off) = w // BPF_STX | BPF_B
      *(u16 *)(r + off) = w // BPF_STX | BPF_H
      *(u32 *)(r + off) = w // BPF_STX | BPF_W

NOTE: all other instructions should still use the default decoder
      namespace.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325990
2018-02-23 23:49:31 +00:00
Yonghong Song ca31c3bb3f bpf: Enable 32-bit subregister support for -mattr=+alu32
After all those preparation patches, now we could enable 32-bit subregister
support once -mattr=+alu32 specified.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325989
2018-02-23 23:49:30 +00:00
Yonghong Song fcd1e0f625 bpf: Support 32-bit subregister in various InstrInfo hooks
This patch support 32-bit subregister in three InstrInfo hooks, i.e.
copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot,

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325988
2018-02-23 23:49:29 +00:00
Yonghong Song b1a52bd756 bpf: New instruction patterns for 32-bit subregister load and store
The instruction mapping between eBPF/arm64/x86_64 are:

         eBPF              arm64        x86_64
LD1   BPF_LDX | BPF_B       ldrb        movzbl
LD2   BPF_LDX | BPF_H       ldrh        movzwl
LD4   BPF_LDX | BPF_W       ldr         movl

movzbl/movzwl/movl on x86_64 accept 32-bit sub-register, for example %eax,
the same for ldrb/ldrh on arm64 which accept 32-bit "w" register. And
actually these instructions only accept sub-registers. There is no point
to have LD1/2/4 (unsigned) for 64-bit register, because on these arches,
upper 32-bits are guaranteed to be zeroed by hardware or VM, so load into
the smallest available register class is the best choice for maintaining
type information.

For eBPF we should adopt the same philosophy, to change current
format (A):

  r = *(u8 *) (r + off) // BPF_LDX | BPF_B
  r = *(u16 *)(r + off) // BPF_LDX | BPF_H
  r = *(u32 *)(r + off) // BPF_LDX | BPF_W

  *(u8 *) (r + off) = r // BPF_STX | BPF_B
  *(u16 *)(r + off) = r // BPF_STX | BPF_H
  *(u32 *)(r + off) = r // BPF_STX | BPF_W

into B:

  w = *(u8 *) (r + off) // BPF_LDX | BPF_B
  w = *(u16 *)(r + off) // BPF_LDX | BPF_H
  w = *(u32 *)(r + off) // BPF_LDX | BPF_W

  *(u8 *) (r + off) = w // BPF_STX | BPF_B
  *(u16 *)(r + off) = w // BPF_STX | BPF_H
  *(u32 *)(r + off) = w // BPF_STX | BPF_W

There is no change on encoding nor how should they be interpreted,
everything is as it is, load the specified length, write into low bits of
the register then zeroing all remaining high bits.

The only change is their associated register class and how compiler view
them.

Format A still need to be kept, because eBPF LLVM backend doesn't support
sub-registers at default, but once 32-bit subregister is enabled, it should
use format B.

This patch implemented this together with all those necessary extended load
and truncated store patterns.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325987
2018-02-23 23:49:28 +00:00
Yonghong Song 63cf273f55 bpf: Support i32 in getScalarShiftAmountTy method
getScalarShiftAmount method should be implemented for eBPF backend to make
sure shift amount could still get correct type once 32-bit subregisters
support are enabled.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325986
2018-02-23 23:49:26 +00:00
Yonghong Song 59fc805c7e bpf: Support condition comparison on i32
We need to support condition comparison on i32. All these comparisons are
supposed to be combined into BPF_J* instructions which only support i64.

For ISD::BR_CC we need to promote it to i64 first, then do custom lowering.

For ISD::SET_CC, just expand to SELECT_CC like what's been done for i64.

For ISD::SELECT_CC, we also want to do custom lower for i32. However, after
32-bit subregister support enabled, it is possible the comparison operands
are i32 while the selected value are i64, or the comparison operands are
i64 while the selected value are i32. We need to define extra instruction
pattern and support them in custom instruction inserter.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325985
2018-02-23 23:49:25 +00:00
Yonghong Song 219156cff0 bpf: Handle i32 for ALU operations without ISA support
There is no eBPF ISA support for BSWAP, ROTR, ROTL, SREM, SDIVREM, MULHU,
ADDC, ADDE etc on i32.

They could be emulated by other basic BPF_ALU operations, we'd set their
lowering action the same as i64.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325984
2018-02-23 23:49:24 +00:00
Yonghong Song 07a7a41753 bpf: New calling convention for 32-bit subregisters
This patch add new calling conventions to allow GPR32RegClass as valid
register class for arguments and return types.

New calling convention will only be choosen when -mattr=+alu32 specified.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325983
2018-02-23 23:49:23 +00:00
Yonghong Song 42389377d8 bpf: New target attribute "alu32" for 32-bit subregister support
This new attribute aims to control the enablement of 32-bit subregister
support on eBPF backend.

Name the interface as "alu32" is because we in particular want to enable
the generation of BPF_ALU32 instructions by enable subregister support.

This attribute could be used in the following format with llc:

  llc -mtriple=bpf -mattr=[+|-]alu32

It is disabled at default.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325982
2018-02-23 23:49:22 +00:00
Yonghong Song 0252f35362 bpf: Define instruction patterns for extensions and truncations between i32 to i64
For transformations between i32 and i64, if it is explicit signed extension:
  - first cast the operand to i64
  - then use SLL + SRA to finish the extension.

if it is explicit zero extension:
  - first cast the operand to i64
  - then use SLL + SRL to finish the extension.

if it is explicit any extension:
  - just refer to 64-bit register.

if it is explicit truncation:
  - just refer to 32-bit subregister.

NOTE: Some of the zero extension sequences might be unnecessary, they will be
removed by an peephole pass on MachineInstruction layer.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325981
2018-02-23 23:49:21 +00:00
Yonghong Song 3a564a8f6e bpf: Tighten the immediate predication for 32-bit alu instructions
These 32-bit ALU insn patterns which takes immediate as one operand were
initially added to enable AsmParser support, and the AsmMatcher uses "ins"
and "outs" fields to deduct the operand constraint.

However, the instruction selector doesn't work the same as AsmMatcher. The
selector will use the "pattern" field for which we are not setting the
predication for immediate operands correctly.

Without this patch, i32 would eventually means all i32 operands are valid,
both imm and gpr, while these patterns should allow imm only.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325980
2018-02-23 23:49:19 +00:00
Yonghong Song ec84e2f1b0 bpf: Use markSuperRegs to mark reserved registers
markSuperRegs is the canonical helper function used to mark reserved
registers. It could mark any overlapping sub-registers automatically.

Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 325979
2018-02-23 23:49:18 +00:00
Yonghong Song 9fdd139b41 bpf: disable DwarfUsesRelocationsAcrossSections
The pahole does not work with BPF backend properly:

  -bash-4.2$ cat test.c
  struct test_t {
    int a;
    int b;
  };
  int test(struct test_t *s) {
    return s->a;
  }
  -bash-4.2$ clang -g -O2 -target bpf -c test.c
  -bash-4.2$ pahole test.o
  struct clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) {
          clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); /*     0     4 */
          clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); /*     4     4 */

          /* size: 8, cachelines: 1, members: 2 */
          /* last cacheline: 8 bytes */
  };
  -bash-4.2$

The reason is that BPF backend is not yet implemented in elfutils backend
  https://github.com/threatstack/elfutils/tree/master/backends
and pahole depends on elfutils for dwarf parsing and resolving relocation.

More specifically, the unsupported relocation in .debug_info for type/member name
against symbol table caused the incorrect result above. The following is
the raw .rel.debug_info for the above example,
  Hex dump of section '.rel.debug_info':
    0x00000000 06000000 00000000 0a000000 0b000000 ................
    0x00000010 0c000000 00000000 0a000000 01000000 ................
    0x00000020 12000000 00000000 0a000000 02000000 ................
    0x00000030 16000000 00000000 0a000000 0e000000 ................
    0x00000040 1a000000 00000000 0a000000 03000000 ................
               ----------------- -------- --------
                reloc location     type   symtab index

  Hex dump of section '.debug_info':
    0x00000000 7b000000 04000000 00000801 00000000 {...............
    0x00000010 0c000000 00000000 00000000 00000000 ................
    0x00000020 00000000 00001000 00000200 00000000 ................

Based on "type", the proper value will be extracted from symbol table
and filled in .debug_info so later on .debug_info can be properly
resolved against debug strings.

There are two ways to fix this problem. One is to fix elfutils by adding
BPF support which is desirable. This could take a long time and won't work
with already deployed pahole. For a short term workaround, we can disable
dwarf cross-section relation which specifically avoids debug_info and
symbol table cross relocation. This should help any dwarf-related tool
which has not implement BPF specific relocations yet.

Now .rel.debug_info does not have any relocation for symbol table and
.debug_info itself contains necessary relocation information by itself.
  Hex dump of section '.debug_info':
    0x00000000 7b000000 04000000 00000801 00000000 {...............
    0x00000010 0c003700 00000000 00003e00 00000000 ..7.......>.....
    0x00000020 00000000 00001000 00000200 00000000 ................
  location 0xc has 0, 0x12 has 0x37, 0x1a has 0x3e in place which
  will be used in relocation resolution. Here, the values of 0, 0x37 and 0x3e
  are offset in .debug_str section.
Please note the difference between two above .debug_info dumps.

With the fix, pahole works properly with BPF backend:
  -bash-4.2$ clang -O2 -g -target bpf -c test.c
  -bash-4.2$ pahole test.o
  struct test_t {
          int                        a;                    /*     0     4 */
          int                        b;                    /*     4     4 */

          /* size: 8, cachelines: 1, members: 2 */
          /* last cacheline: 8 bytes */
  };

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325735
2018-02-21 22:59:14 +00:00
Jonas Paulsson 891789c299 [BPF] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Yonghong Song
llvm-svn: 325457
2018-02-18 10:09:54 +00:00
Yonghong Song 920df52a93 bpf: fix a bug in dag2dag optimization for loads from readonly section
The reference '&' is missing in the function parameter. If there are
back-to-back optimizations in terms of dag node list like below:
  t29: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)+12](dereferenceable), zext from i32> t3, t43, undef:i64
  t34: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)](dereferenceable), zext from i32> t3, t41, undef:i64
The bug will trigger a segfault for the added test case remove_truncate_5.ll:
  LLVMSymbolizer: error reading file: No such file or directory
  #0 0x000000000241c4d9 (llc+0x241c4d9)
  #1 0x000000000241c56a (llc+0x241c56a)
  #2 0x000000000241aa50 (llc+0x241aa50)
  ...
  #22 0x0000000000fd5edf (llc+0xfd5edf)
  #23 0x00007f0fe03bec05 __libc_start_main (/lib64/libc.so.6+0x21c05)
  #24 0x0000000000fd3e69 (llc+0xfd3e69)
  ...
  Segmentation fault

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325267
2018-02-15 17:06:45 +00:00
Yonghong Song f2075aef68 bpf: Improve expanding logic in LowerSELECT_CC
LowerSELECT_CC is not generating optimal Select_Ri pattern at the moment. It
is not guaranteed to place ConstantNode at RHS which would miss matching
Select_Ri.

A new testcase added into the existing select_ri.ll, also there is an
existing case in cmp.ll which would be improved to use Select_Ri after this
patch, it is adjusted accordingly.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 324560
2018-02-08 04:37:49 +00:00