X86TargetLowering::EmitLoweredSelect presently detects sequences of CMOV pseudo
instructions without accounting for debug intrinsics. This leads to different
codegen with and without option -g, if a DBG_VALUE instruction lands in the
middle of several lowered selects.
Work around this by skipping over debug instructions when looking for CMOV
sequences, and sinking those debug insts into the EmitLoweredSelect sunk block.
This might slightly shift where variables appear in the instruction sequence,
but won't re-order assignments.
Differential Revision: https://reviews.llvm.org/D58672
llvm-svn: 355307
The isScaledConstantInRange function takes upper and lower bounds which are
checked after dividing by the scale, so the bounds checks for half, single and
double precision should all be the same. Previously, we had wrong bounds checks
for half precision, so selected an immediate the instructions can't actually
represent.
Differential revision: https://reviews.llvm.org/D58822
llvm-svn: 355305
Summary:
Before when we implemented the first EH proposal, 'catch <tag>'
instruction may not catch an exception so there were multiple EH pads an
exception can unwind to. That means a BB could have multiple EH pad
successors.
Now after we switched to the new proposal, every 'catch' instruction
catches an exception, and there is only one catchpad per catchswitch, so
we at most have one EH pad successor, making `ThrowUnwindDest` map in
`WasmEHInfo` unnecessary.
Keeping `ThrowUnwindDest` map in `WasmEHInfo` has its own problems,
because other optimization passes can split a BB that contains possibly
throwing calls (previously invokes), and we have to update the map every
time that happens, which is not easy for common CodeGen passes.
This also correctly updates successor info in LateEHPrepare when we add
a rethrow instruction.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58486
llvm-svn: 355296
We were using VPBLENDW for v2i64 and VBLENDPD for v4i64. VPBLENDD has better throughput than VPBLENDW on some CPUs so it makes sense to use it when possible. VBLENDPD will probably become VBLENDD during execution domain fixing, but we might as well use integer in isel while we can.
This should work around some issues with the domain fixing pass prefering PBLENDW when we start with PBLENDW. There may still be some v8i16 cases that could use PBLENDD.
llvm-svn: 355281
Summary:
This prevents crashes in instruction selection when these operations
are used. The tests check that the scalar version of the instruction
is used where applicable, although some expansions do not use the
scalar version.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58859
llvm-svn: 355261
Summary:
This extends the variety of pattern that can generate a SHLD instead of using two shifts.
This fixes a regression that would be introduced by D57367 or D33587
Reviewers: RKSimon, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D57389
llvm-svn: 355260
The new addressing mode added for the v8.2A FP16 instructions uses bit 8 of the
immediate to encode the sign of the offset, like the other FP loads/stores, so
need to be treated the same way.
Differential revision: https://reviews.llvm.org/D58816
llvm-svn: 355201
This function was not checking for the condition code variants which are
undefined if either input is NaN, so we were missing selection of the VSEL
instruction in some cases when using -fno-honor-nans or -ffast-math.
Differential revision: https://reviews.llvm.org/D58812
llvm-svn: 355199
There was a time when we couldn't dump target-specific flags such as
arm-sbrel etc, so the tests didn't check for them. We can now be more
specific in our tests.
llvm-svn: 355189
These were not recognized as potential atomics by memory legalizer.
The test was working not because legalizer did a right thing, but
because it has skipped all these instructions. When I have fixed
DS desciption test started to fail because region address has
changed from 4 to 2 a while ago.
Differential Revision: https://reviews.llvm.org/D58802
llvm-svn: 355179
Unsigned mul high for MIPS32 is selected into two PseudoInstructions:
PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for
some of its operands. Registers in this class have appropriate hi and lo
register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc.
mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td.
In functions where mul and PseudoMULTu are present fastRegisterAllocator
will "run out of registers during register allocation" because
'calcSpillCost' for $ac0 will return spillImpossible because subregisters
$lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is
to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction.
Differential Revision: https://reviews.llvm.org/D58715
llvm-svn: 355178
In certain cases, the first non-frame-setup instruction in a function is
a branch. For example, it could be a cbz on an argument. Make sure we
correctly allocate the UnwindHelp, and find an appropriate register to
use to initialize it.
Fixes https://bugs.llvm.org/show_bug.cgi?id=40184
Differential Revision: https://reviews.llvm.org/D58752
llvm-svn: 355136
This is another step towards ensuring that we produce the optimal code for reductions,
but there are other potential benefits as seen in the tests diffs:
1. Memory loads may get scalarized resulting in more efficient code.
2. Memory stores may get scalarized resulting in more efficient code.
3. Complex ops like fdiv/sqrt get scalarized which may be faster instructions depending on uarch.
4. Even simple ops like addss/subss/mulss/roundss may result in faster operation/less frequency throttling when scalarized depending on uarch.
The TODO comment suggests 1 or more follow-ups for opcodes that can currently result in regressions.
Differential Revision: https://reviews.llvm.org/D58282
llvm-svn: 355130
Support sub-register code-gen for XADD is like supporting any other Load
and Store patterns.
No new instruction is introduced.
lock *(u32 *)(r1 + 0) += w2
has exactly the same underlying insn as:
lock *(u32 *)(r1 + 0) += r2
BPF_W width modifier has guaranteed they behave the same at runtime. This
patch merely teaches BPF back-end that BPF_W width modifier could work
GPR32 register class and that's all needed for sub-register code-gen
support for XADD.
test/CodeGen/BPF/xadd.ll updated to include sub-register code-gen tests.
A new testcase test/CodeGen/BPF/xadd_legal.ll is added to make sure the
legal case could pass on all code-gen modes. It could also test dead Def
check on GPR32. If there is no proper handling like what has been done
inside BPFMIChecking.cpp:hasLivingDefs, then this testcase will fail.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 355126
Summary:
In the clang UI, replaces -mthread-model posix with -matomics as the
source of truth on threading. In the backend, replaces
-thread-model=posix with the atomics target feature, which is now
collected on the WebAssemblyTargetMachine along with all other used
features. These collected features will also be used to emit the
target features section in the future.
The default configuration for the backend is thread-model=posix and no
atomics, which was previously an invalid configuration. This change
makes the default valid because the thread model is ignored.
A side effect of this change is that objects are never emitted with
passive segments. It will instead be up to the linker to decide
whether sections should be active or passive based on whether atomics
are used in the final link.
Reviewers: aheejin, sbc100, dschuff
Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, steven_wu, dexonsmith, rupprecht, jfb, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D58742
llvm-svn: 355112
This extends the existing support for shufflevector to handle cases like
<2 x float>, which we can implement by concating the vectors and using a TBL1.
Differential Revision: https://reviews.llvm.org/D58684
llvm-svn: 355104
Move the stdu instruction in the prologue and epilogue.
This should provide a small performance boost in functions that are able to do
this. I've kept this change rather conservative at the moment and functions
with frame pointers or base pointers will not try to move the stack pointer
update.
Differential Revision: https://reviews.llvm.org/D42590
llvm-svn: 355085
Summary:
Turns out the test was not correct, I had to adjust the test to work. I
also added CHECK-LABELs for better error messages from FileCheck while
I'm here.
Reviewers: jsji
Subscribers: nemanjai, eraman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58614
llvm-svn: 355079
Add the same level of support as for ARM mode (i.e. still no TLS
support).
In most cases, it is sufficient to replace the opcodes with the
t2-equivalent, but there are some idiosyncrasies that I decided to
preserve because I don't understand the full implications:
* For ARM we use LDRi12 to load from constant pools, but for Thumb we
use t2LDRpci (I'm not sure if the ideal would be to use t2LDRi12 for
Thumb as well, or to use LDRcp for ARM).
* For Thumb we don't have an equivalent for MOV|LDRLIT_ga_pcrel_ldr, so
we have to generate MOV|LDRLIT_ga_pcrel plus a load from GOT.
The tests are in separate files because they're hard enough to read even
without doubling the number of checks.
llvm-svn: 355077
At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..).
This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success.
To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll)..
Previously landed
D57593 Fix a bug in the definition of isUnordered on MachineMemOperand
D57596 [CodeGen] Be conservative about atomic accesses as for volatile
D57802 Be conservative about unordered accesses for the moment
rL353959: [Tests] First batch of cornercase tests for unordered atomics.
rL353966: [Tests] RMW folding tests w/unordered atomic operations.
rL353972: [Tests] More unordered atomic lowering tests.
rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface
rL354740: [Hexagon, SystemZ] Be super conservative about atomics
rL354800: [Lanai] Be super conservative about atomics
rL354845: [ARM] Be super conservative about atomics
Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.)
Differential Revision: https://reviews.llvm.org/D57601
llvm-svn: 355025
Currently, the LLVM will print an error like
Unsupported relocation: try to compile with -O2 or above,
or check your static variable usage
if user defines more than one static variables in a single
ELF section (e.g., .bss or .data).
There is ongoing effort to support static and global
variables in libbpf and kernel. This patch removed the
assertion so user programs with static variables won't
fail compilation.
The static variable in-section offset is written to
the "imm" field of the corresponding to-be-relocated
bpf instruction. Below is an example to show how the
application (e.g., libbpf) can relate variable to relocations.
-bash-4.4$ cat g1.c
static volatile long a = 2;
static volatile int b = 3;
int test() { return a + b; }
-bash-4.4$ clang -target bpf -O2 -c g1.c
-bash-4.4$ llvm-readelf -r g1.o
Relocation section '.rel.text' at offset 0x158 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name
0000000000000000 0000000400000001 R_BPF_64_64 0000000000000000 .data
0000000000000018 0000000400000001 R_BPF_64_64 0000000000000000 .data
-bash-4.4$ llvm-readelf -s g1.o
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS g1.c
2: 0000000000000000 8 OBJECT LOCAL DEFAULT 4 a
3: 0000000000000008 4 OBJECT LOCAL DEFAULT 4 b
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 64 FUNC GLOBAL DEFAULT 2 test
-bash-4.4$ llvm-objdump -d g1.o
g1.o: file format ELF64-BPF
Disassembly of section .text:
0000000000000000 test:
0: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
2: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0)
3: 18 02 00 00 08 00 00 00 00 00 00 00 00 00 00 00 r2 = 8 ll
5: 61 20 00 00 00 00 00 00 r0 = *(u32 *)(r2 + 0)
6: 0f 10 00 00 00 00 00 00 r0 += r1
7: 95 00 00 00 00 00 00 00 exit
-bash-4.4$
. from symbol table, static variable "a" is in section #4, offset 0.
. from symbol table, static variable "b" is in section #4, offset 8.
. the first relocation is against symbol #4:
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
and in-section offset 0 (see llvm-objdump result)
. the second relocation is against symbol #4:
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
and in-section offset 8 (see llvm-objdump result)
. therefore, the first relocation is for variable "a", and
the second relocation is for variable "b".
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 354954
Summary:
When creating `ScopeTops` info for `try` ~ `catch` ~ `end_try`, we
should create not only `end_try` -> `try` mapping but also `catch` ->
`try` mapping as well. If this is not created, `block` and `end_block`
markers later added may span across an existing `catch`, resulting in
the incorrect code like:
```
try
block --| (X)
catch |
end_block --|
end_try
```
Reviewers: dschuff
Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58605
llvm-svn: 354945
Summary:
This removes unnecessary instructions after TRY marker placement. There
are two cases:
- `end`/`end_block` can be removed if they overlap with `try`/`end_try`
and they have the same return types.
- `br` right before `catch` that branches to after `end_try` can be
deleted.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58591
llvm-svn: 354939