Commit Graph

28126 Commits

Author SHA1 Message Date
Jessica Paquette 607774c960 Recommit "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"
After r355865, we should be able to safely select G_EXTRACT_VECTOR_ELT without
running into any problematic intrinsics.

Also add a fix for lane copies, which don't support index 0.

llvm-svn: 355871
2019-03-11 22:18:01 +00:00
Alex Bradbury 4d20cc21c7 [RISCV] Do a sign-extension in a compare-and-swap of 32 bit in RV64A
AtomicCmpSwapWithSuccess is legalised into an AtomicCmpSwap plus a comparison.
This requires an extension of the value which, by default, is a
zero-extension. When we later lower AtomicCmpSwap into a PseudoCmpXchg32 and then expanded in
RISCVExpandPseudoInsts.cpp, the lr.w instruction does a sign-extension.

This mismatch of extensions causes the comparison to fail when the compared
value is negative. This change overrides TargetLowering::getExtendForAtomicOps
for RISC-V so it does a sign-extension instead.

Differential Revision: https://reviews.llvm.org/D58829
Patch by Ferran Pallarès Roca.

llvm-svn: 355869
2019-03-11 21:41:22 +00:00
Jessica Paquette 42d16501e6 [GlobalISel][AArch64] Always fall back on aarch64.neon.addp.*
Overloaded intrinsics aren't necessarily safe for instruction selection. One
such intrinsic is aarch64.neon.addp.*.

This is a temporary workaround to ensure that we always fall back on that
intrinsic. Eventually this will be replaced with a proper solution.

https://bugs.llvm.org/show_bug.cgi?id=40968

Differential Revision: https://reviews.llvm.org/D59062

llvm-svn: 355865
2019-03-11 20:51:17 +00:00
Nikita Popov aa7cfa75f9 [SDAG][AArch64] Legalize VECREDUCE
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.

Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.

This also includes a few more changes to make this work somewhat
reasonably:

 * Add support for expanding VECREDUCE in SDAG. Usually
   experimental.vector.reduce is expanded prior to codegen, but if the
   target does have native vector reduce, it may of course still be
   necessary to expand due to legalization issues. This uses a shuffle
   reduction if possible, followed by a naive scalar reduction.
 * Allow the result type of integer VECREDUCE to be larger than the
   vector element type. For example we need to be able to reduce a v8i8
   into an (nominally) i32 result type on AArch64.
 * Use the vector operand type rather than the scalar result type to
   determine the action, so we can control exactly which vector types are
   supported. Also change the legalize vector op code to handle
   operations that only have vector operands, but no vector results, as
   is the case for VECREDUCE.
 * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
   explicitly specify for which vector types the reductions are supported.

This does not handle anything related to VECREDUCE_STRICT_*.

Differential Revision: https://reviews.llvm.org/D58015

llvm-svn: 355860
2019-03-11 20:22:13 +00:00
Jonas Paulsson 8b8dc50e79 [RegAlloc] Avoid compile time regression with multiple copy hints.
As a fix for https://bugs.llvm.org/show_bug.cgi?id=40986 ("excessive compile
time building opencollada"), this patch makes sure that no phys reg is hinted
more than once from getRegAllocationHints().

This handles the case were many virtual registers are assigned to the same
physreg. The previous compile time fix (r343686) in weightCalcHelper() only
made sure that physical/virtual registers are passed no more than once to
addRegAllocationHint().

Review: Dimitry Andric, Quentin Colombet
https://reviews.llvm.org/D59201

llvm-svn: 355854
2019-03-11 19:00:37 +00:00
Simon Pilgrim 06ae025345 [X86] Extend widening comparison test.
Ensure we test both v2i16 unary and binary comparisons.

llvm-svn: 355849
2019-03-11 18:08:20 +00:00
Petar Jovanovic 28e13eb098 [MIPS][microMIPS] Add a pattern to match TruncIntFP
A pattern needed to match TruncIntFP was missing. This was causing multiple
tests from llvm test suite to fail during compilation for micromips.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D58722

llvm-svn: 355825
2019-03-11 14:13:31 +00:00
Petar Avramovic 5229f47f9f [MIPS GlobalISel] NarrowScalar G_UMULH
NarrowScalar G_UMULH in LegalizerHelper 
using multiplyRegisters helper function.
NarrowScalar G_UMULH for MIPS32.

Differential Revision: https://reviews.llvm.org/D58825

llvm-svn: 355815
2019-03-11 10:08:44 +00:00
Petar Avramovic 0b17e59b5c [MIPS GlobalISel] NarrowScalar G_MUL
Narrow Scalar G_MUL for MIPS32.
Revisit NarrowScalar implementation in LegalizerHelper.
Introduce new helper function multiplyRegisters.
It performs generic multiplication of values held in multiple registers.
Generated instructions use only types NarrowTy and i1.
Destination can be same or two times size of the source.

Differential Revision: https://reviews.llvm.org/D58824

llvm-svn: 355814
2019-03-11 10:00:17 +00:00
Craig Topper 00afa193f1 [X86] Enable sse2_cvtsd2ss intrinsic to use an EVEX encoded instruction.
llvm-svn: 355810
2019-03-11 06:01:04 +00:00
Amaury Sechet a5820cbd20 Add test case for add to sub post legalization. NFC
llvm-svn: 355797
2019-03-11 01:25:48 +00:00
Sanjay Patel 26e06e859e [x86] add x86-specific opcodes to extractelement scalarization list
llvm-svn: 355792
2019-03-10 18:56:21 +00:00
Craig Topper 93e15dfacc [X86] Make lowering of intrinsics with rounding mode stricter so that only valid rounding modes are lowered. Update tests accordingly
Many of our tests were not using valid rounding mode immediates. Clang verifies this in the frontend when it creates the intrinsics from builtins, but the backend would still lower invalid immediates.

With this change we will now leave them as intrinsics if the immediate is invalid. This will cause an isel selection failure.

llvm-svn: 355789
2019-03-10 17:20:45 +00:00
Nikita Popov bfec0d610c [AArch64] Add tests for saddsat/ssubsat; NFC
Signed versions of the existing unsigned tests.

llvm-svn: 355787
2019-03-10 12:21:36 +00:00
Nikita Popov 506c1aba4d [ARM] Use non-constant operand in umulo-32.ll; NFC
Currently the store+load is folded and both operands of the umulo
end up being constants. To avoid this getting folded away entirely,
make sure at least one operand is non-constant.

Also remove some allocas which don't seem relevant to the test.

llvm-svn: 355776
2019-03-09 13:43:21 +00:00
Nikita Popov 74dde7e5a1 [ARM] Generate test checks for umulo-32.ll; NFC
The second test case is going to be changed by D59041, so generate
full baseline checks.

llvm-svn: 355775
2019-03-09 13:21:15 +00:00
Alex Bradbury fea4957177 [RISCV] Support -target-abi at the MC layer and for codegen
This patch adds proper handling of -target-abi, as accepted by llvm-mc and
llc. Lowering (codegen) for the hard-float ABIs will follow in a subsequent
patch. However, this patch does add MC layer support for the hard float and
RVE ABIs (emission of the appropriate ELF flags
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#-file-header).

ABI parsing must be shared between codegen and the MC layer, so we add
computeTargetABI to RISCVUtils. A warning will be printed if an invalid or
unrecognized ABI is given.

Differential Revision: https://reviews.llvm.org/D59023

llvm-svn: 355771
2019-03-09 09:28:06 +00:00
Thomas Lively 972d7d514b [WebAssembly] Use named operands to identify loads and stores
Summary:
Uses the named operands tablegen feature to look up the indices of
offset, address, and p2align operands for all load and store
instructions. This replaces brittle, incorrect logic for identifying
loads and store when eliminating frame indices, which previously
crashed on bulk-memory ops. It also cleans up the SetP2Alignment pass.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59007

llvm-svn: 355770
2019-03-09 04:31:37 +00:00
Sanjay Patel 40bcc3de7d [x86] add tests for extract of FP select; NFC
llvm-svn: 355768
2019-03-09 02:11:05 +00:00
Craig Topper 69f8c1653d [ScalarizeMaskedMemIntrin] Use IRBuilder functions that take uint32_t/uint64_t for getelementptr, extractelement, and insertelement.
This saves needing to call getInt32 ourselves. Making the code a little shorter.

The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue.

This cleanup because I plan to write similar code for expandload/compressstore.

llvm-svn: 355767
2019-03-09 02:08:41 +00:00
Amara Emerson 7a05d1c1f1 [AArch64][GlobalISel] Fix i1 arguments not being zero-extended as required by ABI.
Fixes PR41001.

llvm-svn: 355745
2019-03-08 22:17:00 +00:00
Sanjay Patel f84083b4db [x86] scalarize extract element 0 of FP cmp
An extension of D58282 noted in PR39665:
https://bugs.llvm.org/show_bug.cgi?id=39665

This doesn't answer the request to use movmsk, but that's an
independent problem. We need this and probably still need
scalarization of FP selects because we can't do that as a
target-independent transform (although it seems likely that
targets besides x86 should have this transform).

llvm-svn: 355741
2019-03-08 21:54:41 +00:00
Mitch Phillips 790edbc16e [HWASan] Save + print registers when tag mismatch occurs in AArch64.
Summary:
This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance.

In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable.

The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format:

Registers where the failure occurred (pc 0x0055555561b4):
    x0  0000000000000014  x1  0000007ffffff6c0  x2  1100007ffffff6d0  x3  12000056ffffe025
    x4  0000007fff800000  x5  0000000000000014  x6  0000007fff800000  x7  0000000000000001
    x8  12000056ffffe020  x9  0200007700000000  x10 0200007700000000  x11 0000000000000000
    x12 0000007fffffdde0  x13 0000000000000000  x14 02b65b01f7a97490  x15 0000000000000000
    x16 0000007fb77376b8  x17 0000000000000012  x18 0000007fb7ed6000  x19 0000005555556078
    x20 0000007ffffff768  x21 0000007ffffff778  x22 0000000000000001  x23 0000000000000000
    x24 0000000000000000  x25 0000000000000000  x26 0000000000000000  x27 0000000000000000
    x28 0000000000000000  x29 0000007ffffff6f0  x30 00000055555561b4

... and prints after the dump of memory tags around the buggy address.

Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging.

Reviewers: pcc, eugenis

Reviewed By: pcc, eugenis

Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D58857

llvm-svn: 355738
2019-03-08 21:22:35 +00:00
Matt Arsenault e8c03a2511 AMDGPU: Move d16 load matching to preprocess step
When matching half of the build_vector to a load, there could still be
a hidden dependency on the other half of the build_vector the pattern
wouldn't detect. If there was an additional chain dependency on the
other value, a cycle could be introduced.

I don't think a tablegen pattern is capable of matching the necessary
conditions, so move this into PreprocessISelDAG. Check isPredecessorOf
for the other value to avoid a cycle. This has a warning that it's
expensive, so this should probably be moved into an MI pass eventually
that will have more freedom to reorder instructions to help match
this. That is currently complicated by the lack of a computeKnownBits
type mechanism for the selected function.

llvm-svn: 355731
2019-03-08 20:58:11 +00:00
Matt Arsenault 26e76ef0e2 DAG: Don't try to cluster loads with tied inputs
This avoids breaking possible value dependencies when sorting loads by
offset.

AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.

Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.

llvm-svn: 355728
2019-03-08 20:46:15 +00:00
Sanjay Patel 43f098e719 [x86] add tests for extracted vector FP cmp; NFC
llvm-svn: 355727
2019-03-08 20:45:27 +00:00
Matt Arsenault 74c9c305e0 AMDGPU: Add more tests for d16 loads
Also fix a few cases that weren't testing what they were supposed to.

llvm-svn: 355724
2019-03-08 20:30:51 +00:00
Matt Arsenault 07f904befb AMDGPU: Correct DS implementation of areLoadsFromSameBasePtr
This was checking the wrong operands for the base register and the
offsets. The indexes are shifted by the number of output registers
from the machine instruction definition, and the chain is moved to the
end.

llvm-svn: 355722
2019-03-08 20:30:50 +00:00
Amaury Sechet 782ac933b5 [DAGCombiner] fold (add (add (xor a, -1), b), 1) -> (sub b, a)
Summary: This pattern is sometime created after legalization.

Reviewers: efriedma, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58874

llvm-svn: 355716
2019-03-08 19:39:32 +00:00
Sanjay Patel b22f438df3 [x86] prevent infinite looping from inverse shuffle transforms
llvm-svn: 355713
2019-03-08 19:20:28 +00:00
Simon Pilgrim 53652feab7 [X86] Add test case for PR22473
llvm-svn: 355712
2019-03-08 19:16:26 +00:00
Simon Pilgrim 00ab0339ed Fix typo in constant vector
llvm-svn: 355699
2019-03-08 15:17:26 +00:00
Clement Courbet 8e16d73346 [SelectionDAG] Allow the user to specify a memeq function.
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.

This is sub-optimal because memcmp has to compute much more than
equality.

This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.

`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.

Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56593

llvm-svn: 355672
2019-03-08 09:07:45 +00:00
Carl Ritson 1a98dc1840 [AMDGPU] V_CVT_F32_UBYTE{0,1,2,3} are full rate instructions
Summary: Fix a bug in the scheduling model where V_CVT_F32_UBYTE{0,1,2,3} are incorrectly marked as quarter rate instructions.

Reviewers: arsenm, rampitec

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59091

llvm-svn: 355671
2019-03-08 09:03:11 +00:00
Craig Topper 4505c99e72 [X86] Improve the type checking in isLegalMaskedLoad and isLegalMaskedGather.
We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store.

For FP we now explicitly check for only double and float.

For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly.

We only check bitwidth after checking that the type is an integer.

llvm-svn: 355667
2019-03-08 07:33:43 +00:00
Sanjay Patel 5ed14ef1e4 [x86] add extract FP tests for target-specific nodes; NFC
llvm-svn: 355655
2019-03-07 23:55:54 +00:00
Konstantin Zhuravlyov 47f0bf8f1f AMDHSA: Code object v3 updates
- Copy kernel symbol attributes into kernel descriptor attributes
  - Make sure kernel symbol's visibility is not "higher" than protected

Differential Revision: https://reviews.llvm.org/D59057

llvm-svn: 355630
2019-03-07 19:58:29 +00:00
Vlad Tsyrklevich 2e1479e2f2 Delete x86_64 ShadowCallStack support
Summary:
ShadowCallStack on x86_64 suffered from the same racy security issues as
Return Flow Guard and had performance overhead as high as 13% depending
on the benchmark. x86_64 ShadowCallStack was always an experimental
feature and never shipped a runtime required to support it, as such
there are no expected downstream users.

Reviewers: pcc

Reviewed By: pcc

Subscribers: mgorny, javed.absar, hiraditya, jdoerfert, cfe-commits, #sanitizers, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D59034

llvm-svn: 355624
2019-03-07 18:56:36 +00:00
David Green ffc922ec35 [LSR] Attempt to increase the accuracy of LSR's setup cost
In some loops, we end up generating loop induction variables that look like:
  {(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
  {(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to
0. This is because the scores, as LSR calculates them, are the same and the
second is filtered in place of the first. We end up with a redundant SUB from 0
in the code.

This patch tries to make the calculation of the setup cost a little more
thoroughly, recursing into the scev members to better approximate the setup
required. The cost function for comparing LSR costs is:

return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
                C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
       std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
                C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be
different.

Differential Revision: https://reviews.llvm.org/D58770

llvm-svn: 355597
2019-03-07 13:44:40 +00:00
Petar Avramovic 3d3120dc9a [MIPS GlobalISel] Fix mul operands
Unsigned mul high for MIPS32 is selected into two PseudoInstructions:
PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for
some of its operands. Registers in this class have appropriate hi and lo
register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc.
mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td.
In functions where mul and PseudoMULTu are present fastRegisterAllocator
will "run out of registers during register allocation" because
'calcSpillCost' for $ac0 will return spillImpossible because subregisters
$lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is
to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction.

Differential Revision: https://reviews.llvm.org/D58715

llvm-svn: 355594
2019-03-07 13:28:29 +00:00
Craig Topper 3acc4236b8 [X86] Enable combineFMinNumFMaxNum for 512 bit vectors when AVX512 is enabled.
Simplified by just checking if the vector type is legal rather than listing all combinations of types and features.

Fixes PR40984.

llvm-svn: 355582
2019-03-07 06:30:19 +00:00
Craig Topper a0dd6e9a08 [X86] Add 512-bit fminnum/maxnum test cases for PR40984. Also add v8f32 minnum/maxnum tests. NFC
llvm-svn: 355581
2019-03-07 05:56:52 +00:00
Aakanksha Patil c56d2afc63 AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV)
A previous patch for "uniform-work-group-size" attribute was found to break
some RADV and possibly radeon SI tests and had to be retracted.
This patch fixes that.

Differential Revision: http://reviews.llvm.org/D58993

llvm-svn: 355574
2019-03-07 00:54:04 +00:00
Simon Atanasyan 83b88441ad [mips] Replace assertion by error message while lowering `RETURNADDR` and `FRAMEADDR`
MIPS target supports lowering `RETURNADDR` and `FRAMEADDR` for a current
frame only. It's better to show an error message then crash on assertion
if `__builtin_return_address` is invoked with non-zero argument.

llvm-svn: 355558
2019-03-06 22:40:28 +00:00
Abderrazek Zaafrani 5ced596198 [AArch64] Improve FP16 instruction selection for vector round and vector conver from half instructions
https://reviews.llvm.org/D58855

llvm-svn: 355545
2019-03-06 20:30:06 +00:00
Nikita Popov e1012e1efb [X86] Add vector mulo with power of two operand tests; NFC
llvm-svn: 355544
2019-03-06 20:25:49 +00:00
Paul Robinson 05efe0fdc4 [PS4] Emit a trap after a stack-protector fail call.
llvm-svn: 355542
2019-03-06 19:57:43 +00:00
Sanjay Patel f941631875 [AArch64] add tests for uaddsat/usubsat; NFC
The tests are copied from the sibling x86 test files.

llvm-svn: 355535
2019-03-06 19:02:01 +00:00
Simon Pilgrim 417f8c5be4 [PowerPC] Use real pointers instead of undef
The reduced test removed the pointer arguments, but to better survive D58017 and D58070 we need them back.

llvm-svn: 355532
2019-03-06 18:49:39 +00:00
Guozhi Wei 11308bdb43 [PPC] Adjust the computed branch offset for the possible shorter distance
In file PPCBranchSelector.cpp we tend to over estimate code size due to large
alignment and inline assembly. Usually it causes larger computed branch offset,
it is not big problem. But sometimes it may also causes smaller computed branch
offset than actual branch offset. If the offset is close to the limit of
encoding, it may cause problem at run time.
Following is a simplified example.

           actual        estimated
           address        address
 ...
bne Far      100            10c
.p2align 4
Near:        110            110
 ...
Far:        8108           8108

Actual offset:    0x8108 - 0x100 = 0x8008
Computed offset:  0x8108 - 0x10c = 0x7ffc

The computed offset is at most ((1 << alignment) - 4) bytes smaller than actual
offset. So we add this number to the offset for safety.

Differential Revision: https://reviews.llvm.org/D57718

llvm-svn: 355529
2019-03-06 18:22:22 +00:00
Krzysztof Parzyszek 9c005bbdd4 [Hexagon] Avoid creating 5-instruction packets with vgather pseudos
Change the resource usage of the vgather pseudos from SLOT0+LD to
SLOT0+SLOT1.

llvm-svn: 355524
2019-03-06 17:43:50 +00:00
Ryan Taylor 67f36903ae [AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions
Summary:
This adds support for 64 bit buffer atomic arithmetic instructions but does not include
cmpswap as that depends on a fix to the way the register pairs are handled

Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58918

llvm-svn: 355520
2019-03-06 17:02:06 +00:00
Simon Pilgrim cdf95f8f07 [DAGCombiner] Enable UADDO/USUBO vector combine support
Differential Revision: https://reviews.llvm.org/D58965

llvm-svn: 355517
2019-03-06 16:11:03 +00:00
Alexander Kornienko 3d467a890e Revert "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default"
This reverts commit 2a0f2c5ef3 (r355490).

The commit causes an assertion failure when compiling LLVM code:
$ cat repro.cpp
class QQQ {
public:
  bool x() const;
  bool y() const;
  unsigned getSizeInBits() const {
    if (y() || x())
      return getScalarSizeInBits();
    return getScalarSizeInBits() * 2;
  }
  unsigned getScalarSizeInBits() const;
};
int f(const QQQ &Ty) {
  switch (Ty.getSizeInBits()) {
    case 1:
    case 8:
      return 0;
    case 16:
      return 1;
    case 32:
      return 2;
    case 64:
      return 3;
    default:
      __builtin_unreachable();
  }
}
$ clang -O2 -o repro.o repro.cpp
assert.h assertion failed at llvm/include/llvm/ADT/ilist_iterator.h:139 in llvm::ilist_iterator::reference llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, true, false>::operator*() const [OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, IsReverse = true, IsConst = false]: !NodePtr->isKnownSentinel()
*** Check failure stack trace: ***
    @     0x558aab4afc10  __assert_fail
    @     0x558aa885479b  llvm::ilist_iterator<>::operator*()
    @     0x558aa8854715  llvm::MachineInstrBundleIterator<>::operator*()
    @     0x558aa92c33c3  llvm::X86InstrInfo::optimizeCompareInstr()
    @     0x558aa9a9c251  (anonymous namespace)::PeepholeOptimizer::optimizeCmpInstr()
    @     0x558aa9a9b371  (anonymous namespace)::PeepholeOptimizer::runOnMachineFunction()
    @     0x558aa99a4fc8  llvm::MachineFunctionPass::runOnFunction()
    @     0x558aab019fc4  llvm::FPPassManager::runOnFunction()
    @     0x558aab01a3a5  llvm::FPPassManager::runOnModule()
    @     0x558aab01aa9b  (anonymous namespace)::MPPassManager::runOnModule()
    @     0x558aab01a635  llvm::legacy::PassManagerImpl::run()
    @     0x558aab01afe1  llvm::legacy::PassManager::run()
    @     0x558aa5914769  (anonymous namespace)::EmitAssemblyHelper::EmitAssembly()
    @     0x558aa5910f44  clang::EmitBackendOutput()
    @     0x558aa5906135  clang::BackendConsumer::HandleTranslationUnit()
    @     0x558aa6d165ad  clang::ParseAST()
    @     0x558aa6a94e22  clang::ASTFrontendAction::ExecuteAction()
    @     0x558aa590255d  clang::CodeGenAction::ExecuteAction()
    @     0x558aa6a94840  clang::FrontendAction::Execute()
    @     0x558aa6a38cca  clang::CompilerInstance::ExecuteAction()
    @     0x558aa4e2294b  clang::ExecuteCompilerInvocation()
    @     0x558aa4df6200  cc1_main()
    @     0x558aa4e1b37f  ExecuteCC1Tool()
    @     0x558aa4e1a725  main
    @     0x7ff20d56abbd  __libc_start_main
    @     0x558aa4df51c9  _start

llvm-svn: 355515
2019-03-06 15:23:50 +00:00
Strahinja Petrovic 94fccc93de [PowerPC] Add secure plt support for TLS symbols
This patch supports secure plt mode for TLS symbols.

Differential Revision: https://reviews.llvm.org/D45520

llvm-svn: 355513
2019-03-06 15:00:10 +00:00
Simon Pilgrim 1bdc2d1874 [DAGCombiner] Add SADDO/SSUBO combine support
Basic constant handling folds, for both scalars and vectors

Differential Revision: https://reviews.llvm.org/D58967

llvm-svn: 355506
2019-03-06 14:22:21 +00:00
Roman Lebedev 4764310505 [X86][NFC] Autogenerate check lines in cmovcmov.ll test
Investigating 8-bit cmov promotion, this test comes up.

llvm-svn: 355496
2019-03-06 11:47:43 +00:00
Simon Pilgrim 642f53d292 [DAGCombiner] Enable SMULO/UMULO vector combine support (PR40442)
Differential Revision: https://reviews.llvm.org/D58968

llvm-svn: 355495
2019-03-06 11:04:21 +00:00
Simon Pilgrim 468bb2e601 [X86][SSE] VSELECT(XOR(Cond,-1), LHS, RHS) --> VSELECT(Cond, RHS, LHS)
As noticed on D58965

DAGCombiner::visitSELECT has something similar, so we should be able to move this to DAGCombiner and support VSELECT as well at some point.

Differential Revision: https://reviews.llvm.org/D58974

llvm-svn: 355494
2019-03-06 10:54:43 +00:00
Ayonam Ray 2a0f2c5ef3 [CodeGen] Omit range checks from jump tables when lowering switches with unreachable default
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.

Differential Revision: https://reviews.llvm.org/D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev

llvm-svn: 355490
2019-03-06 10:01:02 +00:00
Ayonam Ray af92b7a3b8 Reversing the commit of revision 355483 since it is giving a regression on a newly added test.
llvm-svn: 355487
2019-03-06 07:51:28 +00:00
Craig Topper c0e01d29a4 [X86] Enable the add with 128 -> sub with -128 encoding trick with X86ISD::ADD when the carry flag isn't used.
This allows us to use an 8-bit sign extended immediate instead of a 16 or 32 bit immediate.

Also do similar for 0x80000000 with 64-bit adds to avoid having to use a movabsq.

llvm-svn: 355485
2019-03-06 07:36:38 +00:00
Craig Topper 97a1c4c340 [X86] Suppress load folding for add/sub with 128 immediate.
128 won't fit in a sign extended 8-bit immediate, but we can negate it to -128 and use the other operation. This results in a shorter encoding since the move would have used 16 or 32 bits for the immediate.

llvm-svn: 355484
2019-03-06 07:36:36 +00:00
Ayonam Ray 6025fa8e30 [CodeGen] Omit range checks from jump tables when lowering switches with unreachable default
During the lowering of a switch that would result in the generation of a
jump table, a range check is performed before indexing into the jump
table, for the switch value being outside the jump table range and a
conditional branch is inserted to jump to the default block. In case the
default block is unreachable, this conditional jump can be omitted. This
patch implements omitting this conditional branch for unreachable
defaults.

Differential Revision: https://reviews.llvm.org/D52002
Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev

llvm-svn: 355483
2019-03-06 07:27:45 +00:00
Roman Lebedev 98d412ff13 [X86][NFC] Add proper test for promotion of i8 cmov's of trunc's
There was no proper test for that code in X86TargetLowering::LowerSELECT().
Noticed accidentally while trying to modify the last branch in that function.

llvm-svn: 355452
2019-03-05 22:43:53 +00:00
Heejin Ahn ef9d6aea45 [WebAssembly] Disable MachineBlockPlacement pass
Summary:
This pass hurts code size for wasm and sometimes generates irreducible
control flow.
Context: https://github.com/emscripten-core/emscripten/pull/8233

Reviewers: kripken, dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58953

llvm-svn: 355437
2019-03-05 20:35:34 +00:00
Roman Lebedev c38831e11d [NFC][CodeGen][X86][AArch64] Add tests for C++ std::midpoint() pattern (PR40965)
Tests only for integers, not floating point or pointers.

The scalar 8-bit case uses branch instead of CMOV,
because there is no no 8-bit CMOV.

Vector tests are for consistency, since it can be vectorized.

https://bugs.llvm.org/show_bug.cgi?id=40965

llvm-svn: 355436
2019-03-05 20:18:47 +00:00
Matt Arsenault 870397739e AMDGPU: Preserve undef flag when expanding SI_IF
Fixes undefined value verifier error.

llvm-svn: 355426
2019-03-05 18:38:00 +00:00
Craig Topper 4a9dd7c39b [X86] Enable 8-bit SHL to convert to LEA
Differential Revision: https://reviews.llvm.org/D58870

llvm-svn: 355425
2019-03-05 18:37:41 +00:00
Craig Topper 216bf7f03b [X86] Allow 8-bit INC/DEC to be converted to LEA.
We already do this for 16/32/64 as well as 8-bit add with register/immediate. Might as well do it for 8-bit INC/DEC too.

Differential Revision: https://reviews.llvm.org/D58869

llvm-svn: 355424
2019-03-05 18:37:37 +00:00
Craig Topper 572e94ca02 [X86] Enable 8-bit OR with disjoint bits to convert to LEA
We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64.

Differential Revision: https://reviews.llvm.org/D58863

llvm-svn: 355423
2019-03-05 18:37:33 +00:00
Simon Pilgrim 40441aa86a [X86][SSE] Regenerate vector zero tests
llvm-svn: 355412
2019-03-05 16:52:14 +00:00
Jessica Paquette 00d5847b5c Revert "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"
This broke test-suite::aarch64_neon_intrinsics.test

Reverting while I look into it.

Example failure:
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-quick/builds/17740

llvm-svn: 355408
2019-03-05 15:47:00 +00:00
Simon Pilgrim f011e53a78 [X86] Add SMULO/UMULO combine tests
Include scalar and vector test variants covering the folds in DAGCombiner (vector isn't currently supported - PR40442)

llvm-svn: 355407
2019-03-05 15:36:45 +00:00
Simon Pilgrim 65676571e1 Fix typo in constant vector
llvm-svn: 355405
2019-03-05 15:06:01 +00:00
Simon Pilgrim a3d06ccd5e [X86] Add SADDO/UADDO and SSUBO/USUBO combine tests
Include scalar and vector test variants covering the folds in DAGCombiner (vector isn't currently supported - PR40442)

llvm-svn: 355404
2019-03-05 14:52:42 +00:00
Simon Pilgrim 4d93b9c75c [X86] Add test cases for D58874
Add scalar and vector test cases for missing (add (add (xor a, -1), b), 1) -> (sub b, a) fold

llvm-svn: 355400
2019-03-05 13:52:09 +00:00
Carl Ritson 9e3f7d8ad0 [AMDGPU] Fix DPP operand order in atomic optimizer
Summary:
Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.

Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30

Reviewers: sheredom, tpr

Reviewed By: sheredom

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58900

llvm-svn: 355394
2019-03-05 12:21:44 +00:00
Oliver Stannard 4a9086b537 [ARM] Fix select_cc lowering for fp16
When lowering a select_cc node where the true and false values are of type f16,
we can't use a general conditional move because the FP16 instructions do not
support conditional execution. Instead, we must ensure that the condition code
is one of the four supported by the VSEL instruction.

Differential revision: https://reviews.llvm.org/D58813

llvm-svn: 355385
2019-03-05 10:42:34 +00:00
David Stuttard 81eec58a0d [AMDGPU] Omit KILL instructions from hazard recognizer
Summary:
In some cases the KILL was causing a hazard to be introduced as these were
scheduled into hazard slots, but don't result in an instruction.

KILL shouldn't be considered for hazard recognition.

Change-Id: Ib6d2a2160f8c94cd0ce611ab198c7e4f46aeffcf

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58898

llvm-svn: 355384
2019-03-05 10:25:16 +00:00
Chen Zheng 9cfe7e81f1 [PowerPC] fix killed/dead flag after convert x-form to d-form tranformation.
Differential Revision: https://reviews.llvm.org/D58428

llvm-svn: 355378
2019-03-05 04:56:54 +00:00
Yonghong Song d82247cb80 [BPF] Do not generate BTF sections unnecessarily
If There is no types/non-empty strings, do not generate
.BTF section. If there is no func_info/line_info, do
not generate .BTF.ext section.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D58936

llvm-svn: 355360
2019-03-05 01:01:21 +00:00
Florian Hahn fd2d89f98b Fix invalid target triples in tests. (NFC)
llvm-svn: 355349
2019-03-04 23:37:41 +00:00
Jessica Paquette caf62b1d47 [GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT
This adds instruction selection support for G_EXTRACT_VECTOR_ELT for cases
where the index is defined by a G_CONSTANT.

It also factos out the lane copy opcode selection part into its own function,
`getLaneCopyOpcode`. This is used by both `selectUnmergeValues` and
`selectExtractElt`.

Differential Revision: https://reviews.llvm.org/D58469

llvm-svn: 355344
2019-03-04 22:35:32 +00:00
Jessica Paquette 0632e12f89 [GlobalISel][AArch64] Legalize vector G_SELECT
Just scalarize it, and add a test showing it works.

Differential Revision: https://reviews.llvm.org/D58747

llvm-svn: 355339
2019-03-04 21:12:46 +00:00
Amara Emerson 8acb0d9c82 Re-commit r355104: "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
The code to materialize a mask from a constant pool load tried to use a 128 bit
LDR to load a 64 bit constant pool entry, which was 8 byte aligned. This resulted
in a link failure in the NEON tests in the test suite since the LDR address was
unaligned. This change fixes that to instead emit a 64 bit LDR if the entry is
64 bit, before converting back to a 128 bit register for the TBL.

llvm-svn: 355326
2019-03-04 19:16:00 +00:00
Craig Topper 509a8a3cf1 [DAGCombiner][X86][SystemZ][AArch64] Combine some cases of (bitcast (build_vector constants)) between legalize types and legalize dag.
This patch enables combining integer bitcasts of integer build vectors when the new scalar type is legal. I've avoided floating point because the implementation bitcasts float to int along the way and we would need to check the intermediate types for legality

Differential Revision: https://reviews.llvm.org/D58884

llvm-svn: 355324
2019-03-04 19:12:16 +00:00
Simon Pilgrim eeb1144d27 [X86] Regenerate illegal type load test with non-undef load address.
This would be affected by an upcoming patch without undoing some of the bugpoint reduction.

llvm-svn: 355316
2019-03-04 14:49:02 +00:00
Jeremy Morse 09d8ea5282 [X86] Avoid codegen changes when DBG_VALUE appears between lowered selects
X86TargetLowering::EmitLoweredSelect presently detects sequences of CMOV pseudo
instructions without accounting for debug intrinsics. This leads to different
codegen with and without option -g, if a DBG_VALUE instruction lands in the
middle of several lowered selects.

Work around this by skipping over debug instructions when looking for CMOV
sequences, and sinking those debug insts into the EmitLoweredSelect sunk block.
This might slightly shift where variables appear in the instruction sequence,
but won't re-order assignments.

Differential Revision: https://reviews.llvm.org/D58672

llvm-svn: 355307
2019-03-04 10:56:02 +00:00
Oliver Stannard 181afc7f3b [ARM] Fix selection of VLDR.16 instruction with imm offset
The isScaledConstantInRange function takes upper and lower bounds which are
checked after dividing by the scale, so the bounds checks for half, single and
double precision should all be the same. Previously, we had wrong bounds checks
for half precision, so selected an immediate the instructions can't actually
represent.

Differential revision: https://reviews.llvm.org/D58822

llvm-svn: 355305
2019-03-04 09:17:38 +00:00
Heejin Ahn 195a62e9ae [WebAssembly] Delete ThrowUnwindDest map from WasmEHFuncInfo
Summary:
Before when we implemented the first EH proposal, 'catch <tag>'
instruction may not catch an exception so there were multiple EH pads an
exception can unwind to. That means a BB could have multiple EH pad
successors.

Now after we switched to the new proposal, every 'catch' instruction
catches an exception, and there is only one catchpad per catchswitch, so
we at most have one EH pad successor, making `ThrowUnwindDest` map in
`WasmEHInfo` unnecessary.

Keeping `ThrowUnwindDest` map in `WasmEHInfo` has its own problems,
because other optimization passes can split a BB that contains possibly
throwing calls (previously invokes), and we have to update the map every
time that happens, which is not easy for common CodeGen passes.

This also correctly updates successor info in LateEHPrepare when we add
a rethrow instruction.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58486

llvm-svn: 355296
2019-03-03 22:35:56 +00:00
Craig Topper e9e4a0f5b4 [X86] Regenerate test to get the full FP operands printed. NFC
Missed when I updated the printer to print implicit %st operand on binops.

llvm-svn: 355295
2019-03-03 20:28:52 +00:00
Simon Pilgrim d8e91a54c0 [X86] getShuffleScalarElt - peek through insert/extract subvector nodes.
llvm-svn: 355288
2019-03-03 14:11:05 +00:00
Craig Topper ce68659772 [X86] Prefer VPBLENDD for v2i64/v4i64 blends with AVX2.
We were using VPBLENDW for v2i64 and VBLENDPD for v4i64. VPBLENDD has better throughput than VPBLENDW on some CPUs so it makes sense to use it when possible. VBLENDPD will probably become VBLENDD during execution domain fixing, but we might as well use integer in isel while we can.

This should work around some issues with the domain fixing pass prefering PBLENDW when we start with PBLENDW. There may still be some v8i16 cases that could use PBLENDD.

llvm-svn: 355281
2019-03-03 00:18:07 +00:00
Amaury Sechet 31291a403c Add test case for add to sub transformation. NFC
llvm-svn: 355269
2019-03-02 14:28:59 +00:00
Xing GUO 33649349c5 [Codegen] fix typos in test case
llvm-svn: 355264
2019-03-02 08:03:59 +00:00
Thomas Lively 43876ae7bc [WebAssembly] Expand operations not supported by SIMD
Summary:
This prevents crashes in instruction selection when these operations
are used. The tests check that the scalar version of the instruction
is used where applicable, although some expansions do not use the
scalar version.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58859

llvm-svn: 355261
2019-03-02 03:32:25 +00:00
Amaury Sechet f24abf6511 [X86] Improve use of SHLD/SHRD
Summary:
This extends the variety of pattern that can generate a SHLD instead of using two shifts.

This fixes a regression that would be introduced by D57367 or D33587

Reviewers: RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57389

llvm-svn: 355260
2019-03-02 02:44:16 +00:00
Amaury Sechet 1cc0f6061f Add test case for truncate funnel shifts. NFC
llvm-svn: 355258
2019-03-02 02:24:36 +00:00
Vlad Tsyrklevich 8925138007 Revert "[MIPS GlobalISel] Fix mul operands"
This reverts commit r355178, it is causing ASan failures on the
sanitizer bots.

llvm-svn: 355219
2019-03-01 18:58:22 +00:00
Oliver Stannard 82fbbc21fd [ARM] Fix FP16 stack loads/stores for Thumb2 with frame pointer
The new addressing mode added for the v8.2A FP16 instructions uses bit 8 of the
immediate to encode the sign of the offset, like the other FP loads/stores, so
need to be treated the same way.

Differential revision: https://reviews.llvm.org/D58816

llvm-svn: 355201
2019-03-01 14:20:28 +00:00
Oliver Stannard e019e6223b [ARM] Consider undefined-on-NaN conditions in checkVSELConstraints
This function was not checking for the condition code variants which are
undefined if either input is NaN, so we were missing selection of the VSEL
instruction in some cases when using -fno-honor-nans or -ffast-math.

Differential revision: https://reviews.llvm.org/D58812

llvm-svn: 355199
2019-03-01 13:58:25 +00:00
Simon Pilgrim 1a059e6619 [X86] Regenerate legalize test files
Noticed while getting update_mir_test_checks.py to work on python3

llvm-svn: 355198
2019-03-01 13:13:40 +00:00
Simon Pilgrim 69c670e4e3 [Thumb] Add some integer abs testcases for different typesizes.
Committed on behalf of @ikulagin (Ivan Kulagin)

Differential Revision: https://reviews.llvm.org/D52138

llvm-svn: 355197
2019-03-01 12:08:50 +00:00
Diana Picus 54829ec5d0 [ARM GlobalISel] Support G_CTLZ for Thumb2
Same as ARM mode but with different opcode.

llvm-svn: 355191
2019-03-01 10:12:28 +00:00
Diana Picus afb3398da0 [ARM GlobalISel] Check target flags in test. NFCI
There was a time when we couldn't dump target-specific flags such as
arm-sbrel etc, so the tests didn't check for them. We can now be more
specific in our tests.

llvm-svn: 355189
2019-03-01 10:01:22 +00:00
Stanislav Mekhanoshin bb98841399 [AMDGPU] Mark ds instructions as meybeAtomic
These were not recognized as potential atomics by memory legalizer.
The test was working not because legalizer did a right thing, but
because it has skipped all these instructions. When I have fixed
DS desciption test started to fail because region address has
changed from 4 to 2 a while ago.

Differential Revision: https://reviews.llvm.org/D58802

llvm-svn: 355179
2019-03-01 07:59:17 +00:00
Petar Avramovic 9bf43b5c26 [MIPS GlobalISel] Fix mul operands
Unsigned mul high for MIPS32 is selected into two PseudoInstructions:
PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for
some of its operands. Registers in this class have appropriate hi and lo
register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc.
mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td.
In functions where mul and PseudoMULTu are present fastRegisterAllocator
will "run out of registers during register allocation" because
'calcSpillCost' for $ac0 will return spillImpossible because subregisters
$lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is
to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction.

Differential Revision: https://reviews.llvm.org/D58715

llvm-svn: 355178
2019-03-01 07:35:57 +00:00
Petar Avramovic a48285a190 [MIPS GlobalISel] Select G_UMULH
Legalize G_UMULO and select G_UMULH for MIPS32.

Differential Revision: https://reviews.llvm.org/D58714

llvm-svn: 355177
2019-03-01 07:25:44 +00:00
Tom Stellard 33634d1b25 AMDGPU/GlobalISel: Implement select for G_INSERT
Re-commit r344310.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 355159
2019-03-01 00:50:26 +00:00
Thomas Lively ae79f42a2f [WebAssembly] Fix crash when @llvm.global_dtors is external
Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58799

llvm-svn: 355157
2019-03-01 00:12:13 +00:00
Tom Stellard 41f32196a0 AMDGPU/GlobalISel: Implement select for G_EXTRACT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49714

llvm-svn: 355156
2019-02-28 23:37:48 +00:00
Eli Friedman d19a7060c6 [AArch64] [Windows] Don't skip constructing UnwindHelp.
In certain cases, the first non-frame-setup instruction in a function is
a branch.  For example, it could be a cbz on an argument.  Make sure we
correctly allocate the UnwindHelp, and find an appropriate register to
use to initialize it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40184

Differential Revision: https://reviews.llvm.org/D58752

llvm-svn: 355136
2019-02-28 20:38:45 +00:00
Abderrazek Zaafrani abfd10807c [AArch64] Improve FP16 vector convert from short instructions.
https://reviews.llvm.org/D58563

llvm-svn: 355134
2019-02-28 20:21:46 +00:00
Sanjay Patel 7fc6ef7dd7 [x86] scalarize extract element 0 of FP math
This is another step towards ensuring that we produce the optimal code for reductions,
but there are other potential benefits as seen in the tests diffs:

  1. Memory loads may get scalarized resulting in more efficient code.
  2. Memory stores may get scalarized resulting in more efficient code.
  3. Complex ops like fdiv/sqrt get scalarized which may be faster instructions depending on uarch.
  4. Even simple ops like addss/subss/mulss/roundss may result in faster operation/less frequency throttling when scalarized depending on uarch.

The TODO comment suggests 1 or more follow-ups for opcodes that can currently result in regressions.

Differential Revision: https://reviews.llvm.org/D58282

llvm-svn: 355130
2019-02-28 19:47:04 +00:00
Jiong Wang 3da8bcd0a0 bpf: enable sub-register code-gen for XADD
Support sub-register code-gen for XADD is like supporting any other Load
and Store patterns.

No new instruction is introduced.

  lock *(u32 *)(r1 + 0) += w2

has exactly the same underlying insn as:

  lock *(u32 *)(r1 + 0) += r2

BPF_W width modifier has guaranteed they behave the same at runtime. This
patch merely teaches BPF back-end that BPF_W width modifier could work
GPR32 register class and that's all needed for sub-register code-gen
support for XADD.

test/CodeGen/BPF/xadd.ll updated to include sub-register code-gen tests.

A new testcase test/CodeGen/BPF/xadd_legal.ll is added to make sure the
legal case could pass on all code-gen modes. It could also test dead Def
check on GPR32. If there is no proper handling like what has been done
inside BPFMIChecking.cpp:hasLivingDefs, then this testcase will fail.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 355126
2019-02-28 19:21:28 +00:00
Craig Topper 8b1703fc1d [X86] Add test case that was supposed to go with r355116.
llvm-svn: 355117
2019-02-28 18:50:16 +00:00
Amara Emerson 8d70e6425c Revert "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
Seems to break some neon intrinsics tests.

llvm-svn: 355115
2019-02-28 18:47:29 +00:00
Thomas Lively f3b4f99007 [WebAssembly] Remove uses of ThreadModel
Summary:
In the clang UI, replaces -mthread-model posix with -matomics as the
source of truth on threading. In the backend, replaces
-thread-model=posix with the atomics target feature, which is now
collected on the WebAssemblyTargetMachine along with all other used
features. These collected features will also be used to emit the
target features section in the future.

The default configuration for the backend is thread-model=posix and no
atomics, which was previously an invalid configuration. This change
makes the default valid because the thread model is ignored.

A side effect of this change is that objects are never emitted with
passive segments. It will instead be up to the linker to decide
whether sections should be active or passive based on whether atomics
are used in the final link.

Reviewers: aheejin, sbc100, dschuff

Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, steven_wu, dexonsmith, rupprecht, jfb, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D58742

llvm-svn: 355112
2019-02-28 18:39:08 +00:00
Amara Emerson 85c3afd7f6 [AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1.
This extends the existing support for shufflevector to handle cases like
<2 x float>, which we can implement by concating the vectors and using a TBL1.

Differential Revision: https://reviews.llvm.org/D58684

llvm-svn: 355104
2019-02-28 16:43:11 +00:00
Stefan Pintilie bd5429ef38 [PowerPC] Move the stack pointer update instruction later in the prologue and earlier in the epilogue.
Move the stdu instruction in the prologue and epilogue.
This should provide a small performance boost in functions that are able to do
this. I've kept this change rather conservative at the moment and functions
with frame pointers or base pointers will not try to move the stack pointer
update.

Differential Revision: https://reviews.llvm.org/D42590

llvm-svn: 355085
2019-02-28 12:23:28 +00:00
Dmitri Gribenko 3d0576bbaf Fixed a typo in the test s/CEHCK/CHECK/
Summary:
Turns out the test was not correct, I had to adjust the test to work.  I
also added CHECK-LABELs for better error messages from FileCheck while
I'm here.

Reviewers: jsji

Subscribers: nemanjai, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58614

llvm-svn: 355079
2019-02-28 10:56:39 +00:00
Simon Pilgrim 87aeff8bbb [X86][AVX] Fold vf64 concat_vectors(movddup(x),movddup(x)) -> broadcast(x)
llvm-svn: 355078
2019-02-28 10:53:58 +00:00
Diana Picus 3b7beafc77 [ARM GlobalISel] Support global variables for Thumb2
Add the same level of support as for ARM mode (i.e. still no TLS
support).

In most cases, it is sufficient to replace the opcodes with the
t2-equivalent, but there are some idiosyncrasies that I decided to
preserve because I don't understand the full implications:
* For ARM we use LDRi12 to load from constant pools, but for Thumb we
  use t2LDRpci (I'm not sure if the ideal would be to use t2LDRi12 for
  Thumb as well, or to use LDRcp for ARM).
* For Thumb we don't have an equivalent for MOV|LDRLIT_ga_pcrel_ldr, so
  we have to generate MOV|LDRLIT_ga_pcrel plus a load from GOT.

The tests are in separate files because they're hard enough to read even
without doubling the number of checks.

llvm-svn: 355077
2019-02-28 10:42:47 +00:00
Matt Arsenault bf1bf706c8 AMDGPU/GlobalISel: Add regbankselect test for phis
Add baseline for future fixes. These mostly show how this is broken
and producing illegal situations.

llvm-svn: 355057
2019-02-28 00:52:36 +00:00
Matt Arsenault 5d567dc137 AMDGPU: Enable function calls by default
Fixes some crashes on illegal call situations which are unfortunately
still valid IR.

llvm-svn: 355051
2019-02-28 00:40:32 +00:00
Abderrazek Zaafrani 2fc498a652 [AArch64] Generate FP16 vector compare instructions.
https://reviews.llvm.org/D58561

llvm-svn: 355050
2019-02-28 00:31:38 +00:00
Matt Arsenault aa03bcd23c AMDGPU: Fix crashes in invalid call cases
We have to at least tolerate calls to kernels, possibly with a
mismatched calling convention on the callsite.

llvm-svn: 355049
2019-02-28 00:28:44 +00:00
Matt Arsenault d3093c2f1f GlobalISel: Implement fewerElementsVector for phi
llvm-svn: 355048
2019-02-28 00:16:32 +00:00
Matt Arsenault 72bcf15dbf GlobalISel: Implement moreElementsVector for phi
llvm-svn: 355047
2019-02-28 00:01:05 +00:00
Joerg Sonnenberger 6a198366a0 Default to Secure PLT on PPC for NetBSD and OpenBSD.
This matches the default settings of clang.

llvm-svn: 355038
2019-02-27 21:53:14 +00:00
Philip Reames 288a95fc8c Seperate volatility and atomicity/ordering in SelectionDAG
At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..).

This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success.

To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll)..

Previously landed

    D57593 Fix a bug in the definition of isUnordered on MachineMemOperand
    D57596 [CodeGen] Be conservative about atomic accesses as for volatile
    D57802 Be conservative about unordered accesses for the moment
    rL353959: [Tests] First batch of cornercase tests for unordered atomics.
    rL353966: [Tests] RMW folding tests w/unordered atomic operations.
    rL353972: [Tests] More unordered atomic lowering tests.
    rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface
    rL354740: [Hexagon, SystemZ] Be super conservative about atomics
    rL354800: [Lanai] Be super conservative about atomics
    rL354845: [ARM] Be super conservative about atomics

Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.)

Differential Revision: https://reviews.llvm.org/D57601

llvm-svn: 355025
2019-02-27 20:20:08 +00:00
Dmitry Preobrazhensky ef92035827 [AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of instructions s_set_gpr_idx_on and s_set_gpr_idx_mode
See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D58288

llvm-svn: 354969
2019-02-27 13:12:12 +00:00
Simon Pilgrim 71bb6850cf [X86][AVX] Only combine loads to broadcasts for legal types
Thanks to @echristo for spotting this.

llvm-svn: 354961
2019-02-27 11:17:25 +00:00
Yonghong Song cc290a9e91 [BPF] Don't fail for static variables
Currently, the LLVM will print an error like
  Unsupported relocation: try to compile with -O2 or above,
  or check your static variable usage
if user defines more than one static variables in a single
ELF section (e.g., .bss or .data).

There is ongoing effort to support static and global
variables in libbpf and kernel. This patch removed the
assertion so user programs with static variables won't
fail compilation.

The static variable in-section offset is written to
the "imm" field of the corresponding to-be-relocated
bpf instruction. Below is an example to show how the
application (e.g., libbpf) can relate variable to relocations.

  -bash-4.4$ cat g1.c
  static volatile long a = 2;
  static volatile int b = 3;
  int test() { return a + b; }
  -bash-4.4$ clang -target bpf -O2 -c g1.c
  -bash-4.4$ llvm-readelf -r g1.o

  Relocation section '.rel.text' at offset 0x158 contains 2 entries:
      Offset             Info             Type               Symbol's Value  Symbol's Name
  0000000000000000  0000000400000001 R_BPF_64_64            0000000000000000 .data
  0000000000000018  0000000400000001 R_BPF_64_64            0000000000000000 .data
  -bash-4.4$ llvm-readelf -s g1.o

  Symbol table '.symtab' contains 6 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g1.c
       2: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    4 a
       3: 0000000000000008     4 OBJECT  LOCAL  DEFAULT    4 b
       4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
       5: 0000000000000000    64 FUNC    GLOBAL DEFAULT    2 test
  -bash-4.4$ llvm-objdump -d g1.o

  g1.o:   file format ELF64-BPF

  Disassembly of section .text:
  0000000000000000 test:
       0:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00         r1 = 0 ll
       2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
       3:       18 02 00 00 08 00 00 00 00 00 00 00 00 00 00 00         r2 = 8 ll
       5:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
       6:       0f 10 00 00 00 00 00 00         r0 += r1
       7:       95 00 00 00 00 00 00 00         exit
  -bash-4.4$

  . from symbol table, static variable "a" is in section #4, offset 0.
  . from symbol table, static variable "b" is in section #4, offset 8.
  . the first relocation is against symbol #4:
    4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
    and in-section offset 0 (see llvm-objdump result)
  . the second relocation is against symbol #4:
    4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
    and in-section offset 8 (see llvm-objdump result)
  . therefore, the first relocation is for variable "a", and
    the second relocation is for variable "b".

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 354954
2019-02-27 05:36:15 +00:00
Heejin Ahn 82da1ffc16 [WebAssembly] Fix ScopeTops info in CFGStackify for EH pads
Summary:
When creating `ScopeTops` info for `try` ~ `catch` ~ `end_try`, we
should create not only `end_try` -> `try` mapping but also `catch` ->
`try` mapping as well. If this is not created, `block` and `end_block`
markers later added may span across an existing `catch`, resulting in
the incorrect code like:
```
try
  block     --|  (X)
catch         |
  end_block --|
end_try
```

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58605

llvm-svn: 354945
2019-02-27 01:35:14 +00:00
Heejin Ahn cf699b4534 [WebAssembly] Remove unnecessary instructions after TRY marker placement
Summary:
This removes unnecessary instructions after TRY marker placement. There
are two cases:
- `end`/`end_block` can be removed if they overlap with `try`/`end_try`
  and they have the same return types.
- `br` right before `catch` that branches to after `end_try` can be
  deleted.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58591

llvm-svn: 354939
2019-02-27 00:50:53 +00:00
Jonas Paulsson 129826cd9f [SystemZ] Pass regalloc hints to help Load-and-Test transformations.
Since there is no "Load-and-Test-High" instruction, the 32 bit load of a
register to be compared with 0 can only be implemented with LT if the virtual
GRX32 register ends up in a low part (GR32 register).

This patch detects these cases and passes the GR32 registers (low parts) as
(soft) hints in getRegAllocationHints().

Review: Ulrich Weigand.
llvm-svn: 354935
2019-02-27 00:18:28 +00:00
Stanislav Mekhanoshin da1628eb67 [AMDGPU] Fixed hang during DAG combine
SITargetLowering::reassociateScalarOps() does not touch constants
so that DAGCombiner::ReassociateOps() does not revert the combine.
However a global address is not a ConstantSDNode.

Switched to the method used by DAGCombiner::ReassociateOps() itself
to detect constants.

Differential Revision: https://reviews.llvm.org/D58695

llvm-svn: 354926
2019-02-26 20:56:25 +00:00
Reid Kleckner 8fda7e15e6 [X86] Fix bug in vectorcall calling convention
Original implementation can't correctly handle __m256 and __m512 types
passed by reference through stack. This patch fixes it.

Patch by Wei Xiao!

Differential Revision: https://reviews.llvm.org/D57643

llvm-svn: 354921
2019-02-26 19:48:16 +00:00
Petar Avramovic bd39569913 [MIPS GlobalISel] Select G_UADDO
Lower G_UADDO.
Legalize G_UADDO for MIPS32

Differential Revision: https://reviews.llvm.org/D58671

llvm-svn: 354900
2019-02-26 17:22:42 +00:00
Ganesh Gopalasubramanian e172d7008d [X86] AMD znver2 enablement
This patch enables the following

1) AMD family 17h "znver2" tune flag (-march, -mcpu).
2) ISAs that are enabled for "znver2" architecture.
3) For the time being, it uses the znver1 scheduler model.
4) Tests are updated.
5) Scheduler descriptions are yet to be put in place.

Reviewers: craig.topper

Differential Revision: https://reviews.llvm.org/D58343

llvm-svn: 354897
2019-02-26 16:55:10 +00:00
Jonas Paulsson c110b5b69f [SystemZ] Wait with selection of legal vector/FP constants until Select().
This patch aims to make sure that any such constant that can be generated
with a vector instruction (for example VGBM) is recognized as such during
legalization and kept as a target independent node through post-legalize
DAGCombining.

Two new functions named isVectorConstantLegal() and loadVectorConstant()
replace old ways of handling vector/FP constants.

A new struct named SystemZVectorConstantInfo is used to cache the results of
isVectorConstantLegal() and pass them onto loadVectorConstant().

Support for fp128 constants in the presence of FeatureVectorEnhancements1
(z14) has been added.

Review: Ulrich Weigand
https://reviews.llvm.org/D58270

llvm-svn: 354896
2019-02-26 16:47:59 +00:00
Nirav Dave 582d46328c [DAG] Fix constant store folding to handle non-byte sizes.
Avoid crashes from zero-byte values due to sub-byte store sizes.

Reviewers: uabelho, courbet, rnk

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58626

llvm-svn: 354884
2019-02-26 15:02:32 +00:00
Simon Atanasyan 8cb497027d [mips] Emit `.module softfloat` directive
This change fixes crash on an assertion in case of using
`soft float` ABI for mips32r6 target.

llvm-svn: 354882
2019-02-26 14:45:17 +00:00
Simon Pilgrim d4a406e499 [AArch64] Add arithmetic zext bswap tests.
As requested on D58017.

llvm-svn: 354872
2019-02-26 13:22:35 +00:00
Simon Pilgrim e42be1eae2 [AArch64] Add 'free' zext bswap tests.
As requested on D58017.

llvm-svn: 354869
2019-02-26 12:04:37 +00:00
Luke Cheeseman 9e285bef2b [ARM] Add Cortex-M35P
- Add LLVM backend support for Cortex-M35P
- Documentation can be found at
  https://developer.arm.com/products/processors/cortex-m/cortex-m35p

Differentail Revision: https://reviews.llvm.org/D57763

llvm-svn: 354868
2019-02-26 12:02:12 +00:00
Simon Pilgrim 810fa04ac7 [LegalizeDAG] Expand SADDO/SSUBO using SADDSAT/SSUBSAT (PR37763)
If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results.

I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit.

Differential Revision: https://reviews.llvm.org/D58637

llvm-svn: 354866
2019-02-26 11:27:53 +00:00
Simon Pilgrim 2ccc120d19 [AMDGPU] Regenerate bswap/bitreverse tests.
Make codegen changes more obvious in D58017

llvm-svn: 354863
2019-02-26 11:01:08 +00:00
Dan Gohman c71132c0be [WebAssembly] Properly align fp128 arguments in outgoing varargs arguments
For outgoing varargs arguments, it's necessary to check the OrigAlign field
of the corresponding OutputArg entry to determine argument alignment, rather
than just computing an alignment from the argument value type. This is
because types like fp128 are split into multiple argument values, with
narrower types that don't reflect the ABI alignment of the full fp128.

This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase.

Differential Revision: https://reviews.llvm.org/D58656

llvm-svn: 354846
2019-02-26 05:20:19 +00:00
Heejin Ahn d2a56ac661 [WebAssembly] Fix a bug deleting instruction in a ranged for loop
Summary: We shouldn't delete elements while iterating a ranged for loop.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58519

llvm-svn: 354844
2019-02-26 04:08:49 +00:00
Heejin Ahn 7829763e49 [WebAssembly] Improve readability of EH tests
Summary:
- Indent check lines to easily figure out try-catch-end structure
- Add the original C++ code the tests were genereated from
- Add a few more lines to make the structure more readable
- Rename a couple function / structures
- Add label and branch annotations to cfg-stackify-eh.ll
- Temporarily delete check lines for `test1` in `cfg-stackify-eh.ll`
  because it will be updated in a later CL soon and there's no point of
  making it look better here

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58562

llvm-svn: 354842
2019-02-26 03:29:59 +00:00
Reid Kleckner 2f055f026a [X86] Fix bug in x86_intrcc with arg copy elision
Summary:
Use a custom calling convention handler for interrupts instead of fixing
up the locations in LowerMemArgument. This way, the offsets are correct
when constructed and we don't need to account for them in as many
places.

Depends on D56883

Replaces D56275

Reviewers: craig.topper, phil-opp

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56944

llvm-svn: 354837
2019-02-26 02:11:25 +00:00
Stanislav Mekhanoshin ab25f1e65b [AMDGPU] Added target to mir test. NFC.
Test was used without -mcpu, although tested instructions
not available on all ASICs.

llvm-svn: 354830
2019-02-25 22:59:55 +00:00
Matt Arsenault 752579736e RegBankSelect: Handle slightly more complex value mappings
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.

llvm-svn: 354828
2019-02-25 22:24:13 +00:00
Matt Arsenault f4bfe4cd17 AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes
llvm-svn: 354825
2019-02-25 21:32:48 +00:00
Matt Arsenault 82b103998b AMDGPU/GlobalISel: Clamp max implicit_def elements
llvm-svn: 354818
2019-02-25 20:46:06 +00:00
Craig Topper 316c58e8f1 [X86] Improve detection of unneeded shift amount masking to also handle the case that the LHS has known zeroes in it
If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case.

Differential Revision: https://reviews.llvm.org/D58475

llvm-svn: 354811
2019-02-25 19:42:47 +00:00
Nikita Popov fcbd7f6495 [Mips] Fix missing masking in fast-isel of br (PR40325)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending
(and x, 1) the condition before branching on it.

To avoid regressing trivial cases, I'm combining emission of cmp+br
sequences for the single-use + same block case (similar to what we
do in x86). icmpbr1.ll still regresses due to the cross-bb usage
of the condition.

Differential Revision: https://reviews.llvm.org/D58576

llvm-svn: 354808
2019-02-25 18:54:17 +00:00
Simon Pilgrim 28441ac75f [DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats
Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold

Differential Revision: https://reviews.llvm.org/D58585

llvm-svn: 354793
2019-02-25 16:02:01 +00:00
David Green b504f104b2 [ARM] Add some more missing T1 opcodes for the peephole optimisier
This adds a few extra Thumb1 opcodes to improve the peephole opimisers
ability to remove redundant cmp instructions. tADC and tSBC require
a small fixup to prevent MOVS being moved past the instruction, giving
the wrong flags.

Differential Revision: https://reviews.llvm.org/D58281

llvm-svn: 354791
2019-02-25 15:50:54 +00:00
Luke Cheeseman 59f77e7891 [AArch64] Add support for Cortex-A76 and Cortex-A76AE
- Add LLVM backend support for Cortex-A76 and Cortex-A76AE
- Documentation can be found at
  https://developer.arm.com/products/processors/cortex-a/cortex-a76

llvm-svn: 354788
2019-02-25 15:08:27 +00:00
Dmitri Gribenko a3a3964f98 Fixed typos in tests: s/CHEKC/CHECK/
Reviewers: ilya-biryukov

Subscribers: nemanjai, javed.absar, jsji, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D58611

llvm-svn: 354785
2019-02-25 13:41:59 +00:00
Dmitri Gribenko 751c5fbf6a Fixed typos in tests: s/CEHCK/CHECK/
Reviewers: ilya-biryukov

Subscribers: sanjoy, sdardis, javed.absar, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58608

llvm-svn: 354781
2019-02-25 13:12:33 +00:00
Simon Pilgrim c61f1e8e6c [X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483)
Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results.

Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB.

Differential Revision: https://reviews.llvm.org/D58597

llvm-svn: 354771
2019-02-25 11:19:37 +00:00
Simon Tatham b70fc0c5fd [ARM] Make fullfp16 instructions not conditionalisable.
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.

In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.

(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)

So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.

I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.

Reviewers: SjoerdMeijer, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57823

llvm-svn: 354768
2019-02-25 10:39:53 +00:00
Kang Zhang 4faa4090c9 [PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts
Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D58430

llvm-svn: 354762
2019-02-25 02:46:16 +00:00
Simon Pilgrim f43c48cb52 [X86] Add PR40483 test cases
Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops

llvm-svn: 354758
2019-02-24 21:13:29 +00:00
Simon Pilgrim cfaf663a35 [X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637)
Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step.

llvm-svn: 354757
2019-02-24 19:57:52 +00:00
Craig Topper 3fe4bd464c [X86] Fix tls variable lowering issue with large code model
Summary:
The problem here is the lowering for tls variable. Below is the DAG for the code.
SelectionDAG has 11 nodes:

t0: ch = EntryToken
      t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64
        t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
      t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
    t12: i64 = add t8, t11
  t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64
t6: ch = CopyToReg t0, Register:i32 %0, t4
And when mcmodel is large, below instruction can NOT be folded.

  t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0"

When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails.

The patch is to fold the load and X86ISD::WrapperRIP.

Fixes PR26906

Patch by LuoYuanke

Reviewers: craig.topper, rnk, annita.zhang, wxiao3

Reviewed By: rnk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58336

llvm-svn: 354756
2019-02-24 19:33:37 +00:00
Craig Topper 5532a98737 [X86][SSE] Use pblendw for v4i32/v2i64 during isel.
Summary:

Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58574

llvm-svn: 354755
2019-02-24 19:23:41 +00:00
Craig Topper be3348573e [LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.

We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.

Reviewers: spatel, RKSimon, nikic

Reviewed By: nikic

Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58567

llvm-svn: 354753
2019-02-24 19:23:36 +00:00
Sanjay Patel cb04ba032f [CGP] add special-cases to form unsigned add with overflow (PR40486)
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.

Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.

The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.

This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354746
2019-02-24 15:31:27 +00:00
Craig Topper dc185522fb [TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands.
The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist.

X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here.

Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change.

llvm-svn: 354738
2019-02-23 21:41:44 +00:00
Craig Topper be9eeb5526 Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
And its follow ups r354511, r354640.

A follow patch will fix the issue that caused it to be reverted.

llvm-svn: 354737
2019-02-23 21:41:42 +00:00
Craig Topper ccc860cb81 Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract"
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."

These were reverted in r354713 as their context depended on other patches that were reverted for a bug.

llvm-svn: 354734
2019-02-23 19:51:32 +00:00
Nikita Popov e661f946a7 [WebAssembly] Fix select of and (PR40805)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by
patterns added in D53676.

I'm removing the patterns entirely here, as they are not correct
in the general case. If necessary something more specific can be
added in the future.

Differential Revision: https://reviews.llvm.org/D58575

llvm-svn: 354733
2019-02-23 18:59:01 +00:00
Simon Pilgrim e08f177ea2 [X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x)
For AVX1, limit this to i32/f32/i64/f64 loading cases only.

llvm-svn: 354730
2019-02-23 18:34:05 +00:00
Simon Pilgrim 31793733a0 [X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place
Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD.

llvm-svn: 354729
2019-02-23 17:10:47 +00:00
Simon Dardis 86a589e38d [MIPS] Fix a incorrect test. (NFC)
This test is incorrect as it should be using the microMIPSR6 instruction to
return, not the microMIPS version.

llvm-svn: 354726
2019-02-23 15:56:32 +00:00
Craig Topper 75afc0105c [X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel.
Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t.

This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost.

The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results.

llvm-svn: 354724
2019-02-23 08:34:10 +00:00
Reid Kleckner e3876637cf Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
r354363 caused https://crbug.com/934963#c1, which has a plain C reduced
test case.

I also had to revert some dependent changes:
- r354648
- r354647
- r354640
- r354511

llvm-svn: 354713
2019-02-23 01:19:42 +00:00
Craig Topper a9697f24cf [X86] Enable custom splitting of v8i64/v16i32 sext/zext for avx/avx2 when input type will be promoted by the type legalize to 128-bits.
If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register.

Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization.

llvm-svn: 354709
2019-02-23 00:35:02 +00:00
Craig Topper b95ca56361 [X86] Add a few test cases for a v8i64 sext/zext from an illegal type that needs to be promoted to 128 bits.
If v8i64 isn't a legal type but v4i64 is, these will be split and then each half will get their input promoted and become an any_extend_vector_inreg/punpckhwd + any_extend + and/sign_extend_inreg.

If we instead recognize the input will be promoted we can emit the and/sign_extend_inreg first in a 128 bit register. Then we can sign_extend/zero_extend one half and pshufd+sign_extend/zero_extend the other half.

llvm-svn: 354708
2019-02-23 00:34:58 +00:00
Sam Clegg 275d15ecf3 [WebAssembly] Update CodeGen test expectations after rL354697. NFC
llvm-svn: 354705
2019-02-23 00:07:39 +00:00
Sanjay Patel 973143ab79 [CGP] add tests for uaddo increment/decrement; NFC
llvm-svn: 354699
2019-02-22 23:19:34 +00:00
Guozhi Wei 4c8e480358 [MBP] Factor out function hasViableTopFallthrough and enhancement
This patch factor out the function hasViableTopFallthrough from rotateLoop. It is also enhanced. Original code checks only if there is a block can be placed before current loop top. This patch also checks if the loop top is the most possible successor of its predecessor. The attached test case shows its effect.

Differential Revision: https://reviews.llvm.org/D58393

llvm-svn: 354682
2019-02-22 18:04:37 +00:00
Nirav Dave 46f939c118 Disable big-endian constant store merges from rL354676.
llvm-svn: 354677
2019-02-22 16:20:34 +00:00
Nirav Dave 44037d7a63 [DAGCombine] Fold overlapping constant stores
Fold a smaller constant store into larger constant stores immediately
preceeding it.

Reviewers: rnk, courbet

Subscribers: javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58468

llvm-svn: 354676
2019-02-22 16:00:19 +00:00
Sanjay Patel a9e289174a [x86] allow narrowing of vector UINT_TO_FP
As discussed in:
D56864
D58197

Always use the narrow (128-bit) instruction when possible.
We already had the signed int version of this transform.

llvm-svn: 354675
2019-02-22 15:47:45 +00:00
Petar Jovanovic 6083106b12 [mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM
Filling a delay slot in 32bit jump instructions with a 16bit instruction
can cause issues. According to the documentation such an operation is
unpredictable.
This patch adds opcode Mips::PseudoIndirectBranch_MM alongside
Mips::PseudoIndirectBranch and other instructions that are expanded to jr
instruction and do not allow a 16bit instruction in their delay slots.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D58507

llvm-svn: 354672
2019-02-22 14:53:58 +00:00
Roman Tereshin 99a6672bba [LowerSwitch][AMDGPU] Do not handle impossible values
This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D58096

llvm-svn: 354670
2019-02-22 14:33:46 +00:00
David Green acb628b2af [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs
This adds a number of missing Thumb1 opcodes so that the peephole optimiser can
remove redundant CMP instructions.

Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri
instruction can be used with frame indices. In thumb1 we use tADDframe.

Differential Revision: https://reviews.llvm.org/D57833

llvm-svn: 354667
2019-02-22 12:23:31 +00:00
Diana Picus 35e1c6663c [ARM GlobalISel] Support floating point for Thumb2
This is exactly the same as arm mode, so for the instruction selector
tests we just extract them to a new file and run with the same checks
for both arm and thumb mode.

For the legalizer we need to update the tests for soft float a bit, but
only because BL and tBL are slightly different. We could be pedantic and
check that we get a well-formed BL for arm mode and a tBL for thumb, but
for the purposes of the legalizer test it's sufficient to just skip over
the predicate operands in the checks. Also note that we have the
pedantic checks in the divmod test, so we're covered.

llvm-svn: 354665
2019-02-22 09:54:54 +00:00
Craig Topper fa6187d230 [LegalizeVectorOps] Improve the placement of ANDs in the ExpandLoad path for non-byte-sized loads.
When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits.

We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results.

So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR.

llvm-svn: 354655
2019-02-22 07:03:25 +00:00
Craig Topper 0ca023b3b7 [X86] Add test cases to cover the path in VectorLegalizer::ExpandLoad for non-byte sized loads where bits from two loads need to be concatenated.
If the scalar type doesn't divide evenly into the WideVT then the code will need to take some bits from adjacent scalar loads and combine them.

But most of our testing is for i1 element type which always divides evenly.

llvm-svn: 354653
2019-02-22 06:18:32 +00:00
Craig Topper 3a391fc0e8 [X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit.
Type legalization is causing two nodes to be created here, but we can use a single node to extend from v8i16 to v2i64.

llvm-svn: 354648
2019-02-22 01:49:53 +00:00
Craig Topper be22f329a9 [LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract.
Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those.

But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results.

By promoting the input at the same time we can create only a single any_extend or truncate.

There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch.

llvm-svn: 354647
2019-02-22 01:49:50 +00:00
Craig Topper 427404c769 [X86] Fix some copy/paste mistakes that caused a VR128 to be used as the address of a load in an isel pattern
This was introduced in r354511.

Fixes PR40811.

llvm-svn: 354640
2019-02-22 00:04:35 +00:00