Commit Graph

30013 Commits

Author SHA1 Message Date
Kai Luo fec7da8285 [PowerPC][Peephole] Check if `extsw`'s second operand is a virtual register
Summary:
When combining `extsw` and `sldi` in `PPCMIPeephole`, we have to check
if `extsw`'s second operand is a virtual register, otherwise we might
get miscompile.

Differential Revision: https://reviews.llvm.org/D65315

llvm-svn: 367645
2019-08-02 03:14:17 +00:00
Sanjay Patel 8560ea5534 [AArch64][x86] adjust tests with shift-add-shift; NFC
Prevent folding away the math completely.

llvm-svn: 367612
2019-08-01 21:08:08 +00:00
Sanjay Patel cb3140b7bf [AArch64][x86] add tests for shift-add-shift; NFC (PR42644)
llvm-svn: 367607
2019-08-01 20:32:27 +00:00
Matt Arsenault d9d30a408e GlobalISel: Lower scalarizing unmerge of a vector to shifts
AMDGPU sometimes has legal s16 and <2 x s16> operations, but all
registers are really 32-bit. An unmerge destination really should ben
widened to a 32-bit register. If widening a scalarizing vector with a
target size that matches the vector size, bitcast to integer and
extract the relevant bits with shifts.

I'm not sure if this is the right place for this. This could arguably
be part of widenScalar for the result. I also have a growing feeling
that we're missing a bitcast legalize action.

llvm-svn: 367604
2019-08-01 19:10:05 +00:00
Craig Topper a9ed5436bd [X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal
If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it.

This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that...

Differential Revision: https://reviews.llvm.org/D65533

llvm-svn: 367601
2019-08-01 18:49:07 +00:00
Craig Topper 005cc42316 [X86] Add some test cases for 512-bit truncate to 128-bits with min-legal-vector-width=0 and prefer-vector-width=256.
We currently split the 512 type, truncate each half to 128 bits,
concatenate them, and then truncate again. Probably better to
truncate each half to 64-bits and then concat the results
using vpunpcklqdq.

llvm-svn: 367600
2019-08-01 18:48:57 +00:00
Matt Arsenault bb582ebdba AMDGPU: Remove v0 workaround for DS_GWS_* instructions
Any register should work for the src field since r366067, since the
used value is not pulled from the expected encoding field.

llvm-svn: 367598
2019-08-01 18:41:32 +00:00
Matt Arsenault 5faa533e47 GlobalISel: Fix widenScalar for G_MERGE_VALUES to pointer
AMDGPU testcase isn't broken now, but will be in a future patch
without this.

llvm-svn: 367591
2019-08-01 18:13:16 +00:00
Wouter van Oortmerssen 87af0b1911 [WebAssembly] Assembler/InstPrinter: support call_indirect type index.
A TYPE_INDEX operand (as used by call_indirect) used to be represented
by the InstPrinter as a symbol (e.g. .Ltype_index0@TYPE_INDEX) which
was a bit of a mismatch with the WasmObjectWriter which expects an
unnamed symbol, to receive the signature from and then turn into a
reloc.

There was really no good way to round-trip this information. An earlier
version of this patch tried to attach the signature information using
a .functype, but that ran into trouble when the symbol was re-emitted
without a name. Removing the name was a giant hack also.

The current version changes the assembly syntax to have an inline
signature spec for TYPEINDEX operands that is always unnamed, which
is much more elegant both in syntax and in implementation (as now the
assembler is able to follow the same path as the regular backend)

Reviewers: sbc100, dschuff, aheejin, jgravelle-google, sunfish, tlively

Subscribers: arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64758

llvm-svn: 367590
2019-08-01 18:08:26 +00:00
Simon Pilgrim 1d183b407a [TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling
Allow us to peek through vector insertions to avoid dependencies on entire insertion chains.

llvm-svn: 367588
2019-08-01 17:46:44 +00:00
Simon Pilgrim 63d4114f72 [X86][SSE] Add PEXTR*(PINSR*(v, s, c), c) -> s combine.
We should probably extend this to cover bitcasts as well to help other cases in promote-vec3.ll.

llvm-svn: 367582
2019-08-01 16:38:39 +00:00
Simon Pilgrim 33f5f863b5 [X86][SSE] SimplifyMultipleUseDemandedBits - Add PEXTR/PINSR B+W handling
This adds SimplifyMultipleUseDemandedBitsForTargetNode X86 support and uses it to allow us to peek through vector insertions to avoid dependencies on entire insertion chains.

llvm-svn: 367570
2019-08-01 14:46:03 +00:00
Simon Pilgrim f99f9881e3 [X86] EltsFromConsecutiveLoads - don't attempt to merge volatile loads (PR42846)
llvm-svn: 367556
2019-08-01 13:13:18 +00:00
David Green 1343814fb4 [ARM] Fix for MVE VREV64
The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the
cross-beat nature of the instruction. This adds an earlyclobber to Qd, which
seems to be the same way we deal with this on other instructions like the
write-back on loads and stores.

Differential Revision: https://reviews.llvm.org/D65502

llvm-svn: 367544
2019-08-01 11:22:03 +00:00
Simon Pilgrim 7d766c393e [ARM] Regenerate BSWAP16 tests
llvm-svn: 367543
2019-08-01 11:12:10 +00:00
Sander de Smalen 7ebccfefb8 [AArch64] Do not allocate unnecessary emergency slot.
Fix an issue where the compiler still allocates an emergency spill slot even
though it already decided to spill an extra callee-save register to use
as a scratch register.

Reviewers: gberry, thegameg, mstorsjo, t.p.northover

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D65504

llvm-svn: 367540
2019-08-01 10:53:45 +00:00
Petar Avramovic 8a40cedfe6 [MIPS GlobalISel] Fold load/store + G_GEP + G_CONSTANT
Fold load/store + G_GEP + G_CONSTANT when
immediate in G_CONSTANT fits into 16 bit signed integer.

Differential Revision: https://reviews.llvm.org/D65507

llvm-svn: 367535
2019-08-01 09:40:13 +00:00
David Zarzycki 4f1d893f9e [Testing] Fix tests that break with read-only checkouts
Found with `mount --bind -o ro ...` on Linux.

llvm-svn: 367519
2019-08-01 06:41:40 +00:00
Zi Xuan Wu 66c320908b recommit:[PowerPC] Eliminate loads/swap feeding swap/store for vector type by using big-endian load/store
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target. 
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.

Differential Revision: https://reviews.llvm.org/D65063

llvm-svn: 367516
2019-08-01 05:26:02 +00:00
Fangrui Song 67a8d6c795 AMDGPU/GlobalISel: fix inst-select-load-local.mir in -DLLVM_ENABLE_ASSERTIONS=off builds after r367498
llvm-svn: 367514
2019-08-01 04:03:06 +00:00
Matt Arsenault 9952f46407 AMDGPU/GlobalISel: Fix flat load/store of pointer types
llvm-svn: 367513
2019-08-01 03:57:42 +00:00
Matt Arsenault 57495268ac AMDGPU/GlobalISel: Remove manual store select code
This regresses the weird types that are newly treated as legal load
types, but fixes incorrectly using flat instrucions on SI.

llvm-svn: 367512
2019-08-01 03:52:40 +00:00
Matt Arsenault ae87b9f2c2 AMDGPU/GlobalISel: Select local atomic cmpxchg
llvm-svn: 367511
2019-08-01 03:41:41 +00:00
Matt Arsenault 26cb53b260 AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADD
llvm-svn: 367509
2019-08-01 03:33:15 +00:00
Matt Arsenault da5b9bfa95 AMDGPU/GlobalISel: Allow selection of DS atomicrmw
llvm-svn: 367507
2019-08-01 03:29:01 +00:00
Matt Arsenault 3baf4d3418 AMDGPU/GlobalISel: Select simple local stores
llvm-svn: 367504
2019-08-01 03:09:15 +00:00
Matt Arsenault 7bedceb5b2 GlobalISel: moreElementsVector for G_LOAD/G_STORE
AMDGPU change and test is a placeholder until a future patch with
complete handling.

llvm-svn: 367503
2019-08-01 01:44:22 +00:00
Peter Collingbourne fbc563e2cb Create unique, but identically-named ELF sections for explicitly-sectioned functions and globals when using -function-sections and -data-sections.
This allows functions and globals to to be reordered later in the linking phase
(using the -symbol-ordering-file) even though reordering will be limited to
the scope of the explicit section.

Patch by Rahman Lavaee!

Differential Revision: https://reviews.llvm.org/D65478

llvm-svn: 367501
2019-08-01 01:38:53 +00:00
Matt Arsenault d48324ff6f Reapply "AMDGPU: Split block for si_end_cf"
This reverts commit r359363, reapplying r357634

llvm-svn: 367500
2019-08-01 01:25:27 +00:00
Matt Arsenault 3594011de0 AMDGPU/GlobalISel: Select local loads
llvm-svn: 367498
2019-08-01 00:53:38 +00:00
Amy Huang 153f20057c Revert "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" and
and partial fix.
Causes windows buildbot errors.

This reverts commit 6e65c34523963094acd0d6c94a5f5c64b32fe6aa and
53da7ca943.

llvm-svn: 367496
2019-07-31 23:59:31 +00:00
Eli Friedman 89b80f1239 [ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1.
This is extremely specific, but saves three instructions when it's
legal.  I don't think the code can be usefully generalized.

Differential Revision: https://reviews.llvm.org/D65351

llvm-svn: 367492
2019-07-31 23:19:21 +00:00
Eli Friedman 2f45ec1c39 [ARM] Transform compare of masked value to shift on Thumb1.
Thumb1 has very limited immediate modes, so turning an "and" into a
shift can save multiple instructions.

It's possible to simplify the generated code for test2 and test3 in
cmp-and-fold.ll a little more, but I'll implement that as a followup.

Differential Revision: https://reviews.llvm.org/D65175

llvm-svn: 367491
2019-07-31 23:17:34 +00:00
Craig Topper b70026c43c [ScalarizeMaskedMemIntrin] Bitcast the mask to the scalar domain and use scalar bit tests for the branches.
X86 at least is able to use movmsk or kmov to move the mask to the scalar
domain. Then we can just use test instructions to test individual bits.

This is more efficient than extracting each mask element
individually.

I special cased v1i1 to use the previous behavior. This avoids
poor type legalization of bitcast of v1i1 to i1.

I've skipped expandload/compressstore as I think we need to
handle constant masks for those better first.

Many tests end up with duplicate test instructions due to tail
duplication in the branch folding pass. But the same thing
happens when constructing similar code in C. So its not unique
to the scalarization.

Not sure if this lowering code will also be good for other targets,
but we're only testing X86 today.

Differential Revision: https://reviews.llvm.org/D65319

llvm-svn: 367489
2019-07-31 22:58:15 +00:00
Craig Topper b51dc64063 [X86] Add DAG combine to fold any_extend_vector_inreg+truncstore to an extractelement+store
We have custom code that ignores the normal promoting type legalization on less than 128-bit vector types like v4i8 to emit pavgb, paddusb, psubusb since we don't have the equivalent instruction on a larger element type like v4i32. If this operation appears before a store, we can be left with an any_extend_vector_inreg followed by a truncstore after type legalization. When truncstore isn't legal, this will normally be decomposed into shuffles and a non-truncating store. This will then combine away the any_extend_vector_inreg and shuffle leaving just the store. On avx512, truncstore is legal so we don't decompose it and we had no combines to fix it.

This patch adds a new DAG combine to detect this case and emit either an extract_store for 64-bit stoers or a extractelement+store for 32 and 16 bit stores. This makes the avx512 codegen match the avx2 codegen for these situations. I'm restricting to only when -x86-experimental-vector-widening-legalization is false. When we're widening we're not likely to create this any_extend_inreg+truncstore combination. This means we should be able to remove this code when we flip the default. I would like to flip the default soon, but I need to investigate some performance regressions its causing in our branch that I wasn't seeing on trunk.

Differential Revision: https://reviews.llvm.org/D65538

llvm-svn: 367488
2019-07-31 22:43:08 +00:00
Michael Berg 005d705d43 Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context.

Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper

Reviewed By: spatel

Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji

Differential Revision: https://reviews.llvm.org/D65170

llvm-svn: 367486
2019-07-31 21:57:28 +00:00
Amy Huang 27a73dd02c Fix to r367374 "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG"
after windows buildbot failure.

Added a check that the MachineInstr exists and is a call before trying
to add symbols around it.

llvm-svn: 367483
2019-07-31 21:03:38 +00:00
Peter Collingbourne 09f39967a2 AArch64: Add a tagged-globals backend feature.
This feature instructs the backend to allow locally defined global variable
addresses to contain a pointer tag in bits 56-63 that will be ignored by
the hardware (i.e. TBI), but may be used by an instrumentation pass such
as HWASAN. It works by adding a MOVK instruction to the regular ADRP/ADD
sequence that sets bits 48-63 to the corresponding bits of the global, with
the linker bounds check disabled on the ADRP instruction to prevent the tag
from causing a link failure.

This implementation of the feature omits the MOVK when loading from or storing
to a global, which is sufficient for TBI. If the same approach is extended
to MTE, assuming that 0 is not configured as a catch-all tag, we will most
likely also need the MOVK in this case in order to avoid a tag mismatch.

Differential Revision: https://reviews.llvm.org/D65364

llvm-svn: 367475
2019-07-31 20:14:19 +00:00
Craig Topper d502f25373 [X86] Add test cases to show premature decomposition of vector multiplies into shift+add/sub for types that aren't legal and need to be split. NFC
llvm-svn: 367466
2019-07-31 19:05:11 +00:00
Craig Topper e3f0e67f2e [X86] Add AVX512DQ command lines to vector-mul.ll to show that we use vpmullq instead of shift+add/sub for some cases. NFC
llvm-svn: 367465
2019-07-31 19:05:03 +00:00
Simon Pilgrim c4fa139a5c [X86][SSE] Add test cases for PR42825
llvm-svn: 367435
2019-07-31 14:29:44 +00:00
Simon Pilgrim 24ad2b5e7d [X86][AVX] Ensure chained subvector insertions are the same size (PR42833)
Before combining insert_subvector(insert_subvector(vec, sub0, c0), sub1, c1) patterns, ensure that the subvectors are all the same type. On AVX512 targets especially we might have a mixture of 128/256 subvector insertions.

llvm-svn: 367429
2019-07-31 12:55:39 +00:00
Momchil Velikov a36d31478c [AArch64] Add support for Transactional Memory Extension (TME)
Re-commit r366322 after some fixes

TME is a future architecture technology, documented in

  https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
  https://developer.arm.com/docs/ddi0601/a

More about the future architectures:

  https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture

This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".

It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)

Differential Revision: https://reviews.llvm.org/D64416

Patch by Javed Absar and Momchil Velikov

llvm-svn: 367428
2019-07-31 12:52:17 +00:00
Simon Pilgrim 7cf5ef08b8 [X86] Regenerate lrshrink test checks to make D65354 diff easier
llvm-svn: 367426
2019-07-31 12:30:24 +00:00
Simon Pilgrim 54a68f7c73 [X86] Regenerate callee-saved test checks to make D65354 diff easier
llvm-svn: 367425
2019-07-31 12:29:07 +00:00
Simon Pilgrim 83d8d62399 [X86] Regenerate alias-static-alloca test checks to make D65354 diff easier
I've manually added the stack offsets back as these are worth keeping - we really need a way for update_llc_test_checks.py not to mask out useful address math

llvm-svn: 367424
2019-07-31 12:27:47 +00:00
Simon Pilgrim f69cbb43ec [X86] Regenerate vp2intersect tests
Enable nounwind to remove unnecessary stack manipulation code

llvm-svn: 367421
2019-07-31 12:17:10 +00:00
Simon Pilgrim 24e4e8087f [X86][AVX] Add reduced test case for PR42833
llvm-svn: 367412
2019-07-31 11:35:01 +00:00
Oliver Cruickshank 09a1b8172b [ARM] Generate MVE VFMAs
llvm-svn: 367408
2019-07-31 10:44:11 +00:00
Sam Elliott 9e6b2e1605 [RISCV] Support 'f' Inline Assembly Constraint
Summary:
This adds the 'f' inline assembly constraint, as supported by GCC. An
'f'-constrained operand is passed in a floating point register. Exactly
which kind of floating-point register (32-bit or 64-bit) is decided
based on the operand type and the available standard extensions (-f and
-d, respectively).

This patch adds support in both the clang frontend, and LLVM itself.

Reviewers: asb, lewis-revill

Reviewed By: asb

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65500

llvm-svn: 367403
2019-07-31 09:45:55 +00:00