Commit Graph

14296 Commits

Author SHA1 Message Date
Simon Pilgrim 5a07a614c0 [X86][SSE] Regenerated packss.ll test file.
Not sure what went wrong in rL366077....

llvm-svn: 366079
2019-07-15 16:23:42 +00:00
Simon Pilgrim 838c8e30c2 [X86][SSE] Add PACKSS with zero shuffle masks.
This is an example of expansion due to D61129 - it should combine back to a PACKSS with a zero operand.

llvm-svn: 366077
2019-07-15 15:43:04 +00:00
Craig Topper 635d103e0b [X86] Separate the memory size of vzext_load/vextract_store from the element size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only.
Summary:
SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory.

This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense.

I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load.

If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64528

llvm-svn: 366034
2019-07-15 02:02:31 +00:00
Sanjay Patel 03d5e28fe9 [x86] add test for sub-with-flags opportunity (PR40483); NFC
llvm-svn: 366019
2019-07-14 14:08:39 +00:00
Craig Topper 9450b0084a [X86] Remove offset of 8 from the call to FuseInst for UNPCKLPDrr folding added in r365287.
This was copy/pasted from above and I forgot to change it. We just
need the default offset of 0 here.

Fixes PR42616.

llvm-svn: 366011
2019-07-14 04:13:33 +00:00
Sanjay Patel 2097f75eab [x86] simplify cmov with same true/false operands
llvm-svn: 365998
2019-07-13 12:04:52 +00:00
Alex Lorenz 6d187f0eff [macCatalyst] Use macCatalyst pretty name in .build_version darwin
assembly command

'macCatalyst' is more readable than 'maccatalyst'. I renamed the objdump output,
but the assembly should match it as well.

llvm-svn: 365964
2019-07-12 22:06:08 +00:00
Sanjay Patel e26bacb652 [x86] add test for bogus cmov (PR40483); NFC
llvm-svn: 365941
2019-07-12 18:38:29 +00:00
Simon Pilgrim ce8c35a33d [X86][AVX] Add PR34359 shuffle test case.
llvm-svn: 365926
2019-07-12 17:42:32 +00:00
Craig Topper 83b380860d [X86] Pre commit test cases for D64574. Along with a test case for PR42571. NFC
llvm-svn: 365803
2019-07-11 18:19:27 +00:00
Sanjay Patel 5cc7c9ab93 [X86] Merge negated ISD::SUB nodes into X86ISD::SUB equivalent (PR40483)
Follow up to D58597, where it was noted that the commuted ISD::SUB variant
was having problems with lack of combines.

See also D63958 where we untangled setcc/sub pairs.

Differential Revision: https://reviews.llvm.org/D58875

llvm-svn: 365791
2019-07-11 15:56:33 +00:00
Simon Pilgrim d0307f93a7 [DAGCombine] narrowInsertExtractVectorBinOp - add CONCAT_VECTORS support
We already split extract_subvector(binop(insert_subvector(v,x),insert_subvector(w,y))) -> binop(x,y).

This patch adds support for extract_subvector(binop(concat_vectors(),concat_vectors())) cases as well.

In particular this means we don't have to wait for X86 lowering to convert concat_vectors to insert_subvector chains, which helps avoid some cases where demandedelts/combine calls occur too late to split large vector ops.

The fast-isel-store.ll load folding regression is annoying but I don't think is that critical.

Differential Revision: https://reviews.llvm.org/D63653

llvm-svn: 365785
2019-07-11 14:45:03 +00:00
Simon Pilgrim 0b7c38c9f9 [X86] Regenerate intrinsics tests. NFCI.
llvm-svn: 365755
2019-07-11 10:40:23 +00:00
Fangrui Song f9ca13cb5f [X86] -fno-plt: use GOT __tls_get_addr only if GOTPCRELX is enabled
Summary:
As of binutils 2.32, ld has a bogus TLS relaxation error when the GD/LD
code sequence using R_X86_64_GOTPCREL (instead of R_X86_64_GOTPCRELX) is
attempted to be relaxed to IE/LE (binutils PR24784). gold and lld are good.

In gcc/config/i386/i386.md, there is a configure-time check of as/ld
support and the GOT relaxation will not be used if as/ld doesn't support
it:

    if (flag_plt || !HAVE_AS_IX86_TLS_GET_ADDR_GOT)
      return "call\t%P2";
    return "call\t{*%p2@GOT(%1)|[DWORD PTR %p2@GOT[%1]]}";

In clang, -DENABLE_X86_RELAX_RELOCATIONS=OFF is the default. The ld.bfd
bogus error can be reproduced with:

    thread_local int a;
    int main() { return a; }

clang -fno-plt -fpic a.cc -fuse-ld=bfd

GOTPCRELX gained relative good support in 2016, which is considered
relatively new.  It is even difficult to conditionally default to
-DENABLE_X86_RELAX_RELOCATIONS=ON due to cross compilation reasons. So
work around the ld.bfd bug by only using GOT when GOTPCRELX is enabled.

Reviewers: dalias, hjl.tools, nikic, rnk

Reviewed By: nikic

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64304

llvm-svn: 365752
2019-07-11 10:10:09 +00:00
Craig Topper 88729e3dec [X86] Don't convert 8 or 16 bit ADDs to LEAs on Atom in FixupLEAPass.
We use the functions that convert to three address to do the
conversion, but changing an 8 or 16 bit will cause it to create
a virtual register. This can't be done after register allocation
where this pass runs.

I've switched the pass completely to a white list of instructions
that can be converted to LEA instead of a blacklist that was
incorrect. This will avoid surprises if we enhance the three
address conversion function to include additional instructions
in the future.

Fixes PR42565.

llvm-svn: 365720
2019-07-11 01:01:39 +00:00
Sanjay Patel 138328e45c [SDAG] commute setcc operands to match a subtract
If we have:

R = sub X, Y
P = cmp Y, X

...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags.

Motivated by diffs in D58875 - not sure if this changes anything there,
but this seems like a good thing independent of that.

There's a more involved version of this transform already in IR (in instcombine
although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()".

Differential Revision: https://reviews.llvm.org/D63958

llvm-svn: 365711
2019-07-10 23:23:54 +00:00
Craig Topper 1c327c7e0a [X86] Add patterns with and_flag_nocf for BLSI and TBM instructions.
Fixes similar issues to r352306.

llvm-svn: 365705
2019-07-10 22:44:32 +00:00
Craig Topper 472ad62b70 [X86] Add a few more TBM and BLSI test cases that show the same issue that r352306 fixed for BLSR.
llvm-svn: 365704
2019-07-10 22:44:24 +00:00
Michael Berg f4572249d7 Move three folds for FADD, FSUB and FMUL in the DAG combiner away from Unsafe to more aligned checks that reflect context
Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang.  I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it.  Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context. 

Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm

Reviewed By: arsenm

Subscribers: michele.scandale, wdng, javed.absar

Differential Revision: https://reviews.llvm.org/D64450

llvm-svn: 365679
2019-07-10 18:23:26 +00:00
Simon Pilgrim 5a6d40be1f [X86] Regenerate tests. NFCI.
Hasn't been regenerated since the update script could merge 32/64-bit checks.

llvm-svn: 365670
2019-07-10 17:22:31 +00:00
Craig Topper ab5a30ac9d [X86] Add tests for an alternative sequence for _mm_storel_pi/_mm_storeh_pi intrinsics. NFC
llvm-svn: 365667
2019-07-10 17:11:18 +00:00
Simon Pilgrim 6a58583951 [X86][SSE] EltsFromConsecutiveLoads - add basic dereferenceable support
This patch checks to see if the vector element loads are based off a dereferenceable pointer that covers the entire vector width, in which case we don't need to have element loads at both extremes of the vector width - just the start (base pointer) of it.

Another step towards partial vector loads......

Differential Revision: https://reviews.llvm.org/D64205

llvm-svn: 365614
2019-07-10 10:46:36 +00:00
Craig Topper 84a1f07363 [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.

This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.

Differential Revision: https://reviews.llvm.org/D64295

llvm-svn: 365549
2019-07-09 19:55:28 +00:00
Bjorn Pettersson 59029017a6 [LegalizeTypes] Fix saturation bug for smul.fix.sat
Summary:
Make sure we use SETGE instead of SETGT when checking
if the sign bit is zero at SMULFIXSAT expansion.

The faulty expansion occured when doing "expand" of
SMULFIXSAT and the scale was exactly matching the
size of the smaller type. For example doing
  i64 Z = SMULFIXSAT X, Y, 32
and expanding X/Y/Z into using two i32 values.

The problem was that we sometimes did not saturate
to min when overflowing.

Here is an example using Q3.4 numbers:

Consider that we are multiplying X and Y.
  X = 0x80 (-8.0 as Q3.4)
  Y = 0x20 (2.0 as Q3.4)
To avoid loss of precision we do a widening
multiplication, getting a 16 bit result
  Z = 0xF000 (-16.0 as Q7.8)

To detect negative overflow we should check if
the five most significant bits in Z are less than -1.
Assume that we name the 4 most significant bits
as HH and the next 4 bits as HL. Then we can do the
check by examining if
 (HH < -1) or (HH == -1 && "sign bit in HL is zero").

The fault was that we have been doing the check as
 (HH < -1) or (HH == -1 && HL > 0)
instead of
 (HH < -1) or (HH == -1 && HL >= 0).

In our example HH is -1 and HL is 0, so the old
code did not trigger saturation and simply truncated
the result to 0x00 (0.0). With the bugfix we instead
detect that we should saturate to min, and the result
will be set to 0x80 (-8.0).

Reviewers: leonardchan, bevinh

Reviewed By: leonardchan

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64331

llvm-svn: 365455
2019-07-09 10:24:50 +00:00
Guillaume Chatelet 336f3e1601 Fixing @llvm.memcpy not honoring volatile.
This is explicitly not addressing target-specific code, or calls to memcpy.

Summary: https://bugs.llvm.org/show_bug.cgi?id=42254

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63215

llvm-svn: 365449
2019-07-09 09:53:36 +00:00
Reid Kleckner 2f07c2e9d9 Standardize on MSVC behavior for triples with no environment
Summary:
This makes it so that IR files using triples without an environment work
out of the box, without normalizing them.

Typically, the MSVC behavior is more desirable. For example, it tends to
enable things like constant merging, use of associative comdats, etc.

Addresses PR42491

Reviewers: compnerd

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64109

llvm-svn: 365387
2019-07-08 21:05:20 +00:00
Brian Homerding b4b21d807e Add, and infer, a nofree function attribute
This patch adds a function attribute, nofree, to indicate that a function does
not, directly or indirectly, call a memory-deallocation function (e.g., free,
C++'s operator delete).

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D49165

llvm-svn: 365336
2019-07-08 15:57:56 +00:00
Craig Topper 1deca50ab1 [X86] Allow execution domain fixing to turn SHUFPD into SHUFPS.
This can help with code size on SSE targets where SHUFPD requires
a 0x66 prefix and SHUFPS doesn't.

llvm-svn: 365293
2019-07-08 06:52:49 +00:00
Craig Topper d8261f0288 [X86] Make movsd commutable to shufpd with a 0x02 immediate on pre-SSE4.1 targets.
This can help avoid a copy or enable load folding.

On SSE4.1 targets we can commute it to blendi instead.

I had to make shufpd with a 0x02 immediate commutable as well
since we expect commuting to be reversible.

llvm-svn: 365292
2019-07-08 06:52:43 +00:00
Craig Topper 46f2b583a2 [X86] Add MOVSDrr->MOVLPDrm entry to load folding table. Add custom handling to turn UNPCKLPDrr->MOVHPDrm when load is under aligned.
If the load is aligned we can turn UNPCKLPDrr into UNPCKLPDrm.

llvm-svn: 365287
2019-07-08 02:10:20 +00:00
Craig Topper e753247b06 [X86] Add PS<->PD domain changing support for MOVH/MOVL load instructions and MOVH store instructions.
These instructions don't have an integer domain equivalent, but
we can at least change between the two floating point domains.

This allows a smaller encoding on SSE targets if we can turn
PD into PS.

llvm-svn: 365268
2019-07-06 17:59:57 +00:00
Craig Topper 317d6093df [X86] Remove patterns from MOVLPSmr and MOVHPSmr instructions.
These patterns are the same as the MOVLPDmr and MOVHPDmr patterns,
but with a bitcast at the end. We can just select the PD instruction
and let execution domain fixing switch to PS.

llvm-svn: 365267
2019-07-06 17:59:51 +00:00
Craig Topper 913105ca42 [X86] Add patterns to select MOVLPDrm from MOVSD+load and MOVHPD from UNPCKL+load.
These narrow the load so we can only do it if the load isn't
volatile.

There also tests in vector-shuffle-128-v4.ll that this should
support, but we don't seem to fold bitcast+load on pre-sse4.2
targets due to the slow unaligned mem 16 flag.

llvm-svn: 365266
2019-07-06 17:59:45 +00:00
Craig Topper 8c036bf784 [X86] Copy some test cases from vector-shuffle-sse1.ll to vector-shuffle-128-v4.ll and v8 where sse1 did better load folding. NFC
llvm-svn: 365265
2019-07-06 17:59:41 +00:00
Matt Arsenault 705e46f449 RegUsageInfoCollector: Skip AMDGPU entry point functions
I'm not sure if it's worth it or not to add a hook to disable the pass
for an arbitrary function.

This pass is taking up to 5% of compile time in tiny programs by
iterating through all of the physical registers in every register
class. This pass should be rewritten in terms of regunits. For now,
skip doing anything for entry point functions. The vast majority of
functions in the real world aren't callable, so just not running this
will give the majority of the benefit.

llvm-svn: 365255
2019-07-05 23:33:43 +00:00
Robert Lougher 9dcfbbae76 This reverts r365061 and r365062 (test update)
Revision r365061 changed a skip of debug instructions for a skip
of meta instructions. This is not safe, as IMPLICIT_DEF is classed
as a meta instruction.

llvm-svn: 365202
2019-07-05 12:42:06 +00:00
Robert Lougher 2478b62098 Revert r365198 as this accidentally commited something that
should not have been added.

llvm-svn: 365199
2019-07-05 12:30:45 +00:00
Robert Lougher 3bea2b15f5 This reverts r365061 and r365062 (test update)
Revision r365061 changed a skip of debug instructions for a skip
of meta instructions. This is not safe, as IMPLICIT_DEF is classed
as a meta instruction.

llvm-svn: 365198
2019-07-05 12:20:21 +00:00
Simon Pilgrim 8b25d9bf01 [X86][SSE] LowerINSERT_VECTOR_ELT - early out for out of range indices
Fixes OSS-Fuzz #15662

llvm-svn: 365180
2019-07-05 10:34:53 +00:00
Craig Topper 171732aeb3 [X86] Add custom isel to select ADD/SUB/OR/XOR/AND to their non-immediate forms under optsize when the immediate has additional users.
Summary:
We attempt to prevent folding immediates with multiple users under optsize. But we only do this from store nodes and X86ISD::ADD/SUB/XOR/OR/AND patterns. We don't do it for ISD::ADD/SUB/XOR/OR/AND even though we count them as users when deciding whether to fold into other nodes. This leads to situations where we block folding to a compare for example, but still fold into an AND or OR as seen in PR27202.

Unfortunately touching the isel patterns in tablegen for the ISD::ADD/SUB/XOR/OR/AND opcodes will cause the patterns to be unusable for fast isel. And we don't have a way to make a fast isel only pattern.

To workaround this, this patch adds custom isel in front of the isel table that will select the non-immediate forms if the immediate has additional users. This may create some issues for ANDN and NOT matching. And there's room for improvement with unsigned 32 immediates on 64-bit AND.

This patch needs more thorough test cases, but I wanted to get feedback on the direction. Please send me any other test cases you've seen in the wild.

I think we probably have the same issue with the immediate matching when we fold RMW from X86ISD::ADD/SUB/XOR/OR/AND. And our TEST immedaite shrinking logic. Our cost modeling for immediates that can fit in a sign extended 8-bit immediate on a 16/32/64 bit operation is completely wrong.

I also wonder if we should update the ConstantHoisting cost model and block folding for "opaque" constants. But of course constants can still be created by DAG combine and lowering optimizations.

Fixes PR27202

Reviewers: spatel, RKSimon, andreadb

Reviewed By: RKSimon

Subscribers: jsji, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59909

llvm-svn: 365163
2019-07-04 22:53:57 +00:00
Craig Topper e9aed963ce [DAGCombiner] Don't combine (addcarry (uaddo X, Y), 0, Carry) -> (addcarry X, Y, Carry) if the Carry comes from the uaddo.
Summary:
The uaddo won't be removed and the addcarry will still be
dependent on the uaddo. So we'll just increase the use count
of X and Y and potentially require a COPY.

Reviewers: spatel, RKSimon, deadalnix

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64190

llvm-svn: 365149
2019-07-04 18:18:46 +00:00
Simon Pilgrim e602f70de1 [X86][SSE] Add partial dereferenceable vector load test inspired by PR21780
llvm-svn: 365145
2019-07-04 15:00:04 +00:00
Simon Pilgrim 146f1f2e5e [X86][SSE] Add some partial dereferenceable vector load tests inspired by PR16739
llvm-svn: 365138
2019-07-04 13:31:49 +00:00
Simon Pilgrim 8351c32764 [X86] Regenerate load fold peephole test.
llvm-svn: 365136
2019-07-04 12:33:37 +00:00
Simon Pilgrim fde766de4b [X86][AVX1] Combine concat_vectors(pshufd(x,c),pshufd(y,c)) -> vpermilps(concat_vectors(x,y),c)
Bitcast v4i32 to v8f32 and back again - it might be worth adding isel patterns for X86PShufd v8i32 on AVX1 targets like we did for X86Blendi to avoid the bitcasts?

llvm-svn: 365125
2019-07-04 10:17:10 +00:00
Robert Lougher 11953acb13 [X86] Update test; NFC
This updates pr38743.ll after D62605.

llvm-svn: 365062
2019-07-03 17:45:24 +00:00
Robert Lougher 720baf0416 [X86] Avoid SFB - Skip meta instructions
This patch generalizes the fix in D61680 to ignore all meta instructions,
not just debug info.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D62605

llvm-svn: 365061
2019-07-03 17:43:55 +00:00
Francis Visoiu Mistrih 83bbe2f418 [CodeGen] Make branch funnels pass the machine verifier
We previously marked all the tests with branch funnels as
`-verify-machineinstrs=0`.

This is an attempt to fix it.

1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList =
(outs)`

2) After that we hit an assert: ``` Assertion failed: (Op.getValueType()
!= MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue
operands should occur at end of operand list!"), function AddOperand,
file
/Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp,
line 461.  ```

The chain operand was added at the beginning of the operand list. Move
that to the end.

3) After that we hit another verifier issue in the pseudo expansion
where the registers used in the cmps and jmps are not added to the
livein lists. Add the `EFLAGS` to all the new MBBs that we create.

PR39436

Differential Review: https://reviews.llvm.org/D54155

llvm-svn: 365058
2019-07-03 17:16:45 +00:00
Simon Pilgrim 26812c7675 [X86] ComputeNumSignBitsForTargetNode - add target shuffle support.
llvm-svn: 365057
2019-07-03 17:06:59 +00:00
Amaury Sechet bddb8c3597 [DAGCombine] More diamong carry pattern optimization.
Summary:
This diff improve the capability of DAGCOmbine to generate linear carries propagation in presence of a diamond pattern. It is now able to match a large variety of different patterns rather than some hardcoded one.

Arguably, the codegen in test cases is not better, but this is to be expected. The goal of this transformation is more about canonicalisation than actual optimisation.

Reviewers: hfinkel, RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57302

llvm-svn: 365051
2019-07-03 16:15:59 +00:00