Commit Graph

13331 Commits

Author SHA1 Message Date
Nikita Popov 9d7e86a978 [CodeGen] Handle vector UADDO, SADDO, USUBO, SSUBO
This is part of https://bugs.llvm.org/show_bug.cgi?id=40442.

Vector legalization is implemented for the add/sub overflow opcodes.
UMULO/SMULO are also handled as far as legalization is concerned, but
they don't support vector expansion yet (so no tests for them).

The vector result widening implementation is suboptimal, because it
could result in a legalization loop.

Differential Revision: https://reviews.llvm.org/D57639

llvm-svn: 353464
2019-02-07 21:02:22 +00:00
Sanjay Patel a5c4a5e958 [x86] split more 256/512-bit shuffles in lowering
This is intentionally a small step because it's hard to know exactly 
where we might introduce a conflicting transform with the code that 
tries to form wider shuffles. But I think this is safe - if we have 
a wide shuffle with 2 operands, then we should do better with an 
extract + narrow shuffle.

Differential Revision: https://reviews.llvm.org/D57867

llvm-svn: 353427
2019-02-07 17:10:49 +00:00
Craig Topper 1c7ee20819 [X86] Change the CPU on the test case for pr40529.ll to really show the bug. NFC
llvm-svn: 353334
2019-02-06 19:50:59 +00:00
Sanjay Patel 29a710be6a [x86] add tests for horizontal ops (PR38971, PR33758); NFC
llvm-svn: 353332
2019-02-06 19:40:11 +00:00
Nirav Dave e5c37958f9 [InlineAsm][X86] Add backend support for X86 flag output parameters.
Allow custom handling of inline assembly output parameters and add X86
flag parameter support.

llvm-svn: 353307
2019-02-06 15:26:29 +00:00
Sanjay Patel e84fbb67a1 [x86] vectorize cast ops in lowering to avoid register file transfers
The proposal in D56796 may cross the line because we're trying to avoid vectorization 
transforms in generic DAG combining. So this is an alternate, later, x86-specific 
translation of that patch.

There are several potential follow-ups to enhance this:
1. Allow extraction from non-zero element index.
2. Peek through extends of smaller width integers.
3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P

Differential Revision: https://reviews.llvm.org/D56864

llvm-svn: 353302
2019-02-06 14:59:39 +00:00
Philip Reames b5bb4a4ec6 [Test] Add codegen tests for unordered and monotonic integer operations
llvm-svn: 353266
2019-02-06 03:19:04 +00:00
Sanjay Patel 997b2aba58 [x86] add tests for extract+sitofp; NFC
llvm-svn: 353249
2019-02-06 00:19:56 +00:00
Craig Topper 4562f420cd [X86] Regenerate tests missed in r353061. NFC
We now print the implicit %st register on these instruction, but since they occur at the end of the line, FileCheck didn't see they were missing.

llvm-svn: 353222
2019-02-05 21:47:42 +00:00
Simon Pilgrim b0afc69435 [X86][SSE] Disable ZERO_EXTEND shuffle combining
rL352997 enabled ZERO_EXTEND from non-shuffle-able value types. I've disabled it for now to fix a regression identified by @asbirlea until I can fix this properly.

llvm-svn: 353198
2019-02-05 19:15:48 +00:00
Simon Pilgrim 822d2e35e7 [X86][AVX] Attempt to combine shuffles to subvector broadcast load
llvm-svn: 353189
2019-02-05 17:02:49 +00:00
Simon Pilgrim ec5a6761e5 [X86][AVX] Add PR34041 subvector broadcast test cases
llvm-svn: 353182
2019-02-05 16:18:30 +00:00
Sanjay Patel 237e208f16 [AArch64][x86] add tests for unsigned subtract with overflow; NFC
llvm-svn: 353178
2019-02-05 15:26:42 +00:00
Simon Pilgrim 8450ad17a9 [X86][SSE] Rename SimplifyDemandedVectorElts BLENDV tests
I'm going to be adding SimplifyDemandedBits tests shortly.

llvm-svn: 353171
2019-02-05 14:11:50 +00:00
Simon Pilgrim 62af24cc93 [X86][SSE] Add SimplifyDemandedVectorElts support for X86ISD::BLENDV
llvm-svn: 353165
2019-02-05 12:27:29 +00:00
Simon Pilgrim bd3adbb899 [X86][SSE] Add tests showing missing SimplifyDemandedVectorElts support for X86ISD::BLENDV
llvm-svn: 353164
2019-02-05 12:18:34 +00:00
Simon Pilgrim 9e595e3663 [X86][AVX] Attempt to share broadcasts of different widths (PR39454)
If we have broadcasts of different vector widths, keep the longest vector width and extract subvectors for the shorter vectors (which should be free).

Differential Revision: https://reviews.llvm.org/D57663

llvm-svn: 353154
2019-02-05 10:58:43 +00:00
Clement Courbet f3da6abf0f [DAG][NFC] Add unit tests.
In preparation for D57541.

llvm-svn: 353144
2019-02-05 08:00:17 +00:00
Craig Topper f86eb00f12 [X86] Connect the default fpsr and dirflag clobbers in inline assembly to the registers we have defined for them.
Summary:
We don't currently map these constraints to physical register numbers so they don't make it to the MachineIR representation of inline assembly.

This could have problems for proper dependency tracking in the machine schedulers though I don't have a test case that shows that.

Reviewers: rnk

Reviewed By: rnk

Subscribers: eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57641

llvm-svn: 353141
2019-02-05 06:13:06 +00:00
Craig Topper 31259c52ff [X86] Add test case from PR40529. NFC
llvm-svn: 353138
2019-02-05 04:48:23 +00:00
Matt Arsenault 7f09fd6b04 GlobalISel: Consolidate load/store legalization
The fewerElementsVectors implementation for load/stores
handles the scalar reduction case just as well, so drop
the redundant code in narrowScalar. This also introduces
support for narrowing irregular size breakdowns for
scalars.

llvm-svn: 353125
2019-02-05 00:26:12 +00:00
Craig Topper d4e37afe45 [DAGCombiner] Discard pointer info when combining extract_vector_elt of a vector load when the index isn't constant
Summary:
If the index isn't constant, this transform inserts a multiply and an add on the index to calculating the base pointer for a scalar load. But we still create a memory operand with an offset of 0 and the size of the scalar access. But the access is really to an unknown offset within the original access size.

This can cause the machine scheduler to incorrectly calculate dependencies between this load and other accesses. In the case we saw, there was a 32 byte vector store that was split into two 16 byte stores, one with offset 0 and one with offset 16. The size of the memory operand for both was 16. The scheduler correctly detected the alias with the offset 0 store, but not the offset 16 store.

This patch discards the pointer info so we don't incorrectly detect aliasing. I wasn't sure if we could keep using the original offset and size without risking some other transform on the load changing the size.

I tried to reduce a test case, but there's still a lot of memory operations needed to get the scheduler to do the bad reordering. So it looked pretty fragile to maintain.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57616

llvm-svn: 353124
2019-02-05 00:22:23 +00:00
Matt Arsenault 1f795e2c2a GlobalISel: Enforce operand types for constants
A number of of tests were using imm operands, not cimm. Since CSE
relies on the exact ConstantInt* pointer used, and implicit
conversions are generally evil, also enforce the bitsize of the types.

llvm-svn: 353113
2019-02-04 23:29:31 +00:00
Matt Arsenault f2a26339e2 GlobalISel: Verify g_select
Factor the common vector element consistency check many instructions
need out, although this makes the error messages worse.

llvm-svn: 353112
2019-02-04 23:29:16 +00:00
Craig Topper 4ca0b850e0 [X86] Add test case for report_fatal_error added in r352699.
r352699 replaced an llvm_unreachable with a report_fatal_error. This patch adds a test case for it.

llvm-svn: 353096
2019-02-04 21:24:15 +00:00
Roman Lebedev b7ecc9b624 [X86] X86DAGToDAGISel::matchBitExtract(): prepare 'control' in 32 bits
Summary:
Noticed while looking at D56052.
```
  // The 'control' of BEXTR has the pattern of:
  // [15...8 bit][ 7...0 bit] location
  // [ bit count][     shift] name
  // I.e. 0b000000011'00000001 means  (x >> 0b1) & 0b11
```
I.e. we do not care about any of the bits aside from the low 16 bits.
So there is no point in doing the `slh`,`or` in 64 bits,
let's just do everything in 32 bits, and anyext if needed.

We could do that in 16 even, but we intentionally don't
zext to i16 (longer encoding IIRC),
so i'm guessing the same applies here.

Reviewers: craig.topper, andreadb, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D56715

llvm-svn: 353073
2019-02-04 19:04:26 +00:00
Craig Topper 8ea72a8201 [X86] Add ST0 as an implicit def/use of x87 load/store instructions during FP stackifying.
These instructions implicitly operate on ST0, but we don't currently add that information to the MachineInstr. We also don't add it the tablegen definitions either.

For the most part this doesn't cause any problems because the stackifying occurs after register allocation. All the instructions are marked as having side effects so the postRA scheduler won't reorder them amongst themselves.

But nothing stops inline assembly using X87 instructions from being reordered around other x87 instructions if that inline assembly wasn't marked volatile.

The two test cases I've identified so far in PR40539 involve loads and stores used to set up the inline assembly or capture the results of the inline assembly ending up in the wrong order.

This patch adds implicit ST0 uses/defs to the load/store instructions to prevent this from happening.

I plan to fix all of the FP instructions, but the binops are bit trickier to get right. So I've chosen fixing the known test cases as a good first step.

I think we also need to update the tablegen descriptions so MS inline assembly infers the right clobbers, but I haven't checked that yet.

Differential Revision: https://reviews.llvm.org/D57644

llvm-svn: 353070
2019-02-04 18:43:55 +00:00
Craig Topper bf7593ec4a [X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two arguments where on is %st.
All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read.

This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior.

llvm-svn: 353061
2019-02-04 17:28:18 +00:00
Leonard Chan 68d428e578 [Intrinsic] Unsigned Fixed Point Multiplication Intrinsic
Add an intrinsic that takes 2 unsigned integers with the scale of them
provided as the third argument and performs fixed point multiplication on
them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D55625

llvm-svn: 353059
2019-02-04 17:18:11 +00:00
Simon Pilgrim 6e5350a367 [X86][SSE] SimplifyDemandedBitsForTargetNode - PCMPGT(0,X) sign mask
For PCMPGT(0, X) patterns where we only demand the sign bit (e.g. BLENDV or MOVMSK) then we can use X directly.

Differential Revision: https://reviews.llvm.org/D57667

llvm-svn: 353051
2019-02-04 15:43:36 +00:00
Simon Pilgrim a536b89fe0 [DAGCombine] Add ADD(SUB,SUB) combines
Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case.

We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage.

llvm-svn: 353044
2019-02-04 13:44:49 +00:00
Andrea Di Biagio edbf06a767 [AsmPrinter] Remove hidden flag -print-schedule.
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).

Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".

These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.

When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.

Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.

Differential Revision: https://reviews.llvm.org/D57244

llvm-svn: 353043
2019-02-04 12:51:26 +00:00
Simon Pilgrim fb222aa319 [X86] Add a couple of missed ADD combine tests
Noticed while investigating PR40483

llvm-svn: 353042
2019-02-04 12:37:38 +00:00
Craig Topper b5e945c260 Recommit r352660 "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."
We now print ST0 as 'st' when generating the clobber list for MS inline assembly in clang. This matches what the gcc reg name list expects.

Original commit message:

This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler.

Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler.

Differential Revision: https://reviews.llvm.org/D57298

llvm-svn: 353016
2019-02-04 04:44:20 +00:00
Craig Topper 7a2944efe1 [X86] Print %st(0) as %st when its implicit to the instruction. Continue printing it as %st(0) when its encoded in the instruction.
This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior.

llvm-svn: 353015
2019-02-04 04:15:10 +00:00
Craig Topper 145ccb0eb9 [X86] Regenerate test to drop 'End function' comments some other other regex updates.
llvm-svn: 353014
2019-02-04 04:15:04 +00:00
Craig Topper f77b858dc3 Revert r352985 "[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly"
Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st.

I'll be making a more directed change in a future patch.

llvm-svn: 353013
2019-02-04 04:15:02 +00:00
Simon Pilgrim 135413d381 [NFC] Make vector types legal in UREM test
As discussed in D50222, this changes the vector types in tests required for that revision to ones legal for X86.

Patch by @hermord (Dmytro Shynkevych)

Differential Revision: https://reviews.llvm.org/D56372

llvm-svn: 353004
2019-02-03 19:38:15 +00:00
Sanjay Patel 84ceae6048 [CGP] adjust target constraints for forming uaddo
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
   of condition registers. That diff allows PPC to produce 
   slightly better (dot-instructions should be generally good) 
   code.
   Note: someone that cares about PPC codegen might want to 
   look closer at that output because it seems like we could
   still improve this.

2. We (probably?) should not bother trying to form uaddo (or
   other overflow ops) when there's no target support for such
   an op. This goes beyond checking whether the op is expanded
   because both PPC and AArch64 show better codegen for standard
   types regardless of whether the op is legal/custom.

llvm-svn: 353001
2019-02-03 17:53:09 +00:00
Simon Pilgrim 1fce5a8b75 [X86][AVX] Support shuffle combining for VBROADCAST with smaller vector sources
getTargetShuffleMask can only do this safely if we're extracting the lowest subvector from a vector of the same result type.

llvm-svn: 352999
2019-02-03 16:51:33 +00:00
Sanjay Patel 837552fe9f [PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)
This is the most important uaddo problem mentioned in PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754
...but that was overcome in x86 codegen with D57637.

That patch also corrects the inc vs. add regressions seen with the  previous attempt at this.

Still, we want to make this matcher complete, so we can potentially canonicalize the pattern 
even if it's an 'add 1' operation.
Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 
commuted variants of uaddo.

There's also a test with a crazy type to show that the existing CGP transform based on this 
matcher is not limited by target legality checks.

I'm not sure if the Hexagon diff means the test is no longer testing what it intended to
test, but that should be solvable in a follow-up.

Differential Revision: https://reviews.llvm.org/D57516

llvm-svn: 352998
2019-02-03 16:16:48 +00:00
Simon Pilgrim 18b73a655b [X86][AVX] Support shuffle combining for VPMOVZX with smaller vector sources
llvm-svn: 352997
2019-02-03 16:10:18 +00:00
Simon Pilgrim a2a3e5b811 [X86][AVX] More aggressively simplify BROADCAST source operand
Aim to use scalar source or lowest 128-bit vector directly.

We're still missing some VZMOVL_LOAD combines.

llvm-svn: 352994
2019-02-03 14:39:41 +00:00
Sanjay Patel 3d6ecfc078 [x86] add CGP uaddo test with weird type; NFC
There's probably no reason to try this transform
for an obviously unsupported op.

llvm-svn: 352993
2019-02-03 14:22:43 +00:00
Craig Topper 5a570dd437 [X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly
Summary:
When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name.

This also matches what objdump disassembly prints. It's also what is printed by gcc -S.

Reviewers: RKSimon, rnk, efriedma, spatel, andreadb, lebedev.ri

Reviewed By: rnk

Subscribers: eraman, gbedwell, lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D57621

llvm-svn: 352985
2019-02-03 07:53:39 +00:00
Craig Topper 950ca192f6 [X86] Lower ISD::UADDO to use the Z flag instead of C flag when the RHS is a constant 1 to encourage INC formation.
Summary:
Add an additional combine to combineCarryThroughADD to reverse it back to the C flag to avoid regressions.

I believe this catches the cases that D57547 got.

Reviewers: RKSimon, spatel

Reviewed By: spatel

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57637

llvm-svn: 352984
2019-02-03 07:25:06 +00:00
Craig Topper cf07b097c6 [X86] Add another test case for PR40539. NFC
llvm-svn: 352967
2019-02-02 22:01:41 +00:00
Simon Pilgrim dbf302c9f1 [X86][AVX] Enable INSERT_SUBVECTOR(SRC0, SHUFFLE(SRC1)) shuffle combining
Push the insert_subvector up through the shuffle operands to help find more cross-lane shuffles.

The is exposes a couple of minor issues that will be fixed shortly:
Missed broadcast folds - we have a mixture of vzext_load lengths that need cleaning up
combine-sdiv.ll - AVX1 SimplifyDemandedVectorElts failure (hits max depth due to a couple of extra bitcasts).

llvm-svn: 352963
2019-02-02 18:08:04 +00:00
Simon Pilgrim e95550f508 [X86][AVX] Add VMOVDDUP-VPBROADCASTQ execution domain mapping
Noticed in D57514.

Differential Revision: https://reviews.llvm.org/D57519

llvm-svn: 352922
2019-02-01 21:41:30 +00:00
Wolfgang Pieb 58513b7761 [DWARF v5] Fix DWARF emitter and consumer to produce/expect a uleb for a location description's length.
Reviewer: davide, JDevliegere

Differential Revision: https://reviews.llvm.org/D57550

llvm-svn: 352889
2019-02-01 17:11:58 +00:00