Commit Graph

11620 Commits

Author SHA1 Message Date
Chandler Carruth 0ca3bd0729 [x86] Model the direction flag (DF) separately from the rest of EFLAGS.
This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting EFLAGS
actually modify DF (other than things like popf) and so this needlessly
creates uses of EFLAGS that aren't really there.

In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD,
and the whole-flags writes (WRFLAGS and POPF) need to model this.

I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.

Adding this extra register also uncovered a failure to use the correct
datatype to hold X86 registers, and I've corrected that as necessary
here.

Differential Revision: https://reviews.llvm.org/D45154

llvm-svn: 329673
2018-04-10 06:40:51 +00:00
Craig Topper 7e42af87a6 [X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits.
Prefer to use the 32-bit AND with immediate instead.

Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold.

Fixes PR37063.

llvm-svn: 329667
2018-04-10 03:44:15 +00:00
Chandler Carruth 19618fc639 [x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.

However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.

There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.

This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.

Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.

Differential Revision: https://reviews.llvm.org/D45146

llvm-svn: 329657
2018-04-10 01:41:17 +00:00
Vlad Tsyrklevich 0cdc6ec535 ShadowCallStack/x86_64: Ignore pseudo-machine instructions
llvm-svn: 329656
2018-04-10 01:31:01 +00:00
Simon Pilgrim 3a8fc92865 [X86] Added missing AAD/AAM immediate schedule tests
Added some more TODOs for missing instructions

llvm-svn: 329626
2018-04-09 21:46:57 +00:00
Craig Topper 47b2f9d836 [X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types on targets without AVX512BW.
LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type.

Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round.

Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases.

This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that.

llvm-svn: 329616
2018-04-09 20:37:14 +00:00
Craig Topper 3a0cab73eb [X86] Remove GCCBuiltin name from pmuldq/pmuludq intrinsics so clang can custom lower to native IR. Update fast-isel intrinsic tests for clang's new codegen.
In somes cases fast-isel fails to remove the and/shifts and uses blends or conditional moves.

But once masking gets involved, fast-isel aborts on the mask portion and we DAG combine more thorougly.

llvm-svn: 329604
2018-04-09 19:17:38 +00:00
Craig Topper 0c2a12cb3e [X86] Revert the SLM part of r328914.
While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list.

llvm-svn: 329593
2018-04-09 17:07:40 +00:00
Simon Pilgrim 14566ea6ef [X86][SSE] Add floating point add/mul strict (ordered) vector.reduce tests (PR36732)
llvm-svn: 329587
2018-04-09 16:01:44 +00:00
Simon Pilgrim e5ed5e2cba [X86][MMX] Fix missing itinerary for PALIGNR
llvm-svn: 329568
2018-04-09 13:52:33 +00:00
Simon Pilgrim 140fee078f [X86][MMX] Fix missing itinerary for MOVQ2DQ instruction format
llvm-svn: 329567
2018-04-09 13:42:14 +00:00
Simon Pilgrim abf3611332 [X86][MMX] Fix missing itinerary for CVTPI2PS
llvm-svn: 329565
2018-04-09 13:27:47 +00:00
Simon Pilgrim 0047efdd1e [X86][MMX] Fix flipped reg/mem typo in MMX_MISC_FUNC_ITINS
The RR/RM itineraries were the wrong way around

llvm-svn: 329561
2018-04-09 13:02:07 +00:00
Simon Pilgrim 6131286553 [X86][SSE] Fix f32 mul/div itinerary groups typo
The RM folded itineraries were incorrectly using the f64 version.

llvm-svn: 329556
2018-04-09 10:45:53 +00:00
Sam Parker 1f4f4d9a08 [DAGCombine] Improve ReduceLoad for SRL
Recommitting r329283, third time lucky...

If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350

llvm-svn: 329551
2018-04-09 08:16:11 +00:00
Michael Zolotukhin 8d052a0dd2 Remove MachineLoopInfo dependency from AsmPrinter.
Summary:
Currently MachineLoopInfo is used in only two places:
1) for computing IsBasicBlockInsideInnermostLoop field of MCCodePaddingContext, and it is never used.
2) in emitBasicBlockLoopComments, which is called only if `isVerbose()` is true.
Despite that, we currently have a dependency on MachineLoopInfo, which makes
pass manager to compute it and MachineDominator Tree. This patch removes the
use (1) and makes the use (2) lazy, thus avoiding some redundant
recomputations.

Reviewers: opaparo, gadi.haber, rafael, craig.topper, zvi

Subscribers: rengolin, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D44812

llvm-svn: 329542
2018-04-09 00:54:47 +00:00
Craig Topper b7baa358f6 [X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs.
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.

This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.

The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.

Reviewers: RKSimon, andreadb, GGanesh

Reviewed By: RKSimon

Subscribers: GGanesh, llvm-commits

Differential Revision: https://reviews.llvm.org/D45380

llvm-svn: 329539
2018-04-08 17:53:18 +00:00
Craig Topper c362f42b6a [X86][Znver1] Remove InstRWs for BLENDVPS/PD
Summary:
This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data.

The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx"

Reviewers: RKSimon, GGanesh

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44841

llvm-svn: 329538
2018-04-08 17:53:15 +00:00
Simon Pilgrim bf2df1e26c [X86] Regenerate and + immediate mask tests
Added i686 checks

llvm-svn: 329529
2018-04-08 12:31:52 +00:00
Simon Pilgrim 44374cf7b0 [X86][PKU] Regenerate rdpkru/wrpkru intrinsic tests
Added i686 checks

llvm-svn: 329528
2018-04-08 12:30:30 +00:00
Simon Pilgrim 14df0ae8d2 [X86][SSE3] Regenerate mwait/monitor intrinsic tests
Added i686 checks

llvm-svn: 329527
2018-04-08 12:29:11 +00:00
Zvi Rackover 7a53f169f1 DAGCombiner: Combine SDIV with non-splat vector pow2 divisor
Summary:
Extend existing SDIV combine for pow2 constant divider to handle
non-splat vectors of pow2 constants.

Reviewers: RKSimon, craig.topper, spatel, hfinkel, efriedma

Reviewed By: RKSimon

Subscribers: magabari, llvm-commits

Differential Revision: https://reviews.llvm.org/D42479

llvm-svn: 329525
2018-04-08 11:35:20 +00:00
Simon Pilgrim 86588fc809 [X86][Btver2] Add vector extract costs
llvm-svn: 329524
2018-04-08 11:26:26 +00:00
Guozhi Wei 0eb86c8efc [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))
In our real world application, we found the following optimization is missed in DAGCombiner

(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))

If the user of original zext is an add, it may enable further lea optimization on x86.

This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.

Differential Revision: https://reviews.llvm.org/D44402

llvm-svn: 329516
2018-04-07 23:36:10 +00:00
Simon Pilgrim d6981b1d37 [X86] Regenerate atom pshufb test
llvm-svn: 329511
2018-04-07 19:50:09 +00:00
Craig Topper ef37aebc96 [X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering.
Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget

llvm-svn: 329510
2018-04-07 19:09:52 +00:00
Craig Topper 5b95eae1c3 [DAGCombiner] Add a combine to turn a build vector of zero extends of extract vector elts into a vector zero extend and possibly an extract subvector.
llvm-svn: 329509
2018-04-07 19:09:50 +00:00
Tim Northover e25e458d52 Reapply ARM: Do not spill CSR to stack on entry to noreturn functions
Should fix UBSan bot by also checking there's no "uwtable" attribute
before skipping. Otherwise the unwind table will be useless since its
moves expect CSRs to actually be preserved.

A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.

Should fix PR9970.

Patch mostly by myeisha (pmb).

llvm-svn: 329494
2018-04-07 10:57:03 +00:00
Vitaly Buka de5f196530 Revert "ARM: Do not spill CSR to stack on entry to noreturn functions"
Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM

This reverts commit r329287

llvm-svn: 329486
2018-04-07 05:36:44 +00:00
Matt Davis 13b8331054 [StackProtector] Ignore certain intrinsics when calculating sspstrong heuristic.
Summary:
The 'strong' StackProtector heuristic takes into consideration call instructions.
Certain intrinsics, such as lifetime.start, can cause the
StackProtector to protect functions that do not need to be protected.

Specifically, a volatile variable, (not optimized away), but belonging to a stack
allocation will encourage a llvm.lifetime.start to be inserted during
compilation. Because that intrinsic is a 'call' the strong StackProtector
will see that the alloca'd variable is being passed to a call instruction, and
insert a stack protector. In this case the intrinsic isn't really lowered to a
call. This can cause unnecessary stack checking, at the cost of additional
(wasted) CPU cycles.

In the future we should rely on TargetTransformInfo::isLoweredToCall, but as of
now that routine considers all intrinsics as not being lowerable. That needs
to be corrected, and such a change is on my list of things to get moving on.

As a side note, the updated stack-protector-dbginfo.ll test always seems to
pass.  I never see the dbg.declare/dbg.value reaching the
StackProtector::HasAddressTaken, but I don't see any code excluding dbg
intrinsic calls either, so I think it's the safest thing to do.

Reviewers: void, timshen

Reviewed By: timshen

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45331

llvm-svn: 329450
2018-04-06 20:14:13 +00:00
Craig Topper f0d042619b [X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs
Summary:
This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency.

Apparently we were inconsistent about whether the store has latency or not thus the test changes.

I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5.

Reviewers: RKSimon, andreadb

Reviewed By: andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45351

llvm-svn: 329416
2018-04-06 16:16:48 +00:00
Simon Pilgrim 09eeb3a8b9 [X86][SandyBridge] Add (V)DPPS memory fold latencies
Noticed this during D44654

llvm-svn: 329389
2018-04-06 11:25:21 +00:00
Simon Pilgrim 8a83f16ccd [X86][SandyBridge] SBWriteResPair +5cy Memory Folds
As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions.

I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent.

As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests...

Differential Revision: https://reviews.llvm.org/D44654

llvm-svn: 329388
2018-04-06 11:00:51 +00:00
Zvi Rackover 78a065ff16 X86 Tests: Add a case for combining sdiv by a splatted pow2 negative. NFC.
Noticed test was missing while working on D42479.

llvm-svn: 329356
2018-04-05 21:57:20 +00:00
Craig Topper fbe3132f67 [X86] Separate CDQ and CDQE in the scheduler model.
According to Agner's data, CDQE is closer to CWDE.

llvm-svn: 329354
2018-04-05 21:56:19 +00:00
Craig Topper 4cc3827791 [X86] Add MOVZPQILo2PQIrr to the Sandy Bridge scheduler model
llvm-svn: 329351
2018-04-05 21:40:32 +00:00
Craig Topper 3b0b96c591 [X86] Add LEAVE instruction to the scheduler models using the same data as LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge.
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.

The Sandy Bridge version was missing a load port use.

llvm-svn: 329347
2018-04-05 21:16:26 +00:00
Simon Pilgrim 9b41cac3e9 [X86][SSE] Add floating point add/mul fast-math vector.reduce tests
Strict versions aren't working at all (PR36732) and the accumulators aren't supported (PR36734)

llvm-svn: 329344
2018-04-05 21:01:21 +00:00
Simon Pilgrim 806252fab0 [X86][SSE] Add floating point min/max vector.reduce tests
llvm-svn: 329343
2018-04-05 20:54:55 +00:00
Craig Topper c6bb36a3d0 [X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.

llvm-svn: 329339
2018-04-05 20:04:06 +00:00
Craig Topper 9eec2025c5 [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
Mostly vector load, store, and move instructions.

llvm-svn: 329330
2018-04-05 18:38:45 +00:00
Simon Pilgrim 7f6f43fa3e [X86][SSE] Add integer add/mul vector.reduce tests
llvm-svn: 329321
2018-04-05 17:37:35 +00:00
Simon Pilgrim de5d0ffe47 [X86][SSE] Add integer and/or/xor vector.reduce tests
llvm-svn: 329320
2018-04-05 17:29:51 +00:00
Simon Pilgrim 57d324082c [X86][SSE] Add integer min/max vector.reduce tests
llvm-svn: 329319
2018-04-05 17:25:40 +00:00
Tim Northover b30388bf11 ARM: Do not spill CSR to stack on entry to noreturn functions
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.

Should fix PR9970.

Patch by myeisha (pmb).

llvm-svn: 329287
2018-04-05 14:26:06 +00:00
Sam Parker 0e7deb8104 [DAGCombine] Revert r329160
Again, broke the big endian stage 2 builders.

llvm-svn: 329283
2018-04-05 13:46:17 +00:00
Craig Topper 15303dda0d [X86] Revert r329251-329254
It's failing on the bots and I'm not sure why.

This reverts:

[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC

llvm-svn: 329256
2018-04-05 05:19:36 +00:00
Craig Topper 25c7110a37 [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
Mostly vector load, store, and move instructions.

llvm-svn: 329254
2018-04-05 04:42:03 +00:00
Craig Topper 6c4e08c835 [X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.

llvm-svn: 329252
2018-04-05 04:42:01 +00:00
Craig Topper 5c36557426 [X86] Auto-generate complete checks. NFC
llvm-svn: 329251
2018-04-05 04:41:59 +00:00