This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting EFLAGS
actually modify DF (other than things like popf) and so this needlessly
creates uses of EFLAGS that aren't really there.
In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD,
and the whole-flags writes (WRFLAGS and POPF) need to model this.
I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.
Adding this extra register also uncovered a failure to use the correct
datatype to hold X86 registers, and I've corrected that as necessary
here.
Differential Revision: https://reviews.llvm.org/D45154
llvm-svn: 329673
Prefer to use the 32-bit AND with immediate instead.
Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold.
Fixes PR37063.
llvm-svn: 329667
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type.
Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round.
Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases.
This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that.
llvm-svn: 329616
In somes cases fast-isel fails to remove the and/shifts and uses blends or conditional moves.
But once masking gets involved, fast-isel aborts on the mask portion and we DAG combine more thorougly.
llvm-svn: 329604
While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list.
llvm-svn: 329593
Recommitting r329283, third time lucky...
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.
Differential Revision: https://reviews.llvm.org/D41350
llvm-svn: 329551
Summary:
Currently MachineLoopInfo is used in only two places:
1) for computing IsBasicBlockInsideInnermostLoop field of MCCodePaddingContext, and it is never used.
2) in emitBasicBlockLoopComments, which is called only if `isVerbose()` is true.
Despite that, we currently have a dependency on MachineLoopInfo, which makes
pass manager to compute it and MachineDominator Tree. This patch removes the
use (1) and makes the use (2) lazy, thus avoiding some redundant
recomputations.
Reviewers: opaparo, gadi.haber, rafael, craig.topper, zvi
Subscribers: rengolin, javed.absar, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44812
llvm-svn: 329542
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.
This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.
The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.
Reviewers: RKSimon, andreadb, GGanesh
Reviewed By: RKSimon
Subscribers: GGanesh, llvm-commits
Differential Revision: https://reviews.llvm.org/D45380
llvm-svn: 329539
Summary:
This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data.
The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx"
Reviewers: RKSimon, GGanesh
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44841
llvm-svn: 329538
In our real world application, we found the following optimization is missed in DAGCombiner
(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))
If the user of original zext is an add, it may enable further lea optimization on x86.
This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.
Differential Revision: https://reviews.llvm.org/D44402
llvm-svn: 329516
Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget
llvm-svn: 329510
Should fix UBSan bot by also checking there's no "uwtable" attribute
before skipping. Otherwise the unwind table will be useless since its
moves expect CSRs to actually be preserved.
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.
Should fix PR9970.
Patch mostly by myeisha (pmb).
llvm-svn: 329494
Summary:
The 'strong' StackProtector heuristic takes into consideration call instructions.
Certain intrinsics, such as lifetime.start, can cause the
StackProtector to protect functions that do not need to be protected.
Specifically, a volatile variable, (not optimized away), but belonging to a stack
allocation will encourage a llvm.lifetime.start to be inserted during
compilation. Because that intrinsic is a 'call' the strong StackProtector
will see that the alloca'd variable is being passed to a call instruction, and
insert a stack protector. In this case the intrinsic isn't really lowered to a
call. This can cause unnecessary stack checking, at the cost of additional
(wasted) CPU cycles.
In the future we should rely on TargetTransformInfo::isLoweredToCall, but as of
now that routine considers all intrinsics as not being lowerable. That needs
to be corrected, and such a change is on my list of things to get moving on.
As a side note, the updated stack-protector-dbginfo.ll test always seems to
pass. I never see the dbg.declare/dbg.value reaching the
StackProtector::HasAddressTaken, but I don't see any code excluding dbg
intrinsic calls either, so I think it's the safest thing to do.
Reviewers: void, timshen
Reviewed By: timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45331
llvm-svn: 329450
Summary:
This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency.
Apparently we were inconsistent about whether the store has latency or not thus the test changes.
I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5.
Reviewers: RKSimon, andreadb
Reviewed By: andreadb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45351
llvm-svn: 329416
As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions.
I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent.
As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests...
Differential Revision: https://reviews.llvm.org/D44654
llvm-svn: 329388
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.
The Sandy Bridge version was missing a load port use.
llvm-svn: 329347
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.
llvm-svn: 329339
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.
Should fix PR9970.
Patch by myeisha (pmb).
llvm-svn: 329287
It's failing on the bots and I'm not sure why.
This reverts:
[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC
llvm-svn: 329256
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.
llvm-svn: 329252