This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting EFLAGS
actually modify DF (other than things like popf) and so this needlessly
creates uses of EFLAGS that aren't really there.
In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD,
and the whole-flags writes (WRFLAGS and POPF) need to model this.
I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.
Adding this extra register also uncovered a failure to use the correct
datatype to hold X86 registers, and I've corrected that as necessary
here.
Differential Revision: https://reviews.llvm.org/D45154
llvm-svn: 329673
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.
Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.
This also adds a new test for the new redzone behaviour.
llvm-svn: 329134
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.
llvm-svn: 328944
This makes the Y position consistent with other instructions.
This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.
llvm-svn: 328254
At the point the outliner runs, KILLs don't impact anything, but they're still
considered unique instructions. This commit makes them invisible like
DebugValues so that they can still be outlined without impacting outlining
decisions.
llvm-svn: 327760
These tables add 3000 lines to X86InstrInfo.cpp. And if we ever manage to auto generate them they'll be a separate file anyway.
Differential Revision: https://reviews.llvm.org/D43806
llvm-svn: 326225
hit from IR but creates a minefield for MI passes.
The x86 backend has fairly powerful logic to try and fold loads that
feed register operands to instructions into a memory operand on the
instruction. This is almost always a good thing, but there are specific
relocated loads that are only allowed to appear in specific
instructions. Notably, R_X86_64_GOTTPOFF is only allowed in `movq` and
`addq`. This patch blocks folding of memory operands using this
relocation unless the target is in fact `addq`.
The particular relocation indicates why we simply don't hit this under
normal circumstances. This relocation is only used for TLS, and it gets
used in very specific ways in conjunction with %fs-relative addressing.
The result is that loads using this relocation are essentially never
eligible for folding into an instruction's memory operands. Unless, of
course, you have an MI pass that inserts usage of such a load. I have
exactly such an MI pass and was greeted by truly mysterious miscompiles
where the linker replaced my instruction with a completely garbage byte
sequence. Go team.
This is the only such relocation I'm aware of in x86, but there may be
others that need to be similarly restricted.
Fixes PR36165.
Differential Revision: https://reviews.llvm.org/D42732
llvm-svn: 324546
If we are saving/restoring k-registers, the default behavior of getMinimalRegisterClass will find the VK64 class with a spill size of 64 bits. This will cause the KMOVQ opcode to be used for save/restore. If we don't have have BWI instructions we need to constrain the class returned to give us VK16 with a 16-bit spill size. We can do this by passing the either v16i1 or v64i1 into getMinimalRegisterClass.
Also add asserts to make sure BWI is enabled anytime we use KMOVD/KMOVQ. These are what caught this bug.
Fixes PR36256
Differential Revision: https://reviews.llvm.org/D42989
llvm-svn: 324533
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.
Reviewers: craig.topper, niravd, spatel, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42646
llvm-svn: 323888
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.
The main noteworthy regression I was able to observe was pattern of the type (setcc (trunc (and X, C)), 0) where C is such as it would benefit from the hi register trick. To prevent this, a new pattern is added to materialize such pattern using a 32 bits test. This has the added benefit of working with any constant that is materializable as a 32bits immediate, not just the ones that can leverage the high register trick, as demonstrated by the test case in test-shrink.ll using the constant 2049 .
Reviewers: craig.topper, niravd, spatel, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42646
llvm-svn: 323690
MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation.
SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128.
AVX2 instructions should use a Y to indicate 256-bits.
llvm-svn: 323402
All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up.
llvm-svn: 323261
I'm not sure there's any way to generate these folding cases especially the movzx ones since even the register form is never emitted by codegen.
I'm just adding them to remove the difference with the autogenerated version of the folding table.
llvm-svn: 323200
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.
This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.
Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.
Additional refactoring changes will follow.
This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.
This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.
Additional relevant reviews:
https://reviews.llvm.org/D40331https://reviews.llvm.org/D40332https://reviews.llvm.org/D40333https://reviews.llvm.org/D40334
Differential Revision: https://reviews.llvm.org/D40330
Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a XMM zero register.
This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well.
Differential Revision: https://reviews.llvm.org/D41908
llvm-svn: 322525
Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW.
Differential Revision: https://reviews.llvm.org/D42042
llvm-svn: 322524
This commit does two things. Firstly, it adds a collection of flags which can
be passed along to the target to encode information about the MBB that an
instruction lives in to the outliner.
Second, it adds some of those flags to the AArch64 outliner in order to add
more stack instructions to the list of legal instructions that are handled
by the outliner. The two flags added check if
- There are calls in the MachineBasicBlock containing the instruction
- The link register is available in the entire block
If the link register is available and there are no calls, then a stack
instruction can always be outlined without fixups, regardless of what it is,
since in this case, the outliner will never modify the stack to create a
call or outlined frame.
The motivation for doing this was checking which instructions are most often
missed by the outliner. Instructions like, say
%sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy
are very common, but cannot be outlined in the case that the outliner might
modify the stack. This commit allows us to outline instructions like this.
llvm-svn: 322048
The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment.
llvm-svn: 321956
The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros.
llvm-svn: 321950
This matches AVX512 version and is more consistent overall. And improves our scheduler models.
In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses.
llvm-svn: 320325
The patch originally broke Chromium (crbug.com/791714) due to its failing to
specify that the new pseudo instructions clobber EFLAGS. This commit fixes
that.
> Summary: This strengthens the guard and matches MSVC.
>
> Reviewers: hans, etienneb
>
> Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D40622
llvm-svn: 319824
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.
This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.
Differential Revision: https://reviews.llvm.org/D40674
llvm-svn: 319505
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
The NewCC variable is calculated outside of the loop that processes jcc/setcc/cmovcc instructions. If we invert it during the loop it can cause an incorrect value to be used by a later iteration. Instead only read it during the loop and use a new variable to store the possibly inverted value.
Fixes PR35399.
llvm-svn: 318934
This is consistent with out normal implementation of scalar instructions.
While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice.
llvm-svn: 317977
This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses.
We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch.
Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want.
Differential Revision: https://reviews.llvm.org/D39402
llvm-svn: 317112
Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior.
Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used?
llvm-svn: 316978
I have a future patch that wants to make use of the one of the partial functions in one of the earlier memory folding methods and the current ordering prevents that.
llvm-svn: 316883
I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true.
llvm-svn: 315795
Say you have two identical linkonceodr functions, one in M1 and one in M2.
Say that the outliner outlines A,B,C from one function, and D,E,F from another
function (where letters are instructions). Now those functions are not
identical, and cannot be deduped. Locally to M1 and M2, these outlining
choices would be good-- to the whole program, however, this might not be true!
To mitigate this, this commit makes it so that the outliner sees linkonceodr
functions as unsafe to outline from. It also adds a flag,
-enable-linkonceodr-outlining, which allows the user to specify that they
want to outline from such functions when they know what they're doing.
Changing this handles most code size regressions in the test suite caused by
competing with linker dedupe. It also doesn't have a huge impact on the code
size improvements from the outliner. There are 6 tests that regress > 5% from
outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most
tests either improve or are not impacted.
Not outlined vs outlined without linkonceodrs:
https://hastebin.com/raw/qeguxavuda
Not outlined vs outlined with linkonceodrs:
https://hastebin.com/raw/edepoqoqic
Outlined with linkonceodrs vs outlined without linkonceodrs:
https://hastebin.com/raw/awiqifiheb
Numbers generated using compare.py with -m size.__text. Tests run for AArch64
with -Oz -mllvm -enable-machine-outliner -mno-red-zone.
llvm-svn: 315136
This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted.
This should fix PR33079
Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results.
Differential Revision: https://reviews.llvm.org/D38449
llvm-svn: 314914
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.
For the register®ister form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register®ister form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.
I believe this supercedes D38025 which was trying to switch the register®ister form back to pre-PR22995.
Reviewers: aymanmus, RKSimon, zvi
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38120
llvm-svn: 314639
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.
This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.
This also improves several comments on the outliner's cost model.
https://reviews.llvm.org/D36721
llvm-svn: 314341
This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix.
llvm-svn: 314248
Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly.
Patch by David Zarzycki.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37224
llvm-svn: 311979
This commit
- Removes IsTailCall and replaces it with a target-defined unsigned
- Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall
- Adds a call class + frame class classification to OutlinedFunction and Candidate respectively
This accomplishes a couple things.
Firstly, we don't need the notion of *tail call* in the general outlining algorithm.
Secondly, we now can have different "outlining classes" for each candidate within a set of candidates.
This will make it easy to add new ways to outline sequences for certain targets and dynamically choose
an appropriate cost model for a sequence depending on the context that that sequence lives in.
Ultimately, this should get us closer to being able to do something like, say avoid saving the link
register when outlining AArch64 instructions.
llvm-svn: 309475
This is some more cleanup in preparation for some actual
functional changes. This splits getOutliningBenefit into
two cost functions: getOutliningCallOverhead and
getOutliningFrameOverhead. These functions return the
number of instructions that would be required to call
a specific function and the number of instructions
that would be required to construct a frame for a
specific funtion. The actual outlining benefit logic
is moved into the outliner, which calls these functions.
The goal of refactoring getOutliningBenefit is to:
- Get us closer to getting rid of the IsTailCall flag
- Further split up "target-specific" things and
"general algorithm" things
llvm-svn: 309356
In testing, we've found yet another miscompile caused by the new tables.
And this one is even less clear how to fix (we could teach it to fold
a 16-bit load instead of the 32-bit load it wants, or block folding
entirely).
Also, the approach to excluding instructions seems increasingly to not
scale well.
I have left a more detailed analysis on the review log for the original
patch (https://reviews.llvm.org/D32684) along with suggested path
forward. I will land an additional test case that I wrote which covers
the code that was miscompiling (folding into the output of `pextrw`) in
a subsequent commit to keep this a pure revert.
For each commit reverted here, I've restricted the revert to the
non-test code touching the x86 fold table emission until the last commit
where I did revert the test updates. This means the *new* test cases
added for `insertps` and `xchg` remain untouched (and continue to pass).
Reverted commits:
r304540: [X86] Don't fold into memory operands into insertps in the ...
r304347: [TableGen] Adapt more places to getValueAsString now ...
r304163: [X86] Don't fold away the memory operand of an xchg.
r304123: Don't capture a temporary std::string in a StringRef.
r304122: Resubmit "[X86] Adding new LLVM TableGen backend that ..."
Original commit was in r304088, and after a string of fixes was reverted
previously in r304121 to fix build bots, and then re-landed in r304122.
llvm-svn: 304762
This was reverted due to buildbot breakages and I was not familiar
with this code to investigate it. But while trying to get a
useful backtrace for the author, it turns out the fix was very
obvious. Resubmitting this patch as is, and will submit the
fix in a followup so that the fix is not hidden in the larger
CL.
llvm-svn: 304122
This reverts commit 28cb1003507f287726f43c771024a1dc102c45fe as well
as all subsequent followups. llvm-tblgen currently segfaults with
this change, and it seems it has been broken on the bots all
day with no fixes in preparation. See, for example:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/
llvm-svn: 304121
X86 backend holds huge tables in order to map between the register and memory forms of each instruction.
This TableGen Backend automatically generated all these tables with the appropriate flags for each entry.
Differential Revision: https://reviews.llvm.org/D32684
llvm-svn: 304088
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).
Differential Revision: https://reviews.llvm.org/D33169
llvm-svn: 303858
Summary:
Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector.
This is a pre-commit for a patch I'm working on to support G_ICMP. NFC.
Reviewers: zvi, guyblank, delena
Reviewed By: guyblank, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33038
llvm-svn: 302767
1. RegisterClass::getSize() is split into two functions:
- TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const;
- TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const;
2. RegisterClass::getAlignment() is replaced by:
- TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const;
This will allow making those values depend on subtarget features in the
future.
Differential Revision: https://reviews.llvm.org/D31783
llvm-svn: 301221
In addition to the original commit, tighten the condition for when to
pad empty functions to COFF Windows. This avoids running into problems
when targeting e.g. Win32 AMDGPU, which caused test failures when this
was committed initially.
llvm-svn: 301047
Empty functions can lead to duplicate entries in the Guard CF Function
Table of a binary due to multiple functions sharing the same RVA,
causing the kernel to refuse to load that binary.
We had a terrific bug due to this in Chromium.
It turns out we were already doing this for Mach-O in certain
situations. This patch expands the code for that in
AsmPrinter::EmitFunctionBody() and renames
TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it
seems it was used for not just Mach-O anyway.
Differential Revision: https://reviews.llvm.org/D32330
llvm-svn: 301040
Instructions CALLSEQ_START..CALLSEQ_END and their target dependent
counterparts keep data like frame size, stack adjustment etc. These
data are accessed by getOperand using hard coded indices. It is
error prone way. This change implements the access by special methods,
which improve readability and allow changing data representation without
massive changes of index values.
Differential Revision: https://reviews.llvm.org/D31953
llvm-svn: 300196
Check if the scale operand is identical (doesn't have to be 1) and
do not check the chaain operand.
Differential revision: https://reviews.llvm.org/D31833
llvm-svn: 299986
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.
This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.
I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.
Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.
This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.
Differential Revision: https://reviews.llvm.org/D30968
llvm-svn: 298928
In fact this default implementation should be the only implementation,
keep it virtual for now to accomodate targets that don't model flags
correctly.
Differential Revision: https://reviews.llvm.org/D30747
llvm-svn: 297980
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.
Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.
llvm-svn: 297653
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.
I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.
llvm-svn: 297574
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.
llvm-svn: 297081
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html
The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.
This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.
The outliner is run like so:
clang -mno-red-zone -mllvm -enable-machine-outliner file.c
Patch by Jessica Paquette<jpaquette@apple.com>!
rdar://29166825
Differential Revision: https://reviews.llvm.org/D26872
llvm-svn: 296418
AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors.
Differential Revision: https://reviews.llvm.org/D29988
llvm-svn: 295940
The instructions are marked commutable, but without special handling we don't get the immediate correct.
While here also remove the masked memory forms that aren't commutable.
llvm-svn: 295602
This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions.
llvm-svn: 295579
It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses.
llvm-svn: 295564
The original commit was reverted in r283329 due to a miscompile in
Chromium. That turned out to be the same issue as PR31257, which was
fixed in r295262.
llvm-svn: 295357
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.
Differential Revision: https://reviews.llvm.org/D29856
llvm-svn: 295262
Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available.
This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp.
Overall I think this produces better results in the modified test cases.
llvm-svn: 294824
This requires that we communicate to X86InstrInfo::optimizeCompareInstr
that the second operand is neither a register nor an immediate. The way we
do that is by setting CmpMask to zero.
Note that there were already instructions where the second operand was not a
register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero
for those instructions. This seems like a latent bug, but I was unable to
trigger it.
Differential Revision: https://reviews.llvm.org/D28621
llvm-svn: 294634
They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).
Reverting until we figure out the proper solution.
llvm-svn: 294348
This includes unmasked forms of variable shift and shifting by the lower element of a register.
Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms.
llvm-svn: 294285
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.
In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.
Differential revision: https://reviews.llvm.org/D28489
llvm-svn: 293737
Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions.
This also picks up cases where the masked move was created due to a masked load intrinsic.
Differential Revision: https://reviews.llvm.org/D28454
llvm-svn: 292005
We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD.
This makes it possible for the register allocator to have all 32 registers available to work with.
llvm-svn: 292004
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch.
llvm-svn: 291885
Summary:
The expression for computing the return value of getMemOpBaseRegImmOfs has only
one possible value. The other value would result in a return earlier in the
function. This patch replaces the expression with its only possible value.
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27437
llvm-svn: 290133
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.
Differential Revision: https://reviews.llvm.org/D27684
llvm-svn: 289825
The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are:
Support for folding multiple operands at once which reference the same load
Support for folding multiple loads into a single instruction
Walk all the operands of the instruction for varidic instructions (this is a bug fix!)
Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here.
Differential Revision: https://reviews.llvm.org/D24103
llvm-svn: 289510
Summary:
These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits.
As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded.
Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too.
Reviewers: spatel, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27611
llvm-svn: 289419
The second operand of an "ri" instruction may be an immediate, but it may
also be a globalvariable, so we should make any assumptions.
This fixes PR31271.
Differential Revision: https://reviews.llvm.org/D27481
llvm-svn: 288964
Summary:
This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently.
I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel.
I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available.
Reviewers: spatel, delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27401
llvm-svn: 288771
This changes the scalar non-intrinsic non-avx roundss/sd instruction
definitions not to read their destination register - allowing partial dependency
breaking.
This fixes PR31143.
Differential Revision: https://reviews.llvm.org/D27323
llvm-svn: 288703
This makes the createGenericSchedLive() function that constructs the
default scheduler available for the public API. This should help when
you want to get a scheduler and the default list of DAG mutations.
This also shrinks the list of default DAG mutations:
{Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
added by default. Targets can easily add them if they need them. It also
makes it easier for targets to add alternative/custom macrofusion or
clustering mutations while staying with the default
createGenericSchedLive(). It also saves the callback back and forth in
TargetInstrInfo::enableClusterLoads()/enableClusterStores().
Differential Revision: https://reviews.llvm.org/D26986
llvm-svn: 288057
I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships.
llvm-svn: 288007
If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist.
I probably missed some instructions, but this should be a large portion of them.
llvm-svn: 288001
Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load.
llvm-svn: 287991