These are used to help convert OR->LEA when needed to avoid avoid a copy. They
aren't need after register allocation.
Happens to remove an ugly goto from X86MCCodeEmitter.cpp
llvm-svn: 356356
This allows us to use an 8-bit sign extended immediate instead of a 16 or 32 bit immediate.
Also do similar for 0x80000000 with 64-bit adds to avoid having to use a movabsq.
llvm-svn: 355485
We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64.
Differential Revision: https://reviews.llvm.org/D58863
llvm-svn: 355423
If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case.
Differential Revision: https://reviews.llvm.org/D58475
llvm-svn: 354811
def32 here means the producing instruction zeroed bits 63:32. We already do this for zext, but it looks like we can get an and+anyext sometimes.
Spotted in the diffs from D33587.
llvm-svn: 352303
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.
This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.
I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.
This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.
Differential Revision: https://reviews.llvm.org/D55975
llvm-svn: 350245
I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.
I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.
Differential Revision: https://reviews.llvm.org/D55414
llvm-svn: 348959
This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused.
I'm trying to decide if we need SETCC_CARRY. This removes one of its usages.
Differential Revision: https://reviews.llvm.org/D55355
llvm-svn: 348536
The machine scheduler currently biases register copies to/from
physical registers to be closer to their point of use / def to
minimize their live ranges. This change extends this to also physical
register assignments from immediate values.
This causes a reduction in reduction in overall register pressure and
minor reduction in spills and indirectly fixes an out-of-registers
assertion (PR39391).
Most test changes are from minor instruction reorderings and register
name selection changes and direct consequences of that.
Reviewers: MatzeB, qcolombet, myatsina, pcc
Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya,
javed.absar, arphaman, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54218
llvm-svn: 346894
The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them.
llvm-svn: 346581
Instead of using the MOVGOT64r pseudo, use the existing
MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to
create a "scratch register operand" for the pseudo to use, and the
register allocator can make better decisions.
Fixes some X86 verifier errors tracked in PR27481.
llvm-svn: 345219
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.
My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.
There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.
Differential Revision: https://reviews.llvm.org/D52757
llvm-svn: 345165
This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about.
We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR.
I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that.
I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet.
Differential Revision: https://reviews.llvm.org/D53126
llvm-svn: 344270
The additional patterns needed for this aren't overwhelming and introducing extra bitcasts during lowering limits our ability to do computeNumSignBits. Not that I have a good example of that for select. I'm just becoming increasingly grumpy about promotion of AND/OR/XOR. SELECT was just a lot easier to fix.
llvm-svn: 343723
Add the .cv_fpo_stackalign directive so that we can define $T0, or the
VFRAME virtual register, with it. This was overlooked in the initial
implementation because unlike MSVC, we push CSRs before allocating stack
space, so this value is only needed to describe local variable
locations. Variables that the compiler now addresses via ESP are instead
described as being stored at offsets from VFRAME, which for us is ESP
after alignment in the prologue.
This adds tests that show that we use the VFRAME register properly in
our S_DEFRANGE records, and that we emit the correct FPO data to define
it.
Fixes PR38857
llvm-svn: 343603
Summary: Similar to D51893 which was for memcpy
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52063
llvm-svn: 342796
Summary:
In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match.
Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger.
Reviewers: rnk, efriedma, MatzeB, echristo
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51893
llvm-svn: 342015
subtarget features for indirect calls and indirect branches.
This is in preparation for enabling *only* the call retpolines when
using speculative load hardening.
I've continued to use subtarget features for now as they continue to
seem the best fit given the lack of other retpoline like constructs so
far.
The LLVM side is pretty simple. I'd like to eventually get rid of the
old feature, but not sure what backwards compatibility issues that will
cause.
This does remove the "implies" from requesting an external thunk. This
always seemed somewhat questionable and is now clearly not desirable --
you specify a thunk the same way no matter which set of things are
getting retpolines.
I really want to keep this nicely isolated from end users and just an
LLVM implementation detail, so I've moved the `-mretpoline` flag in
Clang to no longer rely on a specific subtarget feature by that name and
instead to be directly handled. In some ways this is simpler, but in
order to preserve existing behavior I've had to add some fallback code
so that users who relied on merely passing -mretpoline-external-thunk
continue to get the same behavior. We should eventually remove this
I suspect (we have never tested that it works!) but I've not done that
in this patch.
Differential Revision: https://reviews.llvm.org/D51150
llvm-svn: 340515
Summary:
So far, `isReturn` property is used to mean both a return instruction
from a functon and the end of an EH scope, a scope that starts with a EH
scope entry BB and ends with a catchret or a cleanupret instruction.
Because WinEH uses funclets, all EH-scope-ending instructions are also
real return instruction from a function. But for wasm, they only serve
as the end marker of an EH scope but not a return instruction that
exits a function. This mismatch caused incorrect prolog and epilog
generation in wasm EH scopes. This patch fixes this.
This patch is in the same vein with rL333045, which splits
`MachineBasicBlock::isEHFuncletEntry` into `isEHFuncletEntry` and
`isEHScopeEntry`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50653
llvm-svn: 340325
Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode.
This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test.
llvm-svn: 339496
If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source
Differential Revision: https://reviews.llvm.org/D50270
llvm-svn: 339041
At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering.
So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions.
Differential Revision: https://reviews.llvm.org/D50212
llvm-svn: 338925
Don't try to generate large PIC code for non-ELF targets. Neither COFF
nor MachO have relocations for large position independent code, and
users have been using "large PIC" code models to JIT 64-bit code for a
while now. With this change, if they are generating ELF code, their
JITed code will truly be PIC, but if they target MachO or COFF, it will
contain 64-bit immediates that directly reference external symbols. For
a JIT, that's perfectly fine.
llvm-svn: 337740
This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid.
The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well.
llvm-svn: 337147
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.
LLVM ERROR: unsupported relocation with subtraction expression, symbol
'__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
expression
llvm-svn: 335894
BMI2 added new shift by register instructions that have the ability to fold a load.
Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load.
We used to enforce this priority by artificially lowering the complexity of the load pattern.
This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky.
llvm-svn: 335804
If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.
Fixes PR37938.
llvm-svn: 335754
Summary:
Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining.
This seems like the safest way to do this trick. And we consider doing it as a DAG combine later.
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48557
llvm-svn: 335575
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
.LtmpN:
leaq .LtmpN(%rip), %rbx
movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
addq %rax, %rbx # GOT base reg
From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
movq extern_gv@GOT(%rbx), %rax
movq local_gv@GOTOFF(%rbx), %rax
All calls end up being indirect:
movabsq $local_fn@GOTOFF, %rax
addq %rbx, %rax
callq *%rax
The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg
DSO local data accesses will use it:
movq local_gv@GOTOFF(%rbx), %rax
Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
movq extern_gv@GOTPCREL(%rip), %rax
Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.
This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.
I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC
code model is not implemented for MachO yet.
Differential Revision: https://reviews.llvm.org/D47211
llvm-svn: 335508
For some reason the 64-bit patterns were separated from their 8/16/32-bit friends, but only for add/sub/mul. For and/or/xor they were together.
llvm-svn: 335429
Summary:
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
.LtmpN:
leaq .LtmpN(%rip), %rbx
movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
addq %rax, %rbx # GOT base reg
From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
movq extern_gv@GOT(%rbx), %rax
movq local_gv@GOTOFF(%rbx), %rax
All calls end up being indirect:
movabsq $local_fn@GOTOFF, %rax
addq %rbx, %rax
callq *%rax
The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg
DSO local data accesses will use it:
movq local_gv@GOTOFF(%rbx), %rax
Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
movq extern_gv@GOTPCREL(%rip), %rax
Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.
This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.
Reviewers: chandlerc, echristo
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47211
llvm-svn: 335297
Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH).
Differential Revision: https://reviews.llvm.org/D47690
llvm-svn: 334083