We were calling tryFoldLoad with the 'and' node was the root and parent node of the load. But the parent of the load should be the shift that proceeds the and. While the and node is correctly the root node.
To fix this I had to make tryFoldLoad take a separate use and root input. I've added a convenience version with the old signature to avoid updating the other call sites.
llvm-svn: 317720
Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits
and does not need the redundant shift left and shift right instructions afterwards.
Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM
and icmp(eq,and(X,Y), 0) goes folds into TESTNM
This commit is a preparation for lowering the test and testn X86 intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38732
llvm-svn: 317465
I think this code is unreachable due to some promotions that occur elsewhere. I'll look into that to be sure, but for now I thought I should at least fix the obvious typo.
llvm-svn: 316840
If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation.
This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case.
This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG.
Differential Revision: https://reviews.llvm.org/D38275
llvm-svn: 316702
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.
Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.
Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.
Differential Revision: https://reviews.llvm.org/D38406
llvm-svn: 315590
It broke the Chromium / SQLite build; see PR34830.
> Summary:
> 1/ Operand folding during complex pattern matching for LEAs has been
> extended, such that it promotes Scale to accommodate similar operand
> appearing in the DAG.
> e.g.
> T1 = A + B
> T2 = T1 + 10
> T3 = T2 + A
> For above DAG rooted at T3, X86AddressMode will no look like
> Base = B , Index = A , Scale = 2 , Disp = 10
>
> 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
> so that if there is an opportunity then complex LEAs (having 3 operands)
> could be factored out.
> e.g.
> leal 1(%rax,%rcx,1), %rdx
> leal 1(%rax,%rcx,2), %rcx
> will be factored as following
> leal 1(%rax,%rcx,1), %rdx
> leal (%rdx,%rcx) , %edx
>
> 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
> thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314919
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
Reviewed By: lsaba
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314886
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.
I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.
Reviewers: RKSimon, zvi, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38273
llvm-svn: 314474
Summary:
Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash.
DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38276
llvm-svn: 314430
This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer.
This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends.
This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size.
Differential Revision: https://reviews.llvm.org/D38217
llvm-svn: 314133
Similar to what we do for X86ISD::SHRUNKBLEND just turn X86ISD::SELECT into ISD::VSELECT. This allows us to remove the duplicated TRUNC patterns.
Differential Revision: https://reviews.llvm.org/D38022
llvm-svn: 313644
This is similar to D37843, but for sub_8bit. This fixes all of the patterns except for the 2 that emit only an EXTRACT_SUBREG. That causes a verifier error with global isel because global isel doesn't know to issue the ABCD when doing this extract on 32-bits targets.
Differential Revision: https://reviews.llvm.org/D37890
llvm-svn: 313558
I'm pretty sure that InstrEmitter::EmitSubregNode will take care of this itself by calling ConstrainForSubReg which in turn calls TRI->getSubClassWithSubReg.
I think Jakob Stoklund Olesen alluded to this in his commit message for r141207 which added the code to EmitSubregNode.
Differential Revision: https://reviews.llvm.org/D37843
llvm-svn: 313557
This caused PR34629: asserts firing when building Chromium. It also broke some
buildbots building test-suite as reported on the commit thread.
> Summary:
> 1/ Operand folding during complex pattern matching for LEAs has been
> extended, such that it promotes Scale to accommodate similar operand
> appearing in the DAG.
> e.g.
> T1 = A + B
> T2 = T1 + 10
> T3 = T2 + A
> For above DAG rooted at T3, X86AddressMode will no look like
> Base = B , Index = A , Scale = 2 , Disp = 10
>
> 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
> so that if there is an opportunity then complex LEAs (having 3 operands)
> could be factored out.
> e.g.
> leal 1(%rax,%rcx,1), %rdx
> leal 1(%rax,%rcx,2), %rcx
> will be factored as following
> leal 1(%rax,%rcx,1), %rdx
> leal (%rdx,%rcx) , %edx
>
> 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
> thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet
>
> Reviewed By: lsaba
>
> Subscribers: spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 313376
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet
Reviewed By: lsaba
Subscribers: spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 313343
Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp.
This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together.
I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel.
I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch.
Differential Revision: https://reviews.llvm.org/D37592
llvm-svn: 313054
Summary:
Once we've done our custom isel for these nodes, I think we should be calling removeDeadNode to prune them out of the DAG. Table driven isel ultimately either calls morphNodeTo which modifies a node and doesn't leave dead nodes. Or it emits new nodes and then calls removeDeadNode as part of Opc_CompleteMatch.
If you run a simple multiply test case like this through llc with -debug you'll see a umul_lohi node get printed as part of the dump for Instruction Selection ends.
```
define i64 @foo(i64 %a, i64 %b) local_unnamed_addr #0 {
entry:
%conv = zext i64 %a to i128
%conv1 = zext i64 %b to i128
%mul = mul nuw nsw i128 %conv1, %conv
%shr = lshr i128 %mul, 64
%conv2 = trunc i128 %shr to i64
ret i64 %conv2
}
```
Reviewers: RKSimon, spatel, zvi, guyblank, niravd
Reviewed By: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37547
llvm-svn: 312857
cover the bitwise operators.
Nothing really exciting here, this just stamps out the rest of the core
operations that can RMW memory and set flags.
Still not implemented here: ADC, SBB. Those will require more
interesting logic to channel the flags *in*, and I'm not currently
planning to try to tackle that. It might be interesting for someone who
wants to improve our code generation for bignum implementations.
Differential Revision: https://reviews.llvm.org/D37141
llvm-svn: 312768
operands and used flags to support matching immediate operands.
This is a bit trickier than register operands, and we still want to fall
back on a register operands even for things that appear to be
"immediates" when they won't actually select into the operation's
immediate operand. This also requires us to handle things like selecting
`sub` vs. `add` to minimize the number of bits needed to represent the
immediate, and picking the shortest immediate encoding. In order to
that, we in turn need to scan to make sure that CF isn't used as it will
get inverted.
The end result seems very nice though, and we're now generating
optimal instruction sequences for these patterns IMO.
A follow-up patch will further expand this to other operations with RMW
memory operands. But handing `add` and `sub` are useful starting points
to flesh out the machinery and make sure interesting and complex cases
can be handled.
Thanks to Craig Topper who provided a few fixes and improvements to this
patch in addition to the review!
Differential Revision: https://reviews.llvm.org/D37139
llvm-svn: 312764
Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly.
Patch by David Zarzycki.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37224
llvm-svn: 311979
to instructions.
These can't be reasonably matched in tablegen due to the handling of
flags, so we have to do this in C++ code. We only did it for `inc` and
`dec` historically, this starts fleshing that out to more interesting
instructions. Notably, this handles transfering operands to `add` and
`sub`.
Currently this forces them into a register. The next patch will add
support for keeping immediate operands as immediates. Then I'll extend
this beyond just `add` and `sub`.
I'm not super thrilled by the repeated switches in the code but
everything else I tried was really ugly or problematic.
Many thanks to Craig Topper for the suggestions about where to even
begin here and how to make this stuff work.
Differential Revision: https://reviews.llvm.org/D37130
llvm-svn: 311806
to handle other x86 pseudos that carry flags and thus can't be matched
by our ISel patterns with fused memory accesses.
Differential Revision: https://reviews.llvm.org/D37088
llvm-svn: 311749
This extracts the code out of a giant switch in preparation for expanding it to
handle operations other thin `inc` and `dec`. Add a FIXME indicating what's
coming here.
Differential Revision: https://reviews.llvm.org/D37045
llvm-svn: 311748
Summary: With masked operations, its possible for the operation node like fadd, fsub, etc. to be used by multiple different vselects. Since the pattern matching will start at the vselect, we need to make sure the operation node itself is only used once before we can fold a load. Otherwise we'll end up folding the same load into multiple instructions.
Reviewers: RKSimon, spatel, zvi, igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36938
llvm-svn: 311342
We can load the memory VT and check for natural alignment. This also adds a new preferNonTemporalLoad helper that checks the correct subtarget feature based on the load size.
This shrinks the isel table by at least 5000 bytes by allowing more reordering and combining to occur.
llvm-svn: 311266
Masked gather for vector length 2 is lowered incorrectly for element type i32.
The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD.
The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal.
In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well.
Differential revision: https://reviews.llvm.org/D34343
llvm-svn: 305987
Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29872
llvm-svn: 301775
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.
This is largely a mechanical transformation from KnownZero to Known.Zero.
Differential Revision: https://reviews.llvm.org/D32569
llvm-svn: 301620
This will become asan errors once the patch lands that poisons the
memory after free. The x86 change is a hack, but I don't see how to
solve this properly at the moment.
llvm-svn: 300867
For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2.
llvm-svn: 297652
The Fuchsia ABI defines slots from the thread pointer where the
stack-guard value for stack-protector, and the unsafe stack pointer
for safe-stack, are stored. This parallels the Android ABI support.
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D30237
llvm-svn: 296081
Merging Load-add-store pattern into a increment op previously dropped
the load's chain from the instructions dependence if the store is
chained to a TokenFactor.
llvm-svn: 293892
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.
As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html
Differential Revision: https://reviews.llvm.org/D25878
llvm-svn: 289087
Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied.
Reviewers: RKSimon, zvi, delena, spatel
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D26790
llvm-svn: 287983
We only ever create TargetConstantPool, TargetJumpTable, TargetExternalSymbol,
TargetGlobalAddress, TargetGlobalTLSAddress, MCSymbol and TargetBlockAddress
nodes as operands of X86ISD::Wrapper nodes, so we can remove one check and
invert the other.
Also update the documentation comment for X86ISD::Wrapper.
Differential Revision: https://reviews.llvm.org/D26731
llvm-svn: 287160
Suspected to be the cause of a sanitizer-windows bot failure:
Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420
llvm-svn: 286385
A relocatable immediate is either an immediate operand or an operand that
can be relocated by the linker to an immediate, such as a regular symbol
in non-PIC code.
Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands
of type "imm32_su". Remove a number of now-redundant patterns.
Differential Revision: https://reviews.llvm.org/D25812
llvm-svn: 286384
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.
llvm-svn: 278902
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
Memory references were not being propagated for this folded load. This
prevented optimizations like LICM from hoisting the load.
Added test to verify that this allows LICM to proceed.
llvm-svn: 273617
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.
Differential Revision: http://reviews.llvm.org/D20376
llvm-svn: 273403
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
llvm-svn: 272512
It's very common to want to replace a node and then remove it since
it's dead, especially as we port backends from the SDNode *Select API
to the void Select one. This helper makes this sequence a bit less
verbose.
llvm-svn: 269236
Don't bother returning a result we don't use here. I've also renamed
this from selectGather to tryGather to better indicate that it may not
do anything.
llvm-svn: 269215
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.
We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.
Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.
llvm-svn: 268693
Both Linux and kFreeBSD use glibc, so follow similiar code paths.
Add isTargetGlibc to check for this, and use it instead of isTargetLinux
in a few places.
Fixes PR22248 for kFreeBSD.
Differential Revision: http://reviews.llvm.org/D19104
llvm-svn: 268624
This fixes two use-after-frees in selectLEA64_32Addr. If matchAddress
matches an ADD with an AND as an operand, and that AND hits one of the
"heroic transforms" that folds masks and shifts, we end up with N
pointing to an SDNode that was deleted. Make sure we're done accessing
it before that.
Found by ASan with the recycling allocator changes in llvm.org/PR26808.
llvm-svn: 266130
Some Include What You Use suggestions were used too.
Use anonymous namespaces in source files.
Differential revision: http://reviews.llvm.org/D18778
llvm-svn: 265454
This is the same as r255936, with added logic for avoiding clobbering of the
red zone (PR26023).
Differential Revision: http://reviews.llvm.org/D18246
llvm-svn: 264375
This is long-standing dirtiness, as acknowledged by r77582:
The current trick is to select it into a merge_values with
the first definition being an implicit_def. The proper solution is
to add new ISD opcodes for the no-output variant.
Doing this before selection will let us combine away some constructs.
Differential Revision: http://reviews.llvm.org/D17659
llvm-svn: 262244
The red zone consists of 128 bytes beyond the stack pointer so that the
allocation of objects in leaf functions doesn't require decrementing
rsp. In r255656, we introduced an optimization that would cheaply
materialize certain constants via push/pop. Push decrements the stack
pointer and stores it's result at what is now the top of the stack.
However, this means that using push/pop would encroach on the red zone.
PR26023 gives an example where this corrupts an object in the red zone.
llvm-svn: 256808
Use the 3-byte (4 with REX prefix) push-pop sequence for materializing
small constants. This is smaller than using a mov (5, 6 or 7 bytes
depending on size and REX prefix), but it's likely to be slower, so
only used for 'minsize'.
This is a follow-up to r255656.
Differential Revision: http://reviews.llvm.org/D15549
llvm-svn: 255936
The motivation for this patch starts with the epic fail example in PR18007:
https://llvm.org/bugs/show_bug.cgi?id=18007
...unfortunately, this patch makes no difference for that case, but it solves some
simpler cases. We'll get there some day. :)
The current 'or' matching code was using computeKnownBits() via
isBaseWithConstantOffset() -> MaskedValueIsZero(), but that's an unnecessarily limited use.
We can do more by copying the logic in ValueTracking's haveNoCommonBitsSet(), so we can
treat the 'or' as if it was an 'add'.
There's a TODO comment here because we should lift the bit-checking logic into a helper
function, so it's not duplicated in DAGCombiner.
An example of the better LEA matching:
leal (%rdi,%rdi), %eax
andl $1, %esi
orl %esi, %eax
Becomes:
andl $1, %esi
leal (%rsi,%rdi,2), %eax
Differential Revision: http://reviews.llvm.org/D13956
llvm-svn: 252515
We can set additional bits in a mask given that we know the other
operand of an AND already has some bits set to zero. This can be more
efficient if doing so allows us to use an instruction which implicitly
sign extends the immediate.
This fixes PR24085.
Differential Revision: http://reviews.llvm.org/D11289
llvm-svn: 245169
First step in preventing immediates that occur more than once within a single
basic block from being pulled into their users, in order to prevent unnecessary
large instruction encoding .Currently enabled only when optimizing for size.
Patch by: zia.ansari@intel.com
Differential Revision: http://reviews.llvm.org/D11363
llvm-svn: 244601
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
llvm-svn: 243994
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D11028
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.
These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.
Naming suggestions at this point are welcome, I'm happy to re-run sed.
Reviewers: majnemer, nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11011
llvm-svn: 241633
Before this we were producing a TargetExternalSymbol from a MCSymbol.
That meant extracting the symbol name and fetching the symbol again
down the pipeline.
This patch adds a DAG.getMCSymbol that lets the MCSymbol pass unchanged on the
DAG.
Doing so removes the need for MO_NOPREFIX and fixes the root cause of pr23900,
allowing r240130 to be committed again.
llvm-svn: 240300
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Of these, 'o' and 'v' are not tested but were already implemented.
I'm not sure why 'i' is required for X86 since it's supposed to be an
immediate constraint rather than a memory constraint. A test asserts
without it so I've included it for now.
No functional change intended.
Reviewers: nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8254
llvm-svn: 237517
x86 Windows uses the '_' prefix for all global symbols, and this was
mistakenly being applied to frameescape labels, which are not externally
visible global symbols. They use the private global prefix 'L'.
The *right* way to fix this is probably to stop masquerading this label
as an ExternalSymbol and create a new SDNode type. These labels are not
"external", and we know they will be resolved by assembly time. Having a
custom SDNode type would allow us to do better X86 address mode
matching, so it's probably worth doing eventually.
llvm-svn: 236123
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235989
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977
addl has higher throughput and this was needlessly picking a suboptimal
encoding causing PR23098.
I wish there was a way of doing this without further duplicating tbl-
generated patterns, but so far I haven't found one.
llvm-svn: 233832
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break
anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate
Constraint_* values.
PR22883 was caused the matching operands copying the whole of the operand flags
for the matched operand. This included the constraint id which needed to be
replaced with the operand number. This has been fixed with a conversion
function. Following on from this, matching operands also used the operand
number as the constraint id. This has been fixed by looking up the matched
operand and taking it from there.
llvm-svn: 232165
This (r232027) has caused PR22883; so it seems those bits might be used by
something else after all. Reverting until we can figure out what else to do.
Original commit message:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.
llvm-svn: 232093
Summary:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.
This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.
The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8171
llvm-svn: 232027
Synthesizing a call directly using the MI layer would confuse the frame
lowering code. This is problematic as frame lowering is highly
sensitive the particularities of calls, etc.
llvm-svn: 230129
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)
llvm-svn: 229214
Using KORTESTW for comparison i1 value with zero was wrong since the instruction tests 16 bits.
KORTESTW may be used with KSHIFTL+KSHIFTR that clean the 15 upper bits.
I removed (X86cmp i1, 0) pattern and zero-extend i1 to i8 and then use TESTB.
There are some cases where i1 is in the mask register and the upper bits are already zeroed.
Then KORTESTW is the better solution, but it is subject for optimization.
Meanwhile, I'm fixing the correctness issue.
llvm-svn: 228916
The assembler backend will relax to the long form if necessary. This removes a swap from long form to short form in the MCInstLowering code. Selecting the long form used to be required by the old JIT.
llvm-svn: 225242
condition to match a blend.
This prevents optimizations that work on VSELECT to perform invalid
transformations. Indeed, the optimized condition does not match the vector
boolean content that is expected and bad things may happen.
This patch yields the exact same code on the whole test-suite + specs (-O3 and
-O3 -march=core-avx2), it improves one test case (vector-blend.ll) and fixes a
bug reduced in vselect-avx.ll.
<rdar://problem/18819506>
llvm-svn: 221429
For 8-bit divrems where the remainder is used, we used to generate:
divb %sil
shrw $8, %ax
movzbl %al, %eax
That was to avoid an H-reg access, which is problematic mainly because
it isn't possible in REX-prefixed instructions.
This patch optimizes that to:
divb %sil
movzbl %ah, %eax
To do that, we explicitly extend AH, and extract the L-subreg in the
resulting register. The extension is done using the NOREX variants of
MOVZX. To support signed operations, MOVSX_NOREX is also added.
Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is
then lowered to a sequence containing a single zext (rather than 2).
Differential Revision: http://reviews.llvm.org/D6064
llvm-svn: 221176
Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where
3 are really needed.
This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to
MUL8/IMUL8 + SETO.
i8 is a special case because there is no two/three operand variants of
(I)MUL8, so the first operand and return value need to go in AL/AX.
Also, we can't write patterns for these instructions: TableGen refuses
patterns where output operands don't match SDNode results. In this case,
instructions where the output operand is an implicitly defined register.
A related special case (and FIXME) exists for MUL8 (X86InstrArith.td):
// FIXME: Used for 8-bit mul, ignore result upper 8 bits.
// This probably ought to be moved to a def : Pat<> if the
// syntax can be accepted.
[(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)]
Ideally, these go away with UMUL8, but we still need to improve TableGen
support of implicit operands in patterns.
Before this change:
movsbl %sil, %eax
movsbl %dil, %ecx
imull %eax, %ecx
movb %cl, %al
sarb $7, %al
movzbl %al, %eax
movzbl %ch, %esi
cmpl %eax, %esi
setne %al
After:
movb %dil, %al
imulb %sil
seto %al
Also, remove a made-redundant testcase for PR19858, and enable more FastISel
ALU-overflow tests for SelectionDAG too.
Differential Revision: http://reviews.llvm.org/D5809
llvm-svn: 220516
Summary:
Fix pr21099
The pseudocode of what we were doing (spread through two functions) was:
if (operand.doesNotFitIn32Bits())
Opc.initializeWithFoo();
if (operand < 0)
operand = -operand;
if (operand.doesFitIn8Bits())
Opc.initializeWithBar();
else if (operand.doesFitIn32Bits())
Opc.initializeWithBlah();
doStuff(Opc);
So for operand == INT32_MIN, Opc was never initialized because the operand changes
from fitting in 32 bits to not fitting, causing the various bugs/error messages
noted by pr21099.
This patch adds an extra test at the beginning for this case, and an
llvm_unreachable to have better error message if the operand ends up
not fitting in 32-bits at the end.
Test Plan: new test + make check
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5655
llvm-svn: 219257
In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends. During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.
However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node. In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE. Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.
Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat. Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node). We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed. On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.
I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user. Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).
So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE. This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly. (Previous patches used HandleSDNode to detect the
CSE but that's not practical here). The listener is only installed on X86.
I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc. The only thing we pay for is the
creation of the listener. The callback never ever triggers in spec2k since
this is a corner case.
Fixes rdar://problem/18206171
llvm-svn: 219009
Summary:
Mostly renaming the (not very explicit) variables Tmp0, .. Tmp4, and grouping
related statements together, along with a few lines of comments for the
surprising parts.
No functional change intended.
Test Plan: make check-all
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5088
llvm-svn: 216768
Summary:
Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new
lea-5.ll case.
Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in
ESP/EBP case.
Test Plan: lea tests modified to include x32/nacl and new test added
Reviewers: nadav, dschuff, t.p.northover
Subscribers: llvm-commits, zinovy.nis
Differential Revision: http://reviews.llvm.org/D4929
llvm-svn: 216065
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.
This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.
llvm-svn: 214449
The logic for expanding atomics that aren't natively supported in
terms of cmpxchg loops is much simpler to express at the IR level. It
also allows the normal optimisations and CodeGen improvements to help
out with atomics, instead of using a limited set of possible
instructions..
rdar://problem/13496295
llvm-svn: 212119
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.
Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.
This should fix PR15840.
llvm-svn: 191165
When selecting the DAG (add (WrapperRIP ...), (FrameIndex ...)), X86 code had
spotted the FrameIndex possibility and was working out whether it could fold
the WrapperRIP into this.
The test for forming a %rip version is notionally whether we already have a
base or index register (%rip precludes both), but we were forgetting to account
for the register that would be inserted later to access the frame.
rdar://problem/15024520
llvm-svn: 190995
Previously LEA64_32r went through virtually the entire backend thinking it was
using 32-bit registers until its blissful illusions were cruelly snatched away
by MCInstLower and 64-bit equivalents were substituted at the last minute.
This patch makes it behave normally, and take 64-bit registers as sources all
the way through. Previous uses (for 32-bit arithmetic) are accommodated via
SUBREG_TO_REG instructions which make the types and classes agree properly.
llvm-svn: 183693
Add earlyclobber constaints to prevent input register being allocated as
the output register because, according to Intel spec [1], "If any pair
of the index, mask, or destination registers are the same, this
instruction results a UD fault."
---
[1] http://software.intel.com/sites/default/files/319433-014.pdf
llvm-svn: 183327
The MOV64ri64i32 instruction required hacky MCInst lowering because it
was allocated as setting a GR64, but the eventual instruction ("movl")
only set a GR32. This converts it into a so-called "MOV32ri64" which
still accepts a (appropriate) 64-bit immediate but defines a GR32.
This is then converted to the full GR64 by a SUBREG_TO_REG operation,
thus keeping everyone happy.
This fixes a typo in the opcode field of the original patch, which
should make the legact JIT work again (& adds test for that problem).
llvm-svn: 183068
The MOV64ri64i32 instruction required hacky MCInst lowering because it was
allocated as setting a GR64, but the eventual instruction ("movl") only set a
GR32. This converts it into a so-called "MOV32ri64" which still accepts a
(appropriate) 64-bit immediate but defines a GR32. This is then converted to
the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy.
llvm-svn: 182991
Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions,
it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg")
and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is
smaller and partial register updates can sometimes be avoided.
Until recently, this sequence was a barrier to rematerialization though. That
should now be fixed so it's an appropriate time to make the change.
llvm-svn: 182928
I need to handle this for the test case in my following scheduler
commit.
Work is already under way to redesign the mechanism for node order
propagation because this case by case approach is unmaintainable.
llvm-svn: 179448
To enable a load of a call address to be folded with that call, this
load is moved from outside of callseq into callseq. Such a moving
adds a non-glued node (that load) into a glued sequence. This non-glue
load is only removed when DAG selection folds them into a memory form
call instruction. When such instruction selection is disabled, it breaks
DAG schedule.
To prevent that, such moving is disabled when target favors register
indirect call.
Previous workaround disabling CALL32m/CALL64m insn selection is removed.
llvm-svn: 178308
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
directly.
This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.
llvm-svn: 171253
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
used as a light-weight replacement of setjmp/longjmp which are used to
implementation continuation, user-level threading, and etc. The support added
in this patch ONLY addresses this usage and is NOT intended to support SjLj
exception handling as zero-cost DWARF exception handling is used by default
in X86.
llvm-svn: 165989
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.
llvm-svn: 165488
X86DAGToDAGISel::PreprocessISelDAG(), isel is moving load inside
callseq_start / callseq_end so it can be folded into a call. This can
create a cycle in the DAG when the call is glued to a copytoreg. We
have been lucky this hasn't caused too many issues because the pre-ra
scheduler has special handling of call sequences. However, it has
caused a crash in a specific tailcall case.
rdar://12393897
llvm-svn: 165072
- Merge the processing of LOAD_ADD with other atomic load-arith
operations
- Separate the logic getting target constant for atomic-load-op and add
an optimization for atomic-load-add on i16 with negative value
- Optimize a minor case for atomic-fetch-add i16 with negative operand. Test
case is revised.
llvm-svn: 164243
We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.
This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.
<rdar://problem/12282281>
llvm-svn: 163761
- BlockAddress has no support of BA + offset form and there is no way to
propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
support BA + offset addressing.
llvm-svn: 163743