We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.
Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.
rdar://11873276
llvm-svn: 161462
are targeting an ELF platform. Only fold gs-relative (and fs-relative) loads
if it is actually sensible to do so for the target platform.
This fixes PR13438.
llvm-svn: 160687
Corrected type for index of llvm.x86.avx2.gather.d.pd.256
from 256-bit to 128-bit.
Corrected types for src|dst|mask of llvm.x86.avx2.gather.q.ps.256
from 256-bit to 128-bit.
Support the following intrinsics:
llvm.x86.avx2.gather.d.q, llvm.x86.avx2.gather.q.q
llvm.x86.avx2.gather.d.q.256, llvm.x86.avx2.gather.q.q.256
llvm.x86.avx2.gather.d.d, llvm.x86.avx2.gather.q.d
llvm.x86.avx2.gather.d.d.256, llvm.x86.avx2.gather.q.d.256
llvm-svn: 159402
x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.
To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
the match to try to fit RIP into the base register. If it fails, it
now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
as it did before. The reason these immediates are safe is because the
ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
This is the only case changed by the patch, and the primary place you
see it is in TLS, either the win64 section offset TLS or Linux
local-exec TLS model in a PIC compilation. Here the ABI again ensures
that the immediates fit because we are in small mode, and any other
operations required due to the PIC relocation model have been handled
externally to the Wrapper node (extra loads etc are made around the
wrapper node in ISelLowering).
I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.
llvm-svn: 154304
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011
This is a code change to add support for changing instruction sequences of the form:
load
inc/dec of 8/16/32/64 bits
store
into the appropriate X86 inc/dec through memory instruction:
inc[qlwb] / dec[qlwb]
The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded.
llvm-svn: 153635
This is a code change to add support for changing instruction sequences of the form:
load
inc/dec of 8/16/32/64 bits
store
into the appropriate X86 inc/dec through memory instruction:
inc[qlwb] / dec[qlwb]
The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded.
llvm-svn: 153617
If the DEC node had more than one user, it was doing this lowering but
leaving the original DEC node around and so decrementing twice.
Fixes PR11964.
llvm-svn: 150356
same pattern. We already had this pattern is a few places, but others
tried to make a rough approximation of an actual DAG structure. As not
everywhere went to this trouble, nothing could rely on this being done.
In fact, I've checked all references to these node Ids, and the ones
that are using the topo-sort properties are actually satisfied with
a strict-weak-ordering. The requirement appears to be that Use >= Def.
I've added a big blurb of comments to this bit of the transform to
clarify why the order is so important for the next reader of the code.
I'm starting with this change as it is very small, and trivially
reverted if something breaks or the >= above really does need to be >.
If that proves the case, we can hide the problem by reverting this
patch, but the problem exists elsewhere as well, and so a more
comprehensive solution will be needed.
llvm-svn: 148001
hoped this would revive one of the llvm-gcc selfhost build bots, but it
didn't so it doesn't appear that my transform is the culprit.
If anyone else is seeing failures, please let me know!
llvm-svn: 147957
strange build bot failures that look like a miscompile into an infloop.
I'll investigate this tomorrow, but I'd both like to know whether my
patch is the culprit, and get the bots back to green.
llvm-svn: 147945
mask+shift pairs at the beginning of the ISD::AND case block, and then
hoist the final pattern into a helper function, simplifying and
reflowing it appropriately. This should have no observable behavior
change, but several simplifications fell out of this such as directly
computing the new mask constant, etc.
llvm-svn: 147939
extracts and scaled addressing modes into its own helper function. No
functionality changed here, just hoisting and layout fixes falling out
of that hoisting.
llvm-svn: 147937
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:
unsigned x = my_accelerator_table[input >> 11];
Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):
*(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));
The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.
In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.
llvm-svn: 147936
this substraction will result in small negative numbers at worst which
become very large positive numbers on assignment and are thus caught by
the <=4 check on the next line. The >0 check clearly intended to catch
these as negative numbers.
Spotted by inspection, and impossible to trigger given the shift widths
that can be used.
llvm-svn: 147773
fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.
llvm-svn: 143206
it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
Delete #if 0 code accidentally left in.
llvm-svn: 143188
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
llvm-svn: 143177
In 64-bit mode, sub_8bit_hi sub-registers can only be used by NOREX
instructions. The COPY created from the EXTRACT_SUBREG DAG node cannot
target all GR8 registers, only those in GR8_NOREX.
TO enforce this, we ensure that all instructions using the
EXTRACT_SUBREG are GR8_NOREX constrained.
This fixes PR11088.
llvm-svn: 141499