Note, this is a recommit of r236515 after fixing an error in r236514. The buildbot ran fast enough that it picked up r236514 prior to r236515 and threw an error. r236515 itself ran 'make check' without errors.
Original commit message follows:
A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.
These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register.
Reviewed by Matthias Braun and Quentin Colombet.
llvm-svn: 236550
The register set for LDMIA begins at offset 3, not 4. We were previously
missing the short encoding of this instruction in the case where the base
register was the first register in the register set.
Also clean up some dead code:
- The isARMLowRegister check is redundant with what VerifyLowRegs does;
replace with an assert.
- Remove handling of LDMDB instruction, which has no short encoding (and
does not appear in ReduceTable).
Differential Revision: http://reviews.llvm.org/D9485
llvm-svn: 236535
This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
the element number from getVectorIdxTy() to PtrTy before doing pointer
arithmetic on it. This is needed on z, where element numbers are i32
but pointers are i64.
Original patch by Richard Sandiford.
llvm-svn: 236530
For little-endian, the function would convert (extract_vector_elt (load X), Y)
to X + Y*sizeof(elt). For big-endian it would instead use
X + sizeof(vec) - Y*sizeof(elt). The big-endian case wasn't right since
vector index order always follows memory/array order, even for big-endian.
(Note that the current handling has to be wrong for Y==0 since it would
access beyond the end of the vector.)
Original patch by Richard Sandiford.
llvm-svn: 236529
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal.
E.g. it would load a v4i8 as an i32 if i32 was legal.
This patch extends that behavior to promoted integers as well as legal ones.
If the integer type for the full vector width is TypePromoteInteger,
the element type is going to be TypePromoteInteger too, and it's still
better to use a single promoting load or truncating store rather than N
individual promoting loads or truncating stores. E.g. if you have a v2i8
on a target where i16 is promoted to i32, it's better to load the v2i8 as
an i16 rather than load both i8s individually.
Original patch by Richard Sandiford.
llvm-svn: 236528
This adds intrinsics to allow access to all of the z13 vector instructions.
Note that instructions whose semantics can be described by standard LLVM IR
do not get any intrinsics.
For each instructions whose semantics *cannot* (fully) be described, we
define an LLVM IR target-specific intrinsic that directly maps to this
instruction.
For instructions that also set the condition code, the LLVM IR intrinsic
returns the post-instruction CC value as a second result. Instruction
selection will attempt to detect code that compares that CC value against
constants and use the condition code directly instead.
Based on a patch by Richard Sandiford.
llvm-svn: 236527
The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be
passed in vector registers. We do not yet support those types, and
some infrastructure is missing before we can do so.
In order to prevent accidentally generating code violating the ABI,
this patch adds checks to detect those types and error out if user
code attempts to use them.
llvm-svn: 236526
The ABI allows sub-128 vectors to be passed and returned in registers,
with the vector occupying the upper part of a register. We therefore
want to legalize those types by widening the vector rather than promoting
the elements.
The patch includes some simple tests for sub-128 vectors and also tests
that we can recognize various pack sequences, some of which use sub-128
vectors as temporary results. One of these forms is based on the pack
sequences generated by llvmpipe when no intrinsics are used.
Signed unpacks are recognized as BUILD_VECTORs whose elements are
individually sign-extended. Unsigned unpacks can have the equivalent
form with zero extension, but they also occur as shuffles in which some
elements are zero.
Based on a patch by Richard Sandiford.
llvm-svn: 236525
The z13 vector facility includes some instructions that operate only on the
high f64 in a v2f64, effectively extending the FP register set from 16
to 32 registers. It's still better to use the old instructions if the
operands happen to fit though, since the older instructions have a shorter
encoding.
Based on a patch by Richard Sandiford.
llvm-svn: 236524
The architecture doesn't really have any native v4f32 operations except
v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32
elements being used. Even so, using vector registers for <4 x float>
and scalarising individual operations is much better than generating
completely scalar code, since there's much less register pressure.
It's also more efficient to do v4f32 comparisons by extending to 2
v2f64s, comparing those, then packing the result.
This particularly helps with llvmpipe.
Based on a patch by Richard Sandiford.
llvm-svn: 236523
This adds ABI and CodeGen support for the v2f64 type, which is natively
supported by z13 instructions.
Based on a patch by Richard Sandiford.
llvm-svn: 236522
This the first of a series of patches to add CodeGen support exploiting
the instructions of the z13 vector facility. This patch adds support
for the native integer vector types (v16i8, v8i16, v4i32, v2i64).
When the vector facility is present, we default to the new vector ABI.
This is characterized by two major differences:
- Vector types are passed/returned in vector registers
(except for unnamed arguments of a variable-argument list function).
- Vector types are at most 8-byte aligned.
The reason for the choice of 8-byte vector alignment is that the hardware
is able to efficiently load vectors at 8-byte alignment, and the ABI only
guarantees 8-byte alignment of the stack pointer, so requiring any higher
alignment for vectors would require dynamic stack re-alignment code.
However, for compatibility with old code that may use vector types, when
*not* using the vector facility, the old alignment rules (vector types
are naturally aligned) remain in use.
These alignment rules are not only implemented at the C language level
(implemented in clang), but also at the LLVM IR level. This is done
by selecting a different DataLayout string depending on whether the
vector ABI is in effect or not.
Based on a patch by Richard Sandiford.
llvm-svn: 236521
This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515).
This is to get the bots green while i investigate the failures.
llvm-svn: 236517
A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.
These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register.
Reviewed by Matthias Braun and Quentin Colombet.
llvm-svn: 236515
This reverts commit r236360.
This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.
llvm-svn: 236508
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
llvm-svn: 236507
It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.
This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.
Phabricator review: http://reviews.llvm.org/D9475
llvm-svn: 236503
Summary:
When using the N64 ABI, element-indices use the i64 type instead of i32.
In many cases, we can use iPTR to account for this but additional patterns
and pseudo's are also required.
This fixes most (but not quite all) failures in the test-suite when using
N64 and MSA together.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9342
llvm-svn: 236494
When deciding whether a value comes from the aggregate or inserted value of an
insertvalue instruction, we compare the indices against those of the location
we're interested in. One of the lists needs reversing because the input data is
backwards (so that modifications take place at the end of the SmallVector), but
we were reversing both before leading to incorrect results.
Should fix PR23408
llvm-svn: 236457
This pass is responsible for constructing the EH registration object
that gets linked into fs:00, which is all it does in this change. In the
future, it will also insert stores to update the EH state number.
I considered keeping this functionality in WinEHPrepare, but it's pretty
separable and X86 specific. It has conceptually very little to do with
the task of WinEHPrepare, which is currently outlining. WinEHPrepare is
also in theory useful on ARM, but this logic is pretty x86 specific.
Reviewers: andrew.w.kaylor, majnemer
Differential Revision: http://reviews.llvm.org/D9422
llvm-svn: 236339
Functions with jump tables need an alignment of 4 because they use the ADR
instruction, which aligns the PC to 4 bytes before adding an offset.
Differential Revision: http://reviews.llvm.org/D9424
llvm-svn: 236327
This patch fixes issues with vector constant folding not correctly handling scalar input operands if they require implicit truncation - this was tested with llvm-stress as recommended by Patrik H Hagglund.
The patch ensures that integer input scalars from a build vector are correctly truncated before folding, and that constant integer scalar results are promoted to a legal type before inclusion in the new folded build vector.
I have added another crash test case and also a test for UINT_TO_FP / SINT_TO_FP using an non-truncated scalar input, which was failing before this patch.
Differential Revision: http://reviews.llvm.org/D9282
llvm-svn: 236308
This pass was generating 'Instruction does not dominate all uses!'
errors for programs which had loops with a condition variable that
depended on the result of a phi instruction from outside of the loop.
The pass was inserting new phi nodes outside of the loop which used values
defined inside the loop.
http://bugs.freedesktop.org/show_bug.cgi?id=90056
llvm-svn: 236306
If we move an instruction from one block down to a MOVC and predicate it,
then the original instruction could be moved in to a loop. In this case,
its invalid for any kill flags to remain on there.
Fails with -verfy-machineinstrs.
rdar://problem/20752113
llvm-svn: 236290
When commuting a thumb instruction in the size reduction pass, thumb
instructions are represented as a bundle and so some operands may be marked
as internal. The internal flag has to move with the operand when commuting.
This test is sensitive to register allocation so can't specifically check that
this error was happening, but so long as it continues to pass with -verify then
hopefully its still ok.
rdar://problem/20752113
llvm-svn: 236282
The expansion for t2ABS was always setting the kill flag on the rsb instruction.
It should instead only be set on rsb if it was set on the original ABS instruction.
rdar://problem/20752113
llvm-svn: 236272
This helps reduce the frequency of stack realignment prologues in 32-bit
X86 Windows code. Before this change and the corresponding clang change,
we would take the max of the type preferred alignment and the explicit
alignment on the alloca.
If you don't override aggregate alignment in datalayout, you get a
default of 8. This dates back to 2007 / r34356, and changing it seems
prohibitively difficult at this point.
llvm-svn: 236270
Revision 220239 exposed a latent bug in method
'TargetInstrInfo::commuteInstruction'. When commuting the operands of a machine
instruction, method 'commuteInstruction' didn't correctly propagate the
'IsUndef' flag to the register operands of the new (commuted) instruction.
Before this patch, the following instruction:
%vreg4<def> = VADDSDrr %vreg14, %vreg5<undef>; FR64:%vreg4,%vreg14,%vreg5
was wrongly converted by method 'commuteInstruction' into:
%vreg4<def> = VADDSDrr %vreg5, %vreg14<undef>; FR64:%vreg4,%vreg5,%vreg14
The correct instruction should have been:
%vreg4<def> = VADDSDrr %vreg5<undef>, %vreg14; FR64:%vreg4,%vreg5,%vreg14
This patch fixes the problem in method 'TargetInstrInfo::commuteInstruction'.
When swapping the operands of a machine instruction, we now make sure that
'IsUndef' flags are correctly set.
Added test case 'pr23103.ll'.
Differential Revision: http://reviews.llvm.org/D9406
llvm-svn: 236258
In the test case here, the 'unreachable' BB was removed by BranchFolding because its empty.
It then rewrote the jump from 'entry' to jump to its fallthrough, which was a landing pad.
This results in 'entry' jumping to 2 different landing pads, which fails the machine verifier.
rdar://problem/20750162
llvm-svn: 236248
temporary.
Because of that:
1. The machine verifier was complaining on such code.
2. The generate code worked just because the thumb reduction size pass fixed the
opcode.
rdar://problem/20749824
llvm-svn: 236247
changes:
Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO
Add location to getConstant
Drop comment about the ops being turned into expand
llvm-svn: 236240
Summary:
The majority of the checks are subtarget independent. The few that aren't
will be corrected shortly.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9340
llvm-svn: 236220
Summary:
This doesn't make much difference to MIPS32, but it will simplify a
MIPS64r6 bugfix which will follow shortly by removing unnecessary
sign-extension of parameters.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9338
llvm-svn: 236216
At the least it should be guarded by some kind of target hook.
It also introduced catastrophic compile time and code quality
regressions on some out of tree targets (test case still being
reduced/sanitized).
Sanjay agreed with reverting this patch until these issues can be
resolved.
llvm-svn: 236199
This will cause hot nodes to appear closer to the root.
The literature says building the tree like this makes it a near-optimal (in
terms of search time given key frequencies) binary search tree. In LLVM's case,
we can do up to 3 comparisons in each leaf node, so it might be better to opt
for lower tree height in some cases; that's something to look into in the
future.
Differential Revision: http://reviews.llvm.org/D9318
llvm-svn: 236192
This was breaking sqlite with the machine verifier because operand 0 was a def according to tablegen, but didn't have the 'isDef' flag set.
Looking at the ISA, its clear that this operand is a source as writing to st(0) is implicit. So move the operand to the correct place in the td file.
rdar://problem/20751584
llvm-svn: 236183
32-bit x86 MSVC-style exceptions are functionaly similar to 64-bit, but
they take no arguments. Instead, they implicitly use the value of EBP
passed in by the caller as a pointer to the parent's frame. In LLVM, we
can represent this as llvm.frameaddress(1), and feed that into all of
our calls to llvm.framerecover.
The next steps are:
- Add an alloca to the fs:00 linked list of handlers
- Add something like llvm.sjlj.lsda or generalize it to store in the
alloca
- Move state number calculation to WinEHPrepare, arrange for
FunctionLoweringInfo to call it
- Use the state numbers to insert explicit loads and stores in the IR
llvm-svn: 236172
x86 Windows uses the '_' prefix for all global symbols, and this was
mistakenly being applied to frameescape labels, which are not externally
visible global symbols. They use the private global prefix 'L'.
The *right* way to fix this is probably to stop masquerading this label
as an ExternalSymbol and create a new SDNode type. These labels are not
"external", and we know they will be resolved by assembly time. Having a
custom SDNode type would allow us to do better X86 address mode
matching, so it's probably worth doing eventually.
llvm-svn: 236123
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
Summary:
The existing code was correct for 32-bit GPR's but not 64-bit GPR's. It now
accounts for both cases.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits, mohit.bhakkad, sagar
Differential Revision: http://reviews.llvm.org/D9337
llvm-svn: 236099
We were trying to look through COPY instructions, but only to the next
instruction in a BB and incorrectly anyway. The cases where that would actually
be a good idea are rare enough (and not even tested!) that it's not worth
trying to get right.
rdar://20721342
llvm-svn: 236050
This is a compromise: with this simple patch, we should always handle a chain of exactly 3
operations optimally, but we're not generating the optimal balanced binary tree for a longer
sequence.
In general, this transform will reduce the dependency chain for a sequence of instructions
using N operands from a worst case N-1 dependent operations to N/2 dependent operations.
The optimal balanced binary tree would reduce the chain to log2(N).
The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
don't need to worry about the increased register pressure required to parallelize longer
sequences. It also seems unlikely that we would ever encounter really long strings of
dependent ops like that in the wild, but I'm not sure how to verify that speculation.
FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
and this patch.
We can extend this patch to cover other associative operations such as fmul, fmax, fmin,
integer add, integer mul.
This is a partial fix for:
https://llvm.org/bugs/show_bug.cgi?id=17305
and if extended:
https://llvm.org/bugs/show_bug.cgi?id=21768https://llvm.org/bugs/show_bug.cgi?id=23116
The issue also came up in:
http://reviews.llvm.org/D8941
Differential Revision: http://reviews.llvm.org/D9232
llvm-svn: 236031
Fixes a crash with basically any OpenGL application using the radeonsi
driver.
Patch by: Michel Dänzer
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90176
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 236004
We need to track if an AddrSpaceCast expression was seen when
generating an MCExpr for a ConstantExpr. This change introduces a
custom lowerConstant method to the NVPTX asm printer that will create
NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode
the information that a given symbol needs to be casted to a generic
address.
llvm-svn: 236000
Previously, the code would try to put a fall-through case last,
even if that meant moving a case with much higher branch weight
further down the chain.
Ordering by branch weight is most important, putting a fall-through
block last is secondary.
llvm-svn: 235942
After legalization, scalar SETCC has an i32 result type on AArch64.
The i1 requirement seems too conservative, replace it with an assert.
This also means that we now can run after legalization. That should also
be fine, since the ops legalizer runs again after each combine, and
all types created all have the same sizes as the (legal) inputs.
Exposed by r235917; while there, robustize its tests (bsl also uses the
register it defines).
llvm-svn: 235922
When the setcc has f64 operands, we can't build a vector setcc mask
to feed a vselect, because f64 doesn't divide v3f32 evenly.
Just bail out when that happens.
llvm-svn: 235917
This patch adds a new SSA MI pass that runs on little-endian PPC64
code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors
without alignment constraints are accomplished for little-endian using
lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional
xxswapd instructions hurts performance in comparison with big-endian
code, but they are necessary in the general case to support correct
semantics.
However, the general case does not apply to most vector code. Many
vector instructions are lane-insensitive; they do not "care" which
lanes the parallel computations are performed within, provided that
the resulting data is stored into the correct locations. Thus this
pass looks for computations that perform only lane-insensitive
operations, and remove the unnecessary swaps from loads and stores in
such computations.
Future improvements will allow computations using certain
lane-sensitive operations to also be optimized in this manner, by
modifying the lane-sensitive operations to account for the permuted
order of the lanes. However, this patch only adds the infrastructure
to permit this; no lane-sensitive operations are optimized at this
time.
This code is heavily exercised by the various vectorizing applications
in the projects/test-suite tree. For the time being, I have only added
one simple test case to demonstrate what the pass is doing. Although
it is quite simple, it provides coverage for much of the code,
including the special case handling of copies and subreg-to-reg
operations feeding the swaps. I plan to add additional tests in the
future as I fill in more of the "special handling" code.
Two existing tests were affected, because they expected the swaps to
be present, but they are now removed.
llvm-svn: 235910
Use a loop instruction with a constant extender for a hardware
loop instruction that is too far away from the start of the loop.
This is cheaper than changing the SA register value.
Differential Revision: http://reviews.llvm.org/D9262
llvm-svn: 235882
This reapplies r235194, which was reverted in r235495 because it was causing a
failure in our out-of-tree buildbots for MIPS. With the sign-extension patch
in r235718, this patch doesn't cause any problem any more.
llvm-svn: 235878
Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized.
The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result.
Differential Revision: http://reviews.llvm.org/D9115
llvm-svn: 235837
This introduces an intrinsic called llvm.eh.exceptioncode. It is lowered
by copying the EAX value live into whatever basic block it is called
from. Obviously, this only works if you insert it late during codegen,
because otherwise mid-level passes might reschedule it.
llvm-svn: 235768
Same as r235145 for the call instruction - the justification, tradeoffs,
etc are all the same. The conversion script worked the same without any
false negatives (after replacing 'call' with 'invoke').
llvm-svn: 235755
Summary:
Fixes a bug in the NVPTX codegen. The code used to miss necessary "generic()"
on aggregates of addrspacecasts.
Test Plan: addrspacecast-gvar.ll
Reviewers: eliben, jholewinski
Reviewed By: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9130
llvm-svn: 235689
We were asserting on code like this:
extern "C" unsigned long _exception_code();
void might_crash(unsigned long);
void foo() {
__try {
might_crash(0);
} __except(1) {
might_crash(_exception_code());
}
}
Gtest and many other libraries get the exception code from the __except
block. What's supposed to happen here is that EAX is live into the
__except block, and it contains the exception code. Eventually we'll
represent that as a use of the landingpad ehptr value, but for now we
can replace it with undef.
llvm-svn: 235649
remove copies that are useful after breaking some hardware dependencies.
In other words, handle this kind of situations conservatively by assuming reg2
is redefined by the undef flag.
reg1 = copy reg2
= inst reg2<undef>
reg2 = copy reg1
Copy propagation used to remove the last copy.
This is incorrect because the undef flag on reg2 in inst, allows next
passes to put whatever trashed value in reg2 that may help.
In practice we end up with this code:
reg1 = copy reg2
reg2 = 0
= inst reg2<undef>
reg2 = copy reg1
This fixes PR21743.
llvm-svn: 235647
When the base register index of the vector plus the constant offset
was less than zero, we were passing the wrong base register to the indirect
addressing instruction.
In this case, we need to set the base register to v0 and then add
the computed (negative) index to m0.
llvm-svn: 235641
The order in which branches appear in ImmBranches is approximately their
order within the function body. By visiting later branches first, we reduce
the distance between earlier forward branches and their targets, making it
more likely that the cbn?z optimization, which can only apply to forward
branches, will succeed for those earlier branches.
Differential Revision: http://reviews.llvm.org/D9185
llvm-svn: 235640
In particular, this preserves the kill flag, which allows the Thumb2 cbn?z
optimization to be applied in cases where a branch has been re-created after
the live variables analysis pass, e.g. by the machine block placement pass.
This appears to be low risk; a number of other targets seem to already be
doing something similar, e.g. AArch64, PowerPC.
Differential Revision: http://reviews.llvm.org/D9184
llvm-svn: 235639
This allows the constant island pass to lower these branches to cbn?z
instructions, resulting in a shorter instruction sequence.
Differential Revision: http://reviews.llvm.org/D9183
llvm-svn: 235638
This makes it more likely that we can use the 16-bit push and pop instructions
on Thumb-2, saving around 4 bytes per function.
Differential Revision: http://reviews.llvm.org/D9165
llvm-svn: 235637
This appears to have been introduced back in r76698 as part of an unrelated
change. I can find no official ARM documentation stating that Thumb-2 functions
require 4-byte alignment; in fact, ARM documentation appears to contradict
this (see, e.g., ARM Architecture Reference Manual Thumb-2 Supplement,
section 2.6.1: "Thumb-2 enforces 16-bit alignment on all instructions.").
Also remove code that sets alignment for ARM functions, which is redundant
with code in the MachineFunction constructor, and remove the hidden
-arm-align-constant-islands flag, which has been enabled by default since
r146739 (Dec 2011) and has probably received sufficient testing by now.
Differential Revision: http://reviews.llvm.org/D9138
llvm-svn: 235636
TableGen had been nicely generating code to print a number of instructions using
shorter aliases (and PowerPC has plenty of short mnemonics), but we were not
calling it. For some of the aliases we support in the parser, TableGen can't
infer the "inverse" alias relationship, so there is still more to do.
Thus, after some hours of updating test cases...
llvm-svn: 235616
Summary:
Constant stores of f16 vectors can create NvCast nodes from various
operand types to v4f16 or v8f16 depending on patterns in the stored
constants. This patch adds nvcast rules with v4f16 and v8f16 values.
AArchISelLowering::LowerBUILD_VECTOR has the details on which constant
patterns generate the nvcast nodes.
Reviewers: jmolloy, srhines, ab
Subscribers: rengolin, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D9201
llvm-svn: 235610
Summary:
Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32,
v8i8, v8i16 inputs to allow promotion of v4f16 results.
Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32,
and i64 vectors. Only missing tests are for v16i8 and v16i16 as the
shift operations are too complicated to write a proper check sequence.
The conversions from v4i64 to v4f16 do not depend on this patch - v4i64
is split and the conversion gets handled while lowering v2i64. I am
adding a test here for completeness.
Reviewers: aemerson, rengolin, ab, jmolloy, srhines
Subscribers: rengolin, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D9166
llvm-svn: 235609
Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.
llvm-svn: 235608
Patch to remove extra bitcasts from shuffles, this is often a legacy of XformToShuffleWithZero being used to combine bitmaskings (of float vectors bitcast to integer vectors) into shuffles: bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1)
Differential Revision: http://reviews.llvm.org/D9097
llvm-svn: 235578
This is a re-commit of r235101, which also fixes the problems with the previous patch:
- Switches with only a default case and non-fallthrough were handled incorrectly
- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.
> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649
llvm-svn: 235560
This removes the -sehprepare flag and makes __C_specific_handler
functions always to use WinEHPrepare.
This was tested by building all of chromium_builder_tests and running a
few tests that use SEH, but if something breaks, we can revert this.
llvm-svn: 235557
In particular, this handles SSA values that are live *out* of a handler.
The existing code only handles values that are live *in* to a handler.
It also handles phi nodes in the block where normal control should
resume after the end of a catch handler. When EH return points have phi
nodes, we need to split the return edge. It is impossible for phi
elimination to emit copies in the previous block if that block gets
outlined. The indirectbr that we leave in the function is only notional,
and is eliminated from the MachineFunction CFG early on.
Reviewers: majnemer, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D9158
llvm-svn: 235545
Summary:
Remove the CHECK-DAG calls introduced in r235341, and add a comment that
this test may break due to scheduling variations.
This patch completes the fix discussed in http://reviews.llvm.org/D8804
Reviewers: dsanders, srhines
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9178
llvm-svn: 235530
This fixes a regression introduced at revision 218263.
On AVX, if we optimize for size, a splat build_vector of a load
is lowered into a VBROADCAST node. This is done even if the value type of the
splat build_vector node is v2i64.
Since AVX doesn't support v2f64/v2i64 broadcasts, revision 218263 added two
extra tablegen patterns to allow selecting a VMOVDDUPrm from an X86VBroadcast
where the scalar element comes from a loadi64/loadf64.
However, revision 218263 forgot to add an extra fallback pattern for the case
where we have a X86VBroadcast of a loadi64 with multiple uses.
This patch adds the missing tablegen pattern in X86InstrSSE.td.
This patch also adds an extra test to 'splat-for-size.ll' to verify that ISel
doesn't crash with a 'fatal error in the backend' due to a missing AVX pattern
to select v2i64 X86ISD::BROADCAST nodes.
llvm-svn: 235509
This turned up after r235333, but was a pre-existing bug. The optimization
which transforms select(c, load, load) into a load of a select of the addresses
does not handle indexed loads (pre/post inc/dec). However, it did not check for
them either, leading to a crash if it tried to transform one of them.
llvm-svn: 235497
Enough concerns were raised that this optimization is pessimising some code patterns.
The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher.
llvm-svn: 235491
X86 backend.
The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.
llvm-svn: 235483
With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system
even though 64-bit integers are not legal types.
So instead of producing this:
pshufd $229, %xmm0, %xmm1 ## xmm1 = xmm0[1,1,2,3]
movd %xmm0, (%eax)
movd %xmm1, 4(%eax)
We can do:
movq %xmm0, (%eax)
This is a fix for the problem noted in D7296.
Differential Revision: http://reviews.llvm.org/D9134
llvm-svn: 235460
We should also teach the inliner to collapse framerecover of
frameaddress of the current frame down to an alloca, but that can happen
later.
llvm-svn: 235459
Keep the old SEH fan-in lowering on by default for now, since projects
rely on it. This will make it easy to test this change with a simple
flag flip.
llvm-svn: 235399
There doesn't seem to be a reason to perform this target ISD node matching
in an DAGCombine, moving it to lowering fixes PR23296.
Differential Revision: http://reviews.llvm.org/D9137
llvm-svn: 235394
Summary:
The 64-bit version of the variable shift instructions uses the
shift_rotate_reg class which uses a GPR32Opnd to specify the variable
shift amount. With this patch we avoid the generation of a redundant
SLL instruction for the variable shift instructions in 64-bit targets.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7413
llvm-svn: 235376
This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since.
I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests.
Differential Revision: http://reviews.llvm.org/D9095
llvm-svn: 235372
Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation.
Test case spotted by Patrik Hagglund
llvm-svn: 235371
Summary:
This fixes http://llvm.org/bugs/show_bug.cgi?id=16439.
This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type.
Patch by Keno Fischer!
Test Plan: regression test from http://reviews.llvm.org/D7752
Reviewers: chfast, resistor
Reviewed By: chfast, resistor
Subscribers: sanjoy, resistor, chfast, llvm-commits
Differential Revision: http://reviews.llvm.org/D4978
llvm-svn: 235370
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.
Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296
Differential Revision: http://reviews.llvm.org/D9120
llvm-svn: 235367
Summary:
In the f16-promote test, make the checks for native conversion instructions
similar to the libcall checks:
- Remove hard coded register names
- Do not check exact instruction sequences.
This fixes test flakiness due to non-determinism in instruction
scheduling and register allocation. I also fixed a few minor things in
the CHECK-LIBCALL checks.
I'll try to find a way to check that unnecessary loads, stores, or
conversions don't happen.
Reviewers: mzolotukhin, srhines, ab
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9112
llvm-svn: 235363