When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value
constraints, we should check for intrinsics that dominate the edge's branch,
not just any potential context instructions. An assumption that dominates the
edge's branch represents a truth on that edge. This is specifically useful, for
example, if multiple predecessors assume a pointer to be nonnull, allowing us
to simplify a later null comparison.
The test case, and an initial patch, were provided by Philip Reames. Thanks!
llvm-svn: 219688
e.g Currently we'll generate following instructions if the immediate is too wide:
MOV X0, WideImmediate
ADD X1, BaseReg, X0
LDR X2, [X1, 0]
Using [Base+XReg] addressing mode can save one ADD as following:
MOV X0, WideImmediate
LDR X2, [BaseReg, X0]
Differential Revision: http://reviews.llvm.org/D5477
llvm-svn: 219665
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.
llvm-svn: 219656
We assumed that negation operations of the form (0 - %Z) resulted in a
negative number. This isn't true if %Z was originally negative.
Substituting the negative number into the remainder operation may result
in undefined behavior because the dividend might be INT_MIN.
This fixes PR21256.
llvm-svn: 219639
We have a transform that changes:
(x lshr C1) udiv C2
into:
x udiv (C2 << C1)
However, it is unsafe to do so if C2 << C1 discards any of C2's bits.
This fixes PR21255.
llvm-svn: 219634
Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result. The details are quite complex and hard to
determine statically, since branches in the code may exist in some
circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.
The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.
This patch implements such work-around in the backend, enabled via the option
-aarch64-fix-cortex-a53-835769.
The work-around code generation is not enabled by default.
llvm-svn: 219603
Summary: [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register.
Reviewers: eugenis
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5599
llvm-svn: 219602
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.
This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too.
Differential Revision: http://reviews.llvm.org/D5701
llvm-svn: 219584
Consider the case where X is 2. (2 <<s 31)/s-2147483648 is zero but we
would fold to X. Note that this is valid when we are in the unsigned
domain because we require NUW: 2 <<u 31 results in poison.
This fixes PR21245.
llvm-svn: 219568
consider:
C1 = INT_MIN
C2 = -1
C1 * C2 overflows without a doubt but consider the following:
%x = i32 INT_MIN
This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1.
N. B. Move the unsigned version of this transform to InstSimplify, it
doesn't create any new instructions.
This fixes PR21243.
llvm-svn: 219567
consider:
mul i32 nsw %x, -2147483648
this instruction will not result in poison if %x is 1
however, if we transform this into:
shl i32 nsw %x, 31
then we will be generating poison because we just shifted into the sign
bit.
This fixes PR21242.
llvm-svn: 219566
Summary: Implement the most basic form of conditional branches in Mips fast-isel.
Test Plan:
br1.ll
run 4 flavors of test-suite. mips32 r1/r2 and at -O0/O2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D5583
llvm-svn: 219556
The LLVM Lang Ref states for signed/unsigned int to float conversions:
"If the value cannot fit in the floating point value, the results are undefined."
And for FP to signed/unsigned int:
"If the value cannot fit in ty2, the results are undefined."
This matches the C definitions.
The existing behavior pins to infinity or a max int value, but that may just
lead to more confusion as seen in:
http://llvm.org/bugs/show_bug.cgi?id=21130
Returning undef will hopefully lead to a less silent failure.
Differential Revision: http://reviews.llvm.org/D5603
llvm-svn: 219542
Currently this only functions to match simple cases
where ds_read2_* / ds_write2_* instructions can be used.
In the future it might match some of the other weird
load patterns, such as direct to LDS loads.
Currently enabled only with a subtarget feature to enable
easier testing.
llvm-svn: 219533
It also makes it more aggressive in querying range information by
adding a call to isKnownPredicateWithRanges to
isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond.
phabricator: http://reviews.llvm.org/D5638
Reviewed by: atrick, hfinkel
llvm-svn: 219532
I was quiet surprised to find this feature being used. Fortunately the uses
I found look fairly simple. In fact, they are just a very verbose version
of the regular ar commands.
Start implementing it then by parsing the script and setting the command
variables as if we had a regular command line.
This patch adds just enough support to create an empty archive and do a bit
of error checking. In followup patches I will implement at least addmod
and addlib.
From the description in the manual, even the more general case should not
be too hard to implement if needed. The features that don't map 1:1 to
the simple command line are
* Reading from multiple archives.
* Creating multiple archives.
llvm-svn: 219521
ScalarEvolution in the presence of multiple exits. Previously all
loops exits had to have identical counts for a loop trip count to be
considered computable. This pessimization was implemented by calling
getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock)
inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME
in the comments of that function). The pessimization was added to fix
a corner case involving undefined behavior (pr/16130). This patch more
precisely handles the undefined behavior case allowing the pessimization
to be removed.
ControlsExit replaces IsSubExpr to more precisely track the case where
undefined behavior is expected to occur. Because undefined behavior is
tracked more precisely we can remove MustExit from ExitLimit. MustExit
was used to track the case where the limit was computed potentially
assuming undefined behavior even if undefined behavior didn't necessarily
occur.
llvm-svn: 219517
Fixes a logic error in the MachineScheduler found by Steve Montgomery (and
confirmed by Andy). This has gone unfixed for months because the fix has been
found to introduce some small performance regressions. However, Andy has
recommended that, at this point, we fix this to avoid further dependence on the
incorrect behavior (and then follow-up separately on any regressions), and I
agree.
Fixes PR18883.
llvm-svn: 219512
Summary: Add the ability to convert 64 or 32 bit floating point values to integer in mips fast-isel
Test Plan:
fpintconv.ll
ran 4 flavors of test-suite with no errors, misp32 r1/r2 O0/O2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, rfuhler, mcrosier
Differential Revision: http://reviews.llvm.org/D5562
llvm-svn: 219511
This change depends on the ApplePropertyString helper that I sent spearately.
Not sure how you want this tested: as a tool test by adding a binary to dump, or as an llvm test starting from an IR file?
Reviewers: dblaikie, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5689
llvm-svn: 219507
DW_AT_specification and DW_AT_abstract_origin resolving was only performed
on subroutine DIEs because it used the getSubroutineName method. Introduce
a more generic getName() and use it to dump the reference attributes.
Testcases have been updated to check the printed names instead of the offsets
except when the name could be ambiguous.
Reviewers: dblaikie, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5625
llvm-svn: 219506
instead
We used to transform this:
define void @test6(i1 %cond, i8* %ptr) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br label %bb2
bb2:
%ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
store i8 2, i8* %ptr.2, align 8
ret void
}
into this:
define void @test6(i1 %cond, i8* %ptr) {
%ptr.2 = select i1 %cond, i8* null, i8* %ptr
store i8 2, i8* %ptr.2, align 8
ret void
}
because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).
The existing transformation that removes unreachable control flow in simplifycfg
is:
/// If BB has an incoming value that will always trigger undefined behavior
/// (eg. null pointer dereference), remove the branch leading here.
static bool removeUndefIntroducingPredecessor(BasicBlock *BB)
Now we generate:
define void @test6(i1 %cond, i8* %ptr) {
store i8 2, i8* %ptr.2, align 8
ret void
}
I did not see any impact on the test-suite + externals.
rdar://18596215
llvm-svn: 219462
Long section names are represented as a slash followed by a numeric
ASCII string. This number is an offset into a string table.
Print the appropriate entry in the string table instead of the less
enlightening /4.
N.B. yaml2obj already does the right thing, this test exercises both
sides of the (de-)serialization.
llvm-svn: 219458
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))
This has 2 benefits: less code / faster code and one less estimate instruction
that may lose precision.
The only target that will be affected (until http://reviews.llvm.org/D5658 is approved)
is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf
or vector sqrtf and 4 less flops for a double-precision sqrt.
We also eliminate a constant load and extra register usage.
Differential Revision: http://reviews.llvm.org/D5682
llvm-svn: 219445
The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem.
llvm-svn: 219441
LLVM assumes INSERT_SUBREG will always have register operands, so
we need to legalize non-register operands, like FrameIndexes, to
avoid random assertion failures.
llvm-svn: 219420
The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x
incorrectly use the XForm_1 instruction format, rather than the
XX1Form instruction format. This is likely a pasto when creating
these instructions, which were based on lvx and so forth. This patch
uses the correct format.
The existing reformatting test (test/MC/PowerPC/vsx.s) missed this
because the two formats differ only in that XX1Form has an extension
to the target register field in bit 31. The tests for these
instructions used a target register of 7, so the default of 0 in bit
31 for XForm_1 didn't expose a problem. For register numbers 32-63
this would be noticeable. I've changed the test to use higher
register numbers to verify my change is effective.
llvm-svn: 219416
This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we
wrongly computed the distance between the highest bits set of two negative
values.
This fixes PR21222.
Differential Revision: http://reviews.llvm.org/D5700
llvm-svn: 219406
This adds the Pat<>'s for the intrinsics. These are necessary because we
don't lower these intrinsics to SDNodes but match them directly. See the
rational in the previous commit.
llvm-svn: 219362
These derive from the new asm-only masking definitions.
Unfortunately I wasn't able to find a ISel pattern that we could legally
generate for the masking variants. The problem is that since the destination
is v4* we would need VK4 register classes and v4i1 value types to express the
masking. These are however not legal types/classes in AVX512f but only in VL,
so things get complicated pretty quickly. We can revisit this question later
if we have a more pressing need to express something like this.
So the ISel patterns are empty for the masking instructions and the next patch
will add Pat<>s instead to match the intrinsics calls with instructions.
llvm-svn: 219361
Summary:
I had forgotten to check for NotSlowIncDec in the patterns that can generate
inc/dec for the above pattern (added in D4796).
This currently applies to Atom Silvermont, KNL and SKX.
Test Plan: New checks on atomic_mi.ll
Reviewers: jfb, nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5677
llvm-svn: 219336
A function with discardable linkage cannot be discarded if its a member
of a COMDAT group without considering all the other COMDAT members as
well. This sort of thing is already handled by GlobalOpt/GlobalDCE.
This fixes PR21206.
llvm-svn: 219335
The icmp-select-icmp optimization targets select-icmp.eq
only. This is now ensured by testing the branch predicate
explictly. This commit also includes the test case for pr21199.
llvm-svn: 219282
COFF normally doesn't allow us to describe the alignment of COMMON
symbols.
It turns out that most linkers use the symbol size as a hint as to how
aligned the symbol should be.
However the BFD folks have added a .drectve command, which we
now support as of r219229, that allows us to specify the alignment
precisely. With this in mind, stop rounding sizes up.
llvm-svn: 219281
Summary:
Fix pr21099
The pseudocode of what we were doing (spread through two functions) was:
if (operand.doesNotFitIn32Bits())
Opc.initializeWithFoo();
if (operand < 0)
operand = -operand;
if (operand.doesFitIn8Bits())
Opc.initializeWithBar();
else if (operand.doesFitIn32Bits())
Opc.initializeWithBlah();
doStuff(Opc);
So for operand == INT32_MIN, Opc was never initialized because the operand changes
from fitting in 32 bits to not fitting, causing the various bugs/error messages
noted by pr21099.
This patch adds an extra test at the beginning for this case, and an
llvm_unreachable to have better error message if the operand ends up
not fitting in 32-bits at the end.
Test Plan: new test + make check
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5655
llvm-svn: 219257
This is somewhat the inverse of how similar bugs in DAE and ArgPromo
manifested and were addressed. In those passes, individual call sites
were visited explicitly, and then the old function was deleted. This
left the debug info with a null llvm::Function* that needed to be
updated to point to the new function.
In the case of DFSan, it RAUWs the old function with the wrapper, which
includes debug info. So now the debug info refers to the wrapper, which
doesn't actually have any instructions with debug info in it, so it is
ignored entirely - resulting in a DW_TAG_subprogram with no high/low pc,
etc. Instead, fix up the debug info to refer to the original function
after the RAUW messed it up.
Reviewed/discussed with Peter Collingbourne on the llvm-dev mailing
list.
llvm-svn: 219249
`LoopUnrollPass` says that it preserves `LoopInfo` -- make it so. In
particular, tell `LoopInfo` about copies of inner loops when unrolling
the outer loop.
Conservatively, also tell `ScalarEvolution` to forget about the original
versions of these loops, since their inputs may have changed.
Fixes PR20987.
llvm-svn: 219241
Added a FIXME coment instead, we need to handle the case where the
two DS instructions being compared have different numbers of operands.
llvm-svn: 219236
The GNU linker supports an -aligncomm directive that allows for power-of-2
alignment of common data. Add support to emit this directive.
llvm-svn: 219229
This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
to a select or a couple of selects (depending if the default block is reachable or not).
The typical case this optimization wants to be able to optimize is this one:
Example:
switch (a) {
case 10: %0 = icmp eq i32 %a, 10
return 10; %1 = select i1 %0, i32 10, i32 4
case 20: ----> %2 = icmp eq i32 %a, 20
return 2; %3 = select i1 %2, i32 2, i32 %1
default:
return 4;
}
It also sets the base for further optimizations that are planned and being reviewed.
llvm-svn: 219223
After some stellar (& inspired) help from Reid Kleckner providing a test
case for some rather unstable undefined behavior showing up as
assertions produced by r214761, I was able to fix this issue in DAE
involving the application of both varargs removal, followed by normal
argument removal.
Indeed I introduced this same bug into ArgumentPromotion (r212128) by
copying the code from DAE, and when I fixed the bug in ArgPromo
(r213805) and commented in that patch that I didn't need to address the
same issue in DAE because it was a single pass. Turns out it's two pass,
one for the varargs and one for the normal arguments, so the same fix is
needed (at least during varargs removal). So here it is.
(the observable/net effect of this bug, even when it didn't result in
assertion failure, is that debug info would describe the DAE'd function
in the abstract, but wouldn't provide high/low_pc, variable locations,
line table, etc (it would appear as though the function had been
entirely optimized away), see the original PR14016 for details of the
general problem)
I'm not recommitting the assertion just yet, as there's been another
regression of it since I last tried. It might just be a few test cases
weren't adequately updated after Adrian or Duncan's recent schema
changes.
llvm-svn: 219210
Summary:
According to the ABI documentation, f128 and {f128} should both be returned
in $f0 and $f2. However, this doesn't match GCC's behaviour which is to
return f128 in $f0 and $f2, but {f128} in $f0 and $f1.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5578
llvm-svn: 219196
Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present.
Fixes PR21169.
llvm-svn: 219194
The plugin API doesn't have the notion of linkonce, only weak. It is up to the
plugin to figure out if a symbol used only for the symbol table can be dropped.
In particular, it has to avoid dropping a linkonce_odr selected by gold if there
is also a weak_odr.
llvm-svn: 219188
The code already folds sign-/zero-extends, but only if they are arguments to
mul and shift instructions. This extends the code to also fold them when they
are direct inputs.
llvm-svn: 219187
Tiny enhancement to the address computation code to also fold sub instructions
if the rhs is constant and can be folded into the offset.
llvm-svn: 219186
This commit fixes an issue with sign-/zero-extending loads that was discovered
by Richard Barton.
We use now the correct load instructions for sign-extending loads to 64bit. Also
updated and added more unit tests.
llvm-svn: 219185
The call to copyAttributesFrom will copy the visibility, which might assert
if it were to produce something invalid like "internal hidden". We avoid it
by first creating the replacement with the original linkage and then setting
it to internal affter the call to copyAttributesFrom.
llvm-svn: 219184
When creating an internal function replacement for use in an alias we were
not remapping the argument uses in the instructions to point to the new
arguments.
llvm-svn: 219177
Takes care of the assert that caused build fails.
Rather than asserting the code checks now that the definition
and use are in the same block, and does not attempt
to optimize when that is not the case.
llvm-svn: 219175
The patch's author points out that, despite the function's documentation,
getSetCCResultType is only used to get the SETCC result type (with one
here-removed problematic exception). In one case, getSetCCResultType was being
used to get the predicate type to use for a SELECT node, and then
SIGN_EXTENDing (or truncating) to get the input predicate to match that type.
Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new
SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was
wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the
extension/truncation seems unnecessary here: SELECT is defined as:
Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1
then the high bits must conform to getBooleanContents.
So here we remove this use of getSetCCResultType and update
getSetCCResultType's documentation to reflect its actual uses.
Patch by deadal nix!
llvm-svn: 219141
This reverts r218944, which reverted r218714, plus a bug fix.
Description of the bug in r218714 (by Nick)
The original patch forgot to check if the Scale in VariableGEPIndex flipped the
sign of the variable. The BasicAA pass iterates over the instructions in the
order they appear in the function, and so BasicAliasAnalysis::aliasGEP is
called with the variable it first comes across as parameter GEP1. Adding a
%reorder label puts the definition of %a after %b so aliasGEP is called with %b
as the first parameter and %a as the second. aliasGEP later calculates that %a
== %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first
parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) -
ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly
conclude that %a > %b.
Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug.
Slightly modified by me to add an early exit from the loop and avoid
unnecessary, but expensive, function calls.
Original commit message:
Two related things:
1. Fixes a bug when calculating the offset in GetLinearExpression. The code
previously used zext to extend the offset, so negative offsets were converted
to large positive ones.
2. Enhance aliasGEP to deduce that, if the difference between two GEP
allocations is positive and all the variables that govern the offset are also
positive (i.e. the offset is strictly after the higher base pointer), then
locations that fit in the gap between the two base pointers are NoAlias.
Patch by Nick White!
llvm-svn: 219135
Codeview line tables for functions in different sections refer to a common
STRING_TABLE_SUBSECTION for filenames.
This happens when building with -Gy or with inline functions with MSVC.
Original patch by Jeff Muizelaar!
llvm-svn: 219125
We used to return PartialAlias if *either* variable being queried interacted
with arguments or globals. AFAICT, we can change this to only returning
MayAlias iff *both* variables being queried interacted with arguments or
globals.
Also, adding some basic functionality tests: some basic IPA tests, checking
that we give conservative responses with arguments/globals thrown in the mix,
and ensuring that we trace values through stores and loads.
Note that saying that 'x' interacted with arguments or globals means that the
Attributes of the StratifiedSet that 'x' belongs to has any bits set.
Patch by George Burgess IV, thanks!
llvm-svn: 219122
output of the llvm-dwarfdump and llvm-objdump report the endianness
used when the object files were generated.
Patch by Charlie Turner.
llvm-svn: 219110
Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical.
Patch by Dmitri Shtilman.
llvm-svn: 219092
This trades a (register-renamer-friendly) movaps for a floating point
/ integer domain cross. That is a very bad trade, even on architectures
where domain crossing is relatively fast. On any chip where there is
even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely
to cause a spill to be introduced because the reason for the copy is to
destructively shuffle in place.
Thanks to Ben Kramer for fixing a bug in this code that my new shuffle
lowering exposed and highlighting that perhaps it should just go away.
=]
llvm-svn: 219090
that are unused.
This allows the combiner to delete math feeding shuffles where the math
isn't actually necessary. This improves some of the vperm2x128 tests
that regressed when the vector shuffle lowering started actually
generating vperm instructions rather than forcibly decomposing them.
Sadly, this isn't enough to get this *really* right because we still
form a completely unnecessary permutation. To fix that, we also need to
fold shuffles which just rearrange concatenated or inserted subvectors.
llvm-svn: 219086
new vector shuffle lowering.
This is loosely based on a patch by Marius Wachtler to the PR (thanks!).
I refactored it a bi to use std::count_if and a mutable array ref but
the core idea was exactly right. I also added some direct testing of
this case.
I believe PR21137 is now the only remaining regression.
llvm-svn: 219081
shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the
few remaining regressions impacting benchmarks from the new vector
shuffle lowering.
You may note that it "regresses" many of the vperm2x128 test cases --
these were actually "improved" by the naive lowering that the new
shuffle lowering previously did. This regression gave me fits. I had
this patch ready-to-go about an hour after flipping the switch but
wasn't sure how to have the best of both worlds here and thought the
correct solution might be a completely different approach to lowering
these vector shuffles.
I'm now convinced this is the correct lowering and the missed
optimizations shown in vperm2x128 are actually due to missing
target-independent DAG combines. I've even written most of the needed
DAG combine and will submit it shortly, but this part is ready and
should help some real-world benchmarks out.
llvm-svn: 219079
For any @llvm.assume intrinsic, if there is another which dominates it and uses
the same condition, then it is redundant and can be removed. While this does
not alter the semantics of the @llvm.assume intrinsics, it makes subsequent
handling more efficient (and the resulting IR easier to read).
llvm-svn: 219067
Update the entire regression test suite for the new shuffles. Remove
most of the old testing which was devoted to the old shuffle lowering
path and is no longer relevant really. Also remove a few other random
tests that only really exercised shuffles and only incidently or without
any interesting aspects to them.
Benchmarking that I have done shows a few small regressions with this on
LNT, zero measurable regressions on real, large applications, and for
several benchmarks where the loop vectorizer fires in the hot path it
shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy
Bridge machines. Running on AMD machines shows even more dramatic
improvements.
When using newer ISA vector extensions the gains are much more modest,
but the code is still better on the whole. There are a few regressions
being tracked (PR21137, PR21138, PR21139) but by and large this is
expected to be a win for x86 generated code performance.
It is also more correct than the code it replaces. I have fuzz tested
this extensively with ISA extensions up through AVX2 and found no
crashes or miscompiles (yet...). The old lowering had a few miscompiles
and crashers after a somewhat smaller amount of fuzz testing.
There is one significant area where the new code path lags behind and
that is in AVX-512 support. However, there was *extremely little*
support for that already and so this isn't a significant step backwards
and the new framework will probably make it easier to implement lowering
that uses the full power of AVX-512's table-based shuffle+blend (IMO).
Many thanks to Quentin, Andrea, Robert, and others for benchmarking
assistance. Thanks to Adam and others for help with AVX-512. Thanks to
Hal, Eric, and *many* others for answering my incessant questions about
how the backend actually works. =]
I will leave the old code path in the tree until the 3 PRs above are at
least resolved to folks' satisfaction. Then I will rip it (and 1000s of
lines of code) out. =] I don't expect this flag to stay around for very
long. It may not survive next week.
llvm-svn: 219046
the various ways in which blends can be used to do vector element
insertion for lowering with the scalar math instruction forms that
effectively re-blend with the high elements after performing the
operation.
This then allows me to bail on the element insertion lowering path when
we have SSE4.1 and are going to be doing a normal blend, which in turn
restores the last of the blends lost from the new vector shuffle
lowering when I got it to prioritize insertion in other cases (for
example when we don't *have* a blend instruction).
Without the patterns, using blends here would have regressed
sse-scalar-fp-arith.ll *completely* with the new vector shuffle
lowering. For completeness, I've added RUN-lines with the new lowering
here. This is somewhat superfluous as I'm about to flip the default, but
hey, it shows that this actually significantly changed behavior.
The patterns I've added are just ridiculously repetative. Suggestions on
making them better very much welcome. In particular, handling the
commuted form of the v2f64 patterns is somewhat obnoxious.
llvm-svn: 219033
perform a load to use blendps rather than movss when it is available.
For non-loads, blendps is *much* faster. It can execute on two ports in
Sandy Bridge and Ivy Bridge, and *three* ports on Haswell. This fixes
one of the "regressions" from aggressively taking the "insertion" path
in the new vector shuffle lowering.
This does highlight one problem with blendps -- it isn't commuted as
heavily as it should be. That's future work though.
llvm-svn: 219022
C++14 adds new builtin signatures for 'operator delete'. This change allows
new/delete pairs to be removed in C++14 onwards, as they were in C++11 and
before.
llvm-svn: 219014
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash. The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).
Original commit message follows.
--
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
llvm-svn: 219010
In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends. During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.
However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node. In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE. Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.
Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat. Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node). We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed. On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.
I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user. Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).
So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE. This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly. (Previous patches used HandleSDNode to detect the
CSE but that's not practical here). The listener is only installed on X86.
I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc. The only thing we pay for is the
creation of the listener. The callback never ever triggers in spec2k since
this is a corner case.
Fixes rdar://problem/18206171
llvm-svn: 219009
The test Atomics-32.ll was both redundant (all operations are also checked by
atomics.ll at least) and not actually checking correctness (it was not using
FileCheck, just verifying that the compiler does not crash).
llvm-svn: 218997
Summary:
hwsync is only required for seq_cst fences, acquire and release one can use
the cheaper lwsync.
Test Plan: Added some cases to atomics.ll + make check-all
Reviewers: jfb, wschmidt
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5317
llvm-svn: 218995
Summary:
The register names t4-t7 are not available in the N32 and N64 ABIs.
This patch prints a warning, when those names are used in N32/64,
along with a fix-it with the correct register names.
Patch by Vasileios Kalintiris
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5272
llvm-svn: 218989
and MOVSD nodes for single element vector inserts.
This is particularly important because a number of patterns in the
backend detect these patterns and leverage them to simplify things. It
also fixes quite a few of the insertion bad code examples. However, it
regresses a specific area: when available, blendps and blendpd are
*dramatically* faster than movss and movsd respectively. But it doesn't
really work to form the blend logic first because the blends *aren't* as
crazy efficient when the data is coming from memory anyways, and thus
will have a movss or movsd regardless. Also, doing that would block
a bunch of the patterns that this is designed to hit.
So my plan is to go into the patterns for lowering MOVSS and MOVSD and
lower them via blends when available. However that's a pretty invasive
restructuring so it will need to be a follow-up patch.
I have already gone into the patterns to lower MOVSS and MOVSD from
memory using MOVLPD, etc. Without that, several of the test cases
I already have regress.
llvm-svn: 218985
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.
See PR18996.
llvm-svn: 218981
I got them quite wrong when updating it and had the SSE4.1 run checked
for SSE2 and the SSE2 run checked for SSE4.1. I think everything was
actually generic SSE, but this still seems good to fix. While here,
hoist the triple into the IR and make the flag set a bit more direct in
what it is trying to test.
llvm-svn: 218978
lowering to match VZEXT_MOVL patterns.
I hadn't realized that these had sufficient pattern smarts in the
backend to lower zext-ing from the low element of a vector without it
being a scalar_to_vector node. They do, and this is how to match a bunch
of patterns for movq, movss, etc.
There is a weird propensity to end up using pshufd to place the element
afterward even though it means domain crossing (or rather, to use
xorps+movss to zext the element rather than movq) but that's an
orthogonal problem with VZEXT_MOVL that someone should probably look at.
llvm-svn: 218977
vector to a zero vector for the v2 cases and fix the v4 integer cases to
actually blend from a vector.
There are already seprate tests for the case of inserting from a scalar.
These cases cover a lot of the regressions I've seen in the regression
test suite for the new vector shuffle lowering and specifically cover
the reported lack of using various zext-ing instruction patterns. My
next patch should fix a big chunk of this, but wanted to get a nice
baseline for these patterns in the test cases first.
llvm-svn: 218976
element types to form illegal vector types.
I've added a special SSE1 test case here that makes sure we don't break
this going forward.
llvm-svn: 218974
testing that we generated divps and divss but not in a very systematic
way. There are other tests for widening binary operations already that
make these unnecessary.
The second one seems mostly about testing Atom as well as normal X86,
but despite the comment claiming it is testing a different instruction
sequence, it then tests for exactly the same div instruction sequence!
(The sequence of instructions is actually quite different on Atom, but
not the sequence of div instructions....)
And then it has an "execution" test that simply isn't run? Very strange.
Anyways, none of this is really needed so clean this up.
llvm-svn: 218972
intergrated much more fully into some logical part of the backend to
really understand what it is trying to accomplish and how to update it.
I suspect it no longer holds enough value to be worth having.
llvm-svn: 218950
shufle switch.
I nuked a win64 config from one test as it doesn't really make sense to
cover that ABI specially for generic v2f32 tests...
llvm-svn: 218948
This patch broke 447.dealII on Darwin. I'm currently working on a reduced
test-case, but reverting for now to keep the bots happy.
<rdar://problem/18530107>
llvm-svn: 218944
test cases that will change with the new vector shuffle lowering. This
gives us a nice baseline for deltas against. I've checked and removed
the cases where there were weird register usage being pinned down, and
all of these are extremely pin-pointed tests so fully checking them
seems very appropriate.
llvm-svn: 218941
tighter, more strict FileCheck assertions. Some of these I really like
as they show case exactly what instruction sequences come out of these
microscopic functionality tests.
llvm-svn: 218936
baseline for updates from the new vector shuffle lowering.
I've inspected the results here, and I couldn't find any register
allocation decisions where there should be any realistic way to register
allocate things differently. The closest was the imul test case. If you
see something here you'd like register number variables on, just shout
and I'll add them.
llvm-svn: 218935
need to be updated for the new vector shuffle lowering.
After talking to Adam Nemet, Tim Northover, etc., it seems that testing
MC encodings in the same suite as the basic codegen isn't the right
approach. Instead, we're going to want dedicated MC tests for the
encodings. These encodings are starting to get in my way so I wanted to
cut them out early. The total set of instructions that should have
encoding tests added is:
vpaddd
vsqrtss
vsqrtsd
vmovlhps
vmovhlps
valignq
vbroadcastss
Not too many parts of these tests were even using this. =]
llvm-svn: 218932
Older Book-E cores, such as the PPC 440, support only msync (which has the same
encoding as sync 0), but not any of the other sync forms. Newer Book-E cores,
however, do support sync, and for performance reasons we should allow the use
of the more-general form.
This refactors msync use into its own feature group so that it applies by
default only to older Book-E cores (of the relevant cores, we only have
definitions for the PPC440/450 currently).
llvm-svn: 218923
Summary:
Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.
For example, this patch turns
```
define void @store_i8_unordered(i8* %mem) {
store atomic i8 42, i8* %mem unordered, align 1
ret void
}
```
from
```
_store_i8_unordered: ; @store_i8_unordered
; BB#0:
rlwinm r2, r3, 3, 27, 28
li r4, 42
xori r5, r2, 24
rlwinm r2, r3, 0, 0, 29
li r3, 255
slw r4, r4, r5
slw r3, r3, r5
and r4, r4, r3
LBB4_1: ; =>This Inner Loop Header: Depth=1
lwarx r5, 0, r2
andc r5, r5, r3
or r5, r4, r5
stwcx. r5, 0, r2
bne cr0, LBB4_1
; BB#2:
blr
```
into
```
_store_i8_unordered: ; @store_i8_unordered
; BB#0:
li r2, 42
stb r2, 0(r3)
blr
```
which looks like a pretty clear win to me.
Test Plan:
fixed the tests + new test for indexed accesses + make check-all
Reviewers: jfb, wschmidt, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5587
llvm-svn: 218922
Do not eliminate the frame pointer if there is a stackmap or patchpoint in the
function. All stackmap references should be FP relative.
This fixes PR21107.
llvm-svn: 218920
This patch defines a new iterator for the imported symbols.
Make a change to COFFDumper to use that iterator to print
out imported symbols and its ordinals.
llvm-svn: 218915
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
llvm-svn: 218914
elements as well as integer elements in order to form simpler shuffle
patterns.
This is the primary reason why we were failing to match some of the
2-and-2 floating point shuffles such as PR21140. Even after fixing this
we need to support some extra patterns in the backend in order to match
the resulting X86ISD::UNPCKL nodes into the correct instructions. This
commit should fix PR21140 and includes more comprehensive testing of
insertion patterns in v4 shuffles.
Not all of the added tests are beautiful. For example, we don't have
clever instructions to insert-via-load in the integer domain. There are
also some places where we aren't sufficiently cunning with our use of
movq and movd, but that's future work.
llvm-svn: 218911
When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X.
This can happen in the real world when calculating x ** 3/2. This occurs
in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c.
Differential Revision: http://reviews.llvm.org/D5584
llvm-svn: 218906
floating point and integer domains.
Merge the AVX2 test into it and add an extra RUN line. Generate clean
FileCheck statements with my script. Remove the now merged AVX2 tests.
llvm-svn: 218903
When the flag is given, the command prints out the COFF import table.
Currently only the import table directory will be printed.
I'm going to make another patch to print out the imported symbols.
The implementation of import directory entry iterator in
COFFObjectFile.cpp was buggy. This patch fixes that too.
http://reviews.llvm.org/D5569
llvm-svn: 218891
My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]'
thus inverting index expression. This patch fixes it.
Thanks to Jörg Sonnenberger for pointing.
Differential Revision: http://reviews.llvm.org/D5576
llvm-svn: 218867
This file isn't really doing anything useful. Many of the tests that
seem to be combined are also repeats from other test files. Many of the
other tests, despite the comment that they should be combined into
a single shuffle... well... aren't combined into a single shuffle.
=/
llvm-svn: 218862
least seem *slightly* more interesting test wise, although given how
spotily we actually combine anything, I remain somewhat suspicious.
llvm-svn: 218861
checks for all the ISA variants.
If the SSE2 checks here terrify you, good. This is (in large part) the
kind of amazingly bad code that is holding LLVM back when vectorizing on
older ISAs.
At the same time, these tests seem increasingly dubious to me. There are
a very large number of tests and it isn't clear that they are
systematically covering a specific set of functionality. Anyways,
I don't want to reduce testing during the transition, I just want to
consolidate it to where it is easier to manage.
llvm-svn: 218860
file.
Some of these really don't make sense to test -- we're testing for the
*lack* of combining two shuffles into one, presumably because the two
would generate better shuffles in the end. But if you look at the
generated code shown here, in many cases the generated code is, frankly,
terrible. Or we combine any two generated shuffles back into a single
instruction! I've left a FIXME to revisit these decisions.
llvm-svn: 218859
and use the new grouped FileCheck patterns to match them.
No interesting changes yet, but this test is now in proper form to have
the other shuffle combining tests merged into it.
llvm-svn: 218857
The test has to do with DAG combines, and so it doesn't need the new
vector shuffle lowering to be effective. Also, it has a nice in-IR
triple string which we should really be using rather than command line
flags (unless it varies form RUN-line to RUN-line). Finally, I much
prefer letting LLVM synthesize the correct datalayout string from the
triple rather than baking one in here that will just become stale.
llvm-svn: 218856
generic DAG combining of shuffles relevant to x86.
My plan is to fold a bunch of the other DAG combining test cases into
this one, while converting them to use the nice new FileCheck assertion
syntax.
llvm-svn: 218855
a bare-metal triple and have nice BB labels, etc.
No significant change here, just tidying up to have a consistent set of
OS-agnostic vector functionality here.
llvm-svn: 218854
matching and lowering 64-bit insertions.
The first problem was that we weren't looking through bitcasts to
discover that we *could* lower as insertions. Once fixed, we in turn
weren't looking through bitcasts to discover that we could fold a load
into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR
node around the inserted element and instead were passing a scalar to
a DAG node that expected a vector. It turns out there are some patterns
that will "lower" this into the correct asm, but the rest of the X86
backend is very unhappy with such antics.
This should fix a few more edge case regressions I've spotted going
through the regression test suite to enable the new vector shuffle
lowering.
llvm-svn: 218839
Negative FABS of either a scalar or vector should be handled the same way
on x86 with SSE/AVX: a single OR instruction of the FP operand with a
constant to light up the sign bit(s).
http://llvm.org/bugs/show_bug.cgi?id=20578
Differential Revision: http://reviews.llvm.org/D5201
llvm-svn: 218822
test file.
This old test had a bunch of functions that were never even checked. =/
The only thing it really did was to make sure that we did something
reasonable in 32-bit mode with SSE4.1. Adding another run line to the
main vector-sext.ll test seems a better way to do that.
llvm-svn: 218810
of architectures: SSE2, SSSE3, SSE4.1, AVX, and AVX2.
Unfortunately, this exposses the absolute horror of the code we generate
for many of these patterns. Anyone wanting to familiarize themselves
with the x86 backend and improve performance could do a lot of good
sitting down and making these test cases not look so terrible. While the
new vector shuffle code I'm working on well help some, it won't fix all
of the crimes here.
llvm-svn: 218807
As discussed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html
And again here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html
The sqrt of a negative number when using the llvm intrinsic is undefined.
We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref.
This change should not affect any code that isn't using "no-nans-fp-math";
ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call.
Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and
possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either.
We knowingly approve of this difference with the other compilers in an attempt to flag code
that is invoking undefined behavior.
A front-end warning should also try to convince the user that the program will fail:
http://llvm.org/bugs/show_bug.cgi?id=21093
Differential Revision: http://reviews.llvm.org/D5527
llvm-svn: 218803
These tests are far and away the best sext and zext tests we have for
vectors. I'm going to merge the other similar tests into them and expand
the ISA coverage.
llvm-svn: 218800
script to make them nice and predictable. This will ease updating them
for the new vector shuffle lowering and seeing the delta if any.
llvm-svn: 218795
avx-sext.ll using my new script.
Also add an AVX2 mode to this test.
Part of cleaning up the test suite before enabling the new vector
shuffle lowering. This also highlights some of the abysmal failures of
the old shuffle lowering. Check out those 'pinsrw' and 'pextrw'
sequences!
llvm-svn: 218794
As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.
rdar://problem/18011155
llvm-svn: 218789
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel
Test Plan:
fptrunc.ll
checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: rfuhler
Differential Revision: http://reviews.llvm.org/D5553
llvm-svn: 218785
r206400 and r209442 added remarks that are disabled by default.
However, if a diagnostic handler is registered, the remarks are sent
unfiltered to the handler. This is the right behaviour for clang, since
it has its own filters.
However, the diagnostic handler exposed in the LTO API receives only the
severity and message. It doesn't have the information to filter by pass
name. For LTO, disabled remarks should be filtered by the producer.
I've changed `LLVMContext::setDiagnosticHandler()` to take a `bool`
argument indicating whether to respect the built-in filters. This
defaults to `false`, so other consumers don't have a behaviour change,
but `LTOCodeGenerator::setDiagnosticHandler()` sets it to `true`.
To make this behaviour testable, I added a `-use-diagnostic-handler`
command-line option to `llvm-lto`.
This fixes PR21108.
llvm-svn: 218784