This provides an implementation of CFL alias analysis (including some
supporting data structures). Currently, we don't have any extremely fancy
features, sans some interprocedural analysis (i.e. no field sensitivity, etc.),
and we do best sitting behind BasicAA + TBAA. In such a configuration, we take
~0.6-0.8% of total compile time, and give ~7-8% NoAlias responses to queries
TBAA and BasicAA couldn't answer when bootstrapping LLVM. In testing this on
other projects, we've seen up to 10.5% of queries dropped by BasicAA+TBAA
answered with NoAlias by this algorithm.
Patch by George Burgess IV (with minor modifications by me -- mostly adapting
some BasicAA tests), thanks!
llvm-svn: 216970
This change moves FastISel for AArch64 to target-dependent instruction selection
only. This change replicates the existing target-independent behavior, therefore
there are no changes to the unit tests or new tests.
Future changes will take advantage of this change and update functionality
and unit tests.
llvm-svn: 216955
This allows the target to disable target-independent instruction selection and
jump directly into the target-dependent instruction selection code.
This can be beneficial for targets, such as AArch64, which could emit much
better code, but never got a chance to do so, because the target-independent
instruction selector was able to find an instruction sequence.
llvm-svn: 216947
We duplicate ~30 lines of code to lower FABS and FNEG for x86, so this patch combines them into one function.
No functional change intended, so no additional test cases. Test-suite behavior is unchanged.
Differential Revision: http://reviews.llvm.org/D5064
llvm-svn: 216942
Summary:
MemoryDependenceAnalysis is currently cautious when the QueryInstr is an atomic
load or store, but I forgot to check for atomic cmpxchg/atomicrmw. This patch
is a way of fixing that, and making it less brittle (i.e. no risk that I forget
another possible kind of atomic, even if the IR ends up changing in the future),
by adding a fallback checking mayReadOrWriteFromMemory.
Thanks to Philip Reames for finding this bug and suggesting this solution in
http://reviews.llvm.org/D4845
Sadly, I don't see how to add a test for this, since the passes depending on
MemoryDependenceAnalysis won't trigger for an atomic rmw anyway. Does anyone
see a way for testing it?
Test Plan: none possible at first sight
Reviewers: jfb, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5019
llvm-svn: 216940
"Setting" does not equal "copying". This bug has sat dormant for 2 reasons:
1. The unit test was not adequate.
2. Every current user of the "copyFastMathFlags" API is operating on a new instruction.
(ie, all existing fast-math flags are off). If you copy flags to an existing
instruction that has some flags on already, you will not necessarily turn them off
as expected.
I uncovered this bug while trying to implement a fix for PR20802.
llvm-svn: 216939
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.
llvm-svn: 216932
This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour
Reviewed by Andy Trick and Chandler C
llvm-svn: 216919
When folding a fused multiply-add builtin call, make sure that we propagate the
correct result in the case where the addend is zero, and the two other operands
are finite non-zero.
Example:
define double @test() {
%1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0)
ret double %1
}
Before this patch, the instruction simplifier wrongly folded the builtin call
in function @test to constant 'double 7.0'.
With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and
propagates the expected result (i.e. 56.0).
Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra
test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence
of NaN/Inf operands.
This fixes PR20832.
Differential Revision: http://reviews.llvm.org/D5152
llvm-svn: 216913
Summary:
BBs might contain non-LCSSA'd values after the LCSSA pass is run if they
are unreachable from the entry block.
Normally, the users of the instruction would be PHIs but the unreachable
BBs have normal users; rewrite their uses to be undef values.
An alternative fix could involve fixing this at LCSSA but that would
require this invariant to hold after subsequent transforms. If a BB
created an unreachable block, they would be in violation of this.
This fixes PR19798.
Differential Revision: http://reviews.llvm.org/D5146
llvm-svn: 216911
Summary:
Left shift of negative integer is an undefined behavior, and
is reported by UBSan. It's ok for imm values to be negative, so we can
just replace left shifts with multiplications.
Test Plan: check-llvm test suite
Reviewers: t.p.northover
Reviewed By: t.p.northover
Subscribers: aemerson, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D5132
llvm-svn: 216910
When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).
llvm-svn: 216908
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.
This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.
Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)
Original commit message (by Adam Nemet):
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.
This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part). See the testcase.
In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off. This is the
CommitTargetLoweringOpt piece.
I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.
Fixes <rdar://problem/16031651>
llvm-svn: 216898
r208640 was reverted because it caused a self-hosting failure on ppc64. The
underlying cause was the formation of ISD::ADD nodes with ISD::TargetConstant
operands. Because we have no patterns for 'add' taking 'timm' nodes, these are
selected as r+r add instructions (which is a miscompile). Guard against this
kind of behavior in the future by making the backend crash should this occur
(instead of silently generating invalid output).
llvm-svn: 216897
The structures for Windows unwinding are shared across multiple platforms.
Indicate the encoding to be used for the particular target. Use this to switch
the unwind emitter instantiated by the AsmPrinter.
llvm-svn: 216895
This is an enum class, and will be appropriately prefixed, making the encoding
type prefix redundant. No change to any uses as the use of this was not yet
introduced.
llvm-svn: 216893
SROA may decide that it needs to insert a bitcast and would set it's
insertion point before a PHI. This will create an invalid module
right quick.
Instead, choose the first insertion point in the basic block that holds
our PHI.
This fixes PR20822.
Differential Revision: http://reviews.llvm.org/D5141
llvm-svn: 216891
This reverts commit r216698 which reverted r216523 and r216598.
We would attempt to perform the transformation even if the match()
failed because, as a side effect, it would set V. This would trick us
into believing that we correctly found a place to correctly apply the
transform.
An additional test case was added to getelementptr.ll so that we might
not regress in the future.
llvm-svn: 216890
This change will ease refactoring LowerFABS() and LowerFNEG()
since they have a lot of overlap.
Remove the creation of a floating point constant from an integer
because it's going to be used for a bitwise integer op anyway.
No change to codegen expected, but the verbose comment string
for asm output may change from float values to hex (integer),
depending on whether the constant already exists or not.
Differential Revision: http://reviews.llvm.org/D5052
llvm-svn: 216889
The loop vectorizer preserves wrapping, exact, and fast-math properties of scalar instructions.
This patch adds a convenience method to make that operation easier because we need to do this
in the loop vectorizer, SLP vectorizer, and possibly other places.
Although this is a 'no functional change' patch, I've added a testcase to verify that the exact
flag is preserved by the loop vectorizer. The wrapping and fast-math flags are already checked
in existing testcases.
Differential Revision: http://reviews.llvm.org/D5138
llvm-svn: 216886
This patch implements a few changes related to the Thumb2 M-class MSR instruction:
* better handling of unpredictable encodings,
* recognition of the _g and _nzcvqg variants by the asm parser only if the DSP
extension is available, preferred output of MSR APSR moves with the _<bits>
suffix for v7-M.
Patch by Petr Pavlu.
llvm-svn: 216874
implicit uses of the whole register when a sub register is defined.
Now the same iterator is used in the rematerilization loop as in the
spill loop later.
Patch provided by Mikael Holmen.
This fix was proposed and reviewed by Quentin Colombet,
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/076135.html.
Unfortunately, this error in the rematerilization code has only been
seen in a large test case for an out-of-tree target, and is probably
hard to reproduce on an in-tree target. Therefore, no testcase is
provided.
llvm-svn: 216873
chain became completely broken here as *all* intrinsic users ended up
being skipped, and the ones that seemed to be singled out were actually
the exact wrong set.
This is a great example of why long else-if chains can be easily
confusing. Switch the entire code to use early exits and early continues
to have simpler (and more importantly, correct) logic here, as well as
fixing the reversed logic for detecting and continuing on lifetime
intrinsics.
I've also significantly cleaned up the test case and added another test
case demonstrating an example where the optimization is not (trivially)
safe to perform.
llvm-svn: 216871
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.
This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.
Re-applying again due to MSVC 2013 being minimum requirement, and this patch
having C++11 that MSVC 2012 didn't support.
Fixes PR20655.
llvm-svn: 216870
This feeds AA through the IFI structure into the inliner so that
AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which
functions only access their arguments (instead of just hard-coding some
knowledge of memory intrinsics). Most of the information is only available from
BasicAA; this is important for preserving alias scoping information for
target-specific intrinsics when doing the noalias parameter attribute to
metadata conversion.
llvm-svn: 216866
I thought that I had fixed this problem in r216818, but I did not do a very
good job. The underlying issue is that when we add alias.scope metadata we are
asserting that this metadata completely describes the aliasing relationships
within the current aliasing scope domain, and so in the context of translating
noalias argument attributes, the pointers must all be based on noalias
arguments (as underlying objects) and have no other kind of underlying object.
In r216818 excluding appropriate accesses from getting alias.scope metadata is
done by looking for underlying objects that are not identified function-local
objects -- but that's wrong because allocas, etc. are also function-local
objects and we need to explicitly check that all underlying objects are the
noalias arguments for which we're adding metadata aliasing scopes.
This fixes the underlying-object check for adding alias.scope metadata, and
does some refactoring of the related capture-checking eligibility logic (and
adds more comments; hopefully making everything a bit clearer).
Fixes self-hosting on x86_64 with -mllvm -enable-noalias-to-md-conversion (the
feature is still disabled by default).
llvm-svn: 216863
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
in isPostDominatedBy, use the real MachinePostDominatorTree. The old
heuristics caused instructions to sink unnecessarily, and might create
register pressure.
Test Plan:
Added a NVPTX codegen test to verify that our change is in effect. It also
shows the unnecessary register pressure caused by over-sinking. Updated
affected tests in AArch64 and X86.
Reviewers: eliben, meheff, Jiangning
Reviewed By: Jiangning
Subscribers: jholewinski, aemerson, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D4814
llvm-svn: 216862