The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
leading to FrameLowering setting the frame pointer incorrectly and clobbering
the stack.
But byval handling had grown over many years, and had multiple layers of cruft
trying to compensate for each other and calculate padding correctly. This only
really needs to be done once, in the HandleByVal function. Elsewhere should
just do what it's told by that call.
I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits
byvals with the correct C ABI alignment), which simplified HandleByVal.
rdar://20095672
llvm-svn: 231959
DW_AT_low_pc on functions is taken care of by the relocation processing, but
DW_AT_high_pc and DW_AT_low_pc on other lexical scopes need special handling.
llvm-svn: 231955
This is a follow-up to r231182. This adds the "vbroadcasti128" instruction
back, but without the intrinsic mapping. Also add a test to check the
instriction encoding.
This is related to rdar://problem/18742778.
llvm-svn: 231945
Summary:
The generic ELF TargetObjectFile defaults to .ctors, but Linux's
defaults to .init_array by calling InitializeELF with the value of
UseInitArray from TargetMachine. Make NaCl's behavior match.
Reviewers: jvoung
Differential Revision: http://reviews.llvm.org/D8240
llvm-svn: 231934
The CallGraphNode function "addCalledFunction()" asserts that edges are not to intrinsics.
This patch makes sure that the Inliner does not add such an edge to the callgraph.
Fix for clang crash by assertion: https://llvm.org/bugs/show_bug.cgi?id=22857
Differential Revision: http://reviews.llvm.org/D8231
llvm-svn: 231927
As of r231908, the test I added in r231902 actually gets run - but I'd
checked in a stale version of the input so it didn't pass. Fix the
input and un-xfail the test.
llvm-svn: 231911
This causes a crash if the referenced intrinsic was malformed. In this case, we
would already have reported an error on the referenced intrinsic, but then
crashed on the second one when it tried to introspect the first without
error checking.
llvm-svn: 231910
There were also errors in the CHECK line which I fixed and the test
doesn't actually pass as the "100" is in the wrong line. Not sure
whether this is a test failure or a coverage failure so making the test
XFAIL for now.
llvm-svn: 231908
Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass.
I noticed this completely by accident in another test case. It's not anything that actually came from a real workload.
p.s. We should probably do the same thing for switch instructions.
Differential Revision: http://reviews.llvm.org/D8220
llvm-svn: 231881
There are still 4 tests that check for DW_AT_MIPS_linkage_name,
because they specify DWARF 2 or 3 in the module metadata. So, I didn't
create an explicit version-based test for the attribute.
Differential Revision: http://reviews.llvm.org/D8227
llvm-svn: 231880
This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.
Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems. In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow. Even with that restriction, it handles many interesting cases:
* Early exits from functions
* Early exits from loops (for context instructions in the loop and after the check)
* Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)
Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.
This patch implements two approaches to the problem that need further benchmarking. Approach 1 is to directly walk the dominator tree looking for interesting conditions. Approach 2 is to inspect other uses of the value being queried for interesting comparisons. From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated.
Differential Revision: http://reviews.llvm.org/D7708
llvm-svn: 231879
- Use TargetLowering to check for the actual cost of each extension.
- Provide a factorized method to check for the cost of an extension:
TargetLowering::isExtFree.
- Provide a virtual method TargetLowering::isExtFreeImpl for targets to be able
to tune the cost of non-free extensions.
This refactoring offers a better granularity to model what really happens on
different targets.
No performance changes and very few code differences.
Part of <rdar://problem/19267165>
llvm-svn: 231855
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
(v4i32 (uaddv ...))
is the same as
(v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
(v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(i32 (int_aarch64_neon_uaddv ...)), ssub)
In a combine, we transform all such across-vector-lanes intrinsics to:
(i32 (extract_vector_elt (uaddv ...), 0))
This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs. Consider:
uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
return vmulq_n_u32(a, vaddvq_u32(b));
}
We now generate:
addv.4s s1, v1
mul.4s v0, v0, v1[0]
instead of the previous:
addv.4s s1, v1
fmov w8, s1
dup.4s v1, w8
mul.4s v0, v1, v0
rdar://20044838
llvm-svn: 231840
Follow up from r231505.
Fix the non-determinism by using a MapVector and reintroduce the AArch64
testcase. Defer deleting the got candidates up to the end and remove
them in a bulk, avoiding linear time removal of each element.
Thanks to Renato Golin for trying it out on other platforms.
llvm-svn: 231830
The dependences are now expose through the new getInterestingDependences
API so we can use that with -analyze too and fix the FIXME.
This lets us remove the test that relied on -debug to check the
dependences.
llvm-svn: 231807
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
(Resubmitting this change after not being able to reproduce buildbot failure)
Differential Revision: http://reviews.llvm.org/D7760
llvm-svn: 231800
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.
This is the sibling patch for the Clang half of this change:
http://reviews.llvm.org/D8088
Differential Revision: http://reviews.llvm.org/D8086
llvm-svn: 231794
This crash occurs due to memory corruption when trying to update dependency
direction based on Constraints.
This crash was observed during lnt regression of Polybench benchmark test case dynprog.
Review: http://reviews.llvm.org/D8059
llvm-svn: 231788
This crash in Dependency analysis is because we assume here that in case of UsefulGEP
both source and destination have the same number of operands which may not be true.
This incorrect assumption results in crash while populating Pairs. Fix the same.
This crash was observed during lnt regression for code such as-
struct s{
int A[10][10];
int C[10][10][10];
} S;
void dep_constraint_crash_test(int k,int N) {
for( int i=0;i<N;i++)
for( int j=0;j<N;j++)
S.A[0][0] = S.C[0][0][k];
}
Review: http://reviews.llvm.org/D8162
llvm-svn: 231784
We failed to use a marking set to properly handle recursive types, which caused use
to recurse infinitely and eventually overflow the stack.
llvm-svn: 231760
In this situation we would always have already flagged an error on the statepoint intrinsic,
but then we carry on to parse other, related GC intrinsics, and could end up crashing during that
verification when they try to access data from the malformed statepoint.
llvm-svn: 231759
ReplaceInstUsesWith needs to return nullptr when the input has no users,
because in that case it does not mutate the program. Otherwise, we can
get stuck in an infinite loop of repeatedly attempting to constant fold
and instruction with no users.
llvm-svn: 231755
They mark the start of a compile unit, so name them .Lcu_*. Using
Section->getLabelBeginName() makes it looks like they mark the start of the
section.
While at it, switch to createTempSymbol to avoid collisions with labels
created in inline assembly. Not sure if a "don't crash" test is worth it.
With this getLabelBeginName is dead, delete it.
llvm-svn: 231750
CFLAA didn't know how to properly handle ConstantExprs; it would silently
ignore them. This was a problem if the ConstantExpr is, say, a GEP of a global,
because CFLAA wouldn't realize that there's a global there. :)
llvm-svn: 231743
We now treat pointers given to ptrtoint and pointers retrieved from
inttoptr as similar to arguments or globals (can alias anything, etc.)
This solves some of the problems we were having with giving incorrect
results.
llvm-svn: 231741
It turns out accelerator tables where totally broken if they contained
entries with colliding hashes. The failure mode is pretty bad, as it not
only impacted the colliding entries, but would basically make all the
entries after the first hash collision pointing in the wrong place.
The testcase uses the symbol names that where found to collide during a
clang build.
From a performance point of view, the patch adds a sort and a linear
walk over each bucket contents. While it has a measurable impact on the
accelerator table emission, it's not showing up significantly in clang
profiles (and I'd argue that correctness is priceless :-)).
llvm-svn: 231732
This fixes a subtle issue that was introduced in r205153.
When reusing a store for the extractelement expansion (to load directly
from it, inserting of going through the stack), later stores to the
same location might have overwritten the data we were expecting to
extract from.
To fix that, we need to explicitly replace the chain going out of the
reused store, so that later stores also have an explicit dependency on
the generated element-extracting loads, and can't clobber them.
rdar://20066785
Differential Revision: http://reviews.llvm.org/D8180
llvm-svn: 231721
Fix the double-deletion of AnalysisResolver when delegating through to
Dwarf EH preparation by creating one from scratch. Hopefully the new
pass manager simplifies this.
This reverts commit r229952.
llvm-svn: 231719
Summary:
This removes some duplicated code, and also helps optimization: e.g. in
the test case added, `%idx ULT 128` in `@x` is not currently optimized
to `true` by `-indvars` but will be, after this change.
The only functional change in ths commit is that for add recurrences,
ScalarEvolution::getRange will be more aggressive -- computing the
unsigned (resp. signed) range for a SCEVAddRecExpr will now look at the
NSW (resp. NUW) bits and check for signed (resp. unsigned) overflow.
This can be a strict improvement in some cases (such as the attached
test case), and should be no worse in other cases.
Reviewers: atrick, nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8142
llvm-svn: 231709
Summary:
Unused in this commit, but will be used in a subsequent change (D8142)
by a FileCheck test.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8143
llvm-svn: 231708