Added metadata to be able to make statistics on how many functions
that have been imported have been removed. Also module name might
be helpfull when debugging.
Reviewers: tejohnson, eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D21943
llvm-svn: 274668
Summary:
Fixes an incorrect assert that fails on 128-bit-sized loads or stores.
Augments the wset tests to include this case.
Reviewers: aizatsky
Subscribers: vitalybuka, zhaoqin, kcc, eugenis, llvm-commits
Differential Revision: http://reviews.llvm.org/D22062
llvm-svn: 274666
Now with a corrected test to account for a recently supported properties bit in the debug info of a struct.
Original review: http://reviews.llvm.org/D21939
This reverts commit 970c3fd497a28d25dd69526eb52594a696c37968.
llvm-svn: 274661
The dse_with_dbg_value.ll test committed with r273141 is removed because this
we no longer performs any type of back tracking, which is what was causing the
codegen differences with and without debug information.
Differential Revision: http://reviews.llvm.org/D21613
llvm-svn: 274660
Text suggested by Daniel Berlin. While it is likely to be exactly what
the advisory committee would do anyway, codifying it does no harm and
helps reassure people that rare does not mean arbitrary.
Differential Revision: http://reviews.llvm.org/D21981
llvm-svn: 274659
This is "cvtdq2ps" which does not appear to be particularly slow on any CPU
according to Agner's tables. Choosing "5" as a cost here as suggested in:
https://llvm.org/bugs/show_bug.cgi?id=21356
...but it seems very conservative given that the instruction is fully pipelined,
and I think these costs are supposed to model throughput.
Note that related costs are also most likely too high, but this fixes PR21356
and partly fixes PR28434.
llvm-svn: 274658
We were still crashing in the "no change" case because LVI was not
getting invalidated.
See the thread "Should analyses be able to hold AssertingVH to IR?
(related to PR28400)" for more discussion.
llvm-svn: 274656
This logic was introduced in r157663 and does not make any sense to me.
The motivating example in rdar://11538365 looks like this:
This is the tail:
BB#16: derived from LLVM BB %if.end68
Live Ins: %R0 %R4 %R5
Predecessors according to CFG: BB#15 BB#5
tBLXi pred:14, pred:%noreg, <ga:@CFRelease>, %R0<kill>, <regmask>, %LR<imp-def,dead>, %SP<imp-use>, %SP<imp-def>
t2B <BB#20>, pred:14, pred:%noreg
Successors according to CFG: BB#20
This is the predBB:
BB#5:
Live Ins: %R5
Predecessors according to CFG: BB#4
%R4<def> = t2MOVi 0, pred:14, pred:%noreg, opt:%noreg
t2B <BB#16>, pred:14, pred:%noreg
Successors according to CFG: BB#16
However this is invalid machine code to begin with, if %R0 is live-in to
BB#16 then it must be live-in to BB#5 as well if BB#5 does not define
it. We should not need logic to retroactively fix broken machine code
and in fact the example from r157663 passes cleanly with the code
removed and I do not see any (newly) failing tests with the machine
verifier enabled.
Differential Revision: http://reviews.llvm.org/D22031
llvm-svn: 274655
Cast cost tables are now sorted, for each cast type, lexicographically on
[source base type, source vector width, dest base type, base vector width].
llvm-svn: 274653
On SystemZ, shift and rotate instructions only use the bottom 6 bits of the shift/rotate amount.
Therefore, if the amount is ANDed with an immediate mask that has all of the bottom 6 bits set, we
can remove the AND operation entirely.
Differential Revision: http://reviews.llvm.org/D21854
llvm-svn: 274650
We were checking for 2 insertions (which is caught earlier in the pattern matching loop) instead of the case where we have no insertions.
Turns out this code never fires as we always try to lower to insertps after trying to lower to blendps, which would catch these cases - I'm about to make some changes to support combining to insertps which could cause this to fire so I don't want to remove it.
llvm-svn: 274648
There is a problem in VSXSwapRemoval where it is incorrectly removing permute instructions.
In this case, the permute is feeding both a vector store and also a non-store instruction. In this case, the permute cannot be removed.
The fix is to simply look at all the uses of the vector register defined by the permute and ensure that all the uses are vector store instructions.
This problem was reported in PR 27735 (https://llvm.org/bugs/show_bug.cgi?id=27735).
Test case based on the original problem reported.
Phabricator Review: http://reviews.llvm.org/D21802
llvm-svn: 274645
Summary:
findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed.
This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase.
Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner
Subscribers: mehdi_amini, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D21692
llvm-svn: 274644
The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast gets split, the resulting casts
end up legal and cheap.
Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.
Differential Revision: http://reviews.llvm.org/D21251
llvm-svn: 274642
We currently always vectorize induction variables. However, if an induction
variable is only used for counting loop iterations or computing addresses with
getelementptr instructions, we don't need to do this. Vectorizing these trivial
induction variables can create vector code that is difficult to simplify later
on. This is especially true when the unroll factor is greater than one, and we
create vector arithmetic when computing step vectors. With this patch, we check
if an induction variable is only used for counting iterations or computing
addresses, and if so, scalarize the arithmetic when computing step vectors
instead. This allows for greater simplification.
This patch addresses the suboptimal pointer arithmetic sequence seen in
PR27881.
Reference: https://llvm.org/bugs/show_bug.cgi?id=27881
Differential Revision: http://reviews.llvm.org/D21620
llvm-svn: 274627
This is a follow-up for r273544.
The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
This commit also removes two command-line flags that weren't used in any of the
tests: widen-vmovs and swift-partial-update-clearance. The former may be easily
replaced with the mattr mechanism, but the latter may not (as it is a subtarget
property, and not a proper feature).
Differential Revision: http://reviews.llvm.org/D21797
llvm-svn: 274620
This is a follow-up for r273544 and r273853.
The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
This commit also marks them as obsolete.
Differential Revision: http://reviews.llvm.org/D21796
llvm-svn: 274616
The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc".
I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction.
I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions.
This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173.
Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails).
Differential revision: http://reviews.llvm.org/D21956
llvm-svn: 274613
Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the last
basic block was effectively disabled, since SI_RETURN is now used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21975
llvm-svn: 274612
"More things" = StratifiedAttrs and various bits like interprocedural
summaries.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21964
llvm-svn: 274592
StratifiedSets (as implemented) is very fast, but its accuracy is also
limited. If we take a more aggressive andersens-like approach, we can be
way more accurate, but we'll also end up being slower.
So, we've decided to split CFLAA into CFLSteensAA and CFLAndersAA.
Long-term, we want to end up in a place where CFLSteens is queried
first; if it can provide an answer, great (since queries are basically
map lookups). Otherwise, we'll fall back to CFLAnders, BasicAA, etc.
This patch splits everything out so we can try to do something like
that when we get a reasonable CFLAnders implementation.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21910
llvm-svn: 274589
I think the Ops filled out by Regex::match contain pointers into the temporary
std::string returned by StringRef::upper. Its lifetime is extended by the call
to match, but only until the end of that call (not to the uses of Ops later
on).
llvm-svn: 274586
Registers are printed a lot, so don't create temporary
std::strings. Using char instead of a string to an ostream
saves a function call.
llvm-svn: 274581
The way the named arguments for various system instructions are handled at the
moment has a few problems:
- Large-scale duplication between AArch64BaseInfo.h and AArch64BaseInfo.cpp
- That weird Mapping class that I have no idea what I was on when I thought
it was a good idea.
- Searches are performed linearly through the entire list.
- We print absolutely all registers in upper-case, even though some are
canonically mixed case (SPSel for example).
- The ARM ARM specifies sysregs in terms of 5 fields, but those are relegated
to comments in our implementation, with a slightly opaque hex value
indicating the canonical encoding LLVM will use.
This adds a new TableGen backend to produce efficiently searchable tables, and
switches AArch64 over to using that infrastructure.
llvm-svn: 274576