There's nothing darwin-specific in these tests, and using
that setting causes extra phantom diffs when the auto-generated
check lines are regenerated today.
llvm-svn: 304614
This pass allows to run the register scavenging independently of
PrologEpilogInserter to allow targeted testing.
Also adds some basic register scavenging tests.
llvm-svn: 304606
Since r288804, we try to lower build_vectors on AVX using broadcasts of
float/double. However, when we broadcast integer values that happen to
have a NaN float bitpattern, we lose the NaN payload, thereby changing
the integer value being broadcast.
This is caused by ConstantFP::get, to which we pass the splat i32 as
a float (by bitcasting it using bitsToFloat). ConstantFP::get takes
a double parameter, so we end up lossily converting a single-precision
NaN to double-precision.
Instead, avoid any kinds of conversions by directly building an APFloat
from the splatted APInt.
Note that this also fixes another piece of code (broadcast of
subvectors), that currently isn't susceptible to the same problem.
Also note that we could really just use APInt and ConstantInt
throughout: the constant pool type doesn't matter much. Still, for
consistency, use the appropriate type.
llvm-svn: 304590
This initial patch doesn't actually do much useful. It's just to show where the new code goes. Once this is in, I'll extend the verification logic to check more useful properties.
For those curious, the more complicated version of this patch already found one very suspicious thing.
Differential Revision: https://reviews.llvm.org/D33819
llvm-svn: 304564
insertps behaves differently, the register form selects from an input
register based on the immediate operand while the memory form just loads
the given address. We have custom code to change the immediate in cases
where that's legal, so completely remove insertps from the generated
tables.
llvm-svn: 304540
Summary:
Add an early combine to match patterns such as:
(i16 bitcast (v16i1 x))
->
(i16 movmsk (v16i8 sext (v16i1 x)))
This combine needs to happen early enough before
type-legalization scalarizes the result of the setcc.
Reviewers: igorb, craig.topper, RKSimon
Subscribers: delena, llvm-commits
Differential Revision: https://reviews.llvm.org/D33311
llvm-svn: 304406
Summary: This pattern is no very useful per se, but it exposes optimization for toehr patterns that wouldn't kick in otherwize. It's very common and worth optimizing for.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32756
llvm-svn: 304402
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.
Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb
Reviewed By: MatzeB, andreadb
Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32563
llvm-svn: 304371
We should have a single call site entry with no landing pad. This
indicates that no EH action should be taken and the unwinder should
unwind to the next frame.
We currently don't recognize __gxx_personality_seh0 as a known
personality, so we forcibly emit a table, and that table was wrong. This
was filed as PR33220. Now we emit a correct table for that personality.
The next step is to recognize that we can completely skip the table for
this personality.
llvm-svn: 304363
Summary:
If we attempt to unfold an SUnit in ScheduleDAG that results in
finding an already scheduled load, we must should abort the
unfold as it will not improve scheduling.
This fixes PR32610.
Reviewers: jmolloy, sunfish, bogner, spatel
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D32911
llvm-svn: 304321
xchg with a mem operand has different locking semantics. If we unfold it
into a xchg r,r we will loose the implicit lock. Likewise we never want
to fold a register xchg into a memory one as it would be a lot slower.
This triggers during LLVM selfhost.
llvm-svn: 304163
The extending load possibility was missed in:
https://reviews.llvm.org/rL304072
We might want to handle this cases as a follow-up, but bailing out for now
to avoid miscompiling.
llvm-svn: 304153
This was reverted due to buildbot breakages and I was not familiar
with this code to investigate it. But while trying to get a
useful backtrace for the author, it turns out the fix was very
obvious. Resubmitting this patch as is, and will submit the
fix in a followup so that the fix is not hidden in the larger
CL.
llvm-svn: 304122
This reverts commit 28cb1003507f287726f43c771024a1dc102c45fe as well
as all subsequent followups. llvm-tblgen currently segfaults with
this change, and it seems it has been broken on the bots all
day with no fixes in preparation. See, for example:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/
llvm-svn: 304121
X86 backend holds huge tables in order to map between the register and memory forms of each instruction.
This TableGen Backend automatically generated all these tables with the appropriate flags for each entry.
Differential Revision: https://reviews.llvm.org/D32684
llvm-svn: 304088
If we have (extract_subvector(load wide vector)) with no other users,
that can just be (load narrow vector). This is intentionally conservative.
Follow-ups may loosen the one-use constraint to account for the extract cost
or just remove the one-use check.
The memop chain updating is based on code that already exists multiple times
in x86 lowering, so that should be pulled into a helper function as a follow-up.
Background: this is a potential improvement noticed via regressions caused by
making x86's peekThroughBitcasts() not loop on consecutive bitcasts (see
comments in D33137).
Differential Revision: https://reviews.llvm.org/D33578
llvm-svn: 304072
Rewrite fixupKills() to use the LivePhysRegs class. Simplifies the code
and fixes a bug where the CSR registers in return blocks where missed
leading to invalid kill flags. Also remove the unnecessary rule that we
wouldn't set kill flags on tied operands.
No tests as I have an upcoming commit improving MachineVerifier checks
to catch these cases in multiple existing lit tests.
llvm-svn: 304055
In the best case:
extract (binop (concat X1, X2), (concat Y1, Y2)), N --> binop XN, YN
...we kill all of the extract/concat and just have narrow binops remaining.
If only one of the binop operands is amenable, this transform is still
worthwhile because we kill some of the extract/concat.
Optional bitcasting makes the code more complicated, but there doesn't
seem to be a way to avoid that.
The TODO about extending to more than bitwise logic is there because we really
will regress several x86 tests including madd, psad, and even a plain
integer-multiply-by-2 or shift-left-by-1. I don't think there's anything
fundamentally wrong with this patch that would cause those regressions; those
folds are just missing or brittle.
If we extend to more binops, I found that this patch will fire on at least one
non-x86 regression test. There's an ARM NEON test in
test/CodeGen/ARM/coalesce-subregs.ll with a pattern like:
t5: v2f32 = vector_shuffle<0,3> t2, t4
t6: v1i64 = bitcast t5
t8: v1i64 = BUILD_VECTOR Constant:i64<0>
t9: v2i64 = concat_vectors t6, t8
t10: v4f32 = bitcast t9
t12: v4f32 = fmul t11, t10
t13: v2i64 = bitcast t12
t16: v1i64 = extract_subvector t13, Constant:i32<0>
There was no functional change in the codegen from this transform from what I
could see though.
For the x86 test changes:
1. PR32790() is the closest call. We don't reduce the AVX1 instruction count in that case,
but we improve throughput. Also, on a core like Jaguar that double-pumps 256-bit ops,
there's an unseen win because two 128-bit ops have the same cost as the wider 256-bit op.
SSE/AVX2/AXV512 are not affected which is expected because only AVX1 has the extract/concat
ops to match the pattern.
2. do_not_use_256bit_op() is the best case. Everyone wins by avoiding the concat/extract.
Related bug for IR filed as: https://bugs.llvm.org/show_bug.cgi?id=33026
3. The SSE diffs in vector-trunc-math.ll are just scheduling/RA, so nothing real AFAICT.
4. The AVX1 diffs in vector-tzcnt-256.ll are all the same pattern: we reduced the instruction
count by one in each case by eliminating two insert/extract while adding one narrower logic op.
https://bugs.llvm.org/show_bug.cgi?id=32790
Differential Revision: https://reviews.llvm.org/D33137
llvm-svn: 303997
Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.
llvm-svn: 303921
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).
Differential Revision: https://reviews.llvm.org/D33169
llvm-svn: 303858
I suspect this buildbot has slow-incdec set by default, most likely due to
the default CPU having this set. This feature bit can prevent optsize from
having an effect on this IR.
llvm-svn: 303720
Summary:
It's rare but a small number of patterns use IntInit's at the root of the match.
On X86, one such rule is enabled by the OptForSize predicate and causes the
compiler to use the smaller:
%0 = MOV32r1
instead of the usual:
%0 = MOV32ri 1
This patch adds support for matching IntInit's at the root and uses this as a
test case for the optsize attribute that was implemented in r301750
Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32791
llvm-svn: 303678