We may need to widen the vector to make the shifts legal, but if we do that we need to make sure we shift left/right after accounting for the new size. If not we can't guarantee we are shifting in zeros.
The test cases affected actually show cases where we should move the shifts all together, but that's another problem.
llvm-svn: 320248
We were previously using kunpck with zero inputs unnecessarily. And we had cases where we would insert into a zero vector and then insert into larger zero vector incurring two sets of shifts.
llvm-svn: 320244
Summary:
This relaxes an assertion inside SelectionDAGBuilder which is overly
restrictive on targets which have no concept of alignment (such as AVR).
In these architectures, all types are aligned to 8-bits.
After this, LLVM will only assert that accesses are aligned on targets
which actually require alignment.
This patch follows from a discussion on llvm-dev a few months ago
http://llvm.1065342.n5.nabble.com/llvm-dev-Unaligned-atomic-load-store-td112815.html
Reviewers: bogner, nemanjai, joerg, efriedma
Reviewed By: efriedma
Subscribers: efriedma, cactus, llvm-commits
Differential Revision: https://reviews.llvm.org/D39946
llvm-svn: 320243
The outliner previously would never outline calls. Calls are pretty common in
files, so it makes sense to outline them. In fact, in the LLVM test suite, if
you count the number of instructions that the outliner misses when you outline
calls vs when you don't, it turns out that, on average, around 6% of the
instructions encountered are calls. So, if we outline calls, we can find more
candidates, and thus save some more space.
This commit adds that functionality and updates the mir test to reflect that.
llvm-svn: 320229
MachineSink attempts to place instructions near the basic blocks where
they are needed. Once an instruction has been sunk, its location
relative to other instructions no longer is consistent with the
original source code. In order to ensure correct stepping in the
debugger, the debug location for sunk instructions is either merged
with the insertion point or erased if the target successor block is
empty.
Originally submitted as r318679, revised to fix sanitizer failure and
improve testing.
Patch by Matthew Voss!
Differential Revision: https://reviews.llvm.org/D39933
llvm-svn: 320216
This includes a fix so that it doesn't transform declarations, and it
puts the functionality under control of a command-line option which is off
by default to avoid breaking existing setups.
llvm-svn: 320196
For narrow sizes we'll widen the zero vector and widen the insert. Then do an extract_subvector to get back down to correct size.
This allows us to remove some patterns from the isel table that had to COPY_TO_REGCLASS to an oversized register, do the shift and then COPY_TO_REGCLASS back to the narrow register. Now this is represented explicitly in the DAG.
This seems to have perturbed the register allocation in one of the tests, but the number of instructions didn't change.
llvm-svn: 320190
Updated the scheduling information for the Haswell subtarget with the following changes:
Regrouped the instructions after adding appropriate load + store latencies.
Added scheduling for missing instructions such as the GATHER instrs.
The changes were made after revisiting the latencies impact of all memory uOps.
Reviewers: RKSimon, zvi, craig.topper, apilipenko
Differential Revision: https://reviews.llvm.org/D40021
Change-Id: Iaf6c1f5169add1552845a8a566af4e5a359217a7
llvm-svn: 320137
Replace interleaved store instructions by equivalent and more efficient instructions based on latency cost model.
Https://reviews.llvm.org/D38196
llvm-svn: 320123
This reverts commit 959e37e669b0c3cfad4cb9f1f7c9261ce9f5e9ae.
That commit doesn't handle the case where main is declared rather than defined,
in particular the even-more special case where main is a prototypeless
declaration (which is of course the one actually used by musl currently).
llvm-svn: 320121
We previously only supported inserting to the LSB or MSB where it was easy to zero to perform an OR to insert.
This change effectively extracts the old value and the new value, xors them together and then xors that single bit with the correct location in the original vector. This will cancel out the old value in the first xor leaving the new value in the position.
The way I've implemented this uses 3 shifts and two xors and uses an additional register. We can avoid the additional register at the cost of another shift.
llvm-svn: 320120
It is causing sanitizer failures on llvm tests in a bootstrapped compiler. No bot link since it's currently down, but following up to get the bot up.
This reverts commit r319218.
llvm-svn: 320106
The offset overflow check before was incorrect. It would always give the
correct result, but it was comparing the SCALED potential fixed-up offset
against an UNSCALED minimum/maximum. As a result, the outliner was missing a
bunch of frame setup/destroy instructions that ought to have been safe to
outline. This fixes that, and adds an instruction to the .mir test that
failed the old test.
llvm-svn: 320090
-amdgpu-waitcnt-forcezero={1|0} Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n> Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n> Force emit a s_waitcnt vmcnt(0) before the first <n> instrs
Differential Revision: https://reviews.llvm.org/D40091
llvm-svn: 320084
There's no v2i1 or v4i1 kshift, and v8i1 is only supported with AVXDQ. Isel has fake patterns to extend these types to native shifts, but makes no guarantees about the value of any bits shifted in when shifting right.
This patch promotes the vector to a type that supports a native shift first and only allows inserting into the msb of a native sized shift.
I've constructed this in a way that doesn't do the promotion if we're going to fallback to using a xmm/ymm/zmm shuffle. I think I have a plan to remove the shuffle fall back entirely. In which case we this can be simplified, but I wanted to fix the correctness issue first.
llvm-svn: 320081
Tagged as IMUL instructions for a reasonable approximation (ALU tends to be a lot faster) - POPCNT is currently tagged as FAdd which I think should be replaced with IMUL as well
llvm-svn: 320051
I noticed this pattern in D38316 / D38388. We failed to combine a shuffle that is either
repeating a scalar insertion at the same position in a vector or translated to a different
element index.
Like the earlier patch, this could be an instcombine too, but since we opted to make this
a DAG transform earlier, I've made this one a DAG patch too.
We do not need any legality checking because the new insert is identical to the existing
insert except that it may have a different constant insertion operand.
The constant insertion test in test/CodeGen/X86/vector-shuffle-combining.ll was the
motivation for D38756.
Differential Revision: https://reviews.llvm.org/D40209
llvm-svn: 320050
WebAssembly requires caller and callee signatures to match, so the usual
C runtime trick of calling main and having it just work regardless of
whether main is defined as '()' or '(int argc, char *argv[])' doesn't
work. Extend the FixFunctionBitcasts pass to rewrite main to use the
latter form.
llvm-svn: 320041
This patch includes all missing functionality needed to provide first
compilation of a simple program that just returns from a function.
I've added a test case that checks for "ret" instruction printed in assembly
output.
Patch by Andrei Grischenko (andrei.l.grischenko@intel.com)
Differential revision: https://reviews.llvm.org/D39688
llvm-svn: 320035
Summary:
Changed use_instructions() to use_nodbg_instructions() when
building an instruction set.
We don't want the presence of debug info to affect the code
we generate.
Reviewers: dblaikie, Eugene.Zelenko, chandlerc, aprantl
Reviewed By: aprantl
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D40882
llvm-svn: 320010
Summary:
This patch adds MachineCombiner patterns for transforming
(fsub (fmul x y) z) into (fma x y (fneg z)). This has a lower
latency on micro architectures where fneg is cheap.
Patch based on work by George Steed.
Reviewers: rengolin, joelkevinjones, joel_k_jones, evandro, efriedma
Reviewed By: evandro
Subscribers: aemerson, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D40306
llvm-svn: 319980