We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.
Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.
The old test case should still work as expected.
llvm-svn: 348320
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue. This results in incorrect offset computations within
funclets that are contained in vararg functions.
Differential Revision: https://reviews.llvm.org/D55096
llvm-svn: 348222
This moves the stack check logic into a lambda within getOutliningCandidateInfo.
This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.
So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.
This gives some code size savings as a result.
llvm-svn: 348220
If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.
llvm-svn: 348219
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.
This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.
Also add tests for each possible new behaviour.
- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.
- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case
- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.
llvm-svn: 348168
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.
Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html
Reviewers: olista01, samparker, aemerson
Reviewed By: samparker
Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits
Differential Revision: https://reviews.llvm.org/D54629
llvm-svn: 348137
This patch splits backend features currently
hidden behind architecture versions.
For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.
This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html
Reviewers: DavidSpickett, olista01, t.p.northover
Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio
Differential revision: https://reviews.llvm.org/D54633
llvm-svn: 348121
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.
This makes it so that we check for that condition before mapping instructions.
This allows us to outline more, since we don't pessimise as many instructions.
Also update some tests, since we outline more.
llvm-svn: 348081
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.
This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.
This mostly offers some serious simplification.
Also update the test to show the added functionality.
llvm-svn: 348036
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.
Differential Revision: https://reviews.llvm.org/D54988
llvm-svn: 348035
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.
llvm-svn: 347906
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.
I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.
Differential Revision: https://reviews.llvm.org/D54276
llvm-svn: 347902
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.
The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.
Differential Revision: https://reviews.llvm.org/D54518
llvm-svn: 347861
Before this patch, the following stores in `merge_fail` would fail to be
merged, while they would get merged in `merge_ok`:
```
void use(unsigned long long *);
void merge_fail(unsigned key, unsigned index)
{
unsigned long long args[8];
args[0] = key;
args[1] = index;
use(args);
}
void merge_ok(unsigned long long *dst, unsigned a, unsigned b)
{
dst[0] = a;
dst[1] = b;
}
```
The reason is that `getMemOpBaseImmOfs` would return false for FI base
operands.
This adds support for this.
Differential Revision: https://reviews.llvm.org/D54847
llvm-svn: 347747
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.
This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.
The goal of this patch is to refactor all this to return a base
operand instead of a base register.
Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.
Differential Revision: https://reviews.llvm.org/D54846
llvm-svn: 347746
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasExtendedReg()`.
Differential revision: https://reviews.llvm.org/D54822
llvm-svn: 347599
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasShiftedReg()`.
Differential revision: https://reviews.llvm.org/D54820
llvm-svn: 347598
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::isScaledAddr()`
Differential revision: https://reviews.llvm.org/D54777
llvm-svn: 347597
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.
For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.
This patch authored by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, apilipenko, reames
Reviewed By: reames
Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D53892
llvm-svn: 347584
A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.
Fix this by making LowerBUILD_VECTOR not do this transformation for those
vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
constant vector. Doing that means we get a DUP, which we then need to recognise
in ISel as a copy.
llvm-svn: 347456
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.
The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).
Differential Revision: https://reviews.llvm.org/D52653
llvm-svn: 347208
When unwinding past a function that uses shadow call stack, we must
subtract 8 from the value of the x18 register. This patch causes us
to emit a call frame instruction that causes that to happen.
Differential Revision: https://reviews.llvm.org/D54609
llvm-svn: 347089
Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block,
and also not live out.
If this holds for all MBBs, then we can avoid checking for liveness on
that candidate. Furthermore, if it holds for an individual candidate's
MBB, then we can avoid checking for liveness on that candidate.
llvm-svn: 346901
If we keep track of if the ContainsCalls bit is set in the MBB flags for each
candidate, then we have a better chance of not checking the candidate for calls
at all.
This saves quite a few checks in some CTMark tests (~200 in Bullet, for
example.)
llvm-svn: 346816
We already determine a bunch of information about an MBB in
getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating
things that must be false/true.
The first thing we can easily check is if an outlined sequence could ever
contain calls. There's no reason to walk over the outlined range, checking for
calls, if we already know that there are no calls in the block containing the
sequence.
llvm-svn: 346809
Since we never outline anything with fewer than 2 occurrences, there's no
reason to compute cost model information if there's less than that.
llvm-svn: 346803
Turns out it's way simpler to do this check with one LRU. Instead of
maintaining two, just keep one. Check if each of the registers is available,
and then check if it's a live out from the block. If it's a live out, but
available in the block, we know we're in an unsafe case.
llvm-svn: 346721
Instead of returning Flags, return true if the MBB is safe to outline from.
This lets us check for unsafe situations, like say, in AArch64, X17 is live
across a MBB without being defined in that MBB. In that case, there's no point
in performing an instruction mapping.
llvm-svn: 346718
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.
I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.
For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.
Differential Revision: https://reviews.llvm.org/D54073
llvm-svn: 346595
This patch adds support for funclets in frame lowering and ISel
lowering. Together with D50288 and D50166, it enables C++ exception
handling.
Patch by Sanjin Sijaric, with some fixes by me.
Differential Revision: https://reviews.llvm.org/D51524
llvm-svn: 346568
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.
Reviewers: gchatelet, john.brawn, jsji
Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54297
llvm-svn: 346489
In this context, usesWindowsCFI() is basically the same thing as
isOSWindows(), but it makes the relevant property of the target
more explicit.
llvm-svn: 346366
Like the comment says, this isn't the most efficient fix in terms of
codesize, but it works.
Differential Revision: https://reviews.llvm.org/D54129
llvm-svn: 346358
Cleanup CCMP pattern matching code in preparation for review/bugfix:
- Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()`
(it won't accept arbitrary disjunctions and is really about whether we
can transform the subtree into a conjunction that we can emit).
- Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()`
llvm-svn: 346203
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.
llvm-svn: 346180
Summary:
This function was causing a crash when `MaxElements == 1` because
it was trying to create a single element vector type.
Reviewers: dsanders, aemerson, aditya_nandakumar
Reviewed By: dsanders
Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D53734
llvm-svn: 345875
This was added in r330630. GCC's -Wimplicit-fallthrough seems to not
fire when the previous case contains a switch itself.
This fallthrough was bening because the helper function implementing the
case used dyn_cast to re-check the type of the node in question. After
fixing the fallthrough, we can strengthen the cast.
llvm-svn: 345864