Differentiate between word and subword memory operations as they take different
amount of cycles to complete. This just adds a basic model of the subword
latency to the scheduler.
llvm-svn: 266898
Because lowering of CMP_SWAP_64 occurs during type legalization, there can be
i64 types produced by more than just a BUILD_PAIR or similar. My initial tests
used just incoming function args.
llvm-svn: 266828
Summary:
This property is used to mark an intrinsic that only writes to memory, but
neither reads from memory nor has other side effects.
An example where this is useful is the llvm.amdgcn.buffer.store.format.*
intrinsic, which corresponds to a store instruction that goes through a special
buffer descriptor rather than through a plain pointer.
With this property, the intrinsic should still be handled as having side
effects at the LLVM IR level, but machine scheduling can make smarter
decisions.
Reviewers: tstellarAMD, arsenm, joker.eph, reames
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18291
llvm-svn: 266826
Summary:
The added testcase, which triggered this, was derived from a shader-db case
via bugpoint. A separate question is why scalar branching wasn't used.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19208
llvm-svn: 266825
Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that
just return the thread pointer. I have a pending patch that does the same
for SystemZ (D19054), and there are many more targets that could benefit
from one.
This patch merges the ARM and AArch64 intrinsics into a single target
independent one that will also be used by subsequent targets.
Differential Revision: http://reviews.llvm.org/D19098
llvm-svn: 266818
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.
There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.
Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.
llvm-svn: 266806
* Add lowering for SETCCE i32.
* Add test to check lowering of i64 compares uses SETCCE expansion (outside of EQ and NE).
* Fix select.ll test and immediate form selection for RI operations.
llvm-svn: 266802
Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not.
Differential Revision: http://reviews.llvm.org/D19228
llvm-svn: 266728
Summary:
The `"patchable-function"` attribute can be used by an LLVM client to
influence LLVM's code generation in ways that makes the generated code
easily patchable at runtime (for instance, to redirect control).
Right now only one patchability scheme is supported,
`"prologue-short-redirect"`, but this can be expanded in the future.
Reviewers: joker.eph, rnk, echristo, dberris
Subscribers: joker.eph, echristo, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19046
llvm-svn: 266715
The fast register-allocator cannot cope with inter-block dependencies without
spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions
where any value produced within a block is dead by the end, but not for
cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded
after regalloc.
Fortunately this is at -O0 so we don't have to care about performance. This
simplifies the various axes of expansion considerably: we assume a strong
seq_cst operation and ensure ordering via the always-present DMB instructions
rather than v8 acquire/release instructions.
Should fix the 32-bit part of PR25526.
llvm-svn: 266679
Also,
- Skip pass if machine module does not have debug info
- Minor comment changes
- Added test
Differential Revision: http://reviews.llvm.org/D19079
llvm-svn: 266626
This resolves more frame indexes early and folds
the immediate offsets into the scratch mubuf instructions.
This cleans up a lot of the mess that's currently emitted,
such as emitting add 0s and repeatedly initializing the same
register to 0 when spilling.
llvm-svn: 266508
Because HoistSpillHelper::hoistAllSpills is called in postOptimization, before the
patch we didn't want LiveRangeEdit::eliminateDeadDefs to call splitSeparateComponents
and generate unassigned new vregs. However, skipping splitSeparateComponents will make
verify-machineinstrs unhappy, so I remove the early return, and use
HoistSpillHelper::LRE_DidCloneVirtReg to assign physreg/stackslot for those new vregs.
In addition, some code reorganization to make class HoistSpillHelper privately inheriting
from LiveRangeEdit::Delegate possible. This is to be consistent with class RAGreedy and
class RegisterCoalescer.
Differential Revision: http://reviews.llvm.org/D19142
llvm-svn: 266489
After r245976, LLVM will skip the last bit test case if knows it will always be
true. However, we would still erroneously update PHI nodes with incoming values
from the MBB that would perform the final bit test, causing -verify-machineinstrs
to fail.
llvm-svn: 266479
This improves AA in the MI schduler when reason about paired instructions.
Phabricator Revision: http://reviews.llvm.org/D17098
PR26358
llvm-svn: 266462
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.
Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.
Motivation
----------
Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.
We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.
Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.
http://reviews.llvm.org/D19034
<rdar://problem/25256815>
llvm-svn: 266446
Summary:
Without MMOs, the callee-save load/store instructions were treated as
volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer.
Reviewers: t.p.northover, mcrosier
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17661
llvm-svn: 266439
[PPC] Previously when casting generic loads to LXV2DX/ST instructions we
would leave the original load return type in place allowing for an
assertion failure when we merge two equivalent LXV2DX nodes with
different types.
This fixes PR27350.
Reviewers: nemanjai
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19133
llvm-svn: 266438
Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on
AArch64 for now.
This change also add support for unscaled stores which were not handled in
getMemOpBaseRegImmOfs().
llvm-svn: 266437
Summary:
In the added test-case, the atomic instruction feeds into a non-machine
CopyToReg node which hasn't been selected yet, so guard against
non-machine opcodes here.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19043
llvm-svn: 266433