Allow a target to do something other than search for copies
that will avoid cross register bank copies.
Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.
I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid kinds
of subregister copies on some targets.
I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.
The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.
llvm-svn: 248478
If the stores are storing values from loads which partially
alias the stores, we could end up placing the merged loads
and stores on the same chain which has the potential to break.
Each store may have a different chain dependency on only some
of the original loads. Create a new TokenFactor to capture all
of the required dependencies of the stores rather than assuming
all stores can use the same chain.
The testcase is a situation where this happens, although
it does not have an observable change from this. The DAG nodes
just happened to not be reordered before despite this missing
chain dependency.
This is based on an off-list report for an out of tree target
which regressed due to r246307 and I haven't managed to find a case
where the nodes do end up reordered with an in tree target.
llvm-svn: 248468
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.
This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.
llvm-svn: 248467
Loop unswitching produces conditional branches with constant condition,
and it's beneficial for later passes to clean this up with simplify-cfg.
We do this after the second invocation of loop-unswitch, but not after
the first one. Not doing so might cause problem for passes like
LoopUnroll, whose estimate of loop body size would be less accurate.
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D13064
llvm-svn: 248460
Summary:
With this change, subclasses of `llvm::User` will be able to co-allocate
a variable number of bytes (called a "descriptor") with the `llvm::User`
instance. The co-allocated descriptor can later be accessed using
`llvm::User::getDescriptor`. This will be used in later changes to
implement operand bundles.
This change steals one bit from `NumUserOperands`, but given that it is
still 28 bits wide I don't think this will be a practical issue.
This change does not allow allocating hung off uses with descriptors.
This only for simplicity, not for any fundamental reason; and we can
easily add this functionality later if needed.
Reviewers: reames, chandlerc, dexonsmith, kmod, majnemer, pete, JosephTremoulet
Subscribers: pete, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12455
llvm-svn: 248453
Nothing is expected to change, except we do less redundant work in
clean-up.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12951
llvm-svn: 248444
In -fprofile-instr-generate compilation, to remove the redundant profile
variables for the COMDAT functions, these variables are placed in the same
COMDAT group as its associated function. This way when the COMDAT function
is not picked by the linker, those profile variables will also not be
output in the final binary. This may cause warning when mix link objects
built w and wo -fprofile-instr-generate.
This patch puts the profile variables for COMDAT functions to its own COMDAT
group to avoid the problem.
Patch by xur.
Differential Revision: http://reviews.llvm.org/D12248
llvm-svn: 248440
...because that's what the cost model was intended to do.
As discussed in D12882, this fix has a temporary unintended consequence for
SimplifyCFG: it causes us to not speculate an fdiv. However, two wrongs make
PR24818 right, and two wrongs make PR24343 act right even though it's really
still wrong.
I intend to correct SimplifyCFG and add to CodeGenPrepare to account for this
cost model change and preserve the righteousness for the bug report cases.
https://llvm.org/bugs/show_bug.cgi?id=24818https://llvm.org/bugs/show_bug.cgi?id=24343
Differential Revision: http://reviews.llvm.org/D12882
llvm-svn: 248439
This time, the issue is that we weren't accounting for the possibility that
aligned DPRs could have been stored after the final "push" in a prologue. When
that happened we effectively moved a "sub sp, #N" from below the aligned stores
to above them, and everything went to pot.
To make it worse, I'd actually committed something testing that we produced
wrong code, so the test update is tiny.
llvm-svn: 248437
so the lookup works as expected after prepending the oso-prepend-path.
This manifested only on Windows, because "/" is not a relative path there.
llvm-svn: 248423
Patch by: simoncook
Unlike BitCasts, AddrSpaceCasts do not always produce an output the same size as its input, which was previously assumed. This fixes cases where two address spaces do not have the same size pointer, as an assertion failure would occur when trying to prove deferenceability. LoopUnswitch is used in the particular test, but LICM also exhibits the same problem.
Differential Revision: http://reviews.llvm.org/D13008
llvm-svn: 248422
This patch changes the order of GEPs generated by Splitting GEPs
pass, specially when one of the GEPs has constant and the base is
loop invariant, then we will generate the GEP with constant first
when beneficial, to expose more cases for LICM.
If originally Splitting GEP generate the following:
do.body.i:
%idxprom.i = sext i32 %shr.i to i64
%2 = bitcast %typeD* %s to i8*
%3 = shl i64 %idxprom.i, 2
%uglygep = getelementptr i8, i8* %2, i64 %3
%uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
...
Now it genereates:
do.body.i:
%idxprom.i = sext i32 %shr.i to i64
%2 = bitcast %typeD* %s to i8*
%3 = shl i64 %idxprom.i, 2
%uglygep = getelementptr i8, i8* %2, i64 1032
%uglygep7 = getelementptr i8, i8* %uglygep, i64 %3
...
For no-loop cases, the original way of generating GEPs seems to
expose more CSE cases, so we don't change the logic for no-loop
cases, and only limit our change to the specific case we are
interested in.
llvm-svn: 248420
Note: I'm am not trying to describe what "should be"; I'm only describing what is true today.
This came out of my recent question to llvm-dev titled: When can the dominator tree not contain a node for a basic block?
Differential Revision: http://reviews.llvm.org/D13078
llvm-svn: 248417
Add two new ways of accessing the unsafe stack pointer:
* At a fixed offset from the thread TLS base. This is very similar to
StackProtector cookies, but we plan to extend it to other backends
(ARM in particular) soon. Bionic-side implementation here:
https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
not emutls).
This is a re-commit of a change in r248357 that was reverted in
r248358.
llvm-svn: 248405
The BEXTR comments didn't make sense before, we may want to extend the
FP logic transform to work on vectors, and this way is more beautiful.
llvm-svn: 248404
Summary:
This is the first part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.
When range metadata is provided, it should be used to constant fold comparisons with constant values.
Reviewers: sanjoy, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12988
llvm-svn: 248402
This patch extends llvm-dsymutil's ODR type uniquing machinery to also
resolve forward decls for types defined in clang modules.
http://reviews.llvm.org/D13038
llvm-svn: 248398
This reverts r248388 and fixes the underlying bug: hasAddr64 was initialized
in runOnMachineFunction, but runOnMachineFunction isn't ever called in
CodeGen/WebAssembly/global.ll since that testcase has no functions. The fix
here is to use AsmPrinter's getPointerSize() as needed to determine the
pointer size instead.
llvm-svn: 248394
This changes the behavior of AddAligntmentAssumptions to match its
comment. I.e, prove the asserted alignment in the context of the caller,
not the callee.
Thanks to Mehdi Amini for seeing the issue here! Also to Artur Pilipenko
who also saw a fix for the issue.
rdar://22521387
Differential Revision: http://reviews.llvm.org/D12997
llvm-svn: 248390
Invoking a function which returns an aggregate can sometimes be
transformed to return a scalar value. However, this means that we need
to create an insertvalue instruction(s) to recreate the correct
aggregate type. We achieved this by inserting an insertvalue
instruction at the invoke's normal successor. However, this is not
feasible if the normal successor uses the invoke's return value inside a
PHI node.
Instead, split the edge between the invoke and the unwind successor and
create the insertvalue instruction in the new basic block. The new
basic block's successor will be the old invoke successor which leaves
us with IR which is well behaved.
This fixes PR24906.
llvm-svn: 248387
This change allows dead store elimination to remove zero and null stores into memory freshly allocated with calloc-like function.
Differential Revision: http://reviews.llvm.org/D13021
llvm-svn: 248374
The ARM backend has some logic that only allows the fast-isel to be enabled for
subtargets where it is known to be stable. This adds a backend option to
override this and force the fast-isel to be used for any target, to allow it to
be tested.
This is an ARM-specific option, because no other backend disables the fast-isel
on a per-subtarget basis.
llvm-svn: 248369
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.
LLVM counterpart to D12835
Differential Revision: http://reviews.llvm.org/D13002
llvm-svn: 248368
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.
I've refactored the call sites I could find easily to use getZero /
getOne.
Reviewers: hfinkel, majnemer, reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12947
llvm-svn: 248362