This patch implements the transformation that promotes indirect calls to
conditional direct calls when the indirect-call value profile meta-data is
available.
Differential Revision: http://reviews.llvm.org/D17864
llvm-svn: 267815
For compilations with no explicit cpu specified, this exhibits
nice gains on Silvermont, with neutral performance on big cores.
Differential Revision: http://reviews.llvm.org/D19138
llvm-svn: 267809
The callseq_end node must be glued with the TLS calls, otherwise,
the generic code will miss the uses of the returned value and will
mark it dead.
Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI,
the pseudo uses the symbol address at this point not RDI and the
lowering will do the right thing.
llvm-svn: 267797
This was crashing llvm-objdump with -macho -objc-meta-data when trying dump a non-existent section.
So the test binary is simply created from an empty .s file compiled with: clang -arch armv7 empty.s -c
llvm-svn: 267782
transferSuccessors() would LoadCmpBB a successor of DoneBB,
whereas it should be a successor of the original MBB.
Follow-up to r266339.
Unfortunately, it's tricky to catch this in the verifier.
llvm-svn: 267779
transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas
it should be a successor of the original MBB.
The testcase changes are caused by Thumb2SizeReduction, which
was previously confused by the broken CFG.
Follow-up to r266679.
Unfortunately, it's tricky to catch this in the verifier.
llvm-svn: 267778
The sink cast machinery is supposed to sink casts as close to their user
as possible. However, an EH pad is the first instruction in it's basic
block. Don't sink if the user is an EH pad.
This fixes PR27536.
llvm-svn: 267767
"inferattrs" will deduce the attribute, but it will be too late for
many optimizations. Set it ourselves when creating the call.
Differential Revision: http://reviews.llvm.org/D17598
llvm-svn: 267762
We previously disallowed interleaved load groups that may cause us to
speculatively access memory out-of-bounds (r261331). We did this by ensuring
each load group had an access corresponding to the first and last member.
Instead of bailing out for these interleaved groups, this patch enables us to
peel off the last vector iteration, ensuring that we execute at least one
iteration of the scalar remainder loop. This solution was proposed in the
review of the previous patch.
Differential Revision: http://reviews.llvm.org/D19487
llvm-svn: 267751
Added support of TTMP quads.
Reworked M0 exclusion machinery for SMRD and similar instructions
to enable usage of TTMP registers in those instructions as destinations.
Tests added.
Differential Revision: http://reviews.llvm.org/D19342
llvm-svn: 267733
Summary:
llvm-symbolizer wants to get linkage names of functions for historical
reasons. Linkage names are only recorded in the PDB for public symbols,
and the linkage name is apparently stored separately in some "public
symbol" record. We had a workaround in PDBContext which would look for
such symbols when the user requested linkage names.
However, when given an address that was truly in a private function and
public funciton, we would accidentally find nearby public symbols and
return those function names. The fix is to look for both function
symbols and public symbols and only prefer the public symbol name if the
addresses of the symbols agree.
Fixes PR27492
Reviewers: zturner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19571
llvm-svn: 267732
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.
(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)
Reviewers: arsenm, mareko, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19203
llvm-svn: 267729
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
Possibility to specify code of hardware register kept.
Disassemble to symbolic name, if name is known.
Tests updated/added.
Differential Revision: http://reviews.llvm.org/D19335
llvm-svn: 267724
This test also runs on e.g. ARM-native builds when the X86 backend is also
built. This test produces code for the default instruction set, even though it
is in a "X86" sub-directory. Given that this test doesn't seem to be testing
anything architecture-specific, it seems it's best to adapt the check to not
check for an architecture-dependent value (the size of the function), rather
than hard-code the test to target x86.
llvm-svn: 267722
Summary:
With the removal of support for lazy parsing of combined index summary
records (e.g. r267344), we no longer need to include the summary record
bitcode offset in the VST entries for definitions. Change the combined
index format to be similar to the per-module index format in using value
ids to cross-reference from the summary record to the VST entry (rather
than the summary record bitcode offset to cross-reference in the other
direction).
The visible changes are:
1) Add the value id to the combined summary records
2) Remove the summary offset from the combined VST records, which has
the following effects:
- No longer need the VST_CODE_COMBINED_GVDEFENTRY record, as all
combined index VST entries now only contain the value id and
corresponding GUID.
- No longer have duplicate VST entries in the case where there are
multiple definitions of a symbol (e.g. weak/linkonce), as they all
have the same value id and GUID.
An implication of #2 above is that in order to hook up an alias to the
correct aliasee based on the value id of the aliasee recorded in the
combined index alias record, we need to scan the entries in the index
for that GUID to find the one from the same module (i.e. the case where
there are multiple entries for the aliasee). But the reader no longer
has to maintain a special map to hook up the alias/aliasee.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19481
llvm-svn: 267712
Teach Value::getPointerAlignment that allocas with no explicit alignment are aligned to preferred alignment of the allocated type.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D17569
llvm-svn: 267689
Summary:
D19403 adds a new pragma for loop distribution. This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.
As part of this I had to rethink the flag -enable-loop-distribute. My
goal was to be backward compatible with the existing behavior:
A1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute is specified
A2. pass is on when invoked directly from opt (e.g. for unit-testing)
The new pragma/metadata overrides these defaults so the new behavior is:
B1. A1 + enable distribution for individual loop with the pragma/metadata
B2. A2 + disable distribution for individual loop with the pragma/metadata
The default value whether the pass is on or off comes from the initiator
of the pass. From the PassManagerBuilder the default is off, from opt
it's on.
I moved -enable-loop-distribute under the pass. If the flag is
specified it overrides the default from above.
Then the pragma/metadata can further modifies this per loop.
As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline. So to be precise
this is the new behavior:
C1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute or the pragma/metadata enables it
C2. pass is on when invoked directly from opt
unless -enable-loop-distribute=0 or the pragma/metadata disables it
Reviewers: hfinkel
Subscribers: joker.eph, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19431
llvm-svn: 267672
This reverts commit r267657, r267656, and r267655.
The test does not pass on multiple bots, I'm unsure why yet but let's unbreak them.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267664
Summary:
It is incorrect to compare TripCount (which is BECount + 1)
with extraiters (or Count) to check if we should enter unrolled
loop or not, because TripCount can potentially overflow
(when BECount is max unsigned integer).
While comparing BECount with (Count - 1) is overflow safe and
therefore correct.
Reviewer: hfinkel
Differential Revision: http://reviews.llvm.org/D19256
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 267662
This effectively adds back the extractelt combine removed by r262358:
the direct case can still occur (because x86_mmx is special, see
r262446), but it's the indirect case that's now superseded by the
generic combine.
llvm-svn: 267651
Summary:
If the linker requested to preserve a linkonce function, we should
honor this even if we drop all uses.
Reviewers: dexonsmith
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19527
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267644
When encountering a non-local pointer, LVI would eagerly scan the block for dereferences of the given object to prove the pointer to be non null. That's all well and good, but *then* we'd go recurse through our input blocks. As a result, we could end up scanning each and every block we traverse, even if the final definition was obviously non null or we found a constant value somewhere up the chain. The previous code papered over this by using the isKnownNonNull routine from value tracking. This made the duplication less painful in the common case.
Instead, we know do the block scan only *after* we've gotten the recursive results back. This lets us stop scanning individual blocks as soon as we've determined it to be non-null in any predecessor block and use our usual merge rules to propagate that information cheaply through successor blocks. For a pointer which can be found non-null, this does strictly less work and sometimes substaintially so.
Note that the case where we *can't* prove something non-null is still the really expensive case. We end up scanning each and every block looking for a dereference and never end up finding one.
llvm-svn: 267642
the prologue.
Do not use basic blocks that have EFLAGS live-in as prologue if we need
to realign the stack. Realigning the stack uses AND instruction and this
clobbers EFLAGS.
An other alternative would have been to save and restore EFLAGS around
the stack realignment code, but this is likely inefficient.
Fixes PR27531.
llvm-svn: 267634
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well.
This patch builds on 267609 which did the same thing for unary casts.
llvm-svn: 267620
Essentially, I was using the wrong size function. For types which were sized, but not primitive, I wasn't getting a useful size for the operand and failed an assert. I fixed this, and also added a guard that the input is a sized type. Test case is for the original mistake. I'm not sure how to actually exercise the sized type check.
llvm-svn: 267618
We need the default ratio to be sufficiently large that it triggers transforms
based on block frequency info (BFI) and plays well with the recently introduced
BranchProbability used by CGP.
Differential Revision: http://reviews.llvm.org/D19435
llvm-svn: 267615
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well.
This patch implements only the unary operation case. Once this is in, I'll implement the same for the binary operations.
Differential Revision: http://reviews.llvm.org/D19492
llvm-svn: 267609
The destination buffer that sprintf uses is restrict qualified, we do
not need to worry about derived pointers referenced via format
specifiers.
This reverts commit r267580.
llvm-svn: 267605
The DBI stream contains a lot of bookkeeping information for other
streams. In particular it contains information about section contributions
and linked modules. This patch is a first attempt at parsing some of the
information out of the DBI stream. It currently only parses and dumps the
headers of the DBI stream, so none of the module data or section
contribution data is pulled out.
This is just a proof of concept that we understand the basic properties of
the DBI stream's metadata, and followup patches will try to extract more
detailed information out.
Differential Revision: http://reviews.llvm.org/D19500
Reviewed By: majnemer, ruiu
llvm-svn: 267585
When a block is tail-duplicated, the PHI nodes from that block are
replaced with appropriate COPY instructions. When those PHI nodes
contained use operands with subregisters, the subregisters were
dropped from the COPY instructions, resulting in incorrect code.
Keep track of the subregister information and use this information
when remapping instructions from the duplicated block.
Differential Revision: http://reviews.llvm.org/D19337
llvm-svn: 267583
sprintf doesn't read or copy the terminating null byte from it's string
operands. sprintf will append it's own after processing all of the
format specifiers.
This fixes PR27526.
llvm-svn: 267580
The Machine Instruction Verifier flagged some issues in the serialized MIR.
Adjust the input to correct them.
Fixes the remaining portion of PR27480.
llvm-svn: 267578
This is part of solving PR27344:
https://llvm.org/bugs/show_bug.cgi?id=27344
CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this
same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the
IR further with a select in place.
For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly
predictable branch. Even the most limited HW branch predictors will be correct on this branch almost
all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all
the times the branch was predicted correctly.
As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's
MispredictPenalty. Or we could just let targets override the default by implementing the hook with that
and other target-specific options. Note that trying to statically determine mispredict rates for
close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie,
50/50 taken/not-taken might still be 100% predictable.
Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable()
branch weight default values are 4 and 64. A proposal to change that is in D19435.
Differential Revision: http://reviews.llvm.org/D19488
llvm-svn: 267572
Support for SDWA instructions for VOP1 and VOP2 encoding.
Not done yet:
- converters for support optional operands and modifiers
- VOPC
- sext() modifier
- intrinsics
- VOP2b (see vop_dpp.s)
- V_MAC_F32 (see vop_dpp.s)
Differential Revision: http://reviews.llvm.org/D19360
llvm-svn: 267553
Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass.
Differential Revision: http://reviews.llvm.org/D19409
llvm-svn: 267551
If the linker specifically requested for a linkonce to be preserved,
we need to make sure we won't drop it even if all the uses in the
current module disappear.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267543
Summary:
Instead of using maximum IR weight as the basic block weight, this patch uses the voting algorithm to find the most likely weight for the basic block. This can effectively avoid the cases when some IRs are annotated incorrectly due to code motion of the profiled binary.
This patch also updates propagate.ll unittest to include discriminator in the input file so that it is testing something meaningful.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19301
llvm-svn: 267519
Some of the JIT tests began failing with "[llvm] r266663 - [Orc] Re-commit
r266581 with fixes for MSVC, and format cleanups." on powerpc64 big endian.
To get the buildbots running I am marking these as UNSUPPORTED for now.
If this is fixed remove the UNSUPPORTED flag "powerpc64-unknown-linux-gnu".
In r267516 I marked these as XFAIL but they succeed on some of the bots
on stage1.
llvm-svn: 267518
When passed in via filename, this test will fail if the path to the test
has the strings "f1" and "f2" in somewhere. Pass the file through stdin
to prevent test failures due to coincidences in path names.
llvm-svn: 267517
Some of the JIT tests began failing with "[llvm] r266663 - [Orc] Re-commit
r266581 with fixes for MSVC, and format cleanups." on powerpc64 big endian.
To get the buildbots running I am marking these as XFAIL for now.
If this is fixed remove the XFAIL flag "powerpc64-unknown-linux-gnu".
llvm-svn: 267516
When SimplifyCFG merges identical instructions from both sides of a diamond, it
can preserve !llvm.mem.parallel_loop_access (as it does with most of the other
metadata). There's no real data or control dependency change in this case.
llvm-svn: 267515
I really thought we were doing this already, but we were not. Given this input:
void Test(int *res, int *c, int *d, int *p) {
for (int i = 0; i < 16; i++)
res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}
we did not vectorize the loop. Even with "assume_safety" the check that we
don't if-convert conditionally-executed loads (to protect against
data-dependent deferenceability) was not elided.
One subtlety: As implemented, it will still prefer to use a masked-load
instrinsic (given target support) over the speculated load. The choice here
seems architecture specific; the best option depends on how expensive the
masked load is compared to a regular load. Ideally, using the masked load still
reduces unnecessary memory traffic, and so should be preferred. If we'd rather
do it the other way, flipping the order of the checks is easy.
The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also
implies that if conversion is okay.
Differential Revision: http://reviews.llvm.org/D19512
llvm-svn: 267514
Kill-flags, which computeRegisterLiveness uses, are not reliable.
LivePhysRegs is.
Differential Revision: http://reviews.llvm.org/D19472
llvm-svn: 267495
The SparcV8 fneg and fabs instructions interestingly come only in a
single-float variant. Since the sign bit is always the topmost bit no
matter what size float it is, you simply operate on the high
subregister, as if it were a single float.
However, the layout of double-floats in the float registers is reversed
on little-endian CPUs, so that the high bits are in the second
subregister, rather than the first.
Thus, this expansion must check the endianness to use the correct
subregister.
llvm-svn: 267489
This patch is what was the "instcombine" portion of D14185, with an additional
test added (see julia_pseudovec in test/Transforms/InstCombine/insert-val-extract-elem.ll).
The patch causes instcombine to replace sequences of extractelement-insertvalue-store
that act essentially like a bitcast followed by a store.
Differential review: http://reviews.llvm.org/D14260
llvm-svn: 267482
log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit
variant cannot encode it. Therefore, set the subreg part accordingly.
[AArch64] Fix optimizeCondBranch logic.
The opcode for the optimized branch does not depend on the size
of the activate bits in the AND masks, but the AND opcode itself.
Indeed, we need to use a X or W variant based on the AND variant
not based on whether the mask fits into the related variant.
Otherwise, we may end up using the W variant of the optimized branch
for 64-bit register inputs!
This fixes the last make check verifier issues for AArch64: PR27479.
llvm-svn: 267465
Fix early exit from linkInModule. IRMover::move returns false on
success and true on error.
Add a few more cases of merged common linkage variables with
different sizes and alignments.
llvm-svn: 267437
Until PR27449 (https://llvm.org/bugs/show_bug.cgi?id=27449) is fixed in
clang this warning is pointless, since ASTFileSignatures will change
randomly when a module is rebuilt.
rdar://problem/25610919
llvm-svn: 267427
visitAND, when folding and (load) forgets to check which output of
an indexed load is involved, happily folding the updated address
output on the following testcase:
target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"
%typ = type { i32, i32 }
define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) {
%b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1
%1 = load i32, i32* %b, align 4
%2 = ptrtoint i32* %b to i64
%3 = and i64 %2, -35184372088833
%4 = inttoptr i64 %3 to i32*
%_msld = load i32, i32* %4, align 4
%zzz = add i32 %1, %_msld
ret i32 %zzz
}
Fix this by checking ResNo.
I've found a few more places that currently neglect to check for
indexed load, and tightened them up as well, but I don't have test
cases for them. In fact, they might not be triggerable at all,
at least with current targets. Still, better safe than sorry.
Differential Revision: http://reviews.llvm.org/D19202
llvm-svn: 267420
Commit r266977 was reason for failing LLVM test suite with error message: fatal error: error in backend: Cannot select: t17: i32 = rotr t2, t11 ...
llvm-svn: 267418
We didn't have logic to correctly handle CFGs where there was more than
one EH-pad successor (these are novel with WinEH).
There were situations where a register was live in one exceptional
successor but not another but the code as written would only consider
the first exceptional successor it found.
This resulted in split points which were insufficiently early if an
invoke was present.
This fixes PR27501.
N.B. This removes getLandingPadSuccessor.
llvm-svn: 267412
Summary:
This patch adds support for the X asm constraint.
To do this, we lower the constraint to either a "w" or "r" constraint
depending on the operand type (both constraints are supported on ARM).
Fixes PR26493
Reviewers: t.p.northover, echristo, rengolin
Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19061
llvm-svn: 267411
The current logic assumes that any constant global will never be SRA'd. I presume this is because normally constant globals can be pushed into their uses and deleted. However, that sometimes can't happen (which is where you really want SRA, so the elements that can be eliminated, are!).
There seems to be no reason why we can't SRA constants too, so let's do it.
llvm-svn: 267393
If several regions cover the same area of code, we have to restore
the combined value for that area when return from a nested region.
This patch achieves that by combining regions before calling buildSegments.
Differential Revision: http://reviews.llvm.org/D18610
llvm-svn: 267390
Summary:
This implements a new method of run-time checking the NoWrap
SCEV predicates, which should be easier to optimize and nicer
for targets that don't correctly handle multiplication/addition
of large integer types (like i128).
If the AddRec is {a,+,b} and the backedge taken count is c,
the idea is to check that |b| * c doesn't have unsigned overflow,
and depending on the sign of b, that:
a + |b| * c >= a (b >= 0) or
a - |b| * c <= a (b <= 0)
where the comparisons above are signed or unsigned, depending on
the flag that we're checking.
The advantage of doing this is that we avoid extending to a larger
type and we avoid the multiplication of large types (multiplying
i128 can be expensive).
Reviewers: sanjoy
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D19266
llvm-svn: 267389
ADD8TLS, a variant of add instruction used for initial-exec TLS,
currently accepts r0 as a source register. While add itself supports
r0 just fine, linker can relax it to a local-exec sequence, converting
it to addi - which doesn't support r0.
Differential Revision: http://reviews.llvm.org/D19193
llvm-svn: 267388
in a debug-info-bearing function has a debug location attached to it. Failure to
do so causes an "!dbg attachment points at wrong subprogram for function"
assertion failure when the inliner sets up inline scope info.
rdar://problem/25878916
This reaplies r267320 without changes after fixing an issue in the OpenMP IR
generator in clang.
llvm-svn: 267370
This corrects the MI annotations for the stack adjustment following the __chkstk
invocation. We were marking the original SP usage as a Def rather than Kill.
The (new) assigned value is the definition, the original reference is killed.
Adjust the ISelLowering to mark Kills and FrameSetup as well.
This partially resolves PR27480.
llvm-svn: 267361
As discussed on D19318, if we only demand the first element of a DIVSS/DIVSD intrinsic, then reduce to a FDIV call. This matches the existing FADD/FSUB/FMUL patterns.
llvm-svn: 267359
Split from D17490. This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:
1 - demanded vector element support for unary and some extra binary scalar intrinsics (RCP/RSQRT/SQRT/FRCZ and ADD/CMP/DIV/ROUND).
2 - addss/addsd get simplified to a fadd call if we aren't interested in the pass through elements
3 - if we don't need the lowest element of a scalar operation then just use the first argument (the pass through elements) directly
We can add support for propagating demanded elements through any equivalent packed SSE intrinsics in a future patch (these wouldn't use the pass through patterns).
Differential Revision: http://reviews.llvm.org/D19318
llvm-svn: 267357
This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:
1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input)
2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input
Differential Revision: http://reviews.llvm.org/D17490
llvm-svn: 267356
Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm.
This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472.
llvm-svn: 267343
This fixes PR22248 on s390x. The previous attempt at this was D19101,
which was before LOAD_STACK_GUARD existed. Compared to the previous
version, this always emits a rather ugly block of 4 instructions, involving
a thread pointer load that can't be shared with other potential users.
However, this is necessary for SSP - spilling the guard value (or thread
pointer used to load it) is counter to the goal, since it could be
overwritten along with the frame it protects.
Differential Revision: http://reviews.llvm.org/D19363
llvm-svn: 267340
Add tests for some missing cases to bitcode upgrade in r267296.
- DICompositeType with an 'elements:' field, which will cause it to be
involved in a cycle after the upgrade.
- A DIDerivedType that references a class in 'extraData:'.
I updated test/Bitcode/dityperefs-3.8.ll with the missing cases and
regenerated test/Bitcode/dityperefs-3.8.ll.bc.
llvm-svn: 267332
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267328
in a debug-info-bearing function has a debug location attached to it. Failure to
do so causes an "!dbg attachment points at wrong subprogram for function"
assertion failure when the inliner sets up inline scope info.
rdar://problem/25878916
llvm-svn: 267320
Right now it only contains the LinkageType, but will be extended
with "hasSection", "isOptSize", "hasInlineAssembly", etc.
Differential Revision: http://reviews.llvm.org/D19404
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267319
Keeping as much as possible internal/private is
known to help the optimizer. Let's try to benefit from
this in ThinLTO.
Note: this is early work, but is enough to build clang (and
all the LLVM tools). I still need to write some lit-tests...
Differential Revision: http://reviews.llvm.org/D19103
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267317
The option to control the emission of the new relocations
is -relax-relocations (blatantly copied from GNU as).
It can't be enabled by default because it breaks relatively
recent versions of ld.bfd/ld.gold (late 2015).
llvm-svn: 267307
It seems we still have some ordering issue in the combined index
emission, but I can't figure out why right now.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267306
This is a fixup for r267304.
The test was sensitive to the path in a subtle way:
the index in memory is sorted by GUID, which are hashes
that include the source filename for local globals.
Teresa recently added a directive at the IR level, so
we can specify it here to make the test independent of
the path.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267305
Summary:
As discussed in D18298, some local globals can't
be renamed/promoted (because they have a section, or because
they are referenced from inline assembly).
To be able to detect naming collision, we need to keep around
the "GUID" using their original name without taking the linkage
into account.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19454
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267304
Summary:
We are always importing the initializer for a GlobalVariable.
So if a GlobalVariable is in the export-list, we pull in any
refs as well.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19102
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267303
Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around
DIType*. It is no longer legal to refer to a DICompositeType by its
'identifier:', and DIBuilder no longer retains all types with an
'identifier:' automatically.
Aside from the bitcode upgrade, this is mainly removing logic to resolve
an MDString-based reference to an actualy DIType. The commits leading
up to this have made the implicit type map in DICompileUnit's
'retainedTypes:' field superfluous.
This does not remove DITypeRef, DIScopeRef, DINodeRef, and
DITypeRefArray, or stop using them in DI-related metadata. Although as
of this commit they aren't serving a useful purpose, there are patchces
under review to reuse them for CodeView support.
The tests in LLVM were updated with deref-typerefs.sh, which is attached
to the thread "[RFC] Lazy-loading of debug info metadata":
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html
llvm-svn: 267296
Summary:
This removes a couple of flags added to control this behavior, and
simply keeps all value names when save-temps is specified.
Reviewers: rafael
Subscribers: llvm-commits, pcc, davide
Differential Revision: http://reviews.llvm.org/D19384
llvm-svn: 267279
Since forward references for uniqued node operands are expensive (and
those for distinct node operands are cheap due to
DistinctMDOperandPlaceholder), minimize forward references in uniqued
node operands.
Moreover, guarantee that when a cycle is broken by a distinct node, none
of the uniqued nodes have any forward references. In
ValueEnumerator::EnumerateMetadata, enumerate uniqued node subgraphs
first, delaying distinct nodes until all uniqued nodes have been
handled. This guarantees that uniqued nodes only have forward
references when there is a uniquing cycle (since r267276 changed
ValueEnumerator::organizeMetadata to partition distinct nodes in front
of uniqued nodes as a post-pass).
Note that a single uniqued subgraph can hit multiple distinct nodes at
its leaves. Ideally these would themselves be emitted in post-order,
but this commit doesn't attempt that; I think it requires an extra pass
through the edges, which I'm not convinced is worth it (since
DistinctMDOperandPlaceholder makes forward references quite cheap
between distinct nodes).
I've added two testcases:
- test/Bitcode/mdnodes-distinct-in-post-order.ll is just like
test/Bitcode/mdnodes-in-post-order.ll, except with distinct nodes
instead of uniqued ones. This confirms that, in the absence of
uniqued nodes, distinct nodes are still emitted in post-order.
- test/Bitcode/mdnodes-distinct-nodes-break-cycles.ll is the minimal
example where a naive post-order traversal would cause one uniqued
node to forward-reference another. IOW, it's the motivating test.
llvm-svn: 267278
When an operand of a distinct node hasn't been read yet, the reader can
use a DistinctMDOperandPlaceholder. This is much cheaper than forward
referencing from a uniqued node. Change
ValueEnumerator::organizeMetadata to partition distinct nodes and
uniqued nodes to reduce the overhead of cycles broken by distinct nodes.
Mehdi measured this for me; this removes most of the RAUW from the
importing step of -flto=thin, even after a WIP patch that removes
string-based DITypeRefs (introducing many more cycles to the metadata
graph).
llvm-svn: 267276
Before we printed a warning to stderr and left the actual output stream in a
mess. This tries to print a .long or .short representation of what we saw (as
if there was a data-in-code directive).
This isn't guaranteed to restore synchronization in Thumb-mode (if the invalid
instruction was supposed to be 32-bits, we may be off-by-16 for the rest of the
function). But there's no certain way to deal with that, and it's invalid code
anyway (if the data really wasn't an instruction, the user can add proper
.data_in_code directives if they care)
llvm-svn: 267250
Only one consumer (llvm-objdump) actually cared about the fact that there were
two triples. Others were actively working around the fact that the Triple
returned by getArch might have been invalid. As for llvm-objdump, it needs to
be acutely aware of both Triples anyway, so being generic in the exposed API is
no benefit.
Also rename the version of getArch returning a Triple. Users were having to
pass an unwanted nullptr to disambiguate the two, which was nasty.
The only functional change here is that armv7m and armv7em object files no
longer crash llvm-objdump.
llvm-svn: 267249
The dwo_name was added to dwo files to improve diagnostics in dwp, but
it confuses tools that attempt to load any dwo named by a dwo_name, even
ones inside dwos. Avoid this by keeping track of whether a unit is
already a dwo unit, and if so, not loading further dwos.
llvm-svn: 267241
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
Rather than relying on the gmlt-like data emitted into the .o/executable
which only contains the simple name of any inlined functions, use the
.dwo file if present.
Test symbolication with/without a .dwo, and the old test that was
testing behavior when no gmlt-like data was present. (I haven't included
a test of non-gmlt-like data + no .dwo (that would be akin to
symbolication with no debug info) but we could add one for completeness)
The test was simplified a bit to be a little clearer (unoptimized, force
inline, using a function call as the inlined entity) and regenerated
with ToT clang. For the no-gmlt-like-data case, I modified Clang back to
its old behavior temporarily & the .dwo file is identical so it is
shared between the two executables.
llvm-svn: 267227
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.
LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.
Differential Revision: http://reviews.llvm.org/D18367
llvm-svn: 267223
Summary:
We can fold compares to false when two distinct allocations within a
function are compared for equality.
Patch by Anna Thomas!
Reviewers: majnemer, reames, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19390
llvm-svn: 267214
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.
Also includes a bonus feature: addends for COFF image-relative references.
Differential Revision: http://reviews.llvm.org/D17938
llvm-svn: 267211
Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case.
llvm-svn: 267210
The opcode for the optimized branch does not depend on the size
of the activate bits in the AND masks, but the AND opcode itself.
Indeed, we need to use a X or W variant based on the AND variant
not based on whether the mask fits into the related variant.
Otherwise, we may end up using the W variant of the optimized branch
for 64-bit register inputs!
This fixes the last make check verifier issues for AArch64: PR27479.
llvm-svn: 267206
Summary: This change will shorten memset if the beginning of memset is overwritten by later stores.
Reviewers: hfinkel, eeckstein, dberlin, mcrosier
Subscribers: mgrang, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18906
llvm-svn: 267197
Avoid quadratic complexity in unusually large basic blocks by limiting
the size of the ready lists.
Differential Revision: http://reviews.llvm.org/D19349
llvm-svn: 267189
In the next change, I am generalizing the function
findStringMetadataForLoop and I want to make sure I don't break this.
Looks like there was no coverage for this so far.
llvm-svn: 267182
We used to simply set the kill flags to true when transforming a scalar
instruction to a vector one.
SrcScalar1 = copy SrcVector1
... = opScalar SrcScalar1
=>
SrcScalar1 = copy SrcVector1
... = opVector SrcVector1<kill>
This is obviously wrong. The proper update consists in:
1. Propagate the kill status from the copy to the new opVector
2. Reset the kill status on the copy, since the live-range of
SrcVector1 got extended.
This fixes some of the machine verifier errors for AArch64 with make check.
llvm-svn: 267180
Rather than checking both stdout and stderr simultaneously, split it into two
tests. This apparently breaks on Windows where MSVCRT does not buffer output
correctly. NFC.
Thanks to chapuni for bringing the issue to my attention!
llvm-svn: 267179
Summary: eq imply [u|s]ge and [u|s]le are true.
Remove redundant logic by implementing isImpliedFalseByMatchingCmp(Pred1, Pred2)
as isImpliedTrueByMatchingCmp(Pred1, getInversePredicate(Pred2)).
llvm-svn: 267177
Summary:
(... while still not using a PostDomTree)
The way we use isKnownNotFullPoison from SCEV today, the new CFG walking
logic will not trigger for any realistic cases -- it will kick in only
for situations where we could have merged the contiguous basic blocks
anyway[0], since the poison generating instruction dominates all of its
non-PHI uses (which are the only uses we consider right now).
However, having this change in place will allow a later bugfix to break
fewer llvm-lit tests.
[0]: i.e. cases where block A branches to block B and B is A's only
successor and A is B's only predecessor.
Reviewers: broune, bjarke.roune
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19212
llvm-svn: 267175
Summary: [u|s]gt and [u|s]lt imply [u|s]ge and [u|s]le are true, respectively.
I've simplified the existing tests and added additional tests to cover the new
cases mentioned above. I've also added tests for all the cases where the
first compare doesn't imply anything about the second compare.
llvm-svn: 267171
A followup commit will replace these tests with simplified and more inclusive
tests. The diff is unreadable if this were to be done in a single commit.
llvm-svn: 267170
- Switch few loops to range-based for loops
- Fix nop insertion at the end of BB
- Fix formatting
- Check for endpgm
Differential Revision: http://reviews.llvm.org/D19380
llvm-svn: 267167
We take the intersection of overflow flags while CSE'ing.
This permits us to consider two instructions with different overflow
behavior to be replaceable.
llvm-svn: 267153
Summary:
When generating assembly using -m16 we must explicitly mark it as
16-bit. Emit .code16 at beginning of file. Fixes wrong results when
using -fno-integrated-as.
Reviewers: dwmw2
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19392
llvm-svn: 267152
When targetting MIPS64R6 some of the patterns for select were guarded by a
broken predicate. The predicate was supposed to test if a constant value
could fit in a 16 bit zero-extended field. Instead the value was tested to
fit in a 16 bit sign-extended field. For negative constants of native word
width this resulted in wrong code generation.
Reviewers: vkalintiris, dsanders
Differential Review: http://reviews.llvm.org/D19378
llvm-svn: 267151
r267049 broke multiple buildbots (e.g. clang-cmake-mips, and clang-x86_64-linux-selfhost-modules) which the follow-ups have not yet resolved and this is preventing subsequent committers from being notified about additional failures on the affected buildbots.
llvm-svn: 267148
Summary:
When optimizing PHIs which have inputs floating point binary
operators, we preserve all IR flags except the fast math
flags.
This change removes the logic which tracked some of the IR flags
(no wrap, exact) and replaces it by doing an and on the IR flags of
all inputs to the PHI - which will also handle the fast math
flags.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19370
llvm-svn: 267139
Summary:
rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS
operations during DAG combine. This generates better code for truncate, so cost
of truncate needs to be changed but looks like it got changed only in SSE2 table
Whereas this change is also applicable for SSE4.1, so the cost of truncate needs
to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE
v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from
SSE4.1, so it will fall back to SSE2.
Reviewers: Simon Pilgrim
llvm-svn: 267123
EarlyCSE had inconsistent behavior with regards to flag'd instructions:
- In some cases, it would pessimize if the available instruction had
different flags by not performing CSE.
- In other cases, it would miscompile if it replaced an instruction
which had no flags with an instruction which has flags.
Fix this by being more consistent with our flag handling by utilizing
andIRFlags.
llvm-svn: 267111
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.
Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.
This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19191
llvm-svn: 267102
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267098
This test used to write a .s file until r266971 fixed that. But on most bots,
the .s file still exists. Add an rm statement to clean up the bots. In a few
days, this statement can go away again.
llvm-svn: 267095
WIN__DBZCHK will insert a CBZ instruction into the stream. This instruction
reserves 3 bits for the condition register (rn). As such, we must ensure that
we restrict the register to a low register. Use the tGPR class instead of GPR
to ensure that this is properly constrained. In debug builds, we would attempt
to use lr as a condition register which would silently get truncated with no
hint that the register selection was incorrect.
llvm-svn: 267080
We'd disabled them on x86 because back in the early days some host tools
couldn't handle the new load commands. This no longer holds: anyone capable of
deploying Clang should be able to deploy its copies of ar/ranlib/etc.
rdar://25254790
llvm-svn: 267075
Summary:
Adds an instrumentation pass for the new EfficiencySanitizer ("esan")
performance tuning family of tools. Multiple tools will be supported
within the same framework. Preliminary support for a cache fragmentation
tool is included here.
The shared instrumentation includes:
+ Turn mem{set,cpy,move} instrinsics into library calls.
+ Slowpath instrumentation of loads and stores via callouts to
the runtime library.
+ Fastpath instrumentation will be per-tool.
+ Which memory accesses to ignore will be per-tool.
Reviewers: eugenis, vitalybuka, aizatsky, filcab
Subscribers: filcab, vkalintiris, pcc, silvas, llvm-commits, zhaoqin, kcc
Differential Revision: http://reviews.llvm.org/D19167
llvm-svn: 267058
Summary:
If we know that the pointer allocated within a function does not escape,
we can fold away comparisons that are done with global pointers
Patch by Anna Thomas!
Reviewers: reames, majnemer, sanjoy
Subscribers: mgrang, mcrosier, majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D19276
llvm-svn: 267035
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
This builds on 266999 which made FindAvailableValue do the right thing. Tests included show the newly enabled transforms and those which disabled either due to conservatism or correctness requirements.
llvm-svn: 267006
Before this fix, DILexicalBlockFile entries were skipped only in some cases and were not in other cases.
Differential Revision: http://reviews.llvm.org/D18724
llvm-svn: 267004
This change adds a couple of test cases to make sure FindAvailableLoadedValue does the right thing. At the moment, the code added is dead, but separating it makes follow on changes far more obvious.
llvm-svn: 266999
AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code.
A compare instruction is substituted with another instruction which does not
produce the same flags as the original compare instruction.
This patch contains:
1. Fix of the bug.
2. A regression test in MIR.
3. A new test to check that SUBS is replaced by SUB.
Differential Revision: http://reviews.llvm.org/D18838
llvm-svn: 266969
This help to streamline the process of handling importing since
we don't need to special case alias everywhere: just like
linkonce_odr function, make sure at least one alias is emitted
by turning it weak.
Differential Revision: http://reviews.llvm.org/D19308
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266958
Summary:
`llvm.guard(false)` always bails out of the current compilation unit, so
we can prune any control flow following it.
Reviewers: hfinkel, pcc, reames
Subscribers: majnemer, reames, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19245
llvm-svn: 266955
Summary:
The function importer already decided what symbols need to be pulled
in. Also these magically added ones will not be in the export list
for the source module, which can confuse the internalizer for
instance.
Reviewers: tejohnson, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19096
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266948
Emit metadata nodes in post-order. The iterative algorithm from r266709
failed to maintain this property. After understanding my mistake, it
wasn't too hard to write a test with llvm-bcanalyzer (and I've actually
made this change once before: see r220340).
This also reverts the "noisy" testcase change from r266709. That should
have been more of a red flag :/.
Note: The same bug crept into the ValueMapper in r265456. I'm still
working on the fix.
llvm-svn: 266947
No matter what value you OR in to A, the result of (or A, B) is going to be UGE A. When A and B are positive, it's SGE too. If A is negative, OR'ing a value into it can't make it positive, but can increase its value closer to -1, therefore (or A, B) is SGE A. Working through all possible combinations produces this truth table:
```
A is
+, -, +/-
F F F + B is
T F ? -
? F ? +/-
```
The related optimizations are flipping the 'slt' for 'sge' which always NOTs the result (if the result is known), and swapping the LHS and RHS while swapping the comparison predicate.
There are more idioms left to implement (aren't there always!) but I've stopped here because any more would risk becoming unreasonable for reviewers.
llvm-svn: 266939
Produce another specific error message for a malformed Mach-O file when a symbol’s
string index is past the end of the string table. The existing test case in test/Object/macho-invalid.test
for macho-invalid-symbol-name-past-eof now reports the error with the message indicating
that a symbol at a specific index has a bad sting index and that bad string index value.
Again converting interfaces to Expected<> from ErrorOr<> does involve
touching a number of places. Where the existing code reported the error with a
string message or an error code it was converted to do the same. There is some
code for this that could be factored into a routine but I would like to leave that for
the code owners post-commit to do as they want for handling an llvm::Error. An
example of how this could be done is shown in the diff in
lib/ExecutionEngine/RuntimeDyld/RuntimeDyldImpl.h which had a Check() routine
already for std::error_code so I added one like it for llvm::Error .
Also there some were bugs in the existing code that did not deal with the
old ErrorOr<> return values. So now with Expected<> since they must be
checked and the error handled, I added a TODO and a comment:
“// TODO: Actually report errors helpfully” and a call something like
consumeError(NameOrErr.takeError()) so the buggy code will not crash
since needed to deal with the Error.
Note there fixes needed to lld that goes along with this that I will commit right after this.
So expect lld not to built after this commit and before the next one.
llvm-svn: 266919
Summary:
This is done for consistency with asan-use-after-return.
I see no other users than tests.
Reviewers: aizatsky, kcc
Differential Revision: http://reviews.llvm.org/D19306
llvm-svn: 266906
Differentiate between word and subword memory operations as they take different
amount of cycles to complete. This just adds a basic model of the subword
latency to the scheduler.
llvm-svn: 266898
This restores r266871 with a fix for gold tests relying on the value
names, when using a release compiler, by adding a way to disable the
default discarding. Update affected tests to use the new mechanism so
that value names are preserved as expected, regardless of how the
compiler was built.
llvm-svn: 266881
Summary:
This patch prevents importing from (and therefore exporting from) any
module with a "llvm.used" local value. Local values need to be promoted
and renamed when importing, and their presense on the llvm.used variable
indicates that there are opaque uses that won't see the rename. One such
example is a use in inline assembly.
See also the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html
As part of this, move collectUsedGlobalVariables out of Transforms/Utils
and into IR/Module so that it can be used more widely. There are several
other places in LLVM that used copies of this code that can be cleaned
up as a follow on NFC patch.
Reviewers: joker.eph
Subscribers: pcc, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18986
llvm-svn: 266877
This reverts commit r266871. Setting the default based on the NDEBUG
flag is causing test failures. Need to figure out whether to change this
approach or update tests.
llvm-svn: 266872
Summary:
Applies Mehdi's optimization (r263086) to disable value names other than
for GlobalValues to LTO/ThinLTO performed via the gold-plugin, in the
same manner as it is applied in libLTO.
Reviewers: rafael, joker-eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19269
llvm-svn: 266871
Add ParseAMDGPURegister which can be invoked recursively for parsing lists.
Rename getRegForName to getSpecialRegForName.
Support legacy SP3 register list syntax: [s2,s3,s4,s5] or [flat_scratch_lo,flat_scratch_hi].
Add 64-bit registers TBA, TMA where missing.
Add some tests.
Differential Revision: http://reviews.llvm.org/D19163
llvm-svn: 266865
This linkage is *not* intended to express that a declaration refers
to a weak symbol, but that the symbol might not be present at link
time. I don't believe it was the intent.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266856
Because lowering of CMP_SWAP_64 occurs during type legalization, there can be
i64 types produced by more than just a BUILD_PAIR or similar. My initial tests
used just incoming function args.
llvm-svn: 266828
Summary:
This property is used to mark an intrinsic that only writes to memory, but
neither reads from memory nor has other side effects.
An example where this is useful is the llvm.amdgcn.buffer.store.format.*
intrinsic, which corresponds to a store instruction that goes through a special
buffer descriptor rather than through a plain pointer.
With this property, the intrinsic should still be handled as having side
effects at the LLVM IR level, but machine scheduling can make smarter
decisions.
Reviewers: tstellarAMD, arsenm, joker.eph, reames
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18291
llvm-svn: 266826
Summary:
The added testcase, which triggered this, was derived from a shader-db case
via bugpoint. A separate question is why scalar branching wasn't used.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19208
llvm-svn: 266825
Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that
just return the thread pointer. I have a pending patch that does the same
for SystemZ (D19054), and there are many more targets that could benefit
from one.
This patch merges the ARM and AArch64 intrinsics into a single target
independent one that will also be used by subsequent targets.
Differential Revision: http://reviews.llvm.org/D19098
llvm-svn: 266818
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.
There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.
Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.
llvm-svn: 266806
* Add lowering for SETCCE i32.
* Add test to check lowering of i64 compares uses SETCCE expansion (outside of EQ and NE).
* Fix select.ll test and immediate form selection for RI operations.
llvm-svn: 266802
Add a new method, DICompositeType::buildODRType, that will create or
mutate the DICompositeType for a given ODR identifier, and use it in
LLParser and BitcodeReader instead of DICompositeType::getODRType.
The logic is as follows:
- If there's no node, create one with the given arguments.
- Else, if the current node is a forward declaration and the new
arguments would create a definition, mutate the node to match the
new arguments.
- Else, return the old node.
This adds a missing feature supported by the current DITypeIdentifierMap
(which I'm slowly making redudant). The only remaining difference is
that the DITypeIdentifierMap has a "the-last-one-wins" rule, whereas
DICompositeType::buildODRType has a "the-first-one-wins" rule.
For now I'm leaving behind DICompositeType::getODRType since it has
obvious, low-level semantics that are convenient for unit testing.
llvm-svn: 266786
Simplify the test logic a little, sharing logic between the two linking
directions by specifying -check-prefix multiple times. Now it's more
obvious what's hte same and different between the two directions, and
there is less CHECK duplication. This is a prep for expanding the test.
llvm-svn: 266773
This patch improves SimplifyCFG to catch cases like:
if (a < b) {
if (a > b) <- known to be false
unreachable;
}
Phabricator Revision: http://reviews.llvm.org/D18905
llvm-svn: 266767
Summary:
There is no reason to have a weak reference because the external
definition will be weak.
Reviewers: rafael
Subscribers: llvm-commits, tejohnson
Differential Revision: http://reviews.llvm.org/D19267
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266750
Lift the API for debug info ODR type uniquing up a layer. Instead of
clients managing the map directly on the LLVMContext, add a static
method to DICompositeType called getODRType and handle the map in the
background. Also adds DICompositeType::getODRTypeIfExists, so far just
for convenience in the unit tests.
This simplifies the logic in LLParser and BitcodeReader. Because of
argument spam there are actually a few more lines of code now; I'll see
if I come up with a reasonable way to clean that up.
llvm-svn: 266742
Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not.
Differential Revision: http://reviews.llvm.org/D19228
llvm-svn: 266728
Summary:
The `"patchable-function"` attribute can be used by an LLVM client to
influence LLVM's code generation in ways that makes the generated code
easily patchable at runtime (for instance, to redirect control).
Right now only one patchability scheme is supported,
`"prologue-short-redirect"`, but this can be expanded in the future.
Reviewers: joker.eph, rnk, echristo, dberris
Subscribers: joker.eph, echristo, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19046
llvm-svn: 266715
Use a worklist instead of recursing through MDNode operands in
ValueEnumerator. The actual record output order has changed slightly,
but otherwise there's no functionality change.
I had to update test/Bitcode/metadata-function-blocks.ll. I renumbered
nodes so they continue to match the implicit record ids.
llvm-svn: 266709
When we suppress linkage names, for a non-inlined subprogram the name
can still be found in the object-file symbol table, because we have
the code address of the subprogram. This is not necessarily the case
for an inlined subprogram, so we still want to emit the linkage name
in the DWARF. Put this on the abstract-origin DIE because it's common
to all inlined instances.
Differential Revision: http://reviews.llvm.org/D18706
llvm-svn: 266692
The fast register-allocator cannot cope with inter-block dependencies without
spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions
where any value produced within a block is dead by the end, but not for
cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded
after regalloc.
Fortunately this is at -O0 so we don't have to care about performance. This
simplifies the various axes of expansion considerably: we assume a strong
seq_cst operation and ensure ordering via the always-present DMB instructions
rather than v8 acquire/release instructions.
Should fix the 32-bit part of PR25526.
llvm-svn: 266679
Summary:
Calls to @llvm.experimental.deoptimize are expected to "never execute",
so optimize them as such.
Reviewers: chandlerc
Subscribers: junbuml, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19095
llvm-svn: 266654
Also,
- Skip pass if machine module does not have debug info
- Minor comment changes
- Added test
Differential Revision: http://reviews.llvm.org/D19079
llvm-svn: 266626
The root of the problem was that findMainViewFileID(File, Function)
could return some ID for any given file, even though that file
was not the main file for that function.
This patch ensures that the result of this function is conformed
with the result of findMainViewFileID(Function).
This commit reapplies r266436, which was reverted by r266458,
with the .covmapping file serialized in v1 format.
Differential Revision: http://reviews.llvm.org/D18787
llvm-svn: 266620
This reverts commit r266477.
This commit introduces cyclic dependency. This commit has "Analysis" depend on "ProfileData",
while "ProfileData" depends on "Object", which depends on "BitCode", which
depends on "Analysis".
llvm-svn: 266619
Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first.
Differential Revision: http://reviews.llvm.org/D19161
llvm-svn: 266617
Summary:
When clang is given -save-temps or -via-file-asm, any inline assembly in
the source is parsed twice. Once by the compiler, and again by the
assembler. We must take care to ensure that this doesn't lead to
double-filling delay slots.
Reviewers: sdardis, vkalintiris
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D19166
llvm-svn: 266608
This required changing several places to print VT enums as strings instead of raw ints since the proper method to use to print became ambiguous. This is probably an improvement anyway.
This also appears to save ~8K from an x86 self host build of llc.
llvm-svn: 266562
There's a hole in the verifier right now: if a module has no compile
units, it never checks that all the string-based DITypeRefs get
resolved. As a result, this testcase didn't fail the verifier, even
there were references to `!"has-uuid"` instead of `!"uuid"` (the former
was a composite type's 'name:' field, the latter its 'identifier:'
field).
I'm currently working on removing string-based type refs entirely, and
this testcase started failing (because the upgrade script can't resolve
the type refs). Rather than fixing the (about-to-be-removed) hole in
the verifier, I'm just going to fix the test so that my upgrade script
handles it.
llvm-svn: 266553
I accidentally replaced `mayBeOverridden` with `!isInterposable`.
Remove the negation and add a test case that would've caught this.
Many thanks to Håkan Hjort for spotting this!
llvm-svn: 266551
Rather than relying on the structural equivalence of DICompositeType to
merge type definitions, use an explicit map on the LLVMContext that
LLParser and BitcodeReader consult when constructing new nodes.
Each non-forward-declaration DICompositeType with a non-empty
'identifier:' field is stored/loaded from the type map, and the first
definiton will "win".
This map is opt-in: clients that expect ODR types from different modules
to be merged must call LLVMContext::ensureDITypeMap.
- Clients that just happen to load more than one Module in the same
LLVMContext won't magically merge types.
- Clients (like LTO) that want to continue to merge types based on ODR
identifiers should opt-in immediately.
I have updated LTOCodeGenerator.cpp, the two "linking" spots in
gold-plugin.cpp, and llvm-link (unless -disable-debug-info-type-map) to
set this.
With this in place, it will be straightforward to remove the DITypeRef
concept (i.e., referencing types by their 'identifier:' string rather
than pointing at them directly).
llvm-svn: 266549
Merge members that are describing the same member of the same ODR type,
even if other bits differ. If the file or line differ, we don't care;
if anything else differs, it's an ODR violation (and we still don't
really care).
For DISubprogram declarations, this looks at the LinkageName and Scope.
For DW_TAG_member instances of DIDerivedType, this looks at the Name and
Scope. In both cases, we know that the Scope follows ODR rules if it
has a non-empty identifier.
llvm-svn: 266548
Split up the long RUN and clarify the CHECK lines:
- Explicitly confirm there are no other subprograms inside of "A".
- Remove checks for "bar" and "baz", which were just implicitly
checking that there were no other subprograms inside of "A".
This prepares for adding a RUN line which links the two files in the
opposite direction.
llvm-svn: 266543
To be able to work accurately on the reference graph when taking
decision about internalizing, promoting, renaming, etc. We need
to have the alias information explicit.
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266517
At the moment almost every lit.site.cfg.in contains two lines comment:
## Autogenerated by LLVM/Clang configuration.
# Do not edit!
The patch adds variable LIT_SITE_CFG_IN_HEADER, that is replaced from
configure_lit_site_cfg with the note and some useful information.
llvm-svn: 266515
This resolves more frame indexes early and folds
the immediate offsets into the scratch mubuf instructions.
This cleans up a lot of the mess that's currently emitted,
such as emitting add 0s and repeatedly initializing the same
register to 0 when spilling.
llvm-svn: 266508
Because HoistSpillHelper::hoistAllSpills is called in postOptimization, before the
patch we didn't want LiveRangeEdit::eliminateDeadDefs to call splitSeparateComponents
and generate unassigned new vregs. However, skipping splitSeparateComponents will make
verify-machineinstrs unhappy, so I remove the early return, and use
HoistSpillHelper::LRE_DidCloneVirtReg to assign physreg/stackslot for those new vregs.
In addition, some code reorganization to make class HoistSpillHelper privately inheriting
from LiveRangeEdit::Delegate possible. This is to be consistent with class RAGreedy and
class RegisterCoalescer.
Differential Revision: http://reviews.llvm.org/D19142
llvm-svn: 266489
Allow explicit section for indirectly called functions in cfi-icall.
Jumptables for functions in the same type class must be contiguous, so they
always go to the default text section.
Fixes PR25079.
llvm-svn: 266486
After r245976, LLVM will skip the last bit test case if knows it will always be
true. However, we would still erroneously update PHI nodes with incoming values
from the MBB that would perform the final bit test, causing -verify-machineinstrs
to fail.
llvm-svn: 266479
Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count.
Differential Revision: http://reviews.llvm.org/D18622
llvm-svn: 266477
Divisions by a constant can be converted into multiplies which are usually
cheaper, but this isn't possible if the constant gets separated (particularly
in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is
cheap.
I considered making the check generic, but neither AArch64 (strangely) nor x86
showed any benefit on the tests I had.
llvm-svn: 266464
This improves AA in the MI schduler when reason about paired instructions.
Phabricator Revision: http://reviews.llvm.org/D17098
PR26358
llvm-svn: 266462
InstCombine wants to optimize compares of calls to fabs with zero.
However, we didn't have the necessary legality checking to verify that
the function call had the same behavior as fabs.
llvm-svn: 266452
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.
Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.
Motivation
----------
Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.
We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.
Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.
http://reviews.llvm.org/D19034
<rdar://problem/25256815>
llvm-svn: 266446
This is almost identical to:
http://reviews.llvm.org/rL264527
This doesn't solve PR27344; it just allows the profile weights to survive.
To solve the bug, we need to use the profile weights in the backend.
llvm-svn: 266442
Summary:
Without MMOs, the callee-save load/store instructions were treated as
volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer.
Reviewers: t.p.northover, mcrosier
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17661
llvm-svn: 266439
[PPC] Previously when casting generic loads to LXV2DX/ST instructions we
would leave the original load return type in place allowing for an
assertion failure when we merge two equivalent LXV2DX nodes with
different types.
This fixes PR27350.
Reviewers: nemanjai
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19133
llvm-svn: 266438
Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on
AArch64 for now.
This change also add support for unscaled stores which were not handled in
getMemOpBaseRegImmOfs().
llvm-svn: 266437
The root of the problem was that findMainViewFileID(File, Function)
could return some ID for any given file, even though that file
was not the main file for that function.
This patch ensures that the result of this function is conformed
with the result of findMainViewFileID(Function).
Differential Revision: http://reviews.llvm.org/D18787
llvm-svn: 266436
Summary:
In the added test-case, the atomic instruction feeds into a non-machine
CopyToReg node which hasn't been selected yet, so guard against
non-machine opcodes here.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19043
llvm-svn: 266433
Summary:
This lets us add this pass to the IR pass manager unconditionally; it
will simply not do anything on targets without branch divergence.
Reviewers: tra
Subscribers: llvm-commits, jingyue, rnk, chandlerc
Differential Revision: http://reviews.llvm.org/D18625
llvm-svn: 266398
If the size of an AST entry changes, we also need to make sure we perform
necessary alias set merges, as the new size may overlap pointers in other sets.
We happen to run into this with memset, because memset allows an entry for a
i8* pointer to have a decidedly non-i8 size.
This fixes PR27262.
Differential Revision: http://reviews.llvm.org/D18939
llvm-svn: 266381
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.
This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).
For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.
This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).
As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.
Fixes PR16275.
llvm-svn: 266363
https://llvm.org/bugs/show_bug.cgi?id=27105
We can check if all bits outside of a constant mask are set with a
single constant.
As noted in the bug report, although this form should be considered the
canonical IR, backends may want to transform this into an 'andn' / 'andc'
comparison against zero because that could be a single machine instruction.
Differential Revision: http://reviews.llvm.org/D18842
llvm-svn: 266362
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.
Reviewers: arsenm, qcolombet
Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D19077
llvm-svn: 266356
Summary:
If a PHI has an incoming undef, we can pretend that it is equal to one
non-undef, non-self incoming value.
This is particularly relevant in combination with the StructurizeCFG
pass, which introduces PHI nodes with undefs. Previously, this lead to
branch conditions that were uniform before StructurizeCFG to become
non-uniform afterwards, which confused the SIAnnotateControlFlow
pass.
This fixes a crash when Mesa radeonsi compiles a shader from
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex
Reviewers: arsenm, tstellarAMD, jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19013
llvm-svn: 266347
Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like
def %vreg0:SGPR_32
...
if-block:
..
def %vreg1:SGPR_32
...
else-block:
...
use %vreg0:SGPR_32
...
and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.
However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.
In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.
Reviewers: arsenm, tstellarAMD
Subscribers: MatzeB, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19041
llvm-svn: 266345
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.
I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.
It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.
Should fix PR25526.
llvm-svn: 266339
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.
This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.
Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.
Reviewers: mareko, arsenm, tstellarAMD, nhaehnle
Subscribers: FireBurn, kerberizer, llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D18340
Patch By: Bas Nieuwenhuizen
llvm-svn: 266337
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,
The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewers: mareko, arsenm, nhaehnle, tstellarAMD
Subscribers: qcolombet, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18941
Patch By: Bas Nieuwenhuizen
llvm-svn: 266336
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.
Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.
The additional checking also acts as white-box testing for the getAsAddRec method.
Reviewers: anemet, sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18792
llvm-svn: 266334
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.
This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.
llvm-svn: 266301
Summary:
The only difference between the removed tests and the pre-existing
ones, is the materialization of the zero constant, which shouldn't
matter for these cases.
Reviewers: dsanders, sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D18693
llvm-svn: 266285
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.
It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the
-NAN semantics.
N.B. Of course, it is only safe to do this if the intrinsic call is
marked nnan.
llvm-svn: 266279
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:
+ ConstantHoisting was modifying switch statements. This is simply invalid,
the cases must remain integer constants no matter the notional cost.
+ ConstantHoisting was mangling alloca instructions in the entry block. These
should be handled by FrameLowering, so constants actually have a cost of 0.
Worse, the resulting bitcasts meant they became dynamic allocas.
rdar://25707382
llvm-svn: 266260
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D18901
llvm-svn: 266253
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D18902
llvm-svn: 266252
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D19007
llvm-svn: 266251
And update the existing test cases in test/Object/macho-invalid.test
to use llvm-objdump with the -macho option to produce these
error messages and stop producing the generic "Invalid data
was encountered while parsing the file" message.
Working from the beginning of the file, if the mach header is too large for
the size of the file and then if the load commands that follow extend past
the end of the file these two errors now generate correct error messages.
Both of these have existing test cases in test/Object/macho-invalid.test .
But the first with macho-invalid-header it will never trigger the error message
"mach header extends past the end of the file" using any of the llvm tools as
they all use identify_magic() which rejects files with the correct magic number
that are too small in size. So I tested this by hacking that code and seeing the
error message down in parseHeader() really does happen. So in case there
is ever code in llvm that directly calls createMachOObjectFile() this error
message will be correctly produced.
The second error message of "load commands extends past the end of the file"
is triggered by a number of existing tests cases in test/Object/macho-invalid.test .
Also other tests trigger different error messages now like "ilocalsym plus
nlocalsym in LC_DYSYMTAB load command extends past the end of the
symbol table".
There are two existing test cases that still get the "Invalid data was encountered ..."
error messages that I will tackle next. But they will involve a bit of pluming an
Expect<...> up through the call stack and I want to do those as separate changes.
FYI, for those test cases that were trying to test specific errors that now get
different errors I’ll fix those in follow on changes and create new test cases
for those so they test the error they were meant to test.
llvm-svn: 266248
Since we can't emit diagnostics for missing "jmp 1f" labels until the end of
the file, we need to be able to restore the context used to calculate
file/line. This is basically the "# line file" directive that's being used at
the time the expression is seen.
rdar://25706972
llvm-svn: 266238
LLVM optimization passes may reduce a profiled target expression
to a constant. Removing runtime calls at such instrumentation points
would help speedup the runtime of the instrumented program.
llvm-svn: 266229
This patch corresponds to review:
http://reviews.llvm.org/D17850
This patch implements the following instructions:
cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd.
llvm-svn: 266228
Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs
of regular LDR/STR.
Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>.
llvm-svn: 266223
This patch fixes a bug (PR26827) when using anti-aliasing in store
merging. This sets the chain users of the component stores to point to
the new store instead of the component stores chain parent.
Reviewers: jyknight
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18909
llvm-svn: 266217
Summary:
To be able to work accurately on the reference graph when taking decision
about internalizing, promoting, renaming, etc. We need to have the alias
information explicit.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266214
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.
TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.
Differential Revision: http://reviews.llvm.org/D17825
llvm-svn: 266205
Summary:
This is a special case for MIPS64 because the architecture requires
properly 32-bit sign-extended values in the register containers.
Additionaly, we merge consecutive trunc + AssertZExt nodes in order
to avoid unnecessary sign-extensions when the extension comes from a
type smaller than i32.
Reviewers: dsanders
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D18893
llvm-svn: 266203
This patch fixes calculating of builtin_object_size if it depends on a
condition. Before this patch compiler did not know how to calculate the
object size when it finds a condition that cannot be eliminated.
This patch enables calculating of builtin_object_size even in case when
condition cannot be eliminated by choosing minimum or maximum value as a
result from condition. Choosing minimum or maximum value from condition
is based on the second argument of __builtin_object_size function.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D18438
llvm-svn: 266193
Differential Revision: http://reviews.llvm.org/D17137
This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068.
There was the problem with test-suite failure.
The problem is hopefully solved with dependant patch so this patch is commited again.
llvm-svn: 266179
Remove an ad-hoc transform in InstCombine and replace it with more
general machinery (ValueTracking, InstructionSimplify and VectorUtils).
This fixes PR27332.
llvm-svn: 266175
Differential Revision: http://reviews.llvm.org/D17068
This changes contains fix for failing test-suite. So, this patch should hopefully work now.
llvm-svn: 266171
two fixes with one about error verify-regalloc reported, and
another about live range update of phi after rematerialization.
r265547:
Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.
analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.
To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.
Patches on top of r265547:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"
Differential Revision: http://reviews.llvm.org/D15302
Differential Revision: http://reviews.llvm.org/D18934
Differential Revision: http://reviews.llvm.org/D18935
Differential Revision: http://reviews.llvm.org/D18936
llvm-svn: 266162
Initialization of m0 is emitted for each LDS operation, so
every block with LDS usage ends up with one. MachineLICM
used to fail to hoist this out of the loop, so every loop
iteration with LDS usage in it would re-initialize it.
This seems to be fixed now, so add a test to make sure that
it stays this way.
llvm-svn: 266156
This state is no longer useful and not guaranteed to be valid in later
codegen passes. For example, see the added test, which would print a
savepoint of %bb.-1 without this change, and crashes with a
use-after-free error under ASan if you apply the recycling allocator
patch from llvm.org/PR26808.
llvm-svn: 266150
This bug was introduced with:
http://reviews.llvm.org/rL262269
AVX masked loads are specified to set vector lanes to zero when the high bit of the mask
element for that lane is zero:
"If the mask is 0, the corresponding data element is set to zero in the load form of these
instructions, and unmodified in the store form." --Intel manual
Differential Revision: http://reviews.llvm.org/D19017
llvm-svn: 266148
Summary:
For correct handling of alias to nameless
function, we need to be able to refer them through a GUID in the summary.
Here we name them using a hash of the non-private global names in the module.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18883
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266132
Summary:
Let keep llvm-as "dumb": it converts textual IR to bitcode. This
commit removes the dependency from llvm-as to libLLVMAnalysis.
We'll add back summary in llvm-as if we get to a textual
representation for it at some point. In the meantime, opt seems
like a better place for that.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19032
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266131
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)
As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.
Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.
Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18292
llvm-svn: 266126
(Recommit of r266002, with r266011, r266016, and not accidentally
including an extra unused/uninitialized element in LibcallRoutineNames)
AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.
This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.
Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.
This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.
It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.
At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.
Differential Revision: http://reviews.llvm.org/D18200
llvm-svn: 266115
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation fixes a hang with the piglit
test:
spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra
Reviewers: arsenm, nhaehnle
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18988
llvm-svn: 266105
This helps clean up some of the mess when expanding unaligned 64-bit
loads when changed to be promote to v2i32, and fixes situations
where or x, 0 was emitted after splitting 64-bit ors during moveToVALU.
I think this could be a generic combine but I'm not sure.
llvm-svn: 266104
Add a check to catch violations. ~60 tests were broken and prevented
this change to be committed. Adrian and I (thanks Adrian!) went
through them in the last week or so updating. The check can be
done more efficiently but I'd still like to get this in ASAP to
avoid more broken tests to be checked in (if any).
PR: 27101
llvm-svn: 266102
Summary:
Under certain circumstances, multi-level breaks (or what is understood by
the control flow passes as such) could be miscompiled in a way that causes
infinite loops, by emitting incorrect control flow intrinsics.
This fixes a hang in
dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18967
llvm-svn: 266088
This is a resubmittion of 263158 change.
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 266086
Summary:
In getUnderlyingObjectsForInstr(): Don't give up on instructions with
multiple MMOs, instead look through all the MMOs and if they all meet
the conservative criteria previously used for single MMO instructions,
then return all of the underlying objects derived from the MMOs.
The change to ScheduleDAGInstrs::buildSchedGraph() is needed to avoid
the case where multiple underlying objects are present and are related
in such a way that successive iterations of the loop end up adding a
dependency from an instruction to itself.
Reviewers: atrick, hfinkel
Subscribers: MatzeB, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18093
llvm-svn: 266084
This patch enables assembler support for .set arch=octeon.
It will fix issues with inline assembler when this directive is used.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D18548
llvm-svn: 266081
When the memory vectorizer is enabled, these tests break.
These tests don't really care about the memory instructions,
and it's easier to write check lines with the unmerged loads.
llvm-svn: 266071
They broke the msan bot.
Original message:
Add __atomic_* lowering to AtomicExpandPass.
AtomicExpandPass can now lower atomic load, atomic store, atomicrmw,and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.
This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.
Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.
This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.
It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.
At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.
Differential Revision: http://reviews.llvm.org/D18200
llvm-svn: 266062
Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D18856
llvm-svn: 266055
This is intended to be shared by the ThinLTOCodeGenerator.
Note that there is a change in the way the verifier is run, previously
it was ran as a Pass on the merged module during internalization.
While now the verifier is called explicitely on the merged module
outside of the internalize "pass pipeline".
What remains strange in the API is the fact that `DisableVerify` in
the API does not disable this initial verifier.
Differential Revision: http://reviews.llvm.org/D19000
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266047
Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046).
The PPCInstrInfo::optimizeCompareInstr function could create a new use of
CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of
CR0 is created.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng
http://reviews.llvm.org/D18884
llvm-svn: 266040
In the ELFv2 ABI, we are not required to save all CR fields. If only one
nonvolatile CR field is clobbered, use mfocrf instead of mfcr to
selectively save the field, because mfocrf has short latency compares to
mfcr.
Thanks Nemanja's invaluable hint!
Reviewers: nemanjai tjablin hfinkel kbarton
http://reviews.llvm.org/D17749
llvm-svn: 266038
`allocsize` is a function attribute that allows users to request that
LLVM treat arbitrary functions as allocation functions.
This patch makes LLVM accept the `allocsize` attribute, and makes
`@llvm.objectsize` recognize said attribute.
The review for this was split into two patches for ease of reviewing:
D18974 and D14933. As promised on the revisions, I'm landing both
patches as a single commit.
Differential Revision: http://reviews.llvm.org/D14933
llvm-svn: 266032
r237193 fix handling of alloca size / align in MergeFunctions, but only tested one and didn't follow FunctionComparator::cmpOperations's usual comparison pattern. It also didn't update Instruction.cpp:haveSameSpecialState which I'll do separately.
llvm-svn: 266022
This is more robust to changes in the link ordering.
Differential Revision: http://reviews.llvm.org/D18946
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266018
Add StackProtector to SafeStack. This adds limited protection against
data corruption in the caller frame. Current implementation treats
all stack protector levels as -fstack-protector-all.
llvm-svn: 266004
This is better for a few reasons:
+ It matches the other tooling for iOS.
+ It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't
use ARM mode).
+ It leads to infinitesimally smaller code (0.2%, yay!).
rdar://25369506
llvm-svn: 266003
AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.
This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.
Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.
This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.
It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.
At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.
Differential Revision: http://reviews.llvm.org/D18200
llvm-svn: 266002
xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) was only being combined at the AfterLegalizeTypes stage, this patch permits the combine to occur anytime before then as well.
The main aim with this to improve the ability to recognise bitmasks that can be converted to shuffles.
I had to modify a number of AVX512 mask tests as the basic bitcast to/from scalar pattern was being stripped out, preventing testing of the mmask bitops. By replacing the bitcasts with loads we can get almost the same result.
Differential Revision: http://reviews.llvm.org/D18944
llvm-svn: 265998
We were incorrectly reporting all non-64 bit integers as int64s.
This is most evident when trying to print the "short" type, but
in theory could happen with chars too (although usually chars use
a different builtin type).
Additionally, we were using the wrong check when deciding whether
to print an enum definition as a global enum. We were checking
whether or not the enum was "nested", and if so saving it until
we print the class definition that it was nested in. But this is
not correct in rare situations where the enum is nested, but the
class that it's nested in does not have type information in the PDB.
So instead we check if there is a class definition for the parent
in the PDB. If so we save it for later, otherwise we print it.
llvm-svn: 265993
Before, ELF at least managed a diagnostic but it was a completely untraceable
"undefined symbol" error. MachO had a variety of even worse behaviours: crash,
emit corrupt file, or an equally bad message.
llvm-svn: 265984
This patch ensures that when we detect first-order recurrences, we reject a phi
node if its previous value is also a phi node. During vectorization the initial
and previous values of the recurrence are shuffled together to create the value
for the current iteration. However, phi nodes are not widened like other
instructions. This fixes PR27246.
Differential Revision: http://reviews.llvm.org/D18971
llvm-svn: 265983
This is the straightforward fix for PR26760:
https://llvm.org/bugs/show_bug.cgi?id=26760
But we still need to make some changes to generalize this helper function
and then send the lshr case into here.
llvm-svn: 265960
This change follows up defaults for GCC and Clang, so LLVM does not differ
from them. While number of the test files are touched with this change, they
all keep the old (expected) behaviour with the explicit option:
"-relocation-model=pic"
The tests that have not been touched are insensitive to relocation model.
Differential Revision: http://reviews.llvm.org/D17995
llvm-svn: 265949
Summary:
This is the first step in also serializing the index out to LLVM
assembly.
The per-module summary written to bitcode is moved out of the bitcode
writer and to a new analysis pass (ModuleSummaryIndexWrapperPass).
The pass itself uses a new builder class to compute index, and the
builder class is used directly in places where we don't have a pass
manager (e.g. llvm-as).
Because we are computing summaries outside of the bitcode writer, we no
longer can use value ids created by the bitcode writer's
ValueEnumerator. This required changing the reference graph edge type
to use a new ValueInfo class holding a union between a GUID (combined
index) and Value* (permodule index). The Value* are converted to the
appropriate value ID during bitcode writing.
Also, this enables removal of the BitWriter library's dependence on the
Analysis library that was previously required for the summary computation.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18763
llvm-svn: 265941
When we see a .arch or .cpu directive, we should try to avoid switching
ARM/Thumb mode if possible.
If we do have to switch modes, we also need to emit the correct mapping
symbol for the new ISA. We did not do this previously, so could emit
ARM code with Thumb mapping symbols (or vice-versa).
The GAS behaviour is to always stay in the same mode, and to emit an
error on any instructions seen when the current mode is not available on
the current target. We can't represent that situation easily (we assume
that Thumb mode is available if ModeThumb is set), so we differ from the
GAS behaviour when switching to a target that can't support the old
mode. I've added a warning for when this implicit mode-switch occurs.
Differential Revision: http://reviews.llvm.org/D18955
llvm-svn: 265936
This adds a conditional variant of CallBR instruction, CallBCR. Also,
it can be fused with integer comparisons, resulting in one of the new
C*BCall instructions.
In addition to CallBRCL limitations, this has another one: it won't
trigger if the function to call isn't already in %r1 - see f22 in the
test for an example (it's also why the loads in tests are volatile).
Author: koriakin
Differential Revision: http://reviews.llvm.org/D18928
llvm-svn: 265933
Restrict the max length of long nops for Lakemont to 7. Experiments on MCU
benchmarks (Dhrystone, Coremark) show that this is the most optimal length.
Differential Revision: http://reviews.llvm.org/D18897
llvm-svn: 265924
Summary:
If we can prove that an op.with.overflow intrinsic does not overflow, we
can get rid of the intrinsic, and replace it with non-wrapping
arithmetic.
Reviewers: atrick, regehr
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18685
llvm-svn: 265913
Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).
Reviewers: atrick, regehr
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18684
llvm-svn: 265912
Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets.
I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL.
Differential Revision: http://reviews.llvm.org/D18937
llvm-svn: 265902
Vectorization cost of uniform load wasn't correctly calculated.
As a result, a simple loop that loads a uniform value wasn't vectorized.
Differential Revision: http://reviews.llvm.org/D18940
llvm-svn: 265901
Summary:
After we make the adjustment, we can assume that for local allocas, but
not for stack parameters, the return address, or any other fixed stack
object (which has a negative offset and therefore lies prior to the
adjusted SP).
Fixes PR26662.
Reviewers: hfinkel, qcolombet, rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D18471
llvm-svn: 265886
This patch drops the debug info for all DISubprograms that are
(a) not attached to an llvm::Function and
(b) not indirectly reachable via inline scopes from any surviving Function and
(c) not reachable from a type (i.e.: member functions).
Background: I'm currently working on a patch to reverse the pointers
between DICompileUnit and DISubprogram (for more info check Duncan's RFC
on lazy-loading of debug info metadata
http://lists.llvm.org/pipermail/llvm-dev/2016-March/097419.html).
The idea is to remove the list of subprograms from DICompileUnit and
instead point to the owning compile unit from each DISubprogram.
After doing this all DISubprograms fulfilling the above criteria will be
implicitly dropped unless we go through an extra effort to preserve them.
http://reviews.llvm.org/D18477
<rdar://problem/25256815>
llvm-svn: 265876
This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle.
The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp.
Differential Revision: http://reviews.llvm.org/D18441
llvm-svn: 265874
Sample-based profiling and optimization remarks currently remove
DICompileUnits from llvm.dbg.cu to suppress the emission of debug info
from them. This is somewhat of a hack and only borderline legal IR.
This patch uses the recently introduced NoDebug emission kind in
DICompileUnit to achieve the same result without breaking the Verifier.
A nice side-effect of this change is that it is now possible to combine
NoDebug and regular compile units under LTO.
http://reviews.llvm.org/D18808
<rdar://problem/25427165>
llvm-svn: 265861
This is a cleanup patch for SSP support in LLVM. There is no functional change.
llvm.stackprotectorcheck is not needed, because SelectionDAG isn't
actually lowering it in SelectBasicBlock; rather, it adds check code in
FinishBasicBlock, ignoring the position where the intrinsic is inserted
(See FindSplitPointForStackProtector()).
llvm-svn: 265851
In Memcpy lowering we had missed a dependence from the load of the
operation to successor operations. This causes us to potentially
construct an in initial DAG with a memory dependence not fully
represented in the chain sub-DAG but rather require looking at the
entire DAG breaking alias analysis by allowing incorrect repositioning
of memory operations.
To work around this, r200033 changed DAGCombiner::GatherAllAliases to be
conservative if any possible issues to happen. Unfortunately this check
forbade many non-problematic situations as well. For example, it's
common for incoming argument lowering to add a non-aliasing load hanging
off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would
find that other (unvisited) use of the EntryNode chain, and just give up
entirely. Furthermore, the check was incomplete: it would not actually
detect all such potentially problematic DAG constructions, because
GatherAllAliases did not guarantee to visit all chain nodes going up to
the root EntryNode. This is in general fine -- giving up early will just
miss a potential optimization, not generate incorrect results. But, for
this non-chain dependency detection code, it's possible that you could
have a load attached to a higher-up chain node than any which were
visited. If that load aliases your store, but the only dependency is
through the value operand of a non-aliasing store, it would've been
missed by this code, and potentially reordered.
With the dependence added, this check can be removed and Alias Analysis
can be much more aggressive. This fixes code quality regression in the
Consecutive Store Merge cleanup (D14834).
Test Change:
ppc64-align-long-double.ll now may see multiple serializations
of its stores
Differential Revision: http://reviews.llvm.org/D18062
llvm-svn: 265836
Summary:
The llvm cos intrinsic currently does not propagate undef's. This change
transforms cos(undef) to null value or 0.
There are 2 test cases added as well.
Patch by Anna Thomas!
Reviewers: sanjoy
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D18863
llvm-svn: 265825
This adds a conditional variant of CallJG instruction, CallBRCL.
It can be used for conditional sibling calls. Unfortunately, due
to IfCvt limitations, it only really works well for functions without
arguments.
Author: koriakin
Differential Revision: http://reviews.llvm.org/D18864
llvm-svn: 265814
We had a select of a cast of a select but attempted to replace the outer
select with the inner select dispite their incompatible types.
Patch by Anton Korobeynikov!
This fixes PR27236.
llvm-svn: 265805
InstCombine cannot effectively remove redundant assumptions without them
registered in the assumption cache. The vectorizer can create identical
assumptions but doesn't register them with the cache, resulting in
slower compile times because InstCombine tries to reason about a lot
more assumptions.
Fix this by registering the cloned assumptions.
llvm-svn: 265800
Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt,
smulwb and smulwt instructions for the ARM backend. Also updated the smul
CodeGen test and removed the smulw one.
Differential Revision: http://reviews.llvm.org/D18892
llvm-svn: 265793
It caused PR27275: "ARM: Bad machine code: Using an undefined physical register"
Also reverting the following commits that were landed on top:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"
llvm-svn: 265790
This re-commits r265535 which was reverted in r265541 because it
broke the windows bots. The problem was that we had a PointerIntPair
which took a pointer to a struct allocated with new. The problem
was that new doesn't provide sufficient alignment guarantees.
This pattern was already present before r265535 and it just happened
to work. To fix this, we now separate the PointerToIntPair from the
ExitNotTakenInfo struct into a pointer and a bool.
Original commit message:
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.
However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.
In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.
We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.
Reviewers: anemet, mzolotukhin, hfinkel, sanjoy
Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17201
llvm-svn: 265786
This is the same change on PPC64 as r255821 on AArch64. I have even borrowed
his commit message.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given machine function and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng
http://reviews.llvm.org/D17533
llvm-svn: 265781
It seems that llc cannot be called used in assembler tests so test that
checks asm for particular target needs to be moved to codegen.
llvm-svn: 265770
This reverts commit r265765, reapplying r265759 after changing a call from
LocalAsMetadata::get to ValueAsMetadata::get (and adding a unit test). When a
local value is mapped to a constant (like "i32 %a" => "i32 7"), the new debug
intrinsic operand may no longer be pointing at a local.
http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/19020/
The previous coommit message follows:
--
This is a partial re-commit -- maybe more of a re-implementation -- of
r265631 (reverted in r265637).
This makes RF_IgnoreMissingLocals behave (almost) consistently between
the Value and the Metadata hierarchy. In particular:
- MapValue returns nullptr or "metadata !{}" for missing locals in
MetadataAsValue/LocalAsMetadata bridging paris, depending on
the RF_IgnoreMissingLocals flag.
- MapValue doesn't memoize LocalAsMetadata-related results.
- MapMetadata no longer deals with LocalAsMetadata or
RF_IgnoreMissingLocals at all. (This wasn't in r265631 at all, but
I realized during testing it would make the patch simpler with no
loss of generality.)
r265631 went too far, making both functions universally ignore
RF_IgnoreMissingLocals. This broke building (e.g.) compiler-rt.
Reassociate (and possibly other passes) don't currently maintain
dominates-use invariants for metadata operands, resulting in IR like
this:
define void @foo(i32 %arg) {
call void @llvm.some.intrinsic(metadata i32 %x)
%x = add i32 1, i32 %arg
}
If the inliner chooses to inline @foo into another function, then
RemapInstruction will call `MapValue(metadata i32 %x)` and assert that
the return is not nullptr.
I've filed PR27273 to add a Verifier check and fix the underlying
problem in the optimization passes.
As a workaround, return `!{}` instead of nullptr for unmapped
LocalAsMetadata when RF_IgnoreMissingLocals is unset. Otherwise, match
the behaviour of r265631.
Original commit message:
ValueMapper: Make LocalAsMetadata match function-local Values
Start treating LocalAsMetadata similarly to function-local members of
the Value hierarchy in MapValue and MapMetadata.
- Don't memoize them.
- Return nullptr if they are missing.
This also cleans up ConstantAsMetadata to stop listening to the
RF_IgnoreMissingLocals flag.
llvm-svn: 265768
Summary:
Fixes PR26774.
If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".
Motivation:
I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard. So transforming:
```
void f(unsigned x) {
unsigned t = 5 / x;
(void)t;
}
```
to
```
void f(unsigned x) { }
```
is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).
Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM. For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).
Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have. This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.
For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store. As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal. The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal. Such a
refined variant will look like it is `readonly`. However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.
Note: this is not just a problem with atomics or with linking
differently optimized object files. See PR26774 for more realistic
examples that involved neither.
This patch:
This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time. It then changes a set of IPO passes to bail out if they see
such a function.
Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18634
llvm-svn: 265762
This patch closes a gap in the DWARF backend that caused LLVM to drop
debug info for floating point variables that were constant for part of
their scope. Floating point constants are emitted as one or more
DW_OP_constu joined via DW_OP_piece.
This fixes a regression caught by the LLDB testsuite that I introduced
in r262247 when we stopped blindly expanding the range of singular
DBG_VALUEs to span the entire scope and started to emit location lists
with accurate ranges instead.
Also deletes a now-impossible testcase (debug-loc-empty-entries).
<rdar://problem/25448338>
llvm-svn: 265760
This is a partial re-commit -- maybe more of a re-implementation -- of
r265631 (reverted in r265637).
This makes RF_IgnoreMissingLocals behave (almost) consistently between
the Value and the Metadata hierarchy. In particular:
- MapValue returns nullptr or "metadata !{}" for missing locals in
MetadataAsValue/LocalAsMetadata bridging paris, depending on
the RF_IgnoreMissingLocals flag.
- MapValue doesn't memoize LocalAsMetadata-related results.
- MapMetadata no longer deals with LocalAsMetadata or
RF_IgnoreMissingLocals at all. (This wasn't in r265631 at all, but
I realized during testing it would make the patch simpler with no
loss of generality.)
r265631 went too far, making both functions universally ignore
RF_IgnoreMissingLocals. This broke building (e.g.) compiler-rt.
Reassociate (and possibly other passes) don't currently maintain
dominates-use invariants for metadata operands, resulting in IR like
this:
define void @foo(i32 %arg) {
call void @llvm.some.intrinsic(metadata i32 %x)
%x = add i32 1, i32 %arg
}
If the inliner chooses to inline @foo into another function, then
RemapInstruction will call `MapValue(metadata i32 %x)` and assert that
the return is not nullptr.
I've filed PR27273 to add a Verifier check and fix the underlying
problem in the optimization passes.
As a workaround, return `!{}` instead of nullptr for unmapped
LocalAsMetadata when RF_IgnoreMissingLocals is unset. Otherwise, match
the behaviour of r265631.
Original commit message:
ValueMapper: Make LocalAsMetadata match function-local Values
Start treating LocalAsMetadata similarly to function-local members of
the Value hierarchy in MapValue and MapMetadata.
- Don't memoize them.
- Return nullptr if they are missing.
This also cleans up ConstantAsMetadata to stop listening to the
RF_IgnoreMissingLocals flag.
llvm-svn: 265759
Summary:
EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate.
EHPad are scheduled in reverse probability order in order to have them flow into each others naturally.
Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17625
llvm-svn: 265726
Standard load/store instructions with GLC bit set.
Reviewers: tstellardAMD, arsenm
Differential Revision: http://reviews.llvm.org/D18760
llvm-svn: 265709
Return is now considered a predicable instruction, and is converted
to a newly-added CondReturn (which maps to BCR to %r14) instruction by
the if conversion pass.
Also, fused compare-and-branch transform knows about conditional
returns, emitting the proper fused instructions for them.
This transform triggers on a *lot* of tests, hence the huge diffstat.
The changes are mostly jX to br %r14 -> bXr %r14.
Author: koriakin
Differential Revision: http://reviews.llvm.org/D17339
llvm-svn: 265689
When GVN wants to re-interpret an already available value in a smaller
type, it needs to right-shift the value on big-endian systems to ensure
the correct bytes are accessed. The shift value is the difference of
the sizes of the two types.
This is correct as long as both types occupy multiples of full bytes.
However, when one of them is a sub-byte type like i1, this no longer
holds true: we still need to shift, but only to access the correct
*byte*. Accessing bits within the byte requires no shift in either
endianness; e.g. an i1 resides in the least-significant bit of its
containing byte on both big- and little-endian systems.
Therefore, the appropriate shift value to be used is the difference of
the *storage* sizes of the two types. This is already handled correctly
in one place where such a shift takes place (GetStoreValueForLoad), but
is incorrect in two other places: GetLoadValueForLoad and
CoerceAvailableValueToLoadType.
This patch changes both places to use the storage size as well.
Differential Revision: http://reviews.llvm.org/D18662
llvm-svn: 265684
http://reviews.llvm.org/D18562
A large number of testcases has been modified so they pass after this test.
One testcase is deleted, because I realized even after undoing the original
change that was committed with this testcase, the testcase still passes. So
I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll)
is an arbitrary change to keep it passing. Given the original intention of the
testcase, and the fact that fixing it will require some time to change the testcase,
we concluded that this quick change will be enough.
llvm-svn: 265683
For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand).
readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding).
Differential Revision: http://reviews.llvm.org/D18696
llvm-svn: 265670
This patch add support for GCC attribute((ifunc("resolver"))) for
targets that use ELF as object file format. In general ifunc is a
special kind of function alias with type @gnu_indirect_function. Patch
for Clang http://reviews.llvm.org/D15524
Differential Revision: http://reviews.llvm.org/D15525
llvm-svn: 265667
Reenable reverted r265550 with endianness issue fixed. Variables of
endian-aware types such as ulittle32_t should be explicitly casted
to their natural equivalent types before passing it as vararg to
printf like functions (format in my case). Added lit config file
depending on AMDGPU target as the testcase uses assembler.
Differential revision: http://reviews.llvm.org/D16998
llvm-svn: 265645
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
(icmp slt x, 0) -> (icmp sle (add x, 1), 0)
(icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
(icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
(icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)
Original Message:
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.
Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.
This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).
This uses the LOCK ISD opcodes added by r262244.
Differential Revision: http://reviews.llvm.org/D17633
llvm-svn: 265636
Third time's the charm? The previous attempt (r265345) caused ASan test
failures on X86, as broken CFI caused stack traces to not work.
This version of the patch makes sure not to merge with stack adjustments
that have CFI, and to not add merged instructions' offests to the CFI
about to be generated.
This is already covered by the lit tests; I just got the expectations
wrong previously.
llvm-svn: 265623
Produce the first specific error message for a malformed Mach-O file describing
the problem instead of the generic message for object_error::parse_failed of
"Invalid data was encountered while parsing the file”. Many more good error
messages will follow after this first one.
This is built on Lang Hames’ great work of adding the ’Error' class for
structured error handling and threading Error through MachOObjectFile
construction. And making createMachOObjectFile return Expected<...> .
So to to get the error to the llvm-obdump tool, I changed the stack of
these methods to also return Expected<...> :
object::ObjectFile::createObjectFile()
object::SymbolicFile::createSymbolicFile()
object::createBinary()
Then finally in ParseInputMachO() in MachODump.cpp the error can
be reported and the specific error message can be printed in llvm-objdump
and can be seen in the existing test case for the existing malformed binary
but with the updated error message.
Converting these interfaces to Expected<> from ErrorOr<> does involve
touching a number of places. To contain the changes for now use of
errorToErrorCode() and errorOrToExpected() are used where the callers
are yet to be converted.
Also there some were bugs in the existing code that did not deal with the
old ErrorOr<> return values. So now with Expected<> since they must be
checked and the error handled, I added a TODO and a comment:
“// TODO: Actually report errors helpfully” and a call something like
consumeError(ObjOrErr.takeError()) so the buggy code will not crash
since needed to deal with the Error.
Note there is one fix also needed to lld/COFF/InputFiles.cpp that goes along
with this that I will commit right after this. So expect lld not to built
after this commit and before the next one.
llvm-svn: 265606
Updating dominators for exit-blocks of the unrolled loops is not enough,
as shown in PR27157. The proper way is to update dominators for all
dominance-children of original loop blocks.
llvm-svn: 265605
http://reviews.llvm.org/D18405
When the integer value loaded is never used directly as integer we should use VSX
or Floating Point Facility integer loads and avoid extra direct move
llvm-svn: 265593
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.
Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Differential Revision: http://reviews.llvm.org/D18559
llvm-svn: 265589
This reverts commit r265550. There're problems with endianness on dumping instruction bytes. Need to find out how to use support::ulittle32_t type properly.
llvm-svn: 265554
when DenseMap growed and moved memory. I verified it fixed the bootstrap
problem on x86_64-linux-gnu but I cannot verify whether it fixes
the bootstrap error on clang-ppc64be-linux. I will watch the build-bot
result closely.
Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.
analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.
To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.
Differential Revision: http://reviews.llvm.org/D15302
llvm-svn: 265547
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.
However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.
In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.
We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.
Reviewers: anemet, mzolotukhin, hfinkel, sanjoy
Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17201
llvm-svn: 265535
To quote the langref "Unlike sqrt in libm, however, llvm.sqrt has
undefined behavior for negative numbers other than -0.0 (which allows
for better optimization, because there is no need to worry about errno
being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sqrt."
This means that it's unsafe to replace sqrt with llvm.sqrt unless the
call is annotated with nnan.
Thanks to Hal Finkel for pointing this out!
llvm-svn: 265521
r265273 added Mapper::mapBlockAddress, which delays mapping a
blockaddress value until the function has a body. The condition was
backwards, and should be checking Function::empty instead of
GlobalValue::isDeclaration.
llvm-svn: 265508
Instead of crashing, give a nice error. As a drive-by, fix the location
associated with the errors for unresolved metadata (the location was off
by one token).
llvm-svn: 265507
This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and
add a couple of test cases. This patch also passed llvm/clang bootstrap
test, and spec2006 build/run/result validation.
Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617
Great thanks to Tom's (tjablin) help, he contributed a lot to this patch.
Thanks Hal and Kit's invaluable opinions!
Reviewers: hfinkel kbarton
http://reviews.llvm.org/D16315
llvm-svn: 265506
This patch implements the following BookII and Book III instructions:
- copy copy_first cp_abort paste paste. paste_last
- msgsync
- slbieg slbsync
- stop
Total 10 instructions
Reviewers: nemanjai hfinkel tjablin amehsan kbarton
llvm-svn: 265504
While preserving the return value for @llvm.experimental.deoptimize at
the IR level is useful during mid-level optimization, doing so at the
machine instruction level requires generating some extra code and a
return that is non-ideal. This change has LLVM lower
```
%val = call @llvm.experimental.deoptimize
ret %val
```
to effectively
```
call @__llvm_deoptimize()
unreachable
```
instead.
llvm-svn: 265502
Don't emit a gc.result for a statepoint lowered from
@llvm.experimental.deoptimize since the call into __llvm_deoptimize is
effectively noreturn. Instead follow the corresponding gc.statepoint
with an "unreachable".
llvm-svn: 265485
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.
llvm-svn: 265481
Prior to this patch, CFLAA wouldn't tag arguments/globals properly if
it didn't find any "interesting" edges on them. This means that, if all
you do is store constants to a global or argument, we would never
actually treat it as a global/argument.
Test case:
define void @foo(i32* %A, i32* %B) #0 {
entry:
store i32 0, i32* %A, align 4
store i32 0, i32* %B, align 4
ret void
}
CFLAA would say that %A can't alias %B, because neither pointer was
used in an interesting way. This patch makes us note whether something
is an argument, global, ... regardless of how interesting CFLAA thinks
its uses are.
(For the record, using a value in an interesting way means loading
from it, using it in a GEP, ...)
llvm-svn: 265474
To start with, handle DW_FORM_string names. Follow up commit will handle
the interesting quirk with type units I was originally aiming for here.
llvm-svn: 265452
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.
Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.
This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).
This uses the LOCK ISD opcodes added by r262244.
Differential Revision: http://reviews.llvm.org/D17633
llvm-svn: 265450
utils/update_test_checks.py was improved with:
http://reviews.llvm.org/rL265414
to CHECK-NEXT the first line of the IR function. This ensures that nothing bad
has happened before that.
llvm-svn: 265417
utils/update_test_checks.py was improved with:
http://reviews.llvm.org/rL265414
to include the first line of the function (expected to be
a comment line). This ensures that nothing bad has happened
before the first actual line of checked asm. It also matches
the existing behavior of the old script.
llvm-svn: 265416
Summary: LanaiSetflagAluCombiner could previously combine instructions across basic building blocks even when not legal. Make the LanaiSetflagAluCombiner more conservative to avoid this.
Reviewers: eliben
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18746
llvm-svn: 265411
Presently, CodeGenPrepare deletes all nearly empty (only phi and branch)
basic blocks. This pass can delete loop preheaders which frequently creates
critical edges. A preheader can be a convenient place to spill registers to
the stack. If the entrance to a loop body is a critical edge, then spills
may occur in the loop body rather than immediately before it. This patch
protects loop preheaders from deletion in CodeGenPrepare even if they are
nearly empty.
Since the patch alters the CFG, it affects a large number of test cases.
In most cases, the changes are merely cosmetic (basic blocks have different
names or instruction orders change slightly). I am somewhat concerned about
the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not
deleted, then the MIPS backend does not take advantage of a branch delay
slot. Consequently, I would like some close review by a MIPS expert.
The patch also partially subsumes D16893 from George Burgess IV. George
correctly notes that CodeGenPrepare does not actually preserve the dominator
tree. I think the dominator tree was usually not valid when CodeGenPrepare
ran, but I am using LoopInfo to mark preheaders, so the dominator tree is
now always valid before CodeGenPrepare.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng
http://reviews.llvm.org/D16984
llvm-svn: 265397
This patch adds support for compact jumps similiar to the previous compact
branch support for MIPSR6. Unlike compact branches, compact jumps do not
have a forbidden slot.
As MipsInstrInfo::getEquivalentCompactForm can determine the correct
expansion for jumps and branches for both microMIPS and MIPSR6, remove the
unnecessary distinction in the delay slot filler.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
llvm-svn: 265390
Summary:
I encountered this issue when constant folding during inlining tried to
fold away a bitcast of a double to an x86_mmx, which is not an integral
type. The test case exposes the same issue with a smaller code snippet
during early CSE.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18528
llvm-svn: 265367
We missed a handful of .mir tests that existed outside the
test/CodeGen/MIR directory.
Also fix the three powerpc .mir tests that nobody noticed were broken.
llvm-svn: 265350
The original commit miscompiled things on 32-bit Windows, e.g. a Clang
boostrap. It turns out that mergeSPUpdates() was a bit too generous in
what it interpreted as a stack adjustment, causing the following code:
addl $12, %esp
leal -4(%ebp), %esp
To be "optimized" into simply:
addl $8, %esp
This commit tightens up mergeSPUpdates() and includes a new test
(test14 in movtopush.ll) for this situation.
llvm-svn: 265345
Adds some dllexport tests to verify that:
- Variables in bss are exported appropriately
- Non-dllexport symbols aliased to dllexport symbols are not exported
- Symbols declared as dllexport but are not defined are not exported
We plan to enable dllimport/dllexport support for the PS4, and these
additional tests are for points we noticed in our internal testing.
Patch by Warren Ristow!
Differential Revision: http://reviews.llvm.org/D18682
llvm-svn: 265333
Direct callees' that are cast to other function prototypes,
show up in the Call/Invoke instructions as ConstantExpr's.
Currently llvm::CallSite's getCalledFunction() fails
to return the callees in such expressions as direct calls.
Value profiling should avoid instrumenting such cases. Mostly NFC.
llvm-svn: 265330
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.
This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.
Related to rdar://24207743
Differential Revision: http://reviews.llvm.org/D18680
llvm-svn: 265329
Summary:
Useful for debugging since we lose this correlation after the permodule
summary/VST is read and until we later materialize source modules in the
function importer.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18555
llvm-svn: 265327
A seg-fault occurs due to a reference of a null pointer, which is
the value returned by getConstantPart. This function returns
null if the constant part is not found. The code that calls this
function needs to check for the null return value.
Differential Revision: http://reviews.llvm.org/D18718
llvm-svn: 265319