Summary:
This patch adds a very simple linker script to version the lib's symbols
and thus trying to avoid crashes if an application loads two different
LLVM versions (as long as they do not share data between them).
Note that we deliberately *don't* make LLVM_5.0 depend on LLVM_4.0:
they're incompatible and the whole point of this patch is
to tell the linker that.
Avoid unexpected crashes when two LLVM versions are used in the same process.
Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Author: Lisandro Damían Nicanor Pérez Meyer <lisandro@debian.org>
Author: Sylvestre Ledru <sylvestre@debian.org>
Bug-Debian: https://bugs.debian.org/848368
Reviewers: beanz, rnk
Reviewed By: rnk
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D31524
llvm-svn: 300496
So, `cast<Instruction>` is not guaranteed to succeed. Change the
code so that we create a new constant and use it in the newly
created instruction, as it's done in other places in InstCombine.
OK'ed by Sanjay/Craig. Fixes PR32686.
llvm-svn: 300495
the exponential behavior.
The patch is to fix PR32043. Functions getZeroExtendExpr and getSignExtendExpr
may call themselves recursively more than once. This is potentially a 2^N
complexity behavior. The exponential behavior was not commonly exposed before
because of existing global cache mechnism like UniqueSCEVs or some early return
mechanism when flags FlagNSW or FlagNUW are seen. However, we still have case
which can expose the exponential behavior, like the case in PR32043, so we add
a local cache in getZeroExtendExpr and getSignExtendExpr. If the input of the
functions -- SCEV and type pair have been seen before, we can find the extended
expression directly in the local cache.
Differential Revision: https://reviews.llvm.org/D30350
llvm-svn: 300494
Avoid looping through program to determine register counts.
This avoids needing to look at regmask operands.
Also fixes some counting errors with flat_scr when there
are no stack objects.
llvm-svn: 300482
While the incoming stack for a kernel is 256-byte aligned,
this refers to the base address of the entire wave. This isn't
useful information for most of codegen. Fixes unnecessarily
aligning stack objects in callees.
llvm-svn: 300481
The splitIndirectCriticalEdges function generates and invalid CFG when the
'Target' basic block is a loop to itself. When this occurs, the code that
updates the predecessor terminator needs to update the terminator in the split
basic block.
This occurs when there is an edge from block D back to D. Since D is split in
to D0 and D1, the code needs to update the terminator in D1. But D1 is not in
the OtherPreds vector, so it was not getting updated.
Differential Revision: https://reviews.llvm.org/D32126
llvm-svn: 300480
This was added to work around a bug in MSVC 2013's implementation of stable_sort. That bug has been fixed as of MSVC 2015 so we shouldn't need this anymore.
Technically the current implementation has undefined behavior because we only protect the deleting of the pVal array with the self move check. There is still a memcpy of that.VAL to VAL that isn't protected. In the case of self move those are the same local and memcpy is undefined for src and dst overlapping.
This reduces the size of the opt binary on my local x86-64 build by about 4k.
Differential Revision: https://reviews.llvm.org/D32116
llvm-svn: 300477
Currently we use getTypeSizeInBits which contains a switch statement to dispatch based on what the Type is. We know we always have a pointer type here, but the compiler isn't able to figure out that out to remove the switch.
This patch changes it to just call handle the pointer type directly by calling getPointerSizeInBits without going through a switch.
getPointerTypeSizeInBits is called pretty often, particularly by getOrEnforceKnownAlignment which is used by InstCombine. This should speed that up a little bit.
Differential Revision: https://reviews.llvm.org/D31841
llvm-svn: 300475
It's basically a terrible idea anyway but objc_msgSend gets emitted like that.
We can decide on a better way to deal with it in the unlikely event that anyone
actually uses it.
llvm-svn: 300474
The documentation for the waymarking algorithm says that we use the lower 2 bits of Use::Prev to store the way marking bits. But because we use a PointerIntPair with the default PointerLikeTypeTraits, we're using bits 2:1 on 64-bit targets.
There's also a trick employed for distinguishing Users that have Uses stored with them and Users that have Uses stored in a separate array. The documentation says we use the LSB of the first byte of the real User object or the User* that occurs at the end of the Use array. But again due to the PointerLikeTypeTraits we're really using bit 2(64-bit) or bit 1(32-bit) and not the LSB. This is a little worrying because the first byte of the User object is the vtable ptr so we're assuming the vtable has 8 byte or 4 byte alignment where what is documented would only require 2 byte alignment.
This patch provides a custom traits override for these two cases to put the bits where the documentation says they are. It also has the side effect of removing some shifts from the waymarking traversal implementation.
Differential Revision: https://reviews.llvm.org/D31733
llvm-svn: 300471
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.
This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.
On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html
Differential Revision: https://reviews.llvm.org/D31838
llvm-svn: 300464
It's almost certainly not a good idea to actually use it in most cases (there's
a pretty large code size overhead on AArch64), but we can't do those
experiments until it's supported.
llvm-svn: 300462
Causes some VGPR usage improvements in shaderdb, but
introduces some SGPR spilling regressions due to random
scheduling changes later.
llvm-svn: 300453
Summary:
This seems like an uncontroversial first step toward providing access to the metadata hierarchy that now exists in LLVM. This should allow for good debug info support from C.
Future plans are to deprecate API that take mixed bags of values and metadata (mainly the LLVMMDNode family of functions) and migrate the rest toward the use of LLVMMetadataRef.
Once this is in place, mapping of DIBuilder will be able to start.
Reviewers: mehdi_amini, echristo, whitequark, jketema, Wallbraker
Reviewed By: Wallbraker
Subscribers: Eugene.Zelenko, axw, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D19448
llvm-svn: 300447
This patch is a generalization of the improvement introduced in rL296898.
Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes
an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after
N iterations, we can peel N times and turn it into invariant.
In order to do this, we for every Phi in loop's header we define the Invariant Depth value
which is calculated as follows:
Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].
If %y is a loop invariant, then Depth(%x) = 1.
If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1.
Otherwise, Depth(%x) is infinite.
Notice that if we peel a loop, all Phis with Depth = 1 become invariants,
and all other Phis with finite depth decrease the depth by 1.
Thus, peeling N first iterations allows us to turn all Phis with Depth <= N
into invariants.
Reviewers: reames, apilipenko, mkuper, skatkov, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31613
llvm-svn: 300446
This is non-functional change to re-order if statements to bail out earlier
from unreachable and ColdCall heuristics.
Reviewers: sanjoy, reames, junbuml, vsk, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31704
llvm-svn: 300442
When peeling loops basing on phis becoming invariants, we make a wrong loop size check.
UP.Threshold should be compared against the total numbers of instructions after the transformation,
which is equal to 2 * LoopSize in case of peeling one iteration.
We should also check that the maximum allowed number of peeled iterations is not zero.
Reviewers: sanjoy, anna, reames, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31753
llvm-svn: 300441
Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected
by unreachable heuristic then we use heuristic and remaining probability is
distributed among other reachable blocks equally.
An example where metadata might be more strong then unreachable heuristic is
as follows: it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B
the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable
heuristics we'll set the probability for A to (1, 2^20) and now edge of A
becomes hotter than edge of B.
This is unexpected behavior.
This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214
Reviewers: sanjoy, junbuml, vsk, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits, reames, davidxl
Differential Revision: https://reviews.llvm.org/D30631
llvm-svn: 300440
If we already called computeKnownBits for the RHS being a constant power of 2, we've already computed everything we can and should just stop. I think previously we would still recurse if we had determined the result was negative or had not determined the sign bit at all.
llvm-svn: 300432
Our 16 bit support is assembler-only + the terrible hack that is
.code16gcc. Simply using 32 bit registers does the right thing for the
latter.
Fixes PR32681.
llvm-svn: 300429
The ConstantInt version has the same assert, and using null/allOnes is likely less efficient.
The only advantage of these local variants (and there's probably a better way to achieve this?)
is to save typing "ConstantInt::" over and over.
llvm-svn: 300426
This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
to instCombineCall pass and was written specific for X86 CMP intrinsics.
Differential Revision: https://reviews.llvm.org/D31398
llvm-svn: 300422
This was throwing an assert because we determined the intra-word shift amount by subtracting the size of the full word shift from the total shift amount. But we failed to account for the fact that we clipped the full word shifts by total words first. To fix this just calculate the intra-word shift as the remainder of dividing by bits per word.
llvm-svn: 300405
Summary:
In PR32594, inline assembly using the 'A' constraint on x86_64 causes
llvm to crash with a "Cannot select" stack trace. This is because
`X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A'
means the EAX and EDX registers.
However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86
(ia16?) it means the old AX and DX registers.
Add new register classes in `X86RegisterInfo.td` to support these cases,
and amend the logic in `getRegForInlineAsmConstraint` to cope with
different subtargets. Also add a test case, derived from PR32594.
Reviewers: craig.topper, qcolombet, RKSimon, ab
Reviewed By: ab
Subscribers: ab, emaste, royger, llvm-commits
Differential Revision: https://reviews.llvm.org/D31902
llvm-svn: 300404
This is a version of D32090 that unifies all of the
`getInstrProf*SectionName` helper functions. (Note: the build failures
which D32090 would have addressed were fixed with r300352.)
We should unify these helper functions because they are hard to use in
their current form. E.g we recently introduced more helpers to fix
section naming for COFF files. This scheme doesn't totally succeed at
hiding low-level details about section naming, so we should switch to an
API that is easier to maintain.
This is not an NFC commit because it fixes llvm-cov's testing support
for COFF files (this falls out of the API change naturally). This is an
area where we lack tests -- I will see about adding one as a follow up.
Testing: check-clang, check-profile, check-llvm.
Differential Revision: https://reviews.llvm.org/D32097
llvm-svn: 300381
When checking if we should return a constant, we create some temporary APInts to see if we know all bits. But the exact computations we do are needed in several other locations in the same code.
This patch moves them to named temporaries so we can reuse them.
Ideally we'd write directly to KnownZero/One, but we currently seem to only write those variables after all the simplifications checks and I didn't want to change that with this patch.
Differential Revision: https://reviews.llvm.org/D32094
llvm-svn: 300376
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.
Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.
llvm-svn: 300367
Now that the libObect support for wasm is better we can
have readobj and nm produce more useful output too.
Differential Revision: https://reviews.llvm.org/D31514
llvm-svn: 300365
...when C1 differs from C2 by one bit and C1 <u C2:
http://rise4fun.com/Alive/Vuo
And move related folds to a helper function. This reduces code duplication and
will make it easier to remove the scalar-only restriction as a follow-up step.
llvm-svn: 300364
We currently only support folding a subtract into a select but not a PHI. This fixes that.
I had to fix an assumption in FoldOpIntoPhi that assumed the PHI node was always in operand 0. Now we pass it in like we do for FoldOpIntoSelect. But we still require some dancing to find the Constant when we create the BinOp or ConstantExpr. This is based code is similar to what we do for selects.
Since I touched all call sites, this also renames FoldOpIntoPhi to foldOpIntoPhi to match coding standards.
Differential Revision: https://reviews.llvm.org/D31686
llvm-svn: 300363
If a kernel's pointer argument is known to be readonly
set access qualifier accordingly. This allows RT not to
flush caches before dispatches.
Differential Revision: https://reviews.llvm.org/D32091
llvm-svn: 300362
One of the ValueTracking unittests creates a named ArrayRef initialized by a std::initializer_list. The underlying array for an std::initializer_list is only guaranteed to have a lifetime as long as the initializer_list object itself. So this can leave the ArrayRef pointing at an array that no long exists.
This fixes this to just create an explicit array instead of an ArrayRef.
Differential Revision: https://reviews.llvm.org/D32089
llvm-svn: 300354
Currently this code always makes 2 or 3 calls to tryFactorization regardless of whether the LHS/RHS are BinaryOperators. We make 3 calls when both operands are BinaryOperators with the same opcode. Or surprisingly, when neither are BinaryOperators. This is because getBinOpsForFactorization returns Instruction::BinaryOpsEnd when the operand is not a BinaryOperator. If both LHS and RHS are not BinaryOperators then they both have an Opcode of Instruction::BinaryOpsEnd. When this happens we rely on tryFactorization to early out due to A/B/C/D being null. Similar behavior occurs for the other calls, we rely on getBinOpsForFactorization having made A/B or C/D null to get tryFactorization to early out.
We also rely on these null checks to check the result of getIdentityValue and early out for it.
This patches refactors this to pull these checks up to SimplifyUsingDistributiveLaws so we don't rely on BinaryOpsEnd as a sentinel or this A/B/C/D null behavior. I think this makes this code easier to reason about. Should also give a tiny performance improvement for cases where the LHS or RHS isn't a BinaryOperator.
Differential Revision: https://reviews.llvm.org/D31913
llvm-svn: 300353
This change really saves just one foldingset lookup, but makes
SCEVRewriteVisitor "feature compatible" with the handwritten logic in
ScalarEvolutionNormalization, so that I can change
ScalarEvolutionNormalization to use SCEVRewriteVisitor in a next step.
This is a non-functional change, but _may_ improve performance in some
pathological cases, but that's unlikely.
llvm-svn: 300348
It won't compile after the recent changes I've made, and I think
keeping it in provides very little value.
Instead I've added (in an earlier commit) a C++ unit test to check the
Denormalize(Normalized(X)) == X property for specific instances of X,
which is what the assert was trying to do anyway.
llvm-svn: 300339
The PostIncTransform class was not pulling its weight, so delete it
and use free functions instead.
This also makes the use of `function_ref` more idiomatic. We were
storing an instance of function_ref in the PostIncTransform class
before, which was fine in that specific case, but the usage after this
change is more obviously okay.
llvm-svn: 300338
Looks like earlier I was relying on #include ordering in files that
used ScalarEvolutionNormalization.h.
Found thanks to the selfhost modules buildbot!
llvm-svn: 300336
It is cleaner to have a callback based system where the logic of
whether an add recurrence is normalized or not lives on IVUsers.
This is one step in a multi-step cleanup.
llvm-svn: 300330
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.
Clang companion patch: D31766.
Differential Revision: https://reviews.llvm.org/D31767
llvm-svn: 300325
This patch is part of D28975's breakdown - no change in output intended.
LV's code currently assumes the vectorized loop is a single basic block up
until predicateInstructions() is called. This patch removes two manifestations
of this assumption (loop phi incoming values, dominator tree update) by
replacing the use of vectorLoopBody with the vectorized loop's latch/header.
Differential Revision: https://reviews.llvm.org/D32040
llvm-svn: 300310
The APInt was created from an 'unsigned' and we just wanted to know how many bits the value needed to represent it. We can just use Log2_32 from MathExtras.h to get the info.
llvm-svn: 300309
Start using it in LLD to avoid needing to read bitcode again just to get the
target triple, and in llvm-lto2 to avoid printing symbol table information
that is inappropriate for the target.
Differential Revision: https://reviews.llvm.org/D32038
llvm-svn: 300300
The tests were failing due to an occasional deadlock in SerializationTraits
for Error: Both serializers and deserializers were protected by a single
mutex and in the unit test (where both ends of the RPC are in the same
process) one side might obtain the mutex, then block waiting for input,
leaving the other side of the connection unable to obtain the mutex to
write the data the first side was waiting for. Splitting the mutex into
two (one for serialization, one for deserialization) appears to have fixed the
issue.
llvm-svn: 300286
Now that we have a type that can represent the attributes on a single
return, function, or parameter, we can pass it around directly rather
than passing around AttributeList and Idx. Removes some more one-based
argument attribute index counting.
NFC
llvm-svn: 300285
This further improves Ahmed's change in rL299482. See the new comment for the
rationale.
The patch recovers most of the regression for bzip2 after D31965. We're down
to +2.68% from +6.97%.
Differential Revision: https://reviews.llvm.org/D32028
llvm-svn: 300276
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.
The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.
NFC
llvm-svn: 300272
If the offset cannot fit into the instruction, an addition to the
pointer is emitted before the actual access. However, BPF offsets are
16-bit but LLVM considers them to be, for the matter of this check,
to be 32-bit long.
This causes the following program:
int bpf_prog1(void *ign)
{
volatile unsigned long t = 0x8983984739ull;
return *(unsigned long *)((0xffffffff8fff0002ull) + t);
}
To generate the following (wrong) code:
0: 18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00
r1 = 590618314553ll
2: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1
3: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 8)
4: 79 10 02 00 00 00 00 00 r0 = *(u64 *)(r1 + 2)
5: 95 00 00 00 00 00 00 00 exit
Fix it by changing the offset check to 16-bit.
Patch by Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Differential Revision: https://reviews.llvm.org/D32055
llvm-svn: 300269
- Refer to options by `-option` instead of `option`
- Use `-mtriple=` instead of `-march` in the example (-march will still
target the default operating system which is usually not what you want
in a test)
- Rephrase sentence because output does not go to stdout by default (you
need -o - for that as should be expected).
llvm-svn: 300268
The ErrorOr should not be dereferenced on the error path.
Patch by Jacob Young
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32032
llvm-svn: 300267
We may not have a working C++ standard library at this point so we
shouldn't rely on it when running CMake checks.
Differential Revision: https://reviews.llvm.org/D31942
llvm-svn: 300260
We call it unconditionally on the operands of the select. Then decide if its a min/max and call it on the min/max operands or on the select operands again. Either of those second calls will overwrite the results of the initial call so we can just delete the first call.
llvm-svn: 300256
For LCSSA purposes, loop BBs not dominating any of the exits aren't
interesting, as none of the values defined in these blocks can be
used outside the loop.
The way the code computed this information was by comparing each
BB of the loop with each of the exit blocks and ask the dominator tree
about their dominance relation. This is slow.
A more efficient way, implemented here, is that of starting from the
exit blocks and walking the dom upwards until we hit an header. By
transitivity, all the blocks we encounter in our path dominate an exit.
For the testcase provided in PR31851, this reduces compile time on
`opt -O2` by ~25%, going from 1m47s to 1m22s.
Thanks to Dan/MichaelZ for discussions/suggesting the approach/review.
Differential Revision: https://reviews.llvm.org/D31843
llvm-svn: 300255
Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This
avoids the (expensive) APInt division operation in favour of bit operations.
Remove all memory allocation from within the GCD loop by tweaking our `lshr`
implementation so it can operate in-place.
Differential Revision: https://reviews.llvm.org/D31968
llvm-svn: 300252
Summary:
Bug noticed by inspection.
Extend the test to handle invokes as well as calls, and rewrite it to
not depend on the inliner and other passes.
Also simplify the call site replacement code with CallSite, similar to
what I did to dead arg elimination and arg promotion (rL300235 and
rL300229).
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32041
llvm-svn: 300251
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.
Reviewers: davidxl, dnovillo
Reviewed By: davidxl
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31950
llvm-svn: 300240
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.
Reviewers: mssimpso, mkuper, anemet
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31979
llvm-svn: 300238
This is effectively a retry of:
https://reviews.llvm.org/rL299851
but now we have tests and an assert to make sure the bug
that was exposed with that attempt will not happen again.
I'll fix the code duplication and missing sibling fold next,
but I want to make this change as small as possible to reduce
risk since I messed it up last time.
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32524
llvm-svn: 300236
Noticed by inspection while doing attribute work. DAE, InstCombineCalls,
and ArgPromotion have a fair amount of duplicated code for hacking on
call sites, and you can find bugs by comparing them.
Add a test case for this.
llvm-svn: 300229
In many cases ds operations can be combined even if offsets do not
fit into 8 bit encoding. What it takes is to adjust base address.
Differential Revision: https://reviews.llvm.org/D31993
llvm-svn: 300227
It's less efficient to produce 'ule' than 'ult' since we know we're going to
canonicalize to 'ult', but we shouldn't have duplicated code for these folds.
As a trade-off, this was a pretty terrible way to make a '2'. :)
if (LHSC == SubOne(RHSC))
AddC = ConstantExpr::getSub(AddOne(RHSC), LHSC);
The next steps are to share the code to fix PR32524 and add the missing 'and'
fold that was left out when PR14708 was fixed:
https://bugs.llvm.org/show_bug.cgi?id=14708
llvm-svn: 300222
Summary:
* Add a bitreverse case in the demanded bits analysis pass.
* Add tests for the bitreverse (and bswap) intrinsic in the
demanded bits pass.
* Add a test case to the BDCE tests: that manipulations to
high-order bits are eliminated once the bits are reversed
and then right-shifted.
Reviewers: mkuper, jmolloy, hfinkel, trentxintong
Reviewed By: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31857
llvm-svn: 300215
Summary:
The linker needs to be able to determine whether a symbol is text or data to
handle the case of a common being overridden by a strong definition in an
archive. If the archive contains a text member of the same name as the common,
that function is discarded. However, if the archive contains a data member of
the same name, that strong definition overrides the common. This is a behavior
of ld.bfd, which the Qualcomm linker also supports in LTO.
Here's a test case to illustrate:
####
cat > 1.c << \!
int blah;
!
cat > 2.c << \!
int blah() {
return 0;
}
!
cat > 3.c << \!
int blah = 20;
!
clang -c 1.c
clang -c 2.c
clang -c 3.c
ar cr lib.a 2.o 3.o
ld 1.o lib.a -t
####
The correct output is:
1.o
(lib.a)3.o
Thanks to Shankar Easwaran and Hemant Kulkarni for the test case!
Reviewers: mehdi_amini, rafael, pcc, davide
Reviewed By: pcc
Subscribers: davide, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D31901
llvm-svn: 300205
If we had these tests, the bug caused by https://reviews.llvm.org/rL299851 would have been caught sooner.
There's also an assert in the code that should have caught that bug, but the assert line itself has a bug.
llvm-svn: 300201
Instructions CALLSEQ_START..CALLSEQ_END and their target dependent
counterparts keep data like frame size, stack adjustment etc. These
data are accessed by getOperand using hard coded indices. It is
error prone way. This change implements the access by special methods,
which improve readability and allow changing data representation without
massive changes of index values.
Differential Revision: https://reviews.llvm.org/D31953
llvm-svn: 300196
Throughout the effort of automatically generating the X86 memory folding tables these missing information were encountered.
This is a preparation work for a future patch including the automation of these tables.
Differential Revision: https://reviews.llvm.org/D31714
llvm-svn: 300190
Refactoring InnerLoopVectorizer's vectorizeBlockInLoop() to provide
vectorizeInstruction(). Aligning DeadInstructions with its only user.
Facilitates driving the transformation by VPlan - follows
https://reviews.llvm.org/D28975 and its tentative breakdown.
Differential Revision: https://reviews.llvm.org/D31997
llvm-svn: 300183
The bool type may be larger than the char type, so assuming we can cast from
bool to char and write a byte out to the stream is unsafe.
Hopefully this will get RPCUtilsTest.ReturnExpectedFailure passing on the bots.
llvm-svn: 300174
Summary:
APInt is currently implemented with an unsigned BitWidth field first and then a uint_64/pointer union. Due to the 64-bit size of the union there is a hole after the bitwidth.
Putting the union first allows the class to be packed. Making it 12 bytes instead of 16 bytes. An APSInt goes from 20 bytes to 16 bytes.
This shows a 4k reduction on the size of the opt binary on my local x86-64 build. So this enables some other improvement to the code as well.
Reviewers: dblaikie, RKSimon, hans, davide
Reviewed By: davide
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D32001
llvm-svn: 300171
At the very least, we have CallInst::setIsNoInline() for adding the
noinline attribute to callsites, and I'm told alwaysinline seems to
work.
Thought of adding "not all attributes are guaranteed to work here". If
someone thinks that would be better (or has a better way of phrasing
that, etc.), happy to add it.
llvm-svn: 300168
This patch allows Error and Expected types to be passed to and returned from
RPC functions.
Serializers and deserializers for custom error types (types deriving from the
ErrorInfo class template) can be registered with the SerializationTraits for
a given channel type (see registerStringError in RPCSerialization.h for an
example), allowing a given custom type to be sent/received. Unregistered types
will be serialized/deserialized as StringErrors using the custom type's log
message as the error string.
llvm-svn: 300167
This is a magic header file supported by the build system that provides a
single definition, LLVM_REVISION, containing an LLVM revision identifier,
if available. This functionality previously lived in the LTO library, but
I am moving it out to lib/Support because I want to also start using it in
lib/Object to create the IR symbol table.
This change also fixes a bug where LLVM_REVISION was never actually being
used in lib/LTO because the macro HAS_LLVM_REVISION was never defined (it
was misspelled as HAVE_SVN_VERSION_INC in lib/LTO/CMakeLists.txt, and was
only being defined in a non-existent file Version.cpp).
I also changed the code to use "git rev-parse --git-dir" to locate the .git
directory, instead of looking for it in the LLVM source root directory,
which makes this compatible with monorepos as well as git worktrees.
Differential Revision: https://reviews.llvm.org/D31985
llvm-svn: 300160
This seems like a much more natural API, based on Derek Schuff's
comments on r300015. It further hides the implementation detail of
AttributeList that function attributes come last and appear at index
~0U, which is easy for the user to screw up. git diff says it saves code
as well: 97 insertions(+), 137 deletions(-)
This also makes it easier to change the implementation, which I want to
do next.
llvm-svn: 300153
This typedef used to be conditional based on whether rvalue references were supported. Looks like it got left behind when we switched to always having rvalue references with c++11. I don't think it provides any value now.
llvm-svn: 300146
Improve performance of argument list parsing with large numbers of IDs and
large numbers of arguments, by tracking a conservative range of indexes within
the argument list that might contain an argument with each ID. In the worst
case (when the first and last argument with a given ID are at the opposite ends
of the argument list), this still results in a linear-time walk of the list,
but it helps substantially in the common case where each ID occurs only once,
or a few times close together in the list.
This gives a ~10x speedup to clang's `test/Driver/response-file.c`, which
constructs a very large set of command line arguments and feeds them to the
clang driver.
Differential Revision: https://reviews.llvm.org/D30130
llvm-svn: 300135
In a followup patch I intend to introduce an additional dumping
mode which dumps a graphical representation of a class's layout.
In preparation for this, the text-based layout printer needs to
be split out from the graphical layout printer, and both need
to be able to use the same code for printing the intro and outro
of a class's definition (e.g. base class list, etc).
This patch does so, and in the process introduces a skeleton
definition for the graphical printer, while currently making
the graphical printer just print nothing.
NFC
llvm-svn: 300134
Previously the dumping of class definitions was very primitive,
and it made it hard to do more than the most trivial of output
formats when dumping. As such, we would only dump one line for
each field, and then dump non-layout items like nested types
and enums.
With this patch, we do a complete analysis of the object
hierarchy including aggregate types, bases, virtual bases,
vftable analysis, etc. The only immediately visible effects
of this are that a) we can now dump a line for the vfptr where
before we would treat that as padding, and b) we now don't
treat virtual bases that come at the end of a class as padding
since we have a more detailed analysis of the class's storage
usage.
In subsequent patches, we should be able to use this analysis
to display a complete graphical view of a class's layout including
recursing arbitrarily deep into an object's base class / aggregate
member hierarchy.
llvm-svn: 300133
The test fails on Darwin because Fuzzer::DeathCallback (which calls
DumpCurrentUnit("crash-")) is called before DumpCurrentUnit("oom-") is
called in Fuzzer::RssLimitCallback. DeathCallback is transitively called
from __sanitizer_print_memory_profile.
This should fix the fuzzer bot that has been failing for a while:
http://lab.llvm.org:8080/green/job/libFuzzer/
llvm-svn: 300127
Previously it tried to call SimplifyInstruction which doesn't know anything about alloca so defers to constant folding which also doesn't do anything with alloca. This results in wasted cycles making calls that won't do anything. Given the frequency with which this function is called this time adds up.
llvm-svn: 300118
If workgroup size is known inform llvm about range returned by local
id and local size queries.
Differential Revision: https://reviews.llvm.org/D31804
llvm-svn: 300102
Summary:
Readnone attribute would cause CSE of two barriers with
the same argument, which is invalid by example:
struct Base {
virtual int foo() { return 42; }
};
struct Derived1 : Base {
int foo() override { return 50; }
};
struct Derived2 : Base {
int foo() override { return 100; }
};
void foo() {
Base *x = new Base{};
new (x) Derived1{};
int a = std::launder(x)->foo();
new (x) Derived2{};
int b = std::launder(x)->foo();
}
Here 2 calls of std::launder will produce @llvm.invariant.group.barrier,
which would be merged into one call, causing devirtualization
to devirtualize second call into Derived1::foo() instead of
Derived2::foo()
Reviewers: chandlerc, dberlin, hfinkel
Subscribers: llvm-commits, rsmith, amharc
Differential Revision: https://reviews.llvm.org/D31531
llvm-svn: 300101
Often you have a unique_ptr<T> where T supports LLVM's
casting methods, and you wish to cast it to a unique_ptr<U>.
Prior to this patch, this requires doing hacky things like:
unique_ptr<U> Casted;
if (isa<U>(Orig.get()))
Casted.reset(cast<U>(Orig.release()));
This is overly verbose, and it would be nice to just be able
to use unique_ptr directly with cast and dyn_cast. To this end,
this patch updates cast<> to work directly with unique_ptr<T>,
so you can now write:
auto Casted = cast<U>(std::move(Orig));
Since it's possible for dyn_cast<> to fail, however, we choose
to use a slightly different API here, because it's awkward to
write
if (auto Casted = dyn_cast<U>(std::move(Orig))) {}
when Orig may end up not having been moved at all. So the
interface for dyn_cast is
if (auto Casted = unique_dyn_cast<U>(Orig)) {}
Where the inclusion of `unique` in the name of the cast operator
re-affirms that regardless of success of or fail of the casting,
exactly one of the input value and the return value will contain
a non-null result.
Differential Revision: https://reviews.llvm.org/D31890
llvm-svn: 300098
This replicates the known bits and constant creation code from the single use case for these instructions and adds it here. The computeKnownBits and constant creation code for other instructions is now in the default case of the opcode switch.
llvm-svn: 300094
We already handled a superset check that included the known ones too and folded to a constant that may include ones. But it can also handle the case of no ones.
llvm-svn: 300093
As discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32486
...the canonicalization of vector select to shufflevector does not hold up
when undef elements are present in the condition vector.
Try to make the undef handling clear in the code and the LangRef.
Differential Revision: https://reviews.llvm.org/D31980
llvm-svn: 300092
The use of a DenseMap in precomputeTriangleChains does not cause
non-determinism, even though it is iterated over, as the only thing the
iteration does is to insert entries into a new DenseMap, which is not iterated.
Comment only change.
llvm-svn: 300088
Currently if we reach an instruction with multiples uses we know we can't do any optimizations to that instruction itself since we only have the demanded bits for one of the users. But if we know all of the bits are zero/one for that one user we can still go ahead and create a constant to give to that user.
This might then reduce the instruction to having a single use and allow additional optimizations on the other path.
This picks up an additional case that r300075 didn't catch.
Differential Revision: https://reviews.llvm.org/D31552
llvm-svn: 300084
The current heuristic is triggered on `InFlightCount > BufferLimit`
which isn't really helpful on in-order cores where BufferLimit is zero.
Note that we already get latency hiding effects for in order cores
by instructions staying in the pending queue on stalls; The additional
latency scheduling heuristics only have minimal effects after that while
occasionally increasing register pressure too much resulting in extra
spills.
My motivation here is additional spills/reloads ending up in a loop in
464.h264ref / BlockMotionSearch function resulting in a 4% overal
regression on an in order core. rdar://30264380
llvm-svn: 300083
If we are adding/subtractings 0s below the highest demanded bit we can just use the other operand and remove the operation.
My primary motivation is observing that we can call ShrinkDemandedConstant for the add/sub and create a 0 constant, rather than removing the add completely. In the case I saw, we modified the constant on an add instruction to a 0, but the add is not put into the worklist. So we didn't revisit it until the next InstCombine iteration. This caused an IR modification to remove add and a subsequent iteration to be ran.
With this change we get bypass the add in the first iteration and prevent the second iteration from changing anything.
Differential Revision: https://reviews.llvm.org/D31120
llvm-svn: 300075
One potential way to make InstCombine (very slightly?) faster is to recycle instructions
when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't
consider this an "InstSimplify". We could, however, make a new layer to house transforms
like this if that makes InstCombine more manageable (just throwing out an idea; not sure
how much opportunity is actually here).
Differential Revision: https://reviews.llvm.org/D31863
llvm-svn: 300067
Use '2>&1 |' and not '|&' to pipe debug output to FileCheck
Hopefully handles a "shell parser error" on
llvm-clang-x86_64-expensive-checks-win
test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll
llvm-svn: 300064
On FreeBSD backtrace is not part of libc and depends on libexecinfo
being available. Instead of using manual checks we can use the builtin
CMake module FindBacktrace.cmake to detect availability of backtrace()
in a portable way.
Patch By: Alex Richardson
Differential Revision: https://reviews.llvm.org/D27143
llvm-svn: 300062
In getEntryCost(), make the scalar type for a compare instruction that of the
operands, not i1. This is needed in order to call getCmpSelInstrCost() for a
compare in a sensible way, the same way as the LoopVectorizer does.
New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll
Review: Matthew Simpson
https://reviews.llvm.org/D31601
llvm-svn: 300061
The cost for a branch after vectorization is very different depending on if
the vectorizer will if-convert the block (branch is eliminated), or if
scalarized and predicated blocks will be produced (branch duplicated before
each block). There is also the case of remaining scalar branches, such as the
back-edge branch.
This patch handles these cases differently with TTI based cost estimates.
Review: Matthew Simpson
https://reviews.llvm.org/D31175
llvm-svn: 300058
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.
This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.
The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.
New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll
Review: Adam Nemet
https://reviews.llvm.org/D30680
llvm-svn: 300056
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
This change is basically relative to D31136, where I initially wanted to
implement some relocations handling optimization which shows it can give
significant boost. Though even without any caching algorithm looks
code can have some cleanup at first.
Refactoring separates out code for taking symbol address, used in relocations
computation.
Differential revision: https://reviews.llvm.org/D31747
llvm-svn: 300039
Summary:
As far as instruction selection is concerned, all three appear to be same thing.
Support for these operands is experimental since AArch64 doesn't make use
of them and the in-tree targets that do use them (AMDGPU for
OperandWithDefaultOps, AMDGPU/ARM/Hexagon/Lanai for PredicateOperand, and ARM
for OperandWithDefaultOps) are not using tablegen-erated GlobalISel yet.
Reviewers: rovka, aditya_nandakumar, t.p.northover, qcolombet, ab
Reviewed By: rovka
Subscribers: inglorion, aemerson, rengolin, mehdi_amini, dberris, kristof.beyls, igorb, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D31135
llvm-svn: 300037
Summary:
Dead basic blocks may be forming a loop, for which SSA form is
fulfilled, but with a circular def-use chain. LoadCombine could
enter an infinite loop when analysing such dead code. This patch
solves the problem by simply avoiding to analyse all basic blocks
that aren't forward reachable, from function entry, in LoadCombine.
Fixes https://bugs.llvm.org/show_bug.cgi?id=27065
Reviewers: mehdi_amini, chandlerc, grosser, Bigcheese, davide
Reviewed By: davide
Subscribers: dberlin, zzheng, bjope, grandinj, Ka-Ka, materi, jholewinski, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31032
llvm-svn: 300034
Summary:
Alias analysis would like to know that
invariant.group.barrier returns pointer that mustalias,
but this can't imply that we can replace one pointer with another
Reviewers: dberlin, sanjoy
Subscribers: llvm-commits, chandlerc, hfinkel, nlewycky, amharc
Differential Revision: https://reviews.llvm.org/D31758
llvm-svn: 300033
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.
All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.
Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.
Differential Revision: https://reviews.llvm.org/D31548
llvm-svn: 300032
Collection of PostDominatedByUnreachable and PostDominatedByColdCall have been
split out of heuristics itself. Update of the data happens now for each basic
block (before update for PostDominatedByColdCall might be skipped if
unreachable or matadata heuristic handled this basic block).
This separation allows re-ordering of heuristics without loosing
the post-domination information.
Reviewers: sanjoy, junbuml, vsk, chandlerc, reames
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31701
llvm-svn: 300029
Not clearing was causing non-deterministic compiles for large files. Addresses
for MachineBasicBlocks would end up colliding and we would lay out a block that
we assumed had been pre-computed when it had not been.
llvm-svn: 300022
Summary:
COFF requires that every comdat contain a symbol with the same name as
the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this
requirement to be violated. This change avoids such violations by
renaming comdats if their leaders are renamed. It also keeps comdats
together when splitting modules.
Reviewers: pcc, mehdi_amini, tejohnson
Reviewed By: pcc
Subscribers: rnk, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31963
llvm-svn: 300019
Summary:
For now, it just wraps AttributeSetNode*. Eventually, it will hold
AvailableAttrs as an inline bitset, and adding and removing enum
attributes will be super cheap.
This sinks AttributeSetNode back down to lib/IR/AttributeImpl.h.
Reviewers: pete, chandlerc
Subscribers: llvm-commits, jfb
Differential Revision: https://reviews.llvm.org/D31940
llvm-svn: 300014