Summary: This allows pthread_self to be pulled out of a loop by LICM.
Reviewers: hfinkel, arsenm, davide
Reviewed By: davide
Subscribers: davide, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D32782
llvm-svn: 303495
In the case where we have an operand defined by a lod of the
same memory location. Historically this was a VariableExpression
because we wanted to make sure they ended up in the same class,
but if we create the right expression, they end up in the same
class anyway.
Fixes PR32897. Thanks to Dan for the detailed discussion and the
fix suggestion.
llvm-svn: 303475
This was here because we don't want to switch leaders too much,
in order to avoid fixpoint(ing) issue, but it's not sure if it
matters in practice.
A first step towards fixing PR32897.
llvm-svn: 303473
Refactor the strlen optimization code to work for both strlen and wcslen.
This especially helps with programs in the wild where people pass
L"string"s to const std::wstring& function parameters and the wstring
constructor gets inlined.
This also fixes a lingerind API problem/bug in getConstantStringInfo()
where zeroinitializers would always give you an empty string (without a
length) back regardless of the actual length of the initializer which
did not work well in the TrimAtNul==false causing the PR mentioned
below.
Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG
memcpy lowering and may lead to some cases for out-of-bounds
zeroinitializer accesses not getting optimized anymore. So some code
with UB may produce out of bound memory reads now instead of just
producing zeros.
The refactoring "accidentally" fixes http://llvm.org/PR32124
Differential Revision: https://reviews.llvm.org/D32839
llvm-svn: 303461
This is a complicated bug involving two issues:
1. What do we do with phi nodes when we prove all arguments are not
live?
2. When is it safe to use value leaders to determine if we can ignore
an argumnet?
llvm-svn: 303453
Summary:
NewGVN: Handle equivalence between phi of ops and op of phis.
This makes our GVN mostly-complete. It would be complete, modulo some
deliberate choices we make. This means it detects roughly all herband
equivalences in polynomial time, including cases notoriously hard for
other GVN's to detect. It also detects a very large swath of the
cases we currently rely on instcombine to detect that involve folding
upwards through phis.
Fixes PR 31125, 31463, PR 31868
Reviewers: davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D32151
llvm-svn: 303444
Summary:
This NFC simply refactors the return value of LoopIdiomRecognize::isLegalStore() from bool to an enumeration, and
removes the return-through-parameter mechanism that the function was using. This function is constructed such that it will
only ever recognize a single store idiom (memset, memset_pattern, or memcpy), and never a combination of these. As such it
makes much more sense for the return value to be the single idiom that the store matches, rather than
having a separate argument-return for each idiom -- it's cleaner, and makes it clearer that
only a single idiom can be matched.
Patch by Daniel Neilson!
Reviewers: anna, sanjoy, davide, haicheng
Reviewed By: anna, haicheng
Subscribers: haicheng, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D33359
llvm-svn: 303434
We can have cycles between PHIs and this causes singleReachablePhi()
to call itself indefintely (until we run out of stack). The proper
solution would be that of computing SCCs, but it's not worth for
now, so just keep a visited set and give up when we find a cycle.
Thanks to Dan for the discussion/help with this.
Fixes PR33014.
llvm-svn: 303393
Also, fix the old-style capitalization of the related functions
and move them to the 'private' section of the class since they
are just helpers of the visit* functions.
As shown in the post-commit comments for D32143, we are missing
folds for xor-of-icmps.
llvm-svn: 303381
Summary:
Implements PR889
Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:
https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing
This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal. However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue. I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.
I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.
Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv
Reviewed By: chandlerc
Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D31261
llvm-svn: 303362
The testcase in PR33077 generates a LSR Use Formula with two SCEVAddRecExprs for the same
loop. Such uncommon formula will become non-canonical after GenerateTruncates adds sign
extension to the ScaledReg of the Formula, and it will break the assertion that every
Formula to be inserted is canonical.
The fix is to call canonicalize for the raw Formula generated by GenerateTruncates
before inserting it.
llvm-svn: 303361
Summary:
We have a bug when RAUWing the condition if experimental.guard or assumes is a use of that
condition. This is because LazyValueInfo may have used the guards/assumes to identify the
value of the condition at the end of the block. RAUW replaces the uses
at the guard/assume as well as uses before the guard/assume. Both of
these are incorrect.
For now, disable RAUW for conditions and fix the logic as a next
step: https://reviews.llvm.org/D33257
Reviewers: sanjoy, reames, trentxintong
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33279
llvm-svn: 303349
Summary:
There are several places in the codebase that try to calculate a maximum value in a Statistic object. We currently do this in one of two ways:
MaxNumFoo = std::max(MaxNumFoo, NumFoo);
or
MaxNumFoo = (MaxNumFoo > NumFoo) ? MaxNumFoo : NumFoo;
The first version reads from MaxNumFoo one time and uncontionally rwrites to it. The second version possibly reads it twice depending on the result of the first compare. But we have no way of knowing if the value was changed by another thread between the reads and the writes.
This patch adds a method to the Statistic object that can ensure that we only store if our value is the max and the previous max didn't change after we read it. If it changed we'll recheck if our value should still be the max or not and try again.
This spawned from an audit I'm trying to do of all places we uses the implicit conversion to unsigned on the Statistics objects. See my previous thread on llvm-dev https://groups.google.com/forum/#!topic/llvm-dev/yfvxiorKrDQ
Reviewers: dberlin, chandlerc, hfinkel, dblaikie
Reviewed By: chandlerc
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D33301
llvm-svn: 303318
I believe this technically fixes a multithreaded race condition in this code. But my primary concern was as part of looking at removing the ability to treat Statistics like a plain unsigned. There are many weird operations on Statistics in the codebase.
llvm-svn: 303314
The missing optimization for xor-of-icmps still needs to be added, but by
being more efficient (not generating unnecessary logic ops with constants)
we avoid the bug.
See discussion in post-commit comments:
https://reviews.llvm.org/D32143
llvm-svn: 303312
As noted in the post-commit comments in D32143, we should be
catching the constant operand cases sooner to be more efficient
and less likely to expose a missing fold.
llvm-svn: 303309
There should be a slight efficiency improvement from handling icmp/fcmp with one matcher and reducing duplicated code.
The larger motivation is that there are questions about how predicate canonicalization is handled, and the refactoring
should make it easier if we want to change any of that behavior.
1. As noted in the code comment, we've chosen 3 of the 16 FCMP preds as not canonical. Why those 3? It goes back to
rL32751 from what I can tell, but I'm not sure if there's a justification for that rule.
2. We currently do not canonicalize integer select conditions. Should we use the same rule that applies to branches
for selects?
3. We currently do canonicalize some FP select conditions, and those rules would conflict with the rule shown here.
Should one or both be changed?
No-functional-change-intended, but adding tests anyway because there's no coverage for most of the predicates.
Differential Revision: https://reviews.llvm.org/D33247
llvm-svn: 303261
If we need to spill the result of the PHI instruction, we insert the spill after
all of the PHIs and EHPads, however, in a catchswitch block there is no
room to insert the spill. Make room by splitting away catchswitch into a separate
block.
Before the fix:
catch.dispatch:
%val = phi i32 [ 1, %if.then ], [ 2, %if.else ]
%switch = catchswitch within none [label %catch] unwind label %cleanuppad
After:
catch.dispatch:
%val = phi i32 [ 1, %if.then ], [ 2, %if.else ]
%tok = cleanuppad within none []
; spill goes here
cleanupret from %tok unwind label %catch.dispatch.switch
catch.dispatch.switch:
%switch = catchswitch within none [label %catch] unwind label %cleanuppad
https://reviews.llvm.org/D31846
llvm-svn: 303232
CTLZ idiom recognition (r303102).
Summary:
The following case:
i = 1;
if(n)
while (n >>= 1)
i++;
use(i);
Was converted to:
i = 1;
if(n)
i += builtin_ctlz(n >> 1, false);
use(i);
Which is not correct. The patch make it:
i = 1;
if(n)
i += builtin_ctlz(n >> 1, true);
use(i);
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 303212
Summary:
RewritePHIs algorithm used in building of CoroFrame inserts a placeholder
```
%placeholder = phi [%val]
```
on every edge leading to a block starting with PHI node with multiple incoming edges,
so that if one of the incoming values was spilled and need to be reloaded, we have a
place to insert a reload. We use SplitEdge helper function to split the incoming edge.
SplitEdge function does not deal with unwind edges comping into a block with an EHPad.
This patch adds an ehAwareSplitEdge function that can correctly split the unwind edge.
For landing pads, we clone the landing pad into every edge block and replace the original
landing pad with a PHI collection the values from all incoming landing pads.
For WinEH pads, we keep the original EHPad in place and insert cleanuppad/cleapret in the
edge blocks.
Reviewers: majnemer, rnk
Reviewed By: majnemer
Subscribers: EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D31845
llvm-svn: 303172
This function gives the wrong answer on some non-ELF platforms in some
cases. The function that does the right thing lives in Mangler.h. To try to
discourage people from using this function, give it a different name.
Differential Revision: https://reviews.llvm.org/D33162
llvm-svn: 303134
There's no need (& a bit incorrect) to mask off the high bits of the
register reference when describing a simple bool value.
Reviewers: aprantl
Differential Revision: https://reviews.llvm.org/D31062
llvm-svn: 303117
ARM Neon has native support for half-sized vector registers (64 bits). This
is beneficial for example for 2D and 3D graphics. This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.
*** Performance Analysis
This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.
The results are with -O3 and PGO. A negative percentage is an improvement.
The testsuite was run with a sample size of 4.
** SPEC
* CFP2006/482.sphinx3 -3.34%
A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.
* CFP2000/177.mesa -3.34%
* CINT2000/256.bzip2 +6.97%
My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%. There are also other problems with the codegen in
this loop so there is further room for improvement.
** LLVM testsuite
* SingleSource/Benchmarks/Misc/ReedSolomon -10.75%
There are multiple small SLP vectorizations outside the hot code. It's a bit
surprising that it adds up to 10%. Some of this may be code-layout noise.
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%
The opt-viewer screenshot can be seen at F3218284. We start at a colder store
but the tree leads us into the hottest loop.
* MultiSource/Applications/lambda-0.1.3/lambda -2.68%
* MultiSource/Benchmarks/Bullet/bullet -2.18%
This is using 3D vectors.
* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%
Noise, binary is unchanged.
* MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90%
There is an additional SLP in the cold code. The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.
* MultiSource/Applications/aha/aha +1.63%
* MultiSource/Applications/JM/lencod/lencod +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark +1.15%
Differential Revision: https://reviews.llvm.org/D31965
llvm-svn: 303116
Summary:
The following loops should be recognized:
i = 0;
while (n) {
n = n >> 1;
i++;
body();
}
use(i);
And replaced with builtin_ctlz(n) if body() is empty or
for CPUs that have CTLZ instruction converted to countable:
for (j = 0; j < builtin_ctlz(n); j++) {
n = n >> 1;
i++;
body();
}
use(builtin_ctlz(n));
Reviewers: rengolin, joerg
Differential Revision: http://reviews.llvm.org/D32605
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 303102
verifyMemoryCongruency() filters out trivially dead MemoryDef(s),
as we find them immediately dead, before moving from TOP to a new
congruence class.
This fixes the same problem for PHI(s) skipping MemoryPhis if all
the operands are dead.
Differential Revision: https://reviews.llvm.org/D33044
llvm-svn: 303100
Summary:
Merge overflow computation for signed add,
appearing both in InstCombine and ValueTracking.
As part of the merge,
cleanup the interface for overflow checks in InstCombine.
Patch by Yoav Ben-Shalom.
Reviewers: craig.topper, majnemer
Reviewed By: craig.topper
Subscribers: takuto.ikuta, llvm-commits
Differential Revision: https://reviews.llvm.org/D32946
llvm-svn: 303029
Summary:
If the Worklist build causes an IR change this change flag currently factors into the flag for running another iteration of the iteration loop. But only changes during processing should trigger another loop.
This patch captures the worklist creation change flag into the outside the loop flag currently used for DbgDeclares and only sends that flag up to the caller. Rerunning the loop only depends on IC.run() now.
This uses the debug output of InstCombine to determine if one or two iterations run. I couldn't think of a better way to detect it since the second spurious iteration shoudn't make any visible changes. Just wasted computation.
I can do a pre-commit of the test case with the CHECK-NOT as a CHECK if this is an ok way to check this.
This is a subset of D31678 as I'm still not sure how to verify the analysis behavior for that.
Reviewers: davide, majnemer, spatel, chandlerc
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32453
llvm-svn: 302982
Implemented frequency based cost/saving analysis
and related options.
The pass is now in a state ready to be turne on
in the pipeline (in follow up).
Differential Revision: http://reviews.llvm.org/D32783
llvm-svn: 302967
This patch adds min/max population count, leading/trailing zero/one bit counting methods.
The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.
Differential Revision: https://reviews.llvm.org/D32931
llvm-svn: 302925
This code was missing a check for stores, so we were thinking the
congruency class didn't have any memory members, and reset the
memory leader.
Differential Revision: https://reviews.llvm.org/D33056
llvm-svn: 302905
invariant PHI inputs and to rewrite PHI nodes during the actual
unswitching.
The checking is quite easy, but rewriting the PHI nodes is somewhat
surprisingly challenging. This should handle both branches and switches.
I think this is now a full featured trivial unswitcher, and more full
featured than the trivial cases in the old pass while still being (IMO)
somewhat simpler in how it works.
Next up is to verify its correctness in more widespread testing, and
then to add non-trivial unswitching.
Thanks to Davide and Sanjoy for the excellent review. There is one
remaining question that I may address in a follow-up patch (see the
review thread for details) but it isn't related to the functionality
specifically.
Differential Revision: https://reviews.llvm.org/D32699
llvm-svn: 302867
The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable. I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.
ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).
We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark. This is not perfect of course but it gives you at
least a rough idea about the tree. Then you can follow up with -view-slp-tree
to really see the actual tree.
llvm-svn: 302811
Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and
refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from
ILV to LVP. These changes facilitate building VPlans and using them to generate
code, following https://reviews.llvm.org/D28975 and its tentative breakdown.
Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to
improve clarity; it's contents remain intact.
Differential Revision: https://reviews.llvm.org/D32200
llvm-svn: 302790
It turned out that MSan was incorrectly calculating the shadow for int comparisons: it was done by truncating the result of (Shadow1 OR Shadow2) to i1, effectively rendering all bits except LSB useless.
This approach doesn't work e.g. in the case where the values being compared are even (i.e. have the LSB of the shadow equal to zero).
Instead, if CreateShadowCast() has to cast a bigger int to i1, we replace the truncation with an ICMP to 0.
This patch doesn't affect the code generated for SPEC 2006 binaries, i.e. there's no performance impact.
For the test case reported in PR32842 MSan with the patch generates a slightly more efficient code:
orq %rcx, %rax
jne .LBB0_6
, instead of:
orl %ecx, %eax
testb $1, %al
jne .LBB0_6
llvm-svn: 302787
// (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2)
This canonicalization was added at:
https://reviews.llvm.org/rL7264
By moving xors out/down, we can more easily combine constants. I'm adding
tests that do not change with this patch, so we can verify that those kinds
of transforms are still happening.
This is no-functional-change-intended because there's a later fold:
// (X^C)|Y -> (X|Y)^C iff Y&C == 0
...and demanded-bits appears to guarantee that any fold that would have
hit the fold we're removing here would be caught by that 2nd fold.
Similar reasoning was used in:
https://reviews.llvm.org/rL299384
The larger motivation for removing this code is that it could interfere with
the fix for PR32706:
https://bugs.llvm.org/show_bug.cgi?id=32706
Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse
a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'.
Differential Revision: https://reviews.llvm.org/D33050
llvm-svn: 302733
This fixes a ubsan bot failure after r302597, which made getProfileCount
non-static, but ended up invoking it on a null ProfileSummaryInfo object
in some cases from buildModuleSummaryIndex.
Most testing passed because the non-static getProfileCount currently
doesn't access any member variables, but I found this when testing a
follow on patch (D32877) that adds a member variable access.
llvm-svn: 302705
This is another step towards favoring 'not' ops over random 'xor' in IR:
https://bugs.llvm.org/show_bug.cgi?id=32706
This transformation may have occurred in longer IR sequences using computeKnownBits,
but that could be much more expensive to calculate.
As the scalar result shows, we do not currently favor 'not' in all cases. The 'not'
created by the transform is transformed again (unnecessarily). Vectors don't have
this problem because vectors are (wrongly) excluded from several other combines.
llvm-svn: 302659
This pass doesn't correctly handle testing for when it is legal to hoist
arbitrary instructions. The whitelist happens to make it safe, so before
it is removed the pass's legality checks will need to be enhanced.
Details have been added to the code review thread for the patch.
llvm-svn: 302640
This pass uses a new target hook to decide whether or not to expand a particular
intrinsic to the shuffevector sequence.
Differential Revision: https://reviews.llvm.org/D32245
llvm-svn: 302631
This change is required because the notion of count is different for
sample profiling and getProfileCount will need to determine the
underlying profile type.
Differential revision: https://reviews.llvm.org/D33012
llvm-svn: 302597
Summary:
This fixes the immediate crash caused by introducing an incorrect inttoptr
before attempting the conversion. There may still be a legality
check missing somewhere earlier for non-integral pointers, but this change
seems necessary in any case.
Reviewers: sanjoy, dberlin
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32623
llvm-svn: 302587
The AArch64 instruction set has a few "widening" instructions (e.g., uaddl,
saddl, uaddw, etc.) that take one or more doubleword operands and produce
quadword results. The operands are automatically sign- or zero-extended as
appropriate. However, in LLVM IR, these extends are explicit. This patch
updates TTI to consider these widening instructions as single operations whose
cost is attached to the arithmetic instruction. It marks extends that are part
of a widening operation "free" and applies a sub-target specified overhead
(zero by default) to the arithmetic instructions.
Differential Revision: https://reviews.llvm.org/D32706
llvm-svn: 302582
The motivation for getting rid of dyn_castNotVal is to allow fixing:
https://bugs.llvm.org/show_bug.cgi?id=32706
So this was supposed to be functional-change-intended for the case
of inverting constants and applying DeMorgan. However, I can't find
any cases where that pattern will actually get to matchDeMorgansLaws()
because we have other folds in visitAnd/visitOr that do the same
thing. So this ends up just being a clean-up patch with slight efficiency
improvement, but no-functional-change-intended.
llvm-svn: 302581
As recently discussed on llvm-dev [1], this patch makes it illegal for
two Functions to point to the same DISubprogram and updates
FunctionCloner to also clone the debug info of a function to conform
to the new requirement. To simplify the implementation it also factors
out the creation of inlineAt locations from the Inliner into a
general-purpose utility in DILocation.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
<rdar://problem/31926379>
Differential Revision: https://reviews.llvm.org/D32975
This reapplies r302469 with a fix for a bot failure (reparentDebugInfo
now checks for the case the orig and new function are identical).
llvm-svn: 302576
Summary:
Since I will post patch with some changes to
replaceDominatedUsesWith, it would be good to avoid
duplicating code again.
Reviewers: davide, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32798
llvm-svn: 302575
Use variadic templates instead of relying on <cstdarg> + sentinel.
This enforces better type checking and makes code more readable.
Differential Revision: https://reviews.llvm.org/D32541
llvm-svn: 302571
The way we currently define congruency for two PHIExpression(s) is:
1) The operands to the phi functions are congruent
2) The PHIs are defined in the same BasicBlock.
NewGVN works under the assumption that phi operands are in predecessor
order, or at least in some consistent order. OTOH, is valid IR:
patatino:
%meh = phi i16 [ %0, %winky ], [ %conv1, %tinky ]
%banana = phi i16 [ %0, %tinky ], [ %conv1, %winky ]
br label %end
and the in-memory representations of the two SSA registers have an
inconsistent order. This violation of NewGVN assumptions results into
two PHIs found congruent when they're not. While we think it's useful
to have always a consistent order enforced, let's fix this in NewGVN
sorting uses in predecessor order before creating a PHI expression.
Differential Revision: https://reviews.llvm.org/D32990
llvm-svn: 302552
The comment says to avoid the case where zero bits are shifted into the truncated value,
but the code checks that the shift is smaller than the truncated value instead of the
number of bits added by the sign extension. Fixing this allows a shift by more than the
value size to be introduced, which is undefined behavior, so the shift is capped at the
value size minus one, which has the expected behavior of filling the value with the sign
bit.
Patch by Jacob Young!
Differential Revision: https://reviews.llvm.org/D32285
llvm-svn: 302548
This caused PR32977.
Original commit message:
> Make it illegal for two Functions to point to the same DISubprogram
>
> As recently discussed on llvm-dev [1], this patch makes it illegal for
> two Functions to point to the same DISubprogram and updates
> FunctionCloner to also clone the debug info of a function to conform
> to the new requirement. To simplify the implementation it also factors
> out the creation of inlineAt locations from the Inliner into a
> general-purpose utility in DILocation.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
> <rdar://problem/31926379>
>
> Differential Revision: https://reviews.llvm.org/D32975
llvm-svn: 302533
Summary:
In first order recurrence vectorization, when the previous value is a phi node, we need to
set the insertion point to the first non-phi node.
We can have the previous value being a phi node, due to the generation of new
IVs as part of trunc optimization [1].
[1] https://reviews.llvm.org/rL294967
Reviewers: mssimpso, mkuper
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32969
llvm-svn: 302532
- This change allows targets to opt-in to using them instead of the log2
shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
factored out into LoopUtils, and now have a unified interface for generating
reductions regardless of the preference of the target. LoopUtils now uses TTI
to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.
Differential Revision: https://reviews.llvm.org/D30086
llvm-svn: 302514
As recently discussed on llvm-dev [1], this patch makes it illegal for
two Functions to point to the same DISubprogram and updates
FunctionCloner to also clone the debug info of a function to conform
to the new requirement. To simplify the implementation it also factors
out the creation of inlineAt locations from the Inliner into a
general-purpose utility in DILocation.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
<rdar://problem/31926379>
Differential Revision: https://reviews.llvm.org/D32975
llvm-svn: 302469
This is another step towards getting rid of dyn_castNotVal,
so we can recommit:
https://reviews.llvm.org/rL300977
As the tests show, we were missing the lshr case for constants
and both ashr/lshr vector splat folds. The ashr case with constant
was being performed inefficiently in 2 steps. It's also possible
there was a latent bug in that case because we can't do that fold
if the constant is positive:
http://rise4fun.com/Alive/Bge
llvm-svn: 302465
Previously SimplifyCFG used getSetSize which returns an APInt that is 1 bit wider than the ConstantRange's bit width. In the reasonably common case that the ConstantRange is 64-bits wide, this requires returning a 65-bit APInt. APInt's can only store 64-bits without a memory allocation so this is inefficient.
The new method takes the 8 as an input and tells if the range contains more than that many elements without requiring any wider math.
llvm-svn: 302385
We can simplify (or (icmp X, C1), (icmp X, C2)) to 'true' or one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything seems
to check out. Eg, the deleted code in instcombine was completely ignoring predicates with
mismatched signedness.
This is a follow-up to:
https://reviews.llvm.org/rL301260https://reviews.llvm.org/D32143
llvm-svn: 302370
wcslen is part of the C99 and C++98 standards.
- This introduces the function to TargetLibraryInfo.
- Also set attributes for wcslen in llvm::inferLibFuncAttributes().
Differential Revision: https://reviews.llvm.org/D32837
llvm-svn: 302278
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.
Differential Revision: https://reviews.llvm.org/D32637
llvm-svn: 302262
Loop Idiom recognition was generating memset in a case that
would result generating a division operation to an unsafe location.
Differential Revision: https://reviews.llvm.org/D32674
llvm-svn: 302238
Compares always return a scalar integer or vector of integers. isIntegerTy returns false for vectors, but that's not completely obvious. So using isVectorTy is less confusing.
llvm-svn: 302198
This fixes a regression since SVN rev 273808 (which was supposed to
not change functionality).
The regression caused miscompilations (noted in the wild when targeting
AArch64) on platforms with 32 bit long.
Differential Revision: https://reviews.llvm.org/D32850
llvm-svn: 302137
When profiling a no-op incremental link of Chromium I found that the functions
computeImportForFunction and computeDeadSymbols were consuming roughly 10% of
the profile. The goal of this change is to improve the performance of those
functions by changing the map lookups that they were previously doing into
pointer dereferences.
This is achieved by changing the ValueInfo data structure to be a pointer to
an element of the global value map owned by ModuleSummaryIndex, and changing
reference lists in the GlobalValueSummary to hold ValueInfos instead of GUIDs.
This means that a ValueInfo will take a client directly to the summary list
for a given GUID.
Differential Revision: https://reviews.llvm.org/D32471
llvm-svn: 302108
Change checkRippleForAdd from a heuristic to a full check -
if it is provable that the add does not overflow return true, otherwise false.
Patch by Yoav Ben-Shalom
Differential Revision: https://reviews.llvm.org/D32686
llvm-svn: 302093
This patch adds isConstant and getConstant for determining if KnownBits represents a constant value and to retrieve the value. Use them to simplify code.
Differential Revision: https://reviews.llvm.org/D32785
llvm-svn: 302091
This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible.
Differential Revision: https://reviews.llvm.org/D32784
llvm-svn: 302088
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
take attribute list indices. Most of these were only used from
BuildLibCalls, and doesNotAlias was only used to test or set if the
return value is malloc-like.
I'm happy to split the patch, but I think they are probably easier to
review when taken together.
This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
0: func attrs
1: retattrs
2...: arg attrs
Reviewers: chandlerc, pete, javed.absar
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D32811
llvm-svn: 302060
Summary:
Cloning basic blocks in the loop for runtime loop unroller depends on loop being
in rotated form (i.e. loop latch target is the exit block).
Assert that this is true, so that callers of runtime loop unroller pass in
canonical loops.
The single caller of this function has that check recently added:
https://reviews.llvm.org/rL301239
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32801
llvm-svn: 302058
Summary:
Currently, loop deletion deletes loop where the only values
that are used outside the loop are loop-invariant.
This patch adds logic to delete loops where the loop is proven to be
never executed (i.e. the only predecessor of the loop preheader has a
constant conditional branch as terminator, and the preheader is not the
taken target). This will remove loops that become dead after
loop-unswitching generates constant conditional branches.
The next steps are:
1. moving the loop deletion implementation to LoopUtils.
2. Add logic in loop-simplifyCFG which will support changing conditional
constant branches to unconditional branches. If loops become unreachable in this
process, they can be removed using `deleteDeadLoop` function.
Reviewers: chandlerc, efriedma, sanjoy, reames
Reviewed by: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32494
llvm-svn: 302015
This was originally checked in here:
https://reviews.llvm.org/rL301923
And reverted here:
https://reviews.llvm.org/rL301924
Because there's a clang test that would fail after this. I fixed/removed the
offending CHECK lines in:
https://reviews.llvm.org/rL301928
So let's try this again. Original commit message:
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in https://reviews.llvm.org/D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate https://reviews.llvm.org/D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is harmful to SCEV and codegen because 'not' can often be folded while random xor cannot.
2. It does not transform vector constants. This is actually a good thing, but if you don't believe the above argument, then we shouldn't have excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't match the greedy nature of instcombine. If we DeMorganize a pattern that has an extra 'not' in it: ~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold that pattern too: (~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301929
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is
harmful to SCEV and codegen because 'not' can often be folded while
random xor cannot.
2. It does not transform vector constants. This is actually a good thing,
but if you don't believe the above argument, then we shouldn't have
excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't
match the greedy nature of instcombine. If we DeMorganize a pattern
that has an extra 'not' in it:
~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold
that pattern too:
(~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301923
In the testcase attached, we believe %tmp1 implies %tmp4.
where:
br i1 %tmp1, label %bb2, label %bb7
br i1 %tmp4, label %bb5, label %bb7
because Wwhile looking at PredicateInfo stuffs we end up calling
isImpliedTrueByMatchingCmp() with the arguments backwards.
Differential Revision: https://reviews.llvm.org/D32718
llvm-svn: 301849
If we have ~(~X & Y), it only makes sense to transform it to (X | ~Y) when we do not need
the intermediate (~X & Y) value. In that case, we would need an extra instruction to
generate ~Y + 'or' (as shown in the test changes).
It's ok if we have multiple uses of ~X or Y, however. In those cases, we may not reduce the
instruction count or critical path, but we might improve throughput because we can generate
~X and ~Y in parallel. Whether that actually makes perf sense or not for a target is something
we can't answer in IR.
Differential Revision: https://reviews.llvm.org/D32703
llvm-svn: 301848
We may not be able to rewrite indirect branch target, but we also want to take it into
account when folding, i.e. if it and all its successor's predecessors go to the same
destination, we can fold, i.e. no need to thread.
llvm-svn: 301816
Summary: [JumpThread] Do RAUW in case Cond folds to a constant in the CFG
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32407
llvm-svn: 301804
Summary:
programUndefinedIfPoison makes more sense, given what the function
does; and I'm about to add a function with a name similar to
isKnownNotFullPoison (so do the rename to avoid confusion).
Reviewers: broune, majnemer, bjarke.roune
Reviewed By: broune
Subscribers: mcrosier, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30444
llvm-svn: 301776
Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit.
Reviewers: RKSimon, spatel, davide
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32651
llvm-svn: 301747
retainAutoreleasedReturnValue that retains the returned value.
This commit fixes a bug in ARC optimizer where it moves a release
between a call and a retainAutoreleasedReturnValue, causing the returned
object to be released before the retainAutoreleasedReturnValue can
retain it.
This commit accomplishes that by doing a lookahead and checking whether
the call prevents the release from moving upwards. In the long term, we
should treat the region between the retainAutoreleasedReturnValue and
the call as a critical section and disallow moving anything there
(possibly using operand bundles).
rdar://problem/20449878
llvm-svn: 301724
I fixed my miscompile in r301722 and I hope I don't have to take
a look at this code again now that Chandler has a new LoopUnswitch
pass, but maybe this could be of use for somebody else in the
meanwhile.
llvm-svn: 301723
This broke the Clang build. (Clang-side patch missing?)
Original commit message:
> [IR] Make add/remove Attributes use AttrBuilder instead of
> AttributeList
>
> This change cleans up call sites and avoids creating temporary
> AttributeList objects.
>
> NFC
llvm-svn: 301712
While looking at pure addressing expressions, it's possible
for the value to appear later in Postorder.
I haven't been able to come up with a testcase where this
exhibits an actual issue, but if you insert a dump before
the value map lookup, a few testcases crash.
llvm-svn: 301705
Eliminates some more cases where some subset of the addressing
computation remains flat. Some cases with addrspacecasts
in nested constant expressions are still left behind however.
llvm-svn: 301704
While debugging a miscompile I realized loopunswitch doesn't
put newlines when printing the instruction being replacement.
Ending up with a single line with many instruction replaced isn't
the best for readability and/or mental sanity.
llvm-svn: 301692
The method is called "get *Param* Alignment", and is only used for
return values exactly once, so it should take argument indices, not
attribute indices.
Avoids confusing code like:
IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError);
Alignment = CS->getParamAlignment(ArgIdx + 1);
Add getRetAlignment to handle the one case in Value.cpp that wants the
return value alignment.
This is a potentially breaking change for out-of-tree backends that do
their own call lowering.
llvm-svn: 301682
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.
NFC
llvm-svn: 301666
Summary:
Skip memops if the total value profiled count is 0, we can't correctly
scale up the counts and there is no point anyway.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32624
llvm-svn: 301645
This is a follow up to the fix in r298360 to improve the handling of debug
values when redundant LEAs are removed. The fix in r298360 effectively
discarded the debug values. This patch now attempts to preserve the debug
values by using the DWARF DW_OP_stack_value operation via prependDIExpr.
Moved functions appendOffset and prependDIExpr from Local.cpp to
DebugInfoMetadata.cpp and made them available as static member functions of
DIExpression.
Differential Revision: https://reviews.llvm.org/D31604
llvm-svn: 301630
EarlyCSE should not just ignore assumes. It should use the fact that its condition is true for all dominated instructions.
Reviewers: sanjoy, reames, apilipenko, anna, skatkov
Reviewed By: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32482
llvm-svn: 301625
If a condition is calculated only once, and there are multiple guards on this condition, we should be able
to remove all guards dominated by the first of them. This patch allows EarlyCSE to try to find the condition
of a guard among the known values, and if it is true, remove the guard. Otherwise we keep the guard and
mark its condition as 'true' for future consideration.
Reviewers: sanjoy, reames, apilipenko, skatkov, anna, dberlin
Reviewed By: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32476
llvm-svn: 301623
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
This is a second re-land of r298158. This time, this feature is
limited to -fdata-sections builds.
llvm-svn: 301587
When possible, put ASan ctor/dtor in comdat.
The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.
The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.
This is a second re-land of r298756. This time with a flag to disable
the whole thing to avoid a bug in the gold linker:
https://sourceware.org/bugzilla/show_bug.cgi?id=19002
llvm-svn: 301586
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
Just calling dropAllReferences leaves pointers to the ConstantExpr
behind, so we would eventually crash with a null pointer dereference.
Differential Revision: https://reviews.llvm.org/D32551
llvm-svn: 301575
Summary:
Misc improvements to debug output. Fix a couple typos and also dump the
value profile before we make any profitability checks.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32607
llvm-svn: 301574
also a discussion about exactly what we should do prior to re-enabling
it.
The current bug is http://llvm.org/PR32821 and the discussion about this
is in the review thread for r300200.
llvm-svn: 301505
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.
Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.
I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.
Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.
Differential Revision: https://reviews.llvm.org/D32376
llvm-svn: 301432
Commits were:
"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"
The changes assumed pointers are 8 byte aligned on all architectures.
llvm-svn: 301429
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.
Reviewers: dblaikie, davide
Reviewed By: davide
Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D32266
llvm-svn: 301424
Summary:
Otherwise we might end up with some empty basic blocks or
single-entry-single-exit basic blocks.
This fixes PR32085
Reviewers: chandlerc, danielcdh
Subscribers: mehdi_amini, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D30468
llvm-svn: 301395
The code I've removed here exists in ExpandBinOp in InstSimplify which we call into before SimplifyUsingDistributiveLaws. The code in InstSimplify looks to have been copied from here.
I verified this code doesn't fire on any lit tests. Not that that proves its definitely dead.
Differential Revision: https://reviews.llvm.org/D32472
llvm-svn: 301341
This patch uses various APInt methods to reduce temporary APInt creation.
This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking.
I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time.
Differential Revision: https://reviews.llvm.org/D32495
llvm-svn: 301338
The matching here wasn't able to handle all the possible commutes. It always assumed the not would be on the left of the xor, but that's not guaranteed.
Differential Revision: https://reviews.llvm.org/D32474
llvm-svn: 301316
One of the fast-math optimizations is to replace calls to standard double
functions with their float equivalents, e.g. exp -> expf. However, this can
cause infinite loops for the following:
float expf(float val) { return (float) exp((double) val); }
A similar inline declaration exists in the MinGW-w64 math.h header file which
when compiled with -O2/3 and fast-math generates infinite loops.
So this fix checks that the calling function to the standard double function
that is being replaced does not match the float equivalent.
Differential Revision: https://reviews.llvm.org/D31806
llvm-svn: 301304
This patch is part of D28975's breakdown.
Genreating the control-flow to guard predicated instructions modified to
only use SplitBlockAndInsertIfThen() for producing the if-then construct.
Differential Revision: https://reviews.llvm.org/D32224
llvm-svn: 301293
We need to do this to prevent a miscompile which sinks an objc_retain
past an objc_release that releases the object objc_retain retains. This
happens because the top-down and bottom-up traversals each determines
the insert point for retain or release individually without knowing
where the other instruction is moved.
For example, when the following IR is fed to the ARC optimizer, the
top-down traversal decides to insert objc_retain right before
objc_release and the bottom-up traversal decides to insert objc_release
right after clang.arc.use.
(IR before ARC optimizer)
%11 = call i8* @objc_retain(i8* %10)
call void (...) @clang.arc.use(%0* %5)
call void @llvm.dbg.value(...)
call void @objc_release(i8* %6)
This reverses the order of objc_release and objc_retain, which causes
the object to be destructed prematurely.
(IR after ARC optimizer)
call void (...) @clang.arc.use(%0* %5)
call void @objc_release(i8* %6)
call void @llvm.dbg.value(...)
%11 = call i8* @objc_retain(i8* %10)
rdar://problem/30530580
llvm-svn: 301289
We can simplify (and (icmp X, C1), (icmp X, C2)) to one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything
seems to check out. Eg, the code in instcombine was completely ignoring predicates with
mismatched signedness.
Handling or-of-icmps would be a follow-up step.
Differential Revision: https://reviews.llvm.org/D32143
llvm-svn: 301260
Summary:
Ensure that the new merge BB (which contains the rest of the original BB
after the mem op being optimized) gets a profile frequency, in case
there are additional mem ops later in the BB. Otherwise they get skipped
as the merge BB looks cold.
Reviewers: davidxl, xur
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32447
llvm-svn: 301244
This reverts commit r300732. This breaks a few tests.
I think the problem is related to adding more uses of
the condition that don't yet exist at this point.
llvm-svn: 301242
The current Loop Unroll implementation works with loops having a
single latch that contains a conditional branch to a block outside
the loop (the other successor is, by defition of latch, the header).
If this precondition doesn't hold, avoid unrolling the loop as
the code is not ready to handle such circumstances.
Differential Revision: https://reviews.llvm.org/D32261
llvm-svn: 301239
This is a straight cut and paste, but there's a bigger problem: if this
fold exists for simplifyOr, there should be a DeMorganized version for
simplifyAnd. But more than that, we have a patchwork of ad hoc logic
optimizations in InstCombine. There should be some structure to ensure
that we're not missing sibling folds across and/or/xor.
llvm-svn: 301213
When the location description of a source variable involves arithmetic
on the value itself, it needs to be marked with DW_OP_stack_value since it
is not describing the variable's location, but rather its value.
This is a follow-up to r297971 and fixes the source testcase quoted in
the comment in debuginfo-dce.ll.
rdar://problem/30725338
This reapplies r301093 without modifications.
llvm-svn: 301210
There is logic to track the expected number of instructions
produced. It thought in this case an instruction would
be necessary to negate the result, but here it folded
into a ConstantExpr fneg when the non-undef value operand
was cancelled out by the second fsub.
I'm not sure why we don't fold constant FP ops with undef currently,
but I think that would also avoid this problem.
llvm-svn: 301199
Summary:
Instead of keeping a variable indicating whether there are early exits
in the loop. We keep all the early exits. This improves LICM's ability to
move instructions out of the loop based on is-guaranteed-to-execute.
I am going to update compilation time as well soon.
Reviewers: hfinkel, sanjoy, efriedma, mkuper
Reviewed By: hfinkel
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D32433
llvm-svn: 301196
Summary:
The return value of these intrinsics should always have 0 bits for
inactive threads. This means that when all arguments are constant
and the comparison evaluates to true, the intrinsic should return
the current exec mask.
Fixes some GL_ARB_shader_ballot tests.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D32344
llvm-svn: 301195
We handled all of the commuted variants for plain xor already,
although they were scattered around and sometimes folded less
efficiently using distributive laws. We had no folds for not-xor.
Handling all of these patterns consistently is part of trying to
reinstate:
https://reviews.llvm.org/rL300977
llvm-svn: 301144
Summary:
In case all predecessor go to a single successor of current BB. We want to fold (not thread).
I failed to update the phi nodes properly in the last patch https://reviews.llvm.org/rL300657.
Phi nodes values are per predecessor in LLVM.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32400
llvm-svn: 301139
There's probably some better way to write this that eliminates the
code duplication without hurting readability, but at least this
eliminates the logic holes and is hopefully slightly more efficient
than creating new instructions.
llvm-svn: 301129
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.
For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."
PR32754.
llvm-svn: 301111
... in the per-TU -O0 pipeline.
The problem is that there could be passes registered using
`addExtensionsToPM()` introducing unnamed globals.
Asan is an example, but there may be others. Building cppcheck
with `-flto=thin` and `-fsanitize=address` triggers an assertion
while we're reading bitcode (in lib/LTO), as the BitcodeReader
assumes there are no unnamed globals (because the namer has run).
Unfortunately I wasn't able to find an easy way to test this.
I added a comment in the hope nobody moves this again.
llvm-svn: 301102
When the location description of a source variable involves arithmetic
on the value itself, it needs to be marked with DW_OP_stack_value since it
is not describing the variable's location, but rather its value.
This is a follow-up to r297971 and fixes the source testcase quoted in
the comment in debuginfo-dce.ll.
rdar://problem/30725338
llvm-svn: 301093
The later uses of dyn_castNotVal in this block are either
incomplete (doesn't handle vector constants) or overstepping
(shouldn't handle constants at all), but this first use is
just unnecessary. 'I' is obviously not a constant, and it
can't be a not-of-a-not because that would already be
instsimplified.
llvm-svn: 301088
The bug was introduced by r301018 "[InstCombine] fadd double (sitofp x), y check that the promotion is valid". The patch didn't expect that fadd can be on vectors not necessarily scalars. Add vector support along with the test.
llvm-svn: 301070
Fixes leaving intermediate flat addressing computations
where a GEP instruction's source is a constant expression.
Still leaves behind a trivial addrspacecast + gep pair that
instcombine is able to handle, which ideally could be folded
here directly.
llvm-svn: 301044
Doing these transformations check that the result of integer addition is representable in the FP type.
(fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))
(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))
This is a fix for https://bugs.llvm.org//show_bug.cgi?id=27036
Reviewed By: andrew.w.kaylor, scanon, spatel
Differential Revision: https://reviews.llvm.org/D31182
llvm-svn: 301018
Currently we choose PostBB as the single successor of QFB, but its possible that QTB's single successor is QFB which would make QFB the correct choice.
Differential Revision: https://reviews.llvm.org/D32323
llvm-svn: 300992
places based on it.
Existing constant hoisting pass will merge a group of contants in a small range
and hoist the const materialization code to the common dominator of their uses.
However, if the uses are all in cold pathes, existing implementation may hoist
the materialization code from cold pathes to a hot place. This may hurt performance.
The patch introduces BFI to the pass and selects the best insertion places based
on it.
The change is controlled by an option consthoist-with-block-frequency which is
off by default for now.
Differential Revision: https://reviews.llvm.org/D28962
llvm-svn: 300989