This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 289902
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.
All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.
Differential Revision: https://reviews.llvm.org/D27324
llvm-svn: 289899
Summary:
We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
imported from its module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27687
llvm-svn: 289896
Lowering to llvm.cttz() will result in constant folding anyway
if the argument to ffs is a constant. Pointed out by Eli for
fls() in D14590.
llvm-svn: 289888
The code change for D27687 accidentally got committed along with the
main change in r289843. Revert it temporarily, so that I can recommit it
along with its test as intended.
llvm-svn: 289875
Summary:
Thin link efficiency improvement. After adding an importing candidate to
the worklist we might have later added it again with a higher threshold.
Skip it when popped from the worklist if we recorded a higher threshold
than the current worklist entry, it will get processed again at the
higher threshold when that entry is popped.
This required adding the summary's GUID to the worklist, so that it can
be used to query the recorded highest threshold for it when we pop from the
worklist.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27696
llvm-svn: 289867
This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".
Differential Revision: https://reviews.llvm.org/D27814
llvm-svn: 289863
Now that a new API to merge debug locations has been committed at r289661 (see
review D26256 for more details), we can use it to "improve" the code added by
revision r280995.
Instead of nulling the debugloc of a commoned instruction, we use the 'merged'
debug location. At the moment, this is just a no functional change since
function `DILocation::getMergedLocation()` is just a stub and would always
return a null location.
Differential Revision: https://reviews.llvm.org/D27804
llvm-svn: 289862
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns.
This patch won't solve the example that was attached to that thread, so something else still needs fixing.
The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.
Corresponding changes for smax/umin/umax to follow.
Differential Revision: https://reviews.llvm.org/D27531
llvm-svn: 289855
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.
Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.
llvm-svn: 289843
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Differential Revision: https://reviews.llvm.org/D27590
llvm-svn: 289828
A number of new patterns for simplifying and/xor of icmp:
(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.
(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
%s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.
llvm-svn: 289813
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).
Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.
At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.
Differential Revision: https://reviews.llvm.org/D27259
llvm-svn: 289755
Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass.
Reviewers: eraman, davidxl
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D27733
llvm-svn: 289725
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.
Reviewers: tejohnson, davidxl, dnovillo
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D27743
llvm-svn: 289714
If all the operands to a phi node are compares that have a RHS constant,
instcombine will try to pull them through the phi node, combining them into
a single operation. When it does this, the debug location of the new op
should be the merged debug locations of the phi node arguments.
Patch 8 of 8 for D26256. Folding of a compare that has a RHS constant.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289704
If all the operands to a phi node are a binop with a RHS constant, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new op should be the
merged debug locations of the phi node arguments.
Patch 7 of 8 for D26256. Folding of a binop with RHS constant.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289699
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline. The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.
Performance results on AArch64 kryo:
Improvements:
Benchmarks/CoyoteBench/fftbench -24.952%
spec2006/bzip2 -4.071%
internal bmark -3.177%
Benchmarks/PAQ8p/paq8p -1.754%
spec2000/perlbmk -1.328%
spec2006/h264ref -1.140%
Regressions:
internal bmark +1.818%
Benchmarks/mafft/pairlocalalign +1.084%
Reviewers: sebpop, dberlin, hiraditya
Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D27722
llvm-svn: 289696
If all the operands to a phi node are a cast, instcombine will try to pull
them through the phi node, combining them into a single cast. When it does
this, the debug location of the new cast should be the merged debug locations
of the phi node arguments.
Patch 6 of 8 for D26256. Folding of a cast operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289693
If all the operands to a phi node are a load, instcombine will try to pull
them through the phi node, combining them into a single load. When it does
this, the debug location of the new load should be the merged debug locations
of the phi node arguments.
Patch 5 of 8 for D26256. Folding of a load operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289688
If all the operands to a phi node are getelementptr, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new getelementptr
should be the merged debug locations of the phi node arguments.
Patch 4 of 8 for D26256. Folding of a getelementptr operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289684
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.
Patch 3 of 8 for D26256. Folding of a compare operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289681
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.
Patch 2 of 8 for D26256. Folding of a binary operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289679
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.
Reviewers: tejohnson, davidxl, dnovillo
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D27743
llvm-svn: 289669
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.
Differential Revision: https://reviews.llvm.org/D26671
llvm-svn: 289647
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0.
Also calculate UndefElts correctly.
Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.
llvm-svn: 289629
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused.
Also calculate UndefElts correctly.
Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.
llvm-svn: 289628
Summary:
This patch will add loop metadata on the pre and post loops generated by IRCE.
Currently, we have metadata for disabling optimizations such as vectorization,
unrolling, loop distribution and LICM versioning (and confirmed that these
optimizations check for the metadata before proceeding with the transformation).
The pre and post loops generated by IRCE need not go through loop opts (since
these are slow paths).
Added two test cases as well.
Reviewers: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26806
llvm-svn: 289588
We currently check if the exact trip count is known and is smaller than the
"tiny loop" bound. We should be checking the maximum bound on the trip count
instead.
Differential Revision: https://reviews.llvm.org/D27690
llvm-svn: 289583
Summary:
This is last in of a series of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.
This patch adds the code to update the control and data flow graphs
to remove the dead control flow.
Also update unit tests to test the capability to remove dead,
may-be-infinite loop which is enabled by the switch
-adce-remove-loops.
Previous patches:
D23824 [ADCE] Add handling of PHI nodes when removing control flow
D23559 [ADCE] Add control dependence computation
D23225 [ADCE] Modify data structures to support removing control flow
D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)
Reviewers: dberlin, majnemer, nadav, mehdi_amini
Subscribers: llvm-commits, david2050, freik, twoh
Differential Revision: https://reviews.llvm.org/D24918
llvm-svn: 289548
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.
Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.
llvm-svn: 289523
Summary:
Since we don't break BBs for function calls. We might get some insane counts
(wrap of unsigned) in the presence of noreturn calls.
This patch sets these counts to zero instead of the wrapped number.
Reviewers: davidxl
Subscribers: xur, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D27602
llvm-svn: 289521
This patch ensures the correct minimum bit width during type-shrinking.
Previously when type-shrinking, we always sign-extended values back to their
original width. However, if we are going to sign-extend, and the sign bit is
unknown, we have to increase the minimum bit width by one bit so the
sign-extend will fill the upper bits correctly. If the sign bit is known to be
zero, we can perform a zero-extend instead. This should fix PR31243.
Reference: https://llvm.org/bugs/show_bug.cgi?id=31243
Differential Revision: https://reviews.llvm.org/D27466
llvm-svn: 289470