If all the operands to a phi node are a binop with a RHS constant, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new op should be the
merged debug locations of the phi node arguments.
Patch 7 of 8 for D26256. Folding of a binop with RHS constant.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289699
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline. The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.
Performance results on AArch64 kryo:
Improvements:
Benchmarks/CoyoteBench/fftbench -24.952%
spec2006/bzip2 -4.071%
internal bmark -3.177%
Benchmarks/PAQ8p/paq8p -1.754%
spec2000/perlbmk -1.328%
spec2006/h264ref -1.140%
Regressions:
internal bmark +1.818%
Benchmarks/mafft/pairlocalalign +1.084%
Reviewers: sebpop, dberlin, hiraditya
Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D27722
llvm-svn: 289696
If all the operands to a phi node are a cast, instcombine will try to pull
them through the phi node, combining them into a single cast. When it does
this, the debug location of the new cast should be the merged debug locations
of the phi node arguments.
Patch 6 of 8 for D26256. Folding of a cast operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289693
This patch fixes the linkage for __crashtracer_info__, making it have the proper mangling (extern "C") and linkage (private extern).
It also adds a new PrettyStackTrace type, allowing LLDB to adopt this instead of Host::SetCrashDescriptionWithFormat().
Without this patch, CrashTracer on macOS won't pick up pretty stack traces from any LLVM client.
An LLDB commit adopting this API will follow shortly.
Differential Revision: https://reviews.llvm.org/D27683
llvm-svn: 289689
If all the operands to a phi node are a load, instcombine will try to pull
them through the phi node, combining them into a single load. When it does
this, the debug location of the new load should be the merged debug locations
of the phi node arguments.
Patch 5 of 8 for D26256. Folding of a load operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289688
Summary:
D27549 (partial fix for PR26619) emits a constant value in the debug
metadata for a floating-point static const that does not exceed 64
bits in size. Whether or not a long double exceeds 64 bits in size
depends on the target. Modify the test case so that it expects a
constant value for long double if and only if the long double is no
larger than 64 bits.
Reviewers: cfe-commits, probinson
Differential Revision: https://reviews.llvm.org/D27597
llvm-svn: 289686
The driver passes flags to cc1 that enable various checkers based on
the target triple. This commit adds tests for these flags on Darwin, Linux,
and Windows.
This is a test-only change.
llvm-svn: 289685
If all the operands to a phi node are getelementptr, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new getelementptr
should be the merged debug locations of the phi node arguments.
Patch 4 of 8 for D26256. Folding of a getelementptr operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289684
Adds a "Discriminator" field to struct DILineInfo, which defaults to 0.
Fills out the "Discriminator" field in DILineInfo in DWARFDebugLine::LineTable::getFileLineInfoForAddress().
in order to have a slightly nicer interface in getFileLineInfoForAddress.
Patch by Simon Que!
Differential Revision: https://reviews.llvm.org/D27649
llvm-svn: 289683
Objects may move during the garbage collection, and JVM needs
to notify ThreadAnalyzer about that. The new function
__tsan_java_find eliminates the need to maintain these
objects both in ThreadAnalyzer and JVM.
Author: Alexander Smundak (asmundak)
Reviewed in https://reviews.llvm.org/D27720
llvm-svn: 289682
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.
Patch 3 of 8 for D26256. Folding of a compare operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289681
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.
Patch 2 of 8 for D26256. Folding of a binary operation.
Differential Revision: https://reviews.llvm.org/D26256
llvm-svn: 289679
r289225 broke AST invariants by reparenting enumerators into function
decl contexts. This improves things by only reparenting TagDecls while
also attempting to preserve the lexical declcontext chain. The
interesting example here is:
int f(struct S { enum E { a = 1 } b; } c);
The semantic contexts of E and S should be f, and the lexical context of
S should be f and the lexical context of E should be S. We didn't do
that with r289225, but now we should.
This change should also improve our behavior on this example:
void f() {
extern void ext(struct S { } o);
// S injected here
}
Before r289225 we would only remove 'S' from the surrounding tag
injection context if it was the TU, but now we properly reparent S from
f to ext.
Fixes PR31366
llvm-svn: 289678
Since SGPRs should spill to VGPRs, they should be allocated first.
I don't think this is sufficient for SGPRs to always spill to
VGPRs though.
llvm-svn: 289671
Summary:
We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from
clang to llvm pass manager builder.
Reviewers: tejohnson, davidxl, dnovillo
Subscribers: mehdi_amini, cfe-commits
Differential Revision: https://reviews.llvm.org/D27744
llvm-svn: 289670
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.
Reviewers: tejohnson, davidxl, dnovillo
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D27743
llvm-svn: 289669
This change allows setting the default linker used by the Clang
driver when configuring the build.
Differential Revision: https://reviews.llvm.org/D25263
llvm-svn: 289668
Summary:
Now that we are not rounding up the sizes passed to the secondary allocator,
the memalign test could run out of aligned addresses to return for larger
alignments. We now reduce the size of the quarantine for that test, and
allocate less chunks for the larger alignments.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27760
llvm-svn: 289665
Summary:
The current test only checks whether ld64 is available, causing tests
to fail when ld64 is avilable but libLTO is not built.
Reviewers: beanz, mehdi_amini
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D27739
llvm-svn: 289662
Given two debug locations the function getMergedLocation combines the
locations into a single location (which may be an empty location).
Please see https://reviews.llvm.org/D26256 for the discussion leading
up to this API.
Note the function is currently a stub. This allows optimisations to
use the API although no location will actually be used.
This is patch 1 out of 8 for D26256. As suggested by David Blaikie,
each change in D26256 has been broken out into a separate patch.
llvm-svn: 289661
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 289659
Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases.
Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value.
Differential Revision: https://reviews.llvm.org/D27714
llvm-svn: 289654
adding new optimization opportunity by adding new X86ISelLowering pattern. The test case was shown in https://llvm.org/bugs/show_bug.cgi?id=30945.
Test explanation:
Select gets three arguments mask, op and op2. In this case, the Mask is a result of ICMP. The ICMP instruction compares (with equal operand) the zero initializer vector and the result of the first ICMP.
In general, The result of "cmp eq, op1, zero initializers" is "not(op1)" where op1 is a mask. By rearranging of the two arguments inside the Select instruction, we can get the same result. Without the necessary of the middle phase ("cmp eq, op1, zero initializers").
Missed optimization opportunity:
vpcmpled %zmm0, %zmm1, %k0
knotw %k0, %k1
can be combine to
vpcmpgtd %zmm0, %zmm2, %k1
Reviewers:
1. delena
2. igorb
Commited after check all
Differential Revision: https://reviews.llvm.org/D27160
llvm-svn: 289653
The function SemaBuiltinFPClassification removed superfluous float to double
casts, this was changed to also remove float to float casts but this isn't
valid in all cases, for example when doing an rvaluetolvalue cast. Added a
check to only remove if this was a conventional floating cast.
Added additional tests into SemaOpenCL/extensions to cover these cases
llvm-svn: 289650