And another attempt to start untangling this ball of threads around gather.
There's `TTI::prefersVectorizedAddressing()`hoop, which confusingly defaults to `true`,
which tells LV to try to vectorize the addresses that lead to loads,
but X86 generally can not deal with vectors of addresses,
the only instructions that support that are GATHER/SCATTER,
but even those aren't available until AVX2, and aren't really usable until AVX512.
This specializes the hook for X86, to return true only if we have AVX512 or AVX2 w/ fast gather.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111546
Incorrectly triggers for template classes that inherit
from a base class that has virtual destructor.
Any class inheriting from a base that has a virtual destructor
will have their destructor also virtual, as per the Standard:
https://timsong-cpp.github.io/cppwp/n4140/class.dtor#9
> If a class has a base class with a virtual destructor,
> its destructor (whether user- or implicitly-declared) is virtual.
Added unit tests to prevent regression.
Fixes bug https://bugs.llvm.org/show_bug.cgi?id=51912
Differential Revision: https://reviews.llvm.org/D110614
The rules were too restrictive, causing out-of-place bufferization when the result of two ExtractSliceOp is fed into an InsertSliceOp.
Differential Revision: https://reviews.llvm.org/D111861
This patch removes code very specific to affine dependence analysis and
refactors it as a FlatAfffineRelation.
A FlatAffineRelation represents a set of ordered pairs (domain -> range) where
"domain" and "range" are tuples of identifiers. These relations are used to
represent an "access relation" for memory access on a memref. An access
relation maps elements of an iteration domain to the element(s) of an array
domain accessed by that iteration of the associated statement through some
array reference. The dependence relation representing the dependence
constraints between two memory accesses can be built by composing the access
relation of the destination access by the inverse of the access relation of
source access.
This patch does not change the functionality of the existing dependence
analysis in checkMemrefAccessDependence, but refactors it to use
FlatAffineRelations to deduplicate code and enable code reuse for future
development of features like scheduling, value-based dependence analysis, etc.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D110563
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
This revision also adds a few passes to the sparse compiler part to unify the transformation sequence with all other paths we currently use.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111900
This is the only lowering to Linalg Tosa has, so it's needlessly
verbose. Likely this was a carry over from IREE's usage where we
originally lowered to linalg on buffers (the only linalg that existed at
the time), so the everything on tensors needed the suffix. We're dropping
it in IREE also, having transitioned entirely to using Linalg on
tensors.
Reviewed By: sjarus
Differential Revision: https://reviews.llvm.org/D111911
RecordMemberExprValidator was not looking through ElaboratedType
nodes when looking for candidates which occur in base classes.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D111830
It seems StringConvert.cpp was moved, and the Xcode project file
wasn't updated.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D111910
For the common case where the shift amount is constant (a single
element range) we can easily compute a precise range (up to
unsigned envelope), so do that.
This removes an over-specified fold. The more general transform
was added with:
727e642e97
There's a difference on an existing test that shows a potentially
unnecessary use limit on an icmp fold.
That fold is in InstCombinerImpl::foldICmpSubConstant(), and IIRC
there was some back-and-forth on it and similar folds because they
could cause analysis/passes (SCEV, LSR?) to miss optimizations.
Differential Revision: https://reviews.llvm.org/D111410
We always want to check correctness, but for some operations we
can only guarantee optimality for a subset of inputs. Accept an
additional predicate that determines whether optimality for a
given pair of ranges should be checked.
(iN X s>> (N-1)) & Y --> (X < 0) ? Y : 0
https://alive2.llvm.org/ce/z/qeYhdz
I was looking at a missing abs() transform and found my way to this
generalization of an existing fold that was added with D67799.
As discussed in that review, we want to make sure codegen handles
this difference well, and for all of the targets/types that I
spot-checked, it looks good.
I am leaving the existing fold in place in this commit because
it covers a potentially missing icmp fold, but I plan to remove
that as a follow-up commit as suggested during review.
Differential Revision: https://reviews.llvm.org/D111410
Print a friendly error message including the inputs, result and
not-contained element if an exhaustive correctness test fails,
same as we do if the optimality test fails.
Mimic the behavior of including headers where a system includer makes an
includee a system header too.
rdar://84049469
Differential Revision: https://reviews.llvm.org/D111476
This patch adds a pass option to only run transforms that scalarize
vector operations and do not create new vector instructions.
When running VectorCombine early in the pipeline introducing new vector
operations can have negative effects, like blocking loop or SLP
vectorization. To avoid regressions, restrict the early VectorCombine
run (when using -enable-matrix) to only perform scalarization and not
introduce new vector operations.
This is done as option to the pass directly, which is then set when
adding the pass to the pipeline. This is done for the new pass manager
only.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D111800
Hide __llvm_profile_raw_version so as not to resolve reference from a
dependent shared object. Since libclang_rt.profile is added later in
the command line, a definition of __llvm_profile_raw_version is not
included if it is provided from an earlier object, e.g. from a shared
dependency.
This causes an extra dependence edge where if libA.so depends on libB.so
and both are coverage-instrumented, libA.so uses libB.so's definition of
__llvm_profile_raw_version. This leads to a runtime link failure if the
libB.so available at runtime does not provide this symbol (but provides
the other dependent symbols). Such a scenario can occur in Android's
mainline modules.
E.g.:
ld -o libB.so libclang_rt.profile-x86_64.a
ld -o libA.so -l B libclang_rt.profile-x86_64.a
libB.so has a global definition of __llvm_profile_raw_version. libA.so
uses libB.so's definition of __llvm_profile_raw_version. At runtime,
libB.so may not be coverage-instrumented (i.e. not export
__llvm_profile_raw_version) so runtime linking of libA.so will fail.
Marking this symbol as hidden forces each binary to use the definition
of __llvm_profile_raw_version from libclang_rt.profile.
Differential Revision: https://reviews.llvm.org/D111759
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.
Reviewed By: yaxunl, #amdgpu
Differential Revision: https://reviews.llvm.org/D109707
prepareSymbolRelocation() in Writer.cpp adds both symbols that need binding and
symbols relocated with a pointer relocation to the got.
Pointer relocations are emitted for non-movq GOTPCREL(%rip) loads. (movqs
become GOT_LOADs so that the linker knows they can be relaxed to leaqs, while
others, such as addq, become just GOT -- a pointer relocation -- since they
can't be relaxed in that way).
For example, this C file produces a private_extern GOT relocation when
compiled with -O2 with clang:
extern const char kString[];
const char* g(int a) { return kString + a; }
Linkers need to put pointer-relocated symbols into the GOT, but ld64 marks them
as LOCAL in the indirect symbol table. This matters, since `strip -x` looks at
the indirect symbol table when deciding what to strip.
The indirect symtab emitting code was assuming that only symbols that need
binding are in the GOT, but pointer relocations where there too. Hence, the
code needs to explicitly check if a symbol is a private extern.
Fixes https://crbug.com/1242638, which has some more information in comments 14
and 15. With this patch, the output of `nm -U` on Chromium Framework after
stripping now contains just two symbols when using lld, just like with ld64.
Differential Revision: https://reviews.llvm.org/D111852
- When a redundant MBB is being erased from MDT, check whether its
single successor is dominiated by it. If yes, update that successor's
idom before erasing MBB; otherwise, it implies MBB is a leaf node and
could be erased directly.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D111831
This test starts failing when people add a setting starting with
`target.process.t` which of course can easily happen. Make it a bit more
resistant by only requiring that `target.process.thr` has a unique completion.