`X86TTIImpl::getGSScalarCost()` has (at least) two issues:
* it naively computes the cost of sequence of `insertelement`/`extractelement`.
If we are operating not on the XMM (but YMM/ZMM),
this widely overestimates the cost of subvector insertions/extractions.
* Gather/scatter takes a vector of pointers, and scalarization results in us performing
scalar memory operation for each of these pointers, but we never account for the cost
of extracting these pointers out of the vector of pointers.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111222
Even if there are no interesting functions, the SCCP solver would still run
before bailing. Now bail earlier, avoid running the solver for nothing.
Differential Revision: https://reviews.llvm.org/D111645
Added functions those implement "atomic compare".
Though clang does not use library interfaces to implement OpenMP atomics,
the functions added for consistency.
Also added missed functions for 80-bit floating min/max atomics.
Differential Revision: https://reviews.llvm.org/D110109
This finds the curl libraries if LLVM_ENABLE_CURL is set. This is needed
to implement the debuginfod client library in LLVM.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D111238
Replaced storing of ittnotify domain array index into
location info structure (which is now read-only) with storing of
(location info address + ittnotify domain + team size) into hash map.
Replaced __kmp_itt_barrier_domains and __kmp_itt_imbalance_domains arrays with
__kmp_itt_barrier_domains hash map; __kmp_itt_region_domains and
__kmp_itt_region_team_size arrays with __kmp_itt_region_domains hash map.
Basic functionality did not change (at least tried to not change).
The patch fixes https://bugs.llvm.org/show_bug.cgi?id=48644.
Differential Revision: https://reviews.llvm.org/D111580
NFC. This check does not verify any functional property since size 8
was added. Remove it for simplicity.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D111737
Change-Id: Ifd7cbd324a137f939d8dc04acb8fbd54c9527a42
This PR implements the save of the XPLINK callee-saved registers
on z/OS.
Reviewed By: uweigand, Kai
Differential Revision: https://reviews.llvm.org/D111653
This is the first step towards supporting general sparse tensors as output
of operations. The init sparse tensor is used to materialize an empty sparse
tensor of given shape and sparsity into a subsequent computation (similar to
the dense tensor init operation counterpart).
Example:
%c = sparse_tensor.init %d1, %d2 : tensor<?x?xf32, #SparseMatrix>
%0 = linalg.matmul
ins(%a, %b: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%c: tensor<?x?xf32, #SparseMatrix>) -> tensor<?x?xf32, #SparseMatrix>
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111684
These tests fail every 10 or so runs on Windows causing both local failures as well as buildbot failures.
Differential Revision: https://reviews.llvm.org/D111659
Adds initial parsing and sema for the 'adjust_args' clause.
Note that an AST clause is not created as it instead adds its expressions
to the OMPDeclareVariantAttr.
Differential Revision: https://reviews.llvm.org/D99905
If the Node has an invalid location, it will trigger assert in
isInSystemHeader(...).
void test() {
__builtin_va_list __args;
// __builtin_va_list has no defination in any source file and its
// CXXConstructorDecl has invalid sourcelocation
}
coredump with "Assertion `Loc.isValid() && "Can't get file
characteristic of invalid loc!"' failed." in
getFileCharacteristic(SourceLocation).
This is a follow on for D111675 which implements the gep case. I'd originally left it out because I was hoping to actually implement the inrange todo, but after a bit of staring at the code, decided to leave it as is since it doesn't effect this use case (i.e. instcombine requires the op to freeze to be an instruction).
Differential Revision: https://reviews.llvm.org/D111691
This has a couple of benefits:
1. It can sometimes fix clusters that got broken apart when the register
allocator inserted a copy.
2. Post-RA scheduling does not have to worry about increasing register
pressure, which in some cases gives it more freedom to reorder
instructions.
Testing on a collection of 10,000 graphics shaders compiled for gfx1010
showed:
- The average length of each run of one or more load instructions
increased by about 1%.
- The number of runs of two or more load instructions increased by
about 4%.
Differential Revision: https://reviews.llvm.org/D111646
During explicit modular build, PCM files are typically specified via the `-fmodule-file=<path>` command-line option. Early during the compilation, Clang uses the `ASTReader` to read their contents and caches the result so that the module isn't loaded implicitly later on. A listener is attached to the `ASTReader` to collect names of the modules read from the PCM files. However, if the PCM has already been loaded previously via PCH:
1. the `ASTReader` doesn't do anything for the second time,
2. the listener is not invoked at all,
3. the module load result is not cached,
4. the compilation fails when attempting to load the module implicitly later on.
This patch solves this problem by attaching the listener to the `ASTReader` for PCH reading as well.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D111560
AffinePromotion and AffineDemotion passes where upstreamed
in their current status from fir-dev. In order to make sure everybody
is on the same page, this patch add some comments to state that.
Reviewed By: schweitz
Differential Revision: https://reviews.llvm.org/D111629
The type can be inferred trivially, but it is currently done as string
stitching between ODS and C++ and is not easily exposed to Python.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D111712
In D110173 we start using the existing LLVM IDF calculator to place PHIs as
we reconstruct an SSA form of machine-code program. Sadly that's slower
than the old (but broken) way, this patch attempts to recover some of that
performance.
The key observation: every time we def a register, we also have to def it's
register units. If we def'd $rax, in the current implementation we
independently calculate PHI locations for {al, ah, ax, eax, hax, rax}, and
they will all have the same PHI positions. Instead of doing that, we can
calculate the PHI positions for {al, ah} and place PHIs for any aliasing
registers in the same positions. Any def of a super-register has to def
the unit, and vice versa, so this is sound. It cuts down the SSA placement
we need to do significantly.
This doesn't work for stack slots, or registers we only ever read, so place
PHIs normally for those. LiveDebugValues choses to ignore writes to SP at
calls, and now have to ignore writes to SP register units too.
Differential Revision: https://reviews.llvm.org/D111627
Instead of setting operands to undef as the "operands" pass does,
convert the operands to a function argument. This avoids having to
introduce undef values into the IR which have some unpredictability
during optimizations.
For instance,
define void @func() {
entry:
%val = add i32 32, 21
store i32 %val, i32* null
ret void
}
is reduced to
define void @func(i32 %val) {
entry:
%val1 = add i32 32, 21
store i32 %val, i32* null
ret void
}
(note that the instruction %val is renamed to %val1 when printing
the IR to avoid ambiguity; ideally %val1 would be removed by dce or the
instruction reduction pass)
Any call to @func is replaced with a call to the function with the
new signature and filled with undef. This is not ideal for IPA passes,
but those out-of-scope for now.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D111503
The builtin __rlwnm is currently constrained to accept only constants
for the shift parameter but the instructions emitted for it have no such
constraint, this patch allows the builtins to accept variable shift.
Reviewed By: NeHuang, amyk
Differential Revision: https://reviews.llvm.org/D111229
This is NFC-intended for scalar code. There are still unnecessary
m_ConstantInt restrictions in surrounding code, so this is not a
complete fix.
This prevents regressions seen with a planned follow-on to D111410.
There's a substantial pile of scalar tests for transforms that
depend on this code, but zero vector coverage. This patch adds
a vector test next to the first scalar test in each file that
is affected by foldLogOpOfMaskedICmps.
The code that handles these transforms is artificially limited
from working with vector splat constants.
When writing the user-facing documentation, I noticed several inconsistencies
and asymmetries in the Python API we provide. Fix them by adding:
- the `owner` property to regions, similarly to blocks;
- the `isinstance` method to any class derived from `PyConcreteAttr`,
`PyConcreteValue` and `PyConreteAffineExpr`, similar to `PyConcreteType` to
enable `isa`-like calls without having to handle exceptions;
- a mechanism to create the first block in the region as we could only create
blocks relative to other blocks, with is impossible in an empty region.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D111556
Old versions of gcc want template specialisations to happen within the
namespace where the template lives; this is still present in gcc 5.1, which
we officially support, so it has to be worked around.
I came across an issue where since we build the library for Apple with
the install name directory being /usr/lib, which means that if we don't
run the tests with DYLD_LIBRARY_PATH, we'll end up loading the
system-provided libc++abi when running the tests. That wreaks havoc.
Instead of fixing it in the legacy config file, this commit introduces
an Apple libc++abi config file that does the right thing.
Differential Revision: https://reviews.llvm.org/D111279
Instead of always defining LIBCXXABI_NO_TIMER to run the tests, only
define LIBCXXABI_USE_TIMER when we want to enable the timer. This makes
the libc++abi testing configuration simpler.
As a fly-by fix, remove the unused LIBUNWIND_NO_TIMER macro from libunwind.
Differential Revision: https://reviews.llvm.org/D111667
InstrRefBasedLDV used to try and determine which values are in which
registers using a lattice approach; however this is hard to understand, and
broken in various ways. This patch replaces that approach with a standard
SSA approach using existing LLVM utilities. PHIs are placed at dominance
frontiers; value propagation then eliminates un-necessary PHIs.
This patch also adds a bunch of unit tests that should cover many of the
weirder forms of control flow.
Differential Revision: https://reviews.llvm.org/D110173
Previously we would call getAsTemplate() when kind == TemplateExpansion,
which triggers an assertion. The call is now replaced with
getAsTemplateOrTemplatePattern(), which is exactly the same as
getAsTemplate(), except it allows calls when kind == TemplateExpansion.
No change in behavior for no-assert builds.
Differential Revision: https://reviews.llvm.org/D111648
Just moving that block inside DWARFASTParserClang::ParseChildMembers into
its own function. Also early-exiting instead of a large if when
num_attributes is 0.