Change the default type of v64 register class from v512i32 to v256f64.
Add a regression test also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D91301
Added documentation about the bufferization features.
Furthermore, the usage of pre- and post-processing is described.
This also includes information about optimization functionalities.
Differential Revision: https://reviews.llvm.org/D90675
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.
I've fixed this by ensuring that:
1. When we don't have enough registers to allocate the whole block
we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
tuple part argument and reinvoke the autogenerated calling
convention handler. Doing this prevents the code from entering
an infinite recursion and, in combination with 1), ensures we
switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
then deallocate any SVE registers we marked as allocated in 1).
We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
LowerFormalArguments functions to detect the start of a tuple,
which involves allocating a single stack object and doing the
correct numbers of legal loads and stores.
Differential Revision: https://reviews.llvm.org/D90219
This was implemented in 410b650e674496e61506fa88f3026759b8759d0f:
"Implement P0340R3: Make 'underlying_type' SFINAE-friendly. Reviewed as https://reviews.llvm.org/D63574
llvm-svn: 364094"
This change does two main things
1) An operation might have multiple dependences to the same
producer. Not tracking them correctly can result in incorrect code
generation with fusion. To rectify this the dependence tracking
needs to also have the operand number in the consumer.
2) Improve the logic used to find the fused loops making it easier to
follow. The only constraint for fusion is that linalg ops (on
buffers) have update semantics for the result. Fusion should be
such that only one iteration of the fused loop (which is also a
tiled loop) must touch only one (disjoint) tile of the output. This
could be relaxed by allowing for recomputation that is the default
when oeprands are tensors, or can be made legal with promotion of
the fused view (in future).
Differential Revision: https://reviews.llvm.org/D90579
A piece of logic of `isLoopInvariantExitCondDuringFirstIterations` is actually
a generalized predicate monotonicity check. This patch moves it into the
corresponding method and generalizes it a bit.
Differential Revision: https://reviews.llvm.org/D90395
Reviewed By: apilipenko
Exposing the C versions of the methods of the sparse runtime support lib
through header files will enable using the same methods in an MLIR program
as well as a C++ program, which will simplify future benchmarking comparisons
(e.g. comparing MLIR generated code with eigen for Matrix Market sparse matrices).
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91316
Sometimes the an instruction we are trying to widen is used by the IV
(which means the instruction is the IV increment). Currently this may
prevent its widening. We should ignore such user because it will be
dead once the transform is done anyways.
Differential Revision: https://reviews.llvm.org/D90920
Reviewed By: fhahn
In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.
This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.
Differential Revision: https://reviews.llvm.org/D91305
InstCombine canonicalizes 'sub nuw' instructions to 'add' without the
`nuw` flag. The typical case where we see it is decrementing induction
variables. For them, IndVars fails to prove that it's legal to widen them,
and inserts unprofitable `zext`'s.
This patch adds recognition of such pattern using SCEV.
Differential Revision: https://reviews.llvm.org/D89550
Reviewed By: fhahn, skatkov
We have option -mabi=ieeelongdouble to set current long double to
IEEEquad semantics. Like what GCC does, we need to define
__LONG_DOUBLE_IEEE128__ macro in this case, and __LONG_DOUBLE_IBM128__
if using PPCDoubleDouble.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90208
This CL integrates the new sparse annotations (hereto merely added as fully
transparent attributes) more tightly to the generic linalg op in order to add
verification of the annotations' consistency as well as to make make other
passes more aware of their presence (in the long run, rewriting rules must
preserve the integrity of the annotations).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D91224
Summary:
This patch begins to add support for a set of scripts that can be used to get information from OpenMP programs to better describe problems and eventually show the data to the user in formatted output. Right now the only support is forformatting the register and memory usage reports from ptxas and nvlink. This is simply done as a wrapper around clang and clang++.
Reviewers: jdoerfert
DIfferential Revision: https://reviews.llvm.org/D91085
The implementation of Messages with forward_list<> makes some
nonstandard assumptions about the validity of iterators that don't
hold up with MSVC's implementation. Use list<> instead. The
measured performance is comparable.
This change obviated a distinction between two member functions
of Messages, and the uses of one have been replaced with calls
to the other.
Similar usage in CharBuffer was also replaced for consistency.
Differential revision: https://reviews.llvm.org/D91210
Non-mechanical changes:
- Added FIXME to StringLiteral to cover multi-token string literals.
- LiteralExpression::getLiteralToken() is gone. (It was never called)
This is because we don't codegen methods in Alternatives
It's conceptually suspect if we consider multi-token string literals, though.
Differential Revision: https://reviews.llvm.org/D91277
Following discussion in D91193, a change made in D88792 was not quite right.
This restores the message argument, and switches from `expect` to `runCmd`.
Differential Revision: https://reviews.llvm.org/D91206
We need to be able to call function pointers. Inline the dispatch
function.
Also inline the context projection function.
Transfer debug locations from the suspend point to the inlined functions.
Use the function argument index instead of the function argument in
coro.id.async. This solves any spurious use issues.
Coerce the arguments of the tail call function at a suspend point. The LLVM
optimizer seems to drop casts leading to a vararg intrinsic.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D91098
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Since callbacks may end up not adding passes, we need to check if the
pass managers are empty before adding them, so PassManager now has an
isEmpty() function. For example, polly adds callbacks but doesn't always
add passes in those callbacks, so this is necessary to keep
-debug-pass-manager tests' output from changing depending on if polly is
enabled or not.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
Reviewed By: asbirlea, Meinersbur
Differential Revision: https://reviews.llvm.org/D89158
except where they are necessary to disambiguate the target.
This substantially improves diagnostics from the standard library,
which are otherwise full of `::__1::` noise.
arguments of types by default.
This somewhat improves the worst-case printing of types like
std::string, std::vector, etc., where many irrelevant default arguments
can be included in the type as printed if we've lost the type sugar.
- Add verbose logging of payloads
- Add public logging of request summaries
- fix non-logging of messages in request scopes (oops!)
- add test for public/non-public logging, extending pipeline_helper a bit.
We've accumulated quite a lot of duplication in the request handlers by now.
I should factor that out, but not in this patch...
Differential Revision: https://reviews.llvm.org/D90654
Operand tree forwarding can cause the change of an access kind; in
particular change from a scalar kind to an array kind if the scalar
dependency is not necessary. Such an access cannot and doesn't need to
be forwarded anymore.
Fixes llvm.org/PR48034
Print to dbgs() any taken action.
Also, read-only scalars do not require any action unless
-polly-analyze-read-only-scalars=true is used. Better refect this by
using ForwardingAction::triviallyForwardable and thus not bumping the
statistics.
Avoid requiring an actual MemoryBuffer in ComputePreambleBounds, when
a MemoryBufferRef will do just fine.
Differential Revision: https://reviews.llvm.org/D90890
Previously a corrupted index shard could cause us to resize arrays to an
arbitrary int32. This tends to be a huge number, and can render the
system unresponsive.
Instead, cap this at the amount of data that might reasonably be read
(e.g. the #bytes in the file). If the specified length is more than that,
assume the data is corrupt.
Differential Revision: https://reviews.llvm.org/D91258
This has been a long-standing TODO item, however we have now been requiring
a monorepo layout to build libc++ and libc++abi for a while now. Hence,
we can fix this code duplication issue now.
Note that it's still not super pretty to reach into libc++ to include
headers, but it's better than having duplicated code which can get out
of sync.
Avoid a spurious error message about a dummy procedure reference
in a specification expression by restructuring the handling of
use-associated and host-associated symbols.
Updated to fix a circular dependence between shared library
binaries that was introduced by the original patch.
Differential revision: https://reviews.llvm.org/D91286
Previously the inliner did a bit of a hack by adding ref edges for all
new edges introduced by performing an inline before calling
updateCGAndAnalysisManagerForPass(). This was because
updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call
edges.
This adds handling of non-trivial call edges to
updateCGAndAnalysisManagerForPass(). The inliner called
updateCGAndAnalysisManagerForFunctionPass() since it was handling adding
newly introduced edges (so updateCGAndAnalysisManagerForPass() would
only have to handle promotion), but now it needs to call
updateCGAndAnalysisManagerForCGSCCPass() since
updateCGAndAnalysisManagerForPass() is now handling the new call edges
and function passes cannot add new edges.
We follow the previous path of adding trivial ref edges then letting promotion
handle changing the ref edges to call edges and the CGSCC updates. So
this still does not allow adding call edges that result in an addition
of a non-trivial ref edge.
This is in preparation for better detecting devirtualization. Previously
since the inliner itself would add ref edges,
updateCGAndAnalysisManagerForPass() would think that promotion and thus
devirtualization had happened after any sort of inlining.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D91046
The Itanium CXX ABI grammer has been extended to support parameterized
vendor extended types [1].
This patch updates Clang's mangling for matrix types to use the new
extension.
[1] b359d28971
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D91253
Some changes were made to the libc++abi new/delete definitions, but
they were not copied back to the libc++ definition. It sucks that we
have this duplication, but for now at least let's keep them in sync.
When there is full fp16 support, there is no reason to widen 16-bit
G_FCONSTANTs to 32 bits. Mark them as legal in this case.
Also, we currently import a pattern for materializing a 16-bit 0.0.
Add a testcase showing we select it.
(All other 16-bit G_FCONSTANTS are not yet selected.)
Differential Revision: https://reviews.llvm.org/D89164
Support a vector register constraint in inline asm of clang.
Add a regression test also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D91251
F18 clause 5.3.3 explicitly allows labels on program unit END statements.
Label resolution code accounts for this for singleton program units,
but incorrectly generates an error for host subprograms with internal
subprograms.
subroutine s(n)
call s1(n)
if (n == 0) goto 88 ! incorrect error
print*, 's'
contains
subroutine s1(n)
if (n == 0) goto 77 ! ok
print*, 's1'
77 end subroutine s1
88 end
Label resolution code makes a sequential pass over an entire file to
collect label information for all subprograms, followed by a pass through
that information for semantics checks. The problem is that END statements
may be separated from prior subprogram code by internal subprogram
definitions, so an END label can be associated with the wrong subprogram.
There are several ways to fix this. Labels are always local to a
subprogram. So the two separate passes over the entire file could probably
instead be interleaved to perform analysis on a subprogram as soon as the
end of the subprogram is reached, using a small stack. The stack structure
would account for the "split" code case. This might work.
It is possible that there is some not otherwise apparent advantage to
the current full-file pass design. The parse tree has productions that
provide access to a subprogram END statement "in advance". An alternative
is to access this information to solve the problem. This PR implements
this latter option.
Differential revision: https://reviews.llvm.org/D91217
Before the change, DFSan always does the propagation. W/o
origin tracking, it is harder to understand such flows. After
the change, the flag is off by default.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D91234