Summary:
This patch starts adding support for adding information dumps to libomptarget
and rtl plugins. The information printing is controlled by the
LIBOMPTARGET_INFO environment variable introduced in D86483. The goal of this
patch is to provide the user with additional information about the device
during kernel execution and providing the user with information dumps in the
case of failure. This patch added the ability to dump the pointer mapping table
as well as printing the number of blocks and threads in the cuda RTL.
Reviewers: jdoerfort gkistanova ye-luo
Subscribers: guansong openmp-commits sstefan1 yaxunl ye-luo
Tags: #OpenMP
Differential Revision: https://reviews.llvm.org/D87165
This commit specifies reduction dimensions for ConvOps. This prevents
running reduction loops in parallel and enables easier detection of kernel dimensions
which we will need later on.
Differential Revision: https://reviews.llvm.org/D87288
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220
Since we always generate CopyToRegs for statepoint results,
we must update DAG root after emitting statepoint, so that
these copies are scheduled before any possible local uses.
Note: getControlRoot() flushes all PendingExports, not only
those we generates for relocates. If that'll become a problem,
we can change it to flushing relocate exports only.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D87251
Consistently use the same pattern of returning *this from the clearUnusedBits() call to allow us to early out from the isSingleWord() path and avoid an else statement.
If the debug section's name isn't recognized, it should be
dumped as a raw content section.
Reviewed By: jhenderson, grimar
Differential Revision: https://reviews.llvm.org/D87346
For out of tree builds, the user generally needs to specify LLVM_DIR and
MLIR_DIR on the command line so that the correct LLVM and MLIR
installations are picked up.
If the provided paths are absolute, everything works fine, however for
buildbots it is customary to work with relative paths, and that makes it
difficult for CMake to find the right modules to include.
This patch changes CMakeLists.txt to convert LLVM_DIR and MLIR_DIR to
absolute paths before adding them to CMAKE_MODULE_PATH. The inputs are
assumed to be relative to the source directory (llvm-project/flang).
Differential Revision: https://reviews.llvm.org/D87083
This commit introduces end-to-end integration tests for
convolutions that test multiple ways of ConvOps lowering.
Differential Revision: https://reviews.llvm.org/D87277
If a function had at most one return block, the pass would return false
regardless if an unified unreachable block was created.
This patch fixes that by refactoring runOnFunction into two separate
helper functions for handling the unreachable blocks respectively the
return blocks, as suggested by @bjope in a review comment.
This was caught using the check introduced by D80916.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D85818
This patch follows D85345 and adds more noundef attributes to return values/arguments of library functions
that are mostly about accessing the file system or processes.
A few functions like `chmod` or `times` use typedef `mode_t` and `clock_t`.
They are neither struct nor union, so they cannot contain undef even if they're lowered to iN in IR. So, it is fine to add noundef to them.
- clock_t's actual type is size_t (C17, 7.27.1.3), so it isn't struct or union.
- For mode_t, either int or long is used in practice because programmers use bit manipulation. So, I think it is okay that it's never aggregate in practice.
After this patch, the remaining library functions are those that eagerly participate in optimizations: they can be removed, reordered, or
introduced by a transformation from primitive IR operations.
For them, a few testings is needed, since it may not be valid to add noundef anymore even if C standard says it's okay.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85894
This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison.
isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84242
Some constructors of IEEEFloat do not initialize member variable exponent.
Fix it by initializing exponent with the following values:
For NaNs, the `exponent` is `maxExponent+1`.
For Infinities, the `exponent` is `maxExponent+1`.
For Zeroes, the `exponent` is `maxExponent-1`.
Patch by: @nullptr.cpp (Yang Fan)
Differential Revision: https://reviews.llvm.org/D86997
Currentl DomTreeNodeBase is using std::vectot to store it's children.
Using SmallVector should be more efficient in terms of compile-time.
A size of 4 seems to be the sweet-spot in terms of compile-time,
according to
http://llvm-compile-time-tracker.com/compare.php?from=9933188c90615c9c264ebb69117f09726e909a25&to=d7a801d027648877b20f0e00e822a7a64c58d976&stat=instructions
This results in the following geomean improvements
```
geomean insts max rss
O3 -0.31 % +0.02 %
ReleaseThinLTO -0.35 % -0.12 %
ReleaseLTO -0.28 % -0.12 %
O0 -0.06 % -0.02 %
NewPM O3 -0.36 % +0.05 %
ReleaseThinLTO (link only) -0.44 % -0.10 %
ReleaseLTO-g (link only): -0.32 % -0.03 %
```
I am not sure if there's any other benefits of using std::vector over
SmallVector.
Reviewed By: kuhar, asbirlea
Differential Revision: https://reviews.llvm.org/D87319
Add subtarget feature check to avoid using ds_read/write_b96/128 with too
low alignment if a bug is present on that specific hardware.
Add this "feature" to GFX 10.1.1 as it is also affected.
Add global-isel test.
The current BufferPlacement transformation cannot handle loops properly. Buffers
passed via backedges will not be freed automatically introducing memory leaks.
This CL adds support for loops to overcome these limitations.
Differential Revision: https://reviews.llvm.org/D85513
This adds support for substituting std::pair instantiations with enabled
import-std-module.
With the fixes in parent revisions we can currently substitute a single pair
(however, a result that returns a second pair currently causes LLDB to crash
while importing the second template instantiation).
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D85141
The ASTImporter has an `Imported(From, To)` callback that notifies subclasses
that a declaration has been imported in some way. LLDB uses this in the
`CompleteTagDeclsScope` to see which records have been imported into the scratch
context. If the record was declared inside the expression, then the
`CompleteTagDeclsScope` will forcibly import the full definition of that record
to the scratch context so that the expression AST can safely be disposed later
(otherwise we might end up going back to the deleted AST to complete the
minimally imported record). The way this is implemented is that there is a list
of decls that need to be imported (`m_decls_to_complete`) and we keep completing
the declarations inside that list until the list is empty. Every `To` Decl we
get via the `Imported` callback will be added to the list of Decls to be
completed.
There are some situations where the ASTImporter will actually give us two
`Imported` calls with the same `To` Decl. One way where this happens is if the
ASTImporter decides to merge an imported definition into an already imported
one. Another way is that the ASTImporter just happens to get two calls to
`ASTImporter::Import` for the same Decl. This for example happens when importing
the DeclContext of a Decl requires importing the Decl itself, such as when
importing a RecordDecl that was declared inside a function.
The bug addressed in this patch is that when we end up getting two `Imported`
calls for the same `To` Decl, then we would crash in the
`CompleteTagDeclsScope`. That's because the first time we complete the Decl we
remove the Origin tracking information (that maps the Decl back to from where it
came from). The next time we try to complete the same `To` Decl the Origin
tracking information is gone and we hit the `to_context_md->getOrigin(decl).ctx
== m_src_ctx` assert (`getOrigin(decl).ctx` is a nullptr the second time as the
Origin was deleted).
This is actually a regression coming from D72495. Before D72495
`m_decls_to_complete` was actually a set so every declaration in there could
only be queued once to be completed. The set was changed to a vector to make the
iteration over it deterministic, but that also causes that we now potentially
end up trying to complete a Decl twice.
This patch essentially just reverts D72495 and makes the `CompleteTagDeclsScope`
use a SetVector for the list of declarations to be completed. The SetVector
should filter out the duplicates (as the original `set` did) and also ensure that
the completion order is deterministic. I actually couldn't find any way to cause
LLDB to reproduce this bug by merging declarations (this would require that we
for example declare two namespaces in a non-top-level expression which isn't
possible). But the bug reproduces very easily by just declaring a class in an
expression, so that's what the test is doing.
Reviewed By: shafik
Differential Revision: https://reviews.llvm.org/D85648
The MemorySSAWrapperPass depends on AAResultsWrapperPass and if
MemorySSA is preserved but AAResultsWrapperPass is not, this could lead
to a crash when updating the last user of the MemorySSAWrapperPass.
Alternatively AAResultsWrapperPass could be marked preserved by GVN, but
I am not sure if that would be safe. I am not sure what is required in
order to preserve AAResultsWrapperPass. At the moment, it seems like a
couple of passes that do similar transforms to GVN are preserving it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87137
SemaSourceWithPriorities is a special SemaSource that wraps our normal LLDB
ExternalASTSource and the ASTReader (which is used for the C++ module loading).
It's only active when the `import-std-module` setting is turned on.
The `CompleteType` function there in `SemaSourceWithPriorities` is looping over
all ExternalASTSources and asks each to complete the type. However, that loop is
in another loop that keeps doing that until the type is complete. If that
function is ever called on a type that is a forward decl then that causes LLDB
to go into an infinite loop.
I remember I added that second loop and the comment because I thought I saw a
similar pattern in some other Clang code, but after some grepping I can't find
that code anywhere and it seems the rest of the code base only calls
CompleteType once (It would also be kinda silly to have calling it multiple
times). So it seems that's just a silly mistake.
The is implicitly tested by importing `std::pair`, but I also added a simpler
dedicated test that creates a dummy libc++ module with some forward declarations
and then imports them into the scratch AST context. At some point the
ASTImporter will check if one of the forward decls could be completed by the
ExternalASTSource, which will cause the `SemaSourceWithPriorities` to go into an
infinite loop once it receives the `CompleteType` call.
Reviewed By: shafik
Differential Revision: https://reviews.llvm.org/D87289
Take advantage of the new `dynamic_tensor_from_elements` operation in `std`.
Instead of stack-allocated memory, we can now lower directly to a single `std`
operation.
Differential Revision: https://reviews.llvm.org/D86935
The rewrite engine's cost model may determine some patterns to be irrelevant
ahead of their application. These patterns were silently ignored previously and
now cause a message in `--debug` mode.
Differential Revision: https://reviews.llvm.org/D87290
Current code in InstEmitter assumes all GC pointers are either
VRegs or stack slots - hence, taking only one operand.
But it is possible to have constant base, in which case it
occupies two machine operands.
Add a convinience function to StackMaps to get index of next
meta argument and use it in InsrEmitter to properly advance to
the next statepoint meta operand.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D87252
We really want to try and avoid spilling P0, which can be difficult
since there's only one register, so try to rematerialize any VCTP
instructions.
Differential Revision: https://reviews.llvm.org/D87280
This commit cleans up the ::initialize method of various AAs in the
following ways:
- If an associated function is required, give up on declarations.
This was discovered as a real problem when lots of llvm.dbg.XXX
call sites were assumed `noreturn` until proven otherwise. That
does not make any sense and caused huge regressions and missed
deductions.
- Require more associated declarations for function interface AAs.
- Use the IRAttribute::initialize to determine if function interface
AAs can be used in IPO, don't replicate the checks (especially
isFunctionIPOAmendable) all over the place. Arguably the function
declaration check should be moved to some central place to.