Parametrize the cache file with TARGET_TRIPLE parameter. Normalize
the target triple to follow the runtime library installation directory.
Explicity enable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR option.
Original interfaces are not safe to be called during dialect conversion.
This is because some ops (e.g. `dynamic_reshape(input, target_shape)`)
depend on the values of their operands to calculate the output shape.
However the operands may be out of reach during dialect conversion (e.g.
converting from tensor world to buffer world). This patch provides a new
kind of interface which accpets user-provided operands to solve this
problem.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D102317
"[mlir] Speed up Lexer::getEncodedSourceLocation"
This reverts commit 3043be9d2d and commit
861d69a525.
This change resulted in printing textual MLIR that can't be parsed; see
review thread https://reviews.llvm.org/D102567 for details.
Summary:
The OpenMP runtime functions don't always provide unique thread ID's to
determine if a basic block is truly single-threaded. Change the implementation
to only check NVPTX intrinsics for now.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102700
This patch transforms the sequence
lea (reg1, reg2), reg3
sub reg3, reg4
to two sub instructions
sub reg1, reg4
sub reg2, reg4
Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register
of ADD).
Differential Revision: https://reviews.llvm.org/D101970
At present, a lot of code contains main function bodies like "return failed(mlir::MlirOptMain(...);". This is unfortunate for two reasons: a) it uses ADL, which is maybe not what the free "failed" function was designed for; and b) it is a bit awkward to read, requring the reader to both understand the boolean nature of the value and the semantics of main's return value. (And it's also not portable, since 1 is not a portable success value.)
The replacement code, `return mlir::AsMainReturnCode(mlir::MlirOptMain(...))` is a bit more self-explanatory.
The change applies the new function to a few internal uses of MlirOptMain, too.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D102641
We currently use SourceMgr::getLineAndColumn to get the line and column for an SMLoc, but this includes a call to StringRef::find_last_of that ends up dominating compile time. In D102567, we start creating locations from the input file for block arguments which resulted in an extreme performance regression for modules with very large amounts of block arguments. This revision switches to just using a pointer offset from the beginning of the line to calculate the column(all MLIR files are simple ascii), resulting in a compile time reduction from 4700 seconds (1 hour and 18 minutes) to 8 seconds.
Differential Revision: https://reviews.llvm.org/D102734
Add this attribute to some types to ensure that they have
debug info.
The debug info for these classes are required for debuggers to display
some STL types. With constructor homing (a new debug info optimization)
their debug info isn't emitted because their constructors are never
called.
The list of types with the attribute added are __hash_value_type,
__value_type, __tree_node_base, __tree_node, __hash_node, __list_node,
and __forward_list_node.
Differential Revision: https://reviews.llvm.org/D98750
Fix __bitop_unsigned_integer and rename to __libcpp_is_unsigned_integer.
There are only five unsigned integer types, so we should just list them out.
Also provide `__libcpp_is_signed_integer`, even though the Standard doesn't
consume that trait anywhere yet.
Notice that `concept uniform_random_bit_generator` is specifically specified
to rely on `concept unsigned_integral` and *not* `__is_unsigned_integer`.
Instantiating `std::ranges::sample` with a type `U` satisfying
`uniform_random_bit_generator` where `unsigned_integral<U::result_type>`
and not `__is_unsigned_integer<U::result_type>` is simply IFNDR.
Orthogonally, fix an undefined behavior in std::countr_zero(__uint128_t).
Orthogonally, improve tests for the <bit> manipulation functions.
It was these new tests that detected the bug in countr_zero.
Differential Revision: https://reviews.llvm.org/D102328
This patch implements first part of Flow Sensitive SampleFDO (FSAFDO).
It has the following changes:
(1) disable current discriminator encoding scheme,
(2) new hierarchical discriminator for FSAFDO.
For this patch, option "-enable-fs-discriminator=true" turns on the new
functionality. Option "-enable-fs-discriminator=false" (the default)
keeps the current SampleFDO behavior. When the fs-discriminator is
enabled, we insert a flag variable, namely, llvm_fs_discriminator, to
the object. This symbol will checked by create_llvm_prof tool, and used
to generate a profile with FS-AFDO discriminators enabled. If this
happens, for an extbinary format profile, create_llvm_prof tool
will add a flag to profile summary section.
Differential Revision: https://reviews.llvm.org/D102246
Revert recent commit to require x86-registered-target (e4b790c5e3).
Remove -O1 from the run lines so they are less dependent on backend passes.
Update the CHECK6 and CHECK10 lines with script.
Differential Revision: https://reviews.llvm.org/D102720
In many cases it is helpful to know at what address the resolved function starts.
This patch adds a new StartAddress member to the DILineInfo structure.
Reviewed By: jhenderson, dblaikie
Differential Revision: https://reviews.llvm.org/D102316
X86 NaCl generally requires the stack to be aligned to 16 bytes.
This change was already implemented in two downstream NaCl compilers
based on llvm.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D102610
While developing a change to the allocator I ended up breaking
realloc on secondary allocations with increasing sizes. That didn't
cause any of the unit tests to fail, which indicated that we're
missing some test coverage here. Add a unit test for that case.
Differential Revision: https://reviews.llvm.org/D102716
This is a hook that allows for providing custom initialization of the pattern, e.g. if it has bounded recursion, setting the debug name, etc., without needing to define a custom constructor. A non-virtual hook was chosen to avoid polluting the vtable with code that we really just want to be inlined when constructing the pattern. The alternative to this would be to just define a constructor for each pattern, this unfortunately creates a lot of otherwise unnecessary boiler plate for a lot of patterns and a hook provides a much simpler/cleaner interface for the very common case.
Differential Revision: https://reviews.llvm.org/D102440
We currently do not document how the pattern rewriter infra treats recursion when it gets detected. This revision adds a blurb on recursion in patterns, and how patterns can signal that they are equipped to handle it.
Differential Revision: https://reviews.llvm.org/D102439
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().
A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.
The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.
This is a reland after an MSan fix in D102667.
Differential Revision: https://reviews.llvm.org/D101713
This patch adds the XPLINK64 calling convention to the SystemZ
backend. It specifies and implements the argument passing and
return value conventions.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D101010
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.
SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D102032
Don't check that types match when the pointer operand is an opaque
pointer.
I would separate the Assembler and Verifier changes, but
verify-uselistorder in the Assembler test ends up running the verifier.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102450
We already use -O0 for unittests under ThinLTO, do the same for full LTO
where the compile time costs to runtime benefits tradeoff is even worse.
Differential Revision: https://reviews.llvm.org/D102718
Handle PDB writing errors like any other error in LLD: emit an error and
continue. This allows the linker to print timing data and summary data
after linking, which can be helpful for finding PDB size problems. Also
report how large the file would have been.
Example output:
lld-link: error: Output data is larger than 4 GiB. File size would have been 6,937,108,480
lld-link: error: failed to write PDB file ./chrome.dll.pdb
Summary
--------------------------------------------------------------------------------
33282 Input OBJ files (expanded from all cmd-line inputs)
4 PDB type server dependencies
0 Precomp OBJ dependencies
33396931 Input type records
... snip ...
Input File Reading: 59756 ms ( 45.5%)
GC: 7500 ms ( 5.7%)
ICF: 3336 ms ( 2.5%)
Code Layout: 6329 ms ( 4.8%)
PDB Emission (Cumulative): 46192 ms ( 35.2%)
Add Objects: 27609 ms ( 21.0%)
Type Merging: 16740 ms ( 12.8%)
Symbol Merging: 10761 ms ( 8.2%)
Publics Stream Layout: 9383 ms ( 7.1%)
TPI Stream Layout: 1678 ms ( 1.3%)
Commit to Disk: 3461 ms ( 2.6%)
--------------------------------------------------
Total Link Time: 131244 ms (100.0%)
Differential Revision: https://reviews.llvm.org/D102713
The version is used by LSP clients to ignore stale diagnostics, and can be used in a followup to help verify incremental changes.
Differential Revision: https://reviews.llvm.org/D102644
D101838 incorrectly handled indices vectors of the same size but with higher element counts to just bitcast to the target indices type instead of performing a ZERO_EXTEND_VECTOR_INREG
In cases where -fno-integrated-as is specified we should overwrite the
EmitAssembly action as well.
We also should rely on the target triple from the process at least until we
implement out-of-process execution.
This patch should improve clang-repl on AIX.
Discussion available at: https://reviews.llvm.org/D96033
Differential revision: https://reviews.llvm.org/D102688
This patch introduces functionality used by BOLT when
re-linking the final binary. It adds to MemoryManager a new member
function allowStubAllocation to control whether this MemoryManager
supports increasing code size with stubs or not. Since BOLT can
rewrite some files in-place, it needs to avoid stub insertion done
by the linker. This patch also introduces allowsZeroSymbols to the
JITSymbolResolver class, enabling us to finish a link successfully
even when some symbols resolve to the value zero. When rewriting a
binary, sometimes we do need to resolve a target to zero in case
the input binary calls address zero and we want to be bug
compatible. We also expose reassignSectionAddress as it is used by
BOLT.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97898
The MaybePromotable set keeps track of loads/stores for which
promotion was not attempted yet. Normally, any load/stores that
are promoted in the current iteration will be removed from this
set, because they naturally MustAlias with the promoted value.
However, if the source program has UB with metadata claiming that
a store is NoAlias, while it is actually MustAlias, and multiple
different pointers are promoted in the same iteration, it can
happen that a store is removed that is still in the MaybePromotable
set, causing a use-after-free.
While this could be fixed by explicitly invalidating values in
MaybePromotable in the LoopPromoter, I'm going with the more
radical option of dropping the set entirely here and check all
load/stores on each promotion iteration. As promotion, and especially
repeated promotion, are quite rare, this doesn't seem to have any
impact on compile-time.
Fixes https://bugs.llvm.org/show_bug.cgi?id=50367.
Define an API for the transformational intrinsic function MATMUL,
implement it, and add some basic unit tests. The large number of
possible argument type combinations are covered by a set of
generalized templates that are instantiated for each valid
pair of possible argument types.
Places where BLAS-2/3 routines could be called for acceleration
are marked with TODOs. Handling for other special cases (e.g.,
known-shape 3x3 matrices and vectors) are deferred.
Some minor tweaks were made to the recent related implementation
of DOT_PRODUCT to reflect lessons learned.
Differential Revision: https://reviews.llvm.org/D102652
This is a GNU as and Clang cc1as option, not a GCC option.
Users should specify `-Wa,-mimplicit-it=` instead.
Note: mixing the -m option and the -Wa, option doesn't work
`-Wa,-mimplicit-it=never -mimplicit-it=always` =>
`clang (LLVM option parsing): for the --arm-implicit-it option: may only occur zero or one times!`
Reviewed By: nickdesaulniers, raj.khem
Differential Revision: https://reviews.llvm.org/D102568