This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D110106
Trying to use the DIA SDK reader only to fail with "DIA SDK wasn't enabled"
isn't very useful. The native PDB reader is missing some stuff, but it's still
better than nothing.
Reduces number of lldb-check-shell test failures with LLVM_ENABLE_DIA_SDK=NO
from 27 to 15.
Differential Revision: https://reviews.llvm.org/D110172
This implements the generic std.format.spec framework for all types.
The Unicode support will be added in a separate patch.
Implements parts of:
- P0645 Text Formatting
Completes:
- LWG-3242 std::format: missing rules for arg-id in width and precision
- P1892 Extended locale-specific presentation specifiers for std::format
Reviewed By: #libc, ldionne, vitaut
Differential Revision: https://reviews.llvm.org/D103368
We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer).
Requires reassoc.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D109954
This patch changes the signature of the load and store vector pair
builtins to match their documentation. The type of the `signed long long`
argument is changed to `signed long`. This patch also changes existing testcases
to match the signature change.
Reviewed By: lei, Conanap
Differential Revision: https://reviews.llvm.org/D109996
The current code writes the pc-table at the process startup,
which may happen before the common_flags() are initialized.
Move writing to the process end.
This is consistent with how we write the counters and avoids the problem with the uninitalized flags.
Add prints if verbosity>=1.
Reviewed By: kostik
Differential Revision: https://reviews.llvm.org/D110119
isValidAssumeForContext can provide better results with access to the
dominator tree in some cases. This patch adjusts computeConstantRange to
allow passing through a dominator tree.
The use VectorCombine is updated to pass through the DT to enable
additional scalarization.
Note that similar APIs like computeKnownBits already accept optional dominator
tree arguments.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D110175
When using instructions which have a MetadataAsValue argument
(e.g. some target-specific intrinsics) MD canonicalization strips
internal MDNodes with a single ConstantAsMetadata child. That
prevented IRTranslator from the proper translation of such a calls.
Compute the tiled producer slice dimensions directly starting from the consumer not using the producer at all.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110147
Add a helper method to check if an index vector contains a permutation of its indices. Additionally, refactor applyPermutationToVector to take int64_t.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110135
It was previously assumed that tensor.insert_slice should be bufferized first in a greedy fashion to avoid out-of-place bufferization of the large tensor. This heuristic does not hold upon further inspection.
This CL removes the special handling of such ops and adds a test that exhibits better behavior and appears in real use cases.
The only test adversely affected is an artificial test which results in a returned memref: this pattern is not allowed by comprehensive bufferization in real scenarios anyway and the offending test is deleted.
Differential Revision: https://reviews.llvm.org/D110072
Previously, comprehensive bufferize would consider all aliasing reads and writes to
the result buffer and matching operand. This resulted in spurious dependences
being considered and resulted in too many unnecessary copies.
Instead, this revision revisits the gathering of read and write alias sets.
This results in fewer alloc and copies.
An exhaustive test cases is added that considers all possible permutations of
`matmul(extract_slice(fill), extract_slice(fill), ...)`.
Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0),
if c1/c0 and c1%c0 are simm12, while c1 is not.
Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0),
if c1%c0 is zero, and c1/c0 is simm12 while c1 is not.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D108607
Helper function `getDefaultOpenCLPointeeAddrSpace()` introduced to
`ASTContext` class. It returns default OpenCL address space
depending on language version and enabled features. If generic
address space is supported, the helper function returns value
`LangAS::opencl_generic`. Otherwise, value `LangAS::opencl_private`
is returned. Code refactoring changes performed in several suitable
places.
Differential Revision: https://reviews.llvm.org/D109874
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.
Also added an API for retrieving a unique undroppable user.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D109700
Add documentation of unbundling of heterogeneous device archives to
create device specific archives, as introduced by D93525. Also, add
documentation for supported text file formats.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D110083
The updated list is based on the output of
cmake -G Ninja -S llvm -B build -DLLVM_ENABLE_PROJECTS='foo'.
Differential Revision: https://reviews.llvm.org/D110124
One of the two inputs of the Shufflevector is often a placeholder.
Previously, there were cases where the placeholder was undef, and there were cases where it was poison.
I added these constructors to create a placeholder consistently.
Changing to use the newly added constructor will be written in a separate patch.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D110146
SystemZ adds the EXRL target instructions in the end of each file. This must
be done before debug info emission since that may end the text section, and
therefore this is now done in emitConstantPools() (instead of in
emitEndOfAsmFile).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109513
Currently we set thr->tctx after OnStarted callback
taking thread registry mutex again and searching for the context.
But OnStarted already runs under the thread registry mutex
and has access to the context, so set it in the OnStarted.
This makes code simpler and faster.
Depends on D110132.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110133
Thread state functions are split into 2 parts:
tsan entry function (e.g. ThreadStart) and thread registry
state change callback (e.g. OnStart). Currently these
pairs of functions are located far from each other and
in reverse order. This makes it hard to read and follow the logic.
Reorder the code so that OnFoo directly follows ThreadFoo.
No other code changes.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110132
Some of the DPrintf's currently produce -Wformat warnings if enabled.
Fix these format strings.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D110131
FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac
if there are no source modifiers, just like we do for other mad/mac and
fma/fmac cases.
Differential Revision: https://reviews.llvm.org/D110074
v_fmac with source modifiers forces VOP3 encoding, but it is strictly
better to use the VOP3-only v_fma instead, because $dst and $src2 are
not tied so it gives the register allocator more freedom and avoids a
copy in some cases.
This is the same strategy we already use for v_mad vs v_mac and
v_fma_legacy vs v_fmac_legacy.
Differential Revision: https://reviews.llvm.org/D110070
Add generic helper function that matches constant splat. It has option to
match constant splat with undef (some elements can be undef but not all).
Add util function and matcher for G_FCONSTANT splat.
Differential Revision: https://reviews.llvm.org/D104410
The logic in howManyLessThans is fishy. It first checks invariance of
RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which
is, strictly saying, a different thing. We are seeing a very rare intermittent
failure of availability checks, and it looks like this precondition is
sometimes broken. Before we can figure out what's going on, adding asserts
that all involved values that may possibly to to isLoopEntryGuardedByCond
are available at loop entry.
If either of these asserts fails (OrigRHS is the most likely suspect), it
means that the logic here is flawed.
This fixes PR51730, a heap-use-after-free bug in
replaceConditionalBranchesOnConstant().
With the attached reproducer we were left with a function looking
something like this after replaceAndRecursivelySimplify():
[...]
cont2.i:
br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i
handler.type_mismatch3.i:
%3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ]
unreachable
cont4.i:
unreachable
[...]
with both the branch instruction and PHI node being in the worklist. As
a result of replacing the branch instruction with an unconditional
branch, the PHI node in %handler.type_mismatch3.i would be removed. This
then resulted in a heap-use-after-free bug due to accessing that removed
PHI node in the next worklist iteration.
This is solved by using a value handle worklist. I am a unsure if this
is the most idiomatic solution. Another solution could have been to
produce a worklist just containing the interesting branch instructions,
but I thought that it perhaps was a bit cleaner to keep all worklist
filtering in the loop that does the rewrites.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D109221