Lower !gpu.async.tokens returned from async.execute regions to events instead of streams.
Make !gpu.async.token returned from !async.execute single-use.
This allows creating one event per use and destroying them without leaking or ref-counting.
Technically we only need this for stream/event-based lowering. I kept the code separate
from the rest of the gpu-async-region pass so that we can make this optional or move
to a separate pass as needed.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D96965
This patch extends support for our custom-lowering of scalable-vector
truncates to include those of fixed-length vectors. It does this by
co-opting the custom RISCVISD::TRUNCATE_VECTOR node and adding mask and
VL operands. This avoids unnecessary duplication of patterns and
inflation of the ISel table.
Some truncates go through CONCAT_VECTORS which currently isn't
efficiently handled, as it goes through the stack. This can be improved
upon in the future.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97202
This patch adds support for the custom lowering sign- and zero-extension
of fixed-length vector types. It does so through custom nodes. Since the
source and destination types are (necessarily) of different sizes, it is
possible that the source type is legal whilst the larger destination
type isn't. In this case the legalization makes heavy use of
EXTRACT_SUBVECTOR.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97194
This patch unifies the two disparate paths for lowering
EXTRACT_SUBVECTOR operations under one roof. Consequently, with this
patch it is possible to support any fixed-length subvector extraction,
not just "cast-like" ones.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97192
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D88287
We should be able to extend this "canonicalizeShuffleWithBinOps" to handle more generic binop cases where either/both operands can be cheaply shuffled.
Allow users to use a non-system version of perl, python and awk, which is useful
in certain package managers.
Reviewed By: JDevlieghere, MaskRay
Differential Revision: https://reviews.llvm.org/D95119
When calling SelectionDAG::getNode() to create an ADD or SUB
of two vectors with i1 element types we can canonicalise this
to use XOR instead, where 1+1 is treated as wrapping around
to 0 and 0-1 wraps to 1.
I've added the following tests for SVE targets:
CodeGen/AArch64/sve-pred-arith.ll
and modified some X86 tests to reflect the much simpler codegen
required.
Differential Revision: https://reviews.llvm.org/D97276
Some people are using alternative address spaces to track GC data, but
otherwise they behave exactly the same. This is the only place in the backend
we even try to care about it so it's really not achieving anything.
Finally, this patch moves from round-tripping one `CompilerInvocation` at a time to round-tripping the invocation as a whole.
This patch includes only the code required to make round-tripping the whole invocation work. More cleanups will be done in a follow-up patch.
Depends on D96847, D97041 & D97042.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D96280
After a revision of D96274 changed `DiagnosticOptions` to not store all remark arguments **as-written**, it is no longer possible to reconstruct the arguments accurately from the class.
This is caused by the fact that for `-Rpass=regexp` and friends, `DiagnosticOptions` store only the group name `pass` and not `regexp`. This is the same representation used for the plain `-Rpass` argument.
Note that each argument must be generated exactly once in `CompilerInvocation::generateCC1CommandLine`, otherwise each subsequent call would produce more arguments than the previous one. Currently this works out because of the way `RoundTrip` splits the responsibilities for certain arguments based on what arguments were queried during parsing. However, this invariant breaks when we move to single round-trip for the whole `CompilerInvocation`.
This patch ensures that for one `-Rpass=regexp` argument, we don't generate two arguments (`-Rpass` from `DiagnosticOptions` and `-Rpass=regexp` from `CodeGenOptions`) by shifting the responsibility for handling both cases to `CodeGenOptions`. To distinguish between the cases correctly, additional information is stored in `CodeGenOptions`.
The `CodeGenOptions` parser of `-Rpass[=regexp]` arguments also looks at `-Rno-pass` and `-R[no-]everything`, which is necessary for generating the correct argument regardless of the ordering of `CodeGenOptions`/`DiagnosticOptions` parsing/generation.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D96847
We can now express all marshalling semantics in `Opt{In,Out}FFlag` via `BoolFOption`.
This patch moves remaining `Opt{In,Out}FFlag` instances using marshalling to `BoolFOption` and removes marshalling capabilities from `Opt{In,Out}FFlag` entirely.
This simplifies the decisions developers have to make when creating new boolean options:
* For simple cc1 flag pairs, use `Bool{,F,G}Option`.
* For cc1 flag pairs that require complex marshalling logic, use `Opt{In,Out}FFlag` and implement marshalling manually.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D97370
This patch introduces a tablegen multiclass called `MarshallingInfoEnum`. It has the same semantics as `MarshallingInfoString` had in combination with `AutoNormalizeEnum`, but it's easier to use and follows the convention used for other `MarshallingInfoXxx` multiclasses.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D97375
This fixes the documentation emitted for type parameters. Also adds a
missing empty line, rendered as line break in mark down.
Co-authored-by: Simon Camphausen <simon.camphausen@iml.fraunhofer.de>
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D97267
Fix the logic to find the app bundle in a path by correctly accounting
for paths containing multiple occurrences of `.app`. The new logic will
correctly extract `com.app.Foo.app` from `com.app.Foo.app/com.app.Foo`.
rdar://74666208
Differential revision: https://reviews.llvm.org/D97441
There are two preconditions to reproduce the issue,
1. Use -save-temps option
2. Provide the -o option with name equal to the input file name
without the file extension. For e.g. clang a.c -o a
With the -o specified, the AssembleJobAction after OffloadWrapperJobAction
will produce the object file with same name as host code object file.
Due to this clash, the OffloadWrapperAction overwrites the initial host
object file, which results in lld error. This also fixes the `multiple definition of __dummy.omp_offloading.entry'` issue in D96769 .
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D97273
This document was originally introduced in ab4648504b, and was reverted in
912bc4980e while I investigated a number of shpinx bot errors. This commit
reintroduces the document with fixes for those errors, as well as some
improvements to the wording and formatting.
In the existing logic, we look at the lifetime.start marker of each alloca, and check all uses of the alloca, to see if any pair of the lifetime marker and an use of alloca crosses suspension point.
This approach is unfortunately incorrect. An use of alloca does not need to be a direct use, but can be an indirect use through alias.
Only checking direct uses can miss cases where indirect uses are crossing suspension point.
This can be demonstrated in the newly added test case 007.
In the test case, both x and y are only directly used prior to suspend, but they are captured into an alias, merged through a PHINode (so they couldn't be materialized), and used after CoroSuspend.
If we only check whether the lifetime starts cross suspension points with direct uses, we will put the allocas to the stack, and then capture their addresses in the frame.
Instead of fixing it in D96441 and D96566, this patch takes a different approach which I think is better.
We still checks the lifetime info in the same way as before, but with two differences:
1. The collection of liftime.start is moved into AllocaUseVisitor to make the logic more concentrated.
2. When looking at lifetime.start and use pairs, we not only checks the direct uses as before, but in this patch we check all uses collected by AllocaUseVisitor, which would include all indirect uses through alias. This will make the analysis more accurate without throwing away the lifetime optimization.
Differential Revision: https://reviews.llvm.org/D96922
Move the remaing of FIR types to TableGen type definition. This follow suggestion in D96422.
Reviewed By: schweitz, jeanPerier, rriddle
Differential Revision: https://reviews.llvm.org/D96987
We're running into undefined references using ThinLTO with -O0 on
Windows/Chrome. This fixes that.
This matches the legacy PM.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D97414
Adding support for intrinsics of AMX-BF16.
This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong
predicate.
Differential Revision: https://reviews.llvm.org/D97358
Add availability checks to the os_signpost code so this can be used with
an older deployment target.
Differential revision: https://reviews.llvm.org/D97410
We always create the VL operand using a register, but if we can
determine that it came from an ADDI X0, imm with a sufficiently
small immediate, we can use VSETIVLI.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97332
We just started using a ComplexPattern for sexti32. This updates
zexti32 to match.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D97231
For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope static device variables in host code, we need to give them
a distinct name and external linkage. This can be done by postfixing each static device variable with
a distinct CUID (Compilation Unit ID) hash.
Since the static device variables have different name across compilation units, now we let
them have external linkage so that they can be looked up by the runtime.
Reviewed by: Artem Belevich, and Jon Chesterfield
Differential Revision: https://reviews.llvm.org/D85223
This expands the op to support error propagation and also makes it symmetric with "shape.get_extent" op.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D97261
Fix a buffer overrun that can occur when parsing '%c' at the end of a
filename pattern string.
rdar://74571261
Reviewed By: kastiglione
Differential Revision: https://reviews.llvm.org/D97239
Define inline versions of __compiler_rt_fmax* and __compiler_rt_scalbn*
rather than depend on the versions in libm. As with
__compiler_rt_logbn*, these functions are only defined for single,
double, and quad precision (binary128).
Fixes PR32279 for targets using only these FP formats (e.g. Android
on arm/arm64/x86/x86_64).
For single and double precision, on AArch64, use __builtin_fmax[f]
instead of the new inline function, because the builtin expands to the
AArch64 fmaxnm instruction.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D91841
Make sure to set the bottom bit of the symbol even when the type
attribute of a label is set after the label.
GNU as sets the thumb state according to the thumb state of the label.
If a .type directive is placed after the label, set the symbol's thumb
state according to the thumb state of the .type directive. This matches
GNU as in most cases.
From: Stefan Agner <stefan@agner.ch>
This fixes:
https://bugs.llvm.org/show_bug.cgi?id=44860https://github.com/ClangBuiltLinux/linux/issues/866
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D74927