The patch fixes the SPIRV backend build using clang. It also replaces
UndefValue with PoisonValue in SPIRVRegularizer.cpp.
Fixes: #57773
Differential Revision: https://reviews.llvm.org/D134071
One must pick the same name as the one referenced in CodeGenFunction when
generating .inline version of an inline builtin, otherwise they are not
correctly replaced.
Differential Revision: https://reviews.llvm.org/D134362
This pass allows a user to dump a MIR function to a dot file
and view it as a graph. It is targeted to provide a similar
functionality as -dot-cfg pass on LLVM-IR. As of now the pass
also support below flags:
-dot-mcfg-only [optional][won't print instructions in the
graph just block name]
-mcfg-dot-filename-prefix [optional][prefix to add to output dot file]
-mcfg-func-name [optional] [specify function name or it's
substring, handy if mir file contains multiple functions and
you need to see graph of just one]
More flags and details can be introduced as per the requirements
in future. This pass is inspired from -dot-cfg IR pass and APIs
are written in almost identical format.
Patch by Yashwant Singh <Yashwant.Singh@amd.com> (yassingh)
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D133709
With Zbp removed, we no longer need the generalized forms.
The computeKnownBitsForTargetNode code brev8/orc.b is still based
on the general form with the shift amount forced to 7.
This is a bugfix patch that resolves the following two bugs in loop interchange:
1. PR57148 which is an assertion error due to of loss of LCSSA form after interchange,
as referred to test1() in pr57148.ll.
2. Use before def for the outermost loop induction variables after interchange,
as referred to test2() in pr57148.ll.
The fix in this patch is that:
1. In cases where the LCSSA form is not maintained after interchange, we update the IR
to the LCSSA form again.
2. We split the phi nodes in the inner loop header into a separate basic block to avoid
the situation where use of the outer indvar appears before its def after interchange.
Previously we already did this for innermost loops, now we do it for non-innermost
loops (e.g., middle loops) as well.
Reviewed By: bmahjour, Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D132055
Use load32_zero instead of load32_splat to load the low 32 bits from memory to
v128. Test cases are added to cover this change.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D134257
This patch is to resolve the bug reported and discussed in
https://reviews.llvm.org/D124926#3718761 and https://reviews.llvm.org/D124926#3719876.
The problem is that loop interchange is a loopnest pass under the new pass manager,
but the loop nest may not be constructed correctly by the loop pass manager after
running loop interchange and before running the next pass, which might cause problems
when it continues running the next pass.
The reason that the loop nest is constructed incorrectly is that the outermost
loop might have changed after interchange, and what was the original outermost
loop is not the current outermost loop anymore. Constructing the loop nest based
on the original outermost loop would generate an invalid loop nest.
The fix in this patch is that, in the loop pass manager before running each loopnest
pass, we re-cosntruct the loop nest based on the current outermost loop, if LPMUpdater
notifies the loop pass manager that the previous loop nest has been invalidated by passes
like loop interchange.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D132199
Unicode 15.0 adds 4,489 characters, for a total of 149,186 characters.
These additions include 2 new scripts along with 20 new emoji characters,
and 4,193 CJK ideographs.
This changes modify most existing tables including
- XID_Start/XID_Continue in Clang
- The character name database (used by \N{} in Clang)
- The list of formattable/printable codepoints
- The case folding algorithm (which we had not updated since Unicode 9)
- The list of nonspacing/enclosing marks used by the column width
computation algorithm. The rest of the column width algorithm
is not updated.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D133807
Specifically predicates for extensions that are subsets of other
extensions. These predicates should never be used. Should always
check the superset extension or the superset ORed with the sub extendsion.
This allows for incrementally updating the old API usages without
needing to update everything at once. PDL will be left on Both
for a little bit and then flipped to prefixed when all APIs have been
updated.
Differential Revision: https://reviews.llvm.org/D134387
This allows for incrementally updating the old API usages without
needing to update everything at once. These will be left on Both
for a little bit and then flipped to prefixed when all APIs have been
updated.
Differential Revision: https://reviews.llvm.org/D134386
showBRParamDiagnostics assumed stores happen only via function parameters while that
can also happen via implicit parameters like 'self' or 'this'.
The regression test caused a failed assert in the original cast to ParmVarDecl.
Differential Revision: https://reviews.llvm.org/D133815
Instrumentation just ORs shadow of inputs.
I assume some result shadow bits can be reset if we go into specifics of particular checks,
but as-is it is still an improvement against existing default strict instruction handler, when
every set bit of input shadow is reported as an error.
Reviewed By: kda
Differential Revision: https://reviews.llvm.org/D134123
We have namespaces `DXIL` and `dxil`, which is just confusing. This
renames `DXIL` -> `dxil` making everything consistent.
While the LLVM coding standards don't have a clear direction here, I
chose lower case because by my current unscientific count there are
more places where we had the lowercase namespace than the uppercase.
When `objc_direct` methods were implemented, the implicit `_cmd` parameter was left as an argument to the method implementation function, but was unset by callers; if the method body referenced the `_cmd` variable, a selector load would be emitted inside the body. However, this leaves an unused argument in the ABI, and is unnecessary.
This change removes the empty/unset argument, and if `_cmd` is referenced inside an `objc_direct` method it will emit local storage for the implicit variable. From the ABI perspective, `objc_direct` methods will have the implicit `self` parameter, immediately followed by whatever explicit arguments are defined on the method, rather than having one unset/undefined register in the middle.
Differential Revision: https://reviews.llvm.org/D131424
Name them after the instructions VFCVT_RTZ_X(U)_F_VL to make it
clear that the ISD nodes don't have the poison semantics of
ISD::SINT_TO_FP/UINT_TO_FP.
I play to reuse this node for a FP_TO_SINT_SAT/FP_TO_UINT_SAT
patch and need the instruction semantics.
Tested with the following program:
```
static volatile int* x = nullptr;
void throws() __attribute__((noinline)) {
if (getpid() == 0)
return;
throw "error";
}
void maybe_throws() __attribute__((noinline)) {
volatile int y = 1;
x = &y;
throws();
y = 2;
}
int main(int argc, char** argv) {
int y;
try {
maybe_throws();
} catch (const char* e) {
//printf("Caught\n");
}
y = *x;
printf("%d\n", y); // should be MTE failure.
return 0;
}
```
Built using `clang++ -c -O2 -target aarch64-linux -fexceptions -march=armv8-a+memtag -fsanitize=memtag-heap,memtag-stack`
Currently only Android implements runtime support for MTE stack tagging.
Without this change, we crash on `__cxa_get_globals` when trying to catch
the exception (because the stack frame __cxa_get_globals frame will fail due
to tags left behind on the stack). With this change, we crash on the `y = *x;`
as expected, because the stack frame has been untagged, but the pointer hasn't.
Reviewed By: #libunwind, compnerd, MaskRay
Differential Revision: https://reviews.llvm.org/D128998
This option specifies a GCC installation directory such as
/usr/lib/gcc/x86_64-linux-gnu/12, /usr/lib/gcc/x86_64-gentoo-linux-musl/11.2.0 .
It is intended to replace --gcc-toolchain=, which specifies a directory where
`lib/gcc{,-cross}` can be found. When --gcc-toolchain= is specified, the
selected `lib/gcc/$triple/$version` installation uses complex logic and the
largest GCC version is picked. There is no way to specify another version in the
presence of multiple GCC versions.
D25661 added gcc-config detection for Gentoo: `ScanGentooConfigs`.
The implementation may be simplified by using --gcc-install-dir=.
Reviewed By: mgorny
Differential Revision: https://reviews.llvm.org/D133329
This validation was introduced in D34003 for v_qsad/v_mqsad instructions
but it applies to all instructions with earlyclobber operands, which now
includes v_mad_i64/v_mad_u64.
In all these cases I do not think there is documentation saying that the
destination must not overlap the sources. Rather there are *some* cases
where the instruction may not function correctly if there is an overlap,
and we are using earlyclobber as a conservative way of preventing
codegen from generating those cases.
I think it is unhelpful for the assembler to enforce the earlyclobber
restriction because it prevents assembling cases where the programmer
knows that in fact the overlap is safe.
See also: https://github.com/llvm/llvm-project/issues/57610
Differential Revision: https://reviews.llvm.org/D134272
I don't know what was going on originally with these tests. It seems reasonable
to have the immediate be the same byte alignment unit as the IR, in which case
we need to take the log2 in order to set the right number of low bits.
This fixes a miscompile in chromium.
Differential Revision: https://reviews.llvm.org/D134380