This patch adds the following built-ins:
__builtin_vsx_build_pair
__builtin_mma_build_acc
Reviewed By: #powerpc, nemanjai, lei
Differential Revision: https://reviews.llvm.org/D107647
SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.
This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.
This fixes llvm.org/PR51964
Recommit with "REQUIRES: asserts" in test that uses statistics.
With improved analysis in determining CFG equivalence that does
not require strict dominance and post-dominance conditions, we
now relax isSafeToMoveBefore() such that an instruction I can
be moved before InsertPoint even if they do not strictly dominate
each other, as long as they follow the same control flow path.
For example, we can move Instruction 0 before Instruction 1,
and vice versa.
```
if (cond1)
// Instruction 0: %add = add i32 1, 2
if (cond1)
// Instruction 1: %add2 = add i32 2, 1
```
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D110456
The MSP430 ABI supports build attributes for specifying
the ISA, code model, data model and enum size in ELF object files.
Differential Revision: https://reviews.llvm.org/D107969
In reaction to the issues raised by Richard in https://llvm.org/D109066,
this commit does not apply P1951 as a DR in previous standard modes,
since it breaks valid code.
I do believe it should be applied as a DR, however ideally we'd get some
sort of statement from the Committee to this effect (and all implementations
would behave consistently). In the meantime, only implement P1951 starting
with C++23 -- we can always come back and apply it as a DR if that's what
the Committee says.
Differential Revision: https://reviews.llvm.org/D110347
Precommit testcase for D110438. Since we do not preserve BPI in loop
pass manager, we are forced to compute BPI everytime Loop predication is
invoked.
The patch referenced changes that behaviour by preserving lossy BPI for
loop passes.
GWP-ASan's `AllocatorState` was recently extended with a
`AllocatorVersionMagic` structure required so that GWP-ASan bug reports
can be understood by tools at different versions.
On Fuchsia, this in included in the `scudo::Allocator` structure, and
by having non-zero initializers, this effectively moved the static
allocator structure from the `.bss` segment to the `.data` segment, thus
increasing (significantly) the size of the libc.
This CL proposes to initialize the structure with its magic numbers at
runtime, allowing for the allocator to go back into the `.bss` segment.
I will work on adding a test on the Scudo side to ensure that this type
of changes get detected early on. Additional work is also needed to
reduce the footprint of the (large) memory-tagging related structures
that are currently part of the allocator.
Differential Revision: https://reviews.llvm.org/D110575
These values allow, for example, `--target=aarch64` and
`--target=aarch64-linux-gnu` to detect `aarch64-linux-android`. This is
confusing. Users should specify `--target=aarch64-linux-android` to get Android GCC
installation.
Reverts D53463.
Reviewed By: nickdesaulniers, danalbert
Differential Revision: https://reviews.llvm.org/D110379
Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities.
This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build:
1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities.
2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time.
Implementation-wise this adds the following summary function attributes:
1. noUnwind: function is noUnwind
2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions)
3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well)
Testing:
Clang self-build passes and 2nd stage build passes check-all
ninja check-all with newly added tests passing
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D36850
Adapt the signature of the PaddingValueComputationFunction callback to either return the padding value or failure to signal padding is not desired.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D110572
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.
For load we have:
https://godbolt.org/z/zP4hd8MT6 - for intels `Block RThroughput: =150.0`; for ryzens, `Block RThroughput: <=59`
So pick cost of `150`.
For store we have:
https://godbolt.org/z/vKb8zTK8E - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `64`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110548
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =75.0`; for ryzens, `Block RThroughput: <=29.5`
So pick cost of `75`. (note that `# 32-byte Reload` does not affect throughput there.)
For store we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `32`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110543
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/dd8T5P471 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=14.5`
So pick cost of `33`.
For store we have:
https://godbolt.org/z/zPxcKWhn4 - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `10`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110541
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/rnsf639Wh - for intels `Block RThroughput: =17.0`; for ryzens, `Block RThroughput: <=7.5`
So pick cost of `17`.
For store we have:
https://godbolt.org/z/565KKrcY6 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `6`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110537
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/5EYc6r9nh - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.
For store we have:
https://godbolt.org/z/z61e5d6GE - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D110536
This patch adds a new preprocessor extension ``#pragma clang final``
which enables warning on undefinition and re-definition of macros.
The intent of this warning is to extend beyond ``-Wmacro-redefined`` to
warn against any and all alterations to macros that are marked `final`.
This warning is part of the ``-Wpedantic-macros`` diagnostics group.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D108567
This integration tests runs a fused and non-fused version of
sampled matrix multiplication. Both should eventually have the
same performance!
NOTE: relies on pending tensor.init fix!
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110444
Fixes 51982. Minor refactor to remove `return x = y` construct.
Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\
blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting
parts while checking the assertion failure still occurred.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110556
This revision makes sure that when the output buffer materializes locally
(in contrast with the passing in of output tensors either in-place or not
in-place), the zero initialization assumption is preserved. This also adds
a bit more documentation on our sparse kernel assumption (viz. TACO
assumptions).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110442
We see that it might otherwise do:
%10 = getelementptr {}**, <2 x {}***> %9, <2 x i32> <i32 10, i32 4>
%11 = bitcast <2 x {}***> %10 to <2 x i64*>
...
%27 = extractelement <2 x i64*> %11, i32 0
%28 = bitcast i64* %27 to <2 x i64>*
store <2 x i64> %22, <2 x i64>* %28, align 4, !tbaa !2
Which is an out-of-bounds store (the extractelement got offset 10
instead of offset 4 as intended). With the fix, we correctly generate
extractelement for i32 1 and generate correct code.
Differential Revision: https://reviews.llvm.org/D106613
Fixes https://bugs.llvm.org/show_bug.cgi?id=51790. The check triggers
incorrectly with non-type template parameters.
A bisect determined that the bug was introduced here:
ea2225a10b
Unfortunately that patch can no longer be reverted on top of the main
branch, so add a fix instead. Add a unit test to avoid regression in
the future.
Enforce constraints C1034 & C1038, which disallow the use
of otherwise valid statements as branch targets when they
appear in FORALL &/or WHERE constructs. (And make the
diagnostic message somewhat more user-friendly.)
Differential Revision: https://reviews.llvm.org/D109936