This allows the instruction selector to realize that it can directly
broadcast the low byte of the memset value, rather than replicating
it to a 64-bit GPR before broadcasting.
This fixes PR50985.
Differential Revision: https://reviews.llvm.org/D108354
Android enables zero initialisation globally by default, but also allows
subprojects to override with different option. Clang complains the above
flag being unused in this case.
Instead of adding a 75 char long -no-* flag, don't warn unused argument
for this flag.
Differential Revision: https://reviews.llvm.org/D108278
This was probably bugging more than is reasonable, but it makes merging
changes in this file slightly less annoying to have the trailing comma
here. I only noticed this because Rust is currently carrying a patch to
this file and it kept making life a little difficult.
Require debug build for CodeGen/X86/fsafdo_test2.ll since it checks for
messages only printed in debug mode.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D108364
We recently enabled crt for powerpc in
https://reviews.llvm.org/rGb7611ad0b16769d3bf172e84fa9296158f8f1910.
And we started to see some unexpected error message when running
check-runtimes.
eg:
https://lab.llvm.org/buildbot/#/builders/57/builds/9488/steps/6/logs/stdio
line 100 - 103:
"
clang-14: error: unknown argument: '-m64 -fno-function-sections'
clang-14: error: unknown argument: '-m64 -fno-function-sections'
clang-14: error: unknown argument: '-m64 -fno-function-sections'
clang-14: error: unknown argument: '-m64 -fno-function-sections'
"
Looks like we shouldn't strip the space at the beginning,
or else the command line passed to subprocess won't work well.
Reviewed By: phosek, MaskRay
Differential Revision: https://reviews.llvm.org/D108329
This changes the lowering of saddsat and ssubsat so that instead of
using:
r,o = saddo x, y
c = setcc r < 0
s = c ? INTMAX : INTMIN
ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
r,o = saddo x, y
s = ashr r, BW-1
x = xor s, INTMIN
ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD
This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.
Differential Revision: https://reviews.llvm.org/D105853
Previously we pre-calculated this and cached it for every
instruction in the function. Most of the calculated results will
never be used. So instead calculate it only on the first use, and
then cache it.
The cache was originally added to fix a compile time issue which
caused r216066 to be reverted.
This change exposed that we weren't pre-computing the Value for
Arguments. I've explicitly disabled that for now as it seemed to
regress some tests on AArch64 which has sext built into its compare
instructions.
Spotted while investigating how to improve heuristics to work better
with RISCV preferring sign extend for unsigned compares for i32 on RV64.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D107976
This encapsulates the APInt creation and worklist management into
a helper function.
To keep one common interface I've use Log2_32 in places that
previously created a mask by subtracting 1 from a power of 2.
Differential Revision: https://reviews.llvm.org/D108324
Apply the "for loop peeling" pattern from SCF dialect transforms. This pattern splits scf.for loops into full and partial iterations. In the full iteration, all masked loads/stores are canonicalized to unmasked loads/stores.
Differential Revision: https://reviews.llvm.org/D107733
This reverts commit 9934a5b2ed.
This patch may cause miscompiles because it missed a constraint
as shown in the examples from:
https://llvm.org/PR51531
This makes the intrinsic logic match the cmp+select idiom folds
just below. It's not clearly a win either way unless we think
that a 'not' op costs more than min/max.
The cmp+select folds on these patterns are more extensive than
the intrinsics currently and may have some complicated interactions,
so I'm trying to make those line up and bring the optimizations
for intrinsics up to parity.
[nfc] Replaces enum indices into an array with a struct. Named the
fields to match the enum, leaves memory layout and initialization unchanged.
Motivation is to later safely remove dead fields and replace redundant ones
with (compile time) computation. It should also be possible to factor some
common fields into a base and introduce a gfx10 amdgpu instance with less
duplication than the arrays of integers require.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D108339
There is an assertion failure in computeOverflowForUnsignedMul
(used in checkOverflow) due to the inner and outer trip counts
having different types. This occurs when the IV has been widened,
but the loop components are not successfully rediscovered.
This is fixed by some refactoring of the code in findLoopComponents
which identifies the trip count of the loop.
Differential Revision: https://reviews.llvm.org/D108107
This patch adds the beginnings of more thorough support in the
legalizers for vector-predicated (VP) operations.
The first step is the ability to widen illegal vectors. The more
complicated scenario in which the result/operands need widening but the
mask doesn't has not been handled here. That would require a lot of code
without an in-tree target on which to test it.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D107904
Completion now looks more like function/member completion:
used
alias(Aliasee)
abi_tag(Tags...)
Differential Revision: https://reviews.llvm.org/D108109
Change 636428c727 enabled BlockingRegion hooks for pthread_once().
Unfortunately this seems to cause crashes on Mac OS X which uses
pthread_once() from locations that seem to result in crashes:
| ThreadSanitizer:DEADLYSIGNAL
| ==31465==ERROR: ThreadSanitizer: stack-overflow on address 0x7ffee73fffd8 (pc 0x00010807fd2a bp 0x7ffee7400050 sp 0x7ffee73fffb0 T93815)
| #0 __tsan::MetaMap::GetSync(__tsan::ThreadState*, unsigned long, unsigned long, bool, bool) tsan_sync.cpp:195 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x78d2a)
| #1 __tsan::MutexPreLock(__tsan::ThreadState*, unsigned long, unsigned long, unsigned int) tsan_rtl_mutex.cpp:143 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x6cefc)
| #2 wrap_pthread_mutex_lock sanitizer_common_interceptors.inc:4240 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x3dae0)
| #3 flockfile <null>:2 (libsystem_c.dylib:x86_64+0x38a69)
| #4 puts <null>:2 (libsystem_c.dylib:x86_64+0x3f69b)
| #5 wrap_puts sanitizer_common_interceptors.inc (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x34d83)
| #6 __tsan::OnPotentiallyBlockingRegionBegin() cxa_guard_acquire.cpp:8 (foo:x86_64+0x100000e48)
| #7 wrap_pthread_once tsan_interceptors_posix.cpp:1512 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2f6e6)
From the stack trace it can be seen that the caller is unknown, and the
resulting stack-overflow seems to indicate that whoever the caller is
does not have enough stack space or otherwise is running in a limited
environment not yet ready for full instrumentation.
Fix it by reverting behaviour on Mac OS X to not call BlockingRegion
hooks from pthread_once().
Reported-by: azharudd
Reviewed By: glider
Differential Revision: https://reviews.llvm.org/D108305
With -fpreserve-vec3-type enabled, a cast was not created when
converting from a vec3 type to a non-vec3 type, even though a
conversion to vec4 was performed. This resulted in creation of
invalid store instructions.
Differential Revision: https://reviews.llvm.org/D107963
Refactored implementation of AddressSanitizerPass and
HWAddressSanitizerPass to use pass options similar to passes like
MemorySanitizerPass. This makes sure that there is a single mapping
from class name to pass name (needed by D108298), and options like
-debug-only and -print-after makes a bit more sense when (despite
that it is the unparameterized pass name that should be used in those
options).
A result of the above is that some pass names are removed in favor
of the parameterized versions:
- "khwasan" is now "hwasan<kernel;recover>"
- "kasan" is now "asan<kernel>"
- "kmsan" is now "msan<kernel>"
Differential Revision: https://reviews.llvm.org/D105007
SmallBitVector implements a level of indirection over BitVector by
storing a smaller bit-vector in a pointer-sized element, or in case the
number of elements exceeds the bucket size, it creates a new pointer to
a BitVector and uses that as its storage.
However, the functions returning the vector size were using `unsigned`,
which is ok for BitVector, but not for SmallBitVector, which is actually
`uintptr_t`.
This commit reuses the `size_type` definition to more than just `count`
and propagates them into range iteration, size calculation, etc.
This is a continuation of D108124.
I haven't changed all occurrences of `unsigned` or `uintptr_t` to
`size_type`, just those that were directly related.
Following directions from clang-tidy on case of variables.
Differential Revision: https://reviews.llvm.org/D108290
Currently, `printHelp` behaves differently for options that:
* do not define `HelpText` (such options _are not printed_), and
* define its `HelpText` as `HelpText<"">` (such options _are printed_).
In practice, both approaches lead to no help text and `printHelp` should
treat them consistently. This patch addresses that by making
`printHelpt` check the length of the help text to be printed.
All affected tests have been updated accordingly. The option definitions
for llvm-cvtres have been updated with a short description or "Not
implemented" for options that are ignored by the tool.
Differential Revision: https://reviews.llvm.org/D107557
I have added a new TTI interface called enableOrderedReductions() that
controls whether or not ordered reductions should be enabled for a
given target. By default this returns false, whereas for AArch64 it
returns true and we rely upon the cost model to make sensible
vectorisation choices. It is still possible to override the new TTI
interface by setting the command line flag:
-force-ordered-reductions=true|false
I have added a new RUN line to show that we use ordered reductions by
default for SVE and Neon:
Transforms/LoopVectorize/AArch64/strict-fadd.ll
Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
Differential Revision: https://reviews.llvm.org/D106653
Replacing Hello World example Plugin with one that counts and prints the names of
functions and subroutines.
This involves changing the `PluginParseTreeAction` Plugin base class to
inherit from `PrescanAndSemaAction` class to get access to the Parse Tree
so that the Plugin can walk it.
Additionally, there are tests of this new Plugin to check it prints the correct
things in different circumstances.
Depends on: D106137
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D107089
Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.
affine.min ops such as:
```
map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]
```
are rewritten them into (in the case the peeled loop):
```
%r = %step
```
To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.
Differential Revision: https://reviews.llvm.org/D107222
This is very similar to CPU_TIME, except that we return nanoseconds
rather than seconds. This means we're potentially dealing with rather
large numbers, so we'll have to wrap around to avoid overflows.
Differential Revision: https://reviews.llvm.org/D105970