These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.
This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.
Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.
Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
Patch adds support for passing vector call operands to variadic
functions. Arguments which are fixed shadow GPRs and stack space even
when they are passed in vector registers, while arguments passed through
ellipses are passed in properly aligned GPRs if available and on the
stack once all GPR arguments registers are consumed.
Differential Revision: https://reviews.llvm.org/D97956
We have the `enable-loopinterchange` option in legacy pass manager but not in NPM.
Add `LoopInterchange` pass to the optimization pipeline (at the same position as before)
when `enable-loopinterchange` is turned on.
Reviewed By: aeubanks, fhahn
Differential Revision: https://reviews.llvm.org/D98116
This patch modifies the x86_64 XRay trampolines to fix the CFI information
generated by the assembler. One of the main issues in correcting the CFI
directives is the `ALIGNED_CALL_RAX` macro, which makes the CFA dependent on
the alignment of the stack. However, this macro is not really necessary because
some additional assumptions can be made on the alignment of the stack when the
trampolines are called. The code has been written as if the stack is guaranteed
to be 8-bytes aligned; however, it is instead guaranteed to be misaligned by 8
bytes with respect to a 16-bytes alignment. For this reason, always moving the
stack pointer by 8 bytes is sufficient to restore the appropriate alignment.
Trampolines that are called from within a function as a result of the builtins
`__xray_typedevent` and `__xray_customevent` are necessarely called with the
stack properly aligned so, in this case too, `ALIGNED_CALL_RAX` can be
eliminated.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49060
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D96785
The hackery is due to glibc clock_gettime crashing from preinit_array (D40679).
32-bit musl architectures do not define `__NR_clock_gettime` so the code causes a compile error.
Tested on Alpine Linux x86-64 (musl) and FreeBSD x86-64.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D96925
GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet.
However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress.
This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.)
Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect.
Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi.
Differential Revision: https://reviews.llvm.org/D98080
As a pragmatic tradeoff, the ease of updating the tests outweighs the slightly easier to understand test conditions. Where revevant, debug output was converted to comments to help human understanding.
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from spv.camelCase to spv.CamelCase everywhere. For ops that
don't have a SPIR-V spec counterpart, we use spv.mlir.snake_case.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D98014
Spack is a package management tool extensively used by HPC community.
As ROCm packages are built by Spack by HPC community, we need to teach
clang driver to detect ROCm installation built by Spack.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D97340
Normally tensors will be stored in buffers before converting to SPIR-V,
given that is how a large amount of data is sent to the GPU. However,
SPIR-V supports converting from tensors directly too. This is for the
cases where the tensor just contains a small amount of elements and it
makes sense to directly inline them as a small data array in the shader.
To handle this, internally the conversion might create new local
variables. SPIR-V consumers in GPU drivers may or may not optimize that
away. So this has implications over register pressure. Therefore, a
threshold is used to control when the patterns should kick in.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D98052
That review is extracted from D69372.
It fixes https://bugs.llvm.org/show_bug.cgi?id=42219 bug.
For the noimplicitfloat mode, the compiler mustn't generate
floating-point code if it was not asked directly to do so.
This rule does not work with variable function arguments currently.
Though compiler correctly guards block of code, which copies xmm vararg
parameters with a check for %al, it does not protect spills for xmm registers.
Thus, such spills are generated in non-protected areas and could break code,
which does not expect floating-point data. The problem happens in -O0
optimization mode. With this optimization level there is used
FastRegisterAllocator, which spills virtual registers at basic block boundaries.
Register Allocator does not protect spills with additional control-flow modifications.
Thus to resolve that problem, it is suggested to not copy incoming physical
registers into virtual registers. Instead, store incoming physical xmm registers
into the memory from scratch.
Differential Revision: https://reviews.llvm.org/D80163
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).
Rather than adjusting the type check, simply drop it entirely,
as getAggregateElement() is well-defined for non-aggregates: It
simply returns null in that case.
Instead of handling a number of special cases for selects, handle
this generally when inferring ranges from conditions. We already
infer ranges from `x + C pred C2` to `x`, so doing the same for
`x pred C2` to `x + C` is straightforward.
It moved the logic for CMake target arguments into llvm_ExternalProject_Add().
No handling was added for CMAKE_CROSSCOMPILING, which has a separate set of compiler_args.
This broke crosscompiling, as now the runtimes builds defaulted to the compiler's default.
I've also added passing of CMAKE_ASM_COMPILER, which was missing before although we were passing the triple for it.
Reviewed By: zero9178
Differential Revision: https://reviews.llvm.org/D97855
These tests didn't test the pattern they were supposed to, because
%a instead of %add was used in the select, which turned this into
a normal min/max).
Noticed this when commenting out the clamp handling code did not
result in any test failures...
Without this patch the file list of the preamble index contains URIs, but other indexes file lists contain file paths.
This makes `indexedFiles()` always returns `IndexContents::None` for the preamble index, because current implementation expects file paths inside the file list of the index.
This patch fixes this problem and also helps to avoid a lot of URI to path conversions during indexes merge.
Reviewed By: kadircet
Differential Revision: https://reviews.llvm.org/D97535
If cross testing (and manually specifying a LIBCXX_TARGET_INFO in the
cmake configuration, as the default is to match the build platform),
we want the accessors for querying the target platform, is_windows,
is_darwin, to return the right value depending on which target info
class is used, not based on what platform is running the build and
driving the tests.
When LIBCXX_TARGET_INFO isn't defined, the right target info class
is chosen automatically based on the platform one is running on, so
this shouldn't make any practical difference for such setups.
Differential Revision: https://reviews.llvm.org/D98045
For MinGW targets, we distinguish between an explicitly shared unwinder
library (requested via -shared-libgcc), an explicitly static one
(requested via -static-libgcc or -static) and the default case (which
just passes -lunwind to the linker, which will pick either shared or
static depending on what's available, with the normal linker logic).
This makes the implicit default case (as added in D79995) actually work as
it was intended, when using the g++ driver (which is the main usecase for
libunwind as far as I know).
Differential Revision: https://reviews.llvm.org/D98023
CLANG_DEFAULT_RTLIB had a typo, and libunwind isn't a valid
option for it.
This keeps the actual behaviour from before, defaulting to none if
using compiler-rt as rtlib.
Differential Revision: https://reviews.llvm.org/D98022
Implements parts of:
- P0898R3 Standard Library Concepts
- P1754 Rename concepts to standard_case for C++20, while we still can
Depends on D96742
Differential Revision: https://reviews.llvm.org/D97162
Prior to this fix, constrained decltype(auto) behaves exactly the same
as constrained regular auto.
This fixes it so it deduces like decltype(auto).
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D98087
We've observed this test being significantly flaky on our Mac CI
machines when we're running the full check-clang suite. It fails because
the wait_for condition isn't met within 3 seconds. We believe it's
because our CI machines are somewhat underpowered and pretty heavily
loaded when we're running the full check-clang suite.
I ran some experiments on increasing the timeout. I ran the full
check-clang suite 100 times with each timeout value and recorded how
many flaky failures we encountered in these tests. The results are:
3 second timeout (baseline): 20 failures
10 second timeout: 14 failures
20 second timeout: 4 failures
30 second timeout: 2 failures
40 second timeout: 1 failure
50 second timeout: 0 failures
60 second timeout: 0 failures
I ran another set of 100 tests for the 50 second timeout and observed
one flaky failure. By contrast, I ended up running check-clang 500 times
for the 60 second timeout and didn't observe a single flaky failure.
That's how the 60 second timeout value used in this patch was derived.
While a 60 second timeout might seem high, keep in mind that:
- This is a timeout, not a sleep; the test should require much less time
the vast majority of instances, especially on more powerful machines.
- The long timeout is most likely to occur when other tests are also
running at the same time, so the latency of the timeout will also be
masked by the latency of the other tests.
See https://reviews.llvm.org/D58418?id=200123#inline-554211 for where
this timeout was originally introduced and the possibility of raising it
if it wasn't enough was discussed.
Reviewed By: plotfi
Differential Revision: https://reviews.llvm.org/D97878