Silence warning Undefined Behavior Sanitzer warning:
runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D90710
We don't need custom matching, we just a need a predicate to check
the immediate is greater than 32. We can use the existing ImmSub32
to adjust the immediate.
I've also used the new predicate in the other location that used
ImmSub32. I tried to create a test case where we would break without
the greater than 32 check on that pattern, but DAG combine defeated me.
Still seemed safer to have it.
Differential Revision: https://reviews.llvm.org/D90546
Pseudo-registers allow different register encodings
between gpu generations. Make sure we resolve the
pseudo regs to real regs whenever we get their
hardware encoding.
Using the correct encodings revealed a register
bank conflict and an unnecessary write dependency.
Tests have been updated to match.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D90721
Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca
This allows those instrumentation to log when they decide to skip a
pass. This provides extra helpful info for optnone functions and also
will help with opt-bisect.
Have OptNoneInstrumentation print when it skips due to seeing optnone.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90545
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.
This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.
Differential Revision: https://reviews.llvm.org/D90596
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90612
Allows the MACRO directive to define macro procedures with parameters and macro-local symbols.
Supports required and optional parameters (including default values), and matches ml64.exe for its macro-local symbol handling (up to 65536 macro-local symbols in any translation unit).
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D89729
This is slightly better compile-time wise,
since we avoid potentially-costly knownbits analysis that will
ultimately not allow us to actually do anything with said `add`.
`+vpu` controls whether VEISelLowering adds any vregs. This defaults to
`-vpu` to have scalar code generation out of the box. We bring up
vector isel under the `+vpu` flag. Once vector isel is stable we switch
to `+vpu` and advertise vregs and vops in TTI.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D90465
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.
Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D89382
Previously, the default value for ieee mode was
- on for compute kernels and compute shaders,
- off for all shaders except compute shaders.
This commit changes the default to be
- on for compute kernels,
- off for shaders.
This aligns the default value with the settings that are actually in
use. To my knowledge, all users of shader calling conventions (mesa and
llpc) disable the ieee mode by default.
Differential Revision: https://reviews.llvm.org/D89388
This patch replaces the AArch64StackOffset class by the generic one
defined in TypeSize.h.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D88983
This reverts commit 59b22e495c.
That commit broke building for ARM and AArch64, reproducible like this:
$ cat apedec-reduced.c
a;
b(e) {
int c;
unsigned d = f();
c = d >> 32 - e;
return c;
}
g() {
int h = i();
if (a)
h = h << a | b(a);
return h;
}
$ clang -target aarch64-linux-gnu -w -c -O3 apedec-reduced.c
clang: ../lib/Transforms/InstCombine/InstructionCombining.cpp:3656: bool llvm::InstCombinerImpl::run(): Assertion `DT.dominates(BB, UserParent) && "Dominance relation broken?"' failed.
Same thing for e.g. an armv7-linux-gnueabihf target.
- Basically iterate each pair of memory operands from both instructions
and return true if any of them may alias.
- The exception are memory instructions without any memory operand. They
may touch everything and could alias to any memory instruction.
Differential Revision: https://reviews.llvm.org/D89447
For the <2 x float> case, instead of adding another combine or legalization to
get it into a <4 x float> form, I'm just adding a GISel specific selection
pattern to cover it.
Differential Revision: https://reviews.llvm.org/D90699
When machine instructions are in the form of
```
%0 = CONST_I32 @str
%1 = ADD_I32 %stack.0, %0
%2 = LOAD 0, 0, %1
```
In the `ADD_I32` instruction, it is possible to fold it if `%0` is a
`CONST_I32` from an immediate number. But in this case it is a global
address, so we shouldn't do that. But we haven't checked if the operand
of `ADD` is an immediate so far. This fixes the problem. (The case
applies the same for `ADD_I64` and `CONST_I64` instructions.)
Fixes https://bugs.llvm.org/show_bug.cgi?id=47944.
Patch by Julien Jorge (jjorge@quarkslab.com)
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D90577
This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).
Differential Revision: https://reviews.llvm.org/D90692
There is already an API in BasicBlock that checks and returns the musttail call if it precedes the return instruction.
Use it instead of manually checking in each place.
Differential Revision: https://reviews.llvm.org/D90693
InstCombine is quite aggressive in doing the opposite transform,
folding `add` of operands with no common bits set into an `or`,
and that not many things support that new pattern..
In this case, teaching Reassociate about it is easy,
there's preexisting art for `sub`/`shl`:
just convert such an `or` into an `add`:
https://rise4fun.com/Alive/Xlyv
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.
With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.
Differential Revision: https://reviews.llvm.org/D89896
This patch adds a new "heap type" operand kind to the WebAssembly MC
layer, used by ref.null. Currently the possible values are "extern" and
"func"; when typed function references come, though, this operand may be
a type index.
Note that the "heap type" production is still known as "refedtype" in
the draft proposal; changing its name in the spec is
ongoing (https://github.com/WebAssembly/reference-types/issues/123).
The register form of ref.null is still untested.
Differential Revision: https://reviews.llvm.org/D90608
As discussed on D90527, we should be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
DAGCombine doesn't canonicalize rotl/rotr with immediate so we
need patterns for both.
Remove the custom matcher for rotl to RORI and just use a SDNodeXForm
to convert the immediate instead. Doing this gives priority to the
rev32/rev16 versions of grevi over rori since an explicit immediate
is more precise than any immediate. I also added rotr patterns for
rev32/rev16. And removed the (or (shl), (shr)) patterns that should be
combined to rotl by DAG combine.
There is at least one other grev pattern that probably needs a
another rotr pattern, but we need more test coverage first.
Differential Revision: https://reviews.llvm.org/D90575
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.
We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.
Differential Revision: https://reviews.llvm.org/D90607
This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.
Replaces https://reviews.llvm.org/D74158
Differential Revision: https://reviews.llvm.org/D89613
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.
We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.