This will use the python that LLVM was configured to use rather than
python from PATH.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D105224
The combine was disabled in 4e22c7265d as it caused failures in
the ppc64be-multistage (bootstrap) bot.
It turns out that the combine did not correctly update the MMO for
the high load which caused aliased stores to be reported as unaliased.
This patch fixes that problem and re-enables the combine.
There are cases where infer address spaces pass cannot yet
infer an address space in the opt pipeline and then in the
llc pipeline it runs too late for atomic expand pass to
benefit from a specific address space.
Move atomic expand pass past the infer address spaces.
Fixes: SWDEV-293410
Differential Revision: https://reviews.llvm.org/D105511
This applies to memory accesses to (compile-time) constant addresses
(such as memory-mapped registers). Currently when a misaligned access
to such an address is detected, a fatal error is reported. This change
will emit a remark, and the compilation will continue with a trap,
and "undef" (for loads) emitted.
This fixes https://llvm.org/PR50838.
Differential Revision: https://reviews.llvm.org/D50524
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.
There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.
The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.
As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base. This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.
This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes). The resulting changes
seem okay to me, but suggestions welcome. As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.
I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.
Recommitting with fix to MemoryDepChecker::isDependent.
Differential Revision: https://reviews.llvm.org/D104806
These are fp->int conversions using either RMM or dynamic rounding modes.
The lround and lrint opcodes have a return type of either i32 or
i64 depending on sizeof(long) in the frontend which should follow
xlen. llround/llrint should always return i64 so we'll need a libcall
for those on rv32.
The frontend will only emit the intrinsics if -fno-math-errno is in
effect otherwise a libcall will be emitted which will not use
these ISD opcodes.
gcc also does this optimization.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D105206
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.
There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.
The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.
As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base. This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.
This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes). The resulting changes
seem okay to me, but suggestions welcome. As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.
I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.
Differential Revision: https://reviews.llvm.org/D104806
The odd register of a (128 bit) register pair is accessed with the 'N' code
with an inline assembly operand.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D105502
This adds a DAG combine to detect sext/zext inputs and emit a
new ISD opcode. The extends will either be removed or replaced
with narrower extends.
Isel patterns are used to match add and widening mul to vwmacc
similar to the recently added vmacc patterns.
There's still some work to be to match vmulsu.
We should also rewrite splats that were extended as scalars and
then splatted.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D104802
Benchmarking has shown that it is worthwhile to implement a variable length
memset of 0 with XC (exclusive or) like gcc does, instead of using a libcall.
This requires the use of the EXecute Relative Long (EXRL) instruction which
can now be done in a framework that can also be used with other target
instructions (not just XC).
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D103865
This avoids the use of the vector unit for copying from scalar to
vector. There is an extra ptrue instruction, but a predicate register
with the ptrue pattern populated is likely to be free in the context of
real code.
Tests were generated from a template to cover the axes mentioned at the
top of the test file.
Co-authored-by: Francesco Petrogalli <francesco.petrogalli@arm.com>
Differential Revision: https://reviews.llvm.org/D103170
Set informational fields in the .shader_functions table.
Also correct the documentation, .scratch_memory_size and .lds_size are
integers.
Differential Revision: https://reviews.llvm.org/D105116
This patch implaments the load and reserve and store conditional
builtins for the PowerPC target, in order to have feature parody with
xlC on AIX.
Differential revision: https://reviews.llvm.org/D105236
For the following case:
t8: i32 = or t7, t4
t10: i32 = ORRWrs t8, t8, TargetConstant:i32<73>
Current code wrongly returns (t8 >> shiftConstant) as the
UsefulBits of t8, which in fact is (t8 | (t8 >> shiftConstant)).
Reviewed by: sdesmalen, mdchen
Differential Revision: https://reviews.llvm.org/D102759
This patch fixes PR50823.
The shuffle mask should be twisted twice before gotten the correct one due to the difference between inner HOP and outer.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104903
Improve codegen when lowering the common vector shuffle case from the
vectorizer (op1[last]:op2[0:last-1]). This patch only handles this
common case as it is difficult to handle this more generally when using
fixed length vectors, due to being unable to use the SVE ext instruction.
Differential Revision: https://reviews.llvm.org/D105289
Added support to check if architecture supports s_mulhi which is used as part of
the decision whether or not to use valu 24 bit mul (if the mulhi gets
transformed to a valu op anyway, then may as well use it).
This is an extension of the work in D97063
Differential Revision: https://reviews.llvm.org/D103321
Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14
This replaces the current ad-hoc implementation,
by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`,
with one exception that here in SimplifyCFG we are allowed to remove EH instructions.
Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return),
arithmetic/logic operations, etc.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D105374
This adds simple patterns for signed and unsigned saturating extract
narrow instructions. They combine a min/max/truncate into a single
instruction, providing that the immediates on the min/max are correct
for the saturation type. This is just handled in tablegen with some
extra patterns.
v2i64->v2i32 is not handled here as the min/max nodes are not legal,
making the lowering quite different.
Differential Revision: https://reviews.llvm.org/D103263
Allocate non-volatile registers in order to be compatible with ABI, regarding gpr_save.
Quoted from https://www.ibm.com/docs/en/ssw_aix_72/assembler/assembler_pdf.pdf page55,
> The preferred method of using GPRs is to use the volatile registers first. Next, use the nonvolatile registers
> in descending order, starting with GPR31.
This patch is based on @jsji 's initial draft.
Tested on test-suite and SPEC, found no degradation.
Reviewed By: jsji, ZarkoCA, xingxue
Differential Revision: https://reviews.llvm.org/D100167
We're trying to match a few pointer computation patterns here for
re-association opportunities.
1) Isolating a constant operand to be on the RHS, e.g.:
G_PTR_ADD(BASE, G_ADD(X, C)) -> G_PTR_ADD(G_PTR_ADD(BASE, X), C)
2) Folding two constants in each sub-tree as long as such folding
doesn't break a legal addressing mode.
G_PTR_ADD(G_PTR_ADD(BASE, C1), C2) -> G_PTR_ADD(BASE, C1+C2)
AArch64 code size improvements on CTMark with -Os:
Program before after diff
pairlocalalign 251048 251044 -0.0%
consumer-typeset 421820 421812 -0.0%
kc 431348 431320 -0.0%
SPASS 413404 413300 -0.0%
clamscan 384396 384220 -0.0%
tramp3d-v4 370640 370412 -0.1%
lencod 432096 431772 -0.1%
bullet 479400 478796 -0.1%
sqlite3 288504 288072 -0.1%
7zip-benchmark 573796 570768 -0.5%
Geomean difference -0.1%
Differential Revision: https://reviews.llvm.org/D105069
Add a flag so that target can choose to use AsmParser for parsing inline asm.
And set the flag by default for AIX.
-no-intergrated-as will override this default if specified explicitly.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D105314
Fixes bugs [[ https://bugs.llvm.org/show_bug.cgi?id=50580 | 50580 ]] and [[ https://bugs.llvm.org/show_bug.cgi?id=49446 | 49446 ]]
When compiling with -g "DBG_VALUE <reg>" instructions are added in the MIR, if such a instruction is inserted between instructions that use <reg> then MachineCopyPropagation invalidates <reg> , this causes some copies to not be propagated and causes differences in code generation (ex bugs 50580 and 49446 ). DBG_VALUE instructions should be ignored since they don't actually modify the register.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D104394
While this should not matter for most architectures (where the program
address space is 0), it is important for CHERI (and therefore Arm Morello).
We use address space 200 for all of our code pointers and without this
change we assert in the SelectionDAG handling of BlockAddress nodes.
It is also useful for AVR: previously programs targeting
AVR that attempt to read their own machine code
via a pointer to a label would instead read from RAM
using a pointer relative to the the start of program flash.
Reviewed By: dylanmckay, theraven
Differential Revision: https://reviews.llvm.org/D48803
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Differential Revision: https://reviews.llvm.org/D104797
In `IRTranslator::translateGetElementPtr`, when we run into a vector gep with
some scalar operands, we try to normalize those operands using
`buildSplatVector`.
This is fine except for when the getelementptr has a <1 x N> type. In that case
it is treated as a scalar. If we run into one of these then every call to
```
// With VectorWidth = 1
LLT::fixed_vector(VectorWidth, PtrTy)
```
will assert.
Here's an example (equivalent to the added testcase):
https://godbolt.org/z/hGsTnMYdW
To get around this, this patch adds a variable, `WantSplatVector`, which
is true when our vector type ought to actually be represented using a vector.
When it's false, we'll translate as a scalar. This checks if `VectorWidth > 1`.
This fixes this bug:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35496
Differential Revision: https://reviews.llvm.org/D105316
Target-independent code only knows how to spill to the stack; instead,
use AArch64ISD::REINTERPRET_CAST.
Differential Revision: https://reviews.llvm.org/D104573
D104868 removed an (incorrect) fold for distributing BFI instructions in
a chain, combining them into a single instruction. BFIs like that are
hard to test, as the patterns are often destroyed before they become
BFIs. But it can come up in places, with chains of BFIs that can be
combined.
This patch adds a replacement, which reassociates BFI instructions with
non-overlapping insertion masks so that low bits are inserted first.
This can end up sorting the nodes so that adjacent inserts are next to
one another, allowing the existing folds to combine into a single BFI.
Differential Revision: https://reviews.llvm.org/D105096
Inserting into a smaller-than-legal scalable vector would result in an
internal compiler error. For example, inserting a <vscale x 4 x i8> into
a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a
crash.
This crash was happening because there was no code to promote (legalise)
the result of an INSERT_SUBVECTOR node.
This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises
the ISD node. This is currently done by going through memory. This is
necessary because of the requirement that the SubVec parameter of the
INSERT_SUBVECTOR node must be smaller than the Vec parameter, which
means that INSERT_SUBVECTOR cannot always have a legal result/operand
types.
Co-Authored-by: Joe Ellis <joe.ellis@arm.com>
Differential Revision: https://reviews.llvm.org/D102766