This is currently built on top of the SelectionDAG call lowering, but
does not use it the same way. SelectionDAG passes legalized types to
the assignment functions, and the tablegenerated assignment functions
may change the value types expected for registers. This does not
change the types used, just moves the register creation to help fix
this in the future.
Defer the register creation until after all of the assignment
decisions have been made. This will also help have correct tail call
compatibility checking in a future change. Currently it does not work
as expected for any arguments split across multiple registers.
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.
AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.
Differential Revision: https://reviews.llvm.org/D100770
This demotes the apple-a12 CPU selection for arm64e to just be the
last-resort default. Concretely, this means:
- an explicitly-specified -mcpu will override the arm64e default;
a user could potentially pick an invalid CPU that doesn't have
v8.3a support, but that's not a major problem anymore
- arm64e-apple-macos (and variants) will pick apple-m1 instead of
being forced to apple-a12.
apple-m1 has the same level of ISA support as apple-a14,
so this is a straightforward mechanical change. However, that
also means this inherits apple-a14's v8.5a+nobti quirkiness.
rdar://68287159
Based on D100682 and D99855.
(Note: I originally was going to just make this part of D99855, but I decided not to because this patch moves lots of unrelated code around, and I didn't want to make D99855 harder to review because of unrelated code-changes/moves.)
Differential Revision: https://reviews.llvm.org/D100686
* adds `iterator_traits` specialisation that supports all expected
member aliases except for `pointer`
* adds `iterator_traits` specialisations for iterators that meet the
legacy iterator requirements but might lack multiple member aliases
* makes pointer `iterator_traits` specialisation require objects
Depends on D99854.
Differential Revision: https://reviews.llvm.org/D99855
The generic SoftFloatVectorExtract.ll test was failing when run on arm
machines, as it tries to create a f64 under soft float. Limit the
transform to when f64 is legal.
Also add a missing override, as reported in D100244.
As reported in PR50025, sometimes we would end up not emitting functions
needed by inline multiversioned variants. This is because we typically
use the 'deferred decl' mechanism to emit these. However, the variants
are emitted after that typically happens. This fixes that by ensuring
we re-run deferred decls after this happens. Also, the multiversion
emission is done recursively to ensure that MV functions that require
other MV functions to be emitted get emitted.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D100495
std.xor ops on bool are lowered to spv.LogicalNotEqual. For Boolean values, xor
and not-equal are the same thing.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D100817
Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types,
and lower them to the appropriate SVE instructions.
Additionally now that the MULH nodes are legal, integer divides can be
expanded into a more performant code sequence.
Differential Revision: https://reviews.llvm.org/D100487
This adds a combine for extract(x, n); extract(x, n+1) ->
VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the
same time in a single instruction, and thanks to the other VMOVRRD folds
we have added recently can help reduce the amount of executed
instructions. Floating point types are very similar, but will include a
bitcast to an integer type.
This also adds a shouldRewriteCopySrc, to prevent copy propagation from
DPR to SPR, which can break as not all DPR regs can be extracted from
directly. Otherwise the machine verifier is unhappy.
Differential Revision: https://reviews.llvm.org/D100244
This is just a small update that makes sure that errors arising from
parsing command-line options are captured more visibly. Also, all
parsing methods will now consistently return either a bool ("may fail")
or void ("never fails").
An instance of `InputKind` coming from `-x` is added to
`FrontendOptions` rather then being returned from `ParseFrontendArgs`.
It's currently not used, but we will require it shortly. In particular,
once code-generation is available we will use it to differentiate
between LLVM IR and Fortran input. `FrontendOptions` is a very suitable
place to keep it.
This changes don't affect the error reporting in the driver. In this
respect these are non-functional-changes. However, it will simplify
things in the forthcoming patches in which we may need a better error
tracking/recovery mechanism.
Differential Revision: https://reviews.llvm.org/D100556
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D100495
Currently a CHECK-NOT directive succeeds whenever the corresponding
match fails. However match can fail due to an error rather than a lack
of match, for instance if a variable is undefined. This commit makes match
error a failure for CHECK-NOT.
Reviewed By: jdenny
Differential Revision: https://reviews.llvm.org/D86222
Instructions on the transcendental unit are executed in parallel to the
normal VALU, so add this as an extra resource.
This doesn't seem to have any effect, but it should be more correct.
Differential Revision: https://reviews.llvm.org/D100123
This is similar to https://reviews.llvm.org/D100309, i.e. `%f18` is
replaced with `%flang_new`.
resolve105.f90 wasn't in tree when D100309 was worked on, so it's
updated here instead.
label14.f90 requires `-fsyntax-only`. I didn't notice that when
submitting D100309, hence updating it now instead. `-fsyntax-only` is
required to prevent `%f18` from calling an external compiler (which then
fails and returns a non-zero exit code).
Differential Revision: https://reviews.llvm.org/D100655
Make sure that the `CriticalMemoryInstruction` of a memory group is invalidated
if it references an already executed instruction. This avoids a potential
use-after-free if the critical memory info becomes stale, and the value is
read after the instruction has executed.
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.
This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.
Differential revision: https://reviews.llvm.org/D100478
This implements an LLVM tool that's flag- and output-compatible
with macOS's `otool` -- except for bugs, but from testing with both
`otool` and `xcrun otool-classic`, llvm-otool matches vanilla
otool's behavior very well already. It's not 100% perfect, but
it's a very solid start.
This uses the same approach as llvm-objcopy: llvm-objdump uses
a different OptTable when it's invoked as llvm-otool. This
is possible thanks to D100433.
Differential Revision: https://reviews.llvm.org/D100583
The patch extends the vectorization pass to lower linalg index operations to vector code. It allocates constant 1d vectors that enumerate the indexes along the iteration dimensions and broadcasts/transposes these 1d vectors to the iteration space.
Differential Revision: https://reviews.llvm.org/D100373
A lit feature guards tests for the lit timeout functionality because on
most system it depends on the availability of the psutil Python module.
However, that feature is defined based on the ability of the testing lit
to cancel test, which does not necessarily apply to the ability of the
tested lit.
In particular, RUN commands have a cleared PYTHONPATH and user site
packages are disabled. In the case where psutil is found by the testing
lit from one of those two source of python path, the tested lit would
not be able to find it, causing timeout tests to fail.
This commit fixes the issue by testing the ability to cancel tests in
the RUN command environment.
Reviewed By: yln
Differential Revision: https://reviews.llvm.org/D99728
Previously, clang-format would erroneously merge import and export
statements. These need to be kept separate, as the semantics differ.
Differential Revision: https://reviews.llvm.org/D100752
The NSS FileCheck variables at the end of the
CodeGenCXX/split-stacks.cpp clang testcase are off by 1, resulting in
the use of an undefined variable (NSS3). One of the CHECK-NOT is also
redundant because _Z8tnosplitIiEiv uses the same attribute as _Z3foov
without split stack. This commit fixes that.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D99839
when the predicate used by last{a,b} specifies a known vector length.
For example:
aarch64_sve_lasta(VL1, D) -> extractelement(D, #1)
aarch64_sve_lastb(VL1, D) -> extractelement(D, #0)
Co-authored-by: Paul Walker <paul.walker@arm.com>
Differential Revision: https://reviews.llvm.org/D100476