scalar_to_vector takes only one argument, not two.
The a16 tests now also check the packing of coordinates into registers
Differential Revision: https://reviews.llvm.org/D73482
This should lower the amount of used registers for gfx9.
I updated some of the changed tests with the update script because
changing them by hand is tedious.
Differential Revision: https://reviews.llvm.org/D73884
Previously the description allowed to describe symbols with use of
`Name` and `Index` keys. This patch removes them and now it is still
possible to use either names or symbol indexes, but the code is simpler
and the format is slightly different.
Such a change will be useful for another patches, e.g:
https://reviews.llvm.org/D73788#inline-671077
Differential revision: https://reviews.llvm.org/D73888
We currently only handle mem instructions with a single define.
Avoid the call site parameter debug info when we find the case with
multiple defs, rather than throwing an assert.
Differential Revision: https://reviews.llvm.org/D73954
Same for any_extend though we don't have coverage for that.
The test changes are because isel didn't check one use of the
setcc_carry. So in isel we would end up with two different
sized setcc_carry instructions. And since it clobbers
the flags we would need to recreate the flags for the second
instruction.
This code handles additional uses by truncating the new wide
setcc_carry back to the original size for those uses.
The old version might be faster on EG (RECIP_IEEE is Trans only),
but it'd need extra corner case checks.
This gives correct corner case behaviour and saves a register.
Fixes OCL CTS sqrt test (1-thread, scalar) on Turks.
Reviewer: arsenm
Differential Revision: https://reviews.llvm.org/D74017
Summary:
This reverts commit 3ef169e586. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.
Depends on D73927.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73928
Originally committed in: 1ced28cbe7
Reverted in: f75301d16d
(reverted due to tests failing on non-linux/x86 targets, tests have since been
generalized and specialized... since Split DWARF isn't supported on non-elf
targets anyway and we have no way to run on "whatever elf target is available"
so they fail on MacOS without an explicit target triple)
This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.
Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
Since we don't support Split DWARF emission on non-ELF formats, hardcode
an elfine triple (we don't have a way to ask for "any ELF triple" it
seems, so hardcoded will have to do)
The compiler may transform the following code
ctx = ctx + reloc_offset
... (*(u32 *)ctx) & 0x8000 ...
to
ctx = ctx + reloc_offset
... (*(u8 *)(ctx + 1)) & 0x80 ...
where reloc_offset will be replaced with a constant during
AsmPrinter phase.
The above transformed code will be rejected the kernel verifier
as it does not allow
*(type *)((ctx + non_zero_offset1) + non_zero_offset2)
style access pattern.
It is hard at SelectionDag phase to identify whether a load
is related to context or not. Sometime, interprocedure analysis
may be needed. So let us simply prevent such optimization
from happening.
Differential Revision: https://reviews.llvm.org/D73997
Originally committed in: 552a8fe12b
Reverted in: f75301d16d
Reverted because it was running llc directly (rather than %llc_dwarf)
which uses COFF files on Windows which LLVM doesn't support all DWARF
features in.
This functionality isn't fully working, but sets up the testing for a
follow-on patch that demonstrates and fixes the brokenness related to
DWO ID hashing this construct.
Summary:
Moves a batch of instructions from unimplemented-simd128 to simd128
because they have recently become available in V8.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73926
Originally committed in: 5327b917e3
and follow on fix: 4f281f0474
Reverted in: 191a9a78b3
and: f75301d16d
Reverted because it wasn't portable between the targets it was running
on. Using %llc_dwarf ensures the target triple is always elfine and thus
DWARF compatible.
lrint/llrint are defined as rounding using the current rounding
mode. Numbers that can't be converted raise FE_INVALID and an
implementation defined value is returned. They may also write to
errno.
I believe this means we can use cvtss2si/cvtsd2si or fist to
convert as long as -fno-math-errno is passed on the command line.
Clang will leave them as libcalls if errno is enabled so they
won't become ISD::LRINT/LLRINT in SelectionDAG.
For 64-bit results on a 32-bit target we can't use cvtss2si/cvtsd2si
but we can use fist since it can write to a 64-bit memory location.
Though maybe we could consider using vcvtps2qq/vcvtpd2qq on avx512dq
targets?
gcc also does this optimization.
I think we might be able to do this with STRICT_LRINT/LLRINT as
well, but I've left that for future work.
Differential Revision: https://reviews.llvm.org/D73859
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.
Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.
Fixes PR44697
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D73752
https://reviews.llvm.org/D72312 introduced an infinite loop which involves
DAGCombiner::visitFMA and AMDGPUTargetLowering::performFNegCombine.
fma( a, fneg(b), fneg(c) ) => fneg( fma (a, b, c) ) => fma( a, fneg(b), fneg(c) ) ...
This only breaks with types where 'isFNegFree' returns flase, e.g. v4f32.
Reproducing the issue also needs the attribute 'no-signed-zeros-fp-math',
and no source mods allowed on one of the users of the Op.
This fix makes changes to indicate that it is not free to negate a fma if it
has users with source mods.
Differential Revision: https://reviews.llvm.org/D73939
Summary:
It can be useful to tune the default inline threshold without overriding other inlining thresholds (e.g. in code compiled for size).
The existing `-inline-threshold` flag overrides other thresholds, so it is insufficient in codebases where there is a mix of code compiled for size and speed.
Patch by Michael Holman <michael.holman@microsoft.com>
Reviewers: eraman, tejohnson
Reviewed By: tejohnson
Subscribers: tejohnson, mtrofin, davidxl, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73217
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.
This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
Similar to D73680 (AArch64 BTI).
A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR.
Also, add 32-bit tests and test that .cfi_startproc is at the function
entry. The line information has a general implementation and is tested
by AArch64/patchable-function-entry-empty.mir
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D73760
Linux commit
1cf5b23988 (diff-289313b9fec99c6f0acfea19d9cfd949)
uses "#pragma clang attribute push (__attribute__((preserve_access_index)),
apply_to = record)"
to apply CO-RE relocations to all records including the following pattern:
#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)
typedef struct {
int a;
} __t;
#pragma clang attribute pop
int test(__t *arg) { return arg->a; }
The current approach to use struct/union type in the relocation record will
result in an anonymous struct, which make later type matching difficult
in bpf loader. In fact, current BPF backend will fail the above program
with assertion:
clang: ../lib/Target/BPF/BPFAbstractMemberAccess.cpp:796: ...
Assertion `TypeName.size()' failed.
clang will change to use the type of the base of the member access
which will preserve the typedef modifier for the
preserve_{struct,union}_access_index intrinsics in the above example.
Here we adjust BPF backend to accept that the debuginfo
type metadata may be 'typedef' and handle them properly.
Differential Revision: https://reviews.llvm.org/D73902
This patch reverts part of r362750 / D62650, which stopped
LiveDebugVariables from trimming leading variable location ranges down
to only covering those instructions that are in scope. I've observed some
circumstances where the number of DBG_VALUEs in a function can be
amplified in an un-necessary way, to cover more instructions that are
out of scope, leading to very slow compile times. Trimming the range
of instructions that the variables cover solves the slow compile times.
The specific problem that r362750 tries to fix is addressed by the
assignment to RStart that I've added. Any variable location that begins
at the first instruction of a block will now be considered to begin at the
start of the block. While these sound the same, the have different
SlotIndexes, and the register allocator may shoehorn additional
instructions in between the two. The test added in the past
(wrong_debug_loc_after_regalloc.ll) still works with this modification.
live-debug-variables.ll has a range trimmed to not cover the prologue of
the function, while dbg-addr-dse.ll has a DBG_VALUE sink past one
instruction with no DebugLoc, which is expected behaviour.
Differential Revision: https://reviews.llvm.org/D73691
This is a bug noted in the recent D72733 and seen
in the similar transform just above the changed source code.
I added tests with illegal types and zexts to show the bug -
we could transform legal phi ops to illegal, etc. I did not add
tests with trunc because we won't see any diffs on those patterns.
That is because InstCombiner::SliceUpIllegalIntegerPHI() appears to
do those transforms independently of datalayout. It can also create
more casts than are present in existing code.
There are some existing regression tests that do not include a
datalayout that would be altered by this fix. I assumed that the
lack of a datalayout in those regression files is an oversight, so
I added the minimal layout (make i32 legal) necessary to preserve
behavior on those tests.
Differential Revision: https://reviews.llvm.org/D73907
Summary: Replace '-' in the error message with <stdin>. This is also consistent with another error message in the code.
Reviewers: jhenderson, probinson, jdenny, grimar, arichardson
Reviewed By: jhenderson
Subscribers: thopre, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73793
Under MVE, we do not have any lowering for fminimum, which a
vector_reduce_fmin without NoNan will be expanded into. As with the
other recent patches, force this to expand in the pre-isel pass. Note
that Neon lowering would be OK because the scalar fminimum uses the
vector VMIN instruction, but is probably better to just rely on the
scalar operations, which is what is done here.
Also fixes what appears to be the reversal of INF vs -INF in the
vector_reduce_fmin widening code.