PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction. That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not. This corrects that problem.
Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.
llvm-svn: 250324
This patch teaches x86 fast-isel how to select nontemporal stores.
On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.
Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.
With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.
Differential Revision: http://reviews.llvm.org/D13698
llvm-svn: 250285
One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new
one. Previously, bundle iterators and single-instruction iterators
could be compared to each other (comparing on underlying pointers).
I changed a comparison from using `MBB->end()` to using
`MBB->instr_end()`, since both end iterators should point at the some
place anyway.
I don't think the implicit conversion between the two iterator types is
a good idea since it's fairly easy to accidentally compare to the wrong
thing (they aren't always end iterators). Otherwise I would have just
added the conversion.
Even with that, no there should be functionality change here.
llvm-svn: 250218
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.
This patch adds the missing EFLAGS definition.
Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.
Reviewers: rsmith, rtrieu
Subscribers: llvm-commits, reames, morisset
Differential Revision: http://reviews.llvm.org/D13680
llvm-svn: 250135
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.
InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.
llvm-svn: 250129
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.
llvm-svn: 250088
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.
This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.
In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.
Example (-mattr=+avx):
%0 = trunc <8 x i32> %a to <8 x i23>
%1 = icmp eq <8 x i23> %0, zeroinitializer
The initial selection dag for the code above is:
v8i1 = setcc t5, t7, seteq:ch
t5: v8i23 = truncate t2
t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
t7: v8i32 = build_vector of all zeroes.
The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
v8i16 = setcc t32, t25, seteq:ch
where t32 and t25 are now values of type v8i32.
Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25
However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
%vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3
Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.
This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.
This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.
Differential Revision: http://reviews.llvm.org/D13660
llvm-svn: 250085
Summary:
This removes unnecessary instructions when extracting from an undefined register
and also fixes a crash for O32 when passing undef to a double argument in
held in integer registers.
Reviewers: vkalintiris
Subscribers: llvm-commits, zoran.jovanovic, petarj
Differential Revision: http://reviews.llvm.org/D13467
llvm-svn: 250039
The Swift Machine Scheduler Model is incomplete. There are instructions
missing which can trigger the "incomplete machine model" abort. This was
observed when a downstream SchedMachineModel was added to the ARM
target.
Patch by Christof Douma!
llvm-svn: 250033
This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
chain of shuffles to be wrongly folded away when the combined shuffle mask has
only one element.
We may end up with a combined shuffle mask of one element as a result of
multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if
indices refer to adjacent elements. If all pairs refer to "adjacent" elements
then the shuffle mask is safely widened. As a consequence of widening, we end up
with a new shuffle mask which is half the size of the original shuffle mask.
The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
indices. Function canWidenShuffleElements would combine each pair of
SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
logarithmic number of steps (4 in this case), the pshufb mask is simplified to
a mask with only one index which is equal to SM_SentinelZero.
Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
of size one is always equivalent to an identity mask. So, the entire shuffle
chain was just folded away as the combined shuffle mask was treated as a no-op
mask.
With this patch we know check if the only element of a combined shuffle mask is
SM_SentinelZero. In case, we propagate a zero vector.
Differential Revision: http://reviews.llvm.org/D13364
llvm-svn: 250027
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).
llvm-svn: 249976
expandPostRAPseudo():
STX -> 2 * STD: The first STD should not have the kill flag set for the address.
SystemZElimCompare:
BRC -> BRCT conversion: Don't forget to remove the CC<use,kill> operand.
Needed to make SystemZ/asm-17.ll pass with -verify-machineinstrs, which
now runs with this flag.
Reviewed by Ulrich Weigand.
llvm-svn: 249945
The new implementation works at least as well as the old implementation
did.
Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.
llvm-svn: 249918
x64 catchpads use rax to inform the unwinder where control should go
next. However, we must initialize rax before the epilogue sequence so
as to not perturb the unwinder.
llvm-svn: 249910
This occurred due to introducing the invalid i64 type after type
legalization had already finished, in an attempt to workaround bitcast
f64 -> v2i32 not doing constant folding.
The *right* thing is to actually fix bitcast, but that has other
complications. So, for now, just get rid of the broken workaround, and
check in a test-case showing that it doesn't crash, with TODOs for
emitting proper code.
llvm-svn: 249908
When running combine on an extract_vector_elt, it wants to look through
a bitcast to check if the argument to the bitcast was itself an
extract_vector_elt with particular operands.
However, it called getOperand() on the argument to the bitcast *before*
checking that the opcode was EXTRACT_VECTOR_ELT, assert-failing if there
were zero operands for the actual opcode.
Fix, and add trivial test.
llvm-svn: 249891
After r249764, if you didn't see the full context, it looked like
`std::next(I)` would get the same result as
`++MachineBasicBlock::iterator(I)`. However, `I` is a `MachineInstr*`
(not a `MachineBasicBlock::iterator`).
Use the `getIterator()` helper I added later (r249782) to make this code
more clear.
llvm-svn: 249852
This patch corresponds to review:
http://reviews.llvm.org/D12032
This patch builds onto the patch that provided scalar to vector conversions
without stack operations (D11471).
Included in this patch:
- Vector element extraction for all vector types with constant element number
- Vector element extraction for v16i8 and v8i16 with variable element number
- Removal of some unnecessary COPY_TO_REGCLASS operations that ended up
unnecessarily moving things around between registers
Not included in this patch (will be in upcoming patch):
- Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with
variable element number
- Vector element insertion for variable/constant element number
Testing is provided for all extractions. The extractions that are not
implemented yet are just placeholders.
llvm-svn: 249822
LLCH, LLHH and CLIH had the wrong register classes for the def-operand.
Tie operands if changing opcode to an instruction with tied ops.
Comment typo fix.
These fixes were needed in order to make regression test case
SystemZ/asm-18.ll pass with -verify-machineinstrs (not used by
default).
Reviewed by Ulrich Weigand.
llvm-svn: 249811
Let parseRegister() allow RegFP Group if expecting RegV Group, since the
%f register prefix yields the FP group even while used with vector instructions.
Reviewed by Ulrich Weigand.
llvm-svn: 249810
Accept r11 when targeting Windows on ARM rather than just low registers.
Because we are in a thumb-2 only mode, this may be slightly more expensive in
code size, but results in better code for the environment since it spills the
frame register, which is generally desired for fast stack walking as per the
ABI.
llvm-svn: 249804
Stop using `getNextNode()` to get an insertion point (at least, in this
one place). Instead, use iterator logic directly.
The `getNextNode()` interface isn't actually supposed to work for
creating iterators; it's supposed to return `nullptr` (not a real
iterator) if this is the last node. It's currently broken and will
"happen" to work, but if we ever fix the function, we'll get some
strange failures in places like this.
llvm-svn: 249764
Stop using `getNextNode()` to create an insertion point for machine
instructions (at least, in this one place). Instead, use an iterator.
As a drive-by, clean up dump statements to use iterator logic.
The `getNextNode()` interface isn't actually supposed to work for
insertion points; it's supposed to return `nullptr` if this is the last
node. It's currently broken and will "happen" to work, but if we ever
fix the function, we'll get some strange failures.
llvm-svn: 249758
its own variable.
This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.
Rationale:
// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
return _mm_addsub_pd(A, B);
}
// mmx
void shift(__m64 a, __m64 b, int c) {
_mm_slli_pi16(a, c);
_mm_slli_pi32(a, c);
_mm_slli_si64(a, c);
_mm_srli_pi16(a, c);
_mm_srli_pi32(a, c);
_mm_srli_si64(a, c);
_mm_srai_pi16(a, c);
_mm_srai_pi32(a, c);
}
clang -msse3 -mno-mmx file.c -c
For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.
This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.
Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.
This is paired with a patch to clang to take advantage of this behavior.
llvm-svn: 249731
o Before this patch, BPF backend will expand UNDEF node
to i64 constant 0.
o For second pass of dag combiner, legalizer will run through
each to-be-processed dag node.
o If any new SDNode is generated and has an undef operand,
dag combiner will put undef node, newly-generated constant-0 node,
and any node which uses these nodes in the working list.
o During this process, it is possible undef operand is
generated again, and this will form an infinite loop
for dag combiner pass2.
o This patch allows UNDEF to be a legal type.
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
llvm-svn: 249718
This fixes yet another scenario where tryBuildVectorShuffle would
attempt to create a BUILD_VECTOR node with an invalid combination
of types. This can happen if the incoming BUILD_VECTOR has elements
of a type different from the vector element type, which is allowed
in certain cases as long as they are all the same type.
When one of these elements is used in the residual vector, and
UNDEF elements are added to fill up the residual vector, those
UNDEFs then have to use the type of the original element, not
the vector element type, or else the resulting BUILD_VECTOR
will have an invalid type combination.
llvm-svn: 249706
We emit 1 compact unwind encoding per function, and this can’t represent
the varying stack pointer that will be generated by X86CallFrameOptimization.
Disable the optimization on Darwin.
(It might be possible to split the function into multiple ranges
and emit 1 compact unwind info per range. The compact unwind emission
code isn’t ready for that and this kind of info certainly isn’t
tested/used anywhere. It might be worth exploring this path if we want
to get the space savings at some point though)
llvm-svn: 249694
This instructions doesn't have intrincis.
Added tests for lowering and encoding.
Differential Revision: http://reviews.llvm.org/D12317
llvm-svn: 249688
This fixes two separate bugs:
1) The mask for the high lane was not set correctly. That fixes PR24532.
2) The transformation should bail out if it believes it involves more than
2 lanes, as it does not currently do anything sensible in this case.
Differential Revision: http://reviews.llvm.org/D13505
llvm-svn: 249669
Compare elimination extended to recognize load-and-test instructions used
for comparison and eliminate them the same way as with compare instructions.
Test case fp-cmp-05.ll updated to expect optimized results now also for z13.
The order of instruction shortening and compare elimination passes have been
changed so that opcodes do not have to be handled in both passes.
Reviewed by Ulrich Weigand.
llvm-svn: 249666
The following instruction shortening transformations would introduce a
definition of the CC reg, so therefore liveness of CC reg must be checked:
WFADB -> ADBR
WFSDB -> SDBR
Also add the CC reg implicit def operand to the MI in case of change of opcode.
Reviewed by Ulrich Weigand.
llvm-svn: 249665
Since the LTxBRCompare instructions can't be used with vector registers, a
normal load-and-test instruction (with a modelled def operand) is used instead.
Reviewed by Ulrich Weigand.
llvm-svn: 249664
In r224059, we started verifying after addPass, but missed doing so on
insertPass. There isn't a good reason for the discrepancy, and
skipping the verifier in these cases causes bugs.
This also exposes a verifier error that was introduced in r249087, but
the verifier doesn't run until after the register coalescer, when the
issue happens to have been resolved. I've skipped the verifier after
SIFixSGPRLiveRangesID to avoid the failures for now and will follow up
with Matt for a proper fix.
llvm-svn: 249643
In particular, passing non-trivially copyable objects by value on win32
uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue
and end up returning to outer space.
llvm-svn: 249637
I'll be using the function in a similar combine for AArch64. The helper was
also improved to handle undef values.
Part of http://reviews.llvm.org/D13442
llvm-svn: 249572
The ARM RTABI defines the half- to single-precision float conversion functions
with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore
we need to emit the __aeabi version when compiling with an eabi or eabihf
triple, and the __gnu version with a gnueabi or gnueabihf triple.
llvm-svn: 249565
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.
Differential Revision: http://reviews.llvm.org/D13508
llvm-svn: 249550
When outgoing function arguments are passed using push instructions, and EH
is enabled, we may need to indicate to the stack unwinder that the stack
pointer was adjusted before the call.
This should fix the exception handling issues in PR24792.
Differential Revision: http://reviews.llvm.org/D13132
llvm-svn: 249522
Because of the constant bus requirement, it is never legal to
use a literal constant for these instructions despite the encoding
allowing it. This was already doing the right thing, but note why.
llvm-svn: 249500
This stops using an unknown reg class operand.
Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.
With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.
llvm-svn: 249494
I'm not sure why this would be necessary, and no tests fail with
them removed. Looking at the uses is suspect as well because
the use reg classes will likely change when the users are moved
as a result of moving this instruction.
llvm-svn: 249493
Summary:
We currently ignore the calling convention, so there is no real reason to
assert on the calling convention of functions.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13367
llvm-svn: 249468
Summary:
- Add CoreCLR to if/else ladders and switches as appropriate.
- Rename isMSVCEHPersonality to isFuncletEHPersonality to better
reflect what it captures.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: pgavlin, AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D13449
llvm-svn: 249455
Summary:
The assembly printing of these is still missing the encoding size
suffix, but this will be fixed in a later commit.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13436
llvm-svn: 249424
Summary:
This fixes 7 tests during fast LLVM test-suite run:
* MultiSource/Benchmarks/McCat/18-imp/imp
* MultiSource/Applications/oggenc/oggenc
* MultiSource/Benchmarks/MallocBench/gs/gs
* MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer
* MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame
* MultiSource/Benchmarks/Bullet/bullet
Error message was in the form of:
fatal error: error in backend: Cannot select: 0x95c3288: f32 = fsqrt 0x95c0190 [ORD=9] [ID=18]
0x95c0190: f32 = fadd 0x95bef30, 0x95c4d00 [ORD=8] [ID=17]
0x95bef30: f32 = fmul 0x95c4988, 0x95c4988 [ORD=5] [ID=16]
...
There was problem with selecting sqrt instruction in LLVM backend.
To fix the issue changes are made in TableGen definition for sqrt instruction in MipsInstrFPU.td and new test file sqrt.ll is added to LLVM regression tests.
Patch by Zlatko Buljan
Reviewers: zoran.jovanovic, hvarga, dsanders
Subscribers: llvm-commits, petarj
Differential Revision: http://reviews.llvm.org/D13235
llvm-svn: 249416
For the program like below
struct key_t {
int pid;
char name[16];
};
extern void test1(char *);
int test() {
struct key_t key = {};
test1(key.name);
return 0;
}
For key.name, the llc/bpf may generate the below code:
R1 = R10 // R10 is the frame pointer
R1 += -24 // framepointer adjustment
R1 |= 4 // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.
This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
R1 = R10
R1 += -20
Patch by Yonghong Song <yhs@plumgrid.com>
llvm-svn: 249371
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.
llvm-svn: 249370
This new syntax is built around putting each instruction on its own line
in a "mnemonic op, op, op" like syntax. It also uses conventional data
section directives like ".byte" and so on rather than requiring everything
to be in hierarchical S-expression format. This is a more natural syntax
for a ".s" file format from the perspective of LLVM MC and related tools,
while remaining easy to translate into other forms as needed.
llvm-svn: 249364
The CATCHRET operand did not match the MachineFunction's CFG. This
mismatch happened because FrameLowering created a new MachineBasicBlock
and updated the CFG but forgot to update the CATCHRET operand.
Let's make sure this doesn't happen again by strengthing the funclet
membership analysis: it can now reason about the membership of all basic
blocks, not just those inside of funclets.
llvm-svn: 249344
Summary:
We are currently only using these aliases for VOPC instructions,
but this helper will make it easier to use them everywhere.
These aliases allow for the automatic matching of instructions
with forced 32-bit encoding. Eventually, we should be able to remove
the custom C++ logic we have for this in the assembler.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13396
llvm-svn: 249330
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.
A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.
Fixes PR9199 and PR23768.
[This is based on Peter Collingbourne's r238473 which was reverted.]
Differential Revision: http://reviews.llvm.org/D13239
Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.
Differential Revision: http://reviews.llvm.org/D13011
llvm-svn: 249313
Summary:
An instruction like "(d)la $5, symbol+8" previously would have crashed the
assembler as it contains an expression. This is now fixed.
A few tests cases have also been changed to reflect these changes, however
these should only be syntax changes. Some new test cases have also been
added.
Patch by Scott Egerton.
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12760
llvm-svn: 249311
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.
With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.
In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.
This patch is a joint work by Maxim Ostapenko andy myself.
llvm-svn: 249303
The custom lowering in LowerExtendedLoad is doing the equivalent shuffle, so make use of existing lowering code to reduce duplication.
llvm-svn: 249243
We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.
llvm-svn: 249164
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.
Example:
%1 = bitcast <4 x i31> %V to <2 x i64>
Before (-fast-isel -fast-isel-abort=1):
FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>
Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.
Originally reviewed here: http://reviews.llvm.org/D13347
llvm-svn: 249147
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.
Example:
%1 = bitcast <4 x i31> %V to <2 x i64>
Before (-fast-isel -fast-isel-abort=1):
FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>
Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.
Differential Revision: http://reviews.llvm.org/D13347
llvm-svn: 249121
Replace LiveInterval usage with LiveVariables. LiveIntervals
computes far more information than is needed for this pass
which just needs to find if an SGPR is live out of the
defining block.
LiveIntervals are not usually available that early, requiring
computing them twice which is very expensive. The extra run of
LiveIntervals/LiveVariables/SlotIndexes was costing in total
about 5% of compile time.
Continuing to use LiveIntervals is problematic. It seems
there is an option (early-live-intervals) to run the analysis
about where it should go to avoid recomputing LiveVariables,
but it seems to be completely broken with subreg liveness
enabled. There are also problems from trying to recompute
LiveIntervals since this seems to undo LiveVariables
and clearing kill flags, causing TwoAddressInstructions
to make bad decisions.
Insert the pass right after live variables and preserve it.
The tricky case to worry about might be phis since
LiveVariables doesn't count a register as live out if
in the successor block it is only used in a phi,
but I don't think this is a concern right now
because SIFixSGPRCopies replaces SGPR phis.
llvm-svn: 249087