when constraint "w" is used on a 32-bit operand.
This enables compiling the following code, which used to error out in
the backend:
void foo1(int a) {
asm volatile ("sqxtn h0, %s0\n" : : "w"(a):);
}
Fixes PR28633.
llvm-svn: 276344
Summary:
This change also changes findMatchingInsn and
findMatchingUpdateInsnForward to take DBG_VALUE opcodes into account
when tracking register defs and uses, which could potentially inhibit
these optimizations in the presence of debug information.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22582
llvm-svn: 276293
At -O0, cmpxchg survives AtomicExpand: it's mostly straightforward
to select it in fast-isel, and let the pseudo be expanded later.
extractvalues on the result are the tricky part: the generic logic
only works for legal types (and it would be painful to make it
support illegal types), so we can only support i32/i64 cmpxchg.
llvm-svn: 276183
This should be all the low-level instruction selection needs to determine how
to implement an operation, with the remaining context taken from the opcode
(e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math).
llvm-svn: 276158
Inference of the 'returned' attribute was fixed in r276008, lets try
turning the backend support back on.
This reverts commit r275677.
llvm-svn: 276081
r275042 reverted function-attribute inference for the 'returned' attribute
because the feature triggered self-hosting failures on ARM and AArch64. James
Molloy determined that the this-return argument forwarding feature, which
directly ties the returned input argument to the returned value, was the cause.
It seems likely that this forwarding code contains, or triggers, a subtle bug.
Disabling for now until we can track that down.
llvm-svn: 275677
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
Summary:
Previously we took an unsigned.
Hooray for type-safety.
Reviewers: chandlerc
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D22282
llvm-svn: 275591
If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now
only Kryo has both), set COPY (W|X)ZR isAsCheapAsAMove.
Differential Revision: http://reviews.llvm.org/D22360
llvm-svn: 275503
Summary:
Make the target-specific flags in MachineMemOperand::Flags real, bona
fide enum values. This simplifies users, prevents various constants
from going out of sync, and avoids the false sense of security provided
by declaring static members in classes and then forgetting to define
them inside of cpp files.
Reviewers: MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22372
llvm-svn: 275451
Summary:
- Give it a shorter name (because we're going to refer to it often from
SelectionDAG and friends).
- Split the flags and alignment into separate variables.
- Specialize FlagsEnumTraits for it, so we can do bitwise ops on it
without losing type information.
- Make some enum values constants in MachineMemOperand instead.
MOMaxBits should not be a valid Flag.
- Simplify some of the bitwise ops for dealing with Flags.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22281
llvm-svn: 275438
If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now
only Kryo has both), set FMOVS0 and FMOVD0 isAsCheapAsAMove.
Differential Revision: http://reviews.llvm.org/D22256
llvm-svn: 275178
Avoid implicit conversions from MachineInstrBundleInstr to MachineInstr*
in the AArch64 backend, mainly by preferring MachineInstr& over
MachineInstr* when a pointer isn't nullable.
llvm-svn: 274924
Stop using an implicit conversion from the return of
MachineBasicBlock::getFirstTerminator to MachineInstr*. In two cases,
directly dereference to a MachineInstr& since later code assumes it's
valid. In a third case, change to an iterator since later code checks
against MachineBasicBlock::end.
Although the fix for the third case avoids undefined behaviour, I expect
this doesn't cause a functionality change in practice (since the basic
block already has a terminator).
llvm-svn: 274898
Support for the macro fusion of simple ALU ops with branches for the Vulcan sub-target.
Patch by Meador Inge <meadori@gmail.com>
Differential Revision: http://reviews.llvm.org/D22042
llvm-svn: 274837
The commit reinstates r273279, which was informally approved.
Original Review: http://reviews.llvm.org/D21414
This reverts commit ca632c91aaa7cafc50942f890c49f727a046ace1.
llvm-svn: 274790
On CPUs with the zero cycle zeroing feature enabled "movi v.2d" should
be used to zero a vector register. This was previously done at
instruction selection time, however the register coalescer sometimes
widened multiple vregs to the Q width because of that leading to extra
spills. This patch leaves the decision on how to zero a register to the
AsmPrinter phase where it doesn't affect register allocation anymore.
This patch also sets isAsCheapAsAMove=1 on FMOVS0, FMOVD0.
This fixes http://llvm.org/PR27454, rdar://25866262
Differential Revision: http://reviews.llvm.org/D21826
llvm-svn: 274686
findScratchNonCalleeSaveRegister() just needs a simple liveness
analysis, use LivePhysRegs for that as it is simpler and does not depend
on the kill flags.
This commit adds a convenience function available() to LivePhysRegs:
This function returns true if the given register is not reserved and
neither the register nor any of its aliases are alive.
Differential Revision: http://reviews.llvm.org/D21865
llvm-svn: 274685
I think the Ops filled out by Regex::match contain pointers into the temporary
std::string returned by StringRef::upper. Its lifetime is extended by the call
to match, but only until the end of that call (not to the uses of Ops later
on).
llvm-svn: 274586
The way the named arguments for various system instructions are handled at the
moment has a few problems:
- Large-scale duplication between AArch64BaseInfo.h and AArch64BaseInfo.cpp
- That weird Mapping class that I have no idea what I was on when I thought
it was a good idea.
- Searches are performed linearly through the entire list.
- We print absolutely all registers in upper-case, even though some are
canonically mixed case (SPSel for example).
- The ARM ARM specifies sysregs in terms of 5 fields, but those are relegated
to comments in our implementation, with a slightly opaque hex value
indicating the canonical encoding LLVM will use.
This adds a new TableGen backend to produce efficiently searchable tables, and
switches AArch64 over to using that infrastructure.
llvm-svn: 274576
This reverts commit r259387 because it inserts illegal code after legalization
in some backends where i64 OR type is illegal for example.
llvm-svn: 274573
The other use really does only care about the SDNode (it checks the
opcode against a whitelist), but bitFieldPlacement can be misled if
the node produces multiple results.
Patch by Ismail Badawi.
llvm-svn: 274567
For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array.
llvm-svn: 274337
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator. One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.
llvm-svn: 274304
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr. In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
llvm-svn: 274287
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr. This is a
general API improvement.
Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other. Instead I've done everything as a block and just
updated what was necessary.
This is mostly mechanical fixes: adding and removing `*` and `&`
operators. The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy. I couldn't run tests
for AVR since llc doesn't link with it turned on.
llvm-svn: 274189
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.
Differential Revision: http://reviews.llvm.org/D20376
llvm-svn: 273403
Summary:
Code generation for Cortex-A72/Cortex-A73 was accidentally changed
by r271555, which was a NFCI. The isCortexA57() predicate was not true
for Cortex-A72/Cortex-A73 before r271555 (since it was checking the CPU
string). Because Cortex-A72/Cortex-A73 inherit all features from Cortex-A57,
all decisions previously guarded by isCortexA57() are now taken.
This change restores the behaviour before r271555 by adding separate
ProcA72/ProcA73, which have the required features to preserve code
generation.
Reviewers: kristof.beyls, aadg, mcrosier, rengolin
Subscribers: mcrosier, llvm-commits, aemerson, t.p.northover, MatzeB, rengolin
Differential Revision: http://reviews.llvm.org/D21182
llvm-svn: 273277
The backend has been around for years, it's pretty ridiculous that we can't
even use the preferred form for printing "MOV" aliases. Unfortunately, TableGen
can't handle the complex predicates when printing so it's a bunch of nasty C++.
Oh well.
llvm-svn: 272865
Of course the assembly was right but because the opcode was MOVZWi it was
encoded as "movz w16, #65535, lsl #32" which is an unallocated encoding and
would go horribly wrong on a CPU.
No idea how this bug survived this long. It seems nobody is using that aspect
of patchpoints.
llvm-svn: 272831
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
llvm-svn: 272512
Add support to the AArch64 IAS for the `.arch` directive. This allows the
assembly input to use architectural functionality in part of a file. This is
used in existing code like BoringSSL.
Resolves PR26016!
llvm-svn: 272241
Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or
FPR for 64-bit or 32-bit values.
Add test cases demonstrating how this information is used to coalesce a
computation on a single register bank.
llvm-svn: 272170
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
llvm-svn: 272126
The cost of a copy may be different based on how many bits we have to
copy around. E.g., a 8-bit copy may be different than a 32-bit copy.
llvm-svn: 272084
We were assuming all SBFX-like operations would have the shl/asr form, but often
when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead.
This is a port of r213754 from ARM to AArch64.
llvm-svn: 271677
new instruction to ARM and AArch64 targets and several system registers.
Patch by: Roger Ferrer Ibanez and Oliver Stannard
Differential Revision: http://reviews.llvm.org/D20282
llvm-svn: 271670
Testing for specific CPUs has a number of problems, better use subtarget
features:
- When some tweak is added for a specific CPU it is often desirable for
the next version of that CPU as well, yet we often forget to add it.
- It is hard to keep track of checks scattered around the target code;
Declaring all target specifics together with the CPU in the tablegen
file is a clear representation.
- Subtarget features can be tweaked from the command line.
To discourage people from using CPU checks in the future I removed the
isCortexXX(), isCyclone(), ... functions. I added an getProcFamily()
function for exceptional circumstances but made it clear in the comment
that usage is discouraged.
Reformat feature list in AArch64.td to have 1 feature per line in
alphabetical order to simplify merging and sorting for out of tree
tweaks.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D20762
llvm-svn: 271555
Summary:
If the target requests it, use emptry spaces in the fixed and
callee-save stack area to allocate local stack objects.
AArch64: Change last callee-save reg stack object alignment instead of
size to leave a gap to take advantage of above change.
Reviewers: t.p.northover, qcolombet, MatzeB
Subscribers: rengolin, mcrosier, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D20220
llvm-svn: 271527
A constant pool holding the address of a variable in equivalent to
a got entry. It produces exactly the same instruction sequence as a
got use and unlike a got use this is not uniqued by the linker.
llvm-svn: 271311
Canonicalize (srl (bswap i32 x), 16) to (rotr (bswap i32 x), 16), if the high
16-bits of x are zero. Similarly, canonicalize (srl (bswap i64 x), 32) to
(rotr (bswap i64 x), 32), if the high 32-bits of x are zero.
test_rev_w_srl16: test_rev_w_srl16:
and w8, w0, #0xffff and w8, w0, #0xffff
rev w8, w8 ---> rev16 w0, w8
lsr w0, w8, #16
test_rev_x_srl32: test_rev_x_srl32:
rev x8, x8 ---> rev32 x0, x8
lsr x0, x8, #32
llvm-svn: 270896
If and only if the value being inserted sets only known zero bits.
This combine transforms things like
and w8, w0, #0xfffffff0
movz w9, #5
orr w0, w8, w9
into
movz w8, #5
bfxil w0, w8, #0, #4
The combine is tuned to make sure we always reduce the number of instructions.
We avoid churning code for what is expected to be performance neutral changes
(e.g., converted AND+OR to OR+BFI).
Differential Revision: http://reviews.llvm.org/D20387
llvm-svn: 270846
Summary:
As this optimization converts two loads into one load with two shift instructions,
it could potentially hurt performance if a loop is arithmetic operation intensive.
Reviewers: t.p.northover, mcrosier, jmolloy
Subscribers: evandro, jmolloy, aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20172
llvm-svn: 270251
Mask0Imm and ~Mask1Imm must be equivalent and one of the MaskImms is a shifted
mask (e.g., 0x000ffff0). Both 'and's must have a single use.
This changes code like:
and w8, w0, #0xffff000f
and w9, w1, #0x0000fff0
orr w0, w9, w8
into
lsr w8, w1, #4
bfi w0, w8, #4, #12
llvm-svn: 270063
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
Summary:
Fix bug in MachO path where a frame index offset would not be reserved
for handling large frames when an extra non-used callee-save register
was saved. In the case where the extra register is reserved or not a
GPR (e.g. %FP in the MachO case), this would lead to the register
scavenger later failing when called from PrologEpilogInserter.
Reviewers: t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20185
llvm-svn: 269697
Most immediates are printed in Aarch64InstPrinter using 'formatImm' macro,
but not all of them.
Implementation contains following rules:
- floating point immediates are always printed as decimal
- signed integer immediates are printed depends on flag settings
(for negative values 'formatImm' macro prints the value as i.e -0x01
which may be convenient when imm is an address or offset)
- logical immediates are always printed as hex
- the 64-bit immediate for advSIMD, encoded in "a🅱️c:d:e:f:g:h" is always printed as hex
- the 64-bit immedaite in exception generation instructions like:
brk, dcps1, dcps2, dcps3, hlt, hvc, smc, svc is always printed as hex
- the rest of immediates is printed depends on availability
of -print-imm-hex
Signed-off-by: Maciej Gabka <maciej.gabka@arm.com>
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
Differential Revision: http://reviews.llvm.org/D16929
llvm-svn: 269446
This one has a lot of code churn, but it's all mechanical and
straightforward.
- Where we were returning a node before, call ReplaceNode instead.
- Where we would return null to fall back to another selector, rename
the method to try* and return a bool for success.
- Where we were calling SelectNodeTo, just return afterwards.
Part of llvm.org/pr26808.
llvm-svn: 269379
For narrow stores (e.g., strb, srth) we know the upper bits of the register are
unused/not useful. In some cases we can use this information to eliminate
unnecessary instructions.
For example, without this patch we generate (from the 2nd test case):
ldr w8, [x0]
and w8, w8, #0xfff0
bfxil w8, w2, #16, #4
strh w8, [x1]
and after the patch the 'and' is removed:
ldr w8, [x0]
bfxil w8, w2, #16, #4
strh w8, [x1]
ret
During the lowering of the bitfield insert instruction the 'and' is eliminated
because we know the upper 16-bits that are masked off are unused and the lower
4-bits that are masked off are overwritten by the insert itself. Therefore, the
'and' is unnecessary.
Differential Revision: http://reviews.llvm.org/D20175
llvm-svn: 269226
Summary: When emitting comparison for fp16, in addition to promote the LHS and RHS to fp32, we need to change the VT as well.
Reviewers: t.p.northover
Subscribers: t.p.northover, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19922
llvm-svn: 269151
Unlike xN/wN, the size of vN is genuinely ambiguous in the assembly, so we
should try to infer what was intended from the type. But only down to 64-bits
(vN can never represent sN, hN or bN).
llvm-svn: 269132
SystemZ (and probably other targets as well) can fold a memory operand
by changing the opcode into a new instruction that as a side-effect
also clobbers the CC-reg.
In order to do this, liveness of that reg must first be checked. When
LIS is passed, getRegUnit() can be called on it and the right
LiveRange is computed on demand.
Reviewed by Matthias Braun.
http://reviews.llvm.org/D19861
llvm-svn: 269026
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.
llvm-svn: 269011
Summary:
This implements the lowering of the X constraint on
AArch64.
The default behaviour of the X constraint lowering is to
restrict it to "f". This is a problem because the "f"
constraint is not implemented on AArch64 and would be too
restrictive anyway. Therefore, the AArch64 hook will
lower this to "w" (if the operand is a floating point or
vector) or "r" otherwise.
The implementation is similar with the one added for
ARM (r267411).
This is the AArch64 side of the fix for http://llvm.org/PR26493
Reviewers: rengolin
Subscribers: aemerson, rengolin, llvm-commits, t.p.northover
Differential Revision: http://reviews.llvm.org/D19967
llvm-svn: 268907
Summary:
If a function needs to allocate both callee-save stack memory and local
stack memory, we currently decrement/increment the SP in two steps:
first for the callee-save area, and then for the local stack area. This
changes the code to allocate them both at once at the very beginning/end
of the function. This has two benefits:
1) there is one fewer sub/add micro-op in the prologue/epilogue
2) the stack adjustment instructions act as a scheduling barrier, so
moving them to the very beginning/end of the function increases post-RA
scheduler's ability to move instructions (that only depend on argument
registers) before any of the callee-save stores
This change can cause an increase in instructions if the original local
stack SP decrement could be folded into the first store to the stack.
This occurs when the first local stack store is to stack offset 0. In
this case we are trading off one more sub instruction for one fewer sub
micro-op (along with benefits (2) and (3) above).
Reviewers: t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18619
llvm-svn: 268746
Summary: This change refactors to decouple the zero store promotion from the narrow ld merge and add a flag (enable-narrow-ld-merge=true) to control the narrow ld merge optimization.
Reviewers: jmolloy, t.p.northover, mcrosier
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19885
llvm-svn: 268744
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.
We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.
Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.
llvm-svn: 268693
This patch adds support for estimating the square root, its reciprocal and
division or reciprocal using the combiner generic reciprocal machinery.
llvm-svn: 268539
Remove the AddPristinesAndCSRs parameters from
addLiveIns()/addLiveOuts().
We need to respect pristine registers after prologue epilogue insertion,
Seeing that we got this wrong in at least two commits already, we should
rather pay the small price to query MachineFrameInfo for it.
There are three cases that did not set AddPristineAndCSRs to true even
after register allocation:
- ExecutionDepsFix: live-out registers are used as a hint that the
register is used soon. This is not true for pristine registers so
use the new addLiveOutsNoPristines() to maintain this behaviour.
- SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like
a bug, should do the right thing automatically now.
- StackMapLivenessAnalysis: Not adding pristine registers looks like a
bug to me. Added a FIXME comment but maintain the current behaviour
as a change may need to get coordinated with GC runtimes.
llvm-svn: 268336
transferSuccessors() would LoadCmpBB a successor of DoneBB,
whereas it should be a successor of the original MBB.
Follow-up to r266339.
Unfortunately, it's tricky to catch this in the verifier.
llvm-svn: 267779
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
Summary:
We don't use MinLatency any more since r184032.
Reviewers: atrick, hfinkel, mcrosier
Differential Revision: http://reviews.llvm.org/D19474
llvm-svn: 267502
log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit
variant cannot encode it. Therefore, set the subreg part accordingly.
[AArch64] Fix optimizeCondBranch logic.
The opcode for the optimized branch does not depend on the size
of the activate bits in the AND masks, but the AND opcode itself.
Indeed, we need to use a X or W variant based on the AND variant
not based on whether the mask fits into the related variant.
Otherwise, we may end up using the W variant of the optimized branch
for 64-bit register inputs!
This fixes the last make check verifier issues for AArch64: PR27479.
llvm-svn: 267465
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267328
The opcode for the optimized branch does not depend on the size
of the activate bits in the AND masks, but the AND opcode itself.
Indeed, we need to use a X or W variant based on the AND variant
not based on whether the mask fits into the related variant.
Otherwise, we may end up using the W variant of the optimized branch
for 64-bit register inputs!
This fixes the last make check verifier issues for AArch64: PR27479.
llvm-svn: 267206
We used to simply set the kill flags to true when transforming a scalar
instruction to a vector one.
SrcScalar1 = copy SrcVector1
... = opScalar SrcScalar1
=>
SrcScalar1 = copy SrcVector1
... = opVector SrcVector1<kill>
This is obviously wrong. The proper update consists in:
1. Propagate the kill status from the copy to the new opVector
2. Reset the kill status on the copy, since the live-range of
SrcVector1 got extended.
This fixes some of the machine verifier errors for AArch64 with make check.
llvm-svn: 267180
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267098
AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code.
A compare instruction is substituted with another instruction which does not
produce the same flags as the original compare instruction.
This patch contains:
1. Fix of the bug.
2. A regression test in MIR.
3. A new test to check that SUBS is replaced by SUB.
Differential Revision: http://reviews.llvm.org/D18838
llvm-svn: 266969