This produces worse code when i16 is legal, mostly
due to combines getting confused by conversions inserted
for uniform 16-bit operations.
llvm-svn: 291717
This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded.
This needs a simple probability check because there are some cases where it is
not profitable.
llvm-svn: 291695
The new matchers work after legalization to make them simpler, and to avoid
blocking other optimizations.
Differential Revision: https://reviews.llvm.org/D27779
llvm-svn: 291693
Commit rL290616 (https://reviews.llvm.org/rL290616) changed a checking command
for the triple arm-apple-darwin in LLVM::CodeGen/ARM/fpcmp_ueq.ll. As a result
of the changes the test could fail for the valid generated code.
These changes fixes the test to check only instructions we would expect.
Differential Revision: https://reviews.llvm.org/D28159
llvm-svn: 291678
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
And VPACKUS* instructions on SEE* targets.
Differential Revision: https://reviews.llvm.org/D28216
llvm-svn: 291670
The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd,
(v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes
redundant (v)movss/(v)movsd instructions. This patch adds patterns for
optimizing these sequences.
Differential revision: https://reviews.llvm.org/D28455
llvm-svn: 291660
Even with aggressive fusion enabled, this requires duplicating
the fmul, or increases an fadd to another fma which is not an
improvement.
llvm-svn: 291642
This was reverted because it would miscompile code where the cmp had
multiple uses. That was due to a deficiency in the existing code, which
was fixed in r291630 (see the PR for details).
This re-commit includes an extra test for the kind of code that got
miscompiled: @test_sub_1_setcc_jcc.
llvm-svn: 291640
We would miscompile the following:
void g(int);
int f(volatile long long *p) {
bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0;
g(b ? 12 : 34);
return b ? 56 : 78;
}
into
pushq %rax
lock incq (%rdi)
movl $12, %eax
movl $34, %edi
cmovlel %eax, %edi
callq g(int)
testq %rax, %rax <---- Bad.
movl $56, %ecx
movl $78, %eax
cmovsl %ecx, %eax
popq %rcx
retq
because the code failed to take into account that the cmp has multiple
uses, replaced one of them, and left the other one comparing garbage.
llvm-svn: 291630
Summary:
Previously if you had
* a function with the fast-math-enabled attr, followed by
* a function without the fast-math attr,
the second function would inherit the first function's fast-math-ness.
This means that mixing fast-math and non-fast-math functions in a module
was completely broken unless you explicitly annotated every
non-fast-math function with "unsafe-fp-math"="false". This appears to
have been broken since r176986 (March 2013), when the resetTargetOptions
function was introduced.
This patch tests the correct behavior as best we can. I don't think I
can test FPDenormalMode and NoTrappingFPMath, because they aren't used
in any backends during function lowering. Surprisingly, I also can't
find any uses at all of LessPreciseFPMAD affecting generated code.
The NVPTX/fast-math.ll test changes are an expected result of fixing
this bug. When FMA is disabled, we emit add as "add.rn.f32", which
prevents fma combining. Before this patch, fast-math was enabled in all
functions following the one which explicitly enabled it on itself, so we
were emitting plain "add.f32" where we should have generated
"add.rn.f32".
Reviewers: mkuper
Subscribers: hfinkel, majnemer, jholewinski, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28507
llvm-svn: 291618
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.
Differential revision: https://reviews.llvm.org/D27742
llvm-svn: 291609
If a vector index is out of bounds, the result is supposed to be
undefined but is not undefined behavior. Change the legalization
for indexing the vector on the stack so that an out of bounds
index does not create an out of bounds memory access.
llvm-svn: 291604
When we collect 2 uses of a function in FindUses and then RAUW when we
visit the first, we end up visiting the wrapper (because the second was
RAUW'd). We still want to use RAUW instead of just Use->set() because
it has special handling for Constants, so this patch just ensures that
only one use of each constant is added to the work list.
Differential Revision: https://reviews.llvm.org/D28504
llvm-svn: 291603
This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351
1. This patch adds new type of shuffle lowering
2. We can use the expand instruction, When the shuffle pattern is as following:
{ 0*a[0]0*a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}.
Reviewers: 1. igorb
2. guyblank
3. craig.topper
4. RKSimon
Differential Revision: https://reviews.llvm.org/D28352
llvm-svn: 291584
The usage of some MIPS MSA instrinsics that took immediates could crash LLVM
during lowering. This patch addresses that behaviour. Crucially this patch
also makes the use of intrinsics with out of range immediates as producing an
internal error.
The ld,st instrinsics would trigger an assertion failure for MIPS64 as their
lowering would attempt to add an i32 offset to a i64 pointer.
Reviewers: vkalintiris, slthakur
Differential Revision: https://reviews.llvm.org/D25438
llvm-svn: 291571
Previous the lowering of FILL_FW would use the MSA128W register class when
performing a vector splat. Instead it should be honouring -mno-odd-spreg and
only use the even registers when performing a splat from word to vector
register.
Logical follow-on from r230235.
This fixes PR/31369.
A previous commit was missing the test case and had another differential
in it.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D28373
llvm-svn: 291566
Summary:
This patch enables the following
1. AMD family 17h architecture using "znver1" tune flag (-march, -mcpu).
2. ISAs that are enabled for "znver1" architecture.
3. Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used.
4. ISAs FMA4, XOP are disabled as they are dropped from amdfam17.
5. For the time being, it uses the btver2 scheduler model.
6. Test file is updated to check this flag.
This item is linked to clang review item https://reviews.llvm.org/D28018
Patch by Ganesh Gopalasubramanian
Reviewers: RKSimon, craig.topper
Subscribers: vprasad, RKSimon, ashutosh.nema, llvm-commits
Differential Revision: https://reviews.llvm.org/D28017
llvm-svn: 291543
While we can usually replace bitcast like instructions
(MachineInstr::isBitcast()) with a COPY this is not legal if any of the
users uses SUBREG_TO_REG to assert the upper bits of the result are
zero.
Differential Revision: https://reviews.llvm.org/D28474
llvm-svn: 291483
For i16 zeroext arguments when i16 was a legal type, the
known bits information from the truncate was lost. Insert
a zeroext so the known bits optimizations work with the 32-bit
loads.
Fixes code quality regressions vs. SI in min.ll test.
llvm-svn: 291461
Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets.
Cost model updates to follow.
llvm-svn: 291451
Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw).
Cost model updates to follow.
llvm-svn: 291445
Summary:
Originally
i64 = umax t8, Constant:i64<4>
was expanded into
i32,i32 = umax Constant:i32<0>, Constant:i32<0>
i32,i32 = umax t7, Constant:i32<4>
Now instead the two produced umax:es return i32 instead of i32, i32.
Thanks to Jan Vesely for help with the test case.
Patch by mikael.holmen at ericsson.com
Reviewers: bogner, jvesely, tstellarAMD, arsenm
Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D28135
llvm-svn: 291441
I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR.
The debug output shows nodes like this:
t4: i32 = xor t2, Constant:i32<-1>
t21: i8 = setcc t4, Constant:i32<0>, setlt:ch
t14: i32 = select t21, t4, Constant:i32<-1>
And because the select is holding onto the t4 (xor) node while EmitTest creates a new
x86-specific xor node, the lowering results in:
t4: i32 = xor t2, Constant:i32<-1>
t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1>
t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1
Differential Revision: https://reviews.llvm.org/D28374
llvm-svn: 291392
We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator.
llvm-svn: 291371
Gracefully leave code that performs function-pointer bitcasts implying
non-trivial pointer conversions alone, rather than aborting, since it's
just undefined behavior.
llvm-svn: 291326
WebAssembly requires caller and callee signatures to match exactly. In LLVM,
there are a variety of circumstances where signatures may be mismatched in
practice, and one can bitcast a function address to another type to call it
as that type. This patch adds a pass which replaces bitcasted function
addresses with wrappers to replace the bitcasts.
This doesn't catch everything, but it does match many common cases.
llvm-svn: 291315
Re-apply r288561: This time with a fix where the ADDs that are part of a
3 instruction LOH would not invalidate the "LastAdrp" state. This fixes
http://llvm.org/PR31361
Previously this pass was using up to 5% compile time in some cases which
is a bit much for what it is doing. The pass featured a full blown
data-flow analysis which in the default configuration was restricted to a
single block.
This rewrites the pass under the assumption that we only ever work on a
single block. This is done in a single pass maintaining a state machine
per general purpose register to catch LOH patterns.
Differential Revision: https://reviews.llvm.org/D27329
This reverts commit 9e6cedb0a4f14364d6511597a9160305e7d34493.
llvm-svn: 291266
We used the logBase2 of the high instead of the ceilLogBase2 resulting
in the wrong result for certain values. For example, it resulted in an
i1 AssertZExt when the exclusive portion of the range was 3.
llvm-svn: 291196
Add an assert that checks whether liveins are up to date before they are
used.
- Do not print liveins into .mir files anymore in situations where they
are out of date anyway.
- The assert in the RegisterScavenger is superseded by the new one in
livein_begin().
- Skip parts of the liveness updating logic in IfConversion.cpp when
liveness isn't tracked anymore (just enough to avoid hitting the new
assert()).
Differential Revision: https://reviews.llvm.org/D27562
llvm-svn: 291169
Summary:
For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register.
The lower 64-bits of the register are evaluated to determine the shift amount.
This patch improves the construction of the vector containing the shift amount.
Reviewers: craig.topper, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28353
llvm-svn: 291120
To make this work, pointers from the MachineBasicBlock to the LLVM-IR-level
basic blocks need to be initialized, as the AsmPrinter uses this link to be
able to print out labels for the basic blocks that are address-taken.
Most of the changes in this commit are about adapting existing tests to include
the basic block name that is now printed out in the MIR format, now that the
name becomes available as the link to the LLVM-IR basic block is initialized.
The relevant test change for the functionality added in this patch are the
added "(address-taken)" strings in
test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.
Differential Revision: https://reviews.llvm.org/D28123
llvm-svn: 291105
This commit does this using a trivial chain of conditional branches. In the
future, we probably want to reuse the optimized switch lowering used in
SelectionDAG.
Differential Revision: https://reviews.llvm.org/D28176
llvm-svn: 291099
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:
fp-to-uint16: 65534.0 -> 0xfffe
With the legalization:
fp-to-sint32: 65534.0 -> 0x0000fffe
Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.
Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.
This patch reverts r279223 behavior and adds more tests and
documentations.
In PR29041's context, James Molloy mentioned that:
We don't need to mask because conversion from float->uint8_t is
undefined if the integer part of the float value is not representable in
uint8_t. Therefore we can assume this doesn't happen!
which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.
Reviewers: jmolloy, nadav, echristo, kbarton
Subscribers: mehdi_amini, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28284
llvm-svn: 291015
Summary:
Instead of matching:
(a + i) + 1 -> (a + i, undef, 1)
Now it matches:
(a + i) + 1 -> (a, i, 1)
Reviewers: rengolin
Differential Revision: http://reviews.llvm.org/D26367
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:
1. When determining tail-call eligibility. If we make a tail call (i.e.
directly branch to a function) then there is no place for the linker to add
a TOC restoration.
2. When determining when we need to add a nop instruction after a call.
Likewise, if there is a possibility that the linker might need to add a
TOC restoration after a call, then we need to put a nop after the call
(the bl instruction).
First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.
Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).
There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.
Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.
Differential Revision: https://reviews.llvm.org/D27231
llvm-svn: 291003
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: aprantl, MatzeB, mkuper
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290955
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.
This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).
Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.
I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.
Differential Revision: https://reviews.llvm.org/D28219
llvm-svn: 290947
GNU as rejects input where .cfi_sections is used after .cfi_startproc,
if the new section differs from the old. Adjust our output to always
emit .cfi_sections before the first .cfi_startproc to minimize necessary
code.
Differential Revision: https://reviews.llvm.org/D28011
llvm-svn: 290817
This reverts commit r290694. It broke sanitizer tests on Win64. I'll
probably bring this back, but the jump tables will just live in .text
like they do for MSVC.
llvm-svn: 290714
This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thread
or language reference for further details.
Differential Revision: https://reviews.llvm.org/D27133
llvm-svn: 290708
Summary:
We were already using 32-bit jump table entries, but this was a
consequence of the default PIC model on Win64, and not an intentional
design decision. This patch ensures that we always use 32-bit label
difference jump table entries on Win64 regardless of the PIC model. This
is a good idea because it saves executable size and object file size.
Moving the jump tables to .rdata cleans up the disassembled object code
and reduces the available ROP targets, but it requires adding one more
RIP-relative lea to the code. COFF doesn't have relocations to express
the difference between two arbitrary symbols, so we can't use the jump
table label in the label difference like we do elsewhere.
Fixes PR31488
Reviewers: majnemer, compnerd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28141
llvm-svn: 290694
Jump table emission can switch to .rdata before
WinException::endFunction gets called. Just remember the appropriate
text section we started in and reset back to it when we end the
function. We were already switching sections back from .xdata anyway.
Fixes the first problem in PR31488, so that now COFF switch tables can
live in .rdata if we want them to.
llvm-svn: 290678
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.
Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky
Differential Revision: https://reviews.llvm.org/D27901
llvm-svn: 290663
Replace the use of grep with FileCheck. Tidy up some of the tests. A
few of the tests have been left as weak as previously, though some have
been made more stringent.
llvm-svn: 290616
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.
llvm-svn: 290573
These particular sequences will be generated after a future change to teach InstCombine to turn masked scalar arithmetic intrinsics into native IR.
llvm-svn: 290563
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: mkuper, MatzeB, aprantl
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290423
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
tests of machine-licm.Correct some check lines.
Differential Revision: https://reviews.llvm.org/D27916
llvm-svn: 290410
Summary:
This change rewrites a core component in the ImplicitNullChecks pass for
greater simplicity since the original design was over-complicated for no
good reason. Please review this as essentially a new pass. The change
is almost NFC and I've added a test case for a scenario that this new
code handles that wasn't handled earlier.
The implicit null check pass, at its core, is a code hoisting transform.
It differs from "normal" code transforms in that it speculates
potentially faulting instructions (by design), but a lot of the usual
hazard detection logic (register read-after-write etc.) still applies.
We previously detected hazards by keeping track of registers defined and
used by machine instructions over an instruction range, but that was
unwieldy and did not actually confer any performance benefits. The
intent was to have linear time complexity over the number of machine
instructions considered, but it ended up being N^2 is practice.
This new version is more obviously O(N^2) (with N capped to 8 by
default) in hazard detection. It does not attempt to be clever in
tracking register uses or defs (the previous cleverness here was a
source of bugs).
Once this is checked in, I'll extract out the `IsSuitableMemoryOp` and
`CanHoistLoadInst` lambda into member functions (they're too complicated
to be inline lambdas) and do some other related NFC cleanups.
Reviewers: reames, anna, atrick
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D27592
llvm-svn: 290394
Target specific instructions have requirements that are not compatible
with what we want to test here. Namely, target specific instructions
must have their operands properly mapped on register classes.
llvm-svn: 290379
The InstructionSelect pass will not look at target specific instructions
since they are already selected. As a result, the operands of target
specific instructions must be properly constrained, because it is not
going to fix them.
This fixes invalid register classes on call instruction.
llvm-svn: 290377
When generic virtual registers get constrained, because of a use on a
target specific operation for instance, we end up with regular virtual
registers with a type and that's perfectly fine.
llvm-svn: 290376
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.
This seems to usually be better but causes some code size regressions
in some tests.
llvm-svn: 290372
This is a succeeding patch of https://reviews.llvm.org/D22840 to address the
issue when a value to be merged into an int64 pair is in a different BB. Redoing
the store splitting in CodeGenPrepare so we can match the pattern across multiple
BBs and move some instructions into the same BB. We still keep the code in dag
combine so that we can catch cases that show up after DAG combining runs.
Differential Revision: https://reviews.llvm.org/D25914
llvm-svn: 290365
Follow up to D27209 fix, this patch now properly handles single transient
instruction in basic block.
Patch by Aleksandar Beserminji.
Differential Revision: https://reviews.llvm.org/D27856
llvm-svn: 290361
When the pipeliner is renaming phi values, it may need to iterate through
the phi operands to check for other phis. However, the pipeliner should
stop once it reaches a phi that is outside the pipelined loop.
Also, when the generateExistingPhis code is unable to reuse an existing
phi, the default code that computes the PhiOp2 is only to be used when
the pipeliner is generating the kernel. Otherwise, the phi may be a value
computed earlier in the same epilog.
Patch by Brendon Cahoon.
llvm-svn: 290355
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.
Differential Revision: https://reviews.llvm.org/D27765
llvm-svn: 290292
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store.
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.
Differential Revision: https://reviews.llvm.org/D27899
llvm-svn: 290250
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use.
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.
The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.
Differential Revision: https://reviews.llvm.org/D27392
llvm-svn: 290240
we used to print UNKNOWN instructions when the instruction to be printer was not
yet inserted in any BB: in that case the pretty printer would not be able to
compute a TII as the instruction does not belong to any BB or function yet.
This patch explicitly passes the TII to the pretty-printer.
Differential Revision: https://reviews.llvm.org/D27645
llvm-svn: 290228
See https://reviews.llvm.org/D6678 for the history of
isExtractSubvectorCheap. Essentially the same considerations apply
to ARM.
This temporarily breaks the formation of vpadd/vpaddl in certain cases;
AddCombineToVPADDL essentially assumes that we won't form VUZP shuffles.
See https://reviews.llvm.org/D27779 for followup fix.
Differential Revision: https://reviews.llvm.org/D27774
llvm-svn: 290198