The exit-on-error flag is necessary to avoid some assertions/unreachables. We
can get past them by creating a few dummy nodes.
Fixes PR27768, PR27769.
Differential Revision: http://reviews.llvm.org/D20726
llvm-svn: 271200
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 271131
We were producing R_X86_64_GOTPCRELX for invalid instructions and
sometimes producing R_X86_64_GOTPCRELX instead of
R_X86_64_REX_GOTPCRELX.
llvm-svn: 271118
It would be better to check the valid/expected size of the immediate operand, but this is
generally better than what we print right now.
Differential Revision: http://reviews.llvm.org/D20385
llvm-svn: 271114
Composing subreg_loreg with subreg_oveflow leads to strange results with
lane masks for register classes with subreg_loreg. In particular, dead
lane detection generates incorrect code.
llvm-svn: 271087
Remove broken patterns matching it. This was matching the
unsafe math pattern and expanding the fix for the buggy instruction
from the pattern. The problems are also on CI. Remove the workarounds
and only use fract with unsafe math or from the intrinsic.
llvm-svn: 271078
The isMemWithSimmOffset predicate rejects relocations which is incorrect
behaviour. Linkers and other tools should handle|warn|error when the
field overflows.
Reviewers: dsanders, vkalintiris
Differential Revision: http://reviews.llvm.org/D20727
llvm-svn: 270995
Register numbers may be specified as assembly-time expressions.
This feature can be useful in macros and alike. However, expressions
are supported within sqare braces only.
Sqare braces were initially intended to support specifying of multiple
(pairs/quads...) registers. Syntax like v[8:8] which specifies single register
is also supported. That allows expressions but looks a bit unnatural.
This change supports syntax REG[EXPR].
Tests added.
Differential Revision: http://reviews.llvm.org/D20588
llvm-svn: 270990
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
A companion patch (D20684) removes/auto-upgrade the clang intrinsics.
Differential Revision: http://reviews.llvm.org/D20686
llvm-svn: 270973
The aggressive anti-dependency breaker can rename the restored callee-
saved registers. To prevent this, mark these registers are live on all
paths to the return/tail-call instructions, and add implicit use operands
for them to these instructions.
llvm-svn: 270898
Canonicalize (srl (bswap i32 x), 16) to (rotr (bswap i32 x), 16), if the high
16-bits of x are zero. Similarly, canonicalize (srl (bswap i64 x), 32) to
(rotr (bswap i64 x), 32), if the high 32-bits of x are zero.
test_rev_w_srl16: test_rev_w_srl16:
and w8, w0, #0xffff and w8, w0, #0xffff
rev w8, w8 ---> rev16 w0, w8
lsr w0, w8, #16
test_rev_x_srl32: test_rev_x_srl32:
rev x8, x8 ---> rev32 x0, x8
lsr x0, x8, #32
llvm-svn: 270896
NVVMIntrRange adds !range metadata to calls of NVVM intrinsics
that return values within known limited range.
This allows LLVM to generate optimal code for indexing arrays
based on tid/ctaid which is a frequently used pattern in CUDA code.
Differential Revision: http://reviews.llvm.org/D20644
llvm-svn: 270872
Hwreg(...) syntax implementation unified with sendmsg(...).
Common strings moved to Utils
MathExtras.h functionality utilized.
Added missing build dependency in Disassembler.
Differential Revision: http://reviews.llvm.org/D20381
llvm-svn: 270871
Most often as not this is what it started out as, the extraction is zero-cost on AVX and the PMOVZX/PMOVSX folding logic is based around 128-bit loads.
llvm-svn: 270858
The exit-on-error flag is needed to avoid an assert where
llvm::SelectionDAGISel::LowerArguments doesn't create enough arguments. Fill up
with zeroes to reach the right number of args.
Fixes PR27767.
Differential Revision: http://reviews.llvm.org/D20571
llvm-svn: 270855
If and only if the value being inserted sets only known zero bits.
This combine transforms things like
and w8, w0, #0xfffffff0
movz w9, #5
orr w0, w8, w9
into
movz w8, #5
bfxil w0, w8, #0, #4
The combine is tuned to make sure we always reduce the number of instructions.
We avoid churning code for what is expected to be performance neutral changes
(e.g., converted AND+OR to OR+BFI).
Differential Revision: http://reviews.llvm.org/D20387
llvm-svn: 270846
f32 vectors would use a sequence of BFI instructions instead
of unrolled cmp + select. This was better in the case of a VALU
select with SGPR inputs, but we don't have a way of dealing with that
in the DAG.
llvm-svn: 270731
By making pointer extraction from a vector more expensive in the cost model,
we avoid the vectorization of a loop that is very likely to be memory-bound:
https://llvm.org/bugs/show_bug.cgi?id=27826
There are still bugs related to this, so we may need a more general solution
to avoid vectorizing obviously memory-bound loops when we don't have HW gather
support.
Differential Revision: http://reviews.llvm.org/D20601
llvm-svn: 270729
As noted in the review, there are still problems, so this doesn't the bug completely.
Differential Revision: http://reviews.llvm.org/D20529
llvm-svn: 270718
Followup to D20528 clang patch, this removes the (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) llvm intrinsics and auto-upgrades to sitofp/fpext instead.
Differential Revision: http://reviews.llvm.org/D20568
llvm-svn: 270678
[AMDGPU] emitPrologue looks for an unused unallocated SGPR that is not
the scratch descriptor. Continue search if unused register found fails
other requirements.
Reviewers: arsenm, tstellarAMD, nhaehnle
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D20526
llvm-svn: 270646
Instead of this:
i32.const $push10=, __stack_pointer
i32.load $push11=, 0($pop10)
Emit this:
i32.const $push10=, 0
i32.load $push11=, __stack_pointer($pop10)
It's not currently clear which is better, though there's a chance the second
form may be better at overall compression. We can revisit this when we have
more data; for now it makes sense to make PEI consistent with isel.
Differential Revision: http://reviews.llvm.org/D20411
llvm-svn: 270635
Summary:
Change process of parsing of optional operands. All optional operands use same parsing method - parseOptionalOperand().
No default values are added to OperandsVector.
Get rid of WORKAROUND_USE_DUMMY_OPERANDS_INSTEAD_MUTIPLE_DEFAULT_OPERANDS.
Reviewers: tstellarAMD, vpykhtin, artem.tamazov, nhaustov
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D20527
llvm-svn: 270556
Patch by Nitesh Jain.
Summary: The type of Imm in MipsDisassembler.cpp was incorrect since SignExtend64 return int64_t type.As per the MIPSr6 doc ,the offset is added to the address of the instruction following the branch (not the branch itself), to form a PC-relative effective target address hence “4” is added to the offset. The offset of some test case are update to reflect the changes due to “ + 4 ” offset and new test case for negative offset are added.
Reviewers: dsanders, vkalintiris
Differential Revision: http://reviews.llvm.org/D17540
llvm-svn: 270542
They were accidentally using the 32-bit load/store instruction for
8/16-bit operations, due to incorrect patterns
(8/16-bit cmpxchg and atomicrmw will be fixed in subsequent changes)
llvm-svn: 270486
Use the more specific LiveInterval::removeSegment instead of
LiveInterval::shrinkToUses when we know the specific range that's
being removed.
llvm-svn: 270463
The exit-on-error flag on the many_args1.ll test is needed to avoid an
unreachable in BPFTargetLowering::LowerCall. We can also avoid it by ignoring
any superfluous arguments to the call (i.e. any arguments after the first 5).
Fixes PR27766.
Differential Revision: http://reviews.llvm.org/D20471
v2 of r270419
llvm-svn: 270440
This patch reverts r270419 because it broke a lot of buildbots,
mostly Windows. We'd like help in investigating the issues, but
for now, it should stay out.
llvm-svn: 270433
The exit-on-error flag on the many_args1.ll test is needed to avoid an
unreachable in BPFTargetLowering::LowerCall. We can also avoid it by ignoring
any superfluous arguments to the call (i.e. any arguments after the first 5).
Fixes PR27766
llvm-svn: 270419
This code should have been with the previous check-in (r270417) and prevents the DelaySlotFiller pass being utilized in functions where the erratum fix has been applied as this will break the run-time code.
llvm-svn: 270418
Due to an erratum in some versions of LEON, we must insert a NOP after any LD or LDF instruction to ensure the processor has time to load the value correctly before using it. This pass will implement that erratum fix.
The code will have no effect for other Sparc, but non-LEON processors.
Differential Review: http://reviews.llvm.org/D20353
llvm-svn: 270417
This isn't the complete fix, but it handles the trivial examples of duplicate vzero* ops in PR27823:
https://llvm.org/bugs/show_bug.cgi?id=27823
...and amusingly, the bogus cases already exist as regression tests, so let's take this baby step.
We'll need to do more in the general case where there's legitimate AVX usage in the function + there's
already a vzero in the code.
Differential Revision: http://reviews.llvm.org/D20477
llvm-svn: 270378
Allocating larger register classes first should give better allocation
results (and more importantly for myself, make the lit tests more stable
with respect to scheduler changes).
Patch by Matthias Braun
llvm-svn: 270312
The current SGPR spilling test does not stress this
because it is using s_buffer_load instructions to
increase SGPR pressure and spill, but their output
operands have the same SReg_32_XM0 constraint. This fixes
an error when the SReg_32 output from most instructions
is spilled.
llvm-svn: 270301
This saves a small amount of code size, and is a first small step toward
passing values on the stack across block boundaries.
Differential Review: http://reviews.llvm.org/D20450
llvm-svn: 270294
- Where we were returning a node before, call ReplaceNode instead.
- Where we would return null to fall back to another selector, rename
the method to try* and return a bool for success.
- Where we were calling SelectNodeTo, just return afterwards.
Part of llvm.org/pr26808.
llvm-svn: 270283
Summary:
As this optimization converts two loads into one load with two shift instructions,
it could potentially hurt performance if a loop is arithmetic operation intensive.
Reviewers: t.p.northover, mcrosier, jmolloy
Subscribers: evandro, jmolloy, aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20172
llvm-svn: 270251
We performed a number of memory allocations each time getTTI was called,
remove them by using SmallString.
No functionality change intended.
llvm-svn: 270246
This patch is a first step towards a more extendible method of matching combined target shuffle masks.
Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple.
I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits.
Differential Revision: http://reviews.llvm.org/D19198
llvm-svn: 270230
This refactors the logic in X86 to avoid code duplication. It also
splits it in two steps: it first decides if a symbol is local to the DSO
and then uses that information to decide how to access it.
The first part is implemented by shouldAssumeDSOLocal. It is not in any
way specific to X86. In a followup patch I intend to move it to
somewhere common and reused it in other backends.
llvm-svn: 270209
Since the calls don't return, the instruction afterwards will never run,
and is just taking up unnecessary space in the binary.
Differential Revision: http://reviews.llvm.org/D20406
llvm-svn: 270109
Mask0Imm and ~Mask1Imm must be equivalent and one of the MaskImms is a shifted
mask (e.g., 0x000ffff0). Both 'and's must have a single use.
This changes code like:
and w8, w0, #0xffff000f
and w9, w1, #0x0000fff0
orr w0, w9, w8
into
lsr w8, w1, #4
bfi w0, w8, #4, #12
llvm-svn: 270063
Fixes for MUBUF_Atomic instructions to make operand list valid:
- For RTN insns, make a copy of $vdata_in operand as $vdata.
- Do not add operand for GLC, it is hardcoded and comes as a token.
Workaround to avoid adding multiple default optional operands.
Tests added.
Differential Revision: http://reviews.llvm.org/D20257
llvm-svn: 270049
Enable "Remove Redundant LEAs" part of the LEA optimization pass for -O2.
This gives 6.4% performance improve on Broadwell on nnet benchmark from Coremark-pro.
There is no significant effect on other benchmarks (Geekbench, Spec2000, Spec2006).
Differential Revision: http://reviews.llvm.org/D19659
llvm-svn: 270036
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
isReturn() was returning different values with and without -g which led to
different code being generated. Change isFlagSettingInstruction to query
an instruction's effect on SR instead.
llvm-svn: 269986
Use signed division otherwise all back jumps fail the check
Fixes regression introduced in r269951
Differential Revision: http://reviews.llvm.org/D20380
llvm-svn: 269972
with an additional fix to make RegAllocFast ignore undef physreg uses. It would
previously get confused about the "push %eax" instruction's use of eax. That
method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate
as well, but since that runs after register-allocation, we didn't run into the
RegAllocFast issue before.
llvm-svn: 269949
Use register class that does not include them when looking
for unallocated registers.
This is hit by the udiv v8i64 test in the opencl integer
conformance test, and takes a few seconds to compile in
a debug build so no test included.
llvm-svn: 269938
Don't expand divisions by constants if it would require multiple instructions.
The current assumption is that engines will perform the desired optimizations.
llvm-svn: 269930
Summary:
The ordering of registers in BinaryRRF instructions are wrong, and
affects the copysign instruction (CPSDR). This results in the wrong
magnitude and sign being set.
Author: zhanjunl
Reviewers: kbarton, uweigand
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20308
llvm-svn: 269922
Summary:
MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT
pair while adding a timer function, such that another termination of the MWAITX
instruction occurs when the timer expires. The presence of the MONITORX and
MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29.
The MONITORX and MWAITX instructions are intercepted by the same bits that
intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be
monitored. MWAITX instruction causes the processor to stop instruction execution
and enter an implementation-dependent optimized state until occurrence of a
class of events.
Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is
"0F 01 FB". These opcode information is used in adding tests for the
disassembler.
These instructions are enabled for AMD's bdver4 architecture.
Patch by Ganesh Gopalasubramanian!
Reviewers: echristo, craig.topper, RKSimon
Subscribers: RKSimon, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19795
llvm-svn: 269911
MC only needs to know if the output is PIC or not. It never has to
decide about creating GOTs and PLTs for example. The only thing that
MC itself uses this information for is expanding "macros" in sparc and
mips. The rest I am pretty sure could be moved to CodeGen.
This is a cleanup and isolates the code from future changes to
Reloc::Model.
llvm-svn: 269909
It defined the LLVM_AVR_GCC_COMPAT constant, which would enable/disable
certain GCC-specific behaviours.
There is no point conditionally turning it on/off, as it will always be
turned on, and we have to maintain both code paths anyway.
llvm-svn: 269904
Restrict the creation of compact branches so that they do meet the ISA
requirements. Notably do not permit $zero to be used as a operand for compact
branches and ensure that some other branches fulfil the requirement that
rs != rt.
Fixup cases where $rs > $rt for bnec and beqc.
Recommit of rL269893 with reviewers comments.
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D20284
llvm-svn: 269899
Restrict the creation of compact branches so that they meet the ISA encoding
requirements. Notably do not permit $zero to be used as a operand for compact
branches and ensure that some other branches fulfil the requirement that
rs != rt.
Fixup cases where $rs > $rt for bnec and beqc.
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D20284
llvm-svn: 269893
This change adds support for software floating point operations for Sparc targets.
This is the first in a set of patches to enable software floating point on Sparc. The next patch will enable the option to be used with Clang.
Differential Revision: http://reviews.llvm.org/D19265
llvm-svn: 269892
We currently don't represent get_local and set_local explicitly; they
are just implied by virtual register use and def. This avoids a lot of
clutter, but it does complicate stackifying: get_locals read their
operands at their position in the stack evaluation order, rather than
at their parent instruction. This patch adds code to walk the stack to
determine the precise ordering, when needed.
llvm-svn: 269854
This patch moves the expansion of WIN_ALLOCA pseudo-instructions
into a separate pass that walks the CFG and lowers the instructions
based on a conservative estimate of the offset between the stack
pointer and the lowest accessed stack address.
The goal is to reduce binary size and run-time costs by removing
calls to _chkstk. While it doesn't fix all the code quality problems
with inalloca calls, it's an incremental improvement for PR27076.
Differential Revision: http://reviews.llvm.org/D20263
llvm-svn: 269828
Since r207518 they are printed exactly like non-hidden stubs on x86 and
since r207517 on ARM.
This means we can use a single set for all stubs in those platforms.
llvm-svn: 269776
The movw instruction is only available in ARM state for V6T2 and above.
The MOVi16 instruction has requirement HasV6T2 but the InstAlias
for mov rd, imm where the operand is imm0_65535_expr:$imm does not.
This means that movw can incorrectly be used in ARMv4 and ARMv5 by
writing mov rd, 0x1234. The simple fix is to the requirement HasV6T2
to the InstAlias. Tests added to not-armv4.s.
Patch by Peter Smith.
llvm-svn: 269761