Commit Graph

16639 Commits

Author SHA1 Message Date
Matthias Braun 84fd4bee6c RegScavenging: Add scavengeRegisterBackwards()
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

llvm-svn: 276044
2016-07-19 22:37:09 +00:00
Evandro Menezes 238fa76574 [AArch64] Properly validate the reciprocal estimation.
Add check for legal data types when expanding into a Newton series.

Differential Revision: https://reviews.llvm.org/D22267

llvm-svn: 276041
2016-07-19 22:31:11 +00:00
Ahmed Bougacha 5a59b24bdd [GlobalISel] Mark newly-created gvregs as having a bank.
Also verify that we never try to set the size of a vreg associated
to a register class.

Report an error when we encounter that in MIR. Fix a testcase that
hit that error and had a size for no reason.

llvm-svn: 276012
2016-07-19 19:48:36 +00:00
Simon Pilgrim 5366d0e0bc [X86][AVX512] Added AVX512 subvector broadcast tests
llvm-svn: 275994
2016-07-19 17:04:28 +00:00
Simon Pilgrim f2d02cb0f6 [X86][AVX] Fixed typo in test names
llvm-svn: 275992
2016-07-19 16:52:05 +00:00
Simon Pilgrim 0ea8d275cc [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion clang patch is at D22105

Differential Revision: https://reviews.llvm.org/D22106

llvm-svn: 275981
2016-07-19 15:07:43 +00:00
Sam Parker 6ca4bbb00d [ARM] Refactor Thumb2 Mul and Mla instr descs
Recommitting after r274347 was reverted. This patch introduces some
classes to refactor the 3 and 4 register Thumb2 multiplication
instruction descriptions, plus improved tests for some of those
instructions.

Differential Revision: https://reviews.llvm.org/D21929

llvm-svn: 275979
2016-07-19 14:44:05 +00:00
Simon Pilgrim b87a21f1c3 [AARCH64] Fix linu triple typo
As promised in D22191

llvm-svn: 275976
2016-07-19 14:12:45 +00:00
Simon Pilgrim fc4d4b251d [AARCH64] Enable AARCH64 lit tests on windows dev machines
As discussed on PR27654, this patch fixes the triples of a lot of aarch64 tests and enables lit tests on windows

This will hopefully help stop cases where windows developers break the aarch64 target

Differential Revision: https://reviews.llvm.org/D22191

llvm-svn: 275973
2016-07-19 13:35:11 +00:00
Daniel Sanders 6a73883c48 [mips] Correct label prefixes for N32 and N64.
Summary:
N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own
($).

This fixes the majority of object differences between -fintegrated-as and
-fno-integrated-as.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D22412

llvm-svn: 275967
2016-07-19 10:49:03 +00:00
Elena Demikhovsky 2c0780b8e5 AVX-512: Fixed BT instruction selection.
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).

I added the new sequence (truncate (a >>n) to i1) to the BT pattern.

Differential Revision: https://reviews.llvm.org/D22354

llvm-svn: 275950
2016-07-19 07:14:21 +00:00
Craig Topper d6ca1dc45e [AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions.
llvm-svn: 275942
2016-07-19 02:00:38 +00:00
Matt Arsenault cb540bc03c AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00
Matt Arsenault 50b76399ed AMDGPU: Fix test name and broken CHECK-LABEL
llvm-svn: 275928
2016-07-18 23:09:51 +00:00
Artem Belevich 9f97dcb018 [NVPTX] Make sure we adjust alignment at all call sites
.. including calls from kernel functions that were
ignored by mistake before.

llvm-svn: 275920
2016-07-18 21:58:48 +00:00
Artem Belevich 052b1ed2fd [NVPTX] Force minimum alignment of 4 for byval arguments of device-side functions.
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.

The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.

Differential Revision: https://reviews.llvm.org/D22428

llvm-svn: 275893
2016-07-18 19:54:56 +00:00
Vitaly Buka c93e10fcbb Revert "[ARM] Skip inline asm memory operands in DAGToDAGISel"
Breaks asan, see https://reviews.llvm.org/D22103

This reverts commit r275776.

llvm-svn: 275890
2016-07-18 19:44:01 +00:00
Vitaly Buka fa474e3eb9 Revert "[ARM] Update test to use CHECK-LABEL. NFCI."
Breaks asan, see https://reviews.llvm.org/D22103

This reverts commit r275777.

llvm-svn: 275889
2016-07-18 19:43:58 +00:00
Simon Pilgrim 069c732f82 [X86][SSE] Regenerate extraction from promotion test
Added tests for SSE2 as well as SSE41

llvm-svn: 275878
2016-07-18 18:53:15 +00:00
Simon Pilgrim a68b8df3a7 [X86][SSE] Regenerate extraction+store memop tests
Added tests for SSE2 as well as SSE41+AVX

llvm-svn: 275876
2016-07-18 18:44:01 +00:00
Simon Pilgrim b21b47ba61 [X86][SSE] Regenerate truncate+extension memop tests
Added tests for SSE2 as well as SSE41

llvm-svn: 275875
2016-07-18 18:42:33 +00:00
Simon Pilgrim 600baaed89 Regenerate test
llvm-svn: 275872
2016-07-18 18:38:51 +00:00
Matt Arsenault c96e1deffa AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault 4c519d3518 AMDGPU/R600: Replace barrier intrinsics
llvm-svn: 275870
2016-07-18 18:34:59 +00:00
Matt Arsenault efb24540b1 AMDGPU: Remove dead check in AMDGPUPromoteAlloca
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.

Also fix crashes on 0 x and 1 x arrays.

llvm-svn: 275869
2016-07-18 18:34:53 +00:00
Tim Northover 918f05063c CodeGenPrep: use correct function to determine Global's alignment.
Elsewhere (particularly computeKnownBits) we assume that a global will be
aligned to the value returned by Value::getPointerAlignment. This is used to
boost the alignment on memcpy/memset, so any target-specific request can only
increase that value.

llvm-svn: 275866
2016-07-18 18:28:52 +00:00
Simon Pilgrim c941f6b329 [X86][AVX] Add target shuffle decode support for VBROADCAST
Currently we only decode broadcasts from a vector of the same size.

llvm-svn: 275823
2016-07-18 17:32:59 +00:00
Krzysztof Parzyszek 5948ea78b9 [Hexagon] Handle returning small structures by value
This is compliant with the official ABI, but allows experimentation with
calling conventions.

llvm-svn: 275822
2016-07-18 17:30:41 +00:00
Chih-Hung Hsieh 4d9f2c154d [X86] Accept SELECT op code for x86-64 fp128 type
DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow
SELECT op code for x86_64 fp128 type for MME targets,
so SoftenFloatOperand does not abort on SELECT op code.

Differential Revision: http://reviews.llvm.org/D21758

llvm-svn: 275818
2016-07-18 17:20:09 +00:00
Simon Pilgrim 4ac7420618 [X86][AVX2] Added tests that demonstrate duplicate broadcasts
We don't yet decode broadcasts as a target shuffle

llvm-svn: 275808
2016-07-18 16:17:34 +00:00
Krzysztof Parzyszek 786333ffcc [Hexagon] Enable .cur formation in MISched for Hexagon V60
Schedule a load and its use in the same packet in MISched. Previously,
isResourceAvailable was returning false for dependences in the same
packet, which prevented MISched from packetizing a load and its use in
the same packet for v60.

Patch by Ikhlas Ajbar.

llvm-svn: 275804
2016-07-18 16:05:27 +00:00
Nemanja Ivanovic d3c284f645 [PowerPC] Remove redundant direct moves when extracting integers and converting to FP
This patch corresponds to review:
https://reviews.llvm.org/D21354

We use direct moves for extracting integer elements from vectors. We also use
direct moves when converting integers to FP. When these operations are chained,
we get a direct move out of a VSR followed by a direct move back into a VSR.
These are redundant - all we need to do is line up the element and convert.

llvm-svn: 275796
2016-07-18 15:30:00 +00:00
Krzysztof Parzyszek 393b37937b [Hexagon] Use timing class info as tie-breaker in machine scheduler
Patch by Sirish Pande.

llvm-svn: 275794
2016-07-18 15:17:10 +00:00
Krzysztof Parzyszek 3467e9d0a9 [Hexagon] HexagonMachineScheduler should account for resources
The machine scheduler needs to account for available resources
more accurately in order to avoid scheduling an instruction that
forces a new packet to be created.

This occurs in two ways: First, an instruction without an available
resource may have a large priority due to other metrics and be
scheduled when there are other instructions with available resources.
Second, an instruction with a non-zero latency may become available
prematurely. In both these cases, we attempt change the priority
in order to allow a better instruction to be scheduled.

Patch by Brendon Cahoon.

llvm-svn: 275793
2016-07-18 14:52:13 +00:00
Krzysztof Parzyszek 748d3efec6 [Hexagon] Fix zero latency instructions with multiple predecessors
An instruction may have multiple predecessors that are candidates
for using .cur. However, only one of them can use .cur in the
packet. When this case occurs, we need to make sure that only
one of the dependences gets a 0 latency value.

Patch by Brendon Cahoon.

llvm-svn: 275790
2016-07-18 14:23:10 +00:00
Simon Dardis d32a2d30cb [inlineasm] Propagate operand constraints to the backend
When SelectionDAGISel transforms a node representing an inline asm
block, memory constraint information is not preserved. This can cause
constraints to be broken when a memory offset is of the form:

offset + frame index

when the frame is resolved.

By propagating the constraints all the way to the backend, targets can
enforce memory operands of inline assembly to conform to their constraints.

For MIPSR6, some instructions had their offsets reduced to 9 bits from
16 bits such as ll/sc. This becomes problematic when using inline assembly
to perform atomic operations, as an offset can generated that is too big to
encode in the instruction.

Reviewers: dsanders, vkalintris

Differential Review: https://reviews.llvm.org/D21615

llvm-svn: 275786
2016-07-18 13:17:31 +00:00
Nicolai Haehnle bef1ceb815 AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions.
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.

Reviewers: tstellarAMD, arsenm

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D20728

llvm-svn: 275779
2016-07-18 09:02:47 +00:00
Diana Picus 6731f13458 [ARM] Update test to use CHECK-LABEL. NFCI.
llvm-svn: 275777
2016-07-18 07:48:42 +00:00
Diana Picus 73ed44d328 [ARM] Skip inline asm memory operands in DAGToDAGISel
The current logic for handling inline asm operands in DAGToDAGISel interprets
the operands by looking for constants, which should represent the flags
describing the kind of operand we're dealing with (immediate, memory, register
def etc). The operands representing actual data are skipped only if they are
non-const, with the exception of immediate operands which are skipped explicitly
when a flag describing an immediate is found.

The oversight is that memory operands may be const too (e.g. for device drivers
reading a fixed address), so we should explicitly skip the operand following a
flag describing a memory operand. If we don't, we risk interpreting that
constant as a flag, which is definitely not intended.

Fixes PR26038

Differential Revision: https://reviews.llvm.org/D22103

llvm-svn: 275776
2016-07-18 07:35:14 +00:00
Craig Topper a3c55f5915 [AVX512] Add EVEX versions of scalar ADD/SUB/MUL/DIV to load folding tables.
llvm-svn: 275775
2016-07-18 06:49:32 +00:00
Craig Topper 83613bb436 [X86] Fix test checks to include leading 'v' on avx mnemonic names.
llvm-svn: 275774
2016-07-18 06:49:29 +00:00
Diana Picus 774d157a5d [ARM] Honour ABI for rem under -O0 for EABI, GNUEABI, Android and Musl
At higher optimization levels, we generate the libcall for DIVREM_Ix, which is
fine: aeabi_{u|i}divmod. At -O0 we generate the one for REM_Ix, which is the
default {u}mod{q|h|s|d}i3.

This commit makes sure that we don't generate REM_Ix calls for ABIs that
don't support them (i.e. where we need to use DIVREM_Ix instead). This is
achieved by bailing out of FastISel, which can't handle non-double multi-reg
returns, and letting the legalization infrastructure expand the REM_Ix calls.

It also updates the divmod-eabi.ll test to run under -O0 as well, and adds some
Windows checks to it to make sure we don't break things for it.

Fixes PR27068

Differential Revision: https://reviews.llvm.org/D21926

llvm-svn: 275773
2016-07-18 06:48:25 +00:00
Craig Topper 1af6cc00dc [X86] Add VPADD instructions to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275769
2016-07-18 06:14:54 +00:00
Craig Topper ba9b93d7f2 [X86] Add floating point packed logical ops to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275768
2016-07-18 06:14:50 +00:00
Craig Topper 3a99de4067 [X86] Add AVX512 instructions to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275767
2016-07-18 06:14:47 +00:00
Craig Topper f7a06c29bc [X86] Add AVX512 load opcodes and a couple AVX load opcodes to X86InstrInfo::areLoadsFromSameBasePtr.
llvm-svn: 275765
2016-07-18 06:14:43 +00:00
Craig Topper 650a15e2b3 [X86] Add more opcodes to isFrameLoadOpcode/isFrameStoreOpcode. Mainly AVX-512 related.
llvm-svn: 275764
2016-07-18 06:14:39 +00:00
Craig Topper 5c913e84df [AVX512] Use VMOVAPSZ128rr/VMOVAPS256rr for VR128X/VR256X physreg moves when VLX is supported.
Ideally we would use VEX encoded moves instead of EVEX if the high 16 registers aren't referenced, but this a good first step.

llvm-svn: 275763
2016-07-18 06:14:34 +00:00
Simon Pilgrim 5aa90c55b6 [X86][AVX] Added VBROADCASTF128/VBROADCASTI128 tests
llvm-svn: 275713
2016-07-17 17:44:18 +00:00
Simon Pilgrim d1e941ae85 [X86] Regenerated ctlz/cttz scalar tests for 32/64-bit targets with/without LZCNT/TZCNT support
llvm-svn: 275710
2016-07-17 16:15:51 +00:00