Commit Graph

40232 Commits

Author SHA1 Message Date
Nikolai Bozhenov bb64aa14a3 [x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter
Move the definitions of three variables out of the switch.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25192

llvm-svn: 287874
2016-11-24 13:15:49 +00:00
Nikolai Bozhenov a2dabed3b6 [x86] Rewrite getAddressFromInstr helper function
- It does not modify the input instruction
- Second operand of any address is always an Index Register,
  make sure we actually check for that, instead of a check for
  an immediate value

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D24938

llvm-svn: 287873
2016-11-24 13:05:43 +00:00
Simon Pilgrim a3af79678e [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.

This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.

Differential Revision: https://reviews.llvm.org/D27072

llvm-svn: 287870
2016-11-24 12:13:46 +00:00
Matt Arsenault 7b54dd039e AMDGPU: Preserve m0 value when spilling
llvm-svn: 287844
2016-11-24 00:26:50 +00:00
Matt Arsenault 94b32ffe8e TRI: Add hook to pass scavenger during frame elimination
The scavenger was not passed if requiresFrameIndexScavenging was
enabled. I need to be able to test for the availability of an
unallocatable register here, so I can't create a virtual register for
it.

It might be better to just always use the scavenger and stop
creating virtual registers.

llvm-svn: 287843
2016-11-24 00:26:47 +00:00
Matt Arsenault 5ee3325358 AMDGPU: Remove m0 spilling code
Since m0 isn't allocatable it should never be spilled anymore.

llvm-svn: 287842
2016-11-24 00:26:44 +00:00
Matt Arsenault 9e5c7b1031 AMDGPU: Make m0 unallocatable
m0 may need to be written for spill code, so
we don't want general code uses relying on the
value stored in it.

This introduces a few code quality regressions where copies
from m0 are not coalesced into copies of a copy of m0.

llvm-svn: 287841
2016-11-24 00:26:40 +00:00
Simon Pilgrim 3ce6a545c7 [X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result
We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq

llvm-svn: 287835
2016-11-23 22:35:06 +00:00
Matt Arsenault a24d84beb9 AMDGPU: Cleanup immediate folding code
Move code down to use, reorder to avoid hard to follow
immediate folding logic.

llvm-svn: 287818
2016-11-23 21:51:07 +00:00
Matt Arsenault 391c3ea9bc AMDGPU: Fix debug printing
The uint8_t was printed as a char which didn't really work.

llvm-svn: 287817
2016-11-23 21:51:05 +00:00
Matt Arsenault 997a9abf4c AMDGPU: Fix not setting kill flag on temp reg when spilling
llvm-svn: 287808
2016-11-23 21:00:12 +00:00
Matt Arsenault dd0cb2a3e5 AMDGPU: Fix adding extra implicit def of register
In the scalar case, there's no reason to add an additional
def of the same register.

llvm-svn: 287807
2016-11-23 21:00:10 +00:00
Matt Arsenault 2669a76f01 AMDGPU: Fix MMO when splitting spill
The size and offset were wrong. The size of the object was
being used for the size of the access, when here it is really
being split into 4-byte accesses. The underlying object size
is set in the MachinePointerInfo, which also didn't have the
offset set.

llvm-svn: 287806
2016-11-23 20:52:53 +00:00
Michael Kuperstein 47eb85a003 [X86] Allow folding of stack reloads when loading a subreg of the spilled reg
We did not support subregs in InlineSpiller:foldMemoryOperand() because targets
may not deal with them correctly.

This adds a target hook to let the spiller know that a target can handle
subregs, and actually enables it for x86 for the case of stack slot reloads.
This fixes PR30832.

Differential Revision: https://reviews.llvm.org/D26521

llvm-svn: 287792
2016-11-23 18:33:49 +00:00
Nemanja Ivanovic 10fc3cfc63 [PowerPC] Remove InstAlias definitions that cause incorrect assembly
In rL283190, I added some InstAlias definitions to generate extended mnemonics
for some uses of the XXPERMDI instruction. However, when the assembler matches
these extended mnemonics, it matches the new instruction in situations where it
should match the old one.
This patch removes these definitions and accomplishes that by defining these
mnemonics with additional instructions that are isCodeGenOnly.

Fixes PR31127.

llvm-svn: 287765
2016-11-23 15:51:52 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Simon Pilgrim 03cd8f887c [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Craig Topper f57e17def0 [AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles.
llvm-svn: 287744
2016-11-23 06:54:55 +00:00
Zvi Rackover 14aba43ea9 [X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's
Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors.

Reviewers: RKSimon, craig.topper, delena, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26985

llvm-svn: 287743
2016-11-23 06:45:25 +00:00
Kuba Mracek 06995e866b [xray] Add XRay support for Mach-O in CodeGen
Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support.

Differential Revision: https://reviews.llvm.org/D26983

llvm-svn: 287734
2016-11-23 02:07:04 +00:00
Matthias Braun 7f423442d1 TargetSubtargetInfo: Move implementation to lib/CodeGen; NFC
TargetSubtargetInfo is filled with CodeGen specific interfaces nowadays
(getInstrInfo(), getFrameLowering(), getSelectionDAGInfo()) most of the
tuning flags like enablePostRAScheduler(), getAntiDepBreakMode(),
enableRALocalReassignment(), ... also do not seem to be universal enough
to make sense outside of CodeGen.

Differential Revision: https://reviews.llvm.org/D26948

llvm-svn: 287708
2016-11-22 22:09:03 +00:00
Simon Dardis 6efb8dd2e3 [mips] seb, seh instruction aliases
Add the single operand form.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D26961

llvm-svn: 287681
2016-11-22 19:17:23 +00:00
Nemanja Ivanovic b8e30d6db6 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
This patch corresponds to review:
https://reviews.llvm.org/D26861

It also fixes PR30730.

Committing on behalf of Lei Huang.

llvm-svn: 287679
2016-11-22 19:02:07 +00:00
Simon Pilgrim 4aa876ca7c [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering. 

We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle

llvm-svn: 287676
2016-11-22 17:50:06 +00:00
Vasileios Kalintiris 04dc211e6a [mips] Add support for unaligned load/store macros.
Add missing unaligned store macros (ush/usw) and fix the exisiting
implementation of the unaligned load macros in order to generate
identical expansions with the GNU assembler.

llvm-svn: 287646
2016-11-22 16:43:49 +00:00
Tim Northover b64fb453ea CodeGen: simplify TargetMachine::getSymbol interface. NFC.
No-one actually had a mangler handy when calling this function, and
getSymbol itself went most of the way towards getting its own mangler
(with a local TLOF variable) so forcing all callers to supply one was
just extra complication.

llvm-svn: 287645
2016-11-22 16:17:20 +00:00
Zvi Rackover 9a355219d1 [X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC.
llvm-svn: 287644
2016-11-22 15:33:28 +00:00
Zvi Rackover 0aa1c32d14 [X86] Remove dead code from LowerVectorBroadcast
Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish.

Reviewers: craig.topper, delena, RKSimon, andreadb

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D26678

llvm-svn: 287643
2016-11-22 15:17:52 +00:00
Chad Rosier ecc77273a0 [AArch64] Set the max interleave factor for Falkor.
llvm-svn: 287642
2016-11-22 14:25:02 +00:00
Chad Rosier 2abc29c593 [AArch64] Maximize 80-column. NFC.
llvm-svn: 287640
2016-11-22 14:12:09 +00:00
Coby Tayree 49b3733d57 [AVX512][inline-asm] Fix AVX512 inline assembly instruction resolution when the size qualifier of a memory operand is not specified explicitly.
This commit handles cases where the size qualifier of an indirect memory reference operand in Intel syntax is missing (e.g. "vaddps xmm1, xmm2, [a]").

GCC will deduce the size qualifier for AVX512 vector and broadcast memory operands based on the possible matches:
"vaddps xmm1, xmm2, [a]" matches only “XMMWORD PTR” qualifier.
"vaddps xmm1, xmm2, [a]{1to4}" matches only “DWORD PTR” qualifier.

This is different from the current behavior of LLVM, which deduces the size qualifier based on the size of the memory operand.
For "vaddps xmm1, xmm2, [a]"
"char a;" will imply "BYTE PTR" qualifier
"short a;" will imply "WORD PTR" qualifier.

This commit aligns LLVM to GCC’s behavior.

This is the LLVM part of the review.
The Clang part of the review: https://reviews.llvm.org/D26587

Differential Revision: https://reviews.llvm.org/D26586

llvm-svn: 287630
2016-11-22 09:30:29 +00:00
Craig Topper 3dcf45f08d [X86] Remove alternate CodeGenOnly version of (v)movq that declared the load size as i128mem. Change all uses to the use the i64mem version.
I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX.

There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table.

llvm-svn: 287622
2016-11-22 05:31:43 +00:00
Craig Topper cada9f2275 [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD).
Summary:
The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities.

We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too.

Reviewers: igorb, delena, Ayal, Farhana, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25652

llvm-svn: 287621
2016-11-22 04:57:34 +00:00
Craig Topper da22267055 [AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type
Summary:
Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking.

This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018.

I plan to add support for more operations in future patches.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26902

llvm-svn: 287612
2016-11-22 03:51:53 +00:00
Stanislav Mekhanoshin ae0f6620e4 [AMDGPU] Fix multiple vreg definitions in si-lower-control-flow
Differential Revision: https://reviews.llvm.org/D26939

llvm-svn: 287608
2016-11-22 01:42:34 +00:00
Geoff Berry e0bf52f394 [AArch64LoadStoreOptimizer] Don't treat write to XZR/WZR as a clobber.
Summary:
When searching for load/store instructions to pair/merge don't treat
writes to WZR/XZR as clobbers since they don't change the value read
from WZR/XZR (which is always 0).

Reviewers: mcrosier, junbuml, jmolloy, t.p.northover

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D26921

llvm-svn: 287592
2016-11-21 22:51:10 +00:00
Simon Dardis 43115a1ce4 [mips] seq macro support
This patch adds the seq macro.

This partially resolves PR/30381.

Thanks to Sean Bruno for reporting the issue!

Reviewers: zoran.jovanovic, vkalintiris, seanbruno

Differential Revision: https://reviews.llvm.org/D24607

llvm-svn: 287573
2016-11-21 20:30:41 +00:00
Coby Tayree 94ddbb4a04 small fixup which enables the issuing of the aforementioned instruction (w/o operands), on MS/Intel syntax.
Differential Revision: https://reviews.llvm.org/D26913

llvm-svn: 287548
2016-11-21 15:50:56 +00:00
Simon Pilgrim b7bbaa669b [X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input
At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input).

This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit.

llvm-svn: 287535
2016-11-21 12:05:49 +00:00
Michael Zuckerman 8462faeaba Fixing a small typo (A->U).
This seem to fixes PR30992.

-         HasAVX512 ? X86::VMOVAPSZ128rm_NOVLX 
+         HasAVX512 ? X86::VMOVUPSZ128rm_NOVLX 

llvm-svn: 287532
2016-11-21 11:52:11 +00:00
Craig Topper 9f2d632ee7 [AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match SSE and AVX.
llvm-svn: 287523
2016-11-21 07:51:31 +00:00
Alexei Starovoitov 7ab125dbf3 [bpf] fix dwarf elf relocs and line numbers
- teach RelocVisitor to recognize bpf relocations
- fix AsmInfo->PointerSize to make sure dwarf is emitted correctly
- add a test for the above

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287521
2016-11-21 06:21:23 +00:00
Craig Topper 0dfc09372f [X86] Remove duplicate instructions for (v)movq and replace with patterns on other instructions. NFC
llvm-svn: 287519
2016-11-21 04:07:56 +00:00
Dean Michael Berris 31761f300d [XRay][AArch64] Implemented a test for the compile-time sleds emitted, and fixed a bug in the jump instruction
This patch adds a test for the assembly code emitted with XRay
instrumentation. It also fixes a bug where the operand of a jump
instruction must be not the number of bytes to jump over, but rather the
number of 4-byte instructions.

Author: rSerge

Reviewers: dberris, rengolin

Differential Revision: https://reviews.llvm.org/D26805

llvm-svn: 287516
2016-11-21 03:01:43 +00:00
Simon Dardis 1dcb911061 [mips] Restrict tail call optimization
The tail call optimization was being used without proper consideration of
ABI requirements for saving and restoring the GP. This patch restricts tail
call optimization to functions within the same translation unit.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D24763

llvm-svn: 287505
2016-11-20 21:23:08 +00:00
Coby Tayree 99a6639047 The 'vpmultishiftqb' instruction was implemented falsely, this patch amend it.
More specifically - (MS dialect) broadcasting variants were implemented falsely.

Differential Revision: https://reviews.llvm.org/D26257

llvm-svn: 287501
2016-11-20 17:19:55 +00:00
Coby Tayree 97e9cf62f4 Some instructions were missing, other implemented falsely. this patch aims at amending those issues. full list:
vcvtps2pd
vcvtudq2pd
vcvtps2qq
vcvttps2qq
vcvtps2uqq
vcvttps2uqq

variants are:

[Dst]XMM(zero-masked/merge-masked/unmasked)
[Src]Mem64

Differential Revision: https://reviews.llvm.org/D26799

llvm-svn: 287500
2016-11-20 17:09:56 +00:00
Simon Pilgrim 5fadce4a3f [X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible
llvm-svn: 287497
2016-11-20 16:11:36 +00:00
Simon Pilgrim 5401bae523 [X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines
llvm-svn: 287496
2016-11-20 15:24:38 +00:00
Simon Pilgrim 3f40412e0f [X86][AVX512] Add support for VBMI VPERMV target shuffle combines
llvm-svn: 287495
2016-11-20 15:05:45 +00:00