Commit Graph

24155 Commits

Author SHA1 Message Date
Sanjay Patel 6124cae8f7 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, 
so replace a pair of casts with the equivalent node. We don't have to account for 
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 328921
2018-03-31 17:55:44 +00:00
Simon Pilgrim 3b8ad346f9 [X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entries
llvm-svn: 328918
2018-03-31 09:15:54 +00:00
Puyan Lotfi 57c4f38c35 [MIR-Canon] Adding support for local idempotent instruction hoisting.
llvm-svn: 328915
2018-03-31 05:48:51 +00:00
Craig Topper 13a0f83a05 [X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Reviewers: RKSimon, GGanesh, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, gbedwell, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D44972

llvm-svn: 328914
2018-03-31 04:54:32 +00:00
Krzysztof Parzyszek 526fbf8e33 [Hexagon] Fix testcase
llvm-svn: 328899
2018-03-30 19:46:28 +00:00
Krzysztof Parzyszek 0f983d69a4 [Hexagon] Avoid creating invalid offsets in packetizer
Two memory instructions with a dependency only on the address register
between the two (the first one of them being post-incrememnt) can be
packetized together after the offset on the second was updated to the
incremement value. Make sure that the new offset is valid for the
instruction.

llvm-svn: 328897
2018-03-30 19:28:37 +00:00
Puyan Lotfi 399b46c98d [MIR] Adding support for Named Virtual Registers in MIR.
llvm-svn: 328887
2018-03-30 18:15:54 +00:00
Stanislav Mekhanoshin 74e2974ac6 [AMDGPU] Fixed some instructions latencies
Differential Revision: https://reviews.llvm.org/D45073

llvm-svn: 328874
2018-03-30 16:19:13 +00:00
Sanjay Patel e09b7dcf3d [SelectionDAG] Removing FABS folding from DAGCombiner
The code has bugs dealing with -0.0.

Since D44550 introduced FABS pattern folding in InstCombine, 
this patch removes the now-redundant code that causes 
https://bugs.llvm.org/show_bug.cgi?id=36600.

Patch by Mikhail Dvoretckii!

Differential Revision: https://reviews.llvm.org/D44683

llvm-svn: 328872
2018-03-30 15:42:52 +00:00
Krzysztof Parzyszek 46abcb236b [Hexagon] Fix printing :mem_noshuf on compiler-generated packets
llvm-svn: 328869
2018-03-30 15:09:05 +00:00
Michael Bedy 59e5ef793c [AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE.
Summary:
The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.

This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.

Reviewers: arsenm, #amdgpu

Reviewed By: arsenm, #amdgpu

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44364

llvm-svn: 328856
2018-03-30 05:03:36 +00:00
Eli Friedman 208fe67a78 [MachineCopyPropagation] Handle COPY with overlapping source/dest.
MachineCopyPropagation::CopyPropagateBlock has a bunch of special
handling for COPY instructions. This handling assumes that COPY
instructions do not modify the source of the copy; this is wrong if
the COPY destination overlaps the source.

To fix the bug, check explicitly for this situation, and fall back to
the generic instruction handling.

This bug can't happen for most register classes because they don't
have this sort of overlap, but there are a few register classes
where this is possible. The testcase uses the AArch64 QQQQ register
class.

Differential Revision: https://reviews.llvm.org/D44911

llvm-svn: 328851
2018-03-30 00:56:03 +00:00
Matt Arsenault 03ae399d50 AMDGPU: Support realigning stack
While the stack access instructions don't care about
alignment > 4, some transformations on the pointer calculation
do make assumptions based on knowing the low bits of a pointer
are 0. If a stack object ends up being accessed through its
absolute address (relative to the kernel scratch wave offset),
the addressing expression may depend on the stack frame being
properly aligned. This was breaking in a testcase due to the
add->or combine.

I think some of the SP/FP handling logic is still backwards,
and overly simplistic to support all of the stack features.
Code which tries to modify the SP with inline asm for example
or variable sized objects will probably require redoing this.

llvm-svn: 328831
2018-03-29 21:30:06 +00:00
Craig Topper 89310f56c8 [X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI.
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd

Differential Revision: https://reviews.llvm.org/D44838

llvm-svn: 328823
2018-03-29 20:41:39 +00:00
Matt Arsenault ffb132e74b AMDGPU: Increase default stack alignment
8 and 16-byte values are common, so increase the default
alignment to avoid realigning the stack in most functions.

llvm-svn: 328821
2018-03-29 20:22:04 +00:00
Matt Arsenault 6c041a3cab AMDGPU: Fix selection error on constant loads with < 4 byte alignment
llvm-svn: 328818
2018-03-29 19:59:28 +00:00
Paul Robinson 407ff1b1cd Try to fix a couple tests for Windows.
llvm-svn: 328814
2018-03-29 18:59:33 +00:00
Paul Robinson b271f31d8d Reapply "[DWARFv5] Emit file 0 to the line table."
DWARF v5 specifies that the root file (also given in the DW_AT_name
attribute of the compilation unit DIE) should be emitted explicitly to
the line table's list of files.  This makes the line table more
independent of the .debug_info section.
We emit the new syntax only for DWARF v5 and later.

Fixes the bug found by asan. Also XFAIL the new test for Darwin, which
is stuck on DWARF v2, and fix up other tests so they stop failing on
Windows.  Last but not least, don't break "clang -g" of an assembler
file that has .file directives in it.

Differential Revision: https://reviews.llvm.org/D44054

llvm-svn: 328805
2018-03-29 17:16:41 +00:00
Krzysztof Parzyszek dc7a557e6a [Hexagon] Add support to handle bit-reverse load intrinsics
Patch by Sumanth Gundapaneni.

llvm-svn: 328774
2018-03-29 13:52:46 +00:00
Jun Bum Lim f90fe701ef [PostRAMachineSink] preserve CFG
Summary: Mark CFG is preserved  since this pass do not make any change in CFG.

Reviewers: sebpop, mzolotukhin, mcrosier

Reviewed By: mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44845

llvm-svn: 328727
2018-03-28 19:56:26 +00:00
Krzysztof Parzyszek 440ba3ae5c [Hexagon] Add support for "new" circular buffer intrinsics
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" versions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.

We need to use pseudo instructions for these instructions until
after register allocation. The problem is that these instructions
allocate a M0/CS0 or M1/CS1 pair. But, we can't generate code for
the CSx set-up until after register allocation when the Mx
register has been fixed for the instruction.

There is a related clang patch.

Patch by Brendon Cahoon.

llvm-svn: 328724
2018-03-28 19:38:29 +00:00
Jessica Paquette 4aa14dbcc2 [MachineOutliner] Simplify call outlining + require valid callee save info for call outlining
This commit simplifies the call outlining logic by removing references to the
Function associated with the callee. To do this, it requires that valid
callee save info is available to the outliner.

llvm-svn: 328719
2018-03-28 17:52:31 +00:00
Simon Pilgrim 7237e0cf39 [X86][AVX2] Add shuffle test case from PR36933
llvm-svn: 328714
2018-03-28 16:48:48 +00:00
Alexander Potapenko 202f809437 Revert "Reapply "[DWARFv5] Emit file 0 to the line table.""
This reverts commit r328676.

Commit r328676 broke the -no-integrated-as flag necessary to build Linux kernel with Clang:

$ cat t.c
void foo() {}
$ clang -no-integrated-as   -c  t.c -g
/tmp/t-dcdec5.s: Assembler messages:
/tmp/t-dcdec5.s:8: Error: file number less than one
clang-7.0: error: assembler command failed with exit code 1 (use -v to see invocation)

llvm-svn: 328699
2018-03-28 12:36:46 +00:00
Tim Renouf cdac172e2a Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader"
This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981.

It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a
release-with-asserts build. I will resubmit the change when I have fixed
that.

Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a
llvm-svn: 328695
2018-03-28 11:21:07 +00:00
Christof Douma a1e77c0e02 [ARM] Support float literals under XO
Follow up patch of r328313 to support the UseVMOVSR constraint. Removed
some unneeded instructions from the test and removed some stray
comments.

Differential Revision: https://reviews.llvm.org/D44941

llvm-svn: 328691
2018-03-28 10:02:26 +00:00
Mikael Holmen 6c062b7641 [RegisterCoalescing] Don't move COPY if it would interfere with another value
Summary:
RegisterCoalescer::removePartialRedundancy tries to hoist B = A from
BB0/BB2 to BB1:

  BB1:
       ...
  BB0/BB2:  ----
       B = A;   |
       ...      |
       A = B;   |
         |-------
         |

It does so if a number of conditions are fulfilled. However, it failed
to check if B was used by any of the terminators in BB1. Since we must
insert B = A before the terminators (since it's not a terminator itself),
this means that we could erroneously insert a new definition of B before a
use of it.

Reviewers: wmi, qcolombet

Reviewed By: wmi

Subscribers: MatzeB, llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D44918

llvm-svn: 328689
2018-03-28 06:01:30 +00:00
Sanjay Patel bb33007b25 [AArch64] add ftrunc tests; NFC
As suggested in D44909.

llvm-svn: 328683
2018-03-28 00:56:00 +00:00
Sanjay Patel 594c1546f1 [PowerPC] add ftrunc vector tests; NFC
Baseline tests for vectors as suggested in D44909.

llvm-svn: 328682
2018-03-28 00:49:12 +00:00
Paul Robinson 07480bd177 Reapply "[DWARFv5] Emit file 0 to the line table."
DWARF v5 specifies that the root file (also given in the DW_AT_name
attribute of the compilation unit DIE) should be emitted explicitly to
the line table's list of files.  This makes the line table more
independent of the .debug_info section.

Fixes the bug found by asan. Also XFAIL the new test for Darwin, which
is stuck on DWARF v2, and fix up other tests so they stop failing on
Windows.  Last but not least, don't break "clang -g" of an assembler
file that has .file directives in it.

Differential Revision: https://reviews.llvm.org/D44054

llvm-svn: 328676
2018-03-27 22:40:34 +00:00
Jessica Paquette 2519ee7081 [MachineOutliner] AArch64: Don't outline ADRPs with un-outlinable operands
If an ADRP appears with, say, a CPI operand, we shouldn't outline it.

This moves the check for unsafe operands so that it occurs before the special-case
for ADRPs. Also add a test for outlining ADRPs.

llvm-svn: 328674
2018-03-27 22:23:48 +00:00
Tim Renouf e4208bfa5b [AMDGPU] For OS type AMDPAL, fixed scratch on compute shader
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).

This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.

Reviewers: kzhuravl, nhaehnle, timcorringham

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D44468

Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 328673
2018-03-27 21:35:00 +00:00
Paul Robinson 7cb26ad2ef [DWARF] Suppress split line tables more carefully.
If a given split type unit does not have source locations, don't have
it refer to the split line table.
If no split type unit refers to the split line table, don't emit the
line table at all.

This will save a little space on rare occasions, but also refactors
things a bit to improve which class is responsible for what.

Responding to review comments on r326395.

Differential Revision: https://reviews.llvm.org/D44220

llvm-svn: 328670
2018-03-27 21:28:59 +00:00
Tim Renouf 4db0960420 [CodeGen] Fixed unreachable with -print-machineinstrs and custom pseudo source value
Summary:
Rev 327580 "[CodeGen] Use MIR syntax for MachineMemOperand printing"
broke -print-machineinstrs for us on AMDGPU, because we have custom
pseudo source values, and MIR serialization does not implement that.

This commit at least restores the functionality of -print-machineinstrs,
even if it does not properly implement the missing MIR serialization
functionality.

Differential Revision: https://reviews.llvm.org/D44871

Change-Id: I44961c0b90bf6d48c01484ed7a4e466fd300db66
llvm-svn: 328668
2018-03-27 21:14:04 +00:00
Simon Pilgrim a2f26788a3 [X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes
Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.

Differential Revision: https://reviews.llvm.org/D44924

llvm-svn: 328664
2018-03-27 20:38:54 +00:00
Matt Arsenault 17f3338015 AMDGPU: Fix not preserving CSR VGPR if used for SGPR spills
Before this was not done if the function had no calls in it. This
is still a possible issue with any callable function, regardless
of calls present.

llvm-svn: 328659
2018-03-27 19:42:55 +00:00
Matt Arsenault 0a0c871f60 AMDGPU: Fix crash when MachinePointerInfo invalid
The combine on a select of a load only triggers for
addrspace 0, and discards the MachinePointerInfo. The
conservative default needs to be used for this.

llvm-svn: 328652
2018-03-27 18:39:45 +00:00
Matt Arsenault 126a874952 AMDGPU: Fix register name format in tests
These were changed to match the asm output name a long time ago,
although I think the old tablegenerated names still work.

llvm-svn: 328651
2018-03-27 18:39:42 +00:00
Matt Arsenault e9f3679031 AMDGPU: Fix FP restore from being reordered with stack ops
In a function, s5 is used as the frame base SGPR. If a function
is calling another function, during the call sequence
it is copied to a preserved SGPR and restored.

Before it was possible for the scheduler to move stack operations
before the restore of s5, since there's nothing to associate
a frame index access with the restore.

Add an implicit use of s5 to the adjcallstack pseudo which ends
the call sequence to preven this from happening. I'm not 100%
satisfied with this solution, but I'm not sure what else would be
better.

llvm-svn: 328650
2018-03-27 18:38:51 +00:00
Krzysztof Parzyszek 0375cd46ef [Hexagon] Implement TTI::shouldMaximizeVectorBandwidth
llvm-svn: 328648
2018-03-27 18:10:47 +00:00
Stefan Pintilie 659f040351 [Power9] Fix the resource list for the COPY instruction.
The COPY instruction was listed as a 4 cycle instruction.
It is now listed correctly as a 2 cycle ALU instruction.

llvm-svn: 328647
2018-03-27 17:51:53 +00:00
Artur Pilipenko ca1d849cd6 Fix a reoccuring typo in load-combine tests
%tmp = bitcast i32* %arg to i8*
   %tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0
-  %tmp2 = load i8, i8* %tmp, align 1
+  %tmp2 = load i8, i8* %tmp1, align 1

This doesn't change the semantics of the tests but makes use of %tmp1 which was originally intended.

llvm-svn: 328642
2018-03-27 17:33:50 +00:00
Krzysztof Parzyszek 52396bb9c5 Use .set instead of = when printing assignment in assembly output
On Hexagon "x = y" is a syntax used in most instructions, and is not
treated as a directive.

Differential Revision: https://reviews.llvm.org/D44256

llvm-svn: 328635
2018-03-27 16:44:41 +00:00
Simon Pilgrim 5f7ab4fedf [X86][Btver2] Add MMX_PMOVMSKBrr to MOVMSK scheduler class
llvm-svn: 328620
2018-03-27 12:26:12 +00:00
Strahinja Petrovic 06cf6a6490 [PowerPC] Secure PLT support
This patch supports secure PLT mode for PowerPC 32 architecture.

Differential Revision: https://reviews.llvm.org/D42112

llvm-svn: 328617
2018-03-27 11:23:53 +00:00
Sanjay Patel 15f7df9f44 [x86] add RUN for target before roundss; NFC
llvm-svn: 328601
2018-03-27 00:32:19 +00:00
Sanjay Patel 8653776367 [x86] add tests for ftrunc; NFC
llvm-svn: 328592
2018-03-26 23:18:32 +00:00
Simon Pilgrim f6440b6fb1 Fix newlines. NFCI.
llvm-svn: 328583
2018-03-26 21:07:59 +00:00
Simon Pilgrim 28e7bcbba6 [X86] Add WriteCRC32 scheduler class
Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.

Differential Revision: https://reviews.llvm.org/D44647

llvm-svn: 328582
2018-03-26 21:06:14 +00:00
Rafael Espindola 78fdca3cd5 Use local symbols for creating .stack-size.
llvm-svn: 328581
2018-03-26 20:40:22 +00:00
Reid Kleckner 41fb2dba9c [X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32
Summary:
Re-lands r328386 and r328443, reverting r328482.

Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small
parameters in i8 and i16 do not end up in the SysV register parameters
(EDI, ESI, etc).

I added tests for how we receive small parameters, since that is the
important part. It's always safe to store more bytes than will be read,
but the assumptions you make when loading them are what really matter.

I also tested this by self-hosting clang and it passed tests on win64.

Reviewers: mstorsjo, hans

Subscribers: hiraditya, mstorsjo, llvm-commits

Differential Revision: https://reviews.llvm.org/D44900

llvm-svn: 328570
2018-03-26 18:49:48 +00:00
Simon Pilgrim f33d905293 [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881)
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.

These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).

Differential Revision: https://reviews.llvm.org/D44879

llvm-svn: 328566
2018-03-26 18:19:28 +00:00
Krzysztof Parzyszek 5488deb1ab [Hexagon] Add more lit tests
llvm-svn: 328561
2018-03-26 17:53:48 +00:00
Lei Huang be0afb0870 [Power9]Legalize and emit code for quad-precision convert from double-precision
Legalize and emit code for quad-precision floating point operation xscvdpqp
and add option to guard the quad precision operation support.

Differential Revision: https://reviews.llvm.org/D44746

llvm-svn: 328558
2018-03-26 17:46:25 +00:00
Stefan Pintilie 26d4f923c4 [PowerPC] Infrastructure work. Implement getting the opcode for a spill in one place.
A new function getOpcodeForSpill should now be the only place to get
the opcode for a given spilled register.

Differential Revision: https://reviews.llvm.org/D43086

llvm-svn: 328556
2018-03-26 17:39:18 +00:00
Simon Pilgrim 86ea53123d [X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs
We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR.....

llvm-svn: 328551
2018-03-26 17:02:02 +00:00
Krzysztof Parzyszek 9f041b1830 [Pipeliner] Add missing loop carried dependences
The pipeliner is not adding a dependence edge for a loop carried
dependence, and ends up scheduling a load from iteration n prior
to an aliased store in iteration n-1.

The code that adds the loop carried dependences in the pipeliner
doesn't check if the memory objects for loads and stores are
"identified" (i.e., distinct) objects. If they are not, then the
code that adds the dependences needs to be conservative. The
objects can be used to check dependences only when they are
distinct objects.

The code that checks for loop carried dependences has been updated
to classify loads and stores that are not identified as "unknown"
values. A store with an "unknown" value can potentially create
a loop carried dependence with any pending load.

Patch by Brendon Cahoon.

llvm-svn: 328547
2018-03-26 16:50:11 +00:00
Krzysztof Parzyszek a212204453 [Pipeliner] Use latency to compute RecMII
The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.

The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a 
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.

The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).

The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.

The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound. 

Patch by Brendon Cahoon.

llvm-svn: 328542
2018-03-26 16:33:16 +00:00
Simon Pilgrim 8815105cd5 [X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs
llvm-svn: 328541
2018-03-26 16:24:13 +00:00
Krzysztof Parzyszek f13bbf1d58 [Pipeliner] Fix assert caused by pipeliner serialization
The pipeliner is asserting because the serialization step that 
occurs at the end is deleting an instruction.  The assert
occurs later on because there is a use without a definition.  

The problem occurs when an instruction defines a value used 
by a REQ_SEQUENCE and that value is used by a COPY instruction.
The latencies between these instructions are zero, so they are
put in to the same packet.  The serialization code is unable to
handle this correctly, and ends up putting the REG_SEQUENCE
before its definition.

There is special code in the serialization step that attempts
to handle zero-cost instructions (phis, copy, reg_sequence)
differently than regular instructions. Unfortunately, this means
the order does not come out correct.

This patch simplifies the code by changing the seperate steps for
handling zero-cost and regular instructions. Only phis are
handled separate now, since they should occurs first. Then, this
patch adds checks to make use the MoveUse is set to the smallest
value if there are multiple uses in a cycle.

Patch by Brendon Cahoon.

llvm-svn: 328540
2018-03-26 16:23:29 +00:00
Krzysztof Parzyszek 8e1363df4e [Pipeliner] Fix check for order dependences when finalizing instructions
The code in orderDepdences that looks at the order dependences between
instructions was processing all the successor and predecessor order
dependences. However, we really only want to check for an order dependence
for instructions scheduled in the same cycle.

Also, fixed how the pipeliner handles output dependences. An output
dependence is also a potential loop carried dependence. The pipeliner
didn't handle this case properly so an invalid schedule could be created
that allowed an output dependence to be scheduled in the next iteration
at the same cycle.

Patch by Brendon Cahoon.

llvm-svn: 328516
2018-03-26 16:05:55 +00:00
Krzysztof Parzyszek 3a0a15afe7 [Pipeliner] Fix in the pipeliner phi reuse code
When the definition of a phi is used by a phi in the next iteration,
the pipeliner was assuming that the definition is processed first.
Because of the assumption, an incorrect phi name was used. This patch
has a check to see if the phi definition has been processed already.

Patch by Brendon Cahoon.

llvm-svn: 328510
2018-03-26 15:58:16 +00:00
Krzysztof Parzyszek 785b6cec11 [Pipeliner] Correctly update memoperands in the epilog
The pipeliner needs to be conservative when updating the memoperands
of instructions in the epilog. Previously, the pipeliner was changing
the offset of the memoperand based upon the scheduling stage. However,
that is incorrect when control flow branches around the kernel code.
The bug enabled a load and store to the same stack offset to be swapped.

This patch fixes the bug by updating the size of the memoperands to be
UINT_MAX. This conservative value means that dependences will be created
between other loads and stores.

Patch by Brendon Cahoon.

llvm-svn: 328508
2018-03-26 15:45:55 +00:00
Krzysztof Parzyszek 56f0fc4716 [Hexagon] Give priority to post-incremementing memory accesses in LSR
llvm-svn: 328506
2018-03-26 15:32:03 +00:00
Simon Pilgrim 0b73b29388 [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs
Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)

This also adds missing vcvttss2si tests 

llvm-svn: 328505
2018-03-26 15:30:47 +00:00
Simon Pilgrim 3aa9344605 [X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs
These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults.

llvm-svn: 328501
2018-03-26 14:44:24 +00:00
Simon Pilgrim 67df1cf597 [X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs
The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps

llvm-svn: 328497
2018-03-26 14:03:40 +00:00
Simon Pilgrim caa203aed5 [X86][Btver2] Double the AGU and schedule pipe resources for YMM
Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model.

llvm-svn: 328491
2018-03-26 13:15:20 +00:00
Hans Wennborg 311b63f13b Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32"
This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up
patch at D44876 fixes this, but let's revert back to green for now until that's
ready to land.

(Also reverts r328443.)

> Both GCC and MSVC only look at the low byte of a boolean when it is
> passed.

llvm-svn: 328482
2018-03-26 10:07:51 +00:00
Craig Topper 6f28d3c954 [X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT.
llvm-svn: 328474
2018-03-26 05:05:12 +00:00
Craig Topper cdfcf8ecda [X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models.
I've used Agner's data as best I could to get the values to converge on.

llvm-svn: 328473
2018-03-26 05:05:10 +00:00
Craig Topper fbf2d850e3 [X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss instructions.
llvm-svn: 328472
2018-03-26 04:20:36 +00:00
Craig Topper 659f85af14 [X86] Swap the itineraries on the memory and register forms of CVTDQ2PD.
They were backwards.

llvm-svn: 328469
2018-03-26 02:17:13 +00:00
Craig Topper 15fef89ad9 [X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 for Skylake.
This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX.

llvm-svn: 328465
2018-03-25 23:40:56 +00:00
Mandeep Singh Grang 98bc25a0f2 [RISCV] Use init_array instead of ctors for RISCV target, by default
Summary:
LLVM defaults to the newer .init_array/.fini_array scheme for static
constructors rather than the less desirable .ctors/.dtors (the UseCtors
flag defaults to false). This wasn't being respected in the RISC-V
backend because it fails to call TargetLoweringObjectFileELF::InitializeELF with the the appropriate
flag for UseInitArray.
This patch fixes this by implementing RISCVELFTargetObjectFile and overriding its Initialize method to call
InitializeELF(TM.Options.UseInitArray).

Reviewers: asb, apazos

Reviewed By: asb

Subscribers: mgorny, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, llvm-commits

Differential Revision: https://reviews.llvm.org/D44750

llvm-svn: 328433
2018-03-24 18:37:19 +00:00
Simon Pilgrim 913345f8f5 [X86][AES] Ensure we're testing both non-VEX/VEX variants of AES instructions on AVX targets
Add skylake server tests as well

llvm-svn: 328424
2018-03-24 15:05:12 +00:00
Simon Pilgrim 91fe24b8cf [X86][SSE] Ensure we're testing both non-VEX/VEX variants of SSE instructions on AVX targets
And ensure we don't use later instruction sets in SSE schedule tests

llvm-svn: 328423
2018-03-24 14:51:52 +00:00
Simon Pilgrim f7d0f7e6db [X86][AVX1] Ensure we don't use later instruction sets in AVX1 schedule tests
llvm-svn: 328421
2018-03-24 13:47:48 +00:00
Simon Pilgrim d2016f95fb [X86][AVX2] Ensure we don't use later instruction sets in AVX2 schedule tests
llvm-svn: 328420
2018-03-24 13:47:01 +00:00
Craig Topper 2c0a62ab9a [X86] Add a DAG combine to simplify PMULDQ/PMULUDQ nodes
These nodes only use the lower 32 bits of their inputs so we can use SimplifyDemandedBits to simplify them.

Differential Revision: https://reviews.llvm.org/D44375

llvm-svn: 328405
2018-03-24 01:52:01 +00:00
Reid Kleckner e27b410661 [X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32
Both GCC and MSVC only look at the low byte of a boolean when it is
passed.

llvm-svn: 328386
2018-03-23 23:38:53 +00:00
Krzysztof Parzyszek bcf0a96f9e [Hexagon] Boost profit for word-mask immediates, reduce for others
This avoids unnecessary splitting due to uninteresting immediates.

llvm-svn: 328364
2018-03-23 20:11:00 +00:00
Krzysztof Parzyszek e247526cc9 [Hexagon] Fold offset in base+immediate loads/stores
Optimize Ry = add(Rx,#n); memw(Ry+#0) = Rz  =>  memw(Rx,#n) = Rz.

Patch by Jyotsna Verma.

llvm-svn: 328355
2018-03-23 19:30:34 +00:00
Tony Tye 88441a3d1e [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU
Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue.

Differential Revision: https://reviews.llvm.org/D44697

llvm-svn: 328351
2018-03-23 18:58:47 +00:00
Tony Tye 7a893d4e34 [AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS.

Differential Revision: https://reviews.llvm.org/D43736

llvm-svn: 328349
2018-03-23 18:45:18 +00:00
Krzysztof Parzyszek 5f7ba9a74c [Hexagon] Always generate mux out of predicated transfers if possible
HexagonGenMux would collapse pairs of predicated transfers if it assumed
that the predicated .new forms cannot be created. Turns out that generating
mux is preferable in almost all cases.
Introduce an option -hexagon-gen-mux-threshold that controls the minimum
distance between the instruction defining the predicate and the later of
the two transfers. If the distance is closer than the threshold, mux will
not be generated. Set the threshold to 0 by default.

llvm-svn: 328346
2018-03-23 18:43:09 +00:00
Krzysztof Parzyszek 80f10e4fe5 [Hexagon] Avoid early if-conversion for one sided branches
Patch by Anand Kodnani.

llvm-svn: 328344
2018-03-23 18:00:18 +00:00
Ana Pazos 41573804f2 [ARM] Fix "Constant pool entry out of range!" in Thumb1 mode
This patch fixes PR36658, "Constant pool entry out of range!" in Thumb1 mode.

In ARMConstantIslands::optimizeThumb2JumpTables() in Thumb1 mode,
adjustBBOffsetsAfter() is not calculating postOffset correctly by
properly accounting for the padding that is required for the constant pool
that immediately follows the jump table branch  instruction.

Reviewers: t.p.northover, eli.friedman

Reviewed By: t.p.northover

Subscribers: chrib, tstellar, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D44709

llvm-svn: 328341
2018-03-23 17:53:27 +00:00
Krzysztof Parzyszek 570c6440cd [Hexagon] Two fixes in early if-conversion
- Fix checking for vector predicate registers.
- Avoid speculating llvm.lifetime.end intrinsic.

Patch by Harsha Jagasia and Brendon Cahoon.

llvm-svn: 328339
2018-03-23 17:46:09 +00:00
Simon Pilgrim e5c0a041ff [X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit
Add missing non-VEX and (V)PMOVMSKB instructions to the pattern

llvm-svn: 328338
2018-03-23 17:38:59 +00:00
Zaara Syeda 6535993625 Re-commit: [MachineLICM] Add functions to MachineLICM to hoist invariant stores
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.

Differential Revision: https://reviews.llvm.org/D40196

llvm-svn: 328326
2018-03-23 15:28:15 +00:00
John Brawn e3b44f9de6 [AArch64] Don't reduce the width of loads if it prevents combining a shift
Loads and stores can only shift the offset register by the size of the value
being loaded, but currently the DAGCombiner will reduce the width of the load
if it's followed by a trunc making it impossible to later combine the shift.

Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and
make it prevent the width reduction if this is what would happen, though do
allow it if reducing the load width will let us eliminate a later sign or zero
extend.

Differential Revision: https://reviews.llvm.org/D44794

llvm-svn: 328321
2018-03-23 14:47:07 +00:00
Simon Pilgrim 8619962c73 [X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units
Fixes throughput to match Agner/Fam16h-SoG as well.

llvm-svn: 328318
2018-03-23 14:27:26 +00:00
Christof Douma 4a025cc79d [ARM] Support float literals under XO
When targeting execute-only and fp-armv8, float constants in a compare
resulted in instruction selection failures. This is now fixed by using
vmov.f32 where possible, otherwise the floating point constant is
lowered into a integer constant that is moved into a floating point
register.

This patch also restores using fpcmp with immediate 0 under fp-armv8.

Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443
llvm-svn: 328313
2018-03-23 13:02:03 +00:00
Amara Emerson f542355942 [GlobalISel] Fix legalizer combine to not use illegal input G_EXTRACT.
This was being masked because GISel is enabled by default for -O0 and
the abort was disabled. Modified test to explicitly enable abort.

llvm-svn: 328311
2018-03-23 12:48:57 +00:00
Simon Pilgrim 2755893834 [X86][SandyBridge] Fix missing comma that was causing string concatenation of 2 instregex entries
Found while updating D44687

llvm-svn: 328308
2018-03-23 11:56:38 +00:00
Martin Storsjo db75aa96d3 Revert "[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))"
This reverts commit r328252. This change broke building a number
of projects when targeting ARM and AArch64, see PR36873.

llvm-svn: 328297
2018-03-23 08:36:47 +00:00
Craig Topper 4787b7f434 [X86] Correct the latencies of SNB integer vector multiplies based on Agner's data. Add missing MMX multiplies.
llvm-svn: 328295
2018-03-23 06:41:43 +00:00
Craig Topper 7580a7997d [X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version.
llvm-svn: 328293
2018-03-23 06:41:40 +00:00
Craig Topper 7f142b8bf1 [X86] Merge VMOVMSKBrr and MOVMSKBrr in the SNB sheduler model.
The VMOVMSKBrr was in a separate InstRW with a lower latency, but I assume they should be the same and the higher latency matches Agners table so I'm going with that.

llvm-svn: 328291
2018-03-23 06:41:38 +00:00
Craig Topper fae4173b47 [X86] Add VEXTRB/W/D/Q to Zen scheduler model.
The SSE versions were present, but not the VEX version.

llvm-svn: 328290
2018-03-23 06:41:36 +00:00
Michael Zolotukhin fab7a676c2 State that CFG is preserved in 'Falkor HW Prefetch Fix Late Phase'.
That removes some redundant recomputations from the passes pipeline.

llvm-svn: 328272
2018-03-22 23:44:40 +00:00
Michael Zolotukhin 3520331f93 Reapply "[test] Add tests for llc passes pipelines." with a fix for bots with expensive checks on.
llvm-svn: 328267
2018-03-22 23:02:48 +00:00
Craig Topper adb173314d [X86] Correct the VROUND regular expressions in Znver1 scheduler model to account for r328254
llvm-svn: 328260
2018-03-22 22:17:11 +00:00
Craig Topper 40d3b32e12 [X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD
This makes the Y position consistent with other instructions.

This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.

llvm-svn: 328254
2018-03-22 21:55:20 +00:00
Guozhi Wei 17ff975eb1 [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))
In our real world application, we found the following optimization is missed in DAGCombiner

(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))

If the user of original zext is an add, it may enable further lea optimization on x86.

This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.

Differential Revision: https://reviews.llvm.org/D44402

llvm-svn: 328252
2018-03-22 21:47:25 +00:00
Craig Topper 58afb4ea58 [X86][SkylakeClient] Fix a bunch of instructions that were incorrectly assigned Port015 instead of Port01.
The VEC ADD and VEC MUL units aren't present on port 5 on SkylakeClient.

llvm-svn: 328241
2018-03-22 21:10:07 +00:00
Jun Bum Lim 2ecb7ba4c6 [CodeGen] Add a new pass for PostRA sink
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated).  This avoids executing the
the copy on paths where their results aren't needed.  This also exposes
additional opportunites for dead copy elimination and shrink wrapping.

These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..

For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.

```
   %bb.0:
      %wzr = SUBSWri %w1, 1
      %w19 = COPY %w0
      Bcc 11, %bb.2
    %bb.1:
      Live Ins: %w19
      BL @fun
      %w0 = ADDWrr %w0, %w19
      RET %w0
    %bb.2:
      %w0 = COPY %wzr
      RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.

With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted  in spec2000/2006/2017 on AArch64.

Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz

Reviewed By: sebpop

Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41463

llvm-svn: 328237
2018-03-22 20:06:47 +00:00
Nirav Dave 8c5f47ac40 [DAG, X86] Fix ISel-time node insertion ids
As in SystemZ backend, correctly propagate node ids when inserting new
unselected nodes into the DAG during instruction Seleciton for X86
target.

Fixes PR36865.

Reviewers: jyknight, craig.topper

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D44797

llvm-svn: 328233
2018-03-22 19:32:07 +00:00
Craig Topper 4a3be6e578 [X86] Correct the scheduling data for some of the 32 and 64 bit multiplies to as best as I understand how they are implemented.
llvm-svn: 328231
2018-03-22 19:22:51 +00:00
Aditya Nandakumar b3297ef051 [GISel]: Fix incorrect IRTranslation while translating null pointer types
https://reviews.llvm.org/D44762

Currently IRTranslator produces
%vreg17<def>(p0) = G_CONSTANT 0;

instead we should build
%vreg16(s64) = G_CONSTANT 0
%vreg17(p0) = G_INTTOPTR %vreg16

reviewed by @aemerson.

llvm-svn: 328218
2018-03-22 17:31:38 +00:00
Jonas Devlieghere 7e69dd02bb Revert "[test] Add tests for llc passes pipelines."
This reverts r328159 because the two AArch64 tests fail on GreenDragon:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/11030/

llvm-svn: 328188
2018-03-22 10:34:06 +00:00
Michael Zolotukhin 7e6fa1d6ae [test] Add tests for llc passes pipelines.
This is basically an extension of existing test
test/CodeGen/X86/O0-pipeline.ll introduced in r302608.

llvm-svn: 328159
2018-03-21 22:17:13 +00:00
Artem Belevich 30512869ff [NVPTX] Make tensor shape part of WMMA intrinsic's name.
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.

Differential Revision: https://reviews.llvm.org/D44719

llvm-svn: 328158
2018-03-21 21:55:02 +00:00
Lei Huang efd6f1c8e2 [POWER9][NFC] update testcase check statements
llvm-svn: 328147
2018-03-21 20:59:45 +00:00
Sanjay Patel e235942a1e [InstSimplify] fp_binop X, NaN --> NaN
We propagate the existing NaN value when possible.

Differential Revision: https://reviews.llvm.org/D44521

llvm-svn: 328140
2018-03-21 19:31:53 +00:00
Krzysztof Parzyszek c715a5d2b8 [Hexagon] Eliminate subregisters from PHI nodes before pipelining
The pipeliner needs to remove instructions from the SlotIndexes
structure when they are deleted. Otherwise, the SlotIndexes map
has stale data, and an assert will occur when adding new
instructions.

This patch also changes the pipeliner to make the back-edge of
a loop carried dependence 1 cycle. The 1 cycle latency is added
to the anti-dependence that represents the back-edge. This
changes eliminates a couple of hacks added to the pipeliner to
handle the latency of the back-edge. It is needed to correctly
pipeline the test case for the sub-register elimination pass.

llvm-svn: 328113
2018-03-21 16:39:11 +00:00
Alex Bradbury 65d6ea5e68 [RISCV] Codegen support for RV32F floating point comparison operations
This patch also includes extensive tests targeted at select and br+fcmp IR
inputs. A sequence of br+fcmp required support for FPR32 registers to be added
to RISCVInstrInfo::storeRegToStackSlot and
RISCVInstrInfo::loadRegFromStackSlot.

llvm-svn: 328104
2018-03-21 15:11:02 +00:00
Alex Bradbury 77d5927a1c [RISCV] Add tests missed from r327979
llvm-svn: 328102
2018-03-21 14:50:27 +00:00
Craig Topper 137a4dd84d [X86] Fix the SchedRW for XOP vpcom register form instructions to not be marked as loads.
llvm-svn: 328071
2018-03-21 03:41:33 +00:00
Craig Topper d25f1acf67 [X86] Change PMULLD to 10 cycles on Skylake per Agner's tables and llvm-exegesis.
Also restrict to port 0 and 1 for SkylakeClient. It looks like the scheduler models don't account for client not having a full vector ALU on port 5 like server.

Fixes PR36808.

llvm-svn: 328061
2018-03-20 23:39:48 +00:00
Derek Schuff 39b5367cba [WebAssembly] Strip threadlocal attribute from globals in single thread mode
The default thread model for wasm is single, and in this mode thread-local
global variables can be lowered identically to non-thread-local variables.

Differential Revision: https://reviews.llvm.org/D44703

llvm-svn: 328049
2018-03-20 22:01:32 +00:00
Martin Storsjo 07589fc496 [X86] Don't use the MSVC stack protector names on mingw
Mingw uses the same stack protector functions as GCC provides
on other platforms as well.

Patch by Valentin Churavy!

Differential Revision: https://reviews.llvm.org/D27296

llvm-svn: 328039
2018-03-20 20:37:51 +00:00
Abderrazek Zaafrani 4c60c222e4 [AArch64] Add vmulxh_lane fp16 vector intrinsic
https://reviews.llvm.org/D44591

llvm-svn: 328035
2018-03-20 20:25:40 +00:00
Krzysztof Parzyszek 4094ab73cc [Hexagon] Add a few more lit tests, NFC
llvm-svn: 328023
2018-03-20 19:35:09 +00:00
Krzysztof Parzyszek 65059ee284 [Hexagon] Add heuristic to exclude critical path cost for scheduling
Patch by Brendon Cahoon.

llvm-svn: 328022
2018-03-20 19:26:27 +00:00
Craig Topper c2dbd677bd [PowerPC][LegalizeFloatTypes] Move the PPC hacks for (i32 fp_to_sint/fp_to_uint (ppcf128 X)) out of LegalizeFloatTypes and into PPC specific code
I'm not entirely sure these hacks are still needed. If you remove the hacks completely, the name of the library call that gets generated doesn't match the grep the test previously had. So the test wasn't really checking anything.

If the hack is still needed it belongs in PPC specific code. I believe the FP_TO_SINT code here is the only place in the tree where a FP_ROUND_INREG node is created today. And I don't think its even being used correctly because the legalization returned a BUILD_PAIR with the same value twice. That doesn't seem right to me. By moving the code entirely to PPC we can avoid creating the FP_ROUND_INREG at all.

I replaced the grep in the existing test with full checks generated by hacking update_llc_test_check.py to support ppc32 just long enough to generate it.

Differential Revision: https://reviews.llvm.org/D44061

llvm-svn: 328017
2018-03-20 18:49:28 +00:00
Krzysztof Parzyszek eb0c510ecd [X86] Add phony registers for high halves of regs with low halves
Registers E[A-D]X, E[SD]I, E[BS]P, and EIP have 16-bit subregisters
that cover the low halves of these registers. This change adds artificial
subregisters for the high halves in order to differentiate (in terms of
register units) between the 32- and the low 16-bit registers.

This patch contains parts that aim to preserve the calculated register
pressure. This is in order to preserve the current codegen (minimize the
impact of this patch). The approach of having artificial subregisters
could be used to fix PR23423, but the pressure calculation would need
to be changed.

Differential Revision: https://reviews.llvm.org/D43353

llvm-svn: 328016
2018-03-20 18:46:55 +00:00
Artem Belevich 914d4babec [NVPTX] Make tensor load/store intrinsics overloaded.
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.

Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.

Updated tests to use non-default address spaces.

Differential Revision: https://reviews.llvm.org/D43268

llvm-svn: 328006
2018-03-20 17:18:59 +00:00
Krzysztof Parzyszek 4c6b65f685 [Hexagon] Correct the computation of TopReadyCycle and BotReadyCycle of SU
TopReadyCycle and BotReadyCycle were off by one cycle when an SU is either
the first instruction or the last instruction in a packet.

Patch by Ikhlas Ajbar.

llvm-svn: 328000
2018-03-20 17:03:27 +00:00
Michael Zolotukhin fb3f509e01 [XRay] Lazily compute MachineLoopInfo instead of requiring it.
Summary:
Currently X-Ray Instrumentation pass has a dependency on MachineLoopInfo
(and thus on MachineDominatorTree as well) and we have to compute them
even if X-Ray is not used. This patch changes it to a lazy computation
to save compile time by avoiding these redundant computations.

Reviewers: dberris, kubamracek

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D44666

llvm-svn: 327999
2018-03-20 17:02:29 +00:00
Sanjay Patel 5a9210e651 [AArch64] add fabs tests for PR36600; NFC
llvm-svn: 327995
2018-03-20 16:08:47 +00:00
Simon Pilgrim 62690e9d0e [X86][Haswell][Znver1] Fix typo in fldl instregexs
Missing comma was casing 2 instregex entries to be concatenated together by mistake.

Found while investigating PR35548

llvm-svn: 327992
2018-03-20 15:44:47 +00:00
Alex Bradbury 80c8eb7696 [RISCV] Add codegen for RV32F floating point load/store
As part of this, add support for load/store from the constant pool. This is
used to materialise f32 constants.

llvm-svn: 327979
2018-03-20 13:26:12 +00:00
Alex Bradbury 76c29ee815 [RISCV] Add codegen for RV32F arithmetic and conversion operations
Currently, only a soft floating point ABI is supported.

llvm-svn: 327976
2018-03-20 12:45:35 +00:00
Krzysztof Parzyszek dca383123f [Hexagon] Improve scheduling based on register pressure
Patch by Brendon Cahoon.

llvm-svn: 327975
2018-03-20 12:28:43 +00:00
Dylan McKay 212841b7ad [AVR] Add a regression test for struct return lowering
The test is taken from
https://github.com/avr-rust/rust/issues/57

The originally implementation of struct return lowering was made in
r325474.

Patch by Peter Nimmervoll

llvm-svn: 327967
2018-03-20 11:23:03 +00:00
Jonas Paulsson 8ad035d8e5 [SystemZ] Add "REQUIRES: asserts" to test case to fix build bots.
llvm-svn: 327958
2018-03-20 08:29:19 +00:00
Martin Storsjo 802b434156 [X86] Properly implement the calling convention for f80 for mingw/x86_64
In these cases, both parameters and return values are passed
as a pointer to a stack allocation.

MSVC doesn't use the f80 data type at all, while it is used
for long doubles on mingw.

Normally, this part of the calling convention is handled
within clang, but for intrinsics that are lowered to libcalls,
it may need to be handled within llvm as well.

Differential Revision: https://reviews.llvm.org/D44592

llvm-svn: 327957
2018-03-20 06:19:38 +00:00
Craig Topper 4778fa7e8a [X86] Fix the SchedRW for memory forms of CMP and TEST.
They were incorrectly marked as RMW operations. Some of the CMP instrucions worked, but the ones that use a similar encoding as RMW form of ADD ended up marked as RMW.

TEST used the same tablegen class as some of the CMPs.

llvm-svn: 327947
2018-03-20 03:55:17 +00:00
Craig Topper 3e9462607e [X86] Add TEST16mi/TEST32mi/TEST64mi32 to the Sandybridge/Haswell/Broadwell/Skylake scheduler models.
Move it from a load+store group on SNB to a load only group, the same group as CMP.

llvm-svn: 327944
2018-03-20 03:02:03 +00:00
Craig Topper 7c90e29cf8 [X86] Add ROR/ROL/SHR/SAR by 1 instructions to the Sandy Bridge scheduler model.
I assume these match the generic immediate version like they do in the other models.

llvm-svn: 327943
2018-03-20 03:01:59 +00:00
Quentin Colombet 508f68233d [ShrinkWrap] Take into account landing pad
When scanning the function for CSRs uses and defs, also check if
the basic block are landing pads.
Consider that landing pads needs the CSRs to be properly set.
That way we force the prologue/epilogue to always be pushed out
of the problematic "throw" region. The "throw" region is
problematic because the jumps are not properly modeled.

Fixes PR36513

llvm-svn: 327942
2018-03-20 02:44:40 +00:00
Shiva Chen cbd498ac10 [RISCV] Preserve stack space for outgoing arguments when the function contain variable size objects
E.g.

bar (int x)
{
  char p[x];

  push outgoing variables for foo.
  call foo
}

We need to generate stack adjustment instructions for outgoing arguments by
eliminateCallFramePseudoInstr when the function contains variable size
objects to avoid outgoing variables corrupt the variable size object.

Default hasReservedCallFrame will return !hasFP().
We don't want to generate extra sp adjustment instructions when hasFP()
return true, So We override hasReservedCallFrame as !hasVarSizedObjects().

Differential Revision: https://reviews.llvm.org/D43752

llvm-svn: 327938
2018-03-20 01:39:17 +00:00
Craig Topper 2330d6cd55 [X86] Fix the SNB scheduler for BLENDVB.
PBLENDVBrr0 was with the memory version of VBLENDVB and PBLENDVBrm0 was missing.

llvm-svn: 327937
2018-03-20 01:30:21 +00:00
Rafael Espindola d947977e87 Run dos2unix on a test. NFC.
llvm-svn: 327934
2018-03-20 01:06:29 +00:00
Aaron Smith 6738960588 [SelectionDAG] Transfer DbgValues when integer operations are promoted
Summary:
DbgValue nodes were not transferred when integer DAG nodes were promoted. For example, if an i32 add node was promoted to an i64 add node by DAGTypeLegalizer::PromoteIntegerResult(), its DbgValue node was not transferred to the new node. The simple fix is to update SetPromotedInteger() to transfer DbgValues. 

Add AArch64/dbg-value-i8.ll to test this change and fix ARM/debug-info-d16-reg.ll which had the wrong DILocalVariable nodes with arg numbers even though they are not for function parameters.

Patch by Se Jong Oh!

Reviewers: vsk, JDevlieghere, aprantl

Reviewed By: JDevlieghere

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D44546

llvm-svn: 327919
2018-03-19 22:58:50 +00:00
Jessica Paquette 563548d8f3 [MachineOutliner] AArch64: Emit CFI instructions when outlining calls
When outlining calls, the outliner needs to update CFI to ensure that, say,
exception handling works. This commit adds that functionality and adds a test
just for call outlining.

Call outlining stuff in machine-outliner.mir should be moved into
machine-outliner-calls.mir in a later commit.

llvm-svn: 327917
2018-03-19 22:48:40 +00:00
Krzysztof Parzyszek c76fefc6de [Hexagon] Add REQUIRES: asserts to test/CodeGen/Hexagon/v6vec_inc1.ll
llvm-svn: 327907
2018-03-19 21:05:21 +00:00
Nirav Dave 3264c1bdf6 [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"
Reland ISel cycle checking improvements after simplifying node id
invariant traversal and correcting typo.

llvm-svn: 327898
2018-03-19 20:19:46 +00:00
Martin Storsjo 9a55c1b0dc [ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes
This extends the use of this attribute on ARM and AArch64 from
SVN r325900 (where it was only checked for fixed stack
allocations on ARM/AArch64, but for all stack allocations on X86).

This also adds a testcase for the existing use of disabling the
fixed stack probe with the attribute on ARM and AArch64.

Differential Revision: https://reviews.llvm.org/D44291

llvm-svn: 327897
2018-03-19 20:06:50 +00:00
Sanjay Patel f2d85e78df [AMDGPU] change test to avoid NaN math
llvm-svn: 327891
2018-03-19 19:26:22 +00:00
Sanjay Patel dad3d13b99 [AMDGPU] adjust tests to be nan-free
As suggested in D44521 - bitcast to integer for the math,
so we preserve the intent of these tests when NaN math
gets folded away.

llvm-svn: 327890
2018-03-19 19:23:53 +00:00
Lei Huang ecfede94a7 [Power9]Legalize and emit code for quad-precision copySign/abs/nabs/neg/sqrt
Legalize and emit code for quad-precision floating point operations:

  * xscpsgnqp
  * xsabsqp
  * xsnabsqp
  * xsnegqp
  * xssqrtqp

Differential Revision: https://reviews.llvm.org/D44530

llvm-svn: 327889
2018-03-19 19:22:52 +00:00
Krzysztof Parzyszek 461e6691eb [Hexagon] Add a few more lit tests
llvm-svn: 327884
2018-03-19 19:03:18 +00:00
Craig Topper 5e65996fac [X86] Remove OUT32rr/OUT8rr/OUT32ri/OUT8ri from Sandybridge scheduler model.
PR35590 was already filed for this information being wrong. It's probably better to default to WriteSystem behavior instead of using something completely wrong.

llvm-svn: 327882
2018-03-19 19:00:35 +00:00
Craig Topper b4c7873f8c [X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models.
JRCXZ was already present, but not the others.

We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output.

llvm-svn: 327881
2018-03-19 19:00:32 +00:00
Craig Topper afabf36505 [X86] Correct regular expression in Zen scheduler model that was excluding JECXZ instruction.
The regex was looking for JECXZ_32 or JECXZ_64, but their is just one instruction called JECXZ. They used to exist as separate instructions, but were merged over 3 years ago.

llvm-svn: 327880
2018-03-19 19:00:29 +00:00
Lei Huang 6d1596a98c [PowerPC][Power9]Legalize and emit code for quad-precision add/div/mul/sub
Legalize and emit code for quad-precision floating point operations:

  * xsaddqp
  * xssubqp
  * xsdivqp
  * xsmulqp

Differential Revision: https://reviews.llvm.org/D44506

llvm-svn: 327878
2018-03-19 18:52:20 +00:00
Nemanja Ivanovic d9d5bd3067 [PowerPC] Make AddrSpaceCast noop
PowerPC targets do not use address spaces. As a result, we can get selection
failures with address space casts. This patch makes those casts noops.

Patch by Valentin Churavy.

Differential revision: https://reviews.llvm.org/D43781

llvm-svn: 327877
2018-03-19 18:50:02 +00:00
Craig Topper 259eaa6e7c [X86] Remove sse41 specific code from lowering v16i8 multiply
With the SRAs removed from the SSE2 code in D44267, then there doesn't appear to be any advantage to the sse41 code. The punpcklbw instruction and pmovsx seem to have the same latency and throughput on most CPUs. And the SSE41 code requires moving the upper 64-bits into the lower 64-bit before the sign extend can be done. The unpckhbw in sse2 code can do better than that.

llvm-svn: 327869
2018-03-19 17:31:41 +00:00
Craig Topper 5ccd87233f [X86] Make the multiply and divide itineraries more consistent.
Sometimes we used the same itinerary for MEM and REG forms, but that seems inconsistent with our usual usage.

We also used the MUL8 itinerary for MULX32/64 which was also weird.

The test changes are because we were using IIC_IMUL32_RR and IIC_IMUL64_RR instead of IIC_IMUL32_REG/IIC_IMUL64_REG for the 32 and 64 bit multiplies that produce double width result.

llvm-svn: 327866
2018-03-19 16:38:33 +00:00
Zaara Syeda 01f414baaa Revert [MachineLICM] This reverts commit rL327856
Failing build bots. Revert the commit now.

llvm-svn: 327864
2018-03-19 16:19:44 +00:00
Matt Davis 4b54e5fc38 [CodeGen] Avoid handling DBG_VALUE in the LivePhysRegs (addUses,removeDefs,stepForward)
Summary:
This patch prevents DBG_VALUE instructions from influencing
LivePhysRegs::stepBackwards and stepForwards.  In at least one case,
specifically branch folding, the stepBackwards logic was having an
influence on code generation.  The result was that certain code
compiled with '-g -O2' would differ from that compiled with '-O2'
alone. It seems that the original logic, accounting for DBG_VALUE,
was influencing the placement of an IMPLICIT_DEF which had a later
impact on how blocks were processed in branch folding.

Reviewers: kparzysz, MatzeB

Reviewed By: kparzysz

Subscribers: bjope, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D43850

llvm-svn: 327862
2018-03-19 16:06:40 +00:00
Zaara Syeda ff05e2b0e6 [MachineLICM] Add functions to MachineLICM to hoist invariant stores
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.

Differential Revision: https://reviews.llvm.org/D40196

llvm-svn: 327856
2018-03-19 14:52:25 +00:00
Simon Pilgrim 30c38c3849 [X86] Generalize schedule classes to support multiple stages
Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults.

This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases.

I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific.

Differential Revision: https://reviews.llvm.org/D44612

llvm-svn: 327855
2018-03-19 14:46:07 +00:00
Sanjay Patel 05daae75ad [x86] put nops into the WriteNop class and customize for Jaguar
1. Given that we already have a classification bucket with 'nop' in the name, 
   that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'.
2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably 
   llvm-mca) how to model the resource requirements better even though a nop has no 
   dependencies.

Differential Revision: https://reviews.llvm.org/D44608

llvm-svn: 327853
2018-03-19 14:26:50 +00:00
Matt Arsenault fed0a45036 AMDGPU/GlobalISel: RegBankSelect for basic int ops
llvm-svn: 327843
2018-03-19 14:07:23 +00:00
Matt Arsenault 69932e4d69 AMDGPU: Don't leave dead illegal VGPR->SGPR copies
Normally DCE kills these, but at -O0 these get left behind
leaving suspicious looking illegal copies.

Replace with IMPLICIT_DEF to avoid iterator issues.

llvm-svn: 327842
2018-03-19 14:07:15 +00:00
Clement Courbet 6d047b70a4 [MergeICmps] Re-land 324317 "Enable the MergeICmps Pass by default."
Now that PR36557 is fixed.

llvm-svn: 327840
2018-03-19 13:37:04 +00:00
Sjoerd Meijer d16037d9bb [ARM] Support for v4f16 and v8f16 vectors
This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which
uses v4f16 and v8f16 vector operands and return values. All the moving parts
are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16
intrinsic. In a follow-up patch the rest of the intrinsics and tests will be
added.

Differential Revision: https://reviews.llvm.org/D44538

llvm-svn: 327839
2018-03-19 13:35:25 +00:00
Jonas Paulsson a6216ec4cc [SystemZ] Bugfix of CC liveness in emitMemMemWrapper (CLC).
If DoneMBB becomes empty it must have CC added to its live-in list, since it
will fall-through into EndMBB. This happens when the CLC loop does the
complete range.

Review: Ulrich Weigand
llvm-svn: 327834
2018-03-19 13:05:22 +00:00
Alex Bradbury 0171a9f4ec [RISCV] Peephole optimisation for load/store of global values or constant addresses
(load (add base, off), 0) -> (load base, off)
(store val, (add base, off)) -> (store val, base, off)

This is similar to an equivalent peephole optimisation in PPCISelDAGToDAG.

llvm-svn: 327831
2018-03-19 11:54:28 +00:00
Craig Topper d10ceffa5f [X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to match ADD8i8.
Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm.

llvm-svn: 327820
2018-03-19 04:21:40 +00:00
Dylan McKay a35ee70641 [AVR] Lower i128 divisions to runtime library calls
This patch adds i128 division support by instruction LLVM to lower
128-bit divisions to the __udivmodti4 and __divmodti4 rtlib functions.

This also adds test for 64-bit division and 128-bit division.

Patch by Peter Nimmervoll.

llvm-svn: 327814
2018-03-19 00:55:50 +00:00
Simon Pilgrim 203876f104 [X86][Btver2] Fix crc32 schedule costs
The default is currently FAdd for some reason

llvm-svn: 327807
2018-03-18 19:54:42 +00:00
Craig Topper 2d451e73f9 [X86] Fix a bunch of overlapping regular expressions in the scheduler models.
llvm-svn: 327787
2018-03-18 08:38:06 +00:00
Craig Topper 89dcda3e90 [X86] Remove MMX_MASKMOVQ64 and VMASKMOVDQU from scheduler models.
The information was so wildly inaccurate and incomplete its better to just remove it.

MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information.

MMX_MASKMOVQ and MASKMOVDQU were completely missing.

MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction.

Filed PR36780 to track fixing this right.

llvm-svn: 327783
2018-03-18 03:24:42 +00:00
Nirav Dave 5f0ab71b62 Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""
as it times out building test-suite on PPC.

llvm-svn: 327778
2018-03-17 19:24:54 +00:00
Nirav Dave 982d3a56ea [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"
Reland ISel cycle checking improvements after simplifying and reducing
node id invariant traversal.

llvm-svn: 327777
2018-03-17 17:42:10 +00:00
Matt Arsenault abdc4f2dc7 AMDGPU/GlobalISel: Cleanup constant legality
llvm-svn: 327774
2018-03-17 15:17:48 +00:00
Matt Arsenault 685d1e8157 AMDGPU/GlobalISel: Basic G_GEP legality
llvm-svn: 327773
2018-03-17 15:17:45 +00:00
Matt Arsenault 85803366d6 AMDGPU/GlobalISel: Basic legality for load/store
llvm-svn: 327772
2018-03-17 15:17:41 +00:00
Oren Ben Simhon fdd72fd522 [X86] Added support for nocf_check attribute for indirect Branch Tracking
X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET).
IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp.
The `nocf_check` attribute has two roles in the context of X86 IBT technology:
	1. Appertains to a function - do not add ENDBR instruction at the beginning of the function.
	2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction.

This patch implements `nocf_check` context for Indirect Branch Tracking.
It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks.

Differential Revision: https://reviews.llvm.org/D41879

llvm-svn: 327767
2018-03-17 13:29:46 +00:00
Jonas Paulsson dbcf1bf503 [SystemZ] Add 'REQUIRES: asserts' to test case using debug output.
llvm-svn: 327766
2018-03-17 09:15:13 +00:00
Jonas Paulsson 138960770c [SystemZ] computeKnownBitsForTargetNode() / ComputeNumSignBitsForTargetNode()
Improve/implement these methods to improve DAG combining. This mainly
concerns intrinsics.

Some constant operands to SystemZISD nodes have been marked Opaque to avoid
transforming back and forth between generic and target nodes infinitely.

Review: Ulrich Weigand
llvm-svn: 327765
2018-03-17 08:32:12 +00:00
Jonas Paulsson e9f7fa83d5 [SelectionDAG] Handle big endian target BITCAST in computeKnownBits()
The BITCAST handling in computeKnownBits() previously only worked for little
endian.

This patch reverses the iteration over elements for a big endian target which
allows this to work in this case also.

SystemZ test case.

Review: Eli Friedman
https://reviews.llvm.org/D44249

llvm-svn: 327764
2018-03-17 08:04:00 +00:00
Jessica Paquette b3e7dc9144 [MachineOutliner] Make KILLs invisible
At the point the outliner runs, KILLs don't impact anything, but they're still
considered unique instructions. This commit makes them invisible like
DebugValues so that they can still be outlined without impacting outlining
decisions.

llvm-svn: 327760
2018-03-16 22:53:34 +00:00
Krzysztof Parzyszek f81a8d03c1 [Hexagon] Avoid bank conflicts in post-RA scheduler
Avoid scheduling two loads in such a way that they would end up in the
same packet. If there is a load in a packet, try to schedule a non-load
next.

Patch by Brendon Cahoon.

llvm-svn: 327742
2018-03-16 20:55:49 +00:00
Krzysztof Parzyszek 889cbcacbc [Hexagon] Add lit testcases for atomic intrinsics
Patch by Ben Craig.

llvm-svn: 327737
2018-03-16 20:21:43 +00:00
Reid Kleckner f8b51c5f90 [IR] Avoid the need to prefix MS C++ symbols with '\01'
Now the Windows mangling modes ('w' and 'x') do not do any mangling for
symbols starting with '?'. This means that clang can stop adding the
hideous '\01' leading escape. This means LLVM debug logs are less likely
to contain ASCII escape characters and it will be easier to copy and
paste MS symbol names from IR.

Finally.

For non-Windows platforms, names starting with '?' still get IR
mangling, so once clang stops escaping MS C++ names, we will get extra
'_' prefixing on MachO. That's fine, since it is currently impossible to
construct a triple that uses the MS C++ ABI in clang and emits macho
object files.

Differential Revision: https://reviews.llvm.org/D7775

llvm-svn: 327734
2018-03-16 20:13:32 +00:00
Farhana Aleen c6c9dc8773 [AMDGPU] Supported ds_write_b128 generation.
Summary: This is a follow-on patch of https://reviews.llvm.org/D44210

Author: FarhanaAleen

Reviewed By: msearles

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44319

llvm-svn: 327726
2018-03-16 18:12:00 +00:00
Craig Topper e6913ec340 [X86] Post process the DAG after isel to remove vector moves that were added to zero upper bits.
We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations.

So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one.

Differential Revision: https://reviews.llvm.org/D44289

llvm-svn: 327724
2018-03-16 17:13:42 +00:00
Dmitry Preobrazhensky 4c8f4234b6 [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP opcodes
See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751

Differential Revision: https://reviews.llvm.org/D44529

Reviewers: artem.tamazov, arsenm
llvm-svn: 327723
2018-03-16 16:38:04 +00:00
Krzysztof Parzyszek 9915291ab8 [Hexagon] Fix zero-extending non-HVX bool vectors
llvm-svn: 327712
2018-03-16 15:03:37 +00:00
Simon Pilgrim 23578e7d3c [X86][Btver2] Add correct mul/imul schedule costs
Integer multiply is performed on the JMul function unit and i64 requires double pumping

llvm-svn: 327707
2018-03-16 14:01:01 +00:00
Simon Pilgrim 8d28ae6aec [X86][Btver2] Add correct lzcnt/tzcnt/popcnt schedule costs
Don't use WriteIMul defaults

llvm-svn: 327706
2018-03-16 13:43:55 +00:00
Sjoerd Meijer d391a1a985 [ARM] FP16 codegen support for VSEL
This implements lowering of SELECT_CC for f16s, which enables
codegen of VSEL with f16 types.

Differential Revision: https://reviews.llvm.org/D44518

llvm-svn: 327695
2018-03-16 08:06:25 +00:00
Craig Topper 1b8cf49704 [SelectionDAG][ARM][X86] Teach PromoteIntRes_SETCC to do a better job picking the result type for the setcc.
Previously if getSetccResultType returned an illegal type we just fell back to using the default promoted type. This appears to have been to handle the case where for vectors getSetccResultType returns the input type, but the input type itself isn't legal and will need to be promoted. Without the legality check we would never reach a legal type.

But just picking the promoted type to be the setcc type can create strange setccs where the result type is 128 bits and the operand type is 256 bits. If for example the result type was promoted to v8i16 from v8i1, but the input type was promoted from v8i23 to v8i32. We currently handle this with custom lowering code in X86.

This legality check also caused us reject the getSetccResultType when the input type needed to be widened or split. Even though that result wouldn't have caused legalization to get stuck.

This patch tries to fix this by detecting the getSetccResultType needs to be promoted. If its input type also needs to be promoted we'll try a ask for a new setcc result type based on its eventual promoted value. Otherwise we fall back to default type to promote to.

For any other illegal values we might get back from the initial call to getSetccResultType we just keep and allow it to be re-legalized later via splitting or widening or scalarizing.

llvm-svn: 327683
2018-03-15 23:04:11 +00:00
Craig Topper c3983c34cd [X86] Make sure we use FSUB instruction as the reference for operand order in isAddSubOrSubAdd when recognizing subadd
The FADD part of the addsub/subadd pattern can have its operands commuted, but when checking for fsubadd we were using the fadd as reference and commuting the fsub node.

llvm-svn: 327660
2018-03-15 20:30:54 +00:00