Commit Graph

8945 Commits

Author SHA1 Message Date
Simon Pilgrim e8a5ab35ca [X86][AVX512] Added v64i8 reverse shuffle test (PR31470)
llvm-svn: 290544
2016-12-26 17:38:58 +00:00
Florian Hahn 898127fe36 Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot.
llvm-svn: 290425
2016-12-23 12:26:11 +00:00
Florian Hahn 1d6b1a7b79 [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: mkuper, MatzeB, aprantl

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290423
2016-12-23 11:35:00 +00:00
Sanjoy Das 9a129807f3 Reimplement depedency tracking in the ImplicitNullChecks pass
Summary:
This change rewrites a core component in the ImplicitNullChecks pass for
greater simplicity since the original design was over-complicated for no
good reason.  Please review this as essentially a new pass.  The change
is almost NFC and I've added a test case for a scenario that this new
code handles that wasn't handled earlier.

The implicit null check pass, at its core, is a code hoisting transform.
It differs from "normal" code transforms in that it speculates
potentially faulting instructions (by design), but a lot of the usual
hazard detection logic (register read-after-write etc.) still applies.
We previously detected hazards by keeping track of registers defined and
used by machine instructions over an instruction range, but that was
unwieldy and did not actually confer any performance benefits.  The
intent was to have linear time complexity over the number of machine
instructions considered, but it ended up being N^2 is practice.

This new version is more obviously O(N^2) (with N capped to 8 by
default) in hazard detection.  It does not attempt to be clever in
tracking register uses or defs (the previous cleverness here was a
source of bugs).

Once this is checked in, I'll extract out the `IsSuitableMemoryOp` and
`CanHoistLoadInst` lambda into member functions (they're too complicated
to be inline lambdas) and do some other related NFC cleanups.

Reviewers: reames, anna, atrick

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27592

llvm-svn: 290394
2016-12-23 00:41:21 +00:00
Wei Mi a2f0b594c2 Redo store splitting in CodeGenPrepare.
This is a succeeding patch of https://reviews.llvm.org/D22840 to address the
issue when a value to be merged into an int64 pair is in a different BB. Redoing
the store splitting in CodeGenPrepare so we can match the pattern across multiple
BBs and move some instructions into the same BB. We still keep the code in dag
combine so that we can catch cases that show up after DAG combining runs.

Differential Revision: https://reviews.llvm.org/D25914

llvm-svn: 290365
2016-12-22 19:44:45 +00:00
Adrian Prantl 1eadba1c8c Renumber testcase metadata nodes after r290153.
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.

Differential Revision: https://reviews.llvm.org/D27765

llvm-svn: 290292
2016-12-22 00:45:21 +00:00
Simon Pilgrim 081abbb164 [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

llvm-svn: 290267
2016-12-21 20:00:10 +00:00
Elena Demikhovsky 7c7bf1b432 Added a template for building target specific memory node in DAG.
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. 
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.

Differential Revision: https://reviews.llvm.org/D27899

llvm-svn: 290250
2016-12-21 10:43:36 +00:00
Oren Ben Simhon cecc4af496 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
Fixing failing test.

llvm-svn: 290246
2016-12-21 09:18:37 +00:00
Oren Ben Simhon 3b95157090 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. 
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.

The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.

Differential Revision: https://reviews.llvm.org/D27392

llvm-svn: 290240
2016-12-21 08:31:45 +00:00
Adrian Prantl bceaaa9643 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Dean Michael Berris 03b8be575e [XRay] Fix assertion failure on empty machine basic blocks (PR 31424)
The original version of the code in XRayInstrumentation.cpp assumed that
functions may not have empty machine basic blocks (or that the first one
couldn't be). This change addresses that by special-casing that specific
situation.

We provide two .mir test-cases to make sure we're handling this
appropriately.

Fixes llvm.org/PR31424.

Reviewers: chandlerc

Subscribers: varno, llvm-commits

Differential Revision: https://reviews.llvm.org/D27913

llvm-svn: 290091
2016-12-19 09:20:38 +00:00
Daniel Jasper 373f9a6a0c Revert r289955 and r289962. This is causing lots of ASAN failures for us.
Not sure whether it causes and ASAN false positive or whether it
actually leads to incorrect code or whether it even exposes bad code.
Hans, I'll get you instructions to reproduce this.

llvm-svn: 290066
2016-12-18 14:36:38 +00:00
Simon Pilgrim e940daf532 [X86][SSE] Add support for combining target shuffles to SHUFPS.
As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point.

llvm-svn: 290064
2016-12-18 14:26:02 +00:00
Craig Topper 7029db0eaa [X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations if they are available. This will allow a bunch of patterns to be removed.
These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine.

For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode.

For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now.

llvm-svn: 290060
2016-12-18 07:54:23 +00:00
Craig Topper add9cc697a [AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when DQI and VLX instructions are available.
This can give the register allocator more registers to use.

llvm-svn: 290057
2016-12-18 06:23:14 +00:00
Craig Topper 2baef8f466 [AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops for scalars. I missed this in r290049.
llvm-svn: 290055
2016-12-18 04:17:00 +00:00
Craig Topper d3295c6a3a [AVX-512] Use EVEX encoded logic operations for scalar types when they are available. This gives the register allocator more registers to work with.
llvm-svn: 290049
2016-12-17 19:26:00 +00:00
Craig Topper 81b021e7c0 [AVX-512] Update scalar logic test to show missed opportunity to use EVEX encoded logic instructions to get more registers to use.
llvm-svn: 290048
2016-12-17 19:25:55 +00:00
Jun Bum Lim 90b6b5074a [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289988
2016-12-16 20:38:39 +00:00
Adrian Prantl 73ec065604 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Jun Bum Lim f9416af191 Revert "[CodeGenPrep] Skip merging empty case blocks"
This reverts commit r289951.

llvm-svn: 289960
2016-12-16 17:06:14 +00:00
Hans Wennborg 35f21cba13 [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367)
atomic_load_add returns the value before addition, but sets EFLAGS based on the
result of the addition. That means it's setting the flags based on effectively
subtracting C from the value at x, which is also what the outer cmp does.

This targets a pattern that occurs frequently with reference counting pointers:

  void decrement(long volatile *ptr) {
    if (_InterlockedDecrement(ptr) == 0)
      release();
  }

Clang would previously compile it (for 32-bit at -Os) as:

00000000 <?decrement@@YAXPCJ@Z>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   31 c9                   xor    %ecx,%ecx
   6:   49                      dec    %ecx
   7:   f0 0f c1 08             lock xadd %ecx,(%eax)
   b:   83 f9 01                cmp    $0x1,%ecx
   e:   0f 84 00 00 00 00       je     14 <?decrement@@YAXPCJ@Z+0x14>
  14:   c3                      ret

and with this patch it becomes:

00000000 <?decrement@@YAXPCJ@Z>:
   0:   8b 44 24 04             mov    0x4(%esp),%eax
   4:   f0 ff 08                lock decl (%eax)
   7:   0f 84 00 00 00 00       je     d <?decrement@@YAXPCJ@Z+0xd>
   d:   c3                      ret

(Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add
or pre-decrement operator generate the same code.)

Differential Revision: https://reviews.llvm.org/D27781

llvm-svn: 289955
2016-12-16 16:34:59 +00:00
Jun Bum Lim 85347dde27 [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289951
2016-12-16 16:03:31 +00:00
Simon Pilgrim 9519bd9232 [X86][AVX512] use a single shufps for 512-bit vectors when it can save instructions
This is the 512-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837

This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

llvm-svn: 289946
2016-12-16 14:30:04 +00:00
Simon Pilgrim 224416a9e4 [X86][AVX512] Add tests showing missed opportunity to efficiently lower v16i32 to VSHUFPS (PR27885)
llvm-svn: 289945
2016-12-16 14:21:57 +00:00
Simon Pilgrim f159a3414f [X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain.
We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead.

llvm-svn: 289937
2016-12-16 11:48:51 +00:00
Andrew V. Tischenko 5f643ad847 Extra coverage tests to demonstrate fixes in D72618 and D26855
llvm-svn: 289931
2016-12-16 09:56:02 +00:00
Adrian Prantl 74a835cda0 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Chandler Carruth 4154062b69 Revert patch series introducing the DAG combine to match a load-by-bytes
idiom.

r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...

This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.

However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.

Sorry for any trouble or disruption!

llvm-svn: 289916
2016-12-16 04:05:22 +00:00
Adrian Prantl 03c6d31a3b Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Adrian Prantl ce13935776 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Quentin Colombet 327f942876 [IRTranslator] Merge the entry and ABI lowering blocks.
The IRTranslator uses an additional block before the LLVM-IR entry block
to perform all the ABI lowering and the constant hoisting. Thus, this
block is the actual entry block and it falls through the LLVM-IR entry
block. However, with such representation, we end up with two basic
blocks that are not maximal.

Therefore, this patch adds a bit of canonicalization by merging both the
LLVM-IR entry block and the ABI lowering/constants hoisting into one
block, making the resulting block more likely to be maximal (indeed the
LLVM-IR entry block might not have been maximal).

llvm-svn: 289891
2016-12-15 23:32:25 +00:00
Eli Friedman 379294676d Don't combine splats with other shuffles.
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .

Differential Revision: https://reviews.llvm.org/D27793

llvm-svn: 289882
2016-12-15 22:41:40 +00:00
Eli Friedman 34505083c6 Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.

Differential Revision: https://reviews.llvm.org/D27787

llvm-svn: 289874
2016-12-15 21:36:59 +00:00
Sanjay Patel a97358bc8e [x86] use a single shufps for 256-bit vectors when it can save instructions
This is the 256-bit counterpart to the 128-bit transform checked in here:
https://reviews.llvm.org/rL289837

This patch is based on the draft by @sroland (Roland Scheidegger) that is
attached to PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

llvm-svn: 289846
2016-12-15 18:43:46 +00:00
Sanjay Patel a0d8a278a7 [x86] use a single shufps when it can save instructions
This is a tiny patch with a big pile of test changes.
This partially fixes PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885

My motivating case looks like this:

  - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
  - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
  - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]

  + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]

And this happens several times in the diffs. For chips with domain-crossing penalties,
the instruction count and size reduction should usually overcome any potential 
domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such
as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so
using shufps is a pure win.

So the test case diffs all appear to be improvements except one test in 
vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate 
zero elements and one test in combine-sra.ll where multiple uses prevent the expected
shuffle combining.

Differential Revision: https://reviews.llvm.org/D27692

llvm-svn: 289837
2016-12-15 18:03:38 +00:00
Simon Pilgrim 7522f54feb [X86][SSE] Fix domains for scalar store instructions
As discussed on D27692

llvm-svn: 289834
2016-12-15 17:09:24 +00:00
Simon Pilgrim d7518896ff [X86][SSE] Fix domains for VZEXT_LOAD type instructions
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.

Differential Revision: https://reviews.llvm.org/D27684

llvm-svn: 289825
2016-12-15 16:05:29 +00:00
Alexey Bataev 2db6045b29 Revert "[TESTS] Initial commit of tests, by Andrew Tischenko"
This reverts commit ee709f8988653a0334fbf100cdbbdd83a3933347.

llvm-svn: 289814
2016-12-15 12:26:18 +00:00
Alexey Bataev 67c90c7d95 [TESTS] Initial commit of tests, by Andrew Tischenko
llvm-svn: 289807
2016-12-15 11:48:24 +00:00
Sanjoy Das 93b1de0f8c Add missing -mtriple to MIR test case
llvm-svn: 289779
2016-12-15 07:13:50 +00:00
Sanjoy Das d7389d6261 [MachineBlockPlacement] Don't make blocks "uneditable"
Summary:
This fixes an issue with MachineBlockPlacement due to a badly timed call
to `analyzeBranch` with `AllowModify` set to true.  The timeline is as
follows:

 1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls
    `TailDup.shouldTailDuplicate` on its argument, which in turn calls
    `analyzeBranch` with `AllowModify` set to true.

 2. This `analyzeBranch` call edits the terminator sequence of the block
    based on the physical layout of the machine function, turning an
    unanalyzable non-fallthrough block to a unanalyzable fallthrough
    block.  Normally MBP bails out of rearranging such blocks, but this
    block was unanalyzable non-fallthrough (and thus rearrangeable) the
    first time MBP looked at it, and so it goes ahead and decides where
    it should be placed in the function.

 3. When placing this block MBP fails to analyze and thus update the
    block in keeping with the new physical layout.

Concretely, before (1) we have something like:

```
LBL0:
  < unknown terminator op that may branch to LBL1 >
  jmp LBL1

LBL1:
  ... A

LBL2:
  ... B
```

In (2), analyze branch simplifies this to

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump removed

LBL1:
  ... A

LBL2:
  ... B
```

In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2
after the first block since that is profitable.

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump

LBL2:
  ... B

LBL1:
  ... A
```

and the program now has incorrect behavior (we no longer fall-through
from `LBL0` to `LBL1`) because MBP can no longer edit LBL0.

There are several possible solutions, but I went with removing the teeth
off of the `analyzeBranch` calls in TailDuplicator.  That makes thinking
about the result of these calls easier, and breaks nothing in the lit
test suite.

I've also added some bookkeeping to the MachineBlockPlacement pass and
used that to write an assert that would have caught this.

Reviewers: chandlerc, gberry, MatzeB, iteratee

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27783

llvm-svn: 289764
2016-12-15 05:08:57 +00:00
Eli Friedman db07ebbab6 Add testcases for some shuffle bugs.
See https://llvm.org/bugs/show_bug.cgi?id=31301 and
https://llvm.org/bugs/show_bug.cgi?id=31364 .

llvm-svn: 289751
2016-12-15 01:47:15 +00:00
Nirav Dave f5bf03c7ef Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Nirav Dave 8527ab0ad2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Simon Pilgrim 05ab8ffc7e [DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of just APInt::isPowerOf2
Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases.

Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value.

Differential Revision: https://reviews.llvm.org/D27714

llvm-svn: 289654
2016-12-14 15:08:13 +00:00
Michael Zuckerman 1ce2a23a1e Fix bug 30945- [AVX512] Failure to flip vector comparison to remove not mask instruction
adding new optimization opportunity by adding new X86ISelLowering pattern. The test case was shown in https://llvm.org/bugs/show_bug.cgi?id=30945.

Test explanation:
Select gets three arguments mask, op and op2. In this case, the Mask is a result of ICMP. The ICMP instruction compares (with equal operand) the zero initializer vector and the result of the first ICMP.

In general, The result of "cmp eq, op1, zero initializers" is "not(op1)" where op1 is a mask. By rearranging of the two arguments inside the Select instruction, we can get the same result. Without the necessary of the middle phase ("cmp eq, op1, zero initializers").

Missed optimization opportunity: 
vpcmpled %zmm0, %zmm1, %k0
knotw %k0, %k1

can be combine to 
vpcmpgtd %zmm0, %zmm2, %k1

Reviewers: 
1. delena
2. igorb 

Commited after check all 
Differential Revision: https://reviews.llvm.org/D27160

llvm-svn: 289653
2016-12-14 14:57:10 +00:00
Simon Pilgrim ebe58191c8 [X86][SSE] Add AVX1 tests to sdiv/udiv srem/urem combine tests
As requested on D27714

llvm-svn: 289652
2016-12-14 14:39:51 +00:00
Paul Robinson 8fec3da00c [DWARF] Preserve column number when emitting 'line 0' record
Follow-up to r289256, address a FIXME to avoid resetting the column
number. This reduced .debug_line by 2.6% in a RelWithDebInfo
self-build of clang.

llvm-svn: 289620
2016-12-14 00:27:35 +00:00
Simon Pilgrim 5f2db1351f [X86][SSE] Regenerate vector of pointers tests
llvm-svn: 289555
2016-12-13 17:22:39 +00:00
Artur Pilipenko c93cc5955f [DAGCombiner] Match load by bytes idiom and fold it into a single load
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: hfinkel, RKSimon, filcab

Differential Revision: https://reviews.llvm.org/D26149

llvm-svn: 289538
2016-12-13 14:21:14 +00:00
Philip Reames 1f1bbac8da [peephole] Enhance folding logic to work for STATEPOINTs
The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are:

    Support for folding multiple operands at once which reference the same load
    Support for folding multiple loads into a single instruction
    Walk all the operands of the instruction for varidic instructions (this is a bug fix!)

Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here.

Differential Revision: https://reviews.llvm.org/D24103

llvm-svn: 289510
2016-12-13 01:38:41 +00:00
Philip Reames 51387a8c28 [Statepoints] Reuse stack slots more than once within a basic block
The stack slot reuse code had a really amusing bug. We ended up only reusing a stack slot exact once (initial use + reuse) within a basic block. If we had a third statepoint to process, we ended up allocating a new set of stack slots. If we crossed a basic block boundary, the set got cleared. As a result, code which is invoke heavy doesn't see the problem, but multiple calls within a basic block does. Net result: as we optimize invokes into calls, lowering gets worse.

The root error here is that the bitmap uses by the custom allocator wasn't kept in sync. The result was that we ended up resizing the bitmap on the next statepoint (to handle the cross block case), reset the bit once, but then never reset it again.

Differential Revision: https://reviews.llvm.org/D25243

llvm-svn: 289509
2016-12-13 01:21:15 +00:00
Sanjay Patel 2a1554a0b6 [x86] fix test specifications
llvm-svn: 289493
2016-12-12 23:16:35 +00:00
Sanjay Patel 1740526e99 [x86] fix test specifications and auto-generate checks
llvm-svn: 289492
2016-12-12 23:15:15 +00:00
Andrew Kaylor ff6a1edfa8 Avoid infinite loops in branch folding
Differential Revision: https://reviews.llvm.org/D27582

llvm-svn: 289486
2016-12-12 23:05:38 +00:00
Paul Robinson ac7fe5e0c4 Recommit r288212: Emit 'no line' information for interesting 'orphan' instructions.
DWARF specifies that "line 0" really means "no appropriate source
location" in the line table.  By default, use this for branch targets
and some other cases that have no specified source location, to
prevent inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).

Updated patch allows enabling or suppressing this behavior for all
unspecified source locations.

Differential Revision: http://reviews.llvm.org/D24180

llvm-svn: 289468
2016-12-12 20:49:11 +00:00
Simon Pilgrim a64d4dc22f [X86] Regenerate vector bitcast/widening tests.
llvm-svn: 289443
2016-12-12 16:15:45 +00:00
Simon Pilgrim d4ff86b973 [X86] Regenerate test.
llvm-svn: 289438
2016-12-12 15:47:53 +00:00
Simon Pilgrim 5ebd2b542b [X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts.
Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift.

llvm-svn: 289429
2016-12-12 13:33:58 +00:00
Simon Pilgrim 369cd349b9 [X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ
PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far.

Differential Revision: https://reviews.llvm.org/D27657

llvm-svn: 289426
2016-12-12 10:49:15 +00:00
Simon Pilgrim 040a36c176 [SelectionDAG] Add support for EXTRACT_SUBVECTOR to ComputeNumSignBits
Pre-commit as discussed on D27657

llvm-svn: 289425
2016-12-12 10:29:43 +00:00
Craig Topper 36ecce9bed [X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the X86ISD::VZEXT_LOAD opcode.
Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics.

llvm-svn: 289424
2016-12-12 07:57:24 +00:00
Craig Topper 081c0e2864 [X86] Remove some intrinsic instructions from hasPartialRegUpdate
Summary:
These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits.

As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded.

Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too.

Reviewers: spatel, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27611

llvm-svn: 289419
2016-12-12 05:07:17 +00:00
Simon Pilgrim 831435cb14 [X86][SSE] Add support for combining target shuffles to SHUFPD.
llvm-svn: 289407
2016-12-11 21:26:25 +00:00
Ayman Musa 7ec4ed55d3 [X86][AVX512] Add missing patterns for broadcast fallback in case load node has multiple uses (for v4i64 and v4f64).
When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded.
A fallback pattern is added to catch these cases and provide another solution.

Differential Revision: https://reviews.llvm.org/D27661

llvm-svn: 289404
2016-12-11 20:11:17 +00:00
Simon Pilgrim 7c98a79f7b [X86][AVX512] Add target shuffle test showing missing PSHUFPD combine.
llvm-svn: 289400
2016-12-11 19:41:23 +00:00
Simon Pilgrim 8766a76f3d [X86][XOP] Add target shuffle tests showing missing PSHUFPD combine.
llvm-svn: 289398
2016-12-11 19:36:25 +00:00
Oren Ben Simhon 9683ecbff6 [X86] Regcall - Adding support for mask types
Regcall calling convention passes mask types arguments in x86 GPR registers.
The review includes the changes required in order to support v32i1, v16i1 and v8i1.

Differential Revision: https://reviews.llvm.org/D27148

llvm-svn: 289383
2016-12-11 14:10:52 +00:00
Craig Topper 1f1b441267 [X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for being able to constant fold them in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289350
2016-12-11 01:26:44 +00:00
Craig Topper edab02b50b [X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being able to constant fold it in InstCombineCalls like we do for 128/256-bit.
llvm-svn: 289344
2016-12-10 23:09:43 +00:00
Simon Pilgrim b71b214287 [X86][SSE] Add tests for sign extended vXi64 multiplication
llvm-svn: 289342
2016-12-10 22:02:36 +00:00
Craig Topper abe7c5b5e9 [AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a select around the unmasked avx1 intrinsics.
llvm-svn: 289340
2016-12-10 21:15:52 +00:00
Craig Topper 18b57da491 [AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available.
llvm-svn: 289335
2016-12-10 19:35:39 +00:00
Simon Pilgrim 54945a12ec [SelectionDAG] Add ability for computeKnownBits to peek through bitcasts from 'large element' scalar/vector to 'small element' vector.
Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types.

llvm-svn: 289329
2016-12-10 17:00:00 +00:00
Simon Pilgrim 90a040e745 [X86][XOP] Add permil2ps buildvector combine test
llvm-svn: 289327
2016-12-10 13:45:08 +00:00
Simon Pilgrim 8dc97a4591 [X86] Regenerate test
llvm-svn: 289279
2016-12-09 21:53:12 +00:00
Simon Pilgrim 017b7a71d8 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.

llvm-svn: 289232
2016-12-09 17:53:11 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00
Simon Pilgrim e4050a2961 [SelectionDAG] Add partial BITCAST support to computeKnownBits
Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together.

We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them.

Differential Revision: https://reviews.llvm.org/D27129

llvm-svn: 289200
2016-12-09 10:13:45 +00:00
Daniel Jasper f51e05ffbc Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes"
This reverts commit r288916 as it is currently causing a crasher in
Halide. Reproducer on llvm.org/PR31323. While it might be that halide is
generating invalid IR, llc shouldn't crash.

llvm-svn: 289194
2016-12-09 09:04:51 +00:00
Craig Topper a55b483bb5 [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics
Summary:
Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection.

This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits.

We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version.

We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that.

This fixes PR30913.

Reviewers: delena, zvi, v_klochkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27144

llvm-svn: 289190
2016-12-09 06:42:28 +00:00
Craig Topper c4f2b0996d [X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables.
llvm-svn: 289186
2016-12-09 05:20:11 +00:00
Craig Topper 2aeb456425 [AVX-512] Add vpermilps/pd to load folding tables.
llvm-svn: 289173
2016-12-09 02:18:11 +00:00
Craig Topper df9de00928 [AVX-512] Move some floating point stack folding test cases out of the integer test.
llvm-svn: 289172
2016-12-09 02:18:07 +00:00
Reid Kleckner 785e7d282c Don't emit .seh_handler directives for any cleanup funclets
We were falsely claiming that we had an LSDA for the relevant EH
personality before this change, which could lead to the EH machinery
interpreting random adjacent data as an LSDA.

Fixes PR31317

This change is safe because cleanups can't contain exception handlers
today. We do these things to maintain that invariant:
- C++ destructors are naturally out-of-line
- __finally blocks are outlined in clang
- LLVM's inliner will not inline EH constructs into cleanups

llvm-svn: 289101
2016-12-08 20:38:46 +00:00
Peter Collingbourne 235c275b20 IR, X86: Understand !absolute_symbol metadata on global variables.
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.

As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html

Differential Revision: https://reviews.llvm.org/D25878

llvm-svn: 289087
2016-12-08 19:01:00 +00:00
Nicolai Haehnle 3c67a08d1b X86: Add checks for fma_patterns[_wide].ll with -enable-no-infs-fp-math
This re-adds checks for the patterns that were disabled with r288506.

Reviewers: spatel, delena, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27346

llvm-svn: 289049
2016-12-08 14:08:08 +00:00
Simon Pilgrim d9c53710d5 [X86][SSE] Add vector test for (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) detailed in D19325
llvm-svn: 289035
2016-12-08 10:17:25 +00:00
Quentin Colombet ae3168da3f [InlineSpiller] Don't call TargetInstrInfo::foldMemoryOperand with an empty list.
Since r287792 if we try to do that we will hit an assert.

llvm-svn: 289001
2016-12-08 00:06:51 +00:00
Tim Northover 05cc4859ad GlobalISel: simplify MachineIRBuilder interface.
MachineIRBuilder had weird before/after and beginning/end flags for the insert
point. Unfortunately the non-default means that instructions will be inserted
in reverse order which is almost never what anyone wants.

Really, I think we just want (like IRBuilder has) the ability to insert at any
C++ iterator-style point (i.e. before any instruction or before MBB.end()). So
this fixes MIRBuilders to behave like IRBuilders in this respect.

llvm-svn: 288980
2016-12-07 21:05:38 +00:00
Michael Kuperstein 5842b20633 [X86] Skip over DEBUG_VALUE while looking for start of call sequence
If we don't skip over DEBUG_VALUEs, we get differences between -g and non-g
code.

This fixes PR31242.

Differential Revision: https://reviews.llvm.org/D27485

llvm-svn: 288965
2016-12-07 19:31:08 +00:00
Michael Kuperstein 18092cf2c3 [X86] Do not assume "ri" instructions always have an immediate operand
The second operand of an "ri" instruction may be an immediate, but it may
also be a globalvariable, so we should make any assumptions.

This fixes PR31271.

Differential Revision: https://reviews.llvm.org/D27481

llvm-svn: 288964
2016-12-07 19:29:18 +00:00
Simon Pilgrim ba05d41095 [SelectionDAG] Add knownbits support for vector demandedelts in SMAX/SMIN/UMAX/UMIN opcodes
llvm-svn: 288926
2016-12-07 17:54:00 +00:00
Simon Pilgrim ef76b83164 [X86] Add knownbits vector UMAX test
In preparation for demandedelts support

llvm-svn: 288920
2016-12-07 17:21:13 +00:00
Simon Pilgrim 967325b373 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes
llvm-svn: 288916
2016-12-07 16:28:21 +00:00
Simon Pilgrim b421ef2370 [X86] Add test to show missed opportunities to calculate knownbits in INSERT_VECTOR_ELT
llvm-svn: 288912
2016-12-07 15:27:18 +00:00
Simon Pilgrim 33f2a669c1 [X86][SSE] Fix vpextrd/vpextrq checks
They were testing for the pre-vex versions

llvm-svn: 288911
2016-12-07 15:10:05 +00:00
Simon Pilgrim 4b1ebf97fc [X86][SSE] Force execution domain of 32-bit extractps/pextrd in the stack folding tests
llvm-svn: 288910
2016-12-07 15:06:14 +00:00
Simon Pilgrim e75ff02269 [X86][SSE] Regenerate test.
llvm-svn: 288906
2016-12-07 13:05:04 +00:00
Simon Pilgrim 8893bd95f0 [X86][SSE] Consistently set MOVD/MOVQ load/store/move instructions to integer domain
We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain.

Differential Revision: https://reviews.llvm.org/D27419

llvm-svn: 288902
2016-12-07 12:10:49 +00:00
Simon Pilgrim d5bc5c16b2 [X86][XOP] Fix VPERMIL2 non-constant pool shuffle decoding (PR31296)
The non-constant pool version of DecodeVPERMIL2PMask was not offsetting correctly for the second input. I've updated the code to match the implementation in the constant-pool version.

Annoyingly this bug was hidden for so long as it's tricky to combine to useful variable shuffle masks that don't become constant-pool entries.

llvm-svn: 288898
2016-12-07 11:19:00 +00:00
Simon Pilgrim 0559b9e557 [X86][XOP] Add test case for PR31296
llvm-svn: 288858
2016-12-06 22:50:13 +00:00
Zvi Rackover 8bc7e4da51 [X86] Prefer reduced width multiplication over pmulld on Silvermont
Summary:
Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference Manual]:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].

Fixes pr31202.

Analysis of this issue was done by Fahana Aleen.

Reviewers: wmi, delena, mkuper

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D27203

llvm-svn: 288844
2016-12-06 19:35:20 +00:00
Simon Pilgrim dd6ca639d5 [DAGCombine] Add (sext_in_reg (zext x)) -> (sext x) combine
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.

Differential Revision: https://reviews.llvm.org/D27461

llvm-svn: 288842
2016-12-06 19:09:37 +00:00
Simon Pilgrim 1577b39f51 [SelectionDAG] We can ignore knownbits from an undef shuffle vector index if we don't actually demand that element
llvm-svn: 288839
2016-12-06 18:58:25 +00:00
Simon Pilgrim 4a2979ce12 [X86][SSE] Add knownbits test demonstrating demandedelts not ignoring undef shuffle elements
llvm-svn: 288825
2016-12-06 17:00:47 +00:00
Simon Pilgrim 0caaadfc2d [X86][SSE] Added vector sext_in_reg combine tests
llvm-svn: 288819
2016-12-06 15:57:26 +00:00
Simon Pilgrim 7c7b649639 [X86] Improve UMAX/UMIN knownbits test
Test the sequential effect of each op

llvm-svn: 288815
2016-12-06 15:17:50 +00:00
Ayman Musa 86c00b799f [X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting.
Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern.
For example:
"build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>"

Differential Revision: https://reviews.llvm.org/D26802

llvm-svn: 288804
2016-12-06 12:24:14 +00:00
Simon Pilgrim ae63dd10f8 [X86] Add tests to show missed opportunities to calculate knownbits in SMAX/SMIN/UMAX/UMIN
llvm-svn: 288801
2016-12-06 12:12:20 +00:00
Florian Hahn 7582c669bd [framelowering] Improve tracking of first CS pop instruction.
Summary: This patch makes sure FirstCSPop and MBBI never point to DBG_VALUE instructions, which affected the code generated.

Reviewers: mkuper, aprantl, MatzeB

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27343

llvm-svn: 288794
2016-12-06 10:24:55 +00:00
Craig Topper b34eef7b41 [X86] Remove another weird scalar sqrt/rcp/rsqrt pattern.
This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still.

The only test case for this pattern is one I just added so there was no coverage of this.

llvm-svn: 288784
2016-12-06 08:08:12 +00:00
Craig Topper 26ce4267ef [X86] Add test case demonstrating a case where a vector sqrt being passed (scalar_to_vector loadf64) uses a scalar sqrt instruction.
This occurs due to a pattern that uses sse_load_f32/f64 with vector sqrt/rcp/rsqrt operations and turns them into scalar instructions. Perhaps for the case were the upper bits come from undef this is ok.  I believe a (vzmovl load64) would do the same thing but those seems to become vzload instead and selectScalarSSELoad doesn't handle that today. In that case we should be performing the vector operation on the zeros in the upper bits which is not equivalent to using a scalar instruction.

I will remove this pattern in a follow up patch. There appears to be no other test content for it.

llvm-svn: 288783
2016-12-06 08:08:09 +00:00
Craig Topper aa2c38378c [X86] Regenerate a test using update_llc_test_checks.py
llvm-svn: 288782
2016-12-06 08:08:07 +00:00
Craig Topper 683470bf1b [X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic.
The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits.

llvm-svn: 288781
2016-12-06 08:08:04 +00:00
Craig Topper 125939ff65 [X86] Add test case that shows a scalar sqrtsd intrinsic of a 128-bit vector load using the load form of the sqrtsd instruction which violates the intrinsic semantics.
The sqrtsd instruction only loads 64-bits and writes bits 63:0 with the sqrt result. Bits 127:64 are preserved in the destination register. The semantics of the intrinsic indicate bits 127:64 should come from the intrinsic argument which in this case is a 128-bit load. So the generated code should have a 128-bit load and use a register form of sqrtsd.

llvm-svn: 288780
2016-12-06 08:08:01 +00:00
Craig Topper 5fc7bc91f9 [X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined.
The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands.

I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded.

llvm-svn: 288779
2016-12-06 08:07:58 +00:00
Craig Topper 6413f8a8f2 [X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead
Summary:
This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently.

I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel.

I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available.

Reviewers: spatel, delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27401

llvm-svn: 288771
2016-12-06 04:58:39 +00:00
Michael Kuperstein e3036abcf9 [X86] Fix non-intrinsic roundss/roundsd to not read the destination register
This changes the scalar non-intrinsic non-avx roundss/sd instruction
definitions not to read their destination register - allowing partial dependency
breaking.

This fixes PR31143.

Differential Revision: https://reviews.llvm.org/D27323

llvm-svn: 288703
2016-12-05 20:57:37 +00:00
Adrian Prantl 941fa7588b [DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation
so we can stop using DW_OP_bit_piece with the wrong semantics.

The entire back story can be found here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html

The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's
offset field to mean the offset into the source variable rather than
the offset into the location at the top the DWARF expression stack. In
order to be able to fix this in a subsequent patch, this patch
introduces a dedicated DW_OP_LLVM_fragment operation with the
semantics that we used to apply to DW_OP_bit_piece, which is what we
actually need while inside of LLVM. This patch is complete with a
bitcode upgrade for expressions using the old format. It does not yet
fix the DWARF backend to use DW_OP_bit_piece correctly.

Implementation note: We discussed several options for implementing
this, including reserving a dedicated field in DIExpression for the
fragment size and offset, but using an custom operator at the end of
the expression works just fine and is more efficient because we then
only pay for it when we need it.

Differential Revision: https://reviews.llvm.org/D27361
rdar://problem/29335809

llvm-svn: 288683
2016-12-05 18:04:47 +00:00
Sanjay Patel 1f158d6955 [TargetLowering] add special-case for demanded bits analysis of 'not'
We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. 
Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get
folded into another logic op if the target has those. However, if we can remove a logic
instruction by changing the xor's constant mask value, that should always be a win.

Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case
currently (although that's marked with a FIXME). So if you run this IR through -instcombine,
you should get the same end result. I'm hoping to add a different backend transform that 
will expose this problem though, so I need to solve this first.

Differential Revision: https://reviews.llvm.org/D27356

llvm-svn: 288676
2016-12-05 15:58:21 +00:00
Sanjay Patel f807f6a05f [x86] fold fand (fxor X, -1) Y --> fandn X, Y
I noticed this gap in the scalar FP-logic matching with:
D26712
and:
rL287171

Differential Revision: https://reviews.llvm.org/D27385

llvm-svn: 288675
2016-12-05 15:45:27 +00:00
Simon Pilgrim b08c98f125 [X86][SSE] Add support for combining target shuffles to UNPCKL/UNPCKH.
llvm-svn: 288663
2016-12-05 11:25:13 +00:00
Craig Topper db8467ae26 [AVX-512] Teach fast isel to handle 512-bit vector bitcasts.
llvm-svn: 288641
2016-12-05 05:50:51 +00:00
Craig Topper 7ef6ea324a [AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel.
llvm-svn: 288636
2016-12-05 04:51:31 +00:00
Craig Topper 227d4279a8 [AVX-512] Add avx512f command lines to fast isel SSE select test.
Currently the fast isel code emits an avx1 instruction sequence even with avx512. This is different than normal isel. A follow up commit will fix this.

llvm-svn: 288635
2016-12-05 04:51:28 +00:00
Simon Pilgrim 6133fc3aa2 [X86][XOP] Add target shuffle tests showing missing UNPCKL combine.
llvm-svn: 288628
2016-12-04 22:55:57 +00:00
Simon Pilgrim 38d245197e [X86][AVX512] Add target shuffle tests showing missing UNPCK combines.
llvm-svn: 288627
2016-12-04 22:54:21 +00:00
Matt Arsenault 92fede361f DAG: Fold out out of bounds insert_vector_elt
getNode already prevents formation of out of bounds constant
extract_vector_elts. Do the same for insert_vector_elt.

llvm-svn: 288603
2016-12-03 23:03:26 +00:00
Craig Topper 9d16bfa0f5 [AVX-512] Add many of the VPERM instructions to the load folding table. Move VPERMPDZri to the correct table.
llvm-svn: 288591
2016-12-03 19:37:39 +00:00
Craig Topper c210827b53 [AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables.
llvm-svn: 288587
2016-12-03 17:19:15 +00:00
Craig Topper 8e7498976a [X86] Fix VEX encoded VPMADDUBSW to not be marked commutable.
This was accidentallly broken in r285515 when we started lowering the intrinsic to an ISD node. Should fix PR31241.

llvm-svn: 288578
2016-12-03 05:35:44 +00:00
Craig Topper da73a09fcd [X86] Add test cases demonstrating where we incorrectly commute VEX VPMADDUSBW due to a bug introduced in r285515.
I believe this is the cause of PR31241.

llvm-svn: 288577
2016-12-03 05:35:38 +00:00
Sanjay Patel a5dbdf342b [x86] add common check prefix to reduce duplication; NFC
llvm-svn: 288522
2016-12-02 17:58:26 +00:00
Sanjay Patel c731187732 fix check-label
llvm-svn: 288517
2016-12-02 17:50:14 +00:00
Sanjay Patel 91d1ed5ee6 [x86] add tests to show missing demanded bits analysis; NFC
llvm-svn: 288515
2016-12-02 17:48:48 +00:00
Nicolai Haehnle 33ca182c91 [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.

Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:

  x = 1.01
  y = 111

  x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)

  x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)

The example relies on rounding towards zero at least in the second step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578

Reviewers: RKSimon, tstellarAMD, spatel, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26602

llvm-svn: 288506
2016-12-02 16:06:18 +00:00
Simon Pilgrim 3a19863f1c [X86][SSE] Renamed shuffle combine test.
We're trying to combine to vpunpckhbw not vpunpckhwd

llvm-svn: 288501
2016-12-02 14:43:39 +00:00
Simon Pilgrim cbf5f97018 [X86][SSE] Add support for extracting constant bit data from broadcasted constants
llvm-svn: 288499
2016-12-02 13:16:08 +00:00
Craig Topper 4961fa9bba [AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding tables.
llvm-svn: 288484
2016-12-02 07:57:11 +00:00
Craig Topper 17ddb521ef [AVX-512] Add EVEX PSHUFB instructions to load folding tables.
llvm-svn: 288482
2016-12-02 07:06:30 +00:00
Craig Topper f7866fad54 [AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables.
llvm-svn: 288481
2016-12-02 06:24:38 +00:00
Chih-Hung Hsieh 76b913c470 [SelectionDAG] getRawSubclassData should not return HasDebugValue.
This change fixes a regression in r279537 and
makes getRawSubclassData behave like r279536.
Without this change, the fp128-g.ll test case will have an
infinite loop involving SoftenFloatRes_LOAD.

Differential Revision: http://reviews.llvm.org/D26942

llvm-svn: 288420
2016-12-01 21:56:33 +00:00
Simon Pilgrim 5fe6236035 [X86][SSE] Classify AND bitmasks as variable shuffle masks
They are loading the bitmasks from the constant pool so the cost is similar to loading a shuffle mask.

llvm-svn: 288367
2016-12-01 16:00:14 +00:00
Simon Pilgrim 1e4d870999 [X86][SSE] Add support for combining AND bitmasks to shuffles.
llvm-svn: 288365
2016-12-01 15:41:40 +00:00
Simon Pilgrim 2cff28dd27 [X86][SSE] Tidied up filecheck prefixes for uitofp fast-math tests.
They should be in 'narrowing' order from common to more specific test prefixes.

llvm-svn: 288338
2016-12-01 14:56:48 +00:00
Simon Pilgrim 55066e5622 [X86][SSE] Add support for combining target shuffles to AND bitmasks.
llvm-svn: 288335
2016-12-01 13:47:02 +00:00
Simon Pilgrim 947650e99d [X86][SSE] Add support for combining ISD::AND with shuffles.
Attempts to convert an AND with a vector of 255 or 0 values into a shuffle (blend) mask.

llvm-svn: 288333
2016-12-01 11:52:37 +00:00
Simon Pilgrim ed4ede0c29 [X86][SSE] Added tests showing missed combines of shuffles with ANDs.
llvm-svn: 288330
2016-12-01 11:26:07 +00:00
Kostya Serebryany b66cb88c2e revert r288283 as it causes debug info (line numbers) to be lost in instrumented code. also revert r288299 which was a workaround for the problem.
llvm-svn: 288300
2016-12-01 02:06:56 +00:00
Matthias Braun 39c3c89cdc MCStreamer: Use "cfi" for CFI related temp labels.
Choosing a "cfi" name makes the intend a bit clearer in an assembly dump
and more importantly the assembly dumps are slightly more stable as the
numbers don't move around anymore when unrelated code calls
createTempSymbol() more or less often.
As they are temp labels the name doesn't influence the generated object
code.

Differential Revision: https://reviews.llvm.org/D27244

llvm-svn: 288290
2016-11-30 23:48:26 +00:00
Paul Robinson 37a13ddb4b Recommit r288212: Emit 'no line' information for interesting 'orphan' instructions.
The LLDB tests are now ready for this patch.

DWARF specifies that "line 0" really means "no appropriate source
location" in the line table.  Use this for branch targets and some
other cases that have no specified source location, to prevent
inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).

Differential Revision: http://reviews.llvm.org/D24180

llvm-svn: 288283
2016-11-30 22:49:55 +00:00
Simon Pilgrim 4ae3834792 [X86][SSE] Added tests showing missed combines of ANDs with shuffles.
llvm-svn: 288259
2016-11-30 18:15:10 +00:00
Simon Pilgrim 288c088c17 [X86][SSE] Add support for target shuffle constant folding
Initial support for target shuffle constant folding in cases where all shuffle inputs are constant. We may be able to relax this and merge shuffles with only some constant inputs in the future.

I've added the helper function getTargetConstantBitsFromNode (based off a similar function in X86ShuffleDecodeConstantPool.cpp) that could be reused for other cases requiring constant vector extraction.

Differential Revision: https://reviews.llvm.org/D27220

llvm-svn: 288250
2016-11-30 16:33:46 +00:00
Simon Pilgrim 6c38b7548d Updated test with -verify-machineinstrs to check for PR21931
llvm-svn: 288242
2016-11-30 13:21:12 +00:00
Simon Pilgrim b099d16516 [X86][SSE] Add tests demonstrating missed opportunities to combine 64-bit element unpacks with horizontal pair ops.
llvm-svn: 288240
2016-11-30 11:30:33 +00:00
Paul Robinson 957ba405e8 Revert r288212 due to lldb failure.
llvm-svn: 288216
2016-11-29 23:20:35 +00:00
Paul Robinson 96de8c778b Emit 'no line' information for interesting 'orphan' instructions.
DWARF specifies that "line 0" really means "no appropriate source
location" in the line table.  Use this for branch targets and some
other cases that have no specified source location, to prevent
inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).

Differential Revision: http://reviews.llvm.org/D24180

llvm-svn: 288212
2016-11-29 22:41:16 +00:00
Simon Pilgrim 17062a2bf6 [X86][AVX512VL] Improved testing of vcvtpd2ps, vcvtpd2dq/vcvtpd2udq and vcvttpd2dq/vcvttpd2udq implicit zeroing of upper 64-bits of xmm result
Ensure that masked instruction doesn't assume implicit zeroing.

llvm-svn: 288211
2016-11-29 22:38:30 +00:00
Simon Pilgrim 849473ae99 [X86][AVX512DQVL] Improved testing of vcvtqq2ps/vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result
Ensure that masked instruction doesn't assume implicit zeroing.

llvm-svn: 288209
2016-11-29 22:36:28 +00:00
Simon Pilgrim 35c47c494d [X86][SSE] Add initial support for combining target shuffles to (V)PMOVZX.
We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output.

llvm-svn: 288140
2016-11-29 14:18:51 +00:00
Simon Pilgrim c17fb85090 [X86][SSE] Added tests showing missed combines to (V)PMOVZX
llvm-svn: 288136
2016-11-29 13:16:11 +00:00
Reid Kleckner c68a6c4ca9 Recognize ${:uid} escapes in intel syntax inline asm
It looks like this logic was duplicated long ago and the GCC side of
things has grown additional functionality. We need ${:uid} at least to
generate unique MS inline asm labels (PR23715), so expose these.

llvm-svn: 288092
2016-11-29 00:29:27 +00:00
Joerg Sonnenberger caaa82d90d Revert r287553: [CodeGenPrep] Skip merging empty case blocks
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").

llvm-svn: 288052
2016-11-28 18:56:54 +00:00
Simon Pilgrim 2228f70a85 [X86][SSE] Add initial support for combining (V)PMOVZX with shuffles.
llvm-svn: 288049
2016-11-28 17:58:19 +00:00
Simon Pilgrim 3f10e66981 [X86][SSE] Added support for combining bit-shifts with shuffles.
Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining.

Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations.

llvm-svn: 288040
2016-11-28 16:25:01 +00:00
Simon Pilgrim 3def9e11e2 [X86][SSE] Added tests showing missed combines of shifts with shuffles.
llvm-svn: 288037
2016-11-28 15:50:39 +00:00
Craig Topper ff9d45875a [X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions.
llvm-svn: 288009
2016-11-27 21:37:00 +00:00
Craig Topper b00872b983 [X86][FMA4] Add test cases to demonstrate missed folding opportunities for FMA4 scalar intrinsics.
llvm-svn: 288008
2016-11-27 21:36:58 +00:00
Simon Pilgrim 91d6f5fbc1 [X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts
llvm-svn: 288006
2016-11-27 21:08:19 +00:00
Craig Topper 4fab487265 [AVX-512] Add integer and fp unpck instructions to load folding tables.
llvm-svn: 288004
2016-11-27 19:51:41 +00:00
Simon Pilgrim 4571157d2d [X86][SSE] Added tests showing missed combines for shuffle to shifts.
llvm-svn: 288000
2016-11-27 18:25:02 +00:00
Craig Topper c3b3926f8b [AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287995
2016-11-27 08:55:31 +00:00
Craig Topper 991d1ca3ba [X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times.
Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied.

Reviewers: RKSimon, zvi, delena, spatel

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D26790

llvm-svn: 287983
2016-11-26 17:29:25 +00:00
Craig Topper 10d5eec1a1 [AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287975
2016-11-26 08:21:52 +00:00
Craig Topper 97169ea5f9 [AVX-512] Add masked 128/256-bit integer add/sub instructions to load folding tables.
llvm-svn: 287974
2016-11-26 08:21:48 +00:00
Craig Topper 53b33de1e3 [AVX-512] Add masked 512-bit integer add/sub instructions to load folding tables.
llvm-svn: 287972
2016-11-26 07:21:00 +00:00
Craig Topper 6677bb4e50 [AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change.
llvm-svn: 287971
2016-11-26 07:20:57 +00:00
Craig Topper 39265bb1ce [AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables.
llvm-svn: 287970
2016-11-26 07:20:53 +00:00
Craig Topper 88071b37ab [AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size.
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.

This patch detects this and changes the element size to match the vselect.

I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27087

llvm-svn: 287939
2016-11-25 16:48:05 +00:00
Craig Topper 1e48829747 [AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables.
llvm-svn: 287937
2016-11-25 16:33:53 +00:00
Simon Pilgrim 84b6f26eca [X86][SSE] Added knownbits through bitcast test
llvm-svn: 287928
2016-11-25 15:07:15 +00:00
Simon Pilgrim 8424b03d00 [X86][SSE] Added v16i8 shuffle test case from PR31151
llvm-svn: 287919
2016-11-25 11:10:43 +00:00
Craig Topper d621e3a25b [X86] Modify two tests that passed undef to both sides of a vselect to instead pass unique values.
I'd like to teach DAG combine to remove vselects where both sides are identical and these tests were in the way of that.

llvm-svn: 287903
2016-11-24 21:48:50 +00:00
Craig Topper 00758090ca [AVX-512] Add tests demonstrating failure to generated masked instructions for VSHUFF32x4 and VSHUFI32x4 due to shuffle lowering widening elements.
llvm-svn: 287897
2016-11-24 18:24:46 +00:00
Simon Pilgrim 9c71e07276 [X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64
Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit).

The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32.

Differential Revision: https://reviews.llvm.org/D26938

llvm-svn: 287886
2016-11-24 15:12:56 +00:00
Simon Pilgrim 841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim 7c26a6f9ef [X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result
llvm-svn: 287878
2016-11-24 14:02:30 +00:00
Simon Pilgrim ab323ec411 [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering
llvm-svn: 287877
2016-11-24 13:38:59 +00:00
Simon Pilgrim 1e7a846d5f [X86][AVX512DQVL] Add v2i64 -> v2f32 + zero codegen tests
llvm-svn: 287876
2016-11-24 13:26:51 +00:00
Nikolai Bozhenov 3a8d108b2b [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.

It fixes PR28755.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25088

llvm-svn: 287875
2016-11-24 13:23:35 +00:00
Craig Topper f23b995f78 [AVX-512] Fix some mask shuffle tests to actually test the case they were supposed to test.
llvm-svn: 287854
2016-11-24 05:36:50 +00:00
Craig Topper 993c7416d3 [AVX-512] Move a 16 x float shuffle test to the v16 test file and add an integer variant.
llvm-svn: 287853
2016-11-24 05:36:47 +00:00
Simon Pilgrim 3ce6a545c7 [X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result
We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq

llvm-svn: 287835
2016-11-23 22:35:06 +00:00
Simon Pilgrim eda1193456 [X86][AVX512VL] Add v2f64 -> v2i32/v2f32 + zero codegen tests
llvm-svn: 287821
2016-11-23 22:01:50 +00:00
Simon Pilgrim 9234ff26d9 [X86][SSE] Add v2i64 -> v2i32 + zero codegen test
llvm-svn: 287813
2016-11-23 21:19:57 +00:00
Michael Kuperstein 47eb85a003 [X86] Allow folding of stack reloads when loading a subreg of the spilled reg
We did not support subregs in InlineSpiller:foldMemoryOperand() because targets
may not deal with them correctly.

This adds a target hook to let the spiller know that a target can handle
subregs, and actually enables it for x86 for the case of stack slot reloads.
This fixes PR30832.

Differential Revision: https://reviews.llvm.org/D26521

llvm-svn: 287792
2016-11-23 18:33:49 +00:00
John Brawn 150addb45c [DAGCombiner] Fix infinite loop in vector mul/shl combining
We have the following DAGCombiner transformations:
 (mul (shl X, c1), c2) -> (mul X, c2 << c1)
 (mul (shl X, C), Y) -> (shl (mul X, Y), C)
 (shl (mul x, c1), c2) -> (mul x, c1 << c2)
Usually the constant shift is optimised by SelectionDAG::getNode when it is
constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing
with vectors and one of those vector constants contains an undef element
FoldConstantArithmetic does not fold and we enter an infinite loop.

Fix this by making FoldConstantArithmetic use getNode to decide how to fold each
vector element, the same as FoldConstantVectorArithmetic does, and rather than
adding the constant shift to the work list instead only apply the transformation
if it's already been folded into a constant, as if it's not we're going to loop
endlessly. Additionally add missing NoOpaques to one of those transformations,
which I noticed when writing the tests for this.

Differential Revision: https://reviews.llvm.org/D26605

llvm-svn: 287766
2016-11-23 16:05:51 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Elena Demikhovsky 09375d98b8 Type legalization for compressstore and expandload intrinsics.
Implemented widening (v2f32) and splitting (v16f64).
On splitting, I use "popcnt" to calculate memory increment. 
More type legalization work will come in the next patches.

llvm-svn: 287761
2016-11-23 13:58:24 +00:00
Craig Topper f57e17def0 [AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles.
llvm-svn: 287744
2016-11-23 06:54:55 +00:00
Kuba Mracek 06995e866b [xray] Add XRay support for Mach-O in CodeGen
Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support.

Differential Revision: https://reviews.llvm.org/D26983

llvm-svn: 287734
2016-11-23 02:07:04 +00:00
Simon Pilgrim eda365cf80 [X86][AVX512DQ] Add fp <-> int tests for AVX512DQ/AVX512DQ+VL
llvm-svn: 287706
2016-11-22 22:04:50 +00:00
Simon Pilgrim 4aa876ca7c [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering. 

We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle

llvm-svn: 287676
2016-11-22 17:50:06 +00:00
Simon Pilgrim d70c03ad68 Fix line endings
llvm-svn: 287638
2016-11-22 13:27:29 +00:00
Simon Pilgrim 72e43570b7 [SelectionDAG] ComputeNumSignBits of TRUNCATE operations
Add basic ComputeNumSignBits support for TRUNCATE ops for cases where the source's number of sign bits overlaps with the truncated size.

Improves X86 SIGN_EXTEND_IN_REG vector cases which were needlessly sign extending boolean vector results.

Differential Revision: https://reviews.llvm.org/D26851

llvm-svn: 287635
2016-11-22 11:29:19 +00:00
Craig Topper cada9f2275 [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD).
Summary:
The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities.

We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too.

Reviewers: igorb, delena, Ayal, Farhana, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25652

llvm-svn: 287621
2016-11-22 04:57:34 +00:00
Craig Topper da22267055 [AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type
Summary:
Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking.

This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018.

I plan to add support for more operations in future patches.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26902

llvm-svn: 287612
2016-11-22 03:51:53 +00:00
Jun Bum Lim 82f55c5446 [CodeGenPrep] Skip merging empty case blocks
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl

Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 287553
2016-11-21 16:47:28 +00:00
Simon Pilgrim 6704059a0d [X86][SSE] Add SSE reciprocal estimate tests
llvm-svn: 287543
2016-11-21 15:28:21 +00:00
Simon Pilgrim 49d7eda968 [SelectionDAG] Add ComputeNumSignBits support for CONCAT_VECTORS opcode
llvm-svn: 287541
2016-11-21 14:36:19 +00:00
Simon Pilgrim b7bbaa669b [X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input
At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input).

This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit.

llvm-svn: 287535
2016-11-21 12:05:49 +00:00
Craig Topper 9f2d632ee7 [AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match SSE and AVX.
llvm-svn: 287523
2016-11-21 07:51:31 +00:00
Simon Pilgrim 0878d46416 [X86][SSE] Add some initial combine tests that could (should?) use PACKSS
llvm-svn: 287504
2016-11-20 21:12:49 +00:00
Craig Topper 85a1f5c20c [AVX-512] Add tests for masked palignr/valignd/valignq shuffles, many of which show failures to fold the masking into the operation.
Many of these problems are because shuffle lowering widens element size and reduces element count when possible. This causes the shuffle to become separated from the select by a bitcast. Future patches will work to improve these cases by rewriting the shuffle back to a narrow element type if we think it can result in folding the mask.

llvm-svn: 287503
2016-11-20 19:50:32 +00:00
Simon Pilgrim 5fadce4a3f [X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible
llvm-svn: 287497
2016-11-20 16:11:36 +00:00
Simon Pilgrim 5401bae523 [X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines
llvm-svn: 287496
2016-11-20 15:24:38 +00:00
Simon Pilgrim 3f40412e0f [X86][AVX512] Add support for VBMI VPERMV target shuffle combines
llvm-svn: 287495
2016-11-20 15:05:45 +00:00
Simon Pilgrim 9e3f5cc015 [X86][AVX512] Add some initial VBMI target shuffle combine tests
llvm-svn: 287494
2016-11-20 14:45:46 +00:00
Simon Pilgrim 096b6d4f81 [X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets
Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64)

llvm-svn: 287491
2016-11-20 14:03:23 +00:00
Oren Ben Simhon c0f073b67f [X86] RegCall - Handling long double arguments
The change is part of RegCall calling convention support for LLVM.
Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack).
This review present the change and the corresponding tests.

Differential Revision: https://reviews.llvm.org/D26151

llvm-svn: 287485
2016-11-20 11:06:07 +00:00
Simon Pilgrim a14e0cb852 [X86][SSE] Improve PSHUFB lowering from either input
Canonicalization may leave the zeroable vector in the first input.

llvm-svn: 287461
2016-11-19 20:41:48 +00:00
Simon Pilgrim 623a7c57b5 [X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets
llvm-svn: 287459
2016-11-19 20:12:34 +00:00
Simon Pilgrim f011f7e160 [X86][AVX512] Add avx512vbmi tests
llvm-svn: 287447
2016-11-19 18:12:48 +00:00
Simon Pilgrim 28f1e0dab9 [X86][AVX512] Added some more complex v64i8 shuffles
llvm-svn: 287444
2016-11-19 17:50:14 +00:00
Simon Pilgrim e40900dddd [SelectionDAG] Add knowbits support for CONCAT_VECTOR opcode
llvm-svn: 287387
2016-11-18 22:21:22 +00:00
Simon Pilgrim 3a5328ecdd [X86] Add knownbits concat_vector test
Support coming in a future patch

llvm-svn: 287385
2016-11-18 21:59:38 +00:00
Simon Pilgrim 7bde5df5f0 [X86][AVX512] Split AVX512F/AVX512VL tests to demonstrate missed int2fp opportunities without AVX512VL
llvm-svn: 287348
2016-11-18 15:31:36 +00:00
Simon Pilgrim 3e5045e8f1 [X86][AVX2] Add v8i32->v8i64 mul test (PR30845)
llvm-svn: 287332
2016-11-18 11:00:36 +00:00
Craig Topper 02b5a1b50f [AVX-512] Replace masked 16-bit element variable shift intrinsics with new unmasked versions and selects.
The same thing was done to 32-bit and 64-bit element sizes previously.

This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics.

llvm-svn: 287312
2016-11-18 05:04:44 +00:00
Craig Topper 07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Simon Pilgrim 8eca5520dc [X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves
vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves.

If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer).

This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases).

Partial fix for PR30845

Differential Revision: https://reviews.llvm.org/D26590

llvm-svn: 287223
2016-11-17 12:14:49 +00:00
Oren Ben Simhon 489d6eff4f [X86] RegCall - Handling v64i1 in 32/64 bit target
Register Calling Convention defines a new behavior for v64i1 types.
This type should be saved in GPR.
However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit).

Differential Revision: https://reviews.llvm.org/D26181

llvm-svn: 287217
2016-11-17 09:59:40 +00:00
Sanjoy Das 4a8fe09040 [ImplicitNullCheck] Fix an edge case where we were hoisting incorrectly
ImplicitNullCheck keeps track of one instruction that the memory
operation depends on that it also hoists with the memory operation.
When hoisting this dependency, it would sometimes clobber a live-in
value to the basic block we were hoisting the two things out of.  Fix
this by explicitly looking for such dependencies.

I also noticed two redundant checks on `MO.isDef()` in IsMIOperandSafe.
They're redundant since register MachineOperands are either Defs or Uses
-- there is no third kind.  I'll change the checks to asserts in a later
commit.

llvm-svn: 287213
2016-11-17 07:29:40 +00:00
Craig Topper dfaf9201cb [X86] Add a test case where, due to a bug in selectScalarSSELoad, we fold the same load twice.
llvm-svn: 287210
2016-11-17 05:37:39 +00:00
Sanjay Patel 066139a3ec [x86] allow FP-logic ops when one operand is FP and result is FP
We save an inter-register file move this way. If there's any CPU where
the FP logic is slower, we could transform this back to int-logic in 
MachineCombiner.

This helps, but doesn't solve, PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

The 'andn' test shows that we're missing a pattern match to
recognize the xor with -1 constant as a 'not' op.

llvm-svn: 287171
2016-11-16 22:34:05 +00:00
Sanjoy Das df4b162e4d [ImplicitNullChecks] Do not not handle call MachineInstrs
We don't track callee clobbered registers correctly, so avoid hoisting
across calls.

Note: for this bug to trigger we need a `readonly` call target, since we
already have logic to not hoist across potentially storing instructions
either.

llvm-svn: 287159
2016-11-16 21:45:22 +00:00
Sanjay Patel 7f3d51f840 [x86] add fake scalar FP logic instructions to ReplaceableInstrs to save some bytes
We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. 
Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of 
compilers, but logically equivalent int, float, and double variants of bitwise-logic 
instructions are reality in x86, and the float variant may be a shorter instruction 
depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all 
the time.

This is a preliminary step towards solving PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

Differential Revision:
https://reviews.llvm.org/D26712

llvm-svn: 287122
2016-11-16 17:42:40 +00:00
Simon Pilgrim 79416ea76a [X86] Add integer division test for PR23590
Shows missed opportunity to recognise reduced integer division result size

llvm-svn: 287110
2016-11-16 14:54:34 +00:00
Simon Pilgrim b57dd17142 [X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.

LLVM counterpart to D26686

Differential Revision: https://reviews.llvm.org/D26736

llvm-svn: 287108
2016-11-16 14:48:32 +00:00
Simon Pilgrim 9e355bc5bb [X86][AVX512] Added some mask/maskz tests for sitofp/uitofp i32 to f64
llvm-svn: 287106
2016-11-16 14:24:04 +00:00
Simon Pilgrim c223aa52b1 [X86] Regenerated integer divide tests to test on 32 and 64 bit targets
llvm-svn: 287104
2016-11-16 14:12:11 +00:00
Simon Pilgrim dd8c71c646 [X86][SSE] Added PSUBUS from SELECT tests from D25987
llvm-svn: 287103
2016-11-16 13:59:03 +00:00
Ayman Musa 4d60243bfd [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.
Differential Revision: https://reviews.llvm.org/D26128

llvm-svn: 287087
2016-11-16 09:00:28 +00:00
Craig Topper 6910fa0ef4 [X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.

Reviewers: zvi, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26660

llvm-svn: 287083
2016-11-16 05:24:10 +00:00
Quentin Colombet fb9b0cdcfe [RegAllocGreedy] Record missed hint for late recoloring.
In https://reviews.llvm.org/D25347, Geoff noticed that we still have
useless copy that we can eliminate after register allocation. At the
time the allocation is chosen for those copies, they are not useless
but, because of changes in the surrounding code, later on they might
become useless.
The Greedy allocator already has a mechanism to deal with such cases
with a late recoloring. However, we missed to record the some of the
missed hints.

This commit fixes that.

llvm-svn: 287070
2016-11-16 01:07:12 +00:00
Sanjay Patel aaf430452b [x86] regenerate checks; NFC
llvm-svn: 287051
2016-11-15 23:09:53 +00:00