Commit Graph

3207 Commits

Author SHA1 Message Date
David Majnemer 596c8d76fc [X86] Use a better sentinel offset for the FrameAddr index
Other pieces of CodeGen want to negate frame object offsets to account
for architectures where the stack grows down.  Our object is a pseudo
object so it's offset doesn't matter.  However, we shouldn't choose an
offset which results in undefined behavior if you negate it.

llvm-svn: 237474
2015-05-15 20:08:27 +00:00
Elena Demikhovsky d5b3e376d2 AVX-512: Added i1 type handling for calling conventions.
i1 type is a legal type on AVX-512 and can be passed as parameter or return value.
i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here).
The result code is similar to the previous X86 targets, where i1 is allways promoted to i8.

llvm-svn: 237350
2015-05-14 09:04:45 +00:00
Chandler Carruth 942fba95e2 Revert r237175: [X86] Always return the sret parameter in eax/rax ...
This commit broke an x86 test and the bots have been broken for well
over an hour now so I'm just reverting.

llvm-svn: 237210
2015-05-12 23:34:27 +00:00
Reid Kleckner b465563b46 [X86] Always return the sret parameter in eax/rax, even on 32-bit
Summary:
This rule was always in the old SysV i386 ABI docs and the new ones that
H.J. Lu has put together, but we never noticed:

  EAX   scratch register; also used to return integer and pointer values
        from functions; also stores the address of a returned struct or union

Fixes PR23491.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9715

llvm-svn: 237175
2015-05-12 20:56:32 +00:00
Sanjay Patel 7713e6849d use 'auto' to improve readability; NFC
llvm-svn: 237144
2015-05-12 15:15:55 +00:00
Elena Demikhovsky fae20d3565 AVX-512, X86: Added lowering for shift operations for SKX.
The other changes in the LowerShift() are not functional,
just to make the code more convenient.
So, the functional changes for SKX only.

llvm-svn: 237129
2015-05-12 13:25:46 +00:00
Andrea Di Biagio 454f7909c6 [X86] Remove useless target specific combine on TRUNCATE dag nodes.
Before revision 171146, function 'PerformTruncateCombine' used to perform
a premature lowering of TRUNCATE dag nodes.
Revision 171146 then moved all the logic implemented by PerformTruncateCombine
to a custom lowering hook. However, that revision forgot to delete
function PerformTruncateCombine from the code.

This patch removes function 'PerformTruncateCombine' since it has no effect
on the SelectionDAG. No functional change intended.

llvm-svn: 237122
2015-05-12 12:34:22 +00:00
Elena Demikhovsky c1ac5d7bd5 AVX-512: select operation for i1 vectors
like: select i1 %cond, <16 x i1> %a, <16 x i1> %b.
I added pseudo-CMOV patterns to resolve the "select".
Added tests for KNL and SKX.

llvm-svn: 237106
2015-05-12 09:36:52 +00:00
Michael Kuperstein 6f5ff6905c [X86] DAGCombine should not assume arbitrary vector types are simple
The X86-specific DAGCombine for stores should not assume vector types are always simple.
This fixes PR23476.

Differential Revision: http://reviews.llvm.org/D9659

llvm-svn: 237097
2015-05-12 07:33:07 +00:00
Eric Christopher 824f42f209 Migrate existing backends that care about software floating point
to use the information in the module rather than TargetOptions.

We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.

For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.

Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.

llvm-svn: 237079
2015-05-12 01:26:05 +00:00
Pirama Arumuga Nainar af171e7720 [X86] Updates to X86 backend for f16 promotion
Summary:
r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.

This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:
1. Set loadextaction for f16 vectors to expand.
2. Handle FP_EXTEND in a switch statement when handling v2f32
3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or
(store (SINT_TO_FP )) to a FILD.

Tests included.

Reviewers: ab, srhines, delena

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9092

llvm-svn: 237004
2015-05-11 17:14:39 +00:00
Elena Demikhovsky 5b9ee1ba7e Fixed compilation warning, NFC.
llvm-svn: 236972
2015-05-11 06:23:41 +00:00
Elena Demikhovsky f40342d6a2 AVX-512: fixed UINT_TO_FP operation for 512-bit types.
llvm-svn: 236955
2015-05-10 14:23:52 +00:00
Elena Demikhovsky 75d1489326 AVX-512: fixed a bug in i1 vectors lowering
llvm-svn: 236947
2015-05-10 10:33:32 +00:00
Arnold Schwaighofer f54b73d681 ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.

Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.

Fix for PR23459.

rdar://20740035

llvm-svn: 236916
2015-05-08 23:52:00 +00:00
Pat Gavlin cc0431d1c0 Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware.
This changes the shape of the statepoint intrinsic from:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)

to:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)

This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.

In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.

Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.

Differential Revision: http://reviews.llvm.org/D9501

llvm-svn: 236888
2015-05-08 18:07:42 +00:00
Andrea Di Biagio 84e22b9096 [X86] Teach 'getTargetShuffleMask' how to look through ISD::WrapperRIP when decoding a PSHUFB mask.
The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes
where the mask node is a load from constant pool, and the constant pool node
is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching
it how to also look through X86ISD::WrapperRIP.

This helps function combineX86ShufflesRecusively to combine more shuffle
sequences containing PSHUFB nodes if we are in RIPRel PIC mode.

Before this change, llc (with -relocation-model=pic -march=x86-64) was unable
to decode a pshufb where the mask was loaded from a constant pool. For example,
the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its
operand, so instead of generating a single 'movaps' the backend always
generated a sub-optimal 'movdqa + pshufb' sequence.

Added test x86-fold-pshufb.ll.

llvm-svn: 236863
2015-05-08 15:11:07 +00:00
Denis Protivensky 159a49e5d6 Fix gcc warning of different enum and non-enum types in ternary
Make '0' literal explicitly unsigned with '0u'.
This appeared after r236775.

llvm-svn: 236838
2015-05-08 12:21:03 +00:00
Matthias Braun d04893fa36 Change getTargetNodeName() to produce compiler warnings for missing cases, fix them
llvm-svn: 236775
2015-05-07 21:33:59 +00:00
Elena Demikhovsky 29792e9a80 AVX-512: Added all forms of FP compare instructions for KNL and SKX.
Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec.

By Igor Breger (igor.breger@intel.com)

llvm-svn: 236714
2015-05-07 11:24:42 +00:00
NAKAMURA Takumi d7c0be9c42 Revert r236546, "propagate IR-level fast-math-flags to DAG nodes (NFC)"
It caused undefined behavior.

llvm-svn: 236600
2015-05-06 14:03:12 +00:00
Sanjay Patel 801caff64d propagate IR-level fast-math-flags to DAG nodes (NFC)
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997

...which split the existing nsw / nuw / exact flags and FMF
into their own struct.

There are 2 structural changes here:

1. The main diff is that we're preparing to extend the optimization
flags to affect more than just binary SDNodes. Eg, IR intrinsics 
( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes
that don't even exist in IR such as FMA, FNEG, etc.

2. The other change is that we're actually copying the FP fast-math-flags
from the IR instructions to SDNodes. 

Differential Revision: http://reviews.llvm.org/D8900

llvm-svn: 236546
2015-05-05 21:40:38 +00:00
Elena Demikhovsky 60eb9db7bb AVX-512: added calling convention for i1 vectors in 32-bit mode.
Fixed some bugs in extend/truncate for AVX-512 target.
Removed VBROADCASTM (masked broadcast) node, since it is not used any more.

llvm-svn: 236420
2015-05-04 12:40:50 +00:00
Elena Demikhovsky 1b60ed7069 Masked gather and scatter intrinsics - enabled codegen for KNL.
llvm-svn: 236394
2015-05-03 07:12:25 +00:00
Simon Pilgrim d5e20306cc [SSE2] Minor tidyup of v16i8 SHL lowering. NFC.
Removed code that was replicating v8i16 'shift + mask' implementation that is done more nicely by making use of LowerScalarImmediateShift

llvm-svn: 236388
2015-05-02 14:42:43 +00:00
Simon Pilgrim ecf5875bd5 [SSE] Fix for MUL v16i8 on pre-SSE41 targets (PR23369).
Sign extension of i8 to i16 was placing the unpacked bytes in the lower byte instead of the upper byte.

llvm-svn: 236209
2015-04-30 08:23:16 +00:00
Elena Demikhovsky a9f20495a2 fixed 80-chars; NFC
llvm-svn: 236093
2015-04-29 08:49:57 +00:00
Eric Christopher 0ba41a6841 Reuse a lookup in an assert.
llvm-svn: 236054
2015-04-28 22:38:35 +00:00
Sanjay Patel ba55804ea3 move IR-level optimization flags into their own struct
This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900).

In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment, 
we should eventually share this data between the IR passes and the backend.

We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to
use the new struct.

The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a
data member to the subclass. This means we don't have to repeat all of the get/set methods per flag,
but we're potentially adding size to all nodes of this subclassi type.

In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an
SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at
least one free byte to play with due to struct alignment.

Differential Revision: http://reviews.llvm.org/D9325

llvm-svn: 235997
2015-04-28 16:39:12 +00:00
Elena Demikhovsky 1f7b3644d3 Fixed crash of variable shift inst on AVX2
https://llvm.org/bugs/show_bug.cgi?id=22955

llvm-svn: 235993
2015-04-28 14:46:35 +00:00
Sergey Dmitrouk 842a51bad8 Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"
[DebugInfo] Add debug locations to constant SD nodes

This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235989
2015-04-28 14:05:47 +00:00
Daniel Jasper 48e93f7181 Revert "[DebugInfo] Add debug locations to constant SD nodes"
This breaks a test:
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870

llvm-svn: 235987
2015-04-28 13:38:35 +00:00
Sergey Dmitrouk adb4c69d5c [DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235977
2015-04-28 11:56:37 +00:00
Elena Demikhovsky a480ef5494 AVX-512: added calling conventions for i1 vectors.
Fixed bug: https://llvm.org/bugs/show_bug.cgi?id=20724

llvm-svn: 235889
2015-04-27 15:11:19 +00:00
Elena Demikhovsky d1084c5b3f AVX-512: Extend/Truncate operations for SKX,
SETCC for bit-vectors

llvm-svn: 235875
2015-04-27 12:57:59 +00:00
Simon Pilgrim 4f683c264a [X86][SSE] Add v16i8/v32i8 multiplication support
Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized.

The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result.

Differential Revision: http://reviews.llvm.org/D9115

llvm-svn: 235837
2015-04-27 07:55:46 +00:00
Sanjay Patel fe1365ac50 [x86] allow 64-bit extracted vector element integer stores on a 32-bit system
With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system
even though 64-bit integers are not legal types.

So instead of producing this:

  pshufd	$229, %xmm0, %xmm1      ## xmm1 = xmm0[1,1,2,3]
  movd	%xmm0, (%eax)
  movd	%xmm1, 4(%eax)

We can do:

  movq %xmm0, (%eax)

This is a fix for the problem noted in D7296.

Differential Revision: http://reviews.llvm.org/D9134

llvm-svn: 235460
2015-04-22 00:24:30 +00:00
Matthias Braun 9e9e8b3230 X86: Match for X86ISD nodes in LowerBUILD_VECTOR instead of BUILD_VECTORCombine
There doesn't seem to be a reason to perform this target ISD node matching
in an DAGCombine, moving it to lowering fixes PR23296.

Differential Revision: http://reviews.llvm.org/D9137

llvm-svn: 235394
2015-04-21 17:21:36 +00:00
Elena Demikhovsky 50b88ddb87 AVX-512: Added logical and arithmetic instructions for SKX
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 235375
2015-04-21 10:27:40 +00:00
Matthias Braun b6b5aaad98 X86: Do not select X86 custom vector nodes if operand types don't match
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.

Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296

Differential Revision: http://reviews.llvm.org/D9120

llvm-svn: 235367
2015-04-21 01:13:41 +00:00
Simon Pilgrim 749953eebb [X86][SSE] Fix for getScalarValueForVectorElement to detect scalar sources requiring truncation.
The fix ensures that scalar sources inserted into a vector are the correct bit size.

Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support.

llvm-svn: 235281
2015-04-19 22:16:49 +00:00
Sanjay Patel 7024b8121a [x86] Implement combineRepeatedFPDivisors
Set the transform bar at 2 divisions because the fastest current
x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle
latency (best case) relative to a 5 cycle multiplier. 
So that's the worst case for this transform (no latency win), 
but multiplies are obviously pipelined while divisions are not,
so there's still a big throughput win which we would expect to
show up in typical FP code.

These are the sequences I'm comparing:

  divss   %xmm2, %xmm0
  mulss   %xmm1, %xmm0
  divss   %xmm2, %xmm0

Becomes:

  movss   LCPI0_0(%rip), %xmm3    ## xmm3 = mem[0],zero,zero,zero
  divss   %xmm2, %xmm3
  mulss   %xmm3, %xmm0
  mulss   %xmm1, %xmm0
  mulss   %xmm3, %xmm0

[Ignore for the moment that we don't optimize the chain of 3 multiplies
into 2 independent fmuls followed by 1 dependent fmul...this is the DAG
version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that,
then the transform becomes even more profitable on all targets.]

Differential Revision: http://reviews.llvm.org/D8941

llvm-svn: 235012
2015-04-15 15:22:55 +00:00
Krzysztof Parzyszek a46c36b8f4 Allow memory intrinsics to be tail calls
llvm-svn: 234764
2015-04-13 17:16:45 +00:00
Benjamin Kramer 619c4e57ba Reduce dyn_cast<> to isa<> or cast<> where possible.
No functional change intended.

llvm-svn: 234586
2015-04-10 11:24:51 +00:00
Simon Pilgrim 49ba9b8e2f [X86][SSE] Use (V)PINSRB for direct byte insertion in 16i8 buildvector on SSE4.1 targets
This patch allows SSE4.1 targets to use (V)PINSRB to create 16i8 vectors by inserting i8 scalars directly into a XMM register instead of merging pairs of i8 scalars into a i16 and using the SSE2 PINSRW instruction.

This allows folding of byte loads and reduces scalar register usage as well.

Differential Revision: http://reviews.llvm.org/D8839

llvm-svn: 234193
2015-04-06 18:39:00 +00:00
David Majnemer 3337064a47 [WinEH] Sink UnwindHelp completely out of IR
We don't need to represent UnwindHelp in IR.  Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.

llvm-svn: 234061
2015-04-03 22:32:26 +00:00
Sanjay Patel eca590ffb3 [AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector
Without this patch, we split the 256-bit vector into halves and produced something like:
	movzwl	(%rdi), %eax
	vmovd	%eax, %xmm0
	vxorps	%xmm1, %xmm1, %xmm1
	vblendps	$15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]

Now, we eliminate the xor and blend because those zeros are free with the vmovd:
        movzwl  (%rdi), %eax
        vmovd   %eax, %xmm0

This should be the final fix needed to resolve PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685

llvm-svn: 233941
2015-04-02 20:21:52 +00:00
David Majnemer a225a19dd0 [WinEH] Generate .xdata for catch handlers
This lets us catch exceptions in simple cases.

N.B. Things that do not work include (but are not limited to):
- Throwing from within a catch handler.
- Catching an object with a named catch parameter.
- 'CatchHigh' is fictitious, we aren't sure of its purpose.
- We aren't entirely efficient with regards to the number of EH states
  that we generate.
- IP-to-State tables are sensitive to the order of emission.

llvm-svn: 233767
2015-03-31 22:35:44 +00:00
Sanjay Patel 2ae9943881 [X86, AVX] try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types
I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354)

It improves the v4i64 case although not optimally. This AVX codegen:

  vmovq {{.*#+}} xmm0 = mem[0],zero
  vxorpd %ymm1, %ymm1, %ymm1
  vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]

Becomes:

  vmovsd {{.*#+}} xmm0 = mem[0],zero

Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here:

    We're not handling v32i8 / v16i16.
    We're not getting the FP / int domains right for instruction selection.

But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64, 
I'm submitting this patch ahead of fixing the above.

Differential Revision: http://reviews.llvm.org/D8341

llvm-svn: 233704
2015-03-31 16:32:11 +00:00
Sanjay Patel bbe1756a8c more space; NFC
llvm-svn: 233554
2015-03-30 15:31:32 +00:00