Commit Graph

4734 Commits

Author SHA1 Message Date
Juergen Ributzka 5ad463f55e Revert "[FastIsel][X86] Add support for lowering the first 8 floating-point arguments."
Reverting it because it breaks several tests.

llvm-svn: 210810
2014-06-12 19:21:43 +00:00
Tom Stellard 7783b0adf4 Revert "SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for vectors"
This reverts commit r210540, adds a testcase for the regression it
caused, and marks the R600 test it was supposed to fix as XFAIL.

llvm-svn: 210792
2014-06-12 16:04:47 +00:00
Andrea Di Biagio 972ff97f8c [X86] Teach how to combine AVX and AVX2 horizontal binop on packed 256-bit vectors.
This patch adds target combine rules to match:
 - [AVX] Horizontal add/sub of packed single/double precision floating point
   values from 256-bit vectors;
 - [AVX2] Horizontal add/sub of packed integer values from 256-bit vectors.

llvm-svn: 210761
2014-06-12 10:53:48 +00:00
Juergen Ributzka b43a559514 [FastISel][x86] Add testcase for r210719.
llvm-svn: 210746
2014-06-12 03:54:05 +00:00
Juergen Ributzka 7eac929609 [x86] Improve frameaddress test from r210709.
llvm-svn: 210743
2014-06-12 03:29:29 +00:00
Juergen Ributzka 04558dc77a [FastISel] Add support for the stackmap intrinsic.
This implements target-independent FastISel lowering for the stackmap intrinsic.

llvm-svn: 210742
2014-06-12 03:29:26 +00:00
Juergen Ributzka 272b570a80 [FastISel][X86] Add support for the sqrt intrinsic.
llvm-svn: 210720
2014-06-11 23:11:02 +00:00
Juergen Ributzka 4dc958777c [FastISel][X86] Add support for the frameaddress intrinsic.
llvm-svn: 210709
2014-06-11 21:44:44 +00:00
Cameron McInally 5d1b7b94e4 Add AVX512 masked leadz instrinsic support.
llvm-svn: 210652
2014-06-11 12:54:45 +00:00
Andrea Di Biagio c7af75f9a7 [X86] Refactor the logic to select horizontal adds/subs to a helper function.
This patch moves part of the logic implemented by the target specific
combine rules added at r210477 to a separate helper function.
This should make easier to add more rules for matching AVX/AVX2 horizontal
adds/subs.

This patch also fixes a problem caused by a wrong check performed on indices
of extract_vector_elt dag nodes in input to the scalar adds/subs.

New tests have been added to verify that we correctly check indices of
extract_vector_elt dag nodes when selecting a horizontal operation.

llvm-svn: 210644
2014-06-11 07:57:50 +00:00
Juergen Ributzka 2dace6e54b [FastISel][X86] Extend support for {s|u}{add|sub|mul}.with.overflow intrinsics.
llvm-svn: 210610
2014-06-10 23:52:44 +00:00
Andrea Di Biagio fa508af0fe [X86] Improved target combine rules for selecting horizontal add/sub.
This patch slightly changes the algorithm introduced at revision 210477
to fix a problem where the algorithm was producing incorrect code for 
the VEX.256 encoded versions of horizontal add/sub.

For these cases, we now try to split the two 256-bit vectors into
128-bit chunks before emitting horizontal add/sub dag nodes.

Added a new test case into haddsub-2.ll.

llvm-svn: 210545
2014-06-10 16:42:57 +00:00
Adam Nemet 7f62b23e92 [X86] AVX512: Add vmovntdqa
Along with the corresponding intrinsic and tests.

llvm-svn: 210543
2014-06-10 16:39:53 +00:00
Tim Northover 7b9f86da5d Revert "X86: elide comparisons after cmpxchg instructions."
This reverts commit r210523. It was committed prematurely without waiting for
review.

llvm-svn: 210524
2014-06-10 10:50:11 +00:00
Tim Northover 84ad29ca1f X86: elide comparisons after cmpxchg instructions.
The C++ and C semantics of the compare_and_swap operations actually
require us to return a boolean "success" value. In LLVM terms this
means a second comparison of the output of "cmpxchg" against the input
desired value.

However, x86's "cmpxchg" instruction sets all flags for the comparison
formed, so we can skip any secondary comparison. (N.b. this isn't true
for cmpxchg8b/16b, which only set ZF).

rdar://problem/13201607

llvm-svn: 210523
2014-06-10 10:49:07 +00:00
Alp Toker d3d017cf00 Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

llvm-svn: 210496
2014-06-09 22:42:55 +00:00
Andrea Di Biagio f99dd64f0a [X86] Add target combine rules for horizontal add/sub.
This patch adds new target specific combine rules to identify horizontal
add/sub idioms from BUILD_VECTOR dag nodes.

This patch also teaches the DAGCombiner how to canonicalize sequences of
insert_vector_elt dag nodes according to the following rule:

  (insert_vector_elt (insert_vector_elt A, I0), I1) ->
    (insert_vecto_elt (insert_vector_elt A, I1), I0)

This new canonicalization rule only triggers if the inner insert_vector
dag node has exactly one use; also, both indices must be known constants,
and I1 < I0.
This last rule made it possible to write a simpler algorithm to identify
horizontal add/sub patterns because now we don't have to worry about the
ordering of insert_vector_elt dag nodes.

llvm-svn: 210477
2014-06-09 16:54:41 +00:00
NAKAMURA Takumi 0ecb6e77f3 llvm/test/CodeGen/X86/2014-05-29-factorial.ll: Relax an expression to match Win32 x64.
llvm-svn: 210471
2014-06-09 14:20:23 +00:00
Andrea Di Biagio dfbdc71ea1 [X86] Avoid emitting unnecessary test instructions.
This patch teaches the backend how to check for the 'NoSignedWrap' flag on
binary operations to improve the emission of 'test' instructions.

If the result of a binary operation is known not to overflow we know that
resetting the Overflow flag is unnecessary and so we can avoid emitting
the test instruction.

Patch by Marcello Maggioni.

llvm-svn: 210468
2014-06-09 12:34:50 +00:00
Andrea Di Biagio 4db1abea15 [DAG] Expose NoSignedWrap, NoUnsignedWrap and Exact flags to SelectionDAG.
This patch modifies SelectionDAGBuilder to construct SDNodes with associated
NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator
instructions.

Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing
nsw/nuw/exact flags during codegen.

Patch by Marcello Maggioni.

llvm-svn: 210467
2014-06-09 12:32:53 +00:00
Alp Toker 5c53639492 Fix typos
llvm-svn: 210401
2014-06-07 21:23:09 +00:00
Benjamin Kramer d0700b2919 X86: Don't turn shifts into ands if there's another use that may not check for equality.
Fixes PR19964.

llvm-svn: 210371
2014-06-06 21:08:55 +00:00
Filipe Cabecinhas 5181255696 Fixed a bug in lowering shuffle_vectors to insertps
Summary:
We were being too strict and not accounting for undefs.
Added a test case and fixed another one where we improved codegen.

Reviewers: grosbach, nadav, delena

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4039

llvm-svn: 210361
2014-06-06 18:07:06 +00:00
Tom Roeder 44cb65fff1 Add a new attribute called 'jumptable' that creates jump-instruction tables for functions marked with this attribute.
It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables.

This also adds backend support for generating the jump-instruction tables on ARM and X86.
Note that since the jumptable attribute creates a second function pointer for a
function, any function marked with jumptable must also be marked with unnamed_addr.

llvm-svn: 210280
2014-06-05 19:29:43 +00:00
Eric Christopher dd240fd79c Revert r209381 as it isn't a local variable. Add a testcase so that
we know next time this happens.

llvm-svn: 210127
2014-06-03 21:01:39 +00:00
Rafael Espindola 64c1e18033 Allow alias to point to an arbitrary ConstantExpr.
This  patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is
up to MC (or the system assembler) to decide if that expression is valid or not.

This reduces our ability to diagnose invalid uses and how early we can spot
them, but it also lets us do things like

@test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32),
                                 i32 ptrtoint (i32* @bar to i32)) to i32*)

An important implication of this patch is that the notion of aliased global
doesn't exist any more. The alias has to encode the information needed to
access it in its metadata (linkage, visibility, type, etc).

Another consequence to notice is that getSection has to return a "const char *".
It could return a NullTerminatedStringRef if there was such a thing, but when
that was proposed the decision was to just uses "const char*" for that.

llvm-svn: 210062
2014-06-03 02:41:57 +00:00
Andrea Di Biagio 4760813831 [X86] Fix checked arithmetic for i8 on X86.
When lowering a ISD::BRCOND into a test+branch, make sure that we
always use the correct condition code to emit the test operation.

This fixes PR19858: "i8 checked mul is wrong on x86".

Patch by Keno Fisher!

llvm-svn: 210032
2014-06-02 16:00:27 +00:00
Filipe Cabecinhas 83f4192a47 Make blend tests more specific
Following the lead set by r209324, I'm making these tests match the whole
instruction, so we can be sure we're lowering them correctly.

llvm-svn: 209947
2014-05-31 00:52:23 +00:00
Andrea Di Biagio 446a527905 [X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type.
This patch teaches the backend how to simplify/canonicalize dag node
sequences normally introduced by the backend when promoting certain dag nodes
with illegal vector type.

This patch adds two new combine rules:
1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
        (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)

2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
        (shuffle (BINOP A, B), Undef, <Mask>).

Both rules are only triggered on the type-legalized DAG.
In particular, rule 1. is a target specific combine rule that attempts
to sink a bitconvert into the operands of a binary operation.
Rule 2. is a target independet rule that attempts to move a shuffle
immediately after a binary operation.

llvm-svn: 209930
2014-05-30 23:17:53 +00:00
Filipe Cabecinhas 82111f12fb Convert a vselect into a concat_vector if possible
Summary:
If both vector args to vselect are concat_vectors and the condition is
constant and picks half a vector from each argument, convert the vselect
into a concat_vectors.

Added a test.

The ConvertSelectToConcatVector is assuming it doesn't get vselects with
arguments of, for example, <undef, undef, true, true>. Those get taken
care of in the checks above its call.

Reviewers: nadav, delena, grosbach, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3916

llvm-svn: 209929
2014-05-30 23:03:11 +00:00
Filipe Cabecinhas d3aebaf875 Separate the check for blend shuffle_vector masks
Summary:
Separate the check for blend shuffle_vector masks into isBlendMask.
This function will also be used to check if a vector shuffle is legal. No
change in functionality was intended, but we ended up improving codegen on
two tests, which were being (more) optimized only if the resulting shuffle
was legal.

Reviewers: nadav, delena, andreadb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3964

llvm-svn: 209923
2014-05-30 21:31:21 +00:00
Rafael Espindola 92945eee80 [pr19636] Fix known bit computation in urem instruction with power of two.
Patch by Andrey Kuharev.

llvm-svn: 209902
2014-05-30 15:00:45 +00:00
Adam Nemet b3587e98b7 [X86] Move test from r209863 to CodeGen/X86
We should only run this if X86 is in the targets.

llvm-svn: 209866
2014-05-29 23:52:53 +00:00
Adam Nemet 35b80eaef1 [X86] Remove AVX1 vbroadcast intrinsics
The corresponding CFE patch replaces these intrinsics with vector initializers
in avxintrin.h.  This patch removes the LLVM intrinsics from the backend.

We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering
that further to the intrinsics.

The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue
to use intrinsics.  As explained in the CFE patch, the reason is that we
currently don't generate as good code for them without the intrinsics.

CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change.  It
checks that for a series of insertelements we generate the appropriate
vbroadcast instruction.

Also verified that there was no assembly change in the test-suite before and
after this patch.

llvm-svn: 209864
2014-05-29 23:35:36 +00:00
Filipe Cabecinhas 229dc17610 Added tests for shufflevector lowering to blend instrs.
These tests ensure that a change I will propose in clang works as
expected.

Summary:
Added tests for the generation of blend+immediate instructions from a
shufflevector.
These tests were proposed along with a patch that was dropped. I'm
committing the tests anyway to protect against possible regressions in
codegen.

Reviewers: nadav, bkramer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3600

llvm-svn: 209853
2014-05-29 22:04:42 +00:00
Michael J. Spencer f375d80635 [x86] Fold extract_vector_elt of a load into the Load's address computation.
An address only use of an extract element of a load can be simplified to a
load. Without this the result of the extract element is spilled to the
stack so that an address is available.

llvm-svn: 209788
2014-05-29 01:42:45 +00:00
Rafael Espindola 59f7eba2b5 [pr19844] Add thread local mode to aliases.
This matches gcc's behavior. It also seems natural given that aliases
contain other properties that govern how it is accessed (linkage,
visibility, dll storage).

Clang still has to be updated to expose this feature to C.

llvm-svn: 209759
2014-05-28 18:15:43 +00:00
Filipe Cabecinhas 82ac07c283 Convert some X86 blendv* intrinsics into IR.
Summary:
Implemented an InstCombine transformation that takes a blendv* intrinsic
call and translates it into an IR select, if the mask is constant.

This will eventually get lowered into blends with immediates if possible,
or pblendvb (with an option to further optimize if we can transform the
pblendvb into a blend+immediate instruction, depending on the selector).
It will also enable optimizations by the IR passes, which give up on
sight of the intrinsic.

Both the transformation and the lowering of its result to asm got shiny
new tests.

The transformation is a bit convoluted because of blendvp[sd]'s
definition:

Its mask is a floating point value! This forces us to convert it and get
the highest bit. I suppose this happened because the mask has type
__m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.

I will send an email to llvm-dev to discuss if we want to change this or
not.

Reviewers: grosbach, delena, nadav

Differential Revision: http://reviews.llvm.org/D3859

llvm-svn: 209643
2014-05-27 03:42:20 +00:00
Rafael Espindola f557536f56 Just check the entire string.
Thanks to David Blaikie for the suggestion.

llvm-svn: 209610
2014-05-26 04:08:51 +00:00
Rafael Espindola 4a04c4b69c Emit data or code export directives based on the type.
Currently we look at the Aliasee to decide what type of export
directive to use. It seems better to use the type of the alias
directly. This is similar to how we handle the alias having the
same address but other attributes (linkage, visibility) from the
aliasee.

With this patch it is now possible to do things like

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"
@foo = global [6 x i8] c"\B8*\00\00\00\C3", section ".text", align 16
@f = dllexport alias i32 (), [6 x i8]* @foo
!llvm.module.flags = !{!0}
!0 = metadata !{i32 6, metadata !"Linker Options", metadata !1}
!1 = metadata !{metadata !2, metadata !3}
!2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"}
!3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"}

llvm-svn: 209600
2014-05-25 12:49:07 +00:00
Rafael Espindola d234b0f08c Make these CHECKs a bit more strict.
The " at the end of the line makes sure we matched the entire directive.

llvm-svn: 209599
2014-05-25 12:43:13 +00:00
Rafael Espindola a5bb2f61cf Use alias linkage and visibility to decide tls access mode.
This matches both what we do for the non-thread case and what gcc does.

With this patch clang would match gcc's behaviour in

static __thread int a = 42;
extern __thread int b __attribute__((alias("a")));
int *f(void) { return &a; }
int *g(void) { return &b; }

if not for pr19843. Manually writing the IL does produce the same access modes.

It is also a step in the direction of fixing pr19844.

llvm-svn: 209543
2014-05-23 19:16:56 +00:00
Nico Rieck fb4c55fb64 Remove unused CHECK lines
llvm-svn: 209539
2014-05-23 19:06:44 +00:00
Rafael Espindola fbe44200ce Convert test to use FileCheck.
llvm-svn: 209528
2014-05-23 16:51:13 +00:00
Andrea Di Biagio c8dd1ad85b [X86] Improve the lowering of BITCAST from MVT::f64 to MVT::v4i16/MVT::v8i8.
This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag
nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to
MVT::v8i8 (and vice versa).

This patch extends the logic from revision 208107 to also handle MVT::v4i16
and MVT::v8i8. Also, this patch correctly propagates Undef values when
performing the widening of a vector (example: when widening from v2i32 to
v4i32, the upper 64bits of the resulting vector are 'undef').

llvm-svn: 209451
2014-05-22 16:21:39 +00:00
Tim Northover f9e798ba6a Segmented stacks: omit __morestack call when there's no frame.
Patch by Florian Zeitz

llvm-svn: 209436
2014-05-22 13:03:43 +00:00
Quentin Colombet b4d53f1afa [X86] Fix a bug in the lowering of BLENDI introduced in r209043.
ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the
second argument.
On the other hand, BLENDI uses 0 to identify the first argument and 1 to
identify the second argument.
Fix the generation of the blend mask to account for this difference.

The bug did not show up with r209043, because we were not checking for the
actual arguments of the blend instruction!
This commit also fixes the test cases.

Note: The same mask works for the BLENDr variant because the arguments are
swapped during instruction selection (see the BLENDXXrr patterns).

<rdar://problem/16975435>

llvm-svn: 209324
2014-05-21 22:00:39 +00:00
Eric Christopher 2feed5fd68 Move the function and data section flags into the options struct and
make the functions to set them non-static.
Move and rename the llvm specific backend options to avoid conflicting
with the clang option.

Paired with a backend commit to update.

llvm-svn: 209238
2014-05-20 21:25:34 +00:00
Quentin Colombet c88baa5c10 [LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regN.
This commit introduces a canonical representation for the formulae.
Basically, as soon as a formula has more that one base register, the scaled
register field is used for one of them. The register put into the scaled
register is preferably a loop variant.
The commit refactors how the formulae are built in order to produce such
representation.
This yields a more accurate, but still perfectible, cost model.

<rdar://problem/16731508>

llvm-svn: 209230
2014-05-20 19:25:04 +00:00
Renato Golin e919fcac05 Avoids DCE on write_register
llvm-svn: 209222
2014-05-20 17:40:03 +00:00