Commit Graph

4636 Commits

Author SHA1 Message Date
Robert Lougher 48ee75b7e3 Test commit - added a new line to vec_shuf-insert.ll.
llvm-svn: 201083
2014-02-10 12:42:13 +00:00
Elena Demikhovsky 9f423d6f25 AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors.
llvm-svn: 201066
2014-02-10 07:02:39 +00:00
Rafael Espindola 5054362920 Always create a temporary symbol to use with the cfi frame.
This is a small simplification and a small step in fixing pr18743 since
private functions on MachO should be using a 'l' prefix.

llvm-svn: 200994
2014-02-07 21:23:18 +00:00
Rafael Espindola e393aab13b Use FileCheck variables to simplify this test.
llvm-svn: 200992
2014-02-07 21:11:33 +00:00
Rafael Espindola 61acf5d9b0 Fix a bug with .weak_def_can_be_hidden: Mutable variables cannot use it.
Thanks to John McCall for noticing it.

llvm-svn: 200977
2014-02-07 16:21:30 +00:00
Jim Grosbach e9008de652 X86: Resolve a long standing FIXME and properly isel pextr[bw].
Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use
them to match the relevant pextr store instructions.

The test widen_load-2.ll requires a slight change because with the
stores gone, the remaining instructions are scheduled in a different
order.

Add test cases for SSE4 and AVX variants.

Resolves rdar://13414672.

Patch by Adam Nemet <anemet@apple.com>.

llvm-svn: 200957
2014-02-07 00:16:33 +00:00
Quentin Colombet 3a4bf0405e [CodeGenPrepare] Move away sign extensions that get in the way of addressing
mode.

Basically the idea is to transform code like this:
%idx = add nsw i32 %a, 1
%sextidx = sext i32 %idx to i64
%gep = gep i8* %myArray, i64 %sextidx
load i8* %gep

Into:
%sexta = sext i32 %a to i64
%idx = add nsw i64 %sexta, 1
%gep = gep i8* %myArray, i64 %idx
load i8* %gep

That way the computation can be folded into the addressing mode.

This transformation is done as part of the addressing mode matcher.
If the matching fails (not profitable, addressing mode not legal, etc.), the
matcher will revert the related promotions.

<rdar://problem/15519855>

llvm-svn: 200947
2014-02-06 21:44:56 +00:00
Quentin Colombet 87769713cf [RegAlloc] Add a last chance recoloring mechanism when everything else failed to
find a register.

The idea is to choose a color for the variable that cannot be allocated and
recolor its interferences around. Unlike the current register allocation scheme,
it is allowed to change the color of an already assigned (but maybe not
splittable or spillable) live interval while propagating this change to its
neighbors.
In other word, there are two things that may help finding an available color:
- Already assigned variables (RS_Done) can be recolored to different color.
- The recoloring allows to catch solutions that needs to touch more that just
  the neighbors of the current allocated variable.

E.g.,
vA can use {R1, R2    }
vB can use {    R2, R3}
vC can use {R1        }
Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
they all interfere.

vA is assigned R1
vB is assigned R2
vC tries to evict vA but vA is already done.
=> Regular register allocation heuristic fails.

Last chance recoloring kicks in:
vC does as if vA was evicted => vC uses R1.
vC is marked as fixed.
vA needs to find a color.
None are available.
vA cannot evict vC: vC is a fixed virtual register now.
vA does as if vB was evicted => vA uses R2.
vB needs to find a color.
R3 is available.
Recoloring => vC = R1, vA = R2, vB = R3.

<rdar://problem/15947839>

llvm-svn: 200883
2014-02-05 22:13:59 +00:00
Rafael Espindola b4eec1daa1 Remove support for not using .loc directives.
Clang itself was not using this. The only way to access it was via llc.

llvm-svn: 200862
2014-02-05 18:00:21 +00:00
Elena Demikhovsky 0b79be8ab2 AVX-512: optimized icmp -> sext -> icmp pattern
llvm-svn: 200849
2014-02-05 16:17:36 +00:00
Elena Demikhovsky a30e437659 AVX-512: Added intrinsic for cvtph2ps.
Added VPTESTNM instruction.
Added a pattern to vselect (lit tests will follow).

llvm-svn: 200823
2014-02-05 07:05:03 +00:00
David Blaikie 5e390e4df7 DebugInfo: Remove some unneeded conditionals now that DIBuilder no longer emits zero-length arrays as {i32 0}
A bunch of test cases needed to be cleaned up for this, many my fault -
when implementid imported modules I updated test cases by simply
duplicating the prior metadata field - which wasn't always the empty
metadata entry.

llvm-svn: 200731
2014-02-04 01:23:52 +00:00
Hal Finkel 5c968d9440 Expand vector bswap in LegalizeVectorOps
ISD::BSWAP was missing from the list of node types that should be expanded
element-wise.

llvm-svn: 200705
2014-02-03 17:27:25 +00:00
Josh Magee 24c7f06333 [stackprotector] Implement the sspstrong rules for stack layout.
This changes the PrologueEpilogInserter and LocalStackSlotAllocation passes to
follow the extended stack layout rules for sspstrong and sspreq.

The sspstrong layout rules are:
 1. Large arrays and structures containing large arrays (>= ssp-buffer-size)
are closest to the stack protector.
 2. Small arrays and structures containing small arrays (< ssp-buffer-size) are
2nd closest to the protector.
 3. Variables that have had their address taken are 3rd closest to the
protector.


Differential Revision: http://llvm-reviews.chandlerc.com/D2546

llvm-svn: 200601
2014-02-01 01:36:16 +00:00
Reid Kleckner f5b76518c9 Implement inalloca codegen for x86 with the new inalloca design
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.

Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work.  As a result these changes are pretty
minimal.

Reviewers: echristo

Differential Revision: http://llvm-reviews.chandlerc.com/D2637

llvm-svn: 200596
2014-01-31 23:50:57 +00:00
Reid Kleckner dfbed59cc2 Don't put non-static allocas in the static alloca map
Allocas marked inalloca are never static, but we were trying to put them
into the static alloca map if they were in the entry block.  Also add an
assertion in x86 fastisel.

llvm-svn: 200593
2014-01-31 23:45:12 +00:00
Reid Kleckner edb94c70c1 Set -mcpu to make this test pass on atom bots
llvm-svn: 200588
2014-01-31 22:58:10 +00:00
Lang Hames 5ec150c967 Replace X86 FMA intrinsic pseduo-instructions with def pats.
It looks like these pseudos were only used for pattern matching. Def pats are
the appropriate way to do that. As a bonus, these intrinsics will now have
memory operands folded properly, and better FMA3 variants selected where
appropriate (see r199933).

<rdar://problem/15611947>

llvm-svn: 200577
2014-01-31 21:29:19 +00:00
Reid Kleckner 1c843228f8 [ms-cxxabi] Add a new calling convention that swaps 'this' and 'sret'
MSVC always places the 'this' parameter for a method first.  The
implicit 'sret' pointer for methods always comes second.  We already
implement this for __thiscall by putting sret parameters on the stack,
but __cdecl methods require putting both parameters on the stack in
opposite order.

Using a special calling convention allows frontends to keep the sret
parameter first, which avoids breaking lots of assumptions in LLVM and
Clang.

Fixes PR15768 with the corresponding change in Clang.

Reviewers: ributzka, majnemer

Differential Revision: http://llvm-reviews.chandlerc.com/D2663

llvm-svn: 200561
2014-01-31 17:41:22 +00:00
Manman Ren 413a6cb42b This patch teaches the DAGCombiner how to fold insert_subvector nodes
when the input is a concat_vectors and the insert replaces one of the
concat halves:

Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)

This can be seen with the following IR:

define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
  %1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
  %2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)

The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.

Using AVX, without the patch this generates two vinsertf128 instructions:

vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0

With the patch this is optimized into:

vinsertf128 $1, %xmm1, %ymm2, %ymm0

Patch by Robert Lougher.

llvm-svn: 200506
2014-01-31 01:10:35 +00:00
Manman Ren 4ece7452ba PGO branch weight: update edge weights in SelectionDAGBuilder.
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.

The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".

Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.

llvm-svn: 200502
2014-01-31 00:42:44 +00:00
Juergen Ributzka fb4d648295 [Stackmaps] Record the stack size of each function that contains a stackmap/patchpoint intrinsic.
Re-applying the patch, but this time without using AsmPrinter methods.

Reviewed by Andy

llvm-svn: 200481
2014-01-30 18:58:27 +00:00
Juergen Ributzka f6f0ce903e Revert "[Stackmaps] Record the stack size of each function that contains a stackmap/patchpoint intrinsic."
This reverts commit r200444 to unbreak buildbots.

llvm-svn: 200445
2014-01-30 03:34:02 +00:00
Juergen Ributzka aece7583a7 [Stackmaps] Record the stack size of each function that contains a stackmap/patchpoint intrinsic.
Reviewed by Andy

llvm-svn: 200444
2014-01-30 03:06:14 +00:00
Manman Ren 7407e0e31c Revert r200431 due to bot failures.
llvm-svn: 200434
2014-01-30 00:53:27 +00:00
Manman Ren 104e0c80cc PGO branch weight: update edge weights in SelectionDAGBuilder.
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.

llvm-svn: 200431
2014-01-30 00:24:37 +00:00
Rafael Espindola 310f501ef0 Use a raw_stream to implement the mangler.
This is a bit more convenient for some callers, but more importantly, it is
easier to implement correctly. Doing this removes the patching of already
printed data that was used for fastcall, fixing a crash with private fastcall
symbols.

llvm-svn: 200367
2014-01-29 02:30:38 +00:00
Andrea Di Biagio 2ea61f17ad [X86] Add extra rules for combining vselect dag nodes into movsd.
This improves the fix committed at revision 199683 adding the
following new target specific combine rules:

1) fold (v4i32: vselect <0,0,-1,-1>, A, B) ->
        (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) ))

2) fold (v4f32: vselect <0,0,-1,-1>, A, B) ->
        (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) ))

3) fold (v4i32: vselect <-1,-1,0,0>, A, B) ->
        (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))

4) fold (v4f32: vselect <-1,-1,0,0>, A, B) ->
        (v4f32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))

llvm-svn: 200324
2014-01-28 18:14:21 +00:00
Andrea Di Biagio b6d39afbda [DAGCombiner] Avoid introducing an illegal build_vector when folding a sign_extend.
Make sure that we don't introduce illegal build_vector dag nodes
when trying to fold a sign_extend of a build_vector.

This fixes a regression introduced by r200234.
Added test CodeGen/X86/fold-vector-sext-crash.ll
to verify that llc no longer crashes with an assertion failure
due to an illegal build_vector of type MVT::v4i64.

Thanks to Ilia Filippov for spotting this regression and for
providing a reproducible test case.

llvm-svn: 200313
2014-01-28 12:53:56 +00:00
Andrea Di Biagio f09a357765 [DAGCombiner] Teach how to fold sext/aext/zext of constant build vectors.
This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).

The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.

Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.

llvm-svn: 200234
2014-01-27 18:45:30 +00:00
Stepan Dyatkovskiy 55139555c4 Additional fix for 200201: due to dependence on bitwidth test was moved to X86 directory.
llvm-svn: 200202
2014-01-27 09:43:10 +00:00
Juergen Ributzka f26beda7c7 Revert "Revert "Add Constant Hoisting Pass" (r200034)"
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.

llvm-svn: 200062
2014-01-25 02:02:55 +00:00
Hans Wennborg 4d67a2e85a Revert "Add Constant Hoisting Pass" (r200034)
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.

We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.

llvm-svn: 200058
2014-01-25 01:18:18 +00:00
Juergen Ributzka 4f3df4ad64 Add Constant Hoisting Pass
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.

llvm-svn: 200034
2014-01-24 20:18:00 +00:00
Lang Hames c63c52e03c Add a testcase for the changes in r199938.
<rdar://problem/15611947>

llvm-svn: 200027
2014-01-24 19:00:19 +00:00
Juergen Ributzka 50e7e80d00 Revert "Add Constant Hoisting Pass"
This reverts commit r200022 to unbreak the build bots.

llvm-svn: 200024
2014-01-24 18:40:30 +00:00
Juergen Ributzka 38b67d0caf Add Constant Hoisting Pass
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.

First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.

If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.

When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.

This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)

Reviewed by Eric

llvm-svn: 200022
2014-01-24 18:23:08 +00:00
Alp Toker cb40291100 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Juergen Ributzka e758ddcd16 [X86] Prevent the creation of redundant ops for sadd and ssub with overflow.
This commit teaches the X86 backend to create the same X86 instructions when it
lowers an sadd/ssub with overflow intrinsic and a conditional branch that uses
that overflow result. This allows SelectionDAG to recognize and remove one of
the redundant operations.

This fixes <rdar://problem/15874016> and <rdar://problem/15661073>.

Reviewed by Nadav

llvm-svn: 199976
2014-01-24 06:47:57 +00:00
Lang Hames 23de211c5d Replace vfmaddxx213 instructions with their 231-type equivalents in accumulator
loops. Writing back to the accumulator (231-type) allows the coalescer to
eliminate an extra copy.

llvm-svn: 199933
2014-01-23 20:23:36 +00:00
Elena Demikhovsky a5d38a39a0 AVX-512: added VPERM2D VPERM2Q VPERM2PS VPERM2PD instructions,
they give better sequences than VPERMI

llvm-svn: 199893
2014-01-23 14:27:26 +00:00
Owen Anderson 77e4d44411 Revert r162101 and replace it with a solution that works for targets where the pointer type is illegal.
This is a horrible bit of code.  We're calling a simplification routine *in the middle* of type legalization.  We tell the
simplification routine that it's running after legalization, but some of the types it will encounter will be illegal!  The
fix is only to invoke the simplification if the types in question were legal, so that none of its invariants will be violated.

llvm-svn: 199847
2014-01-22 22:34:17 +00:00
Quentin Colombet 5a8739e023 Add a testcase for r199430.
llvm-svn: 199831
2014-01-22 20:11:50 +00:00
Elena Demikhovsky 9d56f1e0e5 AVX512: combining setcc and zext is wrong on AVX512
because vector compare instruction puts result in mask register.

llvm-svn: 199798
2014-01-22 12:26:19 +00:00
James Molloy d787d3e593 MachineCopyPropagation has special logic for removing COPY instructions. It will remove plain COPYs using eraseFromParent(), but if the COPY has imp-defs/imp-uses it will convert it to a KILL, to keep the imp-def around.
This actually totally breaks and causes the machine verifier to cry in several cases, one of which being:

%RAX<def> = COPY %RCX<kill>
%ECX<def> = COPY %EAX<kill>, %RAX<imp-use,kill>

These subregister copies are together identified as noops, so are both removed. However, the second one as it has an imp-use gets converted into a kill:

%ECX<def> = KILL %EAX<kill>, %RAX<imp-use,kill>

As the original COPY has been removed, the verifier goes into tears at the use of undefined EAX and RAX.

There are several hacky solutions to this hacky problem (which is all to do with imp-use/def weirdnesses), but the least hacky I've come up with is to *always* remove COPYs by converting to KILLs. KILLs are no-ops to the code generator so the generated code doesn't change (which is why they were partially used in the first place), but using them also keeps the def/use and imp-def/imp-use chains alive:

%RAX<def> = KILL %RCX<kill>
%ECX<def> = KILL %EAX<kill>, %RAX<imp-use,kill>

The patch passes all test cases including the ones that check the removal of MOVs in this circumstance, along with an extra test I added to check subregister behaviour (which made the machine verifier fall over before my patch).

The patch also adds some DEBUG() statements because the file hadn't got any.

llvm-svn: 199797
2014-01-22 09:12:27 +00:00
Chandler Carruth 24785a1048 Tweak the spelling of the asserts requirement a bit more. This makes it
match the (reasonably prevelant) usage in Clang's test suite and so
seems more "canonical".

llvm-svn: 199767
2014-01-21 22:39:19 +00:00
Andrea Di Biagio 450d1661be [X86] Teach how to combine a vselect into a movss/movsd
Add target specific rules for combining vselect dag nodes into movss/movsd
when possible.

If the vector type of the vselect dag node in input is either MVT::v4i13 or
MVT::v4f32, then try to fold according to rules:

  1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B)
  2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A)

If the vector type of the vselect dag node in input is either MVT::v2i64 or
MVT::v2f64 (and we have SSE2), then try to fold according to rules:

  3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B)
  4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A)

llvm-svn: 199683
2014-01-20 19:35:22 +00:00
Hal Finkel 9ff54e1fcb Fix misched-aa-colored.ll to require asserts (trying again)
Perhaps it needs to be in caps.

llvm-svn: 199661
2014-01-20 14:15:28 +00:00
Hal Finkel a6bcadeb4a Fix misched-aa-colored.ll to require asserts.
-misched=shuffle is NDEBUG only. Maybe we should change that.

llvm-svn: 199659
2014-01-20 14:09:34 +00:00
Hal Finkel cd9569c19e Update IR when merging slots in stack coloring
The way that stack coloring updated MMOs when merging stack slots, while
correct, is suboptimal, and is incompatible with the use of AA during
instruction scheduling. The solution, which involves the use of const_cast (and
more importantly, updating the IR from within an MI-level pass), obviously
requires some explanation:

When the stack coloring pass was originally committed, the code in
ScheduleDAGInstrs::buildSchedGraph tracked possible alias sets by using
GetUnderlyingObject, and all load/store and store/store memory control
dependencies where added between SUs at the object level (where only one
object, that returned by GetUnderlyingObject, was used to identify the object
associated with each MMO). When stack coloring merged stack slots, it would
replace MMOs derived from the remapped alloca with the alloca with which the
remapped alloca was being replaced. Because ScheduleDAGInstrs only used single
objects, and tracked alias sets at the object level, this was a fine solution.

In r169744, (Andy and) I updated the code in ScheduleDAGInstrs to use
GetUnderlyingObjects, and track alias sets using, potentially, multiple
underlying objects for each MMO. This was done, primarily, to provide the
ability to look through PHIs, and provide better scheduling for
induction-variable-dependent loads and stores inside loops. At this point, the
MMO-updating code in stack coloring became suboptimal, because it would clear
the MMOs for (i.e. completely pessimize) all instructions for which r169744
might help in scheduling. Updating the IR directly is the simplest fix for this
(and the one with, by far, the least compile-time impact), but others are
possible (we could give each MMO a small vector of potential values, or make
use of a remapping table, constructed from MFI, inside ScheduleDAGInstrs).

Unfortunately, replacing all MMO values derived from the remapped alloca with
the base replacement alloca fundamentally breaks our ability to use AA during
instruction scheduling (which is critical to performance on some targets). The
reason is that the original MMO might have had an offset (either constant or
dynamic) from the base remapped alloca, and that offset is not present in the
updated MMO. One possible way around this would be to use
GetPointerBaseWithConstantOffset, and update not only the MMO's value, but also
its offset based on the original offset. Unfortunately, this solution would
only handle constant offsets, and for safety (because AA is not completely
restricted to deducing relationships with constant offsets), we would need to
clear all MMOs without constant offsets over the entire function. This would be
an even worse pessimization than the current single-object restriction. Any
other solution would involve passing around a vector of remapped allocas, and
teaching AA to use it, introducing additional complexity and overhead into AA.

Instead, when remapping an alloca, we replace all IR uses of that alloca as
well (optionally inserting a bitcast as necessary). This is even more efficient
that the old MMO-updating code in the stack coloring pass (because it removes
the need to call GetUnderlyingObject on all MMO values), removes the
single-object pessimization in the default configuration, and enables the
correct use of AA during instruction scheduling (all without any additional
overhead).

LLVM now no longer miscompiles itself on x86_64 when using -enable-misched
-enable-aa-sched-mi -misched-bottomup=0 -misched-topdown=0 -misched=shuffle!
Fixed PR18497.

Because the alloca replacement is now done at the IR level, unless the MMO
directly refers to the remapped alloca, the change cannot be seen at the MI
level. As a result, there is no good way to fix test/CodeGen/X86/pr14090.ll.

llvm-svn: 199658
2014-01-20 14:03:16 +00:00
Juergen Ributzka e625013071 Add two new calling conventions for runtime calls
This patch adds two new target-independent calling conventions for runtime
calls - PreserveMost and PreserveAll.
The target-specific implementation for X86-64 is defined as following:
  - Arguments are passed as for the default C calling convention
  - The same applies for the return value(s)
  - PreserveMost preserves all GPRs - except R11
  - PreserveAll preserves all GPRs and all XMMs/YMMs - except R11

Reviewed by Lang and Philip

llvm-svn: 199508
2014-01-17 19:47:03 +00:00
Elena Demikhovsky d1487261a0 AVX-512: fixed a compare pattern
llvm-svn: 199366
2014-01-16 08:45:54 +00:00
Rafael Espindola 306e2b019f Convert test to FileCheck.
llvm-svn: 199355
2014-01-16 06:31:20 +00:00
Andrea Di Biagio 12de5f3bc1 Update test/CodeGen/X86/vbinop-simplify-bug.ll.
Redirect the output of llc to /dev/null.

llvm-svn: 199329
2014-01-15 20:16:14 +00:00
Andrea Di Biagio d7c03ec348 [DAGCombiner] Fix a wrong check in method SimplifyVBinOp.
This fixes a regression intruced by r199135.

Revision 199135 tried to simplify part of the logic in method
DAGCombiner::SimplifyVBinOp introducing calls to method BuildVectorSDNode::isConstant().

However, that revision wrongly changed the check performed by method
SimplifyVBinOp to identify dag nodes that can be folded.
Before revision 199135, that method only tried to simplify vector binary operations
if both operands were build_vector of Constant/ConstantFP/Undef only.

After revision 199135, method SimplifyVBinop tried to
simplify also vector binary operations with only one constant operand.

This fixes the problem restoring the old behavior of SimplifyVBinOp.

llvm-svn: 199328
2014-01-15 19:51:32 +00:00
Nico Rieck c60647f0db Handle dllexport for global aliases
llvm-svn: 199219
2014-01-14 15:23:25 +00:00
Nico Rieck 7157bb765e Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199218
2014-01-14 15:22:47 +00:00
Elena Demikhovsky 767fc967b4 AVX-512: optimized scalar compare patterns
removed AVX512SI format, since it is similar to AVX512BI.

llvm-svn: 199217
2014-01-14 15:10:08 +00:00
Andrea Di Biagio 5448a3c771 [X86] Fix assertion failure caused by a wrong folding of vector shifts by immediate count.
This fixes a regression intruced by r198113.

Revision r198113 introduced an algorithm that tries to fold a vector shift
by immediate count into a build_vector if the input vector is a known vector
of constants.

However the algorithm only worked under the assumption that the input vector
type and the shift type are exactly the same.

This patch disables the folding of vector shift by immediate count if the
input vector type and the shift value type are not the same.

llvm-svn: 199213
2014-01-14 13:17:12 +00:00
Nico Rieck 9d2e0df049 Revert "Decouple dllexport/dllimport from linkage"
Revert this for now until I fix an issue in Clang with it.

This reverts commit r199204.

llvm-svn: 199207
2014-01-14 12:38:32 +00:00
Nico Rieck 1794b62f54 Revert "Handle dllexport for global aliases"
This reverts commit r199205.

llvm-svn: 199206
2014-01-14 12:36:54 +00:00
Nico Rieck 4192acdbc3 Handle dllexport for global aliases
llvm-svn: 199205
2014-01-14 11:55:40 +00:00
Nico Rieck e43aaf7967 Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199204
2014-01-14 11:55:03 +00:00
Mark Seaborn 8271118a65 Fix llc to not reuse spill slots in functions that invoke setjmp()
We need to ensure that StackSlotColoring.cpp does not reuse stack
spill slots in functions that call "returns_twice" functions such as
setjmp(), otherwise this can lead to miscompiled code, because a stack
slot would be clobbered when it's still live.

This was already handled correctly for functions that call setjmp()
(though this wasn't covered by a test), but not for functions that
invoke setjmp().

We fix this by changing callsFunctionThatReturnsTwice() to check for
invoke instructions.

This fixes PR18244.

llvm-svn: 199180
2014-01-14 04:20:01 +00:00
Juergen Ributzka 7384405f23 [DAG] Teach DAG to also reassociate vector operations
This commit teaches DAG to reassociate vector ops, which in turn enables
constant folding of vector op chains that appear later on during custom lowering
and DAG combine.

Reviewed by Andrea Di Biagio

llvm-svn: 199135
2014-01-13 20:51:35 +00:00
Elena Demikhovsky b19c9dc1a1 AVX-512: Embedded Rounding Control - encoding and printing
Changed intrinsics for vrcp14/vrcp28 vrsqrt14/vrsqrt28 - aligned with GCC.

llvm-svn: 199102
2014-01-13 12:55:03 +00:00
Nico Rieck f15341c9de Make test independent of scheduling
llvm-svn: 199055
2014-01-12 15:57:38 +00:00
NAKAMURA Takumi d7032ac21e llvm/test/CodeGen/X86/shl_undef.ll: Tweak to satisfy r199050.
Use intel syntax, or "shl" might hit "pushl".

llvm-svn: 199051
2014-01-12 14:41:41 +00:00
Nico Rieck b5262d6d8f Fix non-deterministic SDNodeOrder-dependent codegen
Reset SelectionDAGBuilder's SDNodeOrder to ensure deterministic code
generation.

llvm-svn: 199050
2014-01-12 14:09:17 +00:00
Benjamin Kramer c10563d14e Fix broken CHECK lines.
llvm-svn: 199016
2014-01-11 21:06:00 +00:00
NAKAMURA Takumi 80a474c1c3 llvm/test/CodeGen/X86/anyregcc.ll: Add explicit -mtriple=x86_64-unknown-unknown.
XMM(s) are really spilling for targeting Win64.

llvm-svn: 198999
2014-01-11 09:23:44 +00:00
Juergen Ributzka 976d94b834 [anyregcc] Fix callee-save mask for anyregcc
Use separate callee-save masks for XMM and YMM registers for anyregcc on X86 and
select the proper mask depending on the target cpu we compile for.

llvm-svn: 198985
2014-01-11 01:00:27 +00:00
Andrew Trick 32e1be7bd0 llvm.experimental.stackmap: fix encoding of large constants.
In the stackmap format we advertise the constant field as signed.
However, we were determining whether to promote to a 64-bit constant
pool based on an unsigned comparison.

This fix allows -1 to be encoded as a small constant.

llvm-svn: 198816
2014-01-09 00:22:31 +00:00
Hal Finkel 2150e3a743 Conservatively handle multiple MMOs in MIsNeedChainEdge
MIsNeedChainEdge, which is used by -enable-aa-sched-mi (AA in misched), had an
llvm_unreachable when -enable-aa-sched-mi is enabled and we reach an
instruction with multiple MMOs. Instead, return a conservative answer. This
allows testing -enable-aa-sched-mi on x86.

Also, this moves the check above the isUnsafeMemoryObject checks.
isUnsafeMemoryObject is currently correct only for instructions with one MMO
(as noted in the comment in isUnsafeMemoryObject):

  // We purposefully do no check for hasOneMemOperand() here
  // in hope to trigger an assert downstream in order to
  // finish implementation.

The problem with this is that, had the candidate edge passed the
"!MIa->mayStore() && !MIb->mayStore()" check, the hoped-for assert would never
happen (which could, in theory, lead to incorrect behavior if one of these
secondary MMOs was volatile, for example).

llvm-svn: 198795
2014-01-08 21:52:02 +00:00
Andrea Di Biagio 23df4e4a2d Teach the DAGCombiner how to fold 'vselect' dag nodes according
to the following two rules:
  1) fold (vselect (build_vector AllOnes), A, B) -> A
  2) fold (vselect (build_vector AllZeros), A, B) -> B

llvm-svn: 198777
2014-01-08 18:33:04 +00:00
David Woodhouse 79dd505ce1 [x86] Disambiguate RET[QL] and fix aliases for 16-bit mode
I couldn't see how to do this sanely without splitting RETQ from RETL.

Eric says: "sad about the inability to roundtrip them now, but...".
I have no idea what that means, but perhaps it wants preserving in the
commit comment.

llvm-svn: 198756
2014-01-08 12:58:07 +00:00
Elena Demikhovsky 172a27c750 AVX-512: Added more intrinsics for pmin/pmax, pabs, blend, pmuldq.
llvm-svn: 198745
2014-01-08 10:54:22 +00:00
Iain Sandoe 618def651b [patch] Adjust behavior of FDE cross-section relocs for targets that don't support abs-differences.
Modern versions of OSX/Darwin's ld (ld64 > 97.17) have an optimisation present that allows the back end to omit relocations (and replace them with an absolute difference) for FDE some text section refs.

This patch allows a backend to opt-in to this behaviour by setting "DwarfFDESymbolsUseAbsDiff".  At present, this is only enabled for modern x86 OSX ports.

test changes by David Fang.

llvm-svn: 198744
2014-01-08 10:22:54 +00:00
Rafael Espindola 170a6e7944 Don't assert with private type info variables.
With the gnu objc runtime private strings are used. Since we only need to
produce a unique label, the fix is to just drop the asserts.

llvm-svn: 198701
2014-01-07 19:38:47 +00:00
Andrew Trick dfacda3635 Fix for PR18396: Assertion: MO->isDead "Cannot fold physreg def".
InlineSpiller::foldMemoryOperand needs to handle undef call operands.

llvm-svn: 198679
2014-01-07 07:31:10 +00:00
Elena Demikhovsky 3629b4aa0e AVX-512: added intrinsic vcvtpd2ps (with rounding mode and without)
llvm-svn: 198593
2014-01-06 08:45:54 +00:00
Elena Demikhovsky f404e054a1 AVX-512: changed property name from "neverHasSideEffects=1" to "hasSideEffects=0", added this property to VMOVSS/VMOVSD;
Optimized a truncate pattern.

llvm-svn: 198562
2014-01-05 14:21:07 +00:00
Elena Demikhovsky 52e4a0e109 AVX-512: Added more intrinsics for convert and min/max.
Removed vzeroupper from AVX-512 mode - our optimization gude does not recommend to insert vzeroupper at all.

llvm-svn: 198557
2014-01-05 10:46:09 +00:00
Quentin Colombet 1fb3362a6e [RegAlloc] Make tryInstructionSplit less aggressive.
The greedy register allocator tries to split a live-range around each
instruction where it is used or defined to relax the constraints on the entire
live-range (this is a last chance split before falling back to spill).
The goal is to have a big live-range that is unconstrained (i.e., that can use
the largest legal register class) and several small local live-range that carry
the constraints implied by each instruction.
E.g.,
Let csti be the constraints on operation i.

V1=
op1 V1(cst1)
op2 V1(cst2)

V1 live-range is constrained on the intersection of cst1 and cst2.

tryInstructionSplit relaxes those constraints by aggressively splitting each
def/use point:
V1=
V2 = V1
V3 = V2
op1 V3(cst1)
V4 = V2
op2 V4(cst2)

Because of how the coalescer infrastructure works, each new variable (V3, V4)
that is alive at the same time as V1 (or its copy, here V2) interfere with V1.
Thus, we end up with an uncoalescable copy for each split point.

To make tryInstructionSplit less aggressive, we check if the split point
actually relaxes the constraints on the whole live-range. If it does not, we do
not insert it.
Indeed, it will not help the global allocation problem:
- V1 will have the same constraints.
- V1 will have the same interference + possibly the newly added split variable
  VS.
- VS will produce an uncoalesceable copy if alive at the same time as V1.

<rdar://problem/15570057>

llvm-svn: 198369
2014-01-02 22:47:22 +00:00
Elena Demikhovsky de3f751baf AVX-512: Added intrinsics for vcvt, vcvtt, vrndscale, vcmp
Printing rounding control.
Enncoding for EVEX_RC (rounding control).

llvm-svn: 198277
2014-01-01 15:12:34 +00:00
NAKAMURA Takumi cf396cf82c llvm/test/CodeGen/X86/vselect.ll: Unbreak Windows x64 targets to add -mtriple=x86_64-unknown-unknown.
llvm-svn: 198114
2013-12-28 13:04:29 +00:00
Andrea Di Biagio eaceba0ed0 [X86] Teach the backend how to fold target specific dag node for packed
vector shift by immedate count (VSHLI/VSRLI/VSRAI) into a build_vector when
the vector in input to the shift is a build_vector of all constants or UNDEFs.

Target specific nodes for packed shifts by immediate count are in
general introduced by function 'getTargetVShiftByConstNode' (in
X86ISelLowering.cpp) when lowering shift operations, SSE/AVX immediate
shift intrinsics and (only in very few cases) SIGN_EXTEND_INREG dag
nodes.

This patch adds extra rules for simplifying vector shifts inside
function 'getTargetVShiftByConstNode'.

Added file test/CodeGen/X86/vec_shift5.ll to verify that packed
shifts by immediate are correctly folded into a build_vector when the
input vector to the shift dag node is a vector of constants or undefs.

llvm-svn: 198113
2013-12-28 11:11:52 +00:00
Andrea Di Biagio 46dcddb350 Teach DAGCombiner how to fold a SIGN_EXTEND_INREG of a BUILD_VECTOR of
ConstantSDNodes (or UNDEFs) into a simple BUILD_VECTOR.

For example, given the following sequence of dag nodes:

  i32 C = Constant<1>
  v4i32 V = BUILD_VECTOR C, C, C, C
  v4i32 Result = SIGN_EXTEND_INREG V, ValueType:v4i1

The SIGN_EXTEND_INREG node can be folded into a build_vector since
the vector in input is a BUILD_VECTOR of constants.

The optimized sequence is:

  i32 C = Constant<-1>
  v4i32 Result = BUILD_VECTOR C, C, C, C

llvm-svn: 198084
2013-12-27 20:20:28 +00:00
Elena Demikhovsky 64c9548d66 AVX-512: fixed some patterns for MVT::i1
llvm-svn: 197981
2013-12-24 14:24:07 +00:00
Elena Demikhovsky fe24a30e38 AVX512: SETCC returns i1 for AVX-512 and i8 for all others
llvm-svn: 197876
2013-12-22 10:13:18 +00:00
Quentin Colombet 90a646e4d1 [X86][fast-isel] Fix select lowering.
The condition in selects is supposed to be i1.
Make sure we are just reading the less significant bit
of the 8 bits width value to match this constraint.

<rdar://problem/15651765>

llvm-svn: 197712
2013-12-19 18:32:04 +00:00
Josh Magee 22b8ba2d67 [stackprotector] Use analysis from the StackProtector pass for stack layout in PEI a nd LocalStackSlot passes.
This changes the MachineFrameInfo API to use the new SSPLayoutKind information
produced by the StackProtector pass (instead of a boolean flag) and updates a
few pass dependencies (to preserve the SSP analysis).

The stack layout follows the same approach used prior to this change - i.e.,
only LargeArray stack objects will be placed near the canary and everything
else will be laid out normally.  After this change, structures containing large
arrays will also be placed near the canary - a case previously missed by the
old implementation.

Out of tree targets will need to update their usage of
MachineFrameInfo::CreateStackObject to remove the MayNeedSP argument. 

The next patch will implement the rules for sspstrong and sspreq.  The end goal
is to support ssp-strong stack layout rules.

WIP.

Differential Revision: http://llvm-reviews.chandlerc.com/D2158

llvm-svn: 197653
2013-12-19 03:17:11 +00:00
Andrew Trick e4083f9e85 Disabled subregister copy coalescing during MachineCSE.
This effectively backs out r197465 but leaves some of the general
fixes in place. Not all targets are ready to handle this feature. To
enable it, some infrastructure work is needed to better handle
register class constraints.

llvm-svn: 197514
2013-12-17 19:29:36 +00:00
Quentin Colombet b4c44d239c Add warning capabilities in LLVM.
This reapplies r197438 and fixes the link-time circular dependency between
IR and Support. The fix consists in moving the diagnostic support into IR.

The patch adds a new LLVMContext::diagnose that can be used to communicate to
the front-end, if any, that something of interest happened.
The diagnostics are supported by a new abstraction, the DiagnosticInfo class.
The base class contains the following information:
- The kind of the report: What this is about.
- The severity of the report: How bad this is.

This patch also adds 2 classes:
- DiagnosticInfoInlineAsm: For inline asm reporting. Basically, this diagnostic
will be used to switch to the new diagnostic API for LLVMContext::emitError.
- DiagnosticStackSize: For stack size reporting. Comes as a replacement of the
hard coded warning in PEI.

This patch also features dynamic diagnostic identifiers. In other words plugins
can use this infrastructure for their own diagnostics (for more details, see
getNextAvailablePluginDiagnosticKind).

This patch introduces a new DiagnosticHandlerTy and a new DiagnosticContext in
the LLVMContext that should be set by the front-end to be able to map these
diagnostics in its own system.

http://llvm-reviews.chandlerc.com/D2376
<rdar://problem/15515174>

llvm-svn: 197508
2013-12-17 17:47:22 +00:00
Duncan P. N. Exon Smith 93c6ba6ae8 Setting the CPU in the new vaargs test
Trying to fix buildbots after r197503 (test passes locally).

<rdar://problem/15627766>

llvm-svn: 197505
2013-12-17 16:20:37 +00:00
Duncan P. N. Exon Smith 512601d77f Revert "Revert "Mark vastart_save_xmm_regs as changing EFLAGS""
This reverts commit r197481, recommiting r197469 with an extra fix.

The vastart_save_xmm_regs pseudo-instruction expands to a test and a
branch, so it modifies EFLAGS.  Mark it so, or else the scheduler might
place it in the middle of another test+branch.

This fixes a bug exposed by r192750, which changed the initial scheduler
to source-order as part of enabling the MI Scheduler for X86.

This re-commit changes the VASTART_SAVE_XMM_REGS custom inserter not to
try to save %flags, and adds a test that catches the bad behavior of
r197469.

<rdar://problem/15627766>

llvm-svn: 197503
2013-12-17 15:54:45 +00:00
Stepan Dyatkovskiy 7f7c2710e0 Fix for PR18045:
http://llvm.org/bugs/show_bug.cgi?id=18045

Short issue description:
For X86 machines with sse < sse4.1 we got failures for some
particular load/store vector sequences:

$ clang-trunk -m32 -O2 test-case.c
fatal error: error in backend: Cannot select: 0x4200920: v4i32,ch = load 0x41d6ab0, 0x4205850,
      0x41dcb10<LD16[getelementptr inbounds ([4 x i32]* @e, i32 0, i32 0)](align=4)> [ORD=82]
      [ID=58]
  0x4205850: i32 = X86ISD::Wrapper 0x41d5490 [ORD=26] [ID=43]
    0x41d5490: i32 = TargetGlobalAddress<[4 x i32]* @e> 0 [ORD=26] [ID=23]
  0x41dcb10: i32 = undef [ID=2]

The reason is that EltsFromConsecutiveLoads could emit such load instruction
both before and after legalize stage. Though this instruction is not legal for
machines with SSSE3 and lower.

The fix: In EltsFromConsecutiveLoads, if we have passed legalize stage, we
check whether nodes it emits are legal. 

P.S.: If you get failure in time from 12:00 and till 22:00 (UTC-8),
perhaps I'll slow with response, so you better reject this commit. Thanks!

llvm-svn: 197492
2013-12-17 12:07:33 +00:00
Elena Demikhovsky c5f6726a24 AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey Bader).
Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1.

llvm-svn: 197482
2013-12-17 08:33:15 +00:00
Duncan P. N. Exon Smith b2d4274d3f Revert "Mark vastart_save_xmm_regs as changing EFLAGS"
This reverts commit r197469.

The sanitizer and dragonegg buildbots are failing, I think because of
this change.  Reverting until I figure out why.

llvm-svn: 197481
2013-12-17 07:13:58 +00:00
Duncan P. N. Exon Smith a4acde39e9 Mark vastart_save_xmm_regs as changing EFLAGS
The vastart_save_xmm_regs pseudo-instruction expands to a test and a
branch, so it modifies EFLAGS.  Mark it so, or else the scheduler might
place it in the middle of another test+branch.

This fixes a bug exposed by r192750, which turned on the MI Scheduler
for X86.

<rdar://problem/15627766>

llvm-svn: 197469
2013-12-17 06:12:05 +00:00
Andrew Trick e339828b90 Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies.
Without this, MachineCSE is powerless to handle redundant operations with truncated source operands.

This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled:

     %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
     %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
     %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>

Test case: cse-add-with-overflow.ll.

This exposed an existing bug in
PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case:
PowerPC/crash.ll.

llvm-svn: 197465
2013-12-17 04:50:45 +00:00
Quentin Colombet 382b135d92 Revert r197438 and r197447 until we figure out how to avoid circular dependency at link time
llvm-svn: 197451
2013-12-17 01:19:59 +00:00
Quentin Colombet 66673f4075 Add warning capabilities in LLVM.
The patch adds a new LLVMContext::diagnose that can be used to communicate to
the front-end, if any, that something of interest happened.
The diagnostics are supported by a new abstraction, the DiagnosticInfo class.
The base class contains the following information:
- The kind of the report: What this is about.
- The severity of the report: How bad this is.

This patch also adds 2 classes:
- DiagnosticInfoInlineAsm: For inline asm reporting. Basically, this diagnostic
will be used to switch to the new diagnostic API for LLVMContext::emitError.
- DiagnosticStackSize: For stack size reporting. Comes as a replacement of the
hard coded warning in PEI.

This patch also features dynamic diagnostic identifiers. In other words plugins
can use this infrastructure for their own diagnostics (for more details, see
getNextAvailablePluginDiagnosticKind).

This patch introduces a new DiagnosticHandlerTy and a new DiagnosticContext in
the LLVMContext that should be set by the front-end to be able to map these
diagnostics in its own system.

http://llvm-reviews.chandlerc.com/D2376
<rdar://problem/15515174>

llvm-svn: 197438
2013-12-16 23:22:51 +00:00
Juergen Ributzka 9ed985baad [Stackmap] Allow WebKit_JS calling convention to store 4 byte sized and aligned arguments.
This allows the WebKit_JS calling convention to perform partial writes on a 4
byte granularity to stack slots.

llvm-svn: 197431
2013-12-16 22:05:32 +00:00
Rafael Espindola f152836788 Revert "Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies."
This reverts commit r197414.

It broke the ppc64 bootstrap. I will post a testcase in a sec.

llvm-svn: 197424
2013-12-16 20:57:09 +00:00
Juergen Ributzka b1612c18ab [Stackmap] The first integer argument is passed in register for the WebKit_JS calling convention.
Pass the first integer argument (callee) in register to optimize inline caches.

llvm-svn: 197416
2013-12-16 19:53:31 +00:00
Andrew Trick 88bd8629b2 Allow MachineCSE to coalesce trivial subregister copies the same way
that it coalesces normal copies.

Without this, MachineCSE is powerless to handle redundant operations
with truncated source operands.

This required fixing the 2-addr pass to handle tied subregisters. It
isn't clear what combinations of subregisters can legally be tied, but
the simple case of truncated source operands is now safely handled:

     %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
     %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
     %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>

llvm-svn: 197414
2013-12-16 19:36:21 +00:00
Elena Demikhovsky 3bb50b0ff8 fixed one more line
llvm-svn: 197387
2013-12-16 14:36:50 +00:00
Elena Demikhovsky e8323a88ea Fixed the test - added -mcpu=penryn flag to avoid ambiguity in code generation.
llvm-svn: 197385
2013-12-16 14:24:08 +00:00
Elena Demikhovsky 47fc44e52e AVX-512: Added legal type MVT::i1 and VK1 register for it.
Added scalar compare VCMPSS, VCMPSD.
Implemented LowerSELECT for scalar FP operations.
I replaced FSETCCss, FSETCCsd with one node type FSETCCs.
Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1.

llvm-svn: 197384
2013-12-16 13:52:35 +00:00
Juergen Ributzka e82947539e [Stackmap] Liveness Analysis Pass
This optional register liveness analysis pass can be enabled with either
-enable-stackmap-liveness, -enable-patchpoint-liveness, or both. The pass
traverses each basic block in a machine function. For each basic block the
instructions are processed in reversed order and if a patchpoint or stackmap
instruction is encountered the current live-out register set is encoded as a
register mask and attached to the instruction.

Later on during stackmap generation the live-out register mask is processed and
also emitted as part of the stackmap.

This information is optional and intended for optimization purposes only. This
will enable a client of the stackmap to reason about the registers it can use
and which registers need to be preserved.

Reviewed by Andy

llvm-svn: 197317
2013-12-14 06:53:06 +00:00
Andrew Trick 7bcb0100df Revert "Liveness Analysis Pass"
This reverts commit r197254.

This was an accidental merge of Juergen's patch. It will be checked in
shortly, but wasn't meant to go in quite yet.

Conflicts:
	include/llvm/CodeGen/StackMaps.h
	lib/CodeGen/StackMaps.cpp
	test/CodeGen/X86/stackmap-liveness.ll

llvm-svn: 197260
2013-12-13 18:57:20 +00:00
Andrew Trick e8cba373a3 Grow the stackmap/patchpoint format to hold 64-bit IDs.
llvm-svn: 197255
2013-12-13 18:37:10 +00:00
Andrew Trick 8d6a658430 Liveness Analysis Pass
llvm-svn: 197254
2013-12-13 18:37:03 +00:00
Benjamin Kramer e723bb10b0 X86: When lowering shl_parts, don't emit shift amounts larger than the bit width.
While it's safe for the X86-specific shift nodes, dag combining will
kill generic nodes. Insert an AND to make it safe, isel will nuke it
as x86's shift instructions have an implicit AND.

Fixes PR16108, which contains a contraption to hit this case in between
constant folders.

llvm-svn: 197228
2013-12-13 13:40:24 +00:00
Kai Nacke 87b23aec08 Change stack probing code for MingW.
Since gcc 4.6 the compiler uses ___chkstk_ms which has the same semantics as the
MS CRT function __chkstk. This simplifies the prologue generation a bit.

Reviewed by Rafael Espíndola. 

llvm-svn: 197205
2013-12-13 05:37:05 +00:00
Rafael Espindola 32cb5ac904 Switch to the new MingW ABI.
GCC 4.7 changed the MingW ABI. On the LLVM side it means that sret functions
don't pop the stack.

llvm-svn: 197163
2013-12-12 16:06:58 +00:00
Andrea Di Biagio 9b5c3dcf01 Added new X86 patterns to select SSE scalar fp arithmetic instructions from
a vector packed single/double fp operation followed by a vector insert.

The effect is that the backend coverts the packed fp instruction
followed by a vectro insert into a SSE or AVX scalar fp instruction.

For example, given the following code:
   __m128 foo(__m128 A, __m128 B) {
     __m128 C = A + B;
     return (__m128) {c[0], a[1], a[2], a[3]};
   }

 previously we generated:
   addps %xmm0, %xmm1
   movss %xmm1, %xmm0
 
 we now generate:
   addss %xmm1, %xmm0

llvm-svn: 197145
2013-12-12 11:50:47 +00:00
Rafael Espindola 2b5a0c9e68 On ELF and COFF treat linker_private like private.
The linkers on these systems don't have anything special to do with these
symbols. Since the intent is for them to be absent from the final object,
just treat them as private.

llvm-svn: 197080
2013-12-11 22:18:44 +00:00
Elena Demikhovsky cf08809813 AVX-512: Removed "z" suffix from AVX-512 instructions, since it is incompatible with GCC.
I moved a test from avx512-vbroadcast-crash.ll to avx512-vbroadcast.ll
I defined HasAVX512 predicate as AssemblerPredicate. It means that you should invoke llvm-mc with "-mcpu=knl" to get encoding for AVX-512 instructions. I need this to let AsmMatcher to set different encoding for AVX and AVX-512 instructions that have the same mnemonic and operands (all scalar instructions).

llvm-svn: 197041
2013-12-11 14:31:04 +00:00
Manuel Klimek b6d333fb09 Fix XFAIL rules.
llvm-svn: 197017
2013-12-11 08:38:42 +00:00
Rafael Espindola ffa9b9e427 Make this test a bit stricter.
The extra CHECK and CHECK-NEXT are there to show that we don't print a
linker private symbol on linux.

llvm-svn: 197003
2013-12-11 04:10:41 +00:00
Reid Kleckner ad92aca47c Revert the backend fatal error from r196939
The combination of inline asm, stack realignment, and dynamic allocas
turns out to be too common to reject out of hand.

ASan inserts empy inline asm fragments and uses aligned allocas.
Compiling any trivial function containing a dynamic alloca with ASan is
enough to trigger the check.

XFAIL the test cases that would be miscompiled and add one that uses the
relevant functionality.

llvm-svn: 196986
2013-12-10 23:23:52 +00:00
David Fang 1b01849f2d on darwin<10, fallback to .weak_definition (PPC,X86)
.weak_def_can_be_hidden was not yet supported by the system assembler

llvm-svn: 196970
2013-12-10 21:37:41 +00:00
Reid Kleckner ee08897fb8 Reland "Fix miscompile of MS inline assembly with stack realignment"
This re-lands commit r196876, which was reverted in r196879.

The tests have been fixed to pass on platforms with a stack alignment
larger than 4.

Update to clang side tests will land shortly.

llvm-svn: 196939
2013-12-10 18:27:32 +00:00
Andrea Di Biagio f7c33c8162 Ensure that the backend no longer emits unnecessary vector insert instructions
immediately after SSE scalar fp instructions like addss or mulss.

Added patterns to select SSE scalar fp arithmetic instructions from a scalar
fp operation followed by a blend.

For example, given the following code:
  __m128 foo(__m128 A, __m128 B) {
    A[0] += B[0];
    return A;
  }

previously we generated:
  addss %xmm0, %xmm1
  movss %xmm1, %xmm0

now we generate:
  addss %xmm1, %xmm0

llvm-svn: 196925
2013-12-10 15:22:48 +00:00
Elena Demikhovsky e382c3fdcd AVX-512: changed intrinsics for mask operations
llvm-svn: 196918
2013-12-10 13:53:10 +00:00
Elena Demikhovsky 6270b388c8 AVX-512: Changed intrinsics of VPCONFLICT to match GCC builtin form
llvm-svn: 196914
2013-12-10 11:58:35 +00:00
Reid Kleckner 0a9509f080 Revert "Fix miscompile of MS inline assembly with stack realignment"
This reverts commit r196876.  Its tests failed on the bots, so I'll
figure it out tomorrow.

llvm-svn: 196879
2013-12-10 05:31:27 +00:00
Reid Kleckner 7f10a8cd45 Fix miscompile of MS inline assembly with stack realignment
For stack frames requiring realignment, three pointers may be needed:
- ebp to address incoming arguments
- esi (could be any callee-saved register) to address locals
- esp to address outgoing arguments

We would use esi unconditionally without verifying that it did not
conflict with inline assembly.

This change doesn't do the verification, it simply emits a fatal error
on functions that use stack realignment, dynamic SP adjustments, and
inline assembly.

Because stack realignment is common on Windows, we also no longer assume
that MS inline assembly clobbers esp.  Instead, we analyze the inline
instructions for implicit definitions and check if esp is there.  If so,
we require the use of a base pointer and consider it in the condition
above.

Mostly fixes PR16830, but we could try harder to find a non-conflicting
base pointer.

Reviewers: sunfish

Differential Revision: http://llvm-reviews.chandlerc.com/D1317

llvm-svn: 196876
2013-12-10 05:12:23 +00:00
Nadav Rotem 6eee080450 Fix PR18162 - Incorrect assertion assumed that the SDValue resno is zero.
llvm-svn: 196858
2013-12-10 01:13:59 +00:00
Rafael Espindola 1a3a22fad1 Don't add suffixes for stdcall/fastcall on 64 coff.
This matches the behavior of both msvc and mingw.

llvm-svn: 196814
2013-12-09 20:44:48 +00:00
Cameron McInally e3cc4aacb9 Update AVX512 vector blend intrinsic names.
llvm-svn: 196581
2013-12-06 13:35:35 +00:00
Juergen Ributzka e3cba95d5c [Stackmap] Update stackmap unit test to use AnyRegCC.
llvm-svn: 196552
2013-12-06 00:28:54 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Rafael Espindola 01d19d0299 Hide the stub created for MO_ExternalSymbol too.
given

declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1)
declare void @foo()
define void @bar() {
  call void @foo()
  call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false)
  ret void
}

We used to produce

L_foo$stub:
        .indirect_symbol        _foo
        .ascii  "\364\364\364\364\364"

_memset$stub:
        .indirect_symbol        _memset
        .ascii  "\364\364\364\364\364"

We not produce a private stub for memset too.

Stubs are not needed with recent linkers, but we still produce them for darwin8.

Thanks to David Fang for confirming that gcc used to do this too.

llvm-svn: 196468
2013-12-05 05:19:12 +00:00
Cameron McInally 164097a6eb Add FileCheck statements for r196435.
llvm-svn: 196449
2013-12-05 01:20:36 +00:00
Cameron McInally 30bbb214e5 Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load.
Patch by Aleksey Bader.

llvm-svn: 196435
2013-12-05 00:11:25 +00:00
Cameron McInally cbb51dacfb Fix assembly syntax for AVX512 vector blend instructions.
llvm-svn: 196393
2013-12-04 18:05:36 +00:00
Cameron McInally c5f420e129 Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers.
Patch by Aleksey Bader.

llvm-svn: 196386
2013-12-04 14:52:33 +00:00
Rafael Espindola 757ae64731 Add -mcpu=core2 to all llc invocations in this test.
Should fix the atom buildbot.

llvm-svn: 196340
2013-12-04 01:25:24 +00:00
Juergen Ributzka 17e0d9ee6c [Stackmap] Emit multi-byte nops for X86.
llvm-svn: 196334
2013-12-04 00:39:08 +00:00
Rafael Espindola f4d34153f5 Use CHECK-LABEL to make this test more strict.
llvm-svn: 196321
2013-12-03 21:12:36 +00:00
Rafael Espindola 0a2baf8eaf Fix mingw32 thiscall + sret.
Unlike msvc, when handling a thiscall + sret gcc will
* Put the sret in %ecx
* Put the this pointer is (%esp)

This fixes, for example, calling stringstream::str.

llvm-svn: 196312
2013-12-03 20:51:23 +00:00
Michael Liao 14b02848a3 Enhance the fix of PR17631
- The fix to PR17631 fixes part of the cases where 'vzeroupper' should
  not be issued before 'call' insn. There're other cases where helper
  calls will be inserted not limited to epilog. These helper calls do
  not follow the standard calling convention and won't clobber any YMM
  registers. (So far, all call conventions will clobber any or part of
  YMM registers.)
  This patch enhances the previous fix to cover more cases 'vzerosupper' should
  not be inserted by checking if that function call won't clobber any YMM
  registers and skipping it if so.

llvm-svn: 196261
2013-12-03 09:17:32 +00:00
Rafael Espindola aaaf02206a Also test the created stubs on 32 bits.
llvm-svn: 196052
2013-12-01 21:24:30 +00:00
Andrew Trick ca45c817c3 Add -mcpu to stackmap.ll
llvm-svn: 196051
2013-12-01 18:17:05 +00:00
Juergen Ributzka 5b6234dc4a Force CPU type to unbreak unit tests on Haswell machines.
llvm-svn: 195971
2013-11-30 03:07:16 +00:00
Rafael Espindola 09cf06c75e Cleanup and test X86AsmPrinter::printPCRelImm.
It is only used for asm printing.

On X86 we put basic block addresses on register before passing them to inline
asm, so the MO_MachineBasicBlock case was dead.

MO_ExternalSymbol was dead since any symbol being passed to inline asm
is represented as MO_GlobalAddress.

The MO_GlobalAddress and MO_Register cases were not tested.

llvm-svn: 195824
2013-11-27 06:53:13 +00:00
Michael Liao d617a3015d Fix PR18054
- Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG
  lowering where we need to check whether x is a vector type (in-reg
  type) of i8, i16 or i32; otherwise, that optimization is not valid.

llvm-svn: 195779
2013-11-26 20:31:31 +00:00
Andrew Trick 391dbadb51 StackMap: Implement support for DirectMemRefOp.
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.

For example:

entry:
  %a = alloca i64...
  llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)

Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.

llvm-svn: 195712
2013-11-26 02:03:25 +00:00
Cameron McInally c592e5251c Add an intrinsic for the SSE2 PAUSE instruction.
llvm-svn: 195697
2013-11-26 00:20:43 +00:00
Bill Wendling 9200bb08f9 Unrevert r195599 with testcase fix.
I'm not sure how it was checking for the wrong values...
PR18023.

llvm-svn: 195670
2013-11-25 18:05:22 +00:00
Amara Emerson f59125f5bb Revert r195599 as it broke the builds.
llvm-svn: 195636
2013-11-25 11:24:18 +00:00
Bill Wendling e3c48709ed Don't look past volatile loads.
A volatile load should block us from trying to coalesce stores.
PR18023

llvm-svn: 195599
2013-11-25 05:01:21 +00:00
Manman Ren d664bd7725 Debug Info: update testing cases to specify the debug info version number.
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.

Make tests more robust by removing hard-coded metadata numbers in CHECK lines.

llvm-svn: 195535
2013-11-23 01:16:29 +00:00
Manman Ren 409558f81e Debug Info: update testing cases to specify the debug info version number.
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.

llvm-svn: 195504
2013-11-22 21:49:45 +00:00
Jim Grosbach 860934a924 X86: Perform integer comparisons at i32 or larger.
Utilizing the 8 and 16 bit comparison instructions, even when an input can
be folded into the comparison instruction itself, is typically not worth it.
There are too many partial register stalls as a result, leading to significant
slowdowns. By always performing comparisons on at least 32-bit
registers, performance of the calculation chain leading to the
comparison improves. Continue to use the smaller comparisons when
minimizing size, as that allows better folding of loads into the
comparison instructions.

rdar://15386341

llvm-svn: 195496
2013-11-22 19:57:47 +00:00
Paul Robinson d89125a5d8 Teach ISel not to optimize 'optnone' functions (revised).
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
  SelectAllBasicBlocks; the flag is checked in various places, and
  FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
  something more subtle that doesn't work everywhere.

Based on work by Andrea Di Biagio.

llvm-svn: 195491
2013-11-22 19:11:24 +00:00
Andrew Trick 4a1abb7ab5 patchpoint: factor SD builder code for live vars. Plain stackmap also optimizes Constant values now.
llvm-svn: 195488
2013-11-22 19:07:36 +00:00
Michael Liao 02160d580b Fix PR18014
- When simplifying the mask generation for BLEND, check whether that mask is
  also consumed by other non-BLEND insns. If true, skip that simplification.

llvm-svn: 195476
2013-11-22 17:56:57 +00:00
Rafael Espindola 5a8e985ad3 Don't produce tail calls when the caller is x86_thiscallcc.
The callee will not pop the stack for us.

llvm-svn: 195467
2013-11-22 15:18:28 +00:00
Kostya Serebryany 4007009815 Revert r195318 as it causes miscompilation (PR18029)
llvm-svn: 195439
2013-11-22 10:30:39 +00:00
NAKAMURA Takumi 08be5afea4 Tweak 3 tests in llvm/test/CodeGen/X86 to add -mcpu=generic since r195383.
They failed on bdver2 buildslave.

FIXME: FileCheck-ize them.
llvm-svn: 195407
2013-11-22 02:28:04 +00:00
Ekaterina Romanova d5fa55470c SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction.
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.

It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.

llvm-svn: 195383
2013-11-21 23:21:26 +00:00
Bill Wendling 07787f8747 The basic problem is that some mainstream programs cannot deal with the way
clang optimizes tail calls, as in this example:

int foo(void);
int bar(void) {
 return foo();
}

where the call is transformed to:

  calll .L0$pb
.L0$pb:
  popl  %eax
.Ltmp0:
  addl  $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax
  movl  foo@GOT(%eax), %eax
  popl  %ebp
  jmpl  *%eax                   # TAILCALL

However, the GOT references must all be resolved at dlopen() time, and so this
approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which
usually populates the PLT with stubs that perform the actual resolving.

This patch changes X86TargetLowering::LowerCall() to skip tail call
optimization, if the called function is a global or external symbol.

Patch by Dimitry Andric!

PR15086

llvm-svn: 195318
2013-11-21 07:04:30 +00:00
Benjamin Kramer c8160d6523 MachineBlockPlacement: Strengthen the source order bias when picking an exit block.
We now only allow breaking source order if the exit block frequency is
significantly higher than the other exit block. The actual bias is
currently under a flag so the best cut-off can be found; the flag
defaults to the old behavior. The idea is to get some benchmark coverage
over different values for the flag and pick the best one.

When we require the new frequency to be at least 20% higher than the old
frequency I see a 5% speedup on zlib's deflate when compressing a random
file on x86_64/westmere. Hal reported a small speedup on Fhourstones on
a BG/Q and no regressions in the test suite.

The test case is the full long_match function from zlib's deflate. I was
reluctant to add it for previous tweaks to branch probabilities because
it's large and potentially fragile, but changed my mind since it's an
important use case and more likely to break with all the current work
going into the PGO infrastructure.

Differential Revision: http://llvm-reviews.chandlerc.com/D2202

llvm-svn: 195265
2013-11-20 19:08:44 +00:00
Elena Demikhovsky e1f9bf054f AVX-512: Concat 4 128-bit vectors in one 512-bit vector.
llvm-svn: 195229
2013-11-20 09:10:40 +00:00
Cameron McInally d1cd0be6f3 Fix assembly operands for the SSE2 cvtsd2ss instruction.
llvm-svn: 195129
2013-11-19 14:36:00 +00:00
Andrew Trick 0ab5ba8c35 Use symbolic operands in the patchpoint folding routine and fix a spilling bug.
Fixes <rdar://15487687> [JS] AnyRegCC argument ends up being spilled

llvm-svn: 195094
2013-11-19 03:29:59 +00:00
Bill Wendling 1ead4a484a Testcase for PR17964
llvm-svn: 194961
2013-11-17 10:53:19 +00:00
Benjamin Kramer bb1dd73d3e DAGCombiner: Partially revert r192795, getNOT was fixed not to create illegal constants.
llvm-svn: 194959
2013-11-17 10:40:03 +00:00
Andrew Trick 10d5be4e6e Added a size field to the stack map record to handle subregister spills.
Implementing this on bigendian platforms could get strange. I added a
target hook, getStackSlotRange, per Jakob's recommendation to make
this as explicit as possible.

llvm-svn: 194942
2013-11-17 01:36:23 +00:00
Bob Wilson 9f3e6b25ee Avoid illegal integer promotion in fastisel
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow.  Results are different when:

    sext(a) + sext(b) != sext(a + b)

Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.

<rdar://problem/15292280>

Patch by Duncan Exon Smith!

llvm-svn: 194840
2013-11-15 19:09:27 +00:00
Cameron McInally ad41f1f693 Add AVX512 unmasked FMA intrinsics and support.
llvm-svn: 194824
2013-11-15 17:01:14 +00:00
Alexey Samsonov 07cdeb4a6c Redirect unused test case output to /dev/null
llvm-svn: 194798
2013-11-15 09:36:58 +00:00
Andrew Trick 4f0794fd47 Platform proof a test case.
llvm-svn: 194788
2013-11-15 05:52:56 +00:00
Matt Arsenault b03bd4d96b Add addrspacecast instruction.
Patch by Michele Scandale!

llvm-svn: 194760
2013-11-15 01:34:59 +00:00
Eric Christopher ad59015bf7 Simplify testcase.
llvm-svn: 194748
2013-11-14 23:43:10 +00:00
Rafael Espindola f04bb72b61 Add a triple and switch test to FileCheck.
On windows we don't print .weak for function definitions, so count was only
finding 1 'weak'.

llvm-svn: 194713
2013-11-14 17:12:32 +00:00
Rafael Espindola 4929301af4 Error if we see an alias to a declaration.
In ELF and COFF an alias is just another offset in a section. There is no way
to represent an alias to something in another file.

In MachO, the spec has the N_INDR type which should allow for exactly that, but
is not currently implemented. Given that it is specified but not implemented,
we error in codegen to avoid miscompiling but don't reject aliases to
declarations in the verifier to leave the option open of implementing it.

In the past we have used alias to declarations as a way of implementing
weakref, which is why it exists in some old tests which this patch updates.

llvm-svn: 194705
2013-11-14 13:58:06 +00:00
Elena Demikhovsky 0a74b7da35 AVX-512: Handled extractelement from mask vector;
Added VMOSHDUP/VMOVSLDUP shuffle instructions.

llvm-svn: 194691
2013-11-14 11:29:27 +00:00
Andrew Trick 561f2218e0 Minor extension to llvm.experimental.patchpoint: don't require a call.
If a null call target is provided, don't emit a dummy call. This
allows the runtime to reserve as little nop space as it needs without
the requirement of emitting a call.

llvm-svn: 194676
2013-11-14 06:54:10 +00:00
Rafael Espindola fe4e088dfb Don't mangle \n and "
There is nothing special about quotes and newlines from the object
file point of view, only the assembler has to worry about expanding
the \n and \".

This patch then removes the special handling from the Mangler.

llvm-svn: 194667
2013-11-14 06:05:49 +00:00
Rafael Espindola fdc88137f4 Remove AllowQuotesInName and friends from MCAsmInfo.
Accepting quotes is a property of an assembler, not of an object file. For
example, ELF can support any names for sections and symbols, but the gnu
assembler only accepts quotes in some contexts and llvm-mc in a few more.

LLVM should not produce different symbols based on a guess about which assembler
will be reading the code it is printing.

llvm-svn: 194575
2013-11-13 14:01:59 +00:00
Andrew Trick 5469ae8f21 Add a test case to verify that misusing anyregcc crashes as expected.
llvm-svn: 194553
2013-11-13 03:46:19 +00:00
Juergen Ributzka 34c652d34d SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
This patch reapplies r193676 with an additional fix for the Hexagon backend. The
SystemZ backend has already been fixed by r194148.

The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. Now the type
legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

Reviewed by Nadav

llvm-svn: 194542
2013-11-13 01:57:54 +00:00
Andrew Trick 0ef482ef02 Cleanup the stackmap operand folding code and fix a corner case.
I still don't know how to refer to the fixed operands symbolically. I
plan to look into it.

llvm-svn: 194529
2013-11-12 22:58:39 +00:00
Andrew Trick 3112a5e4c0 Simplify operand folding when rematerializing a load.
We already know how to fold a reload from a frameindex without
analyzing the load instruction. Generalize this to handle any
frameindex load. This streamlines the logic for rematerializing loads
from stack arguments. As a side effect, it allows stackmaps to record
a stack argument location without spilling it.

Verified no effect on codegen for llvm test-suite.

llvm-svn: 194497
2013-11-12 18:06:12 +00:00
Andrew Trick a28099fdd4 Fix the recently added anyregcc convention to handle spilled operands.
Fixes <rdar://15432754> [JS] Assertion: "Folded a def to a non-store!"

The primary purpose of anyregcc is to prevent a patchpoint's call
arguments and return value from being spilled. They must be available
in a register, although the calling convention does not pin the
register. It's up to the front end to avoid using this convention for
calls with more arguments than allocatable registers.

llvm-svn: 194428
2013-11-11 22:40:25 +00:00
Juergen Ributzka 87ed906b2e [Stackmap] Materialize the jump address within the patchpoint noop slide.
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.

Differential Revision: http://llvm-reviews.chandlerc.com/D2074

Reviewed by Andy

llvm-svn: 194306
2013-11-09 01:51:33 +00:00
Juergen Ributzka 9969d3e6e8 [Stackmap] Add AnyReg calling convention support for patchpoint intrinsic.
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).

Differential Revision: http://llvm-reviews.chandlerc.com/D2009

Reviewed by Andy

llvm-svn: 194293
2013-11-08 23:28:16 +00:00
Andrew Trick 6664df12fb Slightly change the way stackmap and patchpoint intrinsics are lowered.
MorphNodeTo is not safe to call during DAG building. It eagerly
deletes dependent DAG nodes which invalidates the NodeMap. We could
expose a safe interface for morphing nodes, but I don't think it's
worth it. Just create a new MachineNode and replaceAllUsesWith.

My understaning of the SD design has been that we want to support
early target opcode selection. That isn't very well supported, but
generally works. It seems reasonable to rely on this feature even if
it isn't widely used.

llvm-svn: 194102
2013-11-05 22:44:04 +00:00
Eric Christopher 542c8d934d Check for both styles of clobbers, those produced by dragonegg and
those produced by clang for the inline asm bswap conversion.

Modified from a patch by Chris Smowton.

llvm-svn: 194016
2013-11-04 21:41:21 +00:00
Cameron McInally d80f7d34de Add support for AVX512 masked vector blend intrinsics.
llvm-svn: 194006
2013-11-04 19:14:56 +00:00
Elena Demikhovsky dacddb0bab AVX-512: added VPCONFLICT instruction and intrinsics,
added EVEX_KZ to tablegen

llvm-svn: 193959
2013-11-03 13:46:31 +00:00
Michael Liao b638d05ecb Fix PR17764
- When selecting BLEND from vselect, the operands need swapping as due to the
  difference between vselect and SSE/AVX's BLEND insn

llvm-svn: 193900
2013-11-02 00:10:02 +00:00
Andrew Trick f990411256 These test cases for experimental features are a bit too darwin-specific still. Use a triple.
llvm-svn: 193820
2013-10-31 22:46:51 +00:00
Andrew Trick a3a11dedca Add new calling convention for WebKit Java Script.
llvm-svn: 193812
2013-10-31 22:12:01 +00:00
Andrew Trick 153ebe6d2a Add support for stack map generation in the X86 backend.
Originally implemented by Lang Hames.

llvm-svn: 193811
2013-10-31 22:11:56 +00:00