Commit Graph

5840 Commits

Author SHA1 Message Date
Philip Reames f8f0933b48 Allow explicit spill slots to be specified for a gc.statepoint
This patch adds support for explicitly provided spill slots in the GC arguments of a gc.statepoint.  This is somewhat analogous to gcroot, but leverages the STATEPOINT MI node and StackMap infrastructure.  The motivation for this is:
1) The stack spilling code for gc.statepoints hasn't advanced as fast as I'd like.  One major option is to give up on doing spilling in the backend and do it at the IR level instead.  We'd give up the ability to have gc values in registers, but that's a minor cost in practice.  We are not neccessarily moving in that direction, but having the ability to prototype such a thing cheaply is interesting.
2) I want to port the gcroot lowering to use the statepoint infastructure.  Given the metadata printers for gcroot expect a fixed set of stack roots, it's easiest to just reuse the explicit stack slots and pass them directly to the underlying statepoint.  

I'm holding off on the documentation for the new feature until I'm reasonable sure this is going to stick around.

llvm-svn: 233356
2015-03-27 04:52:48 +00:00
Andrew Trick 5533adc117 Reintroduce the SelectionDAG scheduler test for r233351.
This test returns nonnative integer types which aren't supported on all targets.
The real issue with the SelectionDAG scheduler is with x86 EFLAGS.

llvm-svn: 233355
2015-03-27 04:42:52 +00:00
Duncan P. N. Exon Smith 219c8d3876 DebugInfo: Update testcases with invalid variables
Fix testcases whose variables are invalid.  I'm working on a patch that
adds `Verifier` checks for `MDLocalVariable` (and `MDGlobalVariable`),
and these failed because:

  - `scope:` fields need to point at `MDLocalScope` and can't be null.
  - `file:` fields need to point at `MDFile`.
  - `inlinedAt:` fields need to point at `MDLocation`.

llvm-svn: 233349
2015-03-27 01:58:34 +00:00
Andrea Di Biagio 8f7feec5fd [X86][FastIsel] Teach how to select vector load instructions.
This patch teaches fast-isel how to select 128-bit vector load instructions.
Added test CodeGen/X86/fast-isel-vecload.ll

Differential Revision: http://reviews.llvm.org/D8605

llvm-svn: 233270
2015-03-26 11:29:02 +00:00
Quentin Colombet 2c6e0597c6 [RegisterCoalescer] Add a rule to consider more profitable copies first when
those are in the same basic block.
The previous approach was the topological order of the basic block.

By default this rule is disabled.

Related to PR22768.

llvm-svn: 233241
2015-03-26 01:01:48 +00:00
Simon Pilgrim 09f3ff9a0a [DAGCombiner] Add support for TRUNCATE + FP_EXTEND vector constant folding
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.

It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.

Differential Revision: http://reviews.llvm.org/D8593

llvm-svn: 233224
2015-03-25 22:30:31 +00:00
Sanjay Patel 2f8f019daf [X86, AVX] improve insertion into zero element of 256-bit vector
This patch allows AVX blend instructions to handle insertion into the low
element of a 256-bit vector for the appropriate data types.

For f32, instead of:

   vblendps	$1, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1,2,3]
   vblendps	$15, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7]

we get:

   vblendps	$1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]

For f64, instead of:

   vmovsd	%xmm1, %xmm0, %xmm1     ## xmm1 = xmm1[0],xmm0[1]
   vblendpd	$3, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1],ymm0[2,3]

we get:

   vblendpd	$1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3]

For the hardware-neglected integer data types, I left a TODO comment in the
code and added regression tests for a follow-on patch.

Differential Revision: http://reviews.llvm.org/D8609

llvm-svn: 233199
2015-03-25 17:36:01 +00:00
Sanjay Patel defd9b9b4c use update_llc_test_checks.py to tighten checking in these tests
1. There were no CHECK-LABELs, so we could match instructions from the wrong function.
2. The use of zero operands meant multiple xor instructions could match some CHECKs.
3. The test was over-specified to need a Sandybridge CPU and Darwin triple.

llvm-svn: 233198
2015-03-25 17:34:11 +00:00
Andrea Di Biagio 07a26d6b2f [X86] Simplify check lines in tests. No functional change.
Also, removed unused check lines from test atomic6432.ll.

llvm-svn: 233181
2015-03-25 11:44:19 +00:00
Paul Robinson 284f0451cf 'optnone' should not disable DAG combiner.
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.

llvm-svn: 233153
2015-03-25 00:10:24 +00:00
Reid Kleckner 11470c48d0 X86: Fix frameescape when not using an FP
We can't use TargetFrameLowering::getFrameIndexOffset directly, because
Win64 really wants the offset from the stack pointer at the end of the
prologue. Instead, use X86FrameLowering::getFrameIndexOffsetFromSP(),
which is a pretty close approximiation of that. It fails to handle cases
with interestingly large stack alignments, which is pretty uncommon on
Win64 and is TODO.

llvm-svn: 233137
2015-03-24 23:46:01 +00:00
Sanjay Patel 99d246d7d7 [X86, AVX] recognize shufflevector with zero input as a vperm2 (PR22984)
vperm2x128 instructions have the special ability (aka free hardware capability)
to shuffle zero values into a vector.

This patch recognizes that type of shuffle and generates the appropriate
control byte.

https://llvm.org/bugs/show_bug.cgi?id=22984

Differential Revision: http://reviews.llvm.org/D8563

llvm-svn: 233100
2015-03-24 19:19:07 +00:00
Simon Pilgrim 481f4146cd [SelectionDAG] Fixed issue with uitofp vector constant folding being treated as sitofp
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):

%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0

The vector constant folding was always using sitofp:

%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>

This patch fixes this so that the correct opcode is used for sitofp and uitofp.

%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>

Differential Revision: http://reviews.llvm.org/D8560

llvm-svn: 233033
2015-03-23 22:44:55 +00:00
Duncan P. N. Exon Smith 9b9cc2dad4 DebugInfo: Overload get() in DIDescriptor subclasses
Continue to simplify the `DIDescriptor` subclasses, so that they behave
more like raw pointers.  Remove `getRaw()`, replace it with an
overloaded `get()`, and overload the arrow and cast operators.  Two
testcases started to crash on the arrow operators with this change
because of `scope:` references that weren't real scopes.  I fixed them.
Soon I'll add verifier checks for them too.

This also adds explicit dereference operators.  Previously, the builtin
dereference against `operator MDNode *()` would have worked, but now the
builtins are ambiguous.

llvm-svn: 233030
2015-03-23 21:54:07 +00:00
Simon Pilgrim 307cb8fe5d Tidied up vec_zero_cse.ll test. NFCI.
Added target triple and refactored the CHECKs to be per function.

llvm-svn: 232894
2015-03-21 14:05:12 +00:00
Eric Christopher c5a85af3b2 Cache the Function dependent subtarget on the MachineFunction.
As preparation for removing the getSubtargetImpl() call from
TargetMachine go ahead and flip the switch on caching the function
dependent subtarget and remove the bare getSubtargetImpl call
from the X86 port. As part of this add a few tests that show we
can generate code and assemble on X86 based on features/cpu on
the Function.

llvm-svn: 232879
2015-03-21 03:13:10 +00:00
Sanjay Patel c88f724fed [X86] Prefer blendps over insertps codegen for one special case
With this patch, for this one exact case, we'll generate:

  blendps %xmm0, %xmm1, $1

instead of:

  insertps %xmm0, %xmm1, $0

If there's a memory operand available for load folding and we're
optimizing for size, we'll still generate the insertps.

The detailed performance data motivation for this may be found in D7866; 
in summary, blendps has 2-3x throughput vs. insertps on widely used chips.

Differential Revision: http://reviews.llvm.org/D8332

llvm-svn: 232850
2015-03-20 21:19:52 +00:00
Daniel Jasper 214997c63b [MBP] Don't outline short optional branches
With the option -outline-optional-branches, LLVM will place optional
branches out of line (more details on r231230).

With this patch, this is not done for short optional branches. A short
optional branch is a branch containing a single block with an
instruction count below a certain threshold (defaulting to 3). Still
everything is guarded under -outline-optional-branches).

Outlining a short branch can't significantly improve code locality. It
can however decrease performance because of the additional jmp and in
cases where the optional branch is hot. This fixes a compile time
regression I have observed in a benchmark.

Review: http://reviews.llvm.org/D8108
llvm-svn: 232802
2015-03-20 10:00:37 +00:00
Sanjay Patel d5c2d287f9 [X86, AVX] use blends instead of insert128 with index 0
Another case of x86-specific shuffle strength reduction:
avoid generating insert*128 instructions with index 0 because
they are slower than their non-lane-changing blend equivalents.

Shuffle lowering already catches most of these cases, but
the zero vector case and some other paths such as in the
modified test in vector-shuffle-256-v32.ll were getting
through.

Differential Revision: http://reviews.llvm.org/D8366

llvm-svn: 232773
2015-03-19 22:29:40 +00:00
Simon Pilgrim cf1d7df2e3 Fixed failing test due to missing target triple causing different results on different buildbots.
llvm-svn: 232685
2015-03-18 22:51:45 +00:00
Simon Pilgrim 5ec5c9cafe [X86][SSE] Avoid scalarization of v2i64 vector shifts (REAPPLIED)
Fixed broken tests.

Differential Revision: http://reviews.llvm.org/D8416

llvm-svn: 232682
2015-03-18 22:18:51 +00:00
Eric Christopher 050f590a0c Revert "[X86][SSE] Avoid scalarization of v2i64 vector shifts" as it
appears to have broken tests/bots.

This reverts commit r232660.

llvm-svn: 232670
2015-03-18 21:01:00 +00:00
Simon Pilgrim 5c837edc2a [X86][SSE] Avoid scalarization of v2i64 vector shifts
Currently v2i64 vectors shifts (non-equal shift amounts) are scalarized, costing 4 x extract, 2 x x86-shifts and 2 x insert instructions - and it gets even more awkward on 32-bit targets.

This patch separately shifts the vector by both shift amounts and then shuffles the partial results back together, costing 2 x shuffles and 2 x sse-shifts instructions (+ 2 movs on pre-AVX hardware).

Note - this patch only improves the SHL / LSHR logical shifts as only these are supported in SSE hardware.

Differential Revision: http://reviews.llvm.org/D8416

llvm-svn: 232660
2015-03-18 19:35:31 +00:00
Matthias Braun 3b36533112 TableGen: Fix register class lane masks being too conservative.
When calculating the lanemask of a register class we have to include the
masks of subregisters supported by any of the class members, not just
the ones supported by all class members.

This fixes problems when coalescing towards a subclass with additional
subregisters available.

The attached testcase works fine as is, but does crash if you enable
subregister liveness on x86 without this change applied.

llvm-svn: 232652
2015-03-18 17:56:09 +00:00
Sanjay Patel e4cdb8fcb3 Use utils/update_llc_test_checks.py to update all CHECKs
The checks here were so vague that we could nuke intrinsics
from existence and still pass the test because we'd match
the function name.

llvm-svn: 232647
2015-03-18 16:38:44 +00:00
Sanjay Patel e90d0387d9 fixed to test features, not CPU model
The 'vmovntdq' was only passing due to a fluke in
SandyBridge codegen that splits 32-byte stores in half, 
but that meant that the test was not correctly checking
for the 32-byte store that we thought we were generating.

The lax checking in this file will be addressed in
another commit. There are bigger problems here.

llvm-svn: 232644
2015-03-18 16:07:10 +00:00
Daniel Jasper 9ec834036e Change test to accept an additional critical edge split.
The two hot blocks are right next to each other and I verified that
there is no performance regression by compressing/uncompressing some
files with a minigzip built with the different options.

llvm-svn: 232629
2015-03-18 12:45:45 +00:00
Josh Magee 89f5dec0bb Add testcases for BEXTR.
These BEXTR cases are a check for the 64-bit load form and two negative cases where the bitrange is non-contiguous.  From a private patch equivalent to r189742/PR17028.

llvm-svn: 232580
2015-03-18 01:34:06 +00:00
David Majnemer e48237df95 DAGCombiner: fold (xor (shl 1, x), -1) -> (rotl ~1, x)
Targets which provide a rotate make it possible to replace a sequence of
(XOR (SHL 1, x), -1) with (ROTL ~1, x).  This saves an instruction on
architectures like X86 and POWER(64).

Differential Revision: http://reviews.llvm.org/D8350

llvm-svn: 232572
2015-03-18 00:03:36 +00:00
David Majnemer 7db449a6e7 COFF: Let globals with private linkage reside in their own section
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.

Differential Revision: http://reviews.llvm.org/D8394

llvm-svn: 232570
2015-03-17 23:54:51 +00:00
David Majnemer 63b1d99943 Revert "COFF: Let globals with private linkage reside in their own section"
This reverts commit r232539.  This was committed accidently.

llvm-svn: 232543
2015-03-17 20:41:11 +00:00
David Majnemer 47e3842982 COFF: Let globals with private linkage reside in their own section
Summary:
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.

Reviewers: rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8374

llvm-svn: 232539
2015-03-17 20:39:25 +00:00
Rafael Espindola ba41539548 Replace a use of GetTempSymbol with createTempSymbol.
This is cleaner and avoids a crash in a corner case.

llvm-svn: 232471
2015-03-17 12:54:04 +00:00
David Majnemer a20616ec10 CodeGen: @llvm.eh.typeid.for replaced @llvm.eh.typeid.for.i32
We removed @llvm.eh.typeid.for.i32 and replaced it with
@llvm.eh.typeid.for quite some time ago.  Fix up some test cases which
never got updated.

llvm-svn: 232421
2015-03-16 21:36:38 +00:00
Duncan P. N. Exon Smith b786572d7c DebugInfo: Fix testcases that fail -verify-debug-info=true
As part of PR22777, fix testcases that fail the debug info verifier.
The changes fall into the following categories:

  - Empty `filename:` fields in `MDFile`s.  Compile units and some types
    require non-empty filenames.  A number of testcases have empty
    filenames, probably due to hand-reduction of testcases.
  - Not-quite empty arrays: `!{i32 0}`.  This used to be equivalent in
    the debug info schema to `!{}`.  They cause problems for
    `!MDSubroutineType`'s `types:` array, since it requires all operands
    to be valid types.  (Note that `!{null}` is the correct type array
    for functions that take no arguments and return `void`.)
  - Significantly bitrotted testcases.  Nodes got left behind a few
    upgrades ago because of missing or invalid tags.

llvm-svn: 232415
2015-03-16 21:10:12 +00:00
Sanjay Patel 11ce908e4c fixed to test feature, not CPU
llvm-svn: 232398
2015-03-16 18:24:28 +00:00
Sanjay Patel a8ec726bb6 add CHECK-LABELs for more reliable testing
llvm-svn: 232391
2015-03-16 17:59:07 +00:00
Sanjay Patel 43ec821cb8 fixed to test feature, not CPU; removed unnecessary declaration
llvm-svn: 232387
2015-03-16 17:01:34 +00:00
Rafael Espindola 933f51af54 Use the i8 immediate cmp instructions when possible.
llvm-svn: 232378
2015-03-16 14:25:08 +00:00
Simon Pilgrim 253fbb17e3 [SSE} Added tests for float4-float3 conversions (PR11580)
llvm-svn: 232324
2015-03-15 16:19:15 +00:00
Simon Pilgrim ece7475951 Simplified some stack folding tests.
Replaced explicit pmovzx* intrinsic tests with general shuffles

llvm-svn: 232286
2015-03-14 23:16:43 +00:00
Daniel Jasper 15e6954aea [MachineLICM] First steps of sinking GEPs near calls.
Specifically, if there are copy-like instructions in the loop header
they are moved into the loop close to their uses. This reduces the live
intervals of the values and can avoid register spills.

This is working towards a fix for http://llvm.org/PR22230.
Review: http://reviews.llvm.org/D7259

Next steps:
- Find a better cost model (which non-copy instructions should be sunk?)
- Make this dependent on register pressure

llvm-svn: 232262
2015-03-14 10:58:38 +00:00
Ahmed Bougacha 082c5c707a Add a bunch of CHECK missing colons in tests. NFC.
Some wouldn't pass;  fixed most, the rest will be fixed separately.

llvm-svn: 232239
2015-03-14 01:43:57 +00:00
Rafael Espindola e7ce9ec398 Use add32ri8 and friends on fast isel.
This fixes pr22854.

The core issue on the bug is that there are multiple instructions that
print the same in assembly. In fact, there doesn't seem to be any
syntax for specifying that a constant that fits in 8 bits should use a 32 bit
immediate.

The attached patch changes fast isel to consider i16immSExt8,
i32immSExt8, and i64immSExt8. They were disabled because fastisel didn’t know
to call the predicate back in the day.

llvm-svn: 232223
2015-03-13 22:18:18 +00:00
David Blaikie f72d05bc7b [opaque pointer type] Add textual IR support for explicit type parameter to gep operator
Similar to gep (r230786) and load (r230794) changes.

Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.

(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)

import fileinput
import sys
import re

rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)

def conv(match):
  line = match.group(1)
  line += match.group(4)
  line += ", "
  line += match.group(2)
  return line

line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
  sys.stdout.write(line[off:match.start()])
  sys.stdout.write(conv(match))
  off = match.end()
sys.stdout.write(line[off:])

llvm-svn: 232184
2015-03-13 18:20:45 +00:00
Andrea Di Biagio 510feca1b8 [X86][AVX] Fix wrong lowering of v4x64 shuffles into concat_vector plus extract_subvector nodes.
This patch fixes a bug in the shuffle lowering logic implemented by function
'lowerV2X128VectorShuffle'.

The are few cases where function 'lowerV2X128VectorShuffle' wrongly expands a
shuffle of two v4X64 vectors into a CONCAT_VECTORS of two EXTRACT_SUBVECTOR
nodes. The problematic expansion only occurs when the shuffle mask M has an
'undef' element at position 2, and M is equivalent to mask <0,1,4,5>.
In that case, the algorithm propagates the wrong vector to one of the two
new EXTRACT_SUBVECTOR nodes.

Example:
;;
define <4 x double> @test(<4 x double> %A, <4 x double> %B) {
entry:
  %0 = shufflevector <4 x double> %A, <4 x double> %B, <4 x i32><i32 undef, i32 1, i32 undef, i32 5>
  ret <4 x double> %0
}
;;

Before this patch, llc (-mattr=+avx) generated:
  vinsertf128 $1, %xmm0, %ymm0, %ymm0

With this patch, llc correctly generates:
  vinsertf128 $1, %xmm1, %ymm0, %ymm0

Added test lower-vec-shuffle-bug.ll

Differential Revision: http://reviews.llvm.org/D8259

llvm-svn: 232179
2015-03-13 17:29:49 +00:00
Sanjay Patel 4339abe66f [X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shuffles
This should complete the job started in r231794 and continued in r232045:
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

AVX2 introduced proper integer variants of the hacked integer insert/extract
C intrinsics that were created for this same functionality with AVX1.

This should complete the removal of insert/extract128 intrinsics.

The Clang precursor patch for this change was checked in at r232109.

llvm-svn: 232120
2015-03-12 23:16:18 +00:00
Simon Pilgrim fb53eded5f Removed useless palignr test - we don't actually provide a llvm.x86.ssse3.palign.r.128 intrinsic
Differential Revision: http://reviews.llvm.org/D8302

llvm-svn: 232108
2015-03-12 21:42:03 +00:00
Quentin Colombet f59b2d034c [X86] Fix a regression introduced by r223641.
The permps and permd instructions have their operands swapped compared to the
intrinsic definition. Therefore, they do not fall into the INTR_TYPE_2OP
category.

I did not create a new category for those two, as they are the only one AFAICT
in that case.

<rdar://problem/20108262>

llvm-svn: 232085
2015-03-12 19:34:12 +00:00
Andrea Di Biagio de2fb00a16 [X86] Fix wrong target specific combine on SETCC nodes.
Part of the folding logic implemented by function 'PerformISDSETCCCombine'
only worked under the assumption that the condition code in input could have
been either SETNE or SETEQ.
Unfortunately that assumption was incorrect, and in some cases the algorithm
ended up incorrectly folding SETCC nodes.

The incorrect folding only affected SETCC dag nodes where:
 - one of the operands was a build_vector of all zeroes;
 - the other operand was a SIGN_EXTEND from a vector of MVT:i1 elements;
 - the condition code was neither SETNE nor SETEQ.

Example:
  (setcc (v4i32 (sign_extend v4i1:%A)), (v4i32 VectorOfAllZeroes), setge)

Before this patch, the entire dag node sequence from the example was
incorrectly folded to node %A.

With this patch, the dag node sequence is folded to a
  (xor %A, (v4i1 VectorOfAllOnes)).

Added test setcc-combine.ll.

Thanks to Greg Bedwell for spotting this issue.

llvm-svn: 232046
2015-03-12 15:16:58 +00:00