Commit Graph

5554 Commits

Author SHA1 Message Date
Simon Pilgrim 0238b96c06 [X86] Force fp stack folding tests to keep to specific domain.
General boolean instructions (AND, ANDN, OR, XOR) need to use a specific domain instruction (and not just the default).

llvm-svn: 228495
2015-02-07 16:14:55 +00:00
Simon Pilgrim 947ce78d49 [X86][AVX2] More AVX2 integer stack folding tests.
llvm-svn: 228494
2015-02-07 16:07:27 +00:00
David Majnemer 5614ea9aae MC: Emit COFF section flags in the "proper" order
COFF section flags are not idempotent:
  'rd' will make a read-write section because 'd' implies write
  'dr' will make a read-only section because 'r' disables write

llvm-svn: 228490
2015-02-07 08:26:40 +00:00
Simon Pilgrim 76cb85a6c7 [X86][AVX2] Begun adding AVX2 integer stack folding tests.
llvm-svn: 228462
2015-02-06 23:12:15 +00:00
Reid Kleckner 526ec29370 Don't dllexport declarations
Fixes PR22488

llvm-svn: 228411
2015-02-06 17:59:49 +00:00
Matthias Braun 5cfba2f573 X86: Test cleanup
Use FileCheck, make it more consistent and do not rely on unoptimized
or(cmp,cmp) getting combined for max to be matched.

llvm-svn: 228361
2015-02-05 23:52:12 +00:00
Ahmed Bougacha e892d13d90 [CodeGen] Add hook/combine to form vector extloads, enabled on X86.
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."

That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).

The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.

It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up.  If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion.  At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.

This optimization is enabled unconditionally on X86.

Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).

Also note that the existing combine that forms extloads is now also
enabled on legal vectors.  This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.

Differential Revision: http://reviews.llvm.org/D6904

llvm-svn: 228325
2015-02-05 18:31:02 +00:00
Andrew Trick 7fc4583eda X86 ABI fix for return values > 24 bytes.
The return value's address must be returned in %rax.
i.e. the callee needs to copy the sret argument (%rdi)
into the return value (%rax).

This probably won't manifest as a bug when the caller is LLVM-compiled
code. But it is an ABI guarantee and tools expect it.

llvm-svn: 228321
2015-02-05 18:09:05 +00:00
Bruno Cardoso Lopes ab9ae87623 [X86][MMX] Handle i32->mmx conversion using movd
Implement a BITCAST dag combine to transform i32->mmx conversion patterns
into a X86 specific node (MMX_MOVW2D) and guarantee that moves between
i32 and x86mmx are better handled, i.e., don't use store-load to do the
conversion..

llvm-svn: 228293
2015-02-05 13:23:07 +00:00
Bruno Cardoso Lopes cc6089d2e0 [X86][MMX] Add several bitcast tests
Avoid regression in previously supported MMX code by adding different
combinations of tests which exercise MMX bitcasts. Small improvements
to these patterns should come next.

llvm-svn: 228292
2015-02-05 13:22:57 +00:00
Rafael Espindola a092f17580 Don' try to make sections in comdats SHF_MERGE.
Parts of llvm were not expecting it and we wouldn't print
the entity size of the section.

Given what comdats are used for, having SHF_MERGE sections would be
just a small improvement, so just disable it for now.

Fixes pr22463.

llvm-svn: 228196
2015-02-04 21:27:24 +00:00
Michael Kuperstein cd63c5fa73 Fixes a bug in vector load legalization that confused bits and bytes.
Differential Revision: http://reviews.llvm.org/D7400

llvm-svn: 228168
2015-02-04 18:54:01 +00:00
Chandler Carruth 4d31f58c88 [x86] Give movss and movsd execution domains in the x86 backend.
This associates movss and movsd with the packed single and packed double
execution domains (resp.). While this is largely cosmetic, as we now
don't have weird ping-pong-ing between single and double precision, it
is also useful because it avoids the domain fixing algorithm from seeing
domain breaks that don't actually exist. It will also be much more
important if we have an execution domain default other than packed
single, as that would cause us to mix movss and movsd with integer
vector code on a regular basis, a very bad mixture.

llvm-svn: 228135
2015-02-04 10:58:53 +00:00
Chandler Carruth 78c8dcd9d3 [x86] Remove a low-value test that was just checking how we cleared
a register. We have lots of tests covering this.

llvm-svn: 228133
2015-02-04 10:47:34 +00:00
Chandler Carruth bb525e336b [x86] Mechanically update a bunch of tests' check lines using the latest
version of the script.

Changes include:
- Using the VEX prefix
- Skipping more detail when we have useful shuffle comments to match
- Matching more shuffle comments that have been added to the printer
  (yay!)
- Matching the destination registers of some AVX instructions
- Stripping trailing whitespace that crept in
- Fixing indentation issues

Nothing interesting going on here. I'm just trying really hard to ensure
these changes don't show up in the diffs with actual changes to the
backend.

llvm-svn: 228132
2015-02-04 10:46:53 +00:00
Chandler Carruth 22b1525ae8 [x86] Include the destination register in the check-lines for AVX
instructions.

No actual change here.

llvm-svn: 228127
2015-02-04 09:18:27 +00:00
Chandler Carruth 18ba596609 [x86] Add some tests I missed in the prior commit to cover blends with
zero for v8i16 as well.

These exhibit the same domain badness, but also exhibit other weaknesses
in our blend lowering. More fixes to come.

llvm-svn: 228126
2015-02-04 09:15:46 +00:00
Chandler Carruth 024cf8efd7 [x86] Start to introduce bit-masking based blend lowering.
This is the simplest form of bit-math based blending which only fires
when we are blending with zero and is relatively profitable. I've only
enabled this path on very specific lowering strategies. I'm planning to
widen its applicability in subsequent patches, but so far you'll notice
that even though we get fewer shufps instructions, we *still* do the bit
math in the FP execution port. I'm looking into why this is still
happening.

llvm-svn: 228124
2015-02-04 09:06:05 +00:00
Chandler Carruth 872d80e7a4 [x86] Add tests for blends-with-zero on 4-element vectors.
llvm-svn: 228122
2015-02-04 09:05:58 +00:00
Chandler Carruth abd09a1f35 [x86] Refresh the checks of a number of tests using
update_llc_test_checks.py.

The exact format of the checks has changed over time. This includes
different indenting rules, new shuffle comments that have been added,
and more operand hiding behind regular expressions.

No functional change to the tests are expected here, but this will make
subsequent patches have a clean diff as they change shuffle lowering.

llvm-svn: 228097
2015-02-04 00:58:42 +00:00
Chandler Carruth abde67eb1c [x86] Switch to using the long '--check-prefix' form which the
update_llc_test_checks.py script uses, and refresh the checks in this
test.

No functionality changed here, just bringing this test up to work with
automated updates using the python script.

llvm-svn: 228096
2015-02-04 00:58:40 +00:00
Chandler Carruth 52332dc620 [x86] Port this test to use utils/update_llc_test_checks.py.
This will make it easy to update as I change some parts of the X86
backend, makes it more clear what instruction differences are
introduced, and I find it makes it a bit easier to read as well.

llvm-svn: 228095
2015-02-04 00:58:37 +00:00
Sanjay Patel b82b8d6b84 improved CHECK
llvm-svn: 228086
2015-02-04 00:24:06 +00:00
Simon Pilgrim 46cd4f7400 [X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2
Patch to match cases where shuffle masks can be reduced to bit shifts. Similar to byte shift shuffle matching from D5699.

Differential Revision: http://reviews.llvm.org/D6649

llvm-svn: 228047
2015-02-03 21:58:29 +00:00
Chandler Carruth 1fff318a41 [x86] Add two truly horrific test cases for the new vector shuffle
lowering. I'm prepping patches to improve these, and this will let the
delta of those patches show the improvement. =]

llvm-svn: 228044
2015-02-03 21:56:28 +00:00
Chandler Carruth 4ce669d91c [x86] Update the indent and layout of some tests in this file. NFC
This is just to remove voise from using the update_llc_test_checks
script.

llvm-svn: 228043
2015-02-03 21:56:24 +00:00
Chandler Carruth a4a77ed59e [x86] Tweak my update script to use test case function names starting
with 'stress' to indicate that the specific output isn't interesting and
relax them to only check the last instruction (a ret).

I've updated the one test case that really uses this to name the one
'stress_test' which was actually producing output we can directly check.
With this, the script doesn't introduce noise when run over the v16 test
file.

llvm-svn: 228033
2015-02-03 21:26:45 +00:00
Simon Pilgrim d9885856e6 [X86][SSE] Added general integer shuffle matching for MOVQ instruction
This patch adds general shuffle pattern matching for the MOVQ zero-extend instruction (copy lower 64bits, zero upper) for all 128-bit integer vectors, it is added as a fallback test in lowerVectorShuffleAsZeroOrAnyExtend.

llvm-svn: 228022
2015-02-03 20:09:18 +00:00
Simon Pilgrim 6544f815b3 [X86][AVX2] Enabled shuffle matching for the AVX2 zero extension (128bit -> 256bit) vpmovzx* instructions.
Differential Revision: http://reviews.llvm.org/D7251

llvm-svn: 228014
2015-02-03 19:34:09 +00:00
Rafael Espindola 8d911b4419 Fix typo in test/CodeGen/X86/sibcall.ll (pr22331).
llvm-svn: 228011
2015-02-03 19:20:26 +00:00
Sanjay Patel b7d5628784 Merge consecutive 16-byte loads into one 32-byte load (PR22329)
This patch detects consecutive vector loads using the existing 
EltsFromConsecutiveLoads() logic. This fixes:
http://llvm.org/bugs/show_bug.cgi?id=22329

This patch effectively reverts the tablegen additions of D6492 / 
http://reviews.llvm.org/rL224344 ...which in hindsight were a horrible hack.

The test cases that were added with that patch are simply modified to load
from varying offsets of a base pointer. These loads did not match the existing
tablegen patterns.

A happy side effect of doing this optimization earlier is that we can now fold
the load into a math op where possible; this is shown in some of the updated
checks in the test file.

Differential Revision: http://reviews.llvm.org/D7303

llvm-svn: 228006
2015-02-03 18:54:00 +00:00
Sanjay Patel ffd039bde1 Fix program crashes due to alignment exceptions generated for SSE memop instructions (PR22371).
r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit.
The commit log says that change did not affect anything, but that's not correct.
That change allowed SSE instructions to have unaligned mem operands folded into
math ops, and that's not allowed in the default specification for any SSE variant. 

The bug is exposed when compiling for an AVX-capable CPU that had this feature
flag but without enabling AVX codegen. Another mistake in r224330 was not adding
the feature flag to all AVX CPUs; the AMD chips were excluded.

This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ).

This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem".
Changed the existing test case for the feature bit to reflect the new name and
renamed the test file itself to better reflect the feature.
Added runs to fold-vex.ll to check for the failing codegen.

Note that the feature bit is not set by default on any CPU because it may require a
configuration register setting to enable the enhanced unaligned behavior.

llvm-svn: 227983
2015-02-03 17:13:04 +00:00
Sanjay Patel a4276fb294 Improve test to actually check for a folded load.
This test was checking for lack of a "movaps" (an aligned load)
rather than a "movups" (an unaligned load). It also included
a store which complicated the checking.

Add specific CPU runs to prevent subtarget feature flag overrides
from inhibiting this optimization.

llvm-svn: 227972
2015-02-03 15:37:18 +00:00
Bruno Cardoso Lopes 077774b820 [X86][MMX] Improve transfer from mmx to i32
Improve EXTRACT_VECTOR_ELT DAG combine to catch conversion patterns
between x86mmx and i32 with more layers of indirection.

Before:
  movq2dq %mm0, %xmm0
  movd %xmm0, %eax
After:
  movd %mm0, %eax

llvm-svn: 227969
2015-02-03 14:46:49 +00:00
Alex Rosenberg bacd479a5d Revert part of r227437 as it was unnecessary. Thanks to echristo for
pointing this out.

llvm-svn: 227897
2015-02-02 23:58:54 +00:00
Bruno Cardoso Lopes f7410e5292 [X86][MMX] Add tests for MMX extract element
LLVM ToT produces poor MMX code compared to 3.5. However, part of the previous
functionality can be achieved by using -x86-experimental-vector-widening-legalization.
Add tests to be sure we don't regress again.

llvm-svn: 227869
2015-02-02 22:00:48 +00:00
Bruno Cardoso Lopes e4716e65b3 [X86][MMX] Cleanup shuffle, bitcast and insert element tests
- Merge MMX arg passing test files
- Merge MMX bitcast, insert elt and shuffle tests

llvm-svn: 227867
2015-02-02 21:56:11 +00:00
Sanjay Patel 122a7a7098 fix typo
llvm-svn: 227815
2015-02-02 17:47:30 +00:00
Michael Kuperstein 13fbd45263 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

(Re-commit of r227728)

Differential Revision: http://reviews.llvm.org/D6789

llvm-svn: 227752
2015-02-01 16:56:04 +00:00
Michael Kuperstein e86aa9a8a4 Revert r227728 due to bad line endings.
llvm-svn: 227746
2015-02-01 16:15:07 +00:00
Michael Kuperstein bd57186c76 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

Differential Revision: http://reviews.llvm.org/D6789

llvm-svn: 227728
2015-02-01 11:44:44 +00:00
Elena Demikhovsky 534a99d878 AVX2: Added 2 more tests for gather intrinsics.
llvm-svn: 227718
2015-02-01 08:52:15 +00:00
Simon Pilgrim 9c76b47469 [X86][SSE] Shuffle mask decode support for zero extend, scalar float/double moves and integer load instructions
This patch adds shuffle mask decodes for integer zero extends (pmovzx** and movq xmm,xmm) and scalar float/double loads/moves (movss/movsd).

Also adds shuffle mask decodes for integer loads (movd/movq).

Differential Revision: http://reviews.llvm.org/D7228

llvm-svn: 227688
2015-01-31 14:09:36 +00:00
Ahmed Bougacha 2d80ea1939 [X86] Cleanup tabs in test vector-zext.ll. NFC.
Some tests have tabs, some don't.
In vector-[sz]ext.ll, space wins (well duh!).

llvm-svn: 227615
2015-01-30 21:41:28 +00:00
Reid Kleckner a580b6ec67 Win64: Put a REX_W prefix on all TAILJMP* instructions
MSDN's x64 software conventions page says that this is one of the fixed
list of legal epilogues:
https://msdn.microsoft.com/en-us/library/tawsa7cb.aspx

Presumably this is how the unwinder distinguishes epilogue jumps from
in-function control flow.

Also normalize the way we place "## TAILCALL" comments on such jumps.

llvm-svn: 227611
2015-01-30 21:03:31 +00:00
Reid Kleckner ca9b9feb2c x86: Fix large model calls to __chkstk for dynamic allocas
In the large code model, we now put __chkstk in %r11 before calling it.

Refactor the code so that we only do this once. Simplify things by using
__chkstk_ms instead of __chkstk on cygming. We already use that symbol
in the prolog emission, and it simplifies our logic.

Second half of PR18582.

llvm-svn: 227519
2015-01-29 23:58:04 +00:00
Reid Kleckner dafc2ae1ad Update comments to use unreachable instead of llvm.trap, as implemented now
win64: Call __chkstk through a register with the large code model

Fixes half of PR18582. True dynamic allocas will still have a
CALL64pcrel32 which will fail.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7267

llvm-svn: 227503
2015-01-29 22:33:00 +00:00
Robert Lougher c69cfeeafa [X86] Use single add/sub for large stack offsets
For large stack offsets the compiler generates multiple immediate mode
sub/add instructions in the prologue/epilogue.  This patch makes the
compiler place the final amount to be added/subtracted into a register,
which is then added/substracted with a single operation.

Differential Revision: http://reviews.llvm.org/D7226

llvm-svn: 227458
2015-01-29 16:18:29 +00:00
Alex Rosenberg 96e833a6a6 Make the test actually test what it's supposed to test. Add a test for the from memory variant of vcvtph2ps for 256-bit.
llvm-svn: 227446
2015-01-29 15:19:54 +00:00
Alex Rosenberg c235500295 Cleanup a few tests on sse4a machines and FileCheckize along the way.
llvm-svn: 227437
2015-01-29 13:31:32 +00:00