Commit Graph

18 Commits

Author SHA1 Message Date
Gadi Haber 19c4fc5e62 This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.

Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky 
Differential Revision: https://reviews.llvm.org/D27901

llvm-svn: 290663
2016-12-28 10:12:48 +00:00
Craig Topper 36ecce9bed [X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the X86ISD::VZEXT_LOAD opcode.
Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics.

llvm-svn: 289424
2016-12-12 07:57:24 +00:00
Craig Topper 081c0e2864 [X86] Remove some intrinsic instructions from hasPartialRegUpdate
Summary:
These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits.

As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded.

Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too.

Reviewers: spatel, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27611

llvm-svn: 289419
2016-12-12 05:07:17 +00:00
Craig Topper a4a51f1afe [AVX-512] Add -show-mc-encoding to legacy vector intrinsic tests so we can see when VEX or EVEX encoded instructions are being emitted. Make sure the tests all have an avx2 command line and an skx command line.
llvm-svn: 286055
2016-11-06 02:03:58 +00:00
Michael Kuperstein 3e3652aef2 Recommit r274692 - [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.

The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD)
which was not appreciated by fast regalloc on 32-bit.

llvm-svn: 274802
2016-07-07 22:50:23 +00:00
Michael Kuperstein edb38a94f8 Revert r274692 to check whether this is what breaks windows selfhost.
llvm-svn: 274771
2016-07-07 16:55:35 +00:00
Michael Kuperstein 1ef6c59b1d [X86] Transform setcc + movzbl into xorl + setcc
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.

This fixes PR28146.

Differential Revision: http://reviews.llvm.org/D21774

llvm-svn: 274692
2016-07-06 21:56:18 +00:00
Simon Pilgrim 9602d678cb [X86][SSE] (Reapplied) Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.

Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.

Differential Revision: http://reviews.llvm.org/D20686

llvm-svn: 271131
2016-05-28 18:03:41 +00:00
Simon Pilgrim 4642a57fbf Revert: r270973 - [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm)
llvm-svn: 270976
2016-05-27 09:02:25 +00:00
Simon Pilgrim c013e5737b [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm)
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.

A companion patch (D20684) removes/auto-upgrade the clang intrinsics.

Differential Revision: http://reviews.llvm.org/D20686

llvm-svn: 270973
2016-05-27 08:49:15 +00:00
Simon Pilgrim d6469e3467 [X86][SSE41] Removed pblendw intrinsics tests - they are auto-upgraded
Equivalent tests included in sse41-intrinsics-x86-upgrade.ll - the i8/i32 immediate diff doesn't matter anymore

llvm-svn: 270767
2016-05-25 21:27:58 +00:00
Simon Pilgrim fa814259ad [X86][SSE41] Regenerated intrinsics tests
llvm-svn: 270764
2016-05-25 21:21:51 +00:00
Simon Pilgrim 1bed207f88 [X86][SSE41] Removed blendpd/blendps intrinsics tests - they are auto-upgraded
Equivalent tests included in sse41-intrinsics-x86-upgrade.ll

llvm-svn: 270761
2016-05-25 21:06:36 +00:00
Simon Pilgrim 9cb018b6b6 [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.

LLVM counterpart to D12835

Differential Revision: http://reviews.llvm.org/D13002

llvm-svn: 248368
2015-09-23 08:48:33 +00:00
Elena Demikhovsky 4aed59fc89 AVX-512: Enabled SSE intrinsics on AVX-512.
Predicate UseAVX depricates pattern selection on AVX-512.
This predicate is necessary for DAG selection to select EVEX form.
But mapping SSE intrinsics to AVX-512 instructions is not ready yet.
So I replaced UseAVX with HasAVX for intrinsics patterns.

llvm-svn: 237903
2015-05-21 14:01:32 +00:00
Chandler Carruth 373b2b1728 [x86] Fix a pretty horrible bug and inconsistency in the x86 asm
parsing (and latent bug in the instruction definitions).

This is effectively a revert of r136287 which tried to address
a specific and narrow case of immediate operands failing to be accepted
by x86 instructions with a pretty heavy hammer: it introduced a new kind
of operand that behaved differently. All of that is removed with this
commit, but the test cases are both preserved and enhanced.

The core problem that r136287 and this commit are trying to handle is
that gas accepts both of the following instructions:

  insertps $192, %xmm0, %xmm1
  insertps $-64, %xmm0, %xmm1

These will encode to the same byte sequence, with the immediate
occupying an 8-bit entry. The first form was fixed by r136287 but that
broke the prior handling of the second form! =[ Ironically, we would
still emit the second form in some cases and then be unable to
re-assemble the output.

The reason why the first instruction failed to be handled is because
prior to r136287 the operands ere marked 'i32i8imm' which forces them to
be sign-extenable. Clearly, that won't work for 192 in a single byte.
However, making thim zero-extended or "unsigned" doesn't really address
the core issue either because it breaks negative immediates. The correct
fix is to make these operands 'i8imm' reflecting that they can be either
signed or unsigned but must be 8-bit immediates. This patch backs out
r136287 and then changes those places as well as some others to use
'i8imm' rather than one of the extended variants.

Naturally, this broke something else. The custom DAG nodes had to be
updated to have a much more accurate type constraint of an i8 node, and
a bunch of Pat immediates needed to be specified as i8 values.

The fallout didn't end there though. We also then ceased to be able to
match the instruction-specific intrinsics to the instructions so
modified. Digging, this is because they too used i32 rather than i8 in
their signature. So I've also switched those intrinsics to i8 arguments
in line with the instructions.

In order to make the intrinsic adjustments of course, I also had to add
auto upgrading for the intrinsics.

I suspect that the intrinsic argument types may have led everything down
this rabbit hole. Pretty happy with the result.

llvm-svn: 217310
2014-09-06 10:00:01 +00:00
Craig Topper 551862d66b Replace sse41/sse42 with sse4.1/sse4.2 in test command lines to fix bots.
llvm-svn: 193311
2013-10-24 07:00:06 +00:00
Craig Topper 5b7c2b4f0d Add tests for SSE intrinsics in non-avx mode by copying from the AVX test cases. Some of these may have been tested by other tests, but most weren't. Patch by Cameron McInally.
llvm-svn: 193309
2013-10-24 06:45:13 +00:00