Commit Graph

45659 Commits

Author SHA1 Message Date
Hiroshi Inoue 84aafee4fb [SelectionDAG] set dereferenceable flag in MergeConsecutiveStores to fix assetion failure
When SelectionDAG merges consecutive stores and loads in MergeConsecutiveStores, it does not set dereferenceable flag for a created load instruction. This results in an assertion failure if SelectionDAG commonizes this load instruction with other load instructions, as well as it may miss optimization opportunities.

This patch sat dereferenceable flag for the newly created load instruction if all the load instructions to be merged are dereferenceable.

Differential Revision: https://reviews.llvm.org/D34679

llvm-svn: 306404
2017-06-27 12:43:08 +00:00
Ayman Musa 721d97f7b8 Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371
[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).

AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Differential Revision: https://reviews.llvm.org/D33188

llvm-svn: 306402
2017-06-27 12:08:37 +00:00
Diana Picus 0e74a134f8 [ARM] GlobalISel: Support G_SELECT for pointers
All we need to do is mark it as legal, otherwise it's just like s32.

llvm-svn: 306390
2017-06-27 10:29:50 +00:00
Simon Pilgrim 71d8b67bea [X86][AVX512] Regenerate avx512 arithmetic tests
llvm-svn: 306389
2017-06-27 10:13:56 +00:00
Daniel Sanders cc36dbf55d [globalisel][tablegen] Add support for EXTRACT_SUBREG.
Summary:
After this patch, we finally have test cases that require multiple
instruction emission.

Depends on D33590

Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls

Subscribers: javed.absar, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D33596

llvm-svn: 306388
2017-06-27 10:11:39 +00:00
Diana Picus 7145d22f81 [ARM] GlobalISel: Support G_SELECT for i32
* Mark as legal for (s32, i1, s32, s32)
* Map everything into GPRs
* Select to two instructions: a CMP of the condition against 0, to set
  the flags, and a MOVCCr to select between the two inputs based on the
  flags that we've just set

llvm-svn: 306382
2017-06-27 09:19:51 +00:00
Hiroshi Inoue 912eec378c [PowerPC] fix incorrect processor name for -mcpu in a test case
to surpress warnings. ppc970 should be 970 (or g5)

llvm-svn: 306380
2017-06-27 08:35:35 +00:00
Chandler Carruth 3f81d8024c [SROA] Fix PR32902 by more carefully propagating !nonnull metadata.
This is based heavily on the work done ni D34285. I mostly wanted to do
test cleanup for the author to save them some time, but I had a really
hard time understanding why it was so hard to write better test cases
for these issues.

The problem is that because SROA does a second rewrite of the loads and
because we *don't* propagate !nonnull for non-pointer loads, we first
introduced invalid !nonnull metadata and then stripped it back off just
in time to avoid most ways of this PR manifesting. Moving to the more
careful utility only fixes this by changing the predicate to look at the
new load's type rather than the target type. However, that *does* fix
the bug, and the utility is much nicer including adding range metadata
to model the nonnull property after a conversion to an integer.

However, we have bigger problems because we don't actually propagate
*range* metadata, and the utility to do this extracted from instcombine
isn't really in good shape to do this currently. It *only* handles the
case of copying range metadata from an integer load to a pointer load.
It doesn't even handle the trivial cases of propagating from one integer
load to another when they are the same width! This utility will need to
be beefed up prior to using in this location to get the metadata to
fully survive.

And even then, we need to go and teach things to turn the range metadata
into an assume the way we do with nonnull so that when we *promote* an
integer we don't lose the information.

All of this will require a new test case that looks kind-of like
`preserve-nonnull.ll` does here but focuses on range metadata. It will
also likely require more testing because it needs to correctly handle
changes to the integer width, especially as SROA actively tries to
change the integer width!

Last but not least, I'm a little worried about hooking the range
metadata up here because the instcombine logic for converting from
a range metadata *to* a nonnull metadata node seems broken in the face
of non-zero address spaces where null is not mapped to the integer `0`.
So that probably needs to get fixed with test cases both in SROA and in
instcombine to cover it.

But this *does* extract the core PR fix from D34285 of preventing the
!nonnull metadata from being propagated in a broken state just long
enough to feed into promotion and crash value tracking.

On D34285 there is some discussion of zero-extend handling because it
isn't necessary. First, the new load size covers all of the non-undef
(ie, possibly initialized) bits. This may even extend past the original
alloca if loading those bits could produce valid data. The only way its
valid for us to zero-extend an integer load in SROA is if the original
code had a zero extend or those bits were undef. And we get to assume
things like undef *never* satifies nonnull, so non undef bits can
participate here. No need to special case the zero-extend handling, it
just falls out correctly.

The original credit goes to Ariel Ben-Yehuda! I'm mostly landing this to
save a few rounds of trivial edits fixing style issues and test case
formulation.

Differental Revision: D34285

llvm-svn: 306379
2017-06-27 08:32:03 +00:00
Nicolai Haehnle 43cc6c4e0f AMDGPU: M0 operands to spill/restore opcodes are dead
Summary:
With scalar stores, M0 is clobbered and therefore marked as implicitly
defined. However, it is also dead.

This fixes an assertion when the Greedy Register Allocator decides to
optimize a spill/restore pair away again (via tryHintsRecoloring).

Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33319

llvm-svn: 306375
2017-06-27 08:04:13 +00:00
Igor Breger 925f088bae [GlobalISel][X86] Add fp32/62 legalizer, regbank-select, selection tests for G_FADD, G_FSUB, G_FMUL, G_FDIV. NFC.
llvm-svn: 306370
2017-06-27 07:01:54 +00:00
Mikael Holmen 37b5120a9a [Reassociate] Make sure EraseInst sets MadeChange
Summary:
EraseInst didn't report that it made IR changes through MadeChange.

It is essential that changes to the IR are reported correctly,
since for example ReassociatePass::run() will indicate that all
analyses are preserved otherwise.
And the CGPassManager determines if the CallGraph is up-to-date
based on status from InstructionCombiningPass::runOnFunction().

Reviewers: craig.topper, rnk, davide

Reviewed By: rnk, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34616

llvm-svn: 306368
2017-06-27 05:32:13 +00:00
Hiroshi Inoue 5102028f63 [PowerPC] set optimization level in SelectionDAGISel
PowerPC backend does not pass the current optimization level to SelectionDAGISel and so SelectionDAGISel works with the default optimization level regardless of the current optimization level.
This patch makes the PowerPC backend set the optimization level correctly.

Differential Revision: https://reviews.llvm.org/D34615

llvm-svn: 306367
2017-06-27 04:52:17 +00:00
Craig Topper f9319a78a5 [InstCombine] Add test cases demonstrating that we don't optmize select+cmp+cttz/ctlz when the bitwidth is larger than 64 bits.
llvm-svn: 306365
2017-06-27 04:50:47 +00:00
Chandler Carruth c691623585 [SROA] Further test cleanup and add a test for the actual propagation of
the nonnull attribute distinct from rewriting it into an assume.

llvm-svn: 306358
2017-06-27 03:08:45 +00:00
Chandler Carruth 3b978394ba [SROA] Clean up a test case a bit prior to adding more testing for
nonnull as part of fixing PR32902.

llvm-svn: 306353
2017-06-27 02:23:15 +00:00
Matthias Braun e2ae001982 ScheduleDAGInstrs: Fix fixupKills() adding too many kill flags.
Remove invalid shortcut in fixupKills(): A register needs to be marked
live even when we are not adding a kill flag. This is because a
partially live register must not get a kill flags, but it still needs to
be fully marked live when walking backwards.

llvm-svn: 306352
2017-06-27 00:58:48 +00:00
Wolfgang Pieb 9f65858235 DAGCombine: Make sure we only eliminate trunc/extend when the scales of truncation and extension match.
This fixes PR33368.

Reviewer: rksimon

Differential Revision:  https://reviews.llvm.org/D34069

llvm-svn: 306345
2017-06-26 23:05:51 +00:00
Dehao Chen 8b7effb344 revert r306336 for breaking ppc test.
llvm-svn: 306344
2017-06-26 23:05:35 +00:00
Sanjay Patel b859910eb2 [x86] add tests for missing sbb transforms; NFC
llvm-svn: 306337
2017-06-26 22:20:07 +00:00
Dehao Chen 79655792cc Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:

spec/2006/fp/C++/444.namd                 26.84  -0.31%
spec/2006/fp/C++/447.dealII               46.19  +0.89%
spec/2006/fp/C++/450.soplex               42.92  -0.44%
spec/2006/fp/C++/453.povray               38.57  -2.25%
spec/2006/fp/C/433.milc                   24.54  -0.76%
spec/2006/fp/C/470.lbm                    41.08  +0.26%
spec/2006/fp/C/482.sphinx3                47.58  -0.99%
spec/2006/int/C++/471.omnetpp             22.06  +1.87%
spec/2006/int/C++/473.astar               22.65  -0.12%
spec/2006/int/C++/483.xalancbmk           33.69  +4.97%
spec/2006/int/C/400.perlbench             33.43  +1.70%
spec/2006/int/C/401.bzip2                 23.02  -0.19%
spec/2006/int/C/403.gcc                   32.57  -0.43%
spec/2006/int/C/429.mcf                   40.35  +0.27%
spec/2006/int/C/445.gobmk                 26.96  +0.06%
spec/2006/int/C/456.hmmer                  24.4  +0.19%
spec/2006/int/C/458.sjeng                 27.91  -0.08%
spec/2006/int/C/462.libquantum            57.47  -0.20%
spec/2006/int/C/464.h264ref               46.52  +1.35%

geometric mean                                   +0.29%

The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.

I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.

Reviewers: hfinkel, mkuper, davidxl, chandlerc

Reviewed By: chandlerc

Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306336
2017-06-26 21:41:09 +00:00
Dehao Chen 38f1bc7834 Fix the bug when handling shufflevector for aarch64.
Summary: This Fixes https://bugs.llvm.org/show_bug.cgi?id=33600

Reviewers: mssimpso, davidxl, Carrot

Reviewed By: mssimpso

Subscribers: aemerson, rengolin, sanjoy, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D34641

llvm-svn: 306334
2017-06-26 21:33:51 +00:00
Matt Arsenault 53fae0772a RenameIndependentSubregs: Fix iterator problem
Fixes bug 33597.

Use of substituteRegister in the tied operand case messes
up the register use iterator, causing some uses to be left
unprocessed.

llvm-svn: 306333
2017-06-26 21:33:36 +00:00
Sam Clegg 933df2658d [WebAssembly] Add more support for weak symbols
Add weak symbol tests to MC
Add symbol flags to output of `llvm-readobj -t`.

Differential Revision: https://reviews.llvm.org/D34635

llvm-svn: 306330
2017-06-26 21:01:39 +00:00
Tim Northover c2d5e6d637 AArch64: legalize G_EXTRACT operations.
This is the dual problem to legalizing G_INSERTs so most of the code and
testing was cribbed from there.

llvm-svn: 306328
2017-06-26 20:34:13 +00:00
Tim Northover 9ac3e42211 AArch64: remove all kill flags when extending register liveness.
When we forward a stored value to a load and eliminate it entirely we need to
make sure the liveness of the register is maintained all the way to its use.
Previously we only cleared liveness on the store doing the forwarding, but
there could be other killing uses in between.

We already do the right thing when the load has to be converted into something
else, it was just this one path that skipped it.

llvm-svn: 306318
2017-06-26 18:49:25 +00:00
Simon Pilgrim d58f051792 [X86][SSE] Check SSE2/SSE3 codegen tests on i686 and x86_64
llvm-svn: 306314
2017-06-26 18:20:46 +00:00
Wei Mi 71f06420e4 [GVN] Recommit the patch "Add phi-translate support in scalarpre".
The recommit fixes three bugs: The first one is to use CurrentBlock instead of
PREInstr's Parent as param of performScalarPREInsertion because the Parent
of a clone instruction may be uninitialized. The second one is stop PRE when
CurrentBlock to its predecessor is a backedge and an operand of CurInst is
defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available. The third one is an out-of-bound
array access in a flipped if guard.

Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.

long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();

void foo(long a, long b, long c, long d) {

  g1 = a * b;
  if (__builtin_expect(g2 > 3, 0)) {
    a = c;
    b = d;
    g2 = a * b;
  }
  g3 = a * b;      // fully redundant.

}

The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.

llvm-svn: 306313
2017-06-26 18:16:10 +00:00
Matt Arsenault f28683cf51 AMDGPU: Setup SP/FP in callee function prolog/epilog
llvm-svn: 306312
2017-06-26 17:53:59 +00:00
Zachary Turner e79b07e41e [llvm-pdbutil] Add a mode to `bytes` for dumping split debug chunks.
llvm-svn: 306309
2017-06-26 17:22:36 +00:00
Ulrich Weigand af98b748f6 [SystemZ] Fix missing emergency spill slot corner case
We sometimes need emergency spill slots for the register scavenger.
This may be the case when code needs to access a stack slot that
has an offset of 4096 or more relative to the stack pointer.

To make that determination, processFunctionBeforeFrameFinalized
currently simply checks the total stack frame size of the current
function.  But this is not enough, since code may need to access
stack slots in the caller's stack frame as well, in particular
incoming arguments stored on the stack.

This commit fixes the problem by taking argument slots into account.

llvm-svn: 306305
2017-06-26 16:50:32 +00:00
Simon Pilgrim f07663876a [X86][SSE] Add combine tests for PMULDQ/PMULUDQ
Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations

llvm-svn: 306302
2017-06-26 16:22:52 +00:00
Ahmed Bougacha 58a197414e [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.
The non-AVX-512 behavior was changed in r248266 to match N1778
(C bindings for IEEE-754 (2008)), which defined the four functions
to not raise the inexact exception ("rint" is still defined as raising
it).

Update the AVX-512 lowering of these functions to match that: it should
not be different.

llvm-svn: 306299
2017-06-26 16:00:24 +00:00
Tom Stellard eb8f1e27d9 AMDGPU/GlobalISel: Mark 32-bit G_SHL as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34589

llvm-svn: 306298
2017-06-26 15:56:52 +00:00
Simon Pilgrim 0ad0e5802b [X86] Add test case for PR15981
llvm-svn: 306296
2017-06-26 15:53:11 +00:00
Sanjay Patel 15748d239e [x86] transform vector inc/dec to use -1 constant (PR33483)
Convert vector increment or decrement to sub/add with an all-ones constant:

add X, <1, 1...> --> sub X, <-1, -1...>
sub X, <1, 1...> --> add X, <-1, -1...>

The all-ones vector constant can be materialized using a pcmpeq instruction that is 
commonly recognized as an idiom (has no register dependency), so that's better than 
loading a splat 1 constant.

AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better
way to produce 512 one-bits.

The general advantages of this lowering are:
1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, 
   so in theory, this could be better for perf, but...

2. That seems unlikely to affect any OOO implementation, and I can't measure any real 
   perf difference from this transform on Haswell or Jaguar, but...

3. It doesn't look like it from the diffs, but this is an overall size win because we 
   eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting 
   a scalar load (which might itself be a bug), then we're replacing a scalar constant 
   load + broadcast with a single cheap op, so that should always be smaller/better too.

4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 
   and psub x, -1, so we should use that form for +1 too because we can. If there's some
   reason to favor a constant load on some CPU, let's make the reverse transform for all
   of these cases (either here in the DAG or in a later machine pass).

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=33483

Differential Revision: https://reviews.llvm.org/D34336

llvm-svn: 306289
2017-06-26 14:19:26 +00:00
Krzysztof Parzyszek 918e6d70bd [Hexagon] Handle cases when the aligned stack pointer is missing
llvm-svn: 306288
2017-06-26 14:17:58 +00:00
Jonas Paulsson 8c33647ba1 [SystemZ] Add a check against zero before calling getTestUnderMaskCond()
Csmith discovered that this function can be called with a zero argument,
in which case an assert for this triggered.

This patch also adds a guard before the other call to this function since
it was missing, although the test only covers the case where it was
discovered.

Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll.

Review: Ulrich Weigand
llvm-svn: 306287
2017-06-26 13:38:27 +00:00
Michael Zuckerman ce7e187f84 [X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test.
Adding base tast (to trunk) for Store strid=4 vf=32. 

llvm-svn: 306286
2017-06-26 13:27:32 +00:00
Serguei Katkov 0e70206c8f This reverts commit r306272.
Revert "[MBP] do not rotate loop if it creates extra branch"

It breaks the sanitizer build bots. Need to fix this.

llvm-svn: 306276
2017-06-26 06:51:45 +00:00
Hiroshi Inoue 4484ff03df fix trivial typo in comment, NFC
llvm-svn: 306274
2017-06-26 06:32:04 +00:00
Serguei Katkov b01fff06ed [MBP] do not rotate loop if it creates extra branch
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.

We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.

Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one

So if C is not a predecessor of H then we introduce extra branch.

This change actually prohibits rotation of the loop if both true
1) Best Exit has next element in chain as successor.
2) Last element in chain is not a predecessor of first element of chain.

Reviewers: iteratee, xur
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34271

llvm-svn: 306272
2017-06-26 05:27:27 +00:00
Matt Arsenault 10fc062b2b AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.

Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.

llvm-svn: 306264
2017-06-26 03:01:31 +00:00
Chandler Carruth 4a000883c7 [LoopSimplify] Re-instate r306081 with a bug fix w.r.t. indirectbr.
This was reverted in r306252, but I already had the bug fixed and was
just trying to form a test case.

The original commit factored the logic for forming dedicated exits
inside of LoopSimplify into a helper that could be used elsewhere and
with an approach that required fewer intermediate data structures. See
that commit for full details including the change to the statistic, etc.

The code looked fine to me and my reviewers, but in fact didn't handle
indirectbr correctly -- it left the 'InLoopPredecessors' vector dirty.

If you have code that looks *just* right, you can end up leaking these
predecessors into a subsequent rewrite, and crash deep down when trying
to update PHI nodes for predecessors that don't exist.

I've added an assert that makes the bug much more obvious, and then
changed the code to reliably clear the vector so we don't get this bug
again in some other form as the code changes.

I've also added a test case that *does* manage to catch this while also
giving some nice positive coverage in the face of indirectbr.

The real code that found this came out of what I think is CPython's
interpreter loop, but any code with really "creative" interpreter loops
mixing indirectbr and other exit paths could manage to tickle the bug.
I was hard to reduce the original test case because in addition to
having a particular pattern of IR, the whole thing depends on the order
of the predecessors which is in turn depends on use list order. The test
case added here was designed so that in multiple different predecessor
orderings it should always end up going down the same path and tripping
the same bug. I hope. At least, it tripped it for me without
manipulating the use list order which is better than anything bugpoint
could do...

llvm-svn: 306257
2017-06-25 22:45:31 +00:00
Chandler Carruth 73367b6a09 [LoopSimplify] Improve a test for loop simplify minorly. NFC.
I did some basic testing while looking for a bug in my recent change to
loop simplify and even though it didn't find the bug it seems like
a useful improvement anyways.

llvm-svn: 306256
2017-06-25 22:24:02 +00:00
Daniel Jasper 4c6cd4ccb7 Revert "[LoopSimplify] Factor the logic to form dedicated exits into a utility."
This leads to a segfault. Chandler already has a test case and should be
able to recommit with a fix soon.

llvm-svn: 306252
2017-06-25 17:58:25 +00:00
Simon Pilgrim 9956364a1f [X86] Add test case for PR15705
llvm-svn: 306246
2017-06-25 16:12:45 +00:00
Sanjay Patel 2f3ead7adc [InstCombine] add (sext i1 X), 1 --> zext (not X)
http://rise4fun.com/Alive/i8Q

A narrow bitwise logic op is obviously better than math for value tracking, 
and zext is better than sext. Typically, the 'not' will be folded into an 
icmp predicate.

The IR difference would even survive through codegen for x86, so we would see 
worse code:

https://godbolt.org/g/C14HMF

one_or_zero(int, int):                      # @one_or_zero(int, int)
        xorl    %eax, %eax
        cmpl    %esi, %edi
        setle   %al
        retq

one_or_zero_alt(int, int):                  # @one_or_zero_alt(int, int)
        xorl    %ecx, %ecx
        cmpl    %esi, %edi
        setg    %cl
        movl    $1, %eax
        subl    %ecx, %eax
        retq

llvm-svn: 306243
2017-06-25 14:15:28 +00:00
Elena Demikhovsky 72f991cded AVX-512: Fixed a crash during legalization of <3 x i8> type
The compiler fails with assertion during legalization of SETCC for <3 x i8> operands.
The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>.

Differential Revision: https://reviews.llvm.org/D34503

llvm-svn: 306242
2017-06-25 13:36:20 +00:00
Igor Breger f5035d6ee5 [GlobalISel][X86] Support vector type G_EXTRACT selection.
Summary:
Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33957

llvm-svn: 306240
2017-06-25 11:42:17 +00:00
Dorit Nuzman e0e0f1ddb0 [AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2
The cost of an interleaved access was only implemented for AVX512. For other
X86 targets an overly conservative Base cost was returned, resulting in
avoiding vectorization where it is actually profitable to vectorize.
This patch starts to add costs for AVX2 for most prominent cases of
interleaved accesses (stride 3,4 chars, for now).

Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb
workloads; There is also a known issue of 15-30% degradations on some of these
workloads, associated with an interleaved access followed by type
promotion/widening; the resulting shuffle sequence is currently inefficient and
will be improved by a series of patches that extend the X86InterleavedAccess pass
(such as D34601 and more to follow).

Note 2: The costs in this patch do not reflect port pressure penalties which can
be very dominant in the case of interleaved accesses since most of the shuffle
operations are restricted to a single port. Further tuning, that may incorporate
these considerations, will be done on top of the upcoming improved shuffle
sequences (that is, along with the abovementioned work to extend
X86InterleavedAccess pass).


Differential Revision: https://reviews.llvm.org/D34023

llvm-svn: 306238
2017-06-25 08:26:25 +00:00