and MOVSD nodes for single element vector inserts.
This is particularly important because a number of patterns in the
backend detect these patterns and leverage them to simplify things. It
also fixes quite a few of the insertion bad code examples. However, it
regresses a specific area: when available, blendps and blendpd are
*dramatically* faster than movss and movsd respectively. But it doesn't
really work to form the blend logic first because the blends *aren't* as
crazy efficient when the data is coming from memory anyways, and thus
will have a movss or movsd regardless. Also, doing that would block
a bunch of the patterns that this is designed to hit.
So my plan is to go into the patterns for lowering MOVSS and MOVSD and
lower them via blends when available. However that's a pretty invasive
restructuring so it will need to be a follow-up patch.
I have already gone into the patterns to lower MOVSS and MOVSD from
memory using MOVLPD, etc. Without that, several of the test cases
I already have regress.
llvm-svn: 218985
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.
See PR18996.
llvm-svn: 218981
I got them quite wrong when updating it and had the SSE4.1 run checked
for SSE2 and the SSE2 run checked for SSE4.1. I think everything was
actually generic SSE, but this still seems good to fix. While here,
hoist the triple into the IR and make the flag set a bit more direct in
what it is trying to test.
llvm-svn: 218978
lowering to match VZEXT_MOVL patterns.
I hadn't realized that these had sufficient pattern smarts in the
backend to lower zext-ing from the low element of a vector without it
being a scalar_to_vector node. They do, and this is how to match a bunch
of patterns for movq, movss, etc.
There is a weird propensity to end up using pshufd to place the element
afterward even though it means domain crossing (or rather, to use
xorps+movss to zext the element rather than movq) but that's an
orthogonal problem with VZEXT_MOVL that someone should probably look at.
llvm-svn: 218977
vector to a zero vector for the v2 cases and fix the v4 integer cases to
actually blend from a vector.
There are already seprate tests for the case of inserting from a scalar.
These cases cover a lot of the regressions I've seen in the regression
test suite for the new vector shuffle lowering and specifically cover
the reported lack of using various zext-ing instruction patterns. My
next patch should fix a big chunk of this, but wanted to get a nice
baseline for these patterns in the test cases first.
llvm-svn: 218976
element types to form illegal vector types.
I've added a special SSE1 test case here that makes sure we don't break
this going forward.
llvm-svn: 218974
testing that we generated divps and divss but not in a very systematic
way. There are other tests for widening binary operations already that
make these unnecessary.
The second one seems mostly about testing Atom as well as normal X86,
but despite the comment claiming it is testing a different instruction
sequence, it then tests for exactly the same div instruction sequence!
(The sequence of instructions is actually quite different on Atom, but
not the sequence of div instructions....)
And then it has an "execution" test that simply isn't run? Very strange.
Anyways, none of this is really needed so clean this up.
llvm-svn: 218972
intergrated much more fully into some logical part of the backend to
really understand what it is trying to accomplish and how to update it.
I suspect it no longer holds enough value to be worth having.
llvm-svn: 218950
shufle switch.
I nuked a win64 config from one test as it doesn't really make sense to
cover that ABI specially for generic v2f32 tests...
llvm-svn: 218948
This patch broke 447.dealII on Darwin. I'm currently working on a reduced
test-case, but reverting for now to keep the bots happy.
<rdar://problem/18530107>
llvm-svn: 218944
test cases that will change with the new vector shuffle lowering. This
gives us a nice baseline for deltas against. I've checked and removed
the cases where there were weird register usage being pinned down, and
all of these are extremely pin-pointed tests so fully checking them
seems very appropriate.
llvm-svn: 218941
tighter, more strict FileCheck assertions. Some of these I really like
as they show case exactly what instruction sequences come out of these
microscopic functionality tests.
llvm-svn: 218936
baseline for updates from the new vector shuffle lowering.
I've inspected the results here, and I couldn't find any register
allocation decisions where there should be any realistic way to register
allocate things differently. The closest was the imul test case. If you
see something here you'd like register number variables on, just shout
and I'll add them.
llvm-svn: 218935
need to be updated for the new vector shuffle lowering.
After talking to Adam Nemet, Tim Northover, etc., it seems that testing
MC encodings in the same suite as the basic codegen isn't the right
approach. Instead, we're going to want dedicated MC tests for the
encodings. These encodings are starting to get in my way so I wanted to
cut them out early. The total set of instructions that should have
encoding tests added is:
vpaddd
vsqrtss
vsqrtsd
vmovlhps
vmovhlps
valignq
vbroadcastss
Not too many parts of these tests were even using this. =]
llvm-svn: 218932
Older Book-E cores, such as the PPC 440, support only msync (which has the same
encoding as sync 0), but not any of the other sync forms. Newer Book-E cores,
however, do support sync, and for performance reasons we should allow the use
of the more-general form.
This refactors msync use into its own feature group so that it applies by
default only to older Book-E cores (of the relevant cores, we only have
definitions for the PPC440/450 currently).
llvm-svn: 218923
Summary:
Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.
For example, this patch turns
```
define void @store_i8_unordered(i8* %mem) {
store atomic i8 42, i8* %mem unordered, align 1
ret void
}
```
from
```
_store_i8_unordered: ; @store_i8_unordered
; BB#0:
rlwinm r2, r3, 3, 27, 28
li r4, 42
xori r5, r2, 24
rlwinm r2, r3, 0, 0, 29
li r3, 255
slw r4, r4, r5
slw r3, r3, r5
and r4, r4, r3
LBB4_1: ; =>This Inner Loop Header: Depth=1
lwarx r5, 0, r2
andc r5, r5, r3
or r5, r4, r5
stwcx. r5, 0, r2
bne cr0, LBB4_1
; BB#2:
blr
```
into
```
_store_i8_unordered: ; @store_i8_unordered
; BB#0:
li r2, 42
stb r2, 0(r3)
blr
```
which looks like a pretty clear win to me.
Test Plan:
fixed the tests + new test for indexed accesses + make check-all
Reviewers: jfb, wschmidt, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5587
llvm-svn: 218922
Do not eliminate the frame pointer if there is a stackmap or patchpoint in the
function. All stackmap references should be FP relative.
This fixes PR21107.
llvm-svn: 218920
This patch defines a new iterator for the imported symbols.
Make a change to COFFDumper to use that iterator to print
out imported symbols and its ordinals.
llvm-svn: 218915
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
llvm-svn: 218914
elements as well as integer elements in order to form simpler shuffle
patterns.
This is the primary reason why we were failing to match some of the
2-and-2 floating point shuffles such as PR21140. Even after fixing this
we need to support some extra patterns in the backend in order to match
the resulting X86ISD::UNPCKL nodes into the correct instructions. This
commit should fix PR21140 and includes more comprehensive testing of
insertion patterns in v4 shuffles.
Not all of the added tests are beautiful. For example, we don't have
clever instructions to insert-via-load in the integer domain. There are
also some places where we aren't sufficiently cunning with our use of
movq and movd, but that's future work.
llvm-svn: 218911
When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X.
This can happen in the real world when calculating x ** 3/2. This occurs
in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c.
Differential Revision: http://reviews.llvm.org/D5584
llvm-svn: 218906
floating point and integer domains.
Merge the AVX2 test into it and add an extra RUN line. Generate clean
FileCheck statements with my script. Remove the now merged AVX2 tests.
llvm-svn: 218903
When the flag is given, the command prints out the COFF import table.
Currently only the import table directory will be printed.
I'm going to make another patch to print out the imported symbols.
The implementation of import directory entry iterator in
COFFObjectFile.cpp was buggy. This patch fixes that too.
http://reviews.llvm.org/D5569
llvm-svn: 218891
My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]'
thus inverting index expression. This patch fixes it.
Thanks to Jörg Sonnenberger for pointing.
Differential Revision: http://reviews.llvm.org/D5576
llvm-svn: 218867
This file isn't really doing anything useful. Many of the tests that
seem to be combined are also repeats from other test files. Many of the
other tests, despite the comment that they should be combined into
a single shuffle... well... aren't combined into a single shuffle.
=/
llvm-svn: 218862
least seem *slightly* more interesting test wise, although given how
spotily we actually combine anything, I remain somewhat suspicious.
llvm-svn: 218861
checks for all the ISA variants.
If the SSE2 checks here terrify you, good. This is (in large part) the
kind of amazingly bad code that is holding LLVM back when vectorizing on
older ISAs.
At the same time, these tests seem increasingly dubious to me. There are
a very large number of tests and it isn't clear that they are
systematically covering a specific set of functionality. Anyways,
I don't want to reduce testing during the transition, I just want to
consolidate it to where it is easier to manage.
llvm-svn: 218860
file.
Some of these really don't make sense to test -- we're testing for the
*lack* of combining two shuffles into one, presumably because the two
would generate better shuffles in the end. But if you look at the
generated code shown here, in many cases the generated code is, frankly,
terrible. Or we combine any two generated shuffles back into a single
instruction! I've left a FIXME to revisit these decisions.
llvm-svn: 218859
and use the new grouped FileCheck patterns to match them.
No interesting changes yet, but this test is now in proper form to have
the other shuffle combining tests merged into it.
llvm-svn: 218857
The test has to do with DAG combines, and so it doesn't need the new
vector shuffle lowering to be effective. Also, it has a nice in-IR
triple string which we should really be using rather than command line
flags (unless it varies form RUN-line to RUN-line). Finally, I much
prefer letting LLVM synthesize the correct datalayout string from the
triple rather than baking one in here that will just become stale.
llvm-svn: 218856
generic DAG combining of shuffles relevant to x86.
My plan is to fold a bunch of the other DAG combining test cases into
this one, while converting them to use the nice new FileCheck assertion
syntax.
llvm-svn: 218855