Commit Graph

26342 Commits

Author SHA1 Message Date
Chandler Carruth 44deb8015c [x86] Introduce tests covering the gamut of 256-bit vector shuffling.
These are just test cases, no actual code yet. This establishes the
baseline fallback strategy we're starting from on AVX2 and the expected
lowering we use on AVX1.

Also, these test cases are very much generated. I've manually crafted
the specific pattern set that I'm hoping will be useful at exercising
the lowering code, but I've not (and could not) manually verify *all* of
these. I've spot checked and they seem legit to me.

As with the rest of vector shuffling, at a certain point the only really
useful way to check the correctness of this stuff is through fuzz
testing.

llvm-svn: 218267
2014-09-22 20:25:08 +00:00
Sanjay Patel 7939d7229d Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2).
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, 
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.

The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.

Differential Revision: http://reviews.llvm.org/D5347

llvm-svn: 218263
2014-09-22 18:54:01 +00:00
Akira Hatanaka f2a721a875 Fix test case commited in r218242 to appease buildbot.
llvm-svn: 218261
2014-09-22 18:07:20 +00:00
Tom Stellard 9f73851e39 Revert "R600/SI: Add support for global atomic add"
This reverts commit r218254.

The global_atomics.ll test fails with asserts disabled.  For some reason,
the compiler fails to produce the atomic no return variants.

llvm-svn: 218257
2014-09-22 16:44:04 +00:00
Frederic Riss 220fa48491 Fix a test introduced in r218246 to work also on Windows.
llvm-svn: 218255
2014-09-22 16:17:32 +00:00
Tom Stellard 2355a77e74 R600/SI: Add support for global atomic add
llvm-svn: 218254
2014-09-22 15:35:35 +00:00
Pavel Chupin be9f12102f [x32] Fix segmented stacks support
Summary:
Update segmented-stacks*.ll tests with x32 target case and make
corresponding changes to make them pass.

Test Plan: tests updated with x32 target

Reviewers: nadav, rafael, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5245

llvm-svn: 218247
2014-09-22 13:11:35 +00:00
Frederic Riss 955724e3f5 [dwarfdump] Dump full filenames as DW_AT_(decl|call)_file attribute values
Reviewers: dblaikie samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5192

llvm-svn: 218246
2014-09-22 12:36:04 +00:00
Frederic Riss 58ed53cfcd Allow DWARFDebugInfoEntryMinimal::getSubroutineName to resolve cross-unit references.
Summary: getSubroutineName is currently only used by llvm-symbolizer, thus add a binary test containing a cross-cu inlining example.

Reviewers: samsonov, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5394

llvm-svn: 218245
2014-09-22 12:35:53 +00:00
Robert Lougher 6da8a243f9 Fix assert when decoding PSHUFB mask
The PSHUFB mask decode routine used to assert if the mask index was out of
range (<0 or greater than the size of the vector).  The problem is, we can
legitimately have a PSHUFB with a large index using intrinsics.  The
instruction only uses the least significant 4 bits.  This change removes the
assert and masks the index to match the instruction behaviour.

llvm-svn: 218242
2014-09-22 11:54:38 +00:00
Oliver Stannard 14f97d0017 Downgrade DWARF2 section limit error to a warning
We currently emit an error when trying to assemble a file with more
than one section using DWARF2 debug info. This should be a warning
instead, as the resulting file will still be usable, but with a
degraded debug illusion.

llvm-svn: 218241
2014-09-22 10:45:16 +00:00
Chandler Carruth 7158c95d65 [x86] Move the AVX v4i64 test cases down to group them together.
Increasingly I don't want to mix the integer and floating point tests,
especially with AVX where they are handled quite differently.

llvm-svn: 218233
2014-09-22 03:05:23 +00:00
Chandler Carruth 12bbf7d922 [x86] Back out a bad choice about lowering v4i64 and pave the way for
a more sane approach to AVX2 support.

Fundamentally, there is no useful way to lower integer vectors in AVX.
None. We always end up with a VINSERTF128 in the end, so we might as
well eagerly switch to the floating point domain and do everything
there. This cleans up lots of weird and unlikely to be correct
differences between integer and floating point shuffles when we only
have AVX1.

The other nice consequence is that by doing things this way we will make
it much easier to write the integer lowering routines as we won't need
to duplicate the logic to check for AVX vs. AVX2 in each one -- if we
actually try to lower a 256-bit vector as an integer vector, we have
AVX2 and can rely on it. I think this will make the code much simpler
and more comprehensible.

Currently, I've disabled *all* support for AVX2 so that we always fall
back to AVX. This keeps everything working rather than asserting. That
will go away with the subsequent series of patches that provide
a baseline AVX2 implementation.

Please note, I'm going to implement AVX2 *without access to hardware*.
That means I cannot correctness test this path. I will be relying on
those with access to AVX2 hardware to do correctness testing and fix
bugs here, but as a courtesy I'm trying to sketch out the framework for
the new-style vector shuffle lowering in the context of the AVX2 ISA.

llvm-svn: 218228
2014-09-22 00:32:15 +00:00
Chandler Carruth 5d45962b2c [x86] Teach the new vector shuffle lowering how to cleverly lower single
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.

I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.

llvm-svn: 218226
2014-09-21 23:46:13 +00:00
Chandler Carruth b195e860f9 [x86] Add a bunch of test cases where we have different shuffle patterns
in the high and low 128-bit lanes of a v8f32 vector.

No functionality change yet, but wanted to set up the baseline for my
next patch which will make these quite a bit better. =]

llvm-svn: 218224
2014-09-21 23:32:42 +00:00
Chandler Carruth b3125c7522 [x86] Teach the new vector shuffle lowering to re-use the SHUFPS
lowering when it can use a symmetric SHUFPS across both 128-bit lanes.

This required making the SHUFPS lowering tolerant of other vector types,
and adjusting our canonicalization to canonicalize harder.

This is the last of the clever uses of symmetry I've thought of for
v8f32. The rest of the tricks I'm aware of here are to work around
assymetry in the mask.

llvm-svn: 218216
2014-09-21 13:35:14 +00:00
Chandler Carruth 33eda72802 [x86] Teach the new vector shuffle lowering the basics about insertion
of a single element into a zero vector for v4f64 and v4i64 in AVX.
Ironically, there is less to see here because xor+blend is so crazy fast
that we can't really beat that to zero the high 128-bit lane.

llvm-svn: 218214
2014-09-21 12:49:46 +00:00
Chandler Carruth 43f5974ea0 [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and
UNPCKHPS with AVX vectors by recognizing those patterns when they are
repeated for both 128-bit lanes.

With this, we now generate the exact same (really nice) code for
Quentin's avx_test_case.ll which was the most significant regression
reported for the new shuffle lowering. In fact, I'm out of specific test
cases for AVX lowering, the rest were AVX2 I think. However, there are
a bunch of pretty obvious remaining things to improve with AVX...

llvm-svn: 218213
2014-09-21 12:20:44 +00:00
Chandler Carruth 78f4798913 [x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in
preparation for enhancing their support in the new vector shuffle
lowering.

llvm-svn: 218212
2014-09-21 12:13:11 +00:00
Chandler Carruth 88404c4f9b [x86] Begin teaching the new vector shuffle lowering among the most
important bits of cleverness: to detect and lower repeated shuffle
patterns between the two 128-bit lanes with a single instruction.

This patch just teaches it how to lower single-input shuffles that fit
this model using VPERMILPS. =] There is more that needs to happen here.

llvm-svn: 218211
2014-09-21 12:01:19 +00:00
Chandler Carruth 83252ac8f4 [x86] Regenerate this test case now that I've improved my script for
generating the test cases to format things more consistently and
actually catch all the operand sequences that should be elided in favor
of the asm comments. No actual changes here.

llvm-svn: 218210
2014-09-21 11:51:33 +00:00
Chandler Carruth e81bfbada9 [x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct
VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows
down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the
reciprocal throughput on Ivy Bridge CPUs much like it does everywhere
for 128-bits. There isn't a downside, so just eagerly use this
instruction when it suffices.

llvm-svn: 218208
2014-09-21 11:17:55 +00:00
Chandler Carruth 6aea21df8e [x86] Add some more comprehensive tests for v4f64 blending.
llvm-svn: 218207
2014-09-21 11:12:19 +00:00
Chandler Carruth 908afb56c0 [x86] Re-generate a bunch of the v4f64 test cases with my new script.
This expands the integer cases to cover the fact that AVX2 moves their
lane-crossing shuffles into the integer domain. It also adds proper
support for AVX2 run lines and the "ALL" group when it doesn't matter.

llvm-svn: 218206
2014-09-21 11:07:41 +00:00
Chandler Carruth 293327ddcd [x86] Teach the new vector shuffle lowering the first step toward more
actual support for complex AVX shuffling tricks. We can do independent
blends of the low and high 128-bit lanes of an avx vector, so shuffle
the inputs into place and then do the blend at 256 bits. This will in
many cases remove one blend instruction.

The next step is to permute the low and high halves in-place rather than
extracting them and re-inserting them.

llvm-svn: 218202
2014-09-21 09:35:22 +00:00
David Majnemer 48227a3759 MC: Support aligned COMMON symbols for COFF
link.exe:
Fuzz testing has shown that COMMON symbols with size > 32 will always
have an alignment of at least 32 and all symbols with size < 32 will
have an alignment of at least the largest power of 2 less than the size
of the symbol.

binutils:
The BFD linker essentially work like the link.exe behavior but with
alignment 4 instead of 32.  The BFD linker also supports an extension to
COFF which adds an -aligncomm argument to the .drectve section which
permits specifying a precise alignment for a variable but MC currently
doesn't support editing .drectve in this way.

With all of this in mind, we decide to play a little trick: we can
ensure that the alignment will be respected by bumping the size of the
global to it's alignment.

llvm-svn: 218201
2014-09-21 09:18:07 +00:00
Chandler Carruth 8ff73c0170 [x86] Add some more test cases covering specific blend patterns.
llvm-svn: 218200
2014-09-21 09:01:26 +00:00
Chandler Carruth 7a6108d652 [x86] Add the beginnings of some tests for our v8f32 shuffle lowering
under AVX.

This really just documents the current state of the world. I'm going to
try to flesh it out to cover any test cases I plan to improve prior to
improving them so that the delta made by changes is actually visible to
code reviewers.

This is made easier by the fact that I now have a script to automate the
process of producing test cases including the check lines. =]

llvm-svn: 218199
2014-09-21 08:49:27 +00:00
Chandler Carruth a454812ac8 [x86] Teach the new vector shuffle lowering to use VPERMILPD for
single-input shuffles with doubles. This allows them to fold memory
operands into the shuffle, etc. This is just the analog to the v4f32
case in my prior commit.

llvm-svn: 218193
2014-09-20 22:09:27 +00:00
Chandler Carruth aa5b798ae7 [x86] Add an AVX run to the 128-bit v2 tests, teach them to have
a generic SSE and AVX mode in addition to a specific AVX1 test path, and
flesh out the AVX tests.

llvm-svn: 218192
2014-09-20 21:26:41 +00:00
David Majnemer fb83977538 Update tests which broke from r218189
llvm-svn: 218191
2014-09-20 21:18:43 +00:00
Chandler Carruth 6f80abac4e [x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS
instruction for single-vector floating point shuffles. This in turn
allows the shuffles to fold a load into the instruction which is one of
the common regressions hit with the new shuffle lowering.

llvm-svn: 218190
2014-09-20 20:52:07 +00:00
David Majnemer 7d0dc3ef18 MC: Fix MCSectionCOFF::PrintSwitchToSection
We had a few bugs:
- We were considering the GVKind instead of just looking at the section
  characteristics
- We would never print out 'y' when a section was meant to be unreadable
- We would never print out 's' when a section was meant to be shared
- We translated IMAGE_SCN_MEM_DISCARDABLE to 'n' when it should've meant
  IMAGE_SCN_LNK_REMOVE

llvm-svn: 218189
2014-09-20 20:40:50 +00:00
Chandler Carruth 78a761ce8c [x86] Start moving to a fancier check syntax to reduce the need for
duplication of check lines. The idea is to have broad sets of
compilation modes that will frequently diverge without having to always
and immediately explode to the precise ISA feature set.

While this already helps due to VEX encoded differences, it will help
much more as I teach the new shuffle lowering about more of the new VEX
encoded instructions which can still be used to implement 128-bit
shuffles.

llvm-svn: 218188
2014-09-20 18:36:39 +00:00
David Majnemer b8dbebb31c MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF
A problem with our old behavior becomes observable under x86-64 COFF
when we need a read-only GV which has an initializer which is referenced
using a relocation: we would mark the section as writable.  Marking the
section as writable interferes with section merging.

This fixes PR21009.

llvm-svn: 218179
2014-09-20 07:31:46 +00:00
Chandler Carruth 8c4cccd4aa [x86] Teach the v4f32 path of the new shuffle lowering to handle the
tricky case of single-element insertion into the zero lane of a zero
vector.

We can't just use the same pattern here as we do in every other vector
type because the general insertion logic can handle insertion into the
non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we
have INSERTPS that is a much better choice than the generic one for such
lowerings. But INSERTPS can do lots of other lowerings as well so
factoring its logic into the general insertion logic doesn't work very
well. We also can't just extract the core common part of the general
insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that
lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than
a blend while INSERTPS is slower! So instead we do a restrictive
condition on attempting to use the generic insertion logic to narrow it
to those cases where VZEXT_MOVL won't need a shuffle afterward and thus
will do better than INSERTPS. Then we try blending. Then we go back to
INSERTPS.

This still doesn't generate perfect code for some silly reasons that can
be fixed by tweaking the td files for lowering VZEXT_MOVL to use
XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends
up in a register rather than a load from memory -- BLENDPSrr has twice
the reciprocal throughput of MOVSSrr. Don't you love this ISA?

llvm-svn: 218177
2014-09-20 04:15:22 +00:00
Chandler Carruth 00389f3ed9 [x86] Generalize the single-element insertion lowering to work with
floating point types and use it for both v2f64 and v2i64 single-element
insertion lowering.

This fixes the last non-AVX performance regression test case I've gotten
of for the new vector shuffle lowering. There is obvious analogous
lowering for v4f32 that I'll add in a follow-up patch (because with
INSERTPS, v4f32 requires special treatment). After that, its AVX stuff.

llvm-svn: 218175
2014-09-20 03:32:25 +00:00
David Majnemer f4dc456eef llvm-readobj: pretty-print special COFF section names
Print IMAGE_SYM_DEBUG and the like instead of (-2).

llvm-svn: 218172
2014-09-20 00:25:06 +00:00
Peter Collingbourne 975726345c Fix crash with an insertvalue that produces an empty object.
llvm-svn: 218171
2014-09-20 00:10:47 +00:00
Matt Arsenault de0253791c R600: Un-xfail a test which passes with pass disabled
llvm-svn: 218165
2014-09-19 23:02:20 +00:00
Matt Arsenault 5e5b242946 R600/SI: Un-xfail tests which work now
llvm-svn: 218164
2014-09-19 23:02:18 +00:00
Matt Arsenault a986554377 R600/SI: Un xfail a test that works now
llvm-svn: 218162
2014-09-19 22:42:40 +00:00
Juergen Ributzka 92e8978e40 [FastIsel][AArch64] Fix a think-o in address computation.
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.

There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.

llvm-svn: 218161
2014-09-19 22:23:46 +00:00
Chandler Carruth 0fc0c22fa9 [x86] Fully generalize the zext lowering in the new vector shuffle
lowering to support both anyext and zext and to custom lower for many
different microarchitectures.

Using this allows us to get *exactly* the right code for zext and anyext
shuffles in all the vector sizes. For v16i8, the improvement is *huge*.
The new SSE2 test case added I refused to add before this because it was
sooooo muny instructions.

llvm-svn: 218143
2014-09-19 20:00:32 +00:00
Justin Bogner a829fde160 llvm-cov: Prevent a test from matching its own check lines
Since llvm-cov shows the source file in its output, be careful about
potentially matching the check lines themselves.

llvm-svn: 218138
2014-09-19 19:04:08 +00:00
David Blaikie db119544a2 Fix test case to be portable to different architectures.
llvm-svn: 218134
2014-09-19 18:31:25 +00:00
Matt Arsenault 4505f3a73d R600/SI: Fix test to prepare for scheduler
llvm-svn: 218131
2014-09-19 18:11:16 +00:00
David Blaikie 3a7ce252cc Omit DW_TAG_subprograms for subprograms without inlined subroutines when producing -gmlt data
To reduce the size of -gmlt data, skip the subprograms without any
inlined subroutines. Since we've now got the ability to make these
determinations in the backend (funnily enough - we added the flag so we
wouldn't produce ranges under -gmlt, but with this change we use the
flag, but go back to producing ranges under -gmlt).

Instead, just produce CU ranges to inform the consumer which parts of
the code are described by this CU's line table. Tools could inspect the
line table directly to compute the range, but the CU ranges only seem to
be about 0.5% of object/executable size, so I'm not too worried about
teaching llvm-symbolizer that trick just yet - it's certainly a possible
piece of future work.

Update an llvm-symbolizer test just to demonstrate that this schema is
acceptable there (if it wasn't, the compiler-rt tests would catch this,
but good to have an in-llvm-tree test for llvm-symbolizer's behavior
here)

Building the clang binary with -gmlt with this patch reduces the total
size of object files by 5.1% (5.56% without ranges) without compression
and the executable by 4.37% (4.75% without ranges).

llvm-svn: 218129
2014-09-19 17:03:16 +00:00
Hal Finkel 62ac736faa Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

llvm-svn: 218120
2014-09-19 11:42:56 +00:00
Chandler Carruth 8a6536d4b2 [x86] Recognize that we can use duplication to widen v16i8 shuffles due
to undef lanes as well as defined widenable lanes. This dramatically
improves the lowering we use for undef-shuffles in a zext-ish pattern
for SSE2.

llvm-svn: 218115
2014-09-19 09:45:21 +00:00
Chandler Carruth 662b6d84e7 [x86] Actually test the SSE2 lowering for most of the zext-ish shuffles.
Not sure why I only did SSSE3 here. Also, I've left out some of the SSE2
ones because the shuffles are so absurd it's not worth transcribing
them. Will try to fix them to be sane and then check them.

llvm-svn: 218114
2014-09-19 08:51:06 +00:00
Chandler Carruth 2e275142cd [x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32
shuffles that are zext-ing.

Not a lot to see here; the undef lane variant is better handled with
pshufd, but this improves the actual zext pattern.

llvm-svn: 218112
2014-09-19 08:37:44 +00:00
Justin Bogner 13ba23bb79 llvm-cov: Fix dropped lines when filters were applied
Uncovered lines in the middle of a covered region weren't being shown
when filtering to a particular function.

llvm-svn: 218109
2014-09-19 08:13:16 +00:00
Chandler Carruth 398ba9a018 [x86] Add a dedicated lowering path for zext-compatible vector shuffles
to the new vector shuffle lowering code.

This allows us to emit PMOVZX variants consistently for patterns where
it is a viable lowering. This instruction is both fast and allows us to
fold loads into it. This only hooks the new lowering up for i16 and i8
element widths, mostly so I could manage the change to the tests. I'll
add the i32 one next, although it is significantly less interesting.

One thing to note is that we already had some tests for these patterns
but those tests had far less horrible instructions. The problem is that
those tests weren't checking the strict start and end of the instruction
sequence. =[ As a consequence something changed in the lowering making
us generate *TERRIBLE* code for these patterns in SSE2 through SSSE3.
I've consolidated all of the tests and spelled out the madness that we
currently emit for these shuffles. I'm going to try to figure out what
has gone wrong here.

llvm-svn: 218102
2014-09-19 06:07:49 +00:00
Jiangning Liu ffbc690933 Optimize sext/zext insertion algorithm in back-end.
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.

llvm-svn: 218101
2014-09-19 05:30:35 +00:00
David Blaikie 03c3dbeb62 Omit DW_AT_frame_base under -gmlt for size
llvm-svn: 218100
2014-09-19 04:55:05 +00:00
David Blaikie 73b65d236c Omit all the extra static attributes on subprograms in -gmlt
This omission will be done in a fancier manner once we're dealing with
"put gmlt in the skeleton CUs under fission" - it'll have to be
conditional on the kind of CU we're emitting into (skeleton or gmlt).

llvm-svn: 218098
2014-09-19 04:30:36 +00:00
Hans Wennborg c0f0c511db Fix an it's vs. its typo.
llvm-svn: 218093
2014-09-19 01:14:56 +00:00
Matt Arsenault 46cbc4367b R600: Better fix for bug 20982
Just do the left shift as unsigned to avoid the UB.

llvm-svn: 218092
2014-09-19 00:42:06 +00:00
Chandler Carruth be58fd2f2d [x86] Extend this test to cover SSE4.1. Nothing interesting here, but
paves the way for subsequent changes.

llvm-svn: 218091
2014-09-19 00:30:24 +00:00
Peter Collingbourne 6b433e4d46 Try to fix i686-cygming bots.
llvm-svn: 218086
2014-09-18 22:56:00 +00:00
Peter Collingbourne 10039c02ea LTO: introduce object file-based on-disk module format.
This format is simply a regular object file with the bitcode stored in a
section named ".llvmbc", plus any number of other (non-allocated) sections.

One immediate use case for this is to accommodate compilation processes
which expect the object file to contain metadata in non-allocated sections,
such as the ".go_export" section used by some Go compilers [1], although I
imagine that in the future we could consider compiling parts of the module
(such as large non-inlinable functions) directly into the object file to
improve LTO efficiency.

[1] http://golang.org/doc/install/gccgo#Imports

Differential Revision: http://reviews.llvm.org/D4371

llvm-svn: 218078
2014-09-18 21:28:49 +00:00
Quentin Colombet 17799fedb7 [ARM] Do not perform a tail call when the caller returns several values.
The fix is slightly different then x86 (see r216117) because the number of values
attached to a return can vary even for a single returned value (e.g., f64 yields
two returned values).

<rdar://problem/18352998>

llvm-svn: 218076
2014-09-18 21:17:50 +00:00
Robin Morisset 5349e8e532 Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
Summary:
This patch was originally in D5304 (I could not find a way to reopen that revision).
It was accepted, commited and broke the build bots because the overloading of
the constructor of ArrayRef for braced initializer lists is not supported by all
toolchains. I then reverted it, and propose this fixed version that uses a plain
C array instead in makeDMB (that array is then converted implicitly to an
ArrayRef, but that is not behind an ifdef). Could someone confirm me whether
initialization lists for plain C arrays are supported by every toolchain used
to build llvm ? Otherwise I can just initialize the array in the old way:
args[0] = ...; .. ; args[5] = ...;

Below is the description of the original patch:
```
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.

Thanks to luqmana for having noticed this bug.
```

Test Plan: Added more cases to atomic-load-store.ll + make check-all

Reviewers: jfb, t.p.northover, luqmana

Subscribers: llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D5386

llvm-svn: 218066
2014-09-18 18:56:04 +00:00
Matt Arsenault 6462f94884 R600: Bug 20982 - Avoid undefined left shift of negative value
I'm not sure what the hardware actually does, so don't
bother trying to fold it for now.

llvm-svn: 218057
2014-09-18 15:52:26 +00:00
Chandler Carruth 9057fcaf82 [x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate.
There is no purpose in using it for single-input shuffles as
pshufd is just as fast and doesn't tie the two operands. This removes
a substantial amount of wrong-domain blend operations in SSSE3 mode. It
also completes the usage of PALIGNR for integer shuffles and addresses
one of the test cases Quentin hit with the new vector shuffle lowering.

There is still the question of whether and when to use this for floating
point shuffles. It is faster than shufps or shufpd but in the integer
domain. I don't yet really have a good heuristic here for when to use
this instruction for floating point vectors.

llvm-svn: 218038
2014-09-18 09:00:25 +00:00
Chandler Carruth 0fe4928fbe [x86] Add an SSSE3 run and check mode to the 128-bit v2 tests of the new
vector shuffle lowering. This will be needed for up-coming palignr
tests.

llvm-svn: 218037
2014-09-18 08:33:04 +00:00
Juergen Ributzka 1d3a312e2d Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ."
Reverting it until I have time to investigate a regression.

llvm-svn: 218035
2014-09-18 08:07:40 +00:00
Juergen Ributzka 0f3076785f Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies.
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.

llvm-svn: 218034
2014-09-18 07:26:26 +00:00
Juergen Ributzka 2964b832ef [FastISel][AArch64] Simplify XALU multiplies.
Simplify {s|u}mul.with.overflow to {s|u}add.with.overflow when possible.

llvm-svn: 218033
2014-09-18 07:04:54 +00:00
Juergen Ributzka 2fc851002b [FastISel][AArch64] Followup commit for 218031 to handle negative offsets too.
llvm-svn: 218032
2014-09-18 07:04:49 +00:00
Juergen Ributzka a33070c321 [FastISel][AArch64] Try to fold the offset into the add instruction when simplifying a memory address.
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.

llvm-svn: 218031
2014-09-18 05:40:47 +00:00
Juergen Ributzka 99b7758ba0 [FastISel][AArch64] Fold 'AND' instruction during the address computation.
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.

and  x1, x1, #0xffffffff   ---> ldrb x0, [x0, w1, uxtw]
ldrb x0, [x0, x1]

llvm-svn: 218030
2014-09-18 05:40:41 +00:00
Chandler Carruth e0d77ef053 [x86] Add an SSSE3 run to the v4 shuffle test.
llvm-svn: 218028
2014-09-18 04:38:32 +00:00
Saleem Abdulrasool bfdfb14a8f ARM: prevent crash on ELF directives on COFF
Certain directives are unsupported on Windows (some of which could/should be
supported).  We would not diagnose the use but rather crash during the emission
as we try to access the Target Streamer.  Add an assertion to prevent creating a
NULL reference (which is not permitted under C++) as well as a test to ensure
that we can diagnose the disabled directives.

llvm-svn: 218014
2014-09-18 04:28:29 +00:00
Chandler Carruth 867930aadf [x86] Initial step of teaching the new vector shuffle lowering about
PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where
it is completely unmatched. It also introduces the logic for detecting
rotation shuffle masks even in the presence of single input or blend
masks and arbitrarily undef lanes.

I've added fairly comprehensive tests for the matching logic in v8i16
because the tests at that size are much easier to write and manage.

I've not checked the SSE2 code generated for these tests because the
code is *horrible*. It is absolute madness. Testing it will just make
the test brittle without giving any interesting improvements in the
correctness confidence.

llvm-svn: 218013
2014-09-18 04:11:29 +00:00
Saleem Abdulrasool 8c61c6c0f9 ARM: use a more precise check for MachO
Rather than relying on support for a specific directive to determine if we are
targeting MachO, explicitly check the output format.

As an additional bonus, cleanup the caret diagnostic for the non-MachO case and
avoid the spurious error caused by not discarding the statement.

llvm-svn: 218012
2014-09-18 03:49:55 +00:00
Juergen Ributzka c35fb03661 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

llvm-svn: 218010
2014-09-18 02:44:13 +00:00
Samuel Antao 61570df715 Fix FastISel bug in boolean returns for PowerPC.
For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. 

llvm-svn: 217993
2014-09-17 23:25:06 +00:00
Juergen Ributzka f6430314b4 [FastISel][AArch64] Custom lower sdiv by power-of-2.
Emit an optimized instruction sequence for sdiv by power-of-2 depending on the
exact flag.

This fixes rdar://problem/18224511.

llvm-svn: 217986
2014-09-17 21:55:55 +00:00
Nick Kledzik 3e95fa431e [llvm-objdump] clean up test cases now that build bots are green
llvm-svn: 217985
2014-09-17 21:53:07 +00:00
Justin Bogner 5cbed6e09e llvm-cov: Push some more debug output into the View (NFC)
llvm-svn: 217984
2014-09-17 21:48:52 +00:00
Rafael Espindola 51bd8ee309 Internalize common symbols when we can.
This fixes pr20974.

llvm-svn: 217981
2014-09-17 20:41:13 +00:00
Juergen Ributzka c611d72754 [FastISel][AArch64] Simplify mul to shift when possible.
This is related to rdar://problem/18369687.

llvm-svn: 217980
2014-09-17 20:35:41 +00:00
Alexey Samsonov 7bddb0a56a Exclude known and bugzilled failures from UBSan bootstrap
llvm-svn: 217979
2014-09-17 20:17:52 +00:00
Juergen Ributzka 3871c69422 [FastISel][AArch64] Fold mul into add/sub and logical operations.
Try to fold the multiply into the add/sub or logical operations (when
possible).

This is related to rdar://problem/18369687.

llvm-svn: 217978
2014-09-17 19:51:38 +00:00
Juergen Ributzka 22d4cd0a4f [FastISel][AArch64] Fold mul into the address computation of memory operations.
Teach 'computeAddress' to also fold multiplies into the address computation
(when possible).

This fixes rdar://problem/18369443.

llvm-svn: 217977
2014-09-17 19:19:31 +00:00
Robin Morisset bf26f8fd56 Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
It is breaking the build on the buildbots but works fine on my machine, I revert
while trying to understand what happens (it appears to depend on the compiler used
to build, I probably used a C++11 feature that is not perfectly supported by some
of the buildbots).

This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc.

llvm-svn: 217973
2014-09-17 18:09:13 +00:00
Juergen Ributzka d8e30c0db8 [FastISel][AArch64] Fold compare with zero and branch into CBZ and CBNZ.
This takes advanatage of the CBZ and CBNZ instruction to further optimize the
common null check pattern into a single instruction.

This is related to rdar://problem/18358882.

llvm-svn: 217972
2014-09-17 18:05:34 +00:00
Juergen Ributzka fb3e14375a [FastISel][AArch64] Improve branch selection to support all FP conditions.
This adds the last two missing floating-point condition codes (FCMP_UEQ and
FCMP_ONE) also to the branch selection. In these two cases an additonal branch
instruction is required.

This also adds unit tests to checks all the different condition codes.

This is related o rdar://problem/18358882.

llvm-svn: 217966
2014-09-17 17:46:47 +00:00
Robin Morisset 1c8a457575 [ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors
Summary:
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.

Thanks to luqmana for having noticed this bug.

Test Plan: Added more cases to atomic-load-store.ll + make check-all

Reviewers: jfb, t.p.northover, luqmana

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5304

llvm-svn: 217965
2014-09-17 17:41:16 +00:00
Matt Arsenault 02dc26529e R600/SI: Change formatting of printed FP immediates
Only 1 decimal place should be printed for inline immediates.
Other constants should be hex constants.

Does not include f64 tests because folding those inline
immediates currently does not work.

llvm-svn: 217964
2014-09-17 17:32:13 +00:00
Chad Rosier 307b50b0f6 [IndVarSimplify] Partially revert r217953 to see if this fixes the bots.
Specifically, disable widening of unsigned compare instructions.

llvm-svn: 217962
2014-09-17 16:35:09 +00:00
Chad Rosier bb99f40530 [IndVarSimplify] Widen loop compare instructions.
This improves other optimizations such as LSR.  A sext may be added to the
compare's other operand, but this can often be hoisted outside of the loop.

llvm-svn: 217953
2014-09-17 14:10:33 +00:00
Andrea Di Biagio 5b92b4971a [InstCombine] Fix wrong folding of constant comparison involving ahsr and negative quantities (PR20945).
Example:
define i1 @foo(i32 %a) {
  %shr = ashr i32 -9, %a
  %cmp = icmp ne i32 %shr, -5
  ret i1 %cmp
}

Before this fix, the instruction combiner wrongly thought that %shr
could have never been equal to -5. Therefore, %cmp was always folded to 'true'.
However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore,
in this example, it is not valid to fold %cmp to 'true'.
The problem was only affecting the case where the comparison was between
negative quantities where one of the quantities was obtained from arithmetic
shift of a negative constant.

This patch fixes the problem with the wrong folding (fixes PR20945).
With this patch, the 'icmp' from the example is now simplified to a
comparison between %a and 1. This still allows us to get rid of the arithmetic
shift (%shr).

llvm-svn: 217950
2014-09-17 11:32:31 +00:00
Toma Tabacu 351b2feeb3 [mips] Add assembler support for the .set nodsp directive.
Summary: This directive is used to tell the assembler to reject DSP-specific instructions.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D5142

llvm-svn: 217946
2014-09-17 09:01:54 +00:00
Pavel Chupin 37b65d81dd [x32] Fix function indirect calls
Summary: Zero-extend register to 64-bit for callq/jmpq.

Test Plan: 3 tests added

Reviewers: nadav, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5355

llvm-svn: 217942
2014-09-17 07:09:23 +00:00
David Majnemer b435a4214e InstSimplify: Don't allow (x srem y) urem y -> x srem y
Let's consider the case where:
%x i16 = 32768
%y i16 = 384

%x srem %y = 65408
(%x srem %y) urem %y = 128

llvm-svn: 217939
2014-09-17 04:16:35 +00:00
David Majnemer ac717f0972 InstSimplify: ((X % Y) % Y) -> (X % Y)
Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D5350

llvm-svn: 217937
2014-09-17 03:34:34 +00:00
Nick Kledzik 3006130a8e [llvm-objdump] properly use c_str() with format("%s"). Improve getLibraryShortNameByIndex() error handling.
llvm-svn: 217930
2014-09-17 00:25:22 +00:00
Quentin Colombet ac55b15bf4 [CodeGenPrepare][AddressingModeMatcher] The promotion mechanism was expecting
instructions when truncate, sext, or zext were created. Fix that.

llvm-svn: 217926
2014-09-16 22:36:07 +00:00
Nick Kledzik 53a80d3a46 tweak test case for debugging bot
llvm-svn: 217906
2014-09-16 21:29:54 +00:00
Kevin Enderby 98c9accace Hookup the MCSymbolizer to llvm-objdump’s disassembly for Mach-O files.
First step done in this commit is to get flush out enough of the
SymbolizerGetOpInfo() routine to symbolic an X86_64 hello world .o and
its loading of the literal string and call to printf.  Also the code to
symbolicate the X86_64_RELOC_SUBTRACTOR relocation and a test is also
added to show a slightly more complicated case.

Next will be to flush out enough of SymbolizerSymbolLookUp() to get the
literal string “Hello world” printed as a comment on the instruction that load
the pointer to it.

llvm-svn: 217893
2014-09-16 18:00:57 +00:00
Adam Nemet e5a07167f5 [TableGen] Fully resolve class-instance values before defs in multiclasses
By class-instance values I mean 'Class<Arg>' in 'Class<Arg>.Field' or in
'Other<Class<Arg>>' (syntactically s SimpleValue).  This is to differentiate
from unnamed/anonymous record definitions (syntactically an ObjectBody) which
are not affected by this change.

Consider the testcase:

    class Struct<int i> {
      int I = !shl(i, 1);
      int J = !shl(I, 1);
    }

    class Class<Struct s> {
        int Class_J = s.J;
    }

    multiclass MultiClass<int i> {
      def Def : Class<Struct<i>>;
    }

    defm Defm : MultiClass<2>;

Before this fix, DefmDef.Class_J yields !shl(I, 1) instead of 8.

This is the sequence of events.  We start with this:

    multiclass MultiClass<int i> {
      def Def : Class<Struct<i>>;
    }

During ParseDef the anonymous object for the class-instance value is created:

    multiclass Multiclass<int i> {
      def anonymous_0 : Struct<i>;

      def Def : Class<NAME#anonymous_0>;
    }

Then class Struct<i> is added to anonymous_0.  Also Class<NAME#anonymous_0> is
added to Def:

    multiclass Multiclass<int i> {
      def anonymous_0 {
        int I = !shl(i, 1);
        int J = !shl(I, 1);
      }

      def Def {
        int Class_J = NAME#anonymous_0.J;
      }
    }

So far so good but then we move on to instantiating this in the defm
by substituting the template arg 'i'.

This is how the anonymous prototype looks after fully instantiating.

    defm Defm = {
      def Defmanonymous_0 {
         int I = 4;
         int J = !shl(I, 1);
      }

Note that we only resolved the reference to the template arg.  The
non-template-arg reference in 'J' has not been resolved yet.

Then we go on to instantiating the Def prototype:

      def DefmDef {
         int Class_J = NAME#anonymous_0.J;
      }

Which is resolved to Defmanonymous_0.J and then to !shl(I, 1).

When we fully resolve each record in a defm, Defmanonymous_0.J does get set
to 8 but that's too late for its use.

The patch adds a new attribute to the Record class that indicates that this
def is actually a class-instance value that may be *used* by other defs in a
multiclass.  (This is unlike regular defs which don't reference each other and
thus can be resolved indepedently.)  They are then fully resolved before the
other defs while the multiclass is instantiated.

I added vg_leak to the new test.  I am not sure if this is necessary but I
don't think I have a way to test it.  I can also check in without the XFAIL
and let the bots test this part.

Also tested that X86.td.expanded and AAarch64.td.expanded were unchange before
and after this change.  (This issue triggering this problem is a WIP patch.)

Part of <rdar://problem/17688758>

llvm-svn: 217886
2014-09-16 17:14:13 +00:00
Toma Tabacu 65f1057191 [mips] Improve the error messages given by MipsAsmParser.
Summary: Changed error messages to be more informative and to resemble other clang/llvm error messages (first letter is lower case, no ending punctuation) and updated corresponding tests.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D5065

llvm-svn: 217873
2014-09-16 15:00:52 +00:00
Toma Tabacu 18227e6f20 [mips] Move 32-bit ADDiu instruction alias from Mips64InstrInfo.td to MipsInstrInfo.td.
Patch by Vasileios Kalintiris.

Differential Revision: http://reviews.llvm.org/D5244

llvm-svn: 217868
2014-09-16 10:19:03 +00:00
Toma Tabacu 25cdd222b0 [mips] Marked the ADDi instruction aliases as not available in Mips32R6 and Mips64R6.
Patch by Vasileios Kalintiris.

Differential Revision: http://reviews.llvm.org/D5242

llvm-svn: 217867
2014-09-16 09:26:09 +00:00
Tilmann Scheller 40fc9595c8 [InstCombine] Remove redundant test case.
Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D5284

llvm-svn: 217865
2014-09-16 08:50:10 +00:00
Elena Demikhovsky 27012478d2 AVX-512: added cost for some AVX-512 instructions
llvm-svn: 217863
2014-09-16 07:57:37 +00:00
Nick Kledzik c17c8093db tweak test case to help build bot
llvm-svn: 217860
2014-09-16 04:51:38 +00:00
Hal Finkel cc4f31d3d7 Fix BasicTTI::getCmpSelInstrCost to deal with illegal vector types
The default implementation of getCmpSelInstrCost, which provides the cost of
icmp/fcmp/select instructions, did not deal sensibly with illegal vector types
that were scalarized. We'd ask for the legalization cost of the vector type,
which would return something like (4, f64) given an input of <4 x double>, and
we'd then check the TLI status of the ISD opcode on that scalar type. This would
result in querying (ISD::VSELECT, f64), for example. Amusingly enough,
ISD::VSELECT on scalar types is marked as Legal by default (as with most other
operations), and most backends never change this because VSELECT is never
generated on scalars. However, seeing the resulting operation as Legal, we'd
neglect to add the scalarization cost before returning. The result is that we'd
grossly under-estimate the cost of cmps/selects on illegal vector types.

Now, if type legalization clearly results in scalarization, we skip the early
return and add the scalarization cost.

llvm-svn: 217859
2014-09-16 04:35:50 +00:00
David Majnemer 2cbc13878f yaml2obj: Support bigobj
Teach yaml2obj how to make a bigobj COFF file.  Like the rest of LLVM,
we automatically decide whether or not to use regular COFF or bigobj
COFF on the fly depending on how many sections the resulting object
would have.

This ends the task of adding bigobj support to LLVM.

N.B. This was tested by forcing yaml2obj to be used in bigobj mode
regardless of the number of sections.  While a dedicated test was
written, the smallest I could make it was 36 MB (!) of yaml and it still
took a significant amount of time to execute on a powerful machine.

llvm-svn: 217858
2014-09-16 03:52:46 +00:00
Nick Kledzik c1a750bba6 tweak test case to help solve why failing on one build bot
llvm-svn: 217856
2014-09-16 02:33:36 +00:00
Nick Kledzik 56ebef45ef [llvm-objdump] for mach-o add -bind, -lazy-bind, and -weak-bind options
This finishes the ability of llvm-objdump to print out all information from
the LC_DYLD_INFO load command.

The -bind option prints out symbolic references that dyld must resolve 
immediately.

The -lazy-bind option prints out symbolc reference that are lazily resolved on 
first use.

The -weak-bind option prints out information about symbols which dyld must
try to coalesce across images.

llvm-svn: 217853
2014-09-16 01:41:51 +00:00
Juergen Ributzka 59e631c728 [FastISel][AArch64] Add vector support to argument lowering.
Lower the first 8 vector arguments too.

llvm-svn: 217850
2014-09-16 00:25:30 +00:00
Chandler Carruth f845e89425 [x86] As a follow-up to r217819, don't check for VSELECT legality now
that we don't use VSELECT and directly emit an addsub synthetic node.
Also remove a stale comment referencing VSELECT.

The test case is updated to use 'core2' which only has SSE3, not SSE4.1,
and it still passes. Previously it would not because we lacked
sufficient blend support to legalize the VSELECT.

llvm-svn: 217849
2014-09-16 00:24:42 +00:00
Chandler Carruth de5f2b356b [x86] Add the beginnings of a proper DAG combine to match ADDSUBPS and
ADDSUBPD nodes out of blends of adds and subs.

This allows us to actually form these instructions with SSE3 rather than
only forming them when we had both SSE3 for the ADDSUB instructions and
SSE4.1 for the blend instructions. ;] Kind-of important.

I've adjusted the CPU requirements on one of the tests to demonstrate
this kicking in nicely for an SSE3 cpu configuration.

llvm-svn: 217848
2014-09-16 00:15:20 +00:00
Juergen Ributzka f693787ed0 [FastISel][AArch64] Add missing test case for previous commit.
This adds the missing test case for the previous commit:
Allow handling of vectors during return lowering for little endian machines.

Sorry for the noise.

llvm-svn: 217847
2014-09-15 23:47:57 +00:00
Juergen Ributzka 993224a553 [FastISel][AArch64] Lower sin/cos/pow to runtime lib calls.
Also lower sin/cos/pow to runtime lib calls.

This fixes rdar://problem/18343468.

llvm-svn: 217839
2014-09-15 22:33:06 +00:00
Justin Bogner 92bb302314 llvm-cov: Make debug output more consistent
This changes the debug output of the llvm-cov tool to consistently
write to stderr, and moves the highlighting output closer to where
it's relevant.

llvm-svn: 217838
2014-09-15 22:23:29 +00:00
Justin Bogner 0b3614f806 llvm-cov: Fix an issue with showing regions but not counts
In r217746, though it was supposed to be NFC, I broke llvm-cov's
handling of showing regions without showing counts. This should've
shown up in the existing tests, except they were checking debug output
that was displayed regardless of what was actually output. I've moved
the relevant debug output to a more appropriate place so that the
tests catch this kind of thing.

llvm-svn: 217835
2014-09-15 22:12:28 +00:00
Rafael Espindola 9dd2d5810f Add back tests for empty function in SPARC and PowerPC.
llvm-svn: 217834
2014-09-15 22:11:07 +00:00
Juergen Ributzka afa034fb61 [FastISel][AArch64] Add lowering support for frem.
This lowers frem to a runtime libcall inside fast-isel.

The test case also checks the CallLoweringInfo bug that was exposed by this
change.

This fixes rdar://problem/18342783.

llvm-svn: 217833
2014-09-15 22:07:49 +00:00
Juergen Ributzka 8984f48d89 [FastISel][AArch64] Improve floating-point compare support.
Add support for the last two missing fcmp condition codes: UEQ and ONE.

This fixes rdar://problem/18341575.

llvm-svn: 217823
2014-09-15 20:47:16 +00:00
Reed Kotler 32be74b178 Add mips32 r1 to the list of supported targets for Mips fast-isel
Summary:
Expand list of supported targets for Mips to include mips32 r1.
Previously it only include r2. More patches are coming where there is 
a difference but in the current patches as pushed upstream, r1 and r2
are equivalent.

Test Plan:
simplestorefp1.ll

add new build bots at mips to test this flavor at both -O0 and -O2

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D5306

llvm-svn: 217821
2014-09-15 20:30:25 +00:00
NAKAMURA Takumi 33d585cb25 llvm/test/CodeGen/X86/peephole-fold-movsd.ll: Relax an expression for win32.
llvm-svn: 217806
2014-09-15 19:00:31 +00:00
Rafael Espindola 4235c2dade Add a triple to fix the bots.
llvm-svn: 217805
2014-09-15 18:54:41 +00:00
Rafael Espindola 6865d6f08a Fix a lot of confusion around inserting nops on empty functions.
On MachO, and MachO only, we cannot have a truly empty function since that
breaks the linker logic for atomizing the section.

When we are emitting a frame pointer, the presence of an unreachable will
create a cfi instruction pointing past the last instruction. This is perfectly
fine. The FDE information encodes the pc range it applies to. If some tool
cannot handle this, we should explicitly say which bug we are working around
and only work around it when it is actually relevant (not for ELF for example).

Given the unreachable we could omit the .cfi_def_cfa_register, but then
again, we could also omit the entire function prologue if we wanted to.

llvm-svn: 217801
2014-09-15 18:32:58 +00:00
Quentin Colombet 9dcb724d31 [CodeGenPrepare][AddressingModeMatcher] Fix a think-o for the sext(zext) -> zext promotion
introduced in r217629.
We were returning the old sext instead of the new zext as the promoted instruction!

Thanks Joerg Sonnenberger for the test case.

llvm-svn: 217800
2014-09-15 18:26:58 +00:00
Akira Hatanaka 760814a7e1 [X86] Fix a bug in X86's peephole optimization.
Peephole optimization was folding MOVSDrm, which is a zero-extending double
precision floating point load, into ADDPDrr, which is a SIMD add of two packed
double precision floating point values.

(before)
%vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21
%vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21

(after)
%vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20

X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this
from happening. However the check wasn't being conducted for loads from stack
objects. This commit factors out the logic into a new function and uses it for
checking loads from stack slots are not zero-extending loads.

rdar://problem/18236850

llvm-svn: 217799
2014-09-15 18:23:52 +00:00
Matt Arsenault f090bda1d5 CHECK-LABELize test
llvm-svn: 217797
2014-09-15 17:56:56 +00:00
Matt Arsenault 49dd4283ed R600/SI: Prefer selecting more e64 instruction forms.
Add some more tests to make sure better operand
choices are still made. Leave some cases that seem
to have no reason to ever be e64 alone.

llvm-svn: 217789
2014-09-15 17:15:02 +00:00
Matt Arsenault 0fd0a316ed R600/SI: Make sure double vector fmul is tested
llvm-svn: 217787
2014-09-15 17:04:54 +00:00
Matt Arsenault 72aafd0689 R600/SI: Add some mubuf testcases.
I noticed some odd looking cases where addr64 wasn't set
when storing to a pointer in an SGPR. This seems to be intentional,
and partially tested already.

The documentation seems to describe addr64 in terms of which registers
addressing modifiers come from, but I would expect to always need
addr64 when using 64-bit pointers. If no offset is applied,
it makes sense to not need to worry about doing a 64-bit add
for the final address. A small immediate offset can be applied,
so is it OK to not have addr64 set if a carry is necessary when adding
the base pointer in the resource to the offset?

llvm-svn: 217785
2014-09-15 16:48:01 +00:00
Matt Arsenault 3f98140c87 R600/SI: Add preliminary support for flat address space
llvm-svn: 217777
2014-09-15 15:41:53 +00:00
Toma Tabacu bbd0eca340 [mips] Marked the DADDiu instruction aliases as MIPS III.
Patch by Vasileios Kalintiris.

Differential Revision: http://reviews.llvm.org/D5239

llvm-svn: 217770
2014-09-15 14:47:46 +00:00
Chandler Carruth 707a2e098d [x86] Begin emitting PBLENDW instructions for integer blend operations
when SSE4.1 is available.

This removes a ton of domain crossing from blend code paths that were
ending up in the floating point code path.

This is just the tip of the iceberg though. The real switch is for
integer blend lowering to more actively rely on this instruction being
available so we don't hit shufps at all any longer. =] That will come in
a follow-up patch.

Another place where we need better support is for using PBLENDVB when
doing so avoids the need to have two complementary PSHUFB masks.

llvm-svn: 217767
2014-09-15 12:40:54 +00:00
Chandler Carruth 00b1e0fc9d [x86] Add an explicit SSE3 run to this test and flesh out a bunch of
missing specific checks.

While there is a lot of redundancy here where all-but-one mode use the
same code generation, I'd rather have each variant spelled out and
checked so that readers aren't misled by an omission in the test suite.

llvm-svn: 217765
2014-09-15 11:40:20 +00:00
Chandler Carruth 12d4a70cbd [x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS
instructions from the relevant shuffle patterns.

This is the last tweak I'm aware of to generate essentially perfect
v4f32 and v2f64 shuffles with the new vector shuffle lowering up through
SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32
is amenable to exhaustive exploration, but this is all of the tricks I'm
aware of.

With AVX there is a new trick to use the VPERMILPS instruction, that's
coming up in a subsequent patch.

llvm-svn: 217761
2014-09-15 11:26:25 +00:00
Chandler Carruth 41a25dd7ef [x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP
instructions when it finds an appropriate pattern.

These are lovely instructions, and its a shame to not use them. =] They
are fast, and can hand loads folded into their operands, etc.

I've also plumbed the comment shuffle decoding through the various
layers so that the test cases are printed nicely.

llvm-svn: 217758
2014-09-15 11:15:23 +00:00
Chandler Carruth 35e3b545d6 [x86] Undo a flawed transform I added to form UNPCK instructions when
AVX is available, and generally tidy up things surrounding UNPCK
formation.

Originally, I was thinking that the only advantage of PSHUFD over UNPCK
instruction variants was its free copy, and otherwise we should use the
shorter encoding UNPCK instructions. This isn't right though, there is
a larger advantage of being able to fold a load into the operand of
a PSHUFD. For UNPCK, the operand *must* be in a register so it can be
the second input.

This removes the UNPCK formation in the target-specific DAG combine for
v4i32 shuffles. It also lifts the v8 and v16 cases out of the
AVX-specific check as they are potentially replacing multiple
instructions with a single instruction and so should always be valuable.
The floating point checks are simplified accordingly.

This also adjusts the formation of PSHUFD instructions to attempt to
match the shuffle mask to one which would fit an UNPCK instruction
variant. This was originally motivated to allow it to match the UNPCK
instructions in the combiner, but clearly won't now.

Eventually, we should add a MachineCombiner pass that can form UNPCK
instructions post-RA when the operand is known to be in a register and
thus there is no loss.

llvm-svn: 217755
2014-09-15 10:35:41 +00:00
Chandler Carruth 44e64b5267 [x86] Teach the new vector shuffle lowering to use 'punpcklwd' and
'punpckhwd' instructions when suitable rather than falling back to the
generic algorithm.

While we could canonicalize to these patterns late in the process, that
wouldn't help when the freedom to use them is only visible during
initial lowering when undef lanes are well understood. This, it turns
out, is very important for matching the shuffle patterns that are used
to lower sign extension. Fixes a small but relevant regression in
gcc-loops with the new lowering.

When I changed this I noticed that several 'pshufd' lowerings became
unpck variants. This is bad because it removes the ability to freely
copy in the same instruction. I've adjusted the widening test to handle
undef lanes correctly and now those will correctly continue to use
'pshufd' to lower. However, this caused a bunch of churn in the test
cases. No functional change, just churn.

Both of these changes are part of addressing a general weakness in the
new lowering -- it doesn't sufficiently leverage undef lanes. I've at
least a couple of patches that will help there at least in an academic
sense.

llvm-svn: 217752
2014-09-15 09:02:37 +00:00
David Majnemer a315bd80c2 InstSimplify: Simplify trivial and/or of icmps
Some ICmpInsts when anded/ored with another ICmpInst trivially reduces
to true or false depending on whether or not all integers or no integers
satisfy the intersected/unioned range.

This sort of trivial looking code can come about when InstCombine
performs a range reduction-type operation on sdiv and the like.

This fixes PR20916.

llvm-svn: 217750
2014-09-15 08:15:28 +00:00
Chandler Carruth 0a98790b32 [x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD.
These are super simple. They even take precedence over crazy
instructions like INSERTPS because they have very high throughput on
modern x86 chips.

I still have to teach the integer shuffle variants about this to avoid
so many domain crossings. However, due to the particular instructions
available, that's a touch more complex and so a separate patch.

Also, the backend doesn't seem to realize it can commute blend
instructions by negating the mask. That would help remove a number of
copies here. Suggestions on how to do this welcome, it's an area I'm
less familiar with.

llvm-svn: 217744
2014-09-14 23:43:33 +00:00
NAKAMURA Takumi da86d7c26b llvm/test/CodeGen/X86/vec_shuffle-38.ll: Add explicit -mtriple=x86_64-unknown to avoid incompatibility of win32.
llvm-svn: 217742
2014-09-14 23:39:01 +00:00
Chandler Carruth f2a92921f9 [x86] Add an SSE41 mode to this test. Nothing interesting here, its the
same as SSE3.

llvm-svn: 217741
2014-09-14 23:28:12 +00:00
Chandler Carruth b396922647 [x86] Switch this test to use an ALL prefix with special SSE2 and SSE3
variants where significant.

This will make it more obvious what is happening when we start using
blends in SSE41.

llvm-svn: 217740
2014-09-14 23:19:37 +00:00
Chandler Carruth da5ce5cad8 [x86] Add some test cases where we should emit blendpd in SSE4.1. No
actual change yet though.

llvm-svn: 217739
2014-09-14 23:15:52 +00:00
Chandler Carruth 47ebd24e24 [x86] Teach the vector combiner that picks a canonical shuffle from to
support transforming the forms from the new vector shuffle lowering to
use 'movddup' when appropriate.

A bunch of the cases where we actually form 'movddup' don't actually
show up in the test results because something even later than DAG
legalization maps them back to 'unpcklpd'. If this shows back up as
a performance problem, I'll probably chase it down, but it is at least
an encoded size loss. =/

To make this work, also always do this canonicalizing step for floating
point vectors where the baseline shuffle instructions don't provide any
free copies of their inputs. This also causes us to canonicalize
unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space
win.

There is one test which is "regressed" by this: extractelement-load.
There, the test case where the optimization it is testing *fails*, the
exact instruction pattern which results is slightly different. This
should probably be fixed by having the appropriate extract formed
earlier in the DAG, but that would defeat the purpose of the test.... If
this test case is critically important for anyone, please let me know
and I'll try to work on it. The prior behavior was actually contrary to
the comment in the test case and seems likely to have been an accident.

llvm-svn: 217738
2014-09-14 22:41:37 +00:00
Matt Arsenault f620a575bf R600/SI: Fix broken check lines
llvm-svn: 217736
2014-09-14 18:32:05 +00:00
Juergen Ributzka 85c1f84650 [FastISel][AArch64] Add support for non-native types for logical ops.
Extend the logical ops selection to also support non-native types such as i1,
i8, and i16.

Fixes rdar://problem/18330589.

llvm-svn: 217732
2014-09-13 23:46:28 +00:00
Chad Rosier ce65c060e7 [AArch64] Update test case to pass with post-RA MI scheduler.
Check that the post RA scheduler is being skipped, regardless of
whether it's the top-down list latency scheduler or the post-RA
MI scheduler.

llvm-svn: 217725
2014-09-13 03:23:23 +00:00
Nick Kledzik b8536b1db8 Stop suppress error messages in test case to see why one buildbot is failing
llvm-svn: 217715
2014-09-12 22:46:01 +00:00
Nick Kledzik ac43144e5a [llvm-objdump] support -rebase option for mach-o to dump rebasing info
Similar to my previous -exports-trie option, the -rebase option dumps info from
the LC_DYLD_INFO load command. The rebasing info is a list of the the locations
that dyld needs to adjust if a mach-o image is not loaded at its preferred 
address. Since ASLR is now the default, images almost never load at their
preferred address, and thus need to be rebased by dyld.

llvm-svn: 217709
2014-09-12 21:34:15 +00:00
Justin Bogner 54b112828f llvm-profdata: Avoid undefined behaviour when reading raw profiles
The raw profiles that are generated in compiler-rt always add padding
so that each profile is aligned, so we can simply treat files that
don't have this property as malformed.

Caught by Alexey's new ubsan bot. Thanks!

llvm-svn: 217708
2014-09-12 21:22:55 +00:00
Chad Rosier e668f61076 FileCheckize. NFC.
llvm-svn: 217698
2014-09-12 17:55:16 +00:00
Chad Rosier 486e087f26 [AArch64] Enable post-RA MI scheduler.
Phabricator Revision: http://reviews.llvm.org/D5278
Patch by Sanjin Sijaric!

llvm-svn: 217693
2014-09-12 17:40:39 +00:00
Jordan Rose ef78038775 [lit] Parse all strings as UTF-8 rather than ASCII.
As far as I can tell UTF-8 has been supported since the beginning of Python's
codec support, and it's the de facto standard for text these days, at least
for primarily-English text. This allows us to put Unicode into lit RUN lines.

rdar://problem/18311663

llvm-svn: 217688
2014-09-12 16:46:05 +00:00
NAKAMURA Takumi 9e424c0eb4 llvm/test/CodeGen/X86/vec_ctbits.ll: Add explicit -mtriple=x86_64-unknown. It was incompatible to Win32 x64.
llvm-svn: 217683
2014-09-12 15:10:56 +00:00
Zoran Jovanovic c74e3eb9a6 [mips][microMIPS] Implement JRADDIUSP instruction
Differential Revision: http://reviews.llvm.org/D5046

llvm-svn: 217681
2014-09-12 14:29:54 +00:00
Bill Schmidt b73b370809 Address comments on r217622
llvm-svn: 217680
2014-09-12 14:26:36 +00:00
Zoran Jovanovic ed6dd6bd39 [mips][microMIPS] Implement BGEZALS and BLTZALS instructions
Differential Revision: http://reviews.llvm.org/D5004

llvm-svn: 217678
2014-09-12 13:51:58 +00:00
Zoran Jovanovic ac9ef12fc5 [mips][microMIPS] Implement JALS and JALRS instructions.
Differential Revision: http://reviews.llvm.org/D5003

llvm-svn: 217676
2014-09-12 13:43:41 +00:00
Zoran Jovanovic 4e7ac4ad2a [mips][microMIPS] Implement TLBP, TLBR, TLBWI and TLBWR instructions
Differential Revision: http://reviews.llvm.org/D5211

llvm-svn: 217675
2014-09-12 13:33:33 +00:00
James Molloy a9f47b6bae [ARM] Teach the cost model that cross-class copies are costly.
Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case.

llvm-svn: 217674
2014-09-12 13:29:40 +00:00
Benjamin Kramer 6d527ef9d6 Legalizer: Use the scalar bit width when promoting bit counting instrs on
vectors.

e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup
the result by 32 bits, not 64. PR20917.

llvm-svn: 217671
2014-09-12 12:50:27 +00:00
Justin Bogner 3d7260e7b2 Revert "llvm-cov: Remove an overly system specific test"
This fixes a call to sys::fs::equivalent that should've been to
CodeCoverageTool::equivalentFiles, which lets us restore the test of
r217476 that was removed in r217478.

This reverts r217478, but the test works this time.

llvm-svn: 217646
2014-09-11 23:20:48 +00:00
Matt Arsenault 362f345bab R600/SI: Fix off by 1 error in used register count
The register numbers start at 0, so if only 1 register
was used, this was reported as 0.

llvm-svn: 217636
2014-09-11 22:51:37 +00:00
Lang Hames 691a21ce5a [MCJIT] Make sure we test ARM BR24 relocations with both internal and external
symbols.

Previously we have only been testing these relocations with external symbols.

<rdar://problem/18308413>

llvm-svn: 217635
2014-09-11 22:43:36 +00:00
Quentin Colombet b2c5c6dde3 [CodeGenPrepare] Teach the addressing mode matcher how to promote zext.
I.e., teach it about 'sext (zext a to ty) to ty2' => zext a to ty2.

llvm-svn: 217629
2014-09-11 21:22:14 +00:00
Bill Schmidt 3ae268076b Add missing colon to RUN line...
llvm-svn: 217623
2014-09-11 20:13:52 +00:00
Bill Schmidt be95fd5357 [PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm
Inline asm may specify 'U' and 'X' constraints to print a 'u' for an
update-form memory reference, or an 'x' for an indexed-form memory
reference.  However, these are really only useful in GCC internal code
generation.  In inline asm the operand of the memory constraint is
typically just a register containing the address, so 'U' and 'X' make
no sense.

This patch quietly accepts 'U' and 'X' in inline asm patterns, but
otherwise does nothing.  If we ever unexpectedly see a non-register,
we'll assert and sort it out afterwards.

I've added a new test for these constraints; the test case should be
used for other asm-constraints changes down the road.

llvm-svn: 217622
2014-09-11 20:10:03 +00:00
Lang Hames 6f1048f94e [MCJIT] Add support for ARM HALF_DIFF relocations to MCJIT.
Fixes <rdar://problem/18297804>.

llvm-svn: 217620
2014-09-11 19:21:14 +00:00
Matt Arsenault d40e1c3fbc Add triple to test to fix bots
llvm-svn: 217612
2014-09-11 17:50:20 +00:00
Brad Smith 2ce0d91bde Provide an implementation of getNoopForMachoTarget for SPARC.
llvm-svn: 217611
2014-09-11 17:40:51 +00:00
Matt Arsenault 8239eaab99 Add DAG combine for shl + add of constants.
Do
 (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)

This is already done for multiplies, but since multiplies
by powers of two are turned into shifts, we also need
to handle it here.

This might want checks for isLegalAddImmediate to avoid
transforming an add of a legal immediate with one that isn't.

llvm-svn: 217610
2014-09-11 17:34:19 +00:00
Lang Hames 4669cd08a7 [MCJIT] Take the relocation addend into account when applying ARM MachO VANILLA
and BR24 relocations.

<rdar://problem/18296496>

llvm-svn: 217605
2014-09-11 17:27:01 +00:00
Adam Nemet 053c4e825c [AVX512] Fix miscompile for unpack
r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack
between the low and the high 256 bits of src1 into the low part of the
destination and another unpack of the low and high 256 bits of src2 into the
high part of the destination.

I don't think that's how unpack works.  AVX512 unpack simply has more 128-bit
lanes but other than it works the same way as AVX.  So in each 128-bit lane,
we're always interleaving certain parts of both operands rather different
parts of one of the operands.

E.g. for this:
__v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
__v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 };
__v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16,
	    			       	     24, 17, 25, 20, 28, 21, 29);

we generated punpcklps (notice how the elements of a and b are not interleaved
in the shuffle).  In turn, c was set to this:

  0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29

Obviously this should have just returned the mask vector of the shuffle
vector.

I mostly reverted this change and made sure the original AVX code worked
for 512-bit vectors as well.

Also updated the tests because they matched the logic from the code.

llvm-svn: 217602
2014-09-11 16:51:10 +00:00
Sanjay Patel 1eb5047ddb Add triple and remove hashes to account for buildbot differences in comment strings.
llvm-svn: 217601
2014-09-11 16:08:44 +00:00
Sanjay Patel 7bd228a82e Combine fmul vector FP constants when unsafe math is allowed.
This is an extension of the change made with r215820:
http://llvm.org/viewvc/llvm-project?view=revision&revision=215820

That patch allowed combining of splatted vector FP constants that are multiplied.

This patch allows combining non-uniform vector FP constants too by relaxing the
check on the type of vector. Also, canonicalize a vector fmul in the
same way that we already do for scalars - if only one operand of the fmul is a
constant, make it operand 1. Otherwise, we miss potential folds.

This fold is also done by -instcombine, but it's possible that extra
fmuls may have been generated during lowering.

Differential Revision: http://reviews.llvm.org/D5254

llvm-svn: 217599
2014-09-11 15:45:27 +00:00
Aaron Watry 3ffc560094 R600: Test local atomics for evergreen
Now that the operations are all implemented, we can test this sub-arch here.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217595
2014-09-11 15:02:52 +00:00
Tilmann Scheller ee0e49398c [ARM] Add Thumb-2 code size optimization regression test for LSR (register).
llvm-svn: 217582
2014-09-11 10:45:50 +00:00
Tilmann Scheller 579379a6f4 [ARM] Add Thumb-2 code size optimization regression test for LSR (immediate).
llvm-svn: 217581
2014-09-11 10:42:17 +00:00
Arnaud A. de Grandmaison 3690266739 [AArch64] Reenable the PBQP test now that the leak issue has been fixed.
David Blaikie's commits r217563 & r217564, which added shared_ptr to the
CostPool have fixed some memory leak issues exposed by the PBQP with
coalescing constraints.

The sanitizer bot was failing because of those leaks. Now that the leaks
are gone, we can reenable the aarch64/pbqp test.

llvm-svn: 217580
2014-09-11 10:39:52 +00:00
Tilmann Scheller 0c1249ac60 [ARM] Add Thumb-2 code size optimization regression test for LSL (register).
llvm-svn: 217579
2014-09-11 10:33:39 +00:00
Tilmann Scheller 7430df486e [ARM] Add Thumb2 code size optimization regression test for LSL (immediate).
llvm-svn: 217576
2014-09-11 10:29:42 +00:00
Chandler Carruth 1ec3e4e4bd [x86] Fixup r217565 which baked in an assumption about the function
name that breaks on some platforms. This part of the test just doesn't
matter...

llvm-svn: 217575
2014-09-11 10:21:25 +00:00
Hal Finkel f83e1f7f66 [AlignmentFromAssumptions] Don't crash just because the target is 32-bit
We used to crash processing any relevant @llvm.assume on a 32-bit target
(because we'd ask SE to subtract expressions of differing types). I've copied
our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get
some meaningful test coverage here.

llvm-svn: 217574
2014-09-11 08:40:17 +00:00
David Xu f7aff68fe3 Build correct vector filled with undef nodes
llvm-svn: 217570
2014-09-11 05:10:28 +00:00
Chandler Carruth 292303dd47 [x86] FileCheck-ize this test.
llvm-svn: 217565
2014-09-11 00:13:35 +00:00
Matt Arsenault 61a528adc7 R600/SI: Fix losing chain when fixing reg class of loads.
The lost chain resulting in earlier side effecting nodes
being deleted.

llvm-svn: 217561
2014-09-10 23:26:19 +00:00
Peter Collingbourne d0ec5ab948 Add LLVMgold target to test dependencies.
llvm-svn: 217557
2014-09-10 22:20:49 +00:00
Matt Arsenault 16e313343d R600: Custom lower frem
llvm-svn: 217553
2014-09-10 21:44:27 +00:00
Hal Finkel 71b7084112 [AlignmentFromAssumptions] Don't divide by zero for unknown starting alignment
The routine that determines an alignment given some SCEV returns zero if the
answer is unknown. In a case where we could determine the increment of an
AddRec but not the starting alignment, we would compute the integer modulus by
zero (which is illegal and traps). Prevent this by returning early if either
the start or increment alignment is unknown (zero).

llvm-svn: 217544
2014-09-10 21:05:52 +00:00
Rafael Espindola 71143ed24b Remember to eraseFromParent after replaceAllUsesWith.
llvm-svn: 217536
2014-09-10 19:39:41 +00:00
Arnaud A. de Grandmaison d17f96c9ad [AArch64] Temporarily desactivate the PBQP test, while I investigate some leaks in the allocator
llvm-svn: 217531
2014-09-10 18:40:18 +00:00
Sanjay Patel b653de1ada Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066

llvm-svn: 217528
2014-09-10 17:58:16 +00:00
Arnaud A. de Grandmaison c75dbbbdd6 [AArch64] Add experimental PBQP support
This adds target specific support for using the PBQP register allocator on the
AArch64, for the A57 cpu.

By default, the PBQP allocator is not used, unless explicitely required
on the command line with "-aarch64-pbqp".

llvm-svn: 217504
2014-09-10 14:06:10 +00:00
Asiri Rathnayake 369c030633 [AArch 64] Use a constant pool load for weak symbol references when
using static relocation model and small code model.

Summary: currently we generate GOT based relocations for weak symbol
references regardless of the underlying relocation model. This should
be change so that in static relocation model we use a constant pool
load instead.

Patch from: Keith Walker

Reviewers: Renato Golin, Tim Northover
llvm-svn: 217503
2014-09-10 13:54:38 +00:00
Tim Northover ba1d704229 ARM: don't size-reduce STMs using the LR register.
The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which
translates to STMDB, so we shouldn't convert STMIAs.

Patch by Sergey Dmitrouk.

llvm-svn: 217498
2014-09-10 12:53:28 +00:00