Commit Graph

26244 Commits

Author SHA1 Message Date
Juergen Ributzka 27e959d7b2 [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1).
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.

This should fix a bug found by Chad.

llvm-svn: 218275
2014-09-22 21:08:53 +00:00
David Majnemer 597be2ded6 MC: ReadOnlyWithRel section kinds should map to rdata in COFF
Don't consider ReadOnlyWithRel as a writable section in COFF, they
really belong in .rdata.

llvm-svn: 218268
2014-09-22 20:39:23 +00:00
Chandler Carruth 44deb8015c [x86] Introduce tests covering the gamut of 256-bit vector shuffling.
These are just test cases, no actual code yet. This establishes the
baseline fallback strategy we're starting from on AVX2 and the expected
lowering we use on AVX1.

Also, these test cases are very much generated. I've manually crafted
the specific pattern set that I'm hoping will be useful at exercising
the lowering code, but I've not (and could not) manually verify *all* of
these. I've spot checked and they seem legit to me.

As with the rest of vector shuffling, at a certain point the only really
useful way to check the correctness of this stuff is through fuzz
testing.

llvm-svn: 218267
2014-09-22 20:25:08 +00:00
Sanjay Patel 7939d7229d Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2).
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, 
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.

The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.

Differential Revision: http://reviews.llvm.org/D5347

llvm-svn: 218263
2014-09-22 18:54:01 +00:00
Akira Hatanaka f2a721a875 Fix test case commited in r218242 to appease buildbot.
llvm-svn: 218261
2014-09-22 18:07:20 +00:00
Tom Stellard 9f73851e39 Revert "R600/SI: Add support for global atomic add"
This reverts commit r218254.

The global_atomics.ll test fails with asserts disabled.  For some reason,
the compiler fails to produce the atomic no return variants.

llvm-svn: 218257
2014-09-22 16:44:04 +00:00
Frederic Riss 220fa48491 Fix a test introduced in r218246 to work also on Windows.
llvm-svn: 218255
2014-09-22 16:17:32 +00:00
Tom Stellard 2355a77e74 R600/SI: Add support for global atomic add
llvm-svn: 218254
2014-09-22 15:35:35 +00:00
Pavel Chupin be9f12102f [x32] Fix segmented stacks support
Summary:
Update segmented-stacks*.ll tests with x32 target case and make
corresponding changes to make them pass.

Test Plan: tests updated with x32 target

Reviewers: nadav, rafael, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5245

llvm-svn: 218247
2014-09-22 13:11:35 +00:00
Frederic Riss 955724e3f5 [dwarfdump] Dump full filenames as DW_AT_(decl|call)_file attribute values
Reviewers: dblaikie samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5192

llvm-svn: 218246
2014-09-22 12:36:04 +00:00
Frederic Riss 58ed53cfcd Allow DWARFDebugInfoEntryMinimal::getSubroutineName to resolve cross-unit references.
Summary: getSubroutineName is currently only used by llvm-symbolizer, thus add a binary test containing a cross-cu inlining example.

Reviewers: samsonov, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5394

llvm-svn: 218245
2014-09-22 12:35:53 +00:00
Robert Lougher 6da8a243f9 Fix assert when decoding PSHUFB mask
The PSHUFB mask decode routine used to assert if the mask index was out of
range (<0 or greater than the size of the vector).  The problem is, we can
legitimately have a PSHUFB with a large index using intrinsics.  The
instruction only uses the least significant 4 bits.  This change removes the
assert and masks the index to match the instruction behaviour.

llvm-svn: 218242
2014-09-22 11:54:38 +00:00
Oliver Stannard 14f97d0017 Downgrade DWARF2 section limit error to a warning
We currently emit an error when trying to assemble a file with more
than one section using DWARF2 debug info. This should be a warning
instead, as the resulting file will still be usable, but with a
degraded debug illusion.

llvm-svn: 218241
2014-09-22 10:45:16 +00:00
Chandler Carruth 7158c95d65 [x86] Move the AVX v4i64 test cases down to group them together.
Increasingly I don't want to mix the integer and floating point tests,
especially with AVX where they are handled quite differently.

llvm-svn: 218233
2014-09-22 03:05:23 +00:00
Chandler Carruth 12bbf7d922 [x86] Back out a bad choice about lowering v4i64 and pave the way for
a more sane approach to AVX2 support.

Fundamentally, there is no useful way to lower integer vectors in AVX.
None. We always end up with a VINSERTF128 in the end, so we might as
well eagerly switch to the floating point domain and do everything
there. This cleans up lots of weird and unlikely to be correct
differences between integer and floating point shuffles when we only
have AVX1.

The other nice consequence is that by doing things this way we will make
it much easier to write the integer lowering routines as we won't need
to duplicate the logic to check for AVX vs. AVX2 in each one -- if we
actually try to lower a 256-bit vector as an integer vector, we have
AVX2 and can rely on it. I think this will make the code much simpler
and more comprehensible.

Currently, I've disabled *all* support for AVX2 so that we always fall
back to AVX. This keeps everything working rather than asserting. That
will go away with the subsequent series of patches that provide
a baseline AVX2 implementation.

Please note, I'm going to implement AVX2 *without access to hardware*.
That means I cannot correctness test this path. I will be relying on
those with access to AVX2 hardware to do correctness testing and fix
bugs here, but as a courtesy I'm trying to sketch out the framework for
the new-style vector shuffle lowering in the context of the AVX2 ISA.

llvm-svn: 218228
2014-09-22 00:32:15 +00:00
Chandler Carruth 5d45962b2c [x86] Teach the new vector shuffle lowering how to cleverly lower single
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.

I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.

llvm-svn: 218226
2014-09-21 23:46:13 +00:00
Chandler Carruth b195e860f9 [x86] Add a bunch of test cases where we have different shuffle patterns
in the high and low 128-bit lanes of a v8f32 vector.

No functionality change yet, but wanted to set up the baseline for my
next patch which will make these quite a bit better. =]

llvm-svn: 218224
2014-09-21 23:32:42 +00:00
Chandler Carruth b3125c7522 [x86] Teach the new vector shuffle lowering to re-use the SHUFPS
lowering when it can use a symmetric SHUFPS across both 128-bit lanes.

This required making the SHUFPS lowering tolerant of other vector types,
and adjusting our canonicalization to canonicalize harder.

This is the last of the clever uses of symmetry I've thought of for
v8f32. The rest of the tricks I'm aware of here are to work around
assymetry in the mask.

llvm-svn: 218216
2014-09-21 13:35:14 +00:00
Chandler Carruth 33eda72802 [x86] Teach the new vector shuffle lowering the basics about insertion
of a single element into a zero vector for v4f64 and v4i64 in AVX.
Ironically, there is less to see here because xor+blend is so crazy fast
that we can't really beat that to zero the high 128-bit lane.

llvm-svn: 218214
2014-09-21 12:49:46 +00:00
Chandler Carruth 43f5974ea0 [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and
UNPCKHPS with AVX vectors by recognizing those patterns when they are
repeated for both 128-bit lanes.

With this, we now generate the exact same (really nice) code for
Quentin's avx_test_case.ll which was the most significant regression
reported for the new shuffle lowering. In fact, I'm out of specific test
cases for AVX lowering, the rest were AVX2 I think. However, there are
a bunch of pretty obvious remaining things to improve with AVX...

llvm-svn: 218213
2014-09-21 12:20:44 +00:00
Chandler Carruth 78f4798913 [x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in
preparation for enhancing their support in the new vector shuffle
lowering.

llvm-svn: 218212
2014-09-21 12:13:11 +00:00
Chandler Carruth 88404c4f9b [x86] Begin teaching the new vector shuffle lowering among the most
important bits of cleverness: to detect and lower repeated shuffle
patterns between the two 128-bit lanes with a single instruction.

This patch just teaches it how to lower single-input shuffles that fit
this model using VPERMILPS. =] There is more that needs to happen here.

llvm-svn: 218211
2014-09-21 12:01:19 +00:00
Chandler Carruth 83252ac8f4 [x86] Regenerate this test case now that I've improved my script for
generating the test cases to format things more consistently and
actually catch all the operand sequences that should be elided in favor
of the asm comments. No actual changes here.

llvm-svn: 218210
2014-09-21 11:51:33 +00:00
Chandler Carruth e81bfbada9 [x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct
VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows
down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the
reciprocal throughput on Ivy Bridge CPUs much like it does everywhere
for 128-bits. There isn't a downside, so just eagerly use this
instruction when it suffices.

llvm-svn: 218208
2014-09-21 11:17:55 +00:00
Chandler Carruth 6aea21df8e [x86] Add some more comprehensive tests for v4f64 blending.
llvm-svn: 218207
2014-09-21 11:12:19 +00:00
Chandler Carruth 908afb56c0 [x86] Re-generate a bunch of the v4f64 test cases with my new script.
This expands the integer cases to cover the fact that AVX2 moves their
lane-crossing shuffles into the integer domain. It also adds proper
support for AVX2 run lines and the "ALL" group when it doesn't matter.

llvm-svn: 218206
2014-09-21 11:07:41 +00:00
Chandler Carruth 293327ddcd [x86] Teach the new vector shuffle lowering the first step toward more
actual support for complex AVX shuffling tricks. We can do independent
blends of the low and high 128-bit lanes of an avx vector, so shuffle
the inputs into place and then do the blend at 256 bits. This will in
many cases remove one blend instruction.

The next step is to permute the low and high halves in-place rather than
extracting them and re-inserting them.

llvm-svn: 218202
2014-09-21 09:35:22 +00:00
David Majnemer 48227a3759 MC: Support aligned COMMON symbols for COFF
link.exe:
Fuzz testing has shown that COMMON symbols with size > 32 will always
have an alignment of at least 32 and all symbols with size < 32 will
have an alignment of at least the largest power of 2 less than the size
of the symbol.

binutils:
The BFD linker essentially work like the link.exe behavior but with
alignment 4 instead of 32.  The BFD linker also supports an extension to
COFF which adds an -aligncomm argument to the .drectve section which
permits specifying a precise alignment for a variable but MC currently
doesn't support editing .drectve in this way.

With all of this in mind, we decide to play a little trick: we can
ensure that the alignment will be respected by bumping the size of the
global to it's alignment.

llvm-svn: 218201
2014-09-21 09:18:07 +00:00
Chandler Carruth 8ff73c0170 [x86] Add some more test cases covering specific blend patterns.
llvm-svn: 218200
2014-09-21 09:01:26 +00:00
Chandler Carruth 7a6108d652 [x86] Add the beginnings of some tests for our v8f32 shuffle lowering
under AVX.

This really just documents the current state of the world. I'm going to
try to flesh it out to cover any test cases I plan to improve prior to
improving them so that the delta made by changes is actually visible to
code reviewers.

This is made easier by the fact that I now have a script to automate the
process of producing test cases including the check lines. =]

llvm-svn: 218199
2014-09-21 08:49:27 +00:00
Chandler Carruth a454812ac8 [x86] Teach the new vector shuffle lowering to use VPERMILPD for
single-input shuffles with doubles. This allows them to fold memory
operands into the shuffle, etc. This is just the analog to the v4f32
case in my prior commit.

llvm-svn: 218193
2014-09-20 22:09:27 +00:00
Chandler Carruth aa5b798ae7 [x86] Add an AVX run to the 128-bit v2 tests, teach them to have
a generic SSE and AVX mode in addition to a specific AVX1 test path, and
flesh out the AVX tests.

llvm-svn: 218192
2014-09-20 21:26:41 +00:00
David Majnemer fb83977538 Update tests which broke from r218189
llvm-svn: 218191
2014-09-20 21:18:43 +00:00
Chandler Carruth 6f80abac4e [x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS
instruction for single-vector floating point shuffles. This in turn
allows the shuffles to fold a load into the instruction which is one of
the common regressions hit with the new shuffle lowering.

llvm-svn: 218190
2014-09-20 20:52:07 +00:00
David Majnemer 7d0dc3ef18 MC: Fix MCSectionCOFF::PrintSwitchToSection
We had a few bugs:
- We were considering the GVKind instead of just looking at the section
  characteristics
- We would never print out 'y' when a section was meant to be unreadable
- We would never print out 's' when a section was meant to be shared
- We translated IMAGE_SCN_MEM_DISCARDABLE to 'n' when it should've meant
  IMAGE_SCN_LNK_REMOVE

llvm-svn: 218189
2014-09-20 20:40:50 +00:00
Chandler Carruth 78a761ce8c [x86] Start moving to a fancier check syntax to reduce the need for
duplication of check lines. The idea is to have broad sets of
compilation modes that will frequently diverge without having to always
and immediately explode to the precise ISA feature set.

While this already helps due to VEX encoded differences, it will help
much more as I teach the new shuffle lowering about more of the new VEX
encoded instructions which can still be used to implement 128-bit
shuffles.

llvm-svn: 218188
2014-09-20 18:36:39 +00:00
David Majnemer b8dbebb31c MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF
A problem with our old behavior becomes observable under x86-64 COFF
when we need a read-only GV which has an initializer which is referenced
using a relocation: we would mark the section as writable.  Marking the
section as writable interferes with section merging.

This fixes PR21009.

llvm-svn: 218179
2014-09-20 07:31:46 +00:00
Chandler Carruth 8c4cccd4aa [x86] Teach the v4f32 path of the new shuffle lowering to handle the
tricky case of single-element insertion into the zero lane of a zero
vector.

We can't just use the same pattern here as we do in every other vector
type because the general insertion logic can handle insertion into the
non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we
have INSERTPS that is a much better choice than the generic one for such
lowerings. But INSERTPS can do lots of other lowerings as well so
factoring its logic into the general insertion logic doesn't work very
well. We also can't just extract the core common part of the general
insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that
lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than
a blend while INSERTPS is slower! So instead we do a restrictive
condition on attempting to use the generic insertion logic to narrow it
to those cases where VZEXT_MOVL won't need a shuffle afterward and thus
will do better than INSERTPS. Then we try blending. Then we go back to
INSERTPS.

This still doesn't generate perfect code for some silly reasons that can
be fixed by tweaking the td files for lowering VZEXT_MOVL to use
XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends
up in a register rather than a load from memory -- BLENDPSrr has twice
the reciprocal throughput of MOVSSrr. Don't you love this ISA?

llvm-svn: 218177
2014-09-20 04:15:22 +00:00
Chandler Carruth 00389f3ed9 [x86] Generalize the single-element insertion lowering to work with
floating point types and use it for both v2f64 and v2i64 single-element
insertion lowering.

This fixes the last non-AVX performance regression test case I've gotten
of for the new vector shuffle lowering. There is obvious analogous
lowering for v4f32 that I'll add in a follow-up patch (because with
INSERTPS, v4f32 requires special treatment). After that, its AVX stuff.

llvm-svn: 218175
2014-09-20 03:32:25 +00:00
David Majnemer f4dc456eef llvm-readobj: pretty-print special COFF section names
Print IMAGE_SYM_DEBUG and the like instead of (-2).

llvm-svn: 218172
2014-09-20 00:25:06 +00:00
Peter Collingbourne 975726345c Fix crash with an insertvalue that produces an empty object.
llvm-svn: 218171
2014-09-20 00:10:47 +00:00
Matt Arsenault de0253791c R600: Un-xfail a test which passes with pass disabled
llvm-svn: 218165
2014-09-19 23:02:20 +00:00
Matt Arsenault 5e5b242946 R600/SI: Un-xfail tests which work now
llvm-svn: 218164
2014-09-19 23:02:18 +00:00
Matt Arsenault a986554377 R600/SI: Un xfail a test that works now
llvm-svn: 218162
2014-09-19 22:42:40 +00:00
Juergen Ributzka 92e8978e40 [FastIsel][AArch64] Fix a think-o in address computation.
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.

There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.

llvm-svn: 218161
2014-09-19 22:23:46 +00:00
Chandler Carruth 0fc0c22fa9 [x86] Fully generalize the zext lowering in the new vector shuffle
lowering to support both anyext and zext and to custom lower for many
different microarchitectures.

Using this allows us to get *exactly* the right code for zext and anyext
shuffles in all the vector sizes. For v16i8, the improvement is *huge*.
The new SSE2 test case added I refused to add before this because it was
sooooo muny instructions.

llvm-svn: 218143
2014-09-19 20:00:32 +00:00
Justin Bogner a829fde160 llvm-cov: Prevent a test from matching its own check lines
Since llvm-cov shows the source file in its output, be careful about
potentially matching the check lines themselves.

llvm-svn: 218138
2014-09-19 19:04:08 +00:00
David Blaikie db119544a2 Fix test case to be portable to different architectures.
llvm-svn: 218134
2014-09-19 18:31:25 +00:00
Matt Arsenault 4505f3a73d R600/SI: Fix test to prepare for scheduler
llvm-svn: 218131
2014-09-19 18:11:16 +00:00
David Blaikie 3a7ce252cc Omit DW_TAG_subprograms for subprograms without inlined subroutines when producing -gmlt data
To reduce the size of -gmlt data, skip the subprograms without any
inlined subroutines. Since we've now got the ability to make these
determinations in the backend (funnily enough - we added the flag so we
wouldn't produce ranges under -gmlt, but with this change we use the
flag, but go back to producing ranges under -gmlt).

Instead, just produce CU ranges to inform the consumer which parts of
the code are described by this CU's line table. Tools could inspect the
line table directly to compute the range, but the CU ranges only seem to
be about 0.5% of object/executable size, so I'm not too worried about
teaching llvm-symbolizer that trick just yet - it's certainly a possible
piece of future work.

Update an llvm-symbolizer test just to demonstrate that this schema is
acceptable there (if it wasn't, the compiler-rt tests would catch this,
but good to have an in-llvm-tree test for llvm-symbolizer's behavior
here)

Building the clang binary with -gmlt with this patch reduces the total
size of object files by 5.1% (5.56% without ranges) without compression
and the executable by 4.37% (4.75% without ranges).

llvm-svn: 218129
2014-09-19 17:03:16 +00:00
Hal Finkel 62ac736faa Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

llvm-svn: 218120
2014-09-19 11:42:56 +00:00
Chandler Carruth 8a6536d4b2 [x86] Recognize that we can use duplication to widen v16i8 shuffles due
to undef lanes as well as defined widenable lanes. This dramatically
improves the lowering we use for undef-shuffles in a zext-ish pattern
for SSE2.

llvm-svn: 218115
2014-09-19 09:45:21 +00:00
Chandler Carruth 662b6d84e7 [x86] Actually test the SSE2 lowering for most of the zext-ish shuffles.
Not sure why I only did SSSE3 here. Also, I've left out some of the SSE2
ones because the shuffles are so absurd it's not worth transcribing
them. Will try to fix them to be sane and then check them.

llvm-svn: 218114
2014-09-19 08:51:06 +00:00
Chandler Carruth 2e275142cd [x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32
shuffles that are zext-ing.

Not a lot to see here; the undef lane variant is better handled with
pshufd, but this improves the actual zext pattern.

llvm-svn: 218112
2014-09-19 08:37:44 +00:00
Justin Bogner 13ba23bb79 llvm-cov: Fix dropped lines when filters were applied
Uncovered lines in the middle of a covered region weren't being shown
when filtering to a particular function.

llvm-svn: 218109
2014-09-19 08:13:16 +00:00
Chandler Carruth 398ba9a018 [x86] Add a dedicated lowering path for zext-compatible vector shuffles
to the new vector shuffle lowering code.

This allows us to emit PMOVZX variants consistently for patterns where
it is a viable lowering. This instruction is both fast and allows us to
fold loads into it. This only hooks the new lowering up for i16 and i8
element widths, mostly so I could manage the change to the tests. I'll
add the i32 one next, although it is significantly less interesting.

One thing to note is that we already had some tests for these patterns
but those tests had far less horrible instructions. The problem is that
those tests weren't checking the strict start and end of the instruction
sequence. =[ As a consequence something changed in the lowering making
us generate *TERRIBLE* code for these patterns in SSE2 through SSSE3.
I've consolidated all of the tests and spelled out the madness that we
currently emit for these shuffles. I'm going to try to figure out what
has gone wrong here.

llvm-svn: 218102
2014-09-19 06:07:49 +00:00
Jiangning Liu ffbc690933 Optimize sext/zext insertion algorithm in back-end.
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.

llvm-svn: 218101
2014-09-19 05:30:35 +00:00
David Blaikie 03c3dbeb62 Omit DW_AT_frame_base under -gmlt for size
llvm-svn: 218100
2014-09-19 04:55:05 +00:00
David Blaikie 73b65d236c Omit all the extra static attributes on subprograms in -gmlt
This omission will be done in a fancier manner once we're dealing with
"put gmlt in the skeleton CUs under fission" - it'll have to be
conditional on the kind of CU we're emitting into (skeleton or gmlt).

llvm-svn: 218098
2014-09-19 04:30:36 +00:00
Hans Wennborg c0f0c511db Fix an it's vs. its typo.
llvm-svn: 218093
2014-09-19 01:14:56 +00:00
Matt Arsenault 46cbc4367b R600: Better fix for bug 20982
Just do the left shift as unsigned to avoid the UB.

llvm-svn: 218092
2014-09-19 00:42:06 +00:00
Chandler Carruth be58fd2f2d [x86] Extend this test to cover SSE4.1. Nothing interesting here, but
paves the way for subsequent changes.

llvm-svn: 218091
2014-09-19 00:30:24 +00:00
Peter Collingbourne 6b433e4d46 Try to fix i686-cygming bots.
llvm-svn: 218086
2014-09-18 22:56:00 +00:00
Peter Collingbourne 10039c02ea LTO: introduce object file-based on-disk module format.
This format is simply a regular object file with the bitcode stored in a
section named ".llvmbc", plus any number of other (non-allocated) sections.

One immediate use case for this is to accommodate compilation processes
which expect the object file to contain metadata in non-allocated sections,
such as the ".go_export" section used by some Go compilers [1], although I
imagine that in the future we could consider compiling parts of the module
(such as large non-inlinable functions) directly into the object file to
improve LTO efficiency.

[1] http://golang.org/doc/install/gccgo#Imports

Differential Revision: http://reviews.llvm.org/D4371

llvm-svn: 218078
2014-09-18 21:28:49 +00:00
Quentin Colombet 17799fedb7 [ARM] Do not perform a tail call when the caller returns several values.
The fix is slightly different then x86 (see r216117) because the number of values
attached to a return can vary even for a single returned value (e.g., f64 yields
two returned values).

<rdar://problem/18352998>

llvm-svn: 218076
2014-09-18 21:17:50 +00:00
Robin Morisset 5349e8e532 Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
Summary:
This patch was originally in D5304 (I could not find a way to reopen that revision).
It was accepted, commited and broke the build bots because the overloading of
the constructor of ArrayRef for braced initializer lists is not supported by all
toolchains. I then reverted it, and propose this fixed version that uses a plain
C array instead in makeDMB (that array is then converted implicitly to an
ArrayRef, but that is not behind an ifdef). Could someone confirm me whether
initialization lists for plain C arrays are supported by every toolchain used
to build llvm ? Otherwise I can just initialize the array in the old way:
args[0] = ...; .. ; args[5] = ...;

Below is the description of the original patch:
```
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.

Thanks to luqmana for having noticed this bug.
```

Test Plan: Added more cases to atomic-load-store.ll + make check-all

Reviewers: jfb, t.p.northover, luqmana

Subscribers: llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D5386

llvm-svn: 218066
2014-09-18 18:56:04 +00:00
Matt Arsenault 6462f94884 R600: Bug 20982 - Avoid undefined left shift of negative value
I'm not sure what the hardware actually does, so don't
bother trying to fold it for now.

llvm-svn: 218057
2014-09-18 15:52:26 +00:00
Chandler Carruth 9057fcaf82 [x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate.
There is no purpose in using it for single-input shuffles as
pshufd is just as fast and doesn't tie the two operands. This removes
a substantial amount of wrong-domain blend operations in SSSE3 mode. It
also completes the usage of PALIGNR for integer shuffles and addresses
one of the test cases Quentin hit with the new vector shuffle lowering.

There is still the question of whether and when to use this for floating
point shuffles. It is faster than shufps or shufpd but in the integer
domain. I don't yet really have a good heuristic here for when to use
this instruction for floating point vectors.

llvm-svn: 218038
2014-09-18 09:00:25 +00:00
Chandler Carruth 0fe4928fbe [x86] Add an SSSE3 run and check mode to the 128-bit v2 tests of the new
vector shuffle lowering. This will be needed for up-coming palignr
tests.

llvm-svn: 218037
2014-09-18 08:33:04 +00:00
Juergen Ributzka 1d3a312e2d Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ."
Reverting it until I have time to investigate a regression.

llvm-svn: 218035
2014-09-18 08:07:40 +00:00
Juergen Ributzka 0f3076785f Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies.
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.

llvm-svn: 218034
2014-09-18 07:26:26 +00:00
Juergen Ributzka 2964b832ef [FastISel][AArch64] Simplify XALU multiplies.
Simplify {s|u}mul.with.overflow to {s|u}add.with.overflow when possible.

llvm-svn: 218033
2014-09-18 07:04:54 +00:00
Juergen Ributzka 2fc851002b [FastISel][AArch64] Followup commit for 218031 to handle negative offsets too.
llvm-svn: 218032
2014-09-18 07:04:49 +00:00
Juergen Ributzka a33070c321 [FastISel][AArch64] Try to fold the offset into the add instruction when simplifying a memory address.
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.

llvm-svn: 218031
2014-09-18 05:40:47 +00:00
Juergen Ributzka 99b7758ba0 [FastISel][AArch64] Fold 'AND' instruction during the address computation.
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.

and  x1, x1, #0xffffffff   ---> ldrb x0, [x0, w1, uxtw]
ldrb x0, [x0, x1]

llvm-svn: 218030
2014-09-18 05:40:41 +00:00
Chandler Carruth e0d77ef053 [x86] Add an SSSE3 run to the v4 shuffle test.
llvm-svn: 218028
2014-09-18 04:38:32 +00:00
Saleem Abdulrasool bfdfb14a8f ARM: prevent crash on ELF directives on COFF
Certain directives are unsupported on Windows (some of which could/should be
supported).  We would not diagnose the use but rather crash during the emission
as we try to access the Target Streamer.  Add an assertion to prevent creating a
NULL reference (which is not permitted under C++) as well as a test to ensure
that we can diagnose the disabled directives.

llvm-svn: 218014
2014-09-18 04:28:29 +00:00
Chandler Carruth 867930aadf [x86] Initial step of teaching the new vector shuffle lowering about
PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where
it is completely unmatched. It also introduces the logic for detecting
rotation shuffle masks even in the presence of single input or blend
masks and arbitrarily undef lanes.

I've added fairly comprehensive tests for the matching logic in v8i16
because the tests at that size are much easier to write and manage.

I've not checked the SSE2 code generated for these tests because the
code is *horrible*. It is absolute madness. Testing it will just make
the test brittle without giving any interesting improvements in the
correctness confidence.

llvm-svn: 218013
2014-09-18 04:11:29 +00:00
Saleem Abdulrasool 8c61c6c0f9 ARM: use a more precise check for MachO
Rather than relying on support for a specific directive to determine if we are
targeting MachO, explicitly check the output format.

As an additional bonus, cleanup the caret diagnostic for the non-MachO case and
avoid the spurious error caused by not discarding the statement.

llvm-svn: 218012
2014-09-18 03:49:55 +00:00
Juergen Ributzka c35fb03661 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

llvm-svn: 218010
2014-09-18 02:44:13 +00:00
Samuel Antao 61570df715 Fix FastISel bug in boolean returns for PowerPC.
For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. 

llvm-svn: 217993
2014-09-17 23:25:06 +00:00
Juergen Ributzka f6430314b4 [FastISel][AArch64] Custom lower sdiv by power-of-2.
Emit an optimized instruction sequence for sdiv by power-of-2 depending on the
exact flag.

This fixes rdar://problem/18224511.

llvm-svn: 217986
2014-09-17 21:55:55 +00:00
Nick Kledzik 3e95fa431e [llvm-objdump] clean up test cases now that build bots are green
llvm-svn: 217985
2014-09-17 21:53:07 +00:00
Justin Bogner 5cbed6e09e llvm-cov: Push some more debug output into the View (NFC)
llvm-svn: 217984
2014-09-17 21:48:52 +00:00
Rafael Espindola 51bd8ee309 Internalize common symbols when we can.
This fixes pr20974.

llvm-svn: 217981
2014-09-17 20:41:13 +00:00
Juergen Ributzka c611d72754 [FastISel][AArch64] Simplify mul to shift when possible.
This is related to rdar://problem/18369687.

llvm-svn: 217980
2014-09-17 20:35:41 +00:00
Alexey Samsonov 7bddb0a56a Exclude known and bugzilled failures from UBSan bootstrap
llvm-svn: 217979
2014-09-17 20:17:52 +00:00
Juergen Ributzka 3871c69422 [FastISel][AArch64] Fold mul into add/sub and logical operations.
Try to fold the multiply into the add/sub or logical operations (when
possible).

This is related to rdar://problem/18369687.

llvm-svn: 217978
2014-09-17 19:51:38 +00:00
Juergen Ributzka 22d4cd0a4f [FastISel][AArch64] Fold mul into the address computation of memory operations.
Teach 'computeAddress' to also fold multiplies into the address computation
(when possible).

This fixes rdar://problem/18369443.

llvm-svn: 217977
2014-09-17 19:19:31 +00:00
Robin Morisset bf26f8fd56 Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
It is breaking the build on the buildbots but works fine on my machine, I revert
while trying to understand what happens (it appears to depend on the compiler used
to build, I probably used a C++11 feature that is not perfectly supported by some
of the buildbots).

This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc.

llvm-svn: 217973
2014-09-17 18:09:13 +00:00
Juergen Ributzka d8e30c0db8 [FastISel][AArch64] Fold compare with zero and branch into CBZ and CBNZ.
This takes advanatage of the CBZ and CBNZ instruction to further optimize the
common null check pattern into a single instruction.

This is related to rdar://problem/18358882.

llvm-svn: 217972
2014-09-17 18:05:34 +00:00
Juergen Ributzka fb3e14375a [FastISel][AArch64] Improve branch selection to support all FP conditions.
This adds the last two missing floating-point condition codes (FCMP_UEQ and
FCMP_ONE) also to the branch selection. In these two cases an additonal branch
instruction is required.

This also adds unit tests to checks all the different condition codes.

This is related o rdar://problem/18358882.

llvm-svn: 217966
2014-09-17 17:46:47 +00:00
Robin Morisset 1c8a457575 [ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors
Summary:
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.

Thanks to luqmana for having noticed this bug.

Test Plan: Added more cases to atomic-load-store.ll + make check-all

Reviewers: jfb, t.p.northover, luqmana

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5304

llvm-svn: 217965
2014-09-17 17:41:16 +00:00
Matt Arsenault 02dc26529e R600/SI: Change formatting of printed FP immediates
Only 1 decimal place should be printed for inline immediates.
Other constants should be hex constants.

Does not include f64 tests because folding those inline
immediates currently does not work.

llvm-svn: 217964
2014-09-17 17:32:13 +00:00
Chad Rosier 307b50b0f6 [IndVarSimplify] Partially revert r217953 to see if this fixes the bots.
Specifically, disable widening of unsigned compare instructions.

llvm-svn: 217962
2014-09-17 16:35:09 +00:00
Chad Rosier bb99f40530 [IndVarSimplify] Widen loop compare instructions.
This improves other optimizations such as LSR.  A sext may be added to the
compare's other operand, but this can often be hoisted outside of the loop.

llvm-svn: 217953
2014-09-17 14:10:33 +00:00
Andrea Di Biagio 5b92b4971a [InstCombine] Fix wrong folding of constant comparison involving ahsr and negative quantities (PR20945).
Example:
define i1 @foo(i32 %a) {
  %shr = ashr i32 -9, %a
  %cmp = icmp ne i32 %shr, -5
  ret i1 %cmp
}

Before this fix, the instruction combiner wrongly thought that %shr
could have never been equal to -5. Therefore, %cmp was always folded to 'true'.
However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore,
in this example, it is not valid to fold %cmp to 'true'.
The problem was only affecting the case where the comparison was between
negative quantities where one of the quantities was obtained from arithmetic
shift of a negative constant.

This patch fixes the problem with the wrong folding (fixes PR20945).
With this patch, the 'icmp' from the example is now simplified to a
comparison between %a and 1. This still allows us to get rid of the arithmetic
shift (%shr).

llvm-svn: 217950
2014-09-17 11:32:31 +00:00
Toma Tabacu 351b2feeb3 [mips] Add assembler support for the .set nodsp directive.
Summary: This directive is used to tell the assembler to reject DSP-specific instructions.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D5142

llvm-svn: 217946
2014-09-17 09:01:54 +00:00
Pavel Chupin 37b65d81dd [x32] Fix function indirect calls
Summary: Zero-extend register to 64-bit for callq/jmpq.

Test Plan: 3 tests added

Reviewers: nadav, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5355

llvm-svn: 217942
2014-09-17 07:09:23 +00:00
David Majnemer b435a4214e InstSimplify: Don't allow (x srem y) urem y -> x srem y
Let's consider the case where:
%x i16 = 32768
%y i16 = 384

%x srem %y = 65408
(%x srem %y) urem %y = 128

llvm-svn: 217939
2014-09-17 04:16:35 +00:00