Commit Graph

26294 Commits

Author SHA1 Message Date
Chandler Carruth 397d12c4b4 [x86] More tweaks to the v32i8 test cases.
I made a mistake in the previous commit and produced the wrong pattern.
Fix that. Also make one more shuffle pattern byte-based rather than
word-based, and add two more blend patterns.

llvm-svn: 218439
2014-09-25 02:44:39 +00:00
Chandler Carruth a03011ffae [x86] Re-work a bunch of the v32i8 test cases to actually involve byte
shuffles rather than word shuffles.

As you might guess, these were built starting from the word shuffle test
cases and I failed to properly port a bunch of them and left them as
widened word shuffle test cases. We still have a couple of tests that
check our ability to widen shuffles, but now we will test the actual
byte shuffle quite a bit better.

llvm-svn: 218438
2014-09-25 02:20:02 +00:00
Reid Kleckner 81782f0cb8 MC: Use @IMGREL instead of @IMGREL32, which we can't parse
Nico Rieck added support for this 32-bit COFF relocation some time ago
for Win64 stuff. It appears that as an oversight, the assembly output
used "foo"@IMGREL32 instead of "foo"@IMGREL, which is what we can parse.

Sadly, there were actually tests that took in IMGREL and put out
IMGREL32, and we didn't notice the inconsistency. Oh well. Now LLVM can
assemble it's own output with slightly more fidelity.

llvm-svn: 218437
2014-09-25 02:09:18 +00:00
Chandler Carruth a577bc26b6 [x86] Fix the v16i16 blend logic I added in the prior commit and add the
missing test cases for it.

Unsurprisingly, without test cases, there were bugs here. Surprisingly,
this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV.
It isn't wired to anything. Oops. I'll fix than next.

llvm-svn: 218434
2014-09-25 01:13:38 +00:00
Justin Bogner b35a72ae9e llvm-cov: Combine segments that cover the same location
If we have multiple coverage counts for the same segment, we need to
add them up rather than arbitrarily choosing one. This fixes that and
adds a test with template instantiations to exercise it.

llvm-svn: 218432
2014-09-25 00:34:18 +00:00
Akira Hatanaka 8cc48bd159 [X86,AVX] Add an isel pattern for X86VBroadcast.
This fixes PR21050 and rdar://problem/18434607.

llvm-svn: 218431
2014-09-25 00:26:15 +00:00
Chandler Carruth 98443d89b9 [x86] Implement v16i16 support with AVX2 in the new vector shuffle
lowering.

This also implements the fancy blend lowering for v16i16 using AVX2 and
teaches the X86 backend to print shuffle masks for 256-bit PSHUFB
and PBLENDW instructions. It also makes the mask decoding correct for
PBLENDW instructions. The yaks, they are legion.

Tests are updated accordingly. There are some missing tests for the
VBLENDVB lowering, but I'll add those in a follow-up as this commit has
accumulated enough cruft already.

llvm-svn: 218430
2014-09-25 00:24:19 +00:00
Kevin Enderby bf246f5a9d Flush out enough of llvm-objdump’s SymbolizerSymbolLookUp() for Mach-O files to
get the literal string “Hello world” printed as a comment on the instruction
that loads the pointer to it. For now this is just for x86_64. So for object
files with relocation entries it produces things like:

	leaq	L_.str(%rip), %rax      ## literal pool for: "Hello world\n"

and similar for fully linked images like executables:

	leaq	0x4f(%rip), %rax        ## literal pool for: "Hello world\n"

Also to allow testing against darwin’s otool(1), I hooked up the existing 
-no-show-raw-insn option to the Mach-O parser code, added the new Mach-O
only -full-leading-addr option to match otool(1)'s printing of addresses and
also added the new -print-imm-hex option.

llvm-svn: 218423
2014-09-24 23:08:22 +00:00
Kostya Serebryany 34ddf8725c [asan] don't instrument module CTORs that may be run before asan.module_ctor. This fixes asan running together -coverage
llvm-svn: 218421
2014-09-24 22:41:55 +00:00
Renato Golin 9c4a6d87ec Removing empty ARM tests from failed revert
llvm-svn: 218419
2014-09-24 21:58:04 +00:00
Renato Golin a86bbc37f2 Removing empty tests from failed revert
llvm-svn: 218417
2014-09-24 21:45:26 +00:00
Renato Golin 4b5f91f513 Revert 218406 - Refactor the RelocVisitor::visit method
llvm-svn: 218416
2014-09-24 21:30:43 +00:00
Renato Golin ba89f068bf Revert 218407 - Add support for ARM and AArch64 BE object files
llvm-svn: 218415
2014-09-24 21:30:14 +00:00
Renato Golin d35e6f6aee Revert 218408 - Report endianness in output of {dwarf, obj}dump
llvm-svn: 218414
2014-09-24 21:29:45 +00:00
Renato Golin 2328747ede Revert 218411 - XFAIL reloc test on x86/hexagon
llvm-svn: 218413
2014-09-24 21:28:53 +00:00
Renato Golin 7aa836043f XFAIL reloc test on x86/hexagon
llvm-svn: 218411
2014-09-24 21:00:30 +00:00
Renato Golin 6f92c6b982 Report endianness in output of {dwarf, obj}dump
For biendian targets like ARM and AArch64, it is useful to have the
output of the llvm-dwarfdump and llvm-objdump report the endianness
used when the object files were generated.

Patch by Charlie Turner.

llvm-svn: 218408
2014-09-24 20:07:41 +00:00
Renato Golin ed654f5852 Add support for ARM and AArch64 BE object files
This change fixes the ARM and AArch64 relocation visitors in
RelocVisitor.  They were unconditionally assuming the object data are
little-endian.  Tests have been added to ensure that the
llvm-dwarfdump utility does not crash when processing big-endian
object files.

Patch by Charlie Turner.

llvm-svn: 218407
2014-09-24 20:07:30 +00:00
Renato Golin 2b25450061 Refactor the RelocVisitor::visit method
This change replaces the brittle if/else chain of string comparisons
with a switch statement on the detected target triple, removing the
need for testing arbitrary architecture names returned from
getFileFormatName, whose primary purpose seems to be for display
(user-interface) purposes. The visitor now takes a reference to the
object file, rather than its arbitrary file format name to figure out
whether the file is a 32 or 64-bit object file and what the detected
target triple is.

A set of tests have been added to help show that the refactoring processes
relocations for the same targets as the original code.

Patch by Charlie Turner.

llvm-svn: 218406
2014-09-24 20:07:22 +00:00
Scott Douglass ae671341c4 pass environment when invoking llvm-config from lit.cfg
Use the same environment when invoking llvm-config from lit.cfg as
will be used when running tests, so that ASAN_OPTIONS, INCLUDE, etc.
are present.

llvm-svn: 218403
2014-09-24 18:37:48 +00:00
Kaelyn Takata c4067328cf Revert "Add support for ARM and AArch64 BE object files"
This reverts commit r218389 as it depends on r218388.

llvm-svn: 218398
2014-09-24 18:00:20 +00:00
Kaelyn Takata e43d88e3f5 Revert "Report endianness in output of {dwarf, obj}dump"
This reverts commit r218391 as it depends on r218388 and r218389

llvm-svn: 218397
2014-09-24 18:00:17 +00:00
Kaelyn Takata f2fce14920 Revert "Refactor the RelocVisitor::visit method"
This reverts commit faac033f7364bb4226e22c8079c221c96af10d02.

The test depends on all targets to be enabled in llc in order to pass,
and needs to be rewritten/refactored to not have that dependency.

llvm-svn: 218393
2014-09-24 17:49:07 +00:00
Renato Golin 4edda28b8a Report endianness in output of {dwarf, obj}dump
For biendian targets like ARM and AArch64, it is useful to have the
output of the llvm-dwarfdump and llvm-objdump report the endianness
used when the object files were generated.

Patch by Charlie Turner.

llvm-svn: 218391
2014-09-24 17:01:33 +00:00
Renato Golin 0e92815e94 Add support for ARM and AArch64 BE object files
This change fixes the ARM and AArch64 relocation visitors in
RelocVisitor.  They were unconditionally assuming the object data are
little-endian.  Tests have been added to ensure that the
llvm-dwarfdump utility does not crash when processing big-endian
object files.

Patch by Charlie Turner.

llvm-svn: 218389
2014-09-24 17:01:06 +00:00
Renato Golin 53f6034f8e Refactor the RelocVisitor::visit method
This change replaces the brittle if/else chain of string comparisons
with a switch statement on the detected target triple, removing the
need for testing arbitrary architecture names returned from
getFileFormatName, whose primary purpose seems to be for display
(user-interface) purposes. The visitor now takes a reference to the
object file, rather than its arbitrary file format name to figure out
whether the file is a 32 or 64-bit object file and what the detected
target triple is.

A set of tests have been added to help show that the refactoring processes
relocations for the same targets as the original code.

Patch by Charlie Turner.

llvm-svn: 218388
2014-09-24 17:00:42 +00:00
David Peixotto 0d4d5e64ec Fix assertion in LICM doFinalization()
The doFinalization method checks that the LoopToAliasSetMap is
empty. LICM populates that map as it runs through the loop nest,
deleting the entries for child loops as it goes. However, if a child
loop is deleted by another pass (e.g. unrolling) then the loop will
never be deleted from the map because LICM walks the loop nest to
find entries it can delete.

The fix is to delete the loop from the map and free the alias set
when the loop is deleted from the loop nest.

Differential Revision: http://reviews.llvm.org/D5305

llvm-svn: 218387
2014-09-24 16:48:31 +00:00
Moritz Roth f5d0c7c2c0 [Thumb] Make load/store optimizer less conservative.
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.

This is effectively a (heavily bug-fixed) rewrite of r208992.

llvm-svn: 218386
2014-09-24 16:35:50 +00:00
Oliver Stannard 1ae8b476f4 [Thumb] 32-bit encodings of 'cps' are not valid for v7M
v7M only allows the 16-bit encoding of the 'cps' (Change Processor
State) instruction, and does not have the 32-bit encoding which is
valid from v6T2 onwards.

llvm-svn: 218382
2014-09-24 14:20:01 +00:00
Chandler Carruth e7e9c04ddf [x86] Teach the instruction lowering to add comments describing constant
pool data being loaded into a vector register.

The comments take the form of:

  # ymm0 = [a,b,c,d,...]
  # xmm1 = <x,y,z...>

The []s are used for generic sequential data and the <>s are used for
specifically ConstantVector loads. Undef elements are printed as the
letter 'u', integers in decimal, and floating point values as floating
point values. Suggestions on improving the formatting or other aspects
of the display are very welcome.

My primary use case for this is to be able to FileCheck test masks
passed to vector shuffle instructions in-register. It isn't fantastic
for that (no decoding special zeroing semantics or other tricks), but it
at least puts the mask onto an instruction line that could reasonably be
checked. I've updated many of the new vector shuffle lowering tests to
leverage this in their test cases so that we're actually checking the
shuffle masks remain as expected.

Before implementing this, I tried a *bunch* of different approaches.
I looked into teaching the MCInstLower code to scan up the basic block
and find a definition of a register used in a shuffle instruction and
then decode that, but this seems incredibly brittle and complex.
I talked to Hal a lot about the "right" way to do this: attach the raw
shuffle mask to the instruction itself in some form of unencoded
operands, and then use that to emit the comments. I still think that's
the optimal solution here, but it proved to be beyond what I'm up for
here. In particular, it seems likely best done by completing the
plumbing of metadata through these layers and attaching the shuffle mask
in metadata which could have fully automatic dropping when encoding an
actual instruction.

llvm-svn: 218377
2014-09-24 09:39:41 +00:00
Matt Arsenault 3e0effa223 R600/SI: Fix weird CHECK-DAG usage
This prevents these from failing in a future commit.

llvm-svn: 218356
2014-09-24 02:14:26 +00:00
Tom Stellard 744b99b476 R600/SI: Enable selecting SALU inside branches
We can do this now that the FixSGPRLiveRanges pass is working.

llvm-svn: 218353
2014-09-24 01:33:28 +00:00
Chandler Carruth 9bd10e7492 [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles with
the native AVX2 instructions.

Note that the test case is really frustrating here because VPERMD
requires the mask to be in the register input and we don't produce
a comment looking through that to the constant pool. I'm going to
attempt to improve this in a subsequent commit, but not sure if I will
succeed.

llvm-svn: 218347
2014-09-24 01:24:44 +00:00
Chandler Carruth fd11815a7d [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle
detection. It was incorrectly handling undef lanes by actually treating
an undef lane in the first 128-bit lane as a *numeric* shuffle value.

Fortunately, this almost always DTRT and disabled detecting repeated
patterns. But not always. =/ This patch introduces a much more
principled approach and fixes the miscompiles I spotted by inspection
previously.

llvm-svn: 218346
2014-09-24 01:03:57 +00:00
Robin Morisset dc1b248ccf Fix swift-atomics testcase
This testcase was not testing what it meant: because there were only two checks for
dmb {{ish}} in the second function, it could have missed a bug where one of the three
required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added
CHECK-LABELs to make it a bit less brittle.

llvm-svn: 218341
2014-09-23 23:18:01 +00:00
Chandler Carruth df2e421845 [x86] Teach the new vector shuffle lowering to lower v4i64 vector
shuffles using the AVX2 instructions. This is the first step of cutting
in real AVX2 support.

Note that I have spotted at least one bug in the test cases already, but
I suspect it was already present and just is getting surfaced. Will
investigate next.

llvm-svn: 218338
2014-09-23 22:39:02 +00:00
Reid Kleckner 78927e884b GlobalOpt: Preserve comdats of unoptimized initializers
Rather than slurping in and splatting out the whole ctor list, preserve
the existing array entries without trying to understand them.  Only
remove the entries that we know we can optimize away.  This way we don't
need to wire through priority and comdats or anything else we might add.

Fixes a linker issue where the .init_array or .ctors entry would point
to discarded initialization code if the comdat group from the TU with
the faulty global_ctors entry was dropped.

llvm-svn: 218337
2014-09-23 22:33:01 +00:00
Jim Grosbach 57fd2623c3 AArch64: allow constant expressions for shifted reg literals
e.g., add w1, w2, w3, lsl #(2 - 1)

This sort of thing comes up in pre-processed assembly playing macro games.
Still validate that it's an assembly time constant. The early exit error check
was just a bit overzealous and disallowed a left paren.

rdar://18430542

llvm-svn: 218336
2014-09-23 22:16:02 +00:00
Chandler Carruth 9a94bd6fa4 [x86] Teach the rest of the 'target shuffle' machinery about blends and
add VPBLENDD to the InstPrinter's comment generation so we get nice
comments everywhere.

Now that we have the nice comments, I can see the bug introduced by
a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay
tests that are easy to inspect.

llvm-svn: 218335
2014-09-23 22:14:14 +00:00
Robin Morisset 6dbbbc28b0 [X86] Make wide loads be managed by AtomicExpand
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.

Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).

Test Plan: make check-all (lots of tests for this functionality already exist)

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5404

llvm-svn: 218332
2014-09-23 20:59:25 +00:00
Robin Morisset 2212996936 [Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.

I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.

Test Plan: new test + make check-all

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5180

llvm-svn: 218331
2014-09-23 20:46:49 +00:00
Chandler Carruth adcfec995c [x86] Teach the new shuffle lowering's blend functionality to use AVX2's
VPBLENDD where appropriate even on 128-bit vectors.

According to Agner's tables, this instruction is significantly higher
throughput (can execute on any port) on Haswell chips so we should
aggressively try to form it when available.

Sadly, this loses our delightful shuffle comments. I'll add those back
for VPBLENDD next.

llvm-svn: 218322
2014-09-23 18:16:12 +00:00
Oliver Stannard c546625c4f Fix segfault in AArch64 backend with -g and -mbig-endian
Fix a null pointer dereference when trying to swap the endianness of
fixups in the .eh_frame section in the AArch64 backend.

llvm-svn: 218311
2014-09-23 15:38:11 +00:00
Timur Iskhodzhanov f6b889126c Fix a small typo in the test comment
llvm-svn: 218306
2014-09-23 14:07:12 +00:00
Timur Iskhodzhanov d171153f81 Rebuild the inputs for the codeview-linetables.test with VS2013
Also provide reproducible instructions

llvm-svn: 218303
2014-09-23 13:49:51 +00:00
Chandler Carruth 40592d2dec [x86] Teach the vector comment parsing and printing to correctly handle
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.

A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.

The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.

llvm-svn: 218301
2014-09-23 11:15:19 +00:00
Chandler Carruth 6d5916a2d7 [x86] Teach the AVX1 path of the new vector shuffle lowering one more
trick that I missed.

VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.

However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.

There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.

llvm-svn: 218300
2014-09-23 10:08:29 +00:00
Michael Kuperstein 946b3b2e16 Ensure bitcode encoding stays stable.
This includes constants, attributes, and some additional instructions not covered by previous tests.

Work was done by lama.saba@intel.com.

llvm-svn: 218297
2014-09-23 08:48:01 +00:00
Sanjay Patel 4bc685c206 tighten up checks
We manage to generate all of the matching instructions (and a lot more) via
the reciprocal optimization function - even if we completely remove the square
root optimization. With CHECK_NEXT, we assure that we're executing the
expected square root optimization paths and not generating extra insts.

llvm-svn: 218284
2014-09-22 22:46:44 +00:00
Sanjay Patel 5cf7561d21 remove unnecessary labels; NFC
llvm-svn: 218278
2014-09-22 21:52:53 +00:00
Juergen Ributzka 27e959d7b2 [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1).
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.

This should fix a bug found by Chad.

llvm-svn: 218275
2014-09-22 21:08:53 +00:00
David Majnemer 597be2ded6 MC: ReadOnlyWithRel section kinds should map to rdata in COFF
Don't consider ReadOnlyWithRel as a writable section in COFF, they
really belong in .rdata.

llvm-svn: 218268
2014-09-22 20:39:23 +00:00
Chandler Carruth 44deb8015c [x86] Introduce tests covering the gamut of 256-bit vector shuffling.
These are just test cases, no actual code yet. This establishes the
baseline fallback strategy we're starting from on AVX2 and the expected
lowering we use on AVX1.

Also, these test cases are very much generated. I've manually crafted
the specific pattern set that I'm hoping will be useful at exercising
the lowering code, but I've not (and could not) manually verify *all* of
these. I've spot checked and they seem legit to me.

As with the rest of vector shuffling, at a certain point the only really
useful way to check the correctness of this stuff is through fuzz
testing.

llvm-svn: 218267
2014-09-22 20:25:08 +00:00
Sanjay Patel 7939d7229d Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2).
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, 
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.

The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.

Differential Revision: http://reviews.llvm.org/D5347

llvm-svn: 218263
2014-09-22 18:54:01 +00:00
Akira Hatanaka f2a721a875 Fix test case commited in r218242 to appease buildbot.
llvm-svn: 218261
2014-09-22 18:07:20 +00:00
Tom Stellard 9f73851e39 Revert "R600/SI: Add support for global atomic add"
This reverts commit r218254.

The global_atomics.ll test fails with asserts disabled.  For some reason,
the compiler fails to produce the atomic no return variants.

llvm-svn: 218257
2014-09-22 16:44:04 +00:00
Frederic Riss 220fa48491 Fix a test introduced in r218246 to work also on Windows.
llvm-svn: 218255
2014-09-22 16:17:32 +00:00
Tom Stellard 2355a77e74 R600/SI: Add support for global atomic add
llvm-svn: 218254
2014-09-22 15:35:35 +00:00
Pavel Chupin be9f12102f [x32] Fix segmented stacks support
Summary:
Update segmented-stacks*.ll tests with x32 target case and make
corresponding changes to make them pass.

Test Plan: tests updated with x32 target

Reviewers: nadav, rafael, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5245

llvm-svn: 218247
2014-09-22 13:11:35 +00:00
Frederic Riss 955724e3f5 [dwarfdump] Dump full filenames as DW_AT_(decl|call)_file attribute values
Reviewers: dblaikie samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5192

llvm-svn: 218246
2014-09-22 12:36:04 +00:00
Frederic Riss 58ed53cfcd Allow DWARFDebugInfoEntryMinimal::getSubroutineName to resolve cross-unit references.
Summary: getSubroutineName is currently only used by llvm-symbolizer, thus add a binary test containing a cross-cu inlining example.

Reviewers: samsonov, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5394

llvm-svn: 218245
2014-09-22 12:35:53 +00:00
Robert Lougher 6da8a243f9 Fix assert when decoding PSHUFB mask
The PSHUFB mask decode routine used to assert if the mask index was out of
range (<0 or greater than the size of the vector).  The problem is, we can
legitimately have a PSHUFB with a large index using intrinsics.  The
instruction only uses the least significant 4 bits.  This change removes the
assert and masks the index to match the instruction behaviour.

llvm-svn: 218242
2014-09-22 11:54:38 +00:00
Oliver Stannard 14f97d0017 Downgrade DWARF2 section limit error to a warning
We currently emit an error when trying to assemble a file with more
than one section using DWARF2 debug info. This should be a warning
instead, as the resulting file will still be usable, but with a
degraded debug illusion.

llvm-svn: 218241
2014-09-22 10:45:16 +00:00
Chandler Carruth 7158c95d65 [x86] Move the AVX v4i64 test cases down to group them together.
Increasingly I don't want to mix the integer and floating point tests,
especially with AVX where they are handled quite differently.

llvm-svn: 218233
2014-09-22 03:05:23 +00:00
Chandler Carruth 12bbf7d922 [x86] Back out a bad choice about lowering v4i64 and pave the way for
a more sane approach to AVX2 support.

Fundamentally, there is no useful way to lower integer vectors in AVX.
None. We always end up with a VINSERTF128 in the end, so we might as
well eagerly switch to the floating point domain and do everything
there. This cleans up lots of weird and unlikely to be correct
differences between integer and floating point shuffles when we only
have AVX1.

The other nice consequence is that by doing things this way we will make
it much easier to write the integer lowering routines as we won't need
to duplicate the logic to check for AVX vs. AVX2 in each one -- if we
actually try to lower a 256-bit vector as an integer vector, we have
AVX2 and can rely on it. I think this will make the code much simpler
and more comprehensible.

Currently, I've disabled *all* support for AVX2 so that we always fall
back to AVX. This keeps everything working rather than asserting. That
will go away with the subsequent series of patches that provide
a baseline AVX2 implementation.

Please note, I'm going to implement AVX2 *without access to hardware*.
That means I cannot correctness test this path. I will be relying on
those with access to AVX2 hardware to do correctness testing and fix
bugs here, but as a courtesy I'm trying to sketch out the framework for
the new-style vector shuffle lowering in the context of the AVX2 ISA.

llvm-svn: 218228
2014-09-22 00:32:15 +00:00
Chandler Carruth 5d45962b2c [x86] Teach the new vector shuffle lowering how to cleverly lower single
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.

I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.

llvm-svn: 218226
2014-09-21 23:46:13 +00:00
Chandler Carruth b195e860f9 [x86] Add a bunch of test cases where we have different shuffle patterns
in the high and low 128-bit lanes of a v8f32 vector.

No functionality change yet, but wanted to set up the baseline for my
next patch which will make these quite a bit better. =]

llvm-svn: 218224
2014-09-21 23:32:42 +00:00
Chandler Carruth b3125c7522 [x86] Teach the new vector shuffle lowering to re-use the SHUFPS
lowering when it can use a symmetric SHUFPS across both 128-bit lanes.

This required making the SHUFPS lowering tolerant of other vector types,
and adjusting our canonicalization to canonicalize harder.

This is the last of the clever uses of symmetry I've thought of for
v8f32. The rest of the tricks I'm aware of here are to work around
assymetry in the mask.

llvm-svn: 218216
2014-09-21 13:35:14 +00:00
Chandler Carruth 33eda72802 [x86] Teach the new vector shuffle lowering the basics about insertion
of a single element into a zero vector for v4f64 and v4i64 in AVX.
Ironically, there is less to see here because xor+blend is so crazy fast
that we can't really beat that to zero the high 128-bit lane.

llvm-svn: 218214
2014-09-21 12:49:46 +00:00
Chandler Carruth 43f5974ea0 [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and
UNPCKHPS with AVX vectors by recognizing those patterns when they are
repeated for both 128-bit lanes.

With this, we now generate the exact same (really nice) code for
Quentin's avx_test_case.ll which was the most significant regression
reported for the new shuffle lowering. In fact, I'm out of specific test
cases for AVX lowering, the rest were AVX2 I think. However, there are
a bunch of pretty obvious remaining things to improve with AVX...

llvm-svn: 218213
2014-09-21 12:20:44 +00:00
Chandler Carruth 78f4798913 [x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in
preparation for enhancing their support in the new vector shuffle
lowering.

llvm-svn: 218212
2014-09-21 12:13:11 +00:00
Chandler Carruth 88404c4f9b [x86] Begin teaching the new vector shuffle lowering among the most
important bits of cleverness: to detect and lower repeated shuffle
patterns between the two 128-bit lanes with a single instruction.

This patch just teaches it how to lower single-input shuffles that fit
this model using VPERMILPS. =] There is more that needs to happen here.

llvm-svn: 218211
2014-09-21 12:01:19 +00:00
Chandler Carruth 83252ac8f4 [x86] Regenerate this test case now that I've improved my script for
generating the test cases to format things more consistently and
actually catch all the operand sequences that should be elided in favor
of the asm comments. No actual changes here.

llvm-svn: 218210
2014-09-21 11:51:33 +00:00
Chandler Carruth e81bfbada9 [x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct
VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows
down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the
reciprocal throughput on Ivy Bridge CPUs much like it does everywhere
for 128-bits. There isn't a downside, so just eagerly use this
instruction when it suffices.

llvm-svn: 218208
2014-09-21 11:17:55 +00:00
Chandler Carruth 6aea21df8e [x86] Add some more comprehensive tests for v4f64 blending.
llvm-svn: 218207
2014-09-21 11:12:19 +00:00
Chandler Carruth 908afb56c0 [x86] Re-generate a bunch of the v4f64 test cases with my new script.
This expands the integer cases to cover the fact that AVX2 moves their
lane-crossing shuffles into the integer domain. It also adds proper
support for AVX2 run lines and the "ALL" group when it doesn't matter.

llvm-svn: 218206
2014-09-21 11:07:41 +00:00
Chandler Carruth 293327ddcd [x86] Teach the new vector shuffle lowering the first step toward more
actual support for complex AVX shuffling tricks. We can do independent
blends of the low and high 128-bit lanes of an avx vector, so shuffle
the inputs into place and then do the blend at 256 bits. This will in
many cases remove one blend instruction.

The next step is to permute the low and high halves in-place rather than
extracting them and re-inserting them.

llvm-svn: 218202
2014-09-21 09:35:22 +00:00
David Majnemer 48227a3759 MC: Support aligned COMMON symbols for COFF
link.exe:
Fuzz testing has shown that COMMON symbols with size > 32 will always
have an alignment of at least 32 and all symbols with size < 32 will
have an alignment of at least the largest power of 2 less than the size
of the symbol.

binutils:
The BFD linker essentially work like the link.exe behavior but with
alignment 4 instead of 32.  The BFD linker also supports an extension to
COFF which adds an -aligncomm argument to the .drectve section which
permits specifying a precise alignment for a variable but MC currently
doesn't support editing .drectve in this way.

With all of this in mind, we decide to play a little trick: we can
ensure that the alignment will be respected by bumping the size of the
global to it's alignment.

llvm-svn: 218201
2014-09-21 09:18:07 +00:00
Chandler Carruth 8ff73c0170 [x86] Add some more test cases covering specific blend patterns.
llvm-svn: 218200
2014-09-21 09:01:26 +00:00
Chandler Carruth 7a6108d652 [x86] Add the beginnings of some tests for our v8f32 shuffle lowering
under AVX.

This really just documents the current state of the world. I'm going to
try to flesh it out to cover any test cases I plan to improve prior to
improving them so that the delta made by changes is actually visible to
code reviewers.

This is made easier by the fact that I now have a script to automate the
process of producing test cases including the check lines. =]

llvm-svn: 218199
2014-09-21 08:49:27 +00:00
Chandler Carruth a454812ac8 [x86] Teach the new vector shuffle lowering to use VPERMILPD for
single-input shuffles with doubles. This allows them to fold memory
operands into the shuffle, etc. This is just the analog to the v4f32
case in my prior commit.

llvm-svn: 218193
2014-09-20 22:09:27 +00:00
Chandler Carruth aa5b798ae7 [x86] Add an AVX run to the 128-bit v2 tests, teach them to have
a generic SSE and AVX mode in addition to a specific AVX1 test path, and
flesh out the AVX tests.

llvm-svn: 218192
2014-09-20 21:26:41 +00:00
David Majnemer fb83977538 Update tests which broke from r218189
llvm-svn: 218191
2014-09-20 21:18:43 +00:00
Chandler Carruth 6f80abac4e [x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS
instruction for single-vector floating point shuffles. This in turn
allows the shuffles to fold a load into the instruction which is one of
the common regressions hit with the new shuffle lowering.

llvm-svn: 218190
2014-09-20 20:52:07 +00:00
David Majnemer 7d0dc3ef18 MC: Fix MCSectionCOFF::PrintSwitchToSection
We had a few bugs:
- We were considering the GVKind instead of just looking at the section
  characteristics
- We would never print out 'y' when a section was meant to be unreadable
- We would never print out 's' when a section was meant to be shared
- We translated IMAGE_SCN_MEM_DISCARDABLE to 'n' when it should've meant
  IMAGE_SCN_LNK_REMOVE

llvm-svn: 218189
2014-09-20 20:40:50 +00:00
Chandler Carruth 78a761ce8c [x86] Start moving to a fancier check syntax to reduce the need for
duplication of check lines. The idea is to have broad sets of
compilation modes that will frequently diverge without having to always
and immediately explode to the precise ISA feature set.

While this already helps due to VEX encoded differences, it will help
much more as I teach the new shuffle lowering about more of the new VEX
encoded instructions which can still be used to implement 128-bit
shuffles.

llvm-svn: 218188
2014-09-20 18:36:39 +00:00
David Majnemer b8dbebb31c MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF
A problem with our old behavior becomes observable under x86-64 COFF
when we need a read-only GV which has an initializer which is referenced
using a relocation: we would mark the section as writable.  Marking the
section as writable interferes with section merging.

This fixes PR21009.

llvm-svn: 218179
2014-09-20 07:31:46 +00:00
Chandler Carruth 8c4cccd4aa [x86] Teach the v4f32 path of the new shuffle lowering to handle the
tricky case of single-element insertion into the zero lane of a zero
vector.

We can't just use the same pattern here as we do in every other vector
type because the general insertion logic can handle insertion into the
non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we
have INSERTPS that is a much better choice than the generic one for such
lowerings. But INSERTPS can do lots of other lowerings as well so
factoring its logic into the general insertion logic doesn't work very
well. We also can't just extract the core common part of the general
insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that
lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than
a blend while INSERTPS is slower! So instead we do a restrictive
condition on attempting to use the generic insertion logic to narrow it
to those cases where VZEXT_MOVL won't need a shuffle afterward and thus
will do better than INSERTPS. Then we try blending. Then we go back to
INSERTPS.

This still doesn't generate perfect code for some silly reasons that can
be fixed by tweaking the td files for lowering VZEXT_MOVL to use
XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends
up in a register rather than a load from memory -- BLENDPSrr has twice
the reciprocal throughput of MOVSSrr. Don't you love this ISA?

llvm-svn: 218177
2014-09-20 04:15:22 +00:00
Chandler Carruth 00389f3ed9 [x86] Generalize the single-element insertion lowering to work with
floating point types and use it for both v2f64 and v2i64 single-element
insertion lowering.

This fixes the last non-AVX performance regression test case I've gotten
of for the new vector shuffle lowering. There is obvious analogous
lowering for v4f32 that I'll add in a follow-up patch (because with
INSERTPS, v4f32 requires special treatment). After that, its AVX stuff.

llvm-svn: 218175
2014-09-20 03:32:25 +00:00
David Majnemer f4dc456eef llvm-readobj: pretty-print special COFF section names
Print IMAGE_SYM_DEBUG and the like instead of (-2).

llvm-svn: 218172
2014-09-20 00:25:06 +00:00
Peter Collingbourne 975726345c Fix crash with an insertvalue that produces an empty object.
llvm-svn: 218171
2014-09-20 00:10:47 +00:00
Matt Arsenault de0253791c R600: Un-xfail a test which passes with pass disabled
llvm-svn: 218165
2014-09-19 23:02:20 +00:00
Matt Arsenault 5e5b242946 R600/SI: Un-xfail tests which work now
llvm-svn: 218164
2014-09-19 23:02:18 +00:00
Matt Arsenault a986554377 R600/SI: Un xfail a test that works now
llvm-svn: 218162
2014-09-19 22:42:40 +00:00
Juergen Ributzka 92e8978e40 [FastIsel][AArch64] Fix a think-o in address computation.
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.

There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.

llvm-svn: 218161
2014-09-19 22:23:46 +00:00
Chandler Carruth 0fc0c22fa9 [x86] Fully generalize the zext lowering in the new vector shuffle
lowering to support both anyext and zext and to custom lower for many
different microarchitectures.

Using this allows us to get *exactly* the right code for zext and anyext
shuffles in all the vector sizes. For v16i8, the improvement is *huge*.
The new SSE2 test case added I refused to add before this because it was
sooooo muny instructions.

llvm-svn: 218143
2014-09-19 20:00:32 +00:00
Justin Bogner a829fde160 llvm-cov: Prevent a test from matching its own check lines
Since llvm-cov shows the source file in its output, be careful about
potentially matching the check lines themselves.

llvm-svn: 218138
2014-09-19 19:04:08 +00:00
David Blaikie db119544a2 Fix test case to be portable to different architectures.
llvm-svn: 218134
2014-09-19 18:31:25 +00:00
Matt Arsenault 4505f3a73d R600/SI: Fix test to prepare for scheduler
llvm-svn: 218131
2014-09-19 18:11:16 +00:00
David Blaikie 3a7ce252cc Omit DW_TAG_subprograms for subprograms without inlined subroutines when producing -gmlt data
To reduce the size of -gmlt data, skip the subprograms without any
inlined subroutines. Since we've now got the ability to make these
determinations in the backend (funnily enough - we added the flag so we
wouldn't produce ranges under -gmlt, but with this change we use the
flag, but go back to producing ranges under -gmlt).

Instead, just produce CU ranges to inform the consumer which parts of
the code are described by this CU's line table. Tools could inspect the
line table directly to compute the range, but the CU ranges only seem to
be about 0.5% of object/executable size, so I'm not too worried about
teaching llvm-symbolizer that trick just yet - it's certainly a possible
piece of future work.

Update an llvm-symbolizer test just to demonstrate that this schema is
acceptable there (if it wasn't, the compiler-rt tests would catch this,
but good to have an in-llvm-tree test for llvm-symbolizer's behavior
here)

Building the clang binary with -gmlt with this patch reduces the total
size of object files by 5.1% (5.56% without ranges) without compression
and the executable by 4.37% (4.75% without ranges).

llvm-svn: 218129
2014-09-19 17:03:16 +00:00