llvm-project

Commit Graph

Author	SHA1	Message	Date
Chandler Carruth	397d12c4b4	[x86] More tweaks to the v32i8 test cases. I made a mistake in the previous commit and produced the wrong pattern. Fix that. Also make one more shuffle pattern byte-based rather than word-based, and add two more blend patterns. llvm-svn: 218439	2014-09-25 02:44:39 +00:00
Chandler Carruth	a03011ffae	[x86] Re-work a bunch of the v32i8 test cases to actually involve byte shuffles rather than word shuffles. As you might guess, these were built starting from the word shuffle test cases and I failed to properly port a bunch of them and left them as widened word shuffle test cases. We still have a couple of tests that check our ability to widen shuffles, but now we will test the actual byte shuffle quite a bit better. llvm-svn: 218438	2014-09-25 02:20:02 +00:00
Reid Kleckner	81782f0cb8	MC: Use @IMGREL instead of @IMGREL32, which we can't parse Nico Rieck added support for this 32-bit COFF relocation some time ago for Win64 stuff. It appears that as an oversight, the assembly output used "foo"@IMGREL32 instead of "foo"@IMGREL, which is what we can parse. Sadly, there were actually tests that took in IMGREL and put out IMGREL32, and we didn't notice the inconsistency. Oh well. Now LLVM can assemble it's own output with slightly more fidelity. llvm-svn: 218437	2014-09-25 02:09:18 +00:00
Chandler Carruth	a577bc26b6	[x86] Fix the v16i16 blend logic I added in the prior commit and add the missing test cases for it. Unsurprisingly, without test cases, there were bugs here. Surprisingly, this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV. It isn't wired to anything. Oops. I'll fix than next. llvm-svn: 218434	2014-09-25 01:13:38 +00:00
Justin Bogner	b35a72ae9e	llvm-cov: Combine segments that cover the same location If we have multiple coverage counts for the same segment, we need to add them up rather than arbitrarily choosing one. This fixes that and adds a test with template instantiations to exercise it. llvm-svn: 218432	2014-09-25 00:34:18 +00:00
Akira Hatanaka	8cc48bd159	[X86,AVX] Add an isel pattern for X86VBroadcast. This fixes PR21050 and rdar://problem/18434607. llvm-svn: 218431	2014-09-25 00:26:15 +00:00
Chandler Carruth	98443d89b9	[x86] Implement v16i16 support with AVX2 in the new vector shuffle lowering. This also implements the fancy blend lowering for v16i16 using AVX2 and teaches the X86 backend to print shuffle masks for 256-bit PSHUFB and PBLENDW instructions. It also makes the mask decoding correct for PBLENDW instructions. The yaks, they are legion. Tests are updated accordingly. There are some missing tests for the VBLENDVB lowering, but I'll add those in a follow-up as this commit has accumulated enough cruft already. llvm-svn: 218430	2014-09-25 00:24:19 +00:00
Kevin Enderby	bf246f5a9d	Flush out enough of llvm-objdump’s SymbolizerSymbolLookUp() for Mach-O files to get the literal string “Hello world” printed as a comment on the instruction that loads the pointer to it. For now this is just for x86_64. So for object files with relocation entries it produces things like: leaq L_.str(%rip), %rax ## literal pool for: "Hello world\n" and similar for fully linked images like executables: leaq 0x4f(%rip), %rax ## literal pool for: "Hello world\n" Also to allow testing against darwin’s otool(1), I hooked up the existing -no-show-raw-insn option to the Mach-O parser code, added the new Mach-O only -full-leading-addr option to match otool(1)'s printing of addresses and also added the new -print-imm-hex option. llvm-svn: 218423	2014-09-24 23:08:22 +00:00
Kostya Serebryany	34ddf8725c	[asan] don't instrument module CTORs that may be run before asan.module_ctor. This fixes asan running together -coverage llvm-svn: 218421	2014-09-24 22:41:55 +00:00
Renato Golin	9c4a6d87ec	Removing empty ARM tests from failed revert llvm-svn: 218419	2014-09-24 21:58:04 +00:00
Renato Golin	a86bbc37f2	Removing empty tests from failed revert llvm-svn: 218417	2014-09-24 21:45:26 +00:00
Renato Golin	4b5f91f513	Revert 218406 - Refactor the RelocVisitor::visit method llvm-svn: 218416	2014-09-24 21:30:43 +00:00
Renato Golin	ba89f068bf	Revert 218407 - Add support for ARM and AArch64 BE object files llvm-svn: 218415	2014-09-24 21:30:14 +00:00
Renato Golin	d35e6f6aee	Revert 218408 - Report endianness in output of {dwarf, obj}dump llvm-svn: 218414	2014-09-24 21:29:45 +00:00
Renato Golin	2328747ede	Revert 218411 - XFAIL reloc test on x86/hexagon llvm-svn: 218413	2014-09-24 21:28:53 +00:00
Renato Golin	7aa836043f	XFAIL reloc test on x86/hexagon llvm-svn: 218411	2014-09-24 21:00:30 +00:00
Renato Golin	6f92c6b982	Report endianness in output of {dwarf, obj}dump For biendian targets like ARM and AArch64, it is useful to have the output of the llvm-dwarfdump and llvm-objdump report the endianness used when the object files were generated. Patch by Charlie Turner. llvm-svn: 218408	2014-09-24 20:07:41 +00:00
Renato Golin	ed654f5852	Add support for ARM and AArch64 BE object files This change fixes the ARM and AArch64 relocation visitors in RelocVisitor. They were unconditionally assuming the object data are little-endian. Tests have been added to ensure that the llvm-dwarfdump utility does not crash when processing big-endian object files. Patch by Charlie Turner. llvm-svn: 218407	2014-09-24 20:07:30 +00:00
Renato Golin	2b25450061	Refactor the RelocVisitor::visit method This change replaces the brittle if/else chain of string comparisons with a switch statement on the detected target triple, removing the need for testing arbitrary architecture names returned from getFileFormatName, whose primary purpose seems to be for display (user-interface) purposes. The visitor now takes a reference to the object file, rather than its arbitrary file format name to figure out whether the file is a 32 or 64-bit object file and what the detected target triple is. A set of tests have been added to help show that the refactoring processes relocations for the same targets as the original code. Patch by Charlie Turner. llvm-svn: 218406	2014-09-24 20:07:22 +00:00
Scott Douglass	ae671341c4	pass environment when invoking llvm-config from lit.cfg Use the same environment when invoking llvm-config from lit.cfg as will be used when running tests, so that ASAN_OPTIONS, INCLUDE, etc. are present. llvm-svn: 218403	2014-09-24 18:37:48 +00:00
Kaelyn Takata	c4067328cf	Revert "Add support for ARM and AArch64 BE object files" This reverts commit r218389 as it depends on r218388. llvm-svn: 218398	2014-09-24 18:00:20 +00:00
Kaelyn Takata	e43d88e3f5	Revert "Report endianness in output of {dwarf, obj}dump" This reverts commit r218391 as it depends on r218388 and r218389 llvm-svn: 218397	2014-09-24 18:00:17 +00:00
Kaelyn Takata	f2fce14920	Revert "Refactor the RelocVisitor::visit method" This reverts commit faac033f7364bb4226e22c8079c221c96af10d02. The test depends on all targets to be enabled in llc in order to pass, and needs to be rewritten/refactored to not have that dependency. llvm-svn: 218393	2014-09-24 17:49:07 +00:00
Renato Golin	4edda28b8a	Report endianness in output of {dwarf, obj}dump For biendian targets like ARM and AArch64, it is useful to have the output of the llvm-dwarfdump and llvm-objdump report the endianness used when the object files were generated. Patch by Charlie Turner. llvm-svn: 218391	2014-09-24 17:01:33 +00:00
Renato Golin	0e92815e94	Add support for ARM and AArch64 BE object files This change fixes the ARM and AArch64 relocation visitors in RelocVisitor. They were unconditionally assuming the object data are little-endian. Tests have been added to ensure that the llvm-dwarfdump utility does not crash when processing big-endian object files. Patch by Charlie Turner. llvm-svn: 218389	2014-09-24 17:01:06 +00:00
Renato Golin	53f6034f8e	Refactor the RelocVisitor::visit method This change replaces the brittle if/else chain of string comparisons with a switch statement on the detected target triple, removing the need for testing arbitrary architecture names returned from getFileFormatName, whose primary purpose seems to be for display (user-interface) purposes. The visitor now takes a reference to the object file, rather than its arbitrary file format name to figure out whether the file is a 32 or 64-bit object file and what the detected target triple is. A set of tests have been added to help show that the refactoring processes relocations for the same targets as the original code. Patch by Charlie Turner. llvm-svn: 218388	2014-09-24 17:00:42 +00:00
David Peixotto	0d4d5e64ec	Fix assertion in LICM doFinalization() The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 llvm-svn: 218387	2014-09-24 16:48:31 +00:00
Moritz Roth	f5d0c7c2c0	[Thumb] Make load/store optimizer less conservative. If it's safe to clobber the condition flags, we can do a few extra things: it's then possible to reset the base register writeback using a SUBS, so we can try to merge even if the base register isn't dead after the merged instruction. This is effectively a (heavily bug-fixed) rewrite of r208992. llvm-svn: 218386	2014-09-24 16:35:50 +00:00
Oliver Stannard	1ae8b476f4	[Thumb] 32-bit encodings of 'cps' are not valid for v7M v7M only allows the 16-bit encoding of the 'cps' (Change Processor State) instruction, and does not have the 32-bit encoding which is valid from v6T2 onwards. llvm-svn: 218382	2014-09-24 14:20:01 +00:00
Chandler Carruth	e7e9c04ddf	[x86] Teach the instruction lowering to add comments describing constant pool data being loaded into a vector register. The comments take the form of: # ymm0 = [a,b,c,d,...] # xmm1 = <x,y,z...> The []s are used for generic sequential data and the <>s are used for specifically ConstantVector loads. Undef elements are printed as the letter 'u', integers in decimal, and floating point values as floating point values. Suggestions on improving the formatting or other aspects of the display are very welcome. My primary use case for this is to be able to FileCheck test masks passed to vector shuffle instructions in-register. It isn't fantastic for that (no decoding special zeroing semantics or other tricks), but it at least puts the mask onto an instruction line that could reasonably be checked. I've updated many of the new vector shuffle lowering tests to leverage this in their test cases so that we're actually checking the shuffle masks remain as expected. Before implementing this, I tried a bunch of different approaches. I looked into teaching the MCInstLower code to scan up the basic block and find a definition of a register used in a shuffle instruction and then decode that, but this seems incredibly brittle and complex. I talked to Hal a lot about the "right" way to do this: attach the raw shuffle mask to the instruction itself in some form of unencoded operands, and then use that to emit the comments. I still think that's the optimal solution here, but it proved to be beyond what I'm up for here. In particular, it seems likely best done by completing the plumbing of metadata through these layers and attaching the shuffle mask in metadata which could have fully automatic dropping when encoding an actual instruction. llvm-svn: 218377	2014-09-24 09:39:41 +00:00
Matt Arsenault	3e0effa223	R600/SI: Fix weird CHECK-DAG usage This prevents these from failing in a future commit. llvm-svn: 218356	2014-09-24 02:14:26 +00:00
Tom Stellard	744b99b476	R600/SI: Enable selecting SALU inside branches We can do this now that the FixSGPRLiveRanges pass is working. llvm-svn: 218353	2014-09-24 01:33:28 +00:00
Chandler Carruth	9bd10e7492	[x86] Teach the new vector shuffle lowering to lower v8i32 shuffles with the native AVX2 instructions. Note that the test case is really frustrating here because VPERMD requires the mask to be in the register input and we don't produce a comment looking through that to the constant pool. I'm going to attempt to improve this in a subsequent commit, but not sure if I will succeed. llvm-svn: 218347	2014-09-24 01:24:44 +00:00
Chandler Carruth	fd11815a7d	[x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle detection. It was incorrectly handling undef lanes by actually treating an undef lane in the first 128-bit lane as a numeric shuffle value. Fortunately, this almost always DTRT and disabled detecting repeated patterns. But not always. =/ This patch introduces a much more principled approach and fixes the miscompiles I spotted by inspection previously. llvm-svn: 218346	2014-09-24 01:03:57 +00:00
Robin Morisset	dc1b248ccf	Fix swift-atomics testcase This testcase was not testing what it meant: because there were only two checks for dmb {{ish}} in the second function, it could have missed a bug where one of the three required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added CHECK-LABELs to make it a bit less brittle. llvm-svn: 218341	2014-09-23 23:18:01 +00:00
Chandler Carruth	df2e421845	[x86] Teach the new vector shuffle lowering to lower v4i64 vector shuffles using the AVX2 instructions. This is the first step of cutting in real AVX2 support. Note that I have spotted at least one bug in the test cases already, but I suspect it was already present and just is getting surfaced. Will investigate next. llvm-svn: 218338	2014-09-23 22:39:02 +00:00
Reid Kleckner	78927e884b	GlobalOpt: Preserve comdats of unoptimized initializers Rather than slurping in and splatting out the whole ctor list, preserve the existing array entries without trying to understand them. Only remove the entries that we know we can optimize away. This way we don't need to wire through priority and comdats or anything else we might add. Fixes a linker issue where the .init_array or .ctors entry would point to discarded initialization code if the comdat group from the TU with the faulty global_ctors entry was dropped. llvm-svn: 218337	2014-09-23 22:33:01 +00:00
Jim Grosbach	57fd2623c3	AArch64: allow constant expressions for shifted reg literals e.g., add w1, w2, w3, lsl #(2 - 1) This sort of thing comes up in pre-processed assembly playing macro games. Still validate that it's an assembly time constant. The early exit error check was just a bit overzealous and disallowed a left paren. rdar://18430542 llvm-svn: 218336	2014-09-23 22:16:02 +00:00
Chandler Carruth	9a94bd6fa4	[x86] Teach the rest of the 'target shuffle' machinery about blends and add VPBLENDD to the InstPrinter's comment generation so we get nice comments everywhere. Now that we have the nice comments, I can see the bug introduced by a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay tests that are easy to inspect. llvm-svn: 218335	2014-09-23 22:14:14 +00:00
Robin Morisset	6dbbbc28b0	[X86] Make wide loads be managed by AtomicExpand Summary: AtomicExpand already had logic for expanding wide loads and stores on LL/SC architectures, and for expanding wide stores on CmpXchg architectures, but not for wide loads on CmpXchg architectures. This patch fills this hole, and makes use of this new feature in the X86 backend. Only one functionnal change: we now lose the SynchScope attribute. It is regrettable, but I have another patch that I will submit soon that will solve this for all of AtomicExpand (it seemed better to split it apart as it is a different concern). Test Plan: make check-all (lots of tests for this functionality already exist) Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5404 llvm-svn: 218332	2014-09-23 20:59:25 +00:00
Robin Morisset	2212996936	[Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate Summary: This patch makes use of AtomicExpandPass in Power for inserting fences around atomic as part of an effort to remove fence insertion from SelectionDAGBuilder. As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic lwsync) instead of sync 0 (heavyweight sync) in many cases. I also added a test, as there was no test for the barriers emitted by the Power backend for atomic loads and stores. Test Plan: new test + make check-all Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5180 llvm-svn: 218331	2014-09-23 20:46:49 +00:00
Chandler Carruth	adcfec995c	[x86] Teach the new shuffle lowering's blend functionality to use AVX2's VPBLENDD where appropriate even on 128-bit vectors. According to Agner's tables, this instruction is significantly higher throughput (can execute on any port) on Haswell chips so we should aggressively try to form it when available. Sadly, this loses our delightful shuffle comments. I'll add those back for VPBLENDD next. llvm-svn: 218322	2014-09-23 18:16:12 +00:00
Oliver Stannard	c546625c4f	Fix segfault in AArch64 backend with -g and -mbig-endian Fix a null pointer dereference when trying to swap the endianness of fixups in the .eh_frame section in the AArch64 backend. llvm-svn: 218311	2014-09-23 15:38:11 +00:00
Timur Iskhodzhanov	f6b889126c	Fix a small typo in the test comment llvm-svn: 218306	2014-09-23 14:07:12 +00:00
Timur Iskhodzhanov	d171153f81	Rebuild the inputs for the codeview-linetables.test with VS2013 Also provide reproducible instructions llvm-svn: 218303	2014-09-23 13:49:51 +00:00
Chandler Carruth	40592d2dec	[x86] Teach the vector comment parsing and printing to correctly handle undef in the shuffle mask. This shows up when we're printing comments during lowering and we still have an IR-level constant hanging around that models undef. A nice consequence of this is much prettier test cases where the undef lanes actually show up as undef rather than as a particular set of values. This also allows us to print shuffle comments in cases that use undef such as the recently added variable VPERMILPS lowering. Now those test cases have nice shuffle comments attached with their details. The shuffle lowering for PSHUFB has been augmented to use undef, and the shuffle combining has been augmented to comprehend it. llvm-svn: 218301	2014-09-23 11:15:19 +00:00
Chandler Carruth	6d5916a2d7	[x86] Teach the AVX1 path of the new vector shuffle lowering one more trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300	2014-09-23 10:08:29 +00:00
Michael Kuperstein	946b3b2e16	Ensure bitcode encoding stays stable. This includes constants, attributes, and some additional instructions not covered by previous tests. Work was done by lama.saba@intel.com. llvm-svn: 218297	2014-09-23 08:48:01 +00:00
Sanjay Patel	4bc685c206	tighten up checks We manage to generate all of the matching instructions (and a lot more) via the reciprocal optimization function - even if we completely remove the square root optimization. With CHECK_NEXT, we assure that we're executing the expected square root optimization paths and not generating extra insts. llvm-svn: 218284	2014-09-22 22:46:44 +00:00
Sanjay Patel	5cf7561d21	remove unnecessary labels; NFC llvm-svn: 218278	2014-09-22 21:52:53 +00:00
Juergen Ributzka	27e959d7b2	[FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1). Shift-left immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. This should fix a bug found by Chad. llvm-svn: 218275	2014-09-22 21:08:53 +00:00
David Majnemer	597be2ded6	MC: ReadOnlyWithRel section kinds should map to rdata in COFF Don't consider ReadOnlyWithRel as a writable section in COFF, they really belong in .rdata. llvm-svn: 218268	2014-09-22 20:39:23 +00:00
Chandler Carruth	44deb8015c	[x86] Introduce tests covering the gamut of 256-bit vector shuffling. These are just test cases, no actual code yet. This establishes the baseline fallback strategy we're starting from on AVX2 and the expected lowering we use on AVX1. Also, these test cases are very much generated. I've manually crafted the specific pattern set that I'm hoping will be useful at exercising the lowering code, but I've not (and could not) manually verify all of these. I've spot checked and they seem legit to me. As with the rest of vector shuffling, at a certain point the only really useful way to check the correctness of this stuff is through fuzz testing. llvm-svn: 218267	2014-09-22 20:25:08 +00:00
Sanjay Patel	7939d7229d	Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2). We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors. This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for size on any CPU with AVX or AVX2. The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data. Differential Revision: http://reviews.llvm.org/D5347 llvm-svn: 218263	2014-09-22 18:54:01 +00:00
Akira Hatanaka	f2a721a875	Fix test case commited in r218242 to appease buildbot. llvm-svn: 218261	2014-09-22 18:07:20 +00:00
Tom Stellard	9f73851e39	Revert "R600/SI: Add support for global atomic add" This reverts commit r218254. The global_atomics.ll test fails with asserts disabled. For some reason, the compiler fails to produce the atomic no return variants. llvm-svn: 218257	2014-09-22 16:44:04 +00:00
Frederic Riss	220fa48491	Fix a test introduced in r218246 to work also on Windows. llvm-svn: 218255	2014-09-22 16:17:32 +00:00
Tom Stellard	2355a77e74	R600/SI: Add support for global atomic add llvm-svn: 218254	2014-09-22 15:35:35 +00:00
Pavel Chupin	be9f12102f	[x32] Fix segmented stacks support Summary: Update segmented-stacks*.ll tests with x32 target case and make corresponding changes to make them pass. Test Plan: tests updated with x32 target Reviewers: nadav, rafael, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5245 llvm-svn: 218247	2014-09-22 13:11:35 +00:00
Frederic Riss	955724e3f5	[dwarfdump] Dump full filenames as DW_AT_(decl\|call)_file attribute values Reviewers: dblaikie samsonov Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5192 llvm-svn: 218246	2014-09-22 12:36:04 +00:00
Frederic Riss	58ed53cfcd	Allow DWARFDebugInfoEntryMinimal::getSubroutineName to resolve cross-unit references. Summary: getSubroutineName is currently only used by llvm-symbolizer, thus add a binary test containing a cross-cu inlining example. Reviewers: samsonov, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5394 llvm-svn: 218245	2014-09-22 12:35:53 +00:00
Robert Lougher	6da8a243f9	Fix assert when decoding PSHUFB mask The PSHUFB mask decode routine used to assert if the mask index was out of range (<0 or greater than the size of the vector). The problem is, we can legitimately have a PSHUFB with a large index using intrinsics. The instruction only uses the least significant 4 bits. This change removes the assert and masks the index to match the instruction behaviour. llvm-svn: 218242	2014-09-22 11:54:38 +00:00
Oliver Stannard	14f97d0017	Downgrade DWARF2 section limit error to a warning We currently emit an error when trying to assemble a file with more than one section using DWARF2 debug info. This should be a warning instead, as the resulting file will still be usable, but with a degraded debug illusion. llvm-svn: 218241	2014-09-22 10:45:16 +00:00
Chandler Carruth	7158c95d65	[x86] Move the AVX v4i64 test cases down to group them together. Increasingly I don't want to mix the integer and floating point tests, especially with AVX where they are handled quite differently. llvm-svn: 218233	2014-09-22 03:05:23 +00:00
Chandler Carruth	12bbf7d922	[x86] Back out a bad choice about lowering v4i64 and pave the way for a more sane approach to AVX2 support. Fundamentally, there is no useful way to lower integer vectors in AVX. None. We always end up with a VINSERTF128 in the end, so we might as well eagerly switch to the floating point domain and do everything there. This cleans up lots of weird and unlikely to be correct differences between integer and floating point shuffles when we only have AVX1. The other nice consequence is that by doing things this way we will make it much easier to write the integer lowering routines as we won't need to duplicate the logic to check for AVX vs. AVX2 in each one -- if we actually try to lower a 256-bit vector as an integer vector, we have AVX2 and can rely on it. I think this will make the code much simpler and more comprehensible. Currently, I've disabled all support for AVX2 so that we always fall back to AVX. This keeps everything working rather than asserting. That will go away with the subsequent series of patches that provide a baseline AVX2 implementation. Please note, I'm going to implement AVX2 without access to hardware. That means I cannot correctness test this path. I will be relying on those with access to AVX2 hardware to do correctness testing and fix bugs here, but as a courtesy I'm trying to sketch out the framework for the new-style vector shuffle lowering in the context of the AVX2 ISA. llvm-svn: 218228	2014-09-22 00:32:15 +00:00
Chandler Carruth	5d45962b2c	[x86] Teach the new vector shuffle lowering how to cleverly lower single input v8f32 shuffles which are not 128-bit lane crossing but have different shuffle patterns in the low and high lanes. This removes most of the extract/insert traffic that was unnecessary and is particularly good at lowering cases where only one of the two lanes is shuffled at all. I've also added a collection of test cases with undef lanes because this lowering is somewhat more sensitive to undef lanes than others. llvm-svn: 218226	2014-09-21 23:46:13 +00:00
Chandler Carruth	b195e860f9	[x86] Add a bunch of test cases where we have different shuffle patterns in the high and low 128-bit lanes of a v8f32 vector. No functionality change yet, but wanted to set up the baseline for my next patch which will make these quite a bit better. =] llvm-svn: 218224	2014-09-21 23:32:42 +00:00
Chandler Carruth	b3125c7522	[x86] Teach the new vector shuffle lowering to re-use the SHUFPS lowering when it can use a symmetric SHUFPS across both 128-bit lanes. This required making the SHUFPS lowering tolerant of other vector types, and adjusting our canonicalization to canonicalize harder. This is the last of the clever uses of symmetry I've thought of for v8f32. The rest of the tricks I'm aware of here are to work around assymetry in the mask. llvm-svn: 218216	2014-09-21 13:35:14 +00:00
Chandler Carruth	33eda72802	[x86] Teach the new vector shuffle lowering the basics about insertion of a single element into a zero vector for v4f64 and v4i64 in AVX. Ironically, there is less to see here because xor+blend is so crazy fast that we can't really beat that to zero the high 128-bit lane. llvm-svn: 218214	2014-09-21 12:49:46 +00:00
Chandler Carruth	43f5974ea0	[x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and UNPCKHPS with AVX vectors by recognizing those patterns when they are repeated for both 128-bit lanes. With this, we now generate the exact same (really nice) code for Quentin's avx_test_case.ll which was the most significant regression reported for the new shuffle lowering. In fact, I'm out of specific test cases for AVX lowering, the rest were AVX2 I think. However, there are a bunch of pretty obvious remaining things to improve with AVX... llvm-svn: 218213	2014-09-21 12:20:44 +00:00
Chandler Carruth	78f4798913	[x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in preparation for enhancing their support in the new vector shuffle lowering. llvm-svn: 218212	2014-09-21 12:13:11 +00:00
Chandler Carruth	88404c4f9b	[x86] Begin teaching the new vector shuffle lowering among the most important bits of cleverness: to detect and lower repeated shuffle patterns between the two 128-bit lanes with a single instruction. This patch just teaches it how to lower single-input shuffles that fit this model using VPERMILPS. =] There is more that needs to happen here. llvm-svn: 218211	2014-09-21 12:01:19 +00:00
Chandler Carruth	83252ac8f4	[x86] Regenerate this test case now that I've improved my script for generating the test cases to format things more consistently and actually catch all the operand sequences that should be elided in favor of the asm comments. No actual changes here. llvm-svn: 218210	2014-09-21 11:51:33 +00:00
Chandler Carruth	e81bfbada9	[x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the reciprocal throughput on Ivy Bridge CPUs much like it does everywhere for 128-bits. There isn't a downside, so just eagerly use this instruction when it suffices. llvm-svn: 218208	2014-09-21 11:17:55 +00:00
Chandler Carruth	6aea21df8e	[x86] Add some more comprehensive tests for v4f64 blending. llvm-svn: 218207	2014-09-21 11:12:19 +00:00
Chandler Carruth	908afb56c0	[x86] Re-generate a bunch of the v4f64 test cases with my new script. This expands the integer cases to cover the fact that AVX2 moves their lane-crossing shuffles into the integer domain. It also adds proper support for AVX2 run lines and the "ALL" group when it doesn't matter. llvm-svn: 218206	2014-09-21 11:07:41 +00:00
Chandler Carruth	293327ddcd	[x86] Teach the new vector shuffle lowering the first step toward more actual support for complex AVX shuffling tricks. We can do independent blends of the low and high 128-bit lanes of an avx vector, so shuffle the inputs into place and then do the blend at 256 bits. This will in many cases remove one blend instruction. The next step is to permute the low and high halves in-place rather than extracting them and re-inserting them. llvm-svn: 218202	2014-09-21 09:35:22 +00:00
David Majnemer	48227a3759	MC: Support aligned COMMON symbols for COFF link.exe: Fuzz testing has shown that COMMON symbols with size > 32 will always have an alignment of at least 32 and all symbols with size < 32 will have an alignment of at least the largest power of 2 less than the size of the symbol. binutils: The BFD linker essentially work like the link.exe behavior but with alignment 4 instead of 32. The BFD linker also supports an extension to COFF which adds an -aligncomm argument to the .drectve section which permits specifying a precise alignment for a variable but MC currently doesn't support editing .drectve in this way. With all of this in mind, we decide to play a little trick: we can ensure that the alignment will be respected by bumping the size of the global to it's alignment. llvm-svn: 218201	2014-09-21 09:18:07 +00:00
Chandler Carruth	8ff73c0170	[x86] Add some more test cases covering specific blend patterns. llvm-svn: 218200	2014-09-21 09:01:26 +00:00
Chandler Carruth	7a6108d652	[x86] Add the beginnings of some tests for our v8f32 shuffle lowering under AVX. This really just documents the current state of the world. I'm going to try to flesh it out to cover any test cases I plan to improve prior to improving them so that the delta made by changes is actually visible to code reviewers. This is made easier by the fact that I now have a script to automate the process of producing test cases including the check lines. =] llvm-svn: 218199	2014-09-21 08:49:27 +00:00
Chandler Carruth	a454812ac8	[x86] Teach the new vector shuffle lowering to use VPERMILPD for single-input shuffles with doubles. This allows them to fold memory operands into the shuffle, etc. This is just the analog to the v4f32 case in my prior commit. llvm-svn: 218193	2014-09-20 22:09:27 +00:00
Chandler Carruth	aa5b798ae7	[x86] Add an AVX run to the 128-bit v2 tests, teach them to have a generic SSE and AVX mode in addition to a specific AVX1 test path, and flesh out the AVX tests. llvm-svn: 218192	2014-09-20 21:26:41 +00:00
David Majnemer	fb83977538	Update tests which broke from r218189 llvm-svn: 218191	2014-09-20 21:18:43 +00:00
Chandler Carruth	6f80abac4e	[x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS instruction for single-vector floating point shuffles. This in turn allows the shuffles to fold a load into the instruction which is one of the common regressions hit with the new shuffle lowering. llvm-svn: 218190	2014-09-20 20:52:07 +00:00
David Majnemer	7d0dc3ef18	MC: Fix MCSectionCOFF::PrintSwitchToSection We had a few bugs: - We were considering the GVKind instead of just looking at the section characteristics - We would never print out 'y' when a section was meant to be unreadable - We would never print out 's' when a section was meant to be shared - We translated IMAGE_SCN_MEM_DISCARDABLE to 'n' when it should've meant IMAGE_SCN_LNK_REMOVE llvm-svn: 218189	2014-09-20 20:40:50 +00:00
Chandler Carruth	78a761ce8c	[x86] Start moving to a fancier check syntax to reduce the need for duplication of check lines. The idea is to have broad sets of compilation modes that will frequently diverge without having to always and immediately explode to the precise ISA feature set. While this already helps due to VEX encoded differences, it will help much more as I teach the new shuffle lowering about more of the new VEX encoded instructions which can still be used to implement 128-bit shuffles. llvm-svn: 218188	2014-09-20 18:36:39 +00:00
David Majnemer	b8dbebb31c	MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF A problem with our old behavior becomes observable under x86-64 COFF when we need a read-only GV which has an initializer which is referenced using a relocation: we would mark the section as writable. Marking the section as writable interferes with section merging. This fixes PR21009. llvm-svn: 218179	2014-09-20 07:31:46 +00:00
Chandler Carruth	8c4cccd4aa	[x86] Teach the v4f32 path of the new shuffle lowering to handle the tricky case of single-element insertion into the zero lane of a zero vector. We can't just use the same pattern here as we do in every other vector type because the general insertion logic can handle insertion into the non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we have INSERTPS that is a much better choice than the generic one for such lowerings. But INSERTPS can do lots of other lowerings as well so factoring its logic into the general insertion logic doesn't work very well. We also can't just extract the core common part of the general insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that lower to MOVSS when they can) because VZEXT_MOVL is often faster than a blend while INSERTPS is slower! So instead we do a restrictive condition on attempting to use the generic insertion logic to narrow it to those cases where VZEXT_MOVL won't need a shuffle afterward and thus will do better than INSERTPS. Then we try blending. Then we go back to INSERTPS. This still doesn't generate perfect code for some silly reasons that can be fixed by tweaking the td files for lowering VZEXT_MOVL to use XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends up in a register rather than a load from memory -- BLENDPSrr has twice the reciprocal throughput of MOVSSrr. Don't you love this ISA? llvm-svn: 218177	2014-09-20 04:15:22 +00:00
Chandler Carruth	00389f3ed9	[x86] Generalize the single-element insertion lowering to work with floating point types and use it for both v2f64 and v2i64 single-element insertion lowering. This fixes the last non-AVX performance regression test case I've gotten of for the new vector shuffle lowering. There is obvious analogous lowering for v4f32 that I'll add in a follow-up patch (because with INSERTPS, v4f32 requires special treatment). After that, its AVX stuff. llvm-svn: 218175	2014-09-20 03:32:25 +00:00
David Majnemer	f4dc456eef	llvm-readobj: pretty-print special COFF section names Print IMAGE_SYM_DEBUG and the like instead of (-2). llvm-svn: 218172	2014-09-20 00:25:06 +00:00
Peter Collingbourne	975726345c	Fix crash with an insertvalue that produces an empty object. llvm-svn: 218171	2014-09-20 00:10:47 +00:00
Matt Arsenault	de0253791c	R600: Un-xfail a test which passes with pass disabled llvm-svn: 218165	2014-09-19 23:02:20 +00:00
Matt Arsenault	5e5b242946	R600/SI: Un-xfail tests which work now llvm-svn: 218164	2014-09-19 23:02:18 +00:00
Matt Arsenault	a986554377	R600/SI: Un xfail a test that works now llvm-svn: 218162	2014-09-19 22:42:40 +00:00
Juergen Ributzka	92e8978e40	[FastIsel][AArch64] Fix a think-o in address computation. When looking through sign/zero-extensions the code would always assume there is such an extension instruction and use the wrong operand for the address. There was also a minor issue in the handling of 'AND' instructions. I accidentially used a 'cast' instead of a 'dyn_cast'. llvm-svn: 218161	2014-09-19 22:23:46 +00:00
Chandler Carruth	0fc0c22fa9	[x86] Fully generalize the zext lowering in the new vector shuffle lowering to support both anyext and zext and to custom lower for many different microarchitectures. Using this allows us to get exactly the right code for zext and anyext shuffles in all the vector sizes. For v16i8, the improvement is huge. The new SSE2 test case added I refused to add before this because it was sooooo muny instructions. llvm-svn: 218143	2014-09-19 20:00:32 +00:00
Justin Bogner	a829fde160	llvm-cov: Prevent a test from matching its own check lines Since llvm-cov shows the source file in its output, be careful about potentially matching the check lines themselves. llvm-svn: 218138	2014-09-19 19:04:08 +00:00
David Blaikie	db119544a2	Fix test case to be portable to different architectures. llvm-svn: 218134	2014-09-19 18:31:25 +00:00
Matt Arsenault	4505f3a73d	R600/SI: Fix test to prepare for scheduler llvm-svn: 218131	2014-09-19 18:11:16 +00:00
David Blaikie	3a7ce252cc	Omit DW_TAG_subprograms for subprograms without inlined subroutines when producing -gmlt data To reduce the size of -gmlt data, skip the subprograms without any inlined subroutines. Since we've now got the ability to make these determinations in the backend (funnily enough - we added the flag so we wouldn't produce ranges under -gmlt, but with this change we use the flag, but go back to producing ranges under -gmlt). Instead, just produce CU ranges to inform the consumer which parts of the code are described by this CU's line table. Tools could inspect the line table directly to compute the range, but the CU ranges only seem to be about 0.5% of object/executable size, so I'm not too worried about teaching llvm-symbolizer that trick just yet - it's certainly a possible piece of future work. Update an llvm-symbolizer test just to demonstrate that this schema is acceptable there (if it wasn't, the compiler-rt tests would catch this, but good to have an in-llvm-tree test for llvm-symbolizer's behavior here) Building the clang binary with -gmlt with this patch reduces the total size of object files by 5.1% (5.56% without ranges) without compression and the executable by 4.37% (4.75% without ranges). llvm-svn: 218129	2014-09-19 17:03:16 +00:00

1 2 3 4 5 ...

26294 Commits