llvm-project

Commit Graph

Author	SHA1	Message	Date
Juergen Ributzka	27e959d7b2	[FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1). Shift-left immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. This should fix a bug found by Chad. llvm-svn: 218275	2014-09-22 21:08:53 +00:00
David Majnemer	597be2ded6	MC: ReadOnlyWithRel section kinds should map to rdata in COFF Don't consider ReadOnlyWithRel as a writable section in COFF, they really belong in .rdata. llvm-svn: 218268	2014-09-22 20:39:23 +00:00
Chandler Carruth	44deb8015c	[x86] Introduce tests covering the gamut of 256-bit vector shuffling. These are just test cases, no actual code yet. This establishes the baseline fallback strategy we're starting from on AVX2 and the expected lowering we use on AVX1. Also, these test cases are very much generated. I've manually crafted the specific pattern set that I'm hoping will be useful at exercising the lowering code, but I've not (and could not) manually verify all of these. I've spot checked and they seem legit to me. As with the rest of vector shuffling, at a certain point the only really useful way to check the correctness of this stuff is through fuzz testing. llvm-svn: 218267	2014-09-22 20:25:08 +00:00
Sanjay Patel	7939d7229d	Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2). We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors. This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for size on any CPU with AVX or AVX2. The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data. Differential Revision: http://reviews.llvm.org/D5347 llvm-svn: 218263	2014-09-22 18:54:01 +00:00
Akira Hatanaka	f2a721a875	Fix test case commited in r218242 to appease buildbot. llvm-svn: 218261	2014-09-22 18:07:20 +00:00
Tom Stellard	9f73851e39	Revert "R600/SI: Add support for global atomic add" This reverts commit r218254. The global_atomics.ll test fails with asserts disabled. For some reason, the compiler fails to produce the atomic no return variants. llvm-svn: 218257	2014-09-22 16:44:04 +00:00
Frederic Riss	220fa48491	Fix a test introduced in r218246 to work also on Windows. llvm-svn: 218255	2014-09-22 16:17:32 +00:00
Tom Stellard	2355a77e74	R600/SI: Add support for global atomic add llvm-svn: 218254	2014-09-22 15:35:35 +00:00
Pavel Chupin	be9f12102f	[x32] Fix segmented stacks support Summary: Update segmented-stacks*.ll tests with x32 target case and make corresponding changes to make them pass. Test Plan: tests updated with x32 target Reviewers: nadav, rafael, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5245 llvm-svn: 218247	2014-09-22 13:11:35 +00:00
Frederic Riss	955724e3f5	[dwarfdump] Dump full filenames as DW_AT_(decl\|call)_file attribute values Reviewers: dblaikie samsonov Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5192 llvm-svn: 218246	2014-09-22 12:36:04 +00:00
Frederic Riss	58ed53cfcd	Allow DWARFDebugInfoEntryMinimal::getSubroutineName to resolve cross-unit references. Summary: getSubroutineName is currently only used by llvm-symbolizer, thus add a binary test containing a cross-cu inlining example. Reviewers: samsonov, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5394 llvm-svn: 218245	2014-09-22 12:35:53 +00:00
Robert Lougher	6da8a243f9	Fix assert when decoding PSHUFB mask The PSHUFB mask decode routine used to assert if the mask index was out of range (<0 or greater than the size of the vector). The problem is, we can legitimately have a PSHUFB with a large index using intrinsics. The instruction only uses the least significant 4 bits. This change removes the assert and masks the index to match the instruction behaviour. llvm-svn: 218242	2014-09-22 11:54:38 +00:00
Oliver Stannard	14f97d0017	Downgrade DWARF2 section limit error to a warning We currently emit an error when trying to assemble a file with more than one section using DWARF2 debug info. This should be a warning instead, as the resulting file will still be usable, but with a degraded debug illusion. llvm-svn: 218241	2014-09-22 10:45:16 +00:00
Chandler Carruth	7158c95d65	[x86] Move the AVX v4i64 test cases down to group them together. Increasingly I don't want to mix the integer and floating point tests, especially with AVX where they are handled quite differently. llvm-svn: 218233	2014-09-22 03:05:23 +00:00
Chandler Carruth	12bbf7d922	[x86] Back out a bad choice about lowering v4i64 and pave the way for a more sane approach to AVX2 support. Fundamentally, there is no useful way to lower integer vectors in AVX. None. We always end up with a VINSERTF128 in the end, so we might as well eagerly switch to the floating point domain and do everything there. This cleans up lots of weird and unlikely to be correct differences between integer and floating point shuffles when we only have AVX1. The other nice consequence is that by doing things this way we will make it much easier to write the integer lowering routines as we won't need to duplicate the logic to check for AVX vs. AVX2 in each one -- if we actually try to lower a 256-bit vector as an integer vector, we have AVX2 and can rely on it. I think this will make the code much simpler and more comprehensible. Currently, I've disabled all support for AVX2 so that we always fall back to AVX. This keeps everything working rather than asserting. That will go away with the subsequent series of patches that provide a baseline AVX2 implementation. Please note, I'm going to implement AVX2 without access to hardware. That means I cannot correctness test this path. I will be relying on those with access to AVX2 hardware to do correctness testing and fix bugs here, but as a courtesy I'm trying to sketch out the framework for the new-style vector shuffle lowering in the context of the AVX2 ISA. llvm-svn: 218228	2014-09-22 00:32:15 +00:00
Chandler Carruth	5d45962b2c	[x86] Teach the new vector shuffle lowering how to cleverly lower single input v8f32 shuffles which are not 128-bit lane crossing but have different shuffle patterns in the low and high lanes. This removes most of the extract/insert traffic that was unnecessary and is particularly good at lowering cases where only one of the two lanes is shuffled at all. I've also added a collection of test cases with undef lanes because this lowering is somewhat more sensitive to undef lanes than others. llvm-svn: 218226	2014-09-21 23:46:13 +00:00
Chandler Carruth	b195e860f9	[x86] Add a bunch of test cases where we have different shuffle patterns in the high and low 128-bit lanes of a v8f32 vector. No functionality change yet, but wanted to set up the baseline for my next patch which will make these quite a bit better. =] llvm-svn: 218224	2014-09-21 23:32:42 +00:00
Chandler Carruth	b3125c7522	[x86] Teach the new vector shuffle lowering to re-use the SHUFPS lowering when it can use a symmetric SHUFPS across both 128-bit lanes. This required making the SHUFPS lowering tolerant of other vector types, and adjusting our canonicalization to canonicalize harder. This is the last of the clever uses of symmetry I've thought of for v8f32. The rest of the tricks I'm aware of here are to work around assymetry in the mask. llvm-svn: 218216	2014-09-21 13:35:14 +00:00
Chandler Carruth	33eda72802	[x86] Teach the new vector shuffle lowering the basics about insertion of a single element into a zero vector for v4f64 and v4i64 in AVX. Ironically, there is less to see here because xor+blend is so crazy fast that we can't really beat that to zero the high 128-bit lane. llvm-svn: 218214	2014-09-21 12:49:46 +00:00
Chandler Carruth	43f5974ea0	[x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and UNPCKHPS with AVX vectors by recognizing those patterns when they are repeated for both 128-bit lanes. With this, we now generate the exact same (really nice) code for Quentin's avx_test_case.ll which was the most significant regression reported for the new shuffle lowering. In fact, I'm out of specific test cases for AVX lowering, the rest were AVX2 I think. However, there are a bunch of pretty obvious remaining things to improve with AVX... llvm-svn: 218213	2014-09-21 12:20:44 +00:00
Chandler Carruth	78f4798913	[x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in preparation for enhancing their support in the new vector shuffle lowering. llvm-svn: 218212	2014-09-21 12:13:11 +00:00
Chandler Carruth	88404c4f9b	[x86] Begin teaching the new vector shuffle lowering among the most important bits of cleverness: to detect and lower repeated shuffle patterns between the two 128-bit lanes with a single instruction. This patch just teaches it how to lower single-input shuffles that fit this model using VPERMILPS. =] There is more that needs to happen here. llvm-svn: 218211	2014-09-21 12:01:19 +00:00
Chandler Carruth	83252ac8f4	[x86] Regenerate this test case now that I've improved my script for generating the test cases to format things more consistently and actually catch all the operand sequences that should be elided in favor of the asm comments. No actual changes here. llvm-svn: 218210	2014-09-21 11:51:33 +00:00
Chandler Carruth	e81bfbada9	[x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the reciprocal throughput on Ivy Bridge CPUs much like it does everywhere for 128-bits. There isn't a downside, so just eagerly use this instruction when it suffices. llvm-svn: 218208	2014-09-21 11:17:55 +00:00
Chandler Carruth	6aea21df8e	[x86] Add some more comprehensive tests for v4f64 blending. llvm-svn: 218207	2014-09-21 11:12:19 +00:00
Chandler Carruth	908afb56c0	[x86] Re-generate a bunch of the v4f64 test cases with my new script. This expands the integer cases to cover the fact that AVX2 moves their lane-crossing shuffles into the integer domain. It also adds proper support for AVX2 run lines and the "ALL" group when it doesn't matter. llvm-svn: 218206	2014-09-21 11:07:41 +00:00
Chandler Carruth	293327ddcd	[x86] Teach the new vector shuffle lowering the first step toward more actual support for complex AVX shuffling tricks. We can do independent blends of the low and high 128-bit lanes of an avx vector, so shuffle the inputs into place and then do the blend at 256 bits. This will in many cases remove one blend instruction. The next step is to permute the low and high halves in-place rather than extracting them and re-inserting them. llvm-svn: 218202	2014-09-21 09:35:22 +00:00
David Majnemer	48227a3759	MC: Support aligned COMMON symbols for COFF link.exe: Fuzz testing has shown that COMMON symbols with size > 32 will always have an alignment of at least 32 and all symbols with size < 32 will have an alignment of at least the largest power of 2 less than the size of the symbol. binutils: The BFD linker essentially work like the link.exe behavior but with alignment 4 instead of 32. The BFD linker also supports an extension to COFF which adds an -aligncomm argument to the .drectve section which permits specifying a precise alignment for a variable but MC currently doesn't support editing .drectve in this way. With all of this in mind, we decide to play a little trick: we can ensure that the alignment will be respected by bumping the size of the global to it's alignment. llvm-svn: 218201	2014-09-21 09:18:07 +00:00
Chandler Carruth	8ff73c0170	[x86] Add some more test cases covering specific blend patterns. llvm-svn: 218200	2014-09-21 09:01:26 +00:00
Chandler Carruth	7a6108d652	[x86] Add the beginnings of some tests for our v8f32 shuffle lowering under AVX. This really just documents the current state of the world. I'm going to try to flesh it out to cover any test cases I plan to improve prior to improving them so that the delta made by changes is actually visible to code reviewers. This is made easier by the fact that I now have a script to automate the process of producing test cases including the check lines. =] llvm-svn: 218199	2014-09-21 08:49:27 +00:00
Chandler Carruth	a454812ac8	[x86] Teach the new vector shuffle lowering to use VPERMILPD for single-input shuffles with doubles. This allows them to fold memory operands into the shuffle, etc. This is just the analog to the v4f32 case in my prior commit. llvm-svn: 218193	2014-09-20 22:09:27 +00:00
Chandler Carruth	aa5b798ae7	[x86] Add an AVX run to the 128-bit v2 tests, teach them to have a generic SSE and AVX mode in addition to a specific AVX1 test path, and flesh out the AVX tests. llvm-svn: 218192	2014-09-20 21:26:41 +00:00
David Majnemer	fb83977538	Update tests which broke from r218189 llvm-svn: 218191	2014-09-20 21:18:43 +00:00
Chandler Carruth	6f80abac4e	[x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS instruction for single-vector floating point shuffles. This in turn allows the shuffles to fold a load into the instruction which is one of the common regressions hit with the new shuffle lowering. llvm-svn: 218190	2014-09-20 20:52:07 +00:00
David Majnemer	7d0dc3ef18	MC: Fix MCSectionCOFF::PrintSwitchToSection We had a few bugs: - We were considering the GVKind instead of just looking at the section characteristics - We would never print out 'y' when a section was meant to be unreadable - We would never print out 's' when a section was meant to be shared - We translated IMAGE_SCN_MEM_DISCARDABLE to 'n' when it should've meant IMAGE_SCN_LNK_REMOVE llvm-svn: 218189	2014-09-20 20:40:50 +00:00
Chandler Carruth	78a761ce8c	[x86] Start moving to a fancier check syntax to reduce the need for duplication of check lines. The idea is to have broad sets of compilation modes that will frequently diverge without having to always and immediately explode to the precise ISA feature set. While this already helps due to VEX encoded differences, it will help much more as I teach the new shuffle lowering about more of the new VEX encoded instructions which can still be used to implement 128-bit shuffles. llvm-svn: 218188	2014-09-20 18:36:39 +00:00
David Majnemer	b8dbebb31c	MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF A problem with our old behavior becomes observable under x86-64 COFF when we need a read-only GV which has an initializer which is referenced using a relocation: we would mark the section as writable. Marking the section as writable interferes with section merging. This fixes PR21009. llvm-svn: 218179	2014-09-20 07:31:46 +00:00
Chandler Carruth	8c4cccd4aa	[x86] Teach the v4f32 path of the new shuffle lowering to handle the tricky case of single-element insertion into the zero lane of a zero vector. We can't just use the same pattern here as we do in every other vector type because the general insertion logic can handle insertion into the non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we have INSERTPS that is a much better choice than the generic one for such lowerings. But INSERTPS can do lots of other lowerings as well so factoring its logic into the general insertion logic doesn't work very well. We also can't just extract the core common part of the general insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that lower to MOVSS when they can) because VZEXT_MOVL is often faster than a blend while INSERTPS is slower! So instead we do a restrictive condition on attempting to use the generic insertion logic to narrow it to those cases where VZEXT_MOVL won't need a shuffle afterward and thus will do better than INSERTPS. Then we try blending. Then we go back to INSERTPS. This still doesn't generate perfect code for some silly reasons that can be fixed by tweaking the td files for lowering VZEXT_MOVL to use XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends up in a register rather than a load from memory -- BLENDPSrr has twice the reciprocal throughput of MOVSSrr. Don't you love this ISA? llvm-svn: 218177	2014-09-20 04:15:22 +00:00
Chandler Carruth	00389f3ed9	[x86] Generalize the single-element insertion lowering to work with floating point types and use it for both v2f64 and v2i64 single-element insertion lowering. This fixes the last non-AVX performance regression test case I've gotten of for the new vector shuffle lowering. There is obvious analogous lowering for v4f32 that I'll add in a follow-up patch (because with INSERTPS, v4f32 requires special treatment). After that, its AVX stuff. llvm-svn: 218175	2014-09-20 03:32:25 +00:00
David Majnemer	f4dc456eef	llvm-readobj: pretty-print special COFF section names Print IMAGE_SYM_DEBUG and the like instead of (-2). llvm-svn: 218172	2014-09-20 00:25:06 +00:00
Peter Collingbourne	975726345c	Fix crash with an insertvalue that produces an empty object. llvm-svn: 218171	2014-09-20 00:10:47 +00:00
Matt Arsenault	de0253791c	R600: Un-xfail a test which passes with pass disabled llvm-svn: 218165	2014-09-19 23:02:20 +00:00
Matt Arsenault	5e5b242946	R600/SI: Un-xfail tests which work now llvm-svn: 218164	2014-09-19 23:02:18 +00:00
Matt Arsenault	a986554377	R600/SI: Un xfail a test that works now llvm-svn: 218162	2014-09-19 22:42:40 +00:00
Juergen Ributzka	92e8978e40	[FastIsel][AArch64] Fix a think-o in address computation. When looking through sign/zero-extensions the code would always assume there is such an extension instruction and use the wrong operand for the address. There was also a minor issue in the handling of 'AND' instructions. I accidentially used a 'cast' instead of a 'dyn_cast'. llvm-svn: 218161	2014-09-19 22:23:46 +00:00
Chandler Carruth	0fc0c22fa9	[x86] Fully generalize the zext lowering in the new vector shuffle lowering to support both anyext and zext and to custom lower for many different microarchitectures. Using this allows us to get exactly the right code for zext and anyext shuffles in all the vector sizes. For v16i8, the improvement is huge. The new SSE2 test case added I refused to add before this because it was sooooo muny instructions. llvm-svn: 218143	2014-09-19 20:00:32 +00:00
Justin Bogner	a829fde160	llvm-cov: Prevent a test from matching its own check lines Since llvm-cov shows the source file in its output, be careful about potentially matching the check lines themselves. llvm-svn: 218138	2014-09-19 19:04:08 +00:00
David Blaikie	db119544a2	Fix test case to be portable to different architectures. llvm-svn: 218134	2014-09-19 18:31:25 +00:00
Matt Arsenault	4505f3a73d	R600/SI: Fix test to prepare for scheduler llvm-svn: 218131	2014-09-19 18:11:16 +00:00
David Blaikie	3a7ce252cc	Omit DW_TAG_subprograms for subprograms without inlined subroutines when producing -gmlt data To reduce the size of -gmlt data, skip the subprograms without any inlined subroutines. Since we've now got the ability to make these determinations in the backend (funnily enough - we added the flag so we wouldn't produce ranges under -gmlt, but with this change we use the flag, but go back to producing ranges under -gmlt). Instead, just produce CU ranges to inform the consumer which parts of the code are described by this CU's line table. Tools could inspect the line table directly to compute the range, but the CU ranges only seem to be about 0.5% of object/executable size, so I'm not too worried about teaching llvm-symbolizer that trick just yet - it's certainly a possible piece of future work. Update an llvm-symbolizer test just to demonstrate that this schema is acceptable there (if it wasn't, the compiler-rt tests would catch this, but good to have an in-llvm-tree test for llvm-symbolizer's behavior here) Building the clang binary with -gmlt with this patch reduces the total size of object files by 5.1% (5.56% without ranges) without compression and the executable by 4.37% (4.75% without ranges). llvm-svn: 218129	2014-09-19 17:03:16 +00:00
Hal Finkel	62ac736faa	Optionally enable more-aggressive FMA formation in DAGCombine The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120	2014-09-19 11:42:56 +00:00
Chandler Carruth	8a6536d4b2	[x86] Recognize that we can use duplication to widen v16i8 shuffles due to undef lanes as well as defined widenable lanes. This dramatically improves the lowering we use for undef-shuffles in a zext-ish pattern for SSE2. llvm-svn: 218115	2014-09-19 09:45:21 +00:00
Chandler Carruth	662b6d84e7	[x86] Actually test the SSE2 lowering for most of the zext-ish shuffles. Not sure why I only did SSSE3 here. Also, I've left out some of the SSE2 ones because the shuffles are so absurd it's not worth transcribing them. Will try to fix them to be sane and then check them. llvm-svn: 218114	2014-09-19 08:51:06 +00:00
Chandler Carruth	2e275142cd	[x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32 shuffles that are zext-ing. Not a lot to see here; the undef lane variant is better handled with pshufd, but this improves the actual zext pattern. llvm-svn: 218112	2014-09-19 08:37:44 +00:00
Justin Bogner	13ba23bb79	llvm-cov: Fix dropped lines when filters were applied Uncovered lines in the middle of a covered region weren't being shown when filtering to a particular function. llvm-svn: 218109	2014-09-19 08:13:16 +00:00
Chandler Carruth	398ba9a018	[x86] Add a dedicated lowering path for zext-compatible vector shuffles to the new vector shuffle lowering code. This allows us to emit PMOVZX variants consistently for patterns where it is a viable lowering. This instruction is both fast and allows us to fold loads into it. This only hooks the new lowering up for i16 and i8 element widths, mostly so I could manage the change to the tests. I'll add the i32 one next, although it is significantly less interesting. One thing to note is that we already had some tests for these patterns but those tests had far less horrible instructions. The problem is that those tests weren't checking the strict start and end of the instruction sequence. =[ As a consequence something changed in the lowering making us generate TERRIBLE code for these patterns in SSE2 through SSSE3. I've consolidated all of the tests and spelled out the madness that we currently emit for these shuffles. I'm going to try to figure out what has gone wrong here. llvm-svn: 218102	2014-09-19 06:07:49 +00:00
Jiangning Liu	ffbc690933	Optimize sext/zext insertion algorithm in back-end. With this optimization, we will not always insert zext for values crossing basic blocks, but insert sext if the users of a value crossing basic block has preference of sign predicate. llvm-svn: 218101	2014-09-19 05:30:35 +00:00
David Blaikie	03c3dbeb62	Omit DW_AT_frame_base under -gmlt for size llvm-svn: 218100	2014-09-19 04:55:05 +00:00
David Blaikie	73b65d236c	Omit all the extra static attributes on subprograms in -gmlt This omission will be done in a fancier manner once we're dealing with "put gmlt in the skeleton CUs under fission" - it'll have to be conditional on the kind of CU we're emitting into (skeleton or gmlt). llvm-svn: 218098	2014-09-19 04:30:36 +00:00
Hans Wennborg	c0f0c511db	Fix an it's vs. its typo. llvm-svn: 218093	2014-09-19 01:14:56 +00:00
Matt Arsenault	46cbc4367b	R600: Better fix for bug 20982 Just do the left shift as unsigned to avoid the UB. llvm-svn: 218092	2014-09-19 00:42:06 +00:00
Chandler Carruth	be58fd2f2d	[x86] Extend this test to cover SSE4.1. Nothing interesting here, but paves the way for subsequent changes. llvm-svn: 218091	2014-09-19 00:30:24 +00:00
Peter Collingbourne	6b433e4d46	Try to fix i686-cygming bots. llvm-svn: 218086	2014-09-18 22:56:00 +00:00
Peter Collingbourne	10039c02ea	LTO: introduce object file-based on-disk module format. This format is simply a regular object file with the bitcode stored in a section named ".llvmbc", plus any number of other (non-allocated) sections. One immediate use case for this is to accommodate compilation processes which expect the object file to contain metadata in non-allocated sections, such as the ".go_export" section used by some Go compilers [1], although I imagine that in the future we could consider compiling parts of the module (such as large non-inlinable functions) directly into the object file to improve LTO efficiency. [1] http://golang.org/doc/install/gccgo#Imports Differential Revision: http://reviews.llvm.org/D4371 llvm-svn: 218078	2014-09-18 21:28:49 +00:00
Quentin Colombet	17799fedb7	[ARM] Do not perform a tail call when the caller returns several values. The fix is slightly different then x86 (see r216117) because the number of values attached to a return can vary even for a single returned value (e.g., f64 yields two returned values). <rdar://problem/18352998> llvm-svn: 218076	2014-09-18 21:17:50 +00:00
Robin Morisset	5349e8e532	Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors" Summary: This patch was originally in D5304 (I could not find a way to reopen that revision). It was accepted, commited and broke the build bots because the overloading of the constructor of ArrayRef for braced initializer lists is not supported by all toolchains. I then reverted it, and propose this fixed version that uses a plain C array instead in makeDMB (that array is then converted implicitly to an ArrayRef, but that is not behind an ifdef). Could someone confirm me whether initialization lists for plain C arrays are supported by every toolchain used to build llvm ? Otherwise I can just initialize the array in the old way: args[0] = ...; .. ; args[5] = ...; Below is the description of the original patch: ``` I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. ``` Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D5386 llvm-svn: 218066	2014-09-18 18:56:04 +00:00
Matt Arsenault	6462f94884	R600: Bug 20982 - Avoid undefined left shift of negative value I'm not sure what the hardware actually does, so don't bother trying to fold it for now. llvm-svn: 218057	2014-09-18 15:52:26 +00:00
Chandler Carruth	9057fcaf82	[x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate. There is no purpose in using it for single-input shuffles as pshufd is just as fast and doesn't tie the two operands. This removes a substantial amount of wrong-domain blend operations in SSSE3 mode. It also completes the usage of PALIGNR for integer shuffles and addresses one of the test cases Quentin hit with the new vector shuffle lowering. There is still the question of whether and when to use this for floating point shuffles. It is faster than shufps or shufpd but in the integer domain. I don't yet really have a good heuristic here for when to use this instruction for floating point vectors. llvm-svn: 218038	2014-09-18 09:00:25 +00:00
Chandler Carruth	0fe4928fbe	[x86] Add an SSSE3 run and check mode to the 128-bit v2 tests of the new vector shuffle lowering. This will be needed for up-coming palignr tests. llvm-svn: 218037	2014-09-18 08:33:04 +00:00
Juergen Ributzka	1d3a312e2d	Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ." Reverting it until I have time to investigate a regression. llvm-svn: 218035	2014-09-18 08:07:40 +00:00
Juergen Ributzka	0f3076785f	Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies. When folding the intrinsic flag into the branch or select we also have to consider the fact if the intrinsic got simplified, because it changes the flag we have to check for. llvm-svn: 218034	2014-09-18 07:26:26 +00:00
Juergen Ributzka	2964b832ef	[FastISel][AArch64] Simplify XALU multiplies. Simplify {s\|u}mul.with.overflow to {s\|u}add.with.overflow when possible. llvm-svn: 218033	2014-09-18 07:04:54 +00:00
Juergen Ributzka	2fc851002b	[FastISel][AArch64] Followup commit for 218031 to handle negative offsets too. llvm-svn: 218032	2014-09-18 07:04:49 +00:00
Juergen Ributzka	a33070c321	[FastISel][AArch64] Try to fold the offset into the add instruction when simplifying a memory address. Small optimization in 'simplifyAddress'. When the offset cannot be encoded in the load/store instruction, then we need to materialize the address manually. The add instruction can encode a wider range of immediates than the load/store instructions. This change tries to fold the offset into the add instruction first before materializing the offset in a register. llvm-svn: 218031	2014-09-18 05:40:47 +00:00
Juergen Ributzka	99b7758ba0	[FastISel][AArch64] Fold 'AND' instruction during the address computation. The 'AND' instruction could be used to mask out the lower 32 bits of a register. If this is done inside an address computation we might be able to fold the instruction into the memory instruction itself. and x1, x1, #0xffffffff ---> ldrb x0, [x0, w1, uxtw] ldrb x0, [x0, x1] llvm-svn: 218030	2014-09-18 05:40:41 +00:00
Chandler Carruth	e0d77ef053	[x86] Add an SSSE3 run to the v4 shuffle test. llvm-svn: 218028	2014-09-18 04:38:32 +00:00
Saleem Abdulrasool	bfdfb14a8f	ARM: prevent crash on ELF directives on COFF Certain directives are unsupported on Windows (some of which could/should be supported). We would not diagnose the use but rather crash during the emission as we try to access the Target Streamer. Add an assertion to prevent creating a NULL reference (which is not permitted under C++) as well as a test to ensure that we can diagnose the disabled directives. llvm-svn: 218014	2014-09-18 04:28:29 +00:00
Chandler Carruth	867930aadf	[x86] Initial step of teaching the new vector shuffle lowering about PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where it is completely unmatched. It also introduces the logic for detecting rotation shuffle masks even in the presence of single input or blend masks and arbitrarily undef lanes. I've added fairly comprehensive tests for the matching logic in v8i16 because the tests at that size are much easier to write and manage. I've not checked the SSE2 code generated for these tests because the code is horrible. It is absolute madness. Testing it will just make the test brittle without giving any interesting improvements in the correctness confidence. llvm-svn: 218013	2014-09-18 04:11:29 +00:00
Saleem Abdulrasool	8c61c6c0f9	ARM: use a more precise check for MachO Rather than relying on support for a specific directive to determine if we are targeting MachO, explicitly check the output format. As an additional bonus, cleanup the caret diagnostic for the non-MachO case and avoid the spurious error caused by not discarding the statement. llvm-svn: 218012	2014-09-18 03:49:55 +00:00
Juergen Ributzka	c35fb03661	[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ. Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). llvm-svn: 218010	2014-09-18 02:44:13 +00:00
Samuel Antao	61570df715	Fix FastISel bug in boolean returns for PowerPC. For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. llvm-svn: 217993	2014-09-17 23:25:06 +00:00
Juergen Ributzka	f6430314b4	[FastISel][AArch64] Custom lower sdiv by power-of-2. Emit an optimized instruction sequence for sdiv by power-of-2 depending on the exact flag. This fixes rdar://problem/18224511. llvm-svn: 217986	2014-09-17 21:55:55 +00:00
Nick Kledzik	3e95fa431e	[llvm-objdump] clean up test cases now that build bots are green llvm-svn: 217985	2014-09-17 21:53:07 +00:00
Justin Bogner	5cbed6e09e	llvm-cov: Push some more debug output into the View (NFC) llvm-svn: 217984	2014-09-17 21:48:52 +00:00
Rafael Espindola	51bd8ee309	Internalize common symbols when we can. This fixes pr20974. llvm-svn: 217981	2014-09-17 20:41:13 +00:00
Juergen Ributzka	c611d72754	[FastISel][AArch64] Simplify mul to shift when possible. This is related to rdar://problem/18369687. llvm-svn: 217980	2014-09-17 20:35:41 +00:00
Alexey Samsonov	7bddb0a56a	Exclude known and bugzilled failures from UBSan bootstrap llvm-svn: 217979	2014-09-17 20:17:52 +00:00
Juergen Ributzka	3871c69422	[FastISel][AArch64] Fold mul into add/sub and logical operations. Try to fold the multiply into the add/sub or logical operations (when possible). This is related to rdar://problem/18369687. llvm-svn: 217978	2014-09-17 19:51:38 +00:00
Juergen Ributzka	22d4cd0a4f	[FastISel][AArch64] Fold mul into the address computation of memory operations. Teach 'computeAddress' to also fold multiplies into the address computation (when possible). This fixes rdar://problem/18369443. llvm-svn: 217977	2014-09-17 19:19:31 +00:00
Robin Morisset	bf26f8fd56	Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors" It is breaking the build on the buildbots but works fine on my machine, I revert while trying to understand what happens (it appears to depend on the compiler used to build, I probably used a C++11 feature that is not perfectly supported by some of the buildbots). This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc. llvm-svn: 217973	2014-09-17 18:09:13 +00:00
Juergen Ributzka	d8e30c0db8	[FastISel][AArch64] Fold compare with zero and branch into CBZ and CBNZ. This takes advanatage of the CBZ and CBNZ instruction to further optimize the common null check pattern into a single instruction. This is related to rdar://problem/18358882. llvm-svn: 217972	2014-09-17 18:05:34 +00:00
Juergen Ributzka	fb3e14375a	[FastISel][AArch64] Improve branch selection to support all FP conditions. This adds the last two missing floating-point condition codes (FCMP_UEQ and FCMP_ONE) also to the branch selection. In these two cases an additonal branch instruction is required. This also adds unit tests to checks all the different condition codes. This is related o rdar://problem/18358882. llvm-svn: 217966	2014-09-17 17:46:47 +00:00
Robin Morisset	1c8a457575	[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors Summary: I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5304 llvm-svn: 217965	2014-09-17 17:41:16 +00:00
Matt Arsenault	02dc26529e	R600/SI: Change formatting of printed FP immediates Only 1 decimal place should be printed for inline immediates. Other constants should be hex constants. Does not include f64 tests because folding those inline immediates currently does not work. llvm-svn: 217964	2014-09-17 17:32:13 +00:00
Chad Rosier	307b50b0f6	[IndVarSimplify] Partially revert r217953 to see if this fixes the bots. Specifically, disable widening of unsigned compare instructions. llvm-svn: 217962	2014-09-17 16:35:09 +00:00
Chad Rosier	bb99f40530	[IndVarSimplify] Widen loop compare instructions. This improves other optimizations such as LSR. A sext may be added to the compare's other operand, but this can often be hoisted outside of the loop. llvm-svn: 217953	2014-09-17 14:10:33 +00:00
Andrea Di Biagio	5b92b4971a	[InstCombine] Fix wrong folding of constant comparison involving ahsr and negative quantities (PR20945). Example: define i1 @foo(i32 %a) { %shr = ashr i32 -9, %a %cmp = icmp ne i32 %shr, -5 ret i1 %cmp } Before this fix, the instruction combiner wrongly thought that %shr could have never been equal to -5. Therefore, %cmp was always folded to 'true'. However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore, in this example, it is not valid to fold %cmp to 'true'. The problem was only affecting the case where the comparison was between negative quantities where one of the quantities was obtained from arithmetic shift of a negative constant. This patch fixes the problem with the wrong folding (fixes PR20945). With this patch, the 'icmp' from the example is now simplified to a comparison between %a and 1. This still allows us to get rid of the arithmetic shift (%shr). llvm-svn: 217950	2014-09-17 11:32:31 +00:00
Toma Tabacu	351b2feeb3	[mips] Add assembler support for the .set nodsp directive. Summary: This directive is used to tell the assembler to reject DSP-specific instructions. Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5142 llvm-svn: 217946	2014-09-17 09:01:54 +00:00
Pavel Chupin	37b65d81dd	[x32] Fix function indirect calls Summary: Zero-extend register to 64-bit for callq/jmpq. Test Plan: 3 tests added Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5355 llvm-svn: 217942	2014-09-17 07:09:23 +00:00
David Majnemer	b435a4214e	InstSimplify: Don't allow (x srem y) urem y -> x srem y Let's consider the case where: %x i16 = 32768 %y i16 = 384 %x srem %y = 65408 (%x srem %y) urem %y = 128 llvm-svn: 217939	2014-09-17 04:16:35 +00:00

1 2 3 4 5 ...

26244 Commits