llvm-project

Commit Graph

Author	SHA1	Message	Date
Tim Northover	d4d294dd51	ARM-MachO: materialize callee address correctly on v4t. llvm-svn: 214958	2014-08-06 11:13:06 +00:00
Chandler Carruth	c3927cd8c9	[x86] Fix two independent miscompiles in the process of getting the same test case to actually generate correct code. The primary miscompile fixed here is that we weren't correctly handling in-place elements in one half of a single-input v8i16 shuffle when moving a dword of elements from that half to the other half. Some times, we would clobber the in-place elements in forming the dword to move across halves. The fix to this involves forcibly marking the in-place inputs even when there is no need to gather them into a dword, and to much more carefully re-arrange the elements when grouping them into a dword to move across halves. With these two changes we would generate correct shuffles for the test case, but found another miscompile. There are also some random perturbations of the generated shuffle pattern in SSE2. It looks like a wash; more instructions in some cases fewer in others. The second miscompile would corrupt the results into nonsense. This is a buggy pattern in one of the added DAG combines. Mapping elements through a PSHUFD when pairing redundant half-shuffles is much harder than this code makes it out to be -- it requires reasoning about all of where the input is used in the PSHUFD, not just one part of where it is used. Plus, we can't combine a half shuffle into a PSHUFD but the code didn't guard against it. I think this was just a bad idea and I've just removed that aspect of the combine. No tests regress as a consequence so seems OK. llvm-svn: 214954	2014-08-06 10:16:36 +00:00
Adam Nemet	5ec912881f	[X86] Fixes commit r214890 to match the posted patch This was another fallout from my local rebase where something went wrong :( llvm-svn: 214951	2014-08-06 07:13:12 +00:00
Peter Collingbourne	df240b252a	[dfsan] Try not to create too many additional basic blocks in functions which already have a large number of blocks. Works around a performance issue with the greedy register allocator. llvm-svn: 214944	2014-08-06 00:33:40 +00:00
Matt Arsenault	d5f4de27b6	R600: Increase nearby load scheduling threshold. This partially fixes weird looking load scheduling in memcpy test. The load clustering doesn't seem particularly smart, but this method seems to be partially deprecated so it might not be worth trying to fix. llvm-svn: 214943	2014-08-06 00:29:49 +00:00
Matt Arsenault	c10853f29f	R600/SI: Implement areLoadsFromSameBasePtr This currently has a noticable effect on the kernel argument loads. LDS and global loads are more problematic, I think because of how copies are currently inserted to ensure that the address is a VGPR. llvm-svn: 214942	2014-08-06 00:29:43 +00:00
David Blaikie	fb0412f039	DebugInfo: Assert that any CU for which debug_loc lists are emitted, has at least one range. This was coming in weird debug info that had variables (and hence debug_locs) but was in GMLT mode (because it was missing the 13th field of the compile_unit metadata) so no ranges were constructed. We should always have at least one range for any CU with a debug_loc in it - because the range should cover the debug_loc. The assertion just ensures that the "!= 1" range case inside the subsequent loop doesn't get entered for the case where there are no ranges at all, which should never reach here in the first place. llvm-svn: 214939	2014-08-06 00:21:25 +00:00
David Blaikie	cabf54a313	DebugInfo: Fix a bunch of tests that, owing to their compile_unit metadata not including a 13th field, had some subtle behavior. Without the 13th field, the "emission kind" field defaults to 0 (which is not equal to either of the values of the emission kind enum (1 == full debug info, 2 == line tables only)). In this particular instance, the comparison with "FullDebugInfo" was done when adding elements to the ranges list - so for these test cases no values were added to the ranges list. This got weirder when emitting debug_loc entries as the addresses should be relative to the range of the CU if the CU has only one range (the reasonable assumption is that if we're emitting debug_loc lists for a CU that CU has at least one range - but due to the above situation, it has zero) so the ranges were emitted relative to the start of the section rather than relative to the start of the CU's singular range. Fix these tests by accounting for the difference in the description of debug_loc entries (in some cases making the test ignorant to these differences, in others adding the extra label difference expression, etc) or the presence/absence of high/low_pc on the CU, and add the 13th field to their CUs to enable proper "full debug info" emission here. In a future commit I'll fix up a bunch of other test cases that are not so rigorously depending on this behavior, but still doing similarly weird things due to the missing 13th field. llvm-svn: 214937	2014-08-05 23:57:31 +00:00
Jonathan Roelofs	ef84bda531	Re-apply r214881: Fix return sequence on armv4 thumb This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928	2014-08-05 21:32:21 +00:00
Bill Schmidt	42a6936c78	[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923	2014-08-05 20:47:25 +00:00
Sanjay Patel	1954f2e924	Improved test cases that were added with r214892. 1. Added ':' to CHECK-LABELs 2. Added more CHECKs 3. Added CHECK-NEXTs 4. Added verbose hex immediate comments to CHECKs llvm-svn: 214921	2014-08-05 20:16:35 +00:00
Rafael Espindola	f9e52cf015	Don't internalize all but main by default. This is mostly a cleanup, but it changes a fairly old behavior. Every "real" LTO user was already disabling the silly internalize pass and creating the internalize pass itself. The difference with this patch is for "opt -std-link-opts" and the C api. Now to get a usable behavior out of opt one doesn't need the funny looking command line: opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts llvm-svn: 214919	2014-08-05 20:10:38 +00:00
Rafael Espindola	c03b6e7880	Add a test showing the interaction of linker scripts and plugin. In particular, the linker script is processed early enough for function g to be internalized. llvm-svn: 214916	2014-08-05 19:56:53 +00:00
Chandler Carruth	a746239be3	[x86] Fix a crasher due to shuffles which cancel each other out and add a test case. We also miscompile this test case which is showing a serious flaw in the single-input v8i16 shuffle code. I've left the specific instruction checks FIXME-ed out until I can address the bug in the single-input code, but I wanted to separate out a significant functionality change to produce correct code from a very simple and targeted crasher fix. The miscompile problem stems from keeping track of inputs by value rather than by index. As a consequence of doing this, we can't reliably update those inputs because they might swap and we can't detect this without copying the mask. The blend code now uses indices for the input lists and this seems strictly better. It also should make it easier to sort things and do other cleanups. I think the time has come to simplify The Great Lambda here. llvm-svn: 214914	2014-08-05 18:45:49 +00:00
Philip Reames	00c9b6461f	Remove dead zero store to calloc initialized memory Optimize the following IR: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This store is dead and should be removed store i32 0, i32* %2, align 4 Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable. Reviewed By: nicholas Differential Revision: http://reviews.llvm.org/D3942 llvm-svn: 214897	2014-08-05 17:48:20 +00:00
Jonathan Roelofs	064eb5a177	Revert r214881 because it broke lots of build-bots llvm-svn: 214893	2014-08-05 17:36:05 +00:00
Sanjay Patel	8e5beb6edb	Optimize vector fabs of bitcasted constant integer values. Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892	2014-08-05 17:35:22 +00:00
Adam Nemet	fd2161b710	[AVX512] Add masking variant and intrinsics for valignd/q This is similar to what I did with the two-source permutation recently. (It's almost too similar so that we should consider generating the masking variants with some tablegen help.) Both encoding and intrinsic tests are added as well. For the latter, this is what the IR that the intrinsic test on the clang side generates. Part of <rdar://problem/17688758> llvm-svn: 214890	2014-08-05 17:23:04 +00:00
Jonathan Roelofs	f5fad3767b	Fix return sequence on armv4 thumb POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881	2014-08-05 17:13:17 +00:00
David Blaikie	c74ffa9cab	Improve test for merged global debug info by using llvm-dwarfdump. It's a bit of a tradeoff, since llvm-dwarfdump doesn't print the name of the global symbol being used as an address in the addressing mode, but this avoids the dependence on hardcoded set labels that keep changing (5+ commits over the last few years that each update the set label as it changes due to other, unrelated differences in output). This could've, instead, been changed to match the set name then match the name in the string pool but that would present other issues (needing to skip over the sets that weren't of interest, etc) and checking that the addresses (granted, without relocations applied - so it's not the whole story) match in the two variable location descriptions seems sufficient and fairly stable here. There are a few similar other tests with similar label dependence that I'll update soonish. llvm-svn: 214878	2014-08-05 16:20:25 +00:00
Joerg Sonnenberger	c4ce42980e	Add accessors for the PPC 403 bank registers. llvm-svn: 214875	2014-08-05 15:45:15 +00:00
Renato Golin	877b9b3513	Add tests for cp10/cp11 on ARMv5/6 Tests for ARMv7/8 are already on diagnostics.s llvm-svn: 214872	2014-08-05 15:29:41 +00:00
Keith Walker	1045717584	Specify that the thumb setend and blx <immed> instructions are not valid on an m-class target llvm-svn: 214871	2014-08-05 15:11:59 +00:00
Keith Walker	292aa3d5f7	Define stc2/stc2l/ldc2/ldc2l as thumb2 instructions llvm-svn: 214868	2014-08-05 14:58:05 +00:00
Joerg Sonnenberger	936a4c8ceb	Accessors for SSR2 and SSR3 on PPC 403. llvm-svn: 214867	2014-08-05 14:53:05 +00:00
Tom Stellard	229d5e669b	R600/SI: Update MUBUF assembly string to match AMD proprietary compiler llvm-svn: 214866	2014-08-05 14:48:12 +00:00
Tom Stellard	b37f797678	R600/SI: Avoid generating REGISTER_LOAD instructions. SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214865	2014-08-05 14:40:52 +00:00
Joerg Sonnenberger	412471271e	Add dci/ici instructions for PPC 476 and friends. llvm-svn: 214864	2014-08-05 14:40:32 +00:00
Joerg Sonnenberger	048284e1b6	Add mftblo and mftbhi for PPC 4xx. llvm-svn: 214863	2014-08-05 14:18:16 +00:00
Joerg Sonnenberger	9dedceb71d	Add lswi / stswi for assembler use with a warning to not add patterns for them. llvm-svn: 214862	2014-08-05 13:34:01 +00:00
Yi Kong	e56de69500	AArch64: Add support for instruction prefetch intrinsic Instruction prefetch is not implemented for AArch64, it is incorrectly translated into data prefetch instruction. Differential Revision: http://reviews.llvm.org/D4777 llvm-svn: 214860	2014-08-05 12:46:47 +00:00
James Molloy	2b8933c354	Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost. Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account. llvm-svn: 214859	2014-08-05 12:30:34 +00:00
Joerg Sonnenberger	6b41a9900a	Allow binary and for tblgen math. llvm-svn: 214851	2014-08-05 09:43:25 +00:00
Chandler Carruth	947cef191d	[x86] Fix a crash and wrong-code bug in the new vector lowering all found by a single test reduced out of a failure on llvm-stress. The start of the problem (and the crash) came when we tried to use a find of a non-used slot in the move-to half of the move-mask as the target for two bad-half inputs. While if lucky this will be the first of a pair of slots which we can place the bad-half inputs into, it isn't actually guaranteed. This really isn't surprising, not sure what I was thinking. The correct way to find the two unused slots is to look for one of the used slots. We know it isn't that pair, and we can use some modular arithmetic to find the other pair by masking off the odd bit and adding 2 modulo 4. With this, we reliably found a viable pair of slots for the bad-half inputs. Sadly, that wasn't enough. We also had a wrong code bug that surfaced when I reduced the test case for this where we would use the same slot twice for the two bad inputs. This is because both of the bad inputs could be in odd slots originally and thus the mod-2 mapping would actually be the same. The whole point of the weird indexing into the pair of empty slots was to try to leverage when the end result needed the two bad-half inputs to be paired in a dword and pre-pair them in the correct orrientation. This is less important with the powerful combining we're now doing, and also easier and more reliable to achieve be noting that we add the bad-half inputs in order. Thus, if they are in a dword pair, the low part of that will be the first input in the sequence. Always putting that in the low element will just do the right thing in addition to computing the correct result. Test case added. =] llvm-svn: 214849	2014-08-05 08:19:21 +00:00
Juergen Ributzka	a126d1ef3c	[FastISel][AArch64] Implement the FastLowerArguments hook. This implements basic argument lowering for AArch64 in FastISel. It only handles a small subset of the C calling convention. It supports simple arguments that can be passed in GPR and FPR registers. This should cover most of the trivial cases without falling back to SelectionDAG. This fixes <rdar://problem/17890986>. llvm-svn: 214846	2014-08-05 05:43:48 +00:00
Kevin Qin	ec100526e3	Revert "r214832 - MachineCombiner Pass for selecting faster instruction" It broke compiling of most Benchmark and internal test, as clang got clashed by segmentation fault or assertion. llvm-svn: 214845	2014-08-05 05:43:47 +00:00
Juergen Ributzka	51f5326e25	[FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended. llvm-svn: 214844	2014-08-05 05:43:44 +00:00
Gerolf Hoflehner	4dbf44b9d8	MachineCombiner Pass for selecting faster instruction sequence on AArch64 Re-commit of r214669 without changes to test cases LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and LLVM:: CodeGen/AArch64/dp-3source.ll This resolves the reported compfails of the original commit. llvm-svn: 214832	2014-08-05 01:16:13 +00:00
Joerg Sonnenberger	755ffa9b54	Add TCR register access llvm-svn: 214826	2014-08-04 23:53:42 +00:00
Joerg Sonnenberger	5995e0021d	Add PPC 603's tlbld and tlbli instructions. llvm-svn: 214825	2014-08-04 23:49:45 +00:00
Bill Schmidt	f04e998e00	[PPC64LE] Fix wrong IR for vec_sld and vec_vsldoi My original LE implementation of the vsldoi instruction, with its altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect shufflevector operations in the LLVM IR. Correct code is generated because the back end handles the incorrect shufflevector in a consistent manner. This patch and a companion patch for Clang correct this problem by removing the fixup from altivec.h and the corresponding fixup from the PowerPC back end. Several test cases are also modified to reflect the now-correct LLVM IR. llvm-svn: 214800	2014-08-04 23:21:01 +00:00
Kevin Enderby	e3c13468bf	Enable Darwin vararg parameters support in assembler macros. Duplicate the vararg tests for linux and add a tests which mixed vararg arguments with darwin positional parameters. Patch by: Janne Grunau <j@jannau.net> llvm-svn: 214799	2014-08-04 23:14:37 +00:00
Joerg Sonnenberger	51cf733427	Add simplified aliases for access to DCCR, ICCR, DEAR and ESR llvm-svn: 214797	2014-08-04 22:56:42 +00:00
Juergen Ributzka	53533e885a	[FastISel][AArch64] Fix shift lowering for i8 and i16 value types. This fix changes the parameters #r and #s that are passed to the UBFM/SBFM instruction to get the zero/sign-extension for free. The original problem was that the shift left would use the 32-bit shift even for i8/i16 value types, which could leave the upper bits set with "garbage" values. The arithmetic shift right on the other side would use the wrong MSB as sign-bit to determine what bits to shift into the value. This fixes <rdar://problem/17907720>. llvm-svn: 214788	2014-08-04 21:49:51 +00:00
Chandler Carruth	40dbd382ad	[SDAG] Fix a really, really terrible bug in the DAG combiner. This code is completely wrong. It is also dead, as if it were to ever run, it would crash. Fortunately, after my work to the combiner, it is at least possible to reach the code, and llvm-stress has found a test case. Thanks to Patrick for reporting. It would be really good if anyone who remembers how this code works and what it was intended to do could add some more obvious test coverage instead of my completely contrived and reduced test case. My test case was so brittle I left a bread crumb comment in it to help the next person to stumble on it and not know what it was actually testing for. llvm-svn: 214785	2014-08-04 21:29:59 +00:00
Joerg Sonnenberger	6c3e38522a	tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs. llvm-svn: 214784	2014-08-04 21:28:22 +00:00
Chad Rosier	5908ab4dd6	[AArch64] Extend the number of scalar instructions supported in the AdvSIMD scalar integer instruction pass. This is a patch I had lying around from a few months ago. The pass is currently disabled by default, so nothing to interesting. llvm-svn: 214779	2014-08-04 21:20:25 +00:00
Joerg Sonnenberger	6d05a2b461	MC uses .lcomm now, so adjust. llvm-svn: 214776	2014-08-04 21:06:00 +00:00
Reid Kleckner	e704010450	Fix failure to invoke exception handler on Win64 When the last instruction prior to a function epilogue is a call, we need to emit a nop so that the return address is not in the epilogue IP range. This is consistent with MSVC's behavior, and may be a workaround for a bug in the Win64 unwinder. Differential Revision: http://reviews.llvm.org/D4751 Patch by Vadim Chugunov! llvm-svn: 214775	2014-08-04 21:05:27 +00:00
Joerg Sonnenberger	6e842b34a0	Recognize mftbl as alias for mftb, for symmetry with mttb. llvm-svn: 214769	2014-08-04 20:28:34 +00:00

1 2 3 4 5 ...

25524 Commits