llvm-project

Commit Graph

Author	SHA1	Message	Date
Jonathan Roelofs	ef84bda531	Re-apply r214881: Fix return sequence on armv4 thumb This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928	2014-08-05 21:32:21 +00:00
Bill Schmidt	42a6936c78	[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923	2014-08-05 20:47:25 +00:00
Sanjay Patel	1954f2e924	Improved test cases that were added with r214892. 1. Added ':' to CHECK-LABELs 2. Added more CHECKs 3. Added CHECK-NEXTs 4. Added verbose hex immediate comments to CHECKs llvm-svn: 214921	2014-08-05 20:16:35 +00:00
Chandler Carruth	a746239be3	[x86] Fix a crasher due to shuffles which cancel each other out and add a test case. We also miscompile this test case which is showing a serious flaw in the single-input v8i16 shuffle code. I've left the specific instruction checks FIXME-ed out until I can address the bug in the single-input code, but I wanted to separate out a significant functionality change to produce correct code from a very simple and targeted crasher fix. The miscompile problem stems from keeping track of inputs by value rather than by index. As a consequence of doing this, we can't reliably update those inputs because they might swap and we can't detect this without copying the mask. The blend code now uses indices for the input lists and this seems strictly better. It also should make it easier to sort things and do other cleanups. I think the time has come to simplify The Great Lambda here. llvm-svn: 214914	2014-08-05 18:45:49 +00:00
Jonathan Roelofs	064eb5a177	Revert r214881 because it broke lots of build-bots llvm-svn: 214893	2014-08-05 17:36:05 +00:00
Sanjay Patel	8e5beb6edb	Optimize vector fabs of bitcasted constant integer values. Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892	2014-08-05 17:35:22 +00:00
Adam Nemet	fd2161b710	[AVX512] Add masking variant and intrinsics for valignd/q This is similar to what I did with the two-source permutation recently. (It's almost too similar so that we should consider generating the masking variants with some tablegen help.) Both encoding and intrinsic tests are added as well. For the latter, this is what the IR that the intrinsic test on the clang side generates. Part of <rdar://problem/17688758> llvm-svn: 214890	2014-08-05 17:23:04 +00:00
Jonathan Roelofs	f5fad3767b	Fix return sequence on armv4 thumb POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881	2014-08-05 17:13:17 +00:00
David Blaikie	c74ffa9cab	Improve test for merged global debug info by using llvm-dwarfdump. It's a bit of a tradeoff, since llvm-dwarfdump doesn't print the name of the global symbol being used as an address in the addressing mode, but this avoids the dependence on hardcoded set labels that keep changing (5+ commits over the last few years that each update the set label as it changes due to other, unrelated differences in output). This could've, instead, been changed to match the set name then match the name in the string pool but that would present other issues (needing to skip over the sets that weren't of interest, etc) and checking that the addresses (granted, without relocations applied - so it's not the whole story) match in the two variable location descriptions seems sufficient and fairly stable here. There are a few similar other tests with similar label dependence that I'll update soonish. llvm-svn: 214878	2014-08-05 16:20:25 +00:00
Tom Stellard	229d5e669b	R600/SI: Update MUBUF assembly string to match AMD proprietary compiler llvm-svn: 214866	2014-08-05 14:48:12 +00:00
Tom Stellard	b37f797678	R600/SI: Avoid generating REGISTER_LOAD instructions. SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214865	2014-08-05 14:40:52 +00:00
Yi Kong	e56de69500	AArch64: Add support for instruction prefetch intrinsic Instruction prefetch is not implemented for AArch64, it is incorrectly translated into data prefetch instruction. Differential Revision: http://reviews.llvm.org/D4777 llvm-svn: 214860	2014-08-05 12:46:47 +00:00
Chandler Carruth	947cef191d	[x86] Fix a crash and wrong-code bug in the new vector lowering all found by a single test reduced out of a failure on llvm-stress. The start of the problem (and the crash) came when we tried to use a find of a non-used slot in the move-to half of the move-mask as the target for two bad-half inputs. While if lucky this will be the first of a pair of slots which we can place the bad-half inputs into, it isn't actually guaranteed. This really isn't surprising, not sure what I was thinking. The correct way to find the two unused slots is to look for one of the used slots. We know it isn't that pair, and we can use some modular arithmetic to find the other pair by masking off the odd bit and adding 2 modulo 4. With this, we reliably found a viable pair of slots for the bad-half inputs. Sadly, that wasn't enough. We also had a wrong code bug that surfaced when I reduced the test case for this where we would use the same slot twice for the two bad inputs. This is because both of the bad inputs could be in odd slots originally and thus the mod-2 mapping would actually be the same. The whole point of the weird indexing into the pair of empty slots was to try to leverage when the end result needed the two bad-half inputs to be paired in a dword and pre-pair them in the correct orrientation. This is less important with the powerful combining we're now doing, and also easier and more reliable to achieve be noting that we add the bad-half inputs in order. Thus, if they are in a dword pair, the low part of that will be the first input in the sequence. Always putting that in the low element will just do the right thing in addition to computing the correct result. Test case added. =] llvm-svn: 214849	2014-08-05 08:19:21 +00:00
Juergen Ributzka	a126d1ef3c	[FastISel][AArch64] Implement the FastLowerArguments hook. This implements basic argument lowering for AArch64 in FastISel. It only handles a small subset of the C calling convention. It supports simple arguments that can be passed in GPR and FPR registers. This should cover most of the trivial cases without falling back to SelectionDAG. This fixes <rdar://problem/17890986>. llvm-svn: 214846	2014-08-05 05:43:48 +00:00
Kevin Qin	ec100526e3	Revert "r214832 - MachineCombiner Pass for selecting faster instruction" It broke compiling of most Benchmark and internal test, as clang got clashed by segmentation fault or assertion. llvm-svn: 214845	2014-08-05 05:43:47 +00:00
Juergen Ributzka	51f5326e25	[FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended. llvm-svn: 214844	2014-08-05 05:43:44 +00:00
Gerolf Hoflehner	4dbf44b9d8	MachineCombiner Pass for selecting faster instruction sequence on AArch64 Re-commit of r214669 without changes to test cases LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and LLVM:: CodeGen/AArch64/dp-3source.ll This resolves the reported compfails of the original commit. llvm-svn: 214832	2014-08-05 01:16:13 +00:00
Bill Schmidt	f04e998e00	[PPC64LE] Fix wrong IR for vec_sld and vec_vsldoi My original LE implementation of the vsldoi instruction, with its altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect shufflevector operations in the LLVM IR. Correct code is generated because the back end handles the incorrect shufflevector in a consistent manner. This patch and a companion patch for Clang correct this problem by removing the fixup from altivec.h and the corresponding fixup from the PowerPC back end. Several test cases are also modified to reflect the now-correct LLVM IR. llvm-svn: 214800	2014-08-04 23:21:01 +00:00
Juergen Ributzka	53533e885a	[FastISel][AArch64] Fix shift lowering for i8 and i16 value types. This fix changes the parameters #r and #s that are passed to the UBFM/SBFM instruction to get the zero/sign-extension for free. The original problem was that the shift left would use the 32-bit shift even for i8/i16 value types, which could leave the upper bits set with "garbage" values. The arithmetic shift right on the other side would use the wrong MSB as sign-bit to determine what bits to shift into the value. This fixes <rdar://problem/17907720>. llvm-svn: 214788	2014-08-04 21:49:51 +00:00
Chandler Carruth	40dbd382ad	[SDAG] Fix a really, really terrible bug in the DAG combiner. This code is completely wrong. It is also dead, as if it were to ever run, it would crash. Fortunately, after my work to the combiner, it is at least possible to reach the code, and llvm-stress has found a test case. Thanks to Patrick for reporting. It would be really good if anyone who remembers how this code works and what it was intended to do could add some more obvious test coverage instead of my completely contrived and reduced test case. My test case was so brittle I left a bread crumb comment in it to help the next person to stumble on it and not know what it was actually testing for. llvm-svn: 214785	2014-08-04 21:29:59 +00:00
Chad Rosier	5908ab4dd6	[AArch64] Extend the number of scalar instructions supported in the AdvSIMD scalar integer instruction pass. This is a patch I had lying around from a few months ago. The pass is currently disabled by default, so nothing to interesting. llvm-svn: 214779	2014-08-04 21:20:25 +00:00
Joerg Sonnenberger	6d05a2b461	MC uses .lcomm now, so adjust. llvm-svn: 214776	2014-08-04 21:06:00 +00:00
Reid Kleckner	e704010450	Fix failure to invoke exception handler on Win64 When the last instruction prior to a function epilogue is a call, we need to emit a nop so that the return address is not in the epilogue IP range. This is consistent with MSVC's behavior, and may be a workaround for a bug in the Win64 unwinder. Differential Revision: http://reviews.llvm.org/D4751 Patch by Vadim Chugunov! llvm-svn: 214775	2014-08-04 21:05:27 +00:00
Ulrich Weigand	983341d3f3	[PowerPC] Add target triple to vec_urem_const.ll test case This should hopefully fix build bots on other architectures. llvm-svn: 214721	2014-08-04 14:55:26 +00:00
Robert Khasanov	7ca7df0bf9	[SKX] Enabling load/store instructions: encoding Instructions: VMOVAPD, VMOVAPS, VMOVDQA8, VMOVDQA16, VMOVDQA32,VMOVDQA64, VMOVDQU8, VMOVDQU16, VMOVDQU32,VMOVDQU64, VMOVUPD, VMOVUPS, Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214719	2014-08-04 14:35:15 +00:00
Ulrich Weigand	cc9909b881	[PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl* by swapping the input arguments. As it turns out, the exact same fix is also required for the vpkuhum/vpkuwum patterns. This fixes another regression in llvmpipe when vector support is enabled. Reviewed by Bill Schmidt. llvm-svn: 214718	2014-08-04 13:53:40 +00:00
Ulrich Weigand	51eccec5d9	[PowerPC] MULHU/MULHS are not legal for vector types I ran into some test failures where common code changed vector division by constant into a multiply-high operation (MULHU). But these are not implemented by the back-end, so we failed to recognize the insn. Fixed by marking MULHU/MULHS as Expand for vector types. llvm-svn: 214716	2014-08-04 13:27:12 +00:00
Ulrich Weigand	c4cc7febb0	[PowerPC] Fix and improve vector comparisons This patch refactors code generation of vector comparisons. This fixes a wrong code-gen bug for ISD::SETGE for floating-point types, and improves generated code for vector comparisons in general. Specifically, the patch moves all logic deciding how to implement vector comparisons into getVCmpInst, which gets two extra boolean outputs indicating to its caller whether its needs to swap the input operands and/or negate the result of the comparison. Apart from implementing these two modifications as directed by getVCmpInst, there is no need to ever implement vector comparisons in any other manner; in particular, there is never a need to perform two separate comparisons (e.g. one for equal and one for greater-than, as code used to do before this patch). Reviewed by Bill Schmidt. llvm-svn: 214714	2014-08-04 13:13:57 +00:00
Chandler Carruth	0e2ddb2790	[x86] Just unilaterally prefer SSSE3-style PSHUFB lowerings over clever use of PACKUS. It's cleaner that way. I looked at implementing clever combine-based folding of PACKUS chains into PSHUFB but it is quite hard and doesn't seem likely to be worth it. The most annoying part would be detecting that the correct masking had been done to use PACKUS-style instructions as a blend operation rather than there being any saturating as is indicated by its name. We generate really nice code for what few test cases I've come up with that aren't completely contrived for this by just directly prefering PSHUFB and so let's go with that strategy for now. =] llvm-svn: 214707	2014-08-04 10:17:35 +00:00
Chandler Carruth	06e6f1cae2	[x86] Implement more aggressive use of PACKUS chains for lowering common patterns of v16i8 shuffles. This implements one of the more important FIXMEs for the SSE2 support in the new shuffle lowering. We now generate the optimal shuffle sequence for truncate-derived shuffles which show up essentially everywhere. Unfortunately, this exposes a weakness in other parts of the shuffle logic -- we can no longer form PSHUFB here. I'll add the necessary support for that and other things in a subsequent commit. llvm-svn: 214702	2014-08-04 09:40:02 +00:00
Kevin Qin	f31ecf3fea	Revert "r214669 - MachineCombiner Pass for selecting faster instruction" This commit broke "make check" for several hours, so get it reverted. llvm-svn: 214697	2014-08-04 05:10:33 +00:00
Chandler Carruth	37a18821cd	[x86] Handle single input shuffles in the SSSE3 case more intelligently. I spent some time looking into a better or more principled way to handle this. For example, by detecting arbitrary "unneeded" ORs... But really, there wasn't any point. We just shouldn't build blatantly wrong code so late in the pipeline rather than adding more stages and logic later on to fix it. Avoiding this is just too simple. llvm-svn: 214680	2014-08-04 01:14:24 +00:00
Chandler Carruth	7bbfd245b0	[x86] Fix the test case added in r214670 and tweaked in r214674 further. Fundamentally, there isn't a really portable way to test the constant pool contents. Instead, pin this test to the bare-metal triple. This also makes it a 64-bit triple which allows us to only match a single constant pool rather than two. It can also just hard code the '.' prefix as the format should be stable now that it has a fixed triple. Finally, I've switched it to use CHECK-NEXT to be more precise in the instruction sequence expected and to use variables rather than hard coding decisions by the register allocator. llvm-svn: 214679	2014-08-04 00:54:28 +00:00
Sanjay Patel	065cabf43e	Account for possible leading '.' in label string. llvm-svn: 214674	2014-08-03 23:20:16 +00:00
Sanjay Patel	2ef67440fc	fix for PR20354 - Miscompile of fabs due to vectorization This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation. This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon. There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too. llvm-svn: 214670	2014-08-03 22:48:23 +00:00
Gerolf Hoflehner	35ba467122	MachineCombiner Pass for selecting faster instruction sequence - AArch64 target support This patch turns off madd/msub generation in the DAGCombiner and generates them in the MachineCombiner instead. It replaces the original code sequence with the combined sequence when it is beneficial to do so. When there is no machine model support it always generates the madd/msub instruction. This is true also when the objective is to optimize for code size: when the combined sequence is shorter is always chosen and does not get evaluated. When there is a machine model the combined instruction sequence is evaluated for critical path and resource length using machine trace metrics and the original code sequence is replaced when it is determined to be faster. rdar://16319955 llvm-svn: 214669	2014-08-03 22:03:40 +00:00
Matt Arsenault	9215b17eb7	R600/SI: Fix extra whitespace in asm str This slipped in in r214467, so something like V_MOV_B32_e32 v0, ... is now printed with 2 spaces between the instruction name and first operand. llvm-svn: 214660	2014-08-03 05:27:14 +00:00
James Molloy	6b999ae682	Update test to use a more modern AArch64 triple, as requested by Renato. llvm-svn: 214637	2014-08-02 17:15:11 +00:00
James Molloy	ce45be0465	[AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available. The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers! llvm-svn: 214634	2014-08-02 14:51:24 +00:00
Chandler Carruth	bec57b406d	[x86] Give this test a bare metal triple so it doesn't use the weird Darwin x86 asm comment prefix designed to work around GAS on that platform. That makes the comment-matching of the test much more stable. llvm-svn: 214629	2014-08-02 11:17:41 +00:00
Chandler Carruth	4c57955fe3	[x86] Largely complete the use of PSHUFB in the new vector shuffle lowering with a small addition to it and adding PSHUFB combining. There is one obvious place in the new vector shuffle lowering where we should form PSHUFBs directly: when without them we will unpack a vector of i8s across two different registers and do a potentially 4-way blend as i16s only to re-pack them into i8s afterward. This is the crazy expensive fallback path for i8 shuffles and we can just directly use pshufb here as it will always be cheaper (the unpack and pack are two instructions so even a single shuffle between them hits our three instruction limit for forming PSHUFB). However, this doesn't generate very good code in many cases, and it leaves a bunch of common patterns not using PSHUFB. So this patch also adds support for extracting a shuffle mask from PSHUFB in the X86 lowering code, and uses it to handle PSHUFBs in the recursive shuffle combining. This allows us to combine through them, combine multiple ones together, and generally produce sufficiently high quality code. Extracting the PSHUFB mask is annoyingly complex because it could be either pre-legalization or post-legalization. At least this doesn't have to deal with re-materialized constants. =] I've added decode routines to handle the different patterns that show up at this level and we dispatch through them as appropriate. The two primary test cases are updated. For the v16 test case there is still a lot of room for improvement. Since I was going through it systematically I left behind a bunch of FIXME lines that I'm hoping to turn into ALL lines by the end of this. llvm-svn: 214628	2014-08-02 10:39:15 +00:00
Chandler Carruth	34f9a987e9	[x86] Teach the target shuffle mask extraction to recognize unary forms of normally binary shuffle instructions like PUNPCKL and MOVLHPS. This detects cases where a single register is used for both operands making the shuffle behave in a unary way. We detect this and adjust the mask to use the unary form which allows the existing DAG combine for shuffle instructions to actually work at all. As a consequence, this uncovered a number of obvious bugs in the existing DAG combine which are fixed. It also now canonicalizes several shuffles even with the existing lowering. These typically are trying to match the shuffle to the domain of the input where before we only really modeled them with the floating point variants. All of the cases which change to an integer shuffle here have something in the integer domain, so there are no more or fewer domain crosses here AFAICT. Technically, it might be better to go from a GPR directly to the floating point domain, but detecting floating point outputs despite integer inputs is a lot more code and seems unlikely to be worthwhile in practice. If folks are seeing domain-crossing regressions here though, let me know and I can hack something up to fix it. Also as a consequence, a bunch of missed opportunities to form pshufb now can be formed. Notably, splats of i8s now form pshufb. Interestingly, this improves the existing splat lowering too. We go from 3 instructions to 1. Yes, we may tie up a register, but it seems very likely to be worth it, especially if splatting the 0th byte (the common case) as then we can use a zeroed register as the mask. llvm-svn: 214625	2014-08-02 10:27:38 +00:00
Akira Hatanaka	dc08c30df9	[ARM] In dynamic-no-pic mode, ARM's post-RA pseudo expansion was incorrectly expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used in pic mode. This patch fixes the bug. <rdar://problem/17886592> llvm-svn: 214614	2014-08-02 05:40:40 +00:00
Matt Arsenault	4de324442b	R600: Cleanup fneg tests llvm-svn: 214612	2014-08-02 02:26:51 +00:00
Chandler Carruth	063f425ea7	[x86] Make some questionable tests not spew assembly to stdout, which makes a mess of the lit output when they ultimately fail. The 2012-10-02-DAGCycle test is really frustrating because the only explanation for what it is testing is a rdar link. I would really rather that rdar links (which are not public or part of the open source project) were not committed to the source code. Regardless, the actual problem must be described as the rdar link is completely opaque. The fact that this test didn't check for any particular output further exacerbates the inability of any other developer to debug failures. The mem-promote-integers test has nice comments and seems to be a great test for our lowering... except that we don't actually check that any of the generated code is correct or matches some pattern. We just avoid crashing. It would be great to go back and populate this test with the actual expectations. llvm-svn: 214605	2014-08-02 00:50:10 +00:00
Akira Hatanaka	3516669a50	[X86] Simplify X87 stackifier pass. Stop using ST registers for function returns and inline-asm instructions and use FP registers instead. This allows removing a large amount of code in the stackifier pass that was needed to track register liveness and handle copies between ST and FP registers and function calls returning floating point values. It also fixes a bug which manifests when an ST register defined by an inline-asm instruction was live across another inline-asm instruction, as shown in the following sequence of machine instructions: 1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5> 2. INLINEASM <es:fldcw $0> 3. %FP0<def> = COPY %ST0 <rdar://problem/16952634> llvm-svn: 214580	2014-08-01 22:19:41 +00:00
NAKAMURA Takumi	49a53507d0	llvm/test/CodeGen/Mips/cconv/arguments-varargs.ll: Add explicit -mtriple=(mips\|mipsel)-linux on 4 lines. llvm-svn: 214578	2014-08-01 22:15:38 +00:00
Tom Stellard	4973a13680	Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp" This reverts commit r214566. I did not mean to commit this yet. llvm-svn: 214572	2014-08-01 21:55:50 +00:00
Reid Kleckner	6a2de90039	MS inline asm: Hide symbol to attempt to fix test failure on darwin If the symbol comes from an external DSO, it apparently requires indirection through a register. llvm-svn: 214571	2014-08-01 21:54:37 +00:00
Tom Stellard	c16f73d7c5	R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214566	2014-08-01 21:50:47 +00:00

1 2 3 4 5 ...

10447 Commits