llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	b2baffaffd	R600/SI: Fix offset folding in some cases with shifted pointers. Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) is only done if the add has one use. If the resulting constant add can be folded into an addressing mode, force this to happen for the pointer operand. This ends up happening a lot because of how LDS objects are allocated. Since the globals are allocated next to each other, acessing the first element of the second object is directly indexed by a shifted pointer. llvm-svn: 215739	2014-08-15 17:49:05 +00:00
Chandler Carruth	03f456abbe	[x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructions where applicable for blending. llvm-svn: 215737	2014-08-15 17:42:00 +00:00
Matt Arsenault	2e7cc48baa	R600/SI: Add intrinsic for ldexp llvm-svn: 215734	2014-08-15 17:30:25 +00:00
Juergen Ributzka	a6faae3c11	[FastISel][ARM] Fix unit test from r215682. Thanks Jim for finding this. llvm-svn: 215733	2014-08-15 17:23:20 +00:00
Matt Arsenault	5015a89aa5	R600/SI: Implement isLegalAddressingMode The default assumes that a 16-bit signed offset is used. LDS instruction use a 16-bit unsigned offset, so it wasn't being used in some cases where it was assumed a negative offset could be used. More should be done here, but first isLegalAddressingMode needs to gain an addressing mode argument. For now, copy most of the rest of the default implementation with the immediate offset change. llvm-svn: 215732	2014-08-15 17:17:07 +00:00
Moritz Roth	8f3765625e	ARM: Fix and re-enable load/store optimizer for Thumb1. In a previous iteration of the pass, we would try to compensate for writeback by updating later instructions and/or inserting a SUBS to reset the base register if necessary. Since such a SUBS sets the condition flags it's not generally safe to do this. For now, only merge LDR/STRs if there is no writeback to the base register (LDM that loads into the base register) or the base register is killed by one of the merged instructions. These cases are clear wins both in terms of instruction count and performance. Also add three new test cases, and update the existing ones accordingly. llvm-svn: 215729	2014-08-15 17:00:30 +00:00
Amara Emerson	82da7d07a1	[AArch64] Narrow arguments passed in wrong position on the stack in big-endian mode. Patch by Asiri Rathnayake. Differential Revision: http://reviews.llvm.org/D4922 llvm-svn: 215716	2014-08-15 14:29:57 +00:00
Bill Schmidt	9bac9ca796	[PPC64] Add test case for r215685. I had deferred adding this test case until I could get it down to a reasonable size. That's done now. Thanks, Bill llvm-svn: 215711	2014-08-15 13:51:57 +00:00
Chandler Carruth	f88612581e	[x86] Add the initial skeleton of type-based dispatch for AVX vectors in the new shuffle lowering and an implementation for v4 shuffles. This allows us to handle non-half-crossing shuffles directly for v4 shuffles, both integer and floating point. This currently misses places where we could perform the blend via UNPCK instructions, but otherwise generates equally good or better code for the test cases included to the existing vector shuffle lowering. There are a few cases that are entertainingly better. ;] llvm-svn: 215702	2014-08-15 11:01:40 +00:00
Chandler Carruth	17fd848bfa	[x86] Fix the very broken formation of vpunpck instructions in the target-specific shuffl DAG combines. We were recognizing the paired shuffles backwards. This code needs to be replaced anyways as we have the same functionality elsewhere, but I'll do the refactoring in a follow-up, this is the minimal fix to the behavior. In addition to fixing miscompiles with the new vector shuffle lowering, it also causes the canonicalization to kick in much better, selecting the smaller encoding variants in lots of places in the new AVX path. This still isn't quite ideal as we don't need both the shufpd and the punpck instructions, but that'll get fixed in a follow-up patch. llvm-svn: 215690	2014-08-15 03:54:49 +00:00
Chandler Carruth	372c143c2f	[x86] Fix PR20540 where the x86 shuffle DAG combiner had completely broken logic for merging shuffle masks in the face of SM_SentinelZero mask operands. While these are '-1' they don't mean 'undef' the way '-1' means in the pre-legalized shuffle masks. Instead, they mean that the shuffle operation is forcibly zeroing that lane. Reflect this and explicitly handle it in a bunch of places. In one place the effect is equivalent but much more clear. In the rest it was really weirdly broken. Also, rewrite the entire merging thing to be a more directy operation with a single loop and just doing math to map the indices through the various masks. Also add a bunch of asserts to try to make in extremely clear what the different masks can possibly look like. Finally, add some comments to clarify that we're merging shuffle masks up here rather than down as we do everywhere else, and thus the logic is quite confusing. Thanks to several different people for sending test cases, and for Robert Khasanov for an initial attempt at fixing. llvm-svn: 215687	2014-08-15 02:43:18 +00:00
Juergen Ributzka	81db58e177	[FastISel][ARM] Fall-back to constant pool loads when materializing an i32 constant. FastEmit_i won't always succeed to materialize an i32 constant and just fail. This would trigger a fall-back to SelectionDAG, which is really not necessary. This fix will first fall-back to a constant pool load to materialize the constant before giving up for good. This fixes <rdar://problem/18022633>. llvm-svn: 215682	2014-08-14 23:29:49 +00:00
Juergen Ributzka	790bacf232	Revert several FastISel commits to track down a buildbot error. This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673	2014-08-14 19:56:28 +00:00
Adam Nemet	241c3252c3	[AVX512] Add test for FMA masking instrinsics llvm-svn: 215665	2014-08-14 17:13:33 +00:00
Adam Nemet	6fa675754a	[AVX512] Switch FMA intrinsics to the masking version This does the renaming and updates the lowering logic. Part of <rdar://problem/17688758> llvm-svn: 215664	2014-08-14 17:13:30 +00:00
Juergen Ributzka	34ed422c42	Revert "[FastISel][AArch64] Add support for more addressing modes." This reverts commits r215597, because it might have broken the build bots. llvm-svn: 215659	2014-08-14 17:10:54 +00:00
Sanjay Patel	35d3133650	optimize vector fneg of bitcasted integer value This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops. This patch is very similar to a fabs patch committed at r214892. Differential Revision: http://reviews.llvm.org/D4852 llvm-svn: 215646	2014-08-14 15:15:28 +00:00
Toma Tabacu	726f1ea2c5	[mips] Improve robustness of some tests. Summary: This is done by removing some hardcoded registers like $at or expecting a single digit register to be selected. Contains work done by Matheus Almeida. Reviewers: matheusalmeida, dsanders Reviewed By: dsanders Subscribers: tomatabacu Differential Revision: http://reviews.llvm.org/D4227 llvm-svn: 215640	2014-08-14 13:10:48 +00:00
Chandler Carruth	a8311b3681	[x86] Begin stubbing out the AVX support in the new vector shuffle lowering scheme. Currently, this just directly bails to the fallback path of splitting the 256-bit vector into two 128-bit vectors, operating there, and then joining the results back together. While the results are far from perfect, they are shockingly good for what we're doing here. I'll be layering the rest of the functionality on top of this piece by piece and updating tests as I go. Note that 256-bit vectors in this mode are still somewhat WIP. While I think the code paths that I'm adding here are clean and good-to-go, there are still a lot of 128-bit assumptions that I'll need to stomp out as I march through the functional spread here. llvm-svn: 215637	2014-08-14 12:13:59 +00:00
Chandler Carruth	7cd15be784	[SDAG] Fix a bug in the DAG combiner where we would fail to return the input node after manually adding it to the worklist and using CombineTo. Once we use CombineTo the input node may have been deleted. Despite this being completely confusing and somewhat broken, the only way to "correctly" return from a DAG combine after potentially deleting the input node is to return that exact node.... But really, this code should just never have used CombineTo. It won't do what it wants (returning the node as mentioned above just causes the combine to infloop). The correct way to combine away a casted load to a load of the correct type is to RAUW the chain directly and then return the loaded value to replace the actual value node. I managed to find this with the vector shuffle fuzzer even though it clearly has nothing at all to do with vector shuffles and rather those happen to trigger a load of a constant pool that hits this combine just right. I've included the test as it is small and a nice stress test that the infrastructure isn't asserting. llvm-svn: 215622	2014-08-14 08:18:34 +00:00
Chandler Carruth	8039b16de7	[SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it stops being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 215611	2014-08-14 01:07:37 +00:00
Akira Hatanaka	b74db09c97	[AArch64, fast-isel] Fall back to SelectionDAG to select tail calls. Certain functions such as objc_autoreleaseReturnValue have to be called as tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls, we have to fall back to SelectionDAG to select calls that are marked as tail. <rdar://problem/17991614> llvm-svn: 215600	2014-08-13 23:23:58 +00:00
Juergen Ributzka	98347d902e	[FastISel][AArch64] Add support for more addressing modes. FastISel didn't take much advantage of the different addressing modes available to it on AArch64. This commit allows the ComputeAddress method to recognize more addressing modes that allows shifts and sign-/zero-extensions to be folded into the memory operation itself. For Example: lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3] ldr x0, [x0, x1] sxtw x1, w1 lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3] ldr x0, [x0, x1] llvm-svn: 215597	2014-08-13 22:53:29 +00:00
Juergen Ributzka	0f8bc043c5	[FastISel][X86] Add large code model support for materializing floating-point constants. In the large code model for X86 floating-point constants are placed in the constant pool and materialized by loading from it. Since the constant pool could be far away, a PC relative load might not work. Therefore we first materialize the address of the constant pool with a movabsq and then load from there the floating-point value. Fixes <rdar://problem/17674628>. llvm-svn: 215595	2014-08-13 22:25:35 +00:00
Juergen Ributzka	ba8b79e932	[FastISel][X86] Use XOR to materialize the "0" value. llvm-svn: 215594	2014-08-13 22:22:17 +00:00
Juergen Ributzka	230494b399	[FastISel][X86] Emit more efficient instructions for integer constant materialization. This mostly affects the i64 value type, which always resulted in an 15byte mobavsq instruction to materialize any constant. The custom code checks the value of the immediate and tries to use a different and smaller mov instruction when possible. This fixes <rdar://problem/17420988>. llvm-svn: 215593	2014-08-13 22:18:11 +00:00
Juergen Ributzka	24080d60fa	[FastISel][AArch64] Make use of the zero register when possible. This change materializes now the value "0" from the zero register. The zero register can be folded by several instruction, so no materialization is need at all. Fixes <rdar://problem/17924413>. llvm-svn: 215591	2014-08-13 22:13:14 +00:00
Juergen Ributzka	7cee768e55	[FastISel] Let the target decide first if it wants to materialize a constant. This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 215588	2014-08-13 22:08:02 +00:00
Juergen Ributzka	a5b083853c	[FastISel][ARM] Use MOVT/MOVW if the subtarget requests it. This change is also in preparation for a future change to make sure that the constant materialization uses MOVT/MOVW when available and not a load from the constant pool. llvm-svn: 215584	2014-08-13 21:42:19 +00:00
Matt Arsenault	74ef277774	R600: Correctly set the src value offset for scalarized kernel args This for some reason fixes v1i64 kernel arguments on pre-SI. This currently breaks some other cases in the kernel-args.ll test for R600, but I'm not particularly confident in the new output. VTX_READ_* are not used for some of the scalarized cases, and the code reading from the constant buffer doesn't make much sense to me. llvm-svn: 215564	2014-08-13 18:14:11 +00:00
Andrea Di Biagio	ace8e1e3d4	[DAGCombiner] Improved target independent vector shuffle combine rule. This patch improves the existing algorithm in DAGCombiner that attempts to fold shuffles according to rule: shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3) Before this change, there were cases where the DAGCombiner conservatively avoided folding shuffles even if the resulting mask would have been legal. That is because the algorithm wrongly assumed that commuting an illegal shuffle mask would always produce an illegal mask. With this change, we now correctly compute the commuted shuffle mask before calling method 'isShuffleMaskLegal' on it. On X86, this improves for example the codegen for the following function: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3> ret <4 x i32> %2 } Before this change the X86 backend (-mcpu=corei7) generated the following assembly code for function @test: shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3] movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] movaps %xmm1, %xmm0 Now we produce: movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1] Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly fold according to the above-mentioned rule. llvm-svn: 215555	2014-08-13 16:09:40 +00:00
Robert Khasanov	ed8829703f	[SKX] Extended non-temporal load/store instructions for AVX512VL subsets. Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction. Added encoding and lowering tests. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 215536	2014-08-13 10:46:00 +00:00
Elena Demikhovsky	51bbd011c3	AVX-512: Fixed a bug in shufflevector lowering. PALIGNR instruction does not exist in AVX-512F set. Added a test. llvm-svn: 215526	2014-08-13 07:58:43 +00:00
Chandler Carruth	b7eda21bb0	[x86] Rewrite a core part of the new vector shuffle lowering to handle one pesky test case correctly. This test case caused the old code to infloop occilating between solving the low-half and the high-half. The 'side balancing' part of single-input v8 shuffle lowering didn't handle the one pattern which can cause it to occilate. Fortunately the fuzz testing found this case. Unfortuately it was terrible to handle. I'm really sorry for the amount and density of the code here, I'd love suggestions on how to simplify it. I feel like there must be a simpler form here, but after a lot of days I've not found it. This is the only one I've found that even works. I've added the one pesky test case along with some nice comments explaining the core problem that we have to solve here. So far this has survived approximately 32k test cases. More strenuous fuzzing commencing. llvm-svn: 215519	2014-08-13 01:25:45 +00:00
Hal Finkel	46ef7ce283	[PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store intrinsics. As with the construction of the MachineMemOperands for the intrinsic calls used for unaligned load/store lowering, the only slight complication is that we need to represent a larger memory range than the loaded/stored value-type size (because the address is rounded down to an aligned address, and we need to conservatively represent the entire possible range of the actual access). This required adding an extra size field to TargetLowering::IntrinsicInfo, and this was done in a way that required no modifications to other targets (the size defaults to the store size of the provided memory data type). This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed). llvm-svn: 215512	2014-08-13 01:15:40 +00:00
Hal Finkel	415e344f29	Fix classof for ISD::INTRINSIC_W_CHAIN and INTRINSIC_VOID Unfortunately, our use of the SDNode class hierarchy for INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes is somewhat broken right now. These nodes sometimes are used for memory intrinsics (those with MachineMemOperands), and sometimes not. When not, the nodes are not created as instances of MemIntrinsicSDNode, but rather created as some other subclass of SDNode using DAG::getNode. When they are memory intrinsics, they are created using DAG::getMemIntrinsicNode as instances of MemIntrinsicSDNode. MemIntrinsicSDNode is a subclass of MemSDNode, but prior to r214452, we had a non-self-consistent setup whereby MemIntrinsicSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return true but MemSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would return false. In r214452, MemSDNode::classof was changed to return true for INTRINSIC_W_CHAIN and INTRINSIC_VOID, which is now self-consistent. The problem is that neither the pre-r214452 logic and the post-r214452 logic are really right. The truth is that not all INTRINSIC_W_CHAIN and INTRINSIC_VOID nodes are instances of MemIntrinsicSDNode (or MemSDNode for that matter), and the return value from classof needs to reflect that. This was broken before r214452 (because MemIntrinsicSDNode::classof always returned true), and was broken afterward (because MemSDNode::classof also always returned true), and will now be correct. The minimal solution is to grab one of the SubclassData bits (there is one left for MemIntrinsicSDNode nodes) and use it to store whether or not a particular INTRINSIC_W_CHAIN or INTRINSIC_VOID is really an instance of MemIntrinsicSDNode or not. Doing this allows both MemIntrinsicSDNode::classof and MemSDNode::classof to return the correct answer for the underlying object for both the memory-intrinsic and non-memory-intrinsic cases. This fixes the problem that r214452 created in the SelectionDAGDumper (thanks to Matt Arsenault for pointing it out). Because PowerPC does not implement getTgtMemIntrinsic, this change breaks test/CodeGen/PowerPC/unal-altivec-wint.ll. I've XFAILed it for now, and will fix it in a follow-up commit. llvm-svn: 215511	2014-08-13 01:15:37 +00:00
Adam Nemet	18355283ce	[AVX512] Verify the code generated for the intrinsic _mm512_broadcastsd_pd llvm-svn: 215487	2014-08-13 00:30:05 +00:00
Adam Nemet	cee9d0a460	[AVX512] Handle valign masking intrinsic via C++ lowering I think that this will scale better in most cases than adding a Pat<> for each mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz). We can just lower to the SDNode and have the resulting DAG be matches by the DAG patterns. Alternatively (long term), we could keep the Pat<>s but generate them via the new AVX512_masking multiclass. The difficulty is that in order to formulate that we would have to concatenate DAGs. Currently this is only supported if the operators of the input DAGs are identical. llvm-svn: 215473	2014-08-12 21:13:12 +00:00
Jan Vesely	e5ca27d716	R600: Use optimized 24bit path in udivrem v2: drop enum keyword use correct extension mode don't bother computing the sign in unsinged case Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215462	2014-08-12 17:31:20 +00:00
Jan Vesely	4a33bc6206	R600: Use i24 optimized path for SREM v2: add tests rename LowerSDIV24 to LowerSDIVREM24 handle the rem part in this function Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215460	2014-08-12 17:31:17 +00:00
Gerolf Hoflehner	eb90500d06	[MachineCombiner] Fix for ICE bug 20598 The combiner ignored DBG nodes when checking the uses of a virtual register. It combined a sequence like %vreg1 = madd %vreg2, %vreg3,... DBG_VALUE (%vreg1 ...) %vreg4 = add %vreg1,... to %vreg4 = madd %vreg2, %vreg3 leaving behind a dangling DBG_VALUE with a definition. This triggered an assertion in the MachineTraceMetrics.cpp module. llvm-svn: 215431	2014-08-12 07:54:12 +00:00
Michael J. Spencer	6b2f5b47d2	[x86] Fold extract_vector_elt of a load into the Load's address computation. llvm-svn: 215409	2014-08-11 23:49:33 +00:00
Tom Stellard	155bbb7713	R600/SI: Add a ComplexPattern for selecting MUBUF _OFFSET variant This saves us from having to copy a 64-bit 0 value into VGPRs for BUFFER_* instruction which only have a 12-bit immediate offset. llvm-svn: 215399	2014-08-11 22:18:17 +00:00
Tom Stellard	48194cdd81	R600/SI: Add check for low 32 bits of encoding to mubuf tests There are no variable values like registers encoded in the low 32 bits of MUBUF instructions, so it is relatively easy to check these bits, and it will help prevent us from introducing encoding bugs. llvm-svn: 215397	2014-08-11 22:18:11 +00:00
Tom Stellard	93ba12f163	R600/SI: Clear lds bit on MUBUF instructions used for private stores This bit was left uninitialized, which was causing some random failures of piglit tests. NOTE: This is a candidate for the 3.5 branch. llvm-svn: 215396	2014-08-11 22:18:09 +00:00
Tom Stellard	e04fd9d430	R600/SI: Fix broken test llvm-svn: 215395	2014-08-11 22:18:05 +00:00
Quentin Colombet	c64c175173	[AArch64] Fix registerAllocator assigns same register for base and wback in pre/post-index load and store. Patch by Steven Wu <stevenwu@apple.com> llvm-svn: 215390	2014-08-11 21:39:53 +00:00
Saleem Abdulrasool	27c78bf131	ARM: try harder to detect non-IT eligible instructions For many Thumb-1 register register instructions, setting the CPSR is not permitted inside an IT block. We would not correctly flag those instructions. The previous change to identify this scenario was insufficient as it did not actually catch all the instances. The current list is formed by manual inspection of the ARMv6M ARM. The change to the Thumb2 IT block test is due to the fact that the new more stringent checking of the MIs results in the If Conversion pass being prevented from executing (since not all the instructions in the BB are predicable). This results in code gen changes. Thanks to Tim Northover for pointing out that the previous patch was insufficient and hinting that the use of the v6M ARM would be much easier to use than the v7 or v8! llvm-svn: 215382	2014-08-11 20:13:25 +00:00
Sanjay Patel	1f80cde804	Correct a missing RUN line in the ARM codegen test for fneg ops. We should also explicitly specify +/-neonfp. The bug was introduced at r99570 when use of "-arm-use-neon-fp" was removed. Differential Revision: http://reviews.llvm.org/D4846 llvm-svn: 215377	2014-08-11 19:04:28 +00:00
Oliver Stannard	11790b2dac	ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention By default, LLVM uses the "C" calling convention for all runtime library functions. The half-precision FP conversion functions use the soft-float calling convention, and are needed for some targets which use the hard-float convention by default, so must have their calling convention explicitly set. llvm-svn: 215348	2014-08-11 09:12:32 +00:00
Jiangning Liu	dd6e12d71c	In Machine CSE pass, the source register of a COPY machine instruction can be propagated to all its users, and this propagation could increase the probability of finding common subexpressions. If the COPY has only one user, the COPY itself can be removed. llvm-svn: 215344	2014-08-11 05:17:19 +00:00
Petar Jovanovic	3a908a0bfc	Add support for scalarizing cttz_zero_undef Follow up to r214266. Add missing case in ScalarizeVectorResult() for cttz_zero_undef. Differential Revision: http://reviews.llvm.org/D4813 llvm-svn: 215330	2014-08-10 22:49:54 +00:00
Saleem Abdulrasool	ed8885b402	ARM: correct isPredicable for MULS in ThHUMB mode The ARM ARM states that CPSR may not be updated by a MUL in thumb mode. Due to an ordering of Thumb 2 Size Reduction and If Conversion, we would end up generating a THUMB MULS inside an IT block. The If Conversion pass uses the TTI isPredicable method to ensure that it can transform a Basic Block. However, because we only check for IT handling on Thumb2 functions, we may miss some cases. Even then, it only validates that the CPSR is not live rather than it is not accessed. This corrects the handling for that particular case since the same restriction does not hold on the vast majority of the instructions. This does prevent the IfConversion optimization from kicking in in certain cases, but generating correct code is more valuable. Addresses PR20555. llvm-svn: 215328	2014-08-10 22:20:37 +00:00
Tom Stellard	c0503db9e2	R600/SI: Custom lower CONCAT_VECTORS This will lower them using register copies rather than loads and stores to the stack. llvm-svn: 215270	2014-08-09 01:06:56 +00:00
Tom Stellard	4f575f7aaf	R600/SI: Update concat_vectors.ll to check for scratch usage These tests were using SI-NOT: MOVREL to make sure concat vectors weren't being lowered to stack loads and stores, but we are using scratch buffers for the stack now instead of registers, so we need to add an additional SI-NOT check for scratch buffers. With this change I was able to uncover one broken test which will be fixed in a future commit. llvm-svn: 215269	2014-08-09 01:06:53 +00:00
Joerg Sonnenberger	7ee0f31a8b	Provide an implementation of getNoopForMachoTarget for PPC, otherwise empty functions will assert in the MC object writer. llvm-svn: 215238	2014-08-08 19:13:23 +00:00
Juergen Ributzka	793f28d274	[FastISel][X86] Fix INC/DEC optimization (r215230) I accidentally also used INC/DEC for unsigned arithmetic which doesn't work, because INC/DEC don't set the required flag which is used for the overflow check. llvm-svn: 215237	2014-08-08 18:47:04 +00:00
Juergen Ributzka	4022614899	[FastISel][X86] Use INC/DEC when possible for {sadd\|ssub}.with.overflow intrinsics. This is a small peephole optimization to emit INC/DEC when possible. Fixes <rdar://problem/17952308>. llvm-svn: 215230	2014-08-08 17:21:37 +00:00
Daniel Sanders	feb613028b	[mips] Invert the abicalls feature bit to be noabicalls so that it's possible for -mno-abicalls to take effect. Also added the testcase that should have been in r215194. This behaviour has surprised me a few times now. The problem is that the generated MipsSubtarget::ParseSubtargetFeatures() contains code like this: if ((Bits & Mips::FeatureABICalls) != 0) IsABICalls = true; so '-abicalls' means 'leave it at the default' and '+abicalls' means 'set it to true'. In this case, (and the similar -modd-spreg case) I'd like the code to be IsABICalls = (Bits & Mips::FeatureABICalls) != 0; or possibly: if ((Bits & Mips::FeatureABICalls) != 0) IsABICalls = true; else IsABICalls = false; and preferably arrange for 'Bits & Mips::FeatureABICalls' to be true by default (on some triples). llvm-svn: 215211	2014-08-08 15:47:17 +00:00
Jiangning Liu	dcc651f99f	[AArch64] Fix a type conversion bug for anlyzing compare. The bug can cause spec2006/483.xalancbmk failure. Patched by David Xu. llvm-svn: 215206	2014-08-08 14:19:29 +00:00
Daniel Sanders	c30f30fe8a	[mips] Remove reason for XFAIL from a test that isn't actually XFAILed. llvm-svn: 215201	2014-08-08 12:58:17 +00:00
James Molloy	3feea9c11a	[AArch64] Add an FP load balancing pass for Cortex-A57 For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations. This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU. Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness). This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected. llvm-svn: 215199	2014-08-08 12:33:21 +00:00
Tim Northover	0f18ff9817	AArch64: stop trying to take control of all UnknownArch triples. This short-circuited our error reporting for incorrectly specified target triples (you'd get AArch64 code instead). Should fix PR20567. llvm-svn: 215191	2014-08-08 08:27:44 +00:00
Patrik Hagglund	b0e86ec814	[pr19635] Revert most of r170537, and add new testcase. Patch provided by Andrey Kuharev. Sorry, r170537 was obviously wrong. llvm-svn: 215190	2014-08-08 08:21:19 +00:00
Adam Nemet	7d498629f1	[AVX512] Add zero-masking variant to AVX512_masking multiclass This completes one item from the todo-list of r215125 "Generate masking instruction variants with tablegen". The AddedComplexity is needed just like for the k variant. Added a codegen test based on valignq. llvm-svn: 215173	2014-08-07 23:53:38 +00:00
Adam Nemet	fa1f7201fc	[AVX512] Add codegen test for the masking variant of valign The AddedComplexity is needed just like in avx512_perm_3src. There may be a bug in the complexity computation... llvm-svn: 215168	2014-08-07 23:18:18 +00:00
Akira Hatanaka	5acc58fcfb	[stack protector] Look through bitcasts to get global variable __stack_chk_guard. Handle the case where the pointer operand of the load instruction that loads the stack guard is not a global variable but instead a bitcast. %StackGuard = load i8 bitcast (i64 @__stack_chk_guard to i8*) call void @llvm.stackprotector(i8 %StackGuard, i8** %StackGuardSlot) Original test case provided by Ana Pazos. This fixes PR20558. llvm-svn: 215167	2014-08-07 23:08:24 +00:00
Adrian Prantl	80c8b2742f	Make these regexes stricter by disallowing any additional characters in the output. Thanks to dblaikie for pointing this out! llvm-svn: 215166	2014-08-07 23:04:07 +00:00
Adrian Prantl	26e66b155f	Reflow this comment. llvm-svn: 215160	2014-08-07 22:44:24 +00:00
Reed Kotler	87048a4c9e	fix materialization of one bit constants and global values which are accessed through a base GOT entry. Summary: get tip of tree mips fast-isel to pass test-suite Two bugs were fixed: 1) one bit booleans were treated as 1 bit signed integers and so the literal '1' could become sign extended. 2) mips uses got for pic but in certain cases, as with string constants for example, many items can be referenced from the same got entry and this case was not handled properly. Test Plan: test-suite Reviewers: dsanders Reviewed By: dsanders Subscribers: mcrosier Differential Revision: http://reviews.llvm.org/D4801 llvm-svn: 215155	2014-08-07 22:09:01 +00:00
Gerolf Hoflehner	97c383bc36	MachineCombiner Pass for selecting faster instruction sequence on AArch64 Re-commit of r214832,r21469 with a work-around that avoids the previous problem with gcc build compilers The work-around is to use SmallVector instead of ArrayRef of basic blocks in preservesResourceLen()/MachineCombiner.cpp llvm-svn: 215151	2014-08-07 21:40:58 +00:00
Akira Hatanaka	bbd33f6766	[Branch probability] Recompute branch weights of tail-merged basic blocks. BranchFolderPass was not correctly setting the basic block branch weights when tail-merging created or merged blocks. This patch recomutes the weights of tail-merged blocks using the following formula: branch_weight(merged block to successor j) = sum(block_frequency(bb) * branch_probability(bb -> j)) bb is a block that is in the set of merged blocks. <rdar://problem/16256423> llvm-svn: 215135	2014-08-07 19:30:13 +00:00
Chandler Carruth	4e8fcbd3fd	[x86] Fix another miscompile found through fuzz testing the new vector shuffle lowering. This is closely related to the previous one. Here we failed to use the source offset when swapping in the other case -- where we end up swapping the final shuffle. The cause of this bug is a bit different: I simply wasn't thinking about the fact that this mask is actually a slice of a wide mask and thus has numbers that need SourceOffset applied. Simple fix. Would be even more simple with an algorithm-y thing to use here, but correctness first. =] llvm-svn: 215095	2014-08-07 10:37:35 +00:00
Chandler Carruth	e206385e99	[x86] Fix another miscompile in the new vector shuffle lowering found via the fuzz tester. Here I missed an offset when round-tripping a value through a shuffle mask. I got it right 2 lines below. See a problem? I do. ;] I'll probably be adding a little "swap" algorithm which accepts a range and two values and swaps those values where they occur in the range. Don't really have a name for it, let me know if you do. llvm-svn: 215094	2014-08-07 10:14:27 +00:00
Chandler Carruth	78494364d1	[x86] Fix another miscompile in the new vector shuffle lowering found through the new fuzzer. This one is great: bad operator precedence led the modulus to happen at the wrong point. All the asserts didn't fire because there were usually the right values past the end of the 4 element region we were looking at. Probably could have gotten a crash here with ASan + fuzzing, but the correctness tests pinpointed this really nicely. llvm-svn: 215092	2014-08-07 09:45:02 +00:00
Pavel Chupin	f55eb450e5	[x32] Use ebp/esp as frame and stack pointer Summary: Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still require 64-bit register, so using 64-bit MachineFramePtr where required. X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing this issue here as well by making isTarget64BitLP64 false. Also mark hasReservedSpillSlot unreachable on X86. See inlined comments. Test Plan: Add one new simple test and upgrade 2 existing with x32 target case. Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D4617 llvm-svn: 215091	2014-08-07 09:41:19 +00:00
Chandler Carruth	27046758de	[x86] Fix a miscompile in the new shuffle lowering found through the new fuzz testing. The function which tested for adjacency did what it said on the tin, but when I called it, I wanted it to do something more thorough: I wanted to know if the pairs of shuffle elements were adjacent and started at 0 mod 2. In one place I had the decency to try to test for this, but in the other it was completely skipped, miscompiling this test case. Fix this by making the helper actually do what I wanted it to do everywhere I called it (and removing the now redundant code in one place). I really dislike the name "canWidenShuffleElements" for this predicate. If anyone can come up with a better name, please let me know. The other name I thought about was "canWidenShuffleMask" but is it really widening the mask to reduce the number of lanes shuffled? I don't know. Naming things is hard. llvm-svn: 215089	2014-08-07 08:11:31 +00:00
Sanjay Patel	cd47959eb6	Fix a test that has no checks. X86 doesn't have fneg, so check for xor. Differential Revision: http://reviews.llvm.org/D4812 llvm-svn: 214992	2014-08-06 20:45:30 +00:00
Matt Arsenault	a6dc6c281c	R600: Cleanup fadd and fsub tests llvm-svn: 214991	2014-08-06 20:27:55 +00:00
Reid Kleckner	2daa731bab	Add a triple to this test to get the right IR mangling llvm-svn: 214982	2014-08-06 18:09:15 +00:00
Reid Kleckner	61bac93faa	Don't count inreg params when mangling fastcall functions This is consistent with MSVC. llvm-svn: 214981	2014-08-06 18:09:04 +00:00
Reid Kleckner	e41d957028	Round up the size of byval arguments to MinAlign Otherwise we can end up with an argument frame size that is not a multiple of stack slot size, which is very awkward. This fixes PR20547, which was a bug in x86_64 Sys V vararg handling. However, it's much easier to test this with x86 callee-cleanup functions, which previously ended in "retl $6" instead of "retl $8". This does affect behavior of all backends, but it presumably fixes the same bug in all of them. llvm-svn: 214980	2014-08-06 17:57:23 +00:00
Robert Khasanov	3c30c4bdec	[AVX512] Added load/store instructions to Register2Memory opcode tables. Added lowering tests for load/store. Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214972	2014-08-06 15:40:34 +00:00
James Molloy	99917946da	[AArch64] Add a testcase for r214957. llvm-svn: 214965	2014-08-06 13:31:32 +00:00
Tim Northover	2a417b96d4	ARM: do not generate BLX instructions on Cortex-M CPUs. Particularly on MachO, we were generating "blx _dest" instructions on M-class CPUs, which don't actually exist. They happen to get fixed up by the linker into valid "bl _dest" instructions (which is why such a massive issue has remained largely undetected), but we shouldn't rely on that. llvm-svn: 214959	2014-08-06 11:13:14 +00:00
Tim Northover	d4d294dd51	ARM-MachO: materialize callee address correctly on v4t. llvm-svn: 214958	2014-08-06 11:13:06 +00:00
Chandler Carruth	c3927cd8c9	[x86] Fix two independent miscompiles in the process of getting the same test case to actually generate correct code. The primary miscompile fixed here is that we weren't correctly handling in-place elements in one half of a single-input v8i16 shuffle when moving a dword of elements from that half to the other half. Some times, we would clobber the in-place elements in forming the dword to move across halves. The fix to this involves forcibly marking the in-place inputs even when there is no need to gather them into a dword, and to much more carefully re-arrange the elements when grouping them into a dword to move across halves. With these two changes we would generate correct shuffles for the test case, but found another miscompile. There are also some random perturbations of the generated shuffle pattern in SSE2. It looks like a wash; more instructions in some cases fewer in others. The second miscompile would corrupt the results into nonsense. This is a buggy pattern in one of the added DAG combines. Mapping elements through a PSHUFD when pairing redundant half-shuffles is much harder than this code makes it out to be -- it requires reasoning about all of where the input is used in the PSHUFD, not just one part of where it is used. Plus, we can't combine a half shuffle into a PSHUFD but the code didn't guard against it. I think this was just a bad idea and I've just removed that aspect of the combine. No tests regress as a consequence so seems OK. llvm-svn: 214954	2014-08-06 10:16:36 +00:00
Adam Nemet	5ec912881f	[X86] Fixes commit r214890 to match the posted patch This was another fallout from my local rebase where something went wrong :( llvm-svn: 214951	2014-08-06 07:13:12 +00:00
Matt Arsenault	d5f4de27b6	R600: Increase nearby load scheduling threshold. This partially fixes weird looking load scheduling in memcpy test. The load clustering doesn't seem particularly smart, but this method seems to be partially deprecated so it might not be worth trying to fix. llvm-svn: 214943	2014-08-06 00:29:49 +00:00
Matt Arsenault	c10853f29f	R600/SI: Implement areLoadsFromSameBasePtr This currently has a noticable effect on the kernel argument loads. LDS and global loads are more problematic, I think because of how copies are currently inserted to ensure that the address is a VGPR. llvm-svn: 214942	2014-08-06 00:29:43 +00:00
David Blaikie	fb0412f039	DebugInfo: Assert that any CU for which debug_loc lists are emitted, has at least one range. This was coming in weird debug info that had variables (and hence debug_locs) but was in GMLT mode (because it was missing the 13th field of the compile_unit metadata) so no ranges were constructed. We should always have at least one range for any CU with a debug_loc in it - because the range should cover the debug_loc. The assertion just ensures that the "!= 1" range case inside the subsequent loop doesn't get entered for the case where there are no ranges at all, which should never reach here in the first place. llvm-svn: 214939	2014-08-06 00:21:25 +00:00
David Blaikie	cabf54a313	DebugInfo: Fix a bunch of tests that, owing to their compile_unit metadata not including a 13th field, had some subtle behavior. Without the 13th field, the "emission kind" field defaults to 0 (which is not equal to either of the values of the emission kind enum (1 == full debug info, 2 == line tables only)). In this particular instance, the comparison with "FullDebugInfo" was done when adding elements to the ranges list - so for these test cases no values were added to the ranges list. This got weirder when emitting debug_loc entries as the addresses should be relative to the range of the CU if the CU has only one range (the reasonable assumption is that if we're emitting debug_loc lists for a CU that CU has at least one range - but due to the above situation, it has zero) so the ranges were emitted relative to the start of the section rather than relative to the start of the CU's singular range. Fix these tests by accounting for the difference in the description of debug_loc entries (in some cases making the test ignorant to these differences, in others adding the extra label difference expression, etc) or the presence/absence of high/low_pc on the CU, and add the 13th field to their CUs to enable proper "full debug info" emission here. In a future commit I'll fix up a bunch of other test cases that are not so rigorously depending on this behavior, but still doing similarly weird things due to the missing 13th field. llvm-svn: 214937	2014-08-05 23:57:31 +00:00
Jonathan Roelofs	ef84bda531	Re-apply r214881: Fix return sequence on armv4 thumb This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928	2014-08-05 21:32:21 +00:00
Bill Schmidt	42a6936c78	[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923	2014-08-05 20:47:25 +00:00
Sanjay Patel	1954f2e924	Improved test cases that were added with r214892. 1. Added ':' to CHECK-LABELs 2. Added more CHECKs 3. Added CHECK-NEXTs 4. Added verbose hex immediate comments to CHECKs llvm-svn: 214921	2014-08-05 20:16:35 +00:00
Chandler Carruth	a746239be3	[x86] Fix a crasher due to shuffles which cancel each other out and add a test case. We also miscompile this test case which is showing a serious flaw in the single-input v8i16 shuffle code. I've left the specific instruction checks FIXME-ed out until I can address the bug in the single-input code, but I wanted to separate out a significant functionality change to produce correct code from a very simple and targeted crasher fix. The miscompile problem stems from keeping track of inputs by value rather than by index. As a consequence of doing this, we can't reliably update those inputs because they might swap and we can't detect this without copying the mask. The blend code now uses indices for the input lists and this seems strictly better. It also should make it easier to sort things and do other cleanups. I think the time has come to simplify The Great Lambda here. llvm-svn: 214914	2014-08-05 18:45:49 +00:00
Jonathan Roelofs	064eb5a177	Revert r214881 because it broke lots of build-bots llvm-svn: 214893	2014-08-05 17:36:05 +00:00
Sanjay Patel	8e5beb6edb	Optimize vector fabs of bitcasted constant integer values. Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892	2014-08-05 17:35:22 +00:00
Adam Nemet	fd2161b710	[AVX512] Add masking variant and intrinsics for valignd/q This is similar to what I did with the two-source permutation recently. (It's almost too similar so that we should consider generating the masking variants with some tablegen help.) Both encoding and intrinsic tests are added as well. For the latter, this is what the IR that the intrinsic test on the clang side generates. Part of <rdar://problem/17688758> llvm-svn: 214890	2014-08-05 17:23:04 +00:00
Jonathan Roelofs	f5fad3767b	Fix return sequence on armv4 thumb POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881	2014-08-05 17:13:17 +00:00

1 2 3 4 5 ...

10589 Commits