llvm-project

Commit Graph

Author	SHA1	Message	Date
Arnaud A. de Grandmaison	d17f96c9ad	[AArch64] Temporarily desactivate the PBQP test, while I investigate some leaks in the allocator llvm-svn: 217531	2014-09-10 18:40:18 +00:00
Arnaud A. de Grandmaison	c75dbbbdd6	[AArch64] Add experimental PBQP support This adds target specific support for using the PBQP register allocator on the AArch64, for the A57 cpu. By default, the PBQP allocator is not used, unless explicitely required on the command line with "-aarch64-pbqp". llvm-svn: 217504	2014-09-10 14:06:10 +00:00
Asiri Rathnayake	369c030633	[AArch 64] Use a constant pool load for weak symbol references when using static relocation model and small code model. Summary: currently we generate GOT based relocations for weak symbol references regardless of the underlying relocation model. This should be change so that in static relocation model we use a constant pool load instead. Patch from: Keith Walker Reviewers: Renato Golin, Tim Northover llvm-svn: 217503	2014-09-10 13:54:38 +00:00
Chad Rosier	3528c1e4c6	[AArch64] Improve AA to remove unneeded edges in the AA MI scheduling graph. Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! Phabricator Review: http://reviews.llvm.org/D5103 llvm-svn: 217371	2014-09-08 14:43:48 +00:00
Jiangning Liu	1a486da543	[AArch64] Add pass to enable additional comparison optimizations by CSE. Patched by Sergey Dmitrouk. This pass tries to make consecutive compares of values use same operands to allow CSE pass to remove duplicated instructions. For this it analyzes branches and adjusts comparisons with immediate values by converting: GE -> GT GT -> GE LT -> LE LE -> LT and adjusting immediate values appropriately. It basically corrects two immediate values towards each other to make them equal. llvm-svn: 217220	2014-09-05 02:55:24 +00:00
Tim Northover	f7423fd090	AArch64: fix vector-immediate BIC/ORR on big-endian devices. Follow up to r217138, extending the logic to other NEON-immediate instructions. As before, the instruction already performs the correct operation and we're just using a different type for convenience, so we want a true nop-cast. Patch by Asiri Rathnayake. llvm-svn: 217159	2014-09-04 15:05:24 +00:00
Tim Northover	bb72e6c804	AArch64: fix big-endian immediate materialisation We were materialising big-endian constants using DAG nodes with types different from what was requested, followed by a bitcast. This is fine on little-endian machines where bitcasting is a nop, but we need a slightly different representation for big-endian. This adds a new set of NVCAST (natural-vector cast) operations which are always nops. Patch by Asiri Rathnayake. llvm-svn: 217138	2014-09-04 09:46:14 +00:00
Juergen Ributzka	4bea494569	Revert r216803 "[MachineSinking] Clear kill flag of all operands at all their uses." This reverts commit r216803, because it might have broken the buildbot. The issue is tracked in PR20842. llvm-svn: 217120	2014-09-04 02:07:36 +00:00
Juergen Ributzka	1dbc15f02d	[FastISel][AArch64] Add target-specific lowering for logical operations. This change adds support for immediate and shift-left folding into logical operations. This fixes rdar://problem/18223183. llvm-svn: 217118	2014-09-04 01:29:18 +00:00
Juergen Ributzka	31e5b7fb12	Reapply r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR."" This reapplies r216805 with a fix to a copy-past error, which resulted in an incorrect register class. Original commit message: Select the correct register class for the various instructions that are generated when combining instructions and constrain the registers to the appropriate register class. This fixes rdar://problem/18183707. llvm-svn: 217019	2014-09-03 07:07:10 +00:00
Juergen Ributzka	a1148b2173	[FastISel][AArch64] Add target-dependent instruction selection for Add/Sub. There is already target-dependent instruction selection support for Adds/Subs to support compares and the intrinsics with overflow check. This takes advantage of the existing infrastructure to also support Add/Sub, which allows the folding of immediates, sign-/zero-extends, and shifts. This fixes rdar://problem/18207316. llvm-svn: 217007	2014-09-03 01:38:36 +00:00
Juergen Ributzka	53dbef6ef1	[FastISel][AArch64] Use the target-dependent selection code for shifts first. This uses the target-dependent selection code for shifts first, which allows us to create better code for shifts with immediates and sign-/zero-extend folding. Vector type are not handled yet and the code falls back to target-independent instruction selection for these cases. This fixes rdar://problem/17907920. llvm-svn: 216985	2014-09-02 22:33:57 +00:00
Rafael Espindola	4dd3677b5f	Replace -use-init-array with -use-ctors. We have been using .init-array for most systems for quiet some time, but tools like llc are still defaulting to .ctors because the old option was never changed. This patch makes llc default to .init-array and changes the option to be -use-ctors. Clang is not affected by this. It has its own fancier logic. llvm-svn: 216905	2014-09-02 13:54:53 +00:00
David Xu	052b9d9282	Merge Extend and Shift into a UBFX llvm-svn: 216899	2014-09-02 09:33:56 +00:00
Hal Finkel	51e6fa2201	Revert "Revert '[DAGCombiner] Split up an indexed load if only the base pointer value is live'" I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The underlying cause of the failure is that pre-inc loads with increments represented by ISD::TargetConstants were being transformed into ISD:::ADDs with ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they were selected as invalid r+r adds. This recommits r208640, rebased and with an exclusion for ISD::TargetConstant increments. This behavior seems correct, although in the future we might want to ask the target to split out the indexing that uses ISD::TargetConstants. Unfortunately, I don't yet have small test case where the relevant invalid 'add' instruction is not itself dead (and thus eliminated by DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things) Original commit message (by Adam Nemet): Right now the load may not get DCE'd because of the side-effect of updating the base pointer. This can happen if we lower a read-modify-write of an illegal larger type (e.g. i48) such that the modification only affects one of the subparts (the lower i32 part but not the higher i16 part). See the testcase. In order to spot the dead load we need to revisit it when SimplifyDemandedBits decided that the value of the load is masked off. This is the CommitTargetLoweringOpt piece. I checked compile time with ARM64 by sending SPEC bitcode files through llc. No measurable change. Fixes <rdar://problem/16031651> llvm-svn: 216898	2014-09-02 06:24:04 +00:00
Jingyue Wu	5208cc5dbe	[MachineSink] Use the real post dominator tree Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. Test Plan: Added a NVPTX codegen test to verify that our change is in effect. It also shows the unnecessary register pressure caused by over-sinking. Updated affected tests in AArch64 and X86. Reviewers: eliben, meheff, Jiangning Reviewed By: Jiangning Subscribers: jholewinski, aemerson, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4814 llvm-svn: 216862	2014-09-01 03:47:25 +00:00
Juergen Ributzka	25816b0fdd	Revert r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR." I think this broke the build bot. Reverting it for now until I have time to take a closer look. llvm-svn: 216813	2014-08-30 06:16:26 +00:00
Juergen Ributzka	3e7f88c169	[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR. Select the correct register class for the various instructions that are generated when combining instructions and constrain the registers to the appropriate register class. This fixes rdar://problem/18183707. llvm-svn: 216805	2014-08-29 23:48:09 +00:00
Juergen Ributzka	c5c1c6090f	[FastISel][AArch64] Use the correct register class for branches. Also constrain the register class for branches. This fixes rdar://problem/18181496. llvm-svn: 216804	2014-08-29 23:48:06 +00:00
Juergen Ributzka	00d78221ab	[MachineSinking] Clear kill flag of all operands at all their uses. When sinking an instruction it might be moved past the original last use of one of its operands. This last use has the kill flag set and the verifier will obviously complain about this. Before Machine Sinking (AArch64): %vreg3<def> = ASRVXr %vreg1, %vreg2<kill> %XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def> ... After Machine Sinking: %XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def> ... %vreg3<def> = ASRVXr %vreg1, %vreg2<kill> This fix clears all the kill flags in all instruction that use the same operands as the instruction that is being sunk. This fixes rdar://problem/18180996. llvm-svn: 216803	2014-08-29 23:48:03 +00:00
Louis Gerbarg	03c627e8a7	Remove spurious mask operations from AArch64 add->compares on 16 and 8 bit values This patch checks for DAG patterns that are an add or a sub followed by a compare on 16 and 8 bit inputs. Since AArch64 does not support those types natively they are legalized into 32 bit values, which means that mask operations are inserted into the DAG to emulate overflow behaviour. In many cases those masks do not change the result of the processing and just introduce a dependent operation, often in the middle of a hot loop. This patch detects the relevent DAG patterns and then tests to see if the transforms are equivalent with and without the mask, removing the mask if possible. The exact mechanism of this patch was discusses in http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074444.html There is a reasonably good chance there are missed oppurtunities due to similiar (but not identical) DAG patterns that could be funneled into this test, adding them should be simple if we see test cases. Tests included. rdar://13754426 llvm-svn: 216776	2014-08-29 21:00:22 +00:00
Juergen Ributzka	f6ee7a7cdd	[FastISel][AArch64] Fix an incorrect kill flag due to a bug in SelectTrunc. When we select a trunc instruction we don't emit any code if the type is already i32 or smaller. This is because the instruction that uses the truncated value will deal with it. This behavior can incorrectly transfer a kill flag, which was meant for the result of the truncate, onto the source register. %2 = trunc i32 %1 to i16 ... = ... %2 -> ... = ... vreg1 <kill> ... = ... %1 ... = ... vreg1 This commit fixes this by emitting a COPY instruction, so that the result and source register are distinct virtual registers. This fixes rdar://problem/18178188. llvm-svn: 216750	2014-08-29 17:58:16 +00:00
Tim Northover	3c0915e858	AArch64: only try to get operand of a known node. A bug in r216725 meant we tried to discover the type of a SETCC before confirming the node actually was a SETCC. llvm-svn: 216734	2014-08-29 15:34:58 +00:00
Tim Northover	c1c05aeb5d	AArch64: skip select/setcc combine in complex case. In an llvm-stress generated test, we were trying to create a v0iN type and asserting when that failed. This case could probably be handled by the function, but not without added complexity and the situation it arises in is sufficiently odd that there's probably no benefit anyway. Should fix PR20775. llvm-svn: 216725	2014-08-29 13:05:18 +00:00
Jiangning Liu	08f4cda2ec	[AArch64] Fix some failures exposed by value type v4f16 and v8f16. 1) Add some missing bitcast patterns for v8f16. 2) Add type promotion for operand of ld/st operations. llvm-svn: 216706	2014-08-29 01:31:42 +00:00
Juergen Ributzka	77bc09f5ab	[FastISel][AArch64] Don't fold instructions that are not in the same basic block. This fix checks first if the instruction to be folded (e.g. sign-/zero-extend, or shift) is in the same machine basic block as the instruction we are folding into. Not doing so can result in incorrect code, because the value might not be live-out of the basic block, where the value is defined. This fixes rdar://problem/18169495. llvm-svn: 216700	2014-08-29 00:19:21 +00:00
Jim Grosbach	ec2b0d0b11	AArch64: More correctly constrain target vector extend lowering. The AArch64 target lowering for [zs]ext of vectors is set up to handle input simple types and expects the generic SDag path to do something reasonable with anything that's not a simple type. The code, however, was only checking that the result type was a simple type and assuming that implied that the source type would also be a simple type. That's not a valid assumption, as operations like "zext <1 x i1> %0 to <1 x i32>" demonstrate. The fix is to simply explicitly validate the source type as well as the result type. PR20791 llvm-svn: 216689	2014-08-28 22:08:28 +00:00
David Xu	ee978203e6	Generate CMN when comparing a short int with minus llvm-svn: 216651	2014-08-28 04:59:53 +00:00
Juergen Ributzka	31328168bb	[FastISel] Undo phi node updates when falling-back to SelectionDAG. The included test case would fail, because the MI PHI node would have two operands from the same predecessor. This problem occurs when a switch instruction couldn't be selected. This happens always, because there is no default switch support for FastISel to begin with. The problem was that FastISel would first add the operand to the PHI nodes and then fall-back to SelectionDAG, which would then in turn add the same operands to the PHI nodes again. This fix removes these duplicate PHI node operands by reseting the PHINodesToUpdate to its original state before FastISel tried to select the instruction. This fixes <rdar://problem/18155224>. llvm-svn: 216640	2014-08-28 02:06:55 +00:00
Juergen Ributzka	4f1a54a41a	[FastISel] Currently instructions are folded very aggressively for AArch64 into the memory operation, which can lead to the use of killed operands: %vreg1<def> = ADDXri %vreg0<kill>, 2 %vreg2<def> = LDRBBui %vreg0, 2 ... = ... %vreg1 ... This usually happens when the result is also used by another non-memory instruction in the same basic block, or any instruction in another basic block. This fix teaches hasTrivialKill to not only check the LLVM IR that the value has a single use, but also to check if the register that represents that value has already been used. This can happen when the instruction with the use was folded into another instruction (in this particular case a load instruction). This fixes rdar://problem/18142857. llvm-svn: 216634	2014-08-28 00:09:46 +00:00
Juergen Ributzka	843f14f411	Revert "[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation." Quentin pointed out that this is not the correct approach and there is a better and easier solution. llvm-svn: 216632	2014-08-27 23:09:40 +00:00
Juergen Ributzka	ad8beabe38	[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation. Currently instructions are folded very aggressively into the memory operation, which can lead to the use of killed operands: %vreg1<def> = ADDXri %vreg0<kill>, 2 %vreg2<def> = LDRBBui %vreg0, 2 ... = ... %vreg1 ... This usually happens when the result is also used by another non-memory instruction in the same basic block, or any instruction in another basic block. If the computed address is used by only memory operations in the same basic block, then it is safe to fold them. This is because all memory operations will fold the address computation and the original computation will never be emitted. This fixes rdar://problem/18142857. llvm-svn: 216629	2014-08-27 22:52:33 +00:00
Juergen Ributzka	3c1b286152	[FastISel][AArch64] Fix simplify address when the address comes from a shift. When the address comes directly from a shift instruction then the address computation cannot be folded into the memory instruction, because the zero register is not available as a base register. Simplify addess needs to emit the shift instruction and use the result as base register. llvm-svn: 216621	2014-08-27 21:38:33 +00:00
Juergen Ributzka	100a9b7fda	[FastISel][AArch64] Use the zero register for stores. Use the zero register directly when possible to avoid an unnecessary register copy and a wasted register at -O0. This also uses integer stores to store a positive floating-point zero. This saves us from materializing the positive zero in a register and then storing it. llvm-svn: 216617	2014-08-27 21:04:52 +00:00
Oliver Stannard	89d1542840	Teach the AArch64 backend about v4f16 and v8f16 This teaches the AArch64 backend to deal with the operations required to deal with the operations on v4f16 and v8f16 which are exposed by NEON intrinsics, plus the add, sub, mul and div operations. llvm-svn: 216555	2014-08-27 16:16:04 +00:00
Juergen Ributzka	fb506a417d	[FastISel][AArch64] Fix address simplification. When a shift with extension or an add with shift and extension cannot be folded into the memory operation, then the address calculation has to be materialized separately. While doing so the code forgot to consider a possible sign-/zero- extension. This fix folds now also the sign-/zero-extension into the add or shift instruction which is used to materialize the address. This fixes rdar://problem/18141718. llvm-svn: 216511	2014-08-27 00:58:30 +00:00
Juergen Ributzka	99dd30f338	[FastISel][AArch64] Fold Sign-/Zero-Extend into the shift immediate instruction. llvm-svn: 216510	2014-08-27 00:58:26 +00:00
Juergen Ributzka	0e0b4c1cda	[FastISel][AArch64] Add support for variable shift. This adds the missing variable shift support for value type i8, i16, and i32. This fixes <rdar://problem/18095685>. llvm-svn: 216242	2014-08-21 23:06:07 +00:00
Juergen Ributzka	addb75a4f3	[FastISel][AArch64] Use the correct register class to make the MI verifier happy. This is mostly achieved by providing the correct register class manually, because getRegClassFor always returns the GPRAllRegClass for MVT::i32 and MVT::i64. Also cleanup the code to use the FastEmitInst_ method whenever possible. This makes sure that the operands' register class is properly constrained. For all the remaining cases this adds the missing constrainOperandRegClass calls for each operand. llvm-svn: 216225	2014-08-21 20:57:57 +00:00
Quentin Colombet	0c740d4b9a	[AArch64] Run a peephole pass right after AdvSIMD pass. The AdvSIMD pass may produce copies that are not coalescer-friendly. The peephole optimizer knows how to fix that as demonstrated in the test case. <rdar://problem/12702965> llvm-svn: 216200	2014-08-21 18:10:07 +00:00
Juergen Ributzka	95c0f153e4	[FastISel][AArch64] Remove redundant test. These tests and many more are already covered by fast-isel-addressing-modes.ll. llvm-svn: 216186	2014-08-21 16:40:05 +00:00
Jiangning Liu	deb4b5fc37	Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type". llvm-svn: 216147	2014-08-21 01:59:30 +00:00
Juergen Ributzka	e1bb055ed3	[FastISel][AArch64] Don't fold the sign-/zero-extend from i1 into the compare. This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero- extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both operands have to be sign-/zero-extended with separate instructions. Related to <rdar://problem/17913111>. llvm-svn: 216073	2014-08-20 16:34:15 +00:00
Jiangning Liu	f841b3b79e	Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type legalization stage. With those two optimizations, fewer signed/zero extension instructions can be inserted, and then we can expose more opportunities to Machine CSE pass in back-end. llvm-svn: 216066	2014-08-20 12:05:15 +00:00
Yi Kong	c655f0c898	ARM: Fix codegen for rbit intrinsic LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic. According to ARM ARM, rbit only takes register as argument, not immediate. The correct instruction should be rbit <Rd>, <Rm>. The bug was originally introduced in r211057. Differential Revision: http://reviews.llvm.org/D4980 llvm-svn: 216064	2014-08-20 10:40:20 +00:00
Juergen Ributzka	0781b860e4	[FastISel][AArch64] Use the proper FMOV instruction to materialize a +0.0. Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register class to be used with the zero register. This makes the MachineInstruction verifier happy again. This is related to <rdar://problem/18027157>. llvm-svn: 216040	2014-08-20 01:10:36 +00:00
Juergen Ributzka	c0886dd5b0	[FastISel][AArch64] Factor out ADDS/SUBS instruction emission and add support for extensions and shift folding. Factor out the ADDS/SUBS instruction emission code into helper functions and make the helper functions more clever to support most of the different ADDS/SUBS instructions the architecture support. This includes better immedediate support, shift folding, and sign-/zero-extend folding. This fixes <rdar://problem/17913111>. llvm-svn: 216033	2014-08-19 22:29:55 +00:00
Juergen Ributzka	6afeffb078	[FastISel][AArch64] Extend floating-point materialization test. This adds the missing test that I promised for r215753 to test the materialization of the floating-point value +0.0. Related to <rdar://problem/18027157>. llvm-svn: 216019	2014-08-19 20:35:07 +00:00
Juergen Ributzka	b46ea081ad	Reapply [FastISel][AArch64] Add support for more addressing modes (r215597). Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. Original commit message: FastISel didn't take much advantage of the different addressing modes available to it on AArch64. This commit allows the ComputeAddress method to recognize more addressing modes that allows shifts and sign-/zero-extensions to be folded into the memory operation itself. For Example: lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3] ldr x0, [x0, x1] sxtw x1, w1 lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3] ldr x0, [x0, x1] llvm-svn: 216013	2014-08-19 19:44:17 +00:00
Juergen Ributzka	7e23f77d82	Reapply [FastISel][AArch64] Make use of the zero register when possible (r215591). Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. Original commit message: This change materializes now the value "0" from the zero register. The zero register can be folded by several instruction, so no materialization is need at all. Fixes <rdar://problem/17924413>. llvm-svn: 216009	2014-08-19 19:44:02 +00:00
Juergen Ributzka	5460cbfda4	[FastISel][AArch64] Fix a few BuildMI callsites where the result register was added as an operand register. This fixes a few BuildMI callsites where the result register was added by using addReg, which is per default a use and therefore an operand register. Also use the zero register as result register when emitting a compare instruction (SUBS with unused result register). llvm-svn: 215997	2014-08-19 17:41:53 +00:00
Oliver Stannard	f5469bec97	Teach the AArch64 backend to handle f16 This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv operations on f16 (half-precision) types by promoting to f32. llvm-svn: 215891	2014-08-18 14:22:39 +00:00
Oliver Stannard	12993dd916	[ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage Externally-defined functions with weak linkage should not be tail-called on ARM or AArch64, as the AAELF spec requires normal calls to undefined weak functions to be replaced with a NOP or jump to the next instruction. The behaviour of branch instructions in this situation (as used for tail calls) is implementation-defined, so we cannot rely on the linker replacing the tail call with a return. llvm-svn: 215890	2014-08-18 12:42:15 +00:00
Amara Emerson	82da7d07a1	[AArch64] Narrow arguments passed in wrong position on the stack in big-endian mode. Patch by Asiri Rathnayake. Differential Revision: http://reviews.llvm.org/D4922 llvm-svn: 215716	2014-08-15 14:29:57 +00:00
Juergen Ributzka	790bacf232	Revert several FastISel commits to track down a buildbot error. This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673	2014-08-14 19:56:28 +00:00
Juergen Ributzka	34ed422c42	Revert "[FastISel][AArch64] Add support for more addressing modes." This reverts commits r215597, because it might have broken the build bots. llvm-svn: 215659	2014-08-14 17:10:54 +00:00
Akira Hatanaka	b74db09c97	[AArch64, fast-isel] Fall back to SelectionDAG to select tail calls. Certain functions such as objc_autoreleaseReturnValue have to be called as tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls, we have to fall back to SelectionDAG to select calls that are marked as tail. <rdar://problem/17991614> llvm-svn: 215600	2014-08-13 23:23:58 +00:00
Juergen Ributzka	98347d902e	[FastISel][AArch64] Add support for more addressing modes. FastISel didn't take much advantage of the different addressing modes available to it on AArch64. This commit allows the ComputeAddress method to recognize more addressing modes that allows shifts and sign-/zero-extensions to be folded into the memory operation itself. For Example: lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3] ldr x0, [x0, x1] sxtw x1, w1 lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3] ldr x0, [x0, x1] llvm-svn: 215597	2014-08-13 22:53:29 +00:00
Juergen Ributzka	24080d60fa	[FastISel][AArch64] Make use of the zero register when possible. This change materializes now the value "0" from the zero register. The zero register can be folded by several instruction, so no materialization is need at all. Fixes <rdar://problem/17924413>. llvm-svn: 215591	2014-08-13 22:13:14 +00:00
Gerolf Hoflehner	eb90500d06	[MachineCombiner] Fix for ICE bug 20598 The combiner ignored DBG nodes when checking the uses of a virtual register. It combined a sequence like %vreg1 = madd %vreg2, %vreg3,... DBG_VALUE (%vreg1 ...) %vreg4 = add %vreg1,... to %vreg4 = madd %vreg2, %vreg3 leaving behind a dangling DBG_VALUE with a definition. This triggered an assertion in the MachineTraceMetrics.cpp module. llvm-svn: 215431	2014-08-12 07:54:12 +00:00
Quentin Colombet	c64c175173	[AArch64] Fix registerAllocator assigns same register for base and wback in pre/post-index load and store. Patch by Steven Wu <stevenwu@apple.com> llvm-svn: 215390	2014-08-11 21:39:53 +00:00
Jiangning Liu	dd6e12d71c	In Machine CSE pass, the source register of a COPY machine instruction can be propagated to all its users, and this propagation could increase the probability of finding common subexpressions. If the COPY has only one user, the COPY itself can be removed. llvm-svn: 215344	2014-08-11 05:17:19 +00:00
Jiangning Liu	dcc651f99f	[AArch64] Fix a type conversion bug for anlyzing compare. The bug can cause spec2006/483.xalancbmk failure. Patched by David Xu. llvm-svn: 215206	2014-08-08 14:19:29 +00:00
James Molloy	3feea9c11a	[AArch64] Add an FP load balancing pass for Cortex-A57 For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations. This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU. Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness). This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected. llvm-svn: 215199	2014-08-08 12:33:21 +00:00
Tim Northover	0f18ff9817	AArch64: stop trying to take control of all UnknownArch triples. This short-circuited our error reporting for incorrectly specified target triples (you'd get AArch64 code instead). Should fix PR20567. llvm-svn: 215191	2014-08-08 08:27:44 +00:00
Akira Hatanaka	5acc58fcfb	[stack protector] Look through bitcasts to get global variable __stack_chk_guard. Handle the case where the pointer operand of the load instruction that loads the stack guard is not a global variable but instead a bitcast. %StackGuard = load i8 bitcast (i64 @__stack_chk_guard to i8*) call void @llvm.stackprotector(i8 %StackGuard, i8** %StackGuardSlot) Original test case provided by Ana Pazos. This fixes PR20558. llvm-svn: 215167	2014-08-07 23:08:24 +00:00
Gerolf Hoflehner	97c383bc36	MachineCombiner Pass for selecting faster instruction sequence on AArch64 Re-commit of r214832,r21469 with a work-around that avoids the previous problem with gcc build compilers The work-around is to use SmallVector instead of ArrayRef of basic blocks in preservesResourceLen()/MachineCombiner.cpp llvm-svn: 215151	2014-08-07 21:40:58 +00:00
James Molloy	99917946da	[AArch64] Add a testcase for r214957. llvm-svn: 214965	2014-08-06 13:31:32 +00:00
Yi Kong	e56de69500	AArch64: Add support for instruction prefetch intrinsic Instruction prefetch is not implemented for AArch64, it is incorrectly translated into data prefetch instruction. Differential Revision: http://reviews.llvm.org/D4777 llvm-svn: 214860	2014-08-05 12:46:47 +00:00
Juergen Ributzka	a126d1ef3c	[FastISel][AArch64] Implement the FastLowerArguments hook. This implements basic argument lowering for AArch64 in FastISel. It only handles a small subset of the C calling convention. It supports simple arguments that can be passed in GPR and FPR registers. This should cover most of the trivial cases without falling back to SelectionDAG. This fixes <rdar://problem/17890986>. llvm-svn: 214846	2014-08-05 05:43:48 +00:00
Kevin Qin	ec100526e3	Revert "r214832 - MachineCombiner Pass for selecting faster instruction" It broke compiling of most Benchmark and internal test, as clang got clashed by segmentation fault or assertion. llvm-svn: 214845	2014-08-05 05:43:47 +00:00
Juergen Ributzka	51f5326e25	[FastISel][AArch64] Don't perform sign-/zero-extension for function arguments that have already been sign-/zero-extended. llvm-svn: 214844	2014-08-05 05:43:44 +00:00
Gerolf Hoflehner	4dbf44b9d8	MachineCombiner Pass for selecting faster instruction sequence on AArch64 Re-commit of r214669 without changes to test cases LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and LLVM:: CodeGen/AArch64/dp-3source.ll This resolves the reported compfails of the original commit. llvm-svn: 214832	2014-08-05 01:16:13 +00:00
Juergen Ributzka	53533e885a	[FastISel][AArch64] Fix shift lowering for i8 and i16 value types. This fix changes the parameters #r and #s that are passed to the UBFM/SBFM instruction to get the zero/sign-extension for free. The original problem was that the shift left would use the 32-bit shift even for i8/i16 value types, which could leave the upper bits set with "garbage" values. The arithmetic shift right on the other side would use the wrong MSB as sign-bit to determine what bits to shift into the value. This fixes <rdar://problem/17907720>. llvm-svn: 214788	2014-08-04 21:49:51 +00:00
Chad Rosier	5908ab4dd6	[AArch64] Extend the number of scalar instructions supported in the AdvSIMD scalar integer instruction pass. This is a patch I had lying around from a few months ago. The pass is currently disabled by default, so nothing to interesting. llvm-svn: 214779	2014-08-04 21:20:25 +00:00
Kevin Qin	f31ecf3fea	Revert "r214669 - MachineCombiner Pass for selecting faster instruction" This commit broke "make check" for several hours, so get it reverted. llvm-svn: 214697	2014-08-04 05:10:33 +00:00
Gerolf Hoflehner	35ba467122	MachineCombiner Pass for selecting faster instruction sequence - AArch64 target support This patch turns off madd/msub generation in the DAGCombiner and generates them in the MachineCombiner instead. It replaces the original code sequence with the combined sequence when it is beneficial to do so. When there is no machine model support it always generates the madd/msub instruction. This is true also when the objective is to optimize for code size: when the combined sequence is shorter is always chosen and does not get evaluated. When there is a machine model the combined instruction sequence is evaluated for critical path and resource length using machine trace metrics and the original code sequence is replaced when it is determined to be faster. rdar://16319955 llvm-svn: 214669	2014-08-03 22:03:40 +00:00
James Molloy	6b999ae682	Update test to use a more modern AArch64 triple, as requested by Renato. llvm-svn: 214637	2014-08-02 17:15:11 +00:00
James Molloy	ce45be0465	[AArch64] Teach DAGCombiner that converting two consecutive loads into a vector load is not a good transform when paired loads are available. The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers! llvm-svn: 214634	2014-08-02 14:51:24 +00:00
Juergen Ributzka	5dcb33bdbb	[FastISel][AArch64] Fold offset into the memory operation. Fold simple offsets into the memory operation: add x0, x0, #8 ldr x0, [x0] --> ldr x0, [x0, #8] Fixes <rdar://problem/17887945>. llvm-svn: 214545	2014-08-01 19:40:16 +00:00
Juergen Ributzka	50a4005e35	[FastISel][AArch64] Add branch weights. Add branch weights to branch instructions, so that the following passes can optimize based on it (i.e. basic block ordering). Fixes <rdar://problem/17887137>. llvm-svn: 214537	2014-08-01 18:39:24 +00:00
Chad Rosier	4d71a4e2c6	[AArch64] Fix test from r214518 in an attempt to appease buildbots. llvm-svn: 214521	2014-08-01 15:30:41 +00:00
Chad Rosier	579c02c9a5	[AArch64] Generate tbz/tbnz when comparing against zero. The tbz/tbnz checks the sign bit to convert op w1, w1, w10 cmp w1, #0 b.lt .LBB0_0 to op w1, w1, w10 tbnz w1, #31, .LBB0_0 Differential Revision: http://reviews.llvm.org/D4440 llvm-svn: 214518	2014-08-01 14:48:56 +00:00
Juergen Ributzka	82ecc7ff2a	[FastISel][AArch64] Fix the immediate versions of the {s\|u}{add\|sub}.with.overflow intrinsics. ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit. This fix checks if the immediate version can be used under this constraints and if we can convert ADDS to SUBS or vice versa to support negative immediates. Also update the test cases to test the immediate versions. llvm-svn: 214470	2014-08-01 01:25:55 +00:00
Juergen Ributzka	c537bd2da4	[FastISel][AArch64] Add basic bitcast support for conversion between float and int. Fixes <rdar://problem/17867078>. llvm-svn: 214389	2014-07-31 06:25:37 +00:00
Juergen Ributzka	130e77e431	[FastISel][AArch64] Add sqrt intrinsic support. Fixes <rdar://problem/17867067>. llvm-svn: 214388	2014-07-31 06:25:33 +00:00
Juergen Ributzka	a80dd08b56	[FastISel][AArch64] Update and enable patchpoint and stackmap intrinsic tests for FastISel. This commit updates the existing SelectionDAG tests for the stackmap and patchpoint intrinsics and enables FastISel testing. It also splits up the tests into separate files, due to different codegen between SelectionDAG and FastISel. llvm-svn: 214382	2014-07-31 04:10:43 +00:00
Juergen Ributzka	052e6c289b	[FastISel][AArch64] Add MachO large code model support for function calls. Currently the large code model for MachO uses the GOT to make function calls. Emit the required adrp and ldr instructions to load the address from the GOT. Related to <rdar://problem/17733076>. llvm-svn: 214381	2014-07-31 04:10:40 +00:00
Juergen Ributzka	3771fbb2f5	[FastISel][AArch64] Add select folding support for the XALU intrinsics. This improves the code generation for the XALU intrinsics when the condition is feeding a select instruction. This also updates and enables the XALU unit tests for FastISel. This fixes <rdar://problem/17831117>. llvm-svn: 214350	2014-07-30 22:04:37 +00:00
Juergen Ributzka	a75cb11f14	[FastISel][AArch64] Add support for shift-immediate. Currently the shift-immediate versions are not supported by tblgen and hopefully this can be later removed, once the required support has been added to tblgen. llvm-svn: 214345	2014-07-30 22:04:22 +00:00
Jiangning Liu	cd296378a7	Implement AArch64 TTI interface isAsCheapAsAMove. llvm-svn: 214159	2014-07-29 02:09:26 +00:00
Tim Northover	2c46beb0d1	AArch64: fix conversion of 'J' inline asm constraints. 'J' represents a negative number suitable for an add/sub alias instruction, but while preparing it to become an int64_t we were mangling the sign extension. So "i32 -1" became 0xffffffffLL, for example. Should fix one half of PR20456. llvm-svn: 214052	2014-07-27 07:10:29 +00:00
Akira Hatanaka	e5b6e0d231	[stack protector] Fix a potential security bug in stack protector where the address of the stack guard was being spilled to the stack. Previously the address of the stack guard would get spilled to the stack if it was impossible to keep it in a register. This patch introduces a new target independent node and pseudo instruction which gets expanded post-RA to a sequence of instructions that load the stack guard value. Register allocator can now just remat the value when it can't keep it in a register. <rdar://problem/12475629> llvm-svn: 213967	2014-07-25 19:31:34 +00:00
Juergen Ributzka	5d6c43e294	[FastISel][AArch64] Add support for frameaddress intrinsic. This commit implements the frameaddress intrinsic for the AArch64 architecture in FastISel. There were two test cases that pretty much tested the same, so I combined them to a single test case. Fixes <rdar://problem/17811834> llvm-svn: 213959	2014-07-25 17:47:14 +00:00
Chandler Carruth	9f4530b95d	[SDAG] Introduce a combined set to the DAG combiner which tracks nodes which have successfully round-tripped through the combine phase, and use this to ensure all operands to DAG nodes are visited by the combiner, even if they are only added during the combine phase. This is critical to have the combiner reach nodes that are introduced during combining. Previously these would sometimes be visited and sometimes not be visited based on whether they happened to end up on the worklist or not. Now we always run them through the combiner. This fixes quite a few bad codegen test cases lurking in the suite while also being more principled. Among these, the TLS codegeneration is particularly exciting for programs that have this in the critical path like TSan-instrumented binaries (although I think they engineer to use a different TLS that is faster anyways). I've tried to check for compile-time regressions here by running llc over a merged (but not LTO-ed) clang bitcode file and observed at most a 3% slowdown in llc. Given that this is essentially a worst case (none of opt or clang are running at this phase) I think this is tolerable. The actual LTO case should be even less costly, and the cost in normal compilation should be negligible. With this combining logic, it is possible to re-legalize as we combine which is necessary to implement PSHUFB formation on x86 as a post-legalize DAG combine (my ultimate goal). Differential Revision: http://reviews.llvm.org/D4638 llvm-svn: 213898	2014-07-24 22:15:28 +00:00
Kevin Qin	9a2a2c502b	[AArch64] Fix a bug generating incorrect instruction when building small vector. This bug is introduced by r211144. The element of operand may be smaller than the element of result, but previous commit can only handle the contrary condition. This commit is to handle this scenario and generate optimized codes like ZIP1. llvm-svn: 213830	2014-07-24 02:05:42 +00:00
Jiangning Liu	451f30e89f	[AArch64] Disable some optimization cases for type conversion from sint to fp, because those optimization cases are micro-architecture dependent and only make sense for Cyclone. A new predicate Cyclone is introduced in .td file. llvm-svn: 213827	2014-07-24 01:29:59 +00:00
Jim Grosbach	d3c7942f4a	Use an explicit triple in testcase. Make the test work better on non-darwin hosts. Hopefully. llvm-svn: 213801	2014-07-23 20:46:32 +00:00
Jim Grosbach	724e438c62	[X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants. The transform to constant fold unary operations with an AND across a vector comparison applies when the constant is not a splat of a scalar as well. llvm-svn: 213800	2014-07-23 20:41:43 +00:00
Jim Grosbach	8f6f0858ec	X86: restrict combine to when type sizes are safe. The folding of unary operations through a vector compare and mask operation is only safe if the unary operation result is of the same size as its input. For example, it's not safe for [su]itofp from v4i32 to v4f64. llvm-svn: 213799	2014-07-23 20:41:38 +00:00
Juergen Ributzka	1b014504ab	[FastISel][AArch64] Fix return type in FastLowerCall. I used the wrong method to obtain the return type inside FinishCall. This fix simply uses the return type from FastLowerCall, which we already determined to be a valid type. Reduced test case from Chad. Thanks. llvm-svn: 213788	2014-07-23 20:03:13 +00:00
Chad Rosier	17020f96c7	[AArch64] Lower sdiv x, pow2 using add + select + shift. The target-independent DAGcombiner will generate: asr w1, X, #31 w1 = splat sign bit. add X, X, w1, lsr #28 X = X + 0 or pow2-1 asr w0, X, asr #4 w0 = X/pow2 However, the add + shifts is expensive, so generate: add w0, X, 15 w0 = X + pow2-1 cmp X, wzr X - 0 csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X; asr w0, X, asr 4 w0 = X/pow2 llvm-svn: 213758	2014-07-23 14:57:52 +00:00
Tim Northover	35910d7fa8	AArch64: remove "arm64_be" support in favour of "aarch64_be". There really is no arm64_be: it was a useful fiction to test big-endian support while both backends existed in parallel, but now the only platform that uses the name (iOS) doesn't have a big-endian variant, let alone one called "arm64_be". llvm-svn: 213748	2014-07-23 12:58:11 +00:00
Juergen Ributzka	2581fa505f	[FastIsel][AArch64] Add support for the FastLowerCall and FastLowerIntrinsicCall target-hooks. This commit modifies the existing call lowering functions to be used as the FastLowerCall and FastLowerIntrinsicCall target-hooks instead. This enables patchpoint intrinsic lowering for AArch64. This fixes <rdar://problem/17733076> llvm-svn: 213704	2014-07-22 23:14:58 +00:00
Juergen Ributzka	88f6faf1f4	[AArch64] Use CHECK-LABEL in ARM64 ABI unit tests. llvm-svn: 213703	2014-07-22 23:14:54 +00:00
Tim Northover	f7a02c1762	CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc, and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation path is now uniform, regardless of the input IR: fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall fpext -> FP16_TO_FP (if f16 illegal) -> libcall Each target should be able to select the version that best matches its operations and not be required to duplicate patterns for both fptrunc and FP_TO_FP16 (for example). As a result we can remove some redundant AArch64 patterns. llvm-svn: 213507	2014-07-21 09:13:56 +00:00
Tim Northover	f8bfe21fad	AArch64: implement efficient f16 bitcasts Because i16 is illegal, there's no native DAG method to represent a bitcast to or from an f16 type. This meant LLVM was inserting a stack store/load pair which is really not ideal. llvm-svn: 213378	2014-07-18 13:07:05 +00:00
Tim Northover	b94f0859e5	AArch64: support f16 extend/trunc operations. llvm-svn: 213375	2014-07-18 13:01:31 +00:00
Tim Northover	20bd0ced30	CodeGen: soften f16 type by default instead of marking legal. Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a useful thing to try than keeping it Legal when no registers can actually hold such values. Longer term, we probably want something between Soften and Promote semantics for most targets, it'll be more efficient to promote the 4 basic operations to f32 than libcall them. llvm-svn: 213372	2014-07-18 12:41:46 +00:00
Jim Grosbach	f7502c4884	AArch64: Constant fold converting vector setcc results to float. Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the code is generated as: fcmeq.4s v0, v0, v1 movi.4s v1, #0x1 // Integer splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. scvtf.4s v0, v0 // Convert each lane to f32. ret After, the code is improved to: fcmeq.4s v0, v0, v1 fmov.4s v1, #1.00000000 // f32 splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. ret The svvtf.4s has been constant folded away and the floating point 1.0f vector lanes are materialized directly via fmov.4s. Rather than do the folding manually in the target code, teach getNode() in the generic SelectionDAG to handle folding constant operands of vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do additional constant folding there as well, but I don't have test cases for those operations, so leaving them for another time when it becomes appropriate. rdar://17693791 llvm-svn: 213341	2014-07-18 00:40:52 +00:00
Tim Northover	fd7e424935	CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248	2014-07-17 10:51:23 +00:00
Yi Kong	2355066e43	Port memory barriers intrinsics to AArch64 Memory barrier __builtin_arm_[dmb, dsb, isb] intrinsics are required to implement their corresponding ACLE and MSVC intrinsics. This patch ports ARM dmb, dsb, isb intrinsic to AArch64. Differential Revision: http://reviews.llvm.org/D4520 llvm-svn: 213247	2014-07-17 10:50:20 +00:00
Tim Northover	e4b8e138e1	AArch64: fall back to generic code for out of range extract/insert. rdar://problem/17624784 llvm-svn: 213059	2014-07-15 10:00:26 +00:00
Tim Northover	3cb24110b1	AArch64: remove unnecessary pseudo-instruction. Sufficiently twisted use of TableGen lets us write patterns directly for f16 (as an i16 promoted to i32) -> f32 conversion. llvm-svn: 212933	2014-07-14 11:16:02 +00:00
Saleem Abdulrasool	f74d48a011	AArch64: add support for llvm.aarch64.hint intrinsic This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to support the various hint intrinsic functions in the ACLE. Add an optional pattern field that permits the subclass to specify the pattern that matches the selection. The intrinsic pattern is set as mayLoad, mayStore, so overload the value for the definition of the hint instruction. llvm-svn: 212883	2014-07-12 21:20:49 +00:00
Oliver Stannard	6eda6ffc0c	ARM: Allow __fp16 as a function arg or return type for AArch64 ACLE 2.0 allows __fp16 to be used as a function argument or return type. This enables this for AArch64. llvm-svn: 212812	2014-07-11 13:33:46 +00:00
Tim Northover	fee2adefba	AArch64: correctly fast-isel i8 & i16 multiplies We were asking for a register for type i8 or i16 which caused an assert. rdar://problem/17620015 llvm-svn: 212718	2014-07-10 14:18:46 +00:00
Hao Liu	71224b02fb	[AArch64]Fix an assertion failure in DAG Combiner about concating 2 build_vector. llvm-svn: 212677	2014-07-10 03:41:50 +00:00
Jim Grosbach	34cc92b475	AArch64: Better codegen for storing to __fp16. Storing will generally be immediately preceded by rounding from an f32 or f64, so make sure to match those patterns directly to convert into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-from-f64 path which was first converting to f32 and then to f16 from there. rdar://17594379 llvm-svn: 212638	2014-07-09 18:55:52 +00:00
Jim Grosbach	04691a530d	AArch64: Better codegen for loading from __fp16. Loading will generally extend to an f32 or an 64, so make sure to match those patterns directly to load into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-to-f64 path which was first converting to f32 and then to f64 from there. rdar://17594379 llvm-svn: 212573	2014-07-08 23:28:48 +00:00
Louis Gerbarg	4c5b4054b2	Allow AArch64FastISel to degrade graceully in the presence of an MVT::i128 Currently AArch64FastISel crashes if it tries to extend an integer into an MVT::i128. This can happen by creating 128 bit integers like so: typedef unsigned int uint128_t __attribute__((mode(TI))); typedef int sint128_t __attribute__((mode(TI))); This patch makes EmitIntExt check for their presence and then falls back to SelectionDAG. Tests included. rdar://17516686 llvm-svn: 212492	2014-07-07 21:37:51 +00:00
Tim Northover	55beb64bd0	CodeGen: it turns out that NAND is not the same thing as BIC. At all. We've been performing the wrong operation on ARM for "atomicrmw nand" for years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b". This bled over into the generic expansion pass. So I assume no-one has ever actually tried to do an atomic nand in the real world. Oh well. llvm-svn: 212443	2014-07-07 09:06:35 +00:00
Kevin Qin	4473c1943f	[AArch64] Normalize all constants to build a vector. The value of constant operands will be truncated to fit element width. llvm-svn: 212428	2014-07-07 02:45:40 +00:00
Chandler Carruth	395421fd98	[aarch64] Add a test that should have been in r212242 but I forgot to add it. Sorry about that. llvm-svn: 212251	2014-07-03 02:12:26 +00:00
Chandler Carruth	9d010fffe1	[codegen,aarch64] Add a target hook to the code generator to control vector type legalization strategies in a more fine grained manner, and change the legalization of several v1iN types and v1f32 to be widening rather than scalarization on AArch64. This fixes an assertion failure caused by scalarizing nodes like "v1i32 trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32. This also provides a foundation for other targets to have more granular control over how vector types are legalized. Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow some work to start taking place on top of this patch as it adds some really important hooks to the backend that I'd like to immediately start using. =] http://reviews.llvm.org/D4322 llvm-svn: 212242	2014-07-03 00:23:43 +00:00
Duncan P. N. Exon Smith	de58870394	AArch64: Re-enable AArch64AddressTypePromotion This reverts commits r212189 and r212190. While this pass was accidentally disabled (until r212073), r205437 slipped in a use of `auto` that should have been `auto&`. This fixes PR20188. llvm-svn: 212201	2014-07-02 18:17:40 +00:00
Duncan P. N. Exon Smith	292fa19077	XFAIL the test to go with r202189 llvm-svn: 212190	2014-07-02 17:07:03 +00:00
Chad Rosier	aba845e835	Revert "Revert "MachineScheduler: better book-keeping for asserts."" This reverts commit r212109, which reverted r212088. However, disable the assert as it's not necessary for correctness. There are several corner cases that the assert needed to handle better for in-order scheduling, but none of them are incorrect scheduler behavior. The assert is mainly there to collect good unit tests like this and ensure that the target-independent scheduler is working as expected with the various machine models. llvm-svn: 212187	2014-07-02 16:46:08 +00:00
Chad Rosier	f575a73751	Revert "MachineScheduler: better book-keeping for asserts." This reverts commit r212088, which is causing a number of spec failures. Will provide reduced test cases shortly. PR20057 llvm-svn: 212109	2014-07-01 17:23:11 +00:00
Andrew Trick	f1b307bcb0	MachineScheduler: better book-keeping for asserts. Fixes another test case under PR20057. llvm-svn: 212088	2014-07-01 03:23:13 +00:00
Duncan P. N. Exon Smith	7d7ae93139	AArch64: Actually do address type promotion AArch64AddressTypePromotion was doing nothing because it was using the old semantics of `Use` and `uses()`, when it really wanted to get at the `users()`. llvm-svn: 212073	2014-06-30 23:42:14 +00:00
Chad Rosier	304fe3ff71	[AArch64] Unsized types don't specify an alignment. PR20109 llvm-svn: 212045	2014-06-30 15:03:00 +00:00
Chad Rosier	e6b8761ab9	[AArch64] Convert mul x, -(pow2 +/- 1) to shift + add/sub. The combine for mul x, pow2 +/- 1 is unchanged. Test cases for both combines as well as mul x, pow2 have been added as well. llvm-svn: 212044	2014-06-30 14:51:14 +00:00
Chad Rosier	5235973ee0	[AArch64] Fix memset ICE when memset value is f128. llvm-svn: 211960	2014-06-27 21:05:09 +00:00
Andrew Trick	5632722cab	MachineScheduler: add some book-keeping to fix an assert. Fixe for Bug 20057 - Assertion failied in llvm::SUnit* llvm::SchedBoundary::pickOnlyChoice(): Assertion `i <= (HazardRec->getMaxLookAhead() + MaxObservedStall) && "permanent hazard"' Thanks to Chad for the test case. llvm-svn: 211865	2014-06-27 04:57:05 +00:00
Weiming Zhao	b1d4dbdcc7	Resubmit commit r211533 "Fix PR20056: Implement pseudo LDR <reg>, =<literal/label> for AArch64" Missed files are added in this commit. llvm-svn: 211605	2014-06-24 16:21:38 +00:00
Kevin Qin	93d45ecdbf	[AArch64] Fix a build_vector pattern match fail caused by defect in isBuildVectorAllZeros(). llvm-svn: 211567	2014-06-24 05:37:27 +00:00
Arnold Schwaighofer	fc308f5c9f	Add a triple so that right syntax is choosen on mac osx systems llvm-svn: 211188	2014-06-18 17:20:49 +00:00
Kevin Qin	f0ec9aff2a	[AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR. ReconstructShuffle() may wrongly creat a CONCAT_VECTOR trying to concat 2 of v2i32 into v4i16. This commit is to fix this issue and try to generate UZP1 instead of lots of MOV and INS. Patch is initalized by Kevin Qin, and refactored by Tim Northover. llvm-svn: 211144	2014-06-18 05:54:42 +00:00
Tim Northover	d5531f72dc	AArch64: estimate inline asm length during branch relaxation To make sure branches are in range, we need to do a better job of estimating the length of an inline assembly block than "it's probably 1 instruction, who'd write asm with more than that?". Fortunately there's already a (highly suspect, see how many ways you can think of to break it!) callback for this purpose, which is used by the other targets. rdar://problem/17277590 llvm-svn: 211095	2014-06-17 11:31:42 +00:00
James Molloy	1e3b5a49e1	[AArch64] Fix a fencepost error in lowering for llvm.aarch64.neon.uqshl. Patch by Jiangning Liu! llvm-svn: 211014	2014-06-16 10:39:21 +00:00
Tim Northover	dbecc3b3fc	AArch64: improve handling & modelling of FP_TO_XINT nodes. There's probably no acatual change in behaviour here, just updating the LowerFP_TO_INT function to be more similar to the reverse implementation and updating costs to current CodeGen. llvm-svn: 210985	2014-06-15 09:27:15 +00:00
Tim Northover	ef0d760cd9	AArch64: improve vector [su]itofp handling. This somehow got missed in the AArch64 merge, so should fix a performance regression since 3.4. llvm-svn: 210984	2014-06-15 09:27:06 +00:00
Jiangning Liu	96e92c1d75	Move GlobalMerge from Transform to CodeGen. This patch is to move GlobalMerge pass from Transform/Scalar to CodeGen, because GlobalMerge depends on TargetMachine. In the mean time, the macro INITIALIZE_TM_PASS is also moved to CodeGen/Passes.h. With this fix we can avoid making libScalarOpts depend on libCodeGen. llvm-svn: 210951	2014-06-13 22:57:59 +00:00
Tim Northover	420a216817	IR: add "cmpxchg weak" variant to support permitted failure. This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity all cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903	2014-06-13 14:24:07 +00:00
Chad Rosier	2205d4ef05	[AArch64] Basic Sched Model for Cortex-A57. Patch by Dave Estes<cestes@codeaurora.org> Differential Revision: http://reviews.llvm.org/D4008 llvm-svn: 210705	2014-06-11 21:06:56 +00:00
Jiangning Liu	b2ae37fb67	Global merge for global symbols. This commit is to improve global merge pass and support global symbol merge. The global symbol merge is not enabled by default. For aarch64, we need some more back-end fix to make it really benifit ADRP CSE. llvm-svn: 210640	2014-06-11 06:44:53 +00:00
Chad Rosier	d863ae39d1	[AArch64] Emit .ident compiler version attribute. Patch by Ana Pazos<apazos@codeaurora.org>! llvm-svn: 210535	2014-06-10 14:32:08 +00:00
Tim Northover	9ffd0b020f	AArch64: disallow x30 & x29 as the destination for indirect tail calls As Ana Pazos pointed out, these have to be restored to their incoming values before a function returns; i.e. before the tail call. So they can't be used correctly as the destination register. llvm-svn: 210525	2014-06-10 10:50:24 +00:00
Tim Northover	c141ad4b75	AArch64: teach FastISel how to handle offset FrameIndices Previously we were abandonning the attempt, leading to some combination of extra work (when selection of a load/store fails completely) and inferior code (when this leads to a real memcpy call instead of inlining). rdar://problem/17187463 llvm-svn: 210520	2014-06-10 09:52:44 +00:00
Tim Northover	c19445d07a	AArch64: make FastISel memcpy emission more robust. We were hitting an assert if FastISel couldn't create the load or store we requested. Currently this happens for large frame-local addresses, though CodeGen could be improved there. rdar://problem/17187463 llvm-svn: 210519	2014-06-10 09:52:40 +00:00
Alp Toker	d3d017cf00	Reduce verbiage of lit.local.cfg files We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496	2014-06-09 22:42:55 +00:00
Chad Rosier	3fe0c876c4	[AArch64] Fix the ordering of the accumulate operand in SchedRW list. Patch by Dave Estes <cestes@codeaurora.org> http://reviews.llvm.org/D4037 llvm-svn: 210446	2014-06-09 01:54:00 +00:00
Chad Rosier	d96e9f14ee	[AArch64] When combining constant mul of power of 2 plus/minus 1, prefer shift plus add. The shift can be folded into the add. This only effects codegen when the constant is 3. llvm-svn: 210445	2014-06-09 01:25:51 +00:00
Tilmann Scheller	2a7efeb7b7	[AArch64] Add regression tests for the load/store optimizer which cover post-index update folding with sub rather than add. The tests check that the following transform happens: (ldr\|str) X, [x20] ... sub x20, x20, #16 -> (ldr\|str) X, [x20], #-16 with X being either w0, x0, s0, d0 or q0. llvm-svn: 210113	2014-06-03 16:03:00 +00:00
Tim Northover	6890add11d	AArch64: mark small types (i1, i8, i16) as promoted This means the output of LowerFormalArguments returns a lowered SDValue with the correct type (expected in SelectionDAGBuilder). Without this, an assertion under a DEBUG macro triggers when those types are passed on the stack. llvm-svn: 210102	2014-06-03 13:54:53 +00:00
Jiangning Liu	cc4f38bc28	[AArch64] Correctly deal with VPR stack parameter passing. llvm-svn: 210067	2014-06-03 03:25:09 +00:00
Tilmann Scheller	75fa6e5a23	[AArch64] Add some more regression tests for store pre-index update folding in the load/store optimizer. Add tests for the following transform: add x8, x8, #16 ... str X, [x8] -> str X, [x8, #16]! with X being either w0, x0, s0, d0 or q0. llvm-svn: 210021	2014-06-02 12:33:33 +00:00
Tilmann Scheller	cfbacc8a65	[AArch64] Add some more regression tests for load pre-index update folding in the load/store optimizer. Add tests for the following transform: add x8, x8, #16 ... ldr X, [x8] -> ldr X, [x8, #16]! with X being either w0, x0, s0, d0 or q0. llvm-svn: 210018	2014-06-02 11:57:09 +00:00
Tim Northover	b4ddc0845a	ARM & AArch64: make use of common cmpxchg idioms after expansion The C and C++ semantics for compare_exchange require it to return a bool indicating success. This gets mapped to LLVM IR which follows each cmpxchg with an icmp of the value loaded against the desired value. When lowered to ldxr/stxr loops, this extra comparison is redundant: its results are implicit in the control-flow of the function. This commit makes two changes: it replaces that icmp with appropriate PHI nodes, and then makes sure earlyCSE is called after expansion to actually make use of the opportunities revealed. I've also added -{arm,aarch64}-enable-atomic-tidy options, so that existing fragile tests aren't perturbed too much by the change. Many of them either rely on undef/unreachable too pervasively to be restored to something well-defined (particularly while making sure they test the same obscure assert from many years ago), or depend on a particular CFG shape, which is disrupted by SimplifyCFG. rdar://problem/16227836 llvm-svn: 209883	2014-05-30 10:09:59 +00:00
Tim Northover	ce6538c38d	AArch64 & ARM: remove undefined behaviour from some tests. llvm-svn: 209880	2014-05-30 08:59:55 +00:00
Hao Liu	5c7314b68d	Test cases named with dates is a legacy rule not used now. Rename several test cases. llvm-svn: 209877	2014-05-30 05:58:19 +00:00
Hao Liu	d670c7eee0	Rename a test case to contain correct date info. llvm-svn: 209799	2014-05-29 09:21:23 +00:00
Hao Liu	4091450181	Fix an assertion failure caused by v1i64 in DAGCombiner Shrink. llvm-svn: 209798	2014-05-29 09:19:07 +00:00
Hal Finkel	2c77fe59d9	Revert "[DAGCombiner] Split up an indexed load if only the base pointer value is live" This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux self-hosting. Because nearly every regression test triggers a segfault, I hope this will be easy to fix. llvm-svn: 209747	2014-05-28 15:33:19 +00:00
Tilmann Scheller	7c747fc7c6	[AArch64] Add store post-index update folding regression tests for the load/store optimizer. Add regression tests for the following transformation: str X, [x20] ... add x20, x20, #32 -> str X, [x20], #32 with X being either w0, x0, s0, d0 or q0. llvm-svn: 209715	2014-05-28 06:43:00 +00:00
Tilmann Scheller	35e451461f	[AArch64] Add load post-index update folding regression tests for the load/store optimizer. Add regression tests for the following transformation: ldr X, [x20] ... add x20, x20, #32 -> ldr X, [x20], #32 with X being either w0, x0, s0, d0 or q0. llvm-svn: 209711	2014-05-28 05:44:14 +00:00
Tim Northover	9a6217b650	AArch64: add test for NZCV cross-copy save. llvm-svn: 209665	2014-05-27 16:50:09 +00:00
Tim Northover	de9402d345	AArch64: add AArch64-specific test for 'c' and 'n'. llvm-svn: 209664	2014-05-27 16:50:03 +00:00
Tim Northover	68ae503de9	AArch64: force i1 to be zero-extended at an ABI boundary. This commit is debatable. There are two possible approaches, neither of which is really satisfactory: 1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin, and 8 bits otherwise. 2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller to 8 bits. This goes against the spirit of "zeroext" I think, but it's a bit of a vague construct anyway (by definition you're going to extend to the amount required by the ABI, that's why it's the ABI!). This implements option 2. The DAG machinery really isn't setup for the first (there's a fairly strong assumption that "zeroext" goes to at least the smallest register size), and even if it was the resulting DAG looks like it would be inferior in many cases. Theoretically we could add AssertZext nodes in the consumers of ABI-passed values too now, but this actually seems to make the code worse in practice by making truncation proceed in two steps. The code produced is equally valid if we continue to assume only the low bit is defined. Should fix PR19850 llvm-svn: 209637	2014-05-26 17:22:07 +00:00
Tim Northover	47e003c65d	AArch64: simplify calling conventions slightly. We can eliminate the custom C++ code in favour of some TableGen to check the same things. Functionality should be identical, except for a buffer overrun that was present in the C++ code and meant webkit failed if any small argument needed to be passed on the stack. llvm-svn: 209636	2014-05-26 17:21:53 +00:00
Tilmann Scheller	cc3ebc8a98	[AArch64] Add store + add folding regression tests for the load/store optimization pass. Add tests for the following transform: str X, [x0, #32] ... add x0, x0, #32 -> str X, [x0, #32]! with X being either w1, x1, s0, d0 or q0. llvm-svn: 209627	2014-05-26 13:36:47 +00:00
Tilmann Scheller	112ada831f	[AArch64] Add more regression tests for the load/store optimization pass. Cover the following cases: ldr X, [x0, #32] ... add x0, x0, #32 -> ldr X, [x0, #32]! with X being either w1, x1, s0, d0 or q0. llvm-svn: 209624	2014-05-26 12:15:51 +00:00
Tilmann Scheller	968d599dc4	Remove accidentally committed whitespace. llvm-svn: 209619	2014-05-26 09:40:40 +00:00
Tilmann Scheller	2d746bcda3	[AArch64] Add a regression test for the load store optimizer. We have a couple of regression tests for load/store pairing, but (to my knowledge) there are no regression tests for the load/store + add/sub folding. As a first step towards increased test coverage of this area, this commit adds a test for one instance of a load + add to pre-indexed load transformation. llvm-svn: 209618	2014-05-26 09:37:19 +00:00
Tim Northover	3b0846e8f7	AArch64/ARM64: move ARM64 into AArch64's place This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577	2014-05-24 12:50:23 +00:00
Tim Northover	cc08e1fe1b	AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64. I'm doing this in two phases for a better "git blame" record. This commit removes the previous AArch64 backend and redirects all functionality to ARM64. It also deduplicates test-lines and removes orphaned AArch64 tests. The next step will be "git mv ARM64 AArch64" and rewire most of the tests. Hopefully LLVM is still functional, though it would be even better if no-one ever had to care because the rename happens straight afterwards. llvm-svn: 209576	2014-05-24 12:42:26 +00:00
Tim Northover	71d04225cf	AArch64/ARM64: enable more AArch64 tests. llvm-svn: 209408	2014-05-22 07:40:55 +00:00
Rafael Espindola	5a52b9f139	Revert "Implement global merge optimization for global variables." This reverts commit r208934. The patch depends on aliases to GEPs with non zero offsets. That is not supported and fairly broken. The good news is that GlobalAlias is being redesigned and will have support for offsets, so this patch should be a nice match for it. llvm-svn: 208978	2014-05-16 13:02:18 +00:00
Tim Northover	5896b066e6	TableGen: fix operand counting for aliases TableGen has a fairly dubious heuristic to decide whether an alias should be printed: does the alias have lest operands than the real instruction. This is bad enough (particularly with no way to override it), but it should at least be calculated consistently for both strings. This patch implements that logic: first get the correct string for the variant, in the same way as the Matcher, without guessing; then count the number of whitespace chars. There are basically 4 changes this brings about after the previous commits; all of these appear to be good, so I have changed the tests: + ARM64: we print "neg X, Y" instead of "sub X, xzr, Y". + ARM64: we skip implicit "uxtx" and "uxtw" modifiers. + Sparc: we print "mov A, B" instead of "or %g0, A, B". + Sparc: we print "fcmpX A, B" instead of "fcmpX %fcc0, A, B" llvm-svn: 208969	2014-05-16 09:42:04 +00:00
Jiangning Liu	932e1c3924	Implement global merge optimization for global variables. This commit implements two command line switches -global-merge-on-external and -global-merge-aligned, and both of them are false by default, so this optimization is disabled by default for all targets. For ARM64, some back-end behaviors need to be tuned to get this optimization further enabled. llvm-svn: 208934	2014-05-15 23:45:42 +00:00
Tim Northover	2509a3fc64	ARM64: print correct aliases for NEON mov & mvn instructions In all cases, if a "mov" alias exists, it is the canonical form of the instruction. Now that TableGen can support aliases containing syntax variants, we can enable them and improve the quality of the asm output. llvm-svn: 208874	2014-05-15 12:11:02 +00:00
Tim Northover	d8d65a69cf	TableGen/ARM64: print aliases even if they have syntax variants. To get at least one use of the change (and some actual tests) in with its commit, I've enabled the AArch64 & ARM64 NEON mov aliases. llvm-svn: 208867	2014-05-15 11:16:32 +00:00
Jiangning Liu	09cc564310	[ARM64] Support aggressive fastcc/tailcallopt breaking ABI by popping out argument stack from callee. llvm-svn: 208837	2014-05-15 01:33:17 +00:00
Tim Northover	ee20caaf82	TableGen: use PrintMethods to print more aliases llvm-svn: 208607	2014-05-12 18:04:06 +00:00
Tim Northover	88a51d983e	AArch64/ARM64: optimise vector selects & enable test When performing a scalar comparison that feeds into a vector select, it's actually better to do the comparison on the vector side: the scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP" since the vector comparisons are all mask based. llvm-svn: 208210	2014-05-07 14:10:27 +00:00
James Molloy	ccc7f982c1	[ARM64-BE] Make big endian (scalar) argument passing work correctly. This completes the port of r204814 (cpirker "AArch64_BE function argument passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues found during later development along the way. The biggest of these was that the alignment fixup logic wasn't replicated into all the places it should have been. llvm-svn: 208192	2014-05-07 11:28:36 +00:00
Tim Northover	df723343fa	AArch64/ARM64: run test on ARM64 too. llvm-svn: 208188	2014-05-07 10:47:04 +00:00
Tim Northover	76a94e6ead	AArch64/ARM64: put annotation in test It makes finding already covered tests much easier with "grep -L arm64". llvm-svn: 208187	2014-05-07 10:47:00 +00:00
Renato Golin	c7aea40ec6	Implememting named register intrinsics This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104	2014-05-06 16:51:25 +00:00
Rafael Espindola	52dc5d828f	Special case aliases in GlobalValue::getAlignment. An alias has the address of what it points to, so it also has the same alignment. This allows a few optimizations to see past aliases for free. llvm-svn: 208103	2014-05-06 16:48:58 +00:00
Tim Northover	534acbdf73	AArch64/ARM64: print BFM instructions as BFI or BFXIL The canonical form of the BFM instruction is always one of the more explicit extract or insert operations, which makes reading output much easier. llvm-svn: 207752	2014-05-01 12:29:38 +00:00
Tim Northover	20ad359b77	AArch64/ARM64: use HS instead of CS & LO instead of CC. On instructions using the NZCV register, a couple of conditions have dual representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and unsigned-lower/carry-clear). The first of these is more descriptive in most circumstances, so we should print it. llvm-svn: 207644	2014-04-30 13:14:03 +00:00
Tim Northover	970c4a8d35	ARM64: use hex immediates for movz/movk instructions Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to piece together an immediate in 16-bit chunks, hex is probably the most appropriate format. llvm-svn: 207635	2014-04-30 11:19:40 +00:00
Tim Northover	cfd6e66544	ARM64: print canonical syntax for add/sub (imm) instructions. Since these instructions only accept a 12-bit immediate, possibly shifted left by 12, the canonical syntax used by the architecture reference manual is "#N {, lsl #12 }". We should accept an immediate that has already been shifted, (e.g. Also, print a comment giving the full addend since it can be helpful. llvm-svn: 207633	2014-04-30 11:19:15 +00:00
James Molloy	7c39df37b2	[ARM64] Ensure arm64_be is dealt with when emitting debug info. This is a partial port of r204816 (cpirker "Elf support for MC-JIT runtime dynamic linker") from AArch64 to ARM64. llvm-svn: 207625	2014-04-30 10:15:35 +00:00
Benjamin Kramer	e1ab3f062e	AArch64: Mark vector long multiplication as expand. There are no patterns for this. This was already fixed for ARM64 but I forgot to apply it to AArch64 too. llvm-svn: 207515	2014-04-29 09:37:54 +00:00
Bradley Smith	672df15122	[ARM64] Print preferred aliases for SFBM/UBFM in InstPrinter llvm-svn: 207219	2014-04-25 10:25:29 +00:00
Kevin Qin	022d395c9c	[ARM64] Add RUN lines for "–target arm64 –mattr=-fp-armv8" on AArch64 no-fp test. This patch is a supplement of implementing predicate of FP, enabling aarch64 backend no-fp tests on arm64 target for verification. During this, one bug is exposed and fixed by this patch. llvm-svn: 207215	2014-04-25 09:44:20 +00:00
Tim Northover	6331d4b975	AArch64: print NEON lists with a space. This matches ARM64 behaviour, which I think is clearer. It also puts all the churn from that difference into one easily ignored commit. llvm-svn: 207116	2014-04-24 14:06:20 +00:00

... 2 3 4 5 6 ...

613 Commits