llvm-project

Commit Graph

Author	SHA1	Message	Date
Oliver Stannard	d55e115b58	ARM: Correctly align arguments after a byval struct is passed on the stack llvm-svn: 202985	2014-03-05 15:25:27 +00:00
Benjamin Kramer	b6d0bd48bd	[C++11] Replace llvm::next and llvm::prior with std::next and std::prev. Remove the old functions. llvm-svn: 202636	2014-03-02 12:27:27 +00:00
Mark Seaborn	be266aa325	Use 16 byte stack alignment for NaCl on ARM NaCl's ARM ABI uses 16 byte stack alignment, so set that in ARMSubtarget.cpp. Using 16 byte alignment exposes an issue in code generation in which a varargs function leaves a 4 byte gap between the values of r1-r3 saved to the stack and the following arguments that were passed on the stack. (Previously, this code only needed to support 4 byte and 8 byte alignment.) With this issue, llc generated: varargs_func: sub sp, sp, #16 push {lr} sub sp, sp, #12 add r0, sp, #16 // Should be 20 stm r0, {r1, r2, r3} ldr r0, .LCPI0_0 // Address of va_list add r1, sp, #16 str r1, [r0] bl external_func Fix the bug by checking for "Align > 4". Also simplify the code by using OffsetToAlignment(), and update comments. Differential Revision: http://llvm-reviews.chandlerc.com/D2677 llvm-svn: 201497	2014-02-16 18:59:48 +00:00
Tim Northover	b0430415e6	ARM: use natural LLVM IR for vshll instructions Similarly to the vshrn instructions, these are simple zext/sext + trunc operations. Using normal LLVM IR should allow for better code, and more sharing with the AArch64 backend. llvm-svn: 201093	2014-02-10 16:20:29 +00:00
Tim Northover	170daafe01	ARM: use LLVM IR to represent the vshrn operation vshrn is just the combination of a right shift and a truncate (and the limits on the immediate value actually mean the signedness of the shift doesn't matter). Using that representation allows us to get rid of an ARM-specific intrinsic, share more code with AArch64 and hopefully get better code out of the mid-end optimisers. llvm-svn: 201085	2014-02-10 14:04:07 +00:00
Matt Arsenault	25793a3f22	Add address space argument to allowsUnalignedMemoryAccess. On R600, some address spaces have more strict alignment requirements than others. llvm-svn: 200887	2014-02-05 23:15:53 +00:00
Juergen Ributzka	659ce00d60	[TLI] Add a new hook to TargetLowering to query the target if a load of a constant should be converted to simply the constant itself. Before this patch we used getIntImmCost from TargetTransformInfo to determine if a load of a constant should be converted to just a constant, but the threshold for this was set to an arbitrary value. This value works well for the two targets (X86 and ARM) that implement this target-hook, but it isn't target-independent at all. Now targets have the possibility to decide directly if this optimization should be performed. The default value is set to false to preserve the current behavior. The target hook has been moved to TargetLowering, which removed the last use and need of TargetTransformInfo in SelectionDAG. llvm-svn: 200271	2014-01-28 01:20:14 +00:00
Alp Toker	cb40291100	Fix known typos Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018	2014-01-24 17:20:08 +00:00
Jakob Stoklund Olesen	209120621a	Switch the NEON register class from QPR to DPair. The already allocatable DPair superclass contains odd-even D register pair in addition to the even-odd pairs in the QPR register class. There is no reason to constrain the set of D register pairs that can be used for NEON values. Any NEON instructions that require a Q register will automatically constrain the register class to QPR. The allocation order for DPair begins with the QPR registers, so register allocation is unlikely to change much. llvm-svn: 199186	2014-01-14 06:18:34 +00:00
Tim Northover	d6a729bb85	ARM MachO: sort out isTargetDarwin/isTargetIOS/... checks. The ARM backend has been using most of the MachO related subtarget checks almost interchangeably, and since the only target it's had to run on has been IOS (which is all three of MachO, Darwin and IOS) it's worked out OK so far. But we'd like to support embedded targets under the "--none-macho" triple, which means everything starts falling apart and inconsistent behaviours emerge. This patch should pick a reasonably sensible set of behaviours for the new triple (and any others that come along, with luck). Some choices were debatable (notably FP == r7 or r11), but we can revisit those later when deficiencies become apparent. llvm-svn: 198617	2014-01-06 14:28:05 +00:00
Bill Wendling	13199b17f8	Remove unnecessary #includes. llvm-svn: 198585	2014-01-06 06:00:00 +00:00
Bill Wendling	908bf814e7	Refactor function that checks that __builtin_returnaddress's argument is constant. This moves the check up into the parent class so that all targets can use it without having to copy (and keep in sync) the same error message. llvm-svn: 198579	2014-01-06 00:43:20 +00:00
Bill Wendling	df7dd28dc8	Emit an error message if the value passed to __builtin_returnaddress isn't a constant __builtin_returnaddress requires that the value passed into is be a constant. However, at -O0 even a constant expression may not be converted to a constant. Emit an error message intead of crashing. llvm-svn: 198531	2014-01-05 01:47:20 +00:00
Weiming Zhao	63871d255f	[aarch32] fix bug 18268: Incorrect condition of vsel Given vsel_cc, op1, op2, since vsel has no LE/LT, to generate vsel for such selection, it needs to inverse cc and swap op1 and op2. To inverse cc, both L/G and E bits should be flipped. llvm-svn: 197615	2013-12-18 22:25:17 +00:00
Alp Toker	f907b891da	Correct word hyphenations This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471	2013-12-05 05:44:44 +00:00
Tim Northover	dee8604caf	ARM: decide whether to use movw/movt based on "minsize" attribute. llvm-svn: 196102	2013-12-02 14:46:26 +00:00
Tim Northover	72360d201c	ARM: add pseudo-instructions for lit-pool global materialisation These are used by MachO only at the moment, and (much like the existing MOVW/MOVT set) work around the fact that the labels used in the actual instructions often contain PC-dependent components, which means that repeatedly materialising the same global can't be CSEed. With small modifications, it could be adapted to how ELF finds the address of _GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there. llvm-svn: 196090	2013-12-02 10:35:41 +00:00
Tim Northover	fa36dfeeca	Darwin-ARM: use movw/movt for static relocations llvm-svn: 195759	2013-11-26 12:45:05 +00:00
Tim Northover	db962e2c45	ARM: remove special cases for Darwin dynamic-no-pic mode. These are handled almost identically to static mode (and ELF's global address materialisation), except that a symbol may have "$non_lazy_ptr" appended. This can be handled by passing appropriate flags along with the instruction instead of using entirely separate pseudo-instructions. llvm-svn: 195655	2013-11-25 16:24:52 +00:00
Tim Northover	28adfbb0d1	ARM: produce friendly error for invalid inline asm We used to perform an invalid operation on an MVT and crash, which wasn't much fun. Patch by Oliver Stannard. llvm-svn: 194714	2013-11-14 17:15:39 +00:00
Bob Wilson	e7dde0c061	Enable optimization of sin / cos pair into call to __sincos_stret for iOS7+. rdar://12856873 Patch by Evan Cheng, with a fix for rdar://13209539 by Tilmann Scheller llvm-svn: 193942	2013-11-03 06:14:38 +00:00
Jim Grosbach	7236678687	Legalize: Improve legalization of long vector extends. When an extend more than doubles the size of the elements (e.g., a zext from v16i8 to v16i32), the normal legalization method of splitting the vectors will run into problems as by the time the destination vector is legal, the source vector is illegal. The end result is the operation often becoming scalarized, with the typical horrible performance. For example, on x86_64, the simple input of: define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind { %tmp = zext <16 x i8> %a to <16 x i32> store <16 x i32> %tmp, <16 x i32>*%p ret void } Generates: .section __TEXT,__text,regular,pure_instructions .section __TEXT,__const .align 5 LCPI0_0: .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .section __TEXT,__text,regular,pure_instructions .globl _bar .align 4, 0x90 _bar: vpunpckhbw %xmm0, %xmm0, %xmm1 vpunpckhwd %xmm0, %xmm1, %xmm2 vpmovzxwd %xmm1, %xmm1 vinsertf128 $1, %xmm2, %ymm1, %ymm1 vmovaps LCPI0_0(%rip), %ymm2 vandps %ymm2, %ymm1, %ymm1 vpmovzxbw %xmm0, %xmm3 vpunpckhwd %xmm0, %xmm3, %xmm3 vpmovzxbd %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vandps %ymm2, %ymm0, %ymm0 vmovaps %ymm0, (%rdi) vmovaps %ymm1, 32(%rdi) vzeroupper ret So instead we can check if there are legal types that enable us to split more cleverly when the input vector is already legal such that we don't turn it into an illegal type. If the extend is such that it's more than doubling the size of the input we check if - the number of vector elements is even, - the source type is legal, - the type of a split source is illegal, - the type of an extended (by doubling element size) source is legal, and - the type of that extended source when split is legal. If the conditions are met, instead of just splitting both the destination and the source types, we create an extend that only goes up one "step" (doubling the element width), and the continue legalizing the rest of the operation normally. The result is that this operates as a new, more effecient, termination condition for the loop of "split the operation until the destination type is legal." With this change, the above example now compiles to: _bar: vpxor %xmm1, %xmm1, %xmm1 vpunpcklbw %xmm1, %xmm0, %xmm2 vpunpckhwd %xmm1, %xmm2, %xmm3 vpunpcklwd %xmm1, %xmm2, %xmm2 vinsertf128 $1, %xmm3, %ymm2, %ymm2 vpunpckhbw %xmm1, %xmm0, %xmm0 vpunpckhwd %xmm1, %xmm0, %xmm3 vpunpcklwd %xmm1, %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vmovaps %ymm0, 32(%rdi) vmovaps %ymm2, (%rdi) vzeroupper ret This generalizes a custom lowering that was added a while back to the ARM backend. That lowering is no longer necessary, and is removed. The testcases for it, however, provide excellent ARM tests for this change and so remain. rdar://14735100 llvm-svn: 193727	2013-10-31 00:20:48 +00:00
Manman Ren	b504f49448	Struct byval cleanup: add helper functions to reduce code duplication. Helper functions are added: emitPostLd: emit a post-increment load operation with given size. emitPostSt: emit a post-increment store operation with given size. No functionality change. llvm-svn: 193656	2013-10-29 22:27:32 +00:00
Tim Northover	c7ea8048e7	ARM: don't expand atomicrmw inline on Cortex-M0 There's a barrier instruction so that should still be used, but most actual atomic operations are going to need a platform decision on the correct behaviour (either nop if single-threaded or OS-support otherwise). rdar://problem/15287210 llvm-svn: 193399	2013-10-25 09:30:24 +00:00
Jim Grosbach	1d1d6d4675	ARM: Tweak usage of '*vfp' compiler_rt functions. Only use them if the subtarget has ARM mode, as these routines are implemented as ARM code. rdar://15302004 llvm-svn: 193381	2013-10-24 23:07:11 +00:00
David Peixotto	b0653e539b	Remove class abstraction from ARM struct byval lowering This commit changes the struct byval lowering for arm to use inline checks for the subtarget instead of a class abstraction to represent the differences. The class abstraction was judged to be too much code for this task. No intended functionality change. llvm-svn: 193357	2013-10-24 16:39:36 +00:00
Tim Northover	94ecbd2e6c	ARM: Use non-VFP softcalls on embedded Darwinish targets The compiler-rt functions __adddf3vfp and so on exist purely to allow Thumb1 code to make use of VFP instructions by switching back to ARM mode, they make no sense for M-class processors which don't even have an ARM mode. Given that justification, in practice this is a platform ABI decision so the actual check is based on that rather than CPU features. rdar://problem/15302004 llvm-svn: 193327	2013-10-24 10:37:09 +00:00
David Peixotto	8e5abc52cb	17309 ARM backend incorrectly lowers COPY_STRUCT_BYVAL_I32 for thumb1 targets This commit implements the correct lowering of the COPY_STRUCT_BYVAL_I32 pseudo-instruction for thumb1 targets. Previously, the lowering of COPY_STRUCT_BYVAL_I32 generated the post-increment forms of ldr/ldrh/ldrb instructions. Thumb1 does not have the post-increment form of these instructions so the generated assembly contained invalid instructions. Passing the generated assembly to gcc caused it to complain with an error like this: Error: cannot honor width suffix -- `ldrb r3,[r0],#1' and the integrated assembler would generate an object file with an invalid instruction encoding. This commit contains a small test case that demonstrates the problem with thumb1 targets as well as an expanded test case that more throughly tests the lowering of byval struct passing for arm, thumb1, and thumb2 targets. llvm-svn: 192916	2013-10-17 19:52:05 +00:00
David Peixotto	c32e24a1b7	Refactor lowering for COPY_STRUCT_BYVAL_I32 This commit refactors the lowering of the COPY_STRUCT_BYVAL_I32 pseudo-instruction in the ARM backend. We introduce a new helper class that encapsulates all of the operations needed during the lowering. The operations are implemented for each subtarget in different subclasses. Currently only arm and thumb2 subtargets are supported. This refactoring was done to easily implement support for thumb1 subtargets. This initial patch does not add support for thumb1, but is only a refactoring. A follow on patch will implement the support for thumb1 subtargets. No intended functionality change. llvm-svn: 192915	2013-10-17 19:49:22 +00:00
Manman Ren	fd956dbae0	Struct byval: fix a copy-paste error for thumb2. PR17309 llvm-svn: 192730	2013-10-15 19:42:32 +00:00
Manman Ren	5a78755336	Struct byval: use the correct alignment for loads generated to load from struct byval to registers. We used to pass 0 which means the alignment of PtrVT. Even when the alignment of the struct is smaller than 4, the LOADs would have alignment of 4, and further optimizations could combine the LOADs into a ldm, which would cause crash. The fix is to pass the alignment of the struct byval. rdar://problem/15144402 llvm-svn: 192126	2013-10-07 19:47:53 +00:00
Matthias Braun	c22630e164	ARM: do not add a regmask for TAILJUMPs The jump doesn't really kill the registers, the following call does but we never get back anyway. This avoids some verify-machineinstrs problems when TAILJUMPs are if-converted. llvm-svn: 191962	2013-10-04 16:52:54 +00:00
Tim Northover	d840745829	ARM: support interrupt attribute This function-attribute modifies the callee-saved register list and function epilogue (specifically the return instruction) so that a routine is suitable for use as an interrupt-handler of the specified type without disrupting user-mode applications. rdar://problem/14207019 llvm-svn: 191766	2013-10-01 14:33:28 +00:00
Amara Emerson	b4ad2f396a	[ARM] Use the load-acquire/store-release instructions optimally in AArch32. Patch by Artyom Skrobov. llvm-svn: 191428	2013-09-26 12:22:36 +00:00
Weiming Zhao	2052f4843b	Fix PR 17368: disable vector mul distribution for square of add/sub for ARM Generally, it is desirable to distribute (a + b) * c to ac + bc for ARM with VMLx forwarding, where a, b and c are vectors. However, for (a + b)(a + b), distribution will result in one extra instruction. With distribution: x = a + b (add) y = a x (mul) z = y + b * y (mla) Without distribution: x = a + b (add) z = x * x (mul) This patch checks if a mul is a square of add/sub. If yes, skip distribution. llvm-svn: 191410	2013-09-25 23:12:06 +00:00
Joey Gouly	ccd04894c4	[ARMv8] Change hasV8Fp to hasFPARMv8, and other command line options to be more consistent. llvm-svn: 190692	2013-09-13 13:46:57 +00:00
Joey Gouly	926d3f5809	[ARMv8] Implement the new DMB/DSB operands. This removes the custom ISD Node: MEMBARRIER and replaces it with an intrinsic. llvm-svn: 190055	2013-09-05 15:35:24 +00:00
Cameron Esfahani	943908b78d	Clean up some usage of Triple. The base class has methods for determining if the target is iOS and Linux. llvm-svn: 189604	2013-08-29 20:23:14 +00:00
Tim Northover	f5769880d9	ARM: Use "dmb sy" for barriers on M-class CPUs The usual default of "dmb ish" (inner-shareable) isn't even a valid instruction on v6M or v7M (well, it does the same thing but software is strongly discouraged from using it) so we should emit a full-system barrier there. llvm-svn: 189483	2013-08-28 14:39:19 +00:00
Joey Gouly	e3dd684aad	[ARMv8] Add CodeGen for VMAXNM/VMINNM. llvm-svn: 189103	2013-08-23 12:01:13 +00:00
Joey Gouly	881eab53be	[ARMv8] Add CodeGen support for VSEL. This uses the ARMcmov pattern that Tim cleaned up in r188995. Thanks to Simon Tatham for his floating point help! llvm-svn: 189024	2013-08-22 15:29:11 +00:00
Joey Gouly	e1de9e9c33	[ARM] Constrain some register classes in EmitAtomicBinary64 so that we pass these tests with -verify-machineinstrs. llvm-svn: 189006	2013-08-22 12:19:24 +00:00
Tim Northover	f79c3a5aef	ARM: implement some simple f64 materializations. Previously we used a const-pool load for virtually all 64-bit floating values. Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov" instructions of one stripe or another. llvm-svn: 188773	2013-08-20 08:57:11 +00:00
Tim Northover	cc2e903bda	ARM: implement allowTruncateForTailCall Now that it's in place, it seems silly not to let ARM make use of the extra tail call opportunities. llvm-svn: 187795	2013-08-06 13:58:03 +00:00
Saleem Abdulrasool	0c2ee5a2cb	[ARM] check bitwidth in PerformORCombine When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the bitwidth of the second operands to both ands match before comparing the negation of the values. Split the check of the value of the second operands to the ands. Move the cast and variable declaration slightly higher to make it slightly easier to follow. Bug-Id: 16700 Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org> llvm-svn: 187404	2013-07-30 04:43:08 +00:00
Quentin Colombet	0f2fe74aaf	[ARM][ISel] Improve the lowering of vector loads. When vectors are built from a single value, the ARM lowering issues a scalar_to_vector node. This node is then always morphed into a move from the general purpose unit to the vector unit. When the value comes from a load, this can be simplified into a vector load to the right lane. This patch changes the lowering of insert_vector_elt to expose a vector friendly pattern in this situation. This is a step toward fixing <rdar://problem/14170854>. llvm-svn: 186999	2013-07-23 22:34:47 +00:00
Tim Northover	069f95f926	ARM: allow printing of ARM atomic DAG nodes. We'd forgotten to provide string representations for the special ARMISD atomic nodes; this adds them in. No effect on CodeGen, just makes the output of "-view-whatever-dags" slightly more readable. llvm-svn: 186406	2013-07-16 12:15:36 +00:00
Tim Northover	a7ecd241d2	ARM: implement ldrex, strex and clrex intrinsics Intrinsics already existed for the 64-bit variants, so these support operations of size at most 32-bits. llvm-svn: 186392	2013-07-16 09:46:55 +00:00
Renato Golin	8761069e22	ARM EABI divmod support This patch enables calls to __aeabi_idivmod when in EABI mode, by using the remainder value returned on registers (R1), enabled by the ARM triple "none-eabi". Note that Darwin and GNUEABI triples will continue lowering on GNU style, that is, using the stack for the remainder. Still need to add SREM/UREM support fix for 64-bit lowering. llvm-svn: 186390	2013-07-16 09:32:17 +00:00
Craig Topper	5871321e49	Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]). llvm-svn: 186301	2013-07-15 04:27:47 +00:00

1 2 3 4 5 ...

993 Commits