llvm-project

Commit Graph

Author	SHA1	Message	Date
Mehdi Amini	44ede33a69	Make TargetLowering::getPointerTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775	2015-07-09 02:09:04 +00:00
Mehdi Amini	ffc1402fad	Remove IsLittleEndian from TargetLowering and redirect to DataLayout Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11017 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241655	2015-07-08 01:00:38 +00:00
Akira Hatanaka	1bc8af78f4	[ARM] Define a subtarget feature and use it to decide whether long calls should be emitted. This is needed to enable ARM long calls for LTO and enable and disable it on a per-function basis. Out-of-tree projects currently using EnableARMLongCalls to emit long calls should start passing "+long-calls" to the feature string (see the changes made to clang in r241565). rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D9364 llvm-svn: 241566	2015-07-07 06:54:42 +00:00
Peter Collingbourne	6a9d1774d0	IR: Do not consider available_externally linkage to be linker-weak. From the linker's perspective, an available_externally global is equivalent to an external declaration (per isDeclarationForLinker()), so it is incorrect to consider it to be a weak definition. Also clean up some logic in the dead argument elimination pass and clarify its comments to better explain how its behavior depends on linkage, introduce GlobalValue::isStrongDefinitionForLinker() and start using it throughout the optimizers and backend. Differential Revision: http://reviews.llvm.org/D10941 llvm-svn: 241413	2015-07-05 20:52:35 +00:00
Benjamin Kramer	9bfb627a0e	[TargetLowering] StringRefize asm constraint getters. There is some functional change here because it changes target code from atoi(3) to StringRef::getAsInteger which has error checking. For valid constraints there should be no difference. llvm-svn: 241411	2015-07-05 19:29:18 +00:00
Hao Liu	2cd34bb585	[ARM] Lower interleaved memory accesses to vldN/vstN intrinsics. This patch also adds a function to calculate the cost of interleaved memory accesses. E.g. Lower an interleaved load: %wide.vec = load <8 x i32>, <8 x i32>* %ptr, align 4 %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> into: %vld2 = { <4 x i32>, <4 x i32> } call llvm.arm.neon.vld2(%ptr, 4) %vec0 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 0 %vec1 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 1 E.g. Lower an interleaved store: %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11> store <12 x i32> %i.vec, <12 x i32>* %ptr, align 4 into: %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3> %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7> %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11> call void llvm.arm.neon.vst3(%ptr, %sub.v0, %sub.v1, %sub.v2, 4) Differential Revision: http://reviews.llvm.org/D10533 llvm-svn: 240755	2015-06-26 02:45:36 +00:00
Alexander Kornienko	f00654e31b	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00
Alexander Kornienko	70bc5f1398	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137	2015-06-19 15:57:42 +00:00
Ahmed Bougacha	9a9094260d	[ARM] Look through concat when lowering in-place shuffles (VZIP, ..) Currently, we canonicalize shuffles that produce a result larger than their operands with: shuffle(concat(v1, undef), concat(v2, undef)) -> shuffle(concat(v1, v2), undef) because we can access quad vectors (see PerformVECTOR_SHUFFLECombine). This is useful in the general case, but there are special cases where native shuffles produce larger results: the two-result ops. We can look through the concat when lowering them: shuffle(concat(v1, v2), undef) -> concat(VZIP(v1, v2):0, :1) This lets us generate the native shuffles instead of scalarizing to dozens of VMOVs. Differential Revision: http://reviews.llvm.org/D10424 llvm-svn: 240118	2015-06-19 02:32:35 +00:00
Ahmed Bougacha	2ffa91f908	[ARM] Factor out two-result shuffle matching. NFCI. In preparation for a future patch: makes it easier to do the same matching to generate different nodes, without duplication. llvm-svn: 240116	2015-06-19 02:25:01 +00:00
Daniel Sanders	c81f450f1a	Clean up redundant copies of Triple objects. NFC Summary: Reviewers: rengolin Reviewed By: rengolin Subscribers: llvm-commits, rengolin, jholewinski Differential Revision: http://reviews.llvm.org/D10382 llvm-svn: 239823	2015-06-16 15:44:21 +00:00
Reid Kleckner	c35e7f52ba	Revert "Move dllimport name mangling to IR mangler." This reverts commit r239437. This broke clang-cl self-hosts. We'd end up calling the __imp_ symbol directly instead of using it to do an indirect function call. llvm-svn: 239502	2015-06-11 01:31:48 +00:00
Peter Collingbourne	9fe51fdf18	Move dllimport name mangling to IR mangler. This ensures that LTO clients see the correct external symbol name. Differential Revision: http://reviews.llvm.org/D10318 llvm-svn: 239437	2015-06-09 22:09:53 +00:00
Akira Hatanaka	d9699bc7bd	Remove DisableTailCalls from TargetOptions and the code in resetTargetOptions that was resetting it. Remove the uses of DisableTailCalls in subclasses of TargetLowering and use the value of function attribute "disable-tail-calls" instead. Also, unconditionally add pass TailCallElim to the pipeline and check the function attribute at the start of runOnFunction to disable the pass on a per-function basis. This is part of the work to remove TargetMachine::resetTargetOptions, and since DisableTailCalls was the last non-fast-math option that was being reset in that function, we should be able to remove the function entirely after the work to propagate IR-level fast-math flags to DAG nodes is completed. Out-of-tree users should remove the uses of DisableTailCalls and make changes to attach attribute "disable-tail-calls"="true" or "false" to the functions in the IR. rdar://problem/13752163 Differential Revision: http://reviews.llvm.org/D10099 llvm-svn: 239427	2015-06-09 19:07:19 +00:00
Peter Collingbourne	6679fc1a79	Revert r238473, "Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM." as it caused miscompilations and assertion failures (PR23768, http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150601/280380.html). llvm-svn: 239169	2015-06-05 18:01:28 +00:00
Luke Cheeseman	85fd06d389	Re-commit of r238201 with fix for building with shared libraries. llvm-svn: 238739	2015-06-01 12:02:47 +00:00
Matt Arsenault	bd7d80a4a6	Add address space argument to isLegalAddressingMode This is important because of different addressing modes depending on the address space for GPU targets. This only adds the argument, and does not update any of the uses to provide the correct address space. llvm-svn: 238723	2015-06-01 05:31:59 +00:00
Peter Collingbourne	450fbee6b2	Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM. We were previously codegen'ing these as regular load/store operations and hoping that the register allocator would allocate registers in ascending order so that we could apply an LDM/STM combine after register allocation. According to the commit that first introduced this code (r37179), we planned to teach the register allocator to allocate the registers in ascending order. This never got implemented, and up to now we've been stuck with very poor codegen. A much simpler approach for achiveing better codegen is to create LDM/STM instructions with identical sets of virtual registers, let the register allocator pick arbitrary registers and order register lists when printing an MCInst. This approach also avoids the need to repeatedly calculate offsets which ultimately ought to be eliminated pre-RA in order to decrease register pressure. This is implemented by lowering the memcpy intrinsic to a series of SD-only MCOPY pseudo-instructions which performs a memory copy using a given number of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a little unusual, but it avoids the need to encode register lists in the SD, and we can take advantage of SD use lists to decide whether to use the _UPD variant of the instructions. Fixes PR9199. Differential Revision: http://reviews.llvm.org/D9508 llvm-svn: 238473	2015-05-28 20:02:45 +00:00
Diego Novillo	bfecc06656	Revert "Re-commit changes in r237579 with fix for bug breaking windows builds." This reverts commit r238201 to fix linking problems in x86 Linux http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150525/278413.html llvm-svn: 238223	2015-05-26 17:45:38 +00:00
Luke Cheeseman	a5d053d6f4	Re-commit changes in r237579 with fix for bug breaking windows builds. llvm-svn: 238201	2015-05-26 13:40:31 +00:00
Luke Cheeseman	0af4f635f1	Test Commit llvm-svn: 238199	2015-05-26 13:10:35 +00:00
Matthias Braun	6091208331	ARM: Fix comment and make it slightly more readable llvm-svn: 237820	2015-05-20 18:40:06 +00:00
David Blaikie	ff6409d096	Simplify IRBuilder::CreateCall* by using ArrayRef+initializer_list/braced init only llvm-svn: 237624	2015-05-18 22:13:54 +00:00
Oliver Stannard	6cb23465e0	Revert r237579, as it broke windows buildbots llvm-svn: 237583	2015-05-18 16:39:16 +00:00
Oliver Stannard	0c553afe6a	[LLVM - ARM/AArch64] Add ACLE special register intrinsics This patch implements LLVM support for the ACLE special register intrinsics in section 10.1, __arm_{w,r}sr{,p,64}. This patch is intended to lower the read/write_register instrinsics, used to implement the special register intrinsics in the clang patch for special register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor registers in AArch32 and AArch64. This is done by inspecting the register string passed to the intrinsic and then lowering to the appropriate instruction. Patch by Luke Cheeseman. Differential Revision: http://reviews.llvm.org/D9699 llvm-svn: 237579	2015-05-18 16:23:33 +00:00
Artyom Skrobov	a70dfe18d3	Re-apply r237247 - [AArch64] Codegen VMAX/VMIN for safe math cases No longer breaks SPEC2000/2006 llvm-svn: 237361	2015-05-14 12:59:46 +00:00
Tim Northover	4998a47f73	ARM: remove custom jump table UID We were creating and propagating two separate indices for each jump table (from back in the mists of time). However, the generic index used by other backends is sufficient to emit a unique symbol so this was unneeded. llvm-svn: 237294	2015-05-13 20:28:38 +00:00
Silviu Baranga	780a3b3be7	Revert r237247 - [AArch64] Codegen VMAX/VMIN.. as it is causing failures in SPEC2000/2006 llvm-svn: 237256	2015-05-13 14:03:18 +00:00
Artyom Skrobov	b526681e08	[AArch64] Codegen VMAX/VMIN for safe math cases llvm-svn: 237247	2015-05-13 12:01:09 +00:00
Eric Christopher	824f42f209	Migrate existing backends that care about software floating point to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079	2015-05-12 01:26:05 +00:00
Arnold Schwaighofer	f54b73d681	ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916	2015-05-08 23:52:00 +00:00
Matthias Braun	f45afee3dc	Fix typo. llvm-svn: 236785	2015-05-07 22:16:10 +00:00
Matthias Braun	d04893fa36	Change getTargetNodeName() to produce compiler warnings for missing cases, fix them llvm-svn: 236775	2015-05-07 21:33:59 +00:00
Artyom Skrobov	3f8eae92a4	[ARM] generate VMAXNM/VMINNM for a compare followed by a select, in safe math mode too llvm-svn: 236590	2015-05-06 11:44:10 +00:00
Pete Cooper	5111881cfc	Don't always apply kill flag in thumb2 ABS pseudo expansion. The expansion for t2ABS was always setting the kill flag on the rsb instruction. It should instead only be set on rsb if it was set on the original ABS instruction. rdar://problem/20752113 llvm-svn: 236272	2015-04-30 22:15:59 +00:00
Sergey Dmitrouk	842a51bad8	Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes" [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989	2015-04-28 14:05:47 +00:00
Daniel Jasper	48e93f7181	Revert "[DebugInfo] Add debug locations to constant SD nodes" This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987	2015-04-28 13:38:35 +00:00
Sergey Dmitrouk	adb4c69d5c	[DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977	2015-04-28 11:56:37 +00:00
Matthias Braun	eec4efcca5	Cleanup, remove unused return value llvm-svn: 235952	2015-04-28 00:37:05 +00:00
Scott Douglass	7ad7792088	[ARM] make vminnm/vmaxnm work with ?le, ?ge and no-nans-fp-math Because -menable-no-nans causes fcmp conditions to be rewritten without 'o' or 'u' the recognition code in needs to cope. Also extended it to handle 'le' and 'ge. Differential Revision: http://reviews.llvm.org/D8725 llvm-svn: 234421	2015-04-08 17:18:28 +00:00
Derek Schuff	b051389f04	Use movw/movt instead of constant pool loads to lower byval parameter copies Summary: The ARM backend can use a loop to implement copying byval parameters before a call. In non-thumb2 mode it uses a constant pool load to materialize the trip count. For targets that need movt instead (e.g. Native Client), use the same code as in thumb2 mode to materialize the trip count. Reviewers: jfb, t.p.northover Differential Revision: http://reviews.llvm.org/D8442 llvm-svn: 233324	2015-03-26 22:11:00 +00:00
Benjamin Kramer	799003bf8c	Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used. llvm-svn: 232998	2015-03-23 19:32:43 +00:00
James Molloy	fa041153e5	[ARM] Remove target-specific ITOFP/FPTOI nodes Anton tried this 5 years ago but it was reverted due to extra VMOVs being emitted. This can be easily fixed with a liberal application of patterns - matching loads/stores and extractelts. llvm-svn: 232958	2015-03-23 16:15:16 +00:00
John Brawn	0dbcd65442	[ARM] Align stack objects passed to memory intrinsics Memcpy, and other memory intrinsics, typically tries to use LDM/STM if the source and target addresses are 4-byte aligned. In CodeGenPrepare look for calls to memory intrinsics and, if the object is on the stack, 4-byte align it if it's large enough that we expect that memcpy would want to use LDM/STM to copy it. Differential Revision: http://reviews.llvm.org/D7908 llvm-svn: 232627	2015-03-18 12:01:59 +00:00
Eric Christopher	ae32649ff2	In preparation for moving ARM's TargetRegisterInfo to the TargetMachine merge Thumb1RegisterInfo and Thumb2RegisterInfo. This will enable us to match the TargetMachine for our TargetRegisterInfo classes. llvm-svn: 232117	2015-03-12 22:48:50 +00:00
Aaron Ballman	c579d66b9a	Silencing an "enumeral and non-enumeral type in conditional expression" warning; NFC. llvm-svn: 232035	2015-03-12 13:24:06 +00:00
Eric Christopher	9deb75d176	Have getCallPreservedMask and getThisCallPreservedMask take a MachineFunction argument so that we can grab subtarget specific features off of it. llvm-svn: 231979	2015-03-11 22:42:13 +00:00
Tim Northover	8cda34f5e7	ARM: simplify and extend byval handling The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, leading to FrameLowering setting the frame pointer incorrectly and clobbering the stack. But byval handling had grown over many years, and had multiple layers of cruft trying to compensate for each other and calculate padding correctly. This only really needs to be done once, in the HandleByVal function. Elsewhere should just do what it's told by that call. I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits byvals with the correct C ABI alignment), which simplified HandleByVal. rdar://20095672 llvm-svn: 231959	2015-03-11 18:54:22 +00:00
Benjamin Kramer	7bd1f7cb58	Remove the remaining uses of abs64 and nuke it. std::abs works just fine and we're already using it in many places. NFC intended. llvm-svn: 231696	2015-03-09 20:20:16 +00:00
Benjamin Kramer	867bfc53ee	Make constant arrays that are passed to functions as const. In theory this allows the compiler to skip materializing the array on the stack. In practice clang often fails to do that, but that's a different story. NFC. llvm-svn: 231571	2015-03-07 17:41:00 +00:00
Ahmed Bougacha	4200cc95b4	[ARM] Enable vector extload combine for legal types. This commit enables forming vector extloads for ARM. It only does so for legal types, and when we can't fold the extension in a wide/long form of the user instruction. Enabling it for larger types isn't as good an idea on ARM as it is on X86, because: - we pretend that extloads are legal, but end up generating vld+vmov - we have instructions like vld {dN, dM}, which can't be generated when we "manually expand" extloads to vld+vmov. For legal types, the combine doesn't fire that often: in the integration tests only in a big endian testcase, where it removes a pointless AND. Related to rdar://19723053 Differential Revision: http://reviews.llvm.org/D7423 llvm-svn: 231396	2015-03-05 19:37:53 +00:00
JF Bastien	f14889ee34	Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded. Summary: In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all. This does not represent a behavioural change and as such no tests were added. Patch by: Richard Diamond. Reviewers: jfb Reviewed By: jfb Subscribers: jfb, aemerson, t.p.northover, llvm-commits Differential Revision: http://reviews.llvm.org/D7713 llvm-svn: 231250	2015-03-04 15:47:57 +00:00
Pete Cooper	ef21bd444d	Remove MCStreamer.h include from MCContext.h and explictly include it where necessary. NFC llvm-svn: 231193	2015-03-04 01:24:11 +00:00
Eric Christopher	11e4df73c8	getRegForInlineAsmConstraint wants to use TargetRegisterInfo for a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699	2015-02-26 22:38:43 +00:00
Eric Christopher	23a3a7c871	Remove an argument-less call to getSubtargetImpl from TargetLoweringBase. This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583	2015-02-26 00:00:24 +00:00
Tim Northover	e95c5b3236	ARM: treat [N x i32] and [N x i64] as AAPCS composite types The logic is almost there already, with our special homogeneous aggregate handling. Tweaking it like this allows front-ends to emit AAPCS compliant code without ever having to count registers or add discarded padding arguments. Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to apply the logic to all integer arrays for more consistency. llvm-svn: 230348	2015-02-24 17:22:34 +00:00
Eric Christopher	ed47b22951	Rewrite the global merge pass to be subprogram agnostic for now. It was previously using the subtarget to get values for the global offset without actually checking each function as it was generating code. Go ahead and solidify the current behavior and make the existing FIXMEs more prominent. As a note the ARM backend previously had a thumb1 and non-thumb1 set of defaults. Only the former was tested so I've changed the behavior to only use that for now. llvm-svn: 230245	2015-02-23 19:28:45 +00:00
Tim Northover	3b6b7ca2bc	CodeGen: convert CCState interface to using ArrayRefs Everyone except R600 was manually passing the length of a static array at each callsite, calculated in a variety of interesting ways. Far easier to let ArrayRef handle that. There should be no functional change, but out of tree targets may have to tweak their calls as with these examples. llvm-svn: 230118	2015-02-21 02:11:17 +00:00
Ahmed Bougacha	db141ac37d	[ARM] Re-re-apply VLD1/VST1 base-update combine. This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932	2015-02-19 23:52:41 +00:00
Ahmed Bougacha	dfdf54bed0	[ARM] Minor cleanup to CombineBaseUpdate. NFC. In preparation for a future patch: - rename isLoad to isLoadOp: the former is confusing, and can be taken to refer to the fact that the node is an ISD::LOAD. (it isn't, yet.) - change formatting here and there. - add some comments. - const-ify bools. llvm-svn: 229929	2015-02-19 23:30:37 +00:00
Ahmed Bougacha	4c2b0781a5	[CodeGen] Use ArrayRef instead of std::vector&. NFC. The former lets us use SmallVectors. Do so in ARM and AArch64. llvm-svn: 229925	2015-02-19 23:13:10 +00:00
Andrew Trick	05938a5481	AArch64: Safely handle the incoming sret call argument. This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering llvm-svn: 229413	2015-02-16 18:10:47 +00:00
Duncan P. N. Exon Smith	2cff9e19a2	ARM: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229220	2015-02-14 02:24:44 +00:00
Benjamin Kramer	5f6a907288	MathExtras: Bring Count(Trailing\|Leading)Ones and CountPopulation in line with countTrailingZeros Update all callers. llvm-svn: 228930	2015-02-12 15:35:40 +00:00
Tim Northover	45aa89c925	ARM & AArch64: teach LowerVSETCC that output type size may differ from input. While various DAG combines try to guarantee that a vector SETCC operation will have the same output size as input, there's nothing intrinsic to either creation or LegalizeTypes that actually guarantees it, so the function needs to be ready to handle a mismatch. Fortunately this is easy enough, just extend or truncate the naturally compared result. I couldn't reproduce the failure in other backends that I know have SIMD, so it's probably only an issue for these two due to shared heritage. Should fix PR21645. llvm-svn: 228518	2015-02-08 00:50:47 +00:00
Renato Golin	2a5c0a51ce	Reverting VLD1/VST1 base-updating/post-incrementing combining This reverts patches 223862, 224198, 224203, and 224754, which were all related to the vector load/store combining and were reverted/reaplied a few times due to the same alignment problems we're seeing now. Further tests, mainly self-hosting Clang, will be needed to reapply this patch in the future. llvm-svn: 228129	2015-02-04 10:11:59 +00:00
Eric Christopher	1889fdc142	Remove getSubtargetImpl from ARMISelLowering and cache the correct subtarget by passing it in during the constructor as TargetLowering is Subtarget specific. llvm-svn: 227401	2015-01-29 00:19:39 +00:00
Jyoti Allur	f1d7050a25	This patch fixes issue with lowering below mentioned pattern :- _foo: smull r0, r1, r1, r0 smull r2, r3, r3, r2 adds r0, r2, r0 adc r1, r3, r1 bx lr to _foo: smull r0, r1, r1, r0 smlal r0, r1, r3, r2 bx lr llvm-svn: 226904	2015-01-23 09:10:03 +00:00
Ahmed Bougacha	2b6917b020	[SelectionDAG] Allow targets to specify legality of extloads' result type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421	2015-01-08 00:51:32 +00:00
Ahmed Bougacha	67dd2d25a3	[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended. A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392	2015-01-07 21:27:10 +00:00
Saleem Abdulrasool	67f729933f	ARM: permit tail calls to weak externals on COFF Weak externals are resolved statically, so we can actually generate the tail call on PE/COFF targets without breaking the requirements. It is questionable whether we want to propagate the current behaviour for MachO as the requirements are part of the ARM ELF specifications, and it seems that prior to the SVN r215890, we would have tail'ed the call. For now, be conservative and only permit it on PE/COFF where the call will always be fully resolved. llvm-svn: 225119	2015-01-03 21:35:00 +00:00
Ahmed Bougacha	4553bff412	[ARM] Don't break alignment when combining base updates into load/stores. r223862/r224203 tried to also combine base-updating load/stores. There was a mistake there: the alignment was added as is as an operand to the ARMISD::VLD/VST node. However, the VLD/VST selection logic doesn't care about less-than-standard alignment attributes. For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks VLD1q64 (because of the memory type). But VLD1q64 ("vld1.64 {dXX, dYY}") is 8-aligned, per ARMARMv7a 3.2.1. For the 1-aligned load, what we really want is VLD1q8. This commit introduces bitcasts if necessary, and changes the vld/vst type to one whose standard alignment matches the original load/store alignment. Differential Revision: http://reviews.llvm.org/D6759 llvm-svn: 224754	2014-12-23 06:07:31 +00:00
Aaron Ballman	0d6a010c13	Fixing -Wsign-compare warnings; NFC. llvm-svn: 224337	2014-12-16 14:04:11 +00:00
Bradley Smith	ececb7f6e2	[ARM] Prevent PerformVCVTCombine from combining a vmul/vcvt with 8 lanes This would result in a crash since the vcvt used does not support v8i32 types. llvm-svn: 224332	2014-12-16 10:59:27 +00:00
Michael Ilseman	addddc441f	Silence more static analyzer warnings. Add in definedness checks for shift operators, null checks when pointers are assumed by the code to be non-null, and explicit unreachables. llvm-svn: 224255	2014-12-15 18:48:43 +00:00
Ahmed Bougacha	0cb861634b	Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores." r223862 tried to also combine base-updating load/stores. r224198 reverted it, as "it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown." Reapply, with a fix to ignore non-normal load/stores. Truncstores are handled elsewhere (you can actually write a pattern for those, whereas for postinc loads you can't, since they return two values), but it should be possible to also combine extloads base updates, by checking that the memory (rather than result) type is of the same size as the addend. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 224203	2014-12-13 23:22:12 +00:00
Renato Golin	df8f9b6dc9	Revert "[ARM] Combine base-updating/post-incrementing vector load/stores." This reverts commit r223862, as it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown. We'll investigate the issue and re-apply when safe. llvm-svn: 224198	2014-12-13 20:23:18 +00:00
Ahmed Bougacha	7efbac74ec	[ARM] Combine base-updating/post-incrementing vector load/stores. We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 223862	2014-12-10 00:07:37 +00:00
Ahmed Bougacha	b31fba1613	[ARM] Factor out base-updating VLD/VST combiner function. NFC. Move the combiner-state check into another function, add a few small comments, and use a more general type in a cast<>. In preparation for a future patch. llvm-svn: 223834	2014-12-09 21:30:00 +00:00
Ahmed Bougacha	2316746e40	[ARM] Move the store combiner function down. NFC. And flip its final condition. In preparation for a future patch. llvm-svn: 223833	2014-12-09 21:26:53 +00:00
Eric Christopher	66322e822c	Both of these subtargets have functions that check whether or not the target is mach-o. Use them. llvm-svn: 223420	2014-12-05 00:22:35 +00:00
Craig Topper	61e88f44f9	Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *' llvm-svn: 222509	2014-11-21 05:58:21 +00:00
Reid Kleckner	343c395f11	Fix more instances of -Wsentinel on Windows with s/NULL/nullptr/ Follow up to r221940, where I must not have caught em all. NFC llvm-svn: 222481	2014-11-20 23:51:47 +00:00
David Blaikie	70573dcd9f	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
Aditya Nandakumar	3053155652	We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed. llvm-svn: 221926	2014-11-13 21:29:21 +00:00
Aditya Nandakumar	a27193297f	This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878	2014-11-13 09:26:31 +00:00
Akira Hatanaka	b961534818	[ARM, inline-asm] Fix ARMTargetLowering::getRegForInlineAsmConstraint to return register class tGPRRegClass if the target is thumb1. This commit fixes a crash that occurs during register allocation which was triggered when a virtual register defined by an inline-asm instruction had to be spilled. rdar://problem/18740489 llvm-svn: 221178	2014-11-03 20:37:04 +00:00
Daniel Sanders	8104b75c9f	Renamed CCState members that appear to misspell 'Processed' as 'Proceed'. NFC. Reviewers: rnk Reviewed By: rnk Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D5978 llvm-svn: 221061	2014-11-01 19:32:23 +00:00
Quentin Colombet	c32615dfef	[CodeGenPrepare] Move extractelement close to store if they can be combined. This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. Context Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. Motivating Example Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. Proposed Solution To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978	2014-10-31 17:52:53 +00:00
Oliver Stannard	79efe41a0c	[ARM] Select VMAXNM and VMINNM regardless of operand order Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671	2014-10-27 09:23:02 +00:00
Renato Golin	6fb9c2ea70	Do not emit intermediate register for zero FP immediate This updates check for double precision zero floating point constant to allow use of instruction with immediate value rather than temporary register. Currently "a == 0.0", where "a" is of "double" type generates: vmov.i32 d16, #0x0 vcmpe.f64 d0, d16 With this change it becomes: vcmpe.f64 d0, #0 Patch by Sergey Dmitrouk. llvm-svn: 220486	2014-10-23 15:31:50 +00:00
Tim Northover	23075ccee7	ARM: rework Thumb1 frame index rewriting The previous code had a few problems, motivating the choices here. 1. It could create instructions clobbering CPSR, but the incoming MachineInstr didn't reflect this. A potential source of corruption. This is why the patch has a new PseudoInst for before lowering. 2. Similarly, there was some code to handle the incoming instruction not being ARMCC::AL, but this would have caused massive problems if it was actually invoked when a complex offset needing more than one instruction was requested. 3. It wasn't designed to handle unaligned pointers (or offsets). These should probably be minimised anyway, but the code needs to deal with them properly regardless. 4. It had some rather dubious ad-hoc code to avoid calling emitThumbRegPlusImmediate, a function which should be designed to do precisely this job. We seem to cover the common cases correctly now, and hopefully can enhance emitThumbRegPlusImmediate to handle any extra optimisations we need to add in future. llvm-svn: 220236	2014-10-20 21:28:41 +00:00
Bob Wilson	9868d71ffe	Use triple's isiOS() and isOSDarwin() methods. These methods are already used in lots of places. This makes things more consistent. NFC. llvm-svn: 219386	2014-10-09 05:43:30 +00:00
Eric Christopher	5312afe7e1	constify TargetMachine argument. llvm-svn: 218930	2014-10-03 00:17:59 +00:00
Oliver Stannard	d4e0a4fd2c	[ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5 Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. llvm-svn: 218763	2014-10-01 13:13:18 +00:00
Tom Stellard	1fa1ce6112	ARM: Remove unneeded check for MI->hasPostISelHook() llvm-svn: 218459	2014-09-25 18:59:23 +00:00
Robin Morisset	dedef3325f	Add AtomicExpandPass::bracketInstWithFences, and use it whenever getInsertFencesForAtomic would trigger in SelectionDAGBuilder Summary: The goal is to eventually remove all the code related to getInsertFencesForAtomic in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works mostly by accident because the backends are overly conservative), and repeats the same logic that goes in emitLeading/TrailingFence. In this patch, I make AtomicExpandPass insert the fences as it knows better where to put them. Because this requires getting the fences and not just passing an IRBuilder around, I had to change the return type of emitLeading/TrailingFence. This code only triggers on ARM for now. Because it is earlier in the pipeline than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so SelectionDAGBuilder does not add barriers anymore on ARM. If this patch is accepted I plan to implement emitLeading/TrailingFence for all backends that setInsertFencesForAtomic(true), which will allow both making them less conservative and simplifying SelectionDAGBuilder once they are all using this interface. This should not cause any functionnal change so the existing tests are used and not modified. Test Plan: make check-all, benefits from existing tests of atomics on ARM Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5179 llvm-svn: 218329	2014-09-23 20:31:14 +00:00
Robin Morisset	a7b357fed1	Just add a fixme about a possibly faster implementation of some atomic loads on some ARM processors llvm-svn: 218326	2014-09-23 18:33:21 +00:00
Quentin Colombet	17799fedb7	[ARM] Do not perform a tail call when the caller returns several values. The fix is slightly different then x86 (see r216117) because the number of values attached to a return can vary even for a single returned value (e.g., f64 yields two returned values). <rdar://problem/18352998> llvm-svn: 218076	2014-09-18 21:17:50 +00:00
Robin Morisset	5349e8e532	Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors" Summary: This patch was originally in D5304 (I could not find a way to reopen that revision). It was accepted, commited and broke the build bots because the overloading of the constructor of ArrayRef for braced initializer lists is not supported by all toolchains. I then reverted it, and propose this fixed version that uses a plain C array instead in makeDMB (that array is then converted implicitly to an ArrayRef, but that is not behind an ifdef). Could someone confirm me whether initialization lists for plain C arrays are supported by every toolchain used to build llvm ? Otherwise I can just initialize the array in the old way: args[0] = ...; .. ; args[5] = ...; Below is the description of the original patch: ``` I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. ``` Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D5386 llvm-svn: 218066	2014-09-18 18:56:04 +00:00
Robin Morisset	bf26f8fd56	Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors" It is breaking the build on the buildbots but works fine on my machine, I revert while trying to understand what happens (it appears to depend on the compiler used to build, I probably used a C++11 feature that is not perfectly supported by some of the buildbots). This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc. llvm-svn: 217973	2014-09-17 18:09:13 +00:00
Robin Morisset	1c8a457575	[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors Summary: I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5304 llvm-svn: 217965	2014-09-17 17:41:16 +00:00
Robin Morisset	25c8e318e4	[X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass This required a new hook called hasLoadLinkedStoreConditional to know whether to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to CmpXchg (X86). Apart from that, the new code in AtomicExpandPass is mostly moved from X86AtomicExpandPass. The main result of this patch is to get rid of that pass, which had lots of code duplicated with AtomicExpandPass. llvm-svn: 217928	2014-09-17 00:06:58 +00:00
Aaron Ballman	169eeb913d	Silencing a usually-helpful-but-braindead-silly-in-this-case sign mismatch warning with MSVC. NFC. llvm-svn: 217143	2014-09-04 11:52:24 +00:00
Robin Morisset	ed3d48f161	Refactor AtomicExpandPass and add a generic isAtomic() method to Instruction Summary: Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs. Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving the X86 backend to this pass. This requires a way of easily detecting which instructions are atomic. I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making isAtomic() a method of Instruction implemented by a switch on the opcodes. Test Plan: make check Reviewers: jfb Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D5035 llvm-svn: 217080	2014-09-03 21:29:59 +00:00
Robin Morisset	a47cb411dc	Use target-dependent emitLeading/TrailingFence instead of the target-independent insertLeading/TrailingFence (in AtomicExpandPass) Fixes two latent bugs: - There was no fence inserted before expanded seq_cst load (unsound on Power) - There was only a fence release before seq_cst stores (again unsound, in particular on Power) It is not even clear if this is correct on ARM swift processors (where release fences are DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift as it is not clear whether it is incorrect. I would love to get documentation stating whether it is correct or not. These two bugs were not triggered because Power is not (yet) using this pass, and these behaviours happen to be (mostly?) working on ARM (although they completely butchered the semantics of the llvm IR). See: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html for an example of the problems that can be caused by the second of these bugs. I couldn't see a way of fixing these in a completely target-independent way without adding lots of unnecessary fences on ARM, hence the target-dependent parts of this patch. This patch implements the new target-dependent parts only for ARM (the default of not doing anything is enough for AArch64), other architectures will use this infrastructure in later patches. llvm-svn: 217076	2014-09-03 21:01:03 +00:00
Eric Christopher	79cc1e3ae7	Reinstate "Nuke the old JIT." Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982	2014-09-02 22:28:02 +00:00
Craig Topper	e1d1294853	Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created. llvm-svn: 216525	2014-08-27 05:25:25 +00:00
Craig Topper	4627679cec	Use range based for loops to avoid needing to re-mention SmallPtrSet size. llvm-svn: 216351	2014-08-24 23:23:06 +00:00
Chad Rosier	ad7c910ecf	Revert "ARM: improve RTABI 4.2 conformance on Linux" This reverts commit r215862 due to nightly failures. Will work on getting a reduced test case, but I wanted to get our bots green in the meantime. llvm-svn: 216325	2014-08-23 18:29:43 +00:00
Chad Rosier	d2959362fb	Revert "ARM: mark missing functions from RTABI" This reverts commit r215863. llvm-svn: 216324	2014-08-23 18:29:40 +00:00
Reid Kleckner	2d9bb65b3d	ARM / x86_64 varargs: Don't save regparms in prologue without va_start There's no need to do this if the user doesn't call va_start. In the future, we're going to have thunks that forward these register parameters with musttail calls, and they won't need these spills for handling va_start. Most of the test suite changes are adding va_start calls to existing tests to keep things working. llvm-svn: 216294	2014-08-22 21:59:26 +00:00
Jonathan Roelofs	5e98ff967b	Add a thread-model knob for lowering atomics on baremetal & single threaded systems http://reviews.llvm.org/D4984 llvm-svn: 216182	2014-08-21 14:35:47 +00:00
Oliver Stannard	51b1d460cb	[ARM] Enable DP copy, load and store instructions for FPv4-SP The FPv4-SP floating-point unit is generally referred to as single-precision only, but it does have double-precision registers and load, store and GPR<->DPR move instructions which operate on them. This patch enables the use of these registers, the main advantage of which is that we now comply with the AAPCS-VFP calling convention. This partially reverts r209650, which added some AAPCS-VFP support, but did not handle return values or alignment of double arguments in registers. This patch also adds tests for Thumb2 code generation for floating-point instructions and intrinsics, which previously only existed for ARM. llvm-svn: 216172	2014-08-21 12:50:31 +00:00
Yi Kong	c655f0c898	ARM: Fix codegen for rbit intrinsic LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic. According to ARM ARM, rbit only takes register as argument, not immediate. The correct instruction should be rbit <Rd>, <Rm>. The bug was originally introduced in r211057. Differential Revision: http://reviews.llvm.org/D4980 llvm-svn: 216064	2014-08-20 10:40:20 +00:00
Robin Morisset	b155f529fc	Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backends Summary: Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backends These helper functions are introduced in D4844. Depends D4844 Test Plan: make check-all passes Reviewers: jfb Subscribers: aemerson, llvm-commits, mcrosier, reames Differential Revision: http://reviews.llvm.org/D4937 llvm-svn: 215902	2014-08-18 16:48:58 +00:00
Oliver Stannard	12993dd916	[ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage Externally-defined functions with weak linkage should not be tail-called on ARM or AArch64, as the AAELF spec requires normal calls to undefined weak functions to be replaced with a NOP or jump to the next instruction. The behaviour of branch instructions in this situation (as used for tail calls) is implementation-defined, so we cannot rely on the linker replacing the tail call with a return. llvm-svn: 215890	2014-08-18 12:42:15 +00:00
Saleem Abdulrasool	3fd996ef5c	ARM: mark missing functions from RTABI Simply indicate the functions that are part of the runtime library that we do not setup libcalls for. This is merely for ease of identification. NFC. llvm-svn: 215863	2014-08-17 22:51:04 +00:00
Saleem Abdulrasool	017bd57fce	ARM: improve RTABI 4.2 conformance on Linux The set of functions defined in the RTABI was separated for no real reason. This brings us closer to proper utilisation of the functions defined by the RTABI. It also sets the ground for correctly emitting function calls to AEABI functions on all AEABI conforming platforms. The previously existing lie on the behaviour of __ldivmod and __uldivmod is propagated as it is beyond the scope of the change. The changes to the test are due to the fact that we now use the divmod functions which return both the quotient and remainder and thus we no longer need to invoke two functions on Linux (making it closer to EABI's behaviour). llvm-svn: 215862	2014-08-17 22:51:02 +00:00
Saleem Abdulrasool	740be89f51	ARM: whitespace Whitespace fix, NFC. llvm-svn: 215861	2014-08-17 22:50:59 +00:00
Robin Morisset	d18cda620c	Fix typos in comments llvm-svn: 215777	2014-08-15 22:17:28 +00:00
Chad Rosier	b1bbf6f8ce	[AArch32] Add support for FP rounding operations for ARMv8/AArch32. Phabricator Revision: http://reviews.llvm.org/D4935 llvm-svn: 215772	2014-08-15 21:38:16 +00:00
Justin Bogner	c0087f3611	IR: Print a newline when dumping Types Type::dump() doesn't print a newline, which makes for a poor experience in a debugger. This looks like it was an ommission considering Value::dump() two lines above, so I've changed Type to add a newline as well. Of the two in-tree callers, one added a newline anyway, and I've updated the other one to use Type::print instead. llvm-svn: 215421	2014-08-12 03:24:59 +00:00
Oliver Stannard	11790b2dac	ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention By default, LLVM uses the "C" calling convention for all runtime library functions. The half-precision FP conversion functions use the soft-float calling convention, and are needed for some targets which use the hard-float convention by default, so must have their calling convention explicitly set. llvm-svn: 215348	2014-08-11 09:12:32 +00:00
Eric Christopher	b9fd9ed37e	Temporarily Revert "Nuke the old JIT." as it's not quite ready to be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154	2014-08-07 22:02:54 +00:00
Rafael Espindola	f8b27c41e8	Nuke the old JIT. I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111	2014-08-07 14:21:18 +00:00
Eric Christopher	b5217507c7	Remove the target machine from CCState. Previously it was only used to get the subtarget and that's accessible from the MachineFunction now. This helps clear the way for smaller changes where we getting a subtarget will require passing in a MachineFunction/Function as well. llvm-svn: 214988	2014-08-06 18:45:26 +00:00
Tim Northover	2a417b96d4	ARM: do not generate BLX instructions on Cortex-M CPUs. Particularly on MachO, we were generating "blx _dest" instructions on M-class CPUs, which don't actually exist. They happen to get fixed up by the linker into valid "bl _dest" instructions (which is why such a massive issue has remained largely undetected), but we shouldn't rely on that. llvm-svn: 214959	2014-08-06 11:13:14 +00:00
Tim Northover	d4d294dd51	ARM-MachO: materialize callee address correctly on v4t. llvm-svn: 214958	2014-08-06 11:13:06 +00:00
Jonathan Roelofs	ef84bda531	Re-apply r214881: Fix return sequence on armv4 thumb This reverts r214893, re-applying r214881 with the test case relaxed a bit to satiate the build bots. POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214928	2014-08-05 21:32:21 +00:00
Jonathan Roelofs	064eb5a177	Revert r214881 because it broke lots of build-bots llvm-svn: 214893	2014-08-05 17:36:05 +00:00
Jonathan Roelofs	f5fad3767b	Fix return sequence on armv4 thumb POP on armv4t cannot be used to change thumb state (unilke later non-m-class architectures), therefore we need a different return sequence that uses 'bx' instead: POP {r3} ADD sp, #offset BX r3 This patch also fixes an issue where the return value in r3 would get clobbered for functions that return 128 bits of data. In that case, we generate this sequence instead: MOV ip, r3 POP {r3} ADD sp, #offset MOV lr, r3 MOV r3, ip BX lr http://reviews.llvm.org/D4748 llvm-svn: 214881	2014-08-05 17:13:17 +00:00
Eric Christopher	d913448b38	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781	2014-08-04 21:25:23 +00:00
Chandler Carruth	3707dda904	[SDAG] Let the DAG combiner take care of dead nodes rather than manually deleting them. This already seems to work, as no tests fail without this. llvm-svn: 214601	2014-08-02 00:19:10 +00:00
Louis Gerbarg	67474e3755	Make sure no loads resulting from load->switch DAGCombine are marked invariant Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449	2014-07-31 21:45:05 +00:00
Tim Northover	4e13a61413	ARM: add __aeabi_d2h for truncation on AEABI systems ARM does actually define the name for this conversion, so we should use it on "-eabi" platforms. llvm-svn: 214176	2014-07-29 09:56:45 +00:00
Matt Arsenault	6f2a526101	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055	2014-07-27 17:46:40 +00:00
Akira Hatanaka	e5b6e0d231	[stack protector] Fix a potential security bug in stack protector where the address of the stack guard was being spilled to the stack. Previously the address of the stack guard would get spilled to the stack if it was impossible to keep it in a register. This patch introduces a new target independent node and pseudo instruction which gets expanded post-RA to a sequence of instructions that load the stack guard value. Register allocator can now just remat the value when it can't keep it in a register. <rdar://problem/12475629> llvm-svn: 213967	2014-07-25 19:31:34 +00:00
Hal Finkel	cc39b67530	AA metadata refactoring (introduce AAMDNodes) In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859	2014-07-24 12:16:19 +00:00
Saleem Abdulrasool	c4e00289a7	ARM: correct WoA __builtin_alloca handling on O0 When performing a dynamic stack adjustment without optimisations, we would mark SP as def and R4 as kill. This occurred as part of the expansion of a WIN__CHKSTK SDNode which indicated the proper handling of SP and R4. The result would be that we would double define SP as part of an operation, which is obviously incorrect. Furthermore, the VTList for the chain had an incorrect parameter type of i32 instead of Other. Correct these to permit proper lowering of __builtin_alloca at -O0. llvm-svn: 213442	2014-07-19 01:29:51 +00:00
Tim Northover	4e80b584fe	ARM: support legalisation of "fptrunc ... to half" operations. llvm-svn: 213373	2014-07-18 13:01:19 +00:00
Tim Northover	53f3bcf0e9	ARM: support direct f16 <-> f64 conversions ARMv8 has instructions to handle it, otherwise a libcall is needed. llvm-svn: 213254	2014-07-17 11:27:04 +00:00
Tim Northover	fd7e424935	CodeGen: extend f16 conversions to permit types > float. This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248	2014-07-17 10:51:23 +00:00
Saleem Abdulrasool	763f9a50a5	ARM: properly lower dllimport'ed global values This completes the handling for DLL import storage symbols when lowering instructions. A DLL import storage symbol must have an additional load performed prior to use. This is applicable to variables and functions. This is particularly important for non-function symbols as it is possible to handle function references by emitting a thunk which performs the translation from the unprefixed __imp_ symbol to the proper symbol (although, this is a non-optimal lowering). For a variable symbol, no such thunk can be accommodated. llvm-svn: 212431	2014-07-07 05:18:35 +00:00
Eric Christopher	c1058df66f	Move function dependent resetting of a subtarget variable out of the subtarget. This involved having the movt predicate take the current function - since we care about size in instruction selection for whether or not to use movw/movt take the function so we can check the attributes. This required adding the current MachineFunction to FastISel and propagating through. llvm-svn: 212309	2014-07-04 01:55:26 +00:00
Juergen Ributzka	3bd03c7099	[DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI. The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135	2014-07-01 22:01:54 +00:00
Christian Pirker	6f81e75dab	ARMEB: Vector extend operations Reviewed at http://reviews.llvm.org/D4043 llvm-svn: 211520	2014-06-23 18:05:53 +00:00
Alp Toker	1d099d9339	Fix typos llvm-svn: 211304	2014-06-19 19:41:26 +00:00
Jim Grosbach	07393ba31b	ARM: intrinsic support for rbit. We already have an ARMISD node. Create an intrinsic to map to it so we can add support for the frontend __rbit() intrinsic. rdar://9283021 llvm-svn: 211057	2014-06-16 21:55:30 +00:00
Tim Northover	b45c3b74b4	ARM: implement correct atomic operations on v7M ARM v7M has ldrex/strex but not ldrexd/strexd. This means 32-bit operations should work as normal, but 64-bit ones are almost certainly doomed. Patch by Phoebe Buckheister. llvm-svn: 211042	2014-06-16 18:49:36 +00:00
Christian Pirker	2cc1cf0d4b	ARMEB: Fix trunc store for vector types Reviewed at http://reviews.llvm.org/D4135 llvm-svn: 211010	2014-06-16 09:17:30 +00:00
Oliver Stannard	b5e596f7c3	ARM: Fix fastcc calling convention for Thumb1 When targetting Thumb1 on a processor which has a VFP unit (which is not accessible from Thumb1), we were converting the fastcc calling convention to AAPCS-VFP, which is not possible. llvm-svn: 210889	2014-06-13 08:33:03 +00:00
Saleem Abdulrasool	abac6e92a0	ARM: add VLA extension for WoA Itanium ABI The armv7-windows-itanium environment is nearly identical to the MSVC ABI. It has a few divergences, mostly revolving around the use of the Itanium ABI for C++. VLA support is one of the extensions that are amongst the set of the extensions. This adds support for proper VLA emission for this environment. This is somewhat similar to the handling for __chkstk emission on X86 and the large stack frame emission for ARM. The invocation style for chkstk is still controlled via the -mcmodel flag to clang. Make an explicit note that this is an extension. llvm-svn: 210489	2014-06-09 20:18:42 +00:00
Saleem Abdulrasool	90386ad60a	ARM: correct assertion for long-calls on WoA COFF/PE, so the relocation model is never static. Loosen the assertion accordingly. The relocation can still be emitted properly, as it will be converted to an IMAGE_REL_ARM_ADDR32 which will be resolved by the loader taking the base relocation into account. This is necessary to permit the emission of long calls which can be controlled via the -mlong-calls option in the driver. llvm-svn: 210399	2014-06-07 20:29:27 +00:00
Christian Pirker	762b2c624f	ARMEB: Fix function return type f64 Reviewed at http://reviews.llvm.org/D3968 llvm-svn: 209990	2014-06-01 09:30:52 +00:00
Eric Christopher	8995833a34	Have the TLOF creation take a Triple rather than needing a subtarget. llvm-svn: 209937	2014-05-31 00:07:32 +00:00
Tim Northover	4f1909f1da	ARM: teach AAPCS-VFP to deal with Cortex-M4. Cortex-M4 only has single-precision floating point support, so any LLVM "double" type will have been split into 2 i32s by now. Fortunately, the consecutive-register framework turns out to be precisely what's needed to reconstruct the double and follow AAPCS-VFP correctly! rdar://problem/17012966 llvm-svn: 209650	2014-05-27 10:43:38 +00:00
Benjamin Kramer	f3ad23551d	SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123	2014-05-19 13:12:38 +00:00
Saleem Abdulrasool	8bfb192ecd	ARM: make libcall setup more table driven Rather than create a series of function calls to setup the library calls, create a table with the information and just use the table to drive the configuration of the library calls. This makes it easier to both inspect the list as well as to modify it. NFC. llvm-svn: 209089	2014-05-18 16:39:11 +00:00
Saleem Abdulrasool	f11f4b4e20	ARM: consolidate frame pointer register knowledge Use the ARMBaseRegisterInfo to query the frame register. The base register info is aware of the frame register that is used for the frame pointer. Use that to determine the frame register rather than duplicating the knowledge. Although, the code path is slightly different in that it may return SP, that can only occur if the frame pointer has been omitted in the machine function, which is supposed to contain the desired value in that case. llvm-svn: 209084	2014-05-18 03:18:09 +00:00
Saleem Abdulrasool	f3a5a5c546	Target: remove old constructors for CallLoweringInfo This is mostly a mechanical change changing all the call sites to the newer chained-function construction pattern. This removes the horrible 15-parameter constructor for the CallLoweringInfo in favour of setting properties of the call via chained functions. No functional change beyond the removal of the old constructors are intended. llvm-svn: 209082	2014-05-17 21:50:17 +00:00
Saleem Abdulrasool	46fed305db	ARM: use the proper target object format for WoA WoA uses COFF, not ELF. ARMISelLowering::createTLOF would previously return ELF for any non-MachO platform. This was a missed site when the original change for target format support for Windows on ARM was done. llvm-svn: 209057	2014-05-17 04:28:08 +00:00
Saleem Abdulrasool	056fc3da4a	ARM: add some integer/floating point conversion libcalls Add some Windows on ARM specific library calls. These are provided by msvcrt, and can be used to perform integer to floating-point conversions (and vice-versa) mirroring similar functions in the RTABI. llvm-svn: 208949	2014-05-16 05:41:33 +00:00
Jay Foad	a0653a3e6c	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
Christian Pirker	6692e7c116	ARM-BE: test files for vector argument passing Reviewed at http://reviews.llvm.org/D3766 llvm-svn: 208793	2014-05-14 16:59:44 +00:00
Christian Pirker	238c7c165b	ARM: Implement big endian bit-conversion for NEON type llvm-svn: 208538	2014-05-12 11:19:20 +00:00
Hal Finkel	f0e086a0bc	Pass the value type to TLI::getRegisterByName We must validate the value type in TLI::getRegisterByName, because if we don't and the wrong type was used with the IR intrinsic, then we'll assert (because we won't be able to find a valid register class with which to construct the requested copy operation). For PPC64, additionally, the type information is necessary to decide between the 64-bit register and the 32-bit subregister. No functionality change. llvm-svn: 208508	2014-05-11 19:29:07 +00:00
Louis Gerbarg	3342bf1451	Add custom lowering for add/sub with overflow intrinsics to ARM This patch adds support to ARM for custom lowering of the llvm.{u\|s}add.with.overflow.i32 intrinsics for i32/i64. This is particularly useful for handling idiomatic saturating math functions as generated by InstCombineCompare. Test cases included. rdar://14853450 llvm-svn: 208435	2014-05-09 17:02:49 +00:00
Oliver Stannard	c24f2171ca	ARM: HFAs must be passed in consecutive registers When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must be passed in a block of consecutive floating-point registers, or on the stack. This means that unused floating-point registers cannot be back-filled with part of an HFA, however this can currently happen. This patch, along with the corresponding clang patch (http://reviews.llvm.org/D3083) prevents this. llvm-svn: 208413	2014-05-09 14:01:47 +00:00
Saleem Abdulrasool	40bca0afab	ARM: support PIC on Windows on ARM Handle lowering of global addresses for PIC mode compilation on Windows. Always use the movw/movt load to load the address as Windows on ARM requires ARMv7+ and is a pure Thumb environment. llvm-svn: 208385	2014-05-09 00:58:32 +00:00
Christian Pirker	b5728191c2	ARM big endian function argument passing llvm-svn: 208316	2014-05-08 14:06:24 +00:00
Renato Golin	c7aea40ec6	Implememting named register intrinsics This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104	2014-05-06 16:51:25 +00:00
Craig Topper	64941d9786	Convert SelectionDAG::getMergeValues to use ArrayRef. llvm-svn: 207374	2014-04-27 19:20:57 +00:00
Craig Topper	206fcd450a	Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size. llvm-svn: 207329	2014-04-26 19:29:41 +00:00
Craig Topper	48d114bed1	Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>. llvm-svn: 207327	2014-04-26 18:35:24 +00:00
Benjamin Kramer	4dae598bc8	DAGCombiner: Turn divs of vector splats into vectorized multiplications. Otherwise the legalizer would just scalarize everything. Support for mulhi in the targets isn't that great yet so on most targets we get exactly the same scalarized output. Add a test for x86 vector udiv. I had to disable the mulhi nodes on ARM because there aren't any patterns for it. As far as I know ARM has instructions for getting the high part of a multiply so this should be fixed. llvm-svn: 207315	2014-04-26 12:06:28 +00:00
Craig Topper	062a2baef0	[C++] Use 'nullptr'. Target edition. llvm-svn: 207197	2014-04-25 05:30:21 +00:00
Reid Kleckner	5772b77789	Add 'musttail' marker to call instructions This is similar to the 'tail' marker, except that it guarantees that tail call optimization will occur. It also comes with convervative IR verification rules that ensure that tail call optimization is possible. Reviewers: nicholas Differential Revision: http://llvm-reviews.chandlerc.com/D3240 llvm-svn: 207143	2014-04-24 20:14:34 +00:00
Tim Northover	978d25f391	ARM: disable emission of __XYZvfp in soft-float environment. The point of these calls is to allow Thumb-1 code to make use of the VFP unit to perform its operations. This is not desirable with -msoft-float, since most of the reasons you'd want that apply equally to the runtime library. rdar://problem/13766161 llvm-svn: 206874	2014-04-22 10:10:09 +00:00
Chandler Carruth	84e68b2994	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all of the header #include lines, lib/Target/... edition. llvm-svn: 206842	2014-04-22 02:41:26 +00:00
Tim Northover	037f26f212	Atomics: promote ARM's IR-based atomics pass to CodeGen. Still only 32-bit ARM using it at this stage, but the promotion allows direct testing via opt and is a reasonably self-contained patch on the way to switching ARM64. At this point, other targets should be able to make use of it without too much difficulty if they want. (See ARM64 commit coming soon for an example). llvm-svn: 206485	2014-04-17 18:22:47 +00:00
Craig Topper	abb4ac7f87	Convert SelectionDAG::getVTList to use ArrayRef llvm-svn: 206357	2014-04-16 06:10:51 +00:00
Craig Topper	840beec2d0	Make consistent use of MCPhysReg instead of uint16_t throughout the tree. llvm-svn: 205610	2014-04-04 05:16:06 +00:00
Jim Grosbach	1a59711505	Tidy up. Trailing whitespace. llvm-svn: 205583	2014-04-03 23:43:18 +00:00
Tim Northover	01b4aa9437	ARM: tell LLVM about zext properties of ldrexb/ldrexh Implementing this via ComputeMaskedBits has two advantages: + It actually works. DAGISel doesn't deal with the chains properly in the previous pattern-based solution, so they never trigger. + The information can be used in other DAG combines, as well as the trivial "get rid of truncs". For example if the trunc is in a different basic block. rdar://problem/16227836 llvm-svn: 205540	2014-04-03 15:10:35 +00:00
Tim Northover	c882eb0723	ARM: expand atomic ldrex/strex loops in IR The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded at MachineInstr emission time had grown to be extremely large and involved, to account for the subtly different code needed for the various flavours (8/16/32/64 bit, cmpxchg/add/minmax). Moving this transformation into the IR clears up the code substantially, and makes future optimisations much easier: 1. an atomicrmw followed by using the new value can be more efficient. As an IR pass, simple CSE could handle this efficiently. 2. Making use of cmpxchg success/failure orderings only has to be done in one (simpler) place. 3. The common "cmpxchg; did we store?" idiom can be exposed to optimisation. I intend to gradually improve this situation within the ARM backend and make sure there are no hidden issues before moving the code out into CodeGen to be shared with (at least ARM64/AArch64, though I think PPC & Mips could benefit too). llvm-svn: 205525	2014-04-03 11:44:58 +00:00
Silviu Baranga	a3106e6847	[ARM] When generating a vpaddl node the input lane type is not always the type of the add operation since extract_vector_elt can perform an extend operation. Get the input lane type from the vector on which we're performing the vpaddl operation on and extend or truncate it to the output type of the original add node. llvm-svn: 205523	2014-04-03 10:44:27 +00:00
Saleem Abdulrasool	cd1308296e	ARM: update subtarget information for Windows on ARM Update the subtarget information for Windows on ARM. This enables using the MC layer to target Windows on ARM. llvm-svn: 205459	2014-04-02 20:32:05 +00:00
Oliver Stannard	b14c625111	ARM: Add support for segmented stacks Patch by Alex Crichton, ILyoan, Luqman Aden and Svetoslav. llvm-svn: 205430	2014-04-02 16:10:33 +00:00
Tim Northover	1ff5f29fb5	ARM: add intrinsics for the v8 ldaex/stlex We've already got versions without the barriers, so this just adds IR-level support for generating the new v8 ones. rdar://problem/16227836 llvm-svn: 204813	2014-03-26 14:39:31 +00:00
Arnaud A. de Grandmaison	1182600f20	ARM: no need to update SplatBits as it is not used llvm-svn: 204575	2014-03-23 21:14:32 +00:00
Craig Topper	a9253267a9	Prune includes in ARM target. llvm-svn: 204548	2014-03-22 23:51:00 +00:00
Saleem Abdulrasool	0d96f3dd6e	ARM: honour -f{no-,}optimize-sibling-calls Use the options in the ARMISelLowering to control whether tail calls are optimised or not. Previously, this option was entirely ignored on the ARM target and only honoured on x86. This option is mostly useful in profiling scenarios. The default remains that tail call optimisations will be applied. llvm-svn: 203577	2014-03-11 15:09:54 +00:00
Saleem Abdulrasool	b720a6bab7	ARM: remove ancient -arm-tail-calls option This option is from 2010, designed to work around a linker issue on Darwin for ARM. According to grosbach this is no longer an issue and this option can safely be removed. llvm-svn: 203576	2014-03-11 15:09:49 +00:00
Tim Northover	445dd58aae	ARM: simplify EmitAtomicBinary64 ATOMIC_STORE operations always get here as a lowered ATOMIC_SWAP, so there's no need for any code to handle them specially. There should be no functionality change so no tests. llvm-svn: 203567	2014-03-11 13:19:55 +00:00
Tim Northover	e94a518a22	IR: add a second ordering operand to cmpxhg for failure The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559	2014-03-11 10:48:52 +00:00
Oliver Stannard	d55e115b58	ARM: Correctly align arguments after a byval struct is passed on the stack llvm-svn: 202985	2014-03-05 15:25:27 +00:00
Benjamin Kramer	b6d0bd48bd	[C++11] Replace llvm::next and llvm::prior with std::next and std::prev. Remove the old functions. llvm-svn: 202636	2014-03-02 12:27:27 +00:00
Mark Seaborn	be266aa325	Use 16 byte stack alignment for NaCl on ARM NaCl's ARM ABI uses 16 byte stack alignment, so set that in ARMSubtarget.cpp. Using 16 byte alignment exposes an issue in code generation in which a varargs function leaves a 4 byte gap between the values of r1-r3 saved to the stack and the following arguments that were passed on the stack. (Previously, this code only needed to support 4 byte and 8 byte alignment.) With this issue, llc generated: varargs_func: sub sp, sp, #16 push {lr} sub sp, sp, #12 add r0, sp, #16 // Should be 20 stm r0, {r1, r2, r3} ldr r0, .LCPI0_0 // Address of va_list add r1, sp, #16 str r1, [r0] bl external_func Fix the bug by checking for "Align > 4". Also simplify the code by using OffsetToAlignment(), and update comments. Differential Revision: http://llvm-reviews.chandlerc.com/D2677 llvm-svn: 201497	2014-02-16 18:59:48 +00:00
Tim Northover	b0430415e6	ARM: use natural LLVM IR for vshll instructions Similarly to the vshrn instructions, these are simple zext/sext + trunc operations. Using normal LLVM IR should allow for better code, and more sharing with the AArch64 backend. llvm-svn: 201093	2014-02-10 16:20:29 +00:00

... 2 3 4 5 6 ...

1339 Commits