llvm-project

Commit Graph

Author	SHA1	Message	Date
Ahmed Bougacha	1ffe7c7d36	[AArch64] Promote f16 operations to f32. For the most common ones (such as fadd), we already did the promotion. Do the same thing for all the others. Currently, we'll just crash/assert on all these operations, as there's no hardware or libcall support whatsoever. f16 (half) is specified as an interchange - not arithmetic - format, and is expected to be promoted to single-precision for arithmetic operations. While there, teach the legalizer about promoting some of the (mostly floating-point) operations that we never needed before. Differential Revision: http://reviews.llvm.org/D8648 See related discussion on the thread for: http://reviews.llvm.org/D8755 llvm-svn: 234550	2015-04-10 00:08:48 +00:00
Lang Hames	522bf13b83	[AArch64] Remove redundant -march option. Also fix a think-o from r234462. llvm-svn: 234467	2015-04-09 05:34:57 +00:00
Lang Hames	903338511b	[AArch64] Teach AArch64TargetLowering::getOptimalMemOpType to consider alignment restrictions when choosing a type for small-memcpy inlining in SelectionDAGBuilder. This ensures that the loads and stores output for the memcpy won't be further expanded during legalization, which would cause the total number of instructions for the memcpy to exceed (often significantly) the inlining thresholds. <rdar://problem/17829180> llvm-svn: 234462	2015-04-09 03:40:33 +00:00
Matthias Braun	b6ac8fa39e	AArch64: Don't lower ISD::SELECT to ISD::SELECT_CC Instead of lowering SELECT to SELECT_CC which is further lowered later immediately call the SELECT_CC lowering code. This is preferable because: - Avoids an unnecessary roundtrip through the legalization queues with an intermediate node. - More importantly: Lowered operations get visited last leading to SELECT_CC getting visited with legalized operands and unlegalized ones for preexisting SELECT_CC nodes. This does not hurt the current code (hence no testcase) but is required for another patch I am working on. Differential Revision: http://reviews.llvm.org/D8187 llvm-svn: 234334	2015-04-07 17:33:05 +00:00
Quentin Colombet	6843ac470b	[AArch64] Enable the codegenprepare optimization that promotes operation to form extended loads. Implement the related target lowering hook so that the optimization has a better estimation of the cost of an extension. rdar://problem/19267165 llvm-svn: 233753	2015-03-31 20:52:32 +00:00
David Blaikie	186d2cbd1d	Refactor: Simplify boolean expressions in AArch64 target Simplify boolean expressions using `true` and `false` with `clang-tidy` Patch by Richard Thomson. Reviewed By: rengolin Differential Revision: http://reviews.llvm.org/D8525 llvm-svn: 233089	2015-03-24 16:24:01 +00:00
Ahmed Bougacha	e6bb09ac3f	[AArch64] Prefer UZP for concat_vector of illegal truncs. Follow-up to r232459: prefer a UZP shuffle to the intermediate truncs. llvm-svn: 232871	2015-03-21 01:08:39 +00:00
Pirama Arumuga Nainar	12aeefc63b	Fix bug while building FP16 constant vectors for AArch64 Summary: Building FP16 constant vectors caused the FP16 data to be bitcast to i64. This patch creates a BITCAST node with the correct value, and adds a test to verify correct handling. Reviewers: mcrosier Reviewed By: mcrosier Subscribers: mcrosier, jmolloy, ab, srhines, llvm-commits, rengolin, aemerson Differential Revision: http://reviews.llvm.org/D8369 llvm-svn: 232562	2015-03-17 23:10:29 +00:00
NAKAMURA Takumi	c085eca176	Appease AArch64ISelLowering.cpp miscompiled by g++-4.7.2. I will revert this when 4.7.3 is ready. llvm-svn: 232561	2015-03-17 22:55:01 +00:00
Ahmed Bougacha	e0afb1fe6c	[AArch64] Use intermediate step for concat_vectors of illegal truncs. Optimize concat_vectors of truncated vectors, where the intermediate type is illegal, to avoid said illegality, e.g., (v4i16 (concat_vectors (v2i16 (truncate (v2i64))), (v2i16 (truncate (v2i64))))) -> (v4i16 (truncate (v4i32 (concat_vectors (v2i32 (truncate (v2i64))), (v2i32 (truncate (v2i64))))))) This isn't really target-specific, and, as such, would best go in the DAGCombiner. However, ISD::TRUNCATE legality isn't keyed on both input and result type, so we might generate worse code when we don't know better. On AArch64 we know it's fine for v2i64->v4i16 and v4i32->v8i8. rdar://20022387 llvm-svn: 232459	2015-03-17 03:23:09 +00:00
Ahmed Bougacha	e33e6c979c	[AArch64] Factor out N->getOperand()s; format. NFCI. llvm-svn: 232458	2015-03-17 03:19:18 +00:00
Eric Christopher	9deb75d176	Have getCallPreservedMask and getThisCallPreservedMask take a MachineFunction argument so that we can grab subtarget specific features off of it. llvm-svn: 231979	2015-03-11 22:42:13 +00:00
Ahmed Bougacha	fab5892f8b	[AArch64] Avoid going through GPRs for across-vector instructions. This adds new node types for each intrinsic. For instance, for addv, we have AArch64ISD::UADDV, such that: (v4i32 (uaddv ...)) is the same as (v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...)))) that is, (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), (i32 (int_aarch64_neon_uaddv ...)), ssub) In a combine, we transform all such across-vector-lanes intrinsics to: (i32 (extract_vector_elt (uaddv ...), 0)) This has one big advantage: by making the extract_element explicit, we enable the existing patterns for lane-aware instructions to fire. This lets us avoid needlessly going through the GPRs. Consider: uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) { return vmulq_n_u32(a, vaddvq_u32(b)); } We now generate: addv.4s s1, v1 mul.4s v0, v0, v1[0] instead of the previous: addv.4s s1, v1 fmov w8, s1 dup.4s v1, w8 mul.4s v0, v1, v0 rdar://20044838 llvm-svn: 231840	2015-03-10 20:45:38 +00:00
Benjamin Kramer	57a3d084cd	Make static variables const if possible. Makes them go into a read-only section. Or fold them into a initializer list which has the same effect. NFC. llvm-svn: 231598	2015-03-08 16:07:39 +00:00
JF Bastien	f14889ee34	Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded. Summary: In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all. This does not represent a behavioural change and as such no tests were added. Patch by: Richard Diamond. Reviewers: jfb Reviewed By: jfb Subscribers: jfb, aemerson, t.p.northover, llvm-commits Differential Revision: http://reviews.llvm.org/D7713 llvm-svn: 231250	2015-03-04 15:47:57 +00:00
Kristof Beyls	aea8461820	Fix PR22408 - LLVM producing AArch64 TLS relocations that GNU linkers cannot handle yet. As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers ld.bfd and ld.gold currently only support a subset of the whole range of AArch64 ELF TLS relocations. Furthermore, they assume that some of the code sequences to access thread-local variables are produced in a very specific sequence. When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize the instructions. Even if that wouldn't be the case, it's good to produce the exact sequence, as that ensures that linkers can perform optimizing relaxations. This patch: * implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang would grow an -mtls-size option to allow support for both, but that's not part of this patch. * by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation) is added to enable generation of local dynamic code sequence, but is off by default. * makes sure that the exact expected code sequence for local dynamic and general dynamic accesses is produced, by making use of a new pseudo instruction. The patch also removes two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ). llvm-svn: 231227	2015-03-04 09:12:08 +00:00
Chad Rosier	8e38f30e49	[AArch64] When combining constant mul of -3, prefer (sub x, (shl x, N)). This change only effects codegen when the constant is -3. llvm-svn: 231085	2015-03-03 17:31:01 +00:00
Benjamin Kramer	5fbfe2ffdc	Convert push_back loops into append calls. No functionality change intended. llvm-svn: 230849	2015-02-28 13:20:15 +00:00
Eric Christopher	11e4df73c8	getRegForInlineAsmConstraint wants to use TargetRegisterInfo for a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699	2015-02-26 22:38:43 +00:00
Eric Christopher	23a3a7c871	Remove an argument-less call to getSubtargetImpl from TargetLoweringBase. This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583	2015-02-26 00:00:24 +00:00
Eric Christopher	ed47b22951	Rewrite the global merge pass to be subprogram agnostic for now. It was previously using the subtarget to get values for the global offset without actually checking each function as it was generating code. Go ahead and solidify the current behavior and make the existing FIXMEs more prominent. As a note the ARM backend previously had a thumb1 and non-thumb1 set of defaults. Only the former was tested so I've changed the behavior to only use that for now. llvm-svn: 230245	2015-02-23 19:28:45 +00:00
Chad Rosier	543900539f	Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity. This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a fmul from being hoisted in cases where it is more profitable to form a fmsub/fmadd. Phabricator Review: http://reviews.llvm.org/D7299 Patch by Lawrence Hu <lawrence@codeaurora.org> llvm-svn: 230241	2015-02-23 19:15:16 +00:00
Tim Northover	3b6b7ca2bc	CodeGen: convert CCState interface to using ArrayRefs Everyone except R600 was manually passing the length of a static array at each callsite, calculated in a variety of interesting ways. Far easier to let ArrayRef handle that. There should be no functional change, but out of tree targets may have to tweak their calls as with these examples. llvm-svn: 230118	2015-02-21 02:11:17 +00:00
Ahmed Bougacha	4c2b0781a5	[CodeGen] Use ArrayRef instead of std::vector&. NFC. The former lets us use SmallVectors. Do so in ARM and AArch64. llvm-svn: 229925	2015-02-19 23:13:10 +00:00
Benjamin Kramer	6cd780ff21	Prefer SmallVector::append/insert over push_back loops. Same functionality, but hoists the vector growth out of the loop. llvm-svn: 229500	2015-02-17 15:29:18 +00:00
Andrew Trick	05938a5481	AArch64: Safely handle the incoming sret call argument. This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering llvm-svn: 229413	2015-02-16 18:10:47 +00:00
Duncan P. N. Exon Smith	003bb7d96e	AArch64: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229218	2015-02-14 02:09:06 +00:00
Tim Northover	45aa89c925	ARM & AArch64: teach LowerVSETCC that output type size may differ from input. While various DAG combines try to guarantee that a vector SETCC operation will have the same output size as input, there's nothing intrinsic to either creation or LegalizeTypes that actually guarantees it, so the function needs to be ready to handle a mismatch. Fortunately this is easy enough, just extend or truncate the naturally compared result. I couldn't reproduce the failure in other backends that I know have SIMD, so it's probably only an issue for these two due to shared heritage. Should fix PR21645. llvm-svn: 228518	2015-02-08 00:50:47 +00:00
Ahmed Bougacha	df956a2e78	[AArch64] Use the source location of the IR branch when creating Bcc from a conditional branch fed by an add/sub/mul-with-overflow node. We previously used the SDLoc of the overflow node, for no good reason. In some cases, this led to the Bcc and B terminators having different source orders, and DBG_VALUEs being inserted between them. The real issue is with the code that can't handle DBG_VALUEs between terminators: the few places affected by this will be fixed soon. In the meantime, fixing the SDLoc is a positive change no matter what. No tests, as I have no idea how to get .loc emitted for branches? rdar://19347133 llvm-svn: 228463	2015-02-06 23:15:39 +00:00
Hao Liu	e0335d77c3	[AArch64]Fix PR21675, a bug about lowering llvm.ctpop.i32. We should noot use "DAG.getUNDEF(MVT::v8i8)" to get all zero vector. Patch by Wei-cheng Wang. llvm-svn: 227550	2015-01-30 02:13:53 +00:00
Eric Christopher	905f12d96d	Remove getSubtargetImpl from AArch64ISelLowering and cache the correct subtarget by passing it in during the constructor as TargetLowering is Subtarget specific. llvm-svn: 227402	2015-01-29 00:19:42 +00:00
Sanjay Patel	08efcd9039	fix typos; NFC llvm-svn: 227386	2015-01-28 22:37:32 +00:00
Eric Christopher	6c901623c0	Migrate AArch64 except for TTI and AsmPrinter away from getSubtargetImpl. llvm-svn: 227293	2015-01-28 03:51:33 +00:00
Greg Fitzgerald	fa78d08675	[AArch64] Implement GHC calling convention Original patch by Luke Iannini. Minor improvements and test added by Erik de Castro Lopo. Differential Revision: http://reviews.llvm.org/D6877 From: Erik de Castro Lopo <erikd@mega-nerd.com> llvm-svn: 226473	2015-01-19 17:40:05 +00:00
Ahmed Bougacha	2b6917b020	[SelectionDAG] Allow targets to specify legality of extloads' result type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421	2015-01-08 00:51:32 +00:00
Ahmed Bougacha	67dd2d25a3	[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended. A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392	2015-01-07 21:27:10 +00:00
Saleem Abdulrasool	67f729933f	ARM: permit tail calls to weak externals on COFF Weak externals are resolved statically, so we can actually generate the tail call on PE/COFF targets without breaking the requirements. It is questionable whether we want to propagate the current behaviour for MachO as the requirements are part of the ARM ELF specifications, and it seems that prior to the SVN r215890, we would have tail'ed the call. For now, be conservative and only permit it on PE/COFF where the call will always be fully resolved. llvm-svn: 225119	2015-01-03 21:35:00 +00:00
Juergen Ributzka	2326650ceb	[AArch64] MachO large code-model: Materialize FP constants in code. In the large code model we have to first get the address of the GOT entry, load the address of the constant, and then load the constant itself. To avoid these loads and the GOT entry alltogether this commit changes the way how FP constants are materialized in the large code model. The constats are now materialized in a GPR and then bitconverted/moved into the FPR. Reviewed by Tim Northover Fixes rdar://problem/16572564. llvm-svn: 223941	2014-12-10 19:43:32 +00:00
Tim Northover	5e84fe3ed4	AArch64: use explicit MVT::i64 when creating EXTRACT_SUBVECTOR nodes. All our patterns use MVT::i64, but the ISelLowering nodes were inconsistent in their choice. No functional change. llvm-svn: 223551	2014-12-06 00:33:37 +00:00
Weiming Zhao	cc4bf3ff3d	[AArch64] Combining Load and IntToFp should check for neon availability llvm-svn: 223382	2014-12-04 20:25:50 +00:00
Tim Northover	293d414380	AArch64: fix wrong-endian parameter passing. The blocked arguments code didn't take account of the hacks needed to support it. llvm-svn: 223247	2014-12-03 17:49:26 +00:00
Ahmed Bougacha	d0ce058f2c	[AArch64] Don't combine "select (setcc i1 LHS, RHS), vL, vR". r208210 introduced an optimization that improves the vector select codegen by doing the setcc on vectors directly. This is a problem they the setcc operands are i1s, because the optimization would create vectors of i1, which aren't legal. Part of PR21549. Differential Revision: http://reviews.llvm.org/D6308 llvm-svn: 223075	2014-12-01 20:59:00 +00:00
Ahmed Bougacha	879463206e	[AArch64] Fix v2i8->i16 bitcast legalization. r213378 improved f16 bitcasts, so that they go directly through subregs, instead of through the stack. That code now causes an assertion failure for bitcasts from other 16-bits types (most importantly v2i8). Correct that by doing the custom lowering for i16 bitcasts only when the input is an f16. Part of PR21549. Differential Revision: http://reviews.llvm.org/D6307 llvm-svn: 223074	2014-12-01 20:52:32 +00:00
Tim Northover	3c55ccac48	AArch64: treat [N x Ty] as a block during procedure calls. The AAPCS treats small structs and homogeneous floating (or vector) aggregates specially, and guarantees they either get passed as a contiguous block of registers, or prevent any future use of those registers and get passed on the stack. This concept can fit quite neatly into LLVM's own type system, mapping an HFA to [N x float] and so on, and small structs to [N x i64]. Doing so allows front-ends to emit AAPCS compliant code without having to duplicate the register counting logic. llvm-svn: 222903	2014-11-27 21:02:42 +00:00
Hao Liu	44e5d7a131	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510	2014-11-21 06:39:58 +00:00
Reid Kleckner	343c395f11	Fix more instances of -Wsentinel on Windows with s/NULL/nullptr/ Follow up to r221940, where I must not have caught em all. NFC llvm-svn: 222481	2014-11-20 23:51:47 +00:00
Weiming Zhao	7a2d15678e	[Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availability llvm-svn: 222292	2014-11-19 00:29:14 +00:00
Aditya Nandakumar	3053155652	We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed. llvm-svn: 221926	2014-11-13 21:29:21 +00:00
Aditya Nandakumar	a27193297f	This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878	2014-11-13 09:26:31 +00:00
Oliver Stannard	269a275cb4	[AArch64] Fix miscompile of comparison with 0xffffffffffffffff Some literals in the AArch64 backend had 15 'f's rather than 16, causing comparisons with a constant 0xffffffffffffffff to be miscompiled. llvm-svn: 221157	2014-11-03 15:28:40 +00:00

1 2 3 4 5

246 Commits