llvm-project

Commit Graph

Author	SHA1	Message	Date
Chad Rosier	1bbd7fb38e	[AArch64] Add support for generating pre- and post-index load/store pairs. llvm-svn: 248593	2015-09-25 17:48:17 +00:00
Lawrence Hu	cac0b89289	Swap loop invariant GEP with loop variant GEP to allow more LICM. This patch changes the order of GEPs generated by Splitting GEPs pass, specially when one of the GEPs has constant and the base is loop invariant, then we will generate the GEP with constant first when beneficial, to expose more cases for LICM. If originally Splitting GEP generate the following: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 %3 %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032 ... Now it genereates: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 1032 %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3 ... For no-loop cases, the original way of generating GEPs seems to expose more CSE cases, so we don't change the logic for no-loop cases, and only limit our change to the specific case we are interested in. llvm-svn: 248420	2015-09-23 19:25:30 +00:00
Ahmed Bougacha	07a844d758	[AArch64] Emit clrex in the expanded cmpxchg fail block. In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248291	2015-09-22 17:21:44 +00:00
Stephen Canon	8216d88511	Don't raise inexact when lowering ceil, floor, round, trunc. The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions did raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice. Differential Revision: http://reviews.llvm.org/D12969 llvm-svn: 248266	2015-09-22 11:43:17 +00:00
Geoff Berry	43ec15e57e	[AArch64] Improved bitfield instruction selection. Summary: For bitfield insert OR matching, check both operands for larger pattern first before checking for smaller pattern. Add pattern for unsigned bitfield insert-in-zero done with SHL+AND. Resolves PR21631. Reviewers: jmolloy, t.p.northover Subscribers: aemerson, rengolin, llvm-commits, mcrosier Differential Revision: http://reviews.llvm.org/D12908 llvm-svn: 248006	2015-09-18 17:11:53 +00:00
Quentin Colombet	b4c6886215	[ShrinkWrap] Refactor the handling of infinite loop in the analysis. - Strenghten the logic to be sure we hoist the restore point out of the current loop. (The fixes a bug with infinite loop, added as part of the patch.) - Walk over the exit blocks of the current loop to conver to the desired restore point in one iteration of the update loop. llvm-svn: 247958	2015-09-17 23:21:34 +00:00
Jun Bum Lim	34b9bd0435	Improve ISel using across lane min/max reduction In vectorized integer min/max reduction code, the final "reduce" step is sub-optimal. In AArch64, this change wll combine : %svn0 = vector_shuffle %0, undef<2,3,u,u> %smax0 = smax %0, svn0 %svn3 = vector_shuffle %smax0, undef<1,u,u,u> %sc = setcc %smax0, %svn3, gt %n0 = extract_vector_elt %sc, #0 %n1 = extract_vector_elt %smax0, #0 %n2 = extract_vector_elt $smax0, #1 %result = select %n0, %n1, n2 becomes : %1 = smaxv %0 %result = extract_vector_elt %1, 0 This change extends r246790. llvm-svn: 247575	2015-09-14 16:19:52 +00:00
David Blaikie	2f40830dde	[opaque pointer type] Add textual IR support for explicit type parameter for global aliases update.py: import fileinput import sys import re alias_match_prefix = r"(.(?:=\|:\|^)\s(?:external \|)(?:(?:private\|internal\|linkonce\|linkonce_odr\|weak\|weak_odr\|common\|appending\|extern_weak\|available_externally) )?(?:default \|hidden \|protected )?(?:dllimport \|dllexport )?(?:unnamed_addr \|)(?:thread_local(?:$[a-z]$)? )?alias" plain = re.compile(alias_match_prefix + r" (.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|addrspacecast\|\[\[[a-zA-Z]\|\{\{).$)") cast = re.compile(alias_match_prefix + r") ((?:bitcast\|inttoptr\|addrspacecast)\s$. to (.?)(\| addrspace\(\d+$ )\\)\s(?:;.)?$)") gep = re.compile(alias_match_prefix + r") ((?:getelementptr)\s(?:inbounds)?\s$(?P<type>.), (?P=type)(?:\saddrspace\(\d+$\s)?\* .\)\s(?:;.)?$)") def conv(line): m = re.match(cast, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(gep, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(plain, line) if m: return m.group(1) + ", " + m.group(2) + m.group(3) + "" + m.group(4) + "\n" return line for line in sys.stdin: sys.stdout.write(conv(line)) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name .ll \| xargs ./apply.sh llvm-svn: 247378	2015-09-11 03:22:04 +00:00
Silviu Baranga	df9ce8408a	[DAGCombine] Truncate BUILD_VECTOR operators if necessary when constant folding vectors Summary: The BUILD_VECTOR node will truncate its operators to match the type. We need to take this into account when constant folding - we need to perform a truncation before constant folding the elements. This is because the upper bits can change the result, depending on the operation type (for example this is the case for min/max). This change also adds a regression test. Reviewers: jmolloy Subscribers: jmolloy, llvm-commits Differential Revision: http://reviews.llvm.org/D12697 llvm-svn: 247265	2015-09-10 10:34:34 +00:00
Ahmed Bougacha	05541459fa	[AArch64] Match FI+offset in STNP addressing mode. First, we need to teach isFrameOffsetLegal about STNP. It already knew about the STP/LDP variants, but those were probably never exercised, because it's only the load/store optimizer that generates STP/LDP, and the only user of the method is frame lowering, which runs earlier. The STP/LDP cases were wrong: they didn't take into account the fact that they return two results, not one, so the immediate offset will be the 4th operand, not the 3rd. Follow-up to r247234. llvm-svn: 247236	2015-09-10 01:54:43 +00:00
Ahmed Bougacha	c0ac38d584	[AArch64] Match base+offset in STNP addressing mode. Followup to r247231. llvm-svn: 247234	2015-09-10 01:48:29 +00:00
Ahmed Bougacha	b8886b517d	[AArch64] Support selecting STNP. We could go through the load/store optimizer and match STNP where we would have matched a nontemporal-annotated STP, but that's not reliable enough, as an opportunistic optimization. Insetad, we can guarantee emitting STNP, by matching them at ISel. Since there are no single-input nontemporal stores, we have to resort to some high-bits-extracting trickery to generate an STNP from a plain store. Also, we need to support another, LDP/STP-specific addressing mode, base + signed scaled 7-bit immediate offset. For now, only match the base. Let's make it smart separately. Part of PR24086. llvm-svn: 247231	2015-09-10 01:42:28 +00:00
Steven Wu	332eeca1e4	Fix the testcase in r246790 Using generic neon syntax to avoid test failure on apple platforms. llvm-svn: 246833	2015-09-04 01:39:24 +00:00
Chad Rosier	6c36eff1d6	[AArch64] Improve ISel using across lane addition reduction. In vectorized add reduction code, the final "reduce" step is sub-optimal. This change wll combine : ext v1.16b, v0.16b, v0.16b, #8 add v0.4s, v1.4s, v0.4s dup v1.4s, v0.s[1] add v0.4s, v1.4s, v0.4s into addv s0, v0.4s PR21371 http://reviews.llvm.org/D12325 Patch by Jun Bum Lim <junbuml@codeaurora.org>! llvm-svn: 246790	2015-09-03 18:13:57 +00:00
Chad Rosier	08ef462d15	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR." This reverts commit r246769. This appears to have broken Multisource/Benchmarks/tramp3d-v4. llvm-svn: 246782	2015-09-03 16:41:28 +00:00
Chad Rosier	491a1bd998	[AArch64] Improve load/store optimizer to handle LDUR + LDR. This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. llvm-svn: 246769	2015-09-03 14:41:37 +00:00
Sanjay Patel	42574203e5	use "unpredictable" metadata in fast-isel when splitting compares This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch. Differential Revision: http://reviews.llvm.org/D12342 llvm-svn: 246692	2015-09-02 19:23:23 +00:00
Ahmed Bougacha	b0ff6437cb	[AArch64] Lower READCYCLECOUNTER using MRS PMCCTNR_EL0. This matches the ARM behavior. In both cases, the register is part of the optional Performance Monitors extension, so, add the feature, and enable it for the A-class processors we support. Differential Revision: http://reviews.llvm.org/D12425 llvm-svn: 246555	2015-09-01 16:23:45 +00:00
Cong Hou	511298b919	Distribute the weight on the edge from switch to default statement to edges generated in lowering switch. Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors. For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution. There are some exceptions: For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it. For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it. When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it. In other cases, the default weight is evenly distributed to successors. Differential Revision: http://reviews.llvm.org/D12418 llvm-svn: 246522	2015-09-01 01:42:16 +00:00
Hans Wennborg	4a61370b8f	Fix CHECK directives that weren't checking. llvm-svn: 246485	2015-08-31 21:10:35 +00:00
Quentin Colombet	a80b9c824e	[AArch64][CollectLOH] Remove an invalid assertion and add a test case exposing it. rdar://problem/22491525 llvm-svn: 246472	2015-08-31 19:02:00 +00:00
Matthias Braun	0acbd08f3c	AArch64: Fix loads to lower NEON vector lanes using GPR registers The ISelLowering code turned insertion turned the element for the lowest lane of a BUILD_VECTOR into an INSERT_SUBREG, this prohibited the patterns for SCALAR_TO_VECTOR(Load) to match later. Restrict this to cases without a load argument. Reported in rdar://22223823 Differential Revision: http://reviews.llvm.org/D12467 llvm-svn: 246462	2015-08-31 18:25:15 +00:00
Duncan P. N. Exon Smith	814b8e91c7	DI: Require subprogram definitions to be distinct As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram$.* isDefinition: true' \| grep -v test/Bitcode \| xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true$/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327	2015-08-28 20:26:49 +00:00
Quentin Colombet	fa4ecb4b9a	[AArch64][CollectLOH] Fix a regression that prevented us to detect chains of more than 2 instructions. I introduced this regression a while back and did not noticed it because I somehow forgot to push the initial test cases for the pass! Fix that as well! llvm-svn: 246239	2015-08-27 23:47:10 +00:00
Jonathan Roelofs	054026dba2	Fix a case of `CHECK[^:]*$`. http://reviews.llvm.org/D11917 llvm-svn: 246163	2015-08-27 17:03:14 +00:00
Sanjay Patel	1015661edf	fix CHECK-LABEL and wrong label llvm-svn: 245958	2015-08-25 18:12:40 +00:00
Matthias Braun	25fd09a756	AArch64: Fix testcase of r245640 llvm-svn: 245647	2015-08-21 00:23:19 +00:00
Matthias Braun	46e5639806	AArch64: Fix cmp;ccmp ordering When producing conditional compare sequences for or operations we need to negate the operands and the finally tested flags. The thing is if we negate the finally tested flags this equals a logical negation of all previously emitted expressions. There was a case missing where we have to order OR expressions so they get emitted first. This fixes http://llvm.org/PR24459 llvm-svn: 245641	2015-08-20 23:33:34 +00:00
Matthias Braun	266204b7dc	AArch64: Do not create CCMP on multiple users. Create CMP;CCMP sequences from and/or trees does not gain us anything if the and/or tree is materialized to a GP register anyway. While most of the code already checked for hasOneUse() there was one important case missing. llvm-svn: 245640	2015-08-20 23:33:31 +00:00
Juergen Ributzka	b12248e9cd	[AArch64][FastISel] Don't fold shifts with UB. We are already falling back to SelectionDAG when encountering an shift with UB. This adds the same checks for shifts with UB that get folded into arithmetic or logical operations. This fixes rdar://problem/22345295. llvm-svn: 245499	2015-08-19 20:52:55 +00:00
Chih-Hung Hsieh	fdcf541871	Split ARM and AArch64 emutls.ll test Differential Revision: http://reviews.llvm.org/D12127 llvm-svn: 245399	2015-08-19 01:44:51 +00:00
Matthias Braun	fa3b248a66	DAGCombiner: Improve DAGCombiner select normalization The current code normalizes select(C0, x, select(C1, x, y)) towards select(C0\|C1, x, y) if the targets prefers that form. This patch adds an additional rule that if the select(C1, x, y) part already exists in the function then we want to normalize into the other direction because the effects of reusing the existing value are bigger than transforming into the target preferred form. This addresses regressions following r238793, see also: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150727/290272.html Differential Revision: http://reviews.llvm.org/D11616 llvm-svn: 245350	2015-08-18 20:48:36 +00:00
Matthias Braun	2e920bd04f	DAGCombiner: Optimize SELECTs first before turning them into SELECT_CC This is part of http://reviews.llvm.org/D11616 - I just decided to split this up into a separate commit. llvm-svn: 245349	2015-08-18 20:48:29 +00:00
James Molloy	ef183397b1	Generate FMINNAN/FMINNUM/FMAXNAN/FMAXNUM from SDAGBuilder. These only get generated if the target supports them. If one of the variants is not legal and the other is, and it is safe to do so, the other variant will be emitted. For example on AArch32 (V8), we have scalar fminnm but not fmin. Fix up a couple of tests while we're here - one now produces better code, and the other was just plain wrong to start with. llvm-svn: 245196	2015-08-17 07:13:10 +00:00
Simon Pilgrim	0750c84623	[DAGCombiner] Attempt to mask vectors before zero extension instead of after. For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors. (zext (truncate x)) -> (zext (and(x, m)) Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code. This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles. Differential Revision: http://reviews.llvm.org/D11764 llvm-svn: 245160	2015-08-15 13:27:30 +00:00
Ahmed Bougacha	cd35787217	[AArch64] Fix FMLS scalar-indexed-from-2s-after-neg patterns. We canonicalize V64 vectors to V128 through insert_subvector: the other FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't, so we'd fail to match fmls and generate fneg+fmla instead. The vector equivalents are already tested and functional. llvm-svn: 245107	2015-08-14 22:06:05 +00:00
James Molloy	63be198712	[AArch64] FMINNAN/FMAXNAN on f16 is not legal. Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal. I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet. llvm-svn: 245035	2015-08-14 09:08:50 +00:00
James Molloy	e7695bca0f	[AArch64] Small rejig of fmax tests, NFCI. These tests relied on -enable-no-nans-fp-math, whereas soon they'll take their no-nans hint from the FCMP instruction itself, so split the no-nans stuff out into its own test. Also do a slight rejig of instruction order. The old FMIN/MAX backend matching had to deal with looking through casts, which it never did particularly well. Now, instcombine will recognize such patterns and canonicalize the cast outside the select. So modify the test inputs to assume that instcombine has already run. llvm-svn: 244913	2015-08-13 17:28:10 +00:00
Ahmed Bougacha	a196661bb0	[CodeGen] Mark the promoted FCOPYSIGN result FP_ROUND as TRUNCating. Now that we can properly promote mismatched FCOPYSIGNs (r244858), we can mark the FP_ROUND on the result as truncating, to expose folding. FCOPYSIGN doesn't change anything but the sign bit, so (fp_round (fcopysign (fpext a), b)) is equivalent to (modulo the sign bit): (fp_round (fpext a)) which is a no-op. llvm-svn: 244862	2015-08-13 01:32:30 +00:00
Ahmed Bougacha	b2a9ed910e	[AArch64] Cleanup vector-fcopysign.ll test. NFC. llvm-svn: 244861	2015-08-13 01:20:38 +00:00
Ahmed Bougacha	2a97b1bcf8	[AArch64] Also custom-lowering mismatched vector/f16 FCOPYSIGN. We can lower them using our cool tricks if we fpext/fptrunc the second input, like we do for f32/f64. Follow-up to r243924, r243926, and r244858. llvm-svn: 244860	2015-08-13 01:13:56 +00:00
Ahmed Bougacha	40ded502ff	[CodeGen] When Promoting, don't extend the 2nd FCOPYSIGN operand. We don't care about its type, and there's even a combine that'll fold away the FP_EXTEND if we let it run. However, until it does, we'll have something broken like: (f32 (fp_extend (f64 v))) Scalar f16 follow-up to r243924. llvm-svn: 244858	2015-08-13 01:09:43 +00:00
John Brawn	75fc09ddba	Redo "Make global aliases have symbol size equal to their type" r242520 was reverted in r244313 as the expected behaviour of the alias attribute in C is that the alias has the same size as the aliasee. However we can re-introduce adding the size on the alias when the aliasee does not, from a source code or object perspective, exist as a discrete entity. This happens when the aliasee is not a symbol, or when that symbol is private. Differential Revision: http://reviews.llvm.org/D11943 llvm-svn: 244752	2015-08-12 15:05:39 +00:00
John Brawn	0bef27d836	[GlobalMerge] Only emit aliases for internal linkage variables for non-Mach-O On Mach-O emitting aliases for the variables that make up a MergedGlobals variable can cause problems when linking with dead stripping enabled so don't do that, except for external variables where we must emit an alias. llvm-svn: 244748	2015-08-12 13:36:48 +00:00
John Brawn	863bfdbfb4	[GlobalMerge] Use private linkage for MergedGlobals variables Other objects can never reference the MergedGlobals symbol so external linkage is never needed. Using private instead of internal linkage means the object is more similar to what it looks like when global merging is not enabled, with the only difference being that the merged variables are addressed indirectly relative to the start of the section they are in. Also add aliases for merged variables with internal linkage, as this also makes the object be more like what it is when they are not merged. Differential Revision: http://reviews.llvm.org/D11942 llvm-svn: 244615	2015-08-11 15:48:04 +00:00
Sanjay Patel	c454f07eb1	delete FIXME comment; it's fixed llvm-svn: 244605	2015-08-11 14:35:29 +00:00
Sanjay Patel	74ca312666	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244604	2015-08-11 14:31:14 +00:00
Sanjay Patel	52c2691829	add missing test for machine combiner when optimizing for size The minsize test will be fixed in the next commit. llvm-svn: 244603	2015-08-11 14:29:45 +00:00
James Molloy	b7b2a1e9b4	[AArch64] Match fminnum/fmaxnum for vector fminnm/fmaxnm instead of an intrinsic. Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change: - Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant. - f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall! llvm-svn: 244595	2015-08-11 12:06:37 +00:00
Alex Lorenz	2f43dd5a12	StackMap: FastISel: Add an appropriate number of immediate operands to the frame setup instruction. This commit ensures that the stack map lowering code in FastISel adds an appropriate number of immediate operands to the frame setup instruction. The previous code added just one immediate operand, which was fine for a target like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit operands. This caused the machine verifier to report an error when the old code added just one. Reviewers: Juergen Ributzka Differential Revision: http://reviews.llvm.org/D11853 llvm-svn: 244508	2015-08-10 21:27:03 +00:00
Jonathan Roelofs	49e46ce8e2	Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481	2015-08-10 19:01:27 +00:00
John Brawn	64e5a66794	Revert "Make global aliases have symbol size equal to their type" This reverts r242520, as it caused pr24379. Also removes part of the test added by r243874 that checks the size of alias symbols. llvm-svn: 244313	2015-08-07 10:56:21 +00:00
Juergen Ributzka	f09c7a3d0f	[AArch64][FastISel] Always use AND before checking the branch flag. When we are not emitting the condition for the branch, because the condition is in another BB or SDAG did the selection for us, then we have to mask the flag in the register with AND. This is required when the condition comes from a truncate, because SDAG only truncates down to a legal size of i32. This fixes rdar://problem/22161062. llvm-svn: 244291	2015-08-06 22:44:15 +00:00
Juergen Ributzka	9f54dbe7a1	Revert "[AArch64][FastISel] Add more truncation tests." and "[AArch64][FastISel] Always use an AND instruction when truncating to non-legal types." This reverts commit r243198 and 243304. Turns out this wasn't the correct fix for this problem. It works only within FastISel, but fails when the truncate is selected by SDAG. llvm-svn: 244287	2015-08-06 22:13:48 +00:00
Kit Barton	a7bf96ab5c	Fix possible infinite loop in shrink wrapping when searching for save/restore points. There is an infinite loop that can occur in Shrink Wrapping while searching for the Save/Restore points. Part of this search checks whether the save/restore points are located in different loop nests and if so, uses the (post) dominator trees to find the immediate (post) dominator blocks. However, if the current block does not have any immediate (post) dominators then this search will result in an infinite loop. This can occur in code containing an infinite loop. The modification checks whether the immediate (post) dominator is different from the current save/restore block. If it is not, then the search terminates and the current location is not considered as a valid save/restore point for shrink wrapping. Phabricator: http://reviews.llvm.org/D11607 llvm-svn: 244247	2015-08-06 19:01:57 +00:00
Ahmed Bougacha	e0e12db8c8	[AArch64] Add isel support for f16 indexed LD/ST. llvm-svn: 243935	2015-08-04 01:29:38 +00:00
Ahmed Bougacha	b0ae36f0d1	[AArch64] Vector FCOPYSIGN supports Custom-lowering: mark it as such. There's a bunch of code in LowerFCOPYSIGN that does smart lowering, and is actually already vector-aware; let's use it instead of scalarizing! The only interesting change is that for v2f32, we previously always used use v4i32 as the integer vector type. Use v2i32 instead, and mark FCOPYSIGN as Custom. llvm-svn: 243926	2015-08-04 00:42:34 +00:00
Ahmed Bougacha	f65371a235	[CodeGen] Fix FCOPYSIGN legalization to account for mismatched types. We used to legalize it like it's any other binary operations. It's not, because it accepts mismatched operand types. Because of that, we used to hit various asserts and miscompiles. Specialize vector legalizations to, in the worst case, unroll, or, when possible, to just legalize the operand that needs legalization. Scalarization isn't covered, because I can't think of a target where some but not all of the 1-element vector types are to be scalarized. llvm-svn: 243924	2015-08-04 00:32:55 +00:00
Duncan P. N. Exon Smith	55ca964e94	DI: Disallow uniquable DICompileUnits Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s. The backend is liable to start relying on that (if it hasn't already), so make uniquable `DICompileUnit`s illegal and automatically upgrade old bitcode. This is a nice cleanup, since we can remove an unnecessary `DenseSet` (and the associated uniquing info) from `LLVMContextImpl`. Almost all the testcases were updated with this script: git grep -e '= !DICompileUnit' -l -- test \| grep -v test/Bitcode \| xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,' I imagine something similar should work for out-of-tree testcases. llvm-svn: 243885	2015-08-03 17:26:41 +00:00
Duncan P. N. Exon Smith	ed013cd221	DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags, using `DW_TAG_variable` in their place Stop exposing the `tag:` field at all in the assembly format for `DILocalVariable`. Most of the testcase updates were generated by the following sed script: find test/ -name ".ll" -o -name ".mir" \| xargs grep -l 'DILocalVariable' \| xargs sed -i '' \ -e 's/tag: DW_TAG_arg_variable, //' \ -e 's/tag: DW_TAG_auto_variable, //' There were only a handful of tests in `test/Assembly` that I needed to update by hand. (Note: a follow-up could change `DILocalVariable::DILocalVariable()` to set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable` (as appropriate), instead of having that logic magically in the backend in `DbgVariable`. I've added a FIXME to that effect.) llvm-svn: 243774	2015-07-31 18:58:39 +00:00
Geoff Berry	8a7ef3b2ee	[AArch64] Favor extended reg patterns for sub Summary: Favor the extended reg patterns over the shifted reg patterns that match only the operand shift and not the full sign/zero extend and shift. Reviewers: jmolloy, t.p.northover Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D11569 llvm-svn: 243753	2015-07-31 15:55:54 +00:00
Tim Northover	2a9d801fd5	AArch64: use 32-bit MOV rather than UBFX to truncate registers. It's potentially more efficient on Cyclone, and from the optimization guides & schedulers looks like it has no effect on Cortex-A53 or A57. In general you'd expect a MOV to be about the most efficient instruction with its semantics, even though the official "UXTW" alias is really a UBFX. llvm-svn: 243576	2015-07-29 21:34:32 +00:00
Akira Hatanaka	f53b0403f8	[AArch64] Define subtarget feature strict-align. This commit defines subtarget feature strict-align and uses it instead of cl::opt -aarch64-strict-align to decide whether strict alignment should be forced. rdar://problem/21529937 llvm-svn: 243516	2015-07-29 14:17:26 +00:00
Tim Northover	17ae83a25f	AArch64: be careful of large immediates when optimising cmps. llvm-svn: 243492	2015-07-28 22:42:32 +00:00
Chih-Hung Hsieh	9843f406ec	Move unit tests to target specific directories. Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243454	2015-07-28 17:32:49 +00:00
Chih-Hung Hsieh	1e859582d6	Implement target independent TLS compatible with glibc's emutls.c. The 'common' section TLS is not implemented. Current C/C++ TLS variables are not placed in common section. DWARF debug info to get the address of TLS variables is not generated yet. clang and driver changes in http://reviews.llvm.org/D10524 Added -femulated-tls flag to select the emulated TLS model, which will be used for old targets like Android that do not support ELF TLS models. Added TargetLowering::LowerToTLSEmulatedModel as a target-independent function to convert a SDNode of TLS variable address to a function call to __emutls_get_address. Added into lib/Target//ISelLowering.cpp to call LowerToTLSEmulatedModel for TLSModel::Emulated. Although all targets supporting ELF TLS models are enhanced, emulated TLS model has been tested only for Android ELF targets. Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for emulated TLS variables. Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls. TODO: Add proper DIE for emulated TLS variables. Added new unit tests with emulated TLS. Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243438	2015-07-28 16:24:05 +00:00
Geoff Berry	c573bf7a5f	[AArch64] Match float round and convert to int instructions. Summary: Add patterns for doing floating point round with various rounding modes followed by conversion to int as a single FCVT* instruction. Reviewers: t.p.northover, jmolloy Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D11424 llvm-svn: 243422	2015-07-28 15:24:10 +00:00
Adhemerval Zanella	7bc3319d84	Implement __builtin_thread_pointer This path add the aarch64 lowering of __builtin_thread_pointer. It uses the already implemented AArch64ISD::THREAD_POINTER used in TLS generation. llvm-svn: 243412	2015-07-28 13:03:31 +00:00
Akira Hatanaka	2541e0241c	[AArch64] Remove check for Darwin that was needed to decide if x18 should be reserved. The decision to reserve x18 is going to be made solely by the front-end, so it isn't necessary to check if the OS is Darwin in the backend. llvm-svn: 243308	2015-07-27 19:18:47 +00:00
Juergen Ributzka	93d67463a3	[AArch64][FastISel] Add more truncation tests. This is a follow-up to r243198 and adds more truncation tests. llvm-svn: 243304	2015-07-27 19:00:23 +00:00
Juergen Ributzka	6364985b58	[AArch64][FastISel] Always use an AND instruction when truncating to non-legal types. When truncating to non-legal types (such as i16, i8 and i1) always use an AND instruction to mask out the upper bits. This was only done when the source type was an i64, but not when the source type was an i32. This commit fixes this and adds the missing i32 truncate tests. This fixes rdar://problem/21990703. llvm-svn: 243198	2015-07-25 02:16:53 +00:00
Akira Hatanaka	0d4c9ea6e0	[AArch64] Define subtarget feature "reserve-x18", which is used to decide whether register x18 should be reserved. This change is needed because we cannot use a backend option to set cl::opt "aarch64-reserve-x18" when doing LTO. Out-of-tree projects currently using cl::opt option "-aarch64-reserve-x18" to reserve x18 should make changes to add subtarget feature "reserve-x18" to the IR. rdar://problem/21529937 Differential Revision: http://reviews.llvm.org/D11463 llvm-svn: 243186	2015-07-25 00:18:31 +00:00
Weiming Zhao	b33a5557f4	This patch eanble register coalescing to coalesce the following: %vreg2<def> = MOVi32imm 1; GPR32:%vreg2 %W1<def> = COPY %vreg2; GPR32:%vreg2 into: %W1<def> = MOVi32imm 1 Patched by Lawrence Hu (lawrence@codeaurora.org) llvm-svn: 243033	2015-07-23 19:24:53 +00:00
Matthias Braun	e536f4f681	AArch64: Add aditional Cyclone macroop fusion opportunities Related to rdar://19205407 Differential Revision: http://reviews.llvm.org/D10746 llvm-svn: 242724	2015-07-20 22:34:47 +00:00
Matthias Braun	2bd6dd8d54	MachineScheduler: Restrict macroop fusion to data-dependent instructions. Before creating a schedule edge to encourage MacroOpFusion check that: - The predecessor actually writes a register that the branch reads. - The predecessor has no successors in the ScheduleDAG so we can schedule it in front of the branch. This avoids skewing the scheduling heuristic in cases where macroop fusion cannot happen. Differential Revision: http://reviews.llvm.org/D10745 llvm-svn: 242723	2015-07-20 22:34:44 +00:00
Chad Rosier	3da0ea7f5d	[AArch64] Change EON pattern to match more often. Phabricator: http://reviews.llvm.org/D11359 Patch by Geoff Berry <gberry@codeaurora.org> llvm-svn: 242694	2015-07-20 18:42:27 +00:00
Quentin Colombet	11922946fe	[RAGreedy] Add an experimental deferred spilling feature. The idea of deferred spilling is to delay the insertion of spill code until the very end of the allocation. A "candidate" to spill variable might not required to be spilled because of other evictions that happened after this decision was taken. The spirit is similar to the optimistic coloring strategy implemented in Preston and Briggs graph coloring algorithm. For now, this feature is highly experimental. Although correct, it would require much more modification to properly model the effect of spilling. Anyway, this early patch helps prototyping this feature. Note: The test case cannot unfortunately be reduced and is probably fragile. llvm-svn: 242585	2015-07-17 23:04:06 +00:00
John Brawn	9ca9ca2805	Make global aliases have symbol size equal to their type This is mainly for the benefit of GlobalMerge, so that an alias into a MergedGlobals variable has the same size as the original non-merged variable. Differential Revision: http://reviews.llvm.org/D10837 llvm-svn: 242520	2015-07-17 12:12:03 +00:00
Tim Northover	ca0ffc3561	AArch64: make inexact signalling on round Darwin-specific C11 leaves the choice on whether round-to-integer operations set the inexact flag implementation-defined. Darwin does expect it to be set, but this seems to be against the intent of the IEEE document and slower to implement anyway. So it should be opt-in. llvm-svn: 242446	2015-07-16 21:30:21 +00:00
Matthias Braun	af7d7709d6	AArch64: Implement conditional compare sequence matching. This is a new iteration of the reverted r238793 / http://reviews.llvm.org/D8232 which wrongly assumed that any and/or trees can be represented by conditional compare sequences, however there are some restrictions to that. This version fixes this and adds comments that explain exactly what types of and/or trees can actually be implemented as conditional compare sequences. Related to http://llvm.org/PR20927, rdar://18326194 Differential Revision: http://reviews.llvm.org/D10579 llvm-svn: 242436	2015-07-16 20:02:37 +00:00
Alexey Bataev	b9288601a3	[SDAG] Optimize unordered comparison in soft-float mode (patch by Anton Nadolskiy) Current implementation handles unordered comparison poorly in soft-float mode. Consider (a ULE b) which is a <= b. It is lowered to (ledf2(a, b) <= 0 \|\| unorddf2(a, b) != 0) (in general). We can do better job by lowering it to (__gtdf2(a, b) <= 0). Such replacement is true for other CMP's (ult, ugt, uge). In general, we just call same function as for ordered case but negate comparison against zero. Differential Revision: http://reviews.llvm.org/D10804 llvm-svn: 242280	2015-07-15 08:39:35 +00:00
JF Bastien	c8f48c19d3	WebAssembly: fix build breakage. Summary: processFunctionBeforeCalleeSavedScan was renamed to determineCalleeSaves and now takes a BitVector parameter as of rL242165, reviewed in http://reviews.llvm.org/D10909 WebAssembly is still marked as experimental and therefore doesn't build by default. It does, however, grep by default! I notice that processFunctionBeforeCalleeSavedScan is still mentioned in a few comments and error messages, which I also fixed. Reviewers: qcolombet, sunfish Subscribers: jfb, dsanders, hfinkel, MatzeB, llvm-commits Differential Revision: http://reviews.llvm.org/D11199 llvm-svn: 242242	2015-07-14 23:06:07 +00:00
Quentin Colombet	8b984d19f2	[ShrinkWrap][PEI] Do not insert epilogue for unreachable blocks. Although this is not incorrect to insert such code, it is useless and it hurts the binary size. llvm-svn: 241946	2015-07-10 22:09:55 +00:00
Evgeniy Stepanov	00b3020453	Fix AArch64 prologue for empty frame with dynamic allocas. Fixes PR23804: assertion failure in emitPrologue in the case of a function with an empty frame and a dynamic alloca that needs stack realignment. This is a typical case for AddressSanitizer. llvm-svn: 241943	2015-07-10 21:24:07 +00:00
Fiona Glaser	b08ae7affb	ComputeKnownBits: be a bit smarter about ADDs If our two inputs have known top-zero bit counts M and N, we trivially know that the output cannot have any bits set in the top (min(M, N)-1) bits, since nothing could carry past that point. llvm-svn: 241927	2015-07-10 18:29:02 +00:00
Arnaud A. de Grandmaison	f40f99e3a4	[AArch64] Select SBFIZ or UBFIZ instead of left + right shifts And rename LSB to Immr / MSB to Imms to match the ARM ARM terminology. llvm-svn: 241803	2015-07-09 14:33:38 +00:00
Renato Golin	ce4ccdc2fc	Test for 241794 (nest attribute in AArch64) Forgot to git add the test. Patch by Stephen Cross. llvm-svn: 241797	2015-07-09 13:29:35 +00:00
Arnold Schwaighofer	3d43f66c91	Add more nvcasts Tim Northover has told me that they can occur when the compiler cleverly constructs constants - as demonstrated in the test case. rdar://21703486 llvm-svn: 241641	2015-07-07 23:13:18 +00:00
Arnold Schwaighofer	7b315ea3c3	Add CHECK lines to test case llvm-svn: 241619	2015-07-07 19:26:31 +00:00
Arnold Schwaighofer	4bc34b1515	Add a pattern for a nvcast from v2f64 -> v4f32 Since the NvCast is generated by the selection process the concerns about endianess and bit reversal don't apply. rdar://21703486 llvm-svn: 241611	2015-07-07 18:31:55 +00:00
Nadav Rotem	754eb7c563	Fix an overly aggressive assertion in getCopyFromPartsVector. The assertion in getCopyFromPartsVector assumed that the vector 'part' must match the type of argument (arguments are potentially split into multiple parts). However, in some cases the targets return a 'part' of the right size but with a different type. We already handle this case correctly later on and generate a bitcast. This commit just makes sure that we are actually checking the property that we care about. llvm-svn: 241312	2015-07-02 23:23:52 +00:00
Hao Liu	7ec8ee3119	[AArch64] Lower interleaved memory accesses to ldN/stN intrinsics. This patch also adds a function to calculate the cost of interleaved memory accesses. E.g. Lower an interleaved load: %wide.vec = load <8 x i32>, <8 x i32>* %ptr %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> into: %ld2 = { <4 x i32>, <4 x i32> } call llvm.aarch64.neon.ld2(%ptr) %vec0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0 %vec1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1 E.g. Lower an interleaved store: %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11> store <12 x i32> %i.vec, <12 x i32>* %ptr into: %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3> %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7> %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11> call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr) Differential Revision: http://reviews.llvm.org/D10533 llvm-svn: 240754	2015-06-26 02:32:07 +00:00
Eric Christopher	572e03a396	Fix "the the" in comments. llvm-svn: 240112	2015-06-19 01:53:21 +00:00
Yi Jiang	e0b3499db7	Avoid redundant select node in early if-conversion pass llvm-svn: 240072	2015-06-18 22:34:09 +00:00
David Majnemer	7fddeccb8b	Move the personality function from LandingPadInst to Function The personality routine currently lives in the LandingPadInst. This isn't desirable because: - All LandingPadInsts in the same function must have the same personality routine. This means that each LandingPadInst beyond the first has an operand which produces no additional information. - There is ongoing work to introduce EH IR constructs other than LandingPadInst. Moving the personality routine off of any one particular Instruction and onto the parent function seems a lot better than have N different places a personality function can sneak onto an exceptional function. Differential Revision: http://reviews.llvm.org/D10429 llvm-svn: 239940	2015-06-17 20:52:32 +00:00
Ahmed Bougacha	f32991461f	[CodeGenPrepare] Generalize inserted set from truncs to any inst. It's been used before to avoid infinite loops caused by separate CGP optimizations undoing one another. We found one more such issue caused by r238054. To avoid it, generalize the "InsertedTruncs" set to any inst, and use it to avoid touching those again. llvm-svn: 239938	2015-06-17 20:44:32 +00:00
Matthias Braun	8321006d44	Revert "AArch64: Use CMP;CCMP sequences for and/or/setcc trees." The patch triggers a miscompile on SPEC 2006 403.gcc with the (ref) 200.i and scilab.i inputs. I opened PR23866 to track analysis of this. This reverts commit r238793. llvm-svn: 239880	2015-06-17 04:02:32 +00:00
Ahmed Bougacha	8c7754b965	[AArch64] Generalize extract-high DUP extension to MOVI/MVNI. These are really immediate DUPs, and suffer from the same problem with long instructions with a high/2 variant (e.g. smull). By extending a MOVI (or DUP, before this patch), we can avoid an ext on the other operand of the long instruction, e.g. turning: ext.16b v0, v0, v0, #8 movi.4h v1, #0x53 smull.4s v0, v0, v1 into: movi.8h v1, #0x53 smull2.4s v0, v0, v1 While there, add a now-necessary combine to fold (VT NVCAST (VT x)). llvm-svn: 239799	2015-06-16 01:18:14 +00:00
Ahmed Bougacha	d300722b93	[AArch64] Robustize neon-2velem-high test. NFC. llvm-svn: 239798	2015-06-16 01:05:39 +00:00
Evgeny Astigeevich	ff1f4be4c7	On behalf of Alexandros Lamprineas: LLVM targeting aarch64 doesn't correctly produce aligned accesses for non-aligned data at -O0/fast-isel (-mno-unaligned-access). The root cause seems to be in fast-isel not producing unaligned access correctly for -mno-unaligned-access. The patch just aborts fast-isel for loads and stores when -mno-unaligned-access is present. The regression test is updated to check this new test case (-mno-unaligned-access together with fast-isel). Differential Revision: http://reviews.llvm.org/D10360 llvm-svn: 239732	2015-06-15 15:48:44 +00:00
Hao Liu	1c2e89a57a	[AArch64] Delete two empty files, which should be removed by r239713. llvm-svn: 239715	2015-06-15 02:56:40 +00:00
Hao Liu	d0ca8d7edd	[AArch64] Revert r239711 again. We need to discuss how to share code between AArch64 and ARM backend. llvm-svn: 239713	2015-06-15 01:56:40 +00:00
Hao Liu	cb070e3833	[AArch64] Match interleaved memory accesses into ldN/stN instructions. Re-commit after adding "-aarch64-neon-syntax=generic" to fix the failure on OS X. This patch was firstly committed in r239514, then reverted in r239544 because of a syntax incompatible failure on OS X. llvm-svn: 239711	2015-06-15 01:35:49 +00:00
Tim Northover	02cfdbb7f1	AArch64: map bare-metal arm64-macho triple to MachO MC layer. Far better than an assertion about expecting ELF. llvm-svn: 239647	2015-06-12 23:37:11 +00:00
Rafael Espindola	65d37e64a9	This reverts commit r239529 and r239514. Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions." Revert "Fixing MSVC 2013 build error." The test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X. llvm-svn: 239544	2015-06-11 17:30:33 +00:00
Hao Liu	4566d18e89	[AArch64] Match interleaved memory accesses into ldN/stN instructions. Add a pass AArch64InterleavedAccess to identify and match interleaved memory accesses. This pass transforms an interleaved load/store into ldN/stN intrinsic. As Loop Vectorizor disables optimization on interleaved accesses by default, this optimization is also disabled by default. To enable it by "-aarch64-interleaved-access-opt=true" E.g. Transform an interleaved load (Factor = 2): %wide.vec = load <8 x i32>, <8 x i32>* %ptr %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> ; Extract even elements %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> ; Extract odd elements Into: %ld2 = { <4 x i32>, <4 x i32> } call aarch64.neon.ld2(%ptr) %v0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0 %v1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1 E.g. Transform an interleaved store (Factor = 2): %i.vec = shuffle %v0, %v1, <0, 4, 1, 5, 2, 6, 3, 7> ; Interleaved vec store <8 x i32> %i.vec, <8 x i32>* %ptr Into: %v0 = shuffle %i.vec, undef, <0, 1, 2, 3> %v1 = shuffle %i.vec, undef, <4, 5, 6, 7> call void aarch64.neon.st2(%v0, %v1, %ptr) llvm-svn: 239514	2015-06-11 09:05:02 +00:00
Chad Rosier	cf90acc104	[AArch64] Remove an overly conservative check when generating store pairs. Store instructions do not modify register values and therefore it's safe to form a store pair even if the source register has been read in between the two store instructions. Previously, the read of w1 (see below) prevented the formation of a stp. str w0, [x2] ldr w8, [x2, #8] add w0, w8, w1 str w1, [x2, #4] ret We now generate the following code. stp w0, w1, [x2] ldr w8, [x2, #8] add w0, w8, w1 ret All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass. Performance results for SPEC2K were within noise. llvm-svn: 239432	2015-06-09 20:59:41 +00:00
Ahmed Bougacha	8207641251	[GlobalMerge] Take into account minsize on Global users' parents. Now that we can look at users, we can trivially do this: when we would have otherwise disabled GlobalMerge (currently -O<3), we can just run it for minsize functions, as it's usually a codesize win. Differential Revision: http://reviews.llvm.org/D10054 llvm-svn: 239087	2015-06-04 20:39:23 +00:00
James Molloy	37593732a4	Don't create a MIN/MAX node if the underlying compare has more than one use. If the compare in a select pattern has another use then it can't be removed, so we'd just be creating repeated code if we created a min/max node. Spotted by Matt Arsenault! llvm-svn: 239037	2015-06-04 13:48:23 +00:00
Matthias Braun	72b8f74813	AArch64: Use CMP;CCMP sequences for and/or/setcc trees. Previously CCMP/FCCMP instructions were only used by the AArch64ConditionalCompares pass for control flow. This patch uses them for SELECT like instructions as well by matching patterns in ISelLowering. PR20927, rdar://18326194 Differential Revision: http://reviews.llvm.org/D8232 llvm-svn: 238793	2015-06-01 22:31:17 +00:00
Luke Cheeseman	4c476858cc	Removing commited assembly file. llvm-svn: 238742	2015-06-01 13:18:53 +00:00
Luke Cheeseman	85fd06d389	Re-commit of r238201 with fix for building with shared libraries. llvm-svn: 238739	2015-06-01 12:02:47 +00:00
Diego Novillo	bfecc06656	Revert "Re-commit changes in r237579 with fix for bug breaking windows builds." This reverts commit r238201 to fix linking problems in x86 Linux http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150525/278413.html llvm-svn: 238223	2015-05-26 17:45:38 +00:00
Luke Cheeseman	a5d053d6f4	Re-commit changes in r237579 with fix for bug breaking windows builds. llvm-svn: 238201	2015-05-26 13:40:31 +00:00
Ahmed Bougacha	236f9040d0	[AArch64][CGP] Sink zext feeding stxr/stlxr into the same block. The usual CodeGenPrepare trickery, on a target-specific intrinsic. Without this, the expansion of atomics will usually have the zext be hoisted out of the loop, defeating the various patterns we have to catch this precise case. Differential Revision: http://reviews.llvm.org/D9930 llvm-svn: 238054	2015-05-22 21:37:17 +00:00
Ahmed Bougacha	3d2d9d1d91	[AArch64] Robustize atomic cmpxchg test a little more. NFC. We changed the test to test non-constant values in r238049. We can also use CHECK-NEXT to be a little stricter. llvm-svn: 238052	2015-05-22 21:35:14 +00:00
Ahmed Bougacha	df94265963	[AArch64] Robustize atomic cmpxchg test. NFC. Constants are easy to get right the wrong way. llvm-svn: 238049	2015-05-22 21:08:15 +00:00
Chad Rosier	ce8e5abbaf	[AArch64] Enhance the load/store optimizer with target-specific alias analysis. Phabricator: http://reviews.llvm.org/D9863 llvm-svn: 237963	2015-05-21 21:36:46 +00:00
Oliver Stannard	6cb23465e0	Revert r237579, as it broke windows buildbots llvm-svn: 237583	2015-05-18 16:39:16 +00:00
Oliver Stannard	0c553afe6a	[LLVM - ARM/AArch64] Add ACLE special register intrinsics This patch implements LLVM support for the ACLE special register intrinsics in section 10.1, __arm_{w,r}sr{,p,64}. This patch is intended to lower the read/write_register instrinsics, used to implement the special register intrinsics in the clang patch for special register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor registers in AArch32 and AArch64. This is done by inspecting the register string passed to the intrinsic and then lowering to the appropriate instruction. Patch by Luke Cheeseman. Differential Revision: http://reviews.llvm.org/D9699 llvm-svn: 237579	2015-05-18 16:23:33 +00:00
James Molloy	cfb0443af6	Mark SMIN/SMAX/UMIN/UMAX nodes as legal and add patterns for them. The new [SU]{MIN,MAX} SDNodes can be lowered directly to instructions for most NEON datatypes - the big exclusion being v2i64. llvm-svn: 237455	2015-05-15 16:15:57 +00:00
Artyom Skrobov	a70dfe18d3	Re-apply r237247 - [AArch64] Codegen VMAX/VMIN for safe math cases No longer breaks SPEC2000/2006 llvm-svn: 237361	2015-05-14 12:59:46 +00:00
Silviu Baranga	780a3b3be7	Revert r237247 - [AArch64] Codegen VMAX/VMIN.. as it is causing failures in SPEC2000/2006 llvm-svn: 237256	2015-05-13 14:03:18 +00:00
Artyom Skrobov	b526681e08	[AArch64] Codegen VMAX/VMIN for safe math cases llvm-svn: 237247	2015-05-13 12:01:09 +00:00
Sunil Srivastava	d79dfcbc37	Changed renaming of local symbols by inserting a dot vefore the numeric suffix. One code change and several test changes to match that details in http://reviews.llvm.org/D9481 llvm-svn: 237150	2015-05-12 16:47:30 +00:00
NAKAMURA Takumi	7f52d6d82c	llvm/test/CodeGen/AArch64/tailcall_misched_graph.ll: s/REQUIRE/REQUIRES/ llvm-svn: 236928	2015-05-09 05:59:00 +00:00
Arnold Schwaighofer	f54b73d681	ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916	2015-05-08 23:52:00 +00:00
Pete Cooper	85b1c48b20	Clear kill flags on all used registers when sinking instructions. The test here was sinking the AND here to a lower BB: %vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8 TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8 which meant that vreg8 was read after it was killed. This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND. llvm-svn: 236886	2015-05-08 17:54:32 +00:00
Pete Cooper	f52123b454	[AArch64] Fix sext/zext folding in address arithmetic. We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there. Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use. llvm-svn: 236764	2015-05-07 19:21:36 +00:00
Tim Northover	e4310fe946	CodeGen: move over-zealous assert into actual if statement. It's quite possible to encounter an insertvalue instruction that's more deeply nested than the value we're looking for, but when that happens we really mustn't compare beyond the end of the index array. Since I couldn't see any guarantees about what comparisons std::equal makes, we probably need to directly check the size beforehand. In practice, I suspect most std::equal implementations would probably bail early, which would be OK. But just in case... rdar://20834485 llvm-svn: 236635	2015-05-06 20:07:38 +00:00
Quentin Colombet	61b305edfd	[ShrinkWrap] Add (a simplified version) of shrink-wrapping. This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. Context Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. Motivating example Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. Proposed Solution This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. Design Decisions 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507	2015-05-05 17:38:16 +00:00
Tim Northover	851ff69b42	CodeGen: match up correct insertvalue indices when assessing tail calls. When deciding whether a value comes from the aggregate or inserted value of an insertvalue instruction, we compare the indices against those of the location we're interested in. One of the lists needs reversing because the input data is backwards (so that modifications take place at the end of the SmallVector), but we were reversing both before leading to incorrect results. Should fix PR23408 llvm-svn: 236457	2015-05-04 20:41:51 +00:00
Quentin Colombet	0de2346859	[AArch64][FastISel] Variant of the logical instructions that use two input registers cannot write on SP. rdar://problem/20748715 llvm-svn: 236352	2015-05-01 21:34:57 +00:00
Quentin Colombet	9df2fa261b	[AArch64][FastISel] Fix the setting of kill flags for MUL -> UMULH sequences. rdar://problem/20748715 llvm-svn: 236346	2015-05-01 20:57:11 +00:00
Quentin Colombet	329fa890ba	[AArch64] Fix bad register class constraint in fast-isel for TST instruction. rdar://problem/20748715 llvm-svn: 236273	2015-04-30 22:27:20 +00:00
Duncan P. N. Exon Smith	a9308c49ef	IR: Give 'DI' prefix to debug info metadata Finish off PR23080 by renaming the debug info IR constructs from `MD` to `DI`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the previous commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to this commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120	2015-04-29 16:38:44 +00:00
Ahmed Bougacha	c004c60c0a	[AArch64] Also combine vector selects fed by non-i1 SETCCs. After legalization, scalar SETCC has an i32 result type on AArch64. The i1 requirement seems too conservative, replace it with an assert. This also means that we now can run after legalization. That should also be fine, since the ops legalizer runs again after each combine, and all types created all have the same sizes as the (legal) inputs. Exposed by r235917; while there, robustize its tests (bsl also uses the register it defines). llvm-svn: 235922	2015-04-27 21:43:12 +00:00
Ahmed Bougacha	89bba61c84	[AArch64] Don't assert when combining (v3f32 select (setcc f64)). When the setcc has f64 operands, we can't build a vector setcc mask to feed a vselect, because f64 doesn't divide v3f32 evenly. Just bail out when that happens. llvm-svn: 235917	2015-04-27 21:01:20 +00:00
Yaron Keren	de2e2b0214	Teach AArch64\lit.local.cfg the new triple names windows-gnu and windows-msvc. Tests were failing when built with -DLLVM_DEFAULT_TARGET_TRIPLE=i686-pc-windows-gnu. llvm-svn: 235733	2015-04-24 17:14:16 +00:00
Pirama Arumuga Nainar	745615ca00	[AArch64] Add nvcast patterns for v4f16 and v8f16 Summary: Constant stores of f16 vectors can create NvCast nodes from various operand types to v4f16 or v8f16 depending on patterns in the stored constants. This patch adds nvcast rules with v4f16 and v8f16 values. AArchISelLowering::LowerBUILD_VECTOR has the details on which constant patterns generate the nvcast nodes. Reviewers: jmolloy, srhines, ab Subscribers: rengolin, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D9201 llvm-svn: 235610	2015-04-23 17:32:25 +00:00
Pirama Arumuga Nainar	b18815354d	[AArch64] Handle vec4, vec8, vec16 *itofp for half Summary: Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32, v8i8, v8i16 inputs to allow promotion of v4f16 results. Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32, and i64 vectors. Only missing tests are for v16i8 and v16i16 as the shift operations are too complicated to write a proper check sequence. The conversions from v4i64 to v4f16 do not depend on this patch - v4i64 is split and the conversion gets handled while lowering v2i64. I am adding a test here for completeness. Reviewers: aemerson, rengolin, ab, jmolloy, srhines Subscribers: rengolin, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D9166 llvm-svn: 235609	2015-04-23 17:16:27 +00:00
James Molloy	cd2334e86e	[AArch64] Disable complex GEP optimization by default. Enough concerns were raised that this optimization is pessimising some code patterns. The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher. llvm-svn: 235491	2015-04-22 09:11:38 +00:00
Ahmed Bougacha	279e3ee954	[GlobalMerge] Look at uses to create smaller global sets. Instead of merging everything together, look at the users of GlobalVariables, and try to group them by function, to create sets of globals used "together". Using that information, a less-aggressive alternative is to keep merging everything together except globals that are only ever used alone, that is, those for which it's clearly non-profitable to merge with others. In my testing, grouping by Function is too aggressive, but grouping by BasicBlock is too conservative. Anything in-between isn't trivially available, so stick with Function grouping for now. cl::opts are added for testing; both enabled by default. A few of the testcases aren't testing the merging proper, but just various edge cases when merging does occur. Update them to use the previous grouping behavior. Also, one of the tests is unrelated to GlobalMerge; change it accordingly. While there, switch to r234666' flags rather than the brutal -O3. Differential Revision: http://reviews.llvm.org/D8070 llvm-svn: 235249	2015-04-18 01:21:58 +00:00
Ahmed Bougacha	e14a4d487e	[AArch64] Don't force MVT::Untyped when selecting LD1LANEpost. The result is either an Untyped reg sequence, on ldN with N > 1, or just the type of the input vector, on ld1. Don't force Untyped. Instead, just use the type of the reg sequence. This mirrors the behavior of createTuple, which feeds the LD1*_POST. The narrow code path wasn't actually covered by tests, because V64 insert_vector_elt are widened to V128 before the LD1LANEpost combine has the chance to run, usually. The only case where it does run on V64 vectors is if the vector ops legalizer ran. So, tickle the code with a ctpop. Fixes PR23265. llvm-svn: 235243	2015-04-17 23:43:33 +00:00
Ahmed Bougacha	28fc7ca53f	Fix another typo in r235224 testcase. NFC. Third time's the charm! llvm-svn: 235242	2015-04-17 23:38:46 +00:00
Pete Cooper	2bbbd8b543	AArch64: Add test for returning [2 x i64] in registers. NFC. llvm-svn: 235228	2015-04-17 21:31:25 +00:00
Ahmed Bougacha	84b801f6b4	Fix typo in r235224 testcase. NFC. llvm-svn: 235226	2015-04-17 21:11:58 +00:00
Ahmed Bougacha	2448ef5f33	[AArch64] Avoid vector->load dependency cycles when creating LD1post. They would break the SelectionDAG. Note that the opposite load->vector dependency is already obvious in: (LD1post vec, ..) llvm-svn: 235224	2015-04-17 21:02:30 +00:00
James Molloy	a4ff7b2713	Fix TRUNCATE splitting helper logic. This is a followon to r233681 - I'd misunderstood the semantics of FTRUNC, and had confused it with (FP_ROUND ..., 0). Thanks for Ahmed Bougacha for his post-commit review! llvm-svn: 235191	2015-04-17 13:51:40 +00:00
Ahmed Bougacha	941420d9ea	[AArch64] Don't assert on f16 in DUP PerfectShuffle generator. Found by code inspection, but breaking i16 at least breaks other tests. They aren't checking this in particular though, so also add some explicit tests for the already working types. llvm-svn: 235148	2015-04-16 23:57:07 +00:00

1 2 3 4 5 ...

911 Commits