llvm-project

Commit Graph

Author	SHA1	Message	Date
Aaron Ballman	fc64ef1a15	Reverting r260922-260923; they cause link failures with MSVC. http://lab.llvm.org:8011/builders/lldb-x86-windows-msvc2015/builds/15436/steps/build/logs/stdio http://bb.pgr.jp/builders/msbuild-llvmclang-x64-msc18-DA/builds/961/steps/build_llvm/logs/stdio llvm-svn: 260972	2016-02-16 15:29:06 +00:00
Quentin Colombet	1ce38545fb	[GlobalISel] Get rid of the ifdefs in TargetLowering. Introduce a new API used only by GlobalISel: CallLowering. This API will contain target hooks dedicated to call lowering. llvm-svn: 260922	2016-02-16 00:57:44 +00:00
Chad Rosier	026f15e687	[AArch64] Enable post-RA MI scheduler for Kryo. This should have landed in r260686. llvm-svn: 260739	2016-02-12 21:27:33 +00:00
Geoff Berry	c25d3bd238	[AArch64] Reduce number of callee-save save/restores. Summary: Before this change, callee-save registers would be rounded up to even pairs of GPRs and FPRs. This change eliminates these extra padding load/stores, though it does keep the stack allocation the same size unless both the GPR and FPR sets have an odd size, in which case one full pair stack slot (16 bytes) is saved. This optimization cannot currently be done for MachO targets since they rely on a fast-path .debug_frame equivalent that can only encode callee-save registers as pairs. Reviewers: t.p.northover, rengolin, mcrosier, jmolloy Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17000 llvm-svn: 260689	2016-02-12 16:31:41 +00:00
Chad Rosier	cd2be7f084	[AArch64] Add support for Qualcomm Kryo CPU. Machine model description by Dave Estes <cestes@codeaurora.org>. llvm-svn: 260686	2016-02-12 15:51:51 +00:00
Jun Bum Lim	397eb7b0b3	[AArch64] Merge two adjacent str WZR into str XZR Summary: This change merges adjacent 32 bit zero stores into a 64 bit zero store. e.g., str wzr, [x0] str wzr, [x0, #4] becomes str xzr, [x0] Therefore, four adjacent 32 bit zero stores will be a single stp. e.g., str wzr, [x0] str wzr, [x0, #4] str wzr, [x0, #8] str wzr, [x0, #12] becomes stp xzr, xzr, [x0] Reviewers: mcrosier, jmolloy, gberry, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16933 llvm-svn: 260682	2016-02-12 15:25:39 +00:00
Quentin Colombet	1cb8fac171	[AArch64] Implements the lowering of formal arguments for GlobalISel. This is just a trivial implementation: - Support only arguments passed in registers. - Support only "plain" arguments, i.e., no sext/zext attribute. At this point, it is possible to play with the IRTranslator on AArch64: llc -mtriple arm64-<vendor>-<os> -print-machineinstrs <input.ll> -o - -global-isel For now, we only support the translation of program with adds and returns. Follow-up patches are on their way to add a test case (the MIRParser is not ready as it is). llvm-svn: 260600	2016-02-11 21:45:08 +00:00
Quentin Colombet	5cf7b415cc	[AArch64] Trivial implementation of lower return for the IRTranslator. llvm-svn: 260574	2016-02-11 19:45:27 +00:00
Quentin Colombet	d96f49543d	[AArch64] Plug the beginning of the GlobalISel pipeline. llvm-svn: 260569	2016-02-11 19:35:06 +00:00
Jun Bum Lim	633b2d81eb	[AArch64] Refactoring findMatchingStore() in aarch64-ldst-opt; NFC Summary: This change makes findMatchingStore() follow the same coding style introduced in r260275. Reviewers: gberry, junbuml Subscribers: aemerson, rengolin, haicheng, bmakam, mssimpso Differential Revision: http://reviews.llvm.org/D17083 llvm-svn: 260534	2016-02-11 16:18:24 +00:00
Chad Rosier	00f9d23f8e	[AArch64] Improve load/store optimizer to handle LDUR + LDR. This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. This is a reapplication of r259812, which had an incorrect assert. The test_stur_str_no_assert() test is a reduced version of the issue hit in the AArch64 self-host. PR24465 llvm-svn: 260523	2016-02-11 14:25:08 +00:00
Chad Rosier	c3f6cb95f9	[AArch64] Refactor is logic into a helper function. NFC. llvm-svn: 260419	2016-02-10 19:45:48 +00:00
Chad Rosier	9f4ec2ea85	[AArch64] Update comment to match reality. NFC. llvm-svn: 260406	2016-02-10 18:49:28 +00:00
Chad Rosier	fc3bf1f526	[AArch64] This bit of logic is specific to pairing. NFC. llvm-svn: 260383	2016-02-10 15:52:46 +00:00
Ahmed Bougacha	f8dfb47c02	[CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI. llvm-svn: 260316	2016-02-09 22:54:12 +00:00
Chad Rosier	f7cd8ea71f	[AArch64] This check is specific to merging instructions. NFC. llvm-svn: 260283	2016-02-09 21:20:12 +00:00
Geoff Berry	173b14db7c	[AArch64] AArch64LoadStoreOptimizer: fix bug in pre-inc check iterator Summary: Fix case where a pre-inc/dec load/store would not be formed if the add/sub that forms the inc/dec part of the operation was the first instruction in the block being examined. Reviewers: mcrosier, jmolloy, t.p.northover, junbuml Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16785 llvm-svn: 260275	2016-02-09 20:47:21 +00:00
Chad Rosier	cc5d61f98e	[AArch64] Bail even earlier if the instructions modifieds the base register. NFC. llvm-svn: 260274	2016-02-09 20:44:41 +00:00
Chad Rosier	1c44c598dd	[AArch64] Simplify. NFC. llvm-svn: 260273	2016-02-09 20:27:45 +00:00
Chad Rosier	87e3341ff6	[AArch64] Add an assert to ensure we don't scale an offset that can't be scaled. llvm-svn: 260272	2016-02-09 20:18:07 +00:00
Chad Rosier	3f8b09da3f	[AArch64] Add a FIXME about invalid KILL markers after the ld/st opt pass. llvm-svn: 260264	2016-02-09 19:42:19 +00:00
Chad Rosier	c46ef8876b	[AArch64] Remove redundant calls and clang format. NFC. llvm-svn: 260260	2016-02-09 19:33:42 +00:00
Chad Rosier	11eedc98af	[AArch64] Hoist now common logic. NFC. llvm-svn: 260257	2016-02-09 19:17:18 +00:00
Chad Rosier	d7363db659	[AArch64] Rename variable to make it clear we're merging here, not pairing. llvm-svn: 260256	2016-02-09 19:09:22 +00:00
Chad Rosier	b5933d7bde	[AArch64] Separage the codegen logic for widening vs. pairing. NFC. llvm-svn: 260249	2016-02-09 19:02:12 +00:00
Chad Rosier	24c46ad50f	[AArch64] Cleanup to simplify logic when widening vs. pairing loads/stores. NFC. The logic to pair instructions and merge narrow instructions has become cloogy and error prone. This patch beings to unravel these two similar, but distinct optimizations. llvm-svn: 260242	2016-02-09 18:10:20 +00:00
Chad Rosier	5c6a66ce34	[AArch64] Rename variable to improve readability. NFC. llvm-svn: 260228	2016-02-09 15:59:57 +00:00
Chad Rosier	4f28e50dc8	[AArch64] Remove stale comment. llvm-svn: 260226	2016-02-09 15:51:33 +00:00
Tim Northover	e316f76222	AArch64: match correct order in subtraction pattern. The accumulator in multiply-and-subtract instructions is actually subtracted from so these patterns were computing the wrong value. llvm-svn: 260131	2016-02-08 19:33:18 +00:00
Evandro Menezes	d761ca2308	[AArch64] Add the scheduling model for Exynos-M1 Summary: Add the core scheduling model for the Samsung Exynos-M1 (ARMv8-A). Reviewers: jmolloy, rengolin, christof, MinSeongKIM, t.p.northover Subscribers: aemerson, rengolin, MatzeB Differential Revision: http://reviews.llvm.org/D16644 llvm-svn: 259958	2016-02-06 00:01:41 +00:00
Jun Bum Lim	1de2d44dcf	[AArch64] Refactoring aarch64-ldst-opt. NCF. Remove narrow load / store instructions from getMatchingPairOpcode(), and add getMatchingWideOpcode(). llvm-svn: 259914	2016-02-05 20:02:03 +00:00
Renato Golin	6274e5222d	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)." This reverts commit r259812 as it broke AArch64 self-hosting. llvm-svn: 259881	2016-02-05 12:14:30 +00:00
Chad Rosier	35706ad6bb	[AArch64] Bound the number of instructions we scan when searching for updates. This only impacts the creation of pre-/post-index instructions. The bound was set high enough such that it did not change code generation for SPEC200X. llvm-svn: 259828	2016-02-04 21:26:02 +00:00
Chad Rosier	05f8020cdf	[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3). This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769 and r259790. The tramp3d failure was caused by an incorrect refactoring in the patch. Specifically, we weren't always properly clearing the SExtIdx flag. llvm-svn: 259812	2016-02-04 18:59:49 +00:00
Silviu Baranga	33b3bd17dd	[AArch64] Multiply extended 32-bit ints with `[U\|S]MADDL' During instruction selection, the AArch64 backend can recognise the following pattern and generate an [U\|S]MADDL instruction, i.e. a multiply of two 32-bit operands with a 64-bit result: (mul (sext i32), (sext i32)) However, when one of the operands is constant, the sign extension gets folded into the constant in SelectionDAG::getNode(). This means that the instruction selection sees this: (mul (sext i32), i64) ...which doesn't match the pattern. Sign-extension and 64-bit multiply instructions are generated, which are slower than one 32-bit multiply. Add a pattern to match this and generate the correct instruction, for both signed and unsigned multiplies. Patch by Chris Diamand! llvm-svn: 259800	2016-02-04 16:47:09 +00:00
Chad Rosier	18896c0f5e	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR." This reverts commit r259790. tramp3d-v4 is still having problems. llvm-svn: 259795	2016-02-04 16:01:40 +00:00
Chad Rosier	feec2aeb0f	[AArch64] Improve load/store optimizer to handle LDUR + LDR. This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769, which was reverted in r246782 due to a test-suite failure. I'm unable to reproduce the issue at this time. llvm-svn: 259790	2016-02-04 14:42:55 +00:00
Chad Rosier	1142f3cf90	[AArch64] Add a FIXME comment. llvm-svn: 259515	2016-02-02 15:22:55 +00:00
Chad Rosier	bba881ef3d	[AArch64] Allocate the modified and used regs only once per function. llvm-svn: 259510	2016-02-02 15:02:30 +00:00
Chad Rosier	dbdb1d6eaf	Move comments a bit closer to associated code. NFC. llvm-svn: 259411	2016-02-01 21:38:31 +00:00
Chad Rosier	064261da16	Remove extra semicolon. NFC. llvm-svn: 259402	2016-02-01 20:54:36 +00:00
Balaram Makam	92431703d7	AArch64: Implement missed conditional compare sequences. Summary: This is an extension to the existing implementation of r242436 which restricts to only select inputs. This version fixes missed opportunities in pr26084 by attempting to lower conditional compare sequences of and/or trees with setcc leafs. This will additionaly handle the case when a tree with select input is not a conjunction-disjunction tree but some of the sub trees are conjunction-disjunction trees. Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry Differential Revision: http://reviews.llvm.org/D16291 llvm-svn: 259387	2016-02-01 19:13:07 +00:00
Geoff Berry	29d4a695f4	[AArch64] Simplify prolog/epilog callee save/restore. NFC. Summary: Factor out common code for callee-save register pair calculation. This is intended to simplify follow-on changes that reduce the number of registers saved/restored. Depends on D16732 Reviewers: mcrosier, jmolloy, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16734 llvm-svn: 259384	2016-02-01 19:07:06 +00:00
Geoff Berry	04bf91a8c1	[AArch64] Simplify callee-save register save/restore. NFC. Summary: Simplify callee-save register save/restore code generation by remembering the size of the callee-save area when it is computed so we don't have to scan the prologue/epilogue instructions again later to reconstruct it. This is intended to simplify follow-on changes that reduce the number of registers saved/restored. Reviewers: mcrosier, jmolloy, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16732 llvm-svn: 259365	2016-02-01 16:29:19 +00:00
Ahmed Bougacha	53010a0d5b	[AArch64] Fix i64 nontemporal high-half extraction. Since we only have pair - not single - nontemporal store instructions, we have to extract the high part into a separate register to be able to use them. When the initial nontemporal codegen support was added, I wrote the extract using the nonsensical UBFX [0,32[. Use the correct LSR form instead. llvm-svn: 259134	2016-01-29 01:08:41 +00:00
Chad Rosier	3ada75f7e8	[AArch64] Set MMOs on pre- and post-index instructions. Without the MMOs the MI scheduler is unable to reason about the dependencies of these instructions. llvm-svn: 259052	2016-01-28 15:38:24 +00:00
Benjamin Kramer	f9172fd4ac	Rename TargetSelectionDAGInfo into SelectionDAGTargetInfo and move it to CodeGen/ It's a SelectionDAG thing, not a Target thing. llvm-svn: 258939	2016-01-27 16:32:26 +00:00
Benjamin Kramer	b3e8a6d2b8	Move MCTargetAsmParser.h to llvm/MC/MCParser where it belongs. llvm-svn: 258917	2016-01-27 10:01:28 +00:00
Chris Bieneman	e49730d4ba	Remove autoconf support Summary: This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html "I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened." - Obi Wan Kenobi Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16471 llvm-svn: 258861	2016-01-26 21:29:08 +00:00
Benjamin Kramer	f57c1977c1	Reflect the MC/MCDisassembler split on the include/ level. No functional change, just moving code around. llvm-svn: 258818	2016-01-26 16:44:37 +00:00
Junmo Park	3ca3e192d0	Silence a -Wparentheses warning; NFC. llvm-svn: 258676	2016-01-25 10:17:17 +00:00
Aaron Ballman	add830b5d1	Silence a -Wparentheses warning; NFC. llvm-svn: 258626	2016-01-23 15:42:21 +00:00
Matthias Braun	327bca776c	Inline variable into assert Seems like some compilers still give unused variable warnings for bool var = ...; (void)var; so I have to inline the variable. llvm-svn: 258619	2016-01-23 06:49:29 +00:00
NAKAMURA Takumi	9974fa9c8c	AArch64ISelLowering.cpp: Fix a warning. [-Wunused-variable] llvm-svn: 258618	2016-01-23 06:34:59 +00:00
Matthias Braun	fdef49b183	AArch64ISel: Fix ccmp code selection matching deep expressions. Some of the conditions necessary to produce ccmp sequences were only checked in recursive calls to emitConjunctionDisjunctionTree() after some of the earlier expressions were already built. Move all checks over to isConjunctionDisjunctionTree() so they are all checked before we start emitting instructions. Also rename some variable to better reflect their usage. llvm-svn: 258605	2016-01-23 04:05:22 +00:00
Matthias Braun	985bdf9084	AArch64ISelLowering: Reduce maximum recursion depth of isConjunctionDisjunctionTree() This function will exhibit exponential runtime (2**n) so we should rather use a lower limit. llvm-svn: 258604	2016-01-23 04:05:18 +00:00
Matthias Braun	fd13c14669	Fix wrong indentation llvm-svn: 258603	2016-01-23 04:05:16 +00:00
Ahmed Bougacha	78d6efdb93	[AArch64] Simplify emitConditionalCompare calls. NFC. Now that both callsites are identical, we can simplify the prototype and make it easier to reason about the 2-CC case. llvm-svn: 258534	2016-01-22 19:43:57 +00:00
Ahmed Bougacha	99209b90a4	[AArch64] Lower 2-CC FCCMPs (one/ueq) using AND'ed CCs. The current behavior is incorrect, as the two CCs returned by changeFPCCToAArch64CC, intended to be OR'ed, are instead used in an AND ccmp chain. Consider: define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) { %cc1 = fcmp one float %a, %b %cc2 = fcmp olt float %c, %d %and = and i1 %cc1, %cc2 %r = select i1 %and, i32 %e, i32 %f ret i32 %r } Assuming (%a < %b) and (%c < %d); we used to do: fcmp s0, s1 # nzcv <- 1000 orr w8, wzr, #0x1 # w8 <- 1 csel w9, w8, wzr, mi # w9 <- 1 csel w8, w8, w9, gt # w8 <- 1 fcmp s2, s3 # nzcv <- 1000 cset w9, mi # w9 <- 1 tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000 csel w0, w0, w1, ne # w0 <- w0 We now do: fcmp s2, s3 # nzcv <- 1000 fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000 fccmp s0, s1, #8, le # !le, so: nzcv <- 1000 csel w0, w0, w1, pl # !pl, so: w0 <- w1 In other words, we transformed: (c < d) && ((a < b) \|\| (a > b)) into: (c < d) && (a u>= b) && (a u<= b) whereas, per De Morgan's, we wanted: (c < d) && !((a u>= b) && (a u<= b)) Note that this problem doesn't occur in the test-suite. changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt. We can't represent that in the fccmp chain; it can't express arbitrary OR sequences, as one comment explains: In general we can create code for arbitrary "... (and (and A B) C)" sequences. We can also implement some "or" expressions, because "(or A B)" is equivalent to "not (and (not A) (not B))" and we can implement some negation operations. [...] However there is no way to negate the result of a partial sequence. Instead, introduce changeFPCCToANDAArch64CC, which produces the conjunct cond codes: - (a one b) == ((a olt b) \|\| (a ogt b)) == ((a ord b) && (a une b)) - (a ueq b) == ((a uno b) \|\| (a oeq b)) == ((a ule b) && (a uge b)) Note that, at first, one might think that, when PushNegate is true, we should use the disjunct CCs, in effect doing: (a \|\| b) = !(!a && !(b)) = !(!a && !(b1 \|\| b2)) <- changeFPCCToAArch64CC(b, b1, b2) = !(!a && !b1 && !b2) However, we can take advantage of the fact that the CC is already negated, which lets us avoid special-casing PushNegate and doing the simpler to reason about: (a \|\| b) = !(!a && (!b)) = !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2) = !(!a && b1 && b2) This makes both emitConditionalCompare cases behave identically, and produces correct ccmp sequences for the 2-CC fcmps. llvm-svn: 258533	2016-01-22 19:43:54 +00:00
Ahmed Bougacha	6345b9ecfa	[AArch64] Assert that CCMP isel didn't fail inconsistently. We verify that the op tree is eligible for CCMP emission in isConjunctionDisjunctionTree, but it's also possible that emitConjunctionDisjunctionTree fails later. The initial check is useful, as it avoids building nodes that will get discarded. Still, make sure that inconsistencies don't happen with an assert. llvm-svn: 258532	2016-01-22 19:43:43 +00:00
Pirama Arumuga Nainar	71e9a2a4c4	Do not lower VSETCC if operand is an f16 vector Summary: SETCC with f16 vectors has OperationAction set to Expand but still gets lowered to FCM* intrinsics based on its result type. This patch skips lowering of VSETCC if the operand is an f16 vector. v4 and v8 tests included. Reviewers: ab, jmolloy Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D15361 llvm-svn: 258471	2016-01-22 01:16:57 +00:00
Keith Walker	8c44bf1b89	Write AArch64 big endian data fixup entries as BE. There was support for writing the AArch64 big endian data fixup entries in the .eh_frame section in BE. This is changed to write all such fixup entries in BE with no restriction on the section. This is similar to the existing support for fixup entries for ARM. A test is added to check the length field in the .debug_line section as this is an example of where such a fixup occurs. Differential Revision: http://reviews.llvm.org/D16064 llvm-svn: 258320	2016-01-20 15:59:14 +00:00
Oliver Stannard	f7696f8267	[AArch64] Fix two bugs in the .inst directive The AArch64 .inst directive was implemented using EmitIntValue, which resulted in both $x and $d (code and data) mapping symbols being emitted at the same address. This fixes it to only emit the $x mapping symbol. EmitIntValue also emits the value in big-endian order when targeting big-endian systems, but instructions are always emitted in little-endian order for AArch64. Differential Revision: http://reviews.llvm.org/D16349 llvm-svn: 258308	2016-01-20 12:54:31 +00:00
Eduard Burtescu	23c4d83aa3	[NFC] Replace several manual GEP loops with gep_type_iterator. Reviewers: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16335 llvm-svn: 258262	2016-01-20 00:26:52 +00:00
Chad Rosier	5c72966ea3	[AArch64] Remove a bunch of useless FIXME comments. llvm-svn: 258193	2016-01-19 21:47:24 +00:00
Chad Rosier	b11c82d3e2	[AArch64] Remove more dead code after r258093. llvm-svn: 258191	2016-01-19 21:27:05 +00:00
Eduard Burtescu	19eb03106d	[opaque pointer types] [NFC] GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType. Summary: GEPOperator: provide getResultElementType alongside getSourceElementType. This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has. GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType. Reviewers: mjacob, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16275 llvm-svn: 258145	2016-01-19 17:28:00 +00:00
Chad Rosier	401a4ab8d8	Typo. llvm-svn: 258137	2016-01-19 16:50:45 +00:00
Chad Rosier	234bf6fe5c	[AArch64] Remove unused arguments. NFC. AFAICT, these have been unused since the initial backend import. llvm-svn: 258093	2016-01-18 21:56:40 +00:00
Manuel Jacob	5f6eaac611	GlobalValue: use getValueType() instead of getType()->getPointerElementType(). Reviewers: mjacob Subscribers: jholewinski, arsenm, dsanders, dblaikie Patch by Eduard Burtescu. Differential Revision: http://reviews.llvm.org/D16260 llvm-svn: 257999	2016-01-16 20:30:46 +00:00
Manman Ren	4632e8e625	CXX_FAST_TLS calling convention: fix issue on AArch64. When we have a single basic block, the explicit copy-back instructions should be inserted right before the terminator. Before this fix, they were wrongly placed at the beginning of the basic block. I will commit fixes to other platforms as well. PR26136 llvm-svn: 257929	2016-01-15 20:13:28 +00:00
Weiming Zhao	038393bba0	Fix AArch64ConditionOptimizer Summary: This pass may modify the Cmp operands. However, the flag reg may be used by both the branch and CSEL. Modifying CMP will have side effect on CSEL. Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: http://reviews.llvm.org/D16147 llvm-svn: 257844	2016-01-15 00:06:58 +00:00
Rui Ueyama	da00f2fdf4	Update to use new name alignTo(). llvm-svn: 257804	2016-01-14 21:06:47 +00:00
Ahmed Bougacha	dfc77357a0	[AArch64] Don't assume extractelt constant index when matching shuffle. llvm-svn: 257735	2016-01-14 02:12:30 +00:00
Rafael Espindola	8340f94df1	Convert a few assert failures into proper errors. Fixes PR25944. llvm-svn: 257697	2016-01-13 22:56:57 +00:00
Haicheng Wu	08b9462540	[AArch64 MachineCombine] Enhance/Add support for general reassociation to reduce the critical path Allow fadd/fmul to be reassociated in aarch64. llvm-svn: 257024	2016-01-07 04:01:02 +00:00
Philip Reames	c86ed0055d	Extract helper function to merge MemoryOperand lists [NFC] In the discussion on http://reviews.llvm.org/D15730, Andy pointed out we had a utility function for merging MMO lists. Since it turned we actually had two copies and there's another review in progress (http://reviews.llvm.org/D15230) which needs the same, extract it into a utility function and clean up the interfaces to make it easier to use with a MachineInstBuilder. I introduced a pair here to track size and allocation together. I think we should probably move in the direction of the MachineOperandsRef helper class, but I'm leaving that for further work. I want to get the poison state introduced before I make major changes to the interface. Differential Revision: http://reviews.llvm.org/D15757 llvm-svn: 256909	2016-01-06 04:39:03 +00:00
Junmo Park	3a40237c03	Delete trailing whitespace; NFC llvm-svn: 256908	2016-01-06 03:53:36 +00:00
Junmo Park	3ec882feed	Delete trailing whitespace; NFC llvm-svn: 256906	2016-01-06 03:41:30 +00:00
MinSeong Kim	a7385ebf78	[AArch64] Add support for Samsung Exynos-M1 Adds core tuning support for new Samsung Exynos-M1 core (ARMv8-A). Differential Revision: http://reviews.llvm.org/D15663 llvm-svn: 256828	2016-01-05 12:51:59 +00:00
Junmo Park	3b8c715b2f	Remove extra whitespace. NFC. llvm-svn: 256820	2016-01-05 09:36:47 +00:00
Geoff Berry	9e934b0cc2	[AArch64] Optimize some simple TBZ/TBNZ cases. Summary: Add some AArch64 dag combines to optimize some simple TBZ/TBNZ cases: (tbz (and x, m), b) -> (tbz x, b) (tbz (shl x, c), b) -> (tbz x, b-c) (tbz (shr x, c), b) -> (tbz x, b+c) (tbz (xor x, -1), b) -> (tbnz x, b) Reviewers: jmolloy, mcrosier, t.p.northover Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D15702 llvm-svn: 256765	2016-01-04 18:55:47 +00:00
Craig Topper	daf2e3ff7a	Remove extra forward declarations and scrub includes for all in tree InstPrinters. NFC llvm-svn: 256427	2015-12-25 22:10:01 +00:00
Jun Bum Lim	6755c3bc5f	[AArch64] Promote loads from stored This is a recommit of r256004 which was reverted in r256160. The issue was the incorrect promotion for half and byte loads transformed into mov instructions. This fix will replace half and byte type loads only with bit field extracts. Original commit message: This change promotes load instructions which directly read from stored by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256249	2015-12-22 16:36:16 +00:00
Matthew Simpson	11c4de6054	[AArch64] Add additional extract-extend patterns for smov This patch adds to the target description two additional patterns for matching extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and v8i16-to-i64 cases. The existing patterns miss these cases because the extracted elements must first be legalized to i32, resulting in any_extend nodes. This was originally implemented as a DAG combine (r255895), but was reverted due to failing out-of-tree tests. llvm-svn: 256176	2015-12-21 18:31:25 +00:00
Jun Bum Lim	4bb171c8da	Revert "[AArch64] Promote loads from stores" This reverts commit r256004 due to a failure in cortex-a53. llvm-svn: 256160	2015-12-21 15:36:49 +00:00
Chad Rosier	d016574df8	[AArch64] Enable PostRAScheduler for AArch64 generic build. Disable post-ra scheduler for perturbed tests to appease the bots and to preserve the history of the tests. http://reviews.llvm.org/D15652 llvm-svn: 256158	2015-12-21 14:43:45 +00:00
Jun Bum Lim	3509d64c24	[AArch64] Promote loads from stores This change promotes load instructions which directly read from stores by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256004	2015-12-18 18:08:30 +00:00
Matthew Simpson	13dddb0799	Revert "[AArch64] Add DAG combine for extract extend pattern" This reverts commit r255895. The patch breaks internal tests. Reverting until a fix is ready. llvm-svn: 255928	2015-12-17 21:29:47 +00:00
Rafael Espindola	9e1cae510f	Revert "[AArch64] Enable PostRAScheduler for AArch64 generic build" This reverts commit r255896. It broke the tests. llvm-svn: 255899	2015-12-17 15:12:26 +00:00
MinSeong Kim	d05e9fd194	[AArch64] Enable PostRAScheduler for AArch64 generic build This patch enables PostRAScheduler specifically for AArch64 generic build, which is beneficial from the performance perspective. Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed. Also benchmarks from LLVM test-suite did not regress. Differential Revision: http://reviews.llvm.org/D15557 llvm-svn: 255896	2015-12-17 14:51:22 +00:00
Matthew Simpson	4355e404d5	[AArch64] Add DAG combine for extract extend pattern This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) -> (extract_vector_elt v, i). The combine enables us to better match some SMOV patterns. Differential Revision: http://reviews.llvm.org/D15515 llvm-svn: 255895	2015-12-17 14:30:55 +00:00
Matthias Braun	454192917b	AArch64: Simplify emitEpilogue() and related code; NFC This is in preparation to an upcoming patch. llvm-svn: 255872	2015-12-17 03:18:47 +00:00
Ahmed Bougacha	66834ec6e1	[AArch64] Simplify some TRI/TII getters. NFC. We don't need static_casts when we use the right Subtarget. llvm-svn: 255836	2015-12-16 22:54:06 +00:00
Manman Ren	cbe4f9417d	CXX_FAST_TLS calling convention: performance improvement for AArch64. The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. The target independent portion was committed as r255353. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15341 llvm-svn: 255821	2015-12-16 21:04:19 +00:00
Geoff Berry	8f5acb1bd1	Remove dead function AArch64TargetLowering::getFunctionAlignment. NFC. Reviewers: t.p.northover, jmolloy, mcrosier Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D15458 llvm-svn: 255509	2015-12-14 17:01:10 +00:00
Cong Hou	c106989fd5	Normalize MBB's successors' probabilities in several locations. This patch adds some missing calls to MBB::normalizeSuccProbs() in several locations where it should be called. Those places are found by checking if the sum of successors' probabilities is approximate one in MachineBlockPlacement pass with some instrumented code (not in this patch). Differential revision: http://reviews.llvm.org/D15259 llvm-svn: 255455	2015-12-13 09:26:17 +00:00
Hal Finkel	cd8664c3c2	Revert r248483, r242546, r242545, and r242409 - absdiff intrinsics After much discussion, ending here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html it has been decided that, instead of having the vectorizer directly generate special absdiff and horizontal-add intrinsics, we'll recognize the relevant reduction patterns during CodeGen. Accordingly, these intrinsics are not needed (the operations they represent can be pattern matched, as is already done in some backends). Thus, we're backing these out in favor of the current development work. r248483 - Codegen: Fix llvm.*absdiff semantic. r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation llvm-svn: 255387	2015-12-11 23:11:52 +00:00
Matthias Braun	60d69e2865	CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness() computeRegisterLiveness() was broken in that it reported dead for a register even if a subregister was alive. I assume this was because the results of analayzePhysRegs() are hard to understand with respect to subregisters. This commit: Changes the results of analyzePhysRegs (=struct PhysRegInfo) to be clearly understandable, also renames the fields to avoid silent breakage of third-party code (and improve the grammar). Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling it and removing workarounds for the bug. This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033 Differential Revision: http://reviews.llvm.org/D15320 llvm-svn: 255362	2015-12-11 19:42:09 +00:00
Matt Arsenault	fbd9bbfda3	Start replacing vector_extract/vector_insert with extractelt/insertelt These are redundant pairs of nodes defined for INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT. insertelement/extractelement are slightly closer to the corresponding C++ node name, and has stricter type checking so prefer it. Update targets to only use these nodes where it is trivial to do so. AArch64, ARM, and Mips all have various type errors on simple replacement, so they will need work to fix. Example from AArch64: def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8), (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>; Which is trying to do sext_inreg i8, i8. llvm-svn: 255359	2015-12-11 19:20:16 +00:00
Pirama Arumuga Nainar	1317d5f311	Fix fptosi, fptoui from f16 vectors to i8, i16 vectors Summary: Convert f16 vectors to corresponding f32 vectors before doing the conversion to int. Add tests for v4f16, v8f16. Reviewers: ab, jmolloy Subscribers: llvm-commits, srhines Differential Revision: http://reviews.llvm.org/D14936 llvm-svn: 255263	2015-12-10 17:16:49 +00:00
Oliver Stannard	86f729296a	[AArch64] Fix FP16 vector instructions that should only accept low registers llvm-svn: 255113	2015-12-09 14:32:11 +00:00
Ahmed Bougacha	97564c3a1b	[AArch64][ARM] Don't base interleaved op legality on type alloc size. Otherwise, we think that most types that look like they'd fit in a legal vector type are legal (so, basically, any vector type with a size between 33 and 128 bits, I think, since we use pow2 alignment; e.g., v2i25, v3f32, ...). DataLayout::getTypeAllocSize rounds up based on alignment. When checking for target intrinsic legality, that's not what we want: if rounding makes a difference, the type isn't legal, and the target intrinsics shouldn't be used, as they are always assumed legal. One could make the argument that alloc size is ultimately the most relevant here, since we're dealing with LD/ST intrinsics. That's only true if we did legalize them though; that's a problem for another day. Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits. Type::getSizeInBits can't be used because that'd gratuitously break pointer vector support. Some of these uses are currently fine, because we only hit them when the type is already known legal (e.g., r114454). Update them for consistency. It's faster to avoid the rounding anyway! llvm-svn: 255089	2015-12-09 01:19:50 +00:00
Pirama Arumuga Nainar	e6ccd7b66a	Define selection for v4f16, v8f16 scalar_to_vector Summary: This fixes failure when trying to select insertelement <4 x half> undef, half %a, i64 0 which gets transformed to a scalar_to_vector node. The accompanying v4 and v8 tests fail instruction selection without this patch. Reviewers: ab, jmolloy Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D15322 llvm-svn: 255072	2015-12-08 23:07:06 +00:00
Oliver Stannard	e4c3d21ea6	[AArch64] Add ARMv8.2-A FP16 vector instructions ARMv8.2-A adds 16-bit floating point versions of all existing SIMD floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. Note that VFP without SIMD is not a valid combination for any version of ARMv8-A, but I have ensured that these instructions all depend on both FeatureNEON and FeatureFullFP16 for consistency. The ".2h" vector type specifier is now legal (for the scalar pairwise reduction instructions), so some unrelated tests have been modified as different error messages are emitted. This is not a problem as the invalid operands are still caught. llvm-svn: 255010	2015-12-08 12:16:10 +00:00
Manman Ren	cb8470b4b5	[CXX TLS calling convention] Add support for AArch64. rdar://9001553 llvm-svn: 254978	2015-12-08 00:14:38 +00:00
Craig Topper	e5e035a3a8	Replace uint16_t with the MCPhysReg typedef in many places. A lot of physical register arrays already use this typedef. llvm-svn: 254843	2015-12-05 07:13:35 +00:00
Philip Reames	7c6692de16	[EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC) When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access. Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing. Note that the actual implementation was always bailing if the load or store wasn't simple. Reminder: - "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered - "ordered" - imposes ordering constraints on other nearby memory operations - "atomic" - can't be split or sheared. In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used. - "simple" - a load which is none of the above. These are normal loads and what most of the optimizer works with. llvm-svn: 254805	2015-12-05 00:18:33 +00:00
Chad Rosier	f3491496dc	[AArch64] Expand vector SDIVREM/UDIVREM operations. http://reviews.llvm.org/D15214 Patch by Ana Pazos <apazos@codeaurora.org>! llvm-svn: 254773	2015-12-04 21:38:44 +00:00
Matthias Braun	0d4505c067	AArch64FastISel: Use cbz/cbnz to branch on i1 In the case of a conditional branch without a preceding cmp we used to emit a "and; cmp; b.eq/b.ne" sequence, use tbz/tbnz instead. Differential Revision: http://reviews.llvm.org/D15122 llvm-svn: 254621	2015-12-03 17:19:58 +00:00
Christof Douma	8b5dc2c94e	[AArch64]: Add support for Cortex-A35 Adds support for the new Cortex-A35 ARMv8-A core. llvm-svn: 254503	2015-12-02 11:53:44 +00:00
Tim Northover	f3be9d5c0b	AArch64: fix 128-bit shifts We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an UNDEF value (and worse, it's not what you want with the natural Arch64 implementation). The generated code is pretty horrific, but I couldn't come up with an obviously better alternative (if the amount is constant EXTR could help). Turns out 128-bit shifts are just nasty. rdar://22491037 llvm-svn: 254475	2015-12-02 00:33:54 +00:00
Weiming Zhao	56ab51870c	[AArch64] Fix a corner case in BitFeild select Summary: When not useful bits, BitWidth becomes 0 and APInt will not be happy. See https://llvm.org/bugs/show_bug.cgi?id=25571 We can just mark the operand as IMPLICIT_DEF is none bits of it is used. Reviewers: t.p.northover, jmolloy Subscribers: gberry, jmolloy, mgrang, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14803 llvm-svn: 254440	2015-12-01 19:17:49 +00:00
Oliver Stannard	a34e47066e	[AArch64] Add ARMv8.2-A Statistical Profiling Extension The Statistical Profiling Extension is an optional extension to ARMv8.2-A. Since it is an optional extension, I have added the FeatureSPE subtarget feature to control it. The assembler-visible parts of this extension are the new "psb csync" instruction, which is equivalent to "hint #17", and a number of system registers. Differential Revision: http://reviews.llvm.org/D15021 llvm-svn: 254401	2015-12-01 10:48:51 +00:00
Oliver Stannard	b25914e03f	[AArch64] Add ARMv8.2-A FP16 scalar instructions ARMv8.2-A adds 16-bit floating point versions of all existing VFP floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. Most of these instructions are the same as the 32- and 64-bit versions, but with the type field (bits 23-22) set to 0b11. Previously the top bit of the size field was always 0, so the instruction classes only provided a 1-bit size field, which I have widened to 2 bits. Differential Revision: http://reviews.llvm.org/D15014 llvm-svn: 254198	2015-11-27 13:04:48 +00:00
Oliver Stannard	64c167db7a	[AArch64] Add ARMv8.2-A new AT instruction variants ARMv8.2-A adds new variants of the "at" (address translate) system instruction, which take the PSTATE.PAN bit (added in ARMv8.1-A). These are a required part of ARMv8.2-A, so no additional subtarget features are required. Differential Revision: http://reviews.llvm.org/D15018 llvm-svn: 254159	2015-11-26 15:34:44 +00:00
Oliver Stannard	911ea20f07	[AArch64] Add ARMv8.2-A UAO PSTATE bit ARMv8.2-A adds a new PSTATE bit, PSTATE.UAO, which allows the LDTR/STTR instructions to behave the same as LDR/STR with respect to execute-only pages at higher privilege levels. New variants of the MSR/MRS instructions are added to allow reading and writing this bit. It is a required part of ARMv8.2-A, so no additional subtarget features are required. Differential Revision: http://reviews.llvm.org/D15020 llvm-svn: 254157	2015-11-26 15:32:30 +00:00
Oliver Stannard	1a81cc9f43	[AArch64] Add ARMv8.2-A persistent memory instruction ARMv8.2-A adds the "dc cvap" instruction, which is a system instruction that cleans caches to the point of persistence (for systems that have persistent memory). It is a required part of ARMv8.2-A, so no additional subtarget features are required. Differential Revision: http://reviews.llvm.org/D15016 llvm-svn: 254156	2015-11-26 15:28:47 +00:00
Oliver Stannard	48b43741d0	[AArch64] Add ARMv8.2-A ID_A64MMFR2_EL1 register ARMv8.2-A adds a new ID register, ID_A64MMFR2_EL1, which behaves in the same way as ID_A64MMFR0_EL1 and ID_A64MMFR1_EL1. It is a required part of ARMv8.2-A, so no additional subtarget features are required. Differential Revision: http://reviews.llvm.org/D15017 llvm-svn: 254155	2015-11-26 15:26:10 +00:00
Oliver Stannard	7cc0c4e675	[AArch64] Add subtarget features for ARMv8.2-A This adds subtarget features for ARMv8.2-A, which builds on (and requires the features from) ARMv8.1-A. Most assembler-visible features of ARMv8.2-A are system instructions, and are all required parts of the architecture, so just depend on the HasV8_2aOps subtarget feature. There is also one large, optional feature, which adds 16-bit floating point versions of all existing floating-point instructions (VFP and SIMD), this is represented by the FeatureFullFP16 subtarget feature. Differential Revision: http://reviews.llvm.org/D15013 llvm-svn: 254154	2015-11-26 15:23:32 +00:00
Artyom Skrobov	314ee04268	Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC) Summary: Many target lowerings copy-paste the code to test SDValues for known constants. This code can instead be shared in SelectionDAG.cpp, and reused in the targets. Reviewers: MatzeB, andreadb, tstellarAMD Subscribers: arsenm, jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D14945 llvm-svn: 254085	2015-11-25 19:41:11 +00:00
Cong Hou	1938f2eb98	Let SelectionDAG start to use probability-based interface to add successors. The patch in http://reviews.llvm.org/D13745 is broken into four parts: 1. New interfaces without functional changes. 2. Use new interfaces in SelectionDAG, while in other passes treat probabilities as weights. 3. Use new interfaces in all other passes. 4. Remove old interfaces. This the second patch above. In this patch SelectionDAG starts to use probability-based interfaces in MBB to add successors but other MC passes are still using weight-based interfaces. Therefore, we need to maintain correct weight list in MBB even when probability-based interfaces are used. This is done by updating weight list in probability-based interfaces by treating the numerator of probabilities as weights. This change affects many test cases that check successor weight values. I will update those test cases once this patch looks good to you. Differential revision: http://reviews.llvm.org/D14361 llvm-svn: 253965	2015-11-24 08:51:23 +00:00
Jun Bum Lim	80ec0d3f5a	[AArch64]Merge narrow zero stores to a wider store This change merges adjacent zero stores into a wider single store. For example : strh wzr, [x0] strh wzr, [x0, #2] becomes str wzr, [x0] This will fix PR25410. llvm-svn: 253711	2015-11-20 21:14:07 +00:00
Jun Bum Lim	c12c2790e1	[AArch64] Refactoring aarch64-ldst-opt. NCF. Summary : * Rename isSmallTypeLdMerge() to isNarrowLoad(). * Rename NumSmallTypeMerged to NumNarrowTypePromoted. * Use Subtarget defined as a member variable. llvm-svn: 253587	2015-11-19 18:41:27 +00:00
Jun Bum Lim	4c35ccac91	[AArch64]Extend merging narrow loads into a wider load This change extends r251438 to handle more narrow load promotions including byte type, unscaled, and signed. For example, this change will convert : ldursh w1, [x0, #-2] ldurh w2, [x0, #-4] into ldur w2, [x0, #-4] asr w1, w2, #16 and w2, w2, #0xffff llvm-svn: 253577	2015-11-19 17:21:41 +00:00
Pete Cooper	67cf9a723b	Revert "Change memcpy/memset/memmove to have dest and source alignments." This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543	2015-11-19 05:56:52 +00:00
Quentin Colombet	f6645cce91	[AArch64] Enable shrink-wrapping by default. Differential Revision: http://reviews.llvm.org/D14360 rdar://problem/20820748 llvm-svn: 253520	2015-11-18 23:12:20 +00:00
Pete Cooper	72bc23ef02	Change memcpy/memset/memmove to have dest and source alignments. Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511	2015-11-18 22:17:24 +00:00
Matthew Simpson	343af07aa9	[Aarch64] Add cost for missing extensions. This patch adds a cost estimate for some missing sign and zero extensions. The costs were determined by counting the number of shift instructions generated without context for each new extension. Differential Revision: http://reviews.llvm.org/D14730 llvm-svn: 253482	2015-11-18 18:03:06 +00:00
Ahmed Bougacha	88ddeae8bd	[AArch64] Promote f16 SELECT_CC CC operands when op is legal. SELECT_CC has the nasty property of having operands with unrelated types. So if you do something like: f32 = select_cc f16, f16, f32, f32, cc You'd only look for the action for <select_cc, f32>, but never f16. If the types are all legal, but the op isn't (as for f16 on AArch64, or for f128 on x86_64/AArch64?), then you get into trouble. For f128, we have softenSetCCOperands to handle this case. Similarly, for f16, we can directly promote the CC operands. llvm-svn: 253344	2015-11-17 16:45:40 +00:00
Oliver Stannard	9be59af3ab	[Assembler] Make fatal assembler errors non-fatal Currently, if the assembler encounters an error after parsing (such as an out-of-range fixup), it reports this as a fatal error, and so stops after the first error. However, for most of these there is an obvious way to recover after emitting the error, such as emitting the fixup with a value of zero. This means that we can report on all of the errors in a file, not just the first one. MCContext::reportError records the fact that an error was encountered, so we won't actually emit an object file with the incorrect contents. Differential Revision: http://reviews.llvm.org/D14717 llvm-svn: 253328	2015-11-17 10:00:43 +00:00
Oliver Stannard	9327a7575b	[ARM,AArch64] Store source location of asm constant pool entries Storing the source location of the expression that created a constant pool entry allows us to emit better error messages if we later discover that the expression cannot be represented by a relocation. Differential Revision: http://reviews.llvm.org/D14646 llvm-svn: 253220	2015-11-16 16:25:47 +00:00
Oliver Stannard	09be060606	[ARM,AArch64] Store source location for values in assembly files The MCValue class can store a SMLoc to allow better error messages to be emitted if an error is detected after parsing. The ARM and AArch64 assembly parsers were not setting this, so error messages did not have source information. Differential Revision: http://reviews.llvm.org/D14645 llvm-svn: 253219	2015-11-16 16:22:47 +00:00
Oliver Stannard	db9081bf89	[AArch64] ldr= pseudo-instruction silently ignored if register invalid The AArch64 assembler was silently ignoring instructions like this: ldr foo, =bar AArch64AsmParser::parseOperand was returning true as the parse failed, but was not calling AArch64AsmParser::Error to report this to the user, so the instruction was ignored without printing an error message. Differential Revision: http://reviews.llvm.org/D14651 llvm-svn: 253193	2015-11-16 10:25:19 +00:00
Akira Hatanaka	b11ef0897c	Reduce the size of MCRelaxableFragment. MCRelaxableFragment previously kept a copy of MCSubtargetInfo and MCInst to enable re-encoding the MCInst later during relaxation. A copy of MCSubtargetInfo (instead of a reference or pointer) was needed because the feature bits could be modified by the parser. This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment with a constant reference to MCSubtargetInfo. The copies of MCSubtargetInfo are kept in MCContext, and the target parsers are now responsible for asking MCContext to provide a copy whenever the feature bits of MCSubtargetInfo have to be toggled. With this patch, I saw a 4% reduction in peak memory usage when I compiled verify-uselistorder.lto.bc using llc. rdar://problem/21736951 Differential Revision: http://reviews.llvm.org/D14346 llvm-svn: 253127	2015-11-14 06:35:56 +00:00
Akira Hatanaka	bd9fc28444	[MCTargetAsmParser] Move the member varialbes that reference MCSubtargetInfo in the subclasses into MCTargetAsmParser and define a member function getSTI. This is done in preparation for making changes to shrink the size of MCRelaxableFragment. (see http://reviews.llvm.org/D14346). llvm-svn: 253124	2015-11-14 05:20:05 +00:00
Justin Bogner	fff708db92	AArch64: Default AArch64Subtarget::ReserveX18 to true on darwin Darwin reserves x18, so it's never ABI compliant to generate code that uses it. Set the default value based on the OS part of the triple rather than forcing front-ends to set the +reserve-x18 target feature in order to build correct code for Darwin. This will make r243310 redundant, so I'll revert that shortly. llvm-svn: 253102	2015-11-13 23:05:46 +00:00
Ahmed Bougacha	4a85643907	[MC] Use LShr for constant evaluation of ">>" on non-arm64 darwin. Follow-up to r235963: this matches other assemblers and is less unexpected (e.g. PR23227). llvm-svn: 252681	2015-11-11 00:51:36 +00:00
Sanjay Patel	241c31fb64	[AArch64] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz() AArch64 has instructions for efficient count-leading/trailing-zeros, so this should be considered a cheap operation (and therefore fair game for speculation) for any AArch64 implementation. The net result of allowing this speculation for the regression tests in this patch is that we get this code: ctlz: clz w0, w0 ret cttz: rbit w8, w0 clz w0, w8 ret Instead of: ctlz: cbz w0, .LBB0_2 clz w0, w0 ret .LBB0_2: orr w0, wzr, #0x20 ret cttz: cbz w0, .LBB1_2 rbit w8, w0 clz w0, w8 ret .LBB1_2: orr w0, wzr, #0x20 ret See D14469 for the larger motivation. Differential Revision: http://reviews.llvm.org/D14505 llvm-svn: 252625	2015-11-10 18:11:37 +00:00
Oliver Stannard	d414c99b9c	[AArch64] Fix halfword load merging for big-endian targets For big-endian targets, when we merge two halfword loads into a word load, the order of the halfwords in the loaded value is reversed compared to little-endian, so the load-store optimiser needs to swap the destination registers. This does not affect merging of two word loads, as we use ldp, which treats the memory as two separate 32-bit words. llvm-svn: 252597	2015-11-10 11:04:18 +00:00
Tim Northover	339c83e27f	AArch64: add experimental support for address tagging. AArch64 has the ability to use the top 8-bits of an "address" for extra information, with the memory subsystem automatically masking them off for loads and stores. When that's happening, we can sometimes skip masks on memory operations in the compiler. However, this requires the host OS and support stack to preserve those bits so it can't be enabled everywhere. In principle iOS 8.0 and above do take the required precautions and but we'll put it under a flag for now. llvm-svn: 252573	2015-11-10 00:44:23 +00:00
Sanjay Patel	776e59b0fe	don't repeat function names in comments; NFC llvm-svn: 252502	2015-11-09 19:18:26 +00:00
Charlie Turner	90dafb1b6d	[AArch64] Add UABDL patterns for log2 shuffle. Summary: This matches the sum-of-absdiff patterns emitted by the vectoriser using log2 shuffles. Relies on D14207 to be able to match the `extract_subvector(..., 0)` Reviewers: t.p.northover, jmolloy Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14208 llvm-svn: 252465	2015-11-09 13:10:52 +00:00
Charlie Turner	7b7b06f737	[AArch64] Handle extract_subvector(..., 0) in ISel. Summary: Lowering this pattern early to an `EXTRACT_SUBREG` was making it impossible to match larger patterns in tblgen that use `extract_subvector(..., 0)` as part of the their input pattern. It seems like there will exist somewhere a better way of specifying this pattern over all relevant register value types, but I didn't manage to find it. Reviewers: t.p.northover, jmolloy Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14207 llvm-svn: 252464	2015-11-09 12:45:11 +00:00
Colin LeMahieu	8a0453e23a	[AsmParser] Backends can parameterize ASM tokenization. llvm-svn: 252439	2015-11-09 00:31:07 +00:00
Joseph Tremoulet	f748c8937e	[WinEH] Update exception pointer registers Summary: The CLR's personality routine passes these in rdx/edx, not rax/eax. Make getExceptionPointerRegister a virtual method parameterized by personality function to allow making this distinction. Similarly make getExceptionSelectorRegister a virtual method parameterized by personality function, for symmetry. Reviewers: pgavlin, majnemer, rnk Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D14344 llvm-svn: 252383	2015-11-07 01:11:31 +00:00
Ahmed Bougacha	cf49b523a0	[AArch64][FastISel] Don't even try to select vector icmps. We used to try to constant-fold them to i32 immediates. Given that fast-isel doesn't otherwise support vNi1, when selecting the result users, we'd fallback to SDAG anyway. However, if the users were in another block, we'd insert broken cross-class copies (GPR32 to FPR64). Give up, let SDAG agree with itself on a vNi1 legalization strategy. llvm-svn: 252364	2015-11-06 23:16:53 +00:00
Jun Bum Lim	22fe15ee86	[AArch64]Enable the narrow ld promotion only on profitable microarchitectures The benefit from converting narrow loads into a wider load (r251438) could be micro-architecturally dependent, as it assumes that a single load with two bitfield extracts is cheaper than two narrow loads. Currently, this conversion is enabled only in cortex-a57 on which performance benefits were verified. llvm-svn: 252316	2015-11-06 16:27:47 +00:00
Tim Northover	775aaeb765	Remove windows line endings introduced by r252177. NFC. llvm-svn: 252217	2015-11-05 21:54:58 +00:00
Sanjay Patel	387e66e79f	replace MachineCombinerPattern namespace and enum with enum class; NFCI Also, remove an enum hack where enum values were used as indexes into an array. We may want to make this a real class to allow pattern-based queries/customization (D13417). llvm-svn: 252196	2015-11-05 19:34:57 +00:00
Oleg Ranevskyy	057c5a6b2b	[DebugInfo] Fix ARM/AArch64 prologue_end position. Related to D11268. Summary: This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it. D11268 is quite old and has merge conflicts against the current trunk. This request - rebases D11268 onto the new trunk; - resolves the merge conflicts; - fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct. Reviewers: echristo, rengolin, kubabrecka Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl Differential Revision: http://reviews.llvm.org/D14338 llvm-svn: 252177	2015-11-05 17:50:17 +00:00
Craig Topper	4b27576001	Remove templates from CostTableLookup functions. All instantiations had the same type. This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now. llvm-svn: 251490	2015-10-28 04:02:12 +00:00
Jun Bum Lim	c9879ecfbc	[AArch64]Merge halfword loads into a 32-bit load This recommits r250719, which caused a failure in SPEC2000.gcc because of the incorrect insert point for the new wider load. Convert two halfword loads into a single 32-bit word load with bitfield extract instructions. For example : ldrh w0, [x2] ldrh w1, [x2, #2] becomes ldr w0, [x2] ubfx w1, w0, #16, #16 and w0, w0, #ffff llvm-svn: 251438	2015-10-27 19:16:03 +00:00
Cong Hou	07eeb8001e	Create a new interface addSuccessorWithoutWeight(MBB) in MBB to add successors when optimization is disabled. When optimization is disabled, edge weights that are stored in MBB won't be used so that we don't have to store them. Currently, this is done by adding successors with default weight 0, and if all successors have default weights, the weight list will be empty. But that the weight list is empty doesn't mean disabled optimization (as is stated several times in MachineBasicBlock.cpp): it may also mean all successors just have default weights. We should discourage using default weights when adding successors, because it is very easy for users to forget update the correct edge weights instead of using default ones (one exception is that the MBB only has one successor). In order to detect such usages, it is better to differentiate using default weights from the case when optimizations is disabled. In this patch, a new interface addSuccessorWithoutWeight(MBB) is created for when optimization is disabled. In this case, MBB will try to maintain an empty weight list, but it cannot guarantee this as for many uses of addSuccessor() whether optimization is disabled or not is not checked. But it can guarantee that if optimization is enabled, then the weight list always has the same size of the successor list. Differential revision: http://reviews.llvm.org/D13963 llvm-svn: 251429	2015-10-27 17:59:36 +00:00
Charlie Turner	458e79b814	[ARM] Expand ROTL and ROTR of vector value types Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection. Reviewers: rengolin, t.p.northover Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14082 llvm-svn: 251401	2015-10-27 10:25:20 +00:00
Craig Topper	ee0c859788	Convert cost table lookup functions to return a pointer to the entry or nullptr instead of the index. This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer. While there make use of ArrayRef and std::find_if. llvm-svn: 251382	2015-10-27 04:14:24 +00:00
Evgeniy Stepanov	d1aad26589	[safestack] Fast access to the unsafe stack pointer on AArch64/Android. Android libc provides a fixed TLS slot for the unsafe stack pointer, and this change implements direct access to that slot on AArch64 via __builtin_thread_pointer() + offset. This change also moves more code into TargetLowering and its target-specific subclasses to get rid of target-specific codegen in SafeStackPass. This change does not touch the ARM backend because ARM lowers builting_thread_pointer as aeabi_read_tp, which is not available on Android. The previous iteration of this change was reverted in r250461. This version leaves the generic, compiler-rt based implementation in SafeStack.cpp instead of moving it to TargetLoweringBase in order to allow testing without a TargetMachine. llvm-svn: 251324	2015-10-26 18:28:25 +00:00
Craig Topper	7bf52c9d26	Use MVT::SimpleValueType instead of MVT in template parameter. NFC llvm-svn: 251217	2015-10-25 00:27:14 +00:00
Craig Topper	272d6a57bb	Call the version of ConvertCostTableLookup that takes a statically sized array rather than pointer and size. NFC llvm-svn: 251196	2015-10-24 18:40:22 +00:00
James Molloy	5b18b4ce96	Revert "[AArch64]Merge halfword loads into a 32-bit load" This reverts commit r250719. This introduced a codegen fault in SPEC2000.gcc, when compiled for Cortex-A53. llvm-svn: 251108	2015-10-23 10:41:38 +00:00
Matthias Braun	d276de6db1	AArch64: Disable the latency heuristic It turned out not to improve any of our benchmarks but occasionally led to increased register pressure and spilling. Only enabling for the Cyclone CPU as the results on the cortex CPUs give mixed results. Differential Revision: http://reviews.llvm.org/D13708 llvm-svn: 251038	2015-10-22 18:07:38 +00:00
Craig Topper	8fe40e0ed5	Change makeLibCall to take an ArrayRef<SDValue> instead of pointer and size. This removes the need to pass a hardcoded size in many places. NFC llvm-svn: 251032	2015-10-22 17:05:00 +00:00
Jun Bum Lim	d3548303ec	[AArch64]Merge halfword loads into a 32-bit load Convert two halfword loads into a single 32-bit word load with bitfield extract instructions. For example : ldrh w0, [x2] ldrh w1, [x2, #2] becomes ldr w0, [x2] ubfx w1, w0, #16, #16 and w0, w0, #ffff llvm-svn: 250719	2015-10-19 18:34:53 +00:00
Craig Topper	2626094fa1	Make a bunch of static arrays const. llvm-svn: 250642	2015-10-18 05:15:34 +00:00
Charlie Turner	434d4599d4	[AArch64] Implement vector splitting on UADDV. Summary: Fixes PR25056. Reviewers: mcrosier, junbuml, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13466 llvm-svn: 250520	2015-10-16 15:38:25 +00:00
Evgeniy Stepanov	9addbc9fc1	Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android." Breaks the hexagon buildbot. llvm-svn: 250461	2015-10-15 21:26:49 +00:00
Evgeniy Stepanov	142947e9f0	[safestack] Fast access to the unsafe stack pointer on AArch64/Android. Android libc provides a fixed TLS slot for the unsafe stack pointer, and this change implements direct access to that slot on AArch64 via __builtin_thread_pointer() + offset. This change also moves more code into TargetLowering and its target-specific subclasses to get rid of target-specific codegen in SafeStackPass. This change does not touch the ARM backend because ARM lowers builting_thread_pointer as aeabi_read_tp, which is not available on Android. llvm-svn: 250456	2015-10-15 20:50:16 +00:00
Duncan P. N. Exon Smith	d3b9df02b3	AArch64: Remove implicit ilist iterator conversions, NFC llvm-svn: 250216	2015-10-13 20:02:15 +00:00
Akira Hatanaka	5a4e4f8d8a	[AArch64] Check the size of the vector before accessing its elements. This fixes an assert in AArch64AsmParser::MatchAndEmitInstruction. rdar://problem/23081753 llvm-svn: 250207	2015-10-13 18:55:34 +00:00
Duncan P. N. Exon Smith	769e1a972d	AArch64: Make getNextNode() cleanup in r249764 more clear After r249764, if you didn't see the full context, it looked like `std::next(I)` would get the same result as `++MachineBasicBlock::iterator(I)`. However, `I` is a `MachineInstr*` (not a `MachineBasicBlock::iterator`). Use the `getIterator()` helper I added later (r249782) to make this code more clear. llvm-svn: 249852	2015-10-09 16:54:54 +00:00
Jun Bum Lim	0aace13d18	Improve ISel across lane float min/max reduction In vectorized float min/max reduction code, the final "reduce" step is sub-optimal. In AArch64, this change wll combine : svn0 = vector_shuffle t0, undef<2,3,u,u> fmin = fminnum t0,svn0 svn1 = vector_shuffle fmin, undef<1,u,u,u> cc = setcc fmin, svn1, ole n0 = extract_vector_elt cc, #0 n1 = extract_vector_elt fmin, #0 n2 = extract_vector_elt fmin, #1 result = select n0, n1,n2 into : result = llvm.aarch64.neon.fminnmv t0 This change extends r247575. llvm-svn: 249834	2015-10-09 14:11:25 +00:00
Duncan P. N. Exon Smith	d389165c14	AArch64: Stop using MachineInstr::getNextNode() Stop using `getNextNode()` to get an insertion point (at least, in this one place). Instead, use iterator logic directly. The `getNextNode()` interface isn't actually supposed to work for creating iterators; it's supposed to return `nullptr` (not a real iterator) if this is the last node. It's currently broken and will "happen" to work, but if we ever fix the function, we'll get some strange failures in places like this. llvm-svn: 249764	2015-10-08 22:43:26 +00:00
Evgeniy Stepanov	5fe279e727	Add Triple::isAndroid(). This is a simple refactoring that replaces Triple.getEnvironment() checks for Android with Triple.isAndroid(). llvm-svn: 249750	2015-10-08 21:21:24 +00:00
Chad Rosier	7c6ac2b8f9	[AArch64] Fold a floating-point divide by power of two into fp conversion. Part of http://reviews.llvm.org/D13442 llvm-svn: 249579	2015-10-07 17:51:37 +00:00
Chad Rosier	fa30c9b436	[AArch64] Fold a floating-point multiply by power of two into fp conversion. Part of http://reviews.llvm.org/D13442 llvm-svn: 249576	2015-10-07 17:39:18 +00:00
Jeroen Ketema	aebca09543	[ARM][AArch64] Only lower to interleaved load/store if the target has NEON Without an additional check for NEON, the compiler crashes during legalization of NEON ldN/stN. Differential Revision: http://reviews.llvm.org/D13508 llvm-svn: 249550	2015-10-07 14:53:29 +00:00
Alexandros Lamprineas	1bab191f25	[MC layer][AArch64] llvm-mc accepts 4-bit immediate values for "msr pan, #imm", while only 1-bit immediate values should be valid. Changed encoding and decoding for msr pstate instructions. Differential Revision: http://reviews.llvm.org/D13011 llvm-svn: 249313	2015-10-05 13:42:31 +00:00
Rafael Espindola	e3a20f57d9	Fix pr24486. This extends the work done in r233995 so that now getFragment (in addition to getSection) also works for variable symbols. With that the existing logic to decide if a-b can be computed works even if a or b are variables. Given that, the expression evaluation can avoid expanding variables as aggressively and that in turn lets the relocation code see the original variable. In order for this to work with the asm streamer, there is now a dummy fragment per section. It is used to assign a section to a symbol when no other fragment exists. This patch is a joint work by Maxim Ostapenko andy myself. llvm-svn: 249303	2015-10-05 12:07:05 +00:00
Chad Rosier	1f385618c0	[ARM] Typo. NFC. llvm-svn: 249153	2015-10-02 16:42:59 +00:00
Chad Rosier	f11d040f01	[AArch64] Deprecate a command-line option used for testing. Support for pairing unscaled loads and stores has been enabled since the original ARM64 port. This feature is no longer experimental, AFAICT. llvm-svn: 249049	2015-10-01 18:17:12 +00:00
Chad Rosier	b7c5b91068	[AArch64] Hoist commonly failing check. NFC. llvm-svn: 249011	2015-10-01 13:43:05 +00:00
Chad Rosier	0b15e7c618	[AArch64] Rename variable to improve readability. NFC. llvm-svn: 249008	2015-10-01 13:33:31 +00:00
Chad Rosier	7a83d770ae	[AArch64] Update comment to reflect reality. llvm-svn: 249007	2015-10-01 13:09:44 +00:00
Chad Rosier	11c825f7db	[AArch64] Remove an unnecessary restriction on pre-index instructions. Previously, the index was constrained to the size of the memory operation for no apparent reason. This change removes that constraint so that we can form pre-index instructions with any valid offset. llvm-svn: 248931	2015-09-30 19:44:40 +00:00
Chad Rosier	4f04e2ec87	[AArch64] Use helper function to improve readability. NFC. llvm-svn: 248914	2015-09-30 16:50:41 +00:00
Chad Rosier	4315012769	[AArch64] Add support for pre- and post-index LDPSWs. llvm-svn: 248825	2015-09-29 20:39:55 +00:00
Chad Rosier	dabe2534ed	[AArch64] Add integer pre- and post-index halfword/byte loads and stores. llvm-svn: 248817	2015-09-29 18:26:15 +00:00
Chad Rosier	32d4d37e61	[AArch64] Scale offsets by the size of the memory operation. NFC. The immediate in the load/store should be scaled by the size of the memory operation, not the size of the register being loaded/stored. This change gets us one step closer to forming LDPSW instructions. This change also enables pre- and post-indexing for halfword and byte loads and stores. llvm-svn: 248804	2015-09-29 16:07:32 +00:00
Chad Rosier	a4d3217e81	[AArch64] Remove some redundant cases. NFC. llvm-svn: 248800	2015-09-29 14:57:10 +00:00
Sanjay Patel	bbbf9a1a34	merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622	2015-09-25 21:49:48 +00:00
Chad Rosier	1bbd7fb38e	[AArch64] Add support for generating pre- and post-index load/store pairs. llvm-svn: 248593	2015-09-25 17:48:17 +00:00
Chad Rosier	b02f5a5a1f	[AArch64] Improve the readability of the ld/st optimization pass. NFC. In this context, MI is an add/sub instruction not a loads/store. llvm-svn: 248540	2015-09-24 21:27:49 +00:00
Chad Rosier	7cd472b719	[AArch64] The paired post-increment store instruction has an output register. The pre- and post-increment version update the base register, but the post- version was defined incorrectly. There is no test case as we don't currently generate these instructions, but I plan on changing that in the near future. llvm-svn: 248528	2015-09-24 19:21:42 +00:00
Chad Rosier	2dfd35499e	[AArch64] Refactor pre- and post-index merge fuctions into a single function. NFC. llvm-svn: 248377	2015-09-23 13:51:44 +00:00
Ahmed Bougacha	07a844d758	[AArch64] Emit clrex in the expanded cmpxchg fail block. In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248291	2015-09-22 17:21:44 +00:00
Stephen Canon	8216d88511	Don't raise inexact when lowering ceil, floor, round, trunc. The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions did raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice. Differential Revision: http://reviews.llvm.org/D12969 llvm-svn: 248266	2015-09-22 11:43:17 +00:00
NAKAMURA Takumi	0a7d0ad95f	Untabify. llvm-svn: 248264	2015-09-22 11:15:07 +00:00
NAKAMURA Takumi	a9cb538a74	Reformat blank lines. llvm-svn: 248263	2015-09-22 11:14:39 +00:00
Chad Rosier	03a47305ec	[Machine Combiner] Refactor machine reassociation code to be target-independent. No functional change intended. Patch by Haicheng Wu <haicheng@codeaurora.org>! http://reviews.llvm.org/D12887 PR24522 llvm-svn: 248164	2015-09-21 15:09:11 +00:00
Craig Topper	3c76c523e1	Cleanup places that passed SMLoc by const reference to pass it by value instead. NFC llvm-svn: 248135	2015-09-20 23:35:59 +00:00

... 2 3 4 5 6 ...

1587 Commits