llvm-project

Commit Graph

Author	SHA1	Message	Date
Pete Cooper	f52123b454	[AArch64] Fix sext/zext folding in address arithmetic. We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there. Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use. llvm-svn: 236764	2015-05-07 19:21:36 +00:00
Wei Mi	062c74484d	[X86] Disable loop unrolling in loop vectorization pass when VF is 1. The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture, by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce the cost of overflow check, memory boundary check and extra prologue/epilogue code when regular unroller will unroll the loop another time. Disable it when VF==1 remove the unnecessary cost on x86. The same can be done for other platforms after verifying interleaving/memory bound checking to be not perf critical on those platforms. Differential Revision: http://reviews.llvm.org/D9515 llvm-svn: 236613	2015-05-06 17:12:25 +00:00
Quentin Colombet	61b305edfd	[ShrinkWrap] Add (a simplified version) of shrink-wrapping. This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. Context Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. Motivating example Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. Proposed Solution This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. Design Decisions 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507	2015-05-05 17:38:16 +00:00
Quentin Colombet	0de2346859	[AArch64][FastISel] Variant of the logical instructions that use two input registers cannot write on SP. rdar://problem/20748715 llvm-svn: 236352	2015-05-01 21:34:57 +00:00
Quentin Colombet	9df2fa261b	[AArch64][FastISel] Fix the setting of kill flags for MUL -> UMULH sequences. rdar://problem/20748715 llvm-svn: 236346	2015-05-01 20:57:11 +00:00
Quentin Colombet	329fa890ba	[AArch64] Fix bad register class constraint in fast-isel for TST instruction. rdar://problem/20748715 llvm-svn: 236273	2015-04-30 22:27:20 +00:00
Tim Northover	03b99f66d7	AArch64: add BFC alias for the BFI/BFM instructions. Unlike 32-bit ARM, AArch64 can use wzr/xzr to implement this without the need for a separate instruction. rdar://18679590 llvm-svn: 236245	2015-04-30 18:28:58 +00:00
Manman Ren	0e20822887	[AArch64] Refactor out codes that depend on specific CS save sequence. No functionality change. llvm-svn: 236143	2015-04-29 20:03:38 +00:00
Duncan P. N. Exon Smith	a9308c49ef	IR: Give 'DI' prefix to debug info metadata Finish off PR23080 by renaming the debug info IR constructs from `MD` to `DI`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the previous commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to this commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120	2015-04-29 16:38:44 +00:00
Sergey Dmitrouk	842a51bad8	Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes" [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989	2015-04-28 14:05:47 +00:00
Daniel Jasper	48e93f7181	Revert "[DebugInfo] Add debug locations to constant SD nodes" This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987	2015-04-28 13:38:35 +00:00
Sergey Dmitrouk	adb4c69d5c	[DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977	2015-04-28 11:56:37 +00:00
Ahmed Bougacha	190528703f	[MC] Use LShr for constant evaluation of ">>" on ELF/arm64--darwin. This matches other assemblers and is less unexpected (e.g. PR23227). On ELF, I tried binutils gas v2.24 and nasm 2.10.09, and they both agree on LShr. On COFF, I couldn't get my hands on an assembler yet, so don't change the behavior. For now, don't change it on non-AArch64 Darwin either, as the other assembler is gas v1.38, which does an AShr. llvm-svn: 235963	2015-04-28 01:37:11 +00:00
Ahmed Bougacha	c004c60c0a	[AArch64] Also combine vector selects fed by non-i1 SETCCs. After legalization, scalar SETCC has an i32 result type on AArch64. The i1 requirement seems too conservative, replace it with an assert. This also means that we now can run after legalization. That should also be fine, since the ops legalizer runs again after each combine, and all types created all have the same sizes as the (legal) inputs. Exposed by r235917; while there, robustize its tests (bsl also uses the register it defines). llvm-svn: 235922	2015-04-27 21:43:12 +00:00
Ahmed Bougacha	89bba61c84	[AArch64] Don't assert when combining (v3f32 select (setcc f64)). When the setcc has f64 operands, we can't build a vector setcc mask to feed a vselect, because f64 doesn't divide v3f32 evenly. Just bail out when that happens. llvm-svn: 235917	2015-04-27 21:01:20 +00:00
Lang Hames	9ff69c8f4d	[AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr. AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a reference for this is crufty. llvm-svn: 235752	2015-04-24 19:11:51 +00:00
Pirama Arumuga Nainar	745615ca00	[AArch64] Add nvcast patterns for v4f16 and v8f16 Summary: Constant stores of f16 vectors can create NvCast nodes from various operand types to v4f16 or v8f16 depending on patterns in the stored constants. This patch adds nvcast rules with v4f16 and v8f16 values. AArchISelLowering::LowerBUILD_VECTOR has the details on which constant patterns generate the nvcast nodes. Reviewers: jmolloy, srhines, ab Subscribers: rengolin, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D9201 llvm-svn: 235610	2015-04-23 17:32:25 +00:00
Pirama Arumuga Nainar	b18815354d	[AArch64] Handle vec4, vec8, vec16 *itofp for half Summary: Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32, v8i8, v8i16 inputs to allow promotion of v4f16 results. Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32, and i64 vectors. Only missing tests are for v16i8 and v16i16 as the shift operations are too complicated to write a proper check sequence. The conversions from v4i64 to v4f16 do not depend on this patch - v4i64 is split and the conversion gets handled while lowering v2i64. I am adding a test here for completeness. Reviewers: aemerson, rengolin, ab, jmolloy, srhines Subscribers: rengolin, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D9166 llvm-svn: 235609	2015-04-23 17:16:27 +00:00
Pete Cooper	037b700b7f	[AArch64] Use MachineRegisterInfo instead of LiveIntervals to calculate liveness. NFC. The CondOpt pass currently uses LiveIntervals to set the dead flag on a def. This patch uses MachineRegisterInfo::use_empty instead as that is equivalent to the def being dead. This removes an instance of LiveIntervals in the pass manager pipeline and saves 3.8% of compile time on llc conpiled for AArch64. Reviewed by Chad Rosier and Zhaoshi. llvm-svn: 235532	2015-04-22 18:05:13 +00:00
James Molloy	cd2334e86e	[AArch64] Disable complex GEP optimization by default. Enough concerns were raised that this optimization is pessimising some code patterns. The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher. llvm-svn: 235491	2015-04-22 09:11:38 +00:00
Vladimir Sukharev	bad1d1dc02	[AArch64] LORID_EL1 register must be treated as read-only Patch by: John Brawn Reviewers: jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9105 llvm-svn: 235314	2015-04-20 16:54:37 +00:00
Ahmed Bougacha	e14a4d487e	[AArch64] Don't force MVT::Untyped when selecting LD1LANEpost. The result is either an Untyped reg sequence, on ldN with N > 1, or just the type of the input vector, on ld1. Don't force Untyped. Instead, just use the type of the reg sequence. This mirrors the behavior of createTuple, which feeds the LD1*_POST. The narrow code path wasn't actually covered by tests, because V64 insert_vector_elt are widened to V128 before the LD1LANEpost combine has the chance to run, usually. The only case where it does run on V64 vectors is if the vector ops legalizer ran. So, tickle the code with a ctpop. Fixes PR23265. llvm-svn: 235243	2015-04-17 23:43:33 +00:00
Ahmed Bougacha	2448ef5f33	[AArch64] Avoid vector->load dependency cycles when creating LD1post. They would break the SelectionDAG. Note that the opposite load->vector dependency is already obvious in: (LD1post vec, ..) llvm-svn: 235224	2015-04-17 21:02:30 +00:00
Benjamin Kramer	97fbdd5a39	[mc] Clean up emission of byte sequences No functional change intended. llvm-svn: 235178	2015-04-17 11:12:43 +00:00
Ahmed Bougacha	941420d9ea	[AArch64] Don't assert on f16 in DUP PerfectShuffle generator. Found by code inspection, but breaking i16 at least breaks other tests. They aren't checking this in particular though, so also add some explicit tests for the already working types. llvm-svn: 235148	2015-04-16 23:57:07 +00:00
Pete Cooper	19d704d13c	Disable AArch64 fast-isel on big-endian call vector returns. A big-endian vector return needs a byte-swap which we aren't doing right now. For now just bail on these cases to get correctness back. llvm-svn: 235133	2015-04-16 21:19:36 +00:00
Vladimir Sukharev	6334cf3d69	[AArch64] Add v8.1a "Virtualization Host Extensions" Reviewers: t.p.northover Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8500 Patch by: Tom Coxon llvm-svn: 235107	2015-04-16 15:38:58 +00:00
Vladimir Sukharev	d49cb8fdd7	[AArch64] Add v8.1a "Limited Ordering Regions" extension Reviewers: t.p.northover Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8499 Patch by: Tom Coxon llvm-svn: 235105	2015-04-16 15:30:43 +00:00
Vladimir Sukharev	251ce0c2db	[AArch64] Add v8.1a "Privileged Access Never" extension Reviewers: jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8498 llvm-svn: 235104	2015-04-16 15:20:51 +00:00
Vladimir Sukharev	a11db3eb88	[AArch64] Handle Cyclone-specific register in common way Reviewers: jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8584 Patch by: Tom Coxon llvm-svn: 235102	2015-04-16 15:01:20 +00:00
Vladimir Sukharev	950b606a2b	[AArch64] Follow-up to: Refactor AArch64NamedImmMapper to become dependent on subtarget features Fixed compilation with clang on some buildbots with "-Werror -Wmissing-field-initializers" Related to: http://reviews.llvm.org/rL235089 llvm-svn: 235099	2015-04-16 14:36:13 +00:00
Vladimir Sukharev	a98f6897a2	[AArch64] Refactor AArch64NamedImmMapper to become dependent on subtarget features. In order to introduce v8.1a-specific entities, Mappers should be aware of SubtargetFeatures available. This patch introduces refactoring, that will then allow to easily introduce: - v8.1-specific "pan" PState for PStateMapper (PAN extension) - v8.1-specific sysregs for SysRegMapper (LOR,VHE extensions) Reviewers: jmolloy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8496 Patch by Tom Coxon llvm-svn: 235089	2015-04-16 12:15:27 +00:00
James Molloy	f8aa57aa3b	[AArch64] Fix invalid use of references to BuildMI. This was found in GCC PR65773 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65773). We shouldn't be taking a reference to the temporary that BuildMI returns, we must copy it. llvm-svn: 235088	2015-04-16 11:37:40 +00:00
Richard Trieu	6b1aa5f5e1	Change range-based for-loops to be -Wrange-loop-analysis clean. No functionality change. llvm-svn: 234963	2015-04-15 01:21:15 +00:00
Rafael Espindola	5560a4cfbd	Use raw_pwrite_stream in the object writer/streamer. The ELF object writer will take advantage of that in the next commit. llvm-svn: 234950	2015-04-14 22:14:34 +00:00
Bradley Smith	b913653b91	[AArch64] Allow non-standard INS/DUP encodings The ARMv8 ARMARM states that for these instructions in A64 state: "Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.", (imm4 for INS). Make the disassembler accept any encoding with these ignored bits set to 1. llvm-svn: 234896	2015-04-14 15:07:26 +00:00
Duncan P. N. Exon Smith	7348ddaa74	DebugInfo: Gut DIVariable and DIGlobalVariable Gut all the non-pointer API from the variable wrappers, except an implicit conversion from `DIGlobalVariable` to `DIDescriptor`. Note that if you're updating out-of-tree code, `DIVariable` wraps `MDLocalVariable` (`MDVariable` is a common base class shared with `MDGlobalVariable`). llvm-svn: 234840	2015-04-14 02:22:36 +00:00
Krzysztof Parzyszek	a46c36b8f4	Allow memory intrinsics to be tail calls llvm-svn: 234764	2015-04-13 17:16:45 +00:00
Alexander Kornienko	f817c1cb9a	Use 'override/final' instead of 'virtual' for overridden methods The patch is generated using clang-tidy misc-use-override check. This command was used: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \ -checks='-*,misc-use-override' -header-filter='llvm\|clang' \ -j=32 -fix -format http://reviews.llvm.org/D8925 llvm-svn: 234679	2015-04-11 02:11:45 +00:00
Ahmed Bougacha	b96444efd1	[CodeGen] Split -enable-global-merge into ARM and AArch64 options. Currently, there's a single flag, checked by the pass itself. It can't force-enable the pass (and is on by default), because it might not even have been created, as that's the targets decision. Instead, have separate explicit flags, so that the decision is consistently made in the target. Keep the flag as a last-resort "force-disable GlobalMerge" for now, for backwards compatibility. llvm-svn: 234666	2015-04-11 00:06:36 +00:00
Quentin Colombet	fd7475b5e8	[AArch64] Strengthen the code for the prologue insertion. The spilled registers are pristine and thus, correctly handled by the register scavenger and so on, but the liveness information is strictly speaking wrong at this point. Fix that. llvm-svn: 234664	2015-04-10 23:14:34 +00:00
Chad Rosier	518659d9b4	[AArch64] Changes some SchedAlias to WriteRes for Cortex-A57. Using SchedAliases is convenient and works well for latency and resource lookup for instructions. However, this creates an entry in AArch64WriteLatencyTable with a WriteResourceID of 0, breaking any SchedReadAdvance since the lookup will fail. http://reviews.llvm.org/D8043 Patch by Dave Estes <cestes@codeaurora.org>! llvm-svn: 234594	2015-04-10 13:19:27 +00:00
Chad Rosier	a82c876045	[AArch64] Adjusts Cortex-A57 machine model to handle zero shift. http://reviews.llvm.org/D8043 Patch by Dave Estes <cestes@codeaurora.org>! llvm-svn: 234593	2015-04-10 13:19:21 +00:00
Benjamin Kramer	619c4e57ba	Reduce dyn_cast<> to isa<> or cast<> where possible. No functional change intended. llvm-svn: 234586	2015-04-10 11:24:51 +00:00
Ahmed Bougacha	1ffe7c7d36	[AArch64] Promote f16 operations to f32. For the most common ones (such as fadd), we already did the promotion. Do the same thing for all the others. Currently, we'll just crash/assert on all these operations, as there's no hardware or libcall support whatsoever. f16 (half) is specified as an interchange - not arithmetic - format, and is expected to be promoted to single-precision for arithmetic operations. While there, teach the legalizer about promoting some of the (mostly floating-point) operations that we never needed before. Differential Revision: http://reviews.llvm.org/D8648 See related discussion on the thread for: http://reviews.llvm.org/D8755 llvm-svn: 234550	2015-04-10 00:08:48 +00:00
Juergen Ributzka	bd0c7eb4dc	[AArch64][FastISel] Fix integer extend optimization. The integer extend optimization tries to fold the extend into the load instruction. This requires us to identify if the extend has already been emitted or not and act accordingly on it. The check that was originally performed for this was not sufficient. Besides checking the ValueMap for a mapped register we also need to check if the virtual register has already an associated machine instruction that defines it. This fixes rdar://problem/20470788. llvm-svn: 234529	2015-04-09 20:00:46 +00:00
Rafael Espindola	49286e9f4a	clang-format bits of code to make a followup patch easy to read. llvm-svn: 234519	2015-04-09 18:32:58 +00:00
Kristof Beyls	17cb8982f4	[AArch64] Add support for dynamic stack alignment Differential Revision: http://reviews.llvm.org/D8876 llvm-svn: 234471	2015-04-09 08:49:47 +00:00
Lang Hames	522bf13b83	[AArch64] Remove redundant -march option. Also fix a think-o from r234462. llvm-svn: 234467	2015-04-09 05:34:57 +00:00
Lang Hames	903338511b	[AArch64] Teach AArch64TargetLowering::getOptimalMemOpType to consider alignment restrictions when choosing a type for small-memcpy inlining in SelectionDAGBuilder. This ensures that the loads and stores output for the memcpy won't be further expanded during legalization, which would cause the total number of instructions for the memcpy to exceed (often significantly) the inlining thresholds. <rdar://problem/17829180> llvm-svn: 234462	2015-04-09 03:40:33 +00:00

1 2 3 4 5 ...

998 Commits