llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	3b251963c3	[CGP] Add support for sinking operands to their users, if they are free. This patch improves code generation for some AArch64 ACLE intrinsics. It adds support to CGP to duplicate and sink operands to their user, if they can be folded into a target instruction, like zexts and sub into usubl. It adds a TargetLowering hook shouldSinkOperands, which looks at the operands of instructions to see if sinking is profitable. I decided to add a new target hook, as for the sinking to be profitable, at least on AArch64, we have to look at multiple operands of an instruction, instead of looking at the users of a zext for example. The sinking is done in CGP, because it works around an instruction selection limitation. If instruction selection is not limited to a single basic block, this patch should not be needed any longer. Alternatively this could be done in the LoopSink pass, which tries to undo LICM for instructions in blocks that are not executed frequently. Note that we do not force the operands to sink to have a single user, because we duplicate them before sinking. Therefore this is only desirable if they really can be done for free. Additionally we could consider the impact on live ranges later on. This should fix https://bugs.llvm.org/show_bug.cgi?id=40025. As for performance, we have internal code that uses intrinsics and can be speed up by 10% by this change. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D57377 llvm-svn: 353152	2019-02-05 10:27:40 +00:00
James Y Knight	7976eb5838	[opaque pointer types] Pass function types to CallInst creation. This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909	2019-02-01 20:43:25 +00:00
Sjoerd Meijer	f7cc34cae8	[SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS And instead just generate a libcall. My motivating example on ARM was a simple: shl i64 %A, %B for which the code bloat is quite significant. For other targets that also accept __int128/i128 such as AArch64 and X86, it is also beneficial for these cases to generate a libcall when optimising for minsize. On these 64-bit targets, the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS lowering operation action is not set to custom/expand. Differential Revision: https://reviews.llvm.org/D57386 llvm-svn: 352736	2019-01-31 08:07:30 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Eli Friedman	ad1151cf6a	[ARM64] [Windows] Handle funclets This patch adds support for funclets in frame lowering and ISel lowering. Together with D50288 and D50166, it enables C++ exception handling. Patch by Sanjin Sijaric, with some fixes by me. Differential Revision: https://reviews.llvm.org/D51524 llvm-svn: 346568	2018-11-09 23:33:30 +00:00
Mandeep Singh Grang	397765bc51	[COFF, ARM64] Add support for MSVC buffer security check Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D54248 llvm-svn: 346469	2018-11-09 02:48:36 +00:00
Craig Topper	0b5f8169b0	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180	2018-11-05 23:26:13 +00:00
Mandeep Singh Grang	547a0d765a	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909	2018-11-01 23:22:25 +00:00
Mandeep Singh Grang	b0cdf56dd7	Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64" This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863	2018-11-01 17:53:57 +00:00
Mandeep Singh Grang	88ad9ac720	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791	2018-10-31 23:16:20 +00:00
Tim Northover	1c353419ab	AArch64: add a pass to compress jump-table entries when possible. llvm-svn: 345188	2018-10-24 20:19:09 +00:00
Alex Bradbury	79518b02cd	[AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they will see no functional change. Previously this hook returned bool, but it now returns AtomicExpansionKind. This hook allows targets to select how a given cmpxchg is to be expanded. D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic. See my associated RFC for more info on the motivation for this change <http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>. Differential Revision: https://reviews.llvm.org/D48130 llvm-svn: 342550	2018-09-19 14:51:42 +00:00
David Green	9dd1d451d9	[AArch64] Add Tiny Code Model for AArch64 This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397	2018-08-22 11:31:39 +00:00
Eli Friedman	0d12e90bf5	[ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly. Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734	2018-08-14 22:10:25 +00:00
Craig Topper	2f60ef2c78	[DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc. The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation. llvm-svn: 338329	2018-07-30 23:22:00 +00:00
Craig Topper	a568a27dfa	[DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2. llvm-svn: 338303	2018-07-30 21:04:34 +00:00
Roman Lebedev	de506632aa	[X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But the IR-optimal patter does not lower efficiently, so we want to undo it.. This handles the simple pattern. There is a second pattern with predicate and constants inverted. NOTE: we do not check uses here. we always do the transform. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49266 llvm-svn: 337166	2018-07-16 12:44:10 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Tim Northover	70666e7765	[AArch64] Implement FLT_ROUNDS macro. Very similar to ARM implementation, just maps to an MRS. Should fix PR25191. Patch by Michael Brase. llvm-svn: 335118	2018-06-20 12:09:01 +00:00
Evandro Menezes	f8425340e4	[AArch64] Fix PR32384: bump up the number of stores per memset and memcpy As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change makes the inlining of `memset()` and `memcpy()` more aggressive when compiling for speed. The tuning remains the same when optimizing for size. Patch by: Sebastian Pop <s.pop@samsung.com> Evandro Menezes <e.menezes@samsung.com> Differential revision: https://reviews.llvm.org/D45098 llvm-svn: 333429	2018-05-29 15:58:50 +00:00
Roman Lebedev	7772de25d0	[DAGCombine][X86][AArch64] Masked merge unfolding: vector edition. Summary: This appears to be the last missing piece for the masked merge pattern handling in the backend. This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated. Differential Revision: https://reviews.llvm.org/D46528 llvm-svn: 332904	2018-05-21 21:41:02 +00:00
Haicheng Wu	0aae2bc260	[CGP] Split large data structres to sink more GEPs Accessing the members of a large data structures needs a lot of GEPs which usually have large offsets due to the size of the underlying data structure. If the offsets are too large to fit into the r+i addressing mode, these GEPs cannot be sunk to their users' blocks and many extra registers are needed then to carry the values of these GEPs. This patch tries to split a large data struct starting from %base like the following. Before: BB0: %base = BB1: %gep0 = gep %base, off0 %gep1 = gep %base, off1 %gep2 = gep %base, off2 BB2: %load1 = load %gep0 %load2 = load %gep1 %load3 = load %gep2 After: BB0: %base = %new_base = gep %base, off0 BB1: %new_gep0 = %new_base %new_gep1 = gep %new_base, off1 - off0 %new_gep2 = gep %new_base, off2 - off0 BB2: %load1 = load i32, i32* %new_gep0 %load2 = load i32, i32* %new_gep1 %load3 = load i32, i32* %new_gep2 In the above example, the struct is split into two parts. The first part still starts from %base and the second part starts from %new_base. After the splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to BB2 and folded into their users. The algorithm to split data structure is simple and very similar to the work of merging SExts. First, it collects GEPs that have large offsets when iterating the blocks. Second, it splits the underlying data structures and updates the collected GEPs to use smaller offsets. Differential Revision: https://reviews.llvm.org/D42759 llvm-svn: 332015	2018-05-10 18:27:36 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
John Brawn	e3b44f9de6	[AArch64] Don't reduce the width of loads if it prevents combining a shift Loads and stores can only shift the offset register by the size of the value being loaded, but currently the DAGCombiner will reduce the width of the load if it's followed by a trunc making it impossible to later combine the shift. Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and make it prevent the width reduction if this is what would happen, though do allow it if reducing the load width will let us eliminate a later sign or zero extend. Differential Revision: https://reviews.llvm.org/D44794 llvm-svn: 328321	2018-03-23 14:47:07 +00:00
Martin Storsjo	cc24096d4d	[AArch64] Implement native TLS for Windows Differential Revision: https://reviews.llvm.org/D43971 llvm-svn: 327220	2018-03-10 19:05:21 +00:00
Sebastian Pop	41073e8046	[AArch64] define isExtractSubvectorCheap Following the ARM-neon backend, define isExtractSubvectorCheap to return true when extracting low and high part of a neon register. The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll This testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive" all DAG transforms until ISelLowering. The testcase is supposed to check that AArch64TargetLowering::ReconstructShuffle() works, and for that we need a BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to ISelLowering. As there is no way to disable the combiner to only exercise the code in ISelLowering, the patch disables the testcase. Differential revision: https://reviews.llvm.org/D43973 llvm-svn: 326811	2018-03-06 16:54:55 +00:00
Martin Storsjo	a63a5b993e	[AArch64] Implement dynamic stack probing for windows This makes sure that alloca() function calls properly probe the stack as needed. Differential Revision: https://reviews.llvm.org/D42356 llvm-svn: 325433	2018-02-17 14:26:32 +00:00
Oliver Stannard	02f08c9d1f	[AArch64] Improve v8.1-A code-gen for atomic load-and Armv8.1-A added an atomic load-clear instruction (which performs bitwise and with the complement of it's operand), but not a load-and instruction. Our current code-generation for atomic load-and always inserts an MVN instruction to invert its argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-and operation into an xor with -1 and a load-clear, allowing the normal DAG optimisations to work on it. To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't see any easy way to do this with an AArch64-specific ISD node, because the code-generation for atomic operations assumes the SDNodes are of type AtomicSDNode. I've left the old tablegen patterns in because they are still needed for global isel. Differential revision: https://reviews.llvm.org/D42478 llvm-svn: 324908	2018-02-12 17:03:11 +00:00
Oliver Stannard	4269917304	[AArch64] Improve v8.1-A code-gen for atomic load-subtract Armv8.1-A added an atomic load-add instruction, but not a load-subtract instruction. Our current code-generation for atomic load-subtract always inserts a NEG instruction to negate it's argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-subtract operation into a subtract and a load-add, allowing the normal DAG optimisations to work on it. I've left the old tablegen patterns in because they are still needed for global isel. Some of the tests in this patch are copied from D35375 by Chad Rosier (which was abandoned). Differential revision: https://reviews.llvm.org/D42477 llvm-svn: 324892	2018-02-12 14:22:03 +00:00
Joel Jones	0715092c65	[AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others. This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma). Test case demonstrating the effects on T99 is included. Patch by: steleman (Stefan Teleman) Differential Revision: https://reviews.llvm.org/D40696 llvm-svn: 323474	2018-01-25 21:55:39 +00:00
Matthias Braun	5c290dc206	AArch64: Fix emergency spillslot being out of reach for large callframes Re-commit of r322200: The testcase shouldn't hit machineverifiers anymore with r322917 in place. Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322919	2018-01-19 03:16:36 +00:00
Evgeniy Stepanov	99fa3e774d	[hwasan] Stack instrumentation. Summary: Very basic stack instrumentation using tagged pointers. Tag for N'th alloca in a function is built as XOR of: * base tag for the function, which is just some bits of SP (poor man's random) * small constant which is a function of N. Allocas are aligned to 16 bytes. On every ReturnInst allocas are re-tagged to catch use-after-return. This implementation has a bunch of issues that will be taken care of later: 1. lifetime intrinsics referring to tagged pointers are not recognized in SDAG. This effectively disables stack coloring. 2. Generated code is quite inefficient. There is one extra instruction at each memory access that adds the base tag to the untagged alloca address. It would be better to keep tagged SP in a callee-saved register and address allocas as an offset of that XOR retag, but that needs better coordination between hwasan instrumentation pass and prologue/epilogue insertion. 3. Lifetime instrinsics are ignored and use-after-scope is not implemented. This would be harder to do than in ASan, because we need to use a differently tagged pointer depending on which lifetime.start / lifetime.end the current instruction is dominated / post-dominated. Reviewers: kcc, alekseyshl Subscribers: srhines, kubamracek, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41602 llvm-svn: 322324	2018-01-11 22:53:30 +00:00
Matthias Braun	e3a8db7ba1	Revert "AArch64: Fix emergency spillslot being out of reach for large callframes" Revert for now as the testcase is hitting a pre-existing verifier error that manifest as a failure when expensive checks are enabled (or -verify-machineinstrs) is used. This reverts commit r322200. llvm-svn: 322231	2018-01-10 22:36:28 +00:00
Matthias Braun	b42ffa1283	AArch64: Fix emergency spillslot being out of reach for large callframes Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322200	2018-01-10 18:16:24 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Matt Arsenault	7d7adf4f2e	TLI: Allow using PSV for intrinsic mem operands llvm-svn: 320756	2017-12-14 22:34:10 +00:00
Nirav Dave	7d8f3e0c93	[ARM][AArch64][DAG] Reenable post-legalize store merge Reenable post-legalize stores with constant merging computation and corresponding test case. * Properly truncate store merge constants * Disable merging of truncated stores floating points * Ensure merges of constant stores into a single vector are constructed from legal elements. Reviewers: eastig, efriedma Reviewed By: eastig Subscribers: spatel, rengolin, aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40701 llvm-svn: 319899	2017-12-06 15:30:13 +00:00
Nirav Dave	839ff79a8d	[DAG][AArch64] Disable post-legalization store Disable post-legalization store for AArch64 backend which is causing errors out-of-tree. llvm-svn: 319607	2017-12-02 04:01:26 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Martin Storsjo	373c8efa1e	[AArch64] Add support for dllimport of values and functions Previously, the dllimport attribute did the right thing in terms of treating it as a pointer to a value, but this makes sure the names get mangled properly, and calls to such functions load the function from the __imp_ pointer. This is based on SVN r212431 and r212430 where the same was implemented for ARM. Differential Revision: https://reviews.llvm.org/D38530 llvm-svn: 316555	2017-10-25 07:25:18 +00:00
Tim Northover	ef1fc5ae89	GlobalISel (AArch64): fix ABI at border between GPRs and SP. If a struct would end up half in GPRs and half on SP the ABI says it should actually go entirely on the stack. We were getting this wrong in GlobalISel before, causing compatibility issues. llvm-svn: 311388	2017-08-21 21:56:11 +00:00
Zvi Rackover	1b73682243	TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI. Changing mask argument type from const SmallVectorImpl<int>& to ArrayRef<int>. This came up in D35700 where a mask is received as an ArrayRef<int> and we want to pass it to TargetLowering::isShuffleMaskLegal(). Also saves a few lines of code. llvm-svn: 309085	2017-07-26 08:06:58 +00:00
Jonas Paulsson	024e319489	[SystemZ, LoopStrengthReduce] This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729	2017-07-21 11:59:37 +00:00
Nirav Dave	8d0ecbedbe	Avoid store merge to f128 in context of noimpiccitfloat NFCI. Prevent store merge from merging stores into an invalid 128-bit store (realized as a f128 value in the context of the noimplicitfloat attribute). Previously, such stores are immediately split back into valid stores. llvm-svn: 308184	2017-07-17 15:09:47 +00:00
Geoff Berry	b1e8714af9	[AArch64][Falkor] Avoid HW prefetcher tag collisions (step 1) Summary: This patch is the first step in reducing HW prefetcher instruction tag collisions in inner loops for Falkor. It adds a pass that annotates IR loads with metadata to indicate that they are known to be strided loads, and adds a target lowering hook that translates this metadata to a target-specific MachineMemOperand flag. A follow on change will use this MachineMemOperand flag to re-write instructions to reduce tag collisions. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34963 llvm-svn: 308059	2017-07-14 21:44:12 +00:00
Martin Storsjo	68266faa31	[AArch64] Implement support for windows style vararg functions Pass parameters properly in calls to such functions (pass all floats in integer registers), and handle va_start properly (allocate stack immediately below the arguments on the stack, to save the register arguments into a single continuous array). Differential Revision: https://reviews.llvm.org/D35006 llvm-svn: 307928	2017-07-13 17:03:12 +00:00
Amara Emerson	c9916d7e97	Re-commit r302678, fixing PR33053. The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211	2017-05-16 21:29:22 +00:00
Hans Wennborg	bd6e9e77a7	Revert r302678 "[AArch64] Enable use of reduction intrinsics." This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115	2017-05-15 20:59:32 +00:00
Amara Emerson	816542ceb3	[AArch64] Enable use of reduction intrinsics. The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678	2017-05-10 15:15:38 +00:00
Craig Topper	d0af7e8ab8	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620	2017-04-28 05:31:46 +00:00

1 2 3 4

193 Commits