llvm-project

Commit Graph

Author	SHA1	Message	Date
Justin Lebar	ed1e312f05	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass. Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642	2016-10-31 21:51:42 +00:00
Nemanja Ivanovic	60bdfe5a7c	[PPC] add absolute difference altivec instructions and matching intrinsics This patch corresponds to review https://reviews.llvm.org/D26072. Committing on behalf of Sean Fertile. llvm-svn: 285627	2016-10-31 19:47:52 +00:00
Tim Northover	037af52c8b	GlobalISel: allow truncating pointer casts on AArch64. llvm-svn: 285615	2016-10-31 18:31:09 +00:00
Tim Northover	cdf23f1d93	GlobalISel: translate stack protector intrinsics llvm-svn: 285614	2016-10-31 18:30:59 +00:00
Krzysztof Parzyszek	22586dcb2a	[Hexagon] Don't expand mux instructions with both sources identical llvm-svn: 285588	2016-10-31 15:45:09 +00:00
Manuel Klimek	7c41f20a04	Add triple to test so it does not fail on windows. llvm-svn: 285560	2016-10-31 11:40:14 +00:00
Manuel Klimek	bab67d2af4	Delete .s file that did not test anything, and check in test that works. In D26098, Davide Italiano submitted a .s file instead of the .ll file that was the last stage of the review. llvm-svn: 285559	2016-10-31 11:18:39 +00:00
Craig Topper	d4e580705d	[AVX-512] Add missing patterns for selecting masked vector extracts that started from shuffles. llvm-svn: 285546	2016-10-31 05:55:57 +00:00
Sanjay Patel	339a51ac13	[DAG] x \| x --> x llvm-svn: 285522	2016-10-30 18:19:35 +00:00
Sanjay Patel	13aee345ca	[DAG] x & x --> x llvm-svn: 285521	2016-10-30 18:13:30 +00:00
Sanjay Patel	8a5f9810a0	[x86] add tests for basic logic op folds llvm-svn: 285520	2016-10-30 18:04:19 +00:00
Sanjay Patel	36eeb6d6f6	[ValueTracking] recognize more variants of smin/smax Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202. There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding smax pattern and commuted variants. The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR. Differential Revision: https://reviews.llvm.org/D26091 llvm-svn: 285499	2016-10-29 16:21:19 +00:00
Sanjay Patel	e9fa95e572	[x86] add tests for smin/smax matchSelPattern (D26091) llvm-svn: 285498	2016-10-29 16:02:57 +00:00
Simon Pilgrim	75a697a17e	[DAGCombiner] (REAPPLIED) Add vector demanded elements support to computeKnownBits Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used. I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course. DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit. This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander! Differential Revision: https://reviews.llvm.org/D25691 llvm-svn: 285494	2016-10-29 11:29:39 +00:00
Elena Demikhovsky	519b4ccd70	Fixed FMA + FNEG combine. Masked form of FMA should be omitted in this optimization. Differential Revision: https://reviews.llvm.org/D25984 llvm-svn: 285492	2016-10-29 08:44:46 +00:00
Matt Arsenault	c88ba36eab	AMDGPU: Use 1/2pi inline imm on VI I'm guessing at how it is supposed to be printed llvm-svn: 285490	2016-10-29 04:05:06 +00:00
Davide Italiano	86168b23cf	[DAGCombiner] Fix a crash visiting `AND` nodes. Instead of asserting that the shift count is != 0 we just bail out as it's not profitable trying to optimize a node which will be removed anyway. Differential Revision: https://reviews.llvm.org/D26098 llvm-svn: 285480	2016-10-28 23:55:32 +00:00
Tom Stellard	6695ba0440	AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions Summary: Flat instruction can return out of order, so we need always need to wait for all the outstanding flat operations. Reviewers: tony-tye, arsenm Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D25998 llvm-svn: 285479	2016-10-28 23:53:48 +00:00
Matt Arsenault	7b6475568d	AMDGPU: Add definitions for scalar store instructions Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463	2016-10-28 21:55:15 +00:00
Justin Lebar	f0a80ba385	[NVPTX] Compute 'rem' using the result of 'div', if possible. Summary: In isel, transform Num % Den into Num - (Num / Den) * Den if the result of Num / Den is already available. Reviewers: tra Subscribers: hfinkel, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26090 llvm-svn: 285461	2016-10-28 21:44:00 +00:00
Matt Arsenault	b5f2bb1a88	AMDGPU: Change check prefix in test llvm-svn: 285449	2016-10-28 20:33:01 +00:00
Matt Arsenault	4eae301995	AMDGPU: Diagnose using too many SGPRs This is possible when using inline asm. llvm-svn: 285447	2016-10-28 20:31:47 +00:00
Krzysztof Parzyszek	2717175c99	Handle non-~0 lane masks on live-in registers in LivePhysRegs When LivePhysRegs adds live-in registers, it recognizes ~0 as a special lane mask indicating the entire register. If the lane mask is not ~0, it will only add the subregisters that overlap the specified lane mask. The problem is that if a live-in register does not have subregisters, and the lane mask is not ~0, it will not be added to the live set. (The given lane mask may simply be the lane mask of its register class.) If a register does not have subregisters, add it to the live set if the lane mask is non-zero. Differential Revision: https://reviews.llvm.org/D26094 llvm-svn: 285440	2016-10-28 20:06:37 +00:00
Matt Arsenault	08906a3c62	AMDGPU: Fix using incorrect private resource with no allocation It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435	2016-10-28 19:43:31 +00:00
Nemanja Ivanovic	e28a0fc72a	Implement vector count leading/trailing bytes with zero lsb and vector parity builtins - llvm portion This patch corresponds to review https://reviews.llvm.org/D26003. Committing on behalf of Zaara Syeda. llvm-svn: 285434	2016-10-28 19:38:24 +00:00
Arnold Schwaighofer	6200b2b67e	Make swift calling convention test specific to armv7 llvm-svn: 285431	2016-10-28 19:18:09 +00:00
Sanjay Patel	03a585e882	[x86] add tests for missed umin/umax This is actually a deficiency in ValueTracking's matchSelectPattern(), but a codegen test is the simplest way to expose the bug. llvm-svn: 285429	2016-10-28 19:08:20 +00:00
Arnold Schwaighofer	7f4b31c057	More swift calling convention tests llvm-svn: 285417	2016-10-28 17:21:05 +00:00
Krzysztof Parzyszek	87a47be039	[Hexagon] Maintain kill flags through splitting in expand-condsets Do not use LiveIntervals to recalculate kills, because that cannot be done accurately without implicit uses on predicated instructions. llvm-svn: 285409	2016-10-28 15:50:22 +00:00
Juergen Ributzka	5cee232be4	Revert "[DAGCombiner] Add vector demanded elements support to computeKnownBits" This seems to have increased LTO compile time bejond 2x of previous builds. See http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto/10676/ llvm-svn: 285381	2016-10-28 04:01:12 +00:00
Tom Stellard	aea899e2a0	AMDGPU/SI: Handle hazard with s_rfe_b64 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25638 llvm-svn: 285368	2016-10-27 23:50:21 +00:00
Tom Stellard	04051b5fad	AMDGPU/SI: Handle hazard with sgpr lane selects for v_{read,write}lane Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25637 llvm-svn: 285367	2016-10-27 23:42:29 +00:00
Tom Stellard	b133fbb9a4	AMDGPU/SI: Handle hazard with > 8 byte VMEM stores Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25577 llvm-svn: 285359	2016-10-27 23:05:31 +00:00
Tom Stellard	30d30824b4	AMDGPU/SI: Handle s_setreg hazard in GCNHazardRecognizer Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25528 llvm-svn: 285338	2016-10-27 20:39:09 +00:00
Ehsan Amiri	2492721c36	[PPC] Adding the removed testcase again This testcase was originally part of r284995, but I put it in a wrong directory. So I removed it. Before adding it back I did some small enhancements. Also I changed the assertions a little bit, to take into account the impact of some changes performed since code review is done. This is similar to changes done for another testcase in the original commit. See: https://reviews.llvm.org/D23614#577749 Basically for instead of vxor we now generate xxlxor in some cases, which is better. llvm-svn: 285333	2016-10-27 19:10:09 +00:00
Saleem Abdulrasool	075d2e3c59	ARM: ensure that the Windows DBZ check is in range The Windows ARM target expects the compiler to emit a division-by-zero check. The check would use the form of: cmp r?, #0 cbz .Ltrap b .Lbody .Lbody: ... .Ltrap: udf #249 @ __brkdiv0 This works great most of the time. However, if the body of the function is greater than 127 bytes, the branch target limitation of cbz becomes an issue. This occurs in the unoptimized code generation cases sometimes (like in compiler-rt). Since this is a matter of correctness, possibly pay a small penalty instead. We now form this slightly differently: cbnz .Lbody udf #249 @ __brkdiv0 .Lbody: ... The positive case is through the branch instead of being the next instruction. However, because of the basic block layout, the negated branch is going to be a short distance always (2 bytes away, after the inserted __brkdiv0). The new t__brkdiv0 instruction is required to explicitly mark the instruction as a terminator as the generic UDF instruction is not a terminator. Addresses PR30532! llvm-svn: 285312	2016-10-27 16:59:22 +00:00
Vasileios Kalintiris	cfb005a0ee	[mips] Do not allow -opt-bisect-limit to skip the PIC call optimization pass. r282428 added the MipsOptimizePICCall as an opt-in pass that can be skipped when using the -opt-bisect-limit option. However, this pass is needed because it generates code that conforms to the o32 ABI specification by using the $t9 register for PIC calls with JALR instructions. This bug was exposed by the fact that skipFunction() also checks for the "optnone" attribute. This caused functions with that attribute to break the requirements of the o32 ABI. llvm-svn: 285305	2016-10-27 15:50:36 +00:00
Simon Pilgrim	820e1326d7	[X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64 With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304	2016-10-27 15:27:00 +00:00
Krzysztof Parzyszek	046da74699	[Hexagon] Do not expand ISD::SELECT for HVX vectors llvm-svn: 285297	2016-10-27 14:30:16 +00:00
Simon Pilgrim	01e755eab1	[DAGCombiner] Add vector demanded elements support to computeKnownBits Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used. I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course. DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit. Differential Revision: https://reviews.llvm.org/D25691 llvm-svn: 285296	2016-10-27 14:29:28 +00:00
Sam Parker	09947a3155	[ARM] Add newline char to test. Missed a newline in the previous commit. Differential Revision: https://reviews.llvm.org/D26027 llvm-svn: 285280	2016-10-27 10:43:02 +00:00
Sam Parker	e7d9505c08	[ARM] Predicate UMAAL selection on hasDSP. UMAAL is a DSP instruction and it is not available on thumbv7m (Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong CHECK prefix in longMAC.ll test. Patch by Vadzim Dambrouski. Differential Revision: https://reviews.llvm.org/D25890 llvm-svn: 285278	2016-10-27 09:47:10 +00:00
Nicolai Haehnle	7b0e25b7ad	AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies Summary: When finding a match for a merge and collecting the instructions that must be moved, keep in mind that the instruction we merge might actually use one of the defs that are being moved. Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect]. The fact that the ds_read in the test case is not eliminated suggests that there might be another problem related to alias analysis, but that's a separate problem: this pass should still work correctly even when earlier optimization passes missed something or were disabled. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25829 llvm-svn: 285273	2016-10-27 08:15:07 +00:00
Nemanja Ivanovic	32b5fed639	[PowerPC] - No SExt/ZExt needed for count trailing zeros This patch corresponds to review: https://reviews.llvm.org/D25896 It just eliminates the redundant ZExt after a count trailing zeros instruction. llvm-svn: 285267	2016-10-27 05:17:58 +00:00
Tim Northover	a9cc385664	ARM: don't rely on push/pop reglists being in order when folding SP adjust. It would be a very nice invariant to rely on, but unfortunately it doesn't necessarily hold (and the causes of mis-sorted reglists appear to be quite varied) so to be robust the frame lowering code can't assume that the first register in the list is also the first one that actually gets pushed. Should fix an issue where we were turning something like: push {r8, r4, r7, lr} sub sp, #24 into nonsense like: push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr} llvm-svn: 285232	2016-10-26 20:01:00 +00:00
Nemanja Ivanovic	275853e777	Do not assume that FP vector operands are never legalized by expanding This patch ensures that if a floating point vector operand is legalized by expanding, it is legalized through the stack rather than by calling DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand is a non-integer type. This fixes PR 30715. llvm-svn: 285231	2016-10-26 19:51:35 +00:00
Nemanja Ivanovic	0f45998bc6	[PowerPC] Implement vec_insert_exp builtins - llvm portion This revision corresponds to review: https://reviews.llvm.org/D25957. Committing on behalf of Zaara Syeda. llvm-svn: 285225	2016-10-26 19:03:40 +00:00
Chad Rosier	96e5e16acb	Fix test from r285217. llvm-svn: 285222	2016-10-26 18:49:16 +00:00
Chad Rosier	0c621fda0d	[AArch64] Avoid materializing constant 1 when generating cneg instructions. Instead of cmp w0, #1 orr w8, wzr, #0x1 cneg w0, w8, ne we now generate cmp w0, #1 csinv w0, w0, wzr, eq PR28965 llvm-svn: 285217	2016-10-26 18:15:32 +00:00
Yaxun Liu	94add85adb	AMDGPU: Refactor processor definition to use ISA version features Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend. Refactor processor definition to use ISA version features. Fixed ISA version for stoney. Based on Laurent Morichetti's patch. Differential Revision: https://reviews.llvm.org/D25919 llvm-svn: 285210	2016-10-26 16:37:56 +00:00

1 2 3 4 5 ...

17893 Commits