llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	04840ab752	[X86] Update test case I missed in r294876. llvm-svn: 294878	2017-02-11 23:23:11 +00:00
Craig Topper	1c37e991e6	[X86] Move code for using blendi for insert_subvector out to an isel pattern. This gives the DAG combiner more opportunity to optimize without needing to dig through the blend. llvm-svn: 294876	2017-02-11 22:57:12 +00:00
Simon Pilgrim	755d9127f5	[X86][SSE] Use VSEXT/VZEXT constant folding for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG Preparatory step for PR31712 llvm-svn: 294874	2017-02-11 22:47:06 +00:00
Simon Pilgrim	437d64c49e	[X86][SSE] Improve VSEXT/VZEXT constant folding. Generalize VSEXT/VZEXT constant folding to work with any target constant bits source not just BUILD_VECTOR . llvm-svn: 294873	2017-02-11 21:55:24 +00:00
Amaury Sechet	cafc256fd4	Fix atomic-minmax-i6432.ll . llvm-svn: 294867	2017-02-11 19:34:11 +00:00
Amaury Sechet	42fb927438	Regen expected tests result. NFC llvm-svn: 294866	2017-02-11 19:27:15 +00:00
Sanjay Patel	63499b61c9	[TargetLowering] check for sign-bit comparisons in SimplifyDemandedBits I don't know if anything other than x86 vectors is affected by this change, but this may allow us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises from the fact that blendv* instructions only use the sign-bit when deciding which vector element to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already exists in x86 lowering; this demanded bits change just enables the transform to fire more often. The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, but I've explained the likely steps in this series here: https://llvm.org/bugs/show_bug.cgi?id=11210 Differential Revision: https://reviews.llvm.org/D29687 llvm-svn: 294863	2017-02-11 18:01:55 +00:00
Craig Topper	255343483d	[AVX-512] Add VPMINS/MINU/MAXS/MAXU instructions to load folding tables. llvm-svn: 294858	2017-02-11 17:35:28 +00:00
Simon Pilgrim	86a95c1ff7	[X86][3DNow!] Add tests to ensure PFMAX/PFMIN are not commuted. llvm-svn: 294848	2017-02-11 14:01:37 +00:00
Simon Pilgrim	6411a0ebed	[X86][3DNow!] Enable PFSUB<->PFSUBR commutation llvm-svn: 294847	2017-02-11 13:51:14 +00:00
Simon Pilgrim	4ead1d4aa9	[X86][3DNow!] Enable commutation for PFADD/PFMUL/PFCMPEQ/PAVGUSB/PMULHRW All commutations confirmed to give identical results - note PFMAX/PFMIN do not PFSUB<->PFSUBR should be commutable as well llvm-svn: 294846	2017-02-11 13:32:55 +00:00
Simon Pilgrim	6b4a5134af	[X86][3DNow!] Add tests showing missed commutation opportunities. llvm-svn: 294845	2017-02-11 13:00:32 +00:00
Simon Pilgrim	8158816efe	[X86][XOP] Regenerate XOP commutation tests. Added 32-bit tests as well. llvm-svn: 294841	2017-02-11 12:30:59 +00:00
Simon Pilgrim	008ba63e04	[X86][SSE] Regenerate float comparison commutation tests. llvm-svn: 294840	2017-02-11 12:29:56 +00:00
Simon Pilgrim	0d8632f089	[X86] Regenerate CLMUL commutation tests. llvm-svn: 294839	2017-02-11 12:23:22 +00:00
Craig Topper	1f6153bab4	[AVX-512] Add VPINSRB/W/D/Q instructions to load folding tables. llvm-svn: 294830	2017-02-11 07:01:40 +00:00
Craig Topper	3afa777f10	[AVX-512] Add VPSADBW instructions to load folding tables. llvm-svn: 294827	2017-02-11 06:24:03 +00:00
Craig Topper	464b8cb244	[X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 is available. Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available. This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp. Overall I think this produces better results in the modified test cases. llvm-svn: 294824	2017-02-11 05:32:57 +00:00
Wei Mi	8f20e63a20	[LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with outerloop. The recommit includes some changes of testcases. No functional change to the patch. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 294814	2017-02-11 00:50:23 +00:00
Ahmed Bougacha	2e275e272f	[X86] Bitcast subvector before broadcasting it. Since r274013, we've been looking through bitcasts on broadcast inputs. In the scalar-folding case (from a load, build_vector, or sc2vec), the input type didn't matter, as we'd simply bitcast the resulting scalar back. However, when broadcasting a 128-bit-lane-aligned element, we create an EXTRACT_SUBVECTOR. Use proper types, by creating an extract_subvector of the original input type. llvm-svn: 294774	2017-02-10 19:51:47 +00:00
Simon Pilgrim	39f8da3823	[X86][AVX512] Add vector rotate tests for AVX512 targets AVX512 does have vector rotate instructions, but we don't lower to them yet llvm-svn: 294766	2017-02-10 18:06:11 +00:00
Amaury Sechet	280ad2cebb	Autogenerate results for test/CodeGen/X86/peep-test-4.ll . NFC llvm-svn: 294765	2017-02-10 17:57:48 +00:00
Amaury Sechet	f6308cfe87	Autogenerate results for test/CodeGen/X86/pr14314.ll . NFC llvm-svn: 294764	2017-02-10 17:57:46 +00:00
Amaury Sechet	c8587e4257	Use autogenerate check in CodeGen/X86/pr16031.ll . NFC llvm-svn: 294761	2017-02-10 17:26:21 +00:00
Amaury Sechet	3b87944433	Check full codegen in CodeGen/X86/i256-add.ll NFC llvm-svn: 294756	2017-02-10 16:34:17 +00:00
Simon Pilgrim	a3362a1c9e	[X86][SSE] Added chained FDIV test cases for D26855 Tests to demonstrate throughput-latency decision between div and rcp on faster hardware such as Haswell llvm-svn: 294750	2017-02-10 14:56:12 +00:00
Simon Pilgrim	bfb1747806	[DAGCombine] Allow vector constant folding of any value type before type legalization The patch comes in 2 parts: 1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types. 2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled. Fix for PR30760 Differential Revision: https://reviews.llvm.org/D29568 llvm-svn: 294749	2017-02-10 14:37:25 +00:00
Simon Pilgrim	c371159aac	[X86][SSE] Add support for extracting target constants from BUILD_VECTOR In some cases we call getTargetConstantBitsFromNode for nodes that haven't been lowered from BUILD_VECTOR yet Note: We're getting very close to being able to move most of the constant extraction code from getTargetShuffleMaskIndices into getTargetConstantBitsFromNode llvm-svn: 294746	2017-02-10 14:04:11 +00:00
Igor Breger	b4442f34cd	[X86][GlobalISel] Add general-purpose Register Bank Summary: [X86][GlobalISel] Add general-purpose Register Bank. Add trivial handling of G_ADD legalization . Add Regestry Bank selection for COPY and G_ADD instructions Reviewers: rovka, zvi, ab, t.p.northover, qcolombet Reviewed By: qcolombet Subscribers: qcolombet, mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29771 llvm-svn: 294723	2017-02-10 07:05:56 +00:00
David L. Jones	e072cf51da	Update test/CodeGen/X86/sse-align-10.ll to use FileCheck instead of grep Patch by Jorge Gorbe (lethalantidote). Differential Revision: https://reviews.llvm.org/D29797 llvm-svn: 294686	2017-02-10 01:35:31 +00:00
Peter Collingbourne	ef089bdb4b	X86: Introduce relocImm-based patterns for cmp. Differential Revision: https://reviews.llvm.org/D28690 llvm-svn: 294636	2017-02-09 22:02:28 +00:00
Peter Collingbourne	d7dd65ad7c	X86: Teach X86InstrInfo::analyzeCompare to recognize compares of symbols. This requires that we communicate to X86InstrInfo::optimizeCompareInstr that the second operand is neither a register nor an immediate. The way we do that is by setting CmpMask to zero. Note that there were already instructions where the second operand was not a register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero for those instructions. This seems like a latent bug, but I was unable to trigger it. Differential Revision: https://reviews.llvm.org/D28621 llvm-svn: 294634	2017-02-09 21:58:24 +00:00
Simon Pilgrim	b25f60210f	[X86][BMI2] Regenerate mulx tests llvm-svn: 294598	2017-02-09 17:54:51 +00:00
David Bozier	93e773e9be	Revert: "[Stack Protection] Add diagnostic information for why stack protection was applied to a function" this reverts revision r294590 as it broke some buildbots. llvm-svn: 294593	2017-02-09 15:40:14 +00:00
Artur Pilipenko	0e4583b56c	Add DAGCombiner load combine tests for partially available values If some of the trailing or leading bytes of a load combine pattern are zeroes we can combine the pattern to a load + zext and shift. Currently we don't support it, so the tests check the current codegen without load combine. This change will make the patch to support this kind of combine a bit more clear. llvm-svn: 294591	2017-02-09 15:13:40 +00:00
David Bozier	6a44b7c2eb	[Stack Protection] Add diagnostic information for why stack protection was applied to a function Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which function have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function. This change adds an SSP-specific DiagnosticInfo class and uses of it to the Stack Protection code. A subsequent change to clang will cause the remarks to be emitted when enabled. Patch by: James Henderson Differential Revision: https://reviews.llvm.org/D29023 llvm-svn: 294590	2017-02-09 15:08:40 +00:00
Pierre Gousseau	6953b32475	[X86][btver2] PR31902: Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast math. In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants. Differential Revision: https://reviews.llvm.org/D29756 llvm-svn: 294588	2017-02-09 14:43:58 +00:00
Simon Pilgrim	05ac1f70be	[X86][SSE] Added extra FMA/NO-FMA reciprocal test cases for D26855 Test for expected codegen for nr reciprocal cases with/without FMA llvm-svn: 294587	2017-02-09 14:14:06 +00:00
Artur Pilipenko	4a64031954	[DAGCombiner] Support non-zero offset in load combine Enable folding patterns which load the value from non-zero offset: i8 a = ... i32 val = a[4] \| (a[5] << 8) \| (a[6] << 16) \| (a[7] << 24) => i32 val = ((i32*)(a+4)) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29394 llvm-svn: 294582	2017-02-09 12:06:01 +00:00
Simon Pilgrim	563e23e66e	[X86][SSE] Attempt to break register dependencies during lowerBuildVector LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register. This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD. On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.) Differential Revision: https://reviews.llvm.org/D29720 llvm-svn: 294581	2017-02-09 11:50:19 +00:00
Igor Breger	ed43f15637	Add new tests for EXTRACT_VECTOR_ELT (vector of packed i8/16/i32/i64/ps/pd data) llvm-svn: 294565	2017-02-09 07:39:19 +00:00
Craig Topper	50f3d1452c	[X86] Clzero intrinsic and its addition under znver1 This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558	2017-02-09 04:27:34 +00:00
Simon Pilgrim	696e27e1ec	[X86][SSE] Regenerate scalar integer conversions to float tests llvm-svn: 294499	2017-02-08 19:01:27 +00:00
Sanjay Patel	28ef27e3dc	[x86] add AVX512vl target for more coverage; NFC llvm-svn: 294462	2017-02-08 15:22:52 +00:00
Craig Topper	3fd463a15a	[X86] Add test for clflushopt intrinsic and only enable it to be selected if the feature flag is set. llvm-svn: 294407	2017-02-08 05:45:46 +00:00
Amaury Sechet	4b946916ac	[DAGCombiner] Push truncate through adde when the carry isn't used. Summary: As per title. Reviewers: mkuper, spatel, bkramer, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29528 llvm-svn: 294394	2017-02-08 00:32:36 +00:00
Simon Pilgrim	39c138cc76	[X86][SSE] Add SSE2 build vector insertion tests llvm-svn: 294365	2017-02-07 22:23:12 +00:00
Simon Pilgrim	90ee0b2786	[X86][SSE] Add additional v4i32/v8i16/v16i8 build vector insertion tests With particular interest in cases where we don't make use of implicit zeroing or fail to break register dependencies llvm-svn: 294363	2017-02-07 22:03:37 +00:00
Hans Wennborg	819e3e02a9	[X86] Disable conditional tail calls (PR31257) They are currently modelled incorrectly (as calls, which clobber registers, confusing e.g. Machine Copy Propagation). Reverting until we figure out the proper solution. llvm-svn: 294348	2017-02-07 20:37:45 +00:00
Sanjoy Das	2f63cbcc0c	[ImplicitNullCheck] Extend Implicit Null Check scope by using stores Summary: This change allows usage of store instruction for implicit null check. Memory Aliasing Analisys is not used and change conservatively supposes that any store and load may access the same memory. As a result re-ordering of store-store, store-load and load-store is prohibited. Patch by Serguei Katkov! Reviewers: reames, sanjoy Reviewed By: sanjoy Subscribers: atrick, llvm-commits Differential Revision: https://reviews.llvm.org/D29400 llvm-svn: 294338	2017-02-07 19:19:49 +00:00

1 2 3 4 5 ...

9028 Commits