llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	80aa2290fb	[X86] Merge the different Jcc instructions for each condition code into single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon Reviewed By: RKSimon Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60228 llvm-svn: 357802	2019-04-05 19:28:09 +00:00
Craig Topper	7323c2bf85	[X86] Merge the different SETcc instructions for each condition code into single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri Reviewed By: andreadb Subscribers: hiraditya, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60138 llvm-svn: 357801	2019-04-05 19:27:49 +00:00
Craig Topper	e0bfeb5f24	[X86] Merge the different CMOV instructions for each condition code into single instructions that store the condition code as an immediate. Summary: Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models. This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between CMOV instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. This does complicate the scheduler models a little since we can't assign the A and BE instructions to a separate class now. I plan to make similar changes for SETcc and Jcc. Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet Reviewed By: RKSimon Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60041 llvm-svn: 357800	2019-04-05 19:27:41 +00:00
Clement Courbet	1d8c9dfe03	[ExpandMemCmp][NFC] Add tests for `memcmp(p, q, n) < 0` case. llvm-svn: 357767	2019-04-05 15:03:25 +00:00
Simon Pilgrim	17586cda4a	[SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D60006 llvm-svn: 357765	2019-04-05 14:56:21 +00:00
Sanjay Patel	50a8652785	[DAGCombiner][x86] scalarize splatted vector FP ops There are a variety of vector patterns that may be profitably reduced to a scalar op when scalar ops are performed using a subset (typically, the first lane) of the vector register file. For x86, this is true for float/double ops and element 0 because insert/extract is just a sub-register rename. Other targets should likely enable the hook in a similar way. Differential Revision: https://reviews.llvm.org/D60150 llvm-svn: 357760	2019-04-05 13:32:17 +00:00
Simon Pilgrim	faa5b939f0	[X86][AVX] Add PR34584 masked store test cases llvm-svn: 357757	2019-04-05 11:34:30 +00:00
Simon Pilgrim	329e63b915	[X86] Add SSE/AVX1/AVX2 masked trunc+store tests llvm-svn: 357756	2019-04-05 11:22:28 +00:00
Piotr Sobczak	0376ac1d94	[SelectionDAG] Compute known bits of CopyFromReg Summary: Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if the virtual reg used has one def only. This can be particularly useful when calling isBaseWithConstantOffset() with the ISD::CopyFromReg argument, as more optimizations may get enabled in the result. Also add a missing truncation on X86, found by testing of this patch. Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa Reviewers: bogner, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59535 llvm-svn: 357745	2019-04-05 07:44:09 +00:00
Craig Topper	94f1772b1e	[X86] Promote i16 SRA instructions to i32 We already promote SRL and SHL to i32. This will introduce sign extends sometimes which might be harder to deal with than the zero we use for promoting SRL. I ran this through some of our internal benchmark lists and didn't see any major regressions. I think there might be some DAG combine improvement opportunities in the test changes here. Differential Revision: https://reviews.llvm.org/D60278 llvm-svn: 357743	2019-04-05 06:32:50 +00:00
Serguei Katkov	c39636cc2c	[FastISel] Fix crash for gc.relocate lowring Lowering safepoint checks that all gc.relocaes observed in safepoint must be lowered. However Fast-Isel is able to skip dead gc.relocate. To resolve this issue we just ignore dead gc.relocate in the check. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60184 llvm-svn: 357742	2019-04-05 05:41:08 +00:00
James Y Knight	a040174418	Revert [X86] When using Win64 ABI, exit with error if SSE is disabled for varargs It unnecessarily breaks previously-working code which used varargs, but didn't pass any float/double arguments (such as EDK2). Also revert the fixup on top of that: Revert [X86] Fix a test from r357317 This reverts r357317 (git commit `d413f41de6`) This reverts r357380 (git commit `7af32444b9`) llvm-svn: 357718	2019-04-04 19:05:48 +00:00
Sanjay Patel	17648b848e	[x86] eliminate unnecessary broadcast of horizontal op This is another pattern that comes up if we more aggressively scalarize FP ops. llvm-svn: 357703	2019-04-04 14:46:13 +00:00
Craig Topper	3649c20884	[X86] Use INSERT_SUBREG rather than SUBREG_TO_REG when creating LEA64_32 during isel. SUBREG_TO_REG is supposed to be used to assert that we know the upper bits are zero. But that isn't the case here. We've done no analysis of the inputs. llvm-svn: 357673	2019-04-04 05:00:18 +00:00
Serguei Katkov	fb44846e37	[FastISel] Fix the crash in gc.result lowering The Fast ISel has a fallback to SelectionDAGISel in case it cannot handle the instruction. This works as follows: Using reverse order, try to select instruction using Fast ISel, if it cannot handle instruction it fallbacks to SelectionDAGISel for these instructions if it is a call and continue fast instruction selections. However if unhandled instruction is not a call or statepoint related instruction it fallbacks to SelectionDAGISel for all remaining instructions in basic block. However gc.result instruction is missed and as a result it is possible that gc.result is processed earlier than statepoint causing breakage invariant the gc.results should be handled after statepoint. Test is updated because in the current form fast-isel cannot handle ret instruction (due to i1 ret type without explicit ext) and as a result test does not check fast-isel at all. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60182 llvm-svn: 357672	2019-04-04 04:19:56 +00:00
Craig Topper	051bd16faf	[X86] Remove CustomInserters for RDPKRU/WRPKRU. Use some custom lowering and new ISD opcodes instead. These inserters inserted some instructions to zero some registers and copied from virtual registers to physical registers. This change instead inserts the zeros directly into the DAG at lowering time using new ISD opcodes that take the extra zeroes as inputs. The zeros will then go through isel on their own to select the MOV32r0 pseudo. Then we just need to mention the physical registers directly in the isel patterns and the isel table and InstrEmitter will take care of inserting the necessary copies to/from physical registers. llvm-svn: 357659	2019-04-04 00:28:49 +00:00
Craig Topper	52cac4b79f	[X86] Remove CustomInserter pseudos for MONITOR/MONITORX/CLZERO. Use custom instruction selection instead. This custom inserter existed so we could do a weird thing where we pretended that the instructions support a full address mode instead of taking a pointer in EAX/RAX. I think was largely so we could be pointer size agnostic in the isel pattern. To make this work we would then put the address into an LEA into EAX/RAX in front of the instruction after isel. But the LEA is overkill when we just have a base pointer. So we end up using the LEA as a slower MOV instruction. With this change we now just do custom selection during isel instead and just assign the incoming address of the intrinsic into EAX/RAX based on its size. After the intrinsic is selected, we can let isel take care of selecting an LEA or other operation to do any address computation needed in this basic block. I've also split the instruction into a 32-bit mode version and a 64-bit mode version so the implicit use is properly sized based on the pointer. Without this we get comments in the assembly output about killing eax and defing rax or vice versa depending on whether we define the instruction to use EAX/RAX. llvm-svn: 357652	2019-04-03 23:28:30 +00:00
Craig Topper	477008bd50	[X86] Remove dead CHECK lines for a test. NFC llvm-svn: 357651	2019-04-03 23:28:18 +00:00
Craig Topper	437b45a1f8	[X86] Autogenerate checks. NFC llvm-svn: 357650	2019-04-03 23:28:11 +00:00
Sanjay Patel	c9a012e4ea	[x86] fold shuffles of h-ops that have an undef operand If an operand is undef, we can assume it's the same as the other operand. llvm-svn: 357644	2019-04-03 22:40:35 +00:00
Sanjay Patel	61b5e3c6a9	[x86] eliminate movddup of horizontal op This pattern would show up as a regression if we more aggressively convert vector FP ops to scalar ops. There's still a missed optimization for the v4f64 legal case (AVX) because we create that h-op with an undef operand. We should probably just duplicate the operands for that pattern to avoid trouble. llvm-svn: 357642	2019-04-03 22:15:29 +00:00
Sanjay Patel	0b874c7c60	[x86] add another test for disguised h-op; NFC llvm-svn: 357636	2019-04-03 21:10:55 +00:00
Sanjay Patel	8c9ceecdc6	[x86] add test for disguised horizontal op; NFC llvm-svn: 357630	2019-04-03 20:34:22 +00:00
Krzysztof Parzyszek	4841643a1d	[X86] Extend boolean arguments to inline-asm according to getBooleanType Differential Revision: https://reviews.llvm.org/D60208 llvm-svn: 357615	2019-04-03 17:43:14 +00:00
Simon Pilgrim	15919ad306	[X86][AVX] combineHorizontalPredicateResult - split any/allof v16i16/v32i8 reduction on AVX1 Perform the 2 x 128-bit lo/hi OR/AND on the vectors before calling PMOVMSKB on the 128-bit result. llvm-svn: 357611	2019-04-03 17:28:34 +00:00
Simon Pilgrim	9e28dddf55	[X86][AVX] combineHorizontalPredicateResult - support v16i16/v32i8 reduction on AVX1 Use getPMOVMSKB helper which splits v32i8 MOVMSK calls on pre-AVX2 targets. llvm-svn: 357608	2019-04-03 17:17:13 +00:00
Sanjay Patel	8055034666	[x86] make stack folding tests immune to unrelated transforms; NFC llvm-svn: 357604	2019-04-03 16:33:24 +00:00
Sanjay Patel	281cf28329	[x86] remove duplicate tests Accidentally double-committed these. llvm-svn: 357593	2019-04-03 14:45:45 +00:00
Sanjay Patel	393458f3ed	[x86] add negative tests for FP scalarization; NFC These go with the proposal in D60150. llvm-svn: 357592	2019-04-03 14:41:28 +00:00
Sanjay Patel	04848090cd	[x86] add tests with constants for FP scalarization; NFC llvm-svn: 357591	2019-04-03 14:41:24 +00:00
Sanjay Patel	eb5ffc7842	[x86] add tests with constants for FP scalarization; NFC llvm-svn: 357587	2019-04-03 14:36:47 +00:00
Sanjay Patel	00dae6b22d	[DAGCombiner] loosen restrictions for moving shuffles after vector binop There are 3 changes to make this correspond to the same transform in instcombine: 1. Remove the legality check - we can't create anything less legal than we started with. 2. Ease the use restriction, so we only bail out if both operands have >1 use. 3. Ease the use restriction for binops with a repeated operand (eg, mul x, x). As discussed in D60150, there's a scalarization opportunity that will be made easier by allowing this transform more generally. llvm-svn: 357580	2019-04-03 13:42:06 +00:00
Simon Pilgrim	143279e61f	[X86] Regenerate LEA codegen tests llvm-svn: 357573	2019-04-03 12:33:16 +00:00
Clement Courbet	26a8ed3ac9	[X86] Make the post machine scheduler macrofusion-aware. Summary: Given that X86 does not use this currently, this is an NFC. I'll experiment with enabling and will report numbers. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60185 llvm-svn: 357568	2019-04-03 09:37:30 +00:00
Clement Courbet	5bfa946d69	[X86][NFC] Add tests for misched macro-fusion. llvm-svn: 357565	2019-04-03 08:21:54 +00:00
Hans Wennborg	94b867dc7c	Revert r357256 "[DAGCombine] Improve Lifetime node chains." As it caused a pathological compile-time regressionin V8, see PR41352. > Improve both start and end lifetime nodes chain dependencies. > > Reviewers: courbet > > Reviewed By: courbet > > Subscribers: hiraditya, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D59795 This also reverts the follow-up r357309: > [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop. > > Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357563	2019-04-03 07:41:58 +00:00
Craig Topper	16683a3ef8	[X86] Update the test case for v4i1 bitselect in combine-bitselect.ll to not have an infinite loop in IR. In fact we don't even need a loop at all. I backed out the bug fix this was testing for and verified that this new case hit the same issue. This should stop D59626 from deleting some of this code by realizing it was dead due to the loop. llvm-svn: 357544	2019-04-03 00:05:03 +00:00
Craig Topper	ca9eb68541	[X86] Autogenerate complete checks. NFC llvm-svn: 357543	2019-04-03 00:04:57 +00:00
Stanislav Mekhanoshin	ea2e227926	X86: regenerate speculative-load-hardening-indirect.ll tests. NFC. llvm-svn: 357537	2019-04-02 22:44:46 +00:00
Sanjay Patel	8e6d41aeb2	[x86] add more tests for FP scalarization; NFC llvm-svn: 357523	2019-04-02 20:24:06 +00:00
Craig Topper	0d3a533270	[X86] Allow FixupLEAs to form INC/DEC under OptSize not just MinSize This matches our usual INC/DEC heuristic used during isel. llvm-svn: 357497	2019-04-02 17:13:03 +00:00
Simon Pilgrim	64bd87ad4b	[X86][AVX] Add test case showing failure to fold broadcast load if its also used as a scalar llvm-svn: 357465	2019-04-02 10:31:00 +00:00
Craig Topper	536383a354	[X86] Add test cases to fixup-lea.ll for optsize and no size optimization. Add +/-slow-incdec command lines We only form inc/dec in FixupLEAs under minsize today, but all other locations in the compiler for inc/dec with optsize. llvm-svn: 357446	2019-04-02 00:54:22 +00:00
Craig Topper	c133015975	[X86] Autogenerate complete checks. NFC llvm-svn: 357445	2019-04-02 00:54:15 +00:00
Clement Courbet	7e062c9b1f	[X86] Make post-ra scheduling macrofusion-aware. Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59688 llvm-svn: 357384	2019-04-01 13:48:50 +00:00
Clement Courbet	d9f6ee1c3c	[X86MacroFusion][NFC] Add more tests. In preparation for D59688. llvm-svn: 357381	2019-04-01 13:18:34 +00:00
Krasimir Georgiev	7af32444b9	[X86] Fix a test from r357317 Summary: The missing `<` causes the lld command to override the test file, which fails in environments marking the test files as readonly. Reviewers: bkramer Reviewed By: bkramer Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60060 llvm-svn: 357380	2019-04-01 11:42:54 +00:00
Simon Pilgrim	e8c3136994	[X86][SSE] Add fcmp constant folding tests Initial test coverage for D60006 llvm-svn: 357379	2019-04-01 10:54:04 +00:00
Sanjay Patel	e1bc360fc6	[x86] allow movmsk with 2-element reductions One motivation for making this change is that the lack of using movmsk is likely a main source of perf difference between clang and gcc on the C-Ray benchmark as shown here: https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5 ...but this change alone isn't enough to solve that problem. The 'all-of' examples show what is likely the worst case trade-off: we end up with an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of' examples look clearly better using movmsk because we've traded 2 vector instructions for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'. If we examine the llvm-mca output for these cases, it appears that even though the 'all-of' movmsk variant looks worse on paper, it would perform better on both Haswell and Jaguar. $ llvm-mca -mcpu=haswell no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 504 Total uOps: 400 Dispatch Width: 4 uOps Per Cycle: 0.79 IPC: 0.79 Block RThroughput: 1.0 $ llvm-mca -mcpu=haswell movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 358 Total uOps: 600 Dispatch Width: 4 uOps Per Cycle: 1.68 IPC: 1.68 Block RThroughput: 1.5 $ llvm-mca -mcpu=btver2 no_movmsk.s -timeline Iterations: 100 Instructions: 400 Total Cycles: 407 Total uOps: 400 Dispatch Width: 2 uOps Per Cycle: 0.98 IPC: 0.98 Block RThroughput: 2.0 $ llvm-mca -mcpu=btver2 movmsk.s -timeline Iterations: 100 Instructions: 600 Total Cycles: 311 Total uOps: 600 Dispatch Width: 2 uOps Per Cycle: 1.93 IPC: 1.93 Block RThroughput: 3.0 Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if that's true, then we're also almost certainly making the wrong transform already for reductions with >2 elements, so that should be fixed independently. Differential Revision: https://reviews.llvm.org/D59997 llvm-svn: 357367	2019-03-31 15:11:34 +00:00
Craig Topper	e4a0fc7d75	[X86] Teach isel for RMW binops to handle negate Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A) Differential Revision: https://reviews.llvm.org/D60007 llvm-svn: 357353	2019-03-30 18:59:17 +00:00
Simon Pilgrim	10c9032c02	[X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316) Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits. llvm-svn: 357351	2019-03-30 17:12:29 +00:00
Simon Pilgrim	cfdf09ba7d	[X86][SSE] Add PAVG test case from PR41316 llvm-svn: 357346	2019-03-30 13:53:11 +00:00
Amara Emerson	d413f41de6	[X86] When using Win64 ABI, exit with error if SSE is disabled for varargs We need XMM registers to handle varargs with the Win64 ABI. Before we would silently generate bad code resulting in an assertion failure elsewhere in the backend. llvm-svn: 357317	2019-03-29 21:30:51 +00:00
Craig Topper	103fbbbfca	[X86] Add test cases showing failure to use RMW form of negate when only flags are used. NFC llvm-svn: 357300	2019-03-29 19:09:37 +00:00
Nirav Dave	fe59e14031	[DAGCombine] Prune unnused nodes. Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283	2019-03-29 17:35:56 +00:00
Simon Pilgrim	4e00a93558	[X86] Fix some tests using fcmp with undef arguments Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC) llvm-svn: 357278	2019-03-29 17:20:27 +00:00
Sanjay Patel	12685d0f7c	[DAGCombiner] simplify shuffle of shuffle After investigating the examples from D59777 targeting an SSE4.1 machine, it looks like a very different problem due to how we map illegal types (256-bit in these cases). We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand. We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that generality means it is limited to patterns with a one-use constraint, and the examples here have 2 uses. We don't need any uses or legality limitations for a simplification (no new value is created). It looks like we miss this pattern in IR too. In one of the zext examples here, we have shuffle masks like this: Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7> Shuf = vector_shuffle<4,u,6,7,u,u,u,u> ...so that's moving the high half of the 1st vector into the low half. But the high half of the 1st vector is already identical to the low half. Differential Revision: https://reviews.llvm.org/D59961 llvm-svn: 357258	2019-03-29 14:20:38 +00:00
Nirav Dave	9259de217e	[DAGCombine] Improve Lifetime node chains. Improve both start and end lifetime nodes chain dependencies. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59795 llvm-svn: 357256	2019-03-29 14:09:47 +00:00
Sanjay Patel	665a385035	[DAGCombiner] fold sext into decrement This is a sibling to rL357178 that I noticed we'd hit if we chose an alternate transform in D59818. %z = zext i8 %x to i32 %dec = add i32 %z, -1 %r = sext i32 %dec to i64 => %z2 = zext i8 %x to i64 %r = add i64 %z2, -1 https://rise4fun.com/Alive/kPP The x86 vector diffs show a slight regression, so there's a chance that we should limit this and the previous transform to scalars. But given that we allowed vectors before, I'm matching that behavior here. We should change both transforms together if that's the right thing to do. llvm-svn: 357254	2019-03-29 13:49:08 +00:00
Hans Wennborg	800b12f90a	Switch lowering: exploit unreachable fall-through when lowering case range cluster In the example below, we would previously emit two range checks, one for cases 1--3 and one for 4--6. This patch makes us exploit the fact that the fall-through is unreachable and only one range check is necessary. switch i32 %i, label %default [ i32 1, label %bb1 i32 2, label %bb1 i32 3, label %bb1 i32 4, label %bb2 i32 5, label %bb2 i32 6, label %bb2 ] default: unreachable llvm-svn: 357252	2019-03-29 13:40:05 +00:00
Sanjay Patel	881bcbe094	[x86] add tests for decrement+sext; NFC llvm-svn: 357251	2019-03-29 13:34:48 +00:00
Simon Pilgrim	aeaf7fcdde	[X86] Add X86TargetLowering::isCommutativeBinOp override. We currently just have test coverage for PMULUDQ - will add more in the future. llvm-svn: 357244	2019-03-29 11:25:58 +00:00
Craig Topper	c25c9b4d16	[X86] Teach the isel optimization for (x << C1) op C2 to (x op (C2>>C1)) << C1 to consider cases where C2>>C1 can fit an unsigned 32-bit immediate For 64-bit operations we should consider if the immediate can be made to fit in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate with MOV32ri instead of movabsq. For AND this allows us to fold the immediate. Differential Revision: https://reviews.llvm.org/D59867 llvm-svn: 357196	2019-03-28 18:05:37 +00:00
Sanjay Patel	ffa8d3def7	[DAGCombiner] fold sext into negation As noted in D59818: %z = zext i8 %x to i32 %neg = sub i32 0, %z %r = sext i32 %neg to i64 => %z2 = zext i8 %x to i64 %r = sub i64 0, %z2 https://rise4fun.com/Alive/KzSR llvm-svn: 357178	2019-03-28 15:46:02 +00:00
Sanjay Patel	e781528278	[x86] add vector test for sext of negate; NFC llvm-svn: 357177	2019-03-28 15:30:09 +00:00
Sanjay Patel	5bbf6f0bd8	[x86] avoid cmov in movmsk reduction This is probably the least important of our movmsk problems, but I'm starting at the bottom to reduce distractions. We were creating a select_cc which bypasses the select and bitmask codegen optimizations that we have now. If we produce a compare+negate instead, we allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov. There's no partial register update danger in these sequences because we always produce the zero-register xor ahead of the 'set' if needed. There seems to be a missing fold for sext of a bool bit here: negl %ecx movslq %ecx, %rax ...but that's an independent transform. Differential Revision: https://reviews.llvm.org/D59818 llvm-svn: 357172	2019-03-28 14:16:13 +00:00
Clement Courbet	699dc025a6	[X86MacroFusion] Handle branch fusion (AMD CPUs). Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171	2019-03-28 14:12:46 +00:00
Simon Pilgrim	38a0616c1d	[DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161	2019-03-28 11:34:21 +00:00
Simon Pilgrim	22be913ac0	[X85][AVX] Add missing vXi16 broadcast fold patterns Now that D59484 has landed its easier to add these. Added missing AVX512BW v32i16 equivalents while I was at it. llvm-svn: 357155	2019-03-28 10:25:13 +00:00
Craig Topper	929932954d	[X86] Add test cases from PR27202. llvm-svn: 357132	2019-03-27 23:12:19 +00:00
Sanjay Patel	1df0bb6264	[x86] improve AVX lowering of vector zext If we know the 2 halves of an oversized zext-in-reg are the same, don't create those halves independently. I tried several different approaches to fold this, but it's difficult to get right during legalization. In the default path, we are creating a generic shuffle that looks like an unpack high, but it can get transformed into a different mask (a blend), so it's not straightforward to match that. If we try to fold after it actually becomes an X86ISD::UNPCKH node, we can't be sure what the operand node is - it might be a generic shuffle, or it could be some x86-specific op. From the test output, we should be doing something like this for SSE4.1 as well, but I'd rather leave that as a follow-up since it involves changing lowering actions. Differential Revision: https://reviews.llvm.org/D59777 llvm-svn: 357129	2019-03-27 22:42:11 +00:00
Daniel Sanders	495156dc6a	test/CodeGen/X86/codegen-prepare-replacephi.mir requires a default triple llvm-svn: 357122	2019-03-27 20:43:47 +00:00
Nirav Dave	6b741a8038	[DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodes Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations. Reviewers: courbet, jyknight Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59897 llvm-svn: 357121	2019-03-27 20:37:08 +00:00
Justin Bogner	b1650f0da9	[LegalizeVectorTypes] Allow single loads and stores for more short vectors When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable. (See https://reviews.llvm.org/rL236528 for reference.) This applies that behaviour to vector types. If the vector type is TypePromoteInteger, the element type is going to be TypePromoteInteger as well, which will lead to have a single promoting load rather than N individual promoting loads. For instance, if we have a v3i1, we would now have a load of v4i1 instead of 3 loads of i1. Patch by Guillaume Marques. Thanks! Differential Revision: https://reviews.llvm.org/D56201 llvm-svn: 357120	2019-03-27 20:35:56 +00:00
Nirav Dave	c6dfaa0e83	Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes." This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116	2019-03-27 19:54:41 +00:00
Craig Topper	7c9afc35bc	[X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRD Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD. When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization. This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work. Fixes PR41055 Differential Revision: https://reviews.llvm.org/D59391 llvm-svn: 357096	2019-03-27 17:29:34 +00:00
Clement Courbet	678d128b5a	[X86MacroFusion][NFC] Improve macrofusion testing. Add negative tests. Add arithmetic/inc/cmp/and macrofusion tests. llvm-svn: 357076	2019-03-27 15:43:03 +00:00
Hans Wennborg	5c0d7a24e8	Re-commit r355490 "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default" Original commit by Ayonam Ray. This commit adds a regression test for the issue discovered in the previous commit: that the range check for the jump table can only be omitted if the fall-through destination of the jump table is unreachable, which isn't necessarily true just because the default of the switch is unreachable. This addresses the missing optimization in PR41242. > During the lowering of a switch that would result in the generation of a > jump table, a range check is performed before indexing into the jump > table, for the switch value being outside the jump table range and a > conditional branch is inserted to jump to the default block. In case the > default block is unreachable, this conditional jump can be omitted. This > patch implements omitting this conditional branch for unreachable > defaults. > > Differential Revision: https://reviews.llvm.org/D52002 > Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 357067	2019-03-27 14:10:11 +00:00
Simon Pilgrim	d6f9baf74f	[X86][SSE] Add shuffle test case for PR41249 llvm-svn: 357062	2019-03-27 11:21:09 +00:00
Simon Pilgrim	ccb71b2985	Revert rL356864 : [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. ........ Causes PR41249 llvm-svn: 357057	2019-03-27 10:25:02 +00:00
Craig Topper	feadc2a1de	[X86] Add test cases for missed opportunities in (x << C1) op C2 to (x op (C2>>C1)) << C1 transform. We handle the case where the C2 does not fit in a signed 32-bit immediate, but (C2>>C1) does. But there's also some 64-bit opportunities when C2 is not an unsigned 32-bit immediate, but (C2>>C1) is. For OR/XOR this allows us to load the immediate with with MOV32ri instead of a movabsq. For AND it allows us to use a 32-bit AND and fold the immediate. llvm-svn: 357050	2019-03-27 06:07:05 +00:00
Craig Topper	7da7b97487	[X86] When iselling (x << C1) and/or/xor C2 as (x and/or/xor (C2>>C1)) << C1, go through the isel table instead of manually selecting. Previously we manually selected the AND/OR/XOR with immediate and the SHL(or ADD if the shift is 1). But this was missing out on the opportunity to use a 64 bit AND with a 32-bit immediate and possibly other isel tricks we have built into the tables. Instead, insert the new nodes into the DAG using insertDAGNode and allow them each to be selected through the normal table. llvm-svn: 357049	2019-03-27 04:45:58 +00:00
Craig Topper	06cdd7e488	[X86] Autogenerate complete checks. NFC llvm-svn: 357046	2019-03-27 02:18:41 +00:00
Francis Visoiu Mistrih	ee1a6e70fa	[Remarks] Emit a section containing remark diagnostics metadata A section containing metadata on remark diagnostics will be emitted if the flag (-mllvm) -remarks-section is present. For now, the metadata is: * a magic number for remarks: "REMARKS\0" * the version number: a little-endian uint64_t * the absolute file path to the serialized remark diagnostics: a null-terminated string. Differential Revision: https://reviews.llvm.org/D59571 llvm-svn: 357043	2019-03-27 01:13:59 +00:00
Sanjay Patel	bb5cba3cca	[SDAG] add simplifications for FP at node creation time We have the folds for fadd/fsub/fmul already in DAGCombiner, so it may be possible to remove that code if we can guarantee that these ops are zapped before they can exist. llvm-svn: 357029	2019-03-26 20:54:15 +00:00
Nirav Dave	a28c514581	[DAG] Avoid smart constructor-based dangling nodes. Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations or not fully pruning unused result values. This can result in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter as the current node deleter to make sure such nodes have the chance of being pruned. Many minor changes, mostly positive. llvm-svn: 356996	2019-03-26 15:08:14 +00:00
Simon Pilgrim	e24441aab0	[TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELT This helps us relax the extension of a lot of scalar elements before they are inserted into a vector. Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions. Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes. Differential Revision: https://reviews.llvm.org/D59484 llvm-svn: 356989	2019-03-26 12:32:01 +00:00
Sanjay Patel	9bcb0766eb	[x86] add tests for vector cmps; NFC llvm-svn: 356959	2019-03-25 22:08:45 +00:00
Simon Pilgrim	167af1bafb	[SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCC First half of PR40800, this patch adds DAG undef handling to icmp instructions to match the behaviour in llvm::ConstantFoldCompareInstruction and SimplifyICmpInst, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involved a lot of tweaking to reduced tests as bugpoint loves to reduce icmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D59363 llvm-svn: 356938	2019-03-25 18:51:57 +00:00
Sanjay Patel	f49e33e252	[x86] add another vector zext test; NFC Goes with the proposal in D59777 llvm-svn: 356930	2019-03-25 17:53:56 +00:00
Sanjay Patel	76c1ef3d07	[x86] add tests for vector zext; NFC The AVX1 lowering is poor. llvm-svn: 356914	2019-03-25 15:54:34 +00:00
Jonas Paulsson	0e75e21eb3	[RegAlloc] Simplify MIR test Remove the IR part from test/CodeGen/X86/regalloc-copy-hints.mir (added by r355854). To make the test remain functional, the parts of the MBB names referring to BB names have been removed, as well as all machine memory operands. llvm-svn: 356899	2019-03-25 14:28:32 +00:00
Craig Topper	7c2554dd92	Revert r356688 "[X86] Don't avoid folding multiple use sign extended 8-bit immediate into instructions under optsize." Looking back over how the one use optimization works, I don't think this is the right way to fix this. llvm-svn: 356866	2019-03-25 01:25:32 +00:00
Simon Pilgrim	87d4ab8b92	[X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. llvm-svn: 356864	2019-03-24 19:06:35 +00:00
Simon Pilgrim	4465a765ee	[X86] Remove icmp undef from reduced tests Pre-commit for D59363 (Add icmp UNDEF handling to SelectionDAG::FoldSetCC) Approved by @spatel (Sanjay Patel) llvm-svn: 356859	2019-03-24 17:02:08 +00:00
Simon Pilgrim	a71c0ed471	[X86][AVX] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Just enable this for AVX for now as SSE41 introduces extra register moves for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern (but otherwise helps reduce port5 usage on Intel targets). Only AVX support is required for PR40685 as the issue is due to 8i8->8i32 zext shuffle leftovers. llvm-svn: 356858	2019-03-24 16:30:35 +00:00
Sanjay Patel	7d676dfd86	[x86] improve the default expansion of uaddsat/usubsat This is yet another step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 uaddsat X, Y --> (X >u (X + Y)) ? -1 : X + Y usubsat X, Y --> (X >u Y) ? X - Y : 0 We can't count on a sane vector ISA, so override the default (umin/umax) expansion of unsigned add/sub saturate in cases where we do not have umin/umax. Differential Revision: https://reviews.llvm.org/D59006 llvm-svn: 356855	2019-03-24 13:55:54 +00:00
Craig Topper	ce1ed55a4a	[X86] Use xmm registers to implement 64-bit popcnt on 32-bit targets if possible if popcnt instruction is not available On 32-bit targets without popcnt, we currently expand 64-bit popcnt to sequences of arithmetic and logic ops for each 32-bit half and then add the 32 bit halves together. If we have xmm registers we can use use those to implement the operation instead. This results in less instructions then doing two separate 32-bit popcnt sequences. This mitigates some of PR41151 for the i64 on i686 case when we have SSE2. Differential Revision: https://reviews.llvm.org/D59662 llvm-svn: 356808	2019-03-22 20:47:02 +00:00
Craig Topper	1ffd8e8114	[X86] Use movq for i64 atomic load on 32-bit targets when sse2 is enable We used a lock cmpxchg8b to do i64 atomic loads. But if we have SSE2 we can do better and use a plain movq to do the load instead. I tried to just use an f64 atomic load and add isel patterns to MOVSD(which the domain fixing pass can turn to MOVQ), but the atomic_load SDNode in TargetSelectionDAG.td requires the type to be integer. So I've emitted VZEXT_LOAD instead which should be selected by isel to a MOVQ. Hopefully we don't need a specific atomic flavor of this. I kept the memory operand from the original AtomicSDNode. I wasn't sure if I might need to set the MOVolatile flag? I've left some FIXMEs for improvements we can do without SSE2. Differential Revision: https://reviews.llvm.org/D59679 llvm-svn: 356807	2019-03-22 20:46:56 +00:00
James Y Knight	c0e6b8ac3a	IR: Support parsing numeric block ids, and emit them in textual output. Just as as llvm IR supports explicitly specifying numeric value ids for instructions, and emits them by default in textual output, now do the same for blocks. This is a slightly incompatible change in the textual IR format. Previously, llvm would parse numeric labels as string names. E.g. define void @f() { br label %"55" 55: ret void } defined a label named "55", even without needing to be quoted, while the reference required quoting. Now, if you intend a block label which looks like a value number to be a name, you must quote it in the definition too (e.g. `"55":`). Previously, llvm would print nameless blocks only as a comment, and would omit it if there was no predecessor. This could cause confusion for readers of the IR, just as unnamed instructions did prior to the addition of "%5 = " syntax, back in 2008 (PR2480). Now, it will always print a label for an unnamed block, with the exception of the entry block. (IMO it may be better to print it for the entry-block as well. However, that requires updating many more tests.) Thus, the following is supported, and is the canonical printing: define i32 @f(i32, i32) { %3 = add i32 %0, %1 br label %4 4: ret i32 %3 } New test cases covering this behavior are added, and other tests updated as required. Differential Revision: https://reviews.llvm.org/D58548 llvm-svn: 356789	2019-03-22 18:27:13 +00:00
Simon Pilgrim	aea9db9d40	[X86] Regenerate powi tests to include i686 x87/sse targets llvm-svn: 356787	2019-03-22 18:04:28 +00:00
Simon Pilgrim	08380afaab	[X86] Add PR13897 test case (i128 mul on i686) llvm-svn: 356786	2019-03-22 17:52:21 +00:00
Simon Pilgrim	564392d752	[X86] lowerShuffleAsBitMask - ensure float bit masks are the correct width (PR41203) llvm-svn: 356784	2019-03-22 17:23:55 +00:00
Sanjay Patel	221081e365	[x86] auto-generate complete test checks; NFC llvm-svn: 356763	2019-03-22 15:33:59 +00:00
Sanjay Patel	0893351c1c	[x86] auto-generate complete test checks; NFC llvm-svn: 356762	2019-03-22 15:33:55 +00:00
Sanjay Patel	61e2333acb	[x86] add 'nounwind' to tests to reduce noise; NFC llvm-svn: 356761	2019-03-22 15:33:51 +00:00
Sanjay Patel	f39494e795	[x86] auto-generate complete checks for test; NFC llvm-svn: 356760	2019-03-22 15:33:47 +00:00
Craig Topper	b865084ef3	[X86] Add 32-bit command lines with and without SSE2 to atomic-non-integer.ll. NFC llvm-svn: 356733	2019-03-22 04:28:40 +00:00
Craig Topper	056b9a995b	[X86] Autogenerate complete checks. NFC llvm-svn: 356723	2019-03-21 23:09:56 +00:00
Simon Pilgrim	c2e4405475	[X86] canonicalizeBitSelect - don't attempt to canonicalize mask registers We don't use X86ISD::ANDNP for mask registers. Test case from @craig.topper (Craig Topper) llvm-svn: 356696	2019-03-21 18:32:38 +00:00
Sanjay Patel	0760758fed	[x86] add tests with movmsk potential (PR39665); NFC llvm-svn: 356691	2019-03-21 17:57:56 +00:00
Craig Topper	c14f3e4222	[X86] Don't avoid folding multiple use sign extended 8-bit immediate into instructions under optsize. Under optsize we try to avoid folding immediates into instructions under optsize. But if the immediate is 16-bits or 32 bits, but can be encoded as an 8-bit immediate we don't save enough from disabling the folding unless the immediate has enough uses to make up for the size of the move which is either 3 bytes or 5 bytes since there are no sign extended 8-bit moves. We would also save something if the immediate was a live out of the basic block and thus a move was unavoidable, but that would require a more advanced heuristic than just counting uses. Note we only avoid folding multiple use immediates into the patterns that use X86ISD::ADD/SUB/XOR/OR/AND/CMP/ADC/SBB nodes and not the more common ISD::ADD/SUB/XOR/OR/AND nodes. Differential Revision: https://reviews.llvm.org/D59522 llvm-svn: 356688	2019-03-21 17:38:58 +00:00
Craig Topper	9f0b17a248	[ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics. This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle. I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway. Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that. Differential Revision: https://reviews.llvm.org/D59180 llvm-svn: 356687	2019-03-21 17:38:52 +00:00
Krzysztof Parzyszek	4719502941	Add more rotate tests, including ORs of rotates This is a part of https://reviews.llvm.org/D47735. llvm-svn: 356683	2019-03-21 17:14:22 +00:00
Simon Pilgrim	da4992bf8d	[DAGCombine] SimplifySelectCC - call FoldSetCC with the setcc result type We were calling FoldSetCC with the compare operand type instead of the result type. Found by OSS-Fuzz #13838 (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13838) llvm-svn: 356667	2019-03-21 14:07:18 +00:00
Sanjay Patel	d47eac59ef	[CodeGenPrepare] limit formation of overflow intrinsics (PR41129) This is probably a bigger limitation than necessary, but since we don't have any evidence yet that this transform led to real-world perf improvements rather than regressions, I'm making a quick, blunt fix. In the motivating x86 example from: https://bugs.llvm.org/show_bug.cgi?id=41129 ...and shown in the regression test, we want to avoid an extra instruction in the dominating block because that could be costly. The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version is any better than the other yet. Differential Revision: https://reviews.llvm.org/D59602 llvm-svn: 356665	2019-03-21 13:57:07 +00:00
Craig Topper	8d46403b8e	[X86] Add CMPXCHG8B feature flag. Set it for all CPUs except i386/i486 including 'generic'. Disable use of CMPXCHG8B when this flag isn't set. CMPXCHG8B was introduced on i586/pentium generation. If its not enabled, limit the atomic width to 32 bits so the AtomicExpandPass will expand to lib calls. Unclear if we should be using a different limit for other configs. The default is 1024 and experimentation shows that using an i256 atomic will cause a crash in SelectionDAG. Differential Revision: https://reviews.llvm.org/D59576 llvm-svn: 356631	2019-03-20 23:35:49 +00:00
Craig Topper	0367553304	[X86] Call lowerShuffleAsBitMask for 512-bit vectors in lowerShuffleAsBlend. This patch enables the use of lowerShuffleAsBitMask for 512-bit blends before falling back to move immedate, GPR to k-register, and masked op. I had to make some changes to support v8i64 when i64 is not a legal type. And to support floating point types. This trades a load for the move immediate and GPR move which is higher latency. But its probably better for register pressure not having to hop through other register classes. The load+and should play better with LICM and rematerialization I think. Differential Revision: https://reviews.llvm.org/D59479 llvm-svn: 356618	2019-03-20 21:30:20 +00:00
Sanjay Patel	fb44f99b73	[CGP][x86] add tests for usubo regression (PR41129); NFC llvm-svn: 356559	2019-03-20 15:02:35 +00:00
Clement Courbet	238af52ded	[ExpandMemCmp] Trigger on bcmp too. Summary: Fixes 41150. Reviewers: gchatelet Subscribers: hiraditya, llvm-commits, ckennelly, sbenza, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D59593 llvm-svn: 356550	2019-03-20 11:51:11 +00:00
Craig Topper	fda1f96d28	[X86] Remove X32 check lines from a test that doesn't have an X32 FileCheck prefix. Regenerate the test using update_llc_test_checks. NFC llvm-svn: 356535	2019-03-20 03:13:28 +00:00
Matt Arsenault	c2e35a6f32	RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs The 2nd loop calculates spill costs but reports free registers as cost 0 anyway, so there is little benefit from having a separate early loop. Surprisingly this is not NFC, as many register are marked regDisabled so the first loop often picks up later registers unnecessarily instead of the first one available in the allocation order... Patch by Matthias Braun llvm-svn: 356499	2019-03-19 19:01:34 +00:00
Philip Reames	db65a5b776	Allow unordered loads to be considered invariant in CodeGen The actual code change is fairly straight forward, but exercising it isn't. First, it turned out we weren't adding the appropriate flags in SelectionDAG. Second, it turned out that we've got some optimization gaps, so obvious test cases don't work. My first attempt (in atomic-unordered.ll) points out a deficiency in our peephole-opt folding logic which I plan to fix separately. Instead, I'm exercising this through MachineLICM. Differential Revision: https://reviews.llvm.org/D59375 llvm-svn: 356494	2019-03-19 18:27:18 +00:00
Simon Pilgrim	e744f513c4	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - handle repeated shift amounts If a value with multiple uses is only ever used for SSE shift amounts then we know that only the bottom 64-bits are needed. llvm-svn: 356483	2019-03-19 17:23:25 +00:00
Philip Reames	2153c4b828	[AtomicExpand] Fix a crash bug when lowering unordered loads to cmpxchg Add tests for wider atomic loads and stores. In the process, fix a crasher where we appearently handled unorder stores, but not loads, when lowering to cmpxchg idioms. llvm-svn: 356482	2019-03-19 17:20:49 +00:00
Justin Bogner	b353d6887e	[DAGCombine] Fix a miscompile when reducing BUILD_VECTORs to a shuffle In r311255 we added a case where we split vectors whose elements are all derived from the same input vector so that we could shuffle it more efficiently. In doing so, createBuildVecShuffle was taught to adjust for the fact that all indices would be based off of the first vector when this happens, but it's possible for the code that checked that to fire incorrectly if we happen to have a BUILD_VECTOR of extracts from subvectors and don't hit this new optimization. Instead of trying to detect if we've split the vector by checking if we have extracts from the same base vector, we can just pass that information into createBuildVecShuffle, avoiding the miscompile. Differential Revision: https://reviews.llvm.org/D59507 llvm-svn: 356476	2019-03-19 16:52:00 +00:00
Philip Reames	376c87fcd4	[Tests] Update to newer ISA There are some issues w/missed opts on older platforms, but that's not the purpose of this test. Using a newer API points out that some TODOs are already handled, and allows addition of tests to exercise other issues (future patch.) llvm-svn: 356473	2019-03-19 16:46:56 +00:00
Craig Topper	f086e562f9	[X86] Use relocImm in the ROL8ri/ROL16ri/ROL32ri/ROL64ri patterns to be consistent with the ROR patterns. llvm-svn: 356407	2019-03-18 20:43:15 +00:00
Nirav Dave	55c921f4bf	[DAG] Cleanup unused node in SimplifySelectCC. Delete temporarily constructed node uses for analysis after it's use, holding onto original input nodes. Ideally this would be rewritten without making nodes, but this appears relatively complex. Reviewers: spatel, RKSimon, craig.topper Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57921 llvm-svn: 356382	2019-03-18 17:02:38 +00:00
Nikita Popov	9a4453592b	[DAGCombine] Fold (x & ~y) \| y patterns Fold (x & ~y) \| y and it's four commuted variants to x \| y. This pattern can in particular appear when a vselect c, x, -1 is expanded to (x & ~c) \| (-1 & c) and combined to (x & ~c) \| c. This change has some overlap with D59066, which avoids creating a vselect of this form in the first place during uaddsat expansion. Differential Revision: https://reviews.llvm.org/D59174 llvm-svn: 356333	2019-03-17 15:45:38 +00:00
Simon Pilgrim	3b0a6c69ee	[DAGCombine] combineShuffleOfScalars - handle non-zero SCALAR_TO_VECTOR indices (PR41097) rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef. llvm-svn: 356327	2019-03-16 17:36:26 +00:00
Simon Pilgrim	f2c53b5d6c	[X86][SSE] Constant fold PEXTRB/PEXTRW/EXTRACT_VECTOR_ELT nodes. Replaces existing i1-only fold. llvm-svn: 356325	2019-03-16 15:02:00 +00:00
Simon Pilgrim	0f472e1d01	[X86] Add SimplifyDemandedBitsForTargetNode support for PEXTRB/PEXTRW Improved constant folding for PEXTRB/PEXTRW will be added in a future commit llvm-svn: 356324	2019-03-16 14:29:50 +00:00
Roman Lebedev	9f37790608	[X86] X86ISelLowering::combineSextInRegCmov(): also handle i8 CMOV's Summary: As noted by @andreadb in https://reviews.llvm.org/D59035#inline-525780 If we have `sext (trunc (cmov C0, C1) to i8)`, we can instead do `cmov (sext (trunc C0 to i8)), (sext (trunc C1 to i8))` Reviewers: craig.topper, andreadb, RKSimon Reviewed By: craig.topper Subscribers: llvm-commits, andreadb Tags: #llvm Differential Revision: https://reviews.llvm.org/D59412 llvm-svn: 356301	2019-03-15 21:18:05 +00:00
Roman Lebedev	b6e376ddfa	[X86] Promote i8 CMOV's (PR40965) Summary: @mclow.lists brought up this issue up in IRC, it came up during implementation of libc++ `std::midpoint()` implementation (D59099) https://godbolt.org/z/oLrHBP Currently LLVM X86 backend only promotes i8 CMOV if it came from 2x`trunc`. This differential proposes to always promote i8 CMOV. There are several concerns here: * Is this actually more performant, or is it just the ASM that looks cuter? * Does this result in partial register stalls? * What about branch predictor? # Indeed, performance should be the main point here. Let's look at a simple microbenchmark: {F8412076} ``` #include "benchmark/benchmark.h" #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> // Future preliminary libc++ code, from Marshall Clow. namespace std { template <class _Tp> __inline _Tp midpoint(_Tp __a, _Tp __b) noexcept { using _Up = typename std::make_unsigned<typename remove_cv<_Tp>::type>::type; int __sign = 1; _Up __m = __a; _Up __M = __b; if (__a > __b) { __sign = -1; __m = __b; __M = __a; } return __a + __sign * _Tp(_Up(__M - __m) >> 1); } } // namespace std template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct RandRand { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { return std::make_pair(getVectorOfRandomNumbers<T>(count), getVectorOfRandomNumbers<T>(count)); } }; struct ZeroRand { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { return std::make_pair(std::vector<T>(count, T(0)), getVectorOfRandomNumbers<T>(count)); } }; template <class T, class Gen> void BM_StdMidpoint(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { for (size_t i = 0; i < Length; i++) { const auto calculated = std::midpoint(a[i], b[i]); benchmark::DoNotOptimize(calculated); } } state.SetComplexityN(Length); state.counters["midpoints"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["midpoints/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = 2 * 1024 * 1024; // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /total bufs/ 2 / /maximal elt size/ sizeof(T) / /safety margin/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } // Both of the values are random. // The comparison is unpredictable. BENCHMARK_TEMPLATE(BM_StdMidpoint, int32_t, RandRand) ->Apply(CustomArguments<int32_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, uint32_t, RandRand) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, int64_t, RandRand) ->Apply(CustomArguments<int64_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, uint64_t, RandRand) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, int16_t, RandRand) ->Apply(CustomArguments<int16_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, uint16_t, RandRand) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, int8_t, RandRand) ->Apply(CustomArguments<int8_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, uint8_t, RandRand) ->Apply(CustomArguments<uint8_t>); // One value is always zero, and another is bigger or equal than zero. // The comparison is predictable. BENCHMARK_TEMPLATE(BM_StdMidpoint, uint32_t, ZeroRand) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, uint64_t, ZeroRand) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, uint16_t, ZeroRand) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_StdMidpoint, uint8_t, ZeroRand) ->Apply(CustomArguments<uint8_t>); ``` ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks ./llvm-cmov-bench-OLD ./llvm-cmov-bench-NEW RUNNING: ./llvm-cmov-bench-OLD --benchmark_out=/tmp/tmp5a5qjm 2019-03-06 21:53:31 Running ./llvm-cmov-bench-OLD Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.78, 1.81, 1.36 ---------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters<...> ---------------------------------------------------------------------------------------------------- <...> BM_StdMidpoint<int32_t, RandRand>/131072 300398 ns 300404 ns 2330 bytes_read/iteration=1024k bytes_read/sec=3.25083G/s midpoints=305.398M midpoints/sec=436.319M/s BM_StdMidpoint<int32_t, RandRand>_BigO 2.29 N 2.29 N BM_StdMidpoint<int32_t, RandRand>_RMS 2 % 2 % <...> BM_StdMidpoint<uint32_t, RandRand>/131072 300433 ns 300433 ns 2330 bytes_read/iteration=1024k bytes_read/sec=3.25052G/s midpoints=305.398M midpoints/sec=436.278M/s BM_StdMidpoint<uint32_t, RandRand>_BigO 2.29 N 2.29 N BM_StdMidpoint<uint32_t, RandRand>_RMS 2 % 2 % <...> BM_StdMidpoint<int64_t, RandRand>/65536 169857 ns 169858 ns 4121 bytes_read/iteration=1024k bytes_read/sec=5.74929G/s midpoints=270.074M midpoints/sec=385.828M/s BM_StdMidpoint<int64_t, RandRand>_BigO 2.59 N 2.59 N BM_StdMidpoint<int64_t, RandRand>_RMS 3 % 3 % <...> BM_StdMidpoint<uint64_t, RandRand>/65536 169770 ns 169771 ns 4125 bytes_read/iteration=1024k bytes_read/sec=5.75223G/s midpoints=270.336M midpoints/sec=386.026M/s BM_StdMidpoint<uint64_t, RandRand>_BigO 2.59 N 2.59 N BM_StdMidpoint<uint64_t, RandRand>_RMS 3 % 3 % <...> BM_StdMidpoint<int16_t, RandRand>/262144 591169 ns 591179 ns 1182 bytes_read/iteration=1024k bytes_read/sec=1.65189G/s midpoints=309.854M midpoints/sec=443.426M/s BM_StdMidpoint<int16_t, RandRand>_BigO 2.25 N 2.25 N BM_StdMidpoint<int16_t, RandRand>_RMS 1 % 1 % <...> BM_StdMidpoint<uint16_t, RandRand>/262144 591264 ns 591274 ns 1184 bytes_read/iteration=1024k bytes_read/sec=1.65162G/s midpoints=310.378M midpoints/sec=443.354M/s BM_StdMidpoint<uint16_t, RandRand>_BigO 2.25 N 2.25 N BM_StdMidpoint<uint16_t, RandRand>_RMS 1 % 1 % <...> BM_StdMidpoint<int8_t, RandRand>/524288 2983669 ns 2983689 ns 235 bytes_read/iteration=1024k bytes_read/sec=335.156M/s midpoints=123.208M midpoints/sec=175.718M/s BM_StdMidpoint<int8_t, RandRand>_BigO 5.69 N 5.69 N BM_StdMidpoint<int8_t, RandRand>_RMS 0 % 0 % <...> BM_StdMidpoint<uint8_t, RandRand>/524288 2668398 ns 2668419 ns 262 bytes_read/iteration=1024k bytes_read/sec=374.754M/s midpoints=137.363M midpoints/sec=196.479M/s BM_StdMidpoint<uint8_t, RandRand>_BigO 5.09 N 5.09 N BM_StdMidpoint<uint8_t, RandRand>_RMS 0 % 0 % <...> BM_StdMidpoint<uint32_t, ZeroRand>/131072 300887 ns 300887 ns 2331 bytes_read/iteration=1024k bytes_read/sec=3.24561G/s midpoints=305.529M midpoints/sec=435.619M/s BM_StdMidpoint<uint32_t, ZeroRand>_BigO 2.29 N 2.29 N BM_StdMidpoint<uint32_t, ZeroRand>_RMS 2 % 2 % <...> BM_StdMidpoint<uint64_t, ZeroRand>/65536 169634 ns 169634 ns 4102 bytes_read/iteration=1024k bytes_read/sec=5.75688G/s midpoints=268.829M midpoints/sec=386.338M/s BM_StdMidpoint<uint64_t, ZeroRand>_BigO 2.59 N 2.59 N BM_StdMidpoint<uint64_t, ZeroRand>_RMS 3 % 3 % <...> BM_StdMidpoint<uint16_t, ZeroRand>/262144 592252 ns 592255 ns 1182 bytes_read/iteration=1024k bytes_read/sec=1.64889G/s midpoints=309.854M midpoints/sec=442.62M/s BM_StdMidpoint<uint16_t, ZeroRand>_BigO 2.26 N 2.26 N BM_StdMidpoint<uint16_t, ZeroRand>_RMS 1 % 1 % <...> BM_StdMidpoint<uint8_t, ZeroRand>/524288 987295 ns 987309 ns 711 bytes_read/iteration=1024k bytes_read/sec=1012.85M/s midpoints=372.769M midpoints/sec=531.028M/s BM_StdMidpoint<uint8_t, ZeroRand>_BigO 1.88 N 1.88 N BM_StdMidpoint<uint8_t, ZeroRand>_RMS 1 % 1 % RUNNING: ./llvm-cmov-bench-NEW --benchmark_out=/tmp/tmpPvwpfW 2019-03-06 21:56:58 Running ./llvm-cmov-bench-NEW Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.17, 1.46, 1.30 ---------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters<...> ---------------------------------------------------------------------------------------------------- <...> BM_StdMidpoint<int32_t, RandRand>/131072 300878 ns 300880 ns 2324 bytes_read/iteration=1024k bytes_read/sec=3.24569G/s midpoints=304.611M midpoints/sec=435.629M/s BM_StdMidpoint<int32_t, RandRand>_BigO 2.29 N 2.29 N BM_StdMidpoint<int32_t, RandRand>_RMS 2 % 2 % <...> BM_StdMidpoint<uint32_t, RandRand>/131072 300231 ns 300226 ns 2330 bytes_read/iteration=1024k bytes_read/sec=3.25276G/s midpoints=305.398M midpoints/sec=436.578M/s BM_StdMidpoint<uint32_t, RandRand>_BigO 2.29 N 2.29 N BM_StdMidpoint<uint32_t, RandRand>_RMS 2 % 2 % <...> BM_StdMidpoint<int64_t, RandRand>/65536 170819 ns 170777 ns 4115 bytes_read/iteration=1024k bytes_read/sec=5.71835G/s midpoints=269.681M midpoints/sec=383.752M/s BM_StdMidpoint<int64_t, RandRand>_BigO 2.60 N 2.60 N BM_StdMidpoint<int64_t, RandRand>_RMS 3 % 3 % <...> BM_StdMidpoint<uint64_t, RandRand>/65536 171705 ns 171708 ns 4106 bytes_read/iteration=1024k bytes_read/sec=5.68733G/s midpoints=269.091M midpoints/sec=381.671M/s BM_StdMidpoint<uint64_t, RandRand>_BigO 2.62 N 2.62 N BM_StdMidpoint<uint64_t, RandRand>_RMS 3 % 3 % <...> BM_StdMidpoint<int16_t, RandRand>/262144 592510 ns 592516 ns 1182 bytes_read/iteration=1024k bytes_read/sec=1.64816G/s midpoints=309.854M midpoints/sec=442.425M/s BM_StdMidpoint<int16_t, RandRand>_BigO 2.26 N 2.26 N BM_StdMidpoint<int16_t, RandRand>_RMS 1 % 1 % <...> BM_StdMidpoint<uint16_t, RandRand>/262144 614823 ns 614823 ns 1180 bytes_read/iteration=1024k bytes_read/sec=1.58836G/s midpoints=309.33M midpoints/sec=426.373M/s BM_StdMidpoint<uint16_t, RandRand>_BigO 2.33 N 2.33 N BM_StdMidpoint<uint16_t, RandRand>_RMS 4 % 4 % <...> BM_StdMidpoint<int8_t, RandRand>/524288 1073181 ns 1073201 ns 650 bytes_read/iteration=1024k bytes_read/sec=931.791M/s midpoints=340.787M midpoints/sec=488.527M/s BM_StdMidpoint<int8_t, RandRand>_BigO 2.05 N 2.05 N BM_StdMidpoint<int8_t, RandRand>_RMS 1 % 1 % BM_StdMidpoint<uint8_t, RandRand>/524288 1071010 ns 1071020 ns 653 bytes_read/iteration=1024k bytes_read/sec=933.689M/s midpoints=342.36M midpoints/sec=489.522M/s BM_StdMidpoint<uint8_t, RandRand>_BigO 2.05 N 2.05 N BM_StdMidpoint<uint8_t, RandRand>_RMS 1 % 1 % <...> BM_StdMidpoint<uint32_t, ZeroRand>/131072 300413 ns 300416 ns 2330 bytes_read/iteration=1024k bytes_read/sec=3.2507G/s midpoints=305.398M midpoints/sec=436.302M/s BM_StdMidpoint<uint32_t, ZeroRand>_BigO 2.29 N 2.29 N BM_StdMidpoint<uint32_t, ZeroRand>_RMS 2 % 2 % <...> BM_StdMidpoint<uint64_t, ZeroRand>/65536 169667 ns 169669 ns 4123 bytes_read/iteration=1024k bytes_read/sec=5.75568G/s midpoints=270.205M midpoints/sec=386.257M/s BM_StdMidpoint<uint64_t, ZeroRand>_BigO 2.59 N 2.59 N BM_StdMidpoint<uint64_t, ZeroRand>_RMS 3 % 3 % <...> BM_StdMidpoint<uint16_t, ZeroRand>/262144 591396 ns 591404 ns 1184 bytes_read/iteration=1024k bytes_read/sec=1.65126G/s midpoints=310.378M midpoints/sec=443.257M/s BM_StdMidpoint<uint16_t, ZeroRand>_BigO 2.26 N 2.26 N BM_StdMidpoint<uint16_t, ZeroRand>_RMS 1 % 1 % <...> BM_StdMidpoint<uint8_t, ZeroRand>/524288 1069421 ns 1069413 ns 655 bytes_read/iteration=1024k bytes_read/sec=935.092M/s midpoints=343.409M midpoints/sec=490.258M/s BM_StdMidpoint<uint8_t, ZeroRand>_BigO 2.04 N 2.04 N BM_StdMidpoint<uint8_t, ZeroRand>_RMS 0 % 0 % Comparing ./llvm-cmov-bench-OLD to ./llvm-cmov-bench-NEW Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------- <...> BM_StdMidpoint<int32_t, RandRand>/131072 +0.0016 +0.0016 300398 300878 300404 300880 <...> BM_StdMidpoint<uint32_t, RandRand>/131072 -0.0007 -0.0007 300433 300231 300433 300226 <...> BM_StdMidpoint<int64_t, RandRand>/65536 +0.0057 +0.0054 169857 170819 169858 170777 <...> BM_StdMidpoint<uint64_t, RandRand>/65536 +0.0114 +0.0114 169770 171705 169771 171708 <...> BM_StdMidpoint<int16_t, RandRand>/262144 +0.0023 +0.0023 591169 592510 591179 592516 <...> BM_StdMidpoint<uint16_t, RandRand>/262144 +0.0398 +0.0398 591264 614823 591274 614823 <...> BM_StdMidpoint<int8_t, RandRand>/524288 -0.6403 -0.6403 2983669 1073181 2983689 1073201 <...> BM_StdMidpoint<uint8_t, RandRand>/524288 -0.5986 -0.5986 2668398 1071010 2668419 1071020 <...> BM_StdMidpoint<uint32_t, ZeroRand>/131072 -0.0016 -0.0016 300887 300413 300887 300416 <...> BM_StdMidpoint<uint64_t, ZeroRand>/65536 +0.0002 +0.0002 169634 169667 169634 169669 <...> BM_StdMidpoint<uint16_t, ZeroRand>/262144 -0.0014 -0.0014 592252 591396 592255 591404 <...> BM_StdMidpoint<uint8_t, ZeroRand>/524288 +0.0832 +0.0832 987295 1069421 987309 1069413 ``` What can we tell from the benchmark? * `BM_StdMidpoint<[u]int8_t, RandRand>` indeed has the worst performance. * All `BM_StdMidpoint<uint{8,16,32}_t, ZeroRand>` are all performant, even the 8-bit case. That is because there we are computing mid point between zero and some random number, thus if the branch predictor is in use, it is in optimal situation. * Promoting 8-bit CMOV did improve performance of `BM_StdMidpoint<[u]int8_t, RandRand>`, by -59%..-64%. # What about branch predictor? * `BM_StdMidpoint<uint8_t, ZeroRand>` was faster than `BM_StdMidpoint<uint{16,32,64}_t, ZeroRand>`, which may mean that well-predicted branch is better than `cmov`. * Promoting 8-bit CMOV degraded performance of `BM_StdMidpoint<uint8_t, ZeroRand>`, `cmov` is up to +10% worse than well-predicted branch. * However, i do not believe this is a concern. If the branch is well predicted, then the PGO will also say that it is well predicted, and LLVM will happily expand cmov back into branch: https://godbolt.org/z/P5ufig # What about partial register stalls? I'm not really able to answer that. What i can say is that if the branch is unpredictable (if it is predictable, then use PGO and you'll have branch) in ~50% of cases you will have to pay branch misprediction penalty. ``` $ grep -i MispredictPenalty X86Sched*.td X86SchedBroadwell.td: let MispredictPenalty = 16; X86SchedHaswell.td: let MispredictPenalty = 16; X86SchedSandyBridge.td: let MispredictPenalty = 16; X86SchedSkylakeClient.td: let MispredictPenalty = 14; X86SchedSkylakeServer.td: let MispredictPenalty = 14; X86ScheduleBdVer2.td: let MispredictPenalty = 20; // Minimum branch misdirection penalty. X86ScheduleBtVer2.td: let MispredictPenalty = 14; // Minimum branch misdirection penalty X86ScheduleSLM.td: let MispredictPenalty = 10; X86ScheduleZnver1.td: let MispredictPenalty = 17; ``` .. which it can be as small as 10 cycles and as large as 20 cycles. Partial register stalls do not seem to be an issue for AMD CPU's. For intel CPU's, they should be around ~5 cycles? Is that actually an issue here? I'm not sure. In short, i'd say this is an improvement, at least on this microbenchmark. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40965 \| PR40965 ]]. Reviewers: craig.topper, RKSimon, spatel, andreadb, nikic Reviewed By: craig.topper, andreadb Subscribers: jfb, jdoerfert, llvm-commits, mclow.lists Tags: #llvm, #libc Differential Revision: https://reviews.llvm.org/D59035 llvm-svn: 356300	2019-03-15 21:17:53 +00:00
Simon Pilgrim	d33e62c826	[X86][SSE] Fold scalar_to_vector(i64 anyext(x)) -> bitcast(scalar_to_vector(i32 anyext(x))) Reduce the size of an any-extended i64 scalar_to_vector source to i32 - the any_extend nodes are often introduced by SimplifyDemandedBits. llvm-svn: 356292	2019-03-15 19:14:28 +00:00
Philip Reames	d238bf7855	[X86][GlobalISEL] Support lowering aligned unordered atomics The existing lowering code is accidentally correct for unordered atomics as far as I can tell. An unordered atomic has no memory ordering, and simply requires the actual load or store to be done as a single well aligned instruction. As such, relax the restriction while adding tests to ensure the lowering remains correct in the future. Differential Revision: https://reviews.llvm.org/D57803 llvm-svn: 356280	2019-03-15 17:50:30 +00:00
Simon Pilgrim	8fbe439345	[SelectionDAG] Add SimplifyDemandedBits handling for ISD::SCALAR_TO_VECTOR Fixes a lot of constant folding mismatches between i686 and x86_64 llvm-svn: 356273	2019-03-15 17:00:55 +00:00
Simon Pilgrim	65165d54bb	[X86] Add SimplifyDemandedBitsForTargetNode support for PINSRB/PINSRW llvm-svn: 356270	2019-03-15 16:16:49 +00:00
Mikael Holmen	339daae806	[CodeGenPrepare] avoid crashing from replacing a phi twice Summary: This is a fix to bug 41052: https://bugs.llvm.org/show_bug.cgi?id=41052 While trying to optimize a memory instruction in a dead basic block, we end up registering the same phi for replacement twice. This patch avoids registering more than the first replacement candidate for a phi. Patch by: JesperAntonsson Reviewers: skatkov, aprantl Reviewed By: aprantl Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59358 llvm-svn: 356260	2019-03-15 13:51:05 +00:00
Simon Pilgrim	0ad17402a9	[X86][SSE] Attempt to convert SSE shift-by-var to shift-by-imm. Prep work for PR40203 llvm-svn: 356249	2019-03-15 11:05:42 +00:00
Philip Reames	81abc7fb0c	[Tests] Add tests to demonstrate hoisting of unordered invariant loads llvm-svn: 356184	2019-03-14 18:06:15 +00:00
Philip Reames	9616cf0510	[Tests] Revert an accident change to a test llvm-svn: 356183	2019-03-14 18:02:19 +00:00
Philip Reames	c53f02a32a	Auto-generate an existing test to make it easier to update llvm-svn: 356181	2019-03-14 17:59:59 +00:00
Philip Reames	af41b282c5	[Tests] Add tests for reordering of unordered atomics on invariant locations llvm-svn: 356172	2019-03-14 17:36:58 +00:00
Philip Reames	70d156991c	Allow code motion (and thus folding) for atomic (but unordered) memory operands Building on the work done in D57601, now that we can distinguish between atomic and volatile memory accesses, go ahead and allow code motion of unordered atomics. As seen in the diffs, this allows much better folding of memory operations into using instructions. (Mostly done by the PeepholeOpt pass.) Note: I have not reviewed all callers of hasOrderedMemoryRef since one of them - isSafeToMove - is very widely used. I'm relying on the documented semantics of each method to judge correctness. Differential Revision: https://reviews.llvm.org/D59345 llvm-svn: 356170	2019-03-14 17:20:59 +00:00
Philip Reames	8dd9b54d9b	[Tests] Add negative folding tests w/fences as requested in D59345 llvm-svn: 356165	2019-03-14 17:05:18 +00:00
Craig Topper	c747ac3f93	[X86] Fix the pattern changes from r356121 so that the RORr1/RORm1 pattern use the rotr opcode. These instructions used to use rotl with a bitwidth-1 immediate. I changed the immediate to 1, but failed to change the opcode. Thankfully this seems to have not caused a functional issue because we now had two rotl by 1 patterns, but the correct ones were earlier and took priority. So we just missed some optimization. llvm-svn: 356164	2019-03-14 16:53:24 +00:00
Sanjay Patel	5d1df114e8	[x86] prevent infinite looping from vselect commutation (PR41066) This is an immediate fix for: https://bugs.llvm.org/show_bug.cgi?id=41066 ...but as noted there and the code comments, we should do better by stubbing this out sooner. llvm-svn: 356158	2019-03-14 15:32:34 +00:00
Craig Topper	54a0b53308	[X86] Add patterns for rotr by immediate to fix PR41057. Prior to the introduction of funnel shift intrinsics we could count on rotate by immediates prefering to use rotl since that's what MatchRotate would check first. The or+shift pattern doesn't have a direction so one must be chosen arbitrarily. With funnel shift, there is a direction and fshr will try to use rotr first. While fshl will try to use rotl first. This patch adds the isel patterns for rotr to complement the rotl patterns. I've put the rotr by 1 patterns in the instruction patterns. And moved the rotl by bitwidth-1 patterns to separate Pat patterns. Fixes PR41057. llvm-svn: 356121	2019-03-14 07:07:26 +00:00
Craig Topper	c867847016	[X86] Add various test cases for PR41057. NFC llvm-svn: 356120	2019-03-14 07:07:24 +00:00
Quentin Colombet	e77e5f44b8	[GlobalISel][Utils] Add a getConstantVRegVal variant that looks through instrs getConstantVRegVal used to only look for G_CONSTANT when looking at unboxing the value of a vreg. However, constants are sometimes not directly used and are hidden behind trunc, s\|zext or copy chain of computation. In particular this may be introduced by the legalization process that doesn't want to simplify these patterns because it can lead to infine loop when legalizing a constant. To circumvent that problem, add a new variant of getConstantVRegVal, named getConstantVRegValWithLookThrough, that allow to look through extensions. Differential Revision: https://reviews.llvm.org/D59227 llvm-svn: 356116	2019-03-14 01:37:13 +00:00
Craig Topper	fad96a1588	[X86] Add 64-bit mode command lines to rot32.ll so that it will demonstrate PR41055 for 32 bit. NFC llvm-svn: 356112	2019-03-14 00:23:31 +00:00
Simon Pilgrim	e15cd7909b	[X86] Remove icmp undef in more reduced tests llvm-svn: 356084	2019-03-13 19:07:54 +00:00
Simon Pilgrim	8f1b825068	[X86] Regenerate tail call tests llvm-svn: 356083	2019-03-13 19:04:45 +00:00
Craig Topper	84abec2855	[X86] Check for 64-bit mode in X86Subtarget::hasCmpxchg16b() The feature flag alone can't be trusted since it can be passed via -mattr. Need to ensure 64-bit mode as well. We had a 64 bit mode check on the instruction to make the assembler work correctly. But we weren't guarding any of our lowering code or the hooks for the AtomicExpandPass. I've added 32-bit command lines to atomic128.ll with and without cx16. The tests there would all previously fail if -mattr=cx16 was passed to them. I had to move one test case for f128 to a new file as it seems to have a different 32-bit mode or possibly sse issue. Differential Revision: https://reviews.llvm.org/D59308 llvm-svn: 356078	2019-03-13 18:48:50 +00:00
Simon Pilgrim	e1be3403ff	[X86] Avoid icmp undef in reduced tests Because we don't currently simplify icmp with undef in DAG, bugpoint loves to introduce them during reduction. This is a small step towards re-adding non-undef values into some of the simpler tests so that they should still test correctly and emit similar/same codegen. Prep work for PR40800 ([SelectionDAG] Add UNDEF handling to SelectionDAG::FoldSetCC). llvm-svn: 356076	2019-03-13 18:36:59 +00:00
Simon Pilgrim	510f26dca8	Regenerate test llvm-svn: 356071	2019-03-13 18:18:24 +00:00
Nirav Dave	d6351340bb	[DAGCombiner] If a TokenFactor would be merged into its user, consider the user later. Summary: A number of optimizations are inhibited by single-use TokenFactors not being merged into the TokenFactor using it. This makes we consider if we can do the merge immediately. Most tests changes here are due to the change in visitation causing minor reorderings and associated reassociation of paired memory operations. CodeGen tests with non-reordering changes: X86/aligned-variadic.ll -- memory-based add folded into stored leaq value. X86/constant-combiners.ll -- Optimizes out overlap between stores. X86/pr40631_deadstore_elision -- folds constant byte store into preceding quad word constant store. Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet Reviewed By: courbet Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59260 llvm-svn: 356068	2019-03-13 17:07:09 +00:00
Simon Pilgrim	bef4fe056d	[X86][AVX] Add X86ISD::VTRUNC handling to SimplifyDemandedVectorEltsForTargetNode llvm-svn: 356067	2019-03-13 17:00:18 +00:00
Simon Pilgrim	d9aa879b67	[X86][AVX] Add combineConcatVectors support to improve subvector handling Attempt to combine CONCAT_VECTORS nodes, which we only really have pre-legalization. This encourages a lot of X86ISD::SUBV_BROADCAST generation, so I've added SimplifyDemandedVectorEltsForTargetNode handling for this at the same time. The X86ISD::VTRUNC regression in shuffle-vs-trunc-256-widen.ll will be handled in a future commit. llvm-svn: 356064	2019-03-13 16:37:30 +00:00
Sanjay Patel	0a251e4076	[x86] limit extractelement of setcc to pre-legalization A fuzzer found the crasher: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13700 The bug was introduced recently here: rL355741 This is the quick fix. If we need to do this transform later, then we'd have to extend/truncate the vector setcc element type to the scalar setcc type (i8). llvm-svn: 356053	2019-03-13 14:49:52 +00:00
Clement Courbet	3bb5d0bb9b	Re-land r354244 "[DAGCombiner] Eliminate dead stores to stack." Always check candidates for hasOtherUses(), not only stores. llvm-svn: 356050	2019-03-13 13:56:23 +00:00
Simon Pilgrim	7abbd70300	[X86][AVX] lowerShuffleAsBroadcast - improve load folding by avoiding bitcasts AVX1 broadcasts were failing as we were adding bitcasts that caused MayFoldLoad's hasOneUse to return false. This patch stops introducing bitcasts so early and also replaces the broadcast index scaling through bitcasts (which can't succeed in some cases) to instead just keep track of the bitoffset which can be converted back to the broadcast index later on. Differential Revision: https://reviews.llvm.org/D58888 llvm-svn: 356043	2019-03-13 12:20:39 +00:00
Simon Pilgrim	360ce82db2	[DAG] Move integer setcc %x, %x folding into FoldSetCC First step towards PR40800 - I intend to move the float case in a separate future patch. I had to tweak the (overly reduced) thumb2 test and the x86 widening test change is annoying (no longer rematerializable) but we should address this separately. Differential Revision: https://reviews.llvm.org/D59244 llvm-svn: 356040	2019-03-13 11:08:57 +00:00
Philip Reames	21a50ccf9c	[ImplicitNullChecks] Support unordered atomic accesses Update the INC pass to allow folding unordered atomics. This is the first optimization unblocked by the changes landed from D57601. llvm-svn: 356006	2019-03-13 03:25:20 +00:00
Philip Reames	80ccc88869	[Tests] Expand implicit null check coverage llvm-svn: 356004	2019-03-13 03:17:58 +00:00
Craig Topper	9bae5ba076	[X86] Add ImmArg markings to intrinsics. Remove test cases that checked for not crashing when immediate operands were passed not an immediate. These are now considered ill-formed in IR. This was done by manually scanning the intrinsic file for llvm_i32_ty and llvm_i8_ty which are the predominant types we use for immediates. Most of them are on vector intrinsics. I might have missed some other intrinsics. Differential Revision: https://reviews.llvm.org/D58302 llvm-svn: 355993	2019-03-12 23:48:07 +00:00
Philip Reames	b760558517	[Test] Add tests for implicit null checks on atomic/volatile instructions llvm-svn: 355983	2019-03-12 21:09:58 +00:00
Philip Reames	9134f84ba4	For faulting ops, include a comment w/the fault destination A faulting_op is one that has specified behavior when a fault occurs, generally redirecting control flow to another location. This change just adds a comment to the assembly output which makes it both human readable, and machine checkable w/o having to parse the FaultMap section. This is used to split a test file into two parts, so that I can (in a near future commit) easily extend the test file to demonstrate another case. llvm-svn: 355982	2019-03-12 21:05:31 +00:00
Sanjay Patel	737c27a9cd	[x86] scalarize extractelement 0 of FP vselect llvm-svn: 355955	2019-03-12 19:20:45 +00:00
Nikita Popov	149bc099f6	[SDAG] Expand pow2 mulo using shifts Expand MULO with constant power of two operand into a shift. The overflow is checked with (x << shift) >> shift == x, where the right shift will be logical for umulo and arithmetic for smulo (with exception for multiplications by signed_min). Differential Revision: https://reviews.llvm.org/D59041 llvm-svn: 355937	2019-03-12 16:57:25 +00:00
Jonas Paulsson	8b8dc50e79	[RegAlloc] Avoid compile time regression with multiple copy hints. As a fix for https://bugs.llvm.org/show_bug.cgi?id=40986 ("excessive compile time building opencollada"), this patch makes sure that no phys reg is hinted more than once from getRegAllocationHints(). This handles the case were many virtual registers are assigned to the same physreg. The previous compile time fix (r343686) in weightCalcHelper() only made sure that physical/virtual registers are passed no more than once to addRegAllocationHint(). Review: Dimitry Andric, Quentin Colombet https://reviews.llvm.org/D59201 llvm-svn: 355854	2019-03-11 19:00:37 +00:00
Simon Pilgrim	06ae025345	[X86] Extend widening comparison test. Ensure we test both v2i16 unary and binary comparisons. llvm-svn: 355849	2019-03-11 18:08:20 +00:00
Craig Topper	00afa193f1	[X86] Enable sse2_cvtsd2ss intrinsic to use an EVEX encoded instruction. llvm-svn: 355810	2019-03-11 06:01:04 +00:00
Amaury Sechet	a5820cbd20	Add test case for add to sub post legalization. NFC llvm-svn: 355797	2019-03-11 01:25:48 +00:00
Sanjay Patel	26e06e859e	[x86] add x86-specific opcodes to extractelement scalarization list llvm-svn: 355792	2019-03-10 18:56:21 +00:00
Craig Topper	93e15dfacc	[X86] Make lowering of intrinsics with rounding mode stricter so that only valid rounding modes are lowered. Update tests accordingly Many of our tests were not using valid rounding mode immediates. Clang verifies this in the frontend when it creates the intrinsics from builtins, but the backend would still lower invalid immediates. With this change we will now leave them as intrinsics if the immediate is invalid. This will cause an isel selection failure. llvm-svn: 355789	2019-03-10 17:20:45 +00:00
Sanjay Patel	40bcc3de7d	[x86] add tests for extract of FP select; NFC llvm-svn: 355768	2019-03-09 02:11:05 +00:00
Craig Topper	69f8c1653d	[ScalarizeMaskedMemIntrin] Use IRBuilder functions that take uint32_t/uint64_t for getelementptr, extractelement, and insertelement. This saves needing to call getInt32 ourselves. Making the code a little shorter. The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue. This cleanup because I plan to write similar code for expandload/compressstore. llvm-svn: 355767	2019-03-09 02:08:41 +00:00
Sanjay Patel	f84083b4db	[x86] scalarize extract element 0 of FP cmp An extension of D58282 noted in PR39665: https://bugs.llvm.org/show_bug.cgi?id=39665 This doesn't answer the request to use movmsk, but that's an independent problem. We need this and probably still need scalarization of FP selects because we can't do that as a target-independent transform (although it seems likely that targets besides x86 should have this transform). llvm-svn: 355741	2019-03-08 21:54:41 +00:00
Sanjay Patel	43f098e719	[x86] add tests for extracted vector FP cmp; NFC llvm-svn: 355727	2019-03-08 20:45:27 +00:00
Amaury Sechet	782ac933b5	[DAGCombiner] fold (add (add (xor a, -1), b), 1) -> (sub b, a) Summary: This pattern is sometime created after legalization. Reviewers: efriedma, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58874 llvm-svn: 355716	2019-03-08 19:39:32 +00:00
Sanjay Patel	b22f438df3	[x86] prevent infinite looping from inverse shuffle transforms llvm-svn: 355713	2019-03-08 19:20:28 +00:00
Simon Pilgrim	53652feab7	[X86] Add test case for PR22473 llvm-svn: 355712	2019-03-08 19:16:26 +00:00
Simon Pilgrim	00ab0339ed	Fix typo in constant vector llvm-svn: 355699	2019-03-08 15:17:26 +00:00
Clement Courbet	8e16d73346	[SelectionDAG] Allow the user to specify a memeq function. Summary: Right now, when we encounter a string equality check, e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a small compile-time constant, and fall back on calling `memcmp()` else. This is sub-optimal because memcmp has to compute much more than equality. This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms that support `bcmp`. `bcmp` can be made much more efficient than `memcmp` because equality compare is trivially parallel while lexicographic ordering has a chain dependency. Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits Differential Revision: https://reviews.llvm.org/D56593 llvm-svn: 355672	2019-03-08 09:07:45 +00:00
Craig Topper	4505c99e72	[X86] Improve the type checking in isLegalMaskedLoad and isLegalMaskedGather. We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store. For FP we now explicitly check for only double and float. For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly. We only check bitwidth after checking that the type is an integer. llvm-svn: 355667	2019-03-08 07:33:43 +00:00
Sanjay Patel	5ed14ef1e4	[x86] add extract FP tests for target-specific nodes; NFC llvm-svn: 355655	2019-03-07 23:55:54 +00:00
Vlad Tsyrklevich	2e1479e2f2	Delete x86_64 ShadowCallStack support Summary: ShadowCallStack on x86_64 suffered from the same racy security issues as Return Flow Guard and had performance overhead as high as 13% depending on the benchmark. x86_64 ShadowCallStack was always an experimental feature and never shipped a runtime required to support it, as such there are no expected downstream users. Reviewers: pcc Reviewed By: pcc Subscribers: mgorny, javed.absar, hiraditya, jdoerfert, cfe-commits, #sanitizers, llvm-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D59034 llvm-svn: 355624	2019-03-07 18:56:36 +00:00
Craig Topper	3acc4236b8	[X86] Enable combineFMinNumFMaxNum for 512 bit vectors when AVX512 is enabled. Simplified by just checking if the vector type is legal rather than listing all combinations of types and features. Fixes PR40984. llvm-svn: 355582	2019-03-07 06:30:19 +00:00
Craig Topper	a0dd6e9a08	[X86] Add 512-bit fminnum/maxnum test cases for PR40984. Also add v8f32 minnum/maxnum tests. NFC llvm-svn: 355581	2019-03-07 05:56:52 +00:00
Nikita Popov	e1012e1efb	[X86] Add vector mulo with power of two operand tests; NFC llvm-svn: 355544	2019-03-06 20:25:49 +00:00
Paul Robinson	05efe0fdc4	[PS4] Emit a trap after a stack-protector fail call. llvm-svn: 355542	2019-03-06 19:57:43 +00:00
Simon Pilgrim	cdf95f8f07	[DAGCombiner] Enable UADDO/USUBO vector combine support Differential Revision: https://reviews.llvm.org/D58965 llvm-svn: 355517	2019-03-06 16:11:03 +00:00
Alexander Kornienko	3d467a890e	Revert "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default" This reverts commit `2a0f2c5ef3` (r355490). The commit causes an assertion failure when compiling LLVM code: $ cat repro.cpp class QQQ { public: bool x() const; bool y() const; unsigned getSizeInBits() const { if (y() \|\| x()) return getScalarSizeInBits(); return getScalarSizeInBits() * 2; } unsigned getScalarSizeInBits() const; }; int f(const QQQ &Ty) { switch (Ty.getSizeInBits()) { case 1: case 8: return 0; case 16: return 1; case 32: return 2; case 64: return 3; default: __builtin_unreachable(); } } $ clang -O2 -o repro.o repro.cpp assert.h assertion failed at llvm/include/llvm/ADT/ilist_iterator.h:139 in llvm::ilist_iterator::reference llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, true, false>::operator() const [OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, IsReverse = true, IsConst = false]: !NodePtr->isKnownSentinel() Check failure stack trace: * @ 0x558aab4afc10 __assert_fail @ 0x558aa885479b llvm::ilist_iterator<>::operator() @ 0x558aa8854715 llvm::MachineInstrBundleIterator<>::operator() @ 0x558aa92c33c3 llvm::X86InstrInfo::optimizeCompareInstr() @ 0x558aa9a9c251 (anonymous namespace)::PeepholeOptimizer::optimizeCmpInstr() @ 0x558aa9a9b371 (anonymous namespace)::PeepholeOptimizer::runOnMachineFunction() @ 0x558aa99a4fc8 llvm::MachineFunctionPass::runOnFunction() @ 0x558aab019fc4 llvm::FPPassManager::runOnFunction() @ 0x558aab01a3a5 llvm::FPPassManager::runOnModule() @ 0x558aab01aa9b (anonymous namespace)::MPPassManager::runOnModule() @ 0x558aab01a635 llvm::legacy::PassManagerImpl::run() @ 0x558aab01afe1 llvm::legacy::PassManager::run() @ 0x558aa5914769 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly() @ 0x558aa5910f44 clang::EmitBackendOutput() @ 0x558aa5906135 clang::BackendConsumer::HandleTranslationUnit() @ 0x558aa6d165ad clang::ParseAST() @ 0x558aa6a94e22 clang::ASTFrontendAction::ExecuteAction() @ 0x558aa590255d clang::CodeGenAction::ExecuteAction() @ 0x558aa6a94840 clang::FrontendAction::Execute() @ 0x558aa6a38cca clang::CompilerInstance::ExecuteAction() @ 0x558aa4e2294b clang::ExecuteCompilerInvocation() @ 0x558aa4df6200 cc1_main() @ 0x558aa4e1b37f ExecuteCC1Tool() @ 0x558aa4e1a725 main @ 0x7ff20d56abbd __libc_start_main @ 0x558aa4df51c9 _start llvm-svn: 355515	2019-03-06 15:23:50 +00:00
Simon Pilgrim	1bdc2d1874	[DAGCombiner] Add SADDO/SSUBO combine support Basic constant handling folds, for both scalars and vectors Differential Revision: https://reviews.llvm.org/D58967 llvm-svn: 355506	2019-03-06 14:22:21 +00:00
Roman Lebedev	4764310505	[X86][NFC] Autogenerate check lines in cmovcmov.ll test Investigating 8-bit cmov promotion, this test comes up. llvm-svn: 355496	2019-03-06 11:47:43 +00:00
Simon Pilgrim	642f53d292	[DAGCombiner] Enable SMULO/UMULO vector combine support (PR40442) Differential Revision: https://reviews.llvm.org/D58968 llvm-svn: 355495	2019-03-06 11:04:21 +00:00
Simon Pilgrim	468bb2e601	[X86][SSE] VSELECT(XOR(Cond,-1), LHS, RHS) --> VSELECT(Cond, RHS, LHS) As noticed on D58965 DAGCombiner::visitSELECT has something similar, so we should be able to move this to DAGCombiner and support VSELECT as well at some point. Differential Revision: https://reviews.llvm.org/D58974 llvm-svn: 355494	2019-03-06 10:54:43 +00:00

... 2 3 4 5 6 ...

13828 Commits