llvm-project

Commit Graph

Author	SHA1	Message	Date
Guy Blank	509d1b2a5a	[X86][AVX512] regenerate avx512-insert-extract.ll llvm-svn: 307654	2017-07-11 11:51:49 +00:00
Serguei Katkov	0e831c996c	Revert Revert [MBP] do not rotate loop if it creates extra branch This is a second attempt to land this patch. The first one resulted in a crash of clang sanitizer buildbot. The fix is here and regression test is added. This is a last fix for the corner case of PR32214. Actually this is not really corner case in general. We should not do a loop rotation if we create an additional branch due to it. Consider the case where we have a loop chain H, M, B, C , where H is header with viable fallthrough from pre-header and exit from the loop M - some middle block B - backedge to Header but with exit from the loop also. C - some cold block of the loop. Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch. Let's compute the change in number of branches: +1 branch from pre-header to header -1 branch from header to exit +1 branch from header to middle block if there is such -1 branch from cold bock to header if there is one So if C is not a predecessor of H then we introduce extra branch. This change actually prohibits rotation of the loop if both true Best Exit has next element in chain as successor. Last element in chain is not a predecessor of first element of chain. Reviewers: iteratee, xur, sammccall, chandlerc Reviewed By: iteratee Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34745 llvm-svn: 307631	2017-07-11 08:34:58 +00:00
Igor Breger	324d3791f8	[GlobalISel][X86] Use correct AND instructions. AND8ri8 not supported in 64bit. llvm-svn: 307630	2017-07-11 08:04:51 +00:00
Serguei Katkov	0b7b59ada3	[CGP] Relax a bit restriction for optimizeMemoryInst to extend scope CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing if all instructions combining the address for memory instruction is in the same block as memory instruction itself. However if any of these instruction are placed after memory instruction then address calculation will not be folded to memory instruction. The added test case shows an example. Reviewers: loladiro, spatel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34862 llvm-svn: 307628	2017-07-11 06:24:44 +00:00
Matthias Braun	b38736706e	Revert "[DAG] Improve Aliasing of operations to static alloca" Reverting as it breaks tramp3d-v4 in the llvm test-suite. I added some comments to https://reviews.llvm.org/D33345 about it. This reverts commit r307546. llvm-svn: 307589	2017-07-10 20:51:30 +00:00
Andrew V. Tischenko	ae9d6db769	[X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 (PR28573). The new version of the model is definitely faster. Differential Revision: https://reviews.llvm.org/D35198 llvm-svn: 307552	2017-07-10 16:36:03 +00:00
Nirav Dave	163e1ad9dc	[DAG] Improve Aliasing of operations to static alloca Memory accesses offset from frame indices may alias, e.g., we may merge write from function arguments passed on the stack when they are contiguous. As a result, when checking aliasing, we consider the underlying frame index's offset from the stack pointer. Static allocs are realized as stack objects in SelectionDAG, but its offset is not set until post-DAG causing DAGCombiner's alias check to consider access to static allocas to frequently alias. Modify isAlias to consider access between static allocas and access from other frame objects to be considered aliasing. Many test changes are included here. Most are fixes for tests which indirectly relied on our aliasing ability and needed to be modified to preserve their original intent. The remaining tests have minor improvements due to relaxed ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll which has a minor degradation dispite though the pre-legalized DAG is improved. Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand Reviewed By: rnk Subscribers: sdardis, nemanjai, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33345 llvm-svn: 307546	2017-07-10 15:39:41 +00:00
Gadi Haber	f4d154c089	This patch completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target. The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information. Please note that the patch extensively affects the X86 MC instr scheduling for SNB. Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX. The updated and extended information about each instruction includes the following details: •static latency of the instruction •number of uOps from which the instruction consists of •all ports used by the instruction's' uOPs For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5: def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> { let Latency = 9; let NumMicroOps = 6; let ResourceCycles = [1,2,2,1]; } def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>; def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>; def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>; def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>; Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script. Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb Differential Revision: https://reviews.llvm.org/D35019#inline-304691 llvm-svn: 307529	2017-07-10 09:53:16 +00:00
Igor Breger	d8b51e134e	[GlobalISel][X86] Support G_LOAD/G_STORE i1. Summary: Support G_LOAD/G_STORE i1. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35178 llvm-svn: 307527	2017-07-10 09:26:09 +00:00
Igor Breger	d48c5e4855	[GlobalISel][X86] extend G_ZEXT support. Summary: Mark G_ZEXT/G_SEXT i1 to i8/i16, i8 to i16 as legal. Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code). This patch requred to support G_LOAD/G_STORE i1. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D35177 llvm-svn: 307526	2017-07-10 09:07:34 +00:00
Davide Italiano	c4b0ccd049	[X86] Relax an assertion when legalizing vector types. WidenVSELECTAndMask can fold (and it folds in this case) so we get a BUILD_VECTOR of constants as mask. convertMask() seems to work fine when the input is a vector of constants, and we still need to call it to extend/add elements at the end. but the current code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC. This change was discussed briefly with Simon Pilgrim, who also suggests we might consider dropping this assertion in the future. Fixes PR33715. llvm-svn: 307508	2017-07-09 19:22:48 +00:00
Simon Pilgrim	8247687e0f	[X86][AVX512] Regenerate AVX512VL comparison tests. Show poor codegen on KNL targets as mentioned on D35179 llvm-svn: 307500	2017-07-09 15:47:43 +00:00
Igor Breger	769cd05232	[GlobalISel][X86] Add legalizer tests for G_LOAD/G_STORE operations. NFC. llvm-svn: 307494	2017-07-09 07:25:57 +00:00
Igor Breger	b80b44b7b9	[FastISel] fix a fallback diagnostic. Summary: FastISel was marked as failed in case instruction selection succeeded. Reviewers: qcolombet, zvi, rovka, ab Reviewed By: zvi Subscribers: javed.absar, ab, qcolombet, bogner, llvm-commits Differential Revision: https://reviews.llvm.org/D34438 llvm-svn: 307489	2017-07-09 05:55:20 +00:00
Hiroshi Inoue	713b5ba2de	fix trivial typos; NFC sucessor -> successor llvm-svn: 307488	2017-07-09 05:54:44 +00:00
Sanjay Patel	18ee908ca2	[x86] add SBB optimization for SETBE (ule) condition code x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess with missing optimizations. We handle some patterns, but miss logical variants. To clean that up, we should convert all select-of-constants to logic/math and enhance the combining for the expected patterns from that. Selecting 0 or -1 needs extra attention to produce the optimal code as shown here. Attempt to verify that all of these IR forms are logically equivalent: http://rise4fun.com/Alive/plxs Earlier steps in this series: rL306040 rL306072 rL307404 (D34652) As acknowledged in the earlier review, there's a possibility that some Intel uarch would prefer to produce an xor to clear the fake register operand with sbb %eax, %eax. This will likely need to be addressed in a separate pass. llvm-svn: 307471	2017-07-08 14:04:48 +00:00
Sanjay Patel	dd36f75733	[x86] add SBB optimization for SETAE (uge) condition code x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess with missing optimizations. We handle some patterns, but miss logical variants. To clean that up, we should convert all select-of-constants to logic/math and enhance the combining for the expected patterns from that. DAGCombiner already has the foundation to allow the transforms, so we just need to fill in the holes for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the optimal code as shown here. Attempt to verify that all of these IR forms are logically equivalent: http://rise4fun.com/Alive/plxs Earlier steps in this series: rL306040 rL306072 Differential Revision: https://reviews.llvm.org/D34652 llvm-svn: 307404	2017-07-07 14:56:20 +00:00
Wei Mi	20526b2725	[ConstHoisting] choose to hoist when frequency is the same. The patch is to adjust the strategy of frequency based consthoisting: Previously when the candidate block has the same frequency with the existing blocks containing a const, it will not hoist the const to the candidate block. For that case, now we change the strategy to hoist the const if only existing blocks have more than one block member. This is helpful for reducing code size. Differential Revision: https://reviews.llvm.org/D35084 llvm-svn: 307328	2017-07-06 22:32:27 +00:00
Simon Pilgrim	a80cb1d7a7	[X86][SSE] Tests for bitcasting iX integers to vXi1 boolean vectors Including sign/zero extension to legal types llvm-svn: 307301	2017-07-06 19:33:10 +00:00
Simon Pilgrim	0fee3372c9	[X86][SSE] Dropped -mcpu from bitcast+setcc tests Use triple and attribute only for consistency Added SSE2/AVX tests on 256-bit vectors to test PACKSS behaviour llvm-svn: 307289	2017-07-06 18:27:34 +00:00
Wei Mi	90707394e3	[LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale. When the formulae search space is huge, LSR uses a series of heuristic to keep pruning the search space until the number of possible solutions are within certain limit. The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs, which picks the register which is used by the most LSRUses and deletes the other formulae which don't use the register. This is a effective way to prune the search space, but quite often not a good way to keep the best solution. We saw cases before that the heuristic pruned the best formula candidate out of search space. To relieve the problem, we introduce a new heuristic called NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to reduce the search space while keeping the best formula, we want to keep as many formulae with different Scale and ScaledReg as possible. That is because the central idea of LSR is to choose a group of loop induction variables and use those induction variables to represent LSRUses. An induction variable candidate is often represented by the Scale and ScaledReg in a formula. If we have more formulae with different ScaledReg and Scale to choose, we have better opportunity to find the best solution. That is why we believe pruning search space by only keeping the best formula with the same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use two criteria to choose the best formula with the same Scale and ScaledReg. The first criteria is to select the formula using less non shared registers, and the second criteria is to select the formula with less cost got from RateFormula. The patch implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the last resort. Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help on the two improved internal cases we saw. Differential Revision: https://reviews.llvm.org/D34583 llvm-svn: 307269	2017-07-06 15:52:14 +00:00
Simon Pilgrim	713600747e	[X86][SSE4A] Add support for shuffle combining to INSERTQI. llvm-svn: 307268	2017-07-06 15:34:17 +00:00
Simon Pilgrim	03641df383	[X86][SSE4A] Add test showing missed opportunities to combine INSERTQI shuffle llvm-svn: 307265	2017-07-06 14:52:24 +00:00
Sanjay Patel	2a341620e7	[x86] fix over-specified triple and auto-generate checks; NFC llvm-svn: 307262	2017-07-06 14:15:15 +00:00
Simon Pilgrim	cc0f785dca	[X86][SSE4A] Add support for shuffle combining to EXTRQ. llvm-svn: 307254	2017-07-06 12:22:58 +00:00
Simon Pilgrim	40c0ae200f	[X86][SSE4A] Add scheduling tests for SSE4A instructions llvm-svn: 307251	2017-07-06 11:26:43 +00:00
Simon Pilgrim	ac78daf517	{DAGCombiner] Fold (rot x, 0) -> x llvm-svn: 307184	2017-07-05 18:27:11 +00:00
Simon Pilgrim	49123d4bb0	[X86] Test bitfield loadstore tests on i686 as well llvm-svn: 307182	2017-07-05 18:09:30 +00:00
Andrew Zhogin	45d192823e	[DAGCombiner] visitRotate patch to optimize pair of ROTR/ROTL instructions into one with combined shift operand. For two ROTR operations with shifts C1, C2; combined shift operand will be (C1 + C2) % bitsize. Differential revision: https://reviews.llvm.org/D12833 llvm-svn: 307179	2017-07-05 17:55:42 +00:00
Simon Pilgrim	55006b407b	[X86][SSE] Dropped -mcpu from bitcast+setcc mask tests Use triple and attribute only for consistency llvm-svn: 307176	2017-07-05 17:30:30 +00:00
Igor Breger	55e2f5963a	[GlobalIsel] allow x86_fp80 values to be dumped. Summary: Otherwise the fallback path fails with an assertion on x86_64 targets, when "x86_fp80" is encountered. Reviewers: t.p.northover, zvi, guyblank Reviewed By: zvi Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34975 llvm-svn: 307140	2017-07-05 11:11:10 +00:00
Nirav Dave	b320ef9fab	Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset Relanding after rewriting undef.ll test to avoid host-dependant endianness. As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using generic checks. Also, propagate missing local handling from there to BaseIndexOffset checks. Tests of note: * test/CodeGen/X86/build-vector* - Improved. * test/CodeGen/BPF/undef.ll - Improved store alignment allows an additional store merge * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a case we already do not handle well. Here, the DAG is improved, but scheduling causes a code size degradation. Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D34472 llvm-svn: 307114	2017-07-05 01:21:23 +00:00
Gadi Haber	689426e3cb	NFC. Made some updates to the half.ll test under CodeGen to make it friendly to the update_llc_test_checks .py tool as follows: 1.Removing the llc flag -asm-verbose=false 2.Grouping the multiple check-prefix directives 3.Apply update_llc_test_checks.py tool on the test This change is needed to easily update scheduling changes in an upcoming patch. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D34934 llvm-svn: 307108	2017-07-04 21:51:05 +00:00
Simon Pilgrim	ac3e7f3f57	[X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shuffles With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types llvm-svn: 307099	2017-07-04 18:11:02 +00:00
Anna Thomas	505941e7d6	[FastISel] Move gc intrinsic test to X86 directory Move from generic to X86 directory since gc intrinsics only supposed in X86 64 bit. Add target triple as well. Fixes build failure in i686-linux-RA caused by rL307084. llvm-svn: 307086	2017-07-04 15:24:08 +00:00
Simon Pilgrim	d128222f0c	[X86] Add combine tests for vector rotates Reference tests for D12833 llvm-svn: 307073	2017-07-04 12:33:53 +00:00
Gadi Haber	4980790e81	NFC commit. Converting the Codegen test "extractelement-legalization-store-ordering.ll" to be "update_llc_test_checks" friendly. The changes to the test are needed for an upcoming scheduling patch. Reviewers: zvi, RKSimon Differential Revision: https://reviews.llvm.org/D34935 llvm-svn: 307066	2017-07-04 07:18:03 +00:00
Craig Topper	ad140cfb68	[X86] Add comment string for broadcast loads from the constant pool. Summary: When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool. I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead. Reviewers: spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34923 llvm-svn: 307062	2017-07-04 05:46:11 +00:00
Anton Yartsev	66d32c5e06	[legalize-types] Clean up softening machinery. The patch makes SoftenFloatResult/Operand logic just the same as all other legalization routines have: SoftenFloatResult() now fills the SoftenFloats map and SoftenFloatOperand() perform all needed replacements. This prevents softening mashinery from leaving stale entries in SoftenFloats map (that resulted in errors during the legalize type checking) and clarifies softening. The patch replaces https://reviews.llvm.org/D29265. Differential Revision: https://reviews.llvm.org/D31946 llvm-svn: 307053	2017-07-04 01:08:55 +00:00
Simon Pilgrim	fa6e675267	[X86][SSE4A] Add support for combining from EXTRQI/INSERTQI shuffles llvm-svn: 307048	2017-07-03 20:58:16 +00:00
Simon Pilgrim	bdfb3b1d5f	[X86][SSE4A] Add SSE4A shuffle tests on pre-SSSE3 hardware llvm-svn: 307042	2017-07-03 16:53:11 +00:00
Simon Pilgrim	b5c68a6717	[X86][SSE4A] Test SSE4A shuffle combining on SSE42 capable target as well llvm-svn: 307038	2017-07-03 15:55:54 +00:00
Zvi Rackover	d7a1c334ce	DAGCombine: Combine BUILD_VECTOR to TRUNCATE Summary: Add a combine for creating a truncate to replace a build_vector composed of extracts with indices that form a stride-2^N series. Example: v8i32 V = ... v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6)) --> v4i32 truncate (bitcast V to v4i64) Related discussion in llvm-dev about canonicalizing shuffles to truncates in LLVM IR: http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html. Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena Reviewed By: delena Subscribers: guyblank, delena, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D34077 llvm-svn: 307036	2017-07-03 15:47:40 +00:00
Sanjay Patel	e9b1d16a8c	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. There were also over-specifications in the RUN params such as CPU model. llvm-svn: 307033	2017-07-03 15:27:19 +00:00
Sanjay Patel	d3173740fd	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. There were also several over-specifications in the RUN params such as CPU model or OS requirement llvm-svn: 307028	2017-07-03 15:04:05 +00:00
Simon Pilgrim	decfaca033	[X86][SSE4A] Add tests showing missed opportunities to combine EXTRQI/INSERTQI shuffles llvm-svn: 307027	2017-07-03 15:01:07 +00:00
Sanjay Patel	dab798a25f	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. llvm-svn: 307024	2017-07-03 14:29:45 +00:00
Igor Breger	5c787ab346	[GlobalISel][X86] fix %ptr(p0) = G_CONSTANT selection. llvm-svn: 307019	2017-07-03 11:06:54 +00:00
Simon Pilgrim	f05c5ef441	[X86][AVX512] Test AVX512VPOPCNTDQ CTPOP with/without AVX512BW llvm-svn: 306991	2017-07-02 19:52:20 +00:00
Simon Pilgrim	a9655ffb42	[X86][AVX512VPOPCNTDQ] Improve support for v16i8/v8i16/v16i16/ CTPOP Zero extend to v16i32/v8i64, use VPOPCNTDQ instructions and truncate back. llvm-svn: 306990	2017-07-02 19:32:37 +00:00
Simon Pilgrim	3f5ed96f92	[X86][AVX512] Cleanup tzcnt tests triples and attributes Avoid use of specific -mcpu llvm-svn: 306989	2017-07-02 18:51:48 +00:00
Simon Pilgrim	df55dd09d6	[X86][AVX512] Cleanup popcnt tests triples and attributes Avoid use of specific -mcpu llvm-svn: 306988	2017-07-02 18:35:22 +00:00
Sanjay Patel	7d263c1a27	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. llvm-svn: 306984	2017-07-02 15:24:08 +00:00
Sanjay Patel	dd076f0178	[x86] remove unnecessary RUN for test after auto-generating checks; NFC llvm-svn: 306983	2017-07-02 15:16:17 +00:00
Sanjay Patel	c22223e6cd	[x86] update test to use FileCheck and auto-generate checks; NFC llvm-svn: 306982	2017-07-02 15:15:18 +00:00
Sanjay Patel	27cccc96c2	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. llvm-svn: 306981	2017-07-02 14:50:35 +00:00
Simon Pilgrim	8971b2904e	[X86][SSE] Attempt to combine 64-bit and 32-bit shuffles to unary shuffles before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive llvm-svn: 306978	2017-07-02 14:16:25 +00:00
Simon Pilgrim	4cb5613c38	[X86][SSE] Attempt to combine 64-bit and 16-bit shuffles to unary shuffles before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive The 32-bit shuffles are a bit tricky and will be dealt with in a later patch llvm-svn: 306977	2017-07-02 13:19:10 +00:00
Simon Pilgrim	638af5f1c4	[X86][SSE] Add test showing missed opportunity to combine to pshuflw We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive llvm-svn: 306976	2017-07-02 12:56:10 +00:00
Gadi Haber	dc25c2b08b	[X86] Rerun "update_llc_test_checks" tool on CodeGen tests. NFC. This is NFC after rerunning the "update_llc_test_checks.py" tool on the CodeGen X86 tests in order to submit a patch. Minor differences due to added "End of Function" lines. Reviewers: zvi Differential Revision: https://reviews.llvm.org/D34933 llvm-svn: 306973	2017-07-02 12:01:33 +00:00
Igor Breger	717bd36c83	[GlobalISel][X86] Support G_GLOBAL_VALUE operation. Summary: Support G_GLOBAL_VALUE operation. For now most of the PIC configurations not implemented yet. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34738 Conflicts: test/CodeGen/X86/GlobalISel/regbankselect-X86_64.mir llvm-svn: 306972	2017-07-02 08:58:29 +00:00
Igor Breger	b186a69aa5	[GlobalISel][X86] Support vector type G_UNMERGE_VALUES selection. Summary: Support vector type G_UNMERGE_VALUES selection. For now G_UNMERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Reviewers: t.p.northover, qcolombet, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33665 llvm-svn: 306971	2017-07-02 08:15:49 +00:00
Hiroshi Inoue	bb703e8960	fix trivial typos; NFC suport -> support llvm-svn: 306968	2017-07-02 03:24:54 +00:00
Simon Pilgrim	3bad6f3167	[X86][RDSEED] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well. llvm-svn: 306961	2017-07-01 16:42:16 +00:00
Simon Pilgrim	2d320161e5	[X86][RDRAND] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well. llvm-svn: 306960	2017-07-01 16:41:12 +00:00
Simon Pilgrim	2b679e1812	[X86] Removed reference to update_test_checks.py llvm-svn: 306959	2017-07-01 16:34:29 +00:00
Simon Pilgrim	ad7f0844ea	[X86][AVX] Remove duplicate autogeneration note llvm-svn: 306958	2017-07-01 16:32:02 +00:00
Nirav Dave	a35938d827	Revert "[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset" This reverts commit r306819 which appears be exposing underlying issues in a stage1 ppc64be build llvm-svn: 306820	2017-06-30 12:56:02 +00:00
Nirav Dave	c5a48c1ee8	[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using generic checks. Also, propagate missing local handling from there to BaseIndexOffset checks. Tests of note: * test/CodeGen/X86/build-vector* - Improved. * test/CodeGen/BPF/undef.ll - Improved store alignment allows an additional store merge * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a case we already do not handle well. Here, the DAG is improved, but scheduling causes a code size degradation. Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D34472 llvm-svn: 306819	2017-06-30 12:23:41 +00:00
Simon Pilgrim	e5e9232260	[X86] Updated 32-bit memcmp tests to run with/without SSE2 llvm-svn: 306816	2017-06-30 11:23:59 +00:00
Taewook Oh	0e35ea3b7c	Remove redundant copy in recurrences Summary: If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code, ``` BB#1: ; Loop Header %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: ; Loop Latch %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0 %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> ``` Existing two-address generation pass generates following code: ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1: ``` .LBB0_6: addl %esi, %edi addl %ebx, %edi cmpl $10, %edi movl %edi, %esi jl .LBB0_1 ``` This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: derived from LLVM BB %bb7 Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` and the final assembly does not have redundant copy: ``` .LBB0_6: addl %edi, %eax addl %ebx, %eax cmpl $10, %eax jl .LBB0_1 ``` Reviewers: qcolombet, MatzeB, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31821 llvm-svn: 306758	2017-06-29 23:11:24 +00:00
Daniel Jasper	559aa75382	Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue" I am 99% sure that this breaks the PPC ASAN build bot: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio If it doesn't go back to green, we can recommit (and fix the original commit message at the same time :) ). llvm-svn: 306676	2017-06-29 13:58:24 +00:00
Igor Breger	0cddd34876	[GlobalISel][X86] Support vector type G_MERGE_VALUES selection. Summary: Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33958 llvm-svn: 306665	2017-06-29 12:08:28 +00:00
Simon Pilgrim	9a68e69c68	[X86][SSE] Dropped -mcpu from palignr tests Use triple and attribute only for consistency Add AVX tests as well llvm-svn: 306664	2017-06-29 11:13:39 +00:00
Simon Pilgrim	e2eacbfc23	[X86][SSE] Regenerate shuffle test with update_llc_test_checks.py llvm-svn: 306663	2017-06-29 11:11:37 +00:00
Simon Pilgrim	0afe97f480	[X86][SSE] Dropped -mcpu from vector shift tests Use triple and attribute only for consistency llvm-svn: 306662	2017-06-29 11:09:53 +00:00
Simon Pilgrim	91539ce2d3	[X86][SSE] Dropped -mcpu from zero insertion tests Use triple and attribute only for consistency llvm-svn: 306661	2017-06-29 11:08:11 +00:00
Michael Zuckerman	4bcb9c3349	[LLVM][X86][Goldmont] Adding new target-cpu: Goldmont [LLVM SIDE] Connecting the GoldMont processor to his feature. Reviewers: 1. igorb 2. zvi 3. delena 4. RKSimon 5. craig.topper Differential Revision: https://reviews.llvm.org/D34504 llvm-svn: 306658	2017-06-29 10:00:33 +00:00
Zvi Rackover	da3943d600	[X86] Adding shuffle tests demonstrating missed vcompress opportunities. NFC llvm-svn: 306646	2017-06-29 06:22:01 +00:00
Chih-Hung Hsieh	514dafdae3	Another test commit. llvm-svn: 306567	2017-06-28 17:12:51 +00:00
Simon Pilgrim	48b30c3d55	[X86] Added BSWAP tests for illegal i64/i128/i256 'wide' scalar integers llvm-svn: 306546	2017-06-28 14:07:50 +00:00
Simon Pilgrim	4f5fcb03ad	[X86][SSE] Dropped -mcpu from vector bswap tests Use triple and attribute only for consistency llvm-svn: 306545	2017-06-28 13:59:15 +00:00
Michael Zuckerman	d0e663a697	[X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test. Exapnding the test to include AVX target. Adding base tast (to trunk) for Store strid=4 vf=32. llvm-svn: 306543	2017-06-28 13:42:45 +00:00
Igor Breger	86cf07a32e	[GlobalISel][X86] Test G_CONSTANT i32 0 TableGen'erated selection.NFC. llvm-svn: 306537	2017-06-28 12:43:21 +00:00
Igor Breger	d5b59cf914	[GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XOR Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: aymanmus Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34605 llvm-svn: 306533	2017-06-28 11:39:04 +00:00
Michael Zuckerman	f66840020c	Reverting commit 306414 on behalf of @gadi.haber llvm-svn: 306532	2017-06-28 11:23:31 +00:00
Simon Pilgrim	b9fa16bc53	[X86][AVX2] Dropped -mcpu from avx2 arithmetic/intrinsics tests Use triple and attribute only for consistency llvm-svn: 306531	2017-06-28 10:54:54 +00:00
Petar Jovanovic	7b3a38ec30	[X86] Correct dwarf unwind information in function epilogue CFI instructions that set appropriate cfa offset and cfa register are now inserted in emitEpilogue() in X86FrameLowering. Majority of the changes in this patch: 1. Ensure that CFI instructions do not affect code generation. 2. Enable maintaining correct information about cfa offset and cfa register in a function when basic blocks are reordered, merged, split, duplicated. These changes are target independent and described below. Changed CFI instructions so that they: 1. are duplicable 2. are not counted as instructions when tail duplicating or tail merging 3. can be compared as equal Add information to each MachineBasicBlock about cfa offset and cfa register that are valid at its entry and exit (incoming and outgoing CFI info). Add support for updating this information when basic blocks are merged, split, duplicated, created. Add a verification pass (CFIInfoVerifier) that checks that outgoing cfa offset and register of predecessor blocks match incoming values of their successors. Incoming and outgoing CFI information is used by a late pass (CFIInstrInserter) that corrects CFA calculation rule for a basic block if needed. That means that additional CFI instructions get inserted at basic block beginning to correct the rule for calculating CFA. Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D18046 llvm-svn: 306529	2017-06-28 10:21:17 +00:00
Sanjay Patel	4b23fa0abf	[CGP] add specialization for memcmp expansion with only one basic block llvm-svn: 306485	2017-06-27 23:15:01 +00:00
Sanjay Patel	70b36f193d	[CGP] eliminate a sub instruction in memcmp expansion As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 llvm-svn: 306471	2017-06-27 21:46:34 +00:00
Chih-Hung Hsieh	ff680f0386	Another test commit llvm-svn: 306420	2017-06-27 16:18:41 +00:00
Gadi Haber	13759a7ed6	Updated and extended the information about each instruction in HSW and SNB to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414	2017-06-27 15:05:13 +00:00
Ayman Musa	721d97f7b8	Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371 [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 306402	2017-06-27 12:08:37 +00:00
Simon Pilgrim	71d8b67bea	[X86][AVX512] Regenerate avx512 arithmetic tests llvm-svn: 306389	2017-06-27 10:13:56 +00:00
Igor Breger	925f088bae	[GlobalISel][X86] Add fp32/62 legalizer, regbank-select, selection tests for G_FADD, G_FSUB, G_FMUL, G_FDIV. NFC. llvm-svn: 306370	2017-06-27 07:01:54 +00:00
Wolfgang Pieb	9f65858235	DAGCombine: Make sure we only eliminate trunc/extend when the scales of truncation and extension match. This fixes PR33368. Reviewer: rksimon Differential Revision: https://reviews.llvm.org/D34069 llvm-svn: 306345	2017-06-26 23:05:51 +00:00
Sanjay Patel	b859910eb2	[x86] add tests for missing sbb transforms; NFC llvm-svn: 306337	2017-06-26 22:20:07 +00:00
Simon Pilgrim	d58f051792	[X86][SSE] Check SSE2/SSE3 codegen tests on i686 and x86_64 llvm-svn: 306314	2017-06-26 18:20:46 +00:00
Simon Pilgrim	f07663876a	[X86][SSE] Add combine tests for PMULDQ/PMULUDQ Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations llvm-svn: 306302	2017-06-26 16:22:52 +00:00
Ahmed Bougacha	58a197414e	[X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc. The non-AVX-512 behavior was changed in r248266 to match N1778 (C bindings for IEEE-754 (2008)), which defined the four functions to not raise the inexact exception ("rint" is still defined as raising it). Update the AVX-512 lowering of these functions to match that: it should not be different. llvm-svn: 306299	2017-06-26 16:00:24 +00:00
Simon Pilgrim	0ad0e5802b	[X86] Add test case for PR15981 llvm-svn: 306296	2017-06-26 15:53:11 +00:00
Sanjay Patel	15748d239e	[x86] transform vector inc/dec to use -1 constant (PR33483) Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289	2017-06-26 14:19:26 +00:00
Michael Zuckerman	ce7e187f84	[X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test. Adding base tast (to trunk) for Store strid=4 vf=32. llvm-svn: 306286	2017-06-26 13:27:32 +00:00
Serguei Katkov	0e70206c8f	This reverts commit r306272. Revert "[MBP] do not rotate loop if it creates extra branch" It breaks the sanitizer build bots. Need to fix this. llvm-svn: 306276	2017-06-26 06:51:45 +00:00
Serguei Katkov	b01fff06ed	[MBP] do not rotate loop if it creates extra branch This is a last fix for the corner case of PR32214. Actually this is not really corner case in general. We should not do a loop rotation if we create an additional branch due to it. Consider the case where we have a loop chain H, M, B, C , where H is header with viable fallthrough from pre-header and exit from the loop M - some middle block B - backedge to Header but with exit from the loop also. C - some cold block of the loop. Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch. Let's compute the change in number of branches: +1 branch from pre-header to header -1 branch from header to exit +1 branch from header to middle block if there is such -1 branch from cold bock to header if there is one So if C is not a predecessor of H then we introduce extra branch. This change actually prohibits rotation of the loop if both true 1) Best Exit has next element in chain as successor. 2) Last element in chain is not a predecessor of first element of chain. Reviewers: iteratee, xur Reviewed By: iteratee Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34271 llvm-svn: 306272	2017-06-26 05:27:27 +00:00
Simon Pilgrim	9956364a1f	[X86] Add test case for PR15705 llvm-svn: 306246	2017-06-25 16:12:45 +00:00
Elena Demikhovsky	72f991cded	AVX-512: Fixed a crash during legalization of <3 x i8> type The compiler fails with assertion during legalization of SETCC for <3 x i8> operands. The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>. Differential Revision: https://reviews.llvm.org/D34503 llvm-svn: 306242	2017-06-25 13:36:20 +00:00
Igor Breger	f5035d6ee5	[GlobalISel][X86] Support vector type G_EXTRACT selection. Summary: Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33957 llvm-svn: 306240	2017-06-25 11:42:17 +00:00
Nirav Dave	cedfeb364f	Add bitcast store-merge test. llvm-svn: 306158	2017-06-23 20:52:14 +00:00
Sanjay Patel	3de6bad65f	[x86] fix value types for SBB transform (PR33560) I'm not sure yet why this wouldn't fail in the simple case, but clearly I used the wrong value type with: https://reviews.llvm.org/rL306040 ...and the bug manifests with: https://bugs.llvm.org/show_bug.cgi?id=33560 llvm-svn: 306139	2017-06-23 18:42:15 +00:00
Simon Pilgrim	19cee0d56c	[X86][AVX] Regenerate i256 bitcasted store test Check on slow/fast unaligned memory targets llvm-svn: 306138	2017-06-23 18:34:56 +00:00
Simon Pilgrim	dfa436079f	Regenerate extract-store.ll tests llvm-svn: 306131	2017-06-23 17:19:44 +00:00
Sanjay Patel	021f32fd0f	[x86] auto-generate complete checks; NFC llvm-svn: 306114	2017-06-23 15:29:49 +00:00
Sanjay Patel	02469b63c2	[x86] auto-generate complete checks; NFC llvm-svn: 306113	2017-06-23 15:22:27 +00:00
Sanjay Patel	563e5afa0e	[x86] remove overridden target settings in test; NFC r306109 was supposed to make this change, but I committed the wrong version. llvm-svn: 306110	2017-06-23 15:06:30 +00:00
Sanjay Patel	8e06df4303	[x86] rename test file and auto-generate complete checks; NFC The command-line params override the target setting in the file itself, so delete that. Also, remove the cpu and arch because those don't matter and neither does the OS specification in the triple. llvm-svn: 306109	2017-06-23 14:58:21 +00:00
Simon Pilgrim	859b48d2d3	[X86][AVX] Extended vector average tests Added AVX1 tests and merged AVX1/AVX2/AVX512 checks where possible llvm-svn: 306107	2017-06-23 14:38:00 +00:00
Simon Pilgrim	dbd20ffee1	[X86][SSE] Dropped -mcpu from vector average tests Use triple and attribute only for consistency llvm-svn: 306104	2017-06-23 14:16:50 +00:00
Simon Pilgrim	dbf8f5ace7	[X86][SSE] Dropped -mcpu from scalar math tests Use triple and attribute only for consistency llvm-svn: 306097	2017-06-23 13:07:20 +00:00
Simon Pilgrim	5d3d716815	[X86][SSE] Dropped -mcpu from insertps tests Use triple and attribute only for consistency llvm-svn: 306092	2017-06-23 11:00:49 +00:00
Sanjay Patel	359ae44fb4	[x86] add/sub (X==0) --> sbb(cmp X, 1) This is very similar to the transform in: https://reviews.llvm.org/rL306040 ...but in this case, we use cmp X, 1 to set the carry bit as needed. Again, we can show that all of these are logically equivalent (although InstCombine currently canonicalizes to a form not seen here), and if we believe IACA, then this is the smallest/fastest code. Eg, with SNB: \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| 1.0 \| \| \| \| \| \| \| cmp edi, 0x1 \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| sbb eax, eax The larger motivation is to clean up all select-of-constants combining/lowering because we're missing some common cases. llvm-svn: 306072	2017-06-22 23:47:15 +00:00
Farhana Aleen	4b652a5335	Supported lowerInterleavedStore() in X86InterleavedAccess. Reviewers: RKSimon, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306068	2017-06-22 22:59:04 +00:00
Sanjay Patel	ff051957fc	[x86] add more tests for select --> sbb transform; NFC These are siblings of the tests added with r306032. llvm-svn: 306064	2017-06-22 22:17:05 +00:00
Craig Topper	792fc92be2	[AVX-512] Remove and autoupgrade the masked integer compare intrinsics Summary: These intrinsics aren't used by clang and haven't been for a while. There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today. Reviewers: spatel, RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34389 llvm-svn: 306047	2017-06-22 20:11:01 +00:00
Sanjay Patel	41a34e4111	[x86] add/sub (X==0) --> sbb(neg X) Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480), lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb' codegen in 1 out of the 4 tests. This is a step towards smoothing that out. First, show that all of these IR forms are equivalent: http://rise4fun.com/Alive/mx Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge (later Intel and AMD chips are similar based on Agner's tables): This is the "obvious" x86 codegen (what gcc appears to produce currently): \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1* \| \| \| \| \| \| \| \| xor eax, eax \| 1 \| 1.0 \| \| \| \| \| \| CP \| test edi, edi \| 1 \| \| \| \| \| \| 1.0 \| CP \| setnz al \| 1 \| \| 1.0 \| \| \| \| \| CP \| neg eax This is the adc version: \| 1* \| \| \| \| \| \| \| \| xor eax, eax \| 1 \| 1.0 \| \| \| \| \| \| CP \| cmp edi, 0x1 \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| adc eax, 0xffffffff And this is sbb: \| 1 \| 1.0 \| \| \| \| \| \| \| neg edi \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| sbb eax, eax If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be clearly better than the alternatives going forward. llvm-svn: 306040	2017-06-22 18:11:19 +00:00
Sanjay Patel	96e4e0967e	[x86] add tests for select --> sbb transform; NFC llvm-svn: 306032	2017-06-22 17:01:14 +00:00
whitequark	cebe8241ca	[X86] Add support for "probe-stack" attribute This commit adds prologue code emission for stack probe function calls. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D34387 llvm-svn: 306010	2017-06-22 15:42:53 +00:00
Igor Breger	1c29be7e4f	[GlobalISel][X86] Support vector type G_INSERT legalization/selection. Summary: Support vector type G_INSERT legalization/selection. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33956 llvm-svn: 305989	2017-06-22 09:43:35 +00:00
Elena Demikhovsky	2dac0b4d58	AVX-512: Lowering Masked Gather intrinsic - fixed a bug Masked gather for vector length 2 is lowered incorrectly for element type i32. The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD. The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal. In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well. Differential revision: https://reviews.llvm.org/D34343 llvm-svn: 305987	2017-06-22 06:47:41 +00:00
Davide Italiano	9b8e3d308f	[Solaris] emit .init_array instead of .ctors on Solaris (Sparc/x86) Patch by Fedor Sergeev. Differential Revision: https://reviews.llvm.org/D33868 llvm-svn: 305948	2017-06-21 20:36:32 +00:00
Simon Pilgrim	550cb7e82c	[X86][SSE] Dropped -mcpu from 256-bit vector shuffle tests Use triple and attribute only for consistency llvm-svn: 305916	2017-06-21 14:51:23 +00:00
Simon Pilgrim	9d0c2b7bad	[X86][SSE] Dropped -mcpu from 128-bit vector shuffle tests Use triple and attribute only for consistency llvm-svn: 305913	2017-06-21 14:23:02 +00:00
Simon Pilgrim	5309b7d5c9	[X86][SSE] Regenerate merge store tests llvm-svn: 305910	2017-06-21 13:46:42 +00:00
Simon Pilgrim	e74e08fe61	[X86][SSE] Dropped -mcpu from vector blend shuffle tests and regenerate Use triple and attribute only for consistency llvm-svn: 305909	2017-06-21 13:45:33 +00:00
Simon Pilgrim	98aab7c6fc	[X86][SSE] Dropped -mcpu from vector shuffle tests Use triple and attribute only for consistency llvm-svn: 305908	2017-06-21 13:26:52 +00:00
Simon Pilgrim	6d5d6b542b	[X86][SSE] Dropped -mcpu from vector zero extend tests Use triple and attribute only for consistency llvm-svn: 305907	2017-06-21 13:17:14 +00:00
Simon Pilgrim	c388ec32e0	[X86][SSE] Dropped -mcpu from variable shuffle tests Use triple and attribute only for consistency llvm-svn: 305906	2017-06-21 13:15:41 +00:00
Simon Pilgrim	73814a2594	[X86][AVX] Add AVX1 shuffle truncation tests llvm-svn: 305905	2017-06-21 12:58:56 +00:00
Simon Pilgrim	db6c3fa872	[X86][SSE] Add SSE2/SSE42 shuffle truncation tests llvm-svn: 305904	2017-06-21 12:58:19 +00:00
Zvi Rackover	845ca8fba9	[X86] Rerun the update_llc_test_checks tool on test. NFC. llvm-svn: 305897	2017-06-21 11:21:43 +00:00
Guy Blank	52d73fce85	[DAGCombiner] Add another combine from build vector to shuffle Add support for combining a build vector to a shuffle. When the build vector is of extracted elements from 2 vectors (vec1, vec2) where vec2 is 2 times smaller than vec1. llvm-svn: 305883	2017-06-21 07:38:41 +00:00
Dean Michael Berris	28ecff5cf1	[XRay] Reduce synthetic references emitted by XRay Summary: When we're building with XRay instrumentation, we use a trick that preserves references from the function to a function sled index. This index table lives in a separate section, and without this trick the linker is free to garbage-collect this section and all the segments it refers to. Until we're able to tell the linkers to preserve these sections, we use this reference trick to keep around both the index and the entries in the instrumentation map. Before this change we emitted both a synthetic reference to the label in the instrumentation map, and to the entry in the function map index. This change removes the first synthetic reference and only emits one synthetic reference to the index -- the index entry has the references to the labels in the instrumentation map, so the linker will still preserve those if the function itself is preserved. This reduces the amount of synthetic references we emit from 16 bytes to just 8 bytes in x86_64, and similarly to other platforms. Reviewers: dblaikie Subscribers: javed.absar, kpw, pelikan, llvm-commits Differential Revision: https://reviews.llvm.org/D34340 llvm-svn: 305880	2017-06-21 06:39:42 +00:00
Serguei Katkov	0b0dc57dd8	[ImplicitNullChecks] Uphold an invariant in areMemoryOpsAliased Right now areMemoryOpsAliased has an assertion justified as: MMO1 should have a value due it comes from operation we'd like to use as implicit null check. assert(MMO1->getValue() && "MMO1 should have a Value!"); However, it is possible for that invariant to not be upheld in the following situation (conceptually): Null check %RAX NotNullSucc: %RAX = LEA %RSP, 16 // I0 %RDX = MOV64rm %RAX // I1 With the current code, we will have an early exit from ImplicitNullChecks::isSuitableMemoryOp on I0 with SR_Unsuitable. However, I1 will look plausible (since it loads from %RAX) and will go ahead and call areMemoryOpsAliased(I1, I0). This will cause us to fail the assert mentioned above since I1 does not load from an IR level value and thus is allowed to have a non-Value base address. The fix is to bail out earlier whenever we see an unsuitable instruction overwrite PointerReg. This would guarantee that when we call areMemoryOpsAliased, we're guaranteed to be looking at an instruction that loads from or stores to an IR level value. Original Patch Author: sanjoy Reviewers: sanjoy, mkazantsev, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34385 llvm-svn: 305879	2017-06-21 06:38:23 +00:00
Sanjay Patel	0656629b87	[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) llvm-svn: 305802	2017-06-20 15:58:30 +00:00
Simon Pilgrim	4822b5b649	[X86][SSE] Relax 0/-1 vector element insertion to work for any vector with >=16bit elements Shuffle lowering/combining now does a good job for 256/512-bit vectors - we don't need to prevent this llvm-svn: 305801	2017-06-20 15:19:02 +00:00
Simon Pilgrim	b4a77fe83a	Fixed test name. NFCI. llvm-svn: 305787	2017-06-20 10:24:06 +00:00
Igor Breger	1dcd5e8dc8	[GlobalISel][X86] Get correct RegClass for given RegBank. Summary: In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: t.p.northover, guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33952 Conflicts: test/CodeGen/X86/GlobalISel/select-memop-scalar.mir llvm-svn: 305784	2017-06-20 09:15:10 +00:00
Igor Breger	14535f0fc2	[GlobalISel] combine not symmetric merge/unmerge nodes. Summary: In some cases legalization ends up with not symmetric merge/unmerge nodes. Transform it to merge/unmerge nodes. Reviewers: t.p.northover, qcolombet, zvi Reviewed By: t.p.northover Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33626 llvm-svn: 305783	2017-06-20 08:54:17 +00:00
Igor Breger	22ab175658	[GlobalISel][X86] add legalizer mir tests. NFC llvm-svn: 305781	2017-06-20 08:30:48 +00:00
Sanjoy Das	7ba830d61c	Fix machine instruction in test case The AMD64rm instruction used in the test case was incorrect. Since the first input register to AND64rm is tied to output register, they must be the same. Thanks for Jesper Antonsson for pointing this out! llvm-svn: 305756	2017-06-19 22:35:48 +00:00

1 2 3 4 5 ...

9930 Commits