llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	2f963a7e83	[SLPVectorizer] Add initial alternate opcode support for cast instructions. We currently only support binary instructions in the alternate opcode shuffles. This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism: 1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly. 2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this. 3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc. 4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements. Differential Revision: https://reviews.llvm.org/D49135 llvm-svn: 336804	2018-07-11 13:34:09 +00:00
Farhana Aleen	3b416db19b	[SLP] Recognize min/max pattern using instructions producing same values. Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization. %1 = extractelement <2 x i32> %a, i32 0 %2 = extractelement <2 x i32> %a, i32 1 %cond = icmp sgt i32 %1, %2 %3 = extractelement <2 x i32> %a, i32 0 %4 = extractelement <2 x i32> %a, i32 1 %select = select i1 %cond, i32 %3, i32 %4 Author: FarhanaAleen Reviewed By: ABataev, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D47608 llvm-svn: 336130	2018-07-02 17:55:31 +00:00
Simon Pilgrim	35f196c179	[SLPVectorizer][X86] Begin adding alternate tests for call operators Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls llvm-svn: 336125	2018-07-02 17:23:45 +00:00
Simon Pilgrim	265793d52a	[SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns. We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case. This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now... llvm-svn: 336095	2018-07-02 11:28:01 +00:00
Simon Pilgrim	84f77ecba9	[SLPVectorizer][X86] Add some alternate tests for cast operators Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations llvm-svn: 336060	2018-07-01 11:29:46 +00:00
Simon Pilgrim	bbfc18b5b5	[SLPVectorizer] Recognise non uniform power of 2 constants Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them. As SLP works with arrays of values I don't think we can easily use the pattern match helpers here. Differential Revision: https://reviews.llvm.org/D48214 llvm-svn: 335621	2018-06-26 16:20:16 +00:00
Simon Pilgrim	9d3ef8ee2b	[SLPVectorizer] Support alternate opcodes in tryToVectorizeList Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree. NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc. Differential Revision: https://reviews.llvm.org/D48488 llvm-svn: 335364	2018-06-22 16:37:34 +00:00
Simon Pilgrim	1e564504bb	[SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle. This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle. Differential Revision: https://reviews.llvm.org/D48477 llvm-svn: 335349	2018-06-22 14:04:06 +00:00
Simon Pilgrim	229a781214	[SLPVectorizer][X86] Add alternate opcode tests for simple build vector cases llvm-svn: 335348	2018-06-22 13:53:58 +00:00
Simon Pilgrim	9c8f9374b5	[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329	2018-06-22 09:45:31 +00:00
Simon Pilgrim	2a9cde026c	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Simon Pilgrim	d08fbf6486	[SLPVectorizer][X86] Add horizontal add/sub tests Shows PR37882 perf regression llvm-svn: 335215	2018-06-21 11:16:10 +00:00
Simon Pilgrim	2e2f20a949	[SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand. This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation. The AArch64 regression will be fixed by D48172. Differential Revision: https://reviews.llvm.org/D48174 llvm-svn: 335130	2018-06-20 14:26:28 +00:00
Simon Pilgrim	180497ea11	[SLP][X86] Add AVX2 run to POW2 SDIV Tests Non-uniform pow2 tests are only make sense on targets with fast (low cost) non-uniform shifts llvm-svn: 334821	2018-06-15 10:29:37 +00:00
Simon Pilgrim	ca6215f8c8	[SLP][X86] Regenerate POW2 SDIV Tests Added non-uniform pow2 test as well llvm-svn: 334819	2018-06-15 10:07:03 +00:00
Farhana Aleen	078cd48a39	[SLP] Add testcases of min/max reduction pattern for AMDGPU. Author: FarhanaAleen llvm-svn: 334435	2018-06-11 20:29:31 +00:00
Matt Arsenault	1349a04ef5	AMDGPU: Make v2i16/v2f16 legal on VI This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953	2018-05-22 06:32:10 +00:00
Farhana Aleen	e24f3ff8de	[AMDGPU] Support horizontal vectorization of min/max. Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920	2018-05-09 21:18:34 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Farhana Aleen	e2dfe8a853	[AMDGPU] Support horizontal vectorization. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313	2018-05-01 21:41:12 +00:00
Matthew Simpson	661e6a02bd	[SLP] Add additional test for transposable binary operations with reuse llvm-svn: 331274	2018-05-01 15:59:26 +00:00
Davide Italiano	bd3bf1660b	[SLPVectorizer] Debug info shouldn't impact spill cost computation. <rdar://problem/39794738> (Also, PR32761). Differential Revision: https://reviews.llvm.org/D46199 llvm-svn: 331199	2018-04-30 16:57:33 +00:00
Benjamin Kramer	733c7fc55d	[NVPTX] Turn on Loop/SLP vectorization Since PTX has grown a <2 x half> datatype vectorization has become more important. The late LoadStoreVectorizer intentionally only does loads and stores, but now arithmetic has to be vectorized for optimal throughput too. This is still very limited, SLP vectorization happily creates <2 x half> if it's a legal type but there's still a lot of register moving happening to get that fed into a vectorized store. Overall it's a small performance win by reducing the amount of arithmetic instructions. I haven't really checked what the loop vectorizer does to PTX code, the cost model there might need some more tweaks. I didn't see it causing harm though. Differential Revision: https://reviews.llvm.org/D46130 llvm-svn: 331035	2018-04-27 13:36:05 +00:00
Matthew Simpson	cfdec0ff70	[SLP] Add tests for transposable binary operations These test cases are vectorizable, but we are currently unable to vectorize them effectively. llvm-svn: 330945	2018-04-26 14:50:04 +00:00
Craig Topper	60c7e0d587	[X86] Remove unnecessary -mattr to enable avx512bw when the -mcpu already enabled it. NFC This makes the test similar to the arith-sub.ll and arith-mul.ll tests. llvm-svn: 330144	2018-04-16 18:14:19 +00:00
Haicheng Wu	f7466f3164	[SLP] Use getExtractWithExtendCost() to compute the scalar cost of extractelement/ext pair We use getExtractWithExtendCost to calculate the cost of extractelement and s\|zext together when computing the extract cost after vectorization, but we calculate the cost of extractelement and s\|zext separately when computing the scalar cost which is larger than it should be. Differential Revision: https://reviews.llvm.org/D45469 llvm-svn: 330143	2018-04-16 18:09:49 +00:00
Haicheng Wu	5ba379557d	[SLP] update a test case. NFC. llvm-svn: 329818	2018-04-11 15:09:49 +00:00
Alexey Bataev	2f67dbb73e	[SLP] Additional tests for reorder reuse vectorization, NFC. llvm-svn: 329603	2018-04-09 19:02:34 +00:00
Simon Pilgrim	f1e668830f	[SLPVectorizer][X86] Regenerate some tests. NFCI llvm-svn: 329196	2018-04-04 13:53:51 +00:00
Alexey Bataev	428e9d9d87	[SLP] Fix PR36481: vectorize reassociated instructions. Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Patch does not support reordering of the repeated instruction, this must be handled in the separate patch. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 329085	2018-04-03 17:14:47 +00:00
Alexey Bataev	976aff148a	[SLP] Added tests for checks of reordering of the repeated instructions, NFC. llvm-svn: 329080	2018-04-03 16:31:26 +00:00
Benjamin Kramer	2fc3b18922	Revert "[SLP] Fix PR36481: vectorize reassociated instructions." This reverts commit r328980 and r329046. Makes the vectorizer crash. llvm-svn: 329071	2018-04-03 14:40:33 +00:00
Haicheng Wu	7f0daaeb86	[SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth We use two approaches for determining the minimum bitwidth. * Demanded bits * Value tracking If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded. But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in %i = sext i32 %e1 to i64 %gep = getelementptr inbounds i64, i64* %p, i64 %i are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in %tmp15 = sext i32 %tmp14 to i64 %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0 it doesn't make sense to shrink %tmp15 and we can skip the value tracking. Ideas are from Matthew Simpson! Differential Revision: https://reviews.llvm.org/D44868 llvm-svn: 329035	2018-04-03 00:05:10 +00:00
Alexey Bataev	3decaf4275	[SLP] Fix PR36481: vectorize reassociated instructions. Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 328980	2018-04-02 14:51:37 +00:00
Dinar Temirbulatov	c326c1c582	[SLPVectorizer] Add tests related to PR30787, NFCI. llvm-svn: 328813	2018-03-29 18:57:03 +00:00
Haicheng Wu	b45f921678	[SLP] Add more checks to a test case. NFC. llvm-svn: 328572	2018-03-26 18:59:28 +00:00
Haicheng Wu	0ec1dbe417	[SLP] Add a test case. NFC. llvm-svn: 328546	2018-03-26 16:47:37 +00:00
Matthew Simpson	6c289a1c74	[SLP] Stop counting cost of gather sequences with multiple uses When building the SLP tree, we look for reuse among the vectorized tree entries. However, each gather sequence is represented by a unique tree entry, even though the sequence may be identical to another one. This means, for example, that a gather sequence with two uses will be counted twice when computing the cost of the tree. We should only count the cost of the definition of a gather sequence rather than its uses. During code generation, the redundant gather sequences are emitted, but we optimize them away with CSE. So it looks like this problem just affects the cost model. Differential Revision: https://reviews.llvm.org/D44742 llvm-svn: 328316	2018-03-23 14:18:27 +00:00
Matthew Simpson	b17fff79f0	[SLP] Add test case for a gather sequence with multiple uses llvm-svn: 328133	2018-03-21 19:13:14 +00:00
Matthew Simpson	eacfefd056	[AArch64] Implement getArithmeticReductionCost This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702	2018-03-16 11:34:15 +00:00
Alexey Bataev	625ce229b1	[SLP] Additional tests for stores vectorization, NFC. llvm-svn: 326740	2018-03-05 20:20:12 +00:00
Mohammad Shahid	ddeee12f59	[SLP] Added new tests and updated existing for jumbled load, NFC. llvm-svn: 326303	2018-02-28 04:19:34 +00:00
Sanjay Patel	04d1d79ee5	[AArch64] add SLP test based on TSVC; NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326217	2018-02-27 18:06:15 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
Alexey Bataev	b44e2b75e8	[SLP] Added new test + fixed some checks, NFC. llvm-svn: 326117	2018-02-26 20:01:24 +00:00
Simon Pilgrim	864949d5e9	[SLPVectorizer][X86] Add load extend tests (PR36091) llvm-svn: 325772	2018-02-22 12:19:34 +00:00
Sanjay Patel	d53da082a0	[AArch64] fix IR names to not be 'tmp' because that gives the CHECK script problems llvm-svn: 325718	2018-02-21 20:48:14 +00:00
Sanjay Patel	ffe51e450f	[AArch64] add SLP test for matmul (PR36280); NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 325717	2018-02-21 20:34:16 +00:00
Alexey Bataev	cdd0675ddc	[SLP] Fix test checks, NFC. llvm-svn: 325689	2018-02-21 15:32:58 +00:00
Sanjay Patel	e6143904b9	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280) There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658	2018-02-21 01:42:52 +00:00
Alexey Bataev	47dfd249f0	[SLP] Fix tests checks, NFC. llvm-svn: 325605	2018-02-20 18:11:50 +00:00
Sanjay Patel	3e8a76abfd	[TTI CostModel] change default cost of FP ops to 1 (PR36280) This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515	2018-02-19 16:11:44 +00:00
Alexey Bataev	862c476fc2	[SLP] Fix the test for the reversed stores, NFC. llvm-svn: 325268	2018-02-15 17:11:50 +00:00
Alexey Bataev	ac619599d8	[SLP] Added test for reversed stores, NFC. llvm-svn: 325265	2018-02-15 16:56:49 +00:00
Alexey Bataev	7f246e003a	[SLP] Allow vectorization of reversed loads. Summary: Reversed loads are handled as gathering. But we can just reshuffle these values. Patch adds support for vectorization of reversed loads. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43022 llvm-svn: 325134	2018-02-14 15:29:15 +00:00
Alexey Bataev	ca2396e673	[SLP] Take user instructions cost into consideration in insertelement vectorization. Summary: For better vectorization result we should take into consideration the cost of the user insertelement instructions when we try to vectorize sequences that build the whole vector. I.e. if we have the following scalar code: ``` <Scalar code> insertelement <ScalarCode>, ... ``` we should consider the cost of the last `insertelement ` instructions as the cost of the scalar code. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42657 llvm-svn: 324893	2018-02-12 14:54:48 +00:00
Sanjay Patel	574fb73c89	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324616	2018-02-08 15:32:28 +00:00
Sanjay Patel	124392f038	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324615	2018-02-08 15:30:39 +00:00
Sanjay Patel	e2c5e9a970	[SLPVectorizer] move RUN line to top-of-file; NFC I was confused what we were checking because the RUN line was in the middle of the file. llvm-svn: 324614	2018-02-08 15:28:49 +00:00
Sanjay Patel	cfa5c03039	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324612	2018-02-08 15:16:26 +00:00
Alexey Bataev	cd8d6de381	[SLP] Add a tests for PR36280, NFC. llvm-svn: 324510	2018-02-07 20:11:37 +00:00
Alexey Bataev	1e593fe73e	[SLP] Update test checks, NFC. llvm-svn: 324387	2018-02-06 20:00:05 +00:00
Alexey Bataev	1c8f53f47d	[SLP] Add extra test for extractelement shuffle, NFC. llvm-svn: 323815	2018-01-30 21:06:06 +00:00
Alexey Bataev	9c5c103283	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323662	2018-01-29 16:08:52 +00:00
Alexey Bataev	10f5c9e765	[SLP] Add a test with extract for PR32086, NFC. llvm-svn: 323661	2018-01-29 15:56:52 +00:00
Alexey Bataev	f86be12182	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323530 to fix possible problems in users code. llvm-svn: 323581	2018-01-27 02:42:21 +00:00
Alexey Bataev	7ad4e31c3b	[SLP] Test for trunc vectorization, NFC. llvm-svn: 323556	2018-01-26 20:07:55 +00:00
Alexey Bataev	167003df28	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323530	2018-01-26 14:31:09 +00:00
Alexey Bataev	102d4b59f9	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323441 to fix buildbots. llvm-svn: 323447	2018-01-25 17:28:12 +00:00
Alexey Bataev	c8cfa14b6d	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323441	2018-01-25 16:45:18 +00:00
Alexey Bataev	a0b2c78efc	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323430 to fix buildbots. llvm-svn: 323432	2018-01-25 15:20:29 +00:00
Alexey Bataev	ad51fe3644	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323430	2018-01-25 15:01:36 +00:00
Alexey Bataev	0affccc8d7	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323348 because of the broken buildbots. llvm-svn: 323359	2018-01-24 18:36:51 +00:00
Alexey Bataev	4bd8e5332f	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323348	2018-01-24 17:50:53 +00:00
Sanjay Patel	c4ed9ed276	[SLPVectorizer] add test for PR13837; NFC This was probably fixed long ago, but I don't see a test that lines up with the example and target in the bug report: https://bugs.llvm.org/show_bug.cgi?id=13837 ...so adding it here. llvm-svn: 323269	2018-01-23 22:04:17 +00:00
Alexey Bataev	4f74a31c0e	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323246 because of the broken buildbots. llvm-svn: 323252	2018-01-23 20:11:27 +00:00
Alexey Bataev	6719e2418c	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323246	2018-01-23 19:30:26 +00:00
Alexey Bataev	fa80c47c6a	[SLP] Fix vectorization for tree with trunc to minimum required bit width. Summary: If the vectorized tree has truncate to minimum required bit width and the vector type of the cast operation after the truncation is the same as the vector type of the cast operands, count cost of the vector cast operation as 0, because this cast will be later removed. Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner. Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41948 llvm-svn: 322946	2018-01-19 14:40:13 +00:00
Alexey Bataev	a2d6fe4ab4	[SLP] Fix test checks, NFC. llvm-svn: 322865	2018-01-18 17:34:27 +00:00
Alexey Bataev	6977dbcc7b	[SLP] Fix for PR32164: Improve vectorization of reverse order of extract operations. Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D33954 llvm-svn: 322579	2018-01-16 18:17:01 +00:00
Alexey Bataev	90e29b81d6	[SLP] Add/update tests for SLP vectorizer, NFC. llvm-svn: 322225	2018-01-10 21:29:18 +00:00
Alexey Bataev	771ec9f399	[COST]Fix PR35865: Fix cost model evaluation for shuffle on X86. Summary: If the vector type is transformed to non-vector single type, the compile may crash trying to get vector information about non-vector type. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41862 llvm-svn: 322106	2018-01-09 19:08:22 +00:00
Alexey Bataev	5b9a77d4ea	[SLP] Fix PR35777: Incorrect handling of aggregate values. Summary: Fixes the bug with incorrect handling of InsertValue\|InsertElement instrucions in SLP vectorizer. Currently, we may use incorrect ExtractElement instructions as the operands of the original InsertValue\|InsertElement instructions. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41767 llvm-svn: 321994	2018-01-08 14:43:06 +00:00
Alexey Bataev	118a0a2c38	[SLP] Fix PR35628: Count external uses on extra reduction arguments. Summary: If the vectorized value is marked as extra reduction argument, its users are not considered as external users. Patch fixes this. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41786 llvm-svn: 321993	2018-01-08 14:33:11 +00:00
Davide Italiano	4c39758a38	[SLPVectorizer] Reintroduce std::stable_sort(properlyDominates()). The approach was never discussed, I wasn't able to reproduce this non-determinism, and the original author went AWOL. After a discussion on the ML, Philip suggested to revert this. llvm-svn: 321974	2018-01-07 22:06:24 +00:00
Alexey Bataev	fa13848da8	[SLP] Update more test checks, NFC. llvm-svn: 321872	2018-01-05 16:15:17 +00:00
Alexey Bataev	e565ebcdad	[SLP] Update test checks, NFC. llvm-svn: 321870	2018-01-05 15:20:40 +00:00
Alexey Bataev	988db0bd50	[SLP] Update tests checks, NFC. llvm-svn: 321869	2018-01-05 14:40:04 +00:00
Mohammad Shahid	3a934d6ab9	Revert r320548:[SLP] Vectorize jumbled memory loads llvm-svn: 321181	2017-12-20 15:26:59 +00:00
Guozhi Wei	d22d1b953d	[SLPVectorizer] Don't ignore scalar extraction instructions of aggregate value In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted. For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation. Differential Revision: https://reviews.llvm.org/D41139 llvm-svn: 320736	2017-12-14 19:35:43 +00:00
Mohammad Shahid	dbd30edb7f	[SLP] Vectorize jumbled memory loads. Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: mgrang, dcaballe, hans, mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 llvm-svn: 320548	2017-12-13 03:08:29 +00:00
Hans Wennborg	e2470b95da	Revert r319531 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops." It causes builds to fail with "Instruction does not dominate all uses" (PR35497). > Patch tries to improve vectorization of the following code: > > void add1(int * __restrict dst, const int * __restrict src) { > dst++ = src++; > dst++ = src++ + 1; > dst++ = src++ + 2; > dst++ = src++ + 3; > } > Allows to vectorize even if the very first operation is not a binary add, but just a load. > > Fixed issues related to previous commit. > > Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev > > Reviewed By: ABataev, RKSimon > > Subscribers: llvm-commits, RKSimon > > Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 319550	2017-12-01 16:17:24 +00:00
Dinar Temirbulatov	29e86584c6	[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. Patch tries to improve vectorization of the following code: void add1(int * __restrict dst, const int * __restrict src) { dst++ = src++; dst++ = src++ + 1; dst++ = src++ + 2; dst++ = src++ + 3; } Allows to vectorize even if the very first operation is not a binary add, but just a load. Fixed issues related to previous commit. Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev Reviewed By: ABataev, RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 319531	2017-12-01 11:10:47 +00:00
NAKAMURA Takumi	519ea284af	SLPVectorizer.cpp: Avoid std::stable_sort(properlyDominates()). properlyDominates() shouldn't be used as sort key. It causes different output between stdlibc++ and libc++. Instead, I introduced RPOT. In most cases, it works for CSE. llvm-svn: 318743	2017-11-21 09:41:01 +00:00
Adam Nemet	572a87c76f	[SLP] Added more missed optimization remarks Summary: Added more remarks to SLP pass, in particular "missed" optimization remarks. Also proposed several tests for new functionality. Patch by Vladimir Miloserdov! For reference you may look at: https://reviews.llvm.org/rL302811 Reviewers: anemet, fhahn Reviewed By: anemet Subscribers: javed.absar, lattner, petecoup, yakush, llvm-commits Differential Revision: https://reviews.llvm.org/D38367 llvm-svn: 318307	2017-11-15 17:04:53 +00:00
Hans Wennborg	45cabacd2f	Revert r318193 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops." It crashes building sqlite; see reply on the llvm-commits thread. > [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. > > Patch tries to improve vectorization of the following code: > > void add1(int * __restrict dst, const int * __restrict src) { > dst++ = src++; > dst++ = src++ + 1; > dst++ = src++ + 2; > dst++ = src++ + 3; > } > Allows to vectorize even if the very first operation is not a binary add, but just a load. > > Fixed issues related to previous commit. > > Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev > > Reviewed By: ABataev, RKSimon > > Subscribers: llvm-commits, RKSimon > > Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 318239	2017-11-15 00:38:13 +00:00
Dinar Temirbulatov	2bd1836520	[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. Patch tries to improve vectorization of the following code: void add1(int * __restrict dst, const int * __restrict src) { dst++ = src++; dst++ = src++ + 1; dst++ = src++ + 2; dst++ = src++ + 3; } Allows to vectorize even if the very first operation is not a binary add, but just a load. Fixed issues related to previous commit. Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev Reviewed By: ABataev, RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 318193	2017-11-14 20:55:08 +00:00
Dinar Temirbulatov	a9e47fd7d9	NFC, Allow SystemZ SLP tests only when SystemZ is supported. llvm-svn: 318070	2017-11-13 18:35:43 +00:00
Alexey Bataev	0bd9004425	[SLP] Fix PR23510: Try to find best possible vectorizable stores. Summary: The analysis of the store sequence goes in straight order - from the first store to the last. Bu the best opportunity for vectorization will happen if we're going to use reverse order - from last store to the first. It may be best because usually users have some initialization part + further processing and this first initialization may confuse SLP vectorizer. Reviewers: RKSimon, hfinkel, mkuper, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39606 llvm-svn: 317821	2017-11-09 19:07:16 +00:00
Dan Gohman	2c74fe977d	Add an @llvm.sideeffect intrinsic This patch implements Chandler's idea [0] for supporting languages that require support for infinite loops with side effects, such as Rust, providing part of a solution to bug 965 [1]. Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual effect, but which appears to optimization passes to have obscure side effects, such that they don't optimize away loops containing it. It also teaches several optimization passes to ignore this intrinsic, so that it doesn't significantly impact optimization in most cases. As discussed on llvm-dev [2], this patch is the first of two major parts. The second part, to change LLVM's semantics to have defined behavior on infinite loops by default, with a function attribute for opting into potential-undefined-behavior, will be implemented and posted for review in a separate patch. [0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html [1] https://bugs.llvm.org/show_bug.cgi?id=965 [2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html Differential Revision: https://reviews.llvm.org/D38336 llvm-svn: 317729	2017-11-08 21:59:51 +00:00

1 2 3 4 5 ...

526 Commits