llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	1349a04ef5	AMDGPU: Make v2i16/v2f16 legal on VI This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953	2018-05-22 06:32:10 +00:00
Farhana Aleen	e24f3ff8de	[AMDGPU] Support horizontal vectorization of min/max. Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920	2018-05-09 21:18:34 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Farhana Aleen	e2dfe8a853	[AMDGPU] Support horizontal vectorization. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313	2018-05-01 21:41:12 +00:00
Matthew Simpson	661e6a02bd	[SLP] Add additional test for transposable binary operations with reuse llvm-svn: 331274	2018-05-01 15:59:26 +00:00
Davide Italiano	bd3bf1660b	[SLPVectorizer] Debug info shouldn't impact spill cost computation. <rdar://problem/39794738> (Also, PR32761). Differential Revision: https://reviews.llvm.org/D46199 llvm-svn: 331199	2018-04-30 16:57:33 +00:00
Benjamin Kramer	733c7fc55d	[NVPTX] Turn on Loop/SLP vectorization Since PTX has grown a <2 x half> datatype vectorization has become more important. The late LoadStoreVectorizer intentionally only does loads and stores, but now arithmetic has to be vectorized for optimal throughput too. This is still very limited, SLP vectorization happily creates <2 x half> if it's a legal type but there's still a lot of register moving happening to get that fed into a vectorized store. Overall it's a small performance win by reducing the amount of arithmetic instructions. I haven't really checked what the loop vectorizer does to PTX code, the cost model there might need some more tweaks. I didn't see it causing harm though. Differential Revision: https://reviews.llvm.org/D46130 llvm-svn: 331035	2018-04-27 13:36:05 +00:00
Matthew Simpson	cfdec0ff70	[SLP] Add tests for transposable binary operations These test cases are vectorizable, but we are currently unable to vectorize them effectively. llvm-svn: 330945	2018-04-26 14:50:04 +00:00
Craig Topper	60c7e0d587	[X86] Remove unnecessary -mattr to enable avx512bw when the -mcpu already enabled it. NFC This makes the test similar to the arith-sub.ll and arith-mul.ll tests. llvm-svn: 330144	2018-04-16 18:14:19 +00:00
Haicheng Wu	f7466f3164	[SLP] Use getExtractWithExtendCost() to compute the scalar cost of extractelement/ext pair We use getExtractWithExtendCost to calculate the cost of extractelement and s\|zext together when computing the extract cost after vectorization, but we calculate the cost of extractelement and s\|zext separately when computing the scalar cost which is larger than it should be. Differential Revision: https://reviews.llvm.org/D45469 llvm-svn: 330143	2018-04-16 18:09:49 +00:00
Haicheng Wu	5ba379557d	[SLP] update a test case. NFC. llvm-svn: 329818	2018-04-11 15:09:49 +00:00
Alexey Bataev	2f67dbb73e	[SLP] Additional tests for reorder reuse vectorization, NFC. llvm-svn: 329603	2018-04-09 19:02:34 +00:00
Simon Pilgrim	f1e668830f	[SLPVectorizer][X86] Regenerate some tests. NFCI llvm-svn: 329196	2018-04-04 13:53:51 +00:00
Alexey Bataev	428e9d9d87	[SLP] Fix PR36481: vectorize reassociated instructions. Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Patch does not support reordering of the repeated instruction, this must be handled in the separate patch. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 329085	2018-04-03 17:14:47 +00:00
Alexey Bataev	976aff148a	[SLP] Added tests for checks of reordering of the repeated instructions, NFC. llvm-svn: 329080	2018-04-03 16:31:26 +00:00
Benjamin Kramer	2fc3b18922	Revert "[SLP] Fix PR36481: vectorize reassociated instructions." This reverts commit r328980 and r329046. Makes the vectorizer crash. llvm-svn: 329071	2018-04-03 14:40:33 +00:00
Haicheng Wu	7f0daaeb86	[SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth We use two approaches for determining the minimum bitwidth. * Demanded bits * Value tracking If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded. But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in %i = sext i32 %e1 to i64 %gep = getelementptr inbounds i64, i64* %p, i64 %i are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in %tmp15 = sext i32 %tmp14 to i64 %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0 it doesn't make sense to shrink %tmp15 and we can skip the value tracking. Ideas are from Matthew Simpson! Differential Revision: https://reviews.llvm.org/D44868 llvm-svn: 329035	2018-04-03 00:05:10 +00:00
Alexey Bataev	3decaf4275	[SLP] Fix PR36481: vectorize reassociated instructions. Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 328980	2018-04-02 14:51:37 +00:00
Dinar Temirbulatov	c326c1c582	[SLPVectorizer] Add tests related to PR30787, NFCI. llvm-svn: 328813	2018-03-29 18:57:03 +00:00
Haicheng Wu	b45f921678	[SLP] Add more checks to a test case. NFC. llvm-svn: 328572	2018-03-26 18:59:28 +00:00
Haicheng Wu	0ec1dbe417	[SLP] Add a test case. NFC. llvm-svn: 328546	2018-03-26 16:47:37 +00:00
Matthew Simpson	6c289a1c74	[SLP] Stop counting cost of gather sequences with multiple uses When building the SLP tree, we look for reuse among the vectorized tree entries. However, each gather sequence is represented by a unique tree entry, even though the sequence may be identical to another one. This means, for example, that a gather sequence with two uses will be counted twice when computing the cost of the tree. We should only count the cost of the definition of a gather sequence rather than its uses. During code generation, the redundant gather sequences are emitted, but we optimize them away with CSE. So it looks like this problem just affects the cost model. Differential Revision: https://reviews.llvm.org/D44742 llvm-svn: 328316	2018-03-23 14:18:27 +00:00
Matthew Simpson	b17fff79f0	[SLP] Add test case for a gather sequence with multiple uses llvm-svn: 328133	2018-03-21 19:13:14 +00:00
Matthew Simpson	eacfefd056	[AArch64] Implement getArithmeticReductionCost This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702	2018-03-16 11:34:15 +00:00
Alexey Bataev	625ce229b1	[SLP] Additional tests for stores vectorization, NFC. llvm-svn: 326740	2018-03-05 20:20:12 +00:00
Mohammad Shahid	ddeee12f59	[SLP] Added new tests and updated existing for jumbled load, NFC. llvm-svn: 326303	2018-02-28 04:19:34 +00:00
Sanjay Patel	04d1d79ee5	[AArch64] add SLP test based on TSVC; NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326217	2018-02-27 18:06:15 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
Alexey Bataev	b44e2b75e8	[SLP] Added new test + fixed some checks, NFC. llvm-svn: 326117	2018-02-26 20:01:24 +00:00
Simon Pilgrim	864949d5e9	[SLPVectorizer][X86] Add load extend tests (PR36091) llvm-svn: 325772	2018-02-22 12:19:34 +00:00
Sanjay Patel	d53da082a0	[AArch64] fix IR names to not be 'tmp' because that gives the CHECK script problems llvm-svn: 325718	2018-02-21 20:48:14 +00:00
Sanjay Patel	ffe51e450f	[AArch64] add SLP test for matmul (PR36280); NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 325717	2018-02-21 20:34:16 +00:00
Alexey Bataev	cdd0675ddc	[SLP] Fix test checks, NFC. llvm-svn: 325689	2018-02-21 15:32:58 +00:00
Sanjay Patel	e6143904b9	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280) There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658	2018-02-21 01:42:52 +00:00
Alexey Bataev	47dfd249f0	[SLP] Fix tests checks, NFC. llvm-svn: 325605	2018-02-20 18:11:50 +00:00
Sanjay Patel	3e8a76abfd	[TTI CostModel] change default cost of FP ops to 1 (PR36280) This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515	2018-02-19 16:11:44 +00:00
Alexey Bataev	862c476fc2	[SLP] Fix the test for the reversed stores, NFC. llvm-svn: 325268	2018-02-15 17:11:50 +00:00
Alexey Bataev	ac619599d8	[SLP] Added test for reversed stores, NFC. llvm-svn: 325265	2018-02-15 16:56:49 +00:00
Alexey Bataev	7f246e003a	[SLP] Allow vectorization of reversed loads. Summary: Reversed loads are handled as gathering. But we can just reshuffle these values. Patch adds support for vectorization of reversed loads. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43022 llvm-svn: 325134	2018-02-14 15:29:15 +00:00
Alexey Bataev	ca2396e673	[SLP] Take user instructions cost into consideration in insertelement vectorization. Summary: For better vectorization result we should take into consideration the cost of the user insertelement instructions when we try to vectorize sequences that build the whole vector. I.e. if we have the following scalar code: ``` <Scalar code> insertelement <ScalarCode>, ... ``` we should consider the cost of the last `insertelement ` instructions as the cost of the scalar code. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42657 llvm-svn: 324893	2018-02-12 14:54:48 +00:00
Sanjay Patel	574fb73c89	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324616	2018-02-08 15:32:28 +00:00
Sanjay Patel	124392f038	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324615	2018-02-08 15:30:39 +00:00
Sanjay Patel	e2c5e9a970	[SLPVectorizer] move RUN line to top-of-file; NFC I was confused what we were checking because the RUN line was in the middle of the file. llvm-svn: 324614	2018-02-08 15:28:49 +00:00
Sanjay Patel	cfa5c03039	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324612	2018-02-08 15:16:26 +00:00
Alexey Bataev	cd8d6de381	[SLP] Add a tests for PR36280, NFC. llvm-svn: 324510	2018-02-07 20:11:37 +00:00
Alexey Bataev	1e593fe73e	[SLP] Update test checks, NFC. llvm-svn: 324387	2018-02-06 20:00:05 +00:00
Alexey Bataev	1c8f53f47d	[SLP] Add extra test for extractelement shuffle, NFC. llvm-svn: 323815	2018-01-30 21:06:06 +00:00
Alexey Bataev	9c5c103283	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323662	2018-01-29 16:08:52 +00:00
Alexey Bataev	10f5c9e765	[SLP] Add a test with extract for PR32086, NFC. llvm-svn: 323661	2018-01-29 15:56:52 +00:00
Alexey Bataev	f86be12182	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323530 to fix possible problems in users code. llvm-svn: 323581	2018-01-27 02:42:21 +00:00

1 2 3 4 5 ...

460 Commits