llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	8fe8adb9f1	[X86] Add v2i64->v2i32/v2i16/v2i8 test cases to the trunc packus/ssat/usat tests. NFC llvm-svn: 374704	2019-10-13 05:47:42 +00:00
Simon Pilgrim	9f0885d38d	[X86][SSE] Avoid unnecessary PMOVZX in v4i8 sum reduction This should go away once D66004 has landed and we can simplify shuffle chains using demanded elts. llvm-svn: 374658	2019-10-12 15:19:13 +00:00
Craig Topper	9bd542dcd5	[X86] Use pack instructions for packus/ssat truncate patterns when 256-bit is the largest legal vector and the result type is at least 256 bits. Since the input type is larger than 256-bits we'll need to some concatenating to reassemble the results. The pack instructions ability to concatenate while packing make this a shorter/faster sequence. llvm-svn: 374643	2019-10-12 07:59:29 +00:00
Craig Topper	80a4feed7c	[X86] Test SKX cpu in the vector-trunc-packus/ssat/usat.ll tests instad of min-legal-vector-width.ll This adds "min-legal-vector-width"="256" function attributes to all the tests for a larger than 256-bit input. Also switch any larger than 512-bit inputs to use a load. This makes the arguments consistent with min-legal-vector-width attribute which should usually be at least as large as the arguments. The SKX configuration will avoid using zmm registers on the modified test cases. For many of them we should use something closer to the AVX2 codegen with pack instructions instead of the avx512 saturating truncates. llvm-svn: 374642	2019-10-12 07:59:24 +00:00
Craig Topper	3472feb94c	[X86] Fold a VTRUNCS/VTRUNCUS+store into a saturating truncating store. We already did this for VTRUNCUS with a specific combination of types. This extends this to VTRUNCS and handles any types where a truncating store is legal. llvm-svn: 374615	2019-10-12 00:01:08 +00:00
Craig Topper	7dcd440d44	[X86] Add test case showing missing opportunity to fold vmovsdb into a store after type legalization. NFC llvm-svn: 374614	2019-10-12 00:00:59 +00:00
Stanislav Mekhanoshin	f87fe45d5c	[AMDGPU] Use GCN prefix in dpp_combine.mir. NFC. llvm-svn: 374607	2019-10-11 22:28:04 +00:00
Stanislav Mekhanoshin	e2d104f64c	[AMDGPU] link dpp pseudos and real instructions on gfx10 This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 llvm-svn: 374604	2019-10-11 22:03:36 +00:00
David Blaikie	289c45cc62	DebugInfo: Use base address selection entries for debug_loc Unify the range and loc emission (for both DWARFv4 and DWARFv5 style lists) and take advantage of that unification to use strategic base addresses for loclists. Differential Revision: https://reviews.llvm.org/D68620 llvm-svn: 374600	2019-10-11 21:52:41 +00:00
David Green	7c30af8e65	Revert 374373: [Codegen] Alter the default promotion for saturating adds and subs This commit is not extending the promoted integers as it should. Reverting whilst I look into the details. llvm-svn: 374592	2019-10-11 20:33:03 +00:00
Quentin Colombet	9c36ec5941	[GISel][CallLowering] Enable vector support in argument lowering The exciting code is actually already enough to handle the splitting of vector arguments but we were lacking a test case. This commit adds a test case for vector argument lowering involving splitting and enable the related support in call lowering. llvm-svn: 374589	2019-10-11 20:22:57 +00:00
David Blaikie	f358c3d371	llvm-dwarfdump: Add verbose printing for debug_loclists llvm-svn: 374582	2019-10-11 19:06:35 +00:00
Simon Pilgrim	af6c15f679	[X86][SSE] Add support for v4i8 add reduction llvm-svn: 374579	2019-10-11 17:54:15 +00:00
Sanjay Patel	781c49de9c	[AArch64] add tests for (v)select-of-constants; NFC These are copied from existing test files in x86/PPC. llvm-svn: 374568	2019-10-11 16:10:23 +00:00
Kerry McLaughlin	ee0a0a3464	[AArch64][SVE] Implement sdot and udot (lane) intrinsics Summary: Implements the following arithmetic intrinsics: - int_aarch64_sve_sdot - int_aarch64_sve_sdot_lane - int_aarch64_sve_udot - int_aarch64_sve_udot_lane This patch includes tests for the Subdivide4Argument type added by D67549 Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D67551 llvm-svn: 374566	2019-10-11 15:53:41 +00:00
David Tenty	033d16cedc	[AIX] Use .space instead of .zero in assembly Summary: The AIX system assembler does not understand .zero, so we should prefer emitting .space. Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68815 llvm-svn: 374564	2019-10-11 15:07:28 +00:00
Sanjay Patel	3b581ac80f	[DAGCombiner] fold vselect-of-constants to shift The diffs suggest that we are missing some more basic analysis/transforms, but this keeps the vector path in sync with the scalar (rL374397). This is again a preliminary step for introducing the reverse transform in IR as proposed in D63382. llvm-svn: 374555	2019-10-11 14:17:56 +00:00
QingShan Zhang	bb8d540010	[TableGen] Fix a bug that MCSchedClassDesc is interfered between different SchedModel Assume that, ModelA has scheduling resource for InstA and ModelB has scheduling resource for InstB. This is what the llvm::MCSchedClassDesc looks like: llvm::MCSchedClassDesc ModelASchedClasses[] = { ... InstA, 0, ... InstB, -1,... }; llvm::MCSchedClassDesc ModelBSchedClasses[] = { ... InstA, -1,... InstB, 0,... }; The -1 means invalid num of macro ops, while it is valid if it is >=0. This is what we look like now: llvm::MCSchedClassDesc ModelASchedClasses[] = { ... InstA, 0, ... InstB, 0,... }; llvm::MCSchedClassDesc ModelBSchedClasses[] = { ... InstA, 0,... InstB, 0,... }; And compiler hit the assertion here because the SCDesc is valid now for both InstA and InstB. Differential Revision: https://reviews.llvm.org/D67950 llvm-svn: 374524	2019-10-11 08:36:54 +00:00
Craig Topper	e0cb1cf7e3	[X86] Add v8i64->v8i8 ssat/usat/packus truncate tests to min-legal-vector-width.ll I wonder if we should split the v8i8 stores in order to form two v4i8 saturating truncating stores. This would remove the unpckl needed to concatenated the v4i8 results to make a single store. llvm-svn: 374519	2019-10-11 07:24:36 +00:00
Yi-Hong Lyu	2fbfb04ffe	[PowerPC] Remove assertion "Shouldn't overwrite a register before it is killed" The assertion is everzealous and fail tests like: renamable $x3 = LI8 0 STD renamable $x3, 16, $x1 renamable $x3 = LI8 0 Remove the assertion since killed flag of $x3 is not mandentory. Differential Revision: https://reviews.llvm.org/D68344 llvm-svn: 374515	2019-10-11 05:32:29 +00:00
Craig Topper	ccc85ac855	[X86] Add a DAG combine to turn v16i16->v16i8 VTRUNCUS+store into a saturating truncating store. llvm-svn: 374509	2019-10-11 04:16:49 +00:00
Craig Topper	4b9947e2e7	[X86] Add test case for trunc_packus_v16i32_v16i8_store to min-legal-vector-width.ll We aren't folding the vpmovuswb into the store. llvm-svn: 374507	2019-10-11 04:02:04 +00:00
Craig Topper	32097c2696	[X86] Add more packus/ssat/usat truncate tests from legal vectors to less than 128-bit vectors. Some of these have sub-optimal codegen for avx512 relative to avx2. llvm-svn: 374505	2019-10-11 03:46:39 +00:00
Craig Topper	b560fd6c52	[X86] Improve the AVX512 bailout in combineTruncateWithSat to allow pack instructions in more situations. If we don't have VLX we won't end up selecting a saturating truncate for 256-bit or smaller vectors so we should just use the pack lowering. llvm-svn: 374487	2019-10-11 00:38:51 +00:00
Craig Topper	4dc27c69b6	[X86] Update trunc_packus_v32i32_v32i8 test in min-legal-vector-width.ll to use a load for the large type and add the min-legal-vector-width attribute. The attribute is needed to avoid zmm registers. Using memory avoids argument splitting for large vectors. llvm-svn: 374486	2019-10-11 00:38:41 +00:00
Craig Topper	a0df8b72f2	[X86] Add test cases for packus/ssat/usat 32i32->v32i8 test cases. NFC llvm-svn: 374459	2019-10-10 21:46:44 +00:00
Marcello Maggioni	0112123eea	[GISel] Allow getConstantVRegVal() to return G_FCONSTANT values. In GISel we have both G_CONSTANT and G_FCONSTANT, but because in GISel we don't really have a concept of Float vs Int value the only difference between the two is where the data originates from. What both G_CONSTANT and G_FCONSTANT return is just a bag of bits with the constant representation in it. By making getConstantVRegVal() return G_FCONSTANTs bit representation as well we allow ConstantFold and other things to operate with G_FCONSTANT. Adding tests that show ConstantFolding to work on mixed G_CONSTANT and G_FCONSTANT sources. Differential Revision: https://reviews.llvm.org/D68739 llvm-svn: 374458	2019-10-10 21:46:26 +00:00
Stanislav Mekhanoshin	19a1a739b1	[AMDGPU] Handle undef old operand in DPP combine It was missing an undef flag. Differential Revision: https://reviews.llvm.org/D68813 llvm-svn: 374455	2019-10-10 21:32:41 +00:00
Sanjay Patel	8dd16ed0c8	[x86] reduce duplicate test assertions; NFC llvm-svn: 374436	2019-10-10 19:52:27 +00:00
Craig Topper	0e561437c5	[X86] Use packusdw+vpmovuswb to implement v16i32->V16i8 that clamps signed inputs to be between 0 and 255 when zmm registers are disabled on SKX. If we've disable zmm registers, the v16i32 will need to be split. This split will propagate through min/max the truncate. This creates two sequences that need to be concatenated back to v16i8. We can instead use packusdw to do part of the clamping, truncating, and concatenating all at once. Then we can use a vpmovuswb to finish off the clamp. Differential Revision: https://reviews.llvm.org/D68763 llvm-svn: 374431	2019-10-10 19:40:44 +00:00
Reid Kleckner	67d440b949	Print quoted backslashes in LLVM IR as \\ instead of \5C This improves readability of Windows path string literals in LLVM IR. The LLVM assembler has supported \\ in IR strings for a long time, but the lexer doesn't tolerate escaped quotes, so they have to be printed as \22 for now. llvm-svn: 374415	2019-10-10 18:31:57 +00:00
Sanjay Patel	7b904ce724	[DAGCombiner] fold select-of-constants to shift This reverses the scalar canonicalization proposed in D63382. Pre: isPowerOf2(C1) %r = select i1 %cond, i32 C1, i32 0 => %z = zext i1 %cond to i32 %r = shl i32 %z, log2(C1) https://rise4fun.com/Alive/Z50 x86 already tries to fold this pattern, but it isn't done uniformly, so we still see a diff. AArch64 probably should enable the TLI hook to benefit too, but that's a follow-on. llvm-svn: 374397	2019-10-10 17:52:02 +00:00
David Green	8628bb0491	[ARM] VQSUB instruction Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68567 llvm-svn: 374377	2019-10-10 16:34:30 +00:00
David Green	94d379095a	[Codegen] Alter the default promotion for saturating adds and subs The default promotion for the add_sat/sub_sat nodes currently does: 1. ANY_EXTEND iN to iM 2. SHL by M-N 3. [US][ADD\|SUB]SAT 4. L/ASHR by M-N If the promoted add_sat or sub_sat node is not legal, this can produce code that effectively does a lot of shifting (and requiring large constants to be materialised) just to use the overflow flag. It is simpler to just do the saturation manually, using the higher bitwidth addition and a min/max against the saturating bounds. That is what this patch attempts to do. Differential Revision: https://reviews.llvm.org/D68643 llvm-svn: 374373	2019-10-10 16:04:49 +00:00
Yonghong Song	d46a6a9e68	[BPF] Remove relocation for patchable externs Previously, patchable extern relocations are introduced to patch external variables used for multi versioning in compile once, run everywhere use case. The load instruction will be converted into a move with an patchable immediate which can be changed by bpf loader on the host. The kernel verifier has evolved and is able to load and propagate constant values, so compiler relocation becomes unnecessary. This patch removed codes related to this. Differential Revision: https://reviews.llvm.org/D68760 llvm-svn: 374367	2019-10-10 15:33:09 +00:00
Stanislav Mekhanoshin	cbe55c7caf	[AMDGPU] Fixed dpp_combine.mir with expensive checks. NFC. llvm-svn: 374365	2019-10-10 15:28:52 +00:00
Simon Pilgrim	6a38474f77	[X86] combineFMA - Convert to use isNegatibleForFree/GetNegatedExpression. Split off from D67557. llvm-svn: 374356	2019-10-10 14:14:12 +00:00
Dmitri Gribenko	eaf6dd482b	Revert "[IRBuilder] Update IRBuilder::CreateFNeg(...) to return a UnaryOperator" This reverts commit r374240. It broke OCaml tests: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19014 llvm-svn: 374354	2019-10-10 14:13:54 +00:00
Amaury Sechet	aaf0507896	[DAGCombine] Match more patterns for half word bswap Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68250 llvm-svn: 374340	2019-10-10 13:20:10 +00:00
David Green	39596ec2fe	[ARM] VQADD instructions This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336	2019-10-10 13:05:04 +00:00
Sanjay Patel	3370d4d2b7	[AArch64][x86] add tests for (v)select bit magic; NFC llvm-svn: 374334	2019-10-10 12:53:24 +00:00
Mirko Brkusanin	c2e481679b	[Mips] Fix 374055 EXPENSIVE_CHECKS build was failing on new test. This is fixed by marking $ra register as undef. Test now has -verify-machineinstrs to check for operand flags. llvm-svn: 374320	2019-10-10 12:02:14 +00:00
Oliver Stannard	4f454b2275	[IfCvt][ARM] Optimise diamond if-conversion for code size Currently, the heuristics the if-conversion pass uses for diamond if-conversion are based on execution time, with no consideration for code size. This adds a new set of heuristics to be used when optimising for code size. This is mostly target-independent, because the if-conversion pass can see the code size of the instructions which it is removing. For thumb, there are a few passes (insertion of IT instructions, selection of narrow branches, and selection of CBZ instructions) which are run after if conversion and affect these heuristics, so I've added target hooks to better predict the code-size effect of a proposed if-conversion. Differential revision: https://reviews.llvm.org/D67350 llvm-svn: 374301	2019-10-10 09:58:28 +00:00
Matt Arsenault	12994a70cf	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284	2019-10-10 07:11:33 +00:00
Craig Topper	0a84576262	[X86] Add test case for trunc_packus_v16i32_v16i8 with avx512vl+avx512bw and prefer-vector-width=256 and min-legal-vector-width=256. NFC llvm-svn: 374283	2019-10-10 06:25:00 +00:00
Chen Zheng	92e00293fd	[PowerPC] add testcase for ppc loop instr form prep - NFC llvm-svn: 374273	2019-10-10 03:00:15 +00:00
Thomas Lively	3414bce07a	[WebAssembly] Fix tests missed in rL374235 llvm-svn: 374259	2019-10-09 23:06:38 +00:00
Matt Arsenault	f8bf7d7f42	AMDGPU: Don't fold copies to physregs In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257	2019-10-09 22:51:42 +00:00
Matt Arsenault	85dfa82302	AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointer This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255	2019-10-09 22:44:49 +00:00
Matt Arsenault	3cd3959fe2	GlobalISel: Implement fewerElementsVector for G_BUILD_VECTOR Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252	2019-10-09 22:44:43 +00:00

1 2 3 4 5 ...

31019 Commits