llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	8e682086a0	AMDGPU/GlobalISel: Explicitly track d16 for image legalization We were trying to guess at the original IR type for image intrinsics after legalization to figure out if they were d16, but this didn't work. Explicitly track if this is a d16 operation or not in the opcode, as is done for the buffer intrinsics. The OpenCL library is using f32 image writes with a dmask of 15 for some reason, and this was incorrectly switching them to use d16. Fixes image failures in the OpenCL conformance test. The equivalent dmask for loads doesn't even select in either selector.	2022-01-10 14:25:14 -05:00
Matt Arsenault	0ba4e4b500	GlobalISel: Pass DebugLoc to getFunctionLiveInPhysReg Fixes crash in assertion about dropping debug info.	2022-01-10 13:50:52 -05:00
Matt Arsenault	68468bbe15	AMDGPU: Avoid null check during addrspacecast lowering If we know the source is a valid object, we do not need to insert a null check. This misses a lot of opportunities from metadata/attributes not tracked in codegen.	2022-01-10 13:27:39 -05:00
Kazu Hirata	720c48b58e	[AMDGPU] Fix an unused variable warning (NFC) This patch fixes: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:2245:12: error: unused variable 'Ins' [-Werror,-Wunused-variable]	2022-01-10 08:49:46 -08:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Simon Pilgrim	5eb47961c4	[CostModel][X86] Update ROTL/ROTR vXi8/vXi16 costs on AVX512BW targets Refresh based off recent improvements to codegen and the helper script from D103695	2022-01-10 13:18:25 +00:00
Tim Northover	581e855623	AArch64: don't claim to preserve registers used by prologue code	2022-01-10 12:27:04 +00:00
Petar Avramovic	d9d2516aaf	AMDGPU/GlobalISel: Rework legalization for extract/insert vector elt Use G_MERGE_VALUES and G_UNMERGE_VALUES on vector elements instead of G_EXTRACT and G_INSERT when doing custom legalization for G_EXTRACT_VECTOR_ELT and G_INSERT_VECTOR_ELT. With this approach legalization artifact combiner gets direct access to all vector elements. Differential Revision: https://reviews.llvm.org/D116115	2022-01-10 13:15:20 +01:00
Cullen Rhodes	eee993ae4c	[AArch64][SVE] Fold predicate into compare Codegen of added testcase before this patch: ptrue p0.s cmpgt p1.s, p0/z, z0.s, z1.s cmpge p2.s, p0/z, z2.s, z1.s and p0.b, p0/z, p1.b, p2.b ret Patterns originally authored by Will Lovett. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D116749	2022-01-10 10:52:06 +00:00
Craig Topper	b645bcd98a	[RISCV] Generalize (srl (and X, 0xffff), C) -> (srli (slli X, (XLen-16), (XLen-16) + C) optimization. This can be generalized to (srl (and X, C2), C) -> (srli (slli X, (XLen-C3), (XLen-C3) + C). Where C2 is a mask with C3 trailing ones. This can avoid constant materialization for C2. This is beneficial even when C2 can be selected to ANDI because the SLLI can become C.SLLI, but C.ANDI cannot cover all the immediates of ANDI. This also enables CSE in some cases of i8 sdiv by constant codegen.	2022-01-09 23:37:10 -08:00
Pavel Kosov	34a91d7748	[SchedModels][CortexA55] Fix scheduling of FP loads Patch fixes scheduling of FP load instructions with pre/post increment adding WriteAdr for address operand. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D116361 OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg	2022-01-10 10:14:45 +03:00
Zi Xuan Wu	8ddc816929	[CSKY] Lower leaf DAG node such as global symbol, frame address and jumptable, etc. Lower global symbols such as call/external symbol. Lower other leaf DAG node such as frame address/block address/jumptable/vastart. Normally some leaf symbols need reside in constant pool as ABI prefers, and are addressed by lrw or jsri instructions. Every symbol in constant pool is lowered with one entry in target constant pool. The entry has different type corresponding to different leaf node such as blockaddress, jumptable, or global value.	2022-01-10 14:35:07 +08:00
Craig Topper	296e8cae5c	[RISCV] Isel (sra (sext_inreg X, i16), C) -> (srai (slli X, (XLen-16), (XLen-16) + C). Similar for (sra (sext_inreg X, i8), C). With Zbb, sext_inreg of i8 and i16 are legal for sext.b and sext.h. This transform makes the Zbb codegen the same as without Zbb. The shifts are more compressible. This also exposes an opportunity for CSE with another slli in the i16 sdiv by constant codegen.	2022-01-09 21:23:43 -08:00
jacquesguan	6b8362eb8d	[RISCV] Disable EEW=64 for index values when XLEN=32. Disable EEW=64 for vector index load/store when XLEN=32. Differential Revision: https://reviews.llvm.org/D106518	2022-01-10 10:51:27 +08:00
Craig Topper	2dd52f840b	[RISCV] Fold (srl (and X, 0xffff), C)->(srli (slli X, (XLen-16), (XLen-16) + C) even with Zbb/Zbp. We can use zext.h with Zbb, but srli/slli may offer more opportunities for compression.	2022-01-09 18:42:03 -08:00
Chen Zheng	2c46ca96e2	[PowerPC] fast isel can lower intrinsics call on AIX. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D114778	2022-01-10 02:30:05 +00:00
Alexander Shaposhnikov	22430ede7e	[CodeGen] Rename emitCalleeSavedFrameMoves This diff renames emitCalleeSavedFrameMoves to avoid conflicts with non-virtual methods of derived classes having the same name but different semantics. E.g. the class AArch64FrameLowering used to have (non-virtual) "emitCalleeSavedFrameMoves" but it started to override TargetFrameLowering::emitCalleeSavedFrameMoves after https://github.com/llvm/llvm-project/commit/c3e6555616 though its usage and semantics didn't change. P.S. for x86 there was no conflict because the signature of non-virtual X86FrameLowering::emitCalleeSavedFrameMoves is different Test plan: make check-all Differential revision: https://reviews.llvm.org/D114140	2022-01-10 01:33:04 +00:00
Sanjay Patel	e745507eda	[x86] exclude "X==0 ? Y :-1" from math/logic transform This is the last step in a series to improve lowering via "SBB" asm: `68defc0134` `aab1f55e33` ...and fixes #53006	2022-01-09 09:03:39 -05:00
Sanjay Patel	aab1f55e33	[x86] use SETCC_CARRY instead of SBB node for select lowering This is a suggested follow-up to D116765. This removes a clear of the register operand, so it is better for code size, but it does potentially create a false register dependency on surrounding code. If that is a problem, it should be solvable using dependency-breaking code that is used for other instructions. Differential Revision: https://reviews.llvm.org/D116804	2022-01-09 06:23:50 -05:00
Kazu Hirata	f44473ec4e	[llvm] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-08 11:56:44 -08:00
Kazu Hirata	435a5a3652	[llvm] Fix bugprone argument comments (NFC) Identified with bugprone-argument-comment.	2022-01-08 11:56:38 -08:00
Simon Pilgrim	75d8507e45	[X86] LowerRotate - enable ROTL vXi16 rotate-by-splat-amount on pre-AVX targets To enable this on all targets there's still a number of regressions due to getSplatValue/getTargetVShiftNode but these don't really affect pre-AVX targets.	2022-01-08 14:57:00 +00:00
Simon Pilgrim	b5d2e232b8	[X86][SSE] Add initial FSHL/FSHR vXi8 lowering support This is very similar to the existing ROTL/ROTR support for scalar shifts in LowerRotate, I think as time goes on we should be able to share much of this code in helpers between Funnel Shift + Rotation lowering.	2022-01-08 12:19:25 +00:00
Jay Foad	ff971873b3	[GlobalISel] Fix legality checks for G_UBFX combines 1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality with the correct types for the G_UBFX that it builds. 2. Fix AMDGPUTargetLowering::isConstantUnsignedBitfieldExtractLegal to match the legality rules: result and first operand can be s32 or s64 but the "shift amount" operands are always s32. 3. Add AMDGPU tests where the post-legalizer combiner would create illegal MIR without the above fixes. Differential Revision: https://reviews.llvm.org/D116802	2022-01-08 09:20:44 +00:00
Kazu Hirata	9d74582810	[Target] use range-based for loops (NFC)	2022-01-07 21:20:36 -08:00
Craig Topper	042394b69e	[RISCV] Add a command line option to control the LMUL used by TTI's getRegisterBitWidth. By default we return the width of an LMUL=1 register. We can enable testing with larger LMUL values by returning a larger bit width. This patch adds a RISCV specific option to provide a LMUL which will be multiplied by the LMUL=1 bit width. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D116339	2022-01-07 20:02:10 -08:00
Kazu Hirata	4e2ec7e38d	[llvm] Remove unused forward declarations (NFC)	2022-01-07 20:00:34 -08:00
Kito Cheng	f142c45f1e	[RISCV] Set getMinVectorRegisterBitWidth to 16 if enable fixed length vector code gen for RVV getMinVectorRegisterBitWidth means what vector types is supported in this target, and actually RISC-V support all fixed length vector types with vector length less than `getMinRVVVectorSizeInBits`, so set it to 16, means 2 x i8, that is minimal fixed length vector size in theory. That also fixed one issue, some testcase migth become non-vectorizable when `-riscv-v-vector-bits-min` set to larger value, because the vector size is smaller than `-riscv-v-vector-bits-min`. For example, following code can vectorize by SLP with `-riscv-v-vector-bits-min=128` or `-riscv-v-vector-bits-min=256`, but can't vectorize `-riscv-v-vector-bits-min=512` or larger: ``` void foo(double *da) { da[0] = 0; da[1] = 1; da[2] = 2; da[3] = 3; } ``` Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116534	2022-01-08 11:16:21 +08:00
Baoshan Pang	af931a51b9	[RISCV] Materializing constants with 'rori' Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116574	2022-01-07 15:39:22 -08:00
Vitaly Buka	5c46c1c23a	Initialize output parameter Or code like this have UB passing uninitialized CmpValue: ``` int64_t CmpMask, CmpValue; if (!TII->analyzeCompare(MI, SrcReg, SrcReg2, CmpMask, CmpValue)) return false; if (TII->optimizeCompareInstr(MI, SrcReg, SrcReg2, CmpMask, CmpValue, MRI)) { ``` Detected by msan with: -Xclang -enable-noundef-analysis -mllvm -msan-eager-checks=1 Differential Revision: https://reviews.llvm.org/D116831	2022-01-07 15:21:22 -08:00
Vitaly Buka	bd9ae596d8	Initialize ExtAddrMode::Scale Detected by msan with: -Xclang -enable-noundef-analysis -mllvm -msan-eager-checks=1 Differential Revision: https://reviews.llvm.org/D116830	2022-01-07 15:21:22 -08:00
Sumanth Gundapaneni	ec2945d031	[Hexagon] Reconize M2_mnaci in HexagonBitTracker	2022-01-07 14:48:29 -08:00
Krzysztof Parzyszek	07ecb98798	[Hexagon] Use map from HexagonDepArch instead of local one, NFC Co-authored-by: Brian Cain <bcain@quicinc.com>	2022-01-07 13:02:57 -08:00
Krzysztof Parzyszek	d9ee9a1419	[Hexagon] Extract condition into function, NFC Co-authored-by: Brian Cain <bcain@quicinc.com>	2022-01-07 12:35:12 -08:00
Krzysztof Parzyszek	dfbe74be63	[Hexagon] Fix release build break after `5476585673`	2022-01-07 12:21:02 -08:00
Michael Lambert	028444c2b3	[Hexagon] Duplex error: wrong branch hint	2022-01-07 12:04:01 -08:00
colinl	4096ef3ed7	[Hexagon] Consider direction hint forming dealloc_return duplex	2022-01-07 12:04:00 -08:00
colinl	5476585673	[Hexagon] Improve check for subinstruction registers	2022-01-07 11:33:14 -08:00
Yuanxiang Ye	137642f433	[Hexagon] Reject accumulating on vd.tmp Added hvx accum checker function and test cases.	2022-01-07 11:13:19 -08:00
Brian Cain	1f71e46f2a	[Hexagon] Apply tiny core packet size slots limit	2022-01-07 10:33:12 -08:00
colinl	a247360173	[Hexagon] Simplify AX instruction detection	2022-01-07 10:33:12 -08:00
Sanjay Patel	68defc0134	[x86] make select lowering using SBB hack more flexible select (X != 0), -1, Y --> 0 - X; or (sbb), Y select (X != 0), Y, -1 --> X - 1; or (sbb), Y We already had these x86 carry-flag transforms, but one was over-specified to handle a "0" select arm only. That's just a special-case of the more general pattern (the 'or' will be deleted if Y is zero). This is part of solving #53006, but it misses that example because some other combine has already converted that exact pattern into math ops. Differential Revision: https://reviews.llvm.org/D116765	2022-01-07 13:23:09 -05:00
Brian Cain	9af53d2f0c	[Hexagon] s/Fatal/ReportErrors/ Rename argument from 'Fatal' => 'ReportErrors'. HexagonShuffler refers to this arg as 'ReportErrors' and calling it 'Fatal' in HexagonMCShuffler is misleading and inconsistent.	2022-01-07 08:27:34 -08:00
Brian Cain	a58a062fba	[Hexagon] Show slot resources for errors For a scalar packet resource error, emit details about the slots available for each instruction in the packet.	2022-01-07 08:27:33 -08:00
Krzysztof Parzyszek	88397739a3	[Hexagon] Misc shuffling fixes Co-authored-by: Brian Cain <bcain@quicinc.com>	2022-01-07 08:27:33 -08:00
David Green	bc615e436c	[AArch64] Update addo and subo costs Similar to D116732, this adds basic scalar sadd_with_overflow, uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for aarch64, which are usually quite efficiently lowered. Differential Revision: https://reviews.llvm.org/D116734	2022-01-07 16:20:23 +00:00
Luo, Yuanke	21babe4db3	[X86] Combine reduce(add (mul x, y)) to VNNI instruction. For below C code, we can use VNNI to combine the mul and add operation. int usdot_prod_qi(unsigned char restrict a, char restrict b, int c, int n) { int i; for (i = 0; i < 32; i++) { c += ((int)a[i] * (int)b[i]); } return c; } We didn't support the combine acoss basic block in this patch. Differential Revision: https://reviews.llvm.org/D116039	2022-01-07 21:12:19 +08:00
alex-t	5d46263a5a	[AMDGPU] Enable divergence-driven 'ctpop' selection This change adds the patterns and divergence predicates for the ctpop (bitcount) nodes to make them selected according to the divergence. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D116284	2022-01-07 16:07:38 +03:00
Jay Foad	3f3fe4a5cf	[GlobalISel] Fix typo Extact to Extract in function name. NFC.	2022-01-07 11:13:35 +00:00
Lian Wang	e8f1dfe923	[RISCV] Supplement PACKH instruction pattern Optimize (rs1 & 255) \| ((rs2 & 255) << 8) -> (PACKH rs1, rs2). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116791	2022-01-07 17:59:19 +08:00

1 2 3 4 5 ...

65490 Commits