llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	8562d2c040	[AArch64] Regenerate min/max tests and add vXi64 umin/umax test coverage	2020-11-26 15:33:39 +00:00
Kerry McLaughlin	4bee3197f6	[SVE][CodeGen] Extend isConstantSplatValue to support ISD::SPLAT_VECTOR Updated the affected scalable_of_scalable tests in sve-gep.ll, as isConstantSplatValue now returns true in DAGCombiner::visitMUL and folds `(mul x, 1) -> x` Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91363	2020-11-26 11:19:40 +00:00
Craig Topper	aea130f736	[LegalizerTypes] Add support for scalarizing the operand of an FP_EXTEND when the result type is legal.	2020-11-25 20:30:21 -08:00
Paul Robinson	cf1c774d6a	[FastISel] Flush local value map on ever instruction Local values are constants or addresses that can't be folded into the instruction that uses them. FastISel materializes these in a "local value" area that always dominates the current insertion point, to try to avoid materializing these values more than once (per block). https://reviews.llvm.org/D43093 added code to sink these local value instructions to their first use, which has two beneficial effects. One, it is likely to avoid some unnecessary spills and reloads; two, it allows us to attach the debug location of the user to the local value instruction. The latter effect can improve the debugging experience for debuggers with a "set next statement" feature, such as the Visual Studio debugger and PS4 debugger, because instructions to set up constants for a given statement will be associated with the appropriate source line. There are also some constants (primarily addresses) that could be produced by no-op casts or GEP instructions; the main difference from "local value" instructions is that these are values from separate IR instructions, and therefore could have multiple users across multiple basic blocks. D43093 avoided sinking these, even though they were emitted to the same "local value" area as the other instructions. The patch comment for D43093 states: Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. This patch undoes most of D43093, and instead flushes the local value map after() every IR instruction, using that instruction's debug location. This avoids sometimes incorrect locations used previously, and emits instructions in a more natural order. This does mean materialized values are not re-used across IR instruction boundaries; however, only about 5% of those values were reused in an experimental self-build of clang. () Actually, just prior to the next instruction. It seems like it would be cleaner the other way, but I was having trouble getting that to work. Differential Revision: https://reviews.llvm.org/D91734	2020-11-25 13:05:00 -05:00
Mark Murray	2b6691894a	[ARM][AArch64] Adding Neoverse N2 CPU support Add support for the Neoverse N2 CPU to the ARM and AArch64 backends. Differential Revision: https://reviews.llvm.org/D91695	2020-11-25 11:42:54 +00:00
Kerry McLaughlin	603d40da9d	[SVE][CodeGen] Add a DAG combine to extend mscatter indices This patch adds a target-specific DAG combine for mscatter to promote indices with element types i8 or i16 before legalisation, plus various tests with illegal types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90945	2020-11-25 11:18:22 +00:00
Kai Luo	5931be60b5	[DAGCombine][PowerPC] Convert negated abs to trivial arithmetic ops This patch converts `0 - abs(x)` to `Y = sra (X, size(X)-1); sub (Y, xor (X, Y))` for better codegen. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D91120	2020-11-24 09:43:35 +00:00
Amara Emerson	ca7fdf7ce0	[AArch64][GlobalISel] Add pre-isel lowering to convert p0 G_DUPs to use s64. This uses the same reasoning as other similar conversions just before selection, without it we miss out on selection because the importer considers s64 and p0 distinct types.	2020-11-23 22:59:35 -08:00
Amara Emerson	0fb76b9035	[AArch64][GlobalISel] Make <2 x p0> of G_SHUFFLE_VECTOR legal.	2020-11-23 22:59:35 -08:00
Martin Storsjö	6f792041a5	Reapply "[CodeGen] [WinException] Only produce handler data at the end of the function if needed" This reapplies `36c64af9d7` in updated form. Emit the xdata for each function at .seh_endproc. This keeps the exact same output header order for most code generated by the LLVM CodeGen layer. (Sections still change order for code built from assembly where functions lack an explicit .seh_handlerdata directive, and functions with chained unwind info.) The practical effect should be that assembly output lacks superfluous ".seh_handlerdata; .text" pairs at the end of functions that don't handle exceptions, which allows such functions to use the AArch64 packed unwind format again. Differential Revision: https://reviews.llvm.org/D87448	2020-11-23 23:17:03 +02:00
Craig Topper	4252f7773a	[SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets. X86 was already specially marking fma as commutable which allowed tablegen to autogenerate commuted patterns. This moves it to the target independent definition and fix up the targets to remove now unneeded patterns. Unfortunately, the tests change because the commuted version of the patterns are generating operands in a different than the explicit patterns. Differential Revision: https://reviews.llvm.org/D91842	2020-11-23 10:09:20 -08:00
Matt Arsenault	1d1234b2a4	OpaquePtr: Update more tests to use typed sret	2020-11-20 20:08:43 -05:00
Matt Arsenault	20c43d6bd5	OpaquePtr: Bulk update tests to use typed sret	2020-11-20 17:58:26 -05:00
Amara Emerson	c58df88886	[AArch64][GlobalISel] Make G_EXTRACT_VECTOR_ELT of <2 x p0> legal. Also fix a selection issue for this which was using LLT::isScalar() when it should have been using !isVector(), add test for that too.	2020-11-20 14:07:45 -08:00
Matt Arsenault	06c192d454	OpaquePtr: Bulk update tests to use typed byval Upgrade of the IR text tests should be the only thing blocking making typed byval mandatory. Partially done through regex and partially manual.	2020-11-20 14:00:46 -05:00
Andrew Wei	1cd19fc568	[DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead Patched by: guopeilin Reviewed By: hliao,rampitec Differential Revision: https://reviews.llvm.org/D91513	2020-11-21 00:43:23 +08:00
Pavel Iliin	4d7df43ffd	[AArch64] Out-of-line atomics (-moutline-atomics) implementation. This patch implements out of line atomics for LSE deployment mechanism. Details how it works can be found in llvm/docs/Atomics.rst Options -moutline-atomics and -mno-outline-atomics to enable and disable it were added to clang driver. This is clang and llvm part of out-of-line atomics interface, library part is already supported by libgcc. Compiler-rt support is provided in separate patch. Differential Revision: https://reviews.llvm.org/D91157	2020-11-20 13:30:12 +00:00
QingShan Zhang	1b5921f4d8	[NFC][Test] Update test for IEEE Long Double	2020-11-20 09:57:45 +00:00
Adhemerval Zanella	807320119f	[AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16 The compiler-rt part which adds the emitted symbols is handled in a subsequent patch. Differential Revision: https://reviews.llvm.org/D91731	2020-11-19 15:14:50 -03:00
Florian Hahn	1983acce7c	[SelDAGBuilder] Do not require simple VTs for constraints. In some cases, the values passed to `asm sideeffect` calls cannot be mapped directly to simple MVTs. Currently, we crash in the backend if that happens. An example can be found in the @test_vector_too_large_r_m test case, where we pass <9 x float> vectors. In practice, this can happen in cases like the simple C example below. using vec = float __attribute__((ext_vector_type(9))); void f1 (vec m) { asm volatile("" : "+r,m"(m) : : "memory"); } One case that use "+r,m" constraints for arbitrary data types in practice is google-benchmark's DoNotOptimize. This patch updates visitInlineAsm so that it use MVT::Other for constraints with complex VTs. It looks like the rest of the backend correctly deals with that and properly legalizes the type. And we still report an error if there are no registers to satisfy the constraint. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D91710	2020-11-19 09:31:54 +00:00
Kai Luo	5f0ae23e71	[X86][AArch64][RISCV] Pre-commit negated abs test case. NFC.	2020-11-19 02:31:45 +00:00
Florian Hahn	a9adb62a64	[AsmPrinter] Use getMnemonic for instruction-mix remark. This patch uses the new `getMnemonic` helper from D90039 to display mnemonics instead of the internal opcodes. The main motivation behind using the mnemonics is that they are more user-friendly and more directly related to the assembly the users will be presented. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D90040	2020-11-17 12:12:47 +00:00
Jessica Paquette	5bc0bd05e6	[AArch64][GlobalISel] Fold G_XOR x, -1 into G_SELECT and select CSINV When we see ``` xor = G_XOR xor_lhs, -1 select = G_SELECT cc, tval, xor ``` Fold this into ``` select = CSINV tval, xor_lhs, cc ``` Update select-select.mir to reflect the changes. For now, only handle the case where the G_XOR is the false-value for the G_SELECT. It may make more sense to handle the true-value case in post-legalizer lowering. Differential Revision: https://reviews.llvm.org/D90774	2020-11-16 14:14:14 -08:00
Amara Emerson	0b6090699a	[AArch64][GlobalISel] Look through a G_ZEXT when trying to match shift-extended register offsets. The G_ZEXT in these cases seems to actually come from a combine that we do but SelectionDAG doesn't. Looking through it allows us to match "uxtw #2" addressing modes. Differential Revision: https://reviews.llvm.org/D91475	2020-11-16 10:50:46 -08:00
David Green	2104783d02	[AArch64] Remove unused check prefixes. NFC	2020-11-14 18:30:17 +00:00
Jessica Paquette	9a8bfe3835	[AArch64][GlobalISel] Select G_SELECT cc, t, (G_SUB 0, x) -> CSNEG t, x, cc When we see ``` %sub = G_SUB 0, %x %select = G_SELECT %cc, %t, %sub ``` Fold away the G_SUB by producing ``` %select = CSNEG %t, %x, cc ``` Simple IR example: https://godbolt.org/z/K8TEnh This is valid on both sides of the select, but for now, just handle one side. It may make more sense to handle swapping sides during post-legalizer lowering. Differential Revision: https://reviews.llvm.org/D90723	2020-11-13 10:12:51 -08:00
Kerry McLaughlin	306c8ab208	[SVE][CodeGen] Improve codegen of scalable masked scatters If the scatter store is able to perform the sign/zero extend of its index, this is folded into the instruction with refineIndexType(). Additionally, refineUniformBase() will return the base pointer and index from an add + splat_vector. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90942	2020-11-13 11:19:36 +00:00
Jessica Paquette	d0ba6c4002	[AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants Select the following: - G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc - G_SELECT cc 0, -1 -> CSINV zreg, zreg cc - G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc - G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc - G_SELECT cc, t, 1 -> CSINC t, zreg, cc - G_SELECT cc, t, -1 -> CSINC t, zreg, cc (IR example: https://godbolt.org/z/YfPna9) These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td. Unfortunately, it doesn't seem like we can import patterns that use NZCV like those ones do. E.g. ``` def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV), (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>; ``` So we have to manually select these for now. This replaces `selectSelectOpc` with an `emitSelect` function, which performs these optimizations. Differential Revision: https://reviews.llvm.org/D90701	2020-11-12 14:44:01 -08:00
David Sherwood	3225fcf11e	[SVE] Deal with SVE tuple call arguments correctly when running out of registers When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. I've fixed this by ensuring that: 1. When we don't have enough registers to allocate the whole block we mark any remaining SVE registers temporarily as allocated. 2. We temporarily remove the InConsecutiveRegs flags from the last tuple part argument and reinvoke the autogenerated calling convention handler. Doing this prevents the code from entering an infinite recursion and, in combination with 1), ensures we switch over to the Indirect mode. 3. After allocating a GPR register for the pointer to the tuple we then deallocate any SVE registers we marked as allocated in 1). We also set the InConsecutiveRegs flags back how they were before. 4. I've changed the AArch64ISelLowering LowerCALL and LowerFormalArguments functions to detect the start of a tuple, which involves allocating a single stack object and doing the correct numbers of legal loads and stores. Differential Revision: https://reviews.llvm.org/D90219	2020-11-12 08:41:50 +00:00
Amara Emerson	ad376657c1	[AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB.	2020-11-11 22:46:53 -08:00
Jessica Paquette	7a70a2f04d	[AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support When there is full fp16 support, there is no reason to widen 16-bit G_FCONSTANTs to 32 bits. Mark them as legal in this case. Also, we currently import a pattern for materializing a 16-bit 0.0. Add a testcase showing we select it. (All other 16-bit G_FCONSTANTS are not yet selected.) Differential Revision: https://reviews.llvm.org/D89164	2020-11-11 13:25:11 -08:00
David Green	3e5b8d83f7	[AArch4] Regenerate test checks for f16-imm.ll. NFC	2020-11-11 19:42:12 +00:00
Jessica Paquette	c42053f79b	[AArch64][GlobalISel] Select arith extended add/sub in manual selection code The manual selection code for add/sub was not checking if it was possible to fold in shifts + extends (the *rx opcode variants). As a result, we could never select things like ``` cmp x1, w0, uxtw #2 ``` Because we don't import any patterns for compares. This adds support for the arithmetic shifted register forms and updates tests for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91207	2020-11-11 09:26:03 -08:00
Jessica Paquette	f0580c73bb	[AArch64][GlobalISel] Select negative arithmetic immediates in manual selector Previously, we only handled negative arithmetic immediates in the imported selector code. Since we don't import code for, say, compares, we were missing opportunities for things like ``` %cst:gpr(s64) = G_CONSTANT i64 -10 %cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst -> %adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv %cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv ``` Instead, we would have to materialize the constant and emit a SUBS. This adds support for selection like above for SUB, SUBS, ADD, and ADDS. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91108	2020-11-11 09:20:05 -08:00
Kerry McLaughlin	170947a5de	[SVE][CodeGen] Lower scalable masked scatters Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only) Changes included in this patch: - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts. - Added the getCanonicalIndexType function to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes) - Tests with 32 & 64-bit scaled & unscaled offsets Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90941	2020-11-11 11:50:22 +00:00
Amara Emerson	2262393090	[AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG. These do things like turn a multiply of a pow-2+1 into a shift and and add, which is a common pattern that pops up, and is universally better than expensive madd instructions with a constant. I've added check lines to an existing codegen test since the code being ported is almost identical, however the mul by negative pow2 constant tests don't generate the same code because we're missing some generic G_MUL combines still. Differential Revision: https://reviews.llvm.org/D91125	2020-11-10 22:21:13 -08:00
Francesco Petrogalli	9f61931e07	[llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests. For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D90606	2020-11-09 18:27:48 +00:00
Paul Robinson	920befb337	[FastISel] Reduce spills around mem-intrinsic calls FastISel generates instructions to materialize "local values" at the top of a block, in the hope that these values could be reused within the block. To reduce spills and restores, FastISel treats calls as sub-block boundaries, flushing the "local value map" at each call. This patch treats the mem* intrinsics as if they were calls, because at O0 generally they are calls. Eliminating these spills/restores is actually better for debugging (especially a "continue at this line" command), code size, stack frame size, and maybe even performance. Differential Revision: https://reviews.llvm.org/D90877	2020-11-09 09:45:14 -08:00
David Zarzycki	d631e5240c	[testing] Add exhaustive ULT/UGT vector CTPOP to AArch64 and PPC This to help review the impact of https://reviews.llvm.org/D89952 which allows targets to fine tune what SelectionDAG does when vector CTPOP is not legal.	2020-11-09 10:34:01 -05:00
Lucas Prates	c2c2cc1360	[ARM][AArch64] Adding Neoverse V1 CPU support Add support for the Neoverse V1 CPU to the ARM and AArch64 backends. This is based on patches from Mark Murray and Victor Campos. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D90765	2020-11-09 13:15:40 +00:00
Elvina Yakubova	93b99728b1	[AArch64] Add pipeline model for HiSilicon's TSV110 This patch adds the scheduling and cost model for TSV110. Reviewed by: SjoerdMeijer, bryanpkc Differential Revision: https://reviews.llvm.org/D89972	2020-11-07 01:23:00 +03:00
Amara Emerson	f347d78cca	[AArch64][GlobalISel] Add AArch64::G_DUPLANE[X] opcodes for lane duplicates. These were previously handled by pattern matching shuffles in the selector, but adding a new opcode and making it equivalent to the AArch64duplane SDAG node allows us to select more patterns, like lane indexed FMLAs (patch adding a test for that will be committed later). The pattern matching code has been simply moved to postlegalize lowering. Differential Revision: https://reviews.llvm.org/D90820	2020-11-05 11:18:11 -08:00
Cameron McInally	c126eb7529	[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247. Differential Revision: https://reviews.llvm.org/D90644	2020-11-04 14:20:31 -06:00
Kerry McLaughlin	f2412d372d	[SVE][CodeGen] Lower scalable integer vector reductions This patch uses the existing LowerFixedLengthReductionToSVE function to also lower scalable vector reductions. A separate function has been added to lower VECREDUCE_AND & VECREDUCE_OR operations with predicate types using ptest. Lowering scalable floating-point reductions will be addressed in a follow up patch, for now these will hit the assertion added to expandVecReduce() in TargetLowering. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D89382	2020-11-04 11:38:49 +00:00
Michael Liao	4b11201592	[MachineInstr] Add support for instructions with multiple memory operands. - Basically iterate each pair of memory operands from both instructions and return true if any of them may alias. - The exception are memory instructions without any memory operand. They may touch everything and could alias to any memory instruction. Differential Revision: https://reviews.llvm.org/D89447	2020-11-03 20:44:40 -05:00
Amara Emerson	393b55380a	[AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD. For the <2 x float> case, instead of adding another combine or legalization to get it into a <4 x float> form, I'm just adding a GISel specific selection pattern to cover it. Differential Revision: https://reviews.llvm.org/D90699	2020-11-03 17:25:14 -08:00
Hans Wennborg	cbf25fbed5	Revert "[CodeGen] [WinException] Only produce handler data at the end of the function if needed" This caused an explosion in ICF times during linking on Windows when libfuzzer instrumentation is enabled. For a small binary we see ICF time go from ~0 to ~10 s. For a large binary it goes from ~1 s to forevert (I gave up after 30 minutes). See comment on the code review. > If we are going to write handler data (that is written as variable > length data following after the unwind info in .xdata), we need to > emit the handler data immediately, but for cases where no such > info is going to be written, skip emitting it right away. (Unwind > info for all remaining functions that hasn't gotten it emitted > directly is emitted at the end.) > > This does slightly change the ordering of sections (triggering a > bunch of updates to DebugInfo/COFF tests), but the change should be > benign. > > This also matches GCC's assembly output, which doesn't output > .seh_handlerdata unless it actually is needed. > > For ARM64, the unwind info can be packed into the runtime function > entry itself (leaving no data in the .xdata section at all), but > that can only be done if there's no follow-on data in the .xdata > section. If emission of the unwind info is triggered via > EmitWinEHHandlerData (or the .seh_handlerdata directive), which > implicitly switches to the .xdata section, there's a chance of the > caller wanting to pass further data there, so the packed format > can't be used in that case. > > Differential Revision: https://reviews.llvm.org/D87448 This reverts commit `36c64af9d7`.	2020-11-03 13:12:10 +01:00
Sander de Smalen	ba10c514c9	[AArch64][SVE] NFC: Guard all SVE tests for TypeSize warnings. This patch adds a bunch of CHECK lines to guard against implicit conversions of TypeSize -> uint64_t occuring in code-paths that previously were safe for scalable vectors.	2020-11-03 11:29:36 +00:00
Nicholas Guy	54d8627852	[AArch64] Redundant masks in downcast long multiply Adds patterns to catch masks preceeding a long multiply, and generating a single umull/smull instruction instead. Differential revision: https://reviews.llvm.org/D89956	2020-11-03 10:12:28 +00:00
QingShan Zhang	1d178d600a	[Scheduling] Fall back to the fast cluster algorithm if the DAG is too complex We have added a new load/store cluster algorithm in D85517. However, AArch64 see some compiling deg with the new algorithm as the IsReachable() is not cheap if the DAG is complex. O(M+N) See https://bugs.llvm.org/show_bug.cgi?id=47966 So, this patch added a heuristic to switch to old cluster algorithm if the DAG is too complex. Reviewed By: Owen Anderson Differential Revision: https://reviews.llvm.org/D90144	2020-11-02 02:11:52 +00:00

1 2 3 4 5 ...

4158 Commits