llvm-project

Commit Graph

Author	SHA1	Message	Date
Thomas Preud'homme	3ba246719b	[test, AArch64] Fix use of var defined in CHECK-NOT LLVM test CodeGen/AArch64/speculation-hardening.ll tries to check for the absence of a sequence of instructions with several CHECK-NOT with one of those directives using a variable defined in another. However CHECK-NOT are checked independently so that is using a variable defined in a pattern that should not occur in the input. This commit removes the dependency between those CHECK-NOT by replacing single occurence of the undefined variable by a regex match, and multiple occurences by a definition followed by a use. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D99866	2021-04-06 21:15:15 +01:00
Thomas Preud'homme	638d70be6b	[test, AArch64] Fix use of var defined in CHECK-NOT LLVM test CodeGen/AArch64/aarch64-tbz.ll tries to check for the absence of a sequence of instructions with several CHECK-NOT with one of those directives using a variable defined in another. However CHECK-NOT are checked independently so that is using a variable defined in a pattern that should not occur in the input. This commit removes the definition and uses of variable to check each line independently, making the check stronger than the current one. It also removes unnecessary regex match for labels. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99602	2021-04-06 10:45:08 +01:00
Sjoerd Meijer	d5f1131c81	[AArch64] Default to zero-cycle-zeroing FP registers It is generally beneficial to prefer "movi d0, #0" over "fmov s0, wzr" as this is most efficient across all cores; it is recognised as a zeroing idiom. For newer cores, fmov instructions can also be eliminated early and there is no difference with movi, but some implementations lack this so is not true for other/older cores. Thus this standardises on using movi as this should always gives the same or better performance than the fmov with wzr. Differential Revision: https://reviews.llvm.org/D99586	2021-04-06 09:47:50 +01:00
Sjoerd Meijer	ef05b08c61	[AArch64] Use 64-bit movi for zeroing halfs/floats This was using the .2d variant which zeros 128 bits, but using the .2s variant that zeros 64 bits is faster on some cores. This is a prep step for D99586 to always using movi for zeroing floats. Differential Revision: https://reviews.llvm.org/D99710	2021-04-06 08:42:13 +01:00
Nikita Popov	665065821e	[FastISel] Remove kill tracking This is a followup to D98145: As far as I know, tracking of kill flags in FastISel is just a compile-time optimization. However, I'm not actually seeing any compile-time regression when removing the tracking. This probably used to be more important in the past, before FastRA was switched to allocate instructions in reverse order, which means that it discovers kills as a matter of course. As such, the kill tracking doesn't really seem to serve a purpose anymore, and just adds additional complexity and potential for errors. This patch removes it entirely. The primary changes are dropping the hasTrivialKill() method and removing the kill arguments from the emitFast methods. The rest is mechanical fixup. Differential Revision: https://reviews.llvm.org/D98294	2021-04-03 15:50:13 +02:00
Brendon Cahoon	09a88278cb	[GlobalISel] Allow different types for G_SBFX and G_UBFX operands Change the definition of G_SBFX and G_UBFX so that the lsb and width can have different types than the src and dst operands. Differential Revision: https://reviews.llvm.org/D99739	2021-04-02 11:11:06 -04:00
Jun Ma	2dfa2c0ea0	[NFC][SVE] update sve-intrinsics-int-arith.ll under update_llc_test_checks.py	2021-04-02 20:17:11 +08:00
Jun Ma	274ac9d40e	[AArch64][SVE] Lowering sve.dot to DOT node Differential Revision: https://reviews.llvm.org/D99699	2021-04-02 20:05:17 +08:00
Mircea Trofin	ce61def529	[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs() needs to be called before iterating the interfering live ranges. The rest of the patch offers support that is the case: instead of clearing the query's InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix is invalidated (existing triggering mechanism). Without the change in RegAllocGreedy.cpp, the compiler ices. This patch should make it more easily discoverable by developers that collectInterferringVregs needs to be called before iterating. I will follow up with a subsequent patch to improve the usability and maintainability of Query. Differential Revision: https://reviews.llvm.org/D98232	2021-04-01 08:33:28 -07:00
Bradley Smith	2f45e632c0	[AArch64][SVE] Improve codegen for select nodes with fixed types Additionally, move the existing fixed vselect tests to *-vselect.ll. Differential Revision: https://reviews.llvm.org/D99418	2021-04-01 15:54:37 +01:00
Bradley Smith	0934fa4f5d	[AArch64][SVE] SVE functions should use the SVE calling convention for fast calls When an SVE function calls another SVE function using the C calling convention we use the more efficient SVE VectorCall PCS. However, for the Fast calling convention we're incorrectly falling back to the generic AArch64 PCS. This patch adds the same "can use SVE vector calling convention" detection used by CallingConv::C to CallingConv::Fast. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D99657	2021-04-01 15:52:08 +01:00
Simonas Kazlauskas	777a58e05b	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
Bradley Smith	4e52daa254	[AArch64][SVE] Add tests for UREM/SREM using fixed SVE types Differential Revision: https://reviews.llvm.org/D99265	2021-03-31 16:09:55 +01:00
Florian Hahn	52e015081a	[AArch64] Avoid SCALAR_TO_VECTOR for single FP constant vector. Currently the code only checks for integer constants (ConstantSDNode) and triggers an infinite cycle for single-element floating point vector constants. We need to check for both FP and integer constants. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D99384	2021-03-31 10:17:36 +01:00
Amara Emerson	a35c2c7942	[GlobalISel] Implement fewerElements legalization for vector reductions. This patch adds 3 methods, one for power-of-2 vectors which use tree reductions using vector ops, before a final reduction op. For non-pow-2 types it generates multiple narrow reductions and combines the values with scalar ops. Differential Revision: https://reviews.llvm.org/D97163	2021-03-30 11:19:21 -07:00
Amara Emerson	1bc90847ee	[AArch64][GlobalISel] Define some legalization rules for G_ROTR and G_ROTL. For imported pattern purposes, we have a custom rule that promotes the rotate amount to 64b as well. Differential Revision: https://reviews.llvm.org/D99463	2021-03-30 11:11:19 -07:00
Amara Emerson	91887cd4ec	[AArch64][GlobalISel] Combine funnel shifts to rotates. Differential Revision: https://reviews.llvm.org/D99388	2021-03-30 11:00:36 -07:00
Jessica Paquette	700431128e	[GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG. This is only done post-legalization for now. Once the legalizer knows how to decompose these back into shifts, this requirement can probably be removed. Differential Revision: https://reviews.llvm.org/D99230	2021-03-30 10:14:30 -07:00
Joe Ellis	a7dde4c5f7	[AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98496	2021-03-30 09:37:11 +00:00
Joe Ellis	c4d39f64d0	[AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98625	2021-03-30 09:35:44 +00:00
Jun Ma	1af373c673	[AArch64][SVE] Codegen dup_lane for dup(vector_extract) Differential Revision: https://reviews.llvm.org/D99324	2021-03-30 10:35:08 +08:00
Jun Ma	b0db2dbc29	[AArch64][SVEIntrinsicOpts] Optimize tbl+dup into dup+extractelement Differential Revision: https://reviews.llvm.org/D99412	2021-03-30 10:35:08 +08:00
Jessica Paquette	247ff26a89	[AArch64][GlobalISel] NFC: Replace IR regbankselect test with MIR test regbank-ceil.ll -> regbank-ceil.mir The IR test was intended to only check register banks. This makes it brittle, especially as we improve load/store combines in GlobalISel. Rewriting this as a MIR test also makes it more consistent with the rest of the testcases in GlobalISel.	2021-03-29 16:32:34 -07:00
Florian Hahn	482283042f	[AArch64] Remove custom zext/sext legalization code. Currently performExtendCombine assumes that the src-element bitwidth * 2 is a valid MVT. But this is not the case for i1 and it causes a crash on the v64i1 test cases added in this patch. It turns out that this code appears to not be needed; the same patterns are handled by other code and we end up with the same results, even without the custom lowering. I also added additional test cases in `a50037aaa6`. Let's just remove the unneeded code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99437	2021-03-29 22:22:05 +01:00
Florian Hahn	a50037aaa6	[AArch64] Add a few more vector extension tests.	2021-03-29 18:56:00 +01:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Matt Arsenault	2f779e79d5	AArch64/GlobalISel: Remove IR section from test	2021-03-28 11:12:59 -04:00
Amara Emerson	55533203d7	[GlobalISel] Add G_ROTR and G_ROTL opcodes for rotates. Differential Revision: https://reviews.llvm.org/D99383	2021-03-25 17:23:30 -07:00
Jessica Paquette	23f657c165	[AArch64][GlobalISel] Emit bzero on Darwin Darwin platforms for both AArch64 and X86 can provide optimized `bzero()` routines. In this case, it may be preferable to use `bzero` in place of a memset of 0. This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can be generated by platforms which may want to use bzero. To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The conditions for this are largely a port of the bzero case in `AArch64SelectionDAGInfo::EmitTargetCodeForMemset`. The only difference in comparison to the SelectionDAG code is that, when compiling for minsize, this will fire for all memsets of 0. The original code notes that it's not beneficial to do this for small memsets; however, using bzero here will save a mov from wzr. For minsize, I think that it's preferable to prioritise omitting the mov. This also fixes a bug in the libcall legalization code which would delete instructions which could not be legalized. It also adds a check to make sure that we actually get a libcall name. Code size improvements (Darwin): - CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign) - CTMark -Oz: -0.2% geomean (-0.5% on bullet) Differential Revision: https://reviews.llvm.org/D99358	2021-03-25 17:14:25 -07:00
Amara Emerson	0d2c4db637	[GlobalISel] Fix crash in RBS with a non-generic IMPLICIT_DEF. This may occur when swifterror codegen in the translator generates these, but we shouldn't try to handle them since they should have regclasses anyway. rdar://75784009 Differential Revision: https://reviews.llvm.org/D99287	2021-03-24 23:08:51 -07:00
Jessica Paquette	a141c7d06b	[AArch64][GlobalISel] Select G_SBFX and G_UBFX Add selection support for G_SBFX and G_UBFX and add a test. These must always have a constant LSB and width. Differential Revision: https://reviews.llvm.org/D99224	2021-03-24 11:15:57 -07:00
Jessica Paquette	1818dc394f	[AArch64][GlobalISel] Mark G_SBFX/G_UBFX as legal for s32 and s64 This isn't perfect, since we should also verify that these only use constants. Differential Revision: https://reviews.llvm.org/D99219	2021-03-24 11:08:41 -07:00
Nashe Mncube	ac2a1e9596	[SVE] Suppress vselect warning from incorrect interface call The VSelectCombine handler within AArch64ISelLowering, uses an interface call which only expects fixed vectors. This generates a warning when the call is made on a scalable vector. This warning has been suppressed with this change, by using the ElementCount interface, which supports both fixed and scalable vectors. I have also added a regression test which recreates the warning. Differential Revision: https://reviews.llvm.org/D98249	2021-03-24 14:34:34 +00:00
Amara Emerson	45a7fe1911	[AArch64][GlobalISel] Add test for G_FSHR legalization.	2021-03-23 16:11:45 -07:00
Amara Emerson	7bddf00581	[AArch64][GlobalISel] Lower G_FSHL and G_FSHR. Codegen isn't as good as we need it, but that'll be done later.	2021-03-23 16:09:19 -07:00
Amara Emerson	75b6a47bd0	[AArch64][GlobalISel] Lower G_CTLZ_ZERO_UNDEF. This adds some missing legalizer tests, which uncovered a v2s64 selection test that wasn't working since there's no legalization or instruction for that.	2021-03-23 12:49:10 -07:00
David Sherwood	748ae5281d	[IR][SVE] Add new llvm.experimental.stepvector intrinsic This patch adds a new llvm.experimental.stepvector intrinsic, which takes no arguments and returns a linear integer sequence of values of the form <0, 1, ...>. It is primarily intended for scalable vectors, although it will work for fixed width vectors too. It is intended that later patches will make use of this new intrinsic when vectorising induction variables, currently only supported for fixed width. I've added a new CreateStepVector method to the IRBuilder, which will generate a call to this intrinsic for scalable vectors and fall back on creating a ConstantVector for fixed width. For scalable vectors this intrinsic is lowered to a new ISD node called STEP_VECTOR, which takes a single constant integer argument as the step. During lowering this argument is set to a value of 1. The reason for this additional argument at the codegen level is because in future patches we will introduce various generic DAG combines such as mul step_vector(1), 2 -> step_vector(2) add step_vector(1), step_vector(1) -> step_vector(2) shl step_vector(1), 1 -> step_vector(2) etc. that encourage a canonical format for all targets. This hopefully means all other targets supporting scalable vectors can benefit from this too. I've added cost model tests for both fixed width and scalable vectors: llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll as well as codegen lowering tests for fixed width and scalable vectors: llvm/test/CodeGen/AArch64/neon-stepvector.ll llvm/test/CodeGen/AArch64/sve-stepvector.ll See this thread for discussion of the intrinsic: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html	2021-03-23 10:43:35 +00:00
Joe Ellis	6dc32da1b0	[AArch64][SVE] Test more types in sve-fixed-length-subvector.ll Previously only the i32 type was tested. Now, the {i,f}{16,32,64} types are tested. The v8{i,f}16 cases lower differently to the other cases, which is worth defending. The lowering for the other cases is currently identical, but probably worth having for the better coverage. Differential Revision: https://reviews.llvm.org/D98690	2021-03-22 14:09:05 +00:00
Sjoerd Meijer	7515e81e8c	[AArch64] Add some float -> int -> float conversion patterns This adds some conversion match patterns for which we want to keep the int values in FP registers using the corresponding NEON instructions (not the FP instructions) to avoid more costly int <-> fp register transfers. Differential Revision: https://reviews.llvm.org/D98956	2021-03-22 11:06:08 +00:00
Jessica Paquette	0ca83730cc	Recommit "[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE" This reverts commit `962b73dd0f`. This commit was reverted because of some internal SPEC test failures. It turns out that this wasn't actually relevant to anything in open source, so it's safe to recommit this.	2021-03-18 16:01:02 -07:00
Peter Waller	0d6482a76a	[llvm][AArch64][SVE] Lower fixed length vector fabs Seemingly striaghtforward. Differential Revision: https://reviews.llvm.org/D98434	2021-03-18 17:20:08 +00:00
Matt Arsenault	61f834cc09	GlobalISel: Insert memcpy for outgoing byval arguments byval requires an implicit copy between the caller and callee such that the callee may write into the stack area without it modifying the value in the parent. Previously, this was passing through the raw pointer value which would break if the callee wrote into it. Most of the time, this copy can be optimized out (however we don't have the optimization SelectionDAG does yet). This will trigger more fallbacks for AMDGPU now, since we don't have legalization for memcpy yet (although we should stop using byval anyway).	2021-03-18 09:16:54 -04:00
Thomas Preud'homme	b79044391e	[test] Fix incorrect use of string variable use LLVM test CodeGen/AArch64/machine-outliner-retaddr-sign-thunk.ll uses a string substitution block that contains a regex matching block. This seems like as a copy/paste from other similar test where the match also defines a variable, hence the [[]] syntax. In this case however this is a CHECK-NOT variable so nothing should match. No variable definition is thus expected and the square brackets can be dropped. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D98853	2021-03-18 12:19:51 +00:00
Sjoerd Meijer	90ecb862a0	[AArch64] Rewrite (add, csel) to cinc Don't rewrite an add instruction with 2 SET_CC operands into a csel instruction. The total instruction sequence uses an extra instruction and register. Preventing this allows us to match a `(add, csel)` pattern and rewrite this into a `cinc`. Differential Revision: https://reviews.llvm.org/D98704	2021-03-18 08:49:27 +00:00
Amara Emerson	28963d895b	[GlobalISel] Don't DCE LIFETIME_START/LIFETIME_END markers. These are pseudos without any users, so DCE was killing them in the combiner. Marking them as having side effects doesn't seem quite right since they don't. Gives a nice 0.3% geomean size win on CTMark -Os. Differential Revision: https://reviews.llvm.org/D98811	2021-03-17 18:02:08 -07:00
Amara Emerson	d7fed7b899	[AArch64][GlobalISel] Fall back if disabling neon/fp in the translator. The previous technique relied on early-exiting the legalizer predicate initialization, leaving an empty rule table. That causes a fallback for most instructions, but some have legacy rules defined like G_ZEXT which can try continue, but then crash. We should fall back earlier, in the translator, to avoid this issue. Differential Revision: https://reviews.llvm.org/D98730	2021-03-17 15:08:08 -07:00
Pavel Iliin	bd79b565e3	[NFC][AArch64] Add codegen tests for various csinc-cmp sequences.	2021-03-17 20:17:40 +00:00
Bradley Smith	cf0da91ba5	[AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE Previously NEON used a target specific intrinsic for frintn, given that the FROUNDEVEN ISD node now exists, move over to that instead and add codegen support for that node for both NEON and fixed length SVE. Differential Revision: https://reviews.llvm.org/D98487	2021-03-17 11:41:22 +00:00
Joe Ellis	ff2dd8a212	[AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible This commit folds sxtw'd or uxtw'd offsets into gather loads where possible with a DAGCombine optimization. As an example, the following code: 1 #include <arm_sve.h> 2 3 svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) { 4 return svld1sw_gather_s64offset_u64( 5 pred, base, svextw_s64_x(pred, offsets) 6 ); 7 } would previously lower to the following assembly: sxtw z0.d, p0/m, z0.d ld1sw { z0.d }, p0/z, [x0, z0.d] ret but now lowers to: ld1sw { z0.d }, p0/z, [x0, z0.d, sxtw] ret Differential Revision: https://reviews.llvm.org/D97858	2021-03-16 15:09:46 +00:00
Joe Ellis	14bd44edc6	[AArch64][SVEIntrinsicOpts] Factor out redundant SVE mul/fmul intrinsics This commit implements an IR-level optimization to eliminate idempotent SVE mul/fmul intrinsic calls. Currently, the following patterns are captured: fmul pg (dup_x 1.0) V => V mul pg (dup_x 1) V => V fmul pg V (dup_x 1.0) => V mul pg V (dup_x 1) => V fmul pg V (dup v pg 1.0) => V mul pg V (dup v pg 1) => V The result of this commit is that code such as: 1 #include <arm_sve.h> 2 3 svfloat64_t foo(svfloat64_t a) { 4 svbool_t t = svptrue_b64(); 5 svfloat64_t b = svdup_f64(1.0); 6 return svmul_m(t, a, b); 7 } will lower to a nop. This commit does not capture all possibilities; only the simple cases described above. There is still room for further optimisation. Differential Revision: https://reviews.llvm.org/D98033	2021-03-16 14:50:17 +00:00

1 2 3 4 5 ...

4505 Commits