llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	6cb0d23f2e	AArch64/GlobalISel: Narrow stack passed argument access size This fixes a verifier error in the testcase from bug 47619. The stack passed s3 value was widened to 4-bytes, and producing a 4-byte memory access with a < 1 byte result type. We need to either widen the result type or narrow the access size. This copies the code directly from the AMDGPU handling, which narrows the load size. I don't like that every target has to handle this, but this is currently broken on the 11 release branch and this is the simplest fix. This reverts commit `42bfa7c63b`.	2020-09-25 13:35:17 -04:00
Amara Emerson	b5e87c9ef2	[AArch64][GlobalISel] Add selection support for <8 x s16> G_INSERT_VECTOR_ELT with GPR scalar. Fixes the neon intrinsics test in the test suite.	2020-09-25 09:51:04 -07:00
Cameron McInally	e2ccf7f178	[SVE] Lower fixed length VECREDUCE_[SMAX\|SMIN] to Scalable This patch is pretty similar to the VECREDUCE_ADD patch, with some minor tweaks. Results from the AArch64ISD::[SMAX\|SMIN]V_PRED return element sized results. This requires an ANY_EXTEND for results < 32-bits, since Legalization promotes those results. There is no NEON i64 vector support for SMAXV\|SMINV, so use SVE for those. Differential Revision: https://reviews.llvm.org/D88259	2020-09-25 09:58:17 -05:00
Momchil Velikov	a88c722e68	[AArch64] PAC/BTI code generation for LLVM generated functions PAC/BTI-related codegen in the AArch64 backend is controlled by a set of LLVM IR function attributes, added to the function by Clang, based on command-line options and GCC-style function attributes. However, functions, generated in the LLVM middle end (for example, asan.module.ctor or __llvm_gcov_write_out) do not get any attributes and the backend incorrectly does not do any PAC/BTI code generation. This patch record the default state of PAC/BTI codegen in a set of LLVM IR module-level attributes, based on command-line options: * "sign-return-address", with non-zero value means generate code to sign return addresses (PAC-RET), zero value means disable PAC-RET. * "sign-return-address-all", with non-zero value means enable PAC-RET for all functions, zero value means enable PAC-RET only for functions, which spill LR. * "sign-return-address-with-bkey", with non-zero value means use B-key for signing, zero value mean use A-key. This set of attributes are always added for AArch64 targets (as opposed, for example, to interpreting a missing attribute as having a value 0) in order to be able to check for conflicts when combining module attributed during LTO. Module-level attributes are overridden by function level attributes. All the decision making about whether to not to generate PAC and/or BTI code is factored out into AArch64FunctionInfo, there shouldn't be any places left, other than AArch64FunctionInfo, which directly examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which is/will-be handled by a separate patch. Differential Revision: https://reviews.llvm.org/D85649	2020-09-25 11:47:14 +01:00
Simon Pilgrim	42bfa7c63b	Revert rGe55410f8b260 : "AArch64/GlobalISel: Add testcase for bug 47619" This reverts commit `e55410f8b2`. This is failing on EXPENSIVE_CHECKS buildbots	2020-09-25 11:31:14 +01:00
Amara Emerson	f7b36b35b6	[AArch64][GlobalISel] Manually select G_DUP with s8/s16 gpr scalar operands. These don't get selected by the imported patterns, and avoiding generating them is a whole load of not-worth-it-hassle (until we have fp types in GlobalISel).	2020-09-25 01:59:16 -07:00
Amara Emerson	ade6fa46f9	[AArch64][GlobalISel] Make <8 x s16> for G_INSERT_VECTOR_ELT legal.	2020-09-25 01:59:16 -07:00
Daniel Kiss	2a96f47c5f	[AArch64] __builtin_return_address for PAuth. This change adds the support for __builtin_return_address for ARMv8.3A Pointer Authentication. Location of the authentication code in the pointer depends on the system configuration, therefore a dedicated instruction is used for effectively removing the authentication code without authenticating the pointer. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D75044	2020-09-24 23:23:49 +02:00
Matt Arsenault	e55410f8b2	AArch64/GlobalISel: Add testcase for bug 47619 This is asserting on the 11 release branch, and wasn't covered by exisiting tests at the time. This was fixed by `b98f902f18`.	2020-09-24 15:44:26 -04:00
Simon Pilgrim	bdd6af3a58	[AArch64] Regenerate dag-numsignbits.ll checks To improve the codegen diff in D87502	2020-09-24 18:40:49 +01:00
Momchil Velikov	bd44558001	[AArch64][GlobalISel] Implement __builtin_return_address for PAC-RET This patch implements stripping of the PAC in the return address for GlobalISel. Implementation for when not using GLobalISel is in https://reviews.llvm.org/D75044 The analogous GCC patch is https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=a70d5d81c41048556fd86eaa1036018a6bfba115 Differential Revision: https://reviews.llvm.org/D84502	2020-09-24 18:04:37 +01:00
Simon Pilgrim	a815578c31	[AArch64] Regenerate dag-combine-mul-shl.ll checks	2020-09-24 13:42:03 +01:00
Eli Friedman	3f739f736b	[SelectionDAG][GISel] Make LegalizeDAG lower FNEG using integer ops. Previously, if a floating-point type was legal, but FNEG wasn't legal, we would use FSUB. Instead, we should use integer ops, to preserve the semantics. (Alternatively, there's a compiler-rt call we could use, but there isn't much reason to use that.) It turns out we actually are still using this obscure codepath in a few cases: on some targets, we have "legal" floating-point types that don't actually support any floating-point operations. In particular, ARM and AArch64 are using this path. The implementation for SelectionDAG is pretty simple because we can reuse the infrastructure from FCOPYSIGN. See also `9a3dc3e`, the corresponding change to type legalization. Also includes a "bonus" change to STRICT_FSUB legalization, so we can lower a STRICT_FSUB to a float libcall. Includes the changes to both LegalizeDAG and GlobalISel so we don't have inconsistent results in the future. Fixes https://bugs.llvm.org/show_bug.cgi?id=46792 . Differential Revision: https://reviews.llvm.org/D84287	2020-09-23 14:10:33 -07:00
Cameron McInally	e8413ac97f	[AArch64] Expand some vector of i64 reductions on NEON With the exception of VECREDUCE_ADD, there are no NEON instructions to support vector of i64 reductions. This patch removes the Custom lowerings for those and adds some test coverage to confirm. Differential Revision: https://reviews.llvm.org/D88161	2020-09-23 16:01:24 -05:00
Eli Friedman	b92d084910	[AArch64][SVE] Fix frame offset calculation when d8 is saved. If d8 is saved, the fp is not actually adjacent to the SVE spills/allocations. Fix the offset calculation to account for this. Differential Revision: https://reviews.llvm.org/D88117	2020-09-23 11:33:53 -07:00
Andrew Wei	c2deacd929	[AArch64] Fix ldst optimization of non-immediate store offset When matching store instruction for ldst opt, we should make sure store instr is in 'reg+imm' form as load instr, otherwise, it will have assertion in isLdOffsetInRangeOfSt since it will use getImm() directly. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87905	2020-09-23 23:00:13 +08:00
Cameron McInally	db40a74344	[SVE] Lower fixed length ISD::VECREDUCE_ADD to Scalable Differential Revision: https://reviews.llvm.org/D87796	2020-09-23 09:08:07 -05:00
Matt Arsenault	c463fd136e	GlobalISel: Fix truncating shift amount in trunc (shl) combine The shift amount type does not necessarily match the result type. This was inserting a trunc from s32 to s32, which asserted. Just preserve the original shift amount type which can be legalized later.	2020-09-23 09:07:50 -04:00
Kerry McLaughlin	d0149ba9b4	[SVE][CodeGen] Lower legal integer -> floating point conversions This patch adds new ISD nodes, SCVTZ_MERGE_PASSTHRU & UCVTZ_MERGE_PASSTHRU, which are used to lower both legal scalable vector [S\|U]INT_TO_FP operations and the following intrinsics: - llvm.aarch64.sve.scvtf - llvm.aarch64.sve.ucvtf Reviewed By: sdesmalen, efriedma Differential Revision: https://reviews.llvm.org/D87913	2020-09-23 11:53:53 +01:00
Philip Reames	e1a3271ebb	[AArch64] Teach analyzeBranch to remove branch equivelent to fallthrough The motivation here is that MachineBlockPlacement relies on analyzeBranch to remove branches to fallthrough blocks when the branch is not fully analyzeable. With the introduction of the FAULTING_OP psuedo for implicit null checking (see D87861), this case becomes important. Note that it's hard to otherwise exercise this path as BranchFolding handle's any fully analyzeable branch sequence without using this interface. p.s. For anyone who saw my comment in the original review, what I thought was an issue in BranchFolding originally turned out to simply be a bug in my patch. (Now fixed.) Differential Revision: https://reviews.llvm.org/D88035	2020-09-22 14:38:27 -07:00
Congzhe Cao	4edb3d3646	[AArch64] Avoid pairing loads with same result reg When pairing ldr instructions to an ldp instruction, we cannot pair two ldr destination registers where one is a sub or super register of the other. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D86906	2020-09-22 16:25:08 -04:00
Muhammad Omair Javaid	73a6a164b8	Revert "Reapply Revert "RegAllocFast: Rewrite and improve"" This reverts commit `55f9f87da2`. Breaks following buildbots: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306 http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154	2020-09-22 14:40:06 +05:00
Amara Emerson	e3f5046e44	[AArch64][GlobalISel] Merge selection of vector-vector G_ASHR/G_LSHR and support more cases. The vector-immediate cases are handled elsewhere in an earlier commit.	2020-09-21 16:04:52 -07:00
Amara Emerson	a513fdec90	[AArch64][GlobalISel] Add a post-legalize combine for lowering vector-immediate G_ASHR/G_LSHR. In order to select the immediate forms using the imported patterns, we need to lower them into new G_VASHR/G_VLSHR target generic ops. Add a combine to do this matching build_vector of constant operands. With this, we get selection for free.	2020-09-21 16:04:52 -07:00
Amara Emerson	825203daae	[AArch64][GlobalISel] Make <4 x s16> G_ASHR and G_LSHR legal. Selection support for these is coming up.	2020-09-21 15:32:48 -07:00
Martin Storsjö	36c64af9d7	[CodeGen] [WinException] Only produce handler data at the end of the function if needed If we are going to write handler data (that is written as variable length data following after the unwind info in .xdata), we need to emit the handler data immediately, but for cases where no such info is going to be written, skip emitting it right away. (Unwind info for all remaining functions that hasn't gotten it emitted directly is emitted at the end.) This does slightly change the ordering of sections (triggering a bunch of updates to DebugInfo/COFF tests), but the change should be benign. This also matches GCC's assembly output, which doesn't output .seh_handlerdata unless it actually is needed. For ARM64, the unwind info can be packed into the runtime function entry itself (leaving no data in the .xdata section at all), but that can only be done if there's no follow-on data in the .xdata section. If emission of the unwind info is triggered via EmitWinEHHandlerData (or the .seh_handlerdata directive), which implicitly switches to the .xdata section, there's a chance of the caller wanting to pass further data there, so the packed format can't be used in that case. Differential Revision: https://reviews.llvm.org/D87448	2020-09-21 23:42:59 +03:00
Matt Arsenault	55f9f87da2	Reapply Revert "RegAllocFast: Rewrite and improve" This reverts commit `dbd53a1f0c`. Needed lldb test updates	2020-09-21 15:45:27 -04:00
Paul Walker	f3fa954b5b	[SVE] Change definition of reduction ISD nodes to have an SVE vector result type. The current nodes, AArch64::SMAXV_PRED for example, are defined to return a NEON vector result. This is incorrect because they modify the complete SVE register and are thus changed to represent such. This patch also adds nodes for UADDV_PRED and SADDV_PRED, which unifies the handling of all SVE reductions. NOTE: Floating-point reductions are already implemented correctly, so this patch is essentially making everything consistent with those. Differential Revision: https://reviews.llvm.org/D87843	2020-09-21 13:16:28 +01:00
Paul Walker	6457455248	[SVE] Use NEON for extract_vector_elt when the index is in range. Patch also adds missing patterns for unpacked vector types and extracts of element zero. Differential Revision: https://reviews.llvm.org/D87842	2020-09-21 13:12:28 +01:00
Amara Emerson	5a50f8b39f	[AArch64][GlobalISel] Add legalization and selection support for <4 x s16> G_SHL.	2020-09-18 23:32:01 -07:00
Eric Christopher	dbd53a1f0c	Temporarily Revert "RegAllocFast: Rewrite and improve" as it's breaking a few tests in the lldb test suite. Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio This reverts commit `c8757ff3aa`.	2020-09-18 18:11:21 -07:00
Amara Emerson	cce24bb38d	[AArch64][GlobalISel] Add tests for pre-existing selection support for <4 x s16> arithmetic/bitwise ops.	2020-09-18 17:13:55 -07:00
Amara Emerson	269bcc39ca	[AArch64][GlobalISel] Legalize arithmetic ops for <4 x s16>	2020-09-18 17:13:55 -07:00
Amara Emerson	5d34d7f1a0	[GlobalISel] Add lowering support for G_ABS and use for AArch64. Differential Revision: https://reviews.llvm.org/D87952	2020-09-18 16:17:18 -07:00
Matt Arsenault	c8757ff3aa	RegAllocFast: Rewrite and improve This rewrites big parts of the fast register allocator. The basic strategy of doing block-local allocation hasn't changed but I tweaked several details: Track register state on register units instead of physical registers. This simplifies and speeds up handling of register aliases. Process basic blocks in reverse order: Definitions are known to end register livetimes when walking backwards (contrary when walking forward then uses may or may not be a kill so we need heuristics). Check register mask operands (calls) instead of conservatively assuming everything is clobbered. Enhance heuristics to detect killing uses: In case of a small number of defs/uses check if they are all in the same basic block and if so the last one is a killing use. Enhance heuristic for copy-coalescing through hinting: We check the first k defs of a register for COPYs rather than relying on there just being a single definition. When testing this on the full llvm test-suite including SPEC externals I measured: average 5.1% reduction in code size for X86, 4.9% reduction in code on aarch64. (ranging between 0% and 20% depending on the test) 0.5% faster compiletime (some analysis suggests the pass is slightly slower than before, but we more than make up for it because later passes are faster with the reduced instruction count) Also adds a few testcases that were broken without this patch, in particular bug 47278. Patch mostly by Matthias Braun	2020-09-18 14:05:18 -04:00
Matt Arsenault	870fd53e4f	Reapply "RegAllocFast: Record internal state based on register units" The regressions this caused should be fixed when https://reviews.llvm.org/D52010 is applied. This reverts commit `a21387c654`.	2020-09-18 14:05:18 -04:00
Amara Emerson	615695de27	[AArch64][GlobalISel] Make <8 x s8> of G_BUILD_VECTOR legal.	2020-09-18 10:32:33 -07:00
Tim Northover	2afe4becec	AArch64: make sure jump table entries can reach entire image This turns all jump table entries into deltas within the target function because in the small memory model all code & static data must be in a 4GB block somewhere in memory. When the entries were a delta between the table location and a basic block, the 32-bit signed entries are not enough to guarantee reachability. https://reviews.llvm.org/D87286	2020-09-18 09:50:40 +01:00
Andrew Wei	8f09cec8c9	[AArch64] Add tests for zext pattern match with AssertZext/AssertSext operand, NFC	2020-09-18 15:02:43 +08:00
Andrew Wei	992698cfbc	[AArch64] Emit zext move when the source of the zext is AssertZext or AssertSext When the source of the zext is AssertZext or AssertSext, it is hard to know any information about the upper 32 bits, so we should insert a zext move before emitting SUBREG_TO_REG to define the lower 32 bits. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87771	2020-09-18 12:48:41 +08:00
Amara Emerson	f5898f8c2d	[AArch64][GlobalISel] Make G_STORE <8 x s8> legal.	2020-09-17 16:42:18 -07:00
Philip Reames	b04c181ed7	[AArch64] Enable implicit null check transformation This change enables the generic implicit null transformation for the AArch64 target. As background for those unfamiliar with our implicit null check support: An implicit null check is the use of a signal handler to catch and redirect to a handler a null pointer. Specifically, it's replacing an explicit conditional branch with such a redirect. This is only done for very cold branches under frontend control w/appropriate metadata. FAULTING_OP is used to wrap the faulting instruction. It is modelled as being a conditional branch to reflect the fact it can transfer control in the CFG. FAULTING_OP does not need to be an analyzable branch to achieve it's purpose. (Or at least, that's the x86 model. I find this slightly questionable.) When lowering to MC, we convert the FAULTING_OP back into the actual instruction, record the labels, and lower the original instruction. As can be seen in the test changes, currently the AArch64 backend does not eliminate the unconditional branch to the fallthrough block. I've tried two approaches, neither of which worked. I plan to return to this in a separate change set once I've wrapped my head around the interactions a bit better. (X86 handles this via AllowModify on analyzeBranch, but adding the obvious code causing BranchFolding to crash. I haven't yet figured out if it's a latent bug in BranchFolding, or something I'm doing wrong.) Differential Revision: https://reviews.llvm.org/D87851	2020-09-17 16:00:19 -07:00
Victor Huang	a4bb71b1c0	Disable hoisting MI to hotter basic blocks when using pgo This is a follow up patch for https://reviews.llvm.org/D63676 to enable the feature when using pgo. Differential Revision: https://reviews.llvm.org/D85240	2020-09-17 14:17:00 -05:00
Cameron McInally	a35c7f3076	[SVE][WIP] Implement lowering for fixed length VSELECT to Scalable Map fixed length VSELECT to its Scalable equivalent. Differential Revision: https://reviews.llvm.org/D85364	2020-09-17 14:02:57 -05:00
Amara Emerson	7d5b103483	[AArch64][GlobalISel] Widen G_EXTRACT_VECTOR_ELT element types if < 8b. In order to not unnecessarily promote the source vector to greater than our native vector size of 128b, I've added some cascading rules to widen based on the number of elements.	2020-09-17 11:50:33 -07:00
Amara Emerson	bea7749d03	[AArch64][GlobalISel] Make <8 x s16> and <16 x s8> legal for shifts.	2020-09-17 11:50:32 -07:00
Amara Emerson	79b21fc187	[AArch64][GlobalISel] Fix bug in fewVectorElts action while legalizing oversize G_FPTRUNC vectors. For <8 x s32> = fptrunc <8 x s64> the fewerElementsVector action tries to break down the source vector into the final source vectors of <2 x s64> using unmerge. This fixes a crash due to using the wrong number of elements for the breakdown type. Also add some legalizer tests for explicitly G_FPTRUNC which we didn't have. Differential Revision: https://reviews.llvm.org/D87814	2020-09-17 08:56:26 -07:00
Sanne Wouda	d5fd3d9b90	[AArch64] Match pairwise add/fadd pattern D75689 turns the faddp pattern into a shuffle with vector add. Match this new pattern in target-specific DAG combine, rather than ISel, because legalization (for v2f32) turns it into a bit of a mess. - extended to cover f16, f32, f64 and i64	2020-09-17 16:27:01 +01:00
Sanne Wouda	3ee87a976d	Precommit test updates	2020-09-17 16:27:01 +01:00
Kerry McLaughlin	f7185b271f	[SVE][CodeGen] Lower floating point -> integer conversions This patch adds new ISD nodes, FCVTZS_MERGE_PASSTHRU & FCVTZU_MERGE_PASSTHRU, which are used to lower scalable vector FP_TO_SINT/FP_TO_UINT operations and the following intrinsics: - llvm.aarch64.sve.fcvtzu - llvm.aarch64.sve.fcvtzs Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D87232	2020-09-17 14:04:22 +01:00
Philip Reames	7af4f44c3e	[aarch64][tests] Add tests which show current lack of implicit null support I will be posting a patch which adds appropriate target support shortly; landing the tests so that the diffs are clear.	2020-09-16 12:55:29 -07:00
Amara Emerson	6ad33d8360	[AArch64][GlobalISel] Make G_BUILD_VECTOR os <16 x s8> legal.	2020-09-16 11:19:47 -07:00
Michael Kitzan	c4e589b795	[GISel] Add new combines for unary FP instrs with constant operand https://reviews.llvm.org/D86393 Patch adds five new `GICombinerRules`, one for each of the following unary FP instrs: `G_FNEG`, `G_FABS`, `G_FPTRUNC`, `G_FSQRT`, and `G_FLOG2`. The combine rules perform the FP operation on the constant operand and replace the original instr with the result. Patch additionally adds new combiner tests for the AArch64 target to test these new combiner rules.	2020-09-16 10:34:15 -07:00
Francesco Petrogalli	15e9a6c211	[llvm][CodeGen] Do not scalarize `llvm.masked.[gather\|scatter]` operating on scalable vectors. This patch prevents the `llvm.masked.gather` and `llvm.masked.scatter` intrinsics to be scalarized when invoked on scalable vectors. The change in `Function.cpp` is needed to prevent the warning that is raised when `getNumElements` is used in place of `getElementCount` on `VectorType` instances. The tests guards for regressions on this change. The tests makes sure that calls to `llvm.masked.[gather\|scatter]` are still scalarized when: # the intrinsics are operating on fixed size vectors, and # the compiler is not targeting fixed length SVE code generation. Reviewed By: efriedma, sdesmalen Differential Revision: https://reviews.llvm.org/D86249	2020-09-16 16:00:28 +00:00
Jessica Paquette	ffe9986de4	[AArch64][GlobalISel] Refactor + improve CMN, ADDS, and ADD emit functions These functions were extremely similar: - `emitADD` - `emitADDS` - `emitCMN` Refactor them a little, introducing a more generic `emitInstr` function to do most of the work. Also add support for the immediate + shifted register addressing modes in each of them. Update select-uaddo.mir to show that selecing ADDS now supports folding immediates + shifts. (I don't think this can impact CMN, because the CMN checks require a G_SUB with a non-constant on the RHS.) This is around a 0.02% code size improvement on CTMark at -O3. Differential Revision: https://reviews.llvm.org/D87529	2020-09-15 17:18:05 -07:00
Aditya Nandakumar	97203cfd6b	[GISel] Add new GISel combiners for G_MUL https://reviews.llvm.org/D87668 Patch adds two new GICombinerRules, one for G_MUL(X, 1) and another for G_MUL(X, -1). G_MUL(X, 1) is an identity combine, and G_MUL(X, -1) gets replaced with G_SUB(0, X). Patch additionally adds new combiner tests for the AArch64 target to test these new combiner rules, as well as updates AMDGPU GISel tests. Patch by mkitzan	2020-09-15 16:08:47 -07:00
Volkan Keles	a4e35cc2ec	GlobalISel: Add combines for G_TRUNC https://reviews.llvm.org/D87050	2020-09-15 15:50:34 -07:00
Muhammad Asif Manzoor	d417488ef5	[AArch64][SVE] Add lowering for llvm fsqrt Add the functionality to lower fsqrt for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D87707	2020-09-15 15:26:17 -04:00
Hans Wennborg	a21387c654	Revert "RegAllocFast: Record internal state based on register units" This seems to have caused incorrect register allocation in some cases, breaking tests in the Zig standard library (PR47278). As discussed on the bug, revert back to green for now. > Record internal state based on register units. This is often more > efficient as there are typically fewer register units to update > compared to iterating over all the aliases of a register. > > Original patch by Matthias Braun, but I've been rebasing and fixing it > for almost 2 years and fixed a few bugs causing intermediate failures > to make this patch independent of the changes in > https://reviews.llvm.org/D52010. This reverts commit `66251f7e1d`, and follow-ups `931a68f26b` and `0671a4c508`. It also adjust some test expectations.	2020-09-15 13:25:41 +02:00
Quentin Colombet	b3afad0463	[GlobalISel] Add a `X, Y = G_UNMERGE(G_ZEXT Z)` -> X = G_ZEXT Z; Y = 0 combine Add a combiner helper to transform unmerge of zext into one zext and a constant 0 Differential Revision: https://reviews.llvm.org/D87427	2020-09-14 17:27:23 -07:00
Quentin Colombet	d2321129bd	[GlobalISel] Add `X,Y<dead> = G_UNMERGE Z` -> X = G_TRUNC Z Add a combiner helper that replaces G_UNMERGE where all the destination lanes are dead except the first one with a G_TRUNC. Differential Revision: https://reviews.llvm.org/D87174	2020-09-14 17:27:23 -07:00
Philip Reames	e6bc7037d3	[AArch64] Statepoint support for AArch64. Differential Revision: https://reviews.llvm.org/D66012 Patch By: loicottet (with major rebase by me)	2020-09-14 16:43:08 -07:00
Quentin Colombet	a36278c2f8	[GlobalISel] Add G_UNMERGE(Cst) -> Cst1, Cst2, ... combine Add a combiner helper that replaces G_UNMERGE of big constants into direct use of smaller constants. Differential Revision: https://reviews.llvm.org/D87166	2020-09-14 16:30:18 -07:00
Aditya Nandakumar	46f9137e43	[GISel]: Add combine for G_FABS to G_FABS https://reviews.llvm.org/D87554 Patch adds one new GICombinerRule for G_FABS. The combine rule folds G_FABS(G_FABS(X)) to G_FABS(X). Patch additionally adds new combiner tests for the AArch64 target to test this new combiner rule. Patch by mkitzan.	2020-09-14 15:56:24 -07:00
Quentin Colombet	670c276232	[GlobalISel] Add G_UNMERGE_VALUES(G_MERGE_VALUES) combine Add the matching and applying function to the combiner helper for G_UNMERGE_VALUES(G_MERGE_VALUES). This combine also supports any merge-like input nodes, like G_BUILD_VECTORS and is robust against bitcasts in between int unmerge and merge nodes. When the input type of the merge node and the output type of the unmerge node are not the same, but the sizes are, the combine still applies but creates bitcasts between the sources and the destinations instead of reusing the destinations directly. Long term, the artifact combiner should probably reuse that helper, but as of today, it doesn't use any outside helper, so I kept it this way. Differential Revision: https://reviews.llvm.org/D87117	2020-09-14 15:45:06 -07:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Nikita Popov	cc94720728	[AArch64] Add additional vecreduce fmax/fmin legalization tests (NFC) Add a vector widening test with ninf flag to the existing fmax tests, and mirror them over into fmin tests.	2020-09-14 21:13:45 +02:00
David Green	06fb4e9064	[CGP] Limit converting phi types to simple loads and stores Instcombine limits converting phi types to simple loads and stores. This does the same in codegenprepare, not processing phis that are not simple. Note that volatile loads/store ISel will happily convert between float and int. Atomics are more likely to always be integer. This just keeps things simple and doesn't process either. Differential Revision: https://reviews.llvm.org/D83770	2020-09-14 12:08:34 +01:00
David Green	9237fde481	[CGP] Prevent optimizePhiType from iterating forever The recently added optimizePhiType algorithm had no checks to make sure it didn't continually iterate backward and forth between float and int types. This means that given an input like store(phi(bitcast(load))), we could convert that back and forth to store(bitcast(phi(load))). This particular case would usually have been simplified to a different load type (folding the bitcast into the load) before CGP, but other cases can occur. The one that came up was phi(bitcast(phi)), where the two phi's of different types were bitcast between. That was not helped by a dead bitcast being kept around which could make conversion look profitable. This adds an extra check of the bitcast Uses or Defs, to make sure that at least one is grounded and will not end up being converted back. It also makes sure that dead bitcasts are removed, and there is a minor change to include newly created Phi nodes in the Visited set so that they do not need to be revisited. Differential Revision: https://reviews.llvm.org/D82676	2020-09-13 16:11:01 +01:00
Craig Topper	ad3d6f993d	[SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors. Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop isn't natively supported by the target, this leads to poor codegen due to the expansion of ctpop being more complex than what is needed for parity. This adds a DAG combine to convert the pattern to ISD::PARITY before operation legalization. Type legalization is updated to handled Expanding and Promoting this operation. If after type legalization, CTPOP is supported for this type, LegalizeDAG will turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a series of shifts and xors followed by an AND with 1. I've avoided vectors in this patch to avoid more legalization complexity for this patch. X86 previously had a custom DAG combiner for this. This is now moved to Custom lowering for the new opcode. There is a minor regression in vector-reduce-xor-bool.ll, but a follow up patch can easily fix that. Fixes PR47433 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87209	2020-09-12 11:42:18 -07:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Martin Storsjö	1308bb99e0	[MC] [Win64EH] Write packed ARM64 epilogues if possible This gives a pretty substantial size reduction; for a 6.5 MB DLL with 300 KB .xdata, the .xdata shrinks by 66 KB. Differential Revision: https://reviews.llvm.org/D87369	2020-09-11 10:31:04 +03:00
Martin Storsjö	46416f0803	[CodeGen] [WinException] Remove a redundant explicit section switch for aarch64 The following EmitWinEHHandlerData() implicitly switches to .xdata, just like on x86_64. This became orphaned from the original code requiring it in `0b61d220c9` / https://reviews.llvm.org/D61095. Differential Revision: https://reviews.llvm.org/D87447	2020-09-11 10:31:04 +03:00
Amara Emerson	0448d11a06	[AArch64][GlobalISel] Don't emit a branch for a fallthrough G_BR at -O0. With optimizations we leave the decision to eliminate fallthrough branches to bock placement, but at -O0 we should do it in the selector to save code size. This regressed -O0 with a recent change to a combiner.	2020-09-10 15:01:26 -07:00
Volkan Keles	d4bf90271f	GlobalISel: Combine fneg(fneg x) to x https://reviews.llvm.org/D87473	2020-09-10 12:57:38 -07:00
Owen Anderson	3d9c85e4d8	Mark FMOV constant materialization as being as cheap as a move. This prevents us from doing things like LICM'ing it out of a loop, which is usually a net loss because we end up having to spill a callee-saved FPR to accomodate it. This does perturb instruction scheduling around this instruction, so a number of tests had to be updated to account for it. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D87316	2020-09-10 16:38:59 +00:00
Kerry McLaughlin	cd89f5c91b	[SVE][CodeGen] Legalisation of truncate for scalable vectors Truncating from an illegal SVE type to a legal type, e.g. `trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>` fails after PromoteIntOp_CONCAT_VECTORS attempts to create a BUILD_VECTOR. This patch changes the promote function to create a sequence of INSERT_SUBVECTORs if the return type is scalable, and replaces these with UNPK+UZP1 for AArch64. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86548	2020-09-10 11:35:33 +01:00
Martin Storsjö	8060283ff8	[llvm-readobj] [ARMWinEH] Print set_fp/add_fp differently in epilogues This matches how e.g. stp/ldp and other opcodes are printed differently for epilogues. Also add a missing --strict-whitespace in an existing test that was added explicitly for testing vertical alignment, and change to using temp files for the generated object files. Differential Revision: https://reviews.llvm.org/D87363	2020-09-10 11:26:43 +03:00
Jessica Paquette	480e7f43a2	[AArch64][GlobalISel] Share address mode selection code for memops We were missing support for the G_ADD_LOW + ADRP folding optimization in the manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD. As a result, we were missing cases like this: ``` @foo = external hidden global i32* define void @baz(i32* %0) { store i32* %0, i32** @foo ret void } ``` https://godbolt.org/z/16r7ad This functionality already existed in the addressing mode functions for the importer. So, this patch makes the manual selection code use `selectAddrModeIndexed` rather than duplicating work. This is a 0.2% geomean code size improvement for CTMark at -O3. There is one code size increase (0.1% on lencod) which is likely because `selectAddrModeIndexed` doesn't look through constants. Differential Revision: https://reviews.llvm.org/D87397	2020-09-09 15:14:46 -07:00
Amara Emerson	a9f7970762	Add REQUIRES: asserts to a test that uses an asserts only flag.	2020-09-09 14:31:12 -07:00
Amara Emerson	e5784ef8f6	[GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator. We weren't using this before, so none of the MachineFunction CFG edges had the branch probability information added. As a result, block placement later in the pipeline was flying blind. This is enabled only with optimizations enabled like SelectionDAG. Differential Revision: https://reviews.llvm.org/D86824	2020-09-09 14:31:12 -07:00
Amara Emerson	467a071285	[GlobalISel][IRTranslator] Generate better conditional branch lowering. This is a port of the functionality from SelectionDAG, which tries to find a tree of conditions from compares that are then combined using OR or AND, before using that result as the input to a branch. Instead of naively lowering the code as is, this change converts that into a sequence of conditional branches on the sub-expressions of the tree. Like SelectionDAG, we re-use the case block codegen functionality from the switch lowering utils, which causes us to generate some different code. The result of which I've tried to mitigate in earlier combine patches. Differential Revision: https://reviews.llvm.org/D86665	2020-09-09 13:16:11 -07:00
Amara Emerson	cc76da7ada	[GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less. This combine previously tried to take sequences like: %cond = G_ICMP pred, a, b G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and by inverting the compare predicate and swapping branch targets, delete the G_BR and instead have a single conditional branch to the falsebb. Since in an earlier patch we have a combine to fold not(icmp) into just an inverted icmp, we don't need this combine to do as much. This patch instead generalizes the combine by just looking for: G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and then inverting the condition using a not (xor). The xor can be folded away in a separate combine. This change also lets us avoid some optimization code in the IRTranslator. I also think that deleting G_BRs in the combiner is unnecessary. That's something that targets can decide to do at selection time and could simplify generic code in future. Differential Revision: https://reviews.llvm.org/D86664	2020-09-09 13:08:16 -07:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
Volkan Keles	1242dd330d	GlobalISel: Combine `op undef, x` to 0 https://reviews.llvm.org/D86611	2020-09-08 09:46:38 -07:00
Simon Wallis	8ee1419ab6	[AARCH64][RegisterCoalescer] clang miscompiles zero-extension to long long Implement AArch64 variant of shouldCoalesce() to detect a known failing case and prevent the coalescing of a 32-bit copy into a 64-bit sign-extending load. Do not coalesce in the following case: COPY where source is bottom 32 bits of a 64-register, and destination is a 32-bit subregister of a 64-bit register, ie it causes the rest of the register to be implicitly set to zero. A mir test has been added. In the test case, the 32-bit copy implements a 32 to 64 bit zero extension and relies on the upper 32 bits being zeroed. Coalescing to the result of the 64-bit load meant overwriting the upper 32 bits incorrectly when the loaded byte was negative. Reviewed By: john.brawn Differential Revision: https://reviews.llvm.org/D85956	2020-09-08 08:04:52 +01:00
Sanjay Patel	7a06b166b1	[DAGCombiner] allow more store merging for non-i8 truncated ops This is a follow-up suggested in D86420 - if we have a pair of stores in inverted order for the target endian, we can rotate the source bits into place. The "be_i64_to_i16_order" test shows a limitation of the current function (which might be avoided if we integrate this function with the other cases in mergeConsecutiveStores). In the earlier "be_i64_to_i16" test, we skip the first 2 stores because we do not match the full set as consecutive or rotate-able, but then we reach the last 2 stores and see that they are an inverted pair of 16-bit stores. The "be_i64_to_i16_order" test alters the program order of the stores, so we miss matching the sub-pattern. Differential Revision: https://reviews.llvm.org/D87112	2020-09-07 14:12:36 -04:00
Jay Foad	713c2ad60c	[GlobalISel] Extend not_cmp_fold to work on conditional expressions Differential Revision: https://reviews.llvm.org/D86709	2020-09-07 09:31:08 +01:00
Muhammad Asif Manzoor	1ffcbe35ae	[AArch64][SVE] Add lowering for rounding operations Add the functionality to lower SVE rounding operations for passthru variant. Created a new test case file for all rounding operations. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86793	2020-09-04 11:16:57 -04:00
David Sherwood	73a3d350a4	[SVE][CodeGen] Fix up warnings in sve-split-insert/extract tests I have fixed up some more ElementCount/TypeSize related warnings in the following tests: CodeGen/AArch64/sve-split-extract-elt.ll CodeGen/AArch64/sve-split-insert-elt.ll In SelectionDAG::CreateStackTemporary we were relying upon the implicit cast from TypeSize -> uint64_t when calling MachineFrameInfo::CreateStackObject. I've fixed this by passing in the known minimum size instead, which I believe is fine because the associated stack id indicates whether this is a scalable object or not. I've also fixed up a case in TargetLowering::SimplifyDemandedBits when extracting a vector element from a scalable vector. The result is a scalar, hence it wasn't caught at the start of the function. If the vector is scalable we just bail out for now. Differential Revision: https://reviews.llvm.org/D86431	2020-09-04 09:51:31 +01:00
Paul Walker	f72121254d	[SVE] Don't reorder subvector/binop sequences when the resulting binop is not legal. When lowering fixed length vector operations for SVE the subvector operations are used extensively to marshall data between scalable and fixed-length vectors. This means that sequences like: extract_subvec(binop(insert_subvec(a), insert_subvec(b))) are very common. DAGCombine only checks if the resulting binop is legal or can be custom lowered when undoing such sequences. When it's custom lowering that is introducing them the result is an infinite legalise->combine->legalise loop. This patch extends the isOperationLegalOr... functions to include a "LegalOnly" parameter to restrict the check to legal operations only. Although isOperationLegal could be used it's common for the affected code paths to be visited pre and post legalisation, so the extra parameter keeps the code tidy. Differential Revision: https://reviews.llvm.org/D86450	2020-09-02 11:01:33 +01:00
Sander de Smalen	f13beac51b	[AArch64][SVE] Preserve full vector regs over EH edge. Unwinders may only preserve the lower 64bits of Neon and SVE registers, as only the registers in the base ABI are guaranteed to be preserved over the exception edge. The caller will need to preserve additional registers for when the call throws an exception and the unwinder has tried to recover state. For e.g. svint32_t bar(svint32_t); svint32_t foo(svint32_t x, bool err) { try { bar(x); } catch (...) { err = true; } return x; } `z0` needs to be spilled before the call to `bar(x)` and reloaded before returning from foo, as the exception handler may have clobbered z0. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84737	2020-09-02 10:54:18 +01:00
Amara Emerson	520ab710fb	Revert "Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")" This reverts commit `8693ddc743`. Re-committing with the test requiring asserts.	2020-09-01 14:29:04 -07:00
Jordan Rupprecht	8693ddc743	Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.") This reverts commit `8ad8f484b6`. It causes crashes when running `ninja check-llvm-codegen-aarch64-globalisel`, e.g. http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132/steps/test-stage1-compiler/logs/stdio. Note that the crash does not seem to reproduce in debug builds. `5ded444252` depends on this, so revert that too.	2020-09-01 13:31:57 -07:00
Owen Anderson	5987da8764	Revert "Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"" This reverts commit `bc9a29b9ee`. The reasoning that this patch was wrong was itself incorrect (see discussion on llvm-commits). This patch does seem to be exposing a latent SVE code generation bug on non-public tests, which should not block a correctness fix for public, non-SVE use cases.	2020-09-01 19:29:03 +00:00
Amara Emerson	5ded444252	[AArch64][GlobalISel] Optimize away a Not feeding a brcond by using tbz instead of tbnz. Usually brconds are fed by compares, but not always, in which case we would miss this fold. Differential Revision: https://reviews.llvm.org/D86413	2020-09-01 11:06:06 -07:00
Amara Emerson	8ad8f484b6	[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _) This is needed for an upcoming change to how we translate conditional branches which might generate these. Differential Revision: https://reviews.llvm.org/D86383	2020-09-01 10:57:17 -07:00
Volkan Keles	061182b7ba	GlobalISel: Add combines for extend operations https://reviews.llvm.org/D86516	2020-09-01 08:50:06 -07:00
Paul Walker	bc9a29b9ee	Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain" This reverts commit `e9d9a61208`. This patch was previously revert by `04879086b4` with the reapplication being done after breaking the assert used to ensure SP is always 16-byte aligned, which is a requirement of the AAPCS. For extra context the latest patch caused runtime failures when building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".	2020-09-01 16:09:37 +01:00
David Sherwood	9fbb113247	[SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll I have fixed up a number of warnings resulting from TypeSize -> uint64_t casts and calling getVectorNumElements() on scalable vector types. I think most of the changes are fairly trivial except for those in DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create the MachineMemoryOperands in a sensible way for scalable vectors. I have added a CHECK line to the following test: CodeGen/AArch64/sve-split-load.ll that ensures no new warnings are added. Differential Revision: https://reviews.llvm.org/D86697	2020-09-01 07:47:59 +01:00
Sanjay Patel	1c9a09f42e	[DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x), better I tried to fix this in: rG716e35a0cf53 ...but that patch depends on the order that we encounter the magic "x/sqrt(x)" expression in the combiner's worklist. This patch should improve that by waiting until we walk the user list to decide if there's a use to skip. The AArch64 test reveals another (existing) ordering problem though - we may try to create an estimate for plain sqrt(x) before we see that it is part of a 1/sqrt(x) expression.	2020-08-31 09:35:59 -04:00
Sanjay Patel	11e0c5b648	[AArch64] add another test for reciprocal sqrt; NFC	2020-08-31 09:35:59 -04:00
Sanjay Patel	716e35a0cf	[DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x) In general, we probably want to try the multi-use reciprocal transform before sqrt transforms, but x/sqrt(x) is a special-case because that will always reduce to plain sqrt(x) or an estimate. The AArch64 tests show that the transform is limited by TLI hook to patterns where there are 3 or more uses of the divisor. So this change can result in an extra division compared to what we had, but that's the intended behvior based on the current setting of that hook.	2020-08-30 10:55:45 -04:00
Sanjay Patel	7692cb1a6f	[AArch64] add tests for multi-use fast sqrt/recip; NFC	2020-08-30 10:55:44 -04:00
Martin Storsjö	5b86d130e2	[AArch64] Generate and parse SEH assembly directives This ensures that you get the same output regardless if generating code directly to an object file or if generating assembly and assembling that. Add implementations of the EmitARM64WinCFI*() methods in AArch64TargetAsmStreamer, and fill in one blank in MCAsmStreamer. Add corresponding directive handlers in AArch64AsmParser and COFFAsmParser. Some SEH directive names have been picked to match the prior art for SEH assembly directives for x86_64, e.g. the spelling of ".seh_startepilogue" matching the preexisting ".seh_endprologue". For the directives for saving registers, the exact spelling from the arm64 documentation is picked, e.g. ".seh_save_reg" (to follow that naming for all the other ones, e.g. ".seh_save_fregp_x"), while the corresponding one for x86_64 is plain ".seh_savereg" without the second underscore. Directives in the epilogues have the same names as in prologues, e.g. .seh_savereg, even though the registers are restored, not saved, at that point. Differential Revision: https://reviews.llvm.org/D86529	2020-08-29 15:15:22 +03:00
Kai Luo	b904324788	[DAGCombiner] Enhance (zext(setcc)) Current `v:t = zext(setcc x,y,cc)` will be transformed to `select x, y, 1:t, 0:t, cc`. It misses some opportunities if x's type size is less than `t`'s size. This patch enhances the above transformation. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86687	2020-08-29 03:37:41 +00:00
Ties Stuij	d678e14c55	[AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported Previously in addTypeForNeon, we would set the operations for bfloat vectors like other generic types. But as bfloat is a storage-only type a number of operations shouldn't be set. This patch fixes that. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85101	2020-08-28 11:44:37 +01:00
Matt Arsenault	0034e00da0	AArch64/GlobalISel: Fix missing function begin marker in test	2020-08-27 16:56:17 -04:00
Matt Arsenault	9607ccf626	GlobalISel: Remove leftover lit.local.cfg The global-isel feature has been required for a long time and was removed in `c9455d3c57`, so this was causing all tests to be skipped.	2020-08-27 13:49:06 -04:00
Mikhail Maltsev	ae1396c7d4	[ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics This patch adjusts the following ARM/AArch64 LLVM IR intrinsics: - neon_bfmmla - neon_bfmlalb - neon_bfmlalt so that they take and return bf16 and float types. Previously these intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from implementation lacking bf16 IR type). The neon_vbfdot[q] intrinsics are adjusted similarly. This change required some additional selection patterns for vbfdot itself and also for vector shuffles (in a previous patch) because of SelectionDAG transformations kicking in and mangling the original code. This patch makes the generated IR cleaner (less useless bitcasts are produced), but it does not affect the final assembly. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86146	2020-08-27 18:43:16 +01:00
Owen Anderson	e9d9a61208	Reapply D70800: Fix AArch64 AAPCS frame record chain Original Commit Message: After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register) may be placed in the middle of a stack frame if a function has both callee-saved general-purpose registers and floating point registers. This will break the stack unwinders that simply walk through the frame records (based on the guarantee from AAPCS64 "The Frame Pointer" section). This commit fixes the problem by adding the frame record offset. Patch By: logan Differential Revision: D70800	2020-08-27 17:29:41 +00:00
Aditya Nandakumar	db464a3dbf	[GISel] Add new GISel combiners for G_SELECT https://reviews.llvm.org/D83833 Patch adds two new GICombinerRules for G_SELECT. The rules include: combining selects with undef comparisons into their first selectee value, and to combine away selects with constant comparisons. Patch additionally adds a new combiner test for the AArch64 target to test these new G_SELECT combiner rules and the existing select_same_val combiner rule. Patch by mkitzan	2020-08-27 09:40:15 -07:00
Mikhail Maltsev	23d5e93f34	[AArch64] Optimize instruction selection for certain vector shuffles This patch adds code to recognize vector shuffles which can be represented as VDUP (splat) of a vector lane with of a different (wider) type than the original vector lane type. For example: shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1> is essentially: shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0> Such patterns are generated by the SelectionDAG machinery in some cases (see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double bitcasts from shuffles" part). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86225	2020-08-27 11:06:49 +01:00
Paul Walker	81337c915f	[SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source. Differential Revision: https://reviews.llvm.org/D86394	2020-08-27 10:57:37 +01:00
Martin Storsjö	04879086b4	Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain" This reverts commit `9936455204`. That commit caused failed assertions e.g. like this: $ cat alloca.c a; b() { float c; d(); a = __builtin_alloca(d); c = e(); f(a); return c; } $ clang -target aarch64-linux-gnu -c alloca.c -O2 clang: ../lib/Target/AArch64/AArch64InstrInfo.cpp:3446: void llvm::emitFrameOffset(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::iterator, const llvm::DebugLoc&, unsigned int, unsigned int, llvm::StackOffset, const llvm::TargetInstrInfo, llvm::MachineInstr::MIFlag, bool, bool, bool): Assertion `(DestReg != AArch64::SP \|\| Bytes % 16 == 0) && "SP increment/decrement not 16-byte aligned"' failed.	2020-08-27 09:39:56 +03:00
Matt Arsenault	0b7f6cc71a	GlobalISel: Add generic instructions for memory intrinsics AArch64, X86 and Mips currently directly consumes these and custom lowering to produce a libcall, but really these should follow the normal legalization process through the libcall/lower action.	2020-08-26 20:08:45 -04:00
Muhammad Asif Manzoor	fd536eeed9	[AArch64][SVE] Add lowering for llvm fceil Add the functionality to lower fceil for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84548	2020-08-26 15:59:44 -04:00
Owen Anderson	9936455204	Reapply D70800: Fix AArch64 AAPCS frame record chain Original Commit Message: After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register) may be placed in the middle of a stack frame if a function has both callee-saved general-purpose registers and floating point registers. This will break the stack unwinders that simply walk through the frame records (based on the guarantee from AAPCS64 "The Frame Pointer" section). This commit fixes the problem by adding the frame record offset. Patch By: logan	2020-08-26 19:38:38 +00:00
Sanjay Patel	54a5dd485c	[DAGCombiner] allow store merging non-i8 truncated ops We have a gap in our store merging capabilities for shift+truncate patterns as discussed in: https://llvm.org/PR46662 I generalized the code/comments for this function in earlier commits, so we only need ease the type restriction and adjust the address/endian checking to make this work. AArch64 lets us switch endian to make sure that patterns are matched either way. Differential Revision: https://reviews.llvm.org/D86420	2020-08-26 15:23:08 -04:00
QingShan Zhang	ebf3b188c6	[Scheduling] Implement a new way to cluster loads/stores Before calling target hook to determine if two loads/stores are clusterable, we put them into different groups to avoid fake cluster due to dependency. For now, we are putting the loads/stores into the same group if they have the same predecessor. We assume that, if two loads/stores have the same predecessor, it is likely that, they didn't have dependency for each other. However, one SUnit might have several predecessors and for now, we just pick up the first predecessor that has non-data/non-artificial dependency, which is too arbitrary. And we are struggling to fix it. So, I am proposing some better implementation. 1. Collect all the loads/stores that has memory info first to reduce the complexity. 2. Sort these loads/stores so that we can stop the seeking as early as possible. 3. For each load/store, seeking for the first non-dependency instruction with the sorted order, and check if they can cluster or not. Reviewed By: Jay Foad Differential Revision: https://reviews.llvm.org/D85517	2020-08-26 12:33:59 +00:00
Sander de Smalen	5f47d4456d	[AArch64][SVE] Fix calculation restore point for SVE callee saves. This fixes an issue where the restore point of callee-saves in the function epilogues was incorrectly calculated when the basic block consisted of only a RET instruction. This caused dealloc instructions to be inserted in between the block of callee-save restore instructions, rather than before it. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86099	2020-08-26 10:02:31 +01:00
Martin Storsjö	db259fe38b	[llvm-readobj] Fix arm64 unwind opcode disassembly printing Add a missing minus, fix vertical alignment of instructions for one opcode. Differential Revision: https://reviews.llvm.org/D86523	2020-08-26 09:38:11 +03:00
Matt Arsenault	1b3de8812d	AArch64: Fix hardcoded register in test	2020-08-25 13:56:39 -04:00
Paul Walker	73ac3c0ede	[SVE] Lower scalable vector ISD::FNEG operations. Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A) transformation when -1 is expressed as an ISD::SPLAT_VECTOR. Differential Revision: https://reviews.llvm.org/D86415	2020-08-25 11:22:28 +01:00
Venkataramanan Kumar	62e91bf563	[DAGCombine]: Fold X/Sqrt(X) to Sqrt(X) With FMF ( "nsz" and " reassoc") fold X/Sqrt(X) to Sqrt(X). This is done after targets have the chance to produce a reciprocal sqrt estimate sequence because that expansion is probably more efficient than an expansion of a non-reciprocal sqrt. That is also why we deferred doing this transform in IR (D85709). Differential Revision: https://reviews.llvm.org/D86403	2020-08-24 18:16:13 -04:00
Sanjay Patel	a74dc598fb	[x86][AArch64] adjust fast-math-flags in tests; NFC This goes with the proposal in D86403.	2020-08-24 18:16:13 -04:00
Sanjay Patel	c1dc44f914	[AArch64] add tests for store merge of truncs; NFC	2020-08-22 14:54:40 -04:00
Cameron McInally	36dbb8fc97	[SVE] Lower fixed length UDIV to scalable Pretty much just a copy of the SDIV patches (D86114 and D85982) with string replacement. Differential Revision: https://reviews.llvm.org/D86316	2020-08-21 09:01:25 -05:00
Jay Foad	0819a6416f	[SelectionDAG] Better legalization for FSHL and FSHR In SelectionDAGBuilder always translate the fshl and fshr intrinsics to FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and ORs. Improve the legalization of FSHL and FSHR to avoid code quality regressions. Differential Revision: https://reviews.llvm.org/D77152	2020-08-21 10:32:49 +01:00
Cameron McInally	8372e47bb9	[NFCI][SVE] Move fixed length i32/i64 SDIV tests Move fixed length SDIV tests from sve-fixed-length-int-arith.ll to sve-fixed-length-int-div.ll. The former uses CHECK lines that verify legalization decisions. That's overkill for the i8/i16 SDIV tests, since they have a tricky legalization.	2020-08-20 14:46:26 -05:00
Cameron McInally	ac63959460	[SVE] Lower fixed length vXi8/vXi16 SDIV to scalable There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32. Differential Revision: https://reviews.llvm.org/D86114	2020-08-20 13:47:01 -05:00
Paul Walker	0015b8db8e	[SVE] Add ISEL patterns for predicated shifts by an immediate. For scalable vector shifts the prediacte is typically all active, which gets selected to an unpredicated shift by immediate. When code generating for fixed length vectors the predicate is based on the vector length and so additional patterns are required to make use of SVE's predicated shift by immediate instructions. Differential Revision: https://reviews.llvm.org/D86204	2020-08-20 11:47:20 +01:00
Konstantin Schwarz	7497b861f4	[GlobalISel][IRTranslator] Support PHI instructions in landingpad blocks The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear before the landingpad instructions, resulting in a fallback to SelectionDAG. This change relaxes the check to allow PHI nodes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86141	2020-08-20 10:49:31 +02:00
Raul Tambre	e887d0e89b	[AArch64][GlobalISel] Handle rtcGPR64RegClassID in AArch64RegisterBankInfo::getRegBankFromRegClass() TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16 and X17, as it's the last matching class. This in turn gets passed to AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable. It seems sensible to handle this case, so copies from X16 and X17 work. Copying from X17 is used in inline assembly in libunwind for pointer authentication. Differential Revision: https://reviews.llvm.org/D85720	2020-08-19 12:52:30 -07:00
Jessica Paquette	d25b12bdc3	[GlobalISel] Add combine for (x & mask) -> x when (x & mask) == x If we have a mask, and a value x, where (x & mask) == x, we can drop the AND and just use x. This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64. In AArch64, this is most useful post-legalization. Patterns like this often show up when legalizing s1s, which must be extended to larger types. e.g. ``` %cmp:_(s32) = G_ICMP ... %and:_(s32) = G_AND %cmp, 1 ``` Since G_ICMP only produces a single bit, there's no reason to mask it with the G_AND. Differential Revision: https://reviews.llvm.org/D85463	2020-08-19 10:20:57 -07:00
Paul Walker	08ba4f112d	[SVE] Add tests for fixed length vector integer operations with immediate operands.	2020-08-19 11:12:03 +01:00
David Sherwood	3f36561f69	[SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorLoads In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only ever deals with fixed width types, hence the offsets for each individual store never take 'vscale' into account. I've changed the code in that function to use TypeSize instead of unsigned for tracking the remaining load amount. In addition, I've changed the load loop to use the new IncrementPointer helper function for updating the addresses in each iteration, since this handles scalable vector types. Also, I've added report_fatal_errors in GenWidenVectorExtLoads, TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores, since these functions currently use a sequence of element-by-element scalar loads/stores. In a similar vein, I've also added a fatal error report in FindMemType for the case when we decide to return the element type for a scalable vector type. I've added new tests in CodeGen/AArch64/sve-split-load.ll CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll for the changes in GenWidenVectorLoads. Differential Revision: https://reviews.llvm.org/D85909	2020-08-19 07:54:32 +01:00
Eli Friedman	be944c85f3	[AArch64][SVE] Add patterns for integer mla/mls. We probably want to introduce pseudo-instructions at some point, like we have for binary operations, but this seems okay for now. One thing I'm not sure about is whether we should be doing this as a DAGCombine instead of directly pattern-matching it. I don't see any big downside to doing it this way, though. Differential Revision: https://reviews.llvm.org/D85681	2020-08-18 12:51:16 -07:00
Eli Friedman	bb18532399	[AArch64][SVE] Allow llvm.aarch64.sve.st2/3/4 with vectors of pointers. This isn't necessaary for ACLE, but could be useful in other situations. And the change is simple. Differential Revision: https://reviews.llvm.org/D85251	2020-08-18 12:51:16 -07:00
Jessica Paquette	f29e6277ad	[GlobalISel][CallLowering] Don't tail call with non-forwarded explicit sret Similar to this commit: `faf8065a99` Testcase is pretty much the same as test/CodeGen/AArch64/tailcall-explicit-sret.ll Except it uses i64 (since we don't handle the i1024 return values yet), and doesn't have indirect tail call testcases (because we can't translate those yet). Differential Revision: https://reviews.llvm.org/D86148	2020-08-18 11:06:57 -07:00
Amara Emerson	04a6ea5d77	[GlobalISel] Add a combine for sext_inreg(load x), c --> sextload x This is restricted to single use loads, which if we fold to sextloads we can find more optimal addressing modes on AArch64. This also fixes an overload the MachineFunction::getMachineMemOperand() method which was incorrectly using the MF alignment instead of the MMO alignment. Differential Revision: https://reviews.llvm.org/D85966	2020-08-18 10:42:15 -07:00
Amara Emerson	40e269ea6d	[GlobalISel] Add a combine for ashr(shl x, c), c --> sext_inreg x, c' By detecting this sign extend pattern early, we can uncover opportunities for more optimizations. Differential Revision: https://reviews.llvm.org/D85965	2020-08-18 10:42:15 -07:00
Jessica Paquette	224a8c639e	[GlobalISel][CallLowering] Look through call parameters for flags We weren't looking through the parameters on calls at all. E.g., say you had ``` declare i32 @zext(i32 zeroext %x) ... %y = call i32 @zext(i32 %something) ... ``` At the point of the call, we wouldn't know that the %something should have the zeroext attribute. This sets flags in about the same way as TargetLoweringBase::ArgListEntry::setAttributes. Differential Revision: https://reviews.llvm.org/D86125	2020-08-18 08:48:56 -07:00
Paul Walker	9f63dc3265	[SVE] Fix shift-by-imm patterns used by asr, lsl & lsr intrinsics. Right shift patterns will no longer incorrectly accept a shift amount of zero. At the same time they will allow larger shift amounts that are now saturated to their upper bound. Patterns have been extended to enable immediate forms for shifts taking an arbitrary predicate. This patch also unifies the code path for immediate parsing so the i64 based shifts are no longer treated specially. Differential Revision: https://reviews.llvm.org/D86084	2020-08-18 11:41:26 +01:00
Paul Walker	cb5cc47a65	[SVE] Lower fixed length vector ISD::SPLAT_VECTOR operations. Also strengthens the CHECK lines for scalable vector splat tests. Differential Revision: https://reviews.llvm.org/D86070	2020-08-18 11:19:43 +01:00
QingShan Zhang	9b32ef9413	[Test][NFC] Add a new test to verify if scheduler can cluster two ld/st even with different preds	2020-08-18 09:42:15 +00:00
Dávid Bolvanský	0f14b2e6cb	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `50c743fa71`. Patch will be split to smaller ones.	2020-08-17 20:44:33 +02:00
Vitaly Buka	e10e7829bf	[StackSafety] Skip ambiguous lifetime analysis If we can't identify alloca used in lifetime marker we need to assume to worst case scenario. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D84630	2020-08-16 18:05:52 -07:00
Amara Emerson	7006bb69ef	[GlobalISel] Enable copy-propagation in post-legalizer combiner. This cleans up copies that the legalizer or other combines leave around. They can occasionally end up escaping as moves. Differential Revision: https://reviews.llvm.org/D85964	2020-08-15 13:44:30 -07:00
Cameron McInally	92593f9e77	[SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors. Differential Revision: https://reviews.llvm.org/D85982	2020-08-14 18:47:22 -05:00

1 2 3 4 5 ...

4101 Commits