llvm-project

Commit Graph

Author	SHA1	Message	Date
Tim Northover	45de42116e	AArch64: use correct operand for ubsantrap immediate. I accidentally pushed the wrong patch originally.	2020-12-09 10:17:16 +00:00
Jessica Paquette	40d1fb2229	[AArch64][GlobalISel] Swap select operands when inverting condition code This was not obvious when reading the imported tablegen patterns in AArch64GenDAGISel. Update select-select.mir.	2020-12-08 14:17:26 -08:00
Jessica Paquette	21308c2b4c	[AArch64][GlobalISel] Check if G_SELECT has been optimized when folding binops `TryFoldBinOpIntoSelect` didn't have a check for `Optimized`, meaning you could end up folding twice. (e.g. a select with a G_ADD on the true side, and a G_SUB on the false side) Add in the missing `if` and a test.	2020-12-08 13:47:08 -08:00
Jessica Paquette	5b5d3fa9d9	[AArch64][GlobalISel] Fold G_SELECT cc, %t, (G_ADD %x, 1) -> CSINC %t, %x, cc This implements ``` G_SELECT cc, %true, (G_ADD %x, 1) -> CSINC %true, %x, cc G_SELECT cc, (G_ADD %x, 1), %false -> CSINC %x, %false, inv_cc ``` Godbolt example: https://godbolt.org/z/eoPqKq Differential Revision: https://reviews.llvm.org/D92868	2020-12-08 10:53:37 -08:00
Jessica Paquette	cd9a52b99e	[AArch64][GlobalISel] Fold binops on the true side of G_SELECT This implements the following folds: ``` G_SELECT cc, (G_SUB 0, %x), %false -> CSNEG %x, %false, inv_cc G_SELECT cc, (G_XOR x, -1), %false -> CSINV %x, %false, inv_cc ``` This is similar to the folds introduced in `5bc0bd05e6`. In `5bc0bd05e6` I mentioned that we may prefer to do this in AArch64PostLegalizerLowering. I think that it's probably better to do this in the selector. The way we select G_SELECT depends on what register banks end up being assigned to it. If we did this in AArch64PostLegalizerLowering, then we'd end up checking every G_SELECT to see if it's worth swapping operands. Doing it in the selector allows us to restrict the optimization to only relevant G_SELECTs. Also fix up some comments in `TryFoldBinOpIntoSelect` which are kind of confusing IMO. Example IR: https://godbolt.org/z/3qPGca Differential Revision: https://reviews.llvm.org/D92860	2020-12-08 10:42:59 -08:00
Jessica Paquette	ce199667f6	[AArch64][GlobalISel] Don't explicitly write to the zero register in emitCMN This case was missed in `78ccb0359d`. Differential Revision: https://reviews.llvm.org/D92438	2020-12-08 10:42:05 -08:00
Jessica Paquette	b15491eb33	[AArch64][GlobalISel] Select G_SADDO and G_SSUBO We didn't have selector support for these. Selection code is similar to `getAArch64XALUOOp` in AArch64ISelLowering. Similar to that code, this returns the AArch64CC and the instruction produced. In SDAG, this is used to optimize select + overflow and condition branch + overflow pairs. (See `AArch64TargetLowering::LowerBR_CC` and `AArch64TargetLowering::LowerSelect`) (G_USUBO should be easy to add here, but it isn't legalized right now.) This also factors out the existing G_UADDO selection code, and removes an unnecessary check for s32/s64. AFAIK, we shouldn't ever get anything other than s32/s64. It makes more sense for this to be handled by the type assertion in `emitAddSub`. Differential Revision: https://reviews.llvm.org/D92610	2020-12-08 09:18:28 -08:00
Tim Northover	c5978f42ec	UBSAN: emit distinctive traps Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can aid in tracking down the root cause of the problem.	2020-12-08 10:28:26 +00:00
Jessica Paquette	d49f6491b6	[AArch64][GlobalISel] Refactor G_BRCOND selection `selectCompareBranch` was hard to understand. Also, it was being needlessly pessimistic with the `ProduceNonFlagSettingCondBr` case. It assumed that everything in `selectCompareBranch` would emit a TB(N)Z or C(B)NZ. That's not true; the G_FCMP + G_BRCOND case would never emit those instructions, and the G_ICMP + G_BRCOND case was capable of emitting an integer compare + Bcc. - Refactor `selectCompareBranch` into separate functions based off of what is feeding the G_BRCOND's condition. - Move G_BRCOND selection code from `select` to `selectCompareBranch`. - Remove duplicated constraint code from the code originally in `select`; `emitTestBit` already handles that, so no need to constrain twice. - Factor out the G_FCMP + G_BRCOND case into `selectCompareBranchFedByFCmp`. - Split the G_ICMP + G_BRCOND case into an optimization function, `tryOptCompareBranchFedByICmp` and a general selection function, `selectCompareBranchFedByICmp`. - Reduce the number of things passed to `tryOptAndIntoCompareBranch`. - Improve documentation. - Give some variables more descriptive names. Other than improving the code generation for functions with speculative_load_hardening by getting the logic correct, this is NFC. Differential Revision: https://reviews.llvm.org/D92582	2020-12-07 17:24:23 -08:00
Jessica Paquette	195a7af0ab	[AArch64][GlobalISel] Narrow 128-bit regs to 64-bit regs in emitTestBit When we have a 128-bit register, emitTestBit would incorrectly narrow to 32 bits always. If the bit number was > 32, then we would need a TB(N)ZX. This would cause a crash, as we'd have the wrong register class. (PR48379) This generalizes `narrowExtReg` into `moveScalarRegClass`. This also allows us to remove `widenGPRBankRegIfNeeded` entirely, since `selectCopy` correctly handles SUBREG_TO_REG etc. This does create some codegen changes (since `selectCopy` uses the `all` regclass variants). However, I think that these will likely be optimized away, and we can always improve the `selectCopy` code. It looks like we should revisit `selectCopy` at this point, and possibly refactor it into at least one `emit` function. Differential Revision: https://reviews.llvm.org/D92707	2020-12-07 15:04:33 -08:00
Jessica Paquette	c82f002cea	[AArch64][GlobalISel] Don't write to WZR in non-flag-setting G_BRCOND case We are avoiding writing to WZR just about everywhere else. Also update the code to use MachineIRBuilder for the sake of consistency. We also didn't have a GlobalISel testcase for this path, so add a simple one now. Differential Revision: https://reviews.llvm.org/D90626	2020-12-01 16:45:37 -08:00
Jessica Paquette	6c3fa97d8a	[AArch64][GlobalISel] Select Bcc when it's better than TB(N)Z Instead of falling back to selecting TB(N)Z when we fail to select an optimized compare against 0, select Bcc instead. Also simplify selectCompareBranch a little while we're here, because the logic was kind of hard to follow. At -O0, this is a 0.1% geomean code size improvement for CTMark. A simple example of where this can kick in is here: https://godbolt.org/z/4rra6P In the example above, GlobalISel currently produces a subs, cset, and tbnz. SelectionDAG, on the other hand, just emits a compare and b.le. Differential Revision: https://reviews.llvm.org/D92358	2020-12-01 15:45:14 -08:00
Amara Emerson	87ff156414	[AArch64][GlobalISel] Fix crash during legalization of a vector G_SELECT with scalar mask. The lowering of vector selects needs to first splat the scalar mask into a vector first. This was causing a crash when building oggenc in the test suite. Differential Revision: https://reviews.llvm.org/D91655	2020-11-30 16:37:49 -08:00
Amara Emerson	ca7fdf7ce0	[AArch64][GlobalISel] Add pre-isel lowering to convert p0 G_DUPs to use s64. This uses the same reasoning as other similar conversions just before selection, without it we miss out on selection because the importer considers s64 and p0 distinct types.	2020-11-23 22:59:35 -08:00
Amara Emerson	0fb76b9035	[AArch64][GlobalISel] Make <2 x p0> of G_SHUFFLE_VECTOR legal.	2020-11-23 22:59:35 -08:00
Amara Emerson	c58df88886	[AArch64][GlobalISel] Make G_EXTRACT_VECTOR_ELT of <2 x p0> legal. Also fix a selection issue for this which was using LLT::isScalar() when it should have been using !isVector(), add test for that too.	2020-11-20 14:07:45 -08:00
Jessica Paquette	5bc0bd05e6	[AArch64][GlobalISel] Fold G_XOR x, -1 into G_SELECT and select CSINV When we see ``` xor = G_XOR xor_lhs, -1 select = G_SELECT cc, tval, xor ``` Fold this into ``` select = CSINV tval, xor_lhs, cc ``` Update select-select.mir to reflect the changes. For now, only handle the case where the G_XOR is the false-value for the G_SELECT. It may make more sense to handle the true-value case in post-legalizer lowering. Differential Revision: https://reviews.llvm.org/D90774	2020-11-16 14:14:14 -08:00
Amara Emerson	0b6090699a	[AArch64][GlobalISel] Look through a G_ZEXT when trying to match shift-extended register offsets. The G_ZEXT in these cases seems to actually come from a combine that we do but SelectionDAG doesn't. Looking through it allows us to match "uxtw #2" addressing modes. Differential Revision: https://reviews.llvm.org/D91475	2020-11-16 10:50:46 -08:00
Jessica Paquette	9a8bfe3835	[AArch64][GlobalISel] Select G_SELECT cc, t, (G_SUB 0, x) -> CSNEG t, x, cc When we see ``` %sub = G_SUB 0, %x %select = G_SELECT %cc, %t, %sub ``` Fold away the G_SUB by producing ``` %select = CSNEG %t, %x, cc ``` Simple IR example: https://godbolt.org/z/K8TEnh This is valid on both sides of the select, but for now, just handle one side. It may make more sense to handle swapping sides during post-legalizer lowering. Differential Revision: https://reviews.llvm.org/D90723	2020-11-13 10:12:51 -08:00
Jessica Paquette	6c20c1da1e	[AArch64][GlobalISel] NFC: Use CmpInst::isUnsigned instead of static helper Reducing some code duplication. We had a helper for checking if a predicate is unsigned. Remove that and use the existing function in Instructions.cpp. Differential Revision: https://reviews.llvm.org/D91288	2020-11-13 09:35:42 -08:00
Jessica Paquette	b184a2eccf	[GlobalISel] Add matchers for specific constants and a matcher for negations It's fairly common to need matchers for a specific constant value, or for common idioms like finding a negated register. Add - `m_SpecificICst`, which returns true when matching a specific value.. - `m_ZeroInt`, which returns true when an integer 0 is matched. - `m_Neg`, which returns when a register is negated. Also update a few places which use idioms related to the new matchers. Differential Revision: https://reviews.llvm.org/D91397	2020-11-13 09:24:54 -08:00
Jessica Paquette	d0ba6c4002	[AArch64][GlobalISel] Select CSINC and CSINV for G_SELECT with constants Select the following: - G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc - G_SELECT cc 0, -1 -> CSINV zreg, zreg cc - G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc - G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc - G_SELECT cc, t, 1 -> CSINC t, zreg, cc - G_SELECT cc, t, -1 -> CSINC t, zreg, cc (IR example: https://godbolt.org/z/YfPna9) These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td. Unfortunately, it doesn't seem like we can import patterns that use NZCV like those ones do. E.g. ``` def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV), (CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>; ``` So we have to manually select these for now. This replaces `selectSelectOpc` with an `emitSelect` function, which performs these optimizations. Differential Revision: https://reviews.llvm.org/D90701	2020-11-12 14:44:01 -08:00
Amara Emerson	ad376657c1	[AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB.	2020-11-11 22:46:53 -08:00
Jessica Paquette	7a70a2f04d	[AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support When there is full fp16 support, there is no reason to widen 16-bit G_FCONSTANTs to 32 bits. Mark them as legal in this case. Also, we currently import a pattern for materializing a 16-bit 0.0. Add a testcase showing we select it. (All other 16-bit G_FCONSTANTS are not yet selected.) Differential Revision: https://reviews.llvm.org/D89164	2020-11-11 13:25:11 -08:00
Jessica Paquette	c42053f79b	[AArch64][GlobalISel] Select arith extended add/sub in manual selection code The manual selection code for add/sub was not checking if it was possible to fold in shifts + extends (the *rx opcode variants). As a result, we could never select things like ``` cmp x1, w0, uxtw #2 ``` Because we don't import any patterns for compares. This adds support for the arithmetic shifted register forms and updates tests for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91207	2020-11-11 09:26:03 -08:00
Jessica Paquette	f0580c73bb	[AArch64][GlobalISel] Select negative arithmetic immediates in manual selector Previously, we only handled negative arithmetic immediates in the imported selector code. Since we don't import code for, say, compares, we were missing opportunities for things like ``` %cst:gpr(s64) = G_CONSTANT i64 -10 %cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst -> %adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv %cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv ``` Instead, we would have to materialize the constant and emit a SUBS. This adds support for selection like above for SUB, SUBS, ADD, and ADDS. This is a 0.1% geomean code size improvement on SPECINT2000 at -Os. Differential Revision: https://reviews.llvm.org/D91108	2020-11-11 09:20:05 -08:00
Amara Emerson	2262393090	[AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG. These do things like turn a multiply of a pow-2+1 into a shift and and add, which is a common pattern that pops up, and is universally better than expensive madd instructions with a constant. I've added check lines to an existing codegen test since the code being ported is almost identical, however the mul by negative pow2 constant tests don't generate the same code because we're missing some generic G_MUL combines still. Differential Revision: https://reviews.llvm.org/D91125	2020-11-10 22:21:13 -08:00
Amara Emerson	f347d78cca	[AArch64][GlobalISel] Add AArch64::G_DUPLANE[X] opcodes for lane duplicates. These were previously handled by pattern matching shuffles in the selector, but adding a new opcode and making it equivalent to the AArch64duplane SDAG node allows us to select more patterns, like lane indexed FMLAs (patch adding a test for that will be committed later). The pattern matching code has been simply moved to postlegalize lowering. Differential Revision: https://reviews.llvm.org/D90820	2020-11-05 11:18:11 -08:00
Amara Emerson	393b55380a	[AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD. For the <2 x float> case, instead of adding another combine or legalization to get it into a <4 x float> form, I'm just adding a GISel specific selection pattern to cover it. Differential Revision: https://reviews.llvm.org/D90699	2020-11-03 17:25:14 -08:00
Fangrui Song	d5adadb3a5	[AArch64][GlobalISel] Fix -Wunused-variable. NFC	2020-10-24 12:47:11 -07:00
Amara Emerson	0f0fd383b4	[AArch64][GlobalISel] Introduce a new post-isel optimization pass. There are two optimizations here: 1. Consider the following code: FCMPSrr %0, %1, implicit-def $nzcv %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv FCMPSrr %0, %1, implicit-def $nzcv %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv This kind of code where we have 2 FCMPs each feeding a CSEL can happen when we have a single IR fcmp being used by two selects. During selection, to ensure that there can be no clobbering of nzcv between the fcmp and the csel, we have to generate an fcmp immediately before each csel is selected. However, often we can essentially CSE these together later in MachineCSE. This doesn't work though if there are unrelated flag-setting instructions in between the two FCMPs. In this case, the SUBS defines NZCV but it doesn't have any users, being overwritten by the second FCMP. Our solution here is to try to convert flag setting operations between a interval of identical FCMPs, so that CSE will be able to eliminate one. 2. SelectionDAG imported patterns for arithmetic ops currently select the flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand to those instructions. However if those impdef operands are not marked as dead, the peephole optimizations are not able to optimize them into non-flag setting variants. The optimization here is to find these dead imp-defs and mark them as such. This pass is only enabled when optimizations are enabled. Differential Revision: https://reviews.llvm.org/D89415	2020-10-23 10:18:36 -07:00
Jessica Paquette	19dc9c9780	[AArch64][GlobalISel] Move imm adjustment for G_ICMP to post-legalizer lowering Move the code which adjusts the immediate/predicate on a G_ICMP to AArch64PostLegalizerLowering. This - Reduces the number of places we need to test for optimized compares in the selector. We know that the compare should have been simplified by the time it hits the selector, so we can avoid testing this in selects, brconds, etc. - Allows us to potentially fold more compares (previously, this optimization was only done after calling `tryFoldCompare`, this may allow us to hit some more TST cases) - Simplifies the selection code in `emitIntegerCompare` significantly; we can just use an emitSUBS function. - Allows us to avoid checking that the predicate has been updated after `emitIntegerCompare`. Also add a utility header file for things that may be useful in the selector and various combiners. No need for an implementation file at this point, since it's just one constexpr function for now. I've run into a couple cases where having one of these would be handy, so might as well add it here. There are a couple functions in the selector that can probably be factored out into here. Differential Revision: https://reviews.llvm.org/D89823	2020-10-22 15:27:36 -07:00
Jessica Paquette	147b9497e7	[AArch64][GlobalISel] Split post-legalizer combiner to allow for lowering at -O0 There are a lot of combines in AArch64PostLegalizerCombiner which exist to facilitate instruction matching in the selector. (E.g. matching for G_ZIP and other shuffle vector pseudos) It still makes sense to select these instructions at -O0. Matching earlier in a combiner can reduce complexity in the selector significantly. For example, a good portion of our selection code for compares would be a lot easier to represent in a combine. This patch moves matching combines into a "AArch64PostLegalizerLowering" combiner which runs at all optimization levels. Also, while we're here, improve the documentation for the AArch64PostLegalizerCombiner, and fix up the filepath in its file comment. And also add a 'r' which somehow got dropped from a bunch of function names. https://reviews.llvm.org/D89820	2020-10-22 14:43:25 -07:00
Amara Emerson	4ad459997e	[AArch64][GlobalISel] Select csinc if a select has a 1 on RHS. Differential Revision: https://reviews.llvm.org/D89513	2020-10-16 16:49:52 -07:00
Amara Emerson	39c05a1a71	[AArch64][GlobalISel] Add selection support for v2s32 and v2s64 reductions for FADD/ADD. We'll need legalizer lower() support for the other types to work. Differential Revision: https://reviews.llvm.org/D89159	2020-10-16 11:41:57 -07:00
Amara Emerson	32f77eea2d	[AArch64][GlobalISel] Regbankselect reductions to use FPR bank for scalars. Differential Revision: https://reviews.llvm.org/D89075	2020-10-16 10:42:15 -07:00
Amara Emerson	9190411fcf	[AArch64][GlobalISel] Add basic legalizer rules for supported add/fadd reductions. NEON is pretty limited in it's reduction support. As a first step add some basic rules for the legal types we can select. Differential Revision: https://reviews.llvm.org/D89070	2020-10-16 10:35:46 -07:00
Jessica Paquette	609d765cd3	[AArch64][GlobalISel] NFC: Refactor emitIntegerCompare Simplify emitIntegerCompare and improve comments + asserts. Mostly making the code a little easier to follow. Also, this code is only used for G_ICMP. The legalizer ensures that the LHS/RHS for every G_ICMP is either a s32 or s64. So, there's no need to handle anything else. This lets us remove a bunch of checks for whether or not we successfully emitted the compare. Differential Revision: https://reviews.llvm.org/D89433	2020-10-15 16:04:08 -07:00
Amara Emerson	78ccb0359d	[AArch64][GlobalISel] Don't use explicit zero registers for compare results. These cause problems for later optimizations, just using an unused vreg like SelectionDAG generates better code in the end, and obviates the need for some GISel specific flag optimizations. Differential Revision: https://reviews.llvm.org/D89419	2020-10-14 16:49:33 -07:00
Jessica Paquette	5402d11b1d	[GlobalISel][AArch64] Don't emit cset for G_FCMPs feeding into G_BRCONDs Similar to the FP case in `AArch64TargetLowering::LowerBR_CC`. Instead of emitting the csets + a tbnz, just emit a compare + bcc (or two bccs, depending on the condition code) This improves cases like this: https://godbolt.org/z/v8hebx This is a 0.1% geomean code size improvement for CTMark at -O3. Differential Revision: https://reviews.llvm.org/D88624	2020-10-01 15:34:16 -07:00
Jessica Paquette	8e8664e55e	[AArch64][GlobalISel] Use emitTestBit in selection for G_BRCOND Partially refactoring, partially fixing a bug. - We shouldn't use TB(N)ZX unless the bit number is >= 32 - We can fold more than xor using emitTestBit Also remove a check which isn't relevant anymore + update tests. Rename select-brcond-of-not.mir to select-brcond-of-binop.mir, since it now tests more than just G_XOR. Differential Revision: https://reviews.llvm.org/D88702	2020-10-01 15:33:34 -07:00
Amara Emerson	017b871502	[AArch64][GlobalISel] Alias rules for G_FCMP to G_ICMP. No need to be different here for the vast majority of rules.	2020-10-01 15:20:09 -07:00
Amara Emerson	e28c5899a2	[AArch64][GlobalISel] Make <8 x s8> integer arithmetic ops legal.	2020-10-01 14:35:21 -07:00
Amara Emerson	a97e97faed	[AArch64][GlobalISel] Make <8 x s8> shifts legal and add selection support.	2020-10-01 14:21:18 -07:00
Amara Emerson	9a2b3bbc59	Revert "[AArch64][GlobalISel] Make <8 x s8> shifts legal." Accidentally pushed this.	2020-10-01 14:15:57 -07:00
Amara Emerson	8071c2f5c6	[AArch64][GlobalISel] Make <8 x s8> shifts legal.	2020-10-01 14:10:10 -07:00
Amara Emerson	9f6acb1358	[AArch64][GlobalISel] Merge G_SHL, G_ASHR and G_LSHR legalizer rules together. There's no need for any difference between these.	2020-10-01 14:02:45 -07:00
Amara Emerson	73457536ff	[AArch64][GlobalISel] Use custom legalization for G_TRUNC for v8i8 vectors. Truncating to v8i8 is a case where we want to split the source but also generate intermediate truncates to reduce the size of the source vector before truncating down to v8i8. This implements the same strategy as what SelectionDAG does, but I'm not certain where if anywhere in generic code it should live. Use it for legalization of v8s8 = G_ICMP v8s32. Differential Revision: https://reviews.llvm.org/D88191	2020-10-01 13:22:00 -07:00
Amara Emerson	4c265ce665	[AArch64][GlobalISel] Camp oversize v4s64 G_FPEXT operations.	2020-10-01 13:08:31 -07:00
Amara Emerson	da11479fd1	[AArch64][GlobalISel] Select all-zero G_BUILD_VECTOR into a zero mov. Unfortunately the leaf SDAG patterns aren't supported yet so we need to do this manually, but it's not a significant amount of code anyway. Differential Revision: https://reviews.llvm.org/D87924	2020-09-30 23:53:38 -07:00

1 2 3

133 Commits