llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	376233113e	[RISCV] Use TargetConstant for CSR number for READ_CSR/WRITE_CSR. This is consistent with what we do for other operands that are required to be constants. I don't think this results in any real changes. The pattern match code for isel treats ConstantSDNode and TargetConstantSDNode the same.	2021-11-08 15:10:24 -08:00
Ard Biesheuvel	2caf85ad7a	[ARM] implement LOAD_STACK_GUARD for remaining targets Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and other targets rely on the generic support which may result in spilling of the stack canary value or address, or may cause it to be kept in a callee save register across function calls, which means they essentially get spilled as well, only by the callee when it wants to free up this register. So let's implement LOAD_STACK GUARD for other targets as well. This ensures that the load of the stack canary is rematerialized fully in the epilogue. This code was split off from D112768: [ARM] implement support for TLS register based stack protector for which it is a prerequisite. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112811	2021-11-08 22:59:15 +01:00
Michael Liao	bf225939bc	[InferAddressSpaces] Support assumed addrspaces from addrspace predicates. - CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC. - This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g., ``` foo(float *p) { __builtin_assume(__isGlobal(p)); // From there, we could assume p is a global pointer instead of a // generic one. } ``` This makes the code portable without introducing the implementation-specific features. Note that NVCC starts to support __builtin_assume from version 11. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112041	2021-11-08 16:51:57 -05:00
Craig Topper	304edbb553	[RISCV] SMUL_LOHI/UMUL_LOHI should expand for RVV. These and MULHS/MULHU both default to Legal. Targets need to set the ones they don't support to Expand. I think MULHS/MULHU likely has priority in most places so this change probably isn't directly testable. I found it while looking at disabling MULHS/MULHU for nxvXi64 as required for Zve64x. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113325	2021-11-08 09:38:36 -08:00
Sander de Smalen	2829376bb2	[LV] Use VScaleForTuning to fine-tune the cost per lane. When targeting a specific CPU with scalable vectorization, the knowledge of that particular CPU's vscale value can be used to tune the cost-model and make the cost per lane less pessimistic. If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane is calculated as: Cost / (VScaleForTuning * VF.KnownMinLanes) Otherwise, it assumes a value of 1 meaning that the behavior is unchanged and calculated as: Cost / VF.KnownMinLanes Reviewed By: kmclaughlin, david-arm Differential Revision: https://reviews.llvm.org/D113209	2021-11-08 16:59:46 +00:00
Joe Nash	79f52af4cd	[AMDGPU] Make getInstSizeInBytes more generic NFC. This check mainly handles size affecting literals. Make it check all explicit operands instead of a few by name. Also make the isLiteral check handle the KIMM operands, see https://reviews.llvm.org/D111067 Change-Id: I1a362d55b2a10f5c74d445272e8b7829a8b77597 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D113318 Change-Id: Ie6c688f30a71e0335d1c6dd1ff65019bd7ce684e	2021-11-08 10:34:49 -05:00
Simon Pilgrim	7f32edea23	[X86] combineMulToPMADDWD - use ComputeMinSignedBits(). NFCI. Use ComputeMinSignedBits() to ensure the mul source operands at least sign-extend down from the bottom 16 bits. This will make it easier if/when we try to support handling of source types larger than 32-bits.	2021-11-08 15:28:31 +00:00
Mindong Chen	495e258fd7	[AArch64][SVE] Add FP types to the supported SVE structure load/stores vector type list This adds FP type support to the SVE Container type list as a supplement to D112303. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D113333	2021-11-08 22:29:08 +08:00
Simon Pilgrim	f059b04f7b	[DAG] Add SelectionDAG::ComputeMinSignedBits helper As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper. I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious. Differential Revision: https://reviews.llvm.org/D113396	2021-11-08 14:12:45 +00:00
David Sherwood	8d38c24fb6	[SVE][CodeGen] Improve codegen for some FP insert_subvector cases When inserting an unpacked FP subvector into a packed vector we can simply cast the unpacked value into a packed value, since both types are legal for SVE. We can then use this as the input for the UZP instruction. This avoids us expanding the operation by going through the stack. Differential Revision: https://reviews.llvm.org/D113270	2021-11-08 13:45:55 +00:00
Matt	4a59694ba1	[AArch64][SVE] Combine FADD and FMUL aarch64 intrinsics to FMLA This is a refinement to the work in https://reviews.llvm.org/D111638 Fold (fadd p a (fmul p b c)) into (fma p a b c) Differential Revision: https://reviews.llvm.org/D113095	2021-11-08 12:22:38 +00:00
Simon Pilgrim	f60d3ec0c7	[DAG] Add BuildVectorSDNode::getConstantRawBits helper We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation. This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data. Differential Revision: https://reviews.llvm.org/D113351	2021-11-08 12:07:38 +00:00
Simon Moll	c2b91eef27	[VE] default to integrated asm in AsmInfo VE integrated asm has been the default in Clang. Also use the default setting for integrated asm in the backend. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D113384	2021-11-08 11:58:29 +01:00
David Green	a982940eb5	[AArch64] Combine fptoi.sat(fmul) to fixed point cvtf We already have patterns for fptosi and fptoui plus fmul to fixed point convert, this adds equivalent patterns for fptosi.sat and fptoui.sat, which should apply equally well for the legal saturating variants. Differential Revision: https://reviews.llvm.org/D113199	2021-11-08 10:07:34 +00:00
Qiu Chaofan	9b5e2b5261	[PowerPC] Implement basic macro fusion in Power10 Including basic fusion types around arithmetic and logical instructions. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D111693	2021-11-08 17:23:56 +08:00
Andrew Wei	bf3784b882	[AArch64] Canonicalize X(Y+1) or X(1-Y) to madd/msub Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern Reviewed By: dmgreen, sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D111862	2021-11-08 16:49:31 +08:00
skc7	a0633f5ccb	[AMDGPU] Test Commit. NFC Reviewed By: hsmhsm Differential Revision: https://reviews.llvm.org/D113379	2021-11-08 07:09:09 +00:00
Ben Shi	e32cf690df	[RISCV] Optimize (add (mul r, c0), c1) Optimize (add (mul x, c0), c1) -> (add (mul (add x, c1/c0+1), c0), c1%c0-c0), if c1/c0+1 and c1%c0-c0 are simm12, while c1 is not. Optimize (add (mul x, c0), c1) -> (add (mul (add x, c1/c0-1), c0), c1%c0+c0), if c1/c0-1 and c1%c0+c0 are simm12, while c1 is not. Reviewed By: craig.topper, asb Differential Revision: https://reviews.llvm.org/D111141	2021-11-08 02:58:25 +00:00
Chen Zheng	7c6f5950f0	[PowerPC] comment for different input register classes; nfc Add comments to explain why XXPERMDIs and XXPERMDI have different input register classes, vsfrc for XXPERMDIs and vsrc for XXPERMDI. This addresses the comments in abandoned patch D113178, we keep using `f0` instead of using `vs0` for XXPERMDIs on purpose.	2021-11-08 02:21:30 +00:00
Zi Xuan Wu	4fb282fec5	[CSKY] Add CSKY 16-bit instruction format and encoding CSKY is a ARCH which supports mixture of 16-bit and 32-bit instructions natively, and there is not an indivual predictor or feature to enable/disable 16-bit instruction. So I think it's better to add 16-bit instruction early, and naturally to use 16-bit and 32-bit instructions. Differential Revision: https://reviews.llvm.org/D112919	2021-11-08 10:02:15 +08:00
Simon Pilgrim	55e4cd8485	[X86][AVX2] Recognise 256-bit truncation shuffles and mask 256-bit source For v8i16 shuffle patterns that are lowered with AND+PACKUS, check to see if the sources are from a 256-bit vector and perform the masking using BLENDW at the 256-bit level. With the test changes we can see more examples of duplicate XMM/YMM zero vectors (PR26018) :(	2021-11-07 21:24:55 +00:00
Nikita Popov	a8c318b50e	[BasicAA] Use index size instead of pointer size When accumulating the GEP offset in BasicAA, we should use the pointer index size rather than the pointer size. Differential Revision: https://reviews.llvm.org/D112370	2021-11-07 18:56:11 +01:00
Kazu Hirata	aee86f9b6c	[AMDGPU] Remove unused declaration selectSMRD (NFC) The function body proper was removed on Feb 20, 2019 in commit `79b5c3842b`.	2021-11-07 09:53:18 -08:00
Kazu Hirata	41ef3187e0	[ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC)	2021-11-07 09:53:16 -08:00
Simon Pilgrim	d391e4fe84	[X86] Update RET/LRET instruction to use the same naming convention as IRET (PR36876). NFC Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth. Helps prevent future scheduler model mismatches like those that were only addressed in D44687. Differential Revision: https://reviews.llvm.org/D113302	2021-11-07 15:06:54 +00:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Simon Pilgrim	b5ef56f0bc	[X86][AVX] Add missing X86ISD::VBROADCAST(v4f32 -> v8f32) isel pattern for AVX1 targets D109434 addressed the v2f64 -> v4f64 case, an internal test has found an equivalent crash for the v4f32 -> v8f32 case.	2021-11-07 12:59:35 +00:00
Kazu Hirata	22e21da47d	[WebAssembly] Remove unused declaration SelectExternRefAddr (NFC)	2021-11-06 19:31:22 -07:00
Kazu Hirata	e4bab21848	[AMDGPU] Use MachineBasicBlock::{predecessors,successors} (NFC)	2021-11-06 19:31:20 -07:00
Kazu Hirata	cefc01fa65	[X86] Simplify a call to MachineBasicBlock::erase (NFC)	2021-11-06 13:08:25 -07:00
Kazu Hirata	815e8b5a20	[Hexagon] Remove an extraneous variable (NFC)	2021-11-06 13:08:23 -07:00
Kazu Hirata	14d656b3d8	[Target] Use llvm::reverse (NFC)	2021-11-06 13:08:21 -07:00
Roman Lebedev	f8efc5c0ac	[NFC][TTI] Add/extract `getReplicationShuffleCost()` method, deduplicate it's implementations Hiding it in `getInterleavedMemoryOpCost()` is problematic for a number of reasons, including testability and reuse, let's do better. In a followup `getUserCost()` will be taught to use to to estimate the mask costs, which will allow for better cost model tests for it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113313	2021-11-06 16:45:15 +03:00
Bin Cheng	54d891a7d5	[RISCV]: Fix typo by abstracting VWholeLoad* classes This patch abstracts VWholeLoad* classes into VWholeLoadN, simplifies existing code as well as fixes a typo. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D109319	2021-11-06 10:48:03 +08:00
Bin Cheng	d488f1fff2	[RISCV][NFC]: Refactor classes for load/store instructions of RVV This patch refactors classes for load/store of V extension by: - Introduce new class for VUnitStrideLoadFF and VUnitStrideSegmentLoadFF so that uses of L/SUMOP* are not spread around different places. - Reorder classes for Unit-Stride load/store in line with table describing lumop/sumop in riscv-v-spec.pdf. Reviewed By: HsiangKai, craig.topper Differential Revision: https://reviews.llvm.org/D109318	2021-11-06 10:48:03 +08:00
Florian Hahn	f64580f8d2	[AArch64][GISel] Optimize 8 and 16 bit variants of uaddo. Try simplify G_UADDO with 8 or 16 bit operands to wide G_ADD and TBNZ if result is only used in the no-overflow case. It is restricted to cases where we know that the high-bits of the operands are 0. If there's an overflow, then the the 9th or 17th bit must be set, which can be checked using TBNZ. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D111888	2021-11-05 19:11:15 +01:00
Shao-Ce SUN	5c3d7184b4	[RISCV] Support Zfhmin extension According to RISC-V Unprivileged ISA 15.6. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D111866	2021-11-06 01:41:02 +08:00
Kazu Hirata	2c4ba3e9d3	[Target] Use make_early_inc_range (NFC)	2021-11-05 09:14:32 -07:00
Roman Lebedev	ad617183bb	[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: mask is i8 not i1 Even though AVX512's masked mem ops (unlike AVX1/2) have a mask that is a `VF x i1`, replication of said masks happens after promotion of it to `VF x i8`, so we should use `i8`, not `i1`, when calculating the cost of mask replication.	2021-11-05 17:27:02 +03:00
David Sherwood	657a1dcd0d	[AArch64] Add target DAG combine for UUNPKHI/LO When created a UUNPKLO/HI node with an undef input then the output should also be undef. I've added a target DAG combine function to ensure we avoid creating an unnecessary uunpklo/hi instruction. Differential Revision: https://reviews.llvm.org/D113266	2021-11-05 13:50:59 +00:00
Quinn Pham	c71fbdd87b	[NFC] Inclusive language: Remove instances of master in URLs [NFC] This patch fixes URLs containing "master". Old URLs were either broken or redirecting to the new URL. Reviewed By: #libc, ldionne, mehdi_amini Differential Revision: https://reviews.llvm.org/D113186	2021-11-05 08:48:41 -05:00
Jingu Kang	a7b1872593	[AArch64] Fix a bug from a pattern for uaddv(uaddlp(x)) ==> uaddlv A pattern has selected wrong uaddlv MI. It should be as below. uaddv(uaddlp(v8i8)) ==> uaddlv(v8i8) Differential Revision: https://reviews.llvm.org/D113263	2021-11-05 12:48:18 +00:00
Simon Pilgrim	5e9ac7c0a5	[X86] Enable v32i16 rotate lowering on non-BWI targets Fixes one of the regressions in D113192	2021-11-05 11:00:31 +00:00
Chen Zheng	fed2889f07	[PowerPC] use correct selection for v16i8/v8i16 splat load Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D113236	2021-11-05 10:04:03 +00:00
Jay Foad	c93bf53a3e	[AMDGPU] NFC formatting fixes in SIMemoryLegalizer	2021-11-05 09:10:24 +00:00
Qiu Chaofan	5fd406e254	[PowerPC] Add intrinsic to convert between ppc_fp128 and fp128 ppc_fp128 and fp128 are both 128-bit floating point types. However, we can't do conversion between them now, since trunc/ext are not allowed for same-size fp types. This patch adds two new intrinsics: llvm.ppc.convert.f128.to.ppcf128 and llvm.convert.ppcf128.to.f128, to support such conversion. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D109421	2021-11-05 16:58:38 +08:00
Chen Zheng	9695027066	[PowerPC] address post-commit comments for D106555; NFC Address namanjai post commit comments.	2021-11-05 05:30:53 +00:00
Shengchen Kan	be08e452f3	[X86][MS-InlineAsm] Add constraint m for memory access w/ global var Constraint `m` should be used when the address of a variable is passed as a value. And the constraint is missing for MS inline assembly when sth is written to the address of the variable. The missing would cause FE delete the definition of the static varible, and then result in "undefined reference to xxx" issue. Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D113096	2021-11-05 09:11:41 +08:00
Yonghong Song	41860e602a	BPF: Support btf_type_tag attribute A new kind BTF_KIND_TYPE_TAG is defined. The tags associated with a pointer type are emitted in their IR order as modifiers. For example, for the following declaration: int __tag1 * __tag1 __tag2 *g; The BTF type chain will look like VAR(g) -> __tag1 --> __tag2 -> pointer -> __tag1 -> pointer -> int In the above "->" means BTF CommonType.Type which indicates the point-to type. Differential Revision: https://reviews.llvm.org/D113222	2021-11-04 17:01:36 -07:00
Ben Langmuir	a2639dcbe6	[ORC] Add a utility for adding missing "self" relocations to a Symbol If a tool wants to introduce new indirections via stubs at link-time in ORC, it can cause fidelity issues around the address of the function if some references to the function do not have relocations. This is known to happen inside the body of the function itself on x86_64 for example, where a PC-relative address is formed, but without a relocation. ``` _foo: leaq -7(%rip), %rax ## form pointer to '_foo' without relocation _bar: leaq (%rip), %rax ## uses X86_64_RELOC_SIGNED to '_foo' ``` The consequence of introducing a stub for such a function at link time is that if it forms a pointer to itself without relocation, it will not have the same value as a pointer from outside the function. If the function pointer is used as a key, this can cause problems. This utility provides best-effort support for adding such missing relocations using MCDisassembler and MCInstrAnalysis to identify the problematic instructions. Currently it is only implemented for x86_64. Note: the related issue with call/jump instructions is not handled here, only forming function pointers. rdar://83514317 Differential revision: https://reviews.llvm.org/D113038	2021-11-04 15:01:05 -07:00
Thomas Symalla	76cbe62262	[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values. This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D111637	2021-11-04 21:50:18 +01:00
David Green	091244023a	[ARM] Move VPTBlock pass after post-ra scheduling Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. This has the unfortunate side effect of severely hampering post-ra scheduling at times as the instructions are already stuck in vpt blocks, not allowed to be independently ordered. This patch addresses that by just moving the creation of VPT blocks later in the pipeline, after post-ra scheduling has been performed. This allows more optimal scheduling post-ra before the vpt blocks are created, leading to more optimal tail predicated loops. Differential Revision: https://reviews.llvm.org/D113094	2021-11-04 18:42:12 +00:00
Wouter van Oortmerssen	a320f877ce	[WebAssembly] Fix debug locations for ExplicitLocals pass This is a reworked version of the reverted patch: https://reviews.llvm.org/D112487 Note that a) it doesn't need the test changes anymore, and b) I checked at least locally it passes other.test_pthread_lsan_leak Differential Revision: https://reviews.llvm.org/D113208	2021-11-04 11:38:03 -07:00
Zakk Chen	0649dfebba	[RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0. Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic. Reviewed By: frasercrmck, jrtc27, craig.topper Differential Revision: https://reviews.llvm.org/D111062	2021-11-04 10:08:01 -07:00
Kazu Hirata	2887117d2c	[Hexagon] Use make_early_inc_range (NFC)	2021-11-04 08:51:05 -07:00
Sander de Smalen	1ea4296208	[NFC] Remove from UnivariateLinearPolyBase::getValue(). This interface should not have existed in the first place, let alone be a public member. It allows calling `ElementCount::get(..)->getValue()`, which is ambiguous. The interfaces to be used are either getFixedValue() or getKnownMinValue().	2021-11-04 14:32:08 +00:00
Chen Zheng	f6db18fd4a	[PowerPC][NFC] make option ppc-formprep-max-vars can be set more than one time.	2021-11-04 13:44:58 +00:00
Simon Pilgrim	87d5bb66eb	[X86][SSE] Improve PMADDWD SimplifyDemandedVectorElts handling Check both operands for zero elements to remove unnecessary demanded elts. Try to help reduce some minor regressions noticed in D110995	2021-11-04 12:56:31 +00:00
Qiu Chaofan	a84118756c	[PowerPC] Enforce side effects to FPSCR read/set intrinsics Currently, FPSCR is not modeled, so in some early passes (such as early-cse), the read/set intrinsics to FPSCR may get incorrect simplification. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D112380	2021-11-04 11:45:32 +08:00
RamNalamothu	539f500e78	[AMDGPU] Do not add debug locations to the code inside prologue There is no real source location for code inside prologue as it is generated by compiler but source locations are being added to code inside prologue as a side effect of https://reviews.llvm.org/D99269 because buildSpillLoadStore() is using source location of the real instruction in the basic block if any. Fixes: SWDEV-307590 Reviewed By: scott.linder, sebastian-ne Differential Revision: https://reviews.llvm.org/D113100	2021-11-04 08:02:41 +05:30
Craig Topper	5022ac0771	[RISCV] Use HasVInstructions and HasVInstructionsAnyF in more place in TableGen. NFC Change RISCVSubtarget.hasVInstructionAnyF() to call hasVInstructionsF32 so that any changes to hasVInstructionsF32 are reflected. The files were missed in D112496.	2021-11-03 14:32:45 -07:00
Matthias Braun	847a680733	X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr This is a re-commit of `e2c7ee0743` which was reverted in `a2a58d91e8`. This includes a fix to consistently check for EFLAGS being live-out. See phabricator review. Original Summary: This extends `optimizeCompareInstr` to re-use previous comparison results if the previous comparison was with an immediate that was 1 bigger or smaller. Example: CMP x, 13 ... CMP x, 12 ; can be removed if we change the SETg SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP Motivation: This often happens because SelectionDAG canonicalization tends to add/subtract 1 often when optimizing for fallthrough blocks. Example for `x > C` the fallthrough optimization switches true/false blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into `x < C + 1`. Differential Revision: https://reviews.llvm.org/D110867	2021-11-03 14:12:23 -07:00
alex-t	0a3d755ee9	[AMDGPU] Enable divergence-driven BFE selection Detailed description: This change enables the bit field extract patterns selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node divergence. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D110950	2021-11-03 23:26:59 +03:00
Harald van Dijk	889c2b97bd	[X86] Fix X32 indirect call generation The check for whether a zero extension was needed was subtly wrong and saw a value that was already 64 bits, so did not extend. Fixes PR52357. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112860	2021-11-03 16:43:44 +00:00
Kazu Hirata	4bef0304e1	[AArch64, AMDGPU] Use make_early_inc_range (NFC)	2021-11-03 09:22:51 -07:00
Hans Wennborg	a2a58d91e8	Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr" This casued miscompiles of switches, see comments on the code review. > This extends `optimizeCompareInstr` to re-use previous comparison > results if the previous comparison was with an immediate that was 1 > bigger or smaller. Example: > > CMP x, 13 > ... > CMP x, 12 ; can be removed if we change the SETg > SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP > > Motivation: This often happens because SelectionDAG canonicalization > tends to add/subtract 1 often when optimizing for fallthrough blocks. > Example for `x > C` the fallthrough optimization switches true/false > blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into > `x < C + 1`. > > Differential Revision: https://reviews.llvm.org/D110867 This reverts commit `e2c7ee0743`.	2021-11-03 17:01:36 +01:00
Roman Lebedev	df93c8a919	[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: fallback to scalarization cost computation for mask I don't really buy that masked interleaved memory loads/stores are supported on X86. There is zero costmodel test coverage, no actual cost modelling for the generation of the mask repetition, and basically only two LV tests. Additionally, i'm not very interested in AVX512. I don't know if this really helps "soft" block over at https://reviews.llvm.org/D111460#inline-1075467, but i think it can't make things worse at least. When we are being told that there is a masking, instead of completely giving up and falling back to fully scalarizing `BasicTTIImplBase::getInterleavedMemoryOpCost()`, let's correctly query the cost of masked memory ops, keep all the pretty shuffle cost modelling, but scalarize the cost computation for the mask replication. I think, not scalarizing the shuffles themselves may adjust the computed costs a bit, and maybe hopefully just enough to hide the "regressions" at https://reviews.llvm.org/D111460#inline-1075467 I do mean hide, because the test coverage is non-existent. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112873	2021-11-03 18:14:35 +03:00
Peter Waller	7a34145f40	Reland "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `753eba6421`. Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 13:42:14 +00:00
Peter Waller	753eba6421	Revert "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `1febf42f03`, which has a use-of-uninitialized-memory bug. See: https://reviews.llvm.org/D112076	2021-11-03 13:39:38 +00:00
Andrew Savonichev	123ad720f1	[NVPTX] Mark special registers as reserved A reserved register: - is not allocatable - is considered always live - is ignored by liveness tracking NVPTX special registers match the criteria, and marking them as reserved helps to avoid machine verifier error: * Bad machine code: Using an undefined physical register * - function: foo - basic block: %bb.0 (0x557bb178b708) - instruction: %0:int32regs = MOV_SPECIAL $envreg0 - operand 1: $envreg0 Differential Revision: https://reviews.llvm.org/D113008	2021-11-03 15:48:04 +03:00
Andrew Savonichev	0e70785538	[NVPTX] Add MoveParam instruction for TargetExternalSymbol operand TargetExternalSymbol is considered to be an immediate and not a register, so machine verifier emits an error: * Bad machine code: Expected a register operand. * - function: static_offset - basic block: %bb.0 bb (0x560e9b306028) - instruction: %3:int64regs = MoveParamI64 &static_offset_param_1 - operand 1: &static_offset_param_1 The patch adds variants of this instruction with an immediate operand for byval arguments on 64-bit and 32-bit targets. Differential Revision: https://reviews.llvm.org/D113006	2021-11-03 14:43:41 +03:00
David Green	3bc586b9aa	[ARM] Treat MVE gather add-like-or's like adds LLVM has the habit of turning adds with no common bits set into ors, which means we need to detect them and treat them like adds again in the MVE gather/scatter lowering pass. Differential Revision: https://reviews.llvm.org/D112922	2021-11-03 11:41:06 +00:00
Peter Waller	1febf42f03	[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 11:02:44 +00:00
David Green	d36dd1f842	[ARM] Push gather/scatter shl index updates out of loops This teaches the MVE gather scatter lowering pass that SHL is essentially the same as Mul, where we are able to optimize the induction of a gather/scatter address by pushing them out of loops. https://alive2.llvm.org/ce/z/wG4VyT Differential Revision: https://reviews.llvm.org/D112920	2021-11-03 11:00:05 +00:00
Qiu Chaofan	741aeda97d	[PowerPC] Implement longdouble pack/unpack builtins Implement two builtins to pack/unpack IBM extended long double float, according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D112055	2021-11-03 17:57:25 +08:00
Andrew Savonichev	30a3a17df8	[NVPTX] Copy machine operand flags in TII::insertBranch Before this patch, flags such as undef were dropped by TII::insertBranch (used by BranchFolding pass), resulting in the following error from machine verifier: * Bad machine code: Reading virtual register without a def * - function: hoge - basic block: %bb.0 bb (0x562e9c240e68) - instruction: CBranch %2:int1regs, %bb.3 - operand 0: %2:int1regs Differential Revision: https://reviews.llvm.org/D113001	2021-11-03 12:38:27 +03:00
Yi Kong	803d4f8a35	[ARM][AsmParser] Don't emit "deprecated instruction in IT block" warning if requested Also fixed formatting in AsmMatcherEmitter because it was confusing. Differential Revision: https://reviews.llvm.org/D112993	2021-11-03 17:18:04 +08:00
Ben Shi	59c3b48d99	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit `3de3ca3137`.	2021-11-03 14:15:21 +08:00
Chen Zheng	5a8b196340	[PowerPC] handle more splat loads without stack operation This mostly improves splat loads code generation on Power7 Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106555	2021-11-03 05:17:41 +00:00
Abinav Puthan Purayil	fbe61fb0aa	[AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation. The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to optimize AGPR to AGPR copy but it missed checking for the SGPR clobbering for the S_MOV_B64_IMM_PSEUDO generation. Differential Revision: https://reviews.llvm.org/D113005	2021-11-03 09:09:24 +05:30
Ben Shi	3de3ca3137	[AArch64] Optimize add/sub with immediate Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. Reviewed By: jaykang10, dmgreen Differential Revision: https://reviews.llvm.org/D111034	2021-11-03 03:06:43 +00:00
Phoebe Wang	8f101971b6	[X86][VARARG] Assign MMO earlier to avoid prolog insert point been sunk across VASTART_SAVE_XMM_REGS The changes in D80163 defered the assignment of MachineMemOperand (MMO) until the X86ExpandPseudo pass. This will result in crash due to prolog insert point been sunk across the pseudo instruction VASTART_SAVE_XMM_REGS. Moving the assignment to the creation of the node can avoid the problem. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D112859	2021-11-03 10:13:32 +08:00
Eli Friedman	c964afb2c8	[AArch64] Diagnose large adrp offset on Windows. On Windows, this relocation can only encode a 21-bit offset. Make sure we emit an error, instead of silently truncating the offset. Found investigating https://bugs.llvm.org/show_bug.cgi?id=52378 Differential Revision: https://reviews.llvm.org/D113051	2021-11-02 15:11:22 -07:00
Simon Pilgrim	53900a19fd	[X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper for splat of normal vector loads. NFCI. Reapplied from rG1cfecf4fc427 with fix for PR51226 - ensure the load is a normal (non-ext) load.	2021-11-02 20:03:25 +00:00
Simon Pilgrim	82e0eb22af	[X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper. NFCI. This is part of rG1cfecf4fc427 that was reverted to fix PR51226 - concating the broadcasts is OK, its the splatted loads that crash (we're not detecting extloads). I'm still creating a reduced test case so haven't added the load handling again yet.	2021-11-02 18:04:35 +00:00
Fraser Cormack	d065b03801	[RISCV] Optimize vp.load with an all-ones mask Similar to D110206, this patch optimizes unmasked vp.load intrinsics to avoid the need of a vmset instruction to set the mask. It does so by selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113022	2021-11-02 17:23:39 +00:00
Jay Foad	be1a8f8834	[AMDGPU] Really preserve LiveVariables in SILowerControlFlow https://bugs.llvm.org/show_bug.cgi?id=52204 Differential Revision: https://reviews.llvm.org/D112731	2021-11-02 15:03:37 +00:00
Matt	895145aacb	Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA" This reverts commit `fc28a2f8ce`.	2021-11-02 14:56:01 +00:00
Simon Pilgrim	e173631dd1	[X86][AVX] SimplifyDemandedVectorEltsForTargetNode - use getBROADCAST_LOAD helper. NFCI. Reduce width of X86ISD::SUBV_BROADCAST_LOAD node.	2021-11-02 14:07:22 +00:00
Simon Pilgrim	8ca666a280	[X86][AVX] lowerV2X128Shuffle - use getBROADCAST_LOAD helper. NFCI.	2021-11-02 14:07:21 +00:00
Martin Liska	c5029023fb	Fix building with GCC 12: Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380 Differential Revision: https://reviews.llvm.org/D112990	2021-11-02 14:28:00 +01:00
David Callahan	4ec1b8eeac	[RISCV] Fix invalid kill on callee save A callee save may be live (specifically X1) on entry and so a spill should not mark it killed. Differential Revision: https://reviews.llvm.org/D111285	2021-11-02 11:56:54 +00:00
Wouter van Oortmerssen	ac65366485	[WebAssembly] support "return" and unreachable code in asm type checker To support return (it not being supported well was the ground cause for https://github.com/WebAssembly/wasi-sdk/issues/200) we also have to have at least a basic notion of unreachable, which in this case just means to stop type checking until there is an end_block (an incoming control flow edge). This is conservative (may miss on some type checking opportunities) but is simple and an improvement over what we had before. Differential Revision: https://reviews.llvm.org/D112953	2021-11-01 15:42:58 -07:00
Yonghong Song	f63405f6e3	BPF: Workaround an InstCombine ICmp transformation with llvm.bpf.compare builtin Commit `acabad9ff6` ("[InstCombine] try to canonicalize icmp with trunc op into mask and cmp") added a transformation to convert "(conv)a < power_2_const" to "a & <const>" in certain cases and bpf kernel verifier has to handle the resulted code conservatively and this may reject otherwise legitimate program. This commit tries to prevent such a transformation. A bpf backend builtin llvm.bpf.compare is added. The ICMP insn, which is subject to above InstCombine transformation, is converted to the builtin function. The builtin function is later lowered to original ICMP insn, certainly after InstCombine pass. With this change, all affected bpf strobemeta* selftests are passed now. Differential Revision: https://reviews.llvm.org/D112938	2021-11-01 14:46:20 -07:00
Cameron McInally	702fd3d323	[SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive. For NEON, FMA matching is done in the MachineCombiner, and not the DAGCombiner. That causes problems with VLS lowering, since the vectors are fixed width at the DAGCombiner, but are scalable in the MachineCombiner. This patch corrects it by matching FMAs for VLS vectors in the DAGCombiner. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D112557	2021-11-01 10:43:52 -07:00
Kazu Hirata	d000431fb2	[X86] Remove X86ELFObjectWriter in X86AsmBackend.cpp (NFC) Note that the identically named class is defined in an anonymous namespace in X86ELFObjectWriter.cpp.	2021-11-01 08:31:54 -07:00
Jay Foad	7afef22926	[AMDGPU] Use MachineInstrBuilder::addReg. NFC.	2021-11-01 15:29:51 +00:00
Jay Foad	2b548b18c1	[AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32 Differential Revision: https://reviews.llvm.org/D112917	2021-11-01 13:55:53 +00:00
Mubashar Ahmad	0b83a18a2b	[AArch64] Enablement of Cortex-X2 Enables support for Cortex-X2 cores. Differential Revision: https://reviews.llvm.org/D112459	2021-11-01 11:55:24 +00:00
Simon Pilgrim	6fc50e531d	[CostModel][X86] Remove old FIXME comments for AVX512F vector splitting Similar to AVX1, the cost of splitting/merging 512-bit -> 256-bits vectors for arithmetic operations are typically hidden due to different used ports etc.	2021-11-01 11:11:11 +00:00
Simon Pilgrim	fd485d8cda	[X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances. This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source. There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses? Noticed while investigating the quality of interleaved load/store codegen. Differential Revision: https://reviews.llvm.org/D111960	2021-11-01 10:45:50 +00:00
Kazu Hirata	476e1ee3da	[AArch64] Remove unused declaration hasSwiftExtendedFrame (NFC)	2021-10-31 22:58:56 -07:00
Chen Zheng	eeed1545b2	[PowerPC] turn off chain commoning by default.	2021-11-01 04:11:10 +00:00
Zi Xuan Wu	cf78715cae	[CSKY] First patch to construct codegen infra and generate first add instruction Ooops. It constructs codegen infra and provide only basic code to generate first add instruction successfully. Differential Revision: https://reviews.llvm.org/D112206	2021-11-01 10:06:56 +08:00
Craig Topper	ada5458521	[RISCV] Expand scalable vector bswap. Fix crash for bitreverse. Fix LegalizeVectorOps to not try shuffle or unrolling expansions for scalable vectors. Differential Revision: https://reviews.llvm.org/D112236	2021-10-31 10:01:27 -07:00
Kazu Hirata	72710af233	[CodeGen, Target] Use MachineBasicBlock::terminators (NFC)	2021-10-31 07:57:34 -07:00
Kazu Hirata	5970249439	[Hexagon] Remove chksetELFHeaderEFlags (NFC) The function was introduced without any use on Nov 9, 2015 in commit `7cd0892729`.	2021-10-30 08:43:43 -07:00
Kazu Hirata	c3d63a0697	[Hexagon] Remove ValidArch (NFC) This function seems to be unused for at least one year.	2021-10-30 08:43:41 -07:00
Kazu Hirata	c5cd371cc9	[Hexagon] Remove unused struct InstTy (NFC)	2021-10-30 08:43:39 -07:00
Christudasan Devadasan	aa2d3b59ce	GlobalISel/Utils: Use incoming regbank while constraining the superclasses Register operands with superclasses can possibly have multiple regBanks if they have different register types. The regBank ambiguity resolved during regbankselect should be used to constrain the operand regclass instead of obtaining one from the MCInstrDesc. This is a prerequisite patch for D109300 that introduces allocatable AV_* Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to restrain the regclass to either A or V based on the incoming regbank. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112323	2021-10-30 07:20:45 -04:00
Stanislav Mekhanoshin	e5340ed30c	[AMDGPU] Fix global isel for kernels using agprs on gfx90a With Global ISel getReservedRegs() is called before function is regbank selected for the first time. Defer caching of usesAGPRs() in this case. Differential Revision: https://reviews.llvm.org/D112644	2021-10-29 14:23:14 -07:00
Sam Clegg	3b039c68f2	Revert "[WebAssembly] Fix debug locations for ExplicitLocals pass" This reverts commit `a66451ebbe`. This caused a failure when integrated with emscripten: https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8832019855439718609/overview	2021-10-29 13:34:18 -07:00
Nick Desaulniers	39e5dd113f	[SparcISelLowering] avoid emitting libcalls to __muloti4 and __mulodi4 These compiler-rt-only symbols aren't available in libgcc. Similar to D108842, D108844, and D108926. Fixes: pr/52043 Reviewed By: craig.topper, rengolin Differential Revision: https://reviews.llvm.org/D112750	2021-10-29 13:14:09 -07:00
Sanjay Patel	285b8abce4	[x86] limit vector increment fold to allow load folding The tests are based on the example from: https://llvm.org/PR52032 I suspect that it looks worse than it actually is. :) That is, llvm-mca says there's no uop/timing difference with the load folding and pcmpeq vs. broadcast on Haswell (and probably other targets). The load-folding definitely makes the code smaller, so it's good for that at least. So this requires carving a narrow hole in the transform to get just this case without changing others that look good as-is (in other words, the transform still seems good for most examples). Differential Revision: https://reviews.llvm.org/D112464	2021-10-29 15:48:35 -04:00
Sanjay Patel	837518d6a0	[x86] make mayFold* helpers visible to more files; NFC The first function is needed for D112464, but we might as well keep these together in case the others can be used someday.	2021-10-29 15:48:35 -04:00
Amara Emerson	5dd9e019dd	[AArch64][GlobalISel] Fix an crash in RBS due to a new regclass being added. rdar://84674985	2021-10-29 11:47:00 -07:00
Matt Morehouse	33cc0cfd46	[X86] Don't affect jump tables under +tagged-globals. `classifyLocalReference(nullptr)` is called to get the appropriate relocation type for jump tables. We should not use @GOTPCREL for this case. The new test cases trigger assertions without this patch. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D112832	2021-10-29 10:37:43 -07:00
Craig Topper	aefcd59895	[RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better. If the VL operand of a mask register instruction comes from an explicit vsetvli with a different VTYPE, we can still avoid needing a vsetvli as long as the SEW/LMUL ratio is the same and policy bits match. Differential Revision: https://reviews.llvm.org/D112762	2021-10-29 09:49:36 -07:00
Simon Pilgrim	6102e5d56b	[CostModel][X86] Remove old TODO comment BMI (TZCNT) scalar handling was added at rGa2db388dce77c2f23f2009d7363a0b63bb54523c	2021-10-29 17:28:45 +01:00
Bradley Smith	86972f1114	[AArch64][SVE] Use TargetFrameIndex in more SVE load/store addressing modes Add support for generating TargetFrameIndex in complex patterns for indexed addressing modes in SVE. Additionally, add missing load/stores to getMemOpInfo and getLoadStoreImmIdx. Differential Revision: https://reviews.llvm.org/D112617	2021-10-29 14:44:16 +00:00
Jay Foad	1b758925ad	[IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction createReplacementInstr was a trivial wrapper around ConstantExpr::getAsInstruction, which also inserted the new instruction into a basic block. Implement this directly in getAsInstruction by adding an InsertBefore parameter and change all callers to use it. NFC. A follow-up patch will remove createReplacementInstr. Differential Revision: https://reviews.llvm.org/D112791	2021-10-29 15:02:58 +01:00
Jay Foad	21a1d4cf71	[AMDGPU] Change numBitsSigned for simplicity and document it. NFC. Change numBitsSigned to return the minimum size of a signed integer that can hold the value. This is different by one from the previous result but is more consistent with numBitsUnsigned. Update all callers. All callers are now more consistent between the signed and unsigned cases, and some callers get simpler, especially the ones that deal with quantities like numBitsSigned(LHS) + numBitsSigned(RHS). Differential Revision: https://reviews.llvm.org/D112813	2021-10-29 14:22:06 +01:00
Chen Zheng	7591d21032	[PowerPC] fix a miscompile for Solaris build	2021-10-29 12:06:25 +00:00
Bradley Smith	bf72a469ba	[AArch64][SVE] Fix build failure introduced in `13faa5f440`	2021-10-29 11:57:02 +00:00
Simon Pilgrim	154c036ebb	[X86] combineX86GatherScatter - only fold scale if the index isn't extended As mentioned on D108539, when the gather indices are smaller than the pointer size, they are sign-extended BEFORE scale is applied, making the general fold unsafe. If the index have sufficient sign-bits then folding the scale could be safe - I'll investigate this.	2021-10-29 11:48:05 +01:00
Bradley Smith	13faa5f440	[AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types This adds support for SVE structured loads/stores to the relevant target hooks, such that we can support these instructions in the InterleavedAccess pass. Depends on D112078 Differential Revision: https://reviews.llvm.org/D112303	2021-10-29 09:35:57 +00:00
Cullen Rhodes	8686626244	[Sparc] NFC: Remove unused tblgen template args Identified in D109359. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109712	2021-10-29 09:16:15 +00:00
Vang Thao	52b43d1549	[AMDGPU] Fix cvt_f32_ubyte combine with shl Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112733	2021-10-28 21:43:06 -07:00
Kazu Hirata	01b4789b62	[AMDGPU] Remove hasDefinedInitializer (NFC) The last use was removed on Sep 16, 2021 in commit `7a62a5b56d`.	2021-10-28 20:33:34 -07:00
Kazu Hirata	dd5d46b009	[AMDGPU] Remove unused BBSelectRegister in AMDGPUMachineCFGStructurizer (NFC) This field seems to be unused for at least one year.	2021-10-28 20:33:32 -07:00
Kazu Hirata	309357c01a	[AMDGPU] Remove unused declaration eliminateDeadBranchOperands (NFC)	2021-10-28 20:33:30 -07:00
Thomas Lively	fb67f3d969	[WebAssembly] Add prototype relaxed float to int trunc instructions Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u, i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112186	2021-10-28 14:01:53 -07:00
Daniel Kiss	d8075e8781	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 21:45:09 +02:00
Wouter van Oortmerssen	a66451ebbe	[WebAssembly] Fix debug locations for ExplicitLocals pass Differential Revision: https://reviews.llvm.org/D112487	2021-10-28 12:35:46 -07:00
Ahmed Bougacha	bef777206e	[AArch64] Rename some timm predicates for consistency. NFC. timm isn't the common case, and TImmLeafs should make it clear what they are. We're adding a plain ImmLeaf for 0_65535, so rename i64_imm0_65535 to timm64_0_65535, and imm32_0_7 to timm32_0_7.	2021-10-28 11:41:29 -07:00
Matthias Braun	e2c7ee0743	X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr This extends `optimizeCompareInstr` to re-use previous comparison results if the previous comparison was with an immediate that was 1 bigger or smaller. Example: CMP x, 13 ... CMP x, 12 ; can be removed if we change the SETg SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP Motivation: This often happens because SelectionDAG canonicalization tends to add/subtract 1 often when optimizing for fallthrough blocks. Example for `x > C` the fallthrough optimization switches true/false blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into `x < C + 1`. Differential Revision: https://reviews.llvm.org/D110867	2021-10-28 10:33:56 -07:00
Matthias Braun	97a1570d8c	X86InstrInfo: Optimize more combinations of SUB+CMP `X86InstrInfo::optimizeCompareInstr` would only optimize a `SUB` followed by a `CMP` in `isRedundantFlagInstr`. This extends the code to also look for other combinations like `CMP`+`CMP`, `TEST`+`TEST`, `SUB x,0`+`TEST`. - Change `isRedundantFlagInstr` to run `analyzeCompareInstr` on the candidate instruction and compare the results. This normalizes things and gives consistent results for various comparisons (`CMP x, y`, `SUB x, y`) and immediate cases (`TEST x, x`, `SUB x, 0`, `CMP x, 0`...). - Turn `isRedundantFlagInstr` into a member function so it can call `analyzeCompare`. - We now also check `isRedundantFlagInstr` even if `IsCmpZero` is true, since we now have cases like `TEST`+`TEST`. Differential Revision: https://reviews.llvm.org/D110865	2021-10-28 10:33:56 -07:00
Daniel Kiss	66e03db814	Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."" This reverts commit `b6420e575f`.	2021-10-28 17:24:53 +02:00
Daniel Kiss	b6420e575f	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 16:49:19 +02:00
Simon Pilgrim	d29ccbecd0	[X86][AVX] Attempt to fold a scaled index into a gather/scatter scale immediate (PR13310) If the index operand for a gather/scatter intrinsic is being scaled (self-addition or a shl-by-immediate) then we may be able to fold that scaling into the intrinsic scale immediate value instead. Fixes PR13310. Differential Revision: https://reviews.llvm.org/D108539	2021-10-28 14:07:17 +01:00
Abinav Puthan Purayil	2da6ef3664	[AMDGPU] Add 24-bit mulhi intrinsics in INTRINSIC_WO_CHAIN combine. mul24 intrinsic's operands are simplified by AMDGPUTargetLowering::performIntrinsicWOChainCombine(). This change adds the mul24hi intrinsics in the combine since its operands can be simplified like that of the mul24 intrinsics. Differential Revision: https://reviews.llvm.org/D112702	2021-10-28 16:57:48 +05:30
Sebastian Neubauer	fd1cfc9094	[AMDGPU][GlobalISel] Fix waterfall loops - Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit from optimizations - Add support for indirect function calls To support indirect calls, add a G_SI_CALL instruction without register class restrictions and insert a waterfall loop when applying register banks. Differential Revision: https://reviews.llvm.org/D109052	2021-10-28 10:30:55 +02:00
Caroline Concatto	2186b011e9	[Driver][AArch64]Add driver support for neoverse-512tvb target The support for neoverse-512tvb mirrors the same option available in GCC[1]. There is no functional effect for this option yet. This patch ensures the driver accepts "-mcpu=neoverse-512tvb", and enough plumbing is in place to allow the new option to be used in the future. [1]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html Differential Revision: https://reviews.llvm.org/D112406	2021-10-28 09:08:40 +01:00
Hsiangkai Wang	7051f73d69	[RISCV] Sync Zvlsseg register order as the same as vector registers. Sync the order of Zvlsseg registers with vector registers to avoid unnecessary register copies between vector instructions and zvlsseg instructions. Differential Revision: https://reviews.llvm.org/D110250	2021-10-28 13:34:53 +08:00
Kazu Hirata	cee3419d65	[AMDGPU] Remove unused declaration findNumUsedRegistersSI (NFC)	2021-10-27 21:24:02 -07:00
Phoebe Wang	2bc28c6f82	[X86] Add a dependency breaking xor before any gathers with an undef passthru value. In the instruction encoding, the passthru register is always tied to the destination register. The CPU scheduler has to wait for the last writer of this register to finish executing before the gather can start. This is true even if the initial mask is all ones so that the passthru will never be used. By explicitly zeroing the register we can break the false dependency. The zero idiom is executed completing by the register renamer and so is immedately considered ready. Authored by Craig. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D112505	2021-10-28 11:44:52 +08:00
Hsiangkai Wang	0a9b82960c	[RISCV] Use vmv.v.[v\|i] if we know COPY is under the same vl and vtype. If we know the source operand of COPY is defined by a vector instruction with tail agnostic and the same LMUL and there is no vsetvli between COPY and the define instruction to change the vl and vtype, we could use vmv.v.v or vmv.v.i to copy vector registers to get better performance than the whole vector register move instructions. If the source of COPY is from vmv.v.i, we could use vmv.v.i for the COPY. This patch only considers all these instructions within one basic block. Case 1: ``` bb.0: ... VSETVLI # The first VSETVLI before COPY and VOP. ... # Use this VSETVLI to check LMUL and tail agnostic. ... vy = VOP va, vb # Define vy. ... # There is no vsetvli between VOP and COPY. vx = COPY vy ``` Case 2: ``` bb.0: ... VSETVLI # The first VSETVLI before VOP. ... # Use this VSETVLI to check LMUL and tail agnostic. ... vy = VOP va, vb # Define vy. ... # There is no vsetvli to change vl between VOP and COPY. ... VSETVLI # The first VSETVLI before COPY. ... # This VSETVLI does not change vl and vtype. ... vx = COPY vy ``` Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Co-Authored-by: Kito Cheng <kito.cheng@sifive.com> Differential Revision: https://reviews.llvm.org/D103510	2021-10-28 11:39:04 +08:00
Craig Topper	1387483e72	[RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI Add new hasVInstructions() which is currently equivalent. Replace vector uses of hasStdExtZfh/F/D with new vector specific versions. The vector spec no longer requires that the vectors implement the same types as scalar. It only requires that the scalar type is the maximum size the vectors can support. This is currently implemented using the scalar rule we were using before. Add new hasVInstructionsI64() begin using to qualify code that requires i64 vector elements. This is all NFC for now, but we can start using this to better implement D112408 which introduces the Zve extensions. Reviewed By: frasercrmck, eopXD Differential Revision: https://reviews.llvm.org/D112496	2021-10-27 19:33:48 -07:00
Ard Biesheuvel	d7e089f2d6	[ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register. Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead. This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,. Reviewed By: nickdesaulniers, peter.smith Differential Revision: https://reviews.llvm.org/D112600	2021-10-27 16:42:11 -07:00
Michael Liao	e6a4ba3aa6	[amdgpu] Handle the case where there is no scavenged register. - When an unconditional branch is expanded into an indirect branch, if there is no scavenged register, an SGPR pair needs spilling to enable the destination PC calculation. In addition, before jumping into the destination, that clobbered SGPR pair need restoring. - As SGPR cannot be spilled to or restored from memory directly, the spilling/restoring of that SGPR pair reuses the regular SGPR spilling support but without spilling it into memory. As that spilling and restoring points are fully controlled, we only need to spill that SGPR into the temporary VGPR, which needs spilling into its emergency slot. - The target-specific hook is revised to take additional restore block, where the restoring code is filled. After that, the relaxation will place that restore block directly before the destination block and insert an unconditional branch in any fall-through block into the destination block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106449	2021-10-27 18:37:27 -04:00
Kazu Hirata	593451bd3c	[X86] Remove getSETOpc (NFC) This function seems to be unused for at least one year.	2021-10-27 09:22:31 -07:00
Kazu Hirata	e6b6190ead	[X86] Remove NeedsRetpoline in X86AsmPrinter (NFC) This field seems to be unused for at least one year.	2021-10-27 09:22:29 -07:00
Kazu Hirata	cc73310a81	[X86] Remove CallOperand in X86Operand (NFC) This field seems to be unused for at least one year.	2021-10-27 09:22:27 -07:00
Daniel Kiss	894ddba1c9	Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This reverts commit `da1d1a0869`.	2021-10-27 14:29:35 +02:00
Sanjay Patel	6c0a2c2804	[x86] enhance mayFoldLoad to check alignment As noted in D112464, a pre-AVX target may not be able to fold an under-aligned vector load into another op, so we shouldn't report that as a load folding candidate. I only found one caller where this would make a difference -- combineCommutableSHUFP() -- so that's where I added a test to show the (minor) regression. Differential Revision: https://reviews.llvm.org/D112545	2021-10-27 07:54:25 -04:00
Matt	fc28a2f8ce	[AArch64][SVE] Combine predicated FMUL/FADD into FMA Combine FADD and FMUL intrinsics into FMA when the result of the FMUL is an FADD operand with one only use and both use the same predicate. Differential Revision: https://reviews.llvm.org/D111638	2021-10-27 11:41:23 +00:00
Alexandros Lamprineas	8689f5e6e7	[AArch64] Add support for the 'R' architecture profile. This change introduces subtarget features to predicate certain instructions and system registers that are available only on 'A' profile targets. Those features are not present when targeting a generic CPU, which is the default processor. In other words the generic CPU now means the intersection of 'A' and 'R' profiles. To maintain backwards compatibility we enable the features that correspond to -march=armv8-a when the architecture is not explicitly specified on the command line. References: https://developer.arm.com/documentation/ddi0600/latest Differential Revision: https://reviews.llvm.org/D110065	2021-10-27 12:32:30 +01:00
Daniel Kiss	da1d1a0869	[ARM] __cxa_end_cleanup should be called instead of _UnwindResume. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-27 10:40:00 +02:00
Kazu Hirata	6af3e87d2d	[Hexagon] Remove set-but-unused variables (NFC)	2021-10-26 23:38:15 -07:00
Phoebe Wang	eb55c1f153	[X86][NFC] Add the missed `break;` for `79f9dfef0d`	2021-10-27 13:58:31 +08:00
Craig Topper	2783a5cfaf	[RISCV] Add ICmp and FCmp to shouldSinkOperands.	2021-10-26 22:23:54 -07:00
Ben Shi	97e52e1c35	[RISCV] Optimize immediate materialisation with SLLI.UW in the Zba extension Simplify "LUI+SLLI+ADDI+SLLI" and "LUI+ADDIW+SLLI+ADDI+SLLI" to "LUI+ADDIW+SLLIUW" to reduce total instruction amount. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111933	2021-10-27 02:48:38 +00:00
Austin Kerbow	02e60f2e77	[AMDGPU] Use max waves for scheduler's initial occupancy target The scheduler should set critical/excess register usage thresholds that are guided by the maximum possible occupancy for the function. This change is focused on setting proper lower bounds on register usage which we would typically only see when a specific number of maximum waves is requested with the "waves-per-eu" attribute, or by setting "amdgpu-num-vgpr\|sgpr" directly. This was broken previously. I have a follow-on patch that will address issues with the scheduler not targeting correct upper bounds on register usage which is typical with launch bounds and min "waves-per-eu". Changes by this patch: Set the initial critical register usage thresholds to minimum values that are determined by the maximum possible occupancy for the function, or the number of allocatable registers, whichever is lower. Avoid unisgned overflow if register limits are lower than the register tracking "ErrorMargin", I.e. when using stress-regalloc=2. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112373	2021-10-26 15:30:26 -07:00
Fangrui Song	226465efe3	[ARC] Fix `undefined symbol: llvm::MachineFunction::dump() const`	2021-10-26 11:44:18 -07:00
Kazu Hirata	c3e698e2f5	[CodeGen, Hexagon] Use MachineBasicBlock::phis (NFC)	2021-10-26 09:01:29 -07:00
Jonas Paulsson	bb506938be	[SystemZ] Improvement of emitMemMemWrapper() It was discovered that an extra register COPY remained when expanding a (variable length) memory operation with a loop and there was another use of the involved address register(s) afterwards. A simple fix for this is to COPY the address registers before the loop and use that new vreg instead. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D112065	2021-10-26 17:03:01 +02:00
Neubauer, Sebastian	eb16570ab0	[AMDGPU] Remove unused CSR defs CSR_AMDGPU_VGPRs_24_255 and CSR_AMDGPU_VGPRs_32_255 are not used anywhere, so remove them. Differential Revision: https://reviews.llvm.org/D112535	2021-10-26 16:01:49 +02:00
Abinav Puthan Purayil	61e3b9fefe	[AMDGPU] Add constrained shift pattern matches. The motivation for this is due to clang's conformance to https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b` in OpenCL where a is an int of bit width <width>. Differential revision: https://reviews.llvm.org/D110231	2021-10-26 19:07:19 +05:30
Chen Zheng	631f44f338	[PowerPC] use right extend type for SCEV Fix an issue caused by D108750 Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D112502	2021-10-26 13:32:03 +00:00
Abinav Puthan Purayil	781dd39b7b	[AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare. We were bailing out of creating 24-bit muls for results wider than 32 bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul correctly. Differential Revision: https://reviews.llvm.org/D112395	2021-10-26 18:53:07 +05:30
Abinav Puthan Purayil	9bd5cfeb1f	[AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics. These intrinsics maps to the 24-bit v_mul_hi instructions. This change also fixes an incorrect assumption on the associativity of 24-bit mulhi in its SDNode record in tblgen. Differential Revision: https://reviews.llvm.org/D112394	2021-10-26 18:53:07 +05:30
Sanjay Patel	2ab0148c14	[x86] use cast instead of dyn_cast for unchecked usage; NFC This was noted as an independent clean-up in D112464.	2021-10-26 08:20:19 -04:00
Neubauer, Sebastian	487f15603e	[AMDGPU] Fix setcc combine for i128 The combine asserted if constants could not be represented as uint64_t. Use APInts to fix this. Differential Revision: https://reviews.llvm.org/D112416	2021-10-26 13:39:50 +02:00
Jay Foad	c8e5aef1a0	[AMDGPU] Use standard MachineBasicBlock::getFallThrough method. NFCI. Differential Revision: https://reviews.llvm.org/D101825	2021-10-26 12:07:54 +01:00
Jonas Paulsson	9f8872779a	[SystemZ] Provide size values for PATCHPOINT, STACKMAP and FENTRY_CALL. All instructions must have a correct size value close to emission when SystemZLongBranch runs, or a necessary branch relaxation may be missed. This patch also adds an assert for instruction sizes in SystemZLongBranch. Review: Ulrich Weigand	2021-10-26 12:07:22 +02:00
Phoebe Wang	79f9dfef0d	[X86] Move splat addends from the gather/scatter index operand to the base address This can avoid a vector add and a constant pool load. Or an explicit broadcast in case of non-constant. Also reverse the transform any time we encounter a constant index addend that can't be moved to base. In that case pull the constant from base into the index. This reduces code size needed for the displacement since we needed the index add anyway. Limit this to scale of 1 to avoid divisibility and wrap issues. Authored by Craig. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111595	2021-10-26 12:35:57 +08:00
Zarko Todorovski	e9163660b1	[PPC][LLVM] Inclusive terms: remove references to sanity check in lib/Target/PowerPC Removed references to `sanity check` in `PPCBranchCoalescing.cpp` code comments. No word substitution made in this case, as the comments and code following illustrated are sufficient IMO. Reviewed By: quinnp Differential Revision: https://reviews.llvm.org/D112452	2021-10-25 18:13:54 -04:00
Wouter van Oortmerssen	5694dbccc3	[WebAssembly] support Memory64 in target_features section Differential Revision: https://reviews.llvm.org/D112266	2021-10-25 09:31:45 -07:00
Craig Topper	e2b7aabb57	[RISCV] Reduce the number of RISCV vector builtins by an order of magnitude. All but 2 of the vector builtins are only used by clang_builtin_alias. When using clang_builtin_alias, the type string of the builtin is never checked. Only the types in the function definition used for the alias are checked. This patch takes advantage of this to share a single builtin for many different types. We already used type overloads on the IR intrinsic so the codegen for the builtins that are being merge were already the same. This extends the type overloading to the builtins. I had to make a few tweaks to make this work. -Floating point vector-vector vmerge now uses the vmerge intrinsic instead of the vfmerge intrinsic. New isel patterns and tests are added to support this. -The SemaChecking for the immediate of vset_v/vget_v has been removed. Determining the valid range is harder now. I've added masking to ManualCodegen to ensure valid IR for invalid input. This reduces the number of builtins from ~25000 to ~1100. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D112102	2021-10-25 09:03:59 -07:00
Craig Topper	210b586a85	[RISCV] Add vcsr CSR name for V extension. Reviewed By: frasercrmck, kito-cheng Differential Revision: https://reviews.llvm.org/D112342	2021-10-25 08:56:25 -07:00
Danila Malyutin	2d9ee590b6	[AArch64] Handle ST1iN instructions in isAArch64FrameOffsetLegal Before the code would crash with "unhandled opcode in isAArch64FrameOffsetLegal" when there was a spill from extractelement. Fixes pr52249 Differential Revision: https://reviews.llvm.org/D112311	2021-10-25 17:05:12 +03:00
Kerry McLaughlin	1f49b71fe5	[SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt This patch enables the use of reciprocal estimates for SVE when both the -Ofast and -mrecip flags are used. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D111657	2021-10-25 11:30:44 +01:00
Jingu Kang	a502436259	[AArch64] Remove redundant ORRWrs which is generated by zero-extend %3:gpr32 = ORRWrs $wzr, %2, 0 %4:gpr64 = SUBREG_TO_REG 0, %3, %subreg.sub_32 If AArch64's 32-bit form of instruction defines the source operand of ORRWrs, we can remove the ORRWrs because the upper 32 bits of the source operand are set to zero. Differential Revision: https://reviews.llvm.org/D110841	2021-10-25 09:47:07 +01:00
Chen Zheng	80e6aff6bb	[PowerPC] common chains to reuse offsets to reduce register pressure. Add a new preparation pattern in PPCLoopInstFormPrep pass to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D108750	2021-10-25 03:27:16 +00:00
Kazu Hirata	9800731367	[Target, Transforms] Use predecessors instead of pred_begin and pred_end (NFC)	2021-10-24 17:35:35 -07:00
Kazu Hirata	4bd46501c3	Use llvm::any_of and llvm::none_of (NFC)	2021-10-24 17:35:33 -07:00
Matthias Braun	4b75d674f8	X86InstrInfo: Look across basic blocks in optimizeCompareInstr This extends `optimizeCompareInstr` to continue the backwards search when it reached the beginning of a basic block. If there is a single predecessor block then we can just continue the search in that block and mark the EFLAGS register as live-in. Differential Revision: https://reviews.llvm.org/D110862	2021-10-24 16:22:45 -07:00
Matthias Braun	683994c863	X86InstrInfo: Refactor and cleanup optimizeCompareInstr This changes the first part of `optimizeCompareInstr` being split into a loop with a forward scan for cases that re-use zero flags from a producer in case of compare with zero and a backward scan for finding an instruction equivalent to a compare. The code now uses a single backward scan searching for the next instructions that reads or writes EFLAGS. Also: - Add comments giving examples for the 3 cases handled. - Check `MI` which contains the result of the zero-compare cases, instead of re-checking `IsCmpZero`. - Tweak coding style in some loops. - Add new MIR based tests that test the optimization in isolation. This also removes a check for flag readers in situations like this: ``` = SUB32rr %0, %1, implicit-def $eflags ... we no longer stop when there are $eflag users here CMP32rr %0, %1 ; will be removed ... ``` Differential Revision: https://reviews.llvm.org/D110857	2021-10-24 16:22:45 -07:00
Fangrui Song	54405a49d8	[ARC] Fix -Wunused-variable. NFC	2021-10-24 10:31:44 -07:00
Simon Pilgrim	b09f2ee57c	[X86] findEltLoadSrc - fix shift amount variable name. NFCI. Fix the copy + paste, renaming shift amt from Idx to Amt	2021-10-23 21:24:37 +01:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Jessica Clarke	2d8c18fbbd	[X86] Don't add implicit REP prefix to VIA PadLock xstore Commit `8fa3e8fa14` added an implicit REP prefix to all VIA PadLock instructions, but GNU as doesn't add one to xstore, only all the others. This resulted in a kernel panic regression in FreeBSD upon updating to LLVM 11 (https://bugs.freebsd.org/259218) which includes the commit in question. This partially reverts that commit. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D112355	2021-10-23 01:57:17 +01:00
Matt Arsenault	ec57b37551	AMDGPU: Use attributor to propagate amdgpu-flat-work-group-size This can merge the acceptable ranges based on the call graph, rather than the simple application of the attribute. Remove the handling from the old pass.	2021-10-22 16:23:50 -04:00
Matt Arsenault	8d4b74ac3f	AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set It should be semantically identical if it was set to the same value as the default. Also improve the documentation.	2021-10-22 16:23:50 -04:00
Craig Topper	cd824f9e39	[X86] Fix bad formatting. NFC	2021-10-22 13:16:35 -07:00
Jay Foad	58e7ec471c	[AMDGPU] Run SIShrinkInstructions before post-RA scheduling Run post-RA SIShrinkInstructions just before post-RA scheduling, instead of afterwards. After the fixes in D112305 and D112317 this seems to make no difference, but it paves the way for scheduler tweaks that are sensitive to the e32 vs e64 encoding of VALU instructions. Differential Revision: https://reviews.llvm.org/D112341	2021-10-22 20:24:03 +01:00
Jay Foad	3f34f75a68	[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32 As described in the comment, the way we change vcc to vcc_lo in these operands confuses addPhysRegDataDeps into treating them as implicit pseudo operands. Fix this by setting the correct latency from the SchedModel after addPhysRegDataDeps wrongly set it to 0. Differential Revision: https://reviews.llvm.org/D112317	2021-10-22 20:03:29 +01:00
Craig Topper	04c184bba7	[TargetLowering] Simplify the interface of expandABS. NFC Instead of returning a bool to indicate success and a separate SDValue, return the SDValue and have the callers check if it is null. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112331	2021-10-22 10:22:23 -07:00
Kazu Hirata	6fe949c4ed	[Target, Transforms] Use StringRef::contains (NFC)	2021-10-22 08:52:33 -07:00
Jonas Paulsson	12b44bf5ee	[SystemZ] Give the EXRL_Pseudo a size value of 6 bytes. This pseudo is expanded very late (AsmPrinter) and therefore has to have a correct size value, or the branch relaxation pass may make a wrong decision. Review: Ulrich Weigand	2021-10-22 17:38:51 +02:00

... 2 3 4 5 6 ...

64968 Commits