llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazushi (Jam) Marukawa	9c82944b2d	[VE] Add vector control instructions Add LVL/SVL/SMVL/LVIX isntructions. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90355	2020-10-29 19:24:31 +09:00
Ben Shi	076a8d915b	[NFC][AVR] Improve device list Reviewed By: dylanmckay https://reviews.llvm.org/D87968	2020-10-29 10:54:17 +08:00
Kazushi (Jam) Marukawa	7942960199	[VE] Add vector mask operation instructions Add VFMK/VFMS/VFMF/ANDM/ORM/XORM/EQVM/NNDM/NEGM/PCVM/LZVM/TOVM isntructions. Add regression tests too. Also add new patterns to parse VFMK/VFMS/VFMF mnemonics. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90297	2020-10-29 08:42:41 +09:00
Austin Kerbow	de51867343	[AMDGPU] Add Reset function to GCNHazardRecognizer Reset the tracked emitted instructions when starting scheduling on a new region. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90347	2020-10-28 16:32:32 -07:00
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Jay Foad	50ee22d791	[AMDGPU] Fix double space in disassembly of SDWA instructions with vcc Differential Revision: https://reviews.llvm.org/D90317	2020-10-28 21:39:39 +00:00
Florian Hahn	772aaa6023	[AArch64] Improve lowering of insert_vector_elt with 0.0 consts. When moving +0.0 into a float vector, we can use to vi*gpr variants of INS. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D90176	2020-10-28 21:35:33 +00:00
Austin Kerbow	8b127a8661	[AMDGPU] Fix inserting combined s_nop in bundles Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90334	2020-10-28 14:34:04 -07:00
Florian Hahn	ba78cae20f	[AArch64] Use DUP for BUILD_VECTOR with few different elements. If most elements of BUILD_VECTOR are the same, with a few different elements, it is better to use DUP for the common elements and INSERT_VECTOR_ELT for the different elements. Currently this transform is guarded quite restrictively to only trigger in clearly beneficial cases. With D90176, the lowering for patterns originating from code like ` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even better (unnecessary fmov is removed). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D90233	2020-10-28 19:48:20 +00:00
Sanjay Patel	7c395f31a6	[CostModel][x86] remove cost-kind predicate for intrinsic costs We model cost as number of instructions / uops, so it does not make sense to treat size/blended costs any differently than throughput.	2020-10-28 14:33:37 -04:00
Thomas Lively	31e944556f	[WebAssembly] Prototype extending multiplication SIMD instructions As proposed in https://github.com/WebAssembly/simd/pull/376. This commit implements new builtin functions and intrinsics for these instructions, but does not yet add them to wasm_simd128.h because they have not yet been merged to the proposal. These are the first instructions with opcodes greater than 0xff, so this commit updates the MC layer and disassembler to handle that correctly. Differential Revision: https://reviews.llvm.org/D90253	2020-10-28 09:38:59 -07:00
Paul C. Anagnostopoulos	9d72065cf6	[TableGen] [AMDGPU] Add !sub operator for subtraction Use it in the AMDGPU target to eliminate !add(value1, !mul(value2, -1)) Differential Revision: https://reviews.llvm.org/D90107	2020-10-28 12:27:53 -04:00
Jay Foad	9e634bc22f	[AMDGPU] Omit needless string concatenations. NFC.	2020-10-28 12:56:52 +00:00
Kazushi (Jam) Marukawa	cbdee7df06	[VE] Add vector merger operation instructions Add VMRG/VSHF/VCP/VEX isntructions. Add regression tests too. Also add new patterns to parse new UImm4 oeprand. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90292	2020-10-28 19:57:10 +09:00
Kazushi (Jam) Marukawa	7ce2b93cbe	[VE] Add vector iterative operation instructions Add VFIA/VFIS/VFIM/VFIAM/VFISM/VFIMA/VFIMS isntructions. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90252	2020-10-28 19:06:46 +09:00
Kazushi (Jam) Marukawa	15f6250bed	[VE][NFC] Fix typo in comment	2020-10-28 18:51:07 +09:00
Kazushi (Jam) Marukawa	b22e32a9c8	[VE] Specify to expand BRIND and BR_JT BRIND and BR_JT are not implmented yet, so expand them atm. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90283	2020-10-28 18:50:20 +09:00
David Green	066737fdbc	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Carl Ritson	057934a6d7	[AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc SIPreAllocateWWMRegs was being inserted after RegisterCoalescer but this pass does not exist during FastAlloc so pre-allocation pass was never being run. Insert pre-allocation after TwoAddressInstructionPass instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90236	2020-10-28 12:15:15 +09:00
Nemanja Ivanovic	5459d08795	[PowerPC] Fix single-use check and update chain users for ld-splat When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load as of `1461fb6e78`, we inaccurately check for a single user of the load and neglect to update the users of the output chain of the original load. As a result, we can emit a new load when the original load is kept and the new load can be reordered after a dependent store. This patch fixes those two issues. Fixes https://bugs.llvm.org/show_bug.cgi?id=47891	2020-10-27 16:49:38 -05:00
Stanislav Mekhanoshin	78ae1f6c90	[AMDGPU] Change predicate for fma/fmac legacy I do not exactly like the use of a negative predicate to enable instructions' support. Change HasNoMadMacF32Insts with HasFmaLegacy32. Differential Revision: https://reviews.llvm.org/D90250	2020-10-27 12:03:52 -07:00
Victor Huang	2e1a737f46	[PowerPC][PCRelative] Turn on TLS support for PCRel by default Turn on TLS support for PCRel by default and update the test cases. Differential Revision: https://reviews.llvm.org/D88738 Reviewed by: stefanp, kamaub	2020-10-27 13:58:44 -05:00
Michael Liao	46c3d5cb05	[amdgpu] Add the late codegen preparation pass. Summary: - Teach that pass to widen naturally aligned but not DWORD aligned sub-DWORD loads. Reviewers: rampitec, arsenm Subscribers: Tags: #llvm Differential Revision: https://reviews.llvm.org/D80364	2020-10-27 14:07:59 -04:00
Kazushi (Jam) Marukawa	a65883a78a	[VE] Add vector reduction instructions Add VSUMS/VSUMX/VFSUM/VMAXS/VMAXX/VFMAX/VRAND/VROR/VRXOR isntructions. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90227	2020-10-28 02:33:21 +09:00
Michael Liao	0d092303b4	[amdgpu] Enable use of AA during codegen. - Add an internal option `-amdgpu-use-aa-in-codegen` to enable or disable this feature. By Default, it's enabled. Differential Revision: https://reviews.llvm.org/D89320	2020-10-27 09:46:23 -04:00
Benjamin Kramer	35f7cbf9df	[X86] Don't crash on CVTPS2PH with wide vector inputs.	2020-10-27 14:42:02 +01:00
Kazushi (Jam) Marukawa	c5fa6bae12	[VE] Add vector float instructions Add VFAD/VFSB/VFMP/VFDV/VFSQRT/VFCP/VFCM/VFMAD/VFMSB/VFNMAD/VFNMSB/ VRCP/VRSQRT/VRSQRTNEX/VFIX/VFIXX/VFLT/VFLTX/VCVS/VCVD instructions. Add regression tests too. Also add additional AsmParser for VFIX and VFIXX instructions to parse their mnemonic. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90166	2020-10-27 20:42:24 +09:00
Jay Foad	6539ebe97d	[AMDGPU] Use DPP instead of Ext in a couple of class names. NFC.	2020-10-27 10:22:30 +00:00
Craig Topper	f385823e04	[X86] Alternate implementation of D88194. This uses PreprocessISelDAG to replace the constant before instruction selection instead of matching opcodes after. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D89178	2020-10-27 00:20:03 -07:00
Wei Wang	d602e79a81	[X86] Encode global address in small code model In small code model, program and its symbols are linked in the lower 2 GB of the address space. Try encoding global address even when the range is unknown in such case. Differential Revision: https://reviews.llvm.org/D89341	2020-10-26 23:14:06 -07:00
Bing1 Yu	2c08f1b4b6	[CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded... In each 128-lane, if there is at least one index is demanded and not all indices are demanded and this 128-lane is not the first 128-lane of the legalized-vector, then this 128-lane needs a extracti128; If in each 128-lane, there is at least one index is demanded, this 128-lane needs a inserti128. The following cases will help you build a better understanding: Assume we insert several elements into a v8i32 vector in avx2, Case#1: inserting into 1th index needs vpinsrd + inserti128 Case#2: inserting into 5th index needs extracti128 + vpinsrd + inserti128 Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D89767	2020-10-27 11:21:13 +08:00
Chen Zheng	00e573cadb	[LSR] fix typo in comments and rename for a new added hook.	2020-10-26 22:29:22 -04:00
Carl Ritson	7a880ab388	[AMDGPU] Move WQM Pass after MI Scheduler Exec mask manipulation inserted by SIWholeQuadMode barriers to instruction scheduling. Move the entire pass after the machine instruction scheduler and make changes so pass is correct for non-SSA operation. These changes should leave the pass still usable pre-scheduler, although tests have be updated to reflect post-scheduler results. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D88081	2020-10-27 10:25:53 +09:00
Amy Kwan	803cc3aff2	[PowerPC] Implement Set Boolean Condition Instructions This patch implements the set boolean condition instructions introduced in POWER10. The set boolean condition instructions (set[n]bc[r]) are used during the following situations: - sign/zero/any extending i1 to an i32 or i64, - reg+reg, reg+imm or floating point comparisons being sign/zero extended to i32 or i64, - spilling CR bits (using the setnbc instruction) Differential Revision: https://reviews.llvm.org/D87705	2020-10-26 18:42:51 -05:00
Stanislav Mekhanoshin	d176e13ca5	Fixed release build after D89170	2020-10-26 16:00:57 -07:00
Stanislav Mekhanoshin	038d884a50	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170	2020-10-26 14:40:42 -07:00
Evgeny Leviant	a28388f95b	[ARM][SchedModels] Move IsLDMBaseRegInListPred to ARMSchedule.td. NFC This predicate is not specific to cortex-a57 and can be used in other processor models as well.	2020-10-26 22:31:41 +03:00
Stanislav Mekhanoshin	ad8131bb03	[AMDGPU] Fix VC warning about singed/unsigned comparison. NFC. This is the warning reported in https://reviews.llvm.org/D89599	2020-10-26 11:55:57 -07:00
Evgeny Leviant	e74f66125e	[ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90150	2020-10-26 20:22:41 +03:00
Evgeny Leviant	a877bda397	Fix issue in cortex-a57 sched model Differential revision: https://reviews.llvm.org/D90152	2020-10-26 20:16:40 +03:00
Benjamin Kramer	b777d30496	[AMDGPU] Avoid unused variable warning in Release builds. NFC. SIRegisterInfo.cpp:480:19: error: unused variable 'SOffset'	2020-10-26 18:11:57 +01:00
Kazushi (Jam) Marukawa	9d0db405b5	[VE] Add vector shift instructions Add VSLL/VSLD/VSRL/VSLA/VSLAX/VSRA/VSRAX/VSFA instructionss. Add additonal AsmParser for VSLD special operand. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90143	2020-10-27 00:30:27 +09:00
Kazushi (Jam) Marukawa	83cb423c6e	[VE] Add vector logical instructions Add VAND/VOR/VXOE/VEQV/VLDZ/VPCNT/VBRV/VSEQ instrucitons and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90141	2020-10-27 00:29:33 +09:00
Kazushi (Jam) Marukawa	cfefef50c1	[VE] Support atomic store Support atomic store instructions and add a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90137	2020-10-27 00:28:11 +09:00
Jay Foad	0ca4124798	[AMDGPU] Make more use of printNamedBit in AMDGPUInstPrinter. NFC.	2020-10-26 14:03:35 +00:00
Kazushi (Jam) Marukawa	8aa60f67dc	[VE] Add vector comparison and min/max Add VCMP/VCPS/VCPX/VCMS/VCMX vector instructions. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89643	2020-10-26 18:32:04 +09:00
Kazushi (Jam) Marukawa	0acf700243	[VE] Add integer arithmetic vector instructions Add VADD/VADS/VADX/VSUB/VSBS/VSBX/VMPY/VMPS/VMPX/VMPD/VDIV/VDVS/VDVX instructions. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89642	2020-10-26 18:30:11 +09:00
Sebastian Neubauer	a094b4fa4b	[AMDGPU] Emit new pal metadata by default If no pal metadata is given, default to the msgpack format instead of the legacy metadata. This makes tests better readable. Differential Revision: https://reviews.llvm.org/D90035	2020-10-26 10:16:17 +01:00
Evgeny Leviant	a95ce5f65f	[ARM][SchedModels] Rename and generalize predicate. NFC	2020-10-26 12:14:55 +03:00
Kazushi (Jam) Marukawa	f32992ad24	[VE] Support atomic load Support atomic load instruction and add a regression test. VE uses release consitency, so need to insert fence around atomic instructions. This patch enable AtomicExpandPass and use emitLeadingFence and emitTrailingFence mechanism for such purpose. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90135	2020-10-26 18:02:45 +09:00
Evgeny Leviant	99b2756517	[ARM][SchedModels] Get rid of IsLdrAm2ScaledPred Differential revision: https://reviews.llvm.org/D90024	2020-10-26 12:01:39 +03:00
Evgeny Leviant	a4fc18e641	[ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90029	2020-10-26 11:54:08 +03:00
Evgeny Leviant	d613e39d52	[ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D90045	2020-10-26 11:43:02 +03:00
David Green	61bc18de0b	[Schedule] Add a MultiHazardRecognizer This adds a MultiHazardRecognizer and starts to make use of it in the ARM backend. The idea of the class is to allow multiple independent hazard recognizers to be added to a single base MultiHazardRecognizer, allowing them to all work in parallel without requiring them to be chained into subclasses. They can then be added or not based on cpu or subtarget features, which will become useful in the ARM backend once more hazard recognizers are being used for various things. This also renames ARMHazardRecognizer to ARMHazardRecognizerFPMLx in the process, to more clearly explain what that recognizer is designed for. Differential Revision: https://reviews.llvm.org/D72939	2020-10-26 08:06:17 +00:00
Kazushi (Jam) Marukawa	52f03fe115	[VE] Support atomic fence Support atomic fence instruction and add a regression test. Add MEMBARRIER pseudo insturction also to use it as a barrier against to the compiler optimizations. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D90112	2020-10-26 17:03:09 +09:00
Christudasan Devadasan	5a061041ec	[AMDGPU] Avoid offset register in MUBUF for direct stack object accesses We use an absolute address for stack objects and it would be necessary to have a constant 0 for soffset field. Fixes: SWDEV-228562 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89234	2020-10-26 11:08:37 +05:30
Craig Topper	82974e0114	[X86] Don't disassemble wbinvd with 0xf2 or 0x66 prefix. The 0xf3 prefix has been defined as wbnoinvd on Icelake Server. So the prefix isn't ignored by the CPU. AMD documentation suggests that wbnoinvd is treated as wbinvd on older processors. Intel documentation is not clear. Perhaps 0xf2 and 0x66 are treated the same, but its not documented. This patch changes TB to PS in the td file so 0xf2 and 0x66 will be treated as errors. This matches versions of objdump after wbnoinvd was added.	2020-10-25 20:56:01 -07:00
Liu, Chen3	180548c5c7	[X86] VEX/EVEX prefix doesn't work for inline assembly. For now, we lost the encoding information if we using inline assembly. The encoding for the inline assembly will keep default even if we add the vex/evex prefix. Differential Revision: https://reviews.llvm.org/D90009	2020-10-26 08:37:45 +08:00
Craig Topper	63ba82ed00	[X86] Use TargetConstant for immediates for VASTART_SAVE_XMM_REGS.	2020-10-25 12:52:56 -07:00
Craig Topper	2ed16aa66f	[X86] Use TargetConstant instead of Constant for operands to X86vaarg64.	2020-10-25 12:24:59 -07:00
Craig Topper	a222d832d5	[X86] Use TargetConstant for FPDiff with X86::TC_RETURN. It's required to be a constant and can never be in a register so make it explicit.	2020-10-25 00:29:11 -07:00
Fangrui Song	f04d92af94	[X86] Produce R_X86_64_GOTPCRELX for test/binop instructions (MOV32rm/TEST32rm/...) when -Wa,-mrelax-relocations=yes is enabled We have been producing R_X86_64_REX_GOTPCRELX (MOV64rm/TEST64rm/...) and R_X86_64_GOTPCRELX for CALL64m/JMP64m without the REX prefix since 2016 (to be consistent with GNU as), but not for MOV32rm/TEST32rm/...	2020-10-24 15:14:17 -07:00
Fangrui Song	d5adadb3a5	[AArch64][GlobalISel] Fix -Wunused-variable. NFC	2020-10-24 12:47:11 -07:00
Benjamin Kramer	39a0d6889d	[X86] Add a stub for Intel's alderlake. No scheduling, no autodetection.	2020-10-24 19:01:22 +02:00
Benjamin Kramer	bd2cf96c09	[X86] Add a stub for znver3 based on the little public information there is in AMD's manuals No scheduling, no autodetection. Just enough so -march=znver3 works.	2020-10-24 19:01:22 +02:00
dfukalov	9068c20965	[AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions. 1. Throughput and codesize costs estimations was separated and updated. 2. Updated fdiv cost estimation for different cases. 3. Added scalarization processing for types that are treated as !isSimple() to improve codesize estimation in getArithmeticInstrCost() and getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path of base implementation. Next step is unify scalarization part in base class that is currently works for TCK_RecipThroughput path only. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89973	2020-10-24 19:53:08 +03:00
David Green	92205bf122	[ARM] Remove some dead code. NFC	2020-10-24 17:22:49 +01:00
Simon Pilgrim	ce356e1546	[DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method. I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns. Differential Revision: https://reviews.llvm.org/D87930	2020-10-24 12:23:09 +01:00
Jonas Paulsson	7c026a83ee	[SystemZ] Define MaxInstLength to have the value of 6. This value had the default value of 4 which caused branch relaxation to fail. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D90065	2020-10-24 09:19:34 +02:00
Krzysztof Parzyszek	1b5baa42bc	[Hexagon] Handle selection between HVX vector predicates Make sure that (select i1 q0 q1) is handled properly.	2020-10-23 18:22:03 -05:00
Evandro Menezes	fe9a7d9627	[RISCV] Use the commercial name for scheduling model (NFC) Use the commercial name for the scheduling model for the SiFive 7 Series.	2020-10-23 16:33:27 -05:00
Cameron McInally	a1cc274cb3	[SVE] Lower fixed length VECREDUCE_SEQ_FADD operation Differential Revision: https://reviews.llvm.org/D89162	2020-10-23 16:24:02 -05:00
Stanislav Mekhanoshin	2e64ad9494	[AMDGPU] Fixed isLegalRegOperand() with physregs This does not change anything at the moment, but needed for D89170. In that change I am probing a physical SGPR to see if it is legal. RC is SReg_32, but DRC for scratch instructions is SReg_32_XEXEC_HI and test fails. That is sufficient just to check if DRC contains a register here in case of physreg. Physregs also do not use subregs so the subreg handling below is irrelevant for these. Differential Revision: https://reviews.llvm.org/D90064	2020-10-23 11:33:34 -07:00
Baptiste Saleil	edb27912a3	[PowerPC] Add intrinsics for MMA This patch adds support for MMA intrinsics. Authored by: Baptiste Saleil Reviewed By: #powerpc, bsaleil, amyk Differential Revision: https://reviews.llvm.org/D89345	2020-10-23 13:16:02 -05:00
Amara Emerson	0f0fd383b4	[AArch64][GlobalISel] Introduce a new post-isel optimization pass. There are two optimizations here: 1. Consider the following code: FCMPSrr %0, %1, implicit-def $nzcv %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv FCMPSrr %0, %1, implicit-def $nzcv %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv This kind of code where we have 2 FCMPs each feeding a CSEL can happen when we have a single IR fcmp being used by two selects. During selection, to ensure that there can be no clobbering of nzcv between the fcmp and the csel, we have to generate an fcmp immediately before each csel is selected. However, often we can essentially CSE these together later in MachineCSE. This doesn't work though if there are unrelated flag-setting instructions in between the two FCMPs. In this case, the SUBS defines NZCV but it doesn't have any users, being overwritten by the second FCMP. Our solution here is to try to convert flag setting operations between a interval of identical FCMPs, so that CSE will be able to eliminate one. 2. SelectionDAG imported patterns for arithmetic ops currently select the flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand to those instructions. However if those impdef operands are not marked as dead, the peephole optimizations are not able to optimize them into non-flag setting variants. The optimization here is to find these dead imp-defs and mark them as such. This pass is only enabled when optimizations are enabled. Differential Revision: https://reviews.llvm.org/D89415	2020-10-23 10:18:36 -07:00
Huihui Zhang	1e113c078a	[AArch64][SVE] Fix umin/umax lowering to handle out of range imm. Immediate must be in an integer range [0,255] for umin/umax instruction. Extend pattern matching helper SelectSVEArithImm() to take in value type bitwidth when checking immediate value is in range or not. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D89831	2020-10-23 09:42:56 -07:00
Victor Huang	7a74bb899a	[PowerPC] Fix the Predicates for enabling pcrelative-memops and PLXVP/PSTXVP definitions In this patch, Predicates fix added for the following: * disable prefix-instrs will disable pcrelative-memops * set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions Differential Revision: https://reviews.llvm.org/D89727 Reviewed by: amyk, steven.zhang	2020-10-23 11:33:20 -05:00
vpykhtin	00255f4192	[AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse. I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393. First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter. I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among. modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary. Differential Revision: https://reviews.llvm.org/D89386	2020-10-23 19:17:48 +03:00
Paulo Matos	69e2797eae	[WebAssembly] Implementation of (most) table instructions Implementation of instructions table.get, table.set, table.grow, table.size, table.fill, table.copy. Missing instructions are table.init and elem.drop as they deal with element sections which are not yet implemented. Added more tests to tables.s Differential Revision: https://reviews.llvm.org/D89797	2020-10-23 08:42:54 -07:00
Jay Foad	958130dfda	[AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy This follows on from D89558 which added the new intrinsic and D88955 which added similar combines for llvm.amdgcn.fmul.legacy. Differential Revision: https://reviews.llvm.org/D90028	2020-10-23 16:16:13 +01:00
Paul C. Anagnostopoulos	876af264c1	[TableGen] Change !getop and !setop to !getdagop and !setdagop. Differential Revision: https://reviews.llvm.org/D89814	2020-10-23 10:36:05 -04:00
Matt Arsenault	8a59d4b654	AMDGPU: Don't query for TII in TII	2020-10-23 10:34:24 -04:00
Matt Arsenault	d61996473d	AMDGPU: Increase branch size estimate with offset bug This will be relaxed to insert a nop if the offset hits the bad value, so over estimate branch instruction sizes.	2020-10-23 10:34:24 -04:00
Chen Zheng	1e0b6c1df0	[LSR] ignore profitable chain when reg num is not major cost. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D89665	2020-10-23 09:35:48 -04:00
Simon Pilgrim	936ef89ebe	[X86] lowerShuffleWithPERMV - use MVT::changeTypeToInteger helper. NFCI.	2020-10-23 12:35:27 +01:00
Evgeny Leviant	cb86522c94	[ARM][SchedModels] Convert IsR1P0AndLaterPred to MCSchedPredicate. NFC Differential revision: https://reviews.llvm.org/D90017	2020-10-23 14:27:49 +03:00
Florian Hahn	0fcc6f7a76	[AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics. This patch adds a specialized implementation of getIntrinsicInstrCost and add initial cost-modeling for min/max vector intrinsics. AArch64 NEON support umin/smin/umax/smax for vectors <8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>. Notably, it does not support vectors with i64 elements. This change by itself should have very little impact on codegen, but in follow-up patches I plan to teach the vectorizers to consider using those intrinsics on platforms where it is profitable, e.g. because there is no general 'select'-like instruction. The current cost returned should be better for throughput, latency and size. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D89953	2020-10-23 11:32:42 +01:00
Jay Foad	86a480e9ce	[AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy Differential Revision: https://reviews.llvm.org/D88955	2020-10-23 09:31:00 +01:00
Evgeny Leviant	7a78073be7	[ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate Differential revision: https://reviews.llvm.org/D89957	2020-10-23 10:33:20 +03:00
Jessica Paquette	19dc9c9780	[AArch64][GlobalISel] Move imm adjustment for G_ICMP to post-legalizer lowering Move the code which adjusts the immediate/predicate on a G_ICMP to AArch64PostLegalizerLowering. This - Reduces the number of places we need to test for optimized compares in the selector. We know that the compare should have been simplified by the time it hits the selector, so we can avoid testing this in selects, brconds, etc. - Allows us to potentially fold more compares (previously, this optimization was only done after calling `tryFoldCompare`, this may allow us to hit some more TST cases) - Simplifies the selection code in `emitIntegerCompare` significantly; we can just use an emitSUBS function. - Allows us to avoid checking that the predicate has been updated after `emitIntegerCompare`. Also add a utility header file for things that may be useful in the selector and various combiners. No need for an implementation file at this point, since it's just one constexpr function for now. I've run into a couple cases where having one of these would be handy, so might as well add it here. There are a couple functions in the selector that can probably be factored out into here. Differential Revision: https://reviews.llvm.org/D89823	2020-10-22 15:27:36 -07:00
Jessica Paquette	147b9497e7	[AArch64][GlobalISel] Split post-legalizer combiner to allow for lowering at -O0 There are a lot of combines in AArch64PostLegalizerCombiner which exist to facilitate instruction matching in the selector. (E.g. matching for G_ZIP and other shuffle vector pseudos) It still makes sense to select these instructions at -O0. Matching earlier in a combiner can reduce complexity in the selector significantly. For example, a good portion of our selection code for compares would be a lot easier to represent in a combine. This patch moves matching combines into a "AArch64PostLegalizerLowering" combiner which runs at all optimization levels. Also, while we're here, improve the documentation for the AArch64PostLegalizerCombiner, and fix up the filepath in its file comment. And also add a 'r' which somehow got dropped from a bunch of function names. https://reviews.llvm.org/D89820	2020-10-22 14:43:25 -07:00
Tim Corringham	3c1273d737	[AMDGPU] Add amdgpu specific loop threshold metadata Add new loop metadata amdgpu.loop.unroll.threshold to allow the initial AMDGPU specific unroll threshold value to be specified on a loop by loop basis. The intention is to be able to to allow more nuanced hints, e.g. specifying a low threshold value to indicate that a loop may be unrolled if cheap enough rather than using the all or nothing llvm.loop.unroll.disable metadata. Differential Revision: https://reviews.llvm.org/D84779	2020-10-22 17:21:32 +01:00
Mircea Trofin	e24537d48f	[NFC][MC] Use MCRegister for ReachingDefAnalysis APIs Also updated the users of the APIs; and a drive-by small change to RDFRegister.cpp Differential Revision: https://reviews.llvm.org/D89912	2020-10-22 08:47:35 -07:00
Piotr Sobczak	7ae0033ca8	[AMDGPU] Fix expansion of i16 MULH This commit marks i16 MULH as expand in AMDGPU backend, which is necessary after the refactoring in D80485. Differential Revision: https://reviews.llvm.org/D89965	2020-10-22 17:05:06 +02:00
Evgeny Leviant	ed6a91f456	[ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89939	2020-10-22 18:03:01 +03:00
Simon Pilgrim	2692978050	[X86] X86AsmParser - make methods const where possible. NFCI. Reported by cppcheck	2020-10-22 15:55:06 +01:00
Simon Pilgrim	091b18ba81	[X86] Return const& in IntelExprStateMachine::getIdentifierInfo(). NFCI. Avoid unnecessary copy in X86AsmParser::ParseIntelOperand	2020-10-22 15:55:06 +01:00
Matt Arsenault	d5c0561667	AMDGPU: Fix not always reserving VGPRs used for SGPR spilling The VGPRs used for SGPR spills need to be reserved, even if we aren't speculatively reserving one. This was broken by `117e5609e9`.	2020-10-22 10:19:19 -04:00
Matt Arsenault	d3bcfe2a36	AMDGPU: Implement getNoPreservedMask We don't support funclets for exception handling and I hit this when manually reducing MIR.	2020-10-22 10:17:31 -04:00
Simon Pilgrim	794dc7ad26	[CodeGen] Split MVT::changeTypeToInteger() functionality from EVT::changeTypeToInteger(). Add the MVT equivalent handling for EVT changeTypeToInteger/changeVectorElementType/changeVectorElementTypeToInteger. All the SimpleVT code already exists inside the EVT equivalents, but by splitting this out we can use these directly inside MVT types without converting to/from EVT.	2020-10-22 14:27:42 +01:00
Jeremy Morse	d73275993b	[DebugInstrRef] Substitute debug value numbers to handle optimizations This patch touches two optimizations, TwoAddressInstruction and X86's FixupLEAs pass, both of which optimize by re-creating instructions. For LEAs, various bits of arithmetic are better represented as LEAs on X86, while TwoAddressInstruction sometimes converts instrs into three address instructions if it's profitable. For debug instruction referencing, both of these require substitutions to be created -- the old instruction number must be pointed to the new instruction number, as illustrated in the added test. If this isn't done, any variable locations based on the optimized instruction are conservatively dropped. Differential Revision: https://reviews.llvm.org/D85756	2020-10-22 13:01:03 +01:00
Tianqing Wang	be39a6fe6f	[X86] Add User Interrupts(UINTR) instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89301	2020-10-22 17:33:07 +08:00
Tony	8e8cc587a5	[NFC][AMDGPU] Reorder SIMemoryLegalizer functions to be consistent - Make the SIMemoryLegalizer insertAcquire function be in the same order for each target to be consistent. Differential Revision: https://reviews.llvm.org/D89880	2020-10-22 05:39:18 +00:00
Xiang1 Zhang	7c3fea7721	[X86] Support customizing stack protector guard Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D88631	2020-10-22 10:08:14 +08:00
Craig Topper	9e884169a2	[FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes. There are still problems with unsigned i32 conversions too which I'll try to fix in another patch. Fixes part of PR47393 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87115	2020-10-21 18:12:54 -07:00
Gaurav Jain	4634ad6c0b	[NFC] Set return type of getStackPointerRegisterToSaveRestore to Register Differential Revision: https://reviews.llvm.org/D89858	2020-10-21 16:19:38 -07:00
Evgeny Leviant	bf9edcb6fd	[ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89876	2020-10-21 20:49:10 +03:00
Stanislav Mekhanoshin	611959f004	[AMDGPU] Fixed v_swap_b32 match 1. Fixed liveness issue with implicit kills. 2. Fixed potential problem with an indirect mov. Fixes: SWDEV-256848 Differential Revision: https://reviews.llvm.org/D89599	2020-10-21 10:14:24 -07:00
Joe Nash	f6d7832f4c	[AMDGPU] Refactor SOPC & SOPP .td for extension We use the Real vs Pseudo instruction abstraction for other types of instructions to facilitate changes in opcode between gpu generations. This patch introduces that abstraction to SOPC and SOPP. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89738 Change-Id: I59d53c2c7058b49d05b60350f4062a9b542d3138	2020-10-21 12:35:52 -04:00
Matt Arsenault	1ed4caff1d	AMDGPU: Lower the threshold reported for maximum stack size exceeded Check the actual maximum supported stack size for a kernel.	2020-10-21 12:06:27 -04:00
Matt Arsenault	53c43431bc	AMDGPU: Propagate amdgpu-flat-work-group-size attributes Fixes being overly conservative with the register counts in called functions. This should try to do a conservative range merge, but for now just clone. Also fix not being able to functionally run the pass standalone.	2020-10-21 12:06:24 -04:00
Paul C. Anagnostopoulos	dfd6b69e01	[ARM] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans Differential Revision: https://reviews.llvm.org/D89822	2020-10-21 09:52:45 -04:00
Jonas Paulsson	1606755da0	[SystemZ] Mark unsaved argument R6 as live throughout function. For historical reasons, the R6 register is a callee-saved argument register. This means that if it is used to pass an argument to a function that does not clobber it, it is live throughout the function. This patch makes sure that in this special case any kill flags of it are removed. Review: Ulrich Weigand, Eli Friedman Differential Revision: https://reviews.llvm.org/D89451	2020-10-21 14:38:59 +02:00
Nicholas Guy	9a2d2bedb7	Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate Some instructions may be removable through processes such as IfConversion, however DefinesPredicate can not be made aware of when this should be considered. This parameter allows DefinesPredicate to distinguish these removable instructions on a per-call basis, allowing for more fine-grained control from processes like ifConversion. Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose Differential Revision: https://reviews.llvm.org/D88494	2020-10-21 11:52:47 +01:00
Sebastian Neubauer	5290f50e44	[AMDGPU] Fix off by one in assert D89217 did not subtract one when accessing SubRegFromChannelTable in one place. Differential Revision: https://reviews.llvm.org/D89804	2020-10-21 12:37:52 +02:00
Jay Foad	f6a5699c6c	[AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC.	2020-10-21 09:56:43 +01:00
Craig Topper	d4d0b41a82	[X86] Remove period from end of error message in assembler Addresses post-commit feedback from D89837.	2020-10-21 00:43:23 -07:00
Craig Topper	79a69f558f	[X86] Error on using h-registers with REX prefix in the assembler instead of leaving it to a fatal error in the encoder. Using a fatal error is bad for user experience. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D89837	2020-10-20 21:35:44 -07:00
Carl Ritson	324a15cead	[AMDGPU][NFC] Fix missing size in comment	2020-10-21 11:38:21 +09:00
Austin Kerbow	ebdcef20ce	[AMDGPU] Avoid inserting noops during scheduling Passes that are run after the post-RA scheduler may insert instructions like waitcnt which eliminate the need for certain noops. After this patch the scheduler is still aware of possible latency from hazards but noops will not be inserted until the dedicated hazard recognizer pass is run. Depends on D89753. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D89754	2020-10-20 17:11:36 -07:00
Austin Kerbow	37d907899f	[HazardRec] Allow inserting multiple wait-states simultaneously If a target can encode multiple wait-states into a noop allow emitting such instructions directly. Reviewed By: rampitec, dmgreen Differential Revision: https://reviews.llvm.org/D89753	2020-10-20 17:03:47 -07:00
Tony	1bc7bfffdb	[AMDGPU] Optimize waitcnt insertion for flat memory operations Change waitcnt insertion to check the memory operand tokens to see if flat memory operations access VMEM in the same way it does to check if accessing LDS. This avoids adding waitcnt for counters for address spaces that are not accessed. In addition, only generate the pessimistic waitcnt 0 if a flat memory operation appears to access both VMEM and LDS. This benefits flat memory operations that explicitly specify the address space as GLOBAL or LOCAL. Differential Revision: https://reviews.llvm.org/D89618	2020-10-20 22:55:12 +00:00
Craig Topper	1298252f80	[X86] Move 'int $3' -> 'int3' handling in the assembler to processInstruction. Instead of handling before parsing, just fix it after parsing.	2020-10-20 15:22:00 -07:00
Craig Topper	702aae368a	[X86] Move 's{hr,ar,hl} , <op>' to 'shift <op>' optimization in the assembler into processInstruction. Instead of detecting the mnemonic and hacking the operands before parsing. Just fix it up after parsing.	2020-10-20 15:20:46 -07:00
Paul C. Anagnostopoulos	332ff48dee	[AMDGPU] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans Differential Revision: https://reviews.llvm.org/D89796	2020-10-20 16:23:15 -04:00
vnalamot	89f7ccea6f	[AMDGPU] Remove getAllVGPR32() which cannot handle Accum VGPRs properly Remove getAllVGPR32() interface and update the SGPR spill code to use a proper method to get the relevant VGPR registers list. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89806	2020-10-20 23:15:24 +05:30
Jay Foad	4913e3627c	[AMDGPU] Remove unused declaration. NFC. The implementation of this method was removed in D89706.	2020-10-20 16:31:42 +01:00
Michael Liao	2a0e4d1c01	[amdgpu] Enhance AMDGPU AA. - In general, a generic point may alias to pointers in all other address spaces. However, for certain cases enforced by the programming model, we may found a generic point won't alias to pointers to local objects. * When a generic pointer is loaded from the constant address space, it could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it won't alias to pointers to the PRIVATE or LOCAL address space. * When a generic pointer is passed as a kernel argument, it also could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it also won't alias to pointers to the PRIVATE or LOCAL address space. Differential Revision: https://reviews.llvm.org/D89525	2020-10-20 09:54:12 -04:00
Carl Ritson	be2afbd019	[AMDGPU] Remove fix up operand from SI_ELSE Remove immediate operand from SI_ELSE which indicates if EXEC has been modified. Instead always emit code that handles EXEC and remove unnecessary instructions during pre-RA optimisation. This facilitates passes (i.e. SIWholeQuadMode) adding exec mask manipulation post control flow lowering, and pre control flow lower passes do not need to be aware of SI_ELSE handling. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D89644	2020-10-20 19:15:21 +09:00
Carl Ritson	6aabbeadae	[AMDGPU][NFC] Tidy SIOptimizeExecMaskingPreRA for extensibility Remove duplicate code and move things around to make it easier to add additional optimisations to the pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89619	2020-10-20 17:22:43 +09:00
Ulrich Weigand	c299f3555d	[SystemZ] Fix disassembler crashes The "Size" value returned by SystemZDisassembler::getInstruction is used by common code even in the case where the routine returns failure. If that Size value exceeds the number of bytes remaining in the section, that could cause disassembler crashes. Fixed by never returning more than the number of bytes remaining.	2020-10-20 10:21:42 +02:00
Evgeny Leviant	991e86156c	[ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89460	2020-10-20 11:14:21 +03:00
David Green	6dcbc323fd	Revert "[ARM][LowOverheadLoops] Adjust Start insertion." This reverts commit `38f625d0d1`. This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.	2020-10-20 08:55:21 +01:00
Luqman Aden	51892a42da	[COFF][ARM] Fix CodeView for Windows on 32bit ARM targets. Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D89622	2020-10-19 22:16:16 -07:00
Wang, Pengfei	3a85472af2	[X86] Fix assert fail when element type is i1. extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail. Since we don't really handle vxi1 cases in below code, we can just return from here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89096	2020-10-20 09:26:32 +08:00
Stanislav Mekhanoshin	6ddadf9901	[AMDGPU] flat scratch ST addressing mode on gfx10 GFX10 enables third addressing mode for flat scratch instructions, an ST mode. In that mode both register operands are omitted and only swizzled offset is used in addition to flat_scratch base. Differential Revision: https://reviews.llvm.org/D89501	2020-10-19 15:29:52 -07:00
Sergei Trofimovich	1eb812e06d	[VE] Fix initializer visibility Before the change attempt to link libLTO.so against shared LLVM library failed as: ``` [ 76%] Linking CXX shared library ../../lib/libLTO.so ... /usr/bin/cmake -E cmake_link_script CMakeFiles/LTO.dir/link.txt --verbose=1 c++ -o ...libLTO.so.12git ...ibLLVM-12git.so ld: CMakeFiles/LTO.dir/lto.cpp.o: in function `llvm::InitializeAllTargetInfos()': include/llvm/Config/Targets.def:31: undefined reference to `LLVMInitializeVETargetInfo' ``` It happens because on linux llvm build system sets default symbol visibility to "hidden". The fix is to set visibility back to "default" for exported APIs with LLVM_EXTERNAL_VISIBILITY. Bug: https://bugs.llvm.org/show_bug.cgi?id=47847 Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89633	2020-10-19 22:54:41 +01:00
Evgenii Stepanov	188a7d6710	Add alloca size threshold for StackTagging initializer merging. Summary: Initializer merging generates pretty inefficient code for large allocas that also happens to trigger an exponential algorithm somewhere in Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867. This change adds an upper limit for the alloca size. The default limit is selected such that worst case size of memtag-generated code is similar to non-memtag (but because of the ISA quirks, this case is realized at the different value of alloca size, ex. memset inlining triggers at sizes below 512, but stack tagging instructions are 2x shorter, so limit is approx. 256). We could try harder to emit more compact code with initializer merging, but that would only affect large, sparsely initialized allocas, and those are doing fine already. Reviewers: vitalybuka, pcc Subscribers: llvm-commits	2020-10-19 13:44:07 -07:00
Craig Topper	e28376ec28	[X86] Add i32->float and i64->double bitcast pseudo instructions to store folding table. We have pseudo instructions we use for bitcasts between these types. We have them in the load folding table, but not the store folding table. This adds them there so they can be used for stack spills. I added an exact size check so that we don't fold when the stack slot is larger than the GPR. Otherwise the upper bits in the stack slot would be garbage. That would be fine for Eli's test case in PR47874, but I'm not sure its safe in general. A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int pseudo instructions to use FR32 as the second source register class instead of VR128. That will keep the coalescer from promoting the register class of the bitcast instruction which will make the stack slot 4 bytes instead of 16 bytes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89656	2020-10-19 12:53:14 -07:00
Tony	6be9c7d2dc	[AMDGPU] Correct comment typo in SIMemoryLegaliizer.cpp	2020-10-19 18:50:28 +00:00
Jay Foad	56f6bf1a8d	[AMDGPU] Remove MUL_LOHI_U24/MUL_LOHI_I24 These were introduced in r279902 on the grounds that using separate MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple uses of the operands, which would prevent SimplifyDemandedBits from simplifying the operands. This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations" No functional change intended. At least it has no effect on lit tests. Differential Revision: https://reviews.llvm.org/D89706	2020-10-19 19:15:34 +01:00
Amy Kwan	6a946fd06f	[DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead. MULH is often expanded on targets. This patch removes the isMulhCheaperThanMulShift hook and uses isOperationLegalOrCustom instead. Differential Revision: https://reviews.llvm.org/D80485	2020-10-19 12:23:04 -05:00
Tony	151e297034	[AMDGPU] Simplify cumode handling in SIMemoryLegalizer Differential Revision: https://reviews.llvm.org/D89663	2020-10-19 17:13:45 +00:00
Paul C. Anagnostopoulos	2871c6c93f	[Aarch64] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans. Differential Revision: https://reviews.llvm.org/D89551	2020-10-19 10:33:55 -04:00
Piotr Sobczak	c872faf6e0	[AMDGPU] Do not generate S_CMP_LG_U64 on gfx7 S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64(). Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that do not support 64-bit scalar compare. Differential Revision: https://reviews.llvm.org/D89536	2020-10-19 14:44:31 +02:00
Kazushi (Jam) Marukawa	6bb60d3e26	[VE] Add setcc for fp128 Add setcc for fp128 and clean existing ISel patterns. Also add a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89683	2020-10-19 21:36:57 +09:00
Kazushi (Jam) Marukawa	fb2bb6fad4	[VE] Add cast to/from fp128 patterns Add cast to/from fp128 patterns. Clean other cast patterns too. Update a regression test by adding missing tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89682	2020-10-19 21:35:27 +09:00
Kazushi (Jam) Marukawa	8796746b2a	[VE] Support select_cc Add missing ISel patterns related to select_cc DAG nodes. Add regression test of all combination of possible scalar types. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89672	2020-10-19 18:54:25 +09:00
Kazushi (Jam) Marukawa	f2fd42098c	[VE] Add VBRD/VMV instructions Add VBRD/VMV vector instructions. In order to do that, also support VM512 registers and RV instruction format in MC layer. Also add regression tests for new instructions. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89641	2020-10-19 18:33:54 +09:00
Kazushi (Jam) Marukawa	7a09aec804	[VE] Add LSV/LVS/LVM/SVM instructions Add LSV/LVS/LVM/SVM vector instructions and regression tests. Also update AsmParser to support new format of operands. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89499	2020-10-19 18:32:48 +09:00

1 2 3 4 5 ...

59968 Commits