llvm-project

Commit Graph

Author	SHA1	Message	Date
Dmitry Preobrazhensky	933ebc4078	[AMDGPU][MC][GFX8+] Enabled clamp for v_mul_i32_i24_e64 and v_mul_u32_u24_e64 See bug 45925: https://bugs.llvm.org/show_bug.cgi?id=45925 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80287	2020-05-22 14:11:31 +03:00
QingShan Zhang	d1076d729a	[NFC][Test] Add test coverage for fsqrt on PowerPC	2020-05-22 10:59:27 +00:00
Victor Campos	872ee78f65	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `8a12553223`. A bug has been found when generating code for Thumb2. In some very specific cases, the prologue/epilogue emitter generates erroneous stack offsets for the new LDRD instructions that access the stack. This bug does not seem to be caused by the reverted patch though. Likely the latter has made an undiscovered issue emerge in the prologue/epilogue emission pass. Nevertheless, this reversion is necessary since it is blocking users of the ARM backend.	2020-05-22 11:01:57 +01:00
Jessica Paquette	49a4f3f7d8	[AArch64][GlobalISel] Add a post-legalizer combiner with a very simple combine. (This patch is by Jessica, I'm just committing it on her behalf because I need a post-legalizer combiner for something else). This supersedes D77250, which did equivalent work in the selector. This can be done pre-legalization or post-legalization. Post-legalization is more likely to hit, since G_IMPLICIT_DEFs tend to appear during legalization. There's no reason to not do it pre-legalization though-- if it can be caught earlier, great. (I also think that it might be worth reimplementing D78769 using a target-specific post-legalization combine too after thinking about it for a while.) Differential Revision: https://reviews.llvm.org/D78852	2020-05-21 18:47:32 -07:00
Alexey Lapshin	bf242c067e	[AARCH64][NEON] Allow to sink operands of aarch64_neon_pmull64. Summary: This patch fixes a problem when pmull2 instruction is not generated for vmull_high_p64 intrinsic. ISel has a pattern for int_aarch64_neon_pmull64 intrinsic to generate PMULL2 instruction. That pattern assumes that extraction operations are located in the same basic block. We need to sink them if they are not. Handle operands of int_aarch64_neon_pmull64 into AArch64TargetLowering::shouldSinkOperands. Reviewed by: efriedma Differential Revision: https://reviews.llvm.org/D80320	2020-05-22 01:35:24 +03:00
Arthur Eubanks	fc937806ef	Don't jump to landing pads in Control Flow Optimizer Summary: Likely fixes https://bugs.llvm.org/show_bug.cgi?id=45858. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80047	2020-05-21 15:19:10 -07:00
Tim Renouf	d13a508820	[AMDGPU] Fixed incorrect PAL metadata register naming This only affects assembly and -filetype=asm codegen of PAL metadata. Differential Revision: https://reviews.llvm.org/D78860 Change-Id: I7b822e1917bf7b403486820d31afc483be207652	2020-05-21 22:13:19 +01:00
Jean-Michel Gorius	7019cea26d	[CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias Summary: To support all targets, the mayAlias member function needs to support instructions with multiple operands. This revision also changes the order of the emitted instructions in some test cases. Reviewers: efriedma, hfinkel, craig.topper, dmgreen Reviewed By: efriedma Subscribers: MatzeB, dmgreen, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80161	2020-05-21 23:02:54 +02:00
Stanislav Mekhanoshin	689e616ed0	[AMDGPU] Promote alloca to vector in opt Promote alloca to vector before SROA and loop unroll. If we manage to eliminate allocas before unroll we may choose to unroll less. Differential Revision: https://reviews.llvm.org/D80386	2020-05-21 13:49:51 -07:00
Eli Friedman	f09d220c71	[AArch64][SVE] Fill out missing unpredicated load/store patterns. The set of patterns for unpredicated load/store was incomplete: it only included non-extending stores. Fill out the remaining patterns for extending stores, and add the corresponding support to frame offset lowering. Differential Revision: https://reviews.llvm.org/D80349	2020-05-21 13:29:30 -07:00
Stanislav Mekhanoshin	71bbe5d799	[AMDGPU] Added opt pipeline test. NFC.	2020-05-21 11:58:35 -07:00
Stanislav Mekhanoshin	1dfd1b3e4b	[AMDGPU] Tune threshold for cmp/select vector lowering It was set in total vector size while the idea was to limit a number of instructions. Now it started to work with doubles and thresholds needs to be updated. Differential Revision: https://reviews.llvm.org/D80322	2020-05-21 08:59:35 -07:00
Jon Roelofs	5fb979dd06	[llvm][test] Add missing FileCheck colons. NFC	2020-05-21 09:29:27 -06:00
Jon Roelofs	183d6af081	[llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC Differential Revision: https://reviews.llvm.org/D79963	2020-05-21 09:29:27 -06:00
Sjoerd Meijer	b0614509a0	[HardwareLoops] llvm.loop.decrement.reg definition This is split off from D80316, slightly tightening the definition of overloaded hardwareloop intrinsic llvm.loop.decrement.reg specifying that both operands its result have the same type.	2020-05-21 10:48:16 +01:00
Denis Antrushin	dedcefe09d	[Statepoint] Constant fold FP deopt args. We do not have any special handling for constant FP deopt arguments. They are just spilled to stack or generated in register by MOVS instruction. This is inefficient and, when we have too many such constant arguments, may result in register allocation failure. Instead, we can bitcast such constant FP operands to appropriately sized integer and record as constant into statepoint and later, into StackMap. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D80318	2020-05-21 11:02:54 +03:00
Chen Zheng	8086cdd1b0	[PowerPC] add more high latency opcodes for machine combiner pass Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D80097	2020-05-21 02:39:20 -04:00
Craig Topper	ae5ab2f40a	[LegalizeDAG] Modify ExpandLegalINT_TO_FP to swap data for little/big endian instead of the pointers. Will make it easier to pass the pointer info and alignment correctly to the loads/stores. While there also make the i32 stores independent and use a token factor to join before the load.	2020-05-20 22:29:59 -07:00
Eli Friedman	b4f9b34701	[AArch64] Fix unwind info generated by outliner. The offsets were wrong. The result is now the same as what the compiler would generate for a function that spills lr normally. Differential Revision: https://reviews.llvm.org/D80238	2020-05-20 16:39:00 -07:00
Francis Visoiu Mistrih	770ba4f051	[AArch64] Fix GlobalISel tests on non-darwin platforms http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/6998	2020-05-20 16:26:58 -07:00
Francis Visoiu Mistrih	161122ea1c	[AArch64] Provide Darwin variants of most calling conventions With the new SVE stack layout, we now need to provide a Darwin variant for all the calling conventions based on the main AAPCS CSR save order. This also changes APCS_SwiftError to have a Darwin and a non-Darwin version, assuming it could be used on other platforms these days, and restricts the AArch64_CXX_TLS calling convention to Darwin. Differential Revision: https://reviews.llvm.org/D73805	2020-05-20 16:03:48 -07:00
Stanislav Mekhanoshin	4eecf17164	[AMDGPU] Always expand ext/insertelement with divergent idx Even though series of cmd/cndmask can produce quite a lot of code that is still better than a loop. In case of doubles we would even produce two loops. Differential Revision: https://reviews.llvm.org/D80032	2020-05-20 15:51:29 -07:00
aartbik	645bba8d3d	[llvm] [CodeGen] [X86] Fix issues with v4i1 instruction selection Summary: Fixes issue https://bugs.llvm.org/show_bug.cgi?id=45995 Reviewers: mehdi_amini, nicolasvasilache, reidtatge, craig.topper, ftynse, bkramer Reviewed By: craig.topper Subscribers: RKSimon, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80231	2020-05-20 11:34:56 -07:00
Arthur Eubanks	8a88755610	Reland [X86] Codegen for preallocated See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689	2020-05-20 11:25:44 -07:00
Arthur Eubanks	b8cbff51d3	Revert "[X86] Codegen for preallocated" This reverts commit `810567dc69`. Some tests are unexpectedly passing	2020-05-20 10:04:55 -07:00
Arthur Eubanks	810567dc69	[X86] Codegen for preallocated See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689	2020-05-20 09:20:38 -07:00
Matt Arsenault	e8f6b0e583	AMDGPU/GlobalISel: Fix splitting 64-bit extensions This was replicating the low bits into the high bits for G_ZEXT, rather than using 0.	2020-05-20 11:13:32 -04:00
Jay Foad	3c84353804	[AMDGPU] Add the test from D49097.	2020-05-20 14:34:51 +01:00
Pierre-vh	835251f7d9	[Target][ARM] Make Low Overhead Loops coexist with VPT blocks. Previously, the LowOverheadLoops pass couldn't handle VPT blocks with conditions, or with multiple VCTPs. This patch improves the LowOverheadLoops pass so it can handle those cases. It also adds support for VCMPs before the VCTP. Differential Revision: https://reviews.llvm.org/D78206	2020-05-20 12:24:55 +01:00
Kang Zhang	58684fbb6f	[NFC][PowerPC] Add 2 new cases to test livevars pass	2020-05-20 05:32:09 +00:00
Stanislav Mekhanoshin	677929e352	[AMDGPU] Process V_MOV_B32_indirect in SET_GPR_IDX optimization Differential Revision: https://reviews.llvm.org/D80256	2020-05-19 21:37:14 -07:00
Matt Arsenault	77f05e5b53	AMDGPU/GlobalISel: Fix bug in test register bank The intent wasn't cases with illegal VGPR to SGPR copies.	2020-05-19 22:52:59 -04:00
QingShan Zhang	2b59e9f1bd	[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319	2020-05-20 02:12:16 +00:00
Matt Arsenault	21d2884a9c	AMDGPU: Annotate functions that have stack objects Relying on any MachineFunction state in the MachineFunctionInfo constructor is hazardous, because the construction time is unclear and determined by the first use. The function may be only partially constructed, which is part of why we have many of these hacky string attributes to track what we need for ABI lowering. For SelectionDAG, all stack objects are created up-front before calling convention lowering so stack objects are visible at construction time. For GlobalISel, none of the IR function has been visited yet and the allocas haven't been added to the MachineFrameInfo yet. This should fix failing to set flat_scratch_init in GlobalISel when needed. This pass really needs to be turned into some kind of analysis, but I haven't found a nice way use one here.	2020-05-19 18:51:00 -04:00
Cameron McInally	e89a08aefd	[SVE] MOVPRFX zero merging test renaming Differential Revision: https://reviews.llvm.org/D80244	2020-05-19 17:33:19 -05:00
Matt Arsenault	08ae945318	GlobalISel: Copy correct flags to select This was looking for a compare condition, and copying the compare flags. I don't think this was ever correct outside of certain min/max patterns which aren't checked, but this probably predates select instructions having fast math flags.	2020-05-19 18:31:24 -04:00
Matt Arsenault	074b802654	AMDGPU: Fix DAG divergence for implicit function arguments This should be directly implied from the register class, and there's no need to special case live ins here. This was getting the wrong answer for the queue ptr argument in callable functions, since it's not an explicit IR argument and is always uniform. Fixes not using scalar loads for the aperture in addrspacecast lowering, and any other places that use implicit SGPR arguments.	2020-05-19 18:11:34 -04:00
Eli Friedman	5d2c3a0b8c	[AArch64] Disable MachineOutliner on Windows. The handling of unwind info is broken, so disable it for now.	2020-05-19 13:49:03 -07:00
Thomas Lively	8a43d41a40	[WebAssembly] Fix bug in custom shuffle combine Summary: The code previously assumed the source of the bitcast in the combined pattern was a vector type, but this is not always true. This patch adds a check to avoid an assertion failure in that case. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80164	2020-05-19 12:54:15 -07:00
Thomas Lively	3181273be7	[WebAssembly] Implement i64x2.mul and remove i8x16.mul Summary: This reflects changes in the spec proposal made since basic arithmetic was first implemented. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80174	2020-05-19 12:50:44 -07:00
Craig Topper	ccba60a784	[StackColoring] When remapping alloca's move the To alloca if the From alloca is before it. If To is after From its possible that there's a use of From between them. Fixes issue reported here http://lists.llvm.org/pipermail/llvm-dev/2020-May/141421.html Differential Revision: https://reviews.llvm.org/D80101	2020-05-19 10:37:27 -07:00
Matt Arsenault	a7759d1785	GlobalISel: Fix IRTranslator for constantexpr selects This was assuming a select is always an instruction, which is not true.	2020-05-19 09:52:48 -04:00
Sam Parker	e86f3075f8	[NFC][ARM] Add more tail predication tests	2020-05-19 14:01:10 +01:00
Carl Ritson	eeece6dbe6	[AMDGPU] Add more VMEM to SALU WAR hazard tests. NFC	2020-05-19 19:52:13 +09:00
Jonas Paulsson	b3bd0c37ec	[SystemZ] Eliminate the need to create a zero vector by reusing the VPERM mask. Try to avoid creating VGBMs by reusing the permutation mask if it contains a zero. If the first byte was into (any byte of) a zero vector, then the first byte of the mask can become zero and reused by putting the mask also as the first operand. If there instead was a first-byte use of the other source operand, then that zero index can be reused if the mask is placed as the second operand. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D79925	2020-05-19 09:37:19 +02:00
Qiu Chaofan	ad4f196e25	[NFC] [PowerPC] Refresh fma-mutate.ll using script This is a clean-up after D78989. The old comments are out of date.	2020-05-19 13:39:58 +08:00
Chen Zheng	a6be4d17e3	[PowerPC-QPX] adjust operands order of qpx fma instructions. convert %3 = QVFMADD %2, %0, %1, implicit $rm to %3 = QVFMADD %2, %1, %0, implicit $rm Reviewed By: hfinkel, steven.zhang Differential Revision: https://reviews.llvm.org/D78986	2020-05-18 22:59:51 -04:00
Yonghong Song	8e8f1bd75a	[BPF] Return fail if disassembled insn registers out of range Daniel reported a llvm-objdump segfault like below: $ llvm-objdump -D bpf_xdp.o ... 0000000000000000 <.strtab>: 0: 00 63 69 6c 69 75 6d 5f <unknown> 1: 6c 62 36 5f 61 66 66 69 w2 <<= w6 ... (llvm-objdump: lib/Target/BPF/BPFGenAsmWriter.inc:1087: static const char* llvm::BPFInstPrinter::getRegisterName(unsigned int): Assertion `RegNo && RegNo < 25 && "Invalid register number!"' failed. Stack dump: 0. Program arguments: llvm-objdump -D bpf_xdp.o ... abort ... llvm::BPFInstPrinter::getRegisterName(unsigned int) llvm::BPFInstPrinter::printMemOperand(llvm::MCInst const, int, llvm::raw_ostream&, char const) llvm::BPFInstPrinter::printInstruction(llvm::MCInst const, unsigned long, llvm::raw_ostream&) llvm::BPFInstPrinter::printInst(llvm::MCInst const, unsigned long, llvm::StringRef, llvm::MCSubtargetInfo const&, llvm::raw_ostream&) ... Basically, since -D enables disassembly for all sections, .strtab is also disassembled, but some strings are decoded as legal instructions but with illegal register numbers. When llvm-objdump tries to print register name for these illegal register numbers, assertion and segfault happens. The patch fixed the issue by returning fail for a disassembled insn if that insn contains a reg operand with illegal reg number. The insn will be printed as "<unknown>" instead of causing an assertion.	2020-05-18 18:53:23 -07:00
Chen Zheng	4a69eda6f3	[PowerPC][MachineCombiner] add testcase for reassociating FMA - NFC	2020-05-18 21:18:01 -04:00
Yonghong Song	ddff9799d2	[BPF] Prevent disassembly segfault for NOP insn For a simple program like below: -bash-4.4$ cat t.c int test() { asm volatile("r0 = r0" ::); return 0; } compiled with clang -target bpf -O2 -c t.c the following llvm-objdump command will segfault. llvm-objdump -d t.o 0: bf 00 00 00 00 00 00 00 nop llvm-objdump: ../include/llvm/ADT/SmallVector.h:180 ... Assertion `idx < size()' failed ... abort ... llvm::BPFInstPrinter::printOperand llvm::BPFInstPrinter::printInstruction ... The reason is both NOP and MOV_rr (r0 = r0) having the same encoding. The disassembly getInstruction() decodes to be a NOP instruciton but during printInstruction() the same encoding is interpreted as a MOV_rr instruction. Such a mismatcch caused the segfault. The fix is to make NOP instruction as CodeGen only so disassembler will skip NOP insn for disassembling. Note that instruction "r0 = r0" should not appear in non inline asm codes since BPF Machine Instruction Peephole optimization will remove it. Differential Revision: https://reviews.llvm.org/D80156	2020-05-18 17:40:18 -07:00

1 2 3 4 5 ...

34024 Commits