llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	e2a2df2a1e	[AMDGPU] Add test for set_gpr_idx removal with conditional branches	2021-04-30 15:01:32 +01:00
Simon Moll	43bc584dc0	[VP,Integer,#2] ExpandVectorPredication pass This patch implements expansion of llvm.vp.* intrinsics (https://llvm.org/docs/LangRef.html#vector-predication-intrinsics). VP expansion is required for targets that do not implement VP code generation. Since expansion is controllable with TTI, targets can switch on the VP intrinsics they do support in their backend offering a smooth transition strategy for VP code generation (VE, RISC-V V, ARM SVE, AVX512, ..). Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D78203	2021-04-30 15:47:28 +02:00
Jay Foad	181c492ee7	[AMDGPU] Add implicit negative check for the set_gpr_idx tests The only effect of the optimization is to remove s_set_gpr_idx_* instructions, and update_mir_test_checks.py always inserts CHECK: rather than CHECK-NEXT: checks, so without this implicit negative check, the tests would always pass even if the optimization did nothing. Differential Revision: https://reviews.llvm.org/D101622	2021-04-30 14:45:12 +01:00
Jay Foad	66b8a16cc0	[AMDGPU] Fix inconsistent ---/... in MIR tests and regenerate checks In some cases the lack of --- or ... confused update_mir_test_checks.py into not adding any checks for a function.	2021-04-30 14:10:50 +01:00
Jun Ma	b310dd1501	[AArch64][SVE] Lower index_vector to step_vector As discussed in D100107, this patch first convert index_vector to step_vector, and convert step_vector back to index_vector after LegalizeDAG. Differential Revision: https://reviews.llvm.org/D100816	2021-04-30 19:04:39 +08:00
Fraser Cormack	1d85b24762	[RISCV][NFC] Merge RV32/RV64 test checks with a common prefix	2021-04-30 09:43:48 +01:00
Fraser Cormack	791766e6d2	[RISCV] Support STEP_VECTOR with a step greater than one DAGCombiner was recently taught how to combine STEP_VECTOR nodes, meaning the step value is no longer guaranteed to be one by the time it reaches the backend for lowering. This patch supports such cases on RISC-V by lowering to other step values to a multiply following the vid.v instruction. It includes a small optimization for common cases where the multiply can be expressed as a shift left. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D100856	2021-04-30 09:36:18 +01:00
Qiu Chaofan	bd48def3e2	Pre-commit test for PPC vector extraction test	2021-04-30 12:02:37 +08:00
Christudasan Devadasan	544be70864	[AMDGPU] Skip promote-alloca for insertelement/insertvalue users It is difficult to track the users of vector and aggregate types. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D101562	2021-04-30 08:37:26 +05:30
luxufan	5603ed60ad	[RISCV] Fix StackOffset calculation when using sp to access the fixed stack object in the case of rvv vector objects existed When rvv vector objects existed, using sp to access the fixed stack object will pass the rvv vector objects field. So the StackOffset needs add a scalable offset of the size of rvv vector objects field Differential Revision: https://reviews.llvm.org/D100286	2021-04-30 11:02:38 +08:00
luxufan	325b454ed8	[RISCV] Precommit a test case that test accessing a fixed object when has rvv vector object existed Differential Revision: https://reviews.llvm.org/D100284	2021-04-30 10:35:03 +08:00
Brendon Cahoon	d7d85f72ef	[AArch64][GlobalISel] Fix width value for G_SBFX/G_UBFX When creating G_SBFX/G_UBFX opcodes, the last operand is the width instead of the bit position. The bit position is used for the AArch64 SBFM and UBFM instructions. The bit position is converted to a width if the SBFX/UBFX aliases are generated. For other SBMF/UBFM aliases, such as shifts, the bit position is used. Differential Revision: https://reviews.llvm.org/D101543	2021-04-29 21:54:19 -04:00
Matt Arsenault	e6701e575c	AMDGPU: Add missing runline to test There are checks for gfx908, but this wasn't actually running with it.	2021-04-29 20:59:22 -04:00
Zequan Wu	cab48e2f0e	[CodeGen] don't emit addrsig symbol if it's used only by metadata Value only used by metadata can be removed from .addrsig table. This solves the undefined symbol error when enabling addrsig table on COFF LTO. Differential Revision: https://reviews.llvm.org/D101512	2021-04-29 15:39:30 -07:00
jasonliu	7049fbf960	[XCOFF] Handle the case when personality routine is an alias Summary: Personality routine could be an alias to another personality routine. Fix the situation when we compile the file that contains the personality routine and the file also have functions that need to refer to the personality routine. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D101401	2021-04-29 22:03:30 +00:00
Amara Emerson	96ec6d91e4	[AArch64][GlobalISel] Simplify out of range rotate amount. Differential Revision: https://reviews.llvm.org/D101005	2021-04-29 14:05:58 -07:00
Jay Foad	16d707e656	[AMDGPU] Fix v_swap_b32 formation on physical registers As explained in the comments, matchSwap matches: // mov t, x // mov x, y // mov y, t and turns it into: // mov t, x (t is potentially dead and move eliminated) // v_swap_b32 x, y On physical registers we don't have full use-def chains so the check for T being live-out was not working properly with subregs/superregs. Differential Revision: https://reviews.llvm.org/D101546	2021-04-29 20:53:40 +01:00
Sriraman Tallam	a64411916c	Basic block sections for functions with implicit-section-name attribute Functions can have section names set via #pragma or section attributes, basic block sections should be correctly named for such functions. With #pragma, the expectation is that all functions in that file are placed in the same section in the final binary. Basic block sections should be correctly named with the unique flag set so that the final binary has all the basic blocks of the function in that named section. This patch fixes the bug by calling getExplictSectionGlobal when implicit-section-name attribute is set to make sure the function's basic blocks get the correct section name. Differential Revision: https://reviews.llvm.org/D101311	2021-04-29 12:29:34 -07:00
Tim Northover	c1b7460b5b	Revert "RegAlloc: do not consider liveins to EH-pad successors as liveout." Some liveins can come from this block (e.g. any SSA value except the call), it's only the ones that produce `landingpad` values that can't and I didn't think it through properly.	2021-04-29 20:00:07 +01:00
Petar Avramovic	c34900e133	AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return When atomic image intrinsic return value is unused, register class for destination of a sub-register copy of return value ends up not being set. This copy then hits 'Register class not set' assert later. If return value has uses, register class is determined by use instruction. Fix is to not create sub-register copy when image intrinsic destination has no uses because it would be deleted by dead-mi-elimination later anyway. Differential Revision: https://reviews.llvm.org/D101448	2021-04-29 20:56:03 +02:00
Tim Northover	438a63e13b	RegAlloc: do not consider liveins to EH-pad successors as liveout. These registers get defined by the runtime, not the block being allocated, and treating them as preassigned in RegAllocFast adds extra pressure, sometimes enough to make the function unallocatable.	2021-04-29 19:34:49 +01:00
Victor Huang	ae3377c553	[AIX][TLS] Add ASM portion changes to support TLSGD relocations to XCOFF objects - Add new variantKinds for the symbol's variable offset and region handle - Print the proper relocation specifier @gd in the asm streamer when emitting the TC Entry for the variable offset for the symbol - Fix the switch section failure between the TC Entry of variable offset and region handle - Put .__tls_get_addr symbol in the ProgramCodeSects with XTY_ER property Reviewed by: sfertile Differential Revision: https://reviews.llvm.org/D100956	2021-04-29 13:18:59 -05:00
Benjamin Kramer	df323ba445	Revert "[X86] Support AMX fast register allocation" This reverts commit `3b8ec86fd5`. Revert "[X86] Refine AMX fast register allocation" This reverts commit `c3f95e9197`. This pass breaks using LLVM in a multi-threaded environment by introducing global state.	2021-04-29 18:56:33 +02:00
Craig Topper	dcdda2bdf2	[RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c)) Similar for or/xor with 0 in place of -1. This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction. The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier. I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select. The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D101485	2021-04-29 09:43:51 -07:00
Craig Topper	60216adef1	[RISCV] Add test cases for D101485. NFC	2021-04-29 09:43:51 -07:00
Craig Topper	0c330afdfa	[RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32. This replaces D98479. This allows type legalization to form SPLAT_VECTOR_PARTS so we don't lose the splattedness when the scalar type is split. I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so we can continue using non-VL nodes for scalable vectors. I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR with other operations. Especially interesting is a splat BUILD_VECTOR of the extract_vector_elt which can become a splat shuffle, but won't if we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR or add visitSPLAT_VECTOR. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100803	2021-04-29 08:20:09 -07:00
Craig Topper	25391cec3a	[RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31. This seems like a reasonable upper bound on VL. WG discussions for the V spec would probably allow us to use 2^16 as an upper bound on VLEN, but this is good enough for now. This allows us to remove sext and zext if user happens to assign the size_t result into an int and then uses it as a VL intrinsic argument which is size_t. Reviewed By: frasercrmck, rogfer01, arcbbb Differential Revision: https://reviews.llvm.org/D101472	2021-04-29 08:07:59 -07:00
Jay Foad	1ecddddbec	[AMDGPU] Add a v_swap_b32 test case to be fixed	2021-04-29 16:03:15 +01:00
Bradley Smith	354604a2a7	[AArch64][SVE] Use SIMD variant of INSR when scalar is the result of a vector extract At the intrinsic layer the sve.insr operation takes a scalar. When this scalar is an integer we are forcing a data transition between GPRs and ZPRs that is potentially costly. Often the integer scalar is the result of a vector extract, when performing a reduction for example. In such cases we should keep all data within the ZPRs. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D101169	2021-04-29 12:17:42 +01:00
Bradley Smith	c8f20ed448	[AArch64][SVE] Move convert.{from,to}.svbool optimization into InstCombine As part of this the ptrue coalescing done in SVEIntrinsicOpts has been modified to not introduce redundant converts, since the convert removal will no longer run after that optimisation to clean up. Differential Revision: https://reviews.llvm.org/D101302	2021-04-29 12:17:42 +01:00
Fraser Cormack	f6c54a61da	[RISCV][NFC] Combine identical RV32 and RV64 test checks	2021-04-29 11:38:10 +01:00
David Green	e11420ca23	[ARM] Ensure CSINC has one use in CSINV combine Otherwise the CMP glue may be used in multiple nodes, needing to be emitted multiple times. Currently this either increases instruction count or fails as it attempt to insert the same node multiple times.	2021-04-29 10:59:14 +01:00
Serguei Katkov	2e1150d8aa	[Greedy RA] Replace ll to mir test to make more stable to check an error.	2021-04-29 16:20:41 +07:00
Qiu Chaofan	56d923efdb	[SPE] Support constrained float operations on SPE This patch enables support on SPE for constrained arithmetic and comparison operations. This fixes bugzilla 50070. One thing not covered is fcmp vs. fcmps on SPE. Some condition code generates singaling comparison while some not. In this patch, all are considered as singaling. So there might be still some issue when compiling from C code. Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D101282	2021-04-29 16:34:10 +08:00
Fraser Cormack	43ad058a01	[RISCV] Fix stack slot for argument types (Bug 49500) This is an complementary/alternative fix for D99068. It takes a slightly different approach by explicitly summing up all of the required split part type sizes and ensuring we allocate enough space for them. It also takes the maximum alignment of each part. Compared with D99068 there are fewer changes to the stack objects in existing tests. However, @luismarques has shown in that patch that there are opportunities to reduce our stack usage in the future. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D99087	2021-04-29 09:10:48 +01:00
Harald van Dijk	1b788607f5	[X32][CET] Fix handling of indirect branches As X32 uses 32-bit pointers without having 32-bit indirect branch instructions, we need to fix up indirect branches by extending the branch targets to 64 bits. This was already done for BRIND but not yet for NT_BRIND. The same logic works for both, so this applies that existing logic to NT_BRIND as well. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101499	2021-04-29 08:33:22 +01:00
David Green	465df35355	[ARM] Use just ARM::t2B in ARMBlockPlacementPass The ARMConstantIsland pass will convert any t2B to tB if they are within range after it has added or moved any constant pools. They don't need to be deliberately converted beforehand, and it doesn't deal with needing to convert tB to t2B very well.	2021-04-29 07:44:04 +01:00
Jessica Paquette	4d41810cf6	[AArch64][GlobalISel] Don't match thread-local globals in matchFoldGlobalOffset SelectionDAG has separate ISD opcodes for regular global values and thread-local global values, while GlobalISel does not. This combine was ported from SDAG directly without knowing that. As a result, it was running on TLS globals. This makes it so that `matchFoldGlobalOffset` doesn't match on TLS globals, and adds an assert to `selectTLSGlobalValue` to make sure that TLS globals never have offsets. Differential Revision: https://reviews.llvm.org/D101478	2021-04-28 13:48:18 -07:00
Joe Nash	168228d76a	[AMDGPU] Make some VOP3 insts commutable Note, only src0 and src1 will be commuted if the isCommutable flag is set. This patch does not change that, it just makes it possible to commute src0 and src1 of some U/I/B vop3 instructions. This patch revises `d35d8da7d6`. It contains the commute opportunities excluding float insts Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D101474 Change-Id: I62938173d750453839f2457a3851661a29135faf	2021-04-28 13:59:08 -04:00
Qiu Chaofan	d5c2492455	[PowerPC] Fix SELECT_CC with i64 operand on PPC32 This patch fixes the infinite loop in legalization of PPC32 SELECT_CC with 64-bit operand.	2021-04-28 17:48:33 +08:00
Joe Ellis	1eb81f8309	[AArch64] Add missing UINT_TO_FP promotions for v16i8 Differential Revision: https://reviews.llvm.org/D101042	2021-04-28 08:49:15 +00:00
Heejin Ahn	b4a5dd4da9	[WebAssembly] Error when wasm EH is used with Emscripten EH/SjLj - Error out when both Emscripten EH and wasm EH are used together, i.e., both `-enable-emscripten-cxx-exceptions` and `-exception-model=wasm` are given together. This will not happen if you use Emscripten, but this can happen when you call `llc` manually with wrong set of arguments. - Currently we don't yet support using wasm EH with Emscripten SjLj. Unlike `-enable-emscripten-cxx-exceptions` which is turned on only when you use `emcc -s DISABLE_EXCEPTION_CATCHING=0`, `-enable-emscripten-sjlj` is turned on by Emscripten by default. So we error out only when it is turned on and `setjmp` or `longjmp` is actually used. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D101403	2021-04-27 16:07:53 -07:00
Craig Topper	ce09dd54e6	[RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter. This adds a special operand type that is allowed to be either an immediate or register. By giving it a unique operand type the machine verifier will ignore it. This perturbs a lot of tests but mostly it is just slightly different instruction orders. Something bad did happen to some min/max reduction tests. We're spilling vector registers when we weren't before. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101246	2021-04-27 14:38:16 -07:00
David Green	8de7d8b2c2	[ARM] Recognize VIDUP from BUILDVECTORs of additions This adds a pattern to recognize VIDUP from BUILD_VECTOR of incrementing adds. This can come up from either geps or adds, and came up recently in D100550. We are just looking for a BUILD_VECTOR where each lane is an add of the first lane with N*i, where i is the lane and N is one of 1, 2, 4, or 8, supported by the VIDUP instruction. Differential Revision: https://reviews.llvm.org/D101263	2021-04-27 19:33:24 +01:00
David Green	268f1963af	[ARM] Additional VIDUP tests. NFC	2021-04-27 19:33:24 +01:00
Nick Desaulniers	ea8416bf4d	[CodeGenOptions] make StackProtectorGuardOffset signed GCC supports negative values for -mstack-protector-guard-offset=, this should be a signed value. Pre-req to D100919. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101325	2021-04-27 10:12:58 -07:00
Victor Huang	241c2da406	[AIX][Power10] Restrict prefixed instructions from crossing the 64byte boundary This patch adds the support to restrict prefixed instruction from crossing the 64 byte boundary: - Add the infrastructure to register a custom XCOFF streamer - Add a custom XCOFF streamer for PowerPC to allow us to intercept instructions as they are being emitted and align all 8 byte instructions to a 64 byte boundary if required by adding a 4 byte nop. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D101107	2021-04-27 11:55:18 -05:00
Simon Pilgrim	decab8e973	Revert rG9b7a0a50355d5 - Revert "[X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841)" Still causing some sanitizer buildbot failures.	2021-04-27 15:39:20 +01:00
Simon Pilgrim	9b7a0a5035	[X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841) XADD has the same EFLAGS behaviour as ADD Reapplies rG2149aa73f640 (after it was reverted at rG535df472b042) - AFAICT rG029e41ec9800 should ensure we correctly tag the LXADD* ops as load/stores - I haven't been able to repro the sanitizer buildbot fails locally so this is a speculative commit.	2021-04-27 15:01:13 +01:00
Petar Avramovic	8110fcc8fc	AMDGPU/GlobalISel: Fix negative offset folding for buffer_load Buffer_load does unsigned offset calculations. Don't fold operands of 32-bit add that are likely to cause unsigned add overflow (common case is when one of the operands is negative). Differential Revision: https://reviews.llvm.org/D91336	2021-04-27 14:45:22 +02:00

1 2 3 4 5 ...

38694 Commits