llvm-project

Commit Graph

Author	SHA1	Message	Date
Tim Northover	88464a55b4	AArch64: emit @llvm.debugtrap as `brk #0xf000` on all platforms It's useful for a debugger to be able to distinguish an @llvm.debugtrap from a (noreturn) @llvm.trap, so this extends the existing Windows behaviour to other platforms.	2020-07-20 10:31:26 +01:00
Petar Avramovic	ba938f6388	AMDGPU/GlobalISel: Legalize s16->s64 G_FPTOSI/G_FPTOUI Add narrowScalarFor action. Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI. Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI followed by s32->s64 G_SEXT/G_ZEXT. Differential Revision: https://reviews.llvm.org/D84010	2020-07-20 11:06:11 +02:00
Roman Lebedev	3de4166325	[NFC][SimplifyCFG] Add standalone test for common code hoisting xform option Also, move one test into it's correct place	2020-07-20 10:29:29 +03:00
Sanjay Patel	50afa18772	[x86] split FMA with fast-math-flags to avoid libcall fma reassoc A, B, C --> fadd (fmul A, B), C (when target has no FMA hardware) C/C++ code may use explicit fma() calls (which become LLVM fma intrinsics in IR) but then gets compiled with -ffast-math or similar. For targets that do not have FMA hardware, we don't want to go out to the math library for a precise but slow FMA result. I tried this as a generic DAGCombine, but it caused infinite looping on more than 1 other target, so there's likely some over-reaching fma formation happening. There's also a potential intersection of strict FP with fast-math here. Deferring to current behavior for that case (assuming that strict-ness overrides fast-ness). Differential Revision: https://reviews.llvm.org/D83981	2020-07-19 10:03:55 -04:00
David Green	3504acc33e	[ARM] Don't mark vctp as having sideeffects As far as I can tell, it should not be necessary for VCTP to be unpredictable in tail predicated loops. Either it has a a valid loop counter as a operand which will naturally keep it in the right loop, or it doesn't and it won't be converted to a tail predicated loop. Not marking it as having side effects allows it to be scheduled more cleanly for cases where it is not expected to become a tail predicate loop. Differential Revision: https://reviews.llvm.org/D83907	2020-07-19 09:28:09 +01:00
Kang Zhang	d37befdfe5	[PowerPC] Remove the redundant implicit operands in ppc-early-ret pass Summary: In the `ppc-early-ret` pass, we have use `BuildMI` and `copyImplicitOps` when the branch instructions can do the early return. But the two functions will add implicit operands twice, this is not correct. This patch is to remove the redundant implicit operands in `ppc-early-ret pass`. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D76042	2020-07-19 07:01:45 +00:00
Matt Arsenault	c73df56966	AMDGPU/GlobalISel: Address some test fixmes that don't fail now	2020-07-18 10:54:39 -04:00
Matt Arsenault	918f3fc2c7	AMDGPU/GlobalISel: Fix test copy paste error	2020-07-18 10:09:01 -04:00
Evgeny Leviant	24089928be	[CodeGen][TargetPassConfig] Add TargetTransformInfo pass correctly Patch adds tti pass directly enforcing its execution with correctly set TargetTransformInfo. Differential revision: https://reviews.llvm.org/D84047	2020-07-18 14:11:40 +03:00
Dmitry Preobrazhensky	2e87acac9b	[AMDGPU] Removed s_mov_regrd and mov_fed opcodes These opcodes are not intended for public use. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81659	2020-07-17 19:52:54 +03:00
Yonghong Song	0e347c0ff0	BPF: generate .rodata BTF datasec for certain initialized local var's Currently, BTF datasec type for .rodata is generated only if there are user-defined readonly global variables which have debuginfo generated. Certain readonly global variables may be generated from initialized local variables. For example, void foo(const void *); int test() { const struct { unsigned a[4]; char b; } val = { .a = {2, 3, 4, 5}, .b = 6 }; foo(&val); return 0; } The clang will create a private linkage const global to store the initialized value: @__const.test.val = private unnamed_addr constant %struct.anon { [4 x i32] [i32 2, i32 3, i32 4, i32 5], i8 6 }, align 4 This global variable eventually is put in .rodata ELF section. If there is .rodata ELF section, libbpf expects a BTF .rodata datasec as well even though it may be empty meaning there are no global readonly variables with proper debuginfo. Martin reported a bug where without this empty BTF .rodata datasec, the bpftool gen will exit with an error. This patch fixed the issue by generating .rodata BTF datasec if there exists local var intial data which will result in .rodata ELF section. Differential Revision: https://reviews.llvm.org/D84002	2020-07-17 09:45:57 -07:00
Matt Arsenault	994fb86bc2	AMDGPU: Fix promoting f16 fpowi with legal f16	2020-07-17 11:29:05 -04:00
Jay Foad	f05bce86af	[AMDGPU] Add some missing check prefixes and tweak test The test needed some extra ALU instructions to prevent it from being memory bound.	2020-07-17 12:57:47 +01:00
Jay Foad	2dc3d1b313	[AMDGPU] Add some missing check prefixes	2020-07-17 12:56:29 +01:00
Sanjay Patel	7598ad3ead	[x86] add tests for FMA with FMF; NFC	2020-07-17 07:52:29 -04:00
Sam Tebbs	6c348e4067	[HWLoops] Stop converting to a while loop when it would be unsafe to There were cases where a do-while loop would be converted to a while loop before finding out that it would be unsafe to expand the SCEV in this situation and then bailing out of hardware loop conversion. This patch checks if it would be unsafe to expand the SCEV and if so stops converting the do-while into a while, allowing conversion to a hardware loop. Differential Revision: https://reviews.llvm.org/D83953	2020-07-17 11:47:08 +01:00
Jay Foad	760af7a074	[AMDGPU] Avoid splitting FLAT offsets in unsafe ways As explained in the comment: // For a FLAT instruction the hardware decides whether to access // global/scratch/shared memory based on the high bits of vaddr, // ignoring the offset field, so we have to ensure that when we add // remainder to vaddr it still points into the same underlying object. // The easiest way to do that is to make sure that we split the offset // into two pieces that are both >= 0 or both <= 0. In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have an unsigned immediate offset field, so we can't use it to help split a negative offset. Differential Revision: https://reviews.llvm.org/D83394	2020-07-17 11:44:10 +01:00
Jay Foad	62fd7f767c	[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is greater than the total latency of the instructions we've already scheduled -- otherwise its latency would be hidden and there would be no stall. Unfortunately it only tests the depth of one of the candidates. This can lead to situations where the TopDepthReduce heuristic does not kick in, but a lower priority heuristic chooses the other candidate, whose depth is greater than the already scheduled latency, which causes a stall. The fix is to apply the heuristic if the depth of either candidate is greater than the already scheduled latency. All this also applies to the BotHeightReduce heuristic in the bottom zone. Differential Revision: https://reviews.llvm.org/D72392	2020-07-17 11:02:13 +01:00
Florian Hahn	e297006d6f	[ScheduleDAG] Move DBG_VALUEs after first term forward. MBBs are not allowed to have non-terminator instructions after the first terminator. Currently in some cases (see the modified test), EmitSchedule can add DBG_VALUEs after the last terminator, for example when referring a debug value that gets folded into a TCRETURN instruction on ARM. This patch updates EmitSchedule to move inserted DBG_VALUEs just before the first terminator. I am not sure if there are terminators produce values that can in turn be used by a DBG_VALUE. In that case, moving the DBG_VALUE might result in referencing an undefined register. But in any case, it seems like currently there is no way to insert a proper DBG_VALUEs for such registers anyways. Alternatively it might make sense to just remove those extra DBG_VALUES. I am not too familiar with the details of debug info in the backend and would appreciate any suggestions on how to address the issue in the best possible way. Reviewers: vsk, aprantl, jpaquette, efriedma, paquette Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D83561	2020-07-17 10:27:43 +01:00
Kai Luo	817767abee	[PowerPC] Precommit test case for PR46759. NFC.	2020-07-17 08:41:15 +00:00
Simon Wallis	3e0ccf9a90	[ARM] halfword store hits llvm_unreachable with big-endian Summary: [ARM] halfword store hits llvm_unreachable with big-endian Provide missing case in getFixupKindContainerSizeBytes(). This stops execution reaching llvm_unreachable("Unknown fixup kind!") D83947 Reviewers: olista01, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83947 Change-Id: I598aa1fb51fd1c6f424c557c85d6df6d1958bc62	2020-07-17 08:56:44 +01:00
hsmahesha	4905536086	Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size" This reverts commit `cc9d693856`.	2020-07-17 12:20:37 +05:30
Craig Topper	6bba95831e	[X86] Change the scheduler model for 'pentium4' to SandyBridgeModel. I meant to do this in D83913, but missed it while updating the feature list. Interestingly I think this is disabling the postRA scheduler. But it does match our default 64-bit behavior. Reviewed By: echristo Differential Revision: https://reviews.llvm.org/D83996	2020-07-16 22:04:29 -07:00
Carl Ritson	3a18665748	[AMDGPU] Translate s_and/s_andn2 to s_mov in vcc optimisation When SCC is dead, but VCC is required then replace s_and / s_andn2 with s_mov into VCC when mask value is 0 or -1. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D83850	2020-07-17 11:48:57 +09:00
Wouter van Oortmerssen	cc1b9b680f	[WebAssembly] 64-bit (function) pointer fixes. Accounting for the fact that Wasm function indices are 32-bit, but in wasm64 we want uniform 64-bit pointers. Includes reloc types for 64-bit table indices. Differential Revision: https://reviews.llvm.org/D83729	2020-07-16 14:10:22 -07:00
Craig Topper	5408024fa8	[X86] Move integer hadd/hsub formation into a helper function shared by combineAdd and combineSub. There was a lot of duplicate code here for checking the VT and subtarget. Moving it into a helper avoids that. It also fixes a bug that combineAdd reused Op0/Op1 after a call to isHorizontalBinOp may have changed it. The new helper function has its own local version of Op0/Op1 that aren't shared by other code. Fixes PR46455. Reviewed By: spatel, bkramer Differential Revision: https://reviews.llvm.org/D83971	2020-07-16 13:27:27 -07:00
Matt Arsenault	6c5b635e95	AMDGPU: Add a few more missing test for AGPR tuple copying	2020-07-16 15:53:11 -04:00
Craig Topper	ad171d24b9	[X86] Change the tuning settings for pentium4 to be more modern since its the default 32-bit cpu in clang Alternative to D83897. I believe the big change here is that I removed slow unaligned memory 16 Down side that it may adversely effect tuning if someone explicitly targets -march=pentium4 and expects pentium4 tuned code. Of course pentium4 is so old our default behavior with the previous settings may not have been the best either. Reviewed By: echristo, RKSimon Differential Revision: https://reviews.llvm.org/D83913	2020-07-16 12:51:25 -07:00
Matt Arsenault	10382285ac	AMDGPU: Add missing tests for copyPhysReg AGPR tuples	2020-07-16 15:27:57 -04:00
Thomas Lively	ecb2e5bcd7	[WebAssembly] Implement v128.select Although the SIMD spec proposal does not specifically include a select instruction, the select instruction in MVP WebAssembly is polymorphic over the selected types, so it is able to work on v128 values when they are enabled. This patch introduces a new variant of the select instruction for each legal vector type. Additional ISel patterns are adapted from the SELECT_I32 and SELECT_I64 patterns. Depends on D83736. Differential Revision: https://reviews.llvm.org/D83737	2020-07-16 11:37:25 -07:00
Thomas Lively	f7868f87ac	[WebAssembly] Autogenerate tests for simd-select.ll Updating the simd-select.ll tests manually with consistent named regexps for the register numbers was taking more time than it was worth, so this patch updates that test file to have autogenerated output. This is not a significant readability regression because the tests in that file are all very small. Depends on D83734. Differential Revision: https://reviews.llvm.org/D83736	2020-07-16 11:19:09 -07:00
Thomas Lively	f0f9787646	[WebAssembly] Lower vselect to v128.bitselect We were previously expanding vselect and matching on the expansion to generate bitselects, but in some cases the expansion would be further combined and a bitselect would not get generated. This patch improves codegen in those cases by legalizing vselect and lowering it to v128.bitselect. The old pattern that matches the expansion is still useful for lowering IR that already uses the expansion rather than a select operation. Differential Revision: https://reviews.llvm.org/D83734	2020-07-16 11:11:19 -07:00
Craig Topper	9adf7461f7	[X86] Add test case for PR46455.	2020-07-16 11:06:55 -07:00
Matt Arsenault	9d3e56e2ee	DAG: Try scalarizing when expanding saturating add/sub In an upcoming AMDGPU patch, the scalar cases will be legal and vector ops should be scalarized, rather than producing a long sequence of vector ops which will also need to be scalarized. Use a lazy heuristic that seems to work and improves the thumb2 MVE test.	2020-07-16 14:05:16 -04:00
Denis Antrushin	ef658ebd62	MIR Statepoint refactoring. Part 1: Basic MI level changes. Basic support for variadic-def MIR Statepoint: - Change TableGen STATEPOINT description to variadic out list (For self-documentation purpose; by itself it does not affect code generation in any way). - Update StatepointOpers helper class to handle variadic defs. - Update MachineVerifier to properly handle them, too. With this change, new Statepoint instruction can be passed through backend (excluding ISEL) without errors. Full change set is available at D81603. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D81645	2020-07-17 00:57:21 +07:00
Matt Arsenault	79f67cae91	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).	2020-07-16 13:16:30 -04:00
Petar Avramovic	6850033ca6	AMDGPU/GlobalISel: Legalize s64->s16 G_SITOFP/G_UITOFP Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP. Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP followed by s32->s16 G_FPTRUNC. Differential Revision: https://reviews.llvm.org/D83880	2020-07-16 16:31:57 +02:00
Jay Foad	fc2317f0f5	[PowerPC] Precommit 64-bit funnel shift test cases	2020-07-16 15:20:52 +01:00
James Y Knight	60433c63ac	Remove TwoAddressInstructionPass::sink3AddrInstruction. This function has a bug which will incorrectly reschedule instructions after an INLINEASM_BR (which can branch). (The bug may also allow scheduling past a throwing-CALL, I'm not certain.) I could fix that bug, but, as the removed FIXME notes, it's better to attempt rescheduling before converting to 3-addr form, as that may remove the need to convert in the first place. In fact, the code to do such reordering was added to this pass only a few months later, in 2011, via the addition of the function rescheduleMIBelowKill. That code does not contain the same bug. The removal of the sink3AddrInstruction function is not a no-op: in some cases it would move an instruction post-conversion, when rescheduleMIBelowKill would not move the instruction pre-converison. However, this does not appear to be important: the machine instruction scheduler can reorder the after-conversion instructions, in any case. This patch fixes a kernel panic 4.4 LTS x86_64 Linux kernels, when built with clang after `4b0aa5724f`. Link: https://github.com/ClangBuiltLinux/linux/issues/1085 Differential Revision: https://reviews.llvm.org/D83708	2020-07-16 10:02:52 -04:00
Jay Foad	482753fe9c	[PowerPC] Use CHECK-LABEL for better diagnostics	2020-07-16 13:41:29 +01:00
Paul Walker	509351d768	[SVE] Add lowering for scalable vector fadd, fdiv, fmul and fsub operations. Lower the operations to predicated variants. This is prep work required for fixed length code generation but also fixes a bug whereby these operations fail selection when "unpacked" vector types (e.g. MVT::nxv2f32) are used. This patch also adds the missing "unpacked" patterns for FMA. Differential Revision: https://reviews.llvm.org/D83765	2020-07-16 11:31:35 +00:00
Pavel Iliin	b9a6fb6428	[ARM] VBIT/VBIF support added. Vector bitwise selects are matched by pseudo VBSP instruction and expanded to VBSL/VBIT/VBIF after register allocation depend on operands registers to minimize extra copies.	2020-07-16 11:25:53 +01:00
David Green	146d35b6ee	[ARM] CSEL generation This adds a peephole optimisation to turn a t2MOVccr that could not be folded into any other instruction into a CSEL on 8.1-m. The t2MOVccr would usually be expanded into a conditional mov, that becomes an IT; MOV pair. We can instead generate a CSEL instruction, which can potentially be smaller and allows better register allocation freedom, which can help reduce codesize. Performance is more variable and may depend on the micrarchitecture details, but initial results look good. If we need to control this per-cpu, we can add a subtarget feature as we need it. Original patch by David Penry. Differential Revision: https://reviews.llvm.org/D83566	2020-07-16 11:10:53 +01:00
Kerry McLaughlin	2762da0a16	[SVE][CodeGen] Legalisation of masked loads and stores Summary: This patch modifies IncrementMemoryAddress to use a vscale when calculating the new address if the data type is scalable. Also adds tablegen patterns which match an extract_subvector of a legal predicate type with zip1/zip2 instructions Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma, david-arm Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83137	2020-07-16 10:55:45 +01:00
Petar Avramovic	5658002b80	AMDGPU/GlobalISel: Select G_FREEZE Select G_FREEZE in the same way that COPY is selected. Differential Revision: https://reviews.llvm.org/D83031	2020-07-16 11:10:48 +02:00
Amy Kwan	fc55308628	[PowerPC][Power10] Fix VINS* (vector insert byte/half/word) instructions to have i32 arguments. Previously, the vins* intrinsic was incorrectly defined to have its second and third argument arguments as an i64. This patch fixes the second and third argument of the vins* instruction and intrinsic to have i32s instead. Differential Revision: https://reviews.llvm.org/D83497	2020-07-16 00:30:24 -05:00
Carl Ritson	5bf2a9dd40	[AMDGPU] Update VMEM scalar write hazard mitigation sequence Using s_waitcnt_depctr 0xffe3 is potentially faster than v_nop. Reviewed By: rampitec, foad Differential Revision: https://reviews.llvm.org/D83872	2020-07-16 11:37:45 +09:00
dfukalov	7520393842	[NFC] Fixed typo in tests parameters Summary: llc reports `fp32-denormals` is not recognized. I guess it was intended to be `-denormal-fp-math-f32={preserve-sign\|ieee} -mattr=+mad-mac-f32-insts` Reviewers: rampitec Reviewed By: rampitec Subscribers: jvesely, nhaehnle, llvm-commits, kerbowa Tags: #llvm Differential Revision: https://reviews.llvm.org/D83883	2020-07-15 22:09:01 +03:00
Hiroshi Yamauchi	f233b92f92	[PGO][PGSO] Add profile guided size optimization to LegalizeDAG. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83333	2020-07-15 10:03:38 -07:00
lewis-revill	c9c955ada8	[RISCV] Add matching of codegen patterns to RISCV Bit Manipulation Zbt asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions from the ternary subset (zbt subextension) of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79875	2020-07-15 12:19:34 +01:00

1 2 3 4 5 ...

34870 Commits