llvm-project

Commit Graph

Author	SHA1	Message	Date
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
cynecx	8ec9fd4839	Support unwinding from inline assembly I've taken the following steps to add unwinding support from inline assembly: 1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax: ``` invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"() to label %exit unwind label %uexit ``` 2.) Add Bitcode writing/reading support + LLVM-IR parsing. 3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled. 4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind. 5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled. 6.) Don't allow unwinding callbr. Reviewed By: Amanieu Differential Revision: https://reviews.llvm.org/D95745	2021-05-13 19:13:03 +01:00
Stefan Pintilie	54310fc176	[PowerPC] Add ROP Protection to prologue and epilogue Added hashst to the prologue and hashchk to the epilogue. The hash for the prologue and epilogue must always be stored as the first element in the local variable space on the stack. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D99377	2021-05-13 12:54:44 -05:00
David Green	1011d4ed60	[ARM] Constrain CMPZ shift combine to a single use We currently prefer t2CMPrs over t2CMPri when the node contains a shift. This can introduce more nodes if the shift has multiple uses though, as value from the shift will be needed anyway, and in the case of a t2CMPri compared with zero will more readily be removed entirely. Differential Revision: https://reviews.llvm.org/D101688	2021-05-13 18:31:01 +01:00
Stanislav Mekhanoshin	8f98356bb5	[AMDGPU] Only allow global fp atomics with unsafe option Previously we were allowing to use FP atomics without -amdgpu-unsafe-fp-atomics option if a scope is less then system. This is not safe just as well if we have UC memory. This change only allows global and flat FP atomics with the unsafe option. Consequentially that makes a check for denorm mode redundant since we skip it with the unsafe option and do not have a way to produce these instructions without it anyway. Differential Revision: https://reviews.llvm.org/D102347	2021-05-13 08:52:20 -07:00
Bradley Smith	b1a074951f	[AArch64][SVE] Fix missed immediate selection due to mishandling of signedness The complex selection pattern for add/sub shifted immediates is incorrect in it's handling of incoming constant values, in that it does not properly anticipate the values to be signed extended to 32-bits. Co-authored-by: Graham Hunter <graham.hunter@arm.com> Differential Revision: https://reviews.llvm.org/D101833	2021-05-13 16:02:49 +01:00
Juneyoung Lee	395607af3c	Reapply [ConstantFold] Fold more operations to poison This was reverted to mitigate mitigate miscompiles caused by the logical and/or to bitwise and/or fold. Reapply it now that the underlying issue has been fixed by D101191. ----- This patch folds more operations to poison. Alive2 proof: https://alive2.llvm.org/ce/z/mxcb9G (it does not contain tests about div/rem because they fold to poison when raising UB) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D92270	2021-05-13 16:04:12 +02:00
Jinsong Ji	b1509d067e	[AIX] XFAIL CodeGen/Generic/externally_available.ll Globals with “available_externally” linkage should never be emitted into the object file corresponding to the LLVM module. However, AIX system assembler default print error for undefined reference . so AIX chose to emit the available externally symbols into .s, so that users won't run into errors in situations like: clang -target powerpc-ibm-aix -xc -<<<$'extern inline __attribute__((__gnu_inline__)) void foo() {}\nvoid bar() { foo(); }' -O -Xclang -disable-llvm-passes Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D102377	2021-05-13 13:24:48 +00:00
Stefan Pintilie	15051f0b4a	[PowerPC] Handle inline assembly clobber of link regsiter This patch adds the handling of clobbers of the link register LR for inline assembly. This patch is to fix: https://bugs.llvm.org/show_bug.cgi?id=50147 Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D101657	2021-05-13 07:43:37 -05:00
Fraser Cormack	797e580db9	[RISCV][NFC] Simplify test run lines Several tests had -verify-machineinstrs twice, and several tests were explicitly specifying the default FileCheck prefix of CHECK.	2021-05-13 12:41:00 +01:00
Nemanja Ivanovic	39e4676ca7	[PowerPC] Provide doubleword vector predicate form comparisons on Power7 There are two reasons this shouldn't be restricted to Power8 and up: 1. For XL compatibility 2. Because clang will expand comparison operators to these intrinsics* *Without this patch, the following causes a selection error: int test(vector signed long a, vector signed long b) { return a < b; } This patch provides the handling for the intrinsics in the back end and removes the Power8 guards from the predicate functions (vec_{all\|any}_{eq\|ne\|gt\|ge\|lt\|le}).	2021-05-13 04:56:56 -05:00
Serge Pavlov	12537ab772	[FPEnv][X86] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D74730	2021-05-13 14:30:38 +07:00
Sam Clegg	3041b16f73	[WebAssembly] Add TLS data segment flag: WASM_SEG_FLAG_TLS Previously the linker was relying solely on the name of the segment to imply TLS. Differential Revision: https://reviews.llvm.org/D102202	2021-05-12 13:31:02 -07:00
Heejin Ahn	ba38b72ec2	[WebAssembly] Allow Wasm EH with Emscripten SjLj We explicitly made it error out in D101403, out of a good intention that the error message will make people less confusing. Turns out, we weren't failing all cases of wasm EH + SjLj; only a few cases were failing and our client was able to get around by fixing source code, but now we made it fail for all cases, even the cases that previously succeeded fail, which we didn't intend. This reverts that change. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D102364	2021-05-12 13:27:04 -07:00
Simon Pilgrim	fb1d61b725	[X86][AVX] Fold concat(pslq(x,32),pslq(y,32)) -> shuffle(concat(x,y),zero) (PR46621) On AVX1 targets we can handle v4i64 logical shifts by 32 bits as a pair of v8f32 shuffles with zero. I was hoping to put this in LowerScalarImmediateShift, but performing that early causes regressions where other instructions were respliting the subvectors.	2021-05-12 18:04:40 +01:00
Amara Emerson	dc8d16c03f	[AArch64][GlobalISel] Add MMOs to constant pool loads to allow LICM hoisting. This caused performance regressions vs SDAG on SingleSource/Benchmarks/Adobe-C++	2021-05-12 09:47:09 -07:00
Baptiste Saleil	5885f1a4cb	[AMDGPU] Disable the SIFormMemoryClauses pass at -O1 This patch disables the SIFormMemoryClauses pass at -O1. This pass has a significant impact on compilation time, so we only want it to be enabled starting from -O2. Differential Revision: https://reviews.llvm.org/D101939	2021-05-12 11:51:59 -04:00
Simon Pilgrim	778562ada3	[X86][AVX] Add v4i64 shift-by-32 tests AVX1 could perform this as a v8f32 shuffle instead of splitting - based off PR46621	2021-05-12 16:42:18 +01:00
Fraser Cormack	c5ec00e62b	[TargetLowering] Improve legalization of scalable vector types This patch extends the vector type-conversion and legalization capabilities of scalable vector types. Firstly, `vscale x 1` types now behave more like the corresponding `vscale x 2+` types. This enables the integer promotion legalization of extended scalable types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`. These `vscale x 1` types are also now better handled by `getVectorTypeBreakdown`, where what looks like older handling for 1-element fixed-length vector types was spuriously updated to include scalable types. Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR` to insert the smaller scalable vector "value" type into the wider scalable vector "part" type. This allows AArch64 to pass and return `vscale x 1` types by value by widening. There are still cases where we are unable to legalize `vscale x 1` types, such as where expansion would require splitting the vector in two. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102073	2021-05-12 16:33:07 +01:00
Stefan Pintilie	8d37411e48	Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics" This reverts commit `6c80361b84`. Breaks PowerPC Big Endian buildbots.	2021-05-12 09:46:18 -05:00
Hendrik Greving	762ac725bf	[DAGCombiner] Fix DAG combine store elimination, different address space. Fixes a bug in the DAG combiner that eliminates the stores because it missed to inspect the address space of the pointers. %v = load %ptr_as1 // no chain side effect store %v, %ptr_as2 As well as store %v, %ptr_as1 store %v, %ptr_as2 Fixes a test for above in X86. Differential Revision: https://reviews.llvm.org/D102096	2021-05-12 07:14:22 -07:00
Hendrik Greving	4b00ffa767	[DAGCombiner] Add test exposing bug in DAG combine. Adds a test in X86, exposing a bug in DAG combine eliminating stores that are the same value but no the same address space. Differential Revision: https://reviews.llvm.org/D102243	2021-05-12 07:14:21 -07:00
Peter Waller	3fa6510f6e	[CodeGen][AArch64][SVE] Fold [rdffr, ptest] => rdffrs; bugfix for optimizePTestInstr When a ptest is used to set flags from the output of rdffr, the ptest can be eliminated, using a flags-setting rdffrs instead. Additionally, check that nothing consumes flags between rdffr and ptest; this case appears to have been missed previously. * There is no unpredicated RDFFRS instruction. * If substituting RDFFR_PP, require that the mask argument of the PTEST matches that of the RDFFR_PP. * Move some precondition code up inside optimizePTestInstr, so that it covers the new code paths for RDFFR which return earlier. * Only consider RDFFR, PTEST in same basic block. * Check for other flag setting instructions between the two, abort if found. * Drop an old TODO comment about removing dead PTEST instructions. RDFFR_P to follow in later patch. Differential Revision: https://reviews.llvm.org/D101357	2021-05-12 15:06:22 +01:00
Julien Pagès	46adccc5cc	[AMDGPU] Improve Codegen for build_vector Improve the code generation of build_vector. Use the v_pack_b32_f16 instruction instead of v_and_b32 + v_lshl_or_b32 Differential Revision: https://reviews.llvm.org/D98081 Patch by Julien Pagès!	2021-05-12 14:17:44 +01:00
Sanjay Patel	f58e0513dd	[x86] try harder to lower to PCMPGT instead of not-of-PCMPEQ This is motivated by the example in https://llvm.org/PR50055 , but it doesn't do anything for that bug currently because we don't actually have a zero-extended setcc there. Proof for the generic transform (inverse of what we would try to do in combining): https://alive2.llvm.org/ce/z/aBL-Mg Differential Revision: https://reviews.llvm.org/D102275	2021-05-12 08:25:29 -04:00
Sanjay Patel	24d06fff55	[x86] add test for pcmpeq with 0; NFC	2021-05-12 08:25:29 -04:00
Simon Pilgrim	72e242a286	[X86][AVX] canonicalizeShuffleMaskWithHorizOp - improve support for 256/512-bit vectors Extend the HOP(HOP(X,Y),HOP(Z,W)) and SHUFFLE(HOP(X,Y),HOP(Z,W)) folds to handle repeating 256/512-bit vector cases. This allows us to drop the UNPACK(HOP(),HOP()) custom fold in combineTargetShuffle. This required isRepeatedTargetShuffleMask to be tweaked to support target shuffle masks taking more than 2 inputs.	2021-05-12 12:13:24 +01:00
Peter Waller	6e6f9a636b	[AArch64][SVE] Improve sve.convert.to.svbool lowering The sve.convert.to.svbool lowering has the effect of widening a logical <M x i1> vector representing lanes into a physical <16 x i1> vector representing bits in a predicate register. In general, if converting to svbool, the contents of lanes in the physical register might not be known. For sve.convert.to.svbool the new lanes are specified to be zeroed, requiring 'and' instructions to mask off the new lanes. For lanes coming from a ptrue or a comparison, however, they are known to be zero. CodeGen Before: ptrue p0.s, vl16 ptrue p1.s ptrue p2.b and p0.b, p2/z, p0.b, p1.b ret After: ptrue p0.s, vl16 ret Differential Revision: https://reviews.llvm.org/D101544	2021-05-12 10:57:25 +01:00
Piotr Sobczak	68137ef568	[AMDGPU] Skip invariant loads when avoiding WAR conflicts No need to handle invariant loads when avoiding WAR conflicts, as there cannot be a vector store to the same memory location. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D101177	2021-05-12 10:57:05 +02:00
Tomas Matheson	34c098b780	[ARM] Prevent spilling between ldrex/strex pairs Based on the same for AArch64: `4751cadcca` At -O0, the fast register allocator may insert spills between the ldrex and strex instructions inserted by AtomicExpandPass when expanding atomicrmw instructions in LL/SC loops. To avoid this, expand to cmpxchg loops and therefore expand the cmpxchg pseudos after register allocation. Required a tweak to ARMExpandPseudo::ExpandCMP_SWAP to use the 4-byte encoding of UXT, since the pseudo instruction can be allocated a high register (R8-R15) which the 2-byte encoding doesn't support. However, the 4-byte encodings are not present for ARM v8-M Baseline. To enable this, two new pseudos are added for Thumb which are only valid for v8mbase, tCMP_SWAP_8 and tCMP_SWAP_16. The previously committed attempt in D101164 had to be reverted due to runtime failures in the test suites. Rather than spending time fixing that implementation (adding another implementation of atomic operations and more divergence between backends) I have chosen to follow the approach taken in D101163. Differential Revision: https://reviews.llvm.org/D101898 Depends on D101912	2021-05-12 09:43:21 +01:00
Tomas Matheson	edf9d88266	[ARM] Precommit test for D101898 Differential Revision: https://reviews.llvm.org/D101912	2021-05-12 09:43:21 +01:00
Matt Arsenault	cc79aaced0	AMDGPU: Fix SILoadStoreOptimizer for gfx90a This was hardcoding the register class to use for the newly created pointer registers, violating the aligned VGPR requirement.	2021-05-11 21:26:43 -04:00
Matt Arsenault	a15ed701ab	AMDGPU: Fix assert on constant load from addrspacecasted pointer This was trying to create a bitcast between different address spaces.	2021-05-11 20:12:20 -04:00
Matt Arsenault	24e2e5df0e	GlobalISel: Split ValueHandler into assignment and emission classes Currently the ValueHandler handles both selecting the type and location for arguments, as well as inserting instructions needed to handle them. Split this so that the determination of the argument handling is independent of the function state. Currently the checks for tail call compatibility do not follow the full assignment logic, so it misses cases where arguments require nontrivial legalization. This should help avoid targets ending up in a buggy state where the argument evaluation may change in different contexts.	2021-05-11 19:50:12 -04:00
Austin Kerbow	4433f4601e	[AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2 The waitcnt pass would increment the number of vmem events for some buffer invalidates that were not handled by the pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D102252	2021-05-11 13:17:33 -07:00
Craig Topper	d092dd56ae	[RISCV] Regenerate stepvector.ll. NFC It looks like the RV32 and RV64 prefixes were removed from the RUN lines while another patch was in review that added check lines that used them.	2021-05-11 13:04:57 -07:00
Albion Fung	ffbffaf6b6	[PowerPC] Improve codegen for int-to-fp conversion of subword vector extract When an integer is converted into floating point in subword vector extract, it can be done in 2 instructions instead of the 3+ instructions it generates right now. This patch removes the uncessary generation. Differential: https://reviews.llvm.org/D100604	2021-05-11 15:00:11 -05:00
Amara Emerson	69069509b2	[AArch64][GlobaISel] Mark target generic instructions as HasNoSideEffects. One test needed updating because the newly side-effect-free instructions were now being DCE'd.	2021-05-11 12:38:53 -07:00
Amara Emerson	ae2b36e8bd	[AArch64][GlobalISel] Support truncstorei8/i16 w/ combine to form truncating G_STOREs. This needs some tablegen changes so that we can actually import the patterns properly. Differential Revision: https://reviews.llvm.org/D102204	2021-05-11 11:33:03 -07:00
Fangrui Song	ec27c5f170	[RISCV] Prefer to lower MC_GlobalAddress operands to .Lfoo$local Similar to X86 D73230 and AArch64 D101872 With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode, for default visibility external linkage non-ifunc-non-COMDAT definitions. For such dso_local definitions, variable access/taking the address of a function/calling a function will go through a local alias to avoid GOT/PLT. Reviewed By: jrtc27, luismarques Differential Revision: https://reviews.llvm.org/D101875	2021-05-11 11:29:45 -07:00
Simon Pilgrim	4f80340fb6	[X86][SSE] Add tests for permute(phaddw(phaddw(x,y),phaddw(z,w))) -> phaddw(phaddw(),phaddw()) folds. We currently only fold if NumEltsPerLane == 4	2021-05-11 17:47:10 +01:00
Craig Topper	ce6e4f27dd	[RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min. My thought process is that if v2i64 is an LMUL=1 type then v2i32 should be an LMUL=1/2 type. We limit the fractional LMUL so that SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This ensures there's always a fractional LMUL available to truncate a type. This does reduce the number of vsetvlis in some cases. Some tests increase vsetvlis because the best container type for a mask type is dependent on the LMUL+SEW that the mask was produced from, but you can't tell that from the type. I think this is something we need to solve this in the machine IR when optimizing vsetvlis. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D101215	2021-05-11 09:42:48 -07:00
Roman Lebedev	5f78ba001c	[X86][Codegen] Shift amount mod: sh? i64 x, (32-y) --> sh? i64 x, -(y+32) I've seen this in the RawSpeed's BitPumpMSB*::push() hotpath, after fixing the buffer abstraction to a more sane one, when looking into a +5% runtime regression. I was hoping that this would fix it, but it does not look it does. This seems to be at least not worse than the original pattern. But i'm actually mainly interested in the case where we already compute `(y+32)` (see last test), https://alive2.llvm.org/ce/z/ZCzJio Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101944	2021-05-11 19:39:41 +03:00
Craig Topper	dc00cbb505	[RISCV] Match trunc_vector_vl+sra_vl/srl_vl with splat shift amount to vnsra/vnsrl. Limited to splats because we would need to truncate the shift amount vector otherwise. I tried to do this with new ISD nodes and a DAG combine to avoid such a large pattern, but we don't form the splat until LegalizeDAG and need DAG combine to remove a scalable->fixed->scalable cast before it becomes visible to the shift node. By the time that happens we've already visited the truncate node and won't revisit it. I think I have an idea how to improve i64 on RV32 I'll save for a follow up. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102019	2021-05-11 09:29:31 -07:00
Roman Lebedev	2c1f9f390b	[NFC][X86] Precommit another testcase for D101944	2021-05-11 18:34:43 +03:00
Simon Pilgrim	9acc03ad92	[X86][SSE] Replace foldShuffleOfHorizOp with generalized version in canonicalizeShuffleMaskWithHorizOp foldShuffleOfHorizOp only handled basic shufps(hop(x,y),hop(z,w)) folds - by moving this to canonicalizeShuffleMaskWithHorizOp we can work with more general/combined v4x32 shuffles masks, float/integer domains and support shuffle-of-packs as well. The next step will be to support 256/512-bit vector cases.	2021-05-11 14:18:45 +01:00
Piotr Sobczak	09fe84abb4	[AMDGPU] Move code sinking before structurizer Moving code sinking pass before structurizer creates more sinking opportunities. The extra flow edges introduced by the structurizer can have adverse effects on sinking, because the sinking pass prefers moving instructions to blocks with unique predecessors and the structurizer destroys that property in some cases. A notable example is moving high-latency image instructions across kills. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D101115	2021-05-11 14:07:23 +02:00
Stefan Pintilie	c79bc5942d	[PowerPC][Bug] Fix Bug in Stack Frame Update Code The stack frame update code does not take into consideration spilling to registers for callee saved registers. The option -ppc-enable-pe-vector-spills turns on spilling to registers for callee saved registers and may expose a bug in the code that moves a stack frame pointer update instruction. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D101366	2021-05-11 05:54:07 -05:00
Denis Antrushin	df47368d40	[RegAllocFast] properly handle STATEPOINT instruction. STATEPOINT is a fancy and complex pseudo instruction which has both tied defs and regmask operand. Basic FastRA algorithm is as follows: 1. Mark registers used by defs as free 2. If instruction has regmask operand displace clobbered registers according to regmask. 3. Assign registers for use operands. In case of tied defs step 1 is replaced with allocation of registers for them. But regmask is still processed, which may displace already allocated registers. As a result, tied use and def will get assigned to different registers. This patch makes FastRA to process instruction's RegMask (if any) when checking for physical registers interference. That way tied operands won't get registers clobbered by regmask. Reviewed By: arsenm, skatkov Differential Revision: https://reviews.llvm.org/D99284	2021-05-11 17:27:00 +07:00
Jay Foad	3b873831c4	[AMDGPU] Add some GFX10.3 testing. NFC.	2021-05-11 11:21:19 +01:00

1 2 3 4 5 ...

38893 Commits