llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	2bdfcf0cac	GlobalISel: Move AArch64 AssignFnVarArg to base class We can handle the distinction easily enough in the generic code, and this makes it easier to abstract the selection of type/location from the code to insert code.	2021-05-11 19:50:12 -04:00
Jordan Rupprecht	fec2945998	Revert "[GVN] Clobber partially aliased loads." This reverts commit `6c57044231`. It causes assertion errors due to widening atomic loads, and potentially causes miscompile elsewhere too. Repro, also posted to D95543: ``` $ cat repro.ll ; ModuleID = 'repro.ll' source_filename = "repro.ll" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %struct.widget = type { i32 } %struct.baz = type { i32, %struct.snork } %struct.snork = type { %struct.spam } %struct.spam = type { i32, i32 } @global = external local_unnamed_addr global %struct.widget, align 4 @global.1 = external local_unnamed_addr global i8, align 1 @global.2 = external local_unnamed_addr global i32, align 4 define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 { bb: %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1 %tmp1 = bitcast %struct.snork* %tmp to i64* %tmp2 = load i64, i64* %tmp1, align 4 %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1 %tmp4 = icmp ugt i64 %tmp2, 4294967295 br label %bb5 bb5: ; preds = %bb14, %bb %tmp6 = load i32, i32* %tmp3, align 4 %tmp7 = icmp ne i32 %tmp6, 0 %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false %tmp9 = zext i1 %tmp8 to i8 store i8 %tmp9, i8* @global.1, align 1 %tmp10 = load i32, i32* @global.2, align 4 switch i32 %tmp10, label %bb11 [ i32 1, label %bb12 i32 2, label %bb12 ] bb11: ; preds = %bb5 br label %bb14 bb12: ; preds = %bb5, %bb5 %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4 br label %bb14 bb14: ; preds = %bb12, %bb11 br label %bb5 } $ opt -O2 repro.ll -disable-output opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst , unsigned int, llvm::Type , llvm::Instruction , const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output ... ```	2021-05-11 16:08:53 -07:00
Lang Hames	d63860a052	[JITLink] Fix bogus format string.	2021-05-11 16:04:00 -07:00
Congzhe Cao	40e3aa39bd	[LoopInterchange] Fix legality for triangular loops This is a bug fix in legality check. When we encounter triangular loops such as the following form: for (int i = 0; i < m; i++) for (int j = 0; j < i; j++), or for (int i = 0; i < m; i++) for (int j = 0; j*i < n; j++), we should not perform interchange since the number of executions of the loop body will be different before and after interchange, resulting in incorrect results. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D101305	2021-05-11 18:36:53 -04:00
Petr Hosek	8280ece0c9	[Coverage] Support overriding compilation directory When making compilation relocatable, for example in distributed compilation scenarios, we want to set compilation dir to a relative value like `.` but this presents a problem when generating reports because if the file path is relative as well, for example `..`, you may end up writing files outside of the output directory. This change introduces a flag that allows overriding the compilation directory that's stored inside the profile with a different value that is absolute. Differential Revision: https://reviews.llvm.org/D100232	2021-05-11 15:26:45 -07:00
Lang Hames	a0162a81b1	[JITLink][MachO/x86_64] Expose API for creating eh-frame fixing passes. These can be used to create eh-frame section fixing passes outside the usual linker pipeline, which can be useful for tests and tools that just want to verify or dump graphs.	2021-05-11 15:26:16 -07:00
Lang Hames	74a96b4c98	[JITLink][x86-64] Add an x86_64 PointerSize constexpr. This can be used in place of magic '8' values in generic x86-64 utilities.	2021-05-11 15:26:15 -07:00
Lang Hames	cbcfca343f	[JITLink] Make LinkGraph debug dumps more readable. This commit reorders some fields and fixes the width of others to try to maintain more consistent columns. It also switches to long-hand scope and linkage names, since LinkGraph dumps aren't read often enough for single-character codes to be memorable.	2021-05-11 15:26:15 -07:00
Congzhe Cao	d3f89d4d16	Revert "[LoopInterchange] Fix legality for triangular loops" This reverts commit `29342291d2`. The test case requires an assert build. Will add REQUIRES and re-commit.	2021-05-11 18:10:58 -04:00
Petr Hosek	489a3531a4	[llvm-cov] Support for v4 format in convert-for-testing v4 moves function records to a dedicated section so we need to write and read it separately. https://reviews.llvm.org/D100535	2021-05-11 14:41:55 -07:00
Evandro Menezes	3a64b7080d	[RISCV] Move instruction information into the RISCVII namespace (NFC) Move instruction attributes into the `RISCVII` namespace and add associated helper functions. Differential Revision: https://reviews.llvm.org/D102268	2021-05-11 16:32:42 -05:00
Nikita Popov	1556540372	[InstCombine] Clean up one-hot merge optimization (NFC) Remove the requirement that the instruction is a BinaryOperator, make the predicate check more compact and use slightly more meaningful naming for the and operands.	2021-05-11 23:22:11 +02:00
Austin Kerbow	4433f4601e	[AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2 The waitcnt pass would increment the number of vmem events for some buffer invalidates that were not handled by the pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D102252	2021-05-11 13:17:33 -07:00
Albion Fung	ffbffaf6b6	[PowerPC] Improve codegen for int-to-fp conversion of subword vector extract When an integer is converted into floating point in subword vector extract, it can be done in 2 instructions instead of the 3+ instructions it generates right now. This patch removes the uncessary generation. Differential: https://reviews.llvm.org/D100604	2021-05-11 15:00:11 -05:00
Amara Emerson	69069509b2	[AArch64][GlobaISel] Mark target generic instructions as HasNoSideEffects. One test needed updating because the newly side-effect-free instructions were now being DCE'd.	2021-05-11 12:38:53 -07:00
Roman Lebedev	97e04d41e6	[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): canonicalize to integer type This way we don't have to duplicate i32/f32 and i64/f64 entries, which was already forgotten to be done for a few tuples.	2021-05-11 21:35:58 +03:00
Fangrui Song	129f466e22	[GlobalOpt] Remove heap SROA GlobalOpt implements a heap SROA (SROA for an malloc allocatated struct or array of structs) which is largely undertested (heap-sra-[1234].ll are basically the same test with very little difference) and does not trigger at all when bootstrapping clang (it only supports the case of one single store). The heap SROA implementation causes PR50027 (GEP is not properly handled; crash or miscompile). Just drop the implementation. I have deleted some obviously duplicated tests but kept `heap-sra-[12]{,-no-nullopt}.ll`. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102257	2021-05-11 11:34:37 -07:00
Amara Emerson	ae2b36e8bd	[AArch64][GlobalISel] Support truncstorei8/i16 w/ combine to form truncating G_STOREs. This needs some tablegen changes so that we can actually import the patterns properly. Differential Revision: https://reviews.llvm.org/D102204	2021-05-11 11:33:03 -07:00
Fangrui Song	ec27c5f170	[RISCV] Prefer to lower MC_GlobalAddress operands to .Lfoo$local Similar to X86 D73230 and AArch64 D101872 With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode, for default visibility external linkage non-ifunc-non-COMDAT definitions. For such dso_local definitions, variable access/taking the address of a function/calling a function will go through a local alias to avoid GOT/PLT. Reviewed By: jrtc27, luismarques Differential Revision: https://reviews.llvm.org/D101875	2021-05-11 11:29:45 -07:00
Eli Friedman	61cbbba7a6	[ArgumentPromotion] Fix byval alignment handling. Make sure the alignment of the generated operations matches the alignment of the byval argument. Previously, we were just ignoring alignment and getting lucky. While I'm here, also delete the unnecessary "tail" handling. Passing a pointer to a byval argument to a "tail" call is UB, so rewriting to an alloca doesn't require any special handling. Differential Revision: https://reviews.llvm.org/D89819	2021-05-11 11:22:18 -07:00
Sam Powell	cba508fb67	[TextAPI] Reformat llvm_unreachable message Change llvm_unreachable message from "Unknown llvm.MachO.PlatformKind enum" to "Unknown llvm::MachO::PlatformKind enum". Differential revision: https://reviews.llvm.org/D102250	2021-05-11 09:59:26 -07:00
Craig Topper	ce6e4f27dd	[RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min. My thought process is that if v2i64 is an LMUL=1 type then v2i32 should be an LMUL=1/2 type. We limit the fractional LMUL so that SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This ensures there's always a fractional LMUL available to truncate a type. This does reduce the number of vsetvlis in some cases. Some tests increase vsetvlis because the best container type for a mask type is dependent on the LMUL+SEW that the mask was produced from, but you can't tell that from the type. I think this is something we need to solve this in the machine IR when optimizing vsetvlis. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D101215	2021-05-11 09:42:48 -07:00
Roman Lebedev	5f78ba001c	[X86][Codegen] Shift amount mod: sh? i64 x, (32-y) --> sh? i64 x, -(y+32) I've seen this in the RawSpeed's BitPumpMSB*::push() hotpath, after fixing the buffer abstraction to a more sane one, when looking into a +5% runtime regression. I was hoping that this would fix it, but it does not look it does. This seems to be at least not worse than the original pattern. But i'm actually mainly interested in the case where we already compute `(y+32)` (see last test), https://alive2.llvm.org/ce/z/ZCzJio Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101944	2021-05-11 19:39:41 +03:00
Craig Topper	dc00cbb505	[RISCV] Match trunc_vector_vl+sra_vl/srl_vl with splat shift amount to vnsra/vnsrl. Limited to splats because we would need to truncate the shift amount vector otherwise. I tried to do this with new ISD nodes and a DAG combine to avoid such a large pattern, but we don't form the splat until LegalizeDAG and need DAG combine to remove a scalable->fixed->scalable cast before it becomes visible to the shift node. By the time that happens we've already visited the truncate node and won't revisit it. I think I have an idea how to improve i64 on RV32 I'll save for a follow up. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102019	2021-05-11 09:29:31 -07:00
Steven Wu	4eff946947	[IR][AutoUpgrade] Drop align attribute from void return types Since D87304, `align` become an invalid attribute on none pointer types and verifier will reject bitcode that has invalid `align` attribute. The problem is before the change, DeadArgumentElimination can easily turn a pointer return type into a void return type without removing `align` attribute. Teach Autograde to remove invalid `align` attribute from return types to maintain bitcode compatibility. rdar://77022993 Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D102201	2021-05-11 08:23:55 -07:00
Congzhe Cao	29342291d2	[LoopInterchange] Fix legality for triangular loops This is a bug fix in legality check. When we encounter triangular loops such as the following form: for (int i = 0; i < m; i++) for (int j = 0; j < i; j++), or for (int i = 0; i < m; i++) for (int j = 0; j*i < n; j++), we should not perform interchange since the number of executions of the loop body will be different before and after interchange, resulting in incorrect results. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D101305	2021-05-11 11:00:46 -04:00
Aakanksha Patil	c58912eca7	Fix typo "Execpt" in comments Differential Revision: https://reviews.llvm.org/D101858	2021-05-11 10:47:01 -04:00
Paul C. Anagnostopoulos	46402eb103	Revert "[TableGen] Make the NUL character invalid in .td files" At least one build uses a 'sed' that does not understand \x00. This reverts commit cf9647011c4f05e1eb4423c6637d84e2f26b2042.	2021-05-11 10:43:13 -04:00
Florian Hahn	faebc6bf10	[VPlan] Register recipe for instr if the simplified value is recipe. If the simplified VPValue is a recipe, we need to register it for Instr, in case it needs to be recorded. The way this is handled in general may change soon, following some post-commit comments. This fixes PR50298.	2021-05-11 14:32:34 +01:00
Roman Lebedev	69ed93a435	[X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost() Now that getMemoryOpCost() correctly handles all the vector variants, we should no longer hand-roll our own version of it, but use it directly. The AVX512 variant probably needs a similar change, but there it is less obvious.	2021-05-11 16:28:00 +03:00
Paul C. Anagnostopoulos	6ca2bdb03c	[TableGen] Make the NUL character invalid in .td files Differential Revision: https://reviews.llvm.org/D101923	2021-05-11 09:20:42 -04:00
Simon Pilgrim	759b97e55a	[X86] Replace repeated isa/cast<ConstantSDNode> calls with single single dyn_cast<>. NFCI. Noticed while looking at D101944	2021-05-11 14:18:45 +01:00
Simon Pilgrim	9acc03ad92	[X86][SSE] Replace foldShuffleOfHorizOp with generalized version in canonicalizeShuffleMaskWithHorizOp foldShuffleOfHorizOp only handled basic shufps(hop(x,y),hop(z,w)) folds - by moving this to canonicalizeShuffleMaskWithHorizOp we can work with more general/combined v4x32 shuffles masks, float/integer domains and support shuffle-of-packs as well. The next step will be to support 256/512-bit vector cases.	2021-05-11 14:18:45 +01:00
Matt Arsenault	bce3cca488	CodeGen: Fix null dereference before null check	2021-05-11 09:07:32 -04:00
Roman Lebedev	c02476f315	[X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again Instead of handling power-of-two sized vector chunks, try handling the large vector in a stream mode, decreasing the operational vector size once it no longer works for the elements left to process. Notably, this improves costs for overaligned loads - loading padding is fine. This more directly tracks when we need to insert/extract the YMM/XMM subvector, some costs fluctuate because of that. Reviewed By: RKSimon, ABataev Differential Revision: https://reviews.llvm.org/D100684	2021-05-11 16:02:22 +03:00
Sanjay Patel	49950cb1f6	[SLP] restrict matching of load combine candidates The test example from https://llvm.org/PR50256 (and reduced here) shows that we can match a load combine candidate even when there are no "or" instructions. We can avoid that by confirming that we do see an "or". This doesn't apply when matching an or-reduction because that match begins from the operands of the reduction. Differential Revision: https://reviews.llvm.org/D102074	2021-05-11 08:46:40 -04:00
Piotr Sobczak	09fe84abb4	[AMDGPU] Move code sinking before structurizer Moving code sinking pass before structurizer creates more sinking opportunities. The extra flow edges introduced by the structurizer can have adverse effects on sinking, because the sinking pass prefers moving instructions to blocks with unique predecessors and the structurizer destroys that property in some cases. A notable example is moving high-latency image instructions across kills. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D101115	2021-05-11 14:07:23 +02:00
Stefan Pintilie	c79bc5942d	[PowerPC][Bug] Fix Bug in Stack Frame Update Code The stack frame update code does not take into consideration spilling to registers for callee saved registers. The option -ppc-enable-pe-vector-spills turns on spilling to registers for callee saved registers and may expose a bug in the code that moves a stack frame pointer update instruction. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D101366	2021-05-11 05:54:07 -05:00
Denis Antrushin	df47368d40	[RegAllocFast] properly handle STATEPOINT instruction. STATEPOINT is a fancy and complex pseudo instruction which has both tied defs and regmask operand. Basic FastRA algorithm is as follows: 1. Mark registers used by defs as free 2. If instruction has regmask operand displace clobbered registers according to regmask. 3. Assign registers for use operands. In case of tied defs step 1 is replaced with allocation of registers for them. But regmask is still processed, which may displace already allocated registers. As a result, tied use and def will get assigned to different registers. This patch makes FastRA to process instruction's RegMask (if any) when checking for physical registers interference. That way tied operands won't get registers clobbered by regmask. Reviewed By: arsenm, skatkov Differential Revision: https://reviews.llvm.org/D99284	2021-05-11 17:27:00 +07:00
Andy Wingo	b2f21b145a	[CodeGen][WebAssembly] Better lowering for WASM_SYMBOL_TYPE_GLOBAL symbols As we have been missing support for WebAssembly globals on the IR level, the lowering of WASM_SYMBOL_TYPE_GLOBAL to IR was incomplete. This commit fleshes out the lowering support, lowering references to and definitions of addrspace(1) values to correctly typed WASM_SYMBOL_TYPE_GLOBAL symbols. Depends on D101608. Differential Revision: https://reviews.llvm.org/D101913	2021-05-11 11:47:40 +02:00
Paulo Matos	d7086af214	[WebAssembly] Support for WebAssembly globals in LLVM IR This patch adds support for WebAssembly globals in LLVM IR, representing them as pointers to global values, in a non-default, non-integral address space. Instruction selection legalizes loads and stores to these pointers to new WebAssemblyISD nodes GLOBAL_GET and GLOBAL_SET. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set of the appropriate type. Based on work by Paulo Matos in https://reviews.llvm.org/D95425. Reviewed By: pmatos Differential Revision: https://reviews.llvm.org/D101608	2021-05-11 11:19:29 +02:00
Alex Orlov	05d1ae4e18	* Add support for JSON output style to llvm-symbolizer This patch adds JSON output style to llvm-symbolizer to better support CLI automation by providing a machine readable output. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D96883	2021-05-11 13:10:54 +04:00
Hsiangkai Wang	d8ec2b183e	[RISCV] Fix the calculation of the offset of Zvlsseg spilling. For Zvlsseg spilling, we need to convert the pseudo instructions into multiple vector load/store instructions with appropriate offsets. For example, for PseudoVSPILL3_M2, we need to convert it to VS2R %v2, %base ADDI %base, %base, (vlenb x 2) VS2R %v4, %base ADDI %base, %base, (vlenb x 2) VS2R %v6, %base We need to keep the size of the offset in the pseudo spilling instructions. In this case, it is (vlenb x 2). In the original implementation, we use the size of frame objects divide the number of vectors in zvlsseg types. The size of frame objects is not necessary exactly the same as the spilling data. It may be larger than it. So, we change it to (VLENB x LMUL) in this patch. The calculation is more direct and easy to understand. Differential Revision: https://reviews.llvm.org/D101869	2021-05-11 10:13:18 +08:00
Stanislav Mekhanoshin	22d295f695	[AMDGPU] Constant fold Intrinsic::amdgcn_perm Differential Revision: https://reviews.llvm.org/D102203	2021-05-10 16:23:11 -07:00
Sam Clegg	3b8d2be527	Reland: "[lld][WebAssembly] Initial support merging string data" This change was originally landed in: `5000a1b4b9` It was reverted in: `061e071d8c` This change adds support for a new WASM_SEG_FLAG_STRINGS flag in the object format which works in a similar fashion to SHF_STRINGS in the ELF world. Unlike the ELF linker this support is currently limited: - No support for SHF_MERGE (non-string merging) - Always do full tail merging ("lo" can be merged with "hello") - Only support single byte strings (p2align 0) Like the ELF linker merging is only performed at `-O1` and above. This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828, although crucially it doesn't not currently support debug sections because they are not represented by data segments (they are custom sections) Differential Revision: https://reviews.llvm.org/D97657	2021-05-10 16:03:38 -07:00
Jessica Paquette	79be9c59c6	[AArch64][GlobalISel] Add post-legalizer lowering for NEON vector fcmps This is roughly equivalent to the floating point portion of `AArch64TargetLowering::LowerVSETCC`. Main part that's missing is the v4s16 bit. This also adds helpers equivalent to `EmitVectorComparison`, and `changeVectorFPCCToAArch64CC`. This moves `changeFCMPPredToAArch64CC` out of the selector into AArch64GlobalISelUtils for the sake of code reuse. This is done in post-legalizer lowering with pseudos to simplify selection. The imported patterns end up handling selection for us this way. Differential Revision: https://reviews.llvm.org/D101782	2021-05-10 15:40:06 -07:00
Nico Weber	061e071d8c	Revert "[lld][WebAssembly] Initial support merging string data" This reverts commit `5000a1b4b9`. Breaks tests, see https://reviews.llvm.org/D97657#2749151 Easily repros locally with `ninja check-llvm-mc-webassembly`.	2021-05-10 18:28:28 -04:00
Jessica Paquette	6d8b070d96	[AArch64][GlobalISel] Enable memcpy family combines on minsize functions The combines in `tryCombineMemCpyFamily` have heuristics (e.g. `TLI.getMaxStoresPerMemset`) which consider size. So, theoretically, enabling these combines on minsize functions shouldn't be harmful. With this enabled we save 0.9% geomean on CTMark at -Oz, and 5.1% on Bullet. There are no code size regressions. Differential Revision: https://reviews.llvm.org/D102198	2021-05-10 15:25:23 -07:00
Krzysztof Parzyszek	8b9c15c281	[Hexagon] Handle loads and stores of scalar predicate vectors Handle v2i1, v4i1, and v8i1.	2021-05-10 16:42:22 -05:00
Sanjay Patel	5577e86691	[InstCombine] fold extract subvector of bitcast insertelt This is visible in the original example from: https://llvm.org/PR50055 (but this change doesn't solve the bug) https://alive2.llvm.org/ce/z/vM_Yq-	2021-05-10 17:20:10 -04:00
Roman Lebedev	6a64c462eb	[X86] AMD Zen 3: same-reg AVX YMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Still not zero-cycle :)	2021-05-10 23:49:27 +03:00
Roman Lebedev	2953245337	[X86] AMD Zen 3: same-reg AVX XMM VPCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Again, it's not zero-cycle.	2021-05-10 23:49:26 +03:00
Roman Lebedev	0f3bcb97ef	[X86] AMD Zen 3: same-reg SSE XMM PCMP is dep breaking one-idiom As measured by exegesis, and confirmed by ref docs. Much like with MMX PCMP, it does actually have to execute, though.	2021-05-10 23:49:26 +03:00
Roman Lebedev	b24edfff4f	[X86] AMD Zen 3: same-reg PCMPEQ is an MMX all-ones dep breaking idiom They are, however, not zero-cycle, and do actually execute. As measured by exegesis, and confirmed by ref docs.	2021-05-10 23:49:26 +03:00
Nikita Popov	463ea28e96	[InstCombine] Fold comparison of integers by parts Let's say you represent (i32, i32) as an i64 from which the parts are extracted with lshr/trunc. Then, if you compare two tuples by parts you get something like A[0] == B[0] && A[1] == B[1], just that the part extraction happens by lshr/trunc and not a narrow load or similar. The fold implemented here reduces such equality comparisons by converting them into a comparison on a larger part of the integer (which might be the whole integer). It handles both the "and of eq" and the conjugated "or of ne" case. I'm being conservative with one-use for now, though this could be relaxed if profitable (the base pattern converts 11 instructions into 5 instructions, but there's quite a few variations on how it can play out). Differential Revision: https://reviews.llvm.org/D101232	2021-05-10 22:22:39 +02:00
Florian Hahn	93a9a8a8d9	[VecLib] Add support for vector fns from Darwin's libsystem. This patch adds support for Darwin's libsystem math vector functions to TLI. Darwin's libsystem provides a range of vector functions for libm functions. This initial patch only adds the 2 x double and 4 x float versions, which are available on both X86 and ARM64. On X86, wider vector versions are supported as well. Reviewed By: jroelofs Differential Revision: https://reviews.llvm.org/D101856	2021-05-10 21:19:58 +01:00
Sam Clegg	5000a1b4b9	[lld][WebAssembly] Initial support merging string data This change adds support for a new WASM_SEG_FLAG_STRINGS flag in the object format which works in a similar fashion to SHF_STRINGS in the ELF world. Unlike the ELF linker this support is currently limited: - No support for SHF_MERGE (non-string merging) - Always do full tail merging ("lo" can be merged with "hello") - Only support single byte strings (p2align 0) Like the ELF linker merging is only performed at `-O1` and above. This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828, although crucially it doesn't not currently support debug sections because they are not represented by data segments (they are custom sections) Differential Revision: https://reviews.llvm.org/D97657	2021-05-10 13:15:12 -07:00
Arthur Eubanks	85af8a8c1b	[NFC] Use ArgListEntry indirect types more in ISel lowering For opaque pointers, we're trying to avoid uses of PointerType::getElementType(). A couple of ISel places use PointerType::getElementType(). Some of these are easy to fix by using ArgListEntry's indirect types. The inalloca type wasn't stored there, as opposed to preallocated and byval which have their indirect types available, so add it and use it. Differential Revision: https://reviews.llvm.org/D101713	2021-05-10 13:05:15 -07:00
Nikita Popov	aa9b02ac75	[Inliner] Fix noalias metadata handling for instructions simplified during cloning (PR50270) Instead of using VMap, which may include instructions from the caller as a result of simplification, iterate over the (FirstNewBlock, Caller->end()) range, which will only include new instructions. Fixes https://bugs.llvm.org/show_bug.cgi?id=50270. Differential Revision: https://reviews.llvm.org/D102110	2021-05-10 21:59:59 +02:00
Stefan Pintilie	6215f49b8f	[PowerPC] Spilling to registers does not require frame index scavenging If spills are to registers instead of to the stack then a copy will be used and frame index scavenging is not required. This patch adds debug info to frame index scavenging and makes sure that spilling to registers does not cause frame index scavenging. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D101360	2021-05-10 14:42:39 -05:00
Arthur Eubanks	16748bd2fb	[TargetLowering] Only inspect attributes in the arguments for ArgListEntry Parameter attributes are considered part of the function [1], and like mismatched calling conventions [2], we can't have the verifier check for mismatched parameter attributes. [1] https://llvm.org/docs/LangRef.html#parameter-attributes [2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D101806	2021-05-10 12:35:11 -07:00
Amara Emerson	dc75499998	[GlobalISel][IRTranslator] Fix bit-test lowering dropping phi edges. For contiguous ranges we drop the last bit-test case but in doing so we skip adding the new MBB PHI edges to the list of replacement PHI edges, and as a result we incorrectly omit them in the G_PHI in finishPendingPhis(). Was found when bootstrapping clang with -O3 and GlobalISel enabled on Apple Silicon.	2021-05-10 11:59:31 -07:00
Sanjay Patel	88d8f10baf	[PassManager] add helper function to hold set of vector passes (2nd try) This is better no-functional-change-intended than the 1st attempt. As noted in D102002, there were at least 2 diffs that went unchecked in pass manager regressions tests: different pass parameters (SimplifyCFG) and an extension point/callback. Those should be lifted from the original code blocks correctly now.	2021-05-10 14:43:00 -04:00
Roman Lebedev	08cf2776ac	[X86] AMD Zen 3: sub-32-bit CMP also break dependencies They measure as having the same effect as 32-bit CMP.	2021-05-10 20:57:38 +03:00
Simon Pilgrim	e32374ed5c	[X86][SSE] canonicalizeShuffleMaskWithHorizOp - add TODO for better 256/512-bit shuffle+hop folding support. NFC.	2021-05-10 18:43:16 +01:00
Andy Kaylor	7086025d65	[Dependence Analysis] Enable delinearization of fixed sized arrays Patch by Artem Radzikhovskyy! Allow delinearization of fixed sized arrays if we can prove that the GEP indices do not overflow the array dimensions. The checks applied are similar to the ones that are used for delinearization of parametric size arrays. Make sure that the GEP indices are non-negative and that they are smaller than the range of that dimension. Changes Summary: - Updated the LIT tests with more exact values, as we are able to delinearize and apply more exact tests - profitability.ll - now able to delinearize in all cases, no need to use -da-disable-delinearization-checks flag and run the test twice - loop-interchange-optimization-remarks.ll - in one of the cases we are able to delinearize without using -da-disable-delinearization-checks - SimpleSIVNoValidityCheckFixedSize.ll - removed unnecessary "-da-disable-delinearization-checks" flag. Now can get the exact answer without it. - SimpleSIVNoValidityCheckFixedSize.ll and PreliminaryNoValidityCheckFixedSize.ll - made negative tests more explicit, in order to demonstrate the need for "-da-disable-delinearization-checks" flag Differential Revision: https://reviews.llvm.org/D101486	2021-05-10 10:30:15 -07:00
Craig Topper	80b9510806	[RISCV] Correct VL for fixed length masked scatter. We were incorrectly calling getVectorNumElements on a scalable vector type. This shouldn't be allowed. This gives a warning on EVT, but not MVT.	2021-05-10 09:50:08 -07:00
Tomasz Miąsko	2961f86317	[Demangle][Rust] Parse basic types Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D102142	2021-05-10 09:44:46 -07:00
Harald van Dijk	b0ef2070bc	[X86] Fix position-independent TType encoding The logic for x86_64 position-independent TType encodings was backwards, using 8 bytes where 4 were wanted and 4 where 8 were wanted. For regular x86_64, this was mostly harmless, exception tables are allowed to use 8-byte encodings even when it is not needed. For the large code model, and for X32, however, the generated exception tables were wrong. For the large code model, we cannot assume that the address will fit in 4 bytes. For X32, we cannot use 64-bit relocations. Fixes PR50148. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D102132	2021-05-10 17:04:33 +01:00
Momchil Velikov	5c7b43aa82	[clang][AArch32] Correctly align HA arguments when passed on the stack Analogously to https://reviews.llvm.org/D98794 this patch uses the `alignstack` attribute to fix incorrect passing of homogeneous aggregate (HA) arguments on AArch32. The EABI/AAPCS was recently updated to clarify how VFP co-processor candidates are aligned: `4488e34998` Differential Revision: https://reviews.llvm.org/D100853	2021-05-10 16:28:46 +01:00
Sanjay Patel	822be4bec8	Revert "[PassManager] add helper function to hold set of vector passes" This reverts commit `fefcb1f878`. It was supposed to be NFC, but as noted in the post-commit comments in D102002, that was not true: SimplifyCFG uses different parameters and there's a difference in an extension point / callback.	2021-05-10 10:59:30 -04:00
Zarko Todorovski	0c41f77857	[PowerPC] Enable safe for 32bit vins* P10 instructions Correctly emit `vins`instructions that are safe in 32bit mode. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D101383	2021-05-10 10:13:13 -04:00
Alexey Bataev	30463bc3f1	[SLP]Do not count perfect diamond matches for gathers several times. Need to remove the old code for avoiding double counting of the gather nodes with perfect diamond matches within the tree after we started detecting perfect/shuffled matching in the previous patch D100495. We may skip the cost for such nodes completely. Differential Revision: https://reviews.llvm.org/D102023	2021-05-10 07:08:07 -07:00
Bradley Smith	635164b95a	[AArch64][SVE] Improve SVE codegen for fixed length BITCAST Expanding a fixed length operation involves wrapping the operation in an insert/extract subvector pair, as such, when this is done to bitcast we end up with an extract_subvector of a bitcast. DAGCombine tries to convert this into a bitcast of an extract_subvector which restores the initial fixed length bitcast, causing an infinite loop of legalization. As part of this patch, we must make sure the above DAGCombine does not trigger after legalization if the created bitcast would not be legal. Differential Revision: https://reviews.llvm.org/D101990	2021-05-10 14:43:53 +01:00
Simon Pilgrim	605f90475f	X86FlagsCopyLowering.cpp - try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef copies. NFCI.	2021-05-10 14:00:37 +01:00
Simon Pilgrim	9243a584d3	X86LoadValueInjectionLoadHardening.cpp - use const-reference in for-range loops to avoid unnecessary copies. NFCI.	2021-05-10 14:00:36 +01:00
Fraser Cormack	3212a08a8c	[Constant] Allow ConstantAggregateZero a scalable element count A ConstantAggregateZero may be created from a scalable vector type. However, it still assumed fixed number of elements when queried for them. This patch changes ConstantAggregateZero to correctly report its element count. This change fixes a couple of issues. Firstly, it fixes a crash in Constant::getUniqueValue when called on a scalable-vector zeroinitializer constant. Secondly, it fixes a latent bug in GlobalISel's IRTranslator in which translating a scalable-vector zeroinitializer would hit the assertion in ConstantAggregateZero::getNumElements when casting to a FixedVectorType, rather than reporting an error more gracefully. This is currently hypothetical as the IRTranslator has deeper issues preventing the use of scalable vector types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D102082	2021-05-10 13:51:53 +01:00
Bradley Smith	65c89cd1a6	[AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics When using predicated intrinsics, if the predicate used is all lanes active, use an unpredicated form of the instruction, additionally this allows for better use of immediate forms. This only includes instructions where the unpredicated/predicated forms matched in such a way that instruction selection would not introduce extra ptrue instructions. This allows us to convert the intrinsics directly to architecture independent ISD nodes. Depends on D101062 Differential Revision: https://reviews.llvm.org/D101828	2021-05-10 13:06:02 +01:00
Bradley Smith	f8f953c2a6	[AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics When using predicated arithmetic intrinsics, if the predicate used is all lanes active, use an unpredicated form of the instruction, additionally this allows for better use of immediate forms. This also includes a new complex isel pattern which allows matching an all active predicate when the types are different but the predicate is a superset of the type being used. For example, to allow a b8 ptrue for a b32 predicate operand. This only includes instructions where the unpredicated/predicated forms are mismatched between variants, meaning that the removal of the predicate is done during instruction selection in order to prevent spurious re-introductions of ptrue instructions. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D101062	2021-05-10 13:05:37 +01:00
Momchil Velikov	f3139b20a0	[GlobalISel] Fix wrong invocation of `getParamStackAlign` (NFC) The function template `CallLowering::setArgFlags` is invoked both for arguments and return values. In the latter case, it calls `getParamStackAlign` with argument index `~0u`. Nothing wrong happens now, as the argument is safely incremented back to 0 inside `getParamStackAlign` (the type is `unsigned`), but in principle it's fragile and may become incorrect. Differential Revision: https://reviews.llvm.org/D102004	2021-05-10 12:16:33 +01:00
Sander de Smalen	407a33889d	[AArch64][SVE] Fix isel failure for FP-extending loads DAGCombiner tries to combine a (fpext (load)) to (fround (extload)) but SVE has no FP-extending loads. By marking these as expand, the combine no longer happens. This also fixes a similar issue for fptrunc, where the source type is not a legal type. Reviewed By: bsmith, kmclaughlin Differential Revision: https://reviews.llvm.org/D102053	2021-05-10 11:27:38 +01:00
Simon Pilgrim	ea64200b61	HexagonVectorCombine.cpp - don't negate a bool value. NFCI. Silences MSVC warning.	2021-05-10 10:50:37 +01:00
Mats Petersson	7280f4b279	[OpenMP][MLIR]Add support for guided, auto and runtime scheduling When using parallel loop construct, the OpenMP specification allows for guided, auto and runtime as scheduling variants (as well as static and dynamic which are already supported). This adds the translation from MLIR to LLVM-IR for these scheduling variants. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101435	2021-05-10 09:18:52 +00:00
Petar Avramovic	f6985a197e	AMDGPU/GlobalISel: Use destination register bank in applyMappingLoad Large loads on target that does not useFlatForGlobal have to be split in regbankselect. This did not happen in case when destination had vgpr bank and address had sgpr bank. Instead of checking if address bank is sgpr check bank of the destination. Differential Revision: https://reviews.llvm.org/D101992	2021-05-10 10:18:30 +02:00
Fraser Cormack	6db0cedd23	[LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion This patch extends VectorLegalizer::ExpandSELECT to permit expansion also for scalable vector types. The only real change is conditionally checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the vector type. We can use this to fix "cannot select" errors for scalable vector selects on the RISCV target. Note that in future patches RISCV will possibly custom-lower vector SELECTs to VSELECTs for branchless codegen. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102063	2021-05-10 08:22:35 +01:00
Jun Ma	b3aeb13892	[AArch64][SVE] Remove index_vector node. Since index_vector is lowered into step_vector in D100816, we can just remove index_vector, use step_vector for codegen directly. Differential Revision: https://reviews.llvm.org/D101593	2021-05-10 11:08:58 +08:00
Lang Hames	7f9a89f9a2	[ORC] Use the new dispatchTask API to run query callbacks. Dispatching query callbacks, rather than running them on the current thread, will allow them to be distributed across multiple threads.	2021-05-09 19:19:40 -07:00
Lang Hames	5344c88dcb	[ORC] Generalize materialization dispatch to task dispatch. Generalizing this API allows work to be distributed more evenly. In particular, query callbacks can now be dispatched (rather than running immediately on the thread that satisfied the query). This avoids the pathalogical case where an operation on one thread satisfies many queries simultaneously, causing large amounts of work to be run on that thread while other threads potentially sit idle.	2021-05-09 19:19:39 -07:00
Teresa Johnson	220f6e5271	[SimplifyCFG] Ignore ephemeral values when counting insts for threading Ignore ephemeral values (only feeding llvm.assume intrinsics) when computing the instruction count to decide if a block is small enough for threading. This is similar to the handling of these values in the InlineCost computation. These instructions will eventually be removed and shouldn't count against code size (similar to the existing ignoring of phis). Without this change, when enabling -fwhole-program-vtables, which causes type test / assume sequences to be inserted by clang, we can get different threading decisions. In particular, when building with instrumentation FDO it can affect the optimizations decisions before FDO matching, leading to some mismatches. Differential Revision: https://reviews.llvm.org/D101494	2021-05-09 19:06:54 -07:00
Zakk Chen	446ed6394b	[RISCV][NFC] Don't need to create a new STI in RISCVAsmPrinter. RISCVAsmPrinter already has MCSubtargetInfo. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D101889	2021-05-10 09:33:23 +08:00
Tomasz Miąsko	78e949159d	[Demangle][Rust] Print special namespaces Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D101821	2021-05-09 15:45:57 -07:00
Roman Lebedev	be23d5e814	[X86] AMD Zen 3: same-reg CMP is a zero-cycle dependency-breaking instruction As measured by exegesis, and confirmed by ref docs.	2021-05-10 00:03:20 +03:00
Roman Lebedev	11b0568dce	[X86] AMD Zen 3: same-reg SBB is a dependency-breaking instruction As confirmed by exegesis measurements, and ref docs. It does actually execute. While there, bump latency for MULX32rr, that seems to match measurements.	2021-05-10 00:03:20 +03:00
Roman Lebedev	eed8552787	[X86] AMD Zen 3: same-register XOR/SUB are GPR dependency breaking zero-idioms As measured by exegesis and confirmed in reference docs.	2021-05-10 00:03:20 +03:00
David Green	76786037c6	[ARM] Fix postinc of vst1xN These nodes are not handled correctly by CombineBaseUpdate. For the moment, similar to `5f1cad4d29` mark them as unsupported.	2021-05-09 21:57:55 +01:00
Nikita Popov	d26ca78c18	[SCEV] Handle and/or in applyLoopGuards() applyLoopGuards() already combines conditions from multiple nested guards. However, it cannot use multiple conditions on the same guard, combined using and/or. Add support for this by recursing into either `and` or `or`, depending on the direction of the branch. Differential Revision: https://reviews.llvm.org/D101692	2021-05-09 21:34:28 +02:00
Roman Lebedev	675daef58b	[NFC][X86] Znver3: drop obsolete fixme	2021-05-09 20:37:57 +03:00
Roman Lebedev	a21df76db6	[X86] AMD Zen 3: XCHG is a zero-cycle instruction As measured by exegesis and confirmed by reference docs.	2021-05-09 20:37:57 +03:00
Roman Lebedev	f858929208	[NFCI][X86] Mark Znver3 scheduling model as complete To the best of my knowledge, all instructions are modelled, and have reasonable values to them; flipping the switch doesn't cause any diff for MCA tests, so either we're good, or we have test coverage gaps. I'm not really sure why no other X86 sched model is marked as complete.	2021-05-09 01:07:07 +03:00
Roman Lebedev	d5494931f2	[NFCI][X86] Mark a few lately-added system instructions as such for Scheduling purposes	2021-05-09 01:07:07 +03:00

1 2 3 4 5 ...

146902 Commits