llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	fdc902774e	[DAG][AMDGPU][X86] Add SimplifyMultipleUseDemandedBits handling for SIGN/ZERO_EXTEND + SIGN/ZERO_EXTEND_VECTOR_INREG Peek through multiple use ops like we already do for ANY_EXTEND/ANY_EXTEND_VECTOR_INREG Differential Revision: https://reviews.llvm.org/D84863	2020-07-29 18:10:59 +01:00
Roman Lebedev	1d51dc38d8	[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108	2020-07-29 20:05:30 +03:00
Kang Zhang	802c043078	[PowerPC] Set v1i128 to expand for SETCC to avoid crash Summary: PPC only supports the instruction selection for v16i8, v8i16, v4i32, v2i64, v4f32 and v2f64 for ISD::SETCC, don't support the v1i128, so v1i128 for ISD::SETCC will crash. This patch is to set v1i128 to expand to avoid crash. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D84238	2020-07-29 16:39:27 +00:00
Philip Reames	31342eb63e	[Statepoint] When using the tied def lowering, unconditionally use vregs [almost NFC] This builds on `3da1a96` on the path towards supporting invokes and cross block relocations. The actual change attempts to be NFC, but does fail in one corner-case explained below. The change itself is fairly mechanical. Rather than remember SDValues - which are inherently block local - immediately produce a virtual register copy and remember that. Once this lands, we'll update the FunctionLoweringInfo::StatepointSpillMap map to allow register based lowerings, delete VirtRegs from StatepointLowering, and drop the restriction against cross block relocations. I deliberately separate the semantic part into it's own change for easy of understanding and fault isolation. The corner-case which isn't quite NFC is that the old implementation implicitly CSEd gc.relocates of the same SDValue regardless of type. The new implementation still only relocates once, but it produces distinct vregs for the bitcast and it's source, whereas SelectionDAG's generic CSE was able to remove the bitcast in the old implementation. Note that the final assembly doesn't change (at least in the test), as our MI level optimizations catch the duplication. I assert that this is an uninteresting corner-case. It's functionally correct, and if we find a case where this influences performance, we should really be canonicalizing types to i8* at the IR level. Differential Revision: https://reviews.llvm.org/D84692	2020-07-29 09:23:52 -07:00
Joel E. Denny	cee52dd026	[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2) This patch implements OpenMP runtime support for the OpenMP TR8 `present` motion modifier for `omp target update` directives. The previous patch in this series implements Clang front end support. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D84712	2020-07-29 12:18:50 -04:00
Joel E. Denny	9f2f3b9de6	[OpenMP] Implement TR8 `present` motion modifier in Clang (1/2) This patch implements Clang front end support for the OpenMP TR8 `present` motion modifier for `omp target update` directives. The next patch in this series implements OpenMP runtime support. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D84711	2020-07-29 12:18:45 -04:00
Arthur Eubanks	4a10029d7e	[NewPM][Attributor] Pin tests with -attributor to legacy PM All these tests already explicitly test against both legacy PM and NPM. $ sed -i 's/ -attributor / -attributor -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=) $ sed -i 's/ -attributor-cgscc / -attributor-cgscc -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=) Now all tests in Transforms/Attributor/ pass under NPM. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84813	2020-07-29 09:02:30 -07:00
Sanjay Patel	3e8534fbc6	[InstSimplify] allow partial undef constants for vector min/max folds	2020-07-29 11:53:41 -04:00
Sanjay Patel	3c20ede18b	[InstSimplify] fold integer min/max intrinsic with same args	2020-07-29 11:53:41 -04:00
Kang Zhang	a4ade9ed21	[MachineVerifier] Handle the PHI node for verifyLiveVariables() Summary: When doing MachineVerifier for LiveVariables, the MachineVerifier pass will calculate the LiveVariables, and compares the result with the result livevars pass gave. If they are different, verifyLiveVariables() will give error. But when we calculate the LiveVariables in MachineVerifier, we don't consider the PHI node, while livevars considers. This patch is to fix above bug. Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D80274	2020-07-29 15:43:47 +00:00
Matt Arsenault	d42c7b2211	AMDGPU: Account for the size of LDS globals used through constant expressions. Also "fix" the longstanding bug where the computed size depends on the order of the visitation. We could try to predict the allocation order used by legalization, but it would never be 100% perfect. Until we start fixing the addresses somehow (or have a more reliable allocation scheme later), just try to compute the size based on the worst case padding.	2020-07-29 11:40:42 -04:00
Nathan James	bbc2ddecbd	[clang-tidy] Handled insertion only fixits when determining conflicts. Handle insertion fix-its when removing incompatible errors by introducting a new EventType `ET_Insert` This has lower prioirty than End events, but higher than begin. Idea being If an insert is at the same place as a begin event, the insert should be processed first to reduce unnecessary conflicts. Likewise if its at the same place as an end event, process the end event first for the same reason. This also fixes https://bugs.llvm.org/show_bug.cgi?id=46511. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D82898	2020-07-29 16:35:44 +01:00
David Sherwood	9ad7c980bb	[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542	2020-07-29 16:29:19 +01:00
Yuanfang Chen	7a2e1122ae	[NewPM][PassInstrument] Make PrintIR and TimePasses to use before-pass-run callback Reviewed By: asbirlea, aeubanks Differential Revision: https://reviews.llvm.org/D84773	2020-07-29 08:26:36 -07:00
Yuanfang Chen	5cf0c2e67b	[NewPM][PassInstrument] Add a new kind of before-pass callback that only get called if the pass is not skipped TODO * PrintIRInstrumentation and TimePassesHandler would be using this new callback. * "Running pass" logging will also be moved to use this callback. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D84772	2020-07-29 08:26:36 -07:00
Johannes Doerfert	ee05167cc4	[OpenMP] Allow traits for the OpenMP context selector `isa` It was unclear what `isa` was supposed to mean so we did not provide any traits for this context selector. With this patch we will allow any string or identifier. We use the target attribute and target info to determine if the trait matches. In other words, we will check if the provided value is a target feature that is available (at the call site). Fixes PR46338 Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D83281	2020-07-29 10:22:27 -05:00
Johannes Doerfert	7db017bf34	[OpenMP][Docs] Update Clang Support docs after D75591	2020-07-29 10:21:05 -05:00
Simon Wallis	6a05c6bfc8	[MachineCopyPropagation] BackwardPropagatableCopy: add check for hasOverlappingMultipleDef In MachineCopyPropagation::BackwardPropagatableCopy(), a check is added for multiple destination registers. The copy propagation is avoided if the copied destination register is the same register as another destination on the same instruction. A new test is added. This used to fail on ARM like this: error: unpredictable instruction, RdHi and RdLo must be different umull r9, r9, lr, r0 Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D82638	2020-07-29 16:21:01 +01:00
Nathan James	62beb7c6f4	[clang-tidy] Fix module options being registered with different priorities Not a bug that is ever likely to materialise, but still worth fixing Reviewed By: DmitryPolukhin Differential Revision: https://reviews.llvm.org/D84850	2020-07-29 16:19:07 +01:00
Xing GUO	bfa140376d	[DWARFYAML] Make the field names consistent with the DWARF spec. NFC. This patch replaces 'AddrSize'/'SegSize' with 'AddressSize'/'SegmentSelectorSize'. NFC.	2020-07-29 23:10:08 +08:00
Sanjay Patel	9ee7d7122c	[ConstantFolding] fold integer min/max intrinsics If both operands are undef, return undef. If one operand is undef, clamp to limit constant.	2020-07-29 11:01:13 -04:00
Sanjay Patel	9f95895833	[ConstantFolding] add tests for integer min/max intrinsics; NFC	2020-07-29 11:01:13 -04:00
Chris Bowler	d5776f250f	[NFC][PPC][AIX] Add test coverage for _Complex return values Differential Revision: https://reviews.llvm.org/D84069	2020-07-29 10:59:52 -04:00
Simon Pilgrim	d1abca187d	[CostModel][X86] Add SSE costs for SMAX/SMIN/UMAX/UMIN intrinsics	2020-07-29 15:55:43 +01:00
Frederik Gossen	5fc34fafa7	[MLIR][Shape] Limit shape to SCF lowering patterns to their supported types Differential Revision: https://reviews.llvm.org/D84444	2020-07-29 14:54:09 +00:00
Jakub Lichman	1aaf8aa53d	[mlir][Linalg] Conv1D, Conv2D and Conv3D added as named ops This commit is part of a greater project which aims to add full end-to-end support for convolutions inside mlir. The reason behind having conv ops for each rank rather than having one generic ConvOp is to enable better optimizations for every N-D case which reflects memory layout of input/kernel buffers better and simplifies code as well. We expect plain linalg.conv to be progressively retired. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D83879	2020-07-29 16:39:56 +02:00
Nathan James	b99630e432	[clang-tidy] Fix RedundantStringCStrCheck with r values The previous fix for this, https://reviews.llvm.org/D76761, Passed test cases but failed in the real world as std::string has a non trivial destructor so creates a CXXBindTemporaryExpr. This handles that shortfall and updates the test case std::basic_string implementation to use a non trivial destructor to reflect real world behaviour. Reviewed By: gribozavr2 Differential Revision: https://reviews.llvm.org/D84831	2020-07-29 15:35:31 +01:00
Alexey Bader	8d27be8dba	[OpenCL] Add global_device and global_host address spaces This patch introduces 2 new address spaces in OpenCL: global_device and global_host which are a subset of a global address space, so the address space scheme will be looking like: ``` generic->global->host ->device ->private ->local constant ``` Justification: USM allocations may be associated with both host and device memory. We want to give users a way to tell the compiler the allocation type of a USM pointer for optimization purposes. (Link to the Unified Shared Memory extension: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc) Before this patch USM pointer could be only in opencl_global address space, hence a device backend can't tell if a particular pointer points to host or device memory. On FPGAs at least we can generate more efficient hardware code if the user tells us where the pointer can point - being able to distinguish between these types of pointers at compile time allows us to instantiate simpler load-store units to perform memory transactions. Patch by Dmitry Sidorov. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D82174	2020-07-29 17:24:53 +03:00
Tim Keith	2c662f3d3d	[flang] Fix bug with intrinsic in type declaration stmt When an instrinsic function is declared in a type declaration statement we need to set the INTRINSIC attribute and (per 8.2(3)) ignore the specified type. To simplify the check, add IsIntrinsic utility to BaseVisitor. Also, intrinsics and external procedures were getting assigned a size and offset and they shouldn't be. Differential Revision: https://reviews.llvm.org/D84702	2020-07-29 07:23:31 -07:00
Juneyoung Lee	672df0fc67	[InstSimplify] add tests for expandCommutativeBinOp; NFC	2020-07-29 23:21:39 +09:00
Florian Hahn	99166fd4fb	[SCEVExpander] Add option to preserve LCSSA directly. This patch teaches SCEVExpander to directly preserve LCSSA. As it is currently, SCEV does not look through PHI nodes in loops, as it might break LCSSA form. Once SCEVExpander can preserve LCSSA form, it should be safe for SCEV to look through PHIs. To preserve LCSSA form, this patch uses formLCSSAForInstructions on operands of newly created instructions, if the definition is inside a different loop than the new instruction. The final value we return from expandCodeFor may also need LCSSA phis, depending on the insert point. As no user for it exists there yet, create a temporary instruction at the insert point, which can be passed to formLCSSAForInstructions. This temporary instruction is removed after LCSSA construction. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D71538	2020-07-29 15:07:37 +01:00
Frederik Gossen	6673c6cd82	[MLIR][Shape] Limit shape to standard lowerings to their supported types The lowering does not support all types for its source operations. This change makes the patterns fail in a well-defined manner. Differential Revision: https://reviews.llvm.org/D84443	2020-07-29 13:56:52 +00:00
Bruno Ricci	517fe058d4	[clang][NFC] clang-format fix after `eb10b065f2`	2020-07-29 14:55:16 +01:00
Bruno Ricci	1ae63b4179	[clang][NFC] Pass the ASTContext to CXXRecordDecl::setCaptures In general Decl::getASTContext() is relatively expensive and here the changes are non-invasive. NFC.	2020-07-29 14:55:15 +01:00
Tres Popp	ad793ed903	Forward extent tensors through shape.broadcast. Differential Revision: https://reviews.llvm.org/D84832	2020-07-29 15:49:10 +02:00
Sanjay Patel	8c3262a7b4	[ConstantFolding] update test checks FP min/max intrinsics There's a slight difference in functionality with the new CHECK lines: before, we allowed either -0.0 or 0.0 for maxnum/minnum. That matches the definition, but we should always get a deterministic result from constant folding within the compiler, so now we assert that we got the single expected result in all cases.	2020-07-29 09:43:33 -04:00
Victor Campos	71bf6dd682	[Driver][ARM] Fix testcase that should only run on ARM Fix testcase introduced in `d1a3396bfb`.	2020-07-29 14:35:14 +01:00
Simon Pilgrim	0a0f28254a	[CostModel][X86] Add SSE costs for ABS intrinsics	2020-07-29 14:33:59 +01:00
Victor Campos	d1a3396bfb	[Driver][ARM] Disable unsupported features when nofp arch extension is used A list of target features is disabled when there is no hardware floating-point support. This is the case when one of the following options is passed to clang: - -mfloat-abi=soft - -mfpu=none This option list is missing, however, the extension "+nofp" that can be specified in -march flags, such as "-march=armv8-a+nofp". This patch also disables unsupported target features when nofp is passed to -march. Differential Revision: https://reviews.llvm.org/D82948	2020-07-29 14:13:22 +01:00
Andrew Ng	8725a49409	[ELF][test] Add test coverage of `__real_` to wrap-plt.s Differential Revision: https://reviews.llvm.org/D84749	2020-07-29 14:10:38 +01:00
Simon Pilgrim	75182104f0	[TTI] Move abs/smax/smin/umax/umin cost expansion to ICA getIntrinsicInstrCost variant This will simplify target overrides, and matches what we do for most integer intrinsic costs.	2020-07-29 13:44:38 +01:00
Stephan Herhut	823ffef009	[mlir][Standard] Allow unranked memrefs as operands to dim and rank `std.dim` currently only accepts ranked memrefs and `std.rank` is limited to tensors. Differential Revision: https://reviews.llvm.org/D84790	2020-07-29 14:42:58 +02:00
David Green	9ddb28964c	[ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores This patch uses the feature added in D79162 to fix the cost of a sext/zext of a masked load, or a trunc for a masked store. Previously, those were considered cheap or even free, but it's not the case as we cannot split the load in the same way we would for normal loads. This updates the costs to better reflect reality, and adds a test for it in test/Analysis/CostModel/ARM/cast.ll. It also adds a vectorizer test that showcases the improvement: in some cases, the vectorizer will now choose a smaller VF when tail-predication is enabled, which results in better codegen. (Because if it were to use a higher VF in those cases, the code we see above would be generated, and the vmovs would block tail-predication later in the process, resulting in very poor codegen overall) Original Patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79163	2020-07-29 13:41:34 +01:00
David Green	60280e9818	[Analysis] TTI: Add CastContextHint for getCastInstrCost Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162	2020-07-29 13:32:53 +01:00
David Sherwood	2078771759	[SVE][CodeGen] Add simple integer add tests for SVE tuple types I have added tests to: CodeGen/AArch64/sve-intrinsics-int-arith.ll for doing simple integer add operations on tuple types. Since these tests introduced new warnings due to incorrect use of getVectorNumElements() I have also fixed up these warnings in the same patch. These fixes are: 1. In narrowExtractedVectorBinOp I have changed the code to bail out early for scalable vector types, since we've not yet hit a case that proves the optimisations are profitable for scalable vectors. 2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced calls to getVectorNumElements with getVectorMinNumElements in cases that work with scalable vectors. For the other cases I have added asserts that the vector is not scalable because we should not be using shuffle vectors and build vectors in such cases. Differential revision: https://reviews.llvm.org/D84016	2020-07-29 13:32:10 +01:00
Sjoerd Meijer	85342c27a3	[ARM] Optimize immediate selection Optimize some specific immediates selection by materializing them with sub/mvn instructions as opposed to loading them from the constant pool. Patch by Ben Shi, powerman1st@163.com. Differential Revision: https://reviews.llvm.org/D83745	2020-07-29 13:29:17 +01:00
Matt Arsenault	200bb5191a	AMDGPU/GlobalISel: Refactor special argument management	2020-07-29 08:27:31 -04:00
Matt Arsenault	c230965ccf	AMDGPU: Make saturating add/sub legal for DAG path	2020-07-29 08:27:31 -04:00
Matt Arsenault	cdd45d5f9c	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.csub Remove the custom node boilerplate. Not sure why this tried to handle the LDS atomic stuff.	2020-07-29 08:27:31 -04:00
Chris Gyurgyik	33abb7292e	[libc] [obvious] Fix typo in binary header.	2020-07-29 08:18:07 -04:00

1 2 3 4 5 ...

361907 Commits All Branches Search

361907 Commits

All Branches