llvm-project

Commit Graph

Author	SHA1	Message	Date
Martin Storsjö	46416f0803	[CodeGen] [WinException] Remove a redundant explicit section switch for aarch64 The following EmitWinEHHandlerData() implicitly switches to .xdata, just like on x86_64. This became orphaned from the original code requiring it in `0b61d220c9` / https://reviews.llvm.org/D61095. Differential Revision: https://reviews.llvm.org/D87447	2020-09-11 10:31:04 +03:00
Michael Liao	f787fe15d8	[EarlyCSE] Remove unnecessary operand swap. - As min/max are commutative operators, there is no need to swap operands. That breaks the convention calculating the hash value.	2020-09-11 02:14:04 -04:00
Alok Kumar Sharma	e45b0708ae	[DebugInfo] Fixing CodeView assert related to lowerBound field of DISubrange. This is to fix CodeView build failure https://bugs.llvm.org/show_bug.cgi?id=47287 after DIsSubrange upgrade D80197 Assert condition is now removed and Count is calculated in case LowerBound is absent or zero and Count or UpperBound is constant. If Count is unknown it is later handled as VLA (currently Count is set to zero). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D87406	2020-09-11 11:12:49 +05:30
Mircea Trofin	da92448828	[NFC][MLInliner] Presort instruction successions. Differential Revision: https://reviews.llvm.org/D87489	2020-09-10 21:40:49 -07:00
Michael Liao	41e68f7ee7	[EarlyCSE] Fix and recommit the revised `c9826829d7` In addition to calculate hash consistently by swapping SELECT's operands, we also need to inverse the select pattern favor to match the original logic. [EarlyCSE] Equivalent SELECTs should hash equally DenseMap<SimpleValue> assumes that, if its isEqual method returns true for two elements, then its getHashValue method must return the same value for them. This invariant is broken when one SELECT node is a min/max operation, and the other can be transformed into an equivalent min/max by inverting its predicate and swapping its operands. This patch fixes an assertion failure that would occur intermittently while compiling the following IR: define i32 @t(i32 %i) { %cmp = icmp sle i32 0, %i %twin1 = select i1 %cmp, i32 %i, i32 0 %cmpinv = icmp sgt i32 0, %i %twin2 = select i1 %cmpinv, i32 0, i32 %i %sink = add i32 %twin1, %twin2 ret i32 %sink } Differential Revision: https://reviews.llvm.org/D86843	2020-09-10 23:30:56 -04:00
Michael Liao	39dc75f66c	Revert "[EarlyCSE] Equivalent SELECTs should hash equally" This reverts commit `c9826829d7` as it breaks regression tests.	2020-09-10 22:37:35 -04:00
Zarko Todorovski	035396197a	Remove unused variable introduce in `0448d11a06` causing build failures with -Werror on.	2020-09-10 20:07:11 -04:00
Reid Kleckner	2c73bef7fa	Fix wrong comment about enabling optimizations to work around a bug	2020-09-10 16:45:20 -07:00
Amara Emerson	0448d11a06	[AArch64][GlobalISel] Don't emit a branch for a fallthrough G_BR at -O0. With optimizations we leave the decision to eliminate fallthrough branches to bock placement, but at -O0 we should do it in the selector to save code size. This regressed -O0 with a recent change to a combiner.	2020-09-10 15:01:26 -07:00
Reid Kleckner	4e3edef4b8	Use pragmas to work around MSVC x86_32 debug miscompile bug Halide users reported this here: https://llvm.org/pr46176 I reported the issue to MSVC here: https://developercommunity.visualstudio.com/content/problem/1179643/msvc-copies-overaligned-non-trivially-copyable-par.html This codepath is apparently not covered by LLVM's unit tests, so I added coverage in a unit test. If we want to support this configuration going forward, it means that is in general not safe to pass a SmallVector<T, N> by value if alignof(T) is greater than 4. This doesn't appear to come up often because passing a SmallVector by value is inefficient and not idiomatic: it copies the inline storage. In this case, the SmallVector<LLT,4> is captured by value by a lambda, and the lambda is passed by value into std::function, and that's how we hit the bug. Differential Revision: https://reviews.llvm.org/D87475	2020-09-10 14:50:01 -07:00
Florian Hahn	fb109c42d9	[DSE] Switch to MemorySSA-backed DSE by default. The tests have been updated and I plan to move them from the MSSA directory up. Some end-to-end tests needed small adjustments. One difference to the legacy DSE is that legacy DSE also deletes trivially dead instructions that are unrelated to memory operations. Because MemorySSA-backed DSE just walks the MemorySSA, we only visit/check memory instructions. But removing unrelated dead instructions is not really DSE's job and other passes will clean up. One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll, but I think this comes down to legacy DSE not handling instructions that may throw correctly in that case. To cover this with MemorySSA-backed DSE, we need an update to llvm.coro.begin to treat it's return value to belong to the same underlying object as the passed pointer. There are some minor cases MemorySSA-backed DSE currently misses, e.g. related to atomic operations, but I think those can be implemented after the switch. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers and details in the thread on llvm-dev. Impact on CTMark: ``` Legacy Pass Manager exec instrs size-text O3 + 0.60% - 0.27% ReleaseThinLTO + 1.00% - 0.42% ReleaseLTO-g. + 0.77% - 0.33% RelThinLTO (link only) + 0.87% - 0.42% RelLO-g (link only) + 0.78% - 0.33% ``` http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions ``` New Pass Manager exec instrs. size-text O3 + 0.95% - 0.25% ReleaseThinLTO + 1.34% - 0.41% ReleaseLTO-g. + 1.71% - 0.35% RelThinLTO (link only) + 0.96% - 0.41% RelLO-g (link only) + 2.21% - 0.35% ``` http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions Reviewed By: asbirlea, xbolva00, nikic Differential Revision: https://reviews.llvm.org/D87163	2020-09-10 22:24:32 +01:00
Bryan Chan	c9826829d7	[EarlyCSE] Equivalent SELECTs should hash equally DenseMap<SimpleValue> assumes that, if its isEqual method returns true for two elements, then its getHashValue method must return the same value for them. This invariant is broken when one SELECT node is a min/max operation, and the other can be transformed into an equivalent min/max by inverting its predicate and swapping its operands. This patch fixes an assertion failure that would occur intermittently while compiling the following IR: define i32 @t(i32 %i) { %cmp = icmp sle i32 0, %i %twin1 = select i1 %cmp, i32 %i, i32 0 %cmpinv = icmp sgt i32 0, %i %twin2 = select i1 %cmpinv, i32 0, i32 %i %sink = add i32 %twin1, %twin2 ret i32 %sink } Differential Revision: https://reviews.llvm.org/D86843	2020-09-10 16:59:24 -04:00
Lang Hames	c74900ca67	[ORC] Make MaterializationResponsibility immovable, pass by unique_ptr. Making MaterializationResponsibility instances immovable allows their associated VModuleKeys to be updated by the ExecutionSession while the responsibility is still in-flight. This will be used in the upcoming removable code feature to enable safe merging of resource keys even if there are active compiles using the keys being merged.	2020-09-10 13:21:46 -07:00
Nikita Popov	a5168bdb4a	[DemandedBits][BDCE] Add support for min/max intrinsics Add DemandedBits / BDCE support for min/max intrinsics: If the low bits are not demanded in the result, they also aren't demanded in the operands. Differential Revision: https://reviews.llvm.org/D87161	2020-09-10 22:13:31 +02:00
Nikita Popov	99e78cb718	[DemandedBits] Add braces to large if (NFC) While the if only contains a single statement, it happens to be a huge switch. Add braces to make this code easier to read.	2020-09-10 22:13:27 +02:00
Volkan Keles	d4bf90271f	GlobalISel: Combine fneg(fneg x) to x https://reviews.llvm.org/D87473	2020-09-10 12:57:38 -07:00
Anna Thomas	b1b9806370	[ImplicitNullChecks] NFC: Remove unused PointerReg arg in dep analysis The PointerReg arg was passed into the dependence function for an assertion which no longer exists. So, this patch updates the dependence functions to avoid the PointerReg in the signature. Tests-Run: make check	2020-09-10 15:31:57 -04:00
Christopher Tetreault	7ddfd9b3eb	[SVE] Bail from VectorUtils heuristics for scalable vectors Bail from maskIsAllZeroOrUndef and maskIsAllOneOrUndef prior to iterating over the number of elements for scalable vectors. Assert that the mask type is not scalable in possiblyDemandedEltsInMask . Assert that the types are correct in all three functions. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87424	2020-09-10 12:29:37 -07:00
Krzysztof Parzyszek	783e28a508	[Hexagon] Split pair-based masked memops	2020-09-10 14:24:42 -05:00
Dominic Chen	4252f3009b	[WebAssembly] Set unreachable as canonical to permit disassembly Currently, using llvm-objdump to disassemble a function containing unreachable will trigger an assertion while decoding the opcode, since both unreachable and debug_unreachable have the same encoding. To avoid this, set unreachable as the canonical decoding. Differential Revision: https://reviews.llvm.org/D87431	2020-09-10 15:04:16 -04:00
Craig Topper	c195ae2f00	[SLPVectorizer][X86][AMDGPU] Remove fcmp+select to fmin/fmax reduction support. Previously we could match fcmp+select to a reduction if the fcmp had the nonans fast math flag. But if the select had the nonans fast math flag, InstCombine would turn it into a fminnum/fmaxnum intrinsic before SLP gets to it. Seems fairly likely that if one of the fcmp+select pair have the fast math flag, they both would. My plan is to start vectorizing the fmaxnum/fminnum version soon, but I wanted to get this code out as it had some of the strangest fast math flag behaviors.	2020-09-10 11:49:19 -07:00
Fangrui Song	a0ffe2b21a	[PGO] Skip if an IndirectBrInst critical edge cannot be split PGOInstrumentation runs `SplitIndirectBrCriticalEdges` but some IndirectBrInst critical edge cannot be split. `getInstrBB` will crash when calling `SplitCriticalEdge`, e.g. int foo(char p) { void targets[2]; targets[0] = &&indirect; targets[1] = &&end; for (;; p++) if (p == 7) { indirect: goto targets[p[1]]; // the self loop is critical in -O } end: return 0; } Skip such critical edges to prevent a crash. Reviewed By: davidxl, lebedev.ri Differential Revision: https://reviews.llvm.org/D87435	2020-09-10 11:04:14 -07:00
Anna Thomas	46329f6079	[ImplicitNullCheck] Handle instructions that preserve zero value This is the first in a series of patches to make implicit null checks more general. This patch identifies instructions that preserves zero value of a register and considers that as a valid instruction to hoist along with the faulting load. See added testcases. Reviewed-By: reames, dantrushin Differential Revision: https://reviews.llvm.org/D87108	2020-09-10 13:39:50 -04:00
Mircea Trofin	e543708e5e	[NFC][ThinLTO] Let llvm::EmbedBitcodeInModule handle serialization. llvm::EmbedBitcodeInModule handles serializing the passed-in module, if the provided MemoryBufferRef is invalid. This is already the path taken in one of the uses of the API - clang::EmbedBitcode, when called from BackendConsumer::HandleTranslationUnit - so might as well do the same here and reduce (by very little) code duplication. The only difference this patch introduces is that the serialization happens with ShouldPreserveUseListOrder set to true. Differential Revision: https://reviews.llvm.org/D87339	2020-09-10 10:25:00 -07:00
Ettore Tiotto	6b13cfe739	[ArgumentPromotion]: Copy function metadata after promoting arguments The argument promotion pass currently fails to copy function annotations over to the modified function after promoting arguments. This patch copies the original function annotation to the new function. Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D86630	2020-09-10 13:08:57 -04:00
Kit Barton	009cd4e491	[PPC][GlobalISel] Add initial GlobalIsel infrastructure This adds the initial GlobalISel skeleton for PowerPC. It can only run ir-translator and legalizer for `ret void`. This is largely based on the initial GlobalISel patch for RISCV (https://reviews.llvm.org/D65219). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D83100	2020-09-10 11:58:01 -05:00
Simon Pilgrim	f42f733af9	SwitchLoweringUtils.h - reduce TargetLowering.h include. NFCI. Only include the headers we actually need, and move the remaining includes down to implicit dependent files.	2020-09-10 17:42:18 +01:00
Owen Anderson	3d9c85e4d8	Mark FMOV constant materialization as being as cheap as a move. This prevents us from doing things like LICM'ing it out of a loop, which is usually a net loss because we end up having to spill a callee-saved FPR to accomodate it. This does perturb instruction scheduling around this instruction, so a number of tests had to be updated to account for it. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D87316	2020-09-10 16:38:59 +00:00
Simon Pilgrim	601557e9f9	Hexagon.h - remove unnecessary includes. NFCI. Replace with forward declarations and move includes to implicit dependent files.	2020-09-10 16:59:43 +01:00
Krzysztof Parzyszek	8a08740db6	[GVN] Account for masked loads/stores depending on load/store instructions This is a case where an intrinsic depends on a non-call instruction. Differential Revision: https://reviews.llvm.org/D87423	2020-09-10 10:57:33 -05:00
Simon Pilgrim	b585fdae24	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.	2020-09-10 16:05:33 +01:00
Simon Pilgrim	9f830e0af7	AArch64MachineFunctionInfo.h - remove unnecessary TargetFrameLowering.h include. NFCI.	2020-09-10 16:05:33 +01:00
Nikita Popov	4e413e1621	[InstCombine] Temporarily do not drop volatile stores before unreachable See discussion in D87149. Dropping volatile stores here is legal per LLVM semantics, but causes issues for real code and may result in a change to LLVM volatile semantics. Temporarily treat volatile stores as "not guaranteed to transfer execution" in just this place, until this issue has been resolved.	2020-09-10 16:16:44 +02:00
Jay Foad	517202c720	[TargetLowering] Fix comments describing XOR -> OR/AND transformations	2020-09-10 13:56:34 +01:00
Florian Hahn	a5ec99da6e	[DSE] Support eliminating memcpy.inline. MemoryLocation has been taught about memcpy.inline, which means we can get the memory locations read and written by it. This means DSE can handle memcpy.inline	2020-09-10 13:19:25 +01:00
Max Kazantsev	8c0bbbade1	[NFC] Refactoring in SCEV: add missing `const` qualifiers	2020-09-10 19:06:37 +07:00
Simon Pilgrim	de25ebaac6	[CostModel][X86] Add vXi32 division by uniform constant costs (PR47476) Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.	2020-09-10 12:17:54 +01:00
Simon Pilgrim	fc49abee56	[X86][SSE] lowerShuffleAsSplitOrBlend always returns a shuffle. lowerShuffleAsSplitOrBlend always returns a target shuffle result (and is the default operation for lowering some shuffle types), so we don't need to check for null.	2020-09-10 11:45:08 +01:00
Simon Pilgrim	e80605e242	[X86] Remove WaitInsert::TTI member. NFCI. This is only ever set/used inside WaitInsert::runOnMachineFunction so don't bother storing it in the class.	2020-09-10 11:45:08 +01:00
Kerry McLaughlin	cd89f5c91b	[SVE][CodeGen] Legalisation of truncate for scalable vectors Truncating from an illegal SVE type to a legal type, e.g. `trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>` fails after PromoteIntOp_CONCAT_VECTORS attempts to create a BUILD_VECTOR. This patch changes the promote function to create a sequence of INSERT_SUBVECTORs if the return type is scalable, and replaces these with UNPK+UZP1 for AArch64. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86548	2020-09-10 11:35:33 +01:00
Juneyoung Lee	1b9884df8d	Enable InsertFreeze flag of JumpThreading when used in LTO This patch enables inserting freeze when JumpThreading converts a select to a conditional branch when it is run in LTO. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85534	2020-09-10 19:05:49 +09:00
Sam Tebbs	b81c57d646	[ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane values The effects of unpredicated vector instruction with unknown lanes cannot be predicted and therefore cannot be tail predicated. This does not apply to predicated vector instructions and so this patch allows tail predication on them. Differential Revision: https://reviews.llvm.org/D87376	2020-09-10 10:34:32 +01:00
Sam Parker	0bdf8c9127	[SCEV] Constant expansion cost at minsize As code size is the only thing we care about at minsize, query the cost of materialising immediates when calculating the cost of a SCEV expansion. We also modify the CostKind to TCK_CodeSize for minsize, instead of RecipThroughput. Differential Revision: https://reviews.llvm.org/D76434	2020-09-10 08:21:11 +01:00
Sam Parker	1919b65052	[ARM] Tail predicate VQDMULH and VQRDMULH Mark the family of instructions as valid for tail predication. Differential Revision: https://reviews.llvm.org/D87348	2020-09-10 08:20:07 +01:00
Juneyoung Lee	39c1653b3d	[JumpThreading] Conditionally freeze its condition when unfolding select This patch fixes pr45956 (https://bugs.llvm.org/show_bug.cgi?id=45956 ). To minimize its impact to the quality of generated code, I suggest enabling this only for LTO as a start (it has two JumpThreading passes registered). This patch contains a flag that makes JumpThreading enable it. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84940	2020-09-10 15:49:40 +09:00
Max Kazantsev	cde8fc65ae	[NFC] Rename variables to avoid name confusion Name `LI` is used for loop info, loop and load inst at the same function, which causes a lot of confusion.	2020-09-10 13:41:10 +07:00
Max Kazantsev	c413a8a8ec	[LoopLoadElim] Filter away candidates that stop being AddRecs after loop versioning. PR47457 The test in PR47457 demonstrates a situation when candidate load's pointer's SCEV is no loger a SCEVAddRec after loop versioning. The code there assumes that it is always a SCEVAddRec and crashes otherwise. This patch makes sure that we do not consider candidates for which this requirement is broken after the versioning. Differential Revision: https://reviews.llvm.org/D87355 Reviewed By: asbirlea	2020-09-10 13:30:31 +07:00
Qiu Chaofan	6afb279100	[PowerPC] [FPEnv] Disable strict FP mutation by default `22a0edd0` introduced a config IsStrictFPEnabled, which controls the strict floating point mutation (transforming some strict-fp operations into non-strict in ISel). This patch disables the mutation by default since we've finished PowerPC strict-fp enablement in backend. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87222	2020-09-10 13:28:09 +08:00
Petr Hosek	c4d7536136	[CMake] Simplify CMake handling for libxml2 This matches the changes made to handling of zlib done in `10b1b4a` where we rely on find_package and the imported target rather than manually appending the library and include paths. The use of LLVM_LIBXML2_ENABLED has been replaced by LLVM_ENABLE_LIBXML2 thus reducing the number of variables. Differential Revision: https://reviews.llvm.org/D84563	2020-09-09 21:44:44 -07:00
Jordan Rupprecht	52f0837778	[NFC] Move definition of variable now only used in debug builds	2020-09-09 20:23:59 -07:00
Matt Arsenault	e15215e041	AMDGPU: Hoist check for VGPRs	2020-09-09 19:45:40 -04:00
Matt Arsenault	85490874b2	AMDGPU: Skip all meta instructions in hazard recognizer This was not adding a necessary nop due to thinking the kill counted.	2020-09-09 19:45:40 -04:00
Matt Arsenault	82cbc9330a	AMDGPU: Fix inserting waitcnts before kill uses	2020-09-09 19:45:40 -04:00
Juneyoung Lee	a6183d0f02	[ValueTracking] isKnownNonZero, computeKnownBits for freeze This implements support for isKnownNonZero, computeKnownBits when freeze is involved. ``` br (x != 0), BB1, BB2 BB1: y = freeze x ``` In the above program, we can say that y is non-zero. The reason is as follows: (1) If x was poison, `br (x != 0)` raised UB (2) If x was fully undef, the branch again raised UB (3) If x was non-zero partially undef, say `undef \| 1`, `freeze x` will return a nondeterministic value which is also non-zero. (4) If x was just a concrete value, it is trivial Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D75808	2020-09-10 08:07:38 +09:00
dfukalov	c259d3a061	[AMDGPU] Fix for folding v2.16 literals. It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost. Introduced new function to check immediate 32-bit operand can be folded. Converted condition about current op_sel flags value to fall-through. Fixes: SWDEV-247595 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87158	2020-09-10 01:39:25 +03:00
Jessica Paquette	480e7f43a2	[AArch64][GlobalISel] Share address mode selection code for memops We were missing support for the G_ADD_LOW + ADRP folding optimization in the manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD. As a result, we were missing cases like this: ``` @foo = external hidden global i32* define void @baz(i32* %0) { store i32* %0, i32** @foo ret void } ``` https://godbolt.org/z/16r7ad This functionality already existed in the addressing mode functions for the importer. So, this patch makes the manual selection code use `selectAddrModeIndexed` rather than duplicating work. This is a 0.2% geomean code size improvement for CTMark at -O3. There is one code size increase (0.1% on lencod) which is likely because `selectAddrModeIndexed` doesn't look through constants. Differential Revision: https://reviews.llvm.org/D87397	2020-09-09 15:14:46 -07:00
Florian Hahn	9969c317ff	[DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber. Atomic stores are modeled as MemoryDef to model the fact that they may not be reordered, depending on the ordering constraints. Atomic stores that are monotonic or weaker do not limit re-ordering, so we do not have to treat them as potential read clobbers. Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll already contains a set of negative test cases. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87386	2020-09-09 23:01:58 +01:00
Nikita Popov	0a5dc7effb	[DAGCombiner] Fold fmin/fmax of NaN fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the behavior of existing InstSimplify folds. This is expected to improve the reduction lowerings in D87391, which use NaN as a neutral element. Differential Revision: https://reviews.llvm.org/D87415	2020-09-09 23:53:32 +02:00
Amara Emerson	e5784ef8f6	[GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator. We weren't using this before, so none of the MachineFunction CFG edges had the branch probability information added. As a result, block placement later in the pipeline was flying blind. This is enabled only with optimizations enabled like SelectionDAG. Differential Revision: https://reviews.llvm.org/D86824	2020-09-09 14:31:12 -07:00
Amara Emerson	467a071285	[GlobalISel][IRTranslator] Generate better conditional branch lowering. This is a port of the functionality from SelectionDAG, which tries to find a tree of conditions from compares that are then combined using OR or AND, before using that result as the input to a branch. Instead of naively lowering the code as is, this change converts that into a sequence of conditional branches on the sub-expressions of the tree. Like SelectionDAG, we re-use the case block codegen functionality from the switch lowering utils, which causes us to generate some different code. The result of which I've tried to mitigate in earlier combine patches. Differential Revision: https://reviews.llvm.org/D86665	2020-09-09 13:16:11 -07:00
Amara Emerson	cc76da7ada	[GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less. This combine previously tried to take sequences like: %cond = G_ICMP pred, a, b G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and by inverting the compare predicate and swapping branch targets, delete the G_BR and instead have a single conditional branch to the falsebb. Since in an earlier patch we have a combine to fold not(icmp) into just an inverted icmp, we don't need this combine to do as much. This patch instead generalizes the combine by just looking for: G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and then inverting the condition using a not (xor). The xor can be folded away in a separate combine. This change also lets us avoid some optimization code in the IRTranslator. I also think that deleting G_BRs in the combiner is unnecessary. That's something that targets can decide to do at selection time and could simplify generic code in future. Differential Revision: https://reviews.llvm.org/D86664	2020-09-09 13:08:16 -07:00
Hiroshi Yamauchi	0ab6a15698	[X86] Add support for using fast short rep mov for memcpy lowering. Disabled by default behind an option. Differential Revision: https://reviews.llvm.org/D86883	2020-09-09 12:46:40 -07:00
Jian Cai	415a4fbea7	[MC] Resolve the difference of symbols in consecutive MCDataFragements Try to resolve the difference of two symbols in consecutive MCDataFragments. This is important for an idiom like "foo:instr; .if . - foo; instr; .endif" (https://bugs.llvm.org/show_bug.cgi?id=43795). Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D69411	2020-09-09 12:35:43 -07:00
Fangrui Song	ad61e346d3	[gcov] Give the __llvm_gcov_ctr load instruction a name for more readable output	2020-09-09 12:34:43 -07:00
Krzysztof Parzyszek	0ee54cf883	[Hexagon] Account for truncating pairs to non-pairs when widening truncates Added missing selection patterns for vpackl.	2020-09-09 14:31:52 -05:00
Fangrui Song	dbac20bb6b	[gcov] Don't split entry block; add a synthetic entry block instead The entry block is split at the first instruction where `shouldKeepInEntry` returns false. The created basic block has a br jumping to the original entry block. The new basic block causes the function label line and the other entry block lines to be covered by different basic blocks, which can affect line counts with special control flows (fork/exec in the entry block requires heuristics in llvm-cov gcov to get consistent line counts). int main() { // BB0 return 0; // BB2 (due to entry block splitting) } // BB1 is the exit block (since gcov 4.8) This patch adds a synthetic entry block (like PGOInstrumentation and GCC) and inserts an edge from the synthetic entry block to the original entry block. We can thus remove the tricky `shouldKeepInEntry` and entry block splitting. The number of basic blocks does not change, but the emitted .gcno files will be smaller because we can save one GCOV_TAG_LINES tag. // BB0 is the synthetic entry block with a single edge to BB2 int main() { // BB2 return 0; // BB2 } // BB1 is the exit block (since gcov 4.8)	2020-09-09 12:25:24 -07:00
Guillaume Chatelet	5a4a0cfcfb	[NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) This is preparatory work to unable storing alignment for AtomicCmpXchgInst. See D83136 for context and bug: https://bugs.llvm.org/show_bug.cgi?id=27168 This is the fixed version of D83375, which was submitted and reverted. Differential Revision: https://reviews.llvm.org/D87373	2020-09-09 19:10:30 +00:00
Mark de Wever	08196e0b2e	Implements [[likely]] and [[unlikely]] in IfStmt. This is the initial part of the implementation of the C++20 likelihood attributes. It handles the attributes in an if statement. Differential Revision: https://reviews.llvm.org/D85091	2020-09-09 20:48:37 +02:00
Krzysztof Parzyszek	81ff2d30a9	[DSE] Handle masked stores	2020-09-09 13:31:31 -05:00
Ulrich Weigand	1a25133bcd	[DAGCombine] Skip re-visiting EntryToken to avoid compile time explosion During the main DAGCombine loop, whenever a node gets replaced, the new node and all its users are pushed onto the worklist. Omit this if the new node is the EntryToken (e.g. if a store managed to get optimized out), because re-visiting the EntryToken and its users will not uncover any additional opportunities, but there may be a large number of such users, potentially causing compile time explosion. This compile time explosion showed up in particular when building the SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any platform without SIMD vector support. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86963	2020-09-09 19:13:46 +02:00
Mircea Trofin	4b15fc9ddb	[NFC][MLInliner] Don't initialize in an assert. Since the build bots have assertions enabled, this flew under the radar.	2020-09-09 09:56:07 -07:00
Simon Pilgrim	6e45b98934	X86CallFrameOptimization.cpp - use const references where possible. NFCI.	2020-09-09 16:35:08 +01:00
Simon Pilgrim	e706116e11	X86FrameLowering::adjustStackWithPops - cleanup auto usage. NFCI. Don't use auto for non-obvious types, and use const references.	2020-09-09 16:15:02 +01:00
Qiu Chaofan	88ff4d2ca1	[PowerPC] Fix STRICT_FRINT/STRICT_FNEARBYINT lowering In standard C library, both rint and nearbyint returns rounding result in current rounding mode. But nearbyint never raises inexact exception. On PowerPC, x(v\|s)r(d\|s)pic may modify FPSCR XX, raising inexact exception. So we can't select constrained fnearbyint into xvrdpic. One exception here is xsrqpi, which will not raise inexact exception, so fnearbyint f128 is okay here. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87220	2020-09-09 22:40:58 +08:00
Jay Foad	649bde488c	[AMDGPU] Simplify S_SETREG_B32 case in EmitInstrWithCustomInserter NFC.	2020-09-09 15:18:31 +01:00
Dmitry Preobrazhensky	95b7040e43	[AMDGPU][MC] Improved diagnostic messages for invalid registers Corrected parser to issue meaningful error messages for invalid and malformed registers. See bug 41303: https://bugs.llvm.org/show_bug.cgi?id=41303 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D87234	2020-09-09 16:44:03 +03:00
Alon Kom	818cf30b83	[MachinePipeliner] Fix II_setByPragma initialization II_setByPragma was not reset between 2 calls of the MachinePipleiner pass Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D87088	2020-09-09 13:38:35 +00:00
Denis Antrushin	4358fa782e	[Statepoints] Update DAG root after emitting statepoint. Since we always generate CopyToRegs for statepoint results, we must update DAG root after emitting statepoint, so that these copies are scheduled before any possible local uses. Note: getControlRoot() flushes all PendingExports, not only those we generates for relocates. If that'll become a problem, we can change it to flushing relocate exports only. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D87251	2020-09-09 20:22:10 +07:00
Ronak Chauhan	f078577f31	Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors" This reverts commit `487a805310`. Tests fail on big endian machines.	2020-09-09 18:01:28 +05:30
Simon Pilgrim	d816499f95	[KnownBits] Move SelectionDAG::computeKnownBits ISD::ABS handling to KnownBits::abs Move the ISD::ABS handling to a KnownBits::abs handler, to simplify future implementations in ValueTracking/GlobalISel.	2020-09-09 13:22:58 +01:00
David Stenberg	48fc781438	[UnifyFunctionExitNodes] Fix Modified status for unreachable blocks If a function had at most one return block, the pass would return false regardless if an unified unreachable block was created. This patch fixes that by refactoring runOnFunction into two separate helper functions for handling the unreachable blocks respectively the return blocks, as suggested by @bjope in a review comment. This was caught using the check introduced by D80916. Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D85818	2020-09-09 13:36:03 +02:00
Juneyoung Lee	36c8621638	[BuildLibCalls] Add more noundef to library functions This patch follows D85345 and adds more noundef attributes to return values/arguments of library functions that are mostly about accessing the file system or processes. A few functions like `chmod` or `times` use typedef `mode_t` and `clock_t`. They are neither struct nor union, so they cannot contain undef even if they're lowered to iN in IR. So, it is fine to add noundef to them. - clock_t's actual type is size_t (C17, 7.27.1.3), so it isn't struct or union. - For mode_t, either int or long is used in practice because programmers use bit manipulation. So, I think it is okay that it's never aggregate in practice. After this patch, the remaining library functions are those that eagerly participate in optimizations: they can be removed, reordered, or introduced by a transformation from primitive IR operations. For them, a few testings is needed, since it may not be valid to add noundef anymore even if C standard says it's okay. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85894	2020-09-09 20:33:35 +09:00
Juneyoung Lee	25ce1e0497	[ValueTracking] Add UndefOrPoison/Poison-only version of relevant functions This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison. isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84242	2020-09-09 20:00:26 +09:00
Simon Pilgrim	455cce3e21	TrigramIndex.cpp - remove unnecessary includes. NFCI. TrigramIndex.h already includes most of these.	2020-09-09 11:38:31 +01:00
Simon Pilgrim	f16b2d8315	ARMTargetParser.cpp - use auto const references in for range loops. NFCI. Fix static analysis warnings about unnecessary copies.	2020-09-09 11:38:31 +01:00
Simon Pilgrim	24ecfdac7b	[APFloat] Fix uninitialized variable in IEEEFloat constructors Some constructors of IEEEFloat do not initialize member variable exponent. Fix it by initializing exponent with the following values: For NaNs, the `exponent` is `maxExponent+1`. For Infinities, the `exponent` is `maxExponent+1`. For Zeroes, the `exponent` is `maxExponent-1`. Patch by: @nullptr.cpp (Yang Fan) Differential Revision: https://reviews.llvm.org/D86997	2020-09-09 11:38:30 +01:00
Mirko Brkusanin	43af2a6faa	[AMDGPU] Workaround for LDS Misalignment bug on GFX10 Add subtarget feature check to avoid using ds_read/write_b96/128 with too low alignment if a bug is present on that specific hardware. Add this "feature" to GFX 10.1.1 as it is also affected. Add global-isel test.	2020-09-09 11:46:09 +02:00
Florian Hahn	2bcc4db761	[EarlyCSE] Explicitly require AAResultsWrapperPass. The MemorySSAWrapperPass depends on AAResultsWrapperPass and if MemorySSA is preserved but AAResultsWrapperPass is not, this could lead to a crash when updating the last user of the MemorySSAWrapperPass. Alternatively AAResultsWrapperPass could be marked preserved by GVN, but I am not sure if that would be safe. I am not sure what is required in order to preserve AAResultsWrapperPass. At the moment, it seems like a couple of passes that do similar transforms to GVN are preserving it. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87137	2020-09-09 09:14:50 +01:00
Denis Antrushin	2a52c3301a	[Statepoints] Properly handle const base pointer. Current code in InstEmitter assumes all GC pointers are either VRegs or stack slots - hence, taking only one operand. But it is possible to have constant base, in which case it occupies two machine operands. Add a convinience function to StackMaps to get index of next meta argument and use it in InsrEmitter to properly advance to the next statepoint meta operand. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D87252	2020-09-09 14:07:00 +07:00
Sam Parker	3ebc755227	[ARM] Try to rematerialize VCTP instructions We really want to try and avoid spilling P0, which can be difficult since there's only one register, so try to rematerialize any VCTP instructions. Differential Revision: https://reviews.llvm.org/D87280	2020-09-09 07:41:22 +01:00
Johannes Doerfert	d445b6dfec	[Attributor] Cleanup `::initialize` of various AAs This commit cleans up the ::initialize method of various AAs in the following ways: - If an associated function is required, give up on declarations. This was discovered as a real problem when lots of llvm.dbg.XXX call sites were assumed `noreturn` until proven otherwise. That does not make any sense and caused huge regressions and missed deductions. - Require more associated declarations for function interface AAs. - Use the IRAttribute::initialize to determine if function interface AAs can be used in IPO, don't replicate the checks (especially isFunctionIPOAmendable) all over the place. Arguably the function declaration check should be moved to some central place to.	2020-09-09 01:38:25 -05:00
Fangrui Song	6a9a0bfc33	[llvm-cov gcov] Simply computation of line counts and exit block counter	2020-09-08 23:15:37 -07:00
Johannes Doerfert	849146ba93	[Attributor] Associate the callback callee with a call site argument (if any) If we have a callback, call site arguments were already associated with the callback callee. Now we also associate the function with the callback callee, thus we know ensure that the following holds true (if all return nonnull): `getAssociatedArgument()->getParent() == getAssociatedFunction()` To test this an early exit from `AAMemoryBehaviorCallSiteArgument::initialize`` is included as well. Without the change to getAssociatedFunction() this kind of early exit for declarations would cause callback call site arguments to miss out.	2020-09-09 00:52:17 -05:00
Johannes Doerfert	cefd2a2c70	[Attributor] Cleanup `IRPosition::getArgNo` usages As we handle callback calls we need to disambiguate the call site argument number from the callee argument number. While always equal in non-callback calls, a callback comes with a partial parameter-argument mapping so there is no implicit correspondence. Here we split `IRPosition::getArgNo()` into two public functions, `getCallSiteArgNo()` and `getCalleeArgNo()`. Usages are adjusted to pick the right one for their purpose. This fixed some problems that would have been exposed as we more aggressively optimize callbacks.	2020-09-09 00:52:17 -05:00
Johannes Doerfert	c0ab901bdd	[Attributor] Selectively look at the callee even when there are operand bundles While operand bundles carry unpredictable semantics, we know some of them and can therefore "ignore" them. In this case we allow to look at the declaration of `llvm.assume` when asked for the attributes at a call site. The assume operand bundles we have do not invalidate the declaration attributes. We cannot test this in isolation because the llvm.assume attributes are determined by the parser. However, a follow up patch will provide test coverage.	2020-09-09 00:52:17 -05:00
Johannes Doerfert	d5d75f61e5	[Attributor] Provide a command line option that limits recursion depth In `MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.cpp` we initialized attributes until stack frame ~35k caused space to run out. The initial size 1024 is pretty much random.	2020-09-09 00:47:02 -05:00
Max Kazantsev	795e4ee9d2	[NFC] Move functon from IndVarSimplify to SCEV This function can be reused in other places. Differential Revision: https://reviews.llvm.org/D87274 Reviewed By: fhahn, lebedev.ri	2020-09-09 11:20:59 +07:00
Krzysztof Parzyszek	c2b7b9b642	[Hexagon] Fix order of operands in V6_vdealb4w	2020-09-08 22:09:28 -05:00
Fangrui Song	b9d086693b	[llvm-cov gcov] Compute unmeasured arc counts by Kirchhoff's circuit law For a CFG G=(V,E), Knuth describes that by Kirchoff's circuit law, the minimum number of counters necessary is \|E\|-(\|V\|-1). The emitted edges form a spanning tree. libgcov emitted .gcda files leverages this optimization while clang --coverage's doesn't. Propagate counts by Kirchhoff's circuit law so that llvm-cov gcov can correctly print line counts of gcc --coverage emitted files and enable the future improvement of clang --coverage.	2020-09-08 18:45:11 -07:00
Brad Smith	88b368a1c4	[PowerPC] Set setMaxAtomicSizeInBitsSupported appropriately for 32-bit PowerPC in PPCTargetLowering Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D86165	2020-09-08 21:21:14 -04:00
Mircea Trofin	4013bab9c4	[NFC][ThinLTO] EmbedBitcodeSection doesn't need the Config Instead, passing in the command line options, initialized to nullptr. In an upcoming patch, we can then use the parameter to pass actual command line options. Differential Revision: https://reviews.llvm.org/D87336	2020-09-08 17:14:44 -07:00
Krzysztof Parzyszek	055d209589	Handle masked loads and stores in MemoryLocation/Dependence Differential Revision: https://reviews.llvm.org/D87061	2020-09-08 19:08:44 -05:00
David Blaikie	be561fad1e	Remove unused variable(s)	2020-09-08 16:58:01 -07:00
Craig Topper	844e94a502	[SelectionDAGBuilder] Remove Unnecessary FastMathFlags temporary. Use SDNodeFlags instead. NFCI This was a missed simplication in D87200	2020-09-08 15:50:12 -07:00
David Blaikie	69da27c749	llvm-symbolizer: Add optional "start file" to match "start line" Since a function might have portions of its code coming from multiple different files, "start line" is ambiguous (it can't just be resolved relative to the file/line specified). Add start file to disambiguate it.	2020-09-08 15:40:58 -07:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
Krzysztof Parzyszek	d183f47261	[Hexagon] Handle widening of truncation's operand with legal result Failing example: v8i8 = truncate v8i32. v8i8 is legal, but v8i32 was widened to HVX. Make sure that v8i8 does not get altered (even if it's changed to another legal type).	2020-09-08 16:07:39 -05:00
Nikita Popov	8453fbf088	[ValueTracking] Compute known bits of min/max intrinsics Implement known bits for the min/max intrinsics based on the recently added KnownBits primitives.	2020-09-08 21:08:17 +02:00
David Stenberg	17dce2fe43	[UnifyFunctionExitNodes] Remove unused getters, NFC The get{Return,Unwind,Unreachable}Block functions in UnifyFunctionExitNodes have not been used for many years, so just remove them. Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D87078	2020-09-08 20:42:28 +02:00
Nikita Popov	f6b87da0c7	[InstCombine] Fold comparison of abs with int min If the abs is poisoning, this is already folded to true/false. For non-poisoning abs, we can convert this to a comparison with the operand.	2020-09-08 20:23:03 +02:00
Nikita Popov	e97f3b1b43	[InstCombine] Fold abs of known negative operand If we know that the abs operand is known negative, we can replace it with a neg. To avoid computing known bits twice, I've removed the fold for the non-negative case from InstSimplify. Both the non-negative and the negative case are handled by InstCombine now, with one known bits call. Differential Revision: https://reviews.llvm.org/D87196	2020-09-08 20:14:35 +02:00
Xun Li	59a467ee4f	[Coroutine] Make dealing with alloca spills more robust D66230 attempted to fix a problem where when there are allocas used before CoroBegin. It keeps allocas and their uses stay in put if there are no escapse/changes to the data before CoroBegin. Unfortunately that's incorrect. Consider this code: %var = alloca i32 %1 = getelementptr .. %var; stays put %f = call i8* @llvm.coro.begin store ... %1 After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed). To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior. There are also a few other minor issues, such as incorrect dominate condition check in the ptr visitor, unhandled memory intrinsics and etc. Ths patch attempts to fix some of these issue, and make it more robust to deal with aliases. While visiting through the alloca pointer, we also keep track of all aliases created that will be used after CoroBegin. We track the offset of each alias, and then reacreate these aliases after CoroBegin using these offset. It's worth noting that this is not perfect and there will still be cases we cannot handle. I think it's impractical to handle all cases given the current design. This patch makes it more robust and should be a pure win. In the meantime, we need to think about what how to completely elimiante these issues, likely through the route as @rjmccall mentioned in D66230. Differential Revision: https://reviews.llvm.org/D86859	2020-09-08 10:59:13 -07:00
Craig Topper	e6bb4c8e7b	[X86] SSE4_A should only imply SSE3 not SSSE3 in the frontend. SSE4_1 and SSE4_2 due imply SSSE3. So I guess I got confused when switching the code to being table based in D83273. Fixes PR47464	2020-09-08 10:50:59 -07:00
Simon Pilgrim	0dacf3b5ac	RISCVMatInt.h - remove unnecessary includes. NFCI. Add APInt forward declaration and move include to RISCVMatInt.cpp	2020-09-08 18:25:24 +01:00
Volkan Keles	1242dd330d	GlobalISel: Combine `op undef, x` to 0 https://reviews.llvm.org/D86611	2020-09-08 09:46:38 -07:00
Heejin Ahn	d25c17f317	[WebAssembly] Fix fixEndsAtEndOfFunction for try-catch When the function return type is non-void and `end` instructions are at the very end of a function, CFGStackify's `fixEndsAtEndOfFunction` function fixes the corresponding block/loop/try's type to match the function's return type. This is applied to consecutive `end` markers at the end of a function. For example, when the function return type is `i32`, ``` block i32 ;; return type is fixed to i32 ... loop i32 ;; return type is fixed to i32 ... end_loop end_block end_function ``` But try-catch is a little different, because it consists of two parts: a try part and a catch part, and both parts' return type should satisfy the function's return type. Which means, ``` try i32 ;; return type is fixed to i32 ... block i32 ;; this should be changed i32 too! ... end_block catch ... end_try end_function ``` As you can see in this example, it is not sufficient to only `end` instructions at the end of a function; in case of `try`, we should check instructions before `catch`es, in case their corresponding `try`'s type has been fixed. This changes `fixEndsAtEndOfFunction`'s algorithm to use a worklist that contains a reverse iterator, each of which is a starting point for a new backward `end` instruction search. Fixes https://bugs.llvm.org/show_bug.cgi?id=47413. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D87207	2020-09-08 09:27:40 -07:00
Simon Pilgrim	3c83b967cf	LiveRegUnits.h - reduce MachineRegisterInfo.h include. NFC. We only need to include MachineInstrBundle.h, but exposes an implicit dependency in MachineOutliner.h. Also, remove duplicate includes from LiveRegUnits.cpp + MachineOutliner.cpp.	2020-09-08 17:27:00 +01:00
Ronak Chauhan	487a805310	[AMDGPU] Support disassembly for AMDGPU kernel descriptors Decode AMDGPU Kernel descriptors as assembler directives. Reviewed By: scott.linder, jhenderson, kzhuravl Differential Revision: https://reviews.llvm.org/D80713	2020-09-08 21:26:11 +05:30
Jonas Paulsson	6dc3e22b57	[DAGTypeLegalizer] Handle ZERO_EXTEND of promoted type in WidenVecRes_Convert. On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert() always ended up being scalarized, because the type action of the input is promotion which was previously an unhandled case in this method. This fixes https://bugs.llvm.org/show_bug.cgi?id=47132. Differential Revision: https://reviews.llvm.org/D86268 Patch by Eli Friedman. Review: Ulrich Weigand	2020-09-08 16:49:51 +02:00
Florian Hahn	c7b7c32f4a	[DSE,MemorySSA] Increase walker limit a bit. This slightly bumps the walker limit so that it covers more cases while not increasing compile-time too much: http://llvm-compile-time-tracker.com/compare.php?from=0fc1c2b51ba0cfb9145139af35be638333865251&to=91144a50ea4fa82c0c877e77784f60371640b263&stat=instructions	2020-09-08 14:55:46 +01:00
Simon Pilgrim	fcff2c32c0	X86CallLowering.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Simon Pilgrim	0729ae367a	X86DomainReassignment.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Xing GUO	25c3fa3f13	[DWARFYAML] Make the debug_ranges section optional. This patch makes the debug_ranges section optional. When we specify an empty debug_ranges section, yaml2obj only emits the section header. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D87263	2020-09-08 19:55:47 +08:00
Sam Tebbs	7aabb6ad77	[ARM][LowOverheadLoops] Remove modifications to the correct element count register After my patch at D86087, code that now uses the mov operand rather than the vctp operand will no longer remove modifications to the vctp operand as they should. This patch fixes that by explicitly removing modifications to the vctp operand rather than the register used as the element count.	2020-09-08 10:30:05 +01:00
Qiu Chaofan	8d9c13f37d	Revert "[PowerPC] Implement instruction clustering for stores" This reverts commit `3c0b325023`, (along with `ea795304` and `bb39eb9e`) since it breaks test with UB sanitizer.	2020-09-08 17:24:08 +08:00
Serge Guelton	38778e1087	Provide anchor for compiler extensions This patch is cherry-picked from 04b0a4e22e3b4549f9d241f8a9f37eebecb62a31, and amended to prevent an undefined reference to `llvm::EnableABIBreakingChecks'	2020-09-08 10:33:38 +02:00
Qiu Chaofan	bb39eb9e7f	[PowerPC] Fix getMemOperandWithOffsetWidth Commit `3c0b3250` introduced memory cluster under pwr10 target, but a check for operands was unexpectedly removed. This adds it back to avoid regression.	2020-09-08 15:35:25 +08:00
Simon Wallis	8ee1419ab6	[AARCH64][RegisterCoalescer] clang miscompiles zero-extension to long long Implement AArch64 variant of shouldCoalesce() to detect a known failing case and prevent the coalescing of a 32-bit copy into a 64-bit sign-extending load. Do not coalesce in the following case: COPY where source is bottom 32 bits of a 64-register, and destination is a 32-bit subregister of a 64-bit register, ie it causes the rest of the register to be implicitly set to zero. A mir test has been added. In the test case, the 32-bit copy implements a 32 to 64 bit zero extension and relies on the upper 32 bits being zeroed. Coalescing to the result of the 64-bit load meant overwriting the upper 32 bits incorrectly when the loaded byte was negative. Reviewed By: john.brawn Differential Revision: https://reviews.llvm.org/D85956	2020-09-08 08:04:52 +01:00
Mikael Holmen	ea795304ec	[PowerPC] Add parentheses to silence gcc warning Without gcc 7.4 warns with ../lib/Target/PowerPC/PPCInstrInfo.cpp:2284:25: warning: suggest parentheses around '&&' within '\|\|' [-Wparentheses] BaseOp1.isFI() && ~~~~~~~~~~~~~~~^~ "Only base registers and frame indices are supported."); ~	2020-09-08 08:39:57 +02:00
Andrew Wei	78071fb524	[LSR] Canonicalize a formula before insert it into the list In GenerateConstantOffsetsImpl, we may generate non canonical Formula if BaseRegs of that Formula is updated and includes a recurrent expr reg related with current loop while its ScaledReg is not. Patched by: mdchen Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86939	2020-09-08 13:14:53 +08:00
Johannes Doerfert	711bf7dcf9	[Attributor][FIX] Don't crash on internalizing linkonce_odr hidden functions The CloneFunctionInto has implicit requirements with regards to the linkage and visibility of the function. We now update these after we did the CloneFunctionInto on the copy with the same linkage and visibility as the original.	2020-09-07 23:38:09 -05:00
Johannes Doerfert	e6208849c8	[Attributor][NFC] Change variable spelling	2020-09-07 23:38:09 -05:00
Johannes Doerfert	8637acac5a	[Attributor][NFC] Clang tidy: no else after continue	2020-09-07 23:38:08 -05:00
Johannes Doerfert	ff70c25d76	[Attributor][NFC] Expand `auto` types (clang-fix-it)	2020-09-07 23:38:08 -05:00
Johannes Doerfert	79651265b2	[Attributor][FIX] Properly return changed if the IR was modified Deleting or replacing anything is certainly a modification. This caused a later assertion in IPSCCP when compiling 400.perlbench with the new PM. I'm not sure how to test this.	2020-09-07 23:38:08 -05:00
Qiu Chaofan	3c0b325023	[PowerPC] Implement instruction clustering for stores On Power10, it's profitable to schedule some stores with adjacent target address together. This patch implements this feature. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D86754	2020-09-08 11:03:09 +08:00
Florian Hahn	efb8e156da	[DSE,MemorySSA] Add an early check for read clobbers to traversal. Depending on the benchmark, this early exit can save a substantial amount of compile-time: http://llvm-compile-time-tracker.com/compare.php?from=505f2d817aa8e07ba98e5fd4a8f6ff0666f89df1&to=eb4e441147f9b4b7a5fcbbc57428cadbe9e01f10&stat=instructions	2020-09-07 23:22:10 +01:00
Roman Lebedev	bb7d3af113	Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline This was reverted in `503deec218` because it caused gigantic increase (3x) in branch mispredictions in certain benchmarks on certain CPU's, see https://reviews.llvm.org/D84108#2227365. It has since been investigated and here are the results: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html > It's an amazingly severe regression, but it's also all due to branch > mispredicts (about 3x without this). The code layout looks ok so there's > probably something else to deal with. I'm not sure there's anything we can > reasonably do so we'll just have to take the hit for now and wait for > another code reorganization to make the branch predictor a bit more happy :) > > Thanks for giving us some time to investigate and feel free to recommit > whenever you'd like. > > -eric So let's just reland this. Original commit message: I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108 This reverts commit `503deec218`.	2020-09-08 00:24:03 +03:00
Nikita Popov	ddab4cd83e	[KnownBits] Avoid some copies (NFC) These lambdas don't need copies, use const reference.	2020-09-07 22:19:29 +02:00
Nikita Popov	9fb46a452d	[SCCP] Compute ranges for supported intrinsics For intrinsics supported by ConstantRange, compute the result range based on the argument ranges. We do this independently of whether some or all of the input ranges are full, as we can often still constrain the result in some way. Differential Revision: https://reviews.llvm.org/D87183	2020-09-07 22:16:06 +02:00
Craig Topper	da79b1eecc	[SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported. Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and XORs to expand ABS. This is the multi-part version of the sequence we use in LegalizeDAG. It's also the same as the Custom sequence uses for i64 on 32-bit and i128 on 64-bit. So we can remove the X86 customization. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87215	2020-09-07 13:15:26 -07:00
Sanjay Patel	8b30067919	[InstCombine] improve fold of pointer differences This was supposed to be an NFC cleanup, but there's a real logic difference (did not drop 'nsw') visible in some tests in addition to an efficiency improvement. This is because in the case where we have 2 GEPs, the code was always swapping the operands and negating the result. But if we have 2 GEPs, we should never need swapping/negation AFAICT. This is part of improving flags propagation noticed with PR47430.	2020-09-07 15:54:32 -04:00
Craig Topper	01b3e16757	[X86] Use the same sequence for i128 ISD::ABS on 64-bit targets as we use for i64 on 32-bit targets. Differential Revision: https://reviews.llvm.org/D87214	2020-09-07 11:14:05 -07:00
Sanjay Patel	7a06b166b1	[DAGCombiner] allow more store merging for non-i8 truncated ops This is a follow-up suggested in D86420 - if we have a pair of stores in inverted order for the target endian, we can rotate the source bits into place. The "be_i64_to_i16_order" test shows a limitation of the current function (which might be avoided if we integrate this function with the other cases in mergeConsecutiveStores). In the earlier "be_i64_to_i16" test, we skip the first 2 stores because we do not match the full set as consecutive or rotate-able, but then we reach the last 2 stores and see that they are an inverted pair of 16-bit stores. The "be_i64_to_i16_order" test alters the program order of the stores, so we miss matching the sub-pattern. Differential Revision: https://reviews.llvm.org/D87112	2020-09-07 14:12:36 -04:00
Eric Astor	a3ec4a3158	[ms] [llvm-ml] Allow use of locally-defined variables in expressions MASM allows variables defined by equate statements to be used in expressions. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86946	2020-09-07 14:00:14 -04:00
Eric Astor	2feb6e9b84	[ms] [llvm-ml] Fix STRUCT field alignment MASM aligns fields to the _minimum_ of the STRUCT alignment value and the size of the next field. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86945	2020-09-07 13:58:59 -04:00
Eric Astor	e52e7ad54d	[ms] [llvm-ml] Add support for bitwise named operators (AND, NOT, OR) in MASM Add support for expressions of the form '1 or 2', etc. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86944	2020-09-07 13:57:54 -04:00
Simon Pilgrim	5ea9e655ef	VPlan.h - remove unnecessary forward declarations. NFCI. Already defined in includes.	2020-09-07 18:35:06 +01:00
Simon Pilgrim	4e89a0ab02	MipsISelLowering.h - remove CCState/CCValAssign forward declarations. NFCI. These are already defined in the CallingConvLower.h include.	2020-09-07 18:15:26 +01:00
Simon Pilgrim	95ca3aacf0	BTFDebug.h - reduce MachineInstr.h include to forward declaration. NFCI.	2020-09-07 17:51:13 +01:00

1 2 3 4 5 ...

138936 Commits