llvm-project

Commit Graph

Author	SHA1	Message	Date
Shinji Okumura	21e4b9b204	[Attributor][NFC] Add tests to range.ll Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86128	2020-08-19 15:01:14 +09:00
Roman Lebedev	78bd4231bf	[InstCombine] PHI-aware aggregate reconstruction: properly handle duplicate predecessors While it may seem like we can just "deduplicate" the case where some basic block happens to be a predecessor more than once, which happens for e.g. switches, that is not correct thing to do. We must actually add a PHI operand for each predecessor. This was initially reported to me by David Major as a clang crash during gecko build for android.	2020-08-19 01:00:42 +03:00
Sanjay Patel	0b98a59fed	[VectorCombine] add tests for vector loads; NFC	2020-08-18 16:23:33 -04:00
David Green	b8088ada05	[LV] Predicated reduction tests. NFC	2020-08-18 16:02:21 +01:00
Sanjay Patel	c98fcba55c	[SLP] remove instcombine dependency from regression test; NFC InstCombine doesn't do that much here - sinks some instructions and improves alignments - but that should not be part of the SLP pass unit testing.	2020-08-18 10:18:22 -04:00
Sanjay Patel	139da9c4d7	[InstCombine] fold fabs of select with negated operand This is the FP example shown in: https://bugs.llvm.org/PR39474	2020-08-18 09:23:07 -04:00
Sanjay Patel	e0aa335334	[InstCombine] add tests for fneg+fabs; NFC	2020-08-18 09:23:07 -04:00
Sam Parker	5eb705d5dc	[NFC] Add some more Arm tests for IndVarSimplify Copy some generic functions and apply minsize for arm.	2020-08-18 11:24:35 +01:00
Shinji Okumura	5e361e2aa4	[Attributor] Deduce noundef attribute This patch introduces a new abstract attribute `AANoUndef` which corresponds to `noundef` IR attribute and deduce them. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85184	2020-08-18 18:05:54 +09:00
Johannes Doerfert	8abd69aa9e	[Attributor] Bail early if AAMemoryLocation cannot derive anything Before this change we looked through all memory operations in a function even if the first was an unknown call that could do anything. This did cost a lot of time but there is little use to do so. We also avoid creating AAs for things that we would have looked at in case no other AA will; that is the reason for the test changes. Running only the attributor-cgscc pass on a IR version of `llvm-test-suite/MultiSource/Applications/SPASS/clause.c` reduced the time we spend in `AAMemoryLocation::update` from 4% total to 0.9% (disclaimer: no accurate measurements).	2020-08-17 23:36:36 -05:00
Johannes Doerfert	b27bdf955a	[Attributor][FIX] Handle function pointers properly in AANonNull Before we tired to create a dominator tree for a declaration when we wanted to determine if the function pointer is `nonnull`. We now avoid looking at global values if `Value::getPointerDereferenceableBytes` not already determined `nonnull`.	2020-08-17 23:36:35 -05:00
Hamilton Tobon Mosquera	496f8e5b36	[OpenMPOpt][HideMemTransfersLatency] Split __tgt_target_data_begin_mapper into its "issue" and "wait" counterparts. WIP that tries to hide the latency of runtime calls that involve host to device memory transfers by splitting them into their "issue" and "wait" versions. The "issue" is moved upwards as much as possible. The "wait" is moved downards as much as possible. The "issue" issues the memory transfer asynchronously, returning a handle. The "wait" waits in the returned handle for the memory transfer to finish. We still lack of the movement.	2020-08-17 20:56:10 -05:00
Mircea Trofin	62fc44ca3c	[MLInliner] In development mode, obtain the output specs from a file Different training algorithms may produce models that, besides the main policy output (i.e. inline/don't inline), produce additional outputs that are necessary for the next training stage. To facilitate this, in development mode, we require the training policy infrastructure produce a description of the outputs that are interesting to it, in the form of a JSON file. We special-case the first entry in the JSON file as the inlining decision - we care about its value, so we can guide inlining during training - but treat the rest as opaque data that we just copy over to the training log. Differential Revision: https://reviews.llvm.org/D85674	2020-08-17 16:56:47 -07:00
Johannes Doerfert	19bd4ef157	[Attributor] Properly use the call site argument position	2020-08-17 18:21:09 -05:00
Johannes Doerfert	5dfc207c53	[Attributor][FIX] Do not request an AANonNull for non-pointer types	2020-08-17 18:21:08 -05:00
Hamilton Tobon Mosquera	ad03d0647f	[OpenMPOpt][HideMemTransfersLatency] Update regression test with new runtime calls.	2020-08-17 18:16:16 -05:00
Roman Lebedev	03127f795b	[InstCombine] PHI-aware aggregate reconstruction: correctly detect "use" basic block While the original implementation added in D85787 / `ae7f08812e` is not incorrect, it is known to be suboptimal. In particular, it is not incorrect to use the basic block in which the original `insertvalue` instruction is located as the merge point, that is not necessarily optimal, as `@test6` shows. We should look at all the AggElts, and, if they are all defined in the same basic block, then that is the basic block we should use. On RawSpeed library, this catches +4% (+50) more cases. On vanilla LLVM test-suits, this catches +12% (+92) more cases.	2020-08-18 00:45:18 +03:00
Roman Lebedev	4973ca3eac	[NFC][InstCombine] PHI-aware aggregate reconstruction: insert PHI node manually This is NFC at the moment, because right now we always insert the PHI into the same basic block in which the original `insertvalue` instruction is, but that will change. Also, fixes addition of the suffix to the value names.	2020-08-18 00:45:17 +03:00
Roman Lebedev	4791cbdaf9	[NFC][InstCombine] Add more tests for aggregate reconstruction w/ PHI handling Even without handling several layers of PHI nodes, we can handle more cases, as `@test6` shows.	2020-08-18 00:45:17 +03:00
Vitaly Buka	516328860c	[safe-stack] Fix typo in test command line	2020-08-17 13:38:39 -07:00
Tyker	a79e604462	[AssumeBundles] Fix Bug in Assume Queries this bug was causing miscompile. now clang cant properly selfhost with -mllvm --enable-knowledge-retention Reviewed By: jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D83507	2020-08-17 21:36:53 +02:00
Dávid Bolvanský	0f14b2e6cb	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `50c743fa71`. Patch will be split to smaller ones.	2020-08-17 20:44:33 +02:00
Florian Hahn	139810449b	[DSE,MemorySSA] Account for ScanLimit == 0 on entry. Currently the code does not account for the fact that getDomMemoryDef can be called with ScanLimit == 0, if we reached the limit while processing an earlier access. Also tighten the check a bit more and bump the scan limit now that it is handled properly. In some cases, this brings a 2x speedup in terms of compile-time.	2020-08-17 17:55:14 +01:00
Sam Parker	dad04e62f1	[NFC] run update test script On Transforms/LoopUnroll/runtime-small-upperbound.ll	2020-08-17 13:54:28 +01:00
Sanjay Patel	e6b6787d01	[InstCombine] fold abs(X)/X to cmp+select The backend can convert the select-of-constants to bit-hack shift+logic if desirable. https://alive2.llvm.org/ce/z/pgJT6E define i8 @src(i8 %x) { %0: %a = abs i8 %x, 1 %d = sdiv i8 %x, %a ret i8 %d } => define i8 @tgt(i8 %x) { %0: %cond = icmp sgt i8 %x, 255 %r = select i1 %cond, i8 1, i8 255 ret i8 %r } Transformation seems to be correct!	2020-08-17 08:01:28 -04:00
Sanjay Patel	61512ddd2d	[InstCombine] add tests for sdiv-of-abs; NFC	2020-08-17 08:01:27 -04:00
Sam Parker	613d8f2953	[NFC] Run update script on test Update IndVarSimplify/no-iv-rewrite.ll	2020-08-17 12:53:14 +01:00
Cullen Rhodes	2ccde3c96b	[InlineCost] Fix scalable vectors in visitAlloca Discovered as part of the VLS type work (see D85128). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85848	2020-08-17 10:34:27 +00:00
Vitaly Buka	e10e7829bf	[StackSafety] Skip ambiguous lifetime analysis If we can't identify alloca used in lifetime marker we need to assume to worst case scenario. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D84630	2020-08-16 18:05:52 -07:00
Roman Lebedev	ae7f08812e	[InstCombine] Aggregate reconstruction simplification (PR47060) This pattern happens in clang C++ exception lowering code, on unwind branch. We end up having a `landingpad` block after each `invoke`, where RAII cleanup is performed, and the elements of an aggregate `{i8, i32}` holding exception info are `extractvalue`'d, and we then branch to common block that takes extracted `i8` and `i32` elements (via `phi` nodes), form a new aggregate, and finally `resume`'s the exception. The problem is that, if the cleanup block is effectively empty, it shouldn't be there, there shouldn't be that `landingpad` and `resume`, said `invoke` should be a `call`. Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`. But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft, while it is pointless, does not look like "empty cleanup block". So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has higher cost than it could have on unwind branch :S This doesn't happen that often, but it will basically happen once per C++ function with complex CFG that called more than one other function that isn't known to be `nounwind`. I think, this is a missing fold in InstCombine, so i've implemented it. I think, the algorithm/implementation is rather self-explanatory: 1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate. 2. For each element, try to find from which aggregate it was extracted. If it was extracted from the aggregate with identical type, from identical element index, great. 3. If all elements were found to have been extracted from the same aggregate, then we can just use said original source aggregate directly, instead of re-creating it. 4. If we fail to find said aggregate when looking only in the current block, we need be PHI-aware - we might have different source aggregate when coming from each predecessor. I'm not sure if this already handles everything, and there are some FIXME's, i'll deal with all that later in followups. I'd be fine with going with post-commit review here code-wise, but just in case there are thoughts, i'm posting this. On RawSpeed, for example, this has the following effect: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 1253 \| 1253 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 948 \| 1355 \| 407 \| 42.93% \| 42.93% \| \| instcount.NumInsertValueInst \| 4382 \| 3210 \| -1172 \| -26.75% \| 26.75% \| \| simplifycfg.NumSinkCommonCode \| 574 \| 458 \| -116 \| -20.21% \| 20.21% \| \| simplifycfg.NumSinkCommonInstrs \| 1154 \| 921 \| -233 \| -20.19% \| 20.19% \| \| instcount.NumExtractValueInst \| 29017 \| 26397 \| -2620 \| -9.03% \| 9.03% \| \| instcombine.NumDeadInst \| 166618 \| 174705 \| 8087 \| 4.85% \| 4.85% \| \| instcount.NumPHIInst \| 51526 \| 50678 \| -848 \| -1.65% \| 1.65% \| \| instcount.NumLandingPadInst \| 20865 \| 20609 \| -256 \| -1.23% \| 1.23% \| \| instcount.NumInvokeInst \| 34023 \| 33675 \| -348 \| -1.02% \| 1.02% \| \| simplifycfg.NumSimpl \| 113634 \| 114708 \| 1074 \| 0.95% \| 0.95% \| \| instcombine.NumSunkInst \| 15030 \| 14930 \| -100 \| -0.67% \| 0.67% \| \| instcount.TotalBlocks \| 219544 \| 219024 \| -520 \| -0.24% \| 0.24% \| \| instcombine.NumCombined \| 644562 \| 645805 \| 1243 \| 0.19% \| 0.19% \| \| instcount.TotalInsts \| 2139506 \| 2135377 \| -4129 \| -0.19% \| 0.19% \| \| instcount.NumBrInst \| 156988 \| 156821 \| -167 \| -0.11% \| 0.11% \| \| instcount.NumCallInst \| 1206144 \| 1207076 \| 932 \| 0.08% \| 0.08% \| \| instcount.NumResumeInst \| 5193 \| 5190 \| -3 \| -0.06% \| 0.06% \| \| asm-printer.EmittedInsts \| 948580 \| 948299 \| -281 \| -0.03% \| 0.03% \| \| instcount.TotalFuncs \| 11509 \| 11507 \| -2 \| -0.02% \| 0.02% \| \| inline.NumDeleted \| 97595 \| 97597 \| 2 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 210514 \| 210522 \| 8 \| 0.00% \| 0.00% \| ``` So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half, and there is a very apparent decrease in instruction and basic block count. On vanilla llvm-test-suite: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 744 \| 744 \| 0.00% \| 0.00% \| \| instcount.NumInsertValueInst \| 2705 \| 2053 \| -652 \| -24.10% \| 24.10% \| \| simplifycfg.NumInvokes \| 1212 \| 1424 \| 212 \| 17.49% \| 17.49% \| \| instcount.NumExtractValueInst \| 21681 \| 20139 \| -1542 \| -7.11% \| 7.11% \| \| simplifycfg.NumSinkCommonInstrs \| 14575 \| 14361 \| -214 \| -1.47% \| 1.47% \| \| simplifycfg.NumSinkCommonCode \| 6815 \| 6743 \| -72 \| -1.06% \| 1.06% \| \| instcount.NumLandingPadInst \| 14851 \| 14712 \| -139 \| -0.94% \| 0.94% \| \| instcount.NumInvokeInst \| 27510 \| 27332 \| -178 \| -0.65% \| 0.65% \| \| instcombine.NumDeadInst \| 1438173 \| 1443371 \| 5198 \| 0.36% \| 0.36% \| \| instcount.NumResumeInst \| 2880 \| 2872 \| -8 \| -0.28% \| 0.28% \| \| instcombine.NumSunkInst \| 55187 \| 55076 \| -111 \| -0.20% \| 0.20% \| \| instcount.NumPHIInst \| 321366 \| 320916 \| -450 \| -0.14% \| 0.14% \| \| instcount.TotalBlocks \| 886816 \| 886493 \| -323 \| -0.04% \| 0.04% \| \| instcount.TotalInsts \| 7663845 \| 7661108 \| -2737 \| -0.04% \| 0.04% \| \| simplifycfg.NumSimpl \| 886791 \| 887171 \| 380 \| 0.04% \| 0.04% \| \| instcount.NumCallInst \| 553552 \| 553733 \| 181 \| 0.03% \| 0.03% \| \| instcombine.NumCombined \| 3200512 \| 3201202 \| 690 \| 0.02% \| 0.02% \| \| instcount.NumBrInst \| 741794 \| 741656 \| -138 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 14443 \| 14445 \| 2 \| 0.01% \| 0.01% \| \| asm-printer.EmittedInsts \| 7978085 \| 7977916 \| -169 \| 0.00% \| 0.00% \| \| inline.NumDeleted \| 73188 \| 73189 \| 1 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 291959 \| 291968 \| 9 \| 0.00% \| 0.00% \| ``` Roughly similar effect, less instructions and blocks total. See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e. Compile-time wise, this appears to be roughly geomean-neutral: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions And this is a win size-wize in general: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text See https://bugs.llvm.org/show_bug.cgi?id=47060 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85787	2020-08-16 23:27:56 +03:00
Sanjay Patel	29e1d16a3e	Revert "[PhaseOrdering] add test for memcpy removal (PR47114); NFC" This reverts commit `babb59496b`. This test addition was queued up with some unrelated changes, but it seems more likely that we need to fix something internal to -memcpyopt. Also, I'm not sure if including target-specifc attributes in a generic regression test dir will cause bot problems.	2020-08-16 09:52:33 -04:00
Sanjay Patel	3ffb751f3d	[InstCombine] fold copysign with fabs/fneg operand We already get this in the backend, but we need to do it in IR too to consistently get yet more copysign transforms.	2020-08-16 08:53:47 -04:00
Sanjay Patel	4d5fdff434	[InstCombine] add tests for copysign; NFC	2020-08-16 08:53:47 -04:00
Sanjay Patel	babb59496b	[PhaseOrdering] add test for memcpy removal (PR47114); NFC	2020-08-16 08:53:47 -04:00
Wenlei He	577e58bcc7	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. This is a resubmit of https://reviews.llvm.org/D83743	2020-08-15 20:17:21 -07:00
Luofan Chen	87a85f3d57	[Attributor] Use internalized version of non-exact functions This patch internalize non-exact functions and replaces of their uses with the internalized version. Doing this enables the analysis of non-exact functions. We can do this because some non-exact functions with the same name whose linkage is `linkonce_odr` or `weak_odr` should have the same semantics, so we can safely internalize and replace use of them (the result of the other version of this function should be the same.). Note that not all functions can be internalized, e.g., function with `linkonce` or `weak` linkage. For now when specified in commandline, we internalize all functions that meet the requirements without calculating the cost of such internalzation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84167	2020-08-15 20:23:38 +08:00
Dávid Bolvanský	f134fc4f1b	Reland "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"	2020-08-15 12:14:57 +02:00
Martin Storsjö	3e7403a134	Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" This reverts commit `6dbf0cfcf7`. That commit caused failed assertions, e.g. like this: $ cat sprintf-strcpy.c char ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); } $ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.	2020-08-15 09:35:11 +03:00
Philip Reames	1621c004da	[Tests] Be consistent w/definition of statepoint-example These tests use the statepoint-example builtin gc which expects address space #1 to the only non-integral address space. The fact the test used as=0 happened to work, but was caught by a downstream assert. (Literally years ago, I just happened to notice the XFAIL and fix it now.)	2020-08-14 20:45:48 -07:00
Philip Reames	48f4312d4e	Remove inline gc arguments from statepoints The "gc-live" operand bundles were recently added, and all tests have been updated to use that format. A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement. This is an extension to `a96fc46`. "gc-live" hadn't been implemented at the point that patch was initially posted.	2020-08-14 19:44:24 -07:00
Dávid Bolvanský	f62de7c9c7	[SLC] Transform strncpy(dst, "text", C) to memcpy(dst, "text\0\0\0", C) for C <= 128 only Transformation creates big strings for big C values, so bail out for C > 128. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86004	2020-08-15 01:53:32 +02:00
Arthur Eubanks	72effd8d5b	[test][LoopUnroll] Cleanup FullUnroll.ll This is in preparation for enabling proper handling of optnone under the NPM. Most optimizations won't run on an optnone function. Previously the test would rely on lots of optimizations to optimize the IR into a simple infinite loop. This is an optnone function, so clearly that shouldn't be the case. This IR was found by printing the module before the LoopFullUnrollerPass ran. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D85578	2020-08-14 16:05:04 -07:00
Dávid Bolvanský	6dbf0cfcf7	[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str) Transform sprintf(dst, "%s", str) -> strcpy(dst, str) if result is unused Avoid sprintf(dest, "%s", str) -> llvm.memcpy(align 1 dest, align 1 str, strlen(str)+1) if optimizing for size. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85963	2020-08-14 23:48:53 +02:00
Shinji Okumura	5f55a8193c	[Attributor] Implement AAPotentialValues This patch provides an implementation of `AAPotentialValues`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85632	2020-08-14 20:51:14 +09:00
Sam Parker	725400f993	[NFCI][SimpleLoopUnswitch] Adjust CostKind query When getUserCost was transitioned to use an explicit CostKind, TCK_CodeSize was used even though the original kind was implicitly SizeAndLatency so restore this behaviour. We now only query for CodeSize when optimising for minsize. I expect this to not change anything as, I think all, targets will currently return the same value for CodeSize and SizeLatency. Indeed I see no changes in the test suite for Arm, AArch64 and X86. Differential Revision: https://reviews.llvm.org/D85829	2020-08-14 07:54:20 +01:00
Arthur Eubanks	48cd5b72b1	Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" This reverts commit `ab9fc8bae8`. Incorrect transformation if the result is used. Causes breakages, e.g. http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/8193/	2020-08-13 21:05:03 -07:00
Thomas Lively	b182ccfc2d	[WebAssembly] Add missing lit.local.cfg	2020-08-13 16:33:52 -07:00
Arthur Eubanks	41f49736a9	[ConstProp] Handle insertelement constants Previously ConstantFoldExtractElementInstruction() would only work with insertelement instructions, not contants. This properly handles insertelement constants as well. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85865	2020-08-13 15:59:17 -07:00
Dávid Bolvanský	ab9fc8bae8	[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str) Solves 46489	2020-08-14 00:05:55 +02:00
Thomas Lively	d53d952810	[WebAssembly] Allow inlining functions with different features Allow inlining only when the Callee has a subset of the Caller's features. In principle, we should be able to inline regardless of any features because WebAssembly supports features at module granularity, not function granularity, but without this restriction it would be possible for a module to "forget" about features if all the functions that used them were inlined. Requested in PR46812. Differential Revision: https://reviews.llvm.org/D85494	2020-08-13 13:57:43 -07:00
Dávid Bolvanský	18910c4cb5	[Tests] Update strncpy tests	2020-08-13 22:37:44 +02:00
Dávid Bolvanský	5ef2287d36	[SLC] Optimize strncpy(a, a, C) to memcpy(a, a000, C) Solves PR47154	2020-08-13 22:22:51 +02:00
Nikita Popov	524f591a39	[InstSimplify] Add tests for assume with min/max intrinsic (NFC) If we assume one of the operands is smaller/greater, then min/max may be simplified.	2020-08-13 22:10:07 +02:00
Aditya Kumar	1a8c9cd1d9	Fix PR45442: Bail out when MemorySSA information is not available Reviewers: sebpop, uabelho, fhahn Reviewed by: fhahn Differential Revision: https://reviews.llvm.org/D85881	2020-08-13 11:25:58 -07:00
Dávid Bolvanský	50c743fa71	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 19:54:27 +02:00
Aditya Kumar	44716856db	Fix PR45442: Bail out when MemorySSA information is not available	2020-08-13 09:31:18 -07:00
Bjorn Pettersson	11446b02c7	[VectorCombine] Fix for non-zero addrspace when creating vector load from scalar load This is a fixup to commit `43bdac2906`, to make sure the address space from the original load pointer is retained in the vector pointer. Resolves problem with Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. due to address space mismatch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85912	2020-08-13 18:25:32 +02:00
Serguei Katkov	98ba0a5ffe	[InstCombine] Handle gc.relocate(null) in one iteration InstCombine adds users of transformed instruction to working list to process on the same iteration. However gc.relocate may have a hidden user (next gc.relocate) which is connected through gc.statepoint intrinsic and there is no direct def-use chain between them. In this case if the next gc.relocation is already processed it will not be added to worklist and will not be able to be processed on the same iteration. Let's we have the following case: A = gc.relocate(null) B = statepoint(A) C = gc.relocate(B, hidden(A)) If C is already considered then after replacement of A with null, statepoint B instruction will be added to the queue but not C. C can be processed only on the next iteration. If the chain of relocation is pretty long the many iteration may be required. This change is to reduce the number of iteration to meet the latest changes related to reducing infinite loop threshold. This is a quick (not best) fix. In the follow up patches I plan to move gc relocation handling into statepoint handler. This should also help to remove unused gc live entries in statepoint bundle. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D75598	2020-08-13 23:16:27 +07:00
Roman Lebedev	3bd2513ebd	[NFC] Add test case showing the miscompile being fixed by D83507 See https://reviews.llvm.org/D83507	2020-08-13 16:13:33 +03:00
David Stenberg	e8ebebb0bd	[InstCombine] Fix incorrect Modified status When removing instructions from unreachable blocks, and only debug info intrinsics were removed, InstCombine could incorrectly return a false Modified status. This is fixed by making removeAllNonTerminatorAndEHPadInstructions() also return how many debug info intrinsics that were removed, and take that into account. This was caught using the check introduced by D80916. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D85839	2020-08-13 15:10:41 +02:00
Dávid Bolvanský	f9264995a6	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `44587e2f7e`. Sanitizer tests need to be updated.	2020-08-13 14:37:40 +02:00
Dávid Bolvanský	44587e2f7e	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 14:23:58 +02:00
Florian Hahn	3b0878a370	[DSE,MSSA] Fix crash when using tryToMergePartialOverlappingStores. We are re-using tryToMergePartialOverlappingStores, which requires earlier to domiante Later. In the long run, tryToMergeParialOverlappingStores should be re-written using MemorySSA. Fixes PR46513.	2020-08-13 12:07:56 +01:00
Florian Hahn	3948341fa5	[InstCombine] Precommit tests for PR47149.	2020-08-13 10:36:52 +01:00
Ali Tamur	0581c0b0ee	Revert "[SCEV] Look through single value PHIs." This reverts commit `e441b7a7a0`. This patch causes a compile error in tensorflow opensource project. The stack trace looks like: Point of crash: llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35 (gdb) ptype this type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop] (gdb) p this $1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1, DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8, CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true} (gdb) p this->DenseBlockSet->CurArray $2 = (const void *) 0xfffffffffffffffe I will try to get a case from tensorflow or use creduce to get a small case.	2020-08-12 23:13:24 -07:00
Nikita Popov	eba5f5f798	[ValueTracking] Add abs intrinsics support to computeConstantRange() Implementation is the same as for SPF_ABS.	2020-08-12 22:28:46 +02:00
Nikita Popov	6446c11840	[InstSimplify] Add additional abs intrinsic icmp tests (NFC) While abs >= 0 already folds, some variations thereon don't.	2020-08-12 22:28:46 +02:00
Nikita Popov	7397a019b8	[InstSimplify] Extract abs intrinsic tests into separate file (NFC) Also move some tests from InstCombine to InstSimplify, as they are already handled by InstSimplify.	2020-08-12 22:28:46 +02:00
Nikita Popov	e2040d38a1	[ValueTracking] Support min/max intrinsics in computeConstantRange() The implementation is the same as for the SPF_* case.	2020-08-12 22:07:29 +02:00
Nikita Popov	c1abd47aa1	[InstSimplify] Add tests for icmp of min/max with constants (NFC) Test the case where the constants are not the same, but the result is still known.	2020-08-12 22:07:29 +02:00
Sanjay Patel	23bd33c6ac	[InstCombine] prefer xor with -1 because 'not' is easier to understand (PR32706) This is a retry of rL300977 which was reverted because of infinite loops. We have fixed all of the known places where that would happen, but there's still a chance that this patch will cause infinite loops. This matches the demanded bits behavior in the DAG and should fix: https://bugs.llvm.org/show_bug.cgi?id=32706 Differential Revision: https://reviews.llvm.org/D32255	2020-08-12 15:50:33 -04:00
Sanjay Patel	0a1514d7ca	[InstCombine] add test for 'not' vs 'xor'; NFC	2020-08-12 15:50:33 -04:00
Roman Lebedev	12d93a27e7	[InstCombine] Sanitize undef vector constant to 1 in X(2^C) with X << C (PR47133) While xundef is undef, shift-by-undef is poison, which we must avoid introducing. Also log2(iN undef) is NOT iN undef, because log2(iN undef) u< N. See https://bugs.llvm.org/show_bug.cgi?id=47133	2020-08-12 22:06:53 +03:00
Craig Topper	a7a06ded8b	Recommit "[InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms" and its follow up patches This recommits the following patches now that D85684 has landed `1cf6f210a2` [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison. `469da663f2` [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison `122b0640fc` [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison `ac0af12ed2` [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison `9b1e95329a` [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms	2020-08-12 10:45:27 -07:00
Sanjay Patel	cc892fd9f4	[VectorCombine] early exit if target has no vector registers Based on post-commit discussion in: D81766 Other vectorization passes (SLP and Loop) use this TTI API similarly.	2020-08-12 09:22:31 -04:00
Sanjay Patel	89a7f64afc	[VectorCombine] add test for x86 target with SSE disabled; NFC	2020-08-12 09:22:31 -04:00
Sanjay Patel	912c09e845	[InstCombine] eliminate a pointer cast around insertelement I'm not sure if this solves PR46839 completely, but reducing the casting should help: https://bugs.llvm.org/show_bug.cgi?id=46839 Differential Revision: https://reviews.llvm.org/D85647	2020-08-12 09:08:17 -04:00
Sanjay Patel	b97e402ca5	[VectorCombine] add test for Hexagon that would crash; NFC This test verifies the code change from: rGb0b95dab1ce2 (although that would not be true if PR47128 is fixed)	2020-08-12 08:38:20 -04:00
Sam Parker	ea8448e361	[LoopUnroll] Adjust CostKind query When TTI was updated to use an explicit cost, TCK_CodeSize was used although the default implicit cost would have been the hand-wavey cost of size and latency. So, revert back to this behaviour. This is not expected to have (much) impact on targets since most (all?) of them return the same value for SizeAndLatency and CodeSize. When optimising for size, the logic has been changed to query CodeSize costs instead of SizeAndLatency. This patch also adds a testing option in the unroller so that OptSize thresholds can be specified. Differential Revision: https://reviews.llvm.org/D85723	2020-08-12 12:56:09 +01:00
Cullen Rhodes	511d5aaca3	[Transforms][SROA] Skip uses of allocas where the type is scalable When visiting load and store instructions in SROA skip scalable vectors. This is relevant in the implementation of the 'arm_sve_vector_bits' attribute that is used to define VLS types, where an alloca of a fixed-length vector could be bitcasted to scalable. See D85128 for more information. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85725	2020-08-12 09:35:48 +00:00
Florian Hahn	e441b7a7a0	[SCEV] Look through single value PHIs. Now that SCEVExpander can preserve LCSSA form, we do not have to worry about LCSSA form when trying to look through PHIs. SCEVExpander will take care of inserting LCSSA PHI nodes as required. This increases precision of the analysis in some cases. Reviewed By: mkazantsev, bmahjour Differential Revision: https://reviews.llvm.org/D71539	2020-08-12 10:03:42 +01:00
Johannes Doerfert	3a033921ed	[Attributor][NFC] Reformat tests after D85099 Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D85700	2020-08-12 01:04:19 -05:00
Vedant Kumar	30c1633386	Revert "[Instruction] Add updateLocationAfterHoist helper" This reverts commit `4a646ca9e2`. This is causing some bots to fail with "!dbg attachment points at wrong subprogram for function", like: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/67958/steps/stage%201%20check/logs/stdio	2020-08-11 14:54:09 -07:00
Kazu Hirata	cfdc96714b	[Instcombine] Fix uses of undef (PR46940) Without this patch, we attempt to distribute And over Xor even in unsafe circumstances like so: undef & (true ^ true) ==> (undef & true) ^ (undef & true) and evaluate it to undef instead of false. Note that "true ^ true" may show up implicitly with one true being part of a PHI node. This patch fixes the problem by teaching SimplifyUsingDistributiveLaws to not use undef as part of simplifications. Reviewers: spatel, aqjune, nikic, lebedev.ri, fhahn, jdoerfert Differential Revision: https://reviews.llvm.org/D85687	2020-08-11 14:13:32 -07:00
Vedant Kumar	4a646ca9e2	[Instruction] Add updateLocationAfterHoist helper Introduce a helper on Instruction which can be used to update the debug location after hoisting. Use this in GVN and LICM, where we were mistakenly introducing new line 0 locations after hoisting (the docs recommend dropping the location in this case). For more context, see the discussion in https://reviews.llvm.org/D60913. Differential Revision: https://reviews.llvm.org/D85670	2020-08-11 14:05:20 -07:00
Roman Lebedev	e74e8b2b69	[NFC][InstCombine] Add tests for PHI merging/aggregate reconstruction (PR47060) We should be able to see that the new aggregate we have produced is identical to the source aggregate from which we've extracted the elements that we used to form a new aggregate. This happens (a lot) in clang C++ exception code on unwind branch.	2020-08-11 22:40:29 +03:00
Thomas Lively	f969734c21	Reland "[SLPVectorizer] Pre-commit a test for D85759" This reverts commit `52b71aa8b1`. The problem was a missing lit.local.cfg file, which was causing the test to be incorrectly run on bots that had not built the WebAssembly target.	2020-08-11 12:18:33 -07:00
Thomas Lively	52b71aa8b1	Revert "[SLPVectorizer] Pre-commit a test for D85759" This reverts commit `94791970de`. The test is failing on multiple bots, event though it passes for me locally. Reverting while I investigate further.	2020-08-11 12:11:24 -07:00
Thomas Lively	94791970de	[SLPVectorizer] Pre-commit a test for D85759 `8cc911fa5b` refactored the `getIntrinsicInstrCost` function and was meant to be a nonfunctional change, but it accidentally changed how costs were calculated in the SLP vectorizer, which regressed WebAssembly codegen and resulted in a downstream bug report at https://github.com/emscripten-core/emscripten/issues/11449. The fix for this regression is in D85759, and this patch just pre-commits the test from that patch to demonstrate the regressed behavior first.	2020-08-11 11:30:09 -07:00
Sanjay Patel	1470ce4a76	[InstSimplify] fold min/max with matching min/max operands I think this is the last remaining translation of an existing instcombine transform for the corresponding cmp+sel idiom. This interpretation is more general though - we can remove mismatched signed/unsigned combinations in addition to the more obvious cases. min/max(X, Y) must produce X or Y as the result, so this is just another clause in the existing transform that was already matching a min/max of min/max.	2020-08-11 11:23:15 -04:00
Sanjay Patel	bad205fe0c	[InstSimplify] add tests for min/max intrinsics with common operands; NFC There are 444 = 64 variations. We currently handle some, but not all, of the alternative patterns with cmp+sel in instcombine.	2020-08-11 11:23:15 -04:00
Sanjay Patel	ea8c186c40	[InstCombine] add tests for pointer casts with insertelement; NFC	2020-08-11 11:23:15 -04:00
Sam Parker	ac50efd67c	[NFC][ARM][SimplifyCFG] Add some tests. Add some tests around thresholds and minsize.	2020-08-11 15:13:58 +01:00
Florian Hahn	0b774acf11	[SLP] Make sure instructions are ordered when computing spill cost. The entries in VectorizableTree are not necessarily ordered by their position in basic blocks. Collect them and order them by dominance so later instructions are guaranteed to be visited first. For instructions in different basic blocks, we only scan to the beginning of the block, so their order does not matter, as long as all instructions in a basic block are grouped together. Using dominance ensures a deterministic order. The modified test case contains an example where we compute a wrong spill cost (2) without this patch, even though there is no call between any instruction in the bundle. This seems to have limited practical impact, .e.g on X86 with a recent Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006 there are no binary changes. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82444	2020-08-11 11:18:12 +02:00
Dávid Bolvanský	c2f0101310	[InstCombine] ~(~X + Y) -> X - Y Proof: https://alive2.llvm.org/ce/z/4xharr Solves PR47051 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85593	2020-08-11 11:05:42 +02:00
Juneyoung Lee	63b5b92bc9	[LazyValueInfo] Let getEdgeValueLocal look into freeze instructions This patch makes getEdgeValueLocal more precise when a freeze instruction is given, by adding support for freeze into constantFoldUser Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84629	2020-08-11 16:39:34 +09:00
Shinji Okumura	06eee8748f	[Attributor][NFC] Connect AAPotentialValues with AAValueSimplify This patch enables `AAValueSimplify` to use information from `AAPotentialValues` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85668	2020-08-11 15:52:02 +09:00
Wang, Pengfei	9512525947	[X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics. When we use mask compare intrinsics under strict FP option, the masked elements shouldn't raise any exception. So, we cann't replace the intrinsic with a full compare + "and" operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D85385	2020-08-11 10:28:41 +08:00
Johannes Doerfert	fa5d22a045	[OpenMP][NFC] Reuse OMPIRBuilder `struct ident_t` handling in Clang Replace the `ident_t` handling in Clang with the methods offered by the OMPIRBuilder. This cuts down on the clang code as well as the differences between the two, making further transitions easier. Tests have changed but there should not be a real functional change. The most interesting difference is probably that we stop generating local ident_t allocations for now and just use globals. Given that this happens only with debug info, the location part of the `ident_t` is probably bigger than the test anyway. As the location part is already a global, we can avoid the allocation, memcpy, and store in favor of a constant global that is slightly bigger. This can be revisited if there are complications. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D80735	2020-08-10 17:13:26 -05:00
Arthur Eubanks	aae349e276	[InstSimplify][test] Remove unused parameter in vscale.ll Reviewed By: huihuiz Differential Revision: https://reviews.llvm.org/D85688	2020-08-10 14:48:32 -07:00
Nikita Popov	566a66703f	[InstSimplify] Add test for expand binop undef issue (NFC) Add test case from https://reviews.llvm.org/D83360#2146539.	2020-08-10 22:39:59 +02:00
Wei Mi	4cd8e9b169	[SampleFDO] Stop letting findCalleeFunctionSamples return unrelated profiles for invoke instructions. We see a warning of "No debug information found in function foo: Function profile not used" in a case. The function foo is called by an invoke instruction. It has no debug information because it has attribute((nodebug)) in the definition. It shouldn't have profile instance in the sample profile but compiler thinks it does, that turns out to be a compiler bug in findCalleeFunctionSamples. The bug is exposed when sample-profile-merge-inlinee is enabled recently. Currently in findCalleeFunctionSamples, CalleeName is unset and is empty for invoke instruction. For empty CalleeName, findFunctionSamplesAt will treat the call as an indirect call and will return any inline instance profile at the same location as the instruction. That leads to a wrong profile being returned to function foo. The patch set CalleeName when the instruction is an invoke. Differential Revision: https://reviews.llvm.org/D85664	2020-08-10 12:41:09 -07:00
Fangrui Song	3b21a07fd7	[PGO] Delete dead comdat renaming code related to GlobalAlias. NFC A GlobalAlias is an address-taken user of its aliased function. canRenameComdatFunc has excluded such cases. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D85597	2020-08-10 09:02:04 -07:00
Simon Pilgrim	90f721404f	[SLP] Regenerate load-merge.ll tests Noticed this NFC change in D57779	2020-08-10 16:09:26 +01:00
Sanjay Patel	3d5118b75c	[InstCombine] auto-generate test checks; NFC	2020-08-10 08:27:38 -04:00
Florian Hahn	8393b9fd1f	[LoopInterchange] Move instructions from preheader to outer loop header. Instructions defined in the original inner loop preheader may depend on values defined in the outer loop header, but the inner loop header will become the entry block in the loop nest. Move the instructions from the preheader to the outer loop header, so we do not break dominance. We also have to check for unsafe instructions in the preheader. If there are no unsafe instructions, all instructions should be movable. Currently we move all instructions except the terminator and rely on LICM to hoist out invariant instructions later. Fixes PR45743	2020-08-10 12:41:33 +01:00
Florian Hahn	54cb552b96	[LoopInterchange] Form LCSSA phis for values in orig outer loop header. Values defined in the outer loop header could be used in the inner loop latch. In that case, we need to create LCSSA phis for them, because after interchanging they will be defined in the new inner loop and used in the new outer loop.	2020-08-10 11:33:19 +01:00
Simon Pilgrim	0b26c9eddc	[ScalarizeMaskedMemIntrin][X86] Refresh missed transform test cases from rGc0c3b9a25fee Differential Revision: https://reviews.llvm.org/D85416	2020-08-10 11:14:01 +01:00
Juneyoung Lee	ef018cb65c	[BuildLibCalls] Add noundef to standard I/O functions This patch adds noundef to return value and arguments of standard I/O functions. With this patch, passing undef or poison to the functions becomes undefined behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++, passing undef to them was already UB in source. With this patch, the functions cannot return undef or poison anymore as well. According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified value; 3.19.3 says unspecified value is a valid value of the relevant type, and using unspecified value is unspecified behavior, which is not UB, so it cannot be undef (using undef is UB when e.g. it is used at branch condition). — The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10). — The details of the value stored by the fgetpos function (7.21.9.1). — The details of the value returned by the ftell function for a text stream (7.21.9.4). In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85345	2020-08-10 10:58:25 +09:00
Florian Hahn	d236e1c7b6	[InstSimplify/NewGVN] Add option to control the use of undef. Making use of undef is not safe if the simplification result is not used to replace all uses of the result. This leads to problems in NewGVN, which does not replace all uses in the IR directly. See PR33165 for more details. This patch adds an option to SimplifyQuery to disable the use of undef. Note that I've only guarded uses if isa<UndefValue>/m_Undef where SimplifyQuery is currently available. If we agree on the general direction, I'll update the remaining uses. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84792	2020-08-09 19:16:56 +01:00
Florian Hahn	23817cbd0b	[SCEVExpander] Make sure cast properly dominates Builder's IP. The selected cast must properly dominate the Builder's IP, so we cannot re-use the cast, if it matches the builder's IP.	2020-08-09 16:51:19 +01:00
Aditya Kumar	53ac144848	[HotColdSplit] Add options for splitting cold functions in separate section Add support for (if enabled) splitting cold functions into a separate section in order to further boost locality of hot code. Authored by: rjf (Ruijie Fang) Reviewed by: hiraditya,rcorcs,vsk Differential Revision: https://reviews.llvm.org/D85331	2020-08-09 08:48:12 -07:00
Dávid Bolvanský	dee938e5cc	[Tests] Precommit tests for D85593	2020-08-09 16:07:36 +02:00
Sanjay Patel	43bdac2906	[VectorCombine] try to create vector loads from scalar loads This patch was adjusted to match the most basic pattern that starts with an insertelement (so there's no extract created here). Hopefully, that removes any concern about interfering with other passes. Ie, the transform should almost always be profitable. We could make an argument that this could be part of canonicalization, but we conservatively try not to create vector ops from scalar ops in passes like instcombine. If the transform is not profitable, the backend should be able to re-scalarize the load. Differential Revision: https://reviews.llvm.org/D81766	2020-08-09 09:05:06 -04:00
Florian Hahn	c70f0b9d4a	[SCEVExpander] Avoid re-using existing casts if it means updating users. Currently the SCEVExpander tries to re-use existing casts, even if they are not exactly at the insertion point it was asked to create the cast. To do so in some case, it creates a new cast at the insertion point and updates all users to use the new cast. This behavior is problematic, because it changes the IR outside of the instructions created during the expansion. Therefore we cannot completely undo all changes made during expansion. This re-use should be only an extra optimization, so only using the new cast in the expanded instructions should not be a correctness issue. There are many cases equivalent instructions are created during expansion. This patch also adjusts findInsertPointAfter to skip instructions inserted during expansion. This enables re-using existing casts without the renaming any uses, by picking a better insertion point. Reviewed By: efriedma, lebedev.ri Differential Revision: https://reviews.llvm.org/D84399	2020-08-09 13:25:17 +01:00
Roman Lebedev	e492f0e03b	[SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics SimplifyCFG has two main folds for resumes - one when resume is directly using the landingpad, and the other one where resume is using a PHI node. While for the first case, we were already correctly ignoring all the PHI nodes, and both the debug info intrinsics and lifetime intrinsics, in the PHI-based-one, we weren't ignoring PHI's in the resume block, and weren't ignoring lifetime intrinsics. That is clearly a bug. On RawSpeed library, this results in +9.34% (+81) more invoke->call folds, -0.19% (-39) landing pads, -0.24% (-81) invoke instructions but +51 call instructions and -132 basic blocks. Though, the run-time performance impact appears to be within the noise.	2020-08-08 20:00:28 +03:00
Roman Lebedev	c2ebb32465	[NFC][SimplifyCFG] Add a test showing invoke->call simplification failure	2020-08-08 20:00:28 +03:00
Juneyoung Lee	b6d9add71b	[InstCombine] Optimize select(freeze(icmp eq/ne x, y), x, y) This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y) to x or y. This was needed to resolve slowdown after D84940 is applied. I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear. This patch conservatively writes the pattern in a separate function, foldSelectWithFrozenICmp. The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85533	2020-08-08 15:22:29 +09:00
Juneyoung Lee	595d3b5ecc	[InstCombine] Add tests for select(freeze(icmp x, y), x, y); NFC	2020-08-08 15:09:08 +09:00
Simon Pilgrim	f35992b75b	[SLP][X86] Add smax intrinsic reduction tests SLP currently only matches the ICMP+SELECT patterns for min/max reductions	2020-08-07 11:48:08 +01:00
Simon Pilgrim	aa38e97ad5	[SLP][X86] Add abs/smax/smin/umax/umin intrinsic vectorization tests	2020-08-07 11:23:43 +01:00
Shinji Okumura	c575ba28de	[Attributor] AAPotentialValues Interface This is a split patch of D80991. This patch introduces AAPotentialValues and its interface only. For more detail of AAPotentialValues abstract attribute, see the original patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83283	2020-08-07 17:35:12 +09:00
Max Kazantsev	9b49a4d301	[Test] Add one more test on IndVars that was failing on one of older builds	2020-08-07 14:23:55 +07:00
Shinji Okumura	f13f2e16f0	[Attributor] Check violation of returned position nonnull and noundef attribute in AAUndefinedBehavior This patch is a follow up of D84733. If a function has noundef attribute in returned position, instructions that return undef or poison value cause UB. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85178	2020-08-07 12:02:42 +09:00
Arthur Eubanks	039fb7f68a	[NewPM][GuardWidening] Fix loop guard widening tests under NPM Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D85394	2020-08-06 15:32:59 -07:00
Roman Lebedev	be02adfad7	[InstCombine] Fold (x + C1) * (-1<<C2) --> (-C1 - x) * (1<<C2) Negator knows how to do this, but the one-use reasoning is getting a bit muddy here, we don't really want to increase instruction count, so we need to both lie that "IsNegation" and have an one-use check on the outermost LHS value.	2020-08-06 23:40:16 +03:00
Roman Lebedev	0c1c756a31	[InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold Multiplication is commutative, and either of operands can be negative, so if the RHS is a negated power-of-two, we should try to make it true power-of-two (which will allow us to turn it into a left-shift), by trying to sink the negation down into LHS op. But, we shouldn't re-invent the logic for sinking negation, let's just use Negator for that. Tests and original patch by: Simon Pilgrim @RKSimon! Differential Revision: https://reviews.llvm.org/D85446	2020-08-06 23:39:53 +03:00
Roman Lebedev	a404acb86a	[NFC][InstCombine] Add some more tests for negation sinking into mul	2020-08-06 23:37:17 +03:00
Roman Lebedev	7ce76b06ec	[InstCombine] Fold sdiv exact X, -1<<C --> -(ashr exact X, C) While that does increases instruction count, shift is obviously better than a division. Name: base Pre: (1<<C1) >= 0 %o0 = shl i8 1, C1 %r = sdiv exact i8 C0, %o0 => %r = ashr exact i8 C0, C1 Name: neg %o0 = shl i8 -1, C1 %r = sdiv exact i8 C0, %o0 => %t0 = ashr exact i8 C0, C1 %r = sub i8 0, %t0 Name: reverse Pre: C1 != 0 && C1 u< 8 %t0 = ashr exact i8 C0, C1 %r = sub i8 0, %t0 => %o0 = shl i8 -1, C1 %r = sdiv exact i8 C0, %o0 https://rise4fun.com/Alive/MRplf	2020-08-06 23:37:16 +03:00
Roman Lebedev	442cb88f53	[InstCombine] Generalize sdiv exact X, 1<<C --> ashr exact X, C fold to handle non-splat vectors	2020-08-06 23:37:15 +03:00
Roman Lebedev	8633a0d985	[NFC][InstCombine] Better tests for x s/EXACT (1 << y) pattern	2020-08-06 23:37:15 +03:00
Roman Lebedev	1c21635c94	[NFC][InstCombine] Tests for x s/EXACT (-1 << y) pattern	2020-08-06 23:37:15 +03:00
Sanjay Patel	250a167c41	[InstSimplify] avoid crashing by trying to rem-by-zero Bug was noted in the post-commit comments for: rGe8760bb9a8a3	2020-08-06 16:06:31 -04:00
Sanjay Patel	c9bcc237a2	[VectorCombine] add tests for load+insert; NFC	2020-08-06 15:45:02 -04:00
Anton Afanasyev	a7478fab6c	[SLP] Fix order of `insertelement`/`insertvalue` seed operands Summary: This patch takes the indices operands of `insertelement`/`insertvalue` into account while generation of seed elements for `findBuildAggregate()`. This function has kept the original order of `insert`s before. Also this patch optimizes `findBuildAggregate()` preventing it from redundant temporary vector allocations and its multiple reversing. Fixes llvm.org/pr44067 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83779	2020-08-06 22:09:24 +03:00
Arthur Eubanks	d0acd97c68	[NewPM][LoopUnswitch] Pin loop-unswitch to legacy PM or use simple-loop-unswitch As mentioned in http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html, loop-unswitch has not been ported to the NPM. Instead people are using simple-loop-unswitch. Pin all tests in Transforms/LoopUnswitch to legacy PM and replace all other uses of loop-unswitch with simple-loop-unswitch. One test that didn't fit into the above was 2014-06-21-congruent-constant.ll which seems to only pass with loop-unswitch. That is also pinned to legacy PM. Now all tests containing "-loop-unswitch" anywhere in the test succeed with NPM turned on by default. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D85360	2020-08-06 10:56:00 -07:00
Arthur Eubanks	5bb6b8250a	[NewPM] Pin -assumption-cache-tracker tests to legacy PM All tests have corresponding NPM RUN lines. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D85395	2020-08-06 10:38:03 -07:00
Simon Pilgrim	3b93464dcf	[SLP][X86] Regenerate sdiv test noticed in D83779. NFC.	2020-08-06 18:00:21 +01:00
Simon Pilgrim	8f5b2cb828	[InstCombine] Add tests for mul(add(x,c),negpow2) -> mul(sub(-c,x),pow2) fold Also fix some undef vector elements in the similar vector tests that I missed.	2020-08-06 17:13:28 +01:00
Mircea Trofin	87fb7aa137	[llvm][MLInliner] Don't log 'mandatory' events We don't want mandatory events in the training log. We do want to handle them, to keep the native size accounting accurate, but that's all. Fixed the code, also expanded the test to capture this. Differential Revision: https://reviews.llvm.org/D85373	2020-08-06 09:04:15 -07:00
Simon Pilgrim	d1a91d947f	[InstCombine] Add tests for mul(sub(x,y),negpow2) -> mul(sub(y,x),pow2) fold Add full vector coverage (that currently are not folded).	2020-08-06 16:31:57 +01:00
Sanjay Patel	60f2c6a94c	[PatternMatch] allow intrinsic form of min/max with existing matchers I skimmed the existing users of these matchers and don't see any problems (eg, the caller assumes the matched value was a select instruction without checking). So I think we can generalize the matching to allow the new intrinsics or the cmp+select idioms. I did not find any unit tests for the matchers, so added some basics there. The instsimplify tests are adapted from existing tests for the cmp+select pattern and cover the folds in simplifyICmpWithMinMax(). Differential Revision: https://reviews.llvm.org/D85230	2020-08-06 10:50:24 -04:00
Juneyoung Lee	c771087161	[InstCombine] Fold freeze(undef) into a proper constant This is a simple patch that folds freeze(undef) into a proper constant after inspecting its uses. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84948	2020-08-06 18:40:04 +09:00
Juneyoung Lee	54a1097b83	[InstCombine] Add tests for D84948; NFC	2020-08-06 18:23:45 +09:00
David Green	745bf6cf44	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-06 10:10:50 +01:00
Roman Lebedev	141357663e	[InstCombine] (-NSW x) u<= x --> x s<=0 (PR39480) Name: (-x) u<= x --> x s<= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ule i8 %neg_x, %x => %r = icmp sle i8 %x, 0 https://rise4fun.com/Alive/V22 https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:36 +03:00
Roman Lebedev	132be1f502	[InstCombine] (-NSW x) u< x --> x s< 0 (PR39480) Name: (-x) u< x --> x s< 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ult i8 %neg_x, %x => %r = icmp slt i8 %x, 0 https://rise4fun.com/Alive/zSuf https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:36 +03:00
Roman Lebedev	0e1241a3c9	[InstCombine] (-NSW x) u>= x --> x s>= 0 (PR39480) Name: (-x) u>= x --> x s>= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp uge i8 %neg_x, %x => %r = icmp sge i8 %x, 0 https://rise4fun.com/Alive/LLHd https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	16c642fa39	[InstCombine] (-NSW x) u> x --> x s> 0 (PR39480) Name: (-x) u> x --> x s> 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ugt i8 %neg_x, %x => %r = icmp sgt i8 %x, 0 https://rise4fun.com/Alive/Raea https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00
Roman Lebedev	59387c0dd7	[InstCombine] (-NSW x) s<= x --> x s>= 0 (PR39480) Name: (-x) s<= x --> x >= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp sle i8 %neg_x, %x => %r = icmp sge i8 %x, 0 https://rise4fun.com/Alive/91k https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:35 +03:00

1 2 3 4 5 ...

15878 Commits