Commit Graph

136 Commits

Author SHA1 Message Date
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Roman Lebedev e00f189d39
[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592)
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)

This canonicalization seems dubious.

Most importantly, while it does not create `inttoptr` casts by itself,
it may cause them to appear later, see e.g. D88788.

I think it's pretty obvious that it is an undesirable outcome,
by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
are not no-op, and are no longer eager to look past them.
Which e.g. means that given
```
%a = load i32
%b = inttoptr %a
%c = inttoptr %a
```
we likely won't be able to tell that `%b` and `%c` is the same thing.

As we can see in D88789 / D88788 / D88806 / D75505,
we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
And we can't recover the situation post-inlining in instcombine.

So it really does look like this fold is actively breaking
otherwise-good IR, in a way that is not recoverable.
And that means, this fold isn't helpful in exposing the passes
that are otherwise unaware of these patterns it produces.

Thusly, i propose to simply not perform such a canonicalization.
The original motivational RFC does not state what larger problem
that canonicalization was trying to solve, so i'm not sure
how this plays out in the larger picture.

On vanilla llvm test-suite + RawSpeed, this results in
increase of asm instructions and final object size by ~+0.05%
decreases final count of bitcasts by -4.79% (-28990),
ptrtoint casts by -15.41% (-3423),
and of inttoptr casts by -25.59% (-6919, *sic*).
Overall, there's -0.04% less IR blocks, -0.39% instructions.

See https://bugs.llvm.org/show_bug.cgi?id=47592

Differential Revision: https://reviews.llvm.org/D88789
2020-10-06 00:00:30 +03:00
Roman Lebedev 03bd5198b6
[OldPM] Pass manager: run SROA after (simple) loop unrolling
I have stumbled into this pretty accidentally, when rewriting
some spaghetti-like code into something more structured,
which involved using some `std::array<>`s. And to my surprise,
the `alloca`s remained, causing about `+160%` perf regression.

https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68&to=d563e66f40f9d4d145cb2050e41cb961e2b37785&stat=instructions
suggests that this has geomean compile-time cost of `+0.08%`.

Note that D68593 / cecc0d27ad
already did this chage for NewPM, but left OldPM in a pessimized state.

This fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40011 | PR40011 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=42794 | PR42794 ]] and probably some other reports.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D87972
2020-10-04 11:53:50 +03:00
Roman Lebedev 1038ce4b6b
[NFC][PhaseOrdering] Add a test showing new inttoptr casts after SROA due to InstCombine (PR47592)
We could either try to make SROA more picky to the new type
and/or prevent InstCombine from creating the original problem (converting load-stores to operate on ints),
and/or make InstCombine recover the situation by cleaning up all that cruft.
2020-10-03 22:49:58 +03:00
Florian Hahn 6481a76495 [PhaseOrdering] Add test that requires peeling before vectorization.
Test case for PR47671.
2020-10-02 12:19:22 +01:00
Sanjay Patel 149f5b573c [APFloat] convert SNaN to QNaN in convert() and raise Invalid signal
This is an alternate fix (see D87835) for a bug where a NaN constant
gets wrongly transformed into Infinity via truncation.
In this patch, we uniformly convert any SNaN to QNaN while raising
'invalid op'.
But we don't have a way to directly specify a 32-bit SNaN value in LLVM IR,
so those are always encoded/decoded by calling convert from/to 64-bit hex.

See D88664 for a clang fix needed to allow this change.

Differential Revision: https://reviews.llvm.org/D88238
2020-10-01 14:37:38 -04:00
Florian Hahn 915310bf14 Revert "[DSE] Switch to MemorySSA-backed DSE by default."
There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.

This patch temporarily switches back to the legacy DSE
implementation, while we investigate.

This reverts commit 9d172c8e9c.
2020-09-26 18:35:27 +01:00
Sanjay Patel 2625433e77 [PhaseOrdering] move test with target requirement to x86 dir
I'm not sure if the target is actually necessary,
but since it was specified, I'm moving to the
appropriate dir to avoid bot fallout.
2020-09-24 09:54:14 -04:00
Sanjay Patel 9cf647bb3f [PhaseOrdering] move an 'opt' test from x86 codegen; NFC
This file comes from 2007, and I'm not entirely sure of the
motivation, but it was going through all of opt and llc.
The llc part is almost certainly unnecessary as shown in
the now auto-generated FileCheck lines.

This test may be affected by a logic change suggested in:
D87835
2020-09-24 09:47:38 -04:00
Roman Lebedev bb6f4d32aa
[NFC][PhaseOrdering] Add test showing SROA not being performed after loop unrolling 2020-09-19 21:18:35 +03:00
Nikita Popov 13e19d2e7c Revert "[InstCombine] Canonicalize SPF_ABS to abs intrinc"
This reverts commit 05d4c4ebc2.

mstorsjo reports a miscompile after this change in
https://reviews.llvm.org/D87188#2281093. Reverting until I can
investigate this.
2020-09-18 09:38:26 +02:00
Nikita Popov 05d4c4ebc2 [InstCombine] Canonicalize SPF_ABS to abs intrinc
Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic.

To be conservative, the one-use check on the comparison is retained,
this may be relaxed if all goes well.

It's pretty likely that this will uncover places that missing
handling for the abs() intrinsic. Please report any seen performance
regressions.

Differential Revision: https://reviews.llvm.org/D87188
2020-09-17 22:28:34 +02:00
Sanjay Patel 8985755762 [InstSimplify] add limit folds for fmin/fmax
If the constant operand is the opposite of the min/max value,
then the result must be the other value.

This is based on the similar codegen transform proposed in:
D87571
2020-09-15 10:58:44 -04:00
Simon Pilgrim e2dee9af8d [X86] Add test cases for PR11210
Demonstrates that redundant masked stores may be removed, as long as we're able to replace the AVX/AVX2 masked store with a generic masked store (constant mask or sign-extended bool vector mask).
2020-09-13 13:38:05 +01:00
Tyker 78de7297ab Reland [AssumeBundles] Use operand bundles to encode alignment assumptions
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
2020-09-12 15:36:06 +02:00
Roman Lebedev bb7d3af113
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec218.
2020-09-08 00:24:03 +03:00
Roman Lebedev 503deec218
Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with
	https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d8.
2020-08-22 00:33:22 +03:00
Tyker a79e604462 [AssumeBundles] Fix Bug in Assume Queries
this bug was causing miscompile.
now clang cant properly selfhost with -mllvm --enable-knowledge-retention

Reviewed By: jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D83507
2020-08-17 21:36:53 +02:00
Sanjay Patel 29e1d16a3e Revert "[PhaseOrdering] add test for memcpy removal (PR47114); NFC"
This reverts commit babb59496b.

This test addition was queued up with some unrelated changes,
but it seems more likely that we need to fix something internal
to -memcpyopt. Also, I'm not sure if including target-specifc
attributes in a generic regression test dir will cause bot
problems.
2020-08-16 09:52:33 -04:00
Sanjay Patel babb59496b [PhaseOrdering] add test for memcpy removal (PR47114); NFC 2020-08-16 08:53:47 -04:00
Roman Lebedev 3bd2513ebd
[NFC] Add test case showing the miscompile being fixed by D83507
See https://reviews.llvm.org/D83507
2020-08-13 16:13:33 +03:00
Anton Afanasyev a7478fab6c [SLP] Fix order of `insertelement`/`insertvalue` seed operands
Summary:
This patch takes the indices operands of `insertelement`/`insertvalue`
into account while generation of seed elements for `findBuildAggregate()`.
This function has kept the original order of `insert`s before.
Also this patch optimizes `findBuildAggregate()` preventing it from
redundant temporary vector allocations and its multiple reversing.

Fixes llvm.org/pr44067

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83779
2020-08-06 22:09:24 +03:00
Roman Lebedev 1d51dc38d8
[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108
2020-07-29 20:05:30 +03:00
Roman Lebedev b636e7d1fc
[NFC][PhaseOrdering] Add a test demonstrating pitfails of common code hoisting on loop rotation
Depending on the -rotation-max-header-size=?,
hoisting common code early makes loop rotation impossible.
2020-07-16 23:53:26 +03:00
Eric Christopher 7bfaa40086 Temporarily Revert "[AssumeBundles] Use operand bundles to encode alignment assumptions"
due to the performance bugs filed in https://bugs.llvm.org/show_bug.cgi?id=46753.

An SROA change soon may obviate some of these problems.

This reverts commit 8d09f20798.
2020-07-16 11:54:04 -07:00
Max Kazantsev 90798e09e2 Re-enable "[InstCombine] Simplify boolean Phis with const inputs using CFG"
This reverts commit b893822e32.

+ Clang test fixes
+ Insertion point fix for landing pads
2020-07-16 16:09:08 +07:00
Max Kazantsev b893822e32 Revert "[InstCombine] Simplify boolean Phis with const inputs using CFG"
This reverts commit 00472067c3.

Need to fix failing clang tests.
2020-07-16 12:58:39 +07:00
Max Kazantsev 00472067c3 [InstCombine] Simplify boolean Phis with const inputs using CFG
This patch adds simplification for pattern:
```
  if (cond)
  /       \
 ...      ...
  \       /
p = phi [true] [false]
...
br p, succ_1, succ_2
```
If we can prove that top block's branches dominate respective
inputs of a block that has a Phi with constant inputs, we can
use the branch condition (maybe inverted) instead of Phi.
This will make proofs of implication for further jump threading
more transparent.

Differential Revision: https://reviews.llvm.org/D81375
Reviewed By: xbolva00
2020-07-16 12:06:10 +07:00
Tyker 8d09f20798 [AssumeBundles] Use operand bundles to encode alignment assumptions
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.

Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71739
2020-07-14 01:05:58 +02:00
Roman Lebedev 7ea46aee36
Revert "[AssumeBundles] Use operand bundles to encode alignment assumptions"
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.

Minimal reduced reproducer: run `opt -alignment-from-assumptions` on

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }

; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
  call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
  ret i32 0
}

; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1

attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }


This is what we'd have with -mllvm -enable-knowledge-retention

This reverts commit c95ffadb24.
2020-07-04 23:49:23 +03:00
Hiroshi Yamauchi 6bd1db08e7 [InstCombine] Don't let an alignment assume prevent new/delete removals.
Remove allocations with alignment assume.

Differential Revision: https://reviews.llvm.org/D81854
2020-07-01 09:22:32 -07:00
Sanjay Patel 09b8dbf70c [PhaseOrdering][NewPM] update test that silently showed bug with SpeculativeExecutionPass; NFC
See D82735 / rG1a6cebb4d12c744699e23624f8afda5cbe216fe6
2020-06-30 14:22:20 -04:00
Sanjay Patel b6315aee5b [VectorCombine] try to form vector compare and binop to eliminate scalar ops
binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
-->
vcmp = cmp Pred X, VecC
ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0

This is a larger pattern than the existing extractelement folds because we can't
reasonably vectorize the sub-patterns with constants based on cost model calcs
(it doesn't usually make sense to replace a single extracted scalar op with
constant operand with a vector op).

I salvaged as much of the existing logic as I could, but there might be better
ways to share and reduce code.

The motivating case from PR43745:
https://bugs.llvm.org/show_bug.cgi?id=43745
...is the special case of a 2-way reduction. We tried to get SLP to handle that
particular pattern in D59710, but that caused crashing and regressions.
This patch is more general, but hopefully safer.

The v2f64 test with SSE2 surprised me - the cost model accounting looks like this:
OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4
NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5

Differential Revision: https://reviews.llvm.org/D82474
2020-06-29 10:38:52 -04:00
Sanjay Patel 2f3549f813 Revert "[VectorCombine] add test for scalable vectors; NFC"
This reverts commit 700ec6b848.
An extra test diff snuck here.
2020-06-28 12:43:11 -04:00
Sanjay Patel 700ec6b848 [VectorCombine] add test for scalable vectors; NFC 2020-06-28 12:42:00 -04:00
Sanjay Patel c336f21af5 [PhaseOrdering] delete test for vectorization; NFC
As requested in D81416, I'm deleting the file that I added with:
rGdf79443
2020-06-25 09:34:11 -04:00
Tyker c95ffadb24 [AssumeBundles] Use operand bundles to encode alignment assumptions
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.

Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71739
2020-06-25 12:59:44 +02:00
Sanjay Patel a809cea68c [PhaseOrdering] add test for missed vectorization; NFC (PR43745)
Either SLP or VectorCombine should be able to form vector compares
reliably on this example.
2020-06-23 11:57:32 -04:00
Sanjay Patel df794431e0 [PhaseOrdering] add test for vectorizer cooperation; NFC
This would potentially change with the proposal in D81416.
2020-06-23 08:58:36 -04:00
Sanjay Patel 8953ecf22b [InstCombine] reassociate diff of sums into sum of diffs
This is the integer sibling to D81491.

(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])

Removing the "experimental" from these intrinsics is likely
not too far away.
2020-06-22 20:47:09 -04:00
Sanjay Patel 98c2f4eea5 [VectorCombine] add helper to replace uses and rename
The tests are regenerated to show a path that missed renaming,
but there should be no functional difference from this patch.
2020-06-22 09:58:49 -04:00
Sanjay Patel de65b356dc [VectorCombine] add/use pass-level IRBuilder
This saves creating/destroying a builder every time we
perform some transform.

The tests show instruction ordering diffs resulting from
always inserting at the root instruction now, but those
should be benign.
2020-06-22 09:01:29 -04:00
Sanjay Patel cce625f73d [VectorCombine] improve IR debugging by providing/salvaging value names
The tests are regenerated to show the diffs, but there should be no
functional change from this patch.
2020-06-22 08:35:47 -04:00
Sanjay Patel b5fb26951a [InstCombine] reassociate FP diff of sums into sum of diffs
(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])

This should be the last step in solving PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953

We started emitting reduction intrinsics with:
D80867/ rGe50059f6b6b3
So it's a relatively easy pattern match now to re-order those ops.
Also, I have not seen any complaints for the switch to intrinsics
yet, so I'll propose to remove the "experimental" tag from the
intrinsics soon.

Differential Revision: https://reviews.llvm.org/D81491
2020-06-14 09:09:03 -04:00
Simon Pilgrim 5dc4e7c2b9 [VectorCombine] scalarizeBinop - support an all-constant src vector operand
scalarizeBinop currently folds

  vec_bo((inselt VecC0, V0, Index), (inselt VecC1, V1, Index))
  ->
  inselt(vec_bo(VecC0, VecC1), scl_bo(V0,V1), Index)

This patch extends this to account for cases where one of the vec_bo operands is already all-constant and performs similar cost checks to determine if the scalar binop with a constant still makes sense:

  vec_bo((inselt VecC0, V0, Index), VecC1)
  ->
  inselt(vec_bo(VecC0, VecC1), scl_bo(V0,extractelt(V1,Index)), Index)

Fixes PR42174

Differential Revision: https://reviews.llvm.org/D80885
2020-06-09 19:02:05 +01:00
Sanjay Patel e50059f6b6 [x86] form reduction intrinsics from vectorizers instead of raw IR
Motivating examples are seen in the PhaseOrdering tests based on:
https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we have
intrinsics there, some pass can fold them.

The intrinsics are still named "experimental" at this point, but
if there is no fallout from this patch, that will be a good
indicator that it is safe to finalize them.

Differential Revision: https://reviews.llvm.org/D80867
2020-06-05 12:38:49 -04:00
Sanjay Patel 22c4c6dd38 [PhaseOrdering] add tests for reductions; NFC (PR43953) 2020-06-05 12:38:49 -04:00
Sanjay Patel ecbf34c0e4 [PhaseOrdering] add more tests for vector reductions; NFC
More coverage for D80867.
2020-06-04 08:19:40 -04:00
Sanjay Patel 91b45fb527 [PhaseOrdering] add test for hoisting/CSE (PR46115); NFC 2020-05-31 10:34:18 -04:00
Sanjay Patel 129c501aa9 [PhaseOrdering] add scalarization test for PR42174; NFC
Motivating test for vector-combine enhancement in D80885.
Make sure that vectorization and canonicalization are
working together as expected.
2020-05-31 08:43:34 -04:00