Commit Graph

28237 Commits

Author SHA1 Message Date
Sanjay Patel 14eefa57f2 [InstCombine] factorize min/max intrinsic ops with common operand (2nd try)
This is a re-try of 6de1dbbd09 which was reverted because
it missed a null check. Extra test for that failure added.

Original commit message:
This is an adaptation of D41603 and another step on the way
to canonicalizing to the intrinsic forms of min/max.

See D98152 for status.
2021-08-12 16:32:07 -04:00
Amy Huang 427520a8fa Revert "[InstCombine] factorize min/max intrinsic ops with common operand"
This reverts commit 6de1dbbd09 because it causes a
compiler crash.
2021-08-12 12:36:25 -07:00
Florian Hahn f999312872
Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store."
This reverts the revert 28c04794df.

The failing MLIR test that caused the revert should be fixed  in this
version.

Also includes a PPC test fix previously in 1f87c7c478.
2021-08-12 18:31:57 +01:00
Roman Lebedev f30a7dff8a
[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological
We really shouldn't deal with a conditional branch that can be trivially
constant-folded into an unconditional branch.

Indeed, barring failure to trigger BB reprocessing, that should be true,
so let's assert as much, and hope the assertion never fires.
If it does, we have a bug to fix.
2021-08-12 20:03:09 +03:00
Roman Lebedev 628f63d3d5
[SimplifyCFG] If FoldTwoEntryPHINode() changed things, restart
Mainly, i want to add an assertion that `SimplifyCFGOpt::simplifyCondBranch()`
doesn't get asked to deal with non-unconditional branches,
and if i do that, then said assertion fires on existing tests,
and this is what prevents it from firing.
2021-08-12 20:03:09 +03:00
Sanjay Patel 790c29ab86 [InstCombine] fold umax/umin intrinsics based on demanded bits
This is a direct translation of the select folds added with
D53033 / D53036 and another step towards canonicalization
using the intrinsics (see D98152).
2021-08-12 12:37:45 -04:00
maekawatoshiki dd3eea6566 [LICM] Support sinking in LNICM
Currently, LNICM pass does not support sinking instructions out of loop nest.
This patch enables LNICM to sink down as many instructions to the exit block of outermost loop as possible.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D107219
2021-08-13 00:56:26 +09:00
Sanjay Patel cd44cc86e3 [InstCombine] remove unused function argument; NFC
This was just added with 6de1dbbd09 , and I missed
pulling the extra arg from the final revision.
2021-08-12 11:47:25 -04:00
Johannes Doerfert 4e7d7cae67 [Attributor][FIX] Do not try to rewrite functions with casted call sites
If we cast a function at the call site it is hard(er) to get the rewrite
correct, let's not attempt it for now.

Fixes PR51448.
2021-08-12 10:39:53 -05:00
Johannes Doerfert 5f543919b2 [Attributor][FIX] Guard constant casts with type size checks 2021-08-12 10:39:53 -05:00
Johannes Doerfert a420f80bf1 [Attributor] Do not delete volatile stores to null/undef
See D106309.

Differential Revision: https://reviews.llvm.org/D107906
2021-08-12 10:39:52 -05:00
Sanjay Patel be0698559b [InstCombine] remove shl(neg x), y transform
This diff was accidentally committed with:
1b5a195845
2021-08-12 11:27:22 -04:00
Sanjay Patel 6de1dbbd09 [InstCombine] factorize min/max intrinsic ops with common operand
This is an adaptation of D41603 and another step on the way
to canonicalizing to the intrinsic forms of min/max.

See D98152 for status.
2021-08-12 11:19:09 -04:00
Sanjay Patel 1b5a195845 [InstCombine] add tests for factorization of min/max intrinsics; NFC 2021-08-12 11:19:09 -04:00
Liqiang Tao 422fc5603a [llvm][Inline] Refactor out InlineOrder
Move InlineOrder to separated file.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D107831
2021-08-12 22:19:53 +08:00
Mehdi Amini 28c04794df Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store."
This reverts commit a1ef81de35.

Broke the MLIR buildbot.
2021-08-12 11:57:19 +00:00
Florian Hahn a1ef81de35
[Matrix] Overload stride arg in matrix.columnwise.load/store.
This patch adjusts the intrinsics definition of
llvm.matrix.column.major.load and llvm.matrix.column.major.store to
allow overloading the type of the stride. The bitwidth of the stride is
used to perform the offset computation.

This fixes a crash when using __builtin_matrix_column_major_load or
__builtin_matrix_column_major_store on 32 bit platforms. The stride argument
of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit
platforms.

Note that we still perform offset computations with 64 bit width on 32
bit platforms for accesses that do not take a user-specified stride.
This can be fixed separately.

Fixes PR51304.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D107349
2021-08-12 10:45:25 +01:00
Christudasan Devadasan 5d940b71ae Reapply "SROA: Enhance speculateSelectInstLoads"
Originally committed as ffc3fb665d
Reverted in fcf2d5f402 due to an
assertion failure.

Original commit message:

Allow the folding even if there is an
intervening bitcast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106667
2021-08-11 22:58:54 -04:00
Akira Hatanaka 643ce61fb3 [ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the
release call

findSafeStoreForStoreStrongContraction checks whether it's safe to move
the release call to the store by inspecting all instructions between the
two, but was ignoring retain instructions. This was causing objects to
be released and deallocated before they were retained.

rdar://81668577
2021-08-11 13:50:19 -07:00
Sanjay Patel a0a9c9e188 [InstCombine] avoid breaking up min/max (cmp+sel) idioms
This is a quick fix for a motivating case that looks like this:
https://godbolt.org/z/GeMqzMc38

As noted, we might be able to restore the min/max patterns
with select folds, or we just wait for this to become easier
with canonicalization to min/max intrinsics.
2021-08-11 12:48:11 -04:00
Yolanda Chen 8fa16cc628 [LTO][lld] Add lto-pgo-warn-mismatch option
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX)
due to source changes (e.g. `#if` code runs for profile generation but not for profile use)
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.

Add "lto-pgo-warn-mismatch" option to lld COFF/ELF to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.

Differential Revision: https://reviews.llvm.org/D104431
2021-08-11 09:45:55 -07:00
Wang, Pengfei 6c4809825d Revert "[lld] Add lto-pgo-warn-mismatch option"
This reverts commit 0cfb00a1c9.
2021-08-11 16:25:42 +08:00
Yolanda Chen 0cfb00a1c9 [lld] Add lto-pgo-warn-mismatch option
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX).
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.

Add this "lto-pgo-warn-mismatch" option to lld to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D104431
2021-08-11 14:43:26 +08:00
Petr Hosek 389dc94d4b [InstrProfiling] Generate runtime hook for Fuchsia
When none of the translation units in the binary have been instrumented
we shouldn't need to link the profile runtime. However, because we pass
-u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still
be pulled in and incur some overhead. On Fuchsia which uses runtime
counter relocation, it also means that we cannot reference the bias
variable unconditionally.

This change modifies the InstrProfiling pass to pull in the profile
runtime only when needed by declaring the __llvm_profile_runtime symbol
in the translation unit only when needed. For now we restrict this only
for Fuchsia, but this can be later expanded to other platforms. This
approach was already used prior to 9a041a7522, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation,
but that limitation may no longer apply, and it certainly doesn't apply
on platforms like Fuchsia.

Differential Revision: https://reviews.llvm.org/D98061
2021-08-10 23:21:15 -07:00
Petr Hosek c0c1c3cf93 Revert "[InstrProfiling] Emit bias variable eagerly"
This reverts commit 6660cec568 since
it was superseded by https://reviews.llvm.org/D98061.
2021-08-10 23:21:15 -07:00
Johannes Doerfert fc32a5c87d [Attributor][NFC] Try to make the windows build bots happy
Failed for some reason, potentially because of the inner type
declaration in combination with the `using`. This might help.

Failure:
https://lab.llvm.org/buildbot/#/builders/127/builds/15432
2021-08-11 01:11:37 -05:00
Johannes Doerfert e7e3585cde [Attributor][FIX] Handle recurrences (PHIs) in AAPointerInfo explicitly
PHI nodes are not pass through but change their value, we have to
account for that to avoid missing stores.

Follow up for D107798 to fix PR51249 for good.

Differential Revision: https://reviews.llvm.org/D107808
2021-08-11 00:49:54 -05:00
Johannes Doerfert 96da6dd6ba [Attributor][FIX] Only avoid visiting PHI uses multiple times (PR51249)
AAPointerInfoFloating needs to visit all uses and some multiple times if
we go through PHI nodes. Attributor::checkForAllUses keeps a visited set
so we don't recurs endlessly. We now allow recursion for non-phi uses so
we track all pointer offsets via PHI nodes properly without endless
recursion.

This replaces the first attempt D107579.

Differential Revision: https://reviews.llvm.org/D107798
2021-08-11 00:49:54 -05:00
Johannes Doerfert e0c5d83a92 [OpenMP][FIX] Disabled optimizations have to be made known
To avoid simplification with wrong constants we need to make sure we
know that we won't perform specific optimizations based on the users
request. The non-SPMDzation and non-CustomStateMachine flags did only
prevent the final transformation but allowed to value simplification
to go ahead.

Differential Revision: https://reviews.llvm.org/D107862
2021-08-11 00:49:53 -05:00
Christopher Di Bella c874dd5362 [llvm][clang][NFC] updates inline licence info
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.

Differential Revision: https://reviews.llvm.org/D107528
2021-08-11 02:48:53 +00:00
Adrian Prantl a353edb8d6 Simplify coro::salvageDebugInfo() (NFC-ish)
This patch removes the hand-rolled implementation of salvageDebugInfo
for cast and GEPs and replaces it with a call into
llvm::salvageDebugInfoImpl().

A side-effect of this is that additional redundant convert operations
are introduced, but those don't have any negative effect on the
resulting DWARF expression.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D107384
2021-08-10 15:21:18 -07:00
Adrian Prantl d6b6880172 Streamline the API of salvageDebugInfoImpl (NFC)
This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.

  1. Change the return value to I.getOperand(0). Currently users of
     salvageDebugInfoImpl() assume that the first operand is
     I.getOperand(0). This patch makes this information explicit. A
     nice side-effect of this change is that it allows us to salvage
     expressions such as add i8 1, %a in the future.

  2. Factor out the creation of a DIExpression and return an array of
     DIExpression operations instead. This change allows users that
     call salvageDebugInfoImpl() in a loop to avoid the costly
     creation of temporary DIExpressions and to defer the creation of
     a DIExpression until the end.

This patch does not change any functionality.

rdar://80227769

Differential Revision: https://reviews.llvm.org/D107383
2021-08-10 15:21:18 -07:00
Nikita Popov 17db125b48 [MemCpyOpt] Optimize MemoryDef insertion
When converting a store into a memset, we currently insert the new
MemoryDef after the store MemoryDef, which requires all uses to be
renamed to the new def using a whole block scan. Instead, we can
insert the new MemoryDef before the store and not rename uses,
because we know that the location is immediately overwritten, so
all uses should still refer to the old MemoryDef. Those uses will
get renamed when the old MemoryDef is actually dropped, which is
efficient.

I expect something similar can be done for some of the other MSSA
updates in MemCpyOpt. This is an alternative to D107513, at least
for this particular case.

Differential Revision: https://reviews.llvm.org/D107702
2021-08-10 21:28:29 +02:00
Sanjay Patel b267d3ce8d [InstCombine] avoid infinite loops from min/max canonicalization
The intrinsics have an extra chunk of known bits logic
compared to the normal cmp+select idiom. That allows
folding the icmp in each case to something better, but
that then opposes the canonical form of min/max that
we try to form for a select.

I'm carving out a narrow exception to preserve all
existing regression tests while avoiding the inf-loop.
It seems unlikely that this is the only bug like this
left, but this should fix:
https://llvm.org/PR51419
2021-08-10 14:42:37 -04:00
Carl Ritson a1783b54e8 [SimpifyCFG] Remove recursion from FoldCondBranchOnPHI. NFCI.
Avoid stack overflow errors on systems with small stack sizes
by removing recursion in FoldCondBranchOnPHI.

This is a simple change as the recursion was only iteratively
calling the function again on the same arguments.
Ideally this would be compiled to a tail call, but there is
no guarantee.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107803
2021-08-10 19:14:31 +09:00
David Sherwood ce394161cb [InstCombine] Add more complex folds for extractelement + stepvector
I have updated cheapToScalarize to also consider the case when
extracting lanes from a stepvector intrinsic. This required removing
the existing 'bool IsConstantExtractIndex' and passing in the actual
index as a Value instead. We do this because we need to know if the
index is <= known minimum number of elements returned by the stepvector
intrinsic. Effectively, when extracting lane X from a stepvector we
know the value returned is also X.

New tests added here:

  Transforms/InstCombine/vscale_extractelement.ll

Differential Revision: https://reviews.llvm.org/D106358
2021-08-10 09:17:21 +01:00
Arnold Schwaighofer b987c283ae [coro] Correct CurrentBlock tracking bug recently introduced
We use the CurrentBlock to determine whether we have already processed a
block. Don't reuse this variable for setting where we should insert the
rematerialization. The rematerialization block is different to the
current block when we rematerialize for coro suspend block users.

Differential Revision: https://reviews.llvm.org/D107573
2021-08-09 10:41:41 -07:00
Christudasan Devadasan fcf2d5f402 Revert "SROA: Enhance speculateSelectInstLoads"
This reverts commit ffc3fb665d.
2021-08-09 01:13:39 -04:00
Michael Liao b5e470aa2e [LowerMemIntrinsics] Typo fix. 2021-08-08 22:38:58 -04:00
Dorit Nuzman 67278b8a90 [LV] Support Interleaved Store Group With Gaps
Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).

The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.

Reviewed by: Ayal Zaks

Differential Revision: https://reviews.llvm.org/D104750
2021-08-08 10:32:02 +03:00
Nikita Popov 88003cea1c [MemCpyOpt] Remove MemDepAnalysis-based implementation
The MemorySSA-based implementation has been enabled for a few months
(since D94376). This patch drops the old MDA-based implementation
entirely.

I've kept this to only the basic cleanup of dropping various
conditions -- the code could be further cleaned up now that there
is only one implementation.

Differential Revision: https://reviews.llvm.org/D102113
2021-08-07 22:35:44 +02:00
Krishna a9a176ca3b [InstCombine] Remove nnan requirement for transformation to fabs from select
In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs.
(i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub).
(ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D106872
2021-08-07 22:38:45 +05:30
Roman Lebedev 0a241e90d4
[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))`
Instead of expanding it ourselves,
we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2:
https://alive2.llvm.org/ce/z/ymz7zE (self)
https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext)
https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)
2021-08-07 17:31:33 +03:00
Roman Lebedev c6ff867f92
[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))`
Now that we canonicalize low bit splatting to the form we were emitting
here ourselves, emit simpler IR that will be canonicalized later.

See 1e801439be for proofs:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)
2021-08-07 17:31:24 +03:00
Roman Lebedev e71870512f
[InstCombine] Prefer `-(x & 1)` as the low bit splatting pattern (PR51305)
Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF),
so we should have a preference. It seems like mask+negation is better
than two shifts.
2021-08-07 17:25:28 +03:00
Christudasan Devadasan ffc3fb665d SROA: Enhance speculateSelectInstLoads
Allow the folding even if there is an
intervening bitcast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106667
2021-08-07 09:09:14 -04:00
Florian Hahn a00aafc30d
[VPlan] Iterate over phi recipes to detect reductions to fix.
After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.

This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100102
2021-08-07 14:06:50 +01:00
Sanjay Patel 0369714b31 [InstCombine] reduce vector casting before icmp
There may be some generalizations (see test comments) of these patterns,
but this should handle the cases motivated by:
https://llvm.org/PR51315
https://llvm.org/PR51259

The backend may want to transform differently, but at least for
the x86 examples that I looked at, there does not appear to be
any significant perf diff either way.
2021-08-06 17:09:38 -04:00
Artem Belevich 6a9cf21f5a [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.
Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.

While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.

For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.

Differential Revision: https://reviews.llvm.org/D106401
2021-08-06 11:13:52 -07:00
Michael Liao d1cacd5928 [MemCpyOpt] Teach memcpyopt to handle loads from the constant memory.
- Loads from the constant memory (either explicit one or as the source
  of memory transfer intrinsics) won't alias any stores.

Reviewed By: asbirlea, efriedma

Differential Revision: https://reviews.llvm.org/D107605
2021-08-06 12:43:52 -04:00