Commit Graph

10443 Commits

Author SHA1 Message Date
Artur Pilipenko 6ec2c5e402 GC-parseable element atomic memcpy/memmove
This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.

See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ

Differential Revision: https://reviews.llvm.org/D88861
2020-10-23 14:06:09 -07:00
Chen Zheng 1e0b6c1df0 [LSR] ignore profitable chain when reg num is not major cost.
Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D89665
2020-10-23 09:35:48 -04:00
Caroline Concatto 2415636475 [SVE]Clarify TypeSize comparisons in llvm/lib/Transforms
Use isKnownXY comparators when one of the operands can be with
scalable vectors or getFixedSize() for all the other cases.

This patch also does bug fixes for getPrimitiveSizeInBits by using
getFixedSize() near the places with the TypeSize comparison.

Differential Revision: https://reviews.llvm.org/D89703
2020-10-23 09:15:17 +01:00
Max Kazantsev 6e574abf61 [SCEV][NFC] Cache symbolic max exit count
We want to have a caching version of symbolic BE exit count
rather than recompute it every time we need it.

Differential Revision: https://reviews.llvm.org/D89954
Reviewed By: nikic, efriedma
2020-10-23 12:29:37 +07:00
Nikita Popov 3e37543111 [MemCpyOpt] Move GEP during call slot optimization
When performing a call slot optimization to a GEP destination, it
will currently usually fail, because the GEP is directly before the
memcpy and as such does not dominate the call. We should move it
above the call if that satisfies the domination requirement.

I think that a constant-index GEP is the only useful thing to move
here, as otherwise isDereferenceablePointer couldn't look through
it anyway. As such I'm not trying to generalize this further.

Differential Revision: https://reviews.llvm.org/D89623
2020-10-22 20:40:56 +02:00
Serguei Katkov 75d0e0cd5f [IRCE] consolidate profitability check
Use BFI if it is available and BPI otherwise.
This is a promised follow-up after D89541.

Reviewers: ebrevnov, mkazantsev
Reviewed By: ebrevnov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D89773
2020-10-22 11:26:45 +07:00
Zequan Wu 2f29341114 Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation""
This reverts commit 716f7636e1.
2020-10-21 17:08:56 -07:00
Zequan Wu 716f7636e1 Revert "SimplifyCFG: Clean up optforfuzzing implementation"
See discussion: https://reviews.llvm.org/D89590
This reverts commit cdd006eec9.
2020-10-21 16:56:32 -07:00
Arthur Eubanks aa6c305344 [LowerMatrixIntrinsics][NewPM] Fix PreservedAnalyses result
PreservedCFGCheckerInstrumentation was saying that LowerMatrixIntrinsics
didn't properly preserve CFG even though it claimed to. The legacy pass
says it doesn't. Match the legacy pass's preserved analyses.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89175
2020-10-21 12:42:16 -07:00
Artur Pilipenko e8cce5ad89 [RS4GC] NFC. Preparatory refactoring to make GC parseable memcpy
For GC parseable element atomic memcpy/memmove we'll need to
shuffle statepoint arguments. Make it possible by storing the
arguments as Value *, not Use *.
2020-10-21 12:38:20 -07:00
Nicolai Hähnle 848a68a032 DomTree: Extract (mostly) read-only logic into type-erased base classes
Avoid having to instantiate and compile a subset of the dominator tree logic
separately for each node type. More importantly, this allows generic
algorithms to be built on top of dominator trees without writing them as
templates -- such algorithms can now use opaque CfgBlockRef and
CfgInterface instead.

A type-erased implementation of dominator trees could be written in
terms of CfgInterface as well, but doing so would change the current
trade-off: it would slightly reduce code size at the cost of a slight
runtime overhead.

This patch does not change the trade-off, as it only does type-erasure
where basic blocks can be treated in a fully opaque way, i.e. it only
moves methods that don't require iteration over CFG successors and
predecessors.

v5:
- rename generic_{begin,end,children} back without the generic_ prefix
  and refer explictly to base class methods in NewGVN, which wants to
  mutate the order of dominator tree node children directly

v6:
- style change: iDom -> idom; it's arguable whether this is really
  invalid, since it is actually standard camelCase, but clang-tidy
  complains about it so... *shrug*
- rename {to,from}Generic -> {wrap,unwrap}Ref

Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672

Differential Revision: https://reviews.llvm.org/D83089
2020-10-20 19:53:07 +02:00
Florian Hahn 2e58010208 [DSE] Do not scan users of memory terminators for further reads.
isMemTerminator checks if the current def is a memory terminator that
terminates the memory pointed to by DefLoc. We do not have to add any of
their users to the worklist, because the follow-on users cannot read the
memory in question.

This leads to more stores eliminated in the presence of lifetime calls.
Previously we added the users of those intrinsics to the worklist,
limiting elimination.

In terms of removed stores, this gives a nice boost on some benchmarks
(MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3):

Same hash: 205 (filtered out)
Remaining: 32
Metric: dse.NumFastStores

Program                                          base   patch   diff
 test-suite...000/197.parser/197.parser.test     4.00    8.00  100.0%
 test-suite...rolangs-C++/family/family.test     4.00    7.00  75.0%
 test-suite...marks/7zip/7zip-benchmark.test   1722.00 2189.00 27.1%
 test-suite...CFP2000/177.mesa/177.mesa.test    30.00   38.00  26.7%
 test-suite :: External/Nurbs/nurbs.test        44.00   49.00  11.4%
 test-suite...lications/sqlite3/sqlite3.test   115.00  128.00  11.3%
 test-suite...006/447.dealII/447.dealII.test   2715.00 3013.00 11.0%
 test-suite...ProxyApps-C++/CLAMR/CLAMR.test   237.00  261.00  10.1%
 test-suite...tions/lambda-0.1.3/lambda.test    40.00   44.00  10.0%
 test-suite...3.xalancbmk/483.xalancbmk.test   1366.00 1475.00  8.0%
 test-suite...abench/jpeg/jpeg-6a/cjpeg.test    13.00   14.00   7.7%
 test-suite...oxyApps-C++/miniFE/miniFE.test    43.00   46.00   7.0%
 test-suite...lications/ClamAV/clamscan.test   230.00  246.00   7.0%
 test-suite...006/450.soplex/450.soplex.test   284.00  299.00   5.3%
 test-suite...nsumer-jpeg/consumer-jpeg.test    21.00   22.00   4.8%
2020-10-20 16:55:22 +01:00
Florian Hahn 6439fde6d4 [DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem.
This change should currently not have any impact, but guard against
further inconsistencies between MemoryLocation and function attributes.
2020-10-20 14:37:53 +01:00
Serguei Katkov 38799975ce [IRCE] Do not transform if loop has small number of iterations
IRCE has some overhead for runtime checks and in case number of iteration is small
the overhead can kill the benefit from optimizations.

This CL bases on BlockFrequencyInfo of pre-header and header to estimate the
number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop.

Probably it is better to make more complex cost model but for simplicity it seems the be enough.

The usage of BFI is added only for new pass manager and tries to use it efficiently.

Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev
Reviewed By: mkazantsev
Subscribers: llvm-commits, fhahn
Differential Revision: https://reviews.llvm.org/D89541
2020-10-20 10:33:59 +07:00
Roman Lebedev e0567582b8
[NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer
The main tricky thing here is forward-declaring the enum:
we have to specify it's underlying data type.

In particular, this avoids the danger of switching over the SCEVTypes,
but actually switching over an integer, and not being notified
when some case is not handled.

I have updated most of such switches to be exaustive and not have
a default case, where it's pretty obvious to be the intent,
however not all of them.
2020-10-20 00:10:22 +03:00
Roman Lebedev d083d55c2c
[NFC][SCEV] Rename SCEVCastExpr into SCEVIntegralCastExpr
All existing SCEV cast types operate on integers.
D89456 will add SCEVPtrToIntExpr cast expression type.
I believe this is best for consistency.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89455
2020-10-19 10:59:53 +03:00
Florian Hahn f5cf7f544b [DSE] Do not consider 'noop' intrinsics as read-clobbers.
isNoopIntrinsic returns true for some intrinsics that are modeled in
MemorySSA but do not actually read or write any memory and do not block
DSE. Such intrinsics should not be considered as read-clobbers.
2020-10-18 15:51:05 +01:00
Nikita Popov 50cc9a0e61 [MemCpyOpt] Extract common function for unwinding check
These two cases should be using the same logic. Not NFC, as this
resolves the TODO regarding use of the underlying object.
2020-10-17 15:30:39 +02:00
Pedro Tammela 60b19424bb [NFC] fix some typos in LoopUnrollPass
This patch fixes a couple of typos in the LoopUnrollPass.cpp comments

Differential Revision: https://reviews.llvm.org/D89603
2020-10-17 14:20:55 +01:00
Benjamin Kramer b740899c50 [Indvars][NFCI] Simplify assertion.
This should be semantically identical. Also avoids unused variable
warnings in Release builds.
2020-10-16 19:58:55 +02:00
Max Kazantsev 0857029011 [Indvars][NFC] Merge two functions together
Logic of widenWithVariantUse is split into check and transform
part, unlike any other transform in IndVars. We want to pass some
extra flags from analysis to transform part and standartize
the code at once, so merging them together.
2020-10-16 19:21:57 +07:00
Max Kazantsev bb39372e5e [Indvars][NFCI] Remove meaningless restrictive code in IndVars
Variable ExtendOperExpr only exists to check whether it is a SCEV ext.
We create it as SCEV ext right here, so semantically this check is
trivially true. In theory, it may fail if SCEV is smart enough and can
simplify the expression. However, no matter whether it is an ext or not,
we never use this fact for further reasoning. So this code is currently
useless and in theory may become harmful with SCEV's development.

We do not expect any behavior changes with removing it. If it caused
negative changes, the patch should be reverted.
2020-10-16 18:04:31 +07:00
Max Kazantsev 0ee0c7dcc3 [Indvars][NFC] Remove duplicating checks
Some facts have already been checked in widenWithVariantUse and then
checked again in widenWithVariantUseCodegen. The latter is redundant,
we can replace it with asserts.
2020-10-16 17:35:14 +07:00
Florian Hahn 51ff04567b Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
After investigation by @asbirlea, the issue that caused the
revert appears to be an issue in the original source, rather
than a problem with the compiler.

This patch enables MemorySSA DSE again.

This reverts commit 915310bf14.
2020-10-16 09:02:53 +01:00
Florian Hahn 89c0124273 [LoopVersion] Unify SCEVChecks and alias check handling (NFC).
This is an initial cleanup of the way LoopVersioning interacts with LAA.

Currently LoopVersioning has 2 ways of initializing things:

1. Passing LAI and passing UseLAIChecks = true
2. Passing UseLAIChecks = false, followed by calling setSCEVChecks and
   setAliasChecks.

Both ways of initializing lead to the same result and the duplication
seems more complicated than necessary.

This patch removes the UseLAIChecks flag from the constructor and the
setSCEVChecks & setAliasChecks helpers and move initialization
exclusively to the constructor.

This simplifies things, by providing a single way to initialize
LoopVersioning and reducing duplication.

Reviewed By: Meinersbur, lebedev.ri

Differential Revision: https://reviews.llvm.org/D84406
2020-10-15 22:02:17 +01:00
Juneyoung Lee 9b3c2a72e4 [ValueTracking] Use assume's noundef operand bundle
This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89219
2020-10-14 20:16:33 +09:00
Nikita Popov 3b31f05372 [LICM] Don't require AST in LoopPromoter (NFC)
While promotion currently always has an AST available, it is only
relevant for invalidation purposes in LoopPromoter, so we do not
need to have it as a hard dependency.
2020-10-13 22:08:49 +02:00
Nikita Popov cd6f40f432 [MemCpyOpt] Add test scaffolding for MSSA based MemCpyOpt
This adds an -enable-memcpyopt-memoryssa option that currently does
nothing apart from requiring MSSA as a dependency. The tests are
split to run both with the option disabled and enabled. I went with
this rather than the separate directory DSE uses, as I found it
convenient to have a direct side-by-side comparison of differences.

Differential Revision: https://reviews.llvm.org/D89206
2020-10-13 21:45:05 +02:00
Nikita Popov e79ca751fc [MemCpyOpt] Fix MemorySSA preservation
moveUp() moves instructions, so we should move the corresponding
memory accesses as well. We should also move the store instruction
itself: Even though we'll end up removing it later, this gives us
a correct MemoryDef to replace.

The implementation is somewhat more complicated than it should be,
because we also handle the case where P does not have a memory
access due to a degnerate AA pipeline. Hopefully, the need for this
will go away in the future, when the rest of the pass is based on
MSSA.

Differential Revision: https://reviews.llvm.org/D88778
2020-10-13 21:39:09 +02:00
Nikita Popov baa3b87015 [MemCpyOpt] Don't shorten memset if memcpy operands may be the same
If the memcpy operands are the same (which is allowed since D86815)
then the memcpy is effectively a no-op and the partially overlapping
memset is not dead.

Differential Revision: https://reviews.llvm.org/D89192
2020-10-13 21:19:19 +02:00
Nikita Popov 39c39e8a7f [MemCpyOpt] Don't shorten memset if destination observable through unwinding
MemCpyOpt can shorten a memset if it is later partially overwritten
by a memcpy. It checks that the destination is not read in between,
but we also need to make sure that the destination cannot be observed
via unwinding.

Differential Revision: https://reviews.llvm.org/D89190
2020-10-13 21:12:19 +02:00
Nikita Popov 6713332fdd [LoopVersioningLICM] Fix noalias metadata emission
The previous code added the scope on each iteration, so that the
same scope was represented many times in the same !noalias metadata.
That's legal, and semantically equivalent to only storing the scope
once, but it's also wasteful and may pessimize further optimization
if AATags get intersected naively, as done by the AliasSetTracker.
2020-10-13 18:58:05 +02:00
Nikita Popov 5e855f1e80 [MemCpyOpt] Don't hoist store that's not guaranteed to execute
MemCpyOpt can hoist stores while load+store pairs into memcpy.
This hoisting can currently result in stores being executed that
weren't guaranteed to execute in the original problem.

Differential Revision: https://reviews.llvm.org/D89154
2020-10-10 10:26:28 +02:00
Changpeng Fang f192a27ed3 Sink: Handle instruction sink when a user is dead
Summary:
  The current instruction sink pass uses findNearestCommonDominator of all users to find block to sink the instruction to.
However, a user may be in a dead block, which will result in unexpected behavior.

This patch handles such cases by skipping dead blocks. This patch fixes:
https://bugs.llvm.org/show_bug.cgi?id=47415

Reviewers:
  MaskRay, arsenm

Differential Revision:
  https://reviews.llvm.org/D89166
2020-10-09 16:20:26 -07:00
Eli Friedman 278299b0f0 [SCCP] Reduce the number of times ResolvedUndefsIn is called for large modules.
If a module has many values that need to be resolved by
ResolvedUndefsIn, compilation takes quadratic time overall. Solve should
do a small amount of work, since not much is added to the worklists each
time markOverdefined is called. But ResolvedUndefsIn is linear over the
length of the function/module, so resolving one undef at a time is
quadratic in general.

To solve this, make ResolvedUndefsIn resolve every undef value at once,
instead of resolving them one at a time. This loses a little
optimization power, but can be a lot faster.

We still need a loop around ResolvedUndefsIn because markOverdefined
could change the set of blocks that are live. That should be uncommon,
hopefully. We could optimize it by tracking which blocks transition from
dead to live, instead of iterating over the whole module to find them.
But I'll leave that for later. (The whole function will become a lot
simpler once we start pruning branches on undef.)

The regression test changes seem minor. The specific cases in question
could probably be optimized with a bit more work, but they seem like
edge cases that don't really matter.

Fixes an "infinite" compile issue my team found on an internal workoad.

Differential Revision: https://reviews.llvm.org/D89080
2020-10-09 15:24:16 -07:00
Arthur Eubanks 9c21c6c966 [LoopInterchange][NewPM] Port -loop-interchange to NPM
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D89058
2020-10-09 09:21:31 -07:00
Max Kazantsev 225df71951 [NFC] Add option to disable IV widening if needed
IV widening is sometimes a strictly harmful transform (some examples
of this are shown in tests 11, 12 in widen-loop-comp.ll). One of the
reasons of this is that sometimes SCEV fails to prove some facts after
part of guards has been widened.

Though each single such case looks like a bug that can be addressed,
it seems that disabling of IV widening may be profitable in some cases.
We want to have an option to do so. By default, existing behavior is
preserved and IV widening is on.
2020-10-09 18:32:03 +07:00
Simon Pilgrim 6aa10ae5bf [Transforms] visitCmpBlock - don't dereference a dyn_cast<>. NFCI.
Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail.

Prevents clang static analyzer warning that we could deference a null pointer.
2020-10-08 20:18:32 +01:00
Markus Lavin 06758c6a61 [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.

Includes fix for the nullptr deref that caused the original commit
to be reverted in 9d63029770.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-10-08 13:16:43 +02:00
Max Kazantsev fba42aea43 [NFC] Use getZero instead of getConstant(0) 2020-10-07 13:53:36 +07:00
Roman Lebedev 7fa503ef4a
[SROA] rewritePartition()/findCommonType(): if uses have conflicting type, try getTypePartition() before falling back to largest integral use type (PR47592)
And another step towards transformss not introducing inttoptr and/or
ptrtoint casts that weren't there already.

In this case, when load/store uses have conflicting types,
instead of falling back to the iN, we can try to use allocated sub-type.
As disscussed, this isn't the best idea overall (we shouldn't rely on
allocated type), but it works fine as a temporary measure.

I've measured, and @ `-O3` as of vanilla llvm test-suite + RawSpeed,
this results in +0.05% more bitcasts, -5.51% less inttoptr
and -1.05% less ptrtoint (at the end of middle-end opt pipeline)

See https://bugs.llvm.org/show_bug.cgi?id=47592

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D88788
2020-10-07 09:20:19 +03:00
Nikita Popov 616f545048 [MemCpyOpt] Use dereferenceable pointer helper
The call slot optimization has some home-grown code for checking
whether the destination is dereferenceable. Replace this with the
generic isDereferenceableAndAlignedPointer() helper.

I'm not checking alignment here, because that is currently handled
separately and may be an enforced alignment for allocas. The clean
way of integrating that part would probably be to accept a callback
in isDereferenceableAndAlignedPointer() for the actual isAligned check,
which would then have a chance to use an enforced alignment instead.

This allows the destination to be a GEP (among other things), though
the two open TODOs may prevent it from working in practice.

Differential Revision: https://reviews.llvm.org/D88805
2020-10-06 18:41:19 +02:00
Nikita Popov 6b441ca523 [MemCpyOpt] Check for throwing calls during call slot optimization
When performing call slot optimization for a non-local destination,
we need to check whether there may be throwing calls between the
call and the copy. Otherwise, the early write to the destination
may be observable by the caller.

This was already done for call slot optimization of load/store,
but not for memcpys. For the sake of clarity, I'm moving this check
into the common optimization function, even if that does need an
additional instruction scan for the load/store case.

As efriedma pointed out, this check is not sufficient due to
potential accesses from another thread. This case is left as a TODO.

Differential Revision: https://reviews.llvm.org/D88799
2020-10-06 18:24:40 +02:00
Nikita Popov 80cde02e85 [MemCpyOpt] Add separate statistic for call slot optimization (NFC) 2020-10-06 18:14:10 +02:00
Serguei Katkov b988898013 [GVN LoadPRE] Extend the scope of optimization by using context to prove safety of speculation
Use context to prove that load can be safely executed at a point where load is being hoisted.

Postpone the decision about safety of speculative load execution till the moment we know
where we hoist load and check safety at that context.

Reviewers: nikic, fhahn, mkazantsev, lebedev.ri, efriedma, reames
Reviewed By: reames, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D88725
2020-10-06 09:25:16 +07:00
Nikita Popov 9d63029770 Revert "[DebugInfo] Improve dbg preservation in LSR."
This reverts commit a3caf7f610.

The ReleaseLTO-g test-suite configuration has been failing
to build since this commit, because clang segfaults while
building 7zip.
2020-10-05 19:02:30 +02:00
Markus Lavin a3caf7f610 [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-10-05 09:55:16 +02:00
Nikita Popov fbf818724f [MemCpyOpt] Make moveUp() a member method (NFC)
So we don't have to pass through more parameters in the future.
2020-10-03 11:28:49 +02:00
Nikita Popov 94704ed008 [MemCpyOpt] Add helper to erase instructions (NFC)
Next to erasing the instruction, we also always want to remove
it from MSSA and MD. Use a common function to do so.

This is a refactoring split out from D26739.
2020-10-02 21:52:10 +02:00
Nikita Popov 87b63c1726 [MemCpyOpt] Avoid double invalidation (NFCI)
The removal of the cpy instruction is left to the caller of
performCallSlotOptzn(), including the invalidation of MD. Both
call-sites already do this.

Also handle incrementation of NumMemCpyInstr consistently at the
call-site. One of the call-site was already doing this, which
ended up incrementing the statistic twice.

This fix was part of D26739.
2020-10-02 21:50:46 +02:00