Commit Graph

538 Commits

Author SHA1 Message Date
Philip Reames 31764ea295 [LCSSA] Extract a utility for deciding if a new use requires a new lcssa phi [NFC]
(Triggered by a review comment on D98728, but otherwise unrelated.)
2021-03-17 12:14:01 -07:00
Philip Reames 7c7f4676cd [LICM] Fix a crash when sinking instructions w/token operands
It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not.

This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well.

Differential Revision: https://reviews.llvm.org/D98728
2021-03-17 11:18:46 -07:00
Nikita Popov 403da6a69a Reapply [LICM] Make promotion faster
Relative to the previous implementation, this always uses
aliasesUnknownInst() instead of aliasesPointer() to correctly
handle atomics. The added test case was previously miscompiled.

-----

Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-11 10:50:28 +01:00
Alina Sbirlea 29482426b5 Revert "[LICM] Make promotion faster"
Revert 3d8f842712
Revision triggers a miscompile sinking a store incorrectly outside a
threading loop. Detected by tsan.
Reverting while investigating.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-08 12:53:03 -08:00
Xun Li 03f668613c [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions
See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.

This patch disable promotion by check whether loop contains coro.suspend intrinsics.
This is a resubmit of D86190.
Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose.
In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame.

Differential Revision: https://reviews.llvm.org/D96928
2021-03-03 15:21:57 -08:00
Nikita Popov 3d8f842712 [LICM] Make promotion faster
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-02 22:10:48 +01:00
Kazu Hirata 5fc9e30985 [Scalar] Use range-based for loops (NFC) 2021-02-25 19:54:38 -08:00
Quentin Colombet 905623b64d [NFC][LICM] Minor improvements to debug output
Added a utility function in Value class to print block name and use
block labels for unnamed blocks.
Changed LICM to call this function in its debug output.

Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>

Differential Revision: https://reviews.llvm.org/D93577
2021-01-11 18:02:49 -08:00
Kazu Hirata 4d92ab1669 [Transforms] Use llvm::find_if (NFC) 2021-01-09 09:24:58 -08:00
Sjoerd Meijer 1e260f955d [LICM][docs] Document that LICM is also a canonicalization transform. NFC.
This documents that LICM is a canonicalization transform, which we discussed
recently in:

http://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html

but which was also discused earlier, e.g. in:

http://lists.llvm.org/pipermail/llvm-dev/2019-September/135058.html
2020-12-08 11:56:35 +00:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Jamie Schmeiser 7f6360cdc6 Reland: Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager.  Enable memoryssa for loopsink with new pass manager.  This
combination exposed a bug that was previously fixed for loopsink
without memoryssa.  When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation.  This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.

Respond to review comments.  Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control.  Update tests accordingly.

Respond to review comments.  Add options controlling whether memoryssa is
used for loop sink, defaulting to off.  Expand testing based on these
options.

Respond to review comments.  Properly indicated preserved analyses.

This relanding addresses a compile-time performance problem by moving
test for profile data earlier to avoid unnecessary computations.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
2020-11-20 10:26:33 -05:00
Jamie Schmeiser cff479b145 Revert "Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."""
This reverts commit e29292969b.

This apparently causes a regression in compile time (ie, it slows down).
2020-11-18 16:07:16 -05:00
Jamie Schmeiser e29292969b Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.""
This reverts commit 562addba65.

Reverted change too quickly, the failing test cases passed on the next build.
So reverting revert (to include the changes).
2020-11-18 15:33:02 -05:00
Jamie Schmeiser 562addba65 Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."
This reverts commit d4ba28bddc.
2020-11-18 15:17:53 -05:00
Jamie Schmeiser d4ba28bddc Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.
Summary:
Expand existing loopsink testing to also test loopsinking using new pass
manager.  Enable memoryssa for loopsink with new pass manager.  This
combination exposed a bug that was previously fixed for loopsink
without memoryssa.  When sinking an instruction into a loop, the source
block may not be part of the loop but still needs to be checked for
pointer invalidation.  This is the fix for bugzilla #39695 (PR 54659)
expanded to also work with memoryssa.

Respond to review comments.  Enable Memory SSA in legacy Loop Sink pass
under EnableMSSALoopDependency option control.  Update tests accordingly.

Respond to review comments.  Add options controlling whether memoryssa is
used for loop sink, defaulting to off.  Expand testing based on these
options.

Respond to review comments.  Properly indicated preserved analyses.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90249
2020-11-18 14:08:42 -05:00
Jamie Schmeiser f79b483385 [NFC intended] Refactor SinkAndHoistLICMFlags to allow others to construct without exposing internals
Summary:
Refactor SinkAdHoistLICMFlags from a struct to a class with accessors and constructors to allow other
classes to construct flags with meaningful defaults while not exposing LICM internal details.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea)

Differential Revision: https://reviews.llvm.org/D90482
2020-11-12 15:06:59 +00:00
Quentin Colombet a585228027 Prevent LICM and machineLICM from hoisting convergent operations
Results of convergent operations are implicitly affected by the
enclosing control flows and should not be hoisted out of arbitrary
loops.

Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>

Differential Revision: https://reviews.llvm.org/D90361
2020-11-06 10:26:39 -08:00
Alina Sbirlea f514b32a89 [LICM] Add assert of AST/MSSA exclusiveness.
The API `canSinkOrHoistInst` may be called by LoopSink. Add assert to
avoid having two analyses passed in.
2020-11-02 18:04:43 -08:00
Nikita Popov 3b31f05372 [LICM] Don't require AST in LoopPromoter (NFC)
While promotion currently always has an AST available, it is only
relevant for invalidation purposes in LoopPromoter, so we do not
need to have it as a hard dependency.
2020-10-13 22:08:49 +02:00
Jeremy Morse 05659606a2 Revert "[gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC"
Some of the buildbots have croaked with this patch, for examples failures
that begin in this build:

  http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/29933

This reverts commit 674f57870f.
2020-09-30 09:52:12 +01:00
Vedant Kumar 674f57870f [gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC 2020-09-29 17:39:07 -07:00
Vedant Kumar dfc5a9eb57 [Instruction] Add dropLocation and updateLocationAfterHoist helpers
Introduce a helper which can be used to update the debug location of an
Instruction after the instruction is hoisted. This can be used to safely
drop a source location as recommended by the docs.

For more context, see the discussion in https://reviews.llvm.org/D60913.

Differential Revision: https://reviews.llvm.org/D85670
2020-09-24 15:00:04 -07:00
Stefanos Baziotis 89c1e35f3c [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Wenlei He 2c391a5a14 [LICM] Make Loop ICM profile aware again
D65060 was reverted because it introduced non-determinism by using BFI counts from already freed blocks. The parent of this revision fixes that by using a VH callback on blocks to prevent this from happening and makes sure BFI data is passed correctly in LoopStandardAnalysisResults.

This re-introduces the previous optimization of using BFI data to prevent LICM from hoisting/sinking if the instruction will end up moving to a colder block.

Internally at Facebook this change results in a ~7% win in a CPU related metric in one of our big services by preventing hoisting cold code into a hot pre-header like the added test case demonstrates.

Testing:
ninja check

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87551
2020-09-15 17:21:58 -07:00
Wenlei He 2ea4c2c598 [BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults
~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~

~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~

~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~
~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~
~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~

~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~

See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 .

This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes.

Testing
Ninja check

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D86156
2020-09-15 16:16:24 -07:00
David Sherwood 69cccb3189 [SVE] Fix isLoadInvariantInLoop for scalable vectors
I've amended the isLoadInvariantInLoop function to bail out for
scalable vectors for now since the invariant.start intrinsic is only
ever generated by the clang frontend for thread locals or struct
and class constructors, neither of which support sizeless types.
In addition, the intrinsic itself does not currently support the
concept of a scaled size, which makes it impossible to compare
the sizes of different scalable objects, e.g. <vscale x 32 x i8>
and <vscale x 16 x i8>.

Added new tests here:

  Transforms/LICM/AArch64/sve-load-hoist.ll
  Transforms/LICM/hoisting.ll

Differential Revision: https://reviews.llvm.org/D87227
2020-09-15 08:30:19 +01:00
Vedant Kumar 30c1633386 Revert "[Instruction] Add updateLocationAfterHoist helper"
This reverts commit 4a646ca9e2.

This is causing some bots to fail with "!dbg attachment points at wrong
subprogram for function", like:

http://lab.llvm.org:8011/builders/sanitizer-windows/builds/67958/steps/stage%201%20check/logs/stdio
2020-08-11 14:54:09 -07:00
Vedant Kumar 4a646ca9e2 [Instruction] Add updateLocationAfterHoist helper
Introduce a helper on Instruction which can be used to update the debug
location after hoisting.

Use this in GVN and LICM, where we were mistakenly introducing new line
0 locations after hoisting (the docs recommend dropping the location in
this case).

For more context, see the discussion in https://reviews.llvm.org/D60913.

Differential Revision: https://reviews.llvm.org/D85670
2020-08-11 14:05:20 -07:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Guillaume Chatelet 8dbafd24d6 [Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82977
2020-07-02 11:28:02 +00:00
Simon Pilgrim c18b753686 LoopUtils.h - reduce AliasAnalysis.h include to forward declarations. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-24 17:58:38 +01:00
Chris Jackson c6c65164af [DebugInfo] Reduce SalvageDebugInfo() functions
- Now all SalvageDebugInfo() calls will mark undef if the salvage
  attempt fails.

 Reviewed by: vsk, Orlando

 Differential Revision: https://reviews.llvm.org/D78369
2020-06-08 19:28:18 +01:00
Davide Italiano da52aa2c33 [LICM] When promoting loads to the preheader, drop the location.
It's really almost going to be misleading, see the example in
https://bugs.llvm.org/show_bug.cgi?id=45820

Maybe at some point we can do something fancier, but at least
this will fix a bug where we step on dead code while debugging.
2020-05-14 17:05:23 -07:00
Sam Parker e9c9329aa4 [TTI] Add TargetCostKind argument to getUserCost
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.

The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.

getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.

This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.

Differential Revision: https://reviews.llvm.org/D78635
2020-04-28 08:57:45 +01:00
Max Kazantsev a116f0fa86 [LICM][NFC] Reorder checks to speed up things slightly
Side effect check is made faster than potentially heavy other checks.
2020-04-21 11:34:44 +07:00
Nikita Popov 54d01cbc15 [IPT] Don't use OrderedInstructions (NFC)
Use Instruction::comesBefore() instead of OrderedInstructions
inside InstructionPrecedenceTracking. This also removes the
dominator tree dependency.

Differential Revision: https://reviews.llvm.org/D78461
2020-04-20 18:25:31 +02:00
Florian Hahn 32af48cdcf [IVDescriptors] Clean up includes.
Some includes are not required and forward declarations can be used
instead. This also exposed a few places that were not directly including
required files.
2020-04-19 20:07:47 +01:00
Davide Italiano 5f87415efc [LICM] Try to merge debug locations when sinking.
The current strategy LICM uses when sinking for debuginfo is
that of picking the debug location of one of the uses.
This causes stepping to be wrong sometimes, see, e.g. PR45523.

This patch introduces a generalization of getMergedLocation(),
that operates on a vector of locations instead of two, and try
to merge all them together, and use the new API in LICM.

<rdar://problem/61750950>
2020-04-15 12:29:34 -07:00
Tyker c35194b800 [AssumeBundles] preserve information in LICM
Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77407
2020-04-14 12:48:14 +02:00
Eli Friedman 3f13ee8a00 [NFC] Modernize misc. uses of Align/MaybeAlign APIs.
Use the current getAlign() APIs where it makes sense, and use Align
instead of MaybeAlign when we know the value is non-zero.
2020-04-06 17:53:04 -07:00
OCHyams 9b56cc9361 [DebugInfo] Salvage debug info when sinking loop invariant instructions
Reviewed By: vsk, aprantl, djtodoro

Differential Revision: https://reviews.llvm.org/D77318
2020-04-03 09:19:26 +01:00
Matt Arsenault f9047ede58 LICM: Reorder condition checks
Check the fast math flag before the more expensive loop check.
2020-03-03 17:15:57 -05:00
Juneyoung Lee 9f1f244d3c [LICM] Allow freeze to hoist/sink out of a loop
Summary: This patch allows LICM to hoist/sink freeze instructions out of a loop.

Reviewers: reames, fhahn, efriedma

Reviewed By: reames

Subscribers: jfb, lebedev.ri, hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75400
2020-03-03 12:29:39 +09:00
Philip Reames 14845b2c45 Revert "[LICM] Support hosting of dynamic allocas out of loops"
This reverts commit 8d22100f66.

There was a functional regression reported (https://bugs.llvm.org/show_bug.cgi?id=44996).  I'm not actually sure the patch is wrong, but I don't have time to investigate currently, and this line of work isn't something I'm likely to get back to quickly.
2020-02-25 09:05:31 -08:00
Bill Wendling 2fe457690d Filter callbr insts from critical edge splitting
Similarly to how splitting predecessors with an indirectbr isn't handled
in the generic way, we also shouldn't split callbrs, for similar
reasons.
2020-02-20 16:24:42 -08:00
Alina Sbirlea 4f33a68973 Compute ORE, BPI, BFI in Loop passes.
Summary:
Passes ORE, BPI, BFI are not being preserved by Loop passes, hence it
is incorrect to retrieve these passes as cached.
This patch makes the loop passes in question compute a new instance.

In some of these cases, however, it may be beneficial to change the Loop pass to
a Function pass instead, similar to the change for LoopUnrollAndJam.

Reviewers: chandlerc, dmgreen, jdoerfert, reames

Subscribers: mehdi_amini, hiraditya, zzheng, steven_wu, dexonsmith, Whitney, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72891
2020-02-12 09:15:18 -08:00
Alina Sbirlea 1c03cc5a39 [NFCI] Update according to style.
clang-tidy + clang-format
2020-02-04 17:11:36 -08:00
Daniil Suchkov 53a28bd891 [LICM] NFC. Remove AST caching infrastructure
Since LICM doesn't use AST caching any more (see D73081), this
infrastructure is now obsolete and we can remove it.

Reviewers: asbirlea, fhahn, efriedma, reames

Reviewed-By: asbirlea

Differential Revision: https://reviews.llvm.org/D73084
2020-01-23 12:33:50 +07:00