Commit Graph

4912 Commits

Author SHA1 Message Date
Giorgis Georgakoudis b0e06e1fc0 [Attributor][NFC] Modify isAssumedHeapToStack for const argument
There is no need for a non-const argument interface and the const argument modification covers existing and upcoming use cases.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106418
2021-07-21 10:28:21 -07:00
Arthur Eubanks 8bc298d041 [NewPM][Inliner] Check if deleted function is in current SCC
In weird cases, the inliner will inline internal recursive functions,
sometimes causing them to have no more uses, in which case the
inliner will mark the function to be deleted. The function is
actually deleted after the call to
updateCGAndAnalysisManagerForCGSCCPass(). In
updateCGAndAnalysisManagerForCGSCCPass(), UR.UpdatedC may be set to
the SCC containing the function to be deleted. Then the inliner calls
CG.removeDeadFunction() which can cause that SCC to be deleted, even
though it's still stored in UR.UpdatedC.

We could potentially check in the wrappers/pass managers if UR.UpdatedC
is in UR.InvalidatedSCCs before doing anything with it, but it's safer
to do this as close to possible to the call to CG.removeDeadFunction()
to avoid issues with allocating a new SCC in the same address as
the deleted one.

It's hard to find a small test case since we need to have recursive
internal functions be reachable from non-internal functions, yet they
need to become non-recursive and not referenced by other functions when
inlined.

Similar to https://reviews.llvm.org/D106306.

Fixes PR50788.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D106405
2021-07-21 08:47:45 -07:00
Sami Tolvanen e901e581ef Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 700d07f8ce.

Reverting due to a ThinLTO+CFI breakage on -msvc targets.
2021-07-20 13:59:46 -07:00
Fangrui Song 3924877932 [IR] Rename `comdat noduplicates` to `comdat nodeduplicate`
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.

In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics.  The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106319
2021-07-20 12:47:10 -07:00
Nikita Popov 0c794abff1 [ThinTLOBitcodeWriter] Fix unused variable warning (NFC) 2021-07-20 20:19:47 +02:00
Sami Tolvanen 700d07f8ce ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them. This version uses module inline assembly
to avoid issues with LowerTypeTestsModule.

Relands commmit 8e3b5cb39e with arch
specific tests fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-07-20 10:30:02 -07:00
Giorgis Georgakoudis e8439ec893 [OpenMP] Set RequiresFullRuntime false in SPMDization
SPMDization in D102307 does not change the RequiresFullRuntime argument of kmpc_target_init/deinit calls. However, the constraints of SPMDization detection for converting a target region to SPMD mode should guarantee that the region does not require full runtime support. Hence, this patch sets RequiresFullRuntime to false for improved execution performance.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105556
2021-07-20 09:54:51 -07:00
Shimin Cui 0b043bb39b This patch extends the OptimizeGlobalAddressOfMalloc to handle the null check of global pointer variables. It is disabled with https://reviews.llvm.org/rGb7cd291c1542aee12c9e9fde6c411314a163a8ea. This PR is to reenable it while fixing the original problem reported. The fix is to set the store value correctly when creating store for the new created global init bool symbol.
Reviewed By: efriedma

Differential Revision:  https://reviews.llvm.org/D102711
2021-07-20 12:27:26 -04:00
Kazu Hirata 4a30a5c8d9 [SampleProfile] Remove ProfileIsValid (NFC)
The last use was removed on Jan 22, 2021 in commit
c9cd9a0066.
2021-07-20 08:07:04 -07:00
Johannes Doerfert d62bbbebbf [Attributor] Initialize effectively unused value to appease UBSAN 2021-07-20 09:18:51 -05:00
Johannes Doerfert c66cbee140 [Attributor] Use set vector instead of vector to prevent duplicates 2021-07-20 01:39:34 -05:00
Johannes Doerfert 5957cf9f11 [Attributor] Simplify to values in the genericValueTraversal
We already simplified to a constant, given the new interface we can also
simplify to a generic value.
2021-07-20 01:39:34 -05:00
Johannes Doerfert 5eba7846a5 [Attributor] Use checkForAllUses instead of custom use tracking
AAMemoryBehaviorFloating used a custom use tracking mechanism even
though checkForAllUses exists and is already more powerful. Further,
AAMemoryBehaviorFloating uses AANoCapture to guarantee that there are no
aliases and following the uses is sufficient. This is an OK assumption
if checkForAllUses is used but custom tracking is easily out of sync
with AANoCapture and problems follow.
2021-07-20 01:39:33 -05:00
Johannes Doerfert b96ea6b1fd [Attributor] Ensure to simplify operands in AAValueConstantRange
As with other patches before, the simplification callback interface
requires us to go through the Attributor::getAssumedSimplified API first
before we recurs.

It is unclear if the problem can be explicitly tested with our current
infrastructure.
2021-07-20 00:35:14 -05:00
Johannes Doerfert 5fbb51d8d5 [Attributor] Extend the AAValueSimplify compare simplification logic
We first simplify the operands of a compare and then reason on the
simplified versions, e.g., with AANonNull.

This does improve the simplification capabilities but also fixes a
potential problem that has not yet been observed by simplifying the
operands first.
2021-07-20 00:35:14 -05:00
Johannes Doerfert 9c00aabd60 [Attributor][NFCI] Expose `getAssumedUnderlyingObjects` API 2021-07-20 00:35:13 -05:00
Johannes Doerfert 5e169818fb [Attributor][NFC] Fix function name spelling 2021-07-20 00:35:13 -05:00
Johannes Doerfert 44a9ee170c [Attributor][FIX] Do not simplify byval arguments
A byval argument is a different value in the caller and callee, we
cannot propagate the information as part of AAValueSimplify. Users that
want to deal with byval arguments need to specifically perform the
argument -> call site step. We do not do this for now.
2021-07-19 22:48:51 -05:00
Johannes Doerfert c2281f1565 [Attributor] Introduce AAPointerInfo
This patch introduces AAPointerInfo which tracks the uses of a pointer
and places them in "bins" based on their offset from the base and access
size.

As with other AAs, any pointer can be tracked but it is up to the user
to make sense of the results. The user in this patch is AAValueSimplify
and AAPotentialValues which both utilize AAPointerInfo to determine the
value of a load. For now, this is restricted to loads of allocas and
internal globals. Through the use of AAPointerInfo and the "bins" we can
track struct members separately. The users also know that storing only
zeros (at unknown indices) will result in loading only 0 (from unknown
indices). Other than that, the users are flow and context insensitive
(for now).

To deal with the "bins" more easily, AAPointerInfo provides a
forallInterfearingAccesses that applies a callback on all accesses
that might interfere with a given load or store.

Differential Revision: https://reviews.llvm.org/D104432
2021-07-19 22:48:35 -05:00
Johannes Doerfert 28c78a9e12 [Attributor] Simplify loads
As a first step to simplify loads we only handle `null` and `undef`
underlying objects, as well as objects that have the load as a single user.
Loads of those values can be replaced by the initializer, if any.
Proper reasoning is introduced in a follow up patch

Differential Revision: https://reviews.llvm.org/D103862
2021-07-19 22:47:29 -05:00
Johannes Doerfert 97387fdf6d [OpenMP] Fix carefully track SPMDCompatibilityTracker
We did not properly use SPMDCompatibilityTracker in various places.
This patch makes sure we look at the validity properly and also fix
the state if we can.

Differential Revision: https://reviews.llvm.org/D106085
2021-07-19 22:47:03 -05:00
Shilei Tian d3454ee8d2 [AbstractAttributor] Fix two issues in folding __kmpc_is_spmd_exec_mode
This patch fixed two issues found when folding `__kmpc_is_spmd_exec_mode`:
1. When the reaching kernels are empty, it should not fold to generic mode.
2. When creating AA for the caller when updating information, the dependency
   should be required.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D106209
2021-07-17 13:13:44 -04:00
Wenlei He f9f3c34e0f [CSSPGO] Turn on iterative-BFI for CSSPGO
Iterative-BFI produces better count quality and performance when evaluated on internal benchmarks. Turning it on by default now for CSSPGO. We can consider turn it on by default for AutoFDO as well in the future.

Differential Revision: https://reviews.llvm.org/D106202
2021-07-16 17:35:49 -07:00
Sami Tolvanen 0ad1d9fdf2 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 8e3b5cb39e.

Reverting to investigate test failures.
2021-07-16 14:47:33 -07:00
Sami Tolvanen 8e3b5cb39e ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them. This version uses module inline assembly
to avoid issues with LowerTypeTestsModule.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-07-16 14:33:34 -07:00
Joseph Huber b910a109f8 [OpenMP][NFC] Update the comment header for optimizations. 2021-07-16 14:13:13 -04:00
Joseph Huber 2c31d5ebfb [OpenMP] Add IDs to OpenMP remarks
This patch adds unique idenfitiers to the existing OpenMP remarks. This makes
it easier to identify the corresponding documentation for each remark that will
be hosted in the OpenMP webpage.

Depends on D105898

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105939
2021-07-16 14:07:03 -04:00
Joseph Huber eef6601b0f [OpenMP] Rework OpenMP remarks
This patch rewrites and reworks a few of the existing remarks to make the mmore
concise and consistent prior to writing the documentation for them.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105898
2021-07-16 14:07:00 -04:00
Shilei Tian c23da666b5 [Attributor] Add support for compound assignment for ChangeStatus
A common use of `ChangeStatus` is as follows:
```
ChangeStatus Changed = ChangeStatus::UNCHANGED;
Changed |= foo();
```
where `foo` returns `ChangeStatus` as well. Currently `ChangeStatus` doesn't
support compound assignment, we have to write as
```
Changed = Changed | foo();
```
which is not that convenient.

This patch add the support for compound assignment for `ChangeStatus`. Compound
assignment is usually implemented as a member function, and binary arithmetic
operator is therefore implemented using compound assignment. However, unlike
regular C++ class, enum class doesn't support member functions. As a result, they
can only be implemented in the way shown in the patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106109
2021-07-15 23:51:46 -04:00
Shilei Tian ca662297d5 [AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.

In the future we will support more function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105787
2021-07-15 18:23:23 -04:00
Shilei Tian a70ef3f568 Revert "[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible"
This reverts commit 1100e4aafe.
2021-07-15 11:19:28 -04:00
Kuter Dinel ade190c5ea [Attributor] AACallEdges, Add a way to ask nonasm unknown callees
This patch adds a feature to AACallEdges AbstractAttribute that allows
users to ask if there is a unknown callee that isn't a inline assembly.
This feature is needed by some of it's users.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105992
2021-07-15 06:10:42 +03:00
Shilei Tian 1100e4aafe [AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.

In the future we will support more function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105787
2021-07-13 22:28:35 -04:00
Arthur Eubanks ab5693aa4a [OpaquePtr] Use byval type more 2021-07-13 09:34:34 -07:00
Arthur Eubanks 693bc04bf6 [OpaquePtr] Use GlobalValue::getValueType() more 2021-07-13 09:34:34 -07:00
Jeroen Dobbelaere 1d8030053d [NFC] Do not track calls to inlined intrinsics in IFI.
Just like intrinsics are not tracked for IFI.InlinedCalls, they should not be tracked for IFI.InlinedCallSites.

In the current top-of-tree this change is a NFC, but the full restrict patches (D68484) potentially trigger an read-after-free
if intrinsics are also added to the InlindeCallSites, due to a late optimization potentially removing some of the inlined intrinsics.

Also see https://lists.llvm.org/pipermail/llvm-dev/2021-July/151722.html for a discussion about the problem.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D105805
2021-07-13 10:18:23 +02:00
Nikita Popov 7ed3e87825 [Attributes] Determine attribute properties from TableGen data
Continuing from D105763, this allows placing certain properties
about attributes in the TableGen definition. In particular, we
store whether an attribute applies to fn/param/ret (or a combination
thereof). This information is used by the Verifier, as well as the
ForceFunctionAttrs pass. I also plan to use this in LLParser,
which also duplicates info on which attributes are valid where.

This keeps metadata about attributes in one place, and makes it
more likely that it stays in sync, rather than in various
functions spread across the codebase.

Differential Revision: https://reviews.llvm.org/D105780
2021-07-12 22:13:38 +02:00
Liqiang Tao 6ebeb7f8ba [llvm][Inliner] Templatize PriorityInlineOrder
The patch templatize PriorityInlinerOrder so that it can accept any type priority metric.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D104972
2021-07-12 10:38:22 +08:00
Johannes Doerfert 792aac9897 [Attributor][NFCI] Add UsedAssumedInformation to more interfaces
As with other Attributor interfaces we often want to know if assumed
information was used to answer a query. This is important if only
known information is allowed or if known information can lead to an
early fixpoint. The users have been adjusted but none of them utilizes
the new information yet.
2021-07-11 19:18:03 -05:00
Johannes Doerfert 2e7e2994a9 [Attributor][FIX] Destroy bump allocator objects to avoid leaks
AllocationInfo and DeallocationInfo objects themselves are allocated
with the Attributor bump allocator and do not need to be deallocated.
That said, the sets in AllocationInfo and DeallocationInfo need to be
destroyed to avoid memory leaks.
2021-07-10 18:53:37 -05:00
Johannes Doerfert 514c033db1 [OpenMP] Detect SPMD compatible kernels and execute them as such
In the spirit of TRegions [0], this patch analyzes a kernel and tracks
if it can be executed in SPMD-mode. If so, we flip the arguments of
the __kmpc_target_init and deinit call to enable the mode. We also
update the `<kernel>_exec_mode` flag to indicate to the runtime we
changed the mode to SPMD.

The code analysis is done interprocedurally by extending the
AAKernelInfo abstract attribute to track SPMD compatibility as well.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D102307
2021-07-10 18:44:25 -05:00
Johannes Doerfert 8cb7d71355 [OpenMP][FIX] Add missing `)` to remark 2021-07-10 18:40:32 -05:00
Johannes Doerfert d9659bf6a0 [OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state
machine for a generic target region based on the potentially called
parallel regions.

The code analysis is done interprocedurally via an abstract attribute
(AAKernelInfo). All outermost parallel regions are collected and we
check if there might be unknown outermost parallel regions for which
we need an indirect call. Other AAKernelInfo extensions are expected.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D101977
2021-07-10 17:57:08 -05:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Johannes Doerfert 4761d29633 [Attributor][FIX] Sanitize queries to LVI and ScalarEvolution
When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need
to be careful with the query. The particular error occurred because we
folded a PHI node before the LVI query but the context location was now
not dominated by the value anymore. This is not supported by LVI so we
have to filter these situations before we query the outside analyses.
2021-07-10 16:45:19 -05:00
Johannes Doerfert c1d53a316d [Attributor] Look through selects in genericValueTraversal
If we can simplify the select condition we can avoid one value in the
traversal.

Differential Revision: https://reviews.llvm.org/D103861
2021-07-10 16:44:05 -05:00
Johannes Doerfert c1c1fe9385 [Attributor] Reorganize AAHeapToStack
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.

This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.

Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.

Differential Revision: https://reviews.llvm.org/D104993
2021-07-10 16:32:24 -05:00
Johannes Doerfert dbb3a65f5b [Attributor][FIX] Do not replace a value with a non-dominating instruction
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
2021-07-10 16:09:30 -05:00
Johannes Doerfert 5ef18e2421 [Attributor] Use AAValueSimplify to simplify returned values
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.

AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.

Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.

Differential Revision: https://reviews.llvm.org/D103860
2021-07-10 15:52:36 -05:00
Johannes Doerfert 0aab13aaf9 [Attributor] Introduce an optimistic getUnderlyingObjects helper
As the `llvm::getUnderlyingObjects` helper, the optimistic version
collects objects that might be the base of a given pointer. In contrast
to the llvm variant, the optimistic one will use assumed information,
e.g., about select conditions or dead blocks, to provide a more precise
result.

Differential Revision: https://reviews.llvm.org/D103859
2021-07-10 15:47:30 -05:00
Johannes Doerfert 5b12cf3e65 [Attributor][FIX] Traverse uses even if a value is assumed constant
Not all attributes are able to handle the interprocedural step and
follow the uses into a call site. Let them be able to combine call site
uses instead. This might result in some unused values/arguments being
leftover but it removes problems where we misused "is dead" even though
it was actually "is simplified/replaced".

We explicitly check for dead values due to constant propagation in
`AAIsDeadValueImpl::areAllUsesAssumedDead` instead.

Differential Revision: https://reviews.llvm.org/D103858
2021-07-10 15:47:20 -05:00
Nico Weber d3e7491333 Revert Attributor patch series
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
2021-07-10 16:15:55 -04:00
Johannes Doerfert 269416d419 [Attributor][NFCI] Add UsedAssumedInformation to more interfaces
As with other Attributor interfaces we often want to know if assumed
information was used to answer a query. This is important if only
known information is allowed or if known information can lead to an
early fixpoint. The users have been adjusted but none of them utilizes
the new information yet.
2021-07-10 12:32:51 -05:00
Johannes Doerfert d39179d7fa [OpenMP] Detect SPMD compatible kernels and execute them as such
In the spirit of TRegions [0], this patch analyzes a kernel and tracks
if it can be executed in SPMD-mode. If so, we flip the arguments of
the __kmpc_target_init and deinit call to enable the mode. We also
update the `<kernel>_exec_mode` flag to indicate to the runtime we
changed the mode to SPMD.

The code analysis is done interprocedurally by extending the
AAKernelInfo abstract attribute to track SPMD compatibility as well.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D102307
2021-07-10 12:32:51 -05:00
Johannes Doerfert 966342790e [Attributor][FIX] Sanitize queries to LVI and ScalarEvolution
When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need
to be careful with the query. The particular error occurred because we
folded a PHI node before the LVI query but the context location was now
not dominated by the value anymore. This is not supported by LVI so we
have to filter these situations before we query the outside analyses.
2021-07-10 12:32:51 -05:00
Johannes Doerfert ae08df87df [Attributor][FIX] Do not replace a value with a non-dominating instruction
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
2021-07-10 12:32:50 -05:00
Johannes Doerfert f0628c6ff7 [OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state
machine for a generic target region based on the potentially called
parallel regions.

The code analysis is done interprocedurally via an abstract attribute
(AAKernelInfo). All outermost parallel regions are collected and we
check if there might be unknown outermost parallel regions for which
we need an indirect call. Other AAKernelInfo extensions are expected.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D101977
2021-07-10 12:32:50 -05:00
Johannes Doerfert 1d5711c3ee [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 12:32:50 -05:00
Johannes Doerfert 5003ba2542 [Attributor] Look through selects in genericValueTraversal
If we can simplify the select condition we can avoid one value in the
traversal.

Differential Revision: https://reviews.llvm.org/D103861
2021-07-10 12:32:50 -05:00
Johannes Doerfert 1eb31d6de3 [Attributor] Reorganize AAHeapToStack
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.

This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.

Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.

Differential Revision: https://reviews.llvm.org/D104993
2021-07-10 12:32:50 -05:00
Johannes Doerfert 374e573cfc [Attributor] Use AAValueSimplify to simplify returned values
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.

AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.

Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.

Differential Revision: https://reviews.llvm.org/D103860
2021-07-10 12:32:50 -05:00
Johannes Doerfert 93a279a67d [Attributor] Introduce an optimistic getUnderlyingObjects helper
As the `llvm::getUnderlyingObjects` helper, the optimistic version
collects objects that might be the base of a given pointer. In contrast
to the llvm variant, the optimistic one will use assumed information,
e.g., about select conditions or dead blocks, to provide a more precise
result.

Differential Revision: https://reviews.llvm.org/D103859
2021-07-10 12:32:49 -05:00
Johannes Doerfert be5d46e9bb [Attributor][FIX] Traverse uses even if a value is assumed constant
Not all attributes are able to handle the interprocedural step and
follow the uses into a call site. Let them be able to combine call site
uses instead. This might result in some unused values/arguments being
leftover but it removes problems where we misused "is dead" even though
it was actually "is simplified/replaced".

We explicitly check for dead values due to constant propagation in
`AAIsDeadValueImpl::areAllUsesAssumedDead` instead.

Differential Revision: https://reviews.llvm.org/D103858
2021-07-10 12:32:49 -05:00
Kazu Hirata 49d66d9f9f [AFDO] Merge function attributes after inlining
This patch teaches the sample profile loader to merge function
attributes after inlining functions.

Without this patch, the compiler could inline a function requiring the
512-bit vector width into its caller without merging function
attributes, triggering a failure during instruction selection.

Differential Revision: https://reviews.llvm.org/D105729
2021-07-09 16:47:12 -07:00
Varun Gandhi 92dcb1d2db [Clang] Introduce Swift async calling convention.
This change is intended as initial setup. The plan is to add
more semantic checks later. I plan to update the documentation
as more semantic checks are added (instead of documenting the
details up front). Most of the code closely mirrors that for
the Swift calling convention. Three places are marked as
[FIXME: swiftasynccc]; those will be addressed once the
corresponding convention is introduced in LLVM.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D95561
2021-07-09 11:50:10 -07:00
Arthur Eubanks 4e6013250d [NFC][OpaquePtr] Use GlobalValue::getValueType() more
Instead of getType()->getElementType().
2021-07-09 09:55:41 -07:00
Bjorn Pettersson 472462c472 [NewPM] Consistently use 'simplifycfg' rather than 'simplify-cfg'
There was an alias between 'simplifycfg' and 'simplify-cfg' in the
PassRegistry. That was the original reason for this patch, which
effectively removes the alias.

This patch also replaces all occurrances of 'simplify-cfg'
by 'simplifycfg'. Reason for choosing that form for the name is
that it matches the DEBUG_TYPE for the pass, and the legacy PM name
and also how it is spelled out in other passes such as
'loop-simplifycfg', and in other options such as
'simplifycfg-merge-cond-stores'.

I for some reason the name should be changed to 'simplify-cfg' in
the future, then I think such a renaming should be more widely done
and not only impacting the PassRegistry.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D105627
2021-07-09 09:47:03 +02:00
Michael Liao 4e5d9c8803 [Internalize] Preserve variables externally initialized.
- ``externally_initialized`` variables would be initialized or modified
  elsewhere. Particularly, CUDA or HIP may have host code to initialize
  or modify ``externally_initialized`` device variables, which may not
  be explicitly referenced on the device side but may still be used
  through the host side interfaces. Not preserving them triggers the
  elimination of them in the GlobalDCE and breaks the user code.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D105135
2021-07-08 10:48:19 -04:00
Johannes Doerfert 168a9234d7 [Attributor][FIX] Replace uses first, then values
Before we replaced value by registering all their uses. However, as we
replace a value old uses become stale. We now replace values explicitly
and keep track of "new values" when doing so to avoid replacing only
uses in stale/old values but not their replacements.
2021-07-06 22:43:51 -05:00
Johannes Doerfert aa3768278d [Attributor] Introduce a helper function to deal with undef + none
We often need to deal with the value lattice that contains none and
undef as special values. A simple helper makes this much nicer.

Differential Revision: https://reviews.llvm.org/D103857
2021-07-06 22:41:21 -05:00
Johannes Doerfert fc82409b5c [Attributor] Simplify operands inside of simplification AAs first
When we do simplification via AAPotentialValues or AAValueConstantRange
we need to simplify the operands of an instruction we deconstruct first.
This does not only improve the result, see for example range.ll, but is
required as we allow outside AAs to provide simplification rules via
callbacks. If we do ignore the simplification rules and base other
simplifications on the IR instead we can create an inconsistent state.
2021-07-06 22:41:18 -05:00
Roman Lebedev 333d3a3cdf
[NFC][PassBuilder] addVectorPasses(): clarify that 'IsLTO' is actually 'IsFullLTO'
I.e. it will be `false` for thin lto.
2021-07-01 10:09:24 +03:00
Joseph Huber ecabc6684f [OpenMP] Change analysis remarks to not emit on cold functions
The remarks will trigger on some functions that are marked cold, such as the
`__muldc3` intrinsic functions. Change the remarks to avoid these functions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105196
2021-06-30 11:54:24 -04:00
Joseph Huber 0edb87773b [OpenMP] Add additional remarks for OpenMPOpt
This patch adds additional remarks, suggesting the use of `noescape` for failed
globalization and indicating when internalization failed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105150
2021-06-30 09:49:25 -04:00
Chuanqi Xu 801c2b9bba [FuncSpec] Add an option to specializing literal constant
Now the option is off by default. Since we are not sure if this option
would make the compile time increase aggressively. Although we tested it
on SPEC2017, we may need to test more to make it on by default.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D104365
2021-06-30 11:26:44 +08:00
Johannes Doerfert 7af91a2b8f [Attributor][NFCI] Make the state of AAValueSimplify explicit
As we have done with other states we want the AAValueSimplify state to
be explicit to use it more easily in our helpers.
2021-06-29 09:38:22 -05:00
Johannes Doerfert dcbe58d94c [Attributor][NFCI] Remove unneeded namespace 2021-06-29 09:38:20 -05:00
Johannes Doerfert 457bd5c8d5 [Attributor] Teach AAPotentialValues about constant select conditions
There was a TODO but now we actually check if the select condition is
assumed constant and only look at the relevant operand.
2021-06-29 09:38:18 -05:00
Johannes Doerfert 8dc9bb6d85 [Attributor][NFC] Clang format 2021-06-29 09:38:15 -05:00
Joseph Huber 57ad2e1067 [OpenMP] Prevent OpenMPOpt from internalizing uncalled functions
Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary.

Depends on D102423

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104890
2021-06-28 16:47:53 -04:00
Joseph Huber 13b2fba239 [OpenMP][NFC] Fix typo in OpenMPOpt 2021-06-28 09:49:14 -04:00
Joseph Huber 4024087731 [OpenMP][NFC] Fix missing argument 2021-06-28 09:15:01 -04:00
Joseph Huber 4a6bd8e3e7 [OpenMP] Increase attributor iterations on the GPU
Increase the number of attributor iterations on a GPU target. I forgot to
change this in D104416.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104920
2021-06-28 08:50:49 -04:00
Joseph Huber 5ccb7424fa [OpenMP] Change OpenMPOpt to check openmp metadata
The metadata added in D102361 introduces a module flag that we can check
to determine if the module was compiled with `-fopenmp` enables. We can
now check for the precense of this instead of scanning the call graph
for OpenMP runtime functions.

Depends on D102361

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102423
2021-06-25 16:34:22 -04:00
Arthur Eubanks 7110510eca [WPD] Don't optimize calls more than once
WPD currently assumes that there is a one to one correspondence between
type test assume sequences and virtual calls. However, with
-fstrict-vtable-pointers this may not be true. This ends up causing
crashes when we try to optimize a virtual call more than once (
applyUniformRetValOpt()/applyUniqueRetValOpt()/applyVirtualConstProp()/applySingleImplDevirt()).

applySingleImplDevirt() actually didn't previous crash because it would
replace the devirtualized call with the same direct call. Adding an
assert that the call is indirect causes the corresponding test to crash
with the rest of the patch.

This makes Chrome successfully build with -fstrict-vtable-pointers + WPD.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D104798
2021-06-24 13:28:09 -07:00
Zequan Wu 9393894331 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This casues compiler crash: Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"'

This reverts commit e3d24b45b8.
2021-06-23 19:24:56 -07:00
Sami Tolvanen e3d24b45b8 ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

This relands commit 4474958d3a
with a fix to a use-of-uninitialized-value error that tripped
MemorySanitizer.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-06-23 10:56:13 -07:00
Kuter Dinel 5d44d56f7d [Attributor] Derive AAFunctionReachability attribute.
This attribute uses Attributor's internal 'optimistic' call graph
information to answer queries about function call reachability.

Functions can become reachable over time as new call edges are
discovered.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104599
2021-06-23 20:43:10 +03:00
Liqiang Tao a0d96fdd3a [llvm][Inliner] Make PriorityInlineOrder lazily updated
This patch makes PriorityInlineOrder lazily updated.
The PriorityInlineOrder would lazily update the desirability of a call site if it's decreasing.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D104654
2021-06-23 08:59:53 +08:00
Joseph Huber 1cfdcae653 [Attributor] Fix AAExecutionDomain returning true on invalid states
This patch fixes a problem with the AAExecutionDomain attributor not
checking if it is in a valid state. This can cause it to incorrectly
return that a block is executed in a single threaded context after the
attributor failed for any reason.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103186
2021-06-22 18:12:43 -04:00
Joseph Huber 44feacc736 [OpenMP] Change remaining globalization from an analysis remark to missed
After landing the globalization optimizations, the precense of globalization on
the device that was not put in shared or stack memory is a failed optimization
with performance consequences so it should indicate a missed remark.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104735
2021-06-22 16:52:06 -04:00
Sami Tolvanen 33c9438f11 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 4474958d3a.

Breaks check-llvm on Mac.
2021-06-22 12:10:58 -07:00
Joseph Huber ca1560da72 [OpenMP][NFC] Add new optimizations to OpenMPOpt comment header
Summary:
Adds mentions to the new globalization optimizations added to the OpenMPOpt comment header.
2021-06-22 14:40:31 -04:00
Joseph Huber b54ccab509 [Attributor] Add an option to increase the max number of iterations
Right now the Attributor defaults to 32 fixed point iterations unless it is set
explicitly by a command line flag. This patch allows this to be configured when
the attributor instance is created. The maximum is then increased in OpenMPOpt
if the target is a kernel. This is because the globalization analysis can result
in larger iteration counts due to many dependent instances running at once.

Depends on D102444

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104416
2021-06-22 14:38:25 -04:00
Joseph Huber 30e36c9b3c [Attributor] Add interface to emit remarks in Attributor
Summary:
This patch adds support for the Attributor to emit remarks on behalf of some
other pass. The attributor can now optionally take a callback function that
returns an OptimizationRemarkEmitter object when given a Function pointer. If
this is availible then a remark will be emitted for the corresponding pass
name.

Depends on D102197

Reviewed By: sstefan1 thegameg

Differential Revision: https://reviews.llvm.org/D102444
2021-06-22 14:12:46 -04:00
Joseph Huber 7d69da71dd [OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls
Summary:
The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087.

Depends on D97818

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102197
2021-06-22 13:23:05 -04:00
Sami Tolvanen 4474958d3a ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-06-22 10:01:55 -07:00
Joseph Huber 03d7e61c87 [OpenMP] Internalize functions in OpenMPOpt to improve IPO passes
Summary:
Currently the attributor needs to give up if a function has external linkage.
This means that the optimization introduced in D97818 will only apply to static
functions. This change uses the Attributor to internalize OpenMP device
routines by making a copy of each function with private linkage and replacing
the uses in the module with it. This allows for the optimization to be applied
to any regular function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102824
2021-06-22 12:38:10 -04:00
Joseph Huber 6fc51c9f7d [OpenMP] Replace GPU globalization calls with shared memory in the middle-end
Summary:
The changes introduced in D97680 create a simpler interface to code that needs
to be globalized. This interface is used to simplify the globalization calls in
the middle end. We can check any globalization call that is only called by a
single thread in the team and replace it with a static shared memory buffer.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97818
2021-06-22 11:55:44 -04:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Sjoerd Meijer 342bbb7832 [FuncSpec] Don't specialise functions with NoDuplicate instructions.
getSpecializationCost was returning INT_MAX for a case when specialisation
shouldn't happen, but this wasn't properly checked if specialisation was
forced.

Differential Revision: https://reviews.llvm.org/D104461
2021-06-21 09:02:11 +01:00
Liqiang Tao 671a87104b [llvm][Inliner] Add an optional PriorityInlineOrder
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028
2021-06-19 10:17:32 +08:00
Hongtao Yu bd52495518 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Liqiang Tao 93183a41b9 Revert D104028 "[llvm][Inliner] Add an optional PriorityInlineOrder" 2021-06-18 18:52:00 +08:00
Haojian Wu 3f5d53a525 [Attributor] Fix UB behavior on uninitalized bool variables.
Found by ASAN.
2021-06-18 11:49:42 +02:00
Liqiang Tao a740b707d1 [llvm][Inliner] Add an optional PriorityInlineOrder
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028
2021-06-18 16:55:38 +08:00
Haojian Wu 7670938bba [Attributor] Don't print the call-graph in a hard-coded file.
This looks like not a practical pattern in our codebase (it could fail
in some sandbox environement).

Instead we print it via standard output, and it is controled by the
-attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.
2021-06-18 09:38:07 +02:00
Johannes Doerfert 30c9d68ad9 [Attributor][FIX] Arguments of unknown functions can be undef
This should fix PR50683. The wrong assumption was that we
could always know what the callee is when we replace a call site
argument with undef. We wanted to know that to remove the `noundef`
that might be attached to the argument. Since no callee means we
did the propagation on the caller site, there is no need to remove
an attribute. It is only needed if we replace all uses and therefore
pass `undef` instead of the value that was passed in otherwise.
2021-06-18 01:07:53 -05:00
Johannes Doerfert 666dc6f126 [Attributor] Use a centralized value simplification interface
To allow outside AAs that simplify values we need to ensure all value
simplification goes through the Attributor, not AAValueSimplify (or any
of the other AAs we have already like AAPotentialValues). This patch
also introduces an interface for the outside AAs to register
simplification callbacks for an IRPosition. To make this work as
expected we have to pass IRPositions instead of Values in
AAValueSimplify, which makes sense by itself.
2021-06-18 01:07:53 -05:00
Johannes Doerfert d9194b6efb [Attributor] Introduce a helper do deal with constant type mismatches
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.

This also introduces the AA namespace for abstract attribute related
functions and types.

Differential Revision: https://reviews.llvm.org/D103856
2021-06-18 01:07:52 -05:00
Johannes Doerfert 9959eee001 [Attributor] Make sure Heap2Stack works properly on a GPU target
If the target stack is not accessible between different running
"threads" we have to make sure not to create allocas for mallocs
that might be used by multiple "threads". The "use check" is
sufficient to prevent this but if we apply the "free check" we have
to make sure the pointer is not communicated to others before
the free is reached.

Differential Revision: https://reviews.llvm.org/D98608
2021-06-18 01:07:52 -05:00
Johannes Doerfert 9a23e673ca [OpenMP][NFC] Expose AAExecutionDomain and rename its getter
The initial use for AAExecutionDomain was to determine if a single
thread executes a block. While this is sometimes informative most
of the time, and for other reasons, we actually want to know if it
is the "initial thread". Thus, the thread that started execution on
the current device. The deduction needs to be adjusted in a follow
up as the methods we use right not are looking for the OpenMP thread
id which is resets whenever a thread enters a parallel region. What
we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and
equivalent functions.
2021-06-18 01:07:52 -05:00
Johannes Doerfert 8d7bace3b5 [Attributor][NFC] AAReachability is currently stateless, don't invalidate it
We invalidated AAReachabilityImpl directly which is not helpful and
confusing as we still used it regardless. We now avoid invalidating it
(not needed anyway) and add checks for the state. This has by itself no
actual effect but prepares for later extensions.
2021-06-18 01:07:51 -05:00
Kuter Dinel eaf1b6810c [Attributor] Derive AACallEdges attribute
This attribute computes the optimistic live call edges using the attributor
liveness information. This attribute will be used for deriving a
inter-procedural function reachability attribute.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104059
2021-06-18 03:29:22 +03:00
Sjoerd Meijer dcd23d875a [FuncSpec] Don't specialise functions with attribute NoDuplicate.
Differential Revision: https://reviews.llvm.org/D104378
2021-06-17 10:32:29 +01:00
Sjoerd Meijer 49ab3b1735 [FuncSpec] Statistics
Adds some bookkeeping for collecting the number of specialised functions and a
test for that.

Differential Revision: https://reviews.llvm.org/D104102
2021-06-16 09:11:51 +01:00
Chuanqi Xu 86906304d8 [FuncSpec] Use std::pow instead of operator^
The original implementation calculating UserBonus uses operator ^, which means XOR in C++
language.
At the first glance of reviewing, I thought it should be power, my bad.
It doesn't make sense to use XOR here. So I believe it should be a
carelessness as I made.

Test Plan: check-all

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D104282
2021-06-16 10:13:21 +08:00
Andrew Litteken 2c21278e74 [IROutliner] Adding DebugInfo handling for IR Outlined Functions
This adds support for functions outlined by the IR Outliner to be
recognized by the debugger.  The expected behavior is that it will
skip over the instructions included in that section.  This is due to the
fact that we can not say which of the original locations the
instructions originated from.

These functions will show up in the call stack, but you cannot step
through them.

Reviewers: paquette, vsk, djtodoro

Differential Revision: https://reviews.llvm.org/D87302
2021-06-15 10:57:08 -05:00
Aditya Kumar dcbbc69cc5 Calculate getTerminator only when necessary
Differential Revision: https://reviews.llvm.org/D104202
2021-06-13 20:16:07 -07:00
Simon Pilgrim 2477b498f2 ArgumentPromotion.cpp - remove unused <string> include. NFCI. 2021-06-13 13:03:47 +01:00
Sjoerd Meijer 9907746f5d Move Function Specialization to its correct location. NFC.
As a follow up of rGc4a0969b9c14, and as part of D104102, move it to
the IPO transformations directory.
2021-06-11 15:00:10 +01:00
Simon Pilgrim 61cdaf66fe [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Sjoerd Meijer c4a0969b9c Function Specialization Pass
This adds a function specialization pass to LLVM. Constant parameters
like function pointers and constant globals are propagated to the callee by
specializing the function.

This is a first version with a number of limitations:
- The pass is off by default, so needs to be enabled on the command line,
- It does not handle specialization of recursive functions,
- It does not yet handle constants and constant ranges,
- Only 1 argument per function is specialised,
- The cost-model could be further looked into, and perhaps related,
- We are not yet caching analysis results.

This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan.
More recently this was also discussed on the list, see:

https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html.

The motivation for this work is that function specialisation often comes up as
a reason for performance differences of generated code between LLVM and GCC,
which has this enabled by default from optimisation level -O3 and up. And while
this certainly helps a few cpu benchmark cases, this also triggers in real
world codes and is thus a generally useful transformation to have in LLVM.

Function specialisation has great potential to increase compile-times and
code-size.  The summary from some investigations with this patch is:
- Compile-time increases for short compile jobs is high relatively, but the
  increase in absolute numbers still low.
- For longer compile-jobs, the extra compile time is around 1%, and very much
  in line with GCC.
- It is difficult to blame one thing for compile-time increases: it looks like
  everywhere a little bit more time is spent processing more functions and
  instructions.
- But the function specialisation pass itself is not very expensive; it doesn't
  show up very high in the profile of the optimisation passes.

The goal of this work is to reach parity with GCC which means that eventually
we would like to get this enabled by default. But first we would like to address
some of the limitations before that.

Differential Revision: https://reviews.llvm.org/D93838
2021-06-11 09:11:29 +01:00
Joseph Huber 4c9471581f [Attributor] Set floating point loads and stores as nofree in AANoFreeFloating
Summary:
The current implementation of AANoFreeFloating will incorrectly list floating
point loads and stores as may-free. This prevents other attributor instances
like HeapToStack from pushing some allocations to the stack.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103975
2021-06-09 16:16:37 -04:00
Liqiang Tao 4a0de622c3 [llvm] Add interface to order inlining
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order,
e.g. use queue or priority queue.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D103315
2021-06-07 18:27:55 +08:00
Liqiang Tao 48252d7570 Revert "[llvm] Add interface to order inlining" 2021-06-06 14:45:03 +08:00
Liqiang Tao 478dc47292 [llvm] Add interface to order inlining
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order, i.e. use queue or priority queue.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D103315
2021-06-06 12:03:02 +08:00
Rong Xu 8d581857d7 [SampleFDO] New hierarchical discriminator for FS SampleFDO (llvm-profdata part)
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is for llvm-profdata part of change. It sets the bit masks for the
profile reader in llvm-profdata. Also add an internal option
"-fs-discriminator-pass" for show and merge command to process the profile
offline.

This patch also moved setDiscriminatorMaskedBitFrom() to
SampleProfileReader::create() to simplify the interface.

Differential Revision: https://reviews.llvm.org/D103550
2021-06-04 11:22:06 -07:00
Joseph Huber 4a08163c73 [Attributor] Check HeapToStack's state for isKnownHeapToStack
This patch changes the `isKnownHeapToStack` and `isAssumedHeapToStack`
member functions to return if a function call is going to be altered by
HeapToStack.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103574
2021-06-04 12:38:33 -04:00
Rong Xu 6745ffe4fa [SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part)
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is mainly for ProfileData part of change. It will load
FS Profile when such profile is detected. For an extbinary format profile,
create_llvm_prof tool will add a flag to profile summary section.
For other format profiles, the users need to use an internal option
(-profile-isfs) to tell the compiler that the profile uses FS discriminators.

This patch also simplified the bit API used by FS discriminators.

Differential Revision: https://reviews.llvm.org/D103041
2021-06-02 10:32:52 -07:00
Fangrui Song 38dbdde792 [Internalize] Simplify comdat renaming with noduplicates after D103043
I realized that we can use `comdat noduplicates` which is available on ELF.
Add a special case for wasm which doesn't support the feature.
2021-05-28 16:58:38 -07:00
Sanjay Patel 403cfe5d70 [PassManager] unify late simplifycfg options between regular and LTO pipelines
This is split off from D102002, and I think it is clear that
the difference in behavior was not intended. Options were
added to SimplifyCFG over time, but different chunks of
the pass pipelines were not kept in sync.
2021-05-28 13:06:49 -04:00
Jinsong Ji b2581196eb [AIX] Enable stackprotect feature
AIX use `__ssp_canary_word` instead of `__stack_chk_guard`.
This patch update the target hook to use correct symbol,
so that the basic stackprotect feature can work.

The traceback will be handled in follow up patch.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D103100
2021-05-28 02:18:15 +00:00
Teresa Johnson d35fe04fa3 [LTT] Handle merged llvm.assume when dropping type tests
When the lower type test pass is invoked a second time with
DropTypeTests set to true, it expects that all remaining type tests feed
assume instructions, which are removed along with the type tests.

In some cases the llvm.assume might have been merged with another one,
i.e. from a builtin_assume instruction, in which case the type test
would actually feed a phi that in turn feeds the merged assume
instruction. In this case we can simply replace that operand of the phi
with "true" before removing the type test.

Differential Revision: https://reviews.llvm.org/D103073
2021-05-25 17:02:13 -07:00
Fangrui Song b426b45d10 [Internalize] Rename instead of removal if a to-be-internalized comdat has more than one member
Beside the `comdat any` deduplication feature, instrumentations use comdat to
establish dependencies among a group of sections, to prevent section based
linker garbage collection from discarding some members without discarding all.
LangRef acknowledges this usage with the following wording:

> All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key.

On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated
`__llvm_prf_data` section are placed in the same GRP_COMDAT group.  A
`__llvm_prf_data` is usually not referenced and expects the liveness of its
associated `__llvm_prf_cnts` to retain it.

The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the
use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained).
The main goal of this patch is to fix the dependency relationship.

I think it makes sense for InternalizePass to internalize a comdat and thus
suppress the deduplication feature, e.g. a relocatable link of a regular LTO can
create an object file affected by InternalizePass.
If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the
a.o references to the comdat definitions will be non-resolvable (references
cannot bind to STB_LOCAL definitions in b.o).

On PE-COFF, for a non-external selection symbol, deduplication is naturally
suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend
to believe the spec creator has not thought about this use case (see D102973).

GNU ld and gold are still using the "signature is name based" interpretation.
So even if D102973 for ld.lld is accepted, for portability, a better approach is
to rename the comdat. A comdat with one single member is the common case,
leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by
deleting the comdat; otherwise we rename the comdat.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103043
2021-05-25 14:15:27 -07:00
Marco Elver 280333021e [SanitizeCoverage] Add support for NoSanitizeCoverage function attribute
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.

Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.

Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035

Reviewed By: vitalybuka, morehouse

Differential Revision: https://reviews.llvm.org/D102772
2021-05-25 12:57:14 +02:00
Johannes Doerfert 6caea8a7fa [Attributor] Introduce a helper do deal with constant type mismatches
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.

This also introduces the AA namespace for abstract attribute related
functions and types.
2021-05-23 23:00:40 -05:00
Johannes Doerfert 55e9c28212 [Attributor] Teach AAIsDead about undef values
Not only if the branch or switch condition is dead but also if it is
assumed `undef` we can delay AAIsDead exploration.
2021-05-23 23:00:40 -05:00
Johannes Doerfert 4878d73419 [Attributor] Deal with address spaces gracefully
When we do value propagation we need to cast address spaces properly.
2021-05-23 23:00:39 -05:00
Johannes Doerfert 1ba2929bb8 [Attributor] Be more careful to not disturb the CG outside the SCC
We have seen various problems when the call graph was not updated or
the updated did not succeed because it involved functions outside the
SCC. This patch adds assertions and checks to avoid accidentally
changing something outside the SCC that would impact the call graph.
It also prevents us from reanalyzing functions outside the current
SCC which could cause problems on its own. Note that the transformations
we do might cause the CG to be "more precise" but the original one would
always be a super set of the most precise one. Since the call graph is
by nature an approximation, it is good enough to have a super set of all
call edges.
2021-05-23 23:00:39 -05:00
Johannes Doerfert e93ac1e2de [Attributor][FIX] Account for undef in the constant value lattice
The constant value lattice looks like this

```
  <None>
     |
  <undef>
  /  |   \
... <0>  ...
 \   |   /
 <unknown>
```
We did not account for the undef and assumed a value meant we could not
change anymore. Now we actually check if we have the same value as
before, which will signal CHANGED to the users when we go from undef to
a specific constant.

This fixes, among other things, the bug exposed by @ipccp4 in
`value-simplify.ll`.
2021-05-23 20:47:06 -05:00
Johannes Doerfert 5cdc29f795 [Attributor][FIX] Ensure we replace undef if we see the first "real" value
The state of AAPotentialValues tracks if undef is contained. It should
fold undef into the first non-undef value. However we missed a case
before. There was also a shadowing definition of two variables that
caused trouble. The test exposes both problems.
2021-05-23 20:47:06 -05:00
Johannes Doerfert 2bc51d39db [Attributor][NFC] Add helpful debug outputs 2021-05-23 20:47:05 -05:00
Johannes Doerfert cb511531b9 [Attributor][NFC] Clang format the Attributor source files 2021-05-23 20:47:05 -05:00
Yaxun (Sam) Liu bf6124580d [HIP] support ThinLTO
Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling
LTO for offload compilation. Allow LTO for AMDGPU target.

AMDGPU target does not support codegen of object files containing
call of external functions, therefore the LLVM module passed to
AMDGPU backend needs to contain definitions of all the callees.
An LLVM option is added to allow function importer to import
functions with noinline attribute.

HIP toolchain passes proper LLVM options to lld to make sure
function importer imports definitions of all the callees.

Reviewed by: Teresa Johnson, Artem Belevich

Differential Revision: https://reviews.llvm.org/D99683
2021-05-22 10:48:34 -04:00
Arthur Eubanks f7788e1bff Revert "[NewPM] Only invalidate modified functions' analyses in CGSCC passes"
This reverts commit d14d84af2f.

Causes unacceptable memory regressions.
2021-05-21 16:38:03 -07:00
Sanjay Patel f34311c402 [GlobalOpt] recompute alignments for loads and stores of updated globals
GlobalOpt can slice structs/arrays and change GEPs in the process,
but it was not updating alignments for load/store users. This
eventually causes the crashing seen in:
https://llvm.org/PR49661
https://llvm.org/PR50253

On x86, this required SLP+codegen to create an aligned vector
store on an invalid address. The bugs would be easier to
demonstrate on a target with stricter alignment requirements.

I'm not sure if this is a complete solution. The alignment
updating code is adapted from InstCombine, so I assume that
part is tested and good.

Differential Revision: https://reviews.llvm.org/D102552
2021-05-20 12:12:21 -04:00
Joseph Huber 2db182ff8d [Diagnostics] Allow emitting analysis and missed remarks on functions
Summary:
Currently, only `OptimizationRemarks` can be emitted using a Function.
Add constructors to allow this for `OptimizationRemarksAnalysis` and
`OptimizationRemarkMissed` as well.

Reviewed By: jdoerfert thegameg

Differential Revision: https://reviews.llvm.org/D102784
2021-05-19 15:10:20 -04:00
Hongtao Yu 4ca6e37b98 [CSSPGO] Overwrite branch weight annotated in previous pass.
Sample profile loader can be run in both LTO prelink and postlink. Currently the counts annoation in postilnk doesn't fully overwrite what's done in prelink. I'm adding a switch (`-overwrite-existing-weights=1`) to enable a full overwrite, which includes:

1. Clear old metadata for calls when their parent block has a zero count. This could be caused by prelink code duplication.

2. Clear indirect call metadata if somehow all the rest targets have a sum of zero count.

3. Overwrite branch weight for basic blocks.

With a CS profile, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obsolete metadata, which come from a partial global inlining in prelink.  It's not expected to work well for non-CS case with a less-accurate post-inline count quality.

It's worth calling out that some prelink optimizations can damage counts quality in an irreversible way. One example is the loop rotate optimization. Due to lack of exact loop entry count (profiling can only give loop iteration count and loop exit count), moving one iteration out of the loop body leaves the rest iteration count unknown. We had to turn off prelink loop rotate to achieve a better postlink counts quality. A even better postlink counts quality can be archived by turning off prelink CGSCC inlining which is not context-sensitive.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D102537
2021-05-19 09:12:24 -07:00
Joseph Huber 68abc3d264 [Attributor] Change AAExecutionDomain to only accept intrinsics
Summary:
The OpenMP runtime functions don't always provide unique thread ID's to
determine if a basic block is truly single-threaded. Change the implementation
to only check NVPTX intrinsics for now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102700
2021-05-18 21:19:26 -04:00
Hongtao Yu f28ee1a2b3 [CSSPGO] Update pseudo probe distribution factor based on inline context.
With prelink inlining, pseudo probes with same ID can come from different inline contexts. Such probes should not share samples and their factors should be fixed up separately.

I'm seeing 0.3% speedup for SPEC2017 overall. Benchmark 631.deepsjeng_s benefits the most, about 4%.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D102429
2021-05-16 23:11:36 -07:00
Kuter Dinel 64ef29bc66 [Attributor] Call site specific AAValueSimplification and AAIsDead.
This patch makes it possible to do call site specific deductions
for AAValueSimplification and AAIsDead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84722
2021-05-15 21:39:07 +00:00
Simon Pilgrim e30540a603 SampleProfileLoader::inlineHotFunctionsWithPriority - Fix uninitialized variable warning. NFCI.
findIndirectCallFunctionSamples will leave Sum uninitialized if it returns an empty vector, we don't really use Sum in this case (but we do make a copy that isn't used either) - so ensure we initialize the value to zero to at least silence the static analysis warning.
2021-05-15 15:02:52 +01:00
wlei e475d4d69f [CSSPGO] Fix return value of getProbeWeight
Currently we didn't support multiple return type, we work around to use error_code to represent:

1)  The dangling probe.
2)  Ignore the weight of non-probe instruction

While merging the instructions' weight for the whole BB, it will filter out the error code. But If all instructions of the BB give error_code, the outside logic will mark it as a BB requiring the inference algorithm to infer its weight. This is different from the zero value which will be treated as a cold block.

Fix one place that if we can't find the FunctionSamples in the profile data which indicates the BB is cold, we choose to return zero.

Also refine the comments.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D102007
2021-05-14 14:06:09 -07:00
Joseph Huber 8b57ed09bd [OpenMP] Prevent Attributor from deleting functions in OpenMPOptCGSCC pass
Summary:
This patch prevents the Attributor instances made in the CGSCC pass from
deleting functions. This prevents the attributor from changing the call
graph while OpenMPOpt is working with it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102363
2021-05-13 16:35:23 -04:00
Fangrui Song 129f466e22 [GlobalOpt] Remove heap SROA
GlobalOpt implements a heap SROA (SROA for an malloc allocatated struct or array
of structs) which is largely undertested (heap-sra-[1234].ll are basically the
same test with very little difference) and does not trigger at all when
bootstrapping clang (it only supports the case of one single store).

The heap SROA implementation causes PR50027 (GEP is not properly handled; crash or miscompile).
Just drop the implementation. I have deleted some obviously duplicated tests
but kept `heap-sra-[12]{,-no-nullopt}.ll`.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102257
2021-05-11 11:34:37 -07:00
Eli Friedman 61cbbba7a6 [ArgumentPromotion] Fix byval alignment handling.
Make sure the alignment of the generated operations matches the
alignment of the byval argument.  Previously, we were just ignoring
alignment and getting lucky.

While I'm here, also delete the unnecessary "tail" handling.
Passing a pointer to a byval argument to a "tail" call is UB, so
rewriting to an alloca doesn't require any special handling.

Differential Revision: https://reviews.llvm.org/D89819
2021-05-11 11:22:18 -07:00
Sanjay Patel 88d8f10baf [PassManager] add helper function to hold set of vector passes (2nd try)
This is better no-functional-change-intended than the 1st attempt.
As noted in D102002, there were at least 2 diffs that went
unchecked in pass manager regressions tests: different pass
parameters (SimplifyCFG) and an extension point/callback.
Those should be lifted from the original code blocks correctly
now.
2021-05-10 14:43:00 -04:00
Sanjay Patel 822be4bec8 Revert "[PassManager] add helper function to hold set of vector passes"
This reverts commit fefcb1f878.
It was supposed to be NFC, but as noted in the post-commit
comments in D102002, that was not true: SimplifyCFG uses
different parameters and there's a difference in an
extension point / callback.
2021-05-10 10:59:30 -04:00
Arthur Eubanks 34a8a437bf [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
Adrian Prantl c6ddf669dc Fix the module-enabled build by removing a redundant type definition. 2021-05-07 14:45:17 -07:00
Fangrui Song d8aba75a76 Internalize some cl::opt global variables or move them under namespace llvm 2021-05-07 11:15:43 -07:00
Sanjay Patel fefcb1f878 [PassManager] add helper function to hold set of vector passes
This is no-functional-change-intended (NFC) and split off from
D102002 (which proposes to eliminate the LTO-based differences).
2021-05-06 15:36:15 -04:00
Arthur Eubanks d14d84af2f [NewPM] Only invalidate modified functions' analyses in CGSCC passes
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved.

So far this only touches the inliner, argpromotion, funcattrs, and
updateCGAndAnalysisManager(), since they are the most used.

Slight compile time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=326da4adcb8def2abdd530299d87ce951c0edec9&to=8942c7669f330082ef159f3c6c57c3c28484f4be&stat=instructions

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D100917
2021-05-03 17:21:44 -07:00
Joseph Huber 182831258b [Attributor] Add AAExecutionDomainInfo interface to OpenMPOpt
Summary:
Add the AAExecutionDomainInfo attributor instance to OpenMPOpt.
This will infer information relating to domain information that an
instruction might be expecting in. Right now this only includes a very
crude check for instructions that will be executed by the master thread
by comparing a thread-id function with a constant zero.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101578
2021-05-03 19:24:19 -04:00
Florian Hahn ed9df5bd2f
[Passes] Run sinking/hoisting in SimplifyCFG earlier.
Hoisting and sinking instructions out of conditional blocks enables
additional vectorization by:

1. Executing memory accesses unconditionally.
2. Reducing the number of instructions that need predication.

After disabling early hoisting / sinking, we miss out on a few
vectorization opportunities. One of those is causing a ~10% performance
regression in one of the Geekbench benchmarks on AArch64.

This patch tires to recover the regression by running hoisting/sinking
as part of a SimplifyCFG run after LoopRotate and before LoopVectorize.

Note that in the legacy pass-manager, we run LoopRotate just before
vectorization again and there's no SimplifyCFG run in between, so the
sinking/hoisting may impact the later run on LoopRotate. But the impact
should be limited and the benefit of hosting/sinking at this stage
should outweigh the risk of not rotating.

Compile-time impact looks slightly positive for most cases.
http://llvm-compile-time-tracker.com/compare.php?from=2ea7fb7b1c045a7d60fcccf3df3ebb26aa3699e5&to=e58b4a763c691da651f25996aad619cb3d946faf&stat=instructions

NewPM-O3: geomean -0.19%
NewPM-ReleaseThinLTO: geoman -0.54%
NewPM-ReleaseLTO-g: geomean -0.03%

With a few benchmarks seeing a notable increase, but also some
improvements.

Alternative to D101290.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101468
2021-04-30 12:23:57 +01:00
Amanieu d'Antras ad9ce8142d [ConstantMerge] Don't merge thread_local constants with non-thread_local constants
Fixes PR49932

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D100322
2021-04-28 23:44:20 +01:00
Benjamin Kramer 7e5682ee62 [ADT] Make TrackingStatistic's ctor constexpr
This lets clang diagnose unused statistics, so remove them.
2021-04-28 12:00:17 +02:00
Arthur Eubanks fd1ff5ee03 [Inliner] Make ModuleInlinerWrapperPass return PreservedAnalyses::all()
The ModulePassManager should already have taken care of all analysis
invalidation. Without this change, upcoming changes will cause more
invalidation than necessary.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D101320
2021-04-26 17:22:35 -07:00
Hongtao Yu 5f2d730073 [CSSPGO] Fix incorrect prorating indirect call distribution factor that leads to target count loss.
Pseudo probe distribution factor is used to scale down profile samples to avoid misleading the counts inference due to the usage of "maximum" in `getBlockWeight`. For callsites, the scaling down can come from code duplication prior to the sample profile loader (prelink or postlink), or due to the indirect call promotion in sample loader inliner. This patch fixes an issue in sample loader ICP where the leftover indirect callsite scaling down causes the loss of non-promoted call target samples unexpectedly. While the scaling down is to favor BFI/BPI with accurate an callsite count, it doesn't fit in the current distribution factor that represents code duplication changes. Ideally,  we would need two factors, one is for code duplication, the other is for ICP. However this seems over complicated. I'm going to trade one usage (callsite counts) for the other (call target counts).

Seeing perf win on one benchmark (mcf) of SPEC2017 with others unchanged.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D100993
2021-04-23 11:09:22 -07:00
Arthur Eubanks 16ff1a7023 [GlobalOpt] Don't replace alias with aliasee if aliasee is interposable
Both the alias and aliasee linkage are important.

PR27866 provides some background.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99629
2021-04-22 13:12:34 -07:00
Giorgis Georgakoudis a2dbfb6b72 [OpenMP] Simplify offloading parallel call codegen
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.

Reviewed By: jdoerfert, Meinersbur

Differential Revision: https://reviews.llvm.org/D95976
2021-04-21 18:46:07 -07:00
sstefan1 62cdcd6c5a [FuncAttrs] Don't infer willreturn for nonexact definitions
Discovered during attributor testing comparing stats with
and without the attributor. Willreturn should not be inferred
for nonexact definitions.

Differential Revision: https://reviews.llvm.org/D100988
2021-04-21 21:26:09 +02:00
Arthur Eubanks 326da4adcb [FuncAttrs] Always preserve FunctionAnalysisManagerCGSCCProxy
FunctionAnalysisManagerCGSCCProxy should not be preserved if any of its
keys may be invalid. Since we are not removing/adding functions in
FuncAttrs, it's fine to preserve it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D100893
2021-04-20 16:37:45 -07:00
Joseph Huber b2ad63d3cf [OpenMP] Add OpenMPOpt as a Module pass
Summary:
This patch registers OpenMPOpt as a Module pass in addition to a CGSCC
pass. This is so certain optimzations that are sensitive to intact
call-sites can happen before inlining. The old `openmpopt` pass name is
changed to `openmp-opt-cgscc` and `openmp-opt` calls the Module pass.
The current module pass only runs a single check but will be expanded in
the future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D99202
2021-04-20 12:28:58 -04:00
Arthur Eubanks 5e71b9fa93 Explicitly pass type to cast load constant folding result
Previously we would use the type of the pointee to determine what to
cast the result of constant folding a load. To aid with opaque pointer
types, we should explicitly pass the type of the load rather than
looking at pointee types.

ConstantFoldLoadThroughBitcast() converts the const prop'd value to the
proper load type (e.g. [1 x i32] -> i32). Instead of calling this in
every intermediate step like bitcasts, we only call this when we
actually see the global initializer value.

In some existing uses of this API, we don't know the exact type we're
loading from immediately (e.g. first we visit a bitcast, then we visit
the load using the bitcast). In those cases we have to manually call
ConstantFoldLoadThroughBitcast() when simplifying the load to make sure
that we cast to the proper type.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100718
2021-04-20 00:53:21 -07:00
Zequan Wu e28435caf6 [ThinLTO] Copy UnnamedAddr when spliting module.
The unnamedaddr property of a function is lost when using
`-fwhole-program-vtables` and thinlto which causes size increase under linker's
safe icf mode.

The size increase of chrome on Linux when switching from all icf to safe icf
drops from 5 MB to 3 MB after this change, and from 6 MB to 4 MB on Windows.

There is a repro:
```
# a.h
struct A {
  virtual int f();
  virtual int g();
};

# a.cpp
#include "a.h"
int A::f() { return 10; }
int A::g() { return 10; }

# main.cpp
#include "a.h"

int g(A* a) {
  return a->f();
}

int main(int argv, char** args) {
  A a;
  return g(&a);
}

$ clang++ -O2 -ffunction-sections -flto=thin -fwhole-program-vtables -fsplit-lto-unit -c main.cpp -o main.o  && clang++ -Wl,--icf=safe -fuse-ld=lld  -flto=thin main.o -o a.out && llvm-readobj -t a.out | grep -A 1 -e _ZN1A1fEv -e _ZN1A1gEv
    Name: _ZN1A1fEv (480)
    Value: 0x201830
--
    Name: _ZN1A1gEv (490)
    Value: 0x201840
```

Differential Revision: https://reviews.llvm.org/D100498
2021-04-19 14:04:58 -07:00
Philip Reames 3c54762226 [funcattrs] Consistently check call site attributes
This is mostly stylistic cleanup after D100226, but not entirely. When skimming the code, I found one case where we weren't accounting for attributes on the callsite at all. I'm also suspicious we had some latent bugs related to operand bundles (which are supposed to be able to *override* attributes on declarations), but I don't have concrete test cases for those, just suspicions.

Aside: The only case left in the file which directly checks attributes on the declaration is the norecurse logic. I left that because I didn't understand it; it looks obviously wrong, so I suspect I'm misinterpreting the intended semantics of the attribute.

Differential Revision: https://reviews.llvm.org/D100689
2021-04-19 13:20:50 -07:00
Philip Reames 11707435cc [inferattrs] Don't infer lib func attributes for nobuiltin functions
If we have a nobuiltin function, we can't assume we know anything about the implementation.

I noticed this when tracing through a log from an in the wild miscompile (https://github.com/emscripten-core/emscripten/issues/9443) triggered after 8666463.  We were incorrectly assuming that a custom allocator could not free.  (It's not clear yet this is the only problem in said issue.)

I also noticed something similiar mentioned in the commit message of ab243e when scrolling back through history.  Through, from what I can tell, that commit fixed symptom not root cause.

The interface we have for library function detection is extremely error prone, but given the interaction between ``nobuiltin`` decls and ``builtin`` callsites, it's really hard to imagine something much cleaner.  I may iterate on that, but it'll be invasive enough I didn't want to hold an obvious functional fix on it.
2021-04-16 15:36:15 -07:00
Philip Reames f549176ad9 [funcattrs] Add the maximal set of implied attributes to definitions
Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time.

Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now.

The old behavior can end up quite confusing for two reasons:
* Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.)
* We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent.

I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement.

Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own.

Differential Revision: https://reviews.llvm.org/D100226
2021-04-16 14:22:19 -07:00
Philip Reames dd985551c2 Reapply "[InferAttributes] Materialize all infered attributes for declaration"" and follow on patches.
This reverts commit ab98f2c712 and 98eea392cd.

It includes a fix for the clang test which triggered the revert.  I failed to notice this one because there was another AMDGPU llvm test with a similiar name and the exact same text in the error message.  Odd.  Since only one build bot reported the clang test, I didn't notice that one.
2021-04-14 16:38:07 -07:00
Nico Weber ab98f2c712 Revert "[InferAttributes] Materialize all infered attributes for declaration"
Breaks check-clang, see comments on D100400

Also revert follow-up "[NFC] Move a recently added utility into a location to enable reuse"

This reverts commit 3ce61fb6d6.
This reverts commit 61a85da882.
2021-04-14 18:41:20 -04:00
Philip Reames 3ce61fb6d6 [NFC] Move a recently added utility into a location to enable reuse
About to refresh a patch that uses this in FunctionAtrrs, doing the move seperately to control build times.
2021-04-14 15:05:16 -07:00
Philip Reames 61a85da882 [InferAttributes] Materialize all infered attributes for declaration
We have some cases today where attributes can be inferred from another on access, but the result is not explicitly materialized in IR. This change is a step towards changing that.

Why? Two main reasons:

* Human clarity. It's really confusing trying to figure out why a transform is triggering when the IR doesn't appear to have the required attributes.
* This avoids the need to special case declarations in e.g. functionattrs. Since we can assume the attribute is present, we can work directly from attributes (and only attributes) without also needing to query accessors on Function to avoid missing cases due to unannotated (but infered on use) declarations. (This piece will appear must easier to follow once D100226 also lands.)

Differential Revision: https://reviews.llvm.org/D100400
2021-04-14 14:45:24 -07:00
Sterling Augustine 32e264921b Revert "[GlobalOpt] Revert valgrind hacks"
This reverts commit dbc16ed199.
2021-04-13 17:47:07 -07:00
Evgeny Leviant dbc16ed199 [GlobalOpt] Revert valgrind hacks
Differential revision: https://reviews.llvm.org/D69428
2021-04-13 19:11:10 +03:00
Sanjay Patel 661cc71a1c [PassManager][PhaseOrdering] lower expects before running simplifyCFG
Retry of 330619a3a6 that includes a clang test update.

Original commit message:

If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898 <https://reviews.llvm.org/D98898>.

In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.

We make the same change to the old pass manager to keep things synchronized.

Differential Revision: https://reviews.llvm.org/D100213
2021-04-12 15:07:53 -04:00
Sanjay Patel 23ac9d1e6e Revert "[PassManager][PhaseOrdering] lower expects before running simplifyCFG"
This reverts commit 330619a3a6.
There are clang tests that also need to be updated.
2021-04-12 13:58:54 -04:00
Sanjay Patel 330619a3a6 [PassManager][PhaseOrdering] lower expects before running simplifyCFG
If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898.

In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.

We make the same change to the old pass manager to keep things synchronized.

Differential Revision: https://reviews.llvm.org/D100213
2021-04-12 12:23:31 -04:00
Roman Lebedev e8c7f43e2c
[NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:38:55 +03:00
Roman Lebedev 7b12c8c59d
Revert "[NFC][ConstantRange] Add 'icmp' helper method"
This reverts commit 17cf2c9423.
2021-04-10 19:37:53 +03:00
Roman Lebedev 17cf2c9423
[NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:09:52 +03:00
Philip Reames 35393c865c [funcattrs] Infer nosync from instruction walk
Pretty straightforward use of existing infrastructure and port of the attributor inference rules for nosync.

A couple points of interest:
* I deliberately switched from "monotonic or better" to "unordered or better". This is simply me being conservative and is better in line with the rest of the optimizer. We treat monotonic conservatively pretty much everywhere.
* The operand bundle test change is suspicious. It looks like we might have missed something here, but if so, it's an issue with the existing nofree inference as well. I'm going to take a closer look at that separately.
* I needed to keep the previous inference from readnone. This surprised me, but made sense once I realized readonly inference goes to lengths to reason about local vs non-local memory and that writes to local memory are okay. This is fine for the purpose of nosync, but would e.g. prevent us from inferring nofree from readnone - which is slightly surprising.

Differential Revision: https://reviews.llvm.org/D99769
2021-04-08 14:05:00 -07:00
wlei 6d5132b426 [CSSPGO] Fix incorrect probe distribution factor computation in top-down inliner
We see a regression related to low probe factor(0.01) which prevents some callsites being promoted in ICPPass and later cause the missing inline in CGSCC inliner. The root cause is due to redundant(the second) multiplication of the probe factor and this change try to fix it.

`Sum` does multiply a factor right after findCallSamples but later when using as the parameter in setProbeDistributionFactor, it multiplies one again.

This change could get ~2% perf back on mcf benchmark. In mcf, previously the corresponding factor is 1 and it's the recent feature introducing the <1 factor then trigger this bug.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D99787
2021-04-07 08:48:59 -07:00
Philip Reames 908215b346 Use AssumeInst in a few more places [nfc]
Follow up to a6d2a8d6f5.  These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-04-06 13:18:53 -07:00
Philip Reames 9ef6aa020b Plumb AssumeInst through operand bundle apis [nfc]
Follow up to a6d2a8d6f5.  This covers all the public interfaces of the bundle related code.  I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
2021-04-06 12:53:53 -07:00
Jan Svoboda fb6a5237aa Revert "[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction"
This reverts commit 167ea67d

This causes a bunch of build failures:
* http://lab.llvm.org:8011/#/builders/121/builds/6287
* http://green.lab.llvm.org/green/job/clang-stage1-RA/19915
2021-04-06 16:33:28 +02:00
Abhina Sreeskantharajan 82b3e28e83 [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.

Solution:
This patch adds two new flags

  - OF_CRLF which indicates that CRLF translation is used.
  - OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.

Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.

So this is the behaviour per platform with my patch:

z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return

The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
  if (Flags & OF_CRLF)
    CrtOpenFlags |= _O_TEXT;
```

These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99426
2021-04-06 07:23:31 -04:00
madhur13490 167ea67d76 [IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction
This patch enhances hasAddressTaken() to ignore bitcasts as a
callee in callbase instruction. Such bitcast usage doesn't really take
the address in a useful meaningful way.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D98884
2021-04-06 09:23:46 +00:00
Roman Lebedev a26f1bf67e
[PassManager] Run additional LICM before LoopRotate
Loop rotation often has to perform code duplication
from header into preheader, which introduces PHI nodes.

>>! In D99204, @thopre wrote:
>
> With loop peeling, it is important that unnecessary PHIs be avoided or
> it will leads to spurious peeling. One source of such PHIs is loop
> rotation which creates PHIs for invariant loads. Those PHIs are
> particularly problematic since loop peeling is now run as part of simple
> loop unrolling before GVN is run, and are thus a source of spurious
> peeling.
>
> Note that while some of the load can be hoisted and eventually
> eliminated by instruction combine, this is not always possible due to
> alignment issue. In particular, the motivating example [1] was a load
> inside a class instance which cannot be hoisted because the `this'
> pointer has an alignment of 1.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp

Now, we could enhance LoopRotate to avoid duplicating code when not needed,
but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*)
We have LICM, and in fact we already run it right after LoopRotation.

We could try to move it to before LoopRotation,
that is basically free from compile-time perspective:
https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions
But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular:
| statistic name                                   | LoopRotate-LICM | LICM-LoopRotate |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                         | 9015930         | 9015799         |  -131 |   0.00% |  0.00% |
| indvars.NumElimCmp                               | 3536            | 3544            |     8 |   0.23% |  0.23% |
| indvars.NumElimExt                               | 36725           | 36580           |  -145 |  -0.39% |  0.39% |
| indvars.NumElimIV                                | 1197            | 1187            |   -10 |  -0.84% |  0.84% |
| indvars.NumElimIdentity                          | 143             | 136             |    -7 |  -4.90% |  4.90% |
| indvars.NumElimRem                               | 4               | 5               |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                                  | 29842           | 29890           |    48 |   0.16% |  0.16% |
| indvars.NumReplaced                              | 2293            | 2227            |   -66 |  -2.88% |  2.88% |
| indvars.NumSimplifiedSDiv                        | 6               | 8               |     2 |  33.33% | 33.33% |
| indvars.NumWidened                               | 26438           | 26329           |  -109 |  -0.41% |  0.41% |
| instcount.TotalBlocks                            | 1178338         | 1173840         | -4498 |  -0.38% |  0.38% |
| instcount.TotalFuncs                             | 111825          | 111829          |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                             | 9905442         | 9896139         | -9303 |  -0.09% |  0.09% |
| lcssa.NumLCSSA                                   | 425871          | 423961          | -1910 |  -0.45% |  0.45% |
| licm.NumHoisted                                  | 378357          | 378753          |   396 |   0.10% |  0.10% |
| licm.NumMovedCalls                               | 2193            | 2208            |    15 |   0.68% |  0.68% |
| licm.NumMovedLoads                               | 35899           | 31821           | -4078 | -11.36% | 11.36% |
| licm.NumPromoted                                 | 11178           | 11154           |   -24 |  -0.21% |  0.21% |
| licm.NumSunk                                     | 13359           | 13587           |   228 |   1.71% |  1.71% |
| loop-delete.NumDeleted                           | 8547            | 8402            |  -145 |  -1.70% |  1.70% |
| loop-instsimplify.NumSimplified                  | 12876           | 11890           |  -986 |  -7.66% |  7.66% |
| loop-peel.NumPeeled                              | 1008            | 925             |   -83 |  -8.23% |  8.23% |
| loop-rotate.NumNotRotatedDueToHeaderSize         | 368             | 365             |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                           | 42015           | 42003           |   -12 |  -0.03% |  0.03% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 240             | 242             |     2 |   0.83% |  0.83% |
| loop-simplifycfg.NumLoopExitsDeleted             | 497             | 20              |  -477 | -95.98% | 95.98% |
| loop-simplifycfg.NumTerminatorsFolded            | 618             | 336             |  -282 | -45.63% | 45.63% |
| loop-unroll.NumCompletelyUnrolled                | 11028           | 11032           |     4 |   0.04% |  0.04% |
| loop-unroll.NumUnrolled                          | 12608           | 12529           |   -79 |  -0.63% |  0.63% |
| mem2reg.NumDeadAlloca                            | 10222           | 10221           |    -1 |  -0.01% |  0.01% |
| mem2reg.NumPHIInsert                             | 192110          | 192106          |    -4 |   0.00% |  0.00% |
| mem2reg.NumSingleStore                           | 637650          | 637643          |    -7 |   0.00% |  0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 814             | 812             |    -2 |  -0.25% |  0.25% |
| scalar-evolution.NumTripCountsComputed           | 283108          | 282934          |  -174 |  -0.06% |  0.06% |
| scalar-evolution.NumTripCountsNotComputed        | 106712          | 106718          |     6 |   0.01% |  0.01% |
| simple-loop-unswitch.NumBranches                 | 5178            | 4752            |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 914             | 503             |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches                 | 20              | 18              |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial                  | 183             | 95              |   -88 | -48.09% | 48.09% |

... but that actually regresses LICM (-12% `licm.NumMovedLoads`),
loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`),
simple-loop-unswitch (`NumTrivial`).

What if we instead have LICM both before and after LoopRotate?
| statistic name                                | LoopRotate-LICM | LICM-LoopRotate-LICM |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                      | 9015930         | 9014474              | -1456 |  -0.02% |  0.02% |
| indvars.NumElimCmp                            | 3536            | 3546                 |    10 |   0.28% |  0.28% |
| indvars.NumElimExt                            | 36725           | 36681                |   -44 |  -0.12% |  0.12% |
| indvars.NumElimIV                             | 1197            | 1185                 |   -12 |  -1.00% |  1.00% |
| indvars.NumElimIdentity                       | 143             | 146                  |     3 |   2.10% |  2.10% |
| indvars.NumElimRem                            | 4               | 5                    |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                               | 29842           | 29899                |    57 |   0.19% |  0.19% |
| indvars.NumReplaced                           | 2293            | 2299                 |     6 |   0.26% |  0.26% |
| indvars.NumSimplifiedSDiv                     | 6               | 8                    |     2 |  33.33% | 33.33% |
| indvars.NumWidened                            | 26438           | 26404                |   -34 |  -0.13% |  0.13% |
| instcount.TotalBlocks                         | 1178338         | 1173652              | -4686 |  -0.40% |  0.40% |
| instcount.TotalFuncs                          | 111825          | 111829               |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                          | 9905442         | 9895452              | -9990 |  -0.10% |  0.10% |
| lcssa.NumLCSSA                                | 425871          | 425373               |  -498 |  -0.12% |  0.12% |
| licm.NumHoisted                               | 378357          | 383352               |  4995 |   1.32% |  1.32% |
| licm.NumMovedCalls                            | 2193            | 2204                 |    11 |   0.50% |  0.50% |
| licm.NumMovedLoads                            | 35899           | 35755                |  -144 |  -0.40% |  0.40% |
| licm.NumPromoted                              | 11178           | 11163                |   -15 |  -0.13% |  0.13% |
| licm.NumSunk                                  | 13359           | 14321                |   962 |   7.20% |  7.20% |
| loop-delete.NumDeleted                        | 8547            | 8538                 |    -9 |  -0.11% |  0.11% |
| loop-instsimplify.NumSimplified               | 12876           | 12041                |  -835 |  -6.48% |  6.48% |
| loop-peel.NumPeeled                           | 1008            | 924                  |   -84 |  -8.33% |  8.33% |
| loop-rotate.NumNotRotatedDueToHeaderSize      | 368             | 365                  |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                        | 42015           | 42005                |   -10 |  -0.02% |  0.02% |
| loop-simplifycfg.NumLoopBlocksDeleted         | 240             | 241                  |     1 |   0.42% |  0.42% |
| loop-simplifycfg.NumTerminatorsFolded         | 618             | 619                  |     1 |   0.16% |  0.16% |
| loop-unroll.NumCompletelyUnrolled             | 11028           | 11029                |     1 |   0.01% |  0.01% |
| loop-unroll.NumUnrolled                       | 12608           | 12525                |   -83 |  -0.66% |  0.66% |
| mem2reg.NumPHIInsert                          | 192110          | 192073               |   -37 |  -0.02% |  0.02% |
| mem2reg.NumSingleStore                        | 637650          | 637652               |     2 |   0.00% |  0.00% |
| scalar-evolution.NumTripCountsComputed        | 283108          | 282998               |  -110 |  -0.04% |  0.04% |
| scalar-evolution.NumTripCountsNotComputed     | 106712          | 106691               |   -21 |  -0.02% |  0.02% |
| simple-loop-unswitch.NumBranches              | 5178            | 5185                 |     7 |   0.14% |  0.14% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 925                  |    11 |   1.20% |  1.20% |
| simple-loop-unswitch.NumTrivial               | 183             | 179                  |    -4 |  -2.19% |  2.19% |
| simple-loop-unswitch.NumBranches              | 5178            | 4752                 |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 503                  |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches              | 20              | 18                   |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial               | 183             | 95                   |   -88 | -48.09% | 48.09% |

I.e. we end up with less instructions, less peeling, more LICM activity,
also note how none of those 4 regressions are here. Namely:

| statistic name                                   | LICM-LoopRotate | LICM-LoopRotate-LICM |     Δ |        % |   abs(%) |
| asm-printer.EmittedInsts                         | 9015799         | 9014474              | -1325 |   -0.01% |    0.01% |
| indvars.NumElimCmp                               | 3544            | 3546                 |     2 |    0.06% |    0.06% |
| indvars.NumElimExt                               | 36580           | 36681                |   101 |    0.28% |    0.28% |
| indvars.NumElimIV                                | 1187            | 1185                 |    -2 |   -0.17% |    0.17% |
| indvars.NumElimIdentity                          | 136             | 146                  |    10 |    7.35% |    7.35% |
| indvars.NumLFTR                                  | 29890           | 29899                |     9 |    0.03% |    0.03% |
| indvars.NumReplaced                              | 2227            | 2299                 |    72 |    3.23% |    3.23% |
| indvars.NumWidened                               | 26329           | 26404                |    75 |    0.28% |    0.28% |
| instcount.TotalBlocks                            | 1173840         | 1173652              |  -188 |   -0.02% |    0.02% |
| instcount.TotalInsts                             | 9896139         | 9895452              |  -687 |   -0.01% |    0.01% |
| lcssa.NumLCSSA                                   | 423961          | 425373               |  1412 |    0.33% |    0.33% |
| licm.NumHoisted                                  | 378753          | 383352               |  4599 |    1.21% |    1.21% |
| licm.NumMovedCalls                               | 2208            | 2204                 |    -4 |   -0.18% |    0.18% |
| licm.NumMovedLoads                               | 31821           | 35755                |  3934 |   12.36% |   12.36% |
| licm.NumPromoted                                 | 11154           | 11163                |     9 |    0.08% |    0.08% |
| licm.NumSunk                                     | 13587           | 14321                |   734 |    5.40% |    5.40% |
| loop-delete.NumDeleted                           | 8402            | 8538                 |   136 |    1.62% |    1.62% |
| loop-instsimplify.NumSimplified                  | 11890           | 12041                |   151 |    1.27% |    1.27% |
| loop-peel.NumPeeled                              | 925             | 924                  |    -1 |   -0.11% |    0.11% |
| loop-rotate.NumRotated                           | 42003           | 42005                |     2 |    0.00% |    0.00% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 242             | 241                  |    -1 |   -0.41% |    0.41% |
| loop-simplifycfg.NumLoopExitsDeleted             | 20              | 497                  |   477 | 2385.00% | 2385.00% |
| loop-simplifycfg.NumTerminatorsFolded            | 336             | 619                  |   283 |   84.23% |   84.23% |
| loop-unroll.NumCompletelyUnrolled                | 11032           | 11029                |    -3 |   -0.03% |    0.03% |
| loop-unroll.NumUnrolled                          | 12529           | 12525                |    -4 |   -0.03% |    0.03% |
| mem2reg.NumDeadAlloca                            | 10221           | 10222                |     1 |    0.01% |    0.01% |
| mem2reg.NumPHIInsert                             | 192106          | 192073               |   -33 |   -0.02% |    0.02% |
| mem2reg.NumSingleStore                           | 637643          | 637652               |     9 |    0.00% |    0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 812             | 814                  |     2 |    0.25% |    0.25% |
| scalar-evolution.NumTripCountsComputed           | 282934          | 282998               |    64 |    0.02% |    0.02% |
| scalar-evolution.NumTripCountsNotComputed        | 106718          | 106691               |   -27 |   -0.03% |    0.03% |
| simple-loop-unswitch.NumBranches                 | 4752            | 5185                 |   433 |    9.11% |    9.11% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 503             | 925                  |   422 |   83.90% |   83.90% |
| simple-loop-unswitch.NumSwitches                 | 18              | 20                   |     2 |   11.11% |   11.11% |
| simple-loop-unswitch.NumTrivial                  | 95              | 179                  |    84 |   88.42% |   88.42% |

{F15983613} {F15983615} {F15983616}
(this is vanilla llvm testsuite + rawspeed + darktable)

As an example of the code where early LICM only is bad, see:
https://godbolt.org/z/GzEbacs4K

This does have an observable compile-time regression of +~0.5% geomean
https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions
but i think that's basically nothing, and there's potential that it might
be avoidable in the future by fixing clang to produce alignment information
on function arguments, thus making the second run unneeded.

Differential Revision: https://reviews.llvm.org/D99249
2021-04-02 11:11:42 +03:00
Philip Reames b23a314146 [funcattrs] Respect nofree attribute on callsites (not just callee) 2021-04-01 14:45:49 -07:00
Philip Reames 1e69a5af92 [Attributor] Cleanup detection of non-relaxed atomics in nosync inference
The code was checking for cases which are disallowed by the verifier.  Delete dead code and adjust style.
2021-04-01 12:01:29 -07:00
Philip Reames 8e596f7e27 [Attributor] Cleanup intrinsic handling in nosync inference [mostly NFC]
Mostly stylistic adjustment, but the old code didn't handle the memcpy.inline intrinsic.  By using the matcher class, we now do.
2021-04-01 11:49:59 -07:00
Philip Reames 6ef4505298 [funcattrs] Infer nosync from readnone and non-convergent
This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (0626367202).

This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive.

Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work.

Differential Revision: https://reviews.llvm.org/D99749
2021-04-01 11:37:34 -07:00
Philip Reames ffa15e9463 Extract isVolatile helper on Instruction [NFCI]
We have this logic duplicated in several cases, none of which were exhaustive.  Consolidate it in one place.

I don't believe this actually impacts behavior of the callers.  I think they all filter their inputs such that their partial implementations were correct.  If not, this might be fixing a cornercase bug.
2021-04-01 11:24:02 -07:00
Philip Reames 6b05d753e0 Mark unordered memset/memmove/memcpy as nosync
Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.
2021-04-01 10:38:54 -07:00
spupyrev 22998738e8 [SamplePGO] Keeping prof metadata for IndirectBrInst
Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot.
This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D99550
2021-03-30 10:44:48 -07:00
Hongtao Yu 3e3fc431df [CSSPGO] Top-down processing order based on full profile.
Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example:

1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them.

2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining.

3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph.

4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions.

Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4.

I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%.

The change is an enhancement to https://reviews.llvm.org/D95988.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D99351
2021-03-30 10:42:22 -07:00
Krasimir Georgiev c51e91e046 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5178ffc7cf.

Compiling `llvm-profdata` with a compiler build from this produces a
crashing binary.
2021-03-30 14:13:37 +02:00
Sander de Smalen f71ed5dfe2 NFC: Migrate PartialInlining to work on InstructionCost
This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97382
2021-03-30 11:59:45 +01:00
Huihui Zhang ca721042f1 [IPO][SampleContextTracker] Use SmallVector to track context profiles to prevent non-determinism.
Use SmallVector instead of SmallSet to track the context profiles mapped. Doing this
can help avoid non-determinism caused by iterating over unordered containers.

This bug was found with reverse iteration turning on,
--extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON".
Failing LLVM test profile-context-tracker-debug.ll .

Reviewed By: MaskRay, wenlei

Differential Revision: https://reviews.llvm.org/D99547
2021-03-29 16:37:10 -07:00
Gulfem Savrun Yeniceri 5178ffc7cf [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-29 21:53:32 +00:00
Wenlei He 30b0232336 [CSSPGO][llvm-profgen] Context-sensitive global pre-inliner
This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen.

It will serve two purposes:
  1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size.
  2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation.

Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler.

This change only has the framework, with a few TODOs left for follow up patches for a complete implementation:
  1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation.
  2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR.

Differential Revision: https://reviews.llvm.org/D99146
2021-03-29 09:46:14 -07:00
Hans Wennborg c6e5c4654b Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places
Using $ breaks demangling of the symbols. For example,

$ c++filt _Z3foov\$123
_Z3foov$123

This causes problems for developers who would like to see nice stack traces
etc., but also for automatic crash tracking systems which try to organize
crashes based on the stack traces.

Instead, use the period as suffix separator, since Itanium demanglers normally
ignore such suffixes:

$ c++filt _Z3foov.123
foo() [clone .123]

This is already done in some places; try to do it everywhere.

Differential revision: https://reviews.llvm.org/D97484
2021-03-29 13:03:52 +02:00
Hongtao Yu 12ac0403b1 [CSSPGO][NFC] Fix a debug dump issue.
During context promotion, intermediate nodes that are on a call path but do not come with a profile can be promoted together with their parent nodes. Do not print sample context string for such nodes since they do not have profile.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D99441
2021-03-26 16:06:56 -07:00
Chris Lattner 62c41cfba1 Add a missing file header comment, NFC. 2021-03-26 15:34:04 -07:00
Nikita Popov 4622648a06 Revert "[ArgPromotion] Copy additional metadata for loads."
This reverts commit 166620a4f0.

A miscompile has been reported in https://reviews.llvm.org/D93927#2653480
and following.
2021-03-26 21:34:54 +01:00
Wenlei He 5f59f407f5 [CSSPGO] Minor tweak for inline candidate priority tie breaker
When prioritize call site to consider for inlining in sample loader, use number of samples as a first tier breaker before using name/guid comparison. This would favor smaller functions when hotness is the same (from the same block). We could try to retrieve accurate function size if this turns out to be more important.

Differential Revision: https://reviews.llvm.org/D99370
2021-03-25 21:15:36 -07:00
Guozhi Wei 3240910f00 [DAE] Adjust param/arg attributes when changing parameter to undef
In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code.

To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg.

Differential Revision: https://reviews.llvm.org/D98899
2021-03-25 14:53:22 -07:00
Gulfem Savrun Yeniceri 5fbe1fdf17 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5fd001a5ff
because it broke clang-with-thin-lto-ubuntu bot.
2021-03-24 18:59:33 +00:00
Gulfem Savrun Yeniceri 5fd001a5ff [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-24 17:31:18 +00:00
Gulfem Savrun Yeniceri e3a6d70c68 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 78a65cd945 which
caused buildbot failures.
2021-03-23 00:43:16 +00:00
Gulfem Savrun Yeniceri 78a65cd945 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-22 22:09:02 +00:00
Bradley Smith 48f5a392cb [IR] Add vscale_range IR function attribute
This attribute represents the minimum and maximum values vscale can
take. For now this attribute is not hooked up to anything during
codegen, this will be added in the future when such codegen is
considered stable.

Additionally hook up the -msve-vector-bits=<x> clang option to emit this
attribute.

Differential Revision: https://reviews.llvm.org/D98030
2021-03-22 12:05:06 +00:00
Philip Reames 5698537f81 Update basic deref API to account for possiblity of free [NFC]
This patch is plumbing to support work towards the goal outlined in the recent llvm-dev post "[llvm-dev] RFC: Decomposing deref(N) into deref(N) + nofree".

The point of this change is purely to simplify iteration on other pieces on way to making the switch. Rebuilding with a change to Value.h is slow and painful, so I want to get the API change landed. Once that's done, I plan to more closely audit each caller, add the inference rules in their own patch, then post a patch with the langref changes and test diffs. The value of the command line flag is that we can exercise the inference logic in standalone patches without needing the whole switch ready to go just yet.

Differential Revision: https://reviews.llvm.org/D98908
2021-03-19 11:17:19 -07:00
Wei Mi 14756b70ee [SampleFDO] Don't mix up the existing indirect call value profile with the new
value profile annotated after inlining.

In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we
use the magic number -1 in the value profile to avoid repeated indirect call
promotion to the same target for an indirect call. Function updateIDTMetaData
is used to mark an target as being promoted in the value profile with the
magic number. updateIDTMetaData is also used to update the value profile
when an indirect call is inlined and new inline instance profile should be
applied. For the second case, currently updateIDTMetaData mixes up the
existing value profile of the indirect call with the new profile, leading
to the problematic senario that a target count is larger than the total count
in the value profile.

The patch fixes the problem. When updateIDTMetaData is used to update the
value profile after inlining, all the values in the existing value profile
will be dropped except the values with the magic number counts.

Differential Revision: https://reviews.llvm.org/D98835
2021-03-18 09:54:34 -07:00
Mircea Trofin 4b1c8070bb [NFC][ArgumentPromotion] Clear FAM cached results of erased function.
Not doing it here can lead to subtle bugs - the analysis results are
associated by the Function object's address. Nothing stops the memory
allocator from allocating new functions at the same address.
2021-03-18 09:17:32 -07:00
Wenlei He a5d30421a6 [CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list
For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining.

For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work:

 - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee.
 - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added.
 - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile).
 - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate.

Differential Revision: https://reviews.llvm.org/D98590
2021-03-15 12:22:15 -07:00
Hongtao Yu beea06c106 [NFC][Inliner] Debugging support to print funtion size after each inlining.
Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D98439
2021-03-14 22:11:53 -07:00
Chenguang Wang 166620a4f0
[ArgPromotion] Copy additional metadata for loads.
Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D93927
2021-03-14 21:28:14 +00:00
Johannes Doerfert ff256c1376 [Attributor] Derive `willreturn` based on `mustprogress`
Since D86233 we have `mustprogress` which, in combination with
`readonly`, implies `willreturn`. The idea is that every side-effect
has to be modeled as a "write". Consequently, `readonly` means there
is no side-effect, and `mustprogress` guarantees that we cannot "loop"
forever without side-effect.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94125
2021-03-11 23:31:44 -06:00
Nikita Popov 2fe85dd289 [Attributor] Don't access pointer elem type in constructPointer (NFC)
Splitting this out as the change is non-trivial: The way this code
handled pointer types doesn't really make sense, as GEPs can only
apply an offset to the outermost pointer, but can't drill down
into interior pointer types (which would require dereferencing
memory).

Instead give special treatment to the first (pointer) index.
I've hardcoded it to zero as that's the only way the function is
used right now, but handling non-zero indexes would be
straightforward.

The original goal here was to have an element type for CreateGEP.
2021-03-11 21:36:40 +01:00
Wenlei He 051f2c144e [SamplePGO] Skip inlinee profile scaling for sample loader inlining
For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.

Differential Revision: https://reviews.llvm.org/D98187
2021-03-11 10:18:26 -08:00
kuterd d75c9e61a5 [Attributor] Attributor call site specific AAValueConstantRange
This patch makes uses of the context bridges introduced in D83299 to make
AAValueConstantRange call site specific.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83744
2021-03-11 01:19:44 +03:00
Wei Mi ee35784a90 [SampleFDO] Support enabling -funique-internal-linkage-name.
now -funique-internal-linkage-name flag is available, and we want to flip
it on by default since it is beneficial to have separate sample profiles
for different internal symbols with the same name. As a preparation, we
want to avoid regression caused by the flip.

When we flip -funique-internal-linkage-name on, the profile is collected
from binary built without -funique-internal-linkage-name so it has no uniq
suffix, but the IR in the optimized build contains the suffix. This kind of
mismatch may introduce transient regression.

To avoid such mismatch, we introduce a NameTable section flag indicating
whether there is any name in the profile containing uniq suffix. Compiler
will decide whether to keep uniq suffix during name canonicalization
depending on the NameTable section flag. The flag is only available for
extbinary format. For other formats, by default compiler will keep uniq
suffix so they will only experience transient regression when
-funique-internal-linkage-name is just flipped.

Another type of regression is caused by places where we miss to call
getCanonicalFnName. Those places are fixed.

Differential Revision: https://reviews.llvm.org/D96932
2021-03-09 21:41:40 -08:00
Hongtao Yu 4c3d759d00 [CSSPGO] Always use callsite samples as callsite probe counts.
For CS profile, the callsite count of previously inlined callees is populated with the entry count of the callees. Therefore when trying to get a weight for calliste probe after inlinining, the callsite count should always be used. The same fix has already been made for non-probe case.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D98094
2021-03-08 22:52:36 -08:00
Fangrui Song fb2cf0dd60 [FunctionImport] Delete unneeded setLive. NFC
ValueInfo's in Worklist are guaranteed to be live.
2021-03-06 14:09:54 -08:00
Ta-Wei Tu 8a003861a3 [NPM] Add -enable-loopinterchange option to NPM
We have the `enable-loopinterchange` option in legacy pass manager but not in NPM.
Add `LoopInterchange` pass to the optimization pipeline (at the same position as before)
when `enable-loopinterchange` is turned on.

Reviewed By: aeubanks, fhahn

Differential Revision: https://reviews.llvm.org/D98116
2021-03-07 02:39:28 +08:00
William S. Moses d163e75c81 [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-06 12:57:32 -05:00
Michael Kruse b119120673 [clang][OpenMP] Use OpenMPIRBuilder for workshare loops.
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
 * Only the worksharing-loop directive.
 * Recognizes only the nowait clause.
 * No loop nests with more than one loop.
 * Untested with templates, exceptions.
 * Semantic checking left to the existing infrastructure.

This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
 * The distance function: How many loop iterations there will be before entering the loop nest.
 * The loop variable function: Conversion from a logical iteration number to the loop variable.

These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.

The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.

For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.

Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D94973
2021-03-04 22:52:59 -06:00
Wei Mi 2357d29335 [SampleFDO] Another fix to prevent repeated indirect call promotion in
sample loader pass.

In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
to prevent repeated indirect call promotion for the same indirect call
and the same target, we used zero-count value profile to indicate an
indirect call has been promoted for a certain target. We removed
PromotedInsns cache in the same patch. However, there was a problem in
that patch described below, and that problem led me to add PromotedInsns
back as a mitigation in
https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf.

When we get value profile from metadata by calling getValueProfDataFromInst,
we need to specify the maximum possible number of values we expect to read.
We uses MaxNumPromotions in the last patch so the maximum number of value
information extracted from metadata is MaxNumPromotions. If we have many
values including zero-count values when we write the metadata, some of them
will be dropped when we read them because we only read MaxNumPromotions
values. It will allow repeated indirect call promotion again. We need to
make sure if there are values indicating promoted targets, those values need
to be saved in metadata with higher priority than other values.

The patch fixed that problem. We change to use -1 to represent the count
of a promoted target instead of 0 so it is easier to sort the values.
When we prepare to update the metadata in updateIDTMetaData, we will sort
the values in the descending count order and extract only MaxNumPromotions
values to write into metadata. Since -1 is the max uint64_t number, if we
have equal to or less than MaxNumPromotions of -1 count values, they will
all be kept in metadata. If we have more than MaxNumPromotions of -1 count
values, we will only save MaxNumPromotions such values maximally. In such
case, we have logic in place in doesHistoryAllowICP to guarantee no more
promotion in sample loader pass will happen for the indirect call, because
it has been promoted enough.

With this change, now we can remove PromotedInsns without problem.

Differential Revision: https://reviews.llvm.org/D97350
2021-03-04 18:44:12 -08:00
William S. Moses 2b896e39bf Revert "[Attributor] Enable heap-to-stack of any size"
This reverts commit 51bd42ef9b.
2021-03-04 17:24:56 -05:00
William S. Moses 51bd42ef9b [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-04 17:17:23 -05:00
Hongtao Yu 8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Johannes Doerfert 5b70c12f3e [Attributor] Make DepClass a required argument
We often used a sub-optimal dependence class in the past because we
didn't see the argument. Let's make it explicit so we remember to think
about it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert e592dad82e [Attributor] Fold "TrackDependence" into the DepClassTy enum
We don't need a bool and an enum to express the three options we
currently have. This makes the interface nicer and much easier to
use optional dependencies. Also avoids mistakes where the bool is
false and enum ignored.
2021-03-04 00:35:52 -06:00
Johannes Doerfert c8c93fdf0a [Attributor] Avoid work for GEPs and wait till the users are visited 2021-03-04 00:35:52 -06:00
Johannes Doerfert f3f88287c5 [Attributor] Use known alignment as lower bound to avoid work
If we know already more than available from a use, we don't need to
invest time on it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert c14213e030 [Attributor][NFC] Move some trivial checks up 2021-03-04 00:35:52 -06:00
Johannes Doerfert 09c3eebf5f [Attributor] Use sensible initialization in AANoCaptureCallSiteReturned 2021-03-04 00:35:51 -06:00