Commit Graph

240 Commits

Author SHA1 Message Date
Arthur Eubanks 55cf09ae26 [ValueTracking] Simplify llvm::isPointerOffset()
We still need the code after stripAndAccumulateConstantOffsets() since
it doesn't handle GEPs of scalable types and non-constant but identical
indexes.

Differential Revision: https://reviews.llvm.org/D120523
2022-03-14 09:32:36 -07:00
Florian Hahn 7662d1687b
[MemCpyOpt] Check all access for MemoryUses in writtenBetween.
Currently writtenBetween can miss clobbers of Loc between End and Start,
if End is a MemoryUse.

To guarantee we see all write clobbers of Loc between Start and End
for MemoryUses, restrict to Start and End being in the same block
and check all accesses between them.

This fixes 2 mis-compiles illustrated in
llvm/test/Transforms/MemCpyOpt/memcpy-byval-forwarding-clobbers.ll

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119929
2022-02-21 16:54:30 +00:00
Florian Hahn 3ba42a564a
[MemCpyOpt] Add non-local memcpy test with memory phi. 2022-02-18 11:59:24 +00:00
Nikita Popov 97c151de3d [MemCpyOpt] Fix broken check lines (NFC)
These are leftovers from when there were separate MSSA/non-MSSA
check lines.
2022-02-16 14:39:43 +01:00
Florian Hahn 85fd97e3b9
[MemCpyOpt] Add tests with incorrect memcpy->byval forwarding.
Add a few test cases with clobbers (writes and lifetime.end) that should
prevent memcpy->byval forwarding, but those clobbers are missed at the
moment.
2022-02-16 11:35:14 +00:00
Arthur Eubanks a9029a33ff [OpaquePtr][ValueTracking] Check GEP source element type in isPointerOffset()
Fixes a MemCpyOpt miscompile with opaque pointers.
This function can be further cleaned up, but let's just fix the miscompile first.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D119652
2022-02-13 10:35:38 -08:00
Arthur Eubanks d050010ea2 [test][MemCpyOpt] Rename test function 2022-02-12 17:44:51 -08:00
Arthur Eubanks 12ba0659b4 [test][MemCpyOpt] Precommit test 2022-02-12 17:38:45 -08:00
Nikita Popov 6b69985da4 [MemCpyOpt] Use helper for unwind check
This extends support to byval arguments. It would be further
extended to handle the case of non-captured noalias returns.
2022-01-26 12:43:31 +01:00
Nikita Popov ed4efee2a3 [MemCpyOpt] Add additiona call slot unwind tests (NFC)
Test a possibly unwinding call with a byval and sret argument.
2022-01-26 11:49:24 +01:00
Nikita Popov 0d20407d1a Reapply [MemCpyOpt] Look through pointer casts when checking capture
This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.

-----

The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.
2022-01-20 09:30:21 +01:00
Nikita Popov 655a7024db Reapply [MemCpyOpt] Make capture check during call slot optimization more precise
This is a recommit of the patch without changes. The reason for
the revert has been addressed in D117679.

-----

Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.

Differential Revision: https://reviews.llvm.org/D115615
2022-01-20 09:30:20 +01:00
Nikita Popov d7bff2e9d2 [MemCpyOpt] Fix metadata merging during call slot optimization
Call slot optimization currently merges the metadata between the
call and the load. However, we also need to merge in the metadata
of the store.

Part of the reason why we might have gotten away with this
previously is that usually the load and the store are the same
instruction (a memcpy), this can only happen if call slot
optimization occurs on an actual load/store pair.

This addresses the issue reported in
https://reviews.llvm.org/D115615#3251386.

Differential Revision: https://reviews.llvm.org/D117679
2022-01-20 09:25:13 +01:00
Nikita Popov 0db30adcfb [MemCpyOpt] Test invalid noalias metadata after call slot opt (NFC) 2022-01-19 15:51:10 +01:00
Hans Wennborg 53a51acc36 Revert "[MemCpyOpt] Make capture check during call slot optimization more precise"
This casued a miscompile due to call slot optimization replacing a call
argument without considering the call's !noalias metadata, see discussion on
the code review.

> Call slot optimization is currently supposed to be prevented if
> the call can capture the source pointer. Due to an implementation
> bug, this check currently doesn't trigger if a bitcast of the source
> pointer is passed instead. I'm somewhat afraid of the fallout of
> fixing this bug (due to heavy reliance on call slot optimization
> in rust), so I'd like to strengthen the capture reasoning a bit first.
>
> In particular, I believe that the capture is fine as long as a)
> the call itself cannot depend on the pointer identity, because
> neither dest has been captured before/at nor src before the
> call and b) there is no potential use of the captured pointer
> before the lifetime of the source alloca ends, either due to
> lifetime.end or a return from a function. At that point the
> potentially captured pointer becomes dangling.
>
> Differential Revision: https://reviews.llvm.org/D115615

Also reverting the dependent commit:

> [MemCpyOpt] Look through pointer casts when checking capture
>
> The user scanning loop above looks through pointer casts, so we
> also need to strip pointer casts in the capture check. Previously
> the source was incorrectly considered not captured if a bitcast
> was passed to the call.

This reverts commit 487a34ed9d
and 00e6869463.
2022-01-18 17:41:49 +01:00
Nikita Popov 00e6869463 [MemCpyOpt] Look through pointer casts when checking capture
The user scanning loop above looks through pointer casts, so we
also need to strip pointer casts in the capture check. Previously
the source was incorrectly considered not captured if a bitcast
was passed to the call.
2022-01-05 09:50:33 +01:00
Nikita Popov 487a34ed9d [MemCpyOpt] Make capture check during call slot optimization more precise
Call slot optimization is currently supposed to be prevented if
the call can capture the source pointer. Due to an implementation
bug, this check currently doesn't trigger if a bitcast of the source
pointer is passed instead. I'm somewhat afraid of the fallout of
fixing this bug (due to heavy reliance on call slot optimization
in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as a)
the call itself cannot depend on the pointer identity, because
neither dest has been captured before/at nor src before the
call and b) there is no potential use of the captured pointer
before the lifetime of the source alloca ends, either due to
lifetime.end or a return from a function. At that point the
potentially captured pointer becomes dangling.

Differential Revision: https://reviews.llvm.org/D115615
2022-01-05 09:39:25 +01:00
Nikita Popov c2e77c9122 [MemCpyOpt] Add additional call slot capture tests (NFC) 2022-01-05 09:33:04 +01:00
Nikita Popov 396370e889 [MemCpyOpt] Add additional call slot capture tests (NFC)
One test shows a miscompile when bitcasts are involved, the others
cases where we can perform the optimization despite a capture.
2021-12-13 10:57:06 +01:00
Nikita Popov 90ec6dff86 [OpaquePtr] Forbid mixing typed and opaque pointers
Currently, opaque pointers are supported in two forms: The
-force-opaque-pointers mode, where all pointers are opaque and
typed pointers do not exist. And as a simple ptr type that can
coexist with typed pointers.

This patch removes support for the mixed mode. You either get
typed pointers, or you get opaque pointers, but not both. In the
(current) default mode, using ptr is forbidden. In -opaque-pointers
mode, all pointers are opaque.

The motivation here is that the mixed mode introduces additional
issues that don't exist in fully opaque mode. D105155 is an example
of a design problem. Looking at D109259, it would probably need
additional work to support mixed mode (e.g. to generate GEPs for
typed base but opaque result). Mixed mode will also end up
inserting many casts between i8* and ptr, which would require
significant additional work to consistently avoid.

I don't think the mixed mode is particularly valuable, as it
doesn't align with our end goal. The only thing I've found it to
be moderately useful for is adding some opaque pointer tests in
between typed pointer tests, but I think we can live without that.

Differential Revision: https://reviews.llvm.org/D109290
2021-09-10 15:18:23 +02:00
Fraser Cormack 7fb66d4035 [MemCpyOpt] Fix a variety of scalable-type crashes
This patch fixes a variety of crashes resulting from the `MemCpyOptPass`
casting `TypeSize` to a constant integer, whether implicitly or
explicitly.

Since the `MemsetRanges` requires a constant size to work, all but one
of the fixes in this patch simply involve skipping the various
optimizations for scalable types as cleanly as possible.

The optimization of `byval` parameters, however, has been updated to
work on scalable types in theory. In practice, this optimization is only
valid when the length of the `memcpy` is known to be larger than the
scalable type size, which is currently never the case. This could
perhaps be done in the future using the `vscale_range` attribute.

Some implicit casts have been left as they were, under the knowledge
they are only called on aggregate types. These should never be
scalably-sized.

Reviewed By: nikic, tra

Differential Revision: https://reviews.llvm.org/D109329
2021-09-08 11:21:36 +01:00
Nikita Popov 88003cea1c [MemCpyOpt] Remove MemDepAnalysis-based implementation
The MemorySSA-based implementation has been enabled for a few months
(since D94376). This patch drops the old MDA-based implementation
entirely.

I've kept this to only the basic cleanup of dropping various
conditions -- the code could be further cleaned up now that there
is only one implementation.

Differential Revision: https://reviews.llvm.org/D102113
2021-08-07 22:35:44 +02:00
Artem Belevich 6a9cf21f5a [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA.
Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that
there are users that do not expect LLVM to materialize `memset` intrinsic.

While other passes can do that, too, MemCpyOpt triggers it more frequently and
breaks sanitizers and some downstream users.

For now introduce a flag to force-enable the flag and opt-in only CUDA
compilation with NVPTX back-end.

Differential Revision: https://reviews.llvm.org/D106401
2021-08-06 11:13:52 -07:00
Michael Liao d1cacd5928 [MemCpyOpt] Teach memcpyopt to handle loads from the constant memory.
- Loads from the constant memory (either explicit one or as the source
  of memory transfer intrinsics) won't alias any stores.

Reviewed By: asbirlea, efriedma

Differential Revision: https://reviews.llvm.org/D107605
2021-08-06 12:43:52 -04:00
Nikita Popov bb15861e14 [MemCpyOpt] Relax libcall checks
Rather than blocking the whole MemCpyOpt pass if the libcalls are
not available, only disable creation of new memset/memcpy intrinsics
where only load/stores were used previously. This only affects the
store merging and load-store conversion optimization. Other
optimizations are derived from existing intrinsics, which are
well-defined in the absence of libcalls -- not having the libcalls
just means that call simplification won't convert them to intrinsics.

This is a weaker variation of D104801, which dropped these checks
entirely. Ideally we would not couple emission of intrinsics to
libcall availability at all, but as the intrinsics may be legalized
to libcalls we need to be a bit careful right now.

Differential Revision: https://reviews.llvm.org/D106769
2021-08-04 21:17:51 +02:00
Philip Reames e75a2dfe20 [tests] Stablize tests for possible change in deref semantics
There's a potential change in dereferenceability attribute semantics in the nearish future.  See llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree" and D99100 for context.

This change simply adds appropriate attributes to tests to keep transform logic exercised under both old and new/proposed semantics.  Note that for many of these cases, O3 would infer exactly these attributes on the test IR.

This change handles the idiomatic pattern of a dereferenceable object being passed to a call which can not free that memory.  There's a couple other tests which need more one-off attention, they'll be handled in another change.
2021-07-14 13:05:43 -07:00
Jon Roelofs 37b6e03c18 [Intrinsics] Make MemCpyInlineInst a MemCpyInst
This opens up more optimization opportunities in passes that already handle MemCpyInst's.

Differential revision: https://reviews.llvm.org/D105247
2021-07-02 10:25:24 -07:00
Nikita Popov 9aa951e80e [MemCpyOpt] Preserve address space
Preserve address space when generating the cast to i8*.
2021-06-27 20:21:19 +02:00
Nikita Popov f025053977 [MemCpyOpt] Handle unusual memcpy element type
Apparently, it is legal to use memcpy/memset with pointer types
other than i8*. Prior to 81fcdae68c
this case was silently miscompiled, as the i8 offset calculation
was performed on some other type. Now it would crash due to a
type mismatch. Fix this by inserting an explicit bitcast to i8*.
2021-06-27 16:21:44 +02:00
Nikita Popov 81fcdae68c [MemCpyOpt] Support opaque pointers 2021-06-27 15:52:38 +02:00
Nicolai Hähnle a888e492f6 [IR] Memory intrinsics are not unconditionally `nosync`
Remove the `nosync` attribute from the memory intrinsic definitions
(i.e. memset, memcpy, memmove).

Like native memory accesses, memory intrinsics can be volatile. This is
indicated by an immarg in the intrinsic call. All else equal, a volatile
memory intrinsic is `sync`, so we cannot annotate the intrinsic functions
themselves as `nosync`. The attributor and function-attr passes know to
take the volatile bit into account.

Since `nosync` is a default attribute, this means we have to stop using
the DefaultAttrIntrinsic tablegen class for memory intrinsics, and
specify all default attributes other than `nosync` explicitly.

Most of the test changes are trivial churn, but one test case
(in nosync.ll) was in fact incorrect before this change.

Differential Revision: https://reviews.llvm.org/D102295
2021-05-21 03:40:59 +02:00
Nikita Popov 656296b1c2 Reapply [CaptureTracking] Do not check domination
Reapply after adjusting the synchronized.m test case, where the
TODO is now resolved. The pointer is only captured on the exception
handling path.

-----

For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.

This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.

After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
2021-05-16 15:46:31 +02:00
Nikita Popov 541c2845de Revert "[CaptureTracking] Do not check domination"
This reverts commit 6b8b43e7af.

This causes clang test to fail (CodeGenObjC/synchronized.m).
Revert until I can figure out whether that's an expected change.
2021-05-16 11:04:45 +02:00
Nikita Popov 6b8b43e7af [CaptureTracking] Do not check domination
For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.

This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.

After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
2021-05-16 10:49:36 +02:00
Nikita Popov aaf5fd4316 [MemCpyOpt] Add test for unreachable capture (NFC)
This is based on the test from D90688, without the argmemonly
attribute. The argmemonly attribute would guaranteed no modref
by itself and the question of captures would not arise in the
first place.
2021-05-16 10:48:52 +02:00
Olle Fredriksson f5446b769a [MemCpyOpt] Allow variable lengths in memcpy optimizer
This makes the memcpy-memcpy and memcpy-memset optimizations work for
variable sizes as long as they are equal, relaxing the old restriction
that they are constant integers. If they're not equal, the old
requirement that they are constant integers with certain size
restrictions is used.

The implementation works by pushing the length tests further down in the
code, which reveals some places where it's enough that the lengths are
equal (but not necessarily constant).

Differential Revision: https://reviews.llvm.org/D100870
2021-04-21 23:23:38 +02:00
Philip Reames 854de7c4d0 [tests] Refresh a bunch of autogen test to adjust for format changes 2021-03-22 10:41:39 -07:00
Nikita Popov 5556660971 [MemCpyOpt] Handle read from lifetime.start with offset
This fixes a regression from the MemDep-based implementation:
MemDep completely ignores lifetime.start intrinsics that aren't
MustAlias -- this is probably unsound, but it does mean that the
MemDep based implementation successfully eliminated memcpy's from
lifetime.start if the memcpy happens at an offset, rather than
the base address of the alloca.

Add a special case for the case where the lifetime.start spans the
whole alloca (which is pretty much the only kind of lifetime.start
that frontends ever emit), as we don't need to figure out our exact
aliasing relationship in that case, the whole alloca is dead prior
to the call.

If this doesn't cover all practically relevant cases, then it
would be possible to make use of the recently added PartialAlias
clobber offsets to make this more precise.
2021-03-13 20:38:09 +01:00
Nikita Popov a10bf5572d [MemCpyOpt] Add additional tests for memcpy of undef (NFC) 2021-03-13 20:38:09 +01:00
Nikita Popov 2902bdeea1 [MemCpyOpt] Use AA to check for MustAlias between memset and memcpy
Rather than checking for simple equality, check for MustAlias, as
we do in other transforms. This catches equivalent GEPs.
2021-03-13 11:41:15 +01:00
Nikita Popov 9080444f33 [MemCpyOpt] Don't generate zero-size memset
If a memset destination is overwritten by a memcpy and the sizes
are exactly the same, then the memset is simply dead. We can
directly drop it, instead of replacing it with a memset of zero
size, which is particularly ugly for the case of a dynamic size.
2021-03-13 11:41:15 +01:00
Nikita Popov dabd6abbcd [MemCpyOpt] Add additional tests for memset+memcpy overwrite (NFC) 2021-03-13 11:32:24 +01:00
Nikita Popov b2f933a6ce [MemorySSA] Don't bail on phi starting access
When calling getClobberingMemoryAccess() with MemoryLocation on a
MemoryPHI starting access, the walker currently immediately bails
and returns the starting access. This makes sense for the API that
does not accept a location (as we wouldn't know what clobber we
should be checking for), but doesn't make sense for the
MemoryLocation-based API. This means that it can't look through
a MemoryPHI if it's the starting access, but can if there is one
more non-clobbering def in between. This patch removes the limitation.

Differential Revision: https://reviews.llvm.org/D98557
2021-03-13 10:53:13 +01:00
Nikita Popov dfd27ebbd0 [MemCpyOpt] Add test for memcpy in loop (NFC)
This is currently not being optimized.
2021-03-12 22:54:24 +01:00
Nikita Popov 4125afc357 [MemCpyOpt] Fix handling of readnone byval arguments
If the call is readnone, then there may not be any MemoryAccess
associated with the call. Bail out in that case.

This fixes the issue reported at
https://reviews.llvm.org/D94376#2578312.
2021-02-22 18:48:31 +01:00
Dávid Bolvanský cd54c57919 Reland "[Libcalls, Attrs] Annotate libcalls with noundef"
Fixed Clang tests.
2021-02-20 06:18:48 +01:00
Dávid Bolvanský 94d034fb86 Revert "[Libcalls, Attrs] Annotate libcalls with noundef"
This reverts commit 33b0c63775. Bots are failing. Some Clang tests need to be updated too.
2021-02-20 04:18:42 +01:00
Dávid Bolvanský 33b0c63775 [Libcalls, Attrs] Annotate libcalls with noundef
I think we can use here same logic as for nonnull.

strlen(X) - X must be noundef => valid pointer.

for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones)

Reviewed By: jdoerfert, aqjune

Differential Revision: https://reviews.llvm.org/D95122
2021-02-20 04:10:07 +01:00
Nikita Popov be9889b350 [MemorySSA] Don't treat lifetime.end as NoAlias
MemorySSA currently treats lifetime.end intrinsics as not aliasing
anything. This breaks MemorySSA-based MemCpyOpt, because we'll happily
move a read of a pointer below a lifetime.end intrinsic, as no clobber
is reported.

I think the MemorySSA modelling here isn't correct: lifetime.end(p)
has approximately the same effect as doing a memcpy(p, undef), and
should be treated as a clobber.

This patch removes the special handling of lifetime.end, leaving
alias analysis to handle it appropriately.

Differential Revision: https://reviews.llvm.org/D95763
2021-02-04 20:58:28 +01:00
Nikita Popov 6e52eebc2a [MemCpyOpt] Add test for incorrect optimization across lifetime (NFC)
This only affects the MemorySSA-based implementation.
2021-01-29 12:57:02 +01:00