To avoid incorrectly merging GEPs with different source types
under opaque pointers.
To avoid increasing the Expression structure size, this reuses the
existing type member. The code does not rely on this to be the
expression result type, it's only used as a disambiguator.
This patch extends the available-value logic to detect loads
of pointer-selects that can be replaced by a value select.
For example, consider the code below:
loop:
%sel.phi = phi i32* [ %start, %ph ], [ %sel, %ph ]
%l = load %ptr
%l.sel = load %sel.phi
%sel = select cond, %ptr, %sel.phi
...
exit:
%res = load %sel
use(%res)
The load of the pointer phi can be replaced by a load of the start value
outside the loop and a new phi/select chain based on the loaded values,
as illustrated below
%l.start = load %start
loop:
sel.phi.prom = phi i32 [ %l.start, %ph ], [ %sel.prom, %ph ]
%l = load %ptr
%sel.prom = select cond, %l, %sel.phi.prom
...
exit:
use(%sel.prom)
This is a first step towards alllowing vectorizing loops using common libc++
library functions, like std::min_element (https://clang.godbolt.org/z/6czGzzqbs)
#include <vector>
#include <algorithm>
int foo(const std::vector<int> &V) {
return *std::min_element(V.begin(), V.end());
}
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D118143
Allocation functions should be marked with onlyAccessesInaccessibleMemory (when that is correct for the given function) which is checked elsewhere so this check is no longer needed.
Differential Revision: https://reviews.llvm.org/D117180
In D115311, we're looking to modify clang to emit i constraints rather
than X constraints for callbr's indirect destinations. Prior to doing
so, update all of the existing tests in llvm/ to match.
Reviewed By: void, jyknight
Differential Revision: https://reviews.llvm.org/D115410
We currently have two similar implementations of this concept:
isNoAliasCall() only checks for the noalias return attribute.
isNoAliasFn() also checks for allocation functions.
We should switch to only checking the attribute. SLC is responsible
for inferring the noalias return attribute for non-new allocation
functions (with a missing case fixed in
348bc76e35).
For new, clang is responsible for setting the attribute,
if -fno-assume-sane-operator-new is not passed.
Differential Revision: https://reviews.llvm.org/D116800
This is a reoccuring pattern, we can consolidate three copies into one. The main motivation is to reduce usages of isMallocLike.
The original commit (which was quickly reverted) didn't account for the allocation function could be an invoke, test coverage for that case added in this commit.
Removed it because it runs indvars after GVN and
it's obvious it's an indvars test, not GVN.
Also the removed test file includes a 'target triple' with aarch64 specified,
which is needed for indvars to eliminate the Phis produced by GVN,
and the test directory doesn't have a lit config to exclude lit from
running the test if aarch64 is not supported by a build.
To be more consistent with other pass struct names.
There are still more passes that don't end with "Pass", but these are the important ones.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D112935
This reapplies patch db289340c8.
The test failures on build with expensive checks caused by the patch happened due
to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort,
which shuffles the given container if the expensive checks are enabled,
so equivalent Phis in the sorted vector had different mutual order from run
to run. replaceCongruentIVs tries to replace narrow Phis with truncations
of wide ones. In some test cases there were several Phis with the same
width, so if their order differs from run to run, the narrow Phis would
be replaced with a different Phi, depending on the shuffling result.
The patch ae14fae0ff fixed this issue by
replacing llvm::sort with llvm::stable_sort.
In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'.
This function optionally takes a TTI argument to be able to replace narrow IVs uses
with truncates of the widest one.
For some reason the TTI wasn't passed to the function, so it couldn't perform such
transform.
This patch fixes it.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D113024
This is a fix for tests added by 96591a14cd.
A function which was called in tests wasn't marked as 'readonly', and the
GVN performed PRE for the loads, but they were supposed to be non-local.
So added 'readonly' to the called function.
When we eliminate a non-local load in a loop, we create a new PHI for the loaded value,
while there already may be the exact same PHIs in the loop.
IndVarsSimplify currently can handle this case eliminating the duplicated PHIs.
However, if the loop PHI is of type of the load and also there exists
an use of the z(s)ext'ed of it, IndVarSimplify wouldn't eliminate the duplicating PHI.
It would just replace the IV with a widened one, leaving the GVN-generated PHI as is.
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
In LLVM IR, `AlignmentBitfieldElementT` is 5-bit wide
But that means that the maximal alignment exponent is `(1<<5)-2`,
which is `30`, not `29`. And indeed, alignment of `1073741824`
roundtrips IR serialization-deserialization.
While this doesn't seem all that important, this doubles
the maximal supported alignment from 512MiB to 1GiB,
and there's actually one noticeable use-case for that;
On X86, the huge pages can have sizes of 2MiB and 1GiB (!).
So while this doesn't add support for truly huge alignments,
which i think we can easily-ish do if wanted, i think this adds
zero-cost support for a not-trivially-dismissable case.
I don't believe we need any upgrade infrastructure,
and since we don't explicitly record the IR version,
we don't need to bump one either.
As @craig.topper speculates in D108661#2963519,
this might be an artificial limit imposed by the original implementation
of the `getAlignment()` functions.
Differential Revision: https://reviews.llvm.org/D108661
This test was not modifying the pointer in the loop, so the loads
just ended up as undef, without relation to loop load PRE.
Pass the alloca to the called function, so the memory is
potentially modified.
There's a potential change in dereferenceability attribute semantics in the nearish future. See llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree" and D99100 for context.
This change simply adds appropriate attributes to tests to keep transform logic exercised under both old and new/proposed semantics. Note that for many of these cases, O3 would infer exactly these attributes on the test IR.
This change handles the idiomatic pattern of a dereferenceable object being passed to a call which can not free that memory. There's a couple other tests which need more one-off attention, they'll be handled in another change.
This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction.
This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll .
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104602
This change tries to handle multiple dominating users of the pointer operand
by choosing the most immediately dominating one, if possible. While making
this change I also found that the previous implementation had a missing break
statement, making all loads with an odd number of dominating users emit an
OtherAccess value, so that has also been fixed.
Patch by Henrik G Olsson!
Differential Revision: https://reviews.llvm.org/D79097
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames