Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320510
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320499
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320496
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320488
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320483
Summary:
Currently, in InstCombineLoadStoreAlloca, we have simplification
rules for the following cases:
1. load off a null
2. load off a GEP with null base
3. store to a null
This patch adds support for the fourth case which is store into a
GEP with null base. Since this is UB as well (and directly analogous to
the load off a GEP with null base), we can substitute the stored val
with undef in instcombine, so that SimplifyCFG can optimize this code
into unreachable code.
Note: Right now, simplifyCFG hasn't been taught about optimizing
this to unreachable and adding an llvm.trap (this is already done for
the above 3 cases).
Reviewers: majnemer, hfinkel, sanjoy, davide
Reviewed by: sanjoy, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41026
llvm-svn: 320480
The tests fail (opt asserts) on Windows.
> Summary:
> If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
> &V2)))), bitcast)`, but the load is used in other instructions, it leads
> to looping in InstCombiner. Patch adds additional check that all users
> of the load instructions are stores and then replaces all uses of load
> instruction by the new one with new type.
>
> Reviewers: RKSimon, spatel, majnemer
>
> Subscribers: llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320421
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Reviewers: RKSimon, spatel, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41072
llvm-svn: 320407
Summary:
If we have the code like this:
```
float a, b;
a = std::max(a ,b);
```
it is converted into something like this:
```
%call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %a.addr, float* nonnull dereferenceable(4) %b.addr)
%1 = bitcast float* %call to i32*
%2 = load i32, i32* %1, align 4
%3 = bitcast float* %a.addr to i32*
store i32 %2, i32* %3, align 4
```
After inlinning this code is converted to the next:
```
%1 = load float, float* %a.addr
%2 = load float, float* %b.addr
%cmp.i = fcmp fast olt float %1, %2
%__b.__a.i = select i1 %cmp.i, float* %a.addr, float* %b.addr
%3 = bitcast float* %__b.__a.i to i32*
%4 = load i32, i32* %3, align 4
%5 = bitcast float* %arrayidx to i32*
store i32 %4, i32* %5, align 4
```
This pattern is not recognized as minmax pattern.
Patch solves this problem by converting sequence
```
store (bitcast, (load bitcast (select ((cmp V1, V2), &V1, &V2))))
```
to a sequence
```
store (,load (select((cmp V1, V2), &V1, &V2)))
```
After this the code is recognized as minmax pattern.
Reviewers: RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40304
llvm-svn: 320157
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.
Reviewers: dblaikie, aprantl
Reviewed By: dblaikie, aprantl
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37877
llvm-svn: 314694
Summary: Currently, when GVN creates a load and when InstCombine creates a new store for unreachable Load, the DebugLoc info gets lost.
Reviewers: dberlin, davide, aprantl
Reviewed By: aprantl
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34639
llvm-svn: 308404
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference.
This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do.
llvm-svn: 307451
Summary:
As discussed on the mailing list it is legal to propagate TBAA to loads/stores
from/to smaller regions of a larger load tagged with TBAA. Do so for
(load->extractvalue)=>(gep->load) and similar foldings.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D31954
llvm-svn: 306615
metadata out of InstCombine and into helpers.
NFC, this just exposes the logic used by InstCombine when propagating
metadata from one load instruction to another. The plan is to use this
in SROA to address PR32902.
If anyone has better ideas about how to factor this or name variables,
I'm all ears, but this seemed like a pretty good start and lets us make
progress on the PR.
This is based on a patch by Ariel Ben-Yehuda (D34285).
llvm-svn: 306267
Summary:
InstCombine replaces large allocas with small globals consts causing buffer overflows
on valid code, see PR33372.
This fix permits this optimization only if the global is dereference for alloca size.
Fixes PR33372
Reviewers: eugenis, majnemer, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34311
llvm-svn: 306194
This optimisation was crashing when there was a chain of more than one bitcast
instruction to replace, as a result of the changes in D27283.
Patch by James Price.
Differential Revision: https://reviews.llvm.org/D30347
llvm-svn: 296163
This is necessary to avoid warnings from GCC.
InstCombineLoadStoreAlloca.cpp:238:7: error: 'PointerReplacer' declared
with greater visibility than the type of its field 'PointerReplacer::IC'
llvm-svn: 294794
For function-scope variables with large initialisation list, FE usually
generates a global variable to hold the initializer, then generates
memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst
identifies such allocas which are accessed only by reading and replaces
them with the global variable. This is done by casting the global variable
to the type of the alloca and replacing all references.
However, when the global variable is in a different address space which
is disjoint with addr space 0 (e.g. for IR generated from OpenCL,
global variable cannot be in private addr space i.e. addr space 0), casting
the global variable to addr space 0 results in invalid IR for certain
targets (e.g. amdgpu).
To fix this issue, when the global variable is not in addr space 0,
instead of casting it to addr space 0, this patch chases down the uses
of alloca until reaching the load instructions, then replaces load from
alloca with load from the global variable. If during the chasing
bitcast and GEP are encountered, new bitcast and GEP based on the global
variable are generated and used in the load instructions.
Differential Revision: https://reviews.llvm.org/D27283
llvm-svn: 294786
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal.
Differential Revision: https://reviews.llvm.org/D24365
llvm-svn: 288415
When combining an integer load with !range metadata that does not include 0 to a pointer load, make sure emit !nonnull metadata on the newly-created pointer load. This prevents the !nonnull metadata from being dropped during a ptrtoint/inttoptr pair.
This fixes PR30597.
Patch by Ariel Ben-Yehuda!
Differential Revision: https://reviews.llvm.org/D25215
llvm-svn: 283836
Summary:
The correctness fix here is that when we CSE a load with another load,
we need to combine the metadata on the two loads. This matches the
behavior of other passes, like instcombine and GVN.
There's also a minor optimization improvement here: for load PRE, the
aliasing metadata on the inserted load should be the same as the
metadata on the original load. Not sure why the old code was throwing
it away.
Issue found by inspection.
Differential Revision: http://reviews.llvm.org/D21460
llvm-svn: 277977
Original Commit Message
Extend load/store type canonicalization to handle unordered operations
Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case.
Note that the concern about lowering is now much less likely. PR27490 proved that we already *were* mucking with the types of ordered atomics and volatiles. As a result, this change doesn't introduce as much new behavior as originally thought.
llvm-svn: 268809
This is required to use this function from isSafeToSpeculativelyExecute
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16231
llvm-svn: 267692
This patch is what was the "instcombine" portion of D14185, with an additional
test added (see julia_pseudovec in test/Transforms/InstCombine/insert-val-extract-elem.ll).
The patch causes instcombine to replace sequences of extractelement-insertvalue-store
that act essentially like a bitcast followed by a store.
Differential review: http://reviews.llvm.org/D14260
llvm-svn: 267482
Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case.
llvm-svn: 267210
This builds on 266999 which made FindAvailableValue do the right thing. Tests included show the newly enabled transforms and those which disabled either due to conservatism or correctness requirements.
llvm-svn: 267006
I accidentally replaced `mayBeOverridden` with `!isInterposable`.
Remove the negation and add a test case that would've caught this.
Many thanks to Håkan Hjort for spotting this!
llvm-svn: 266551
Summary:
Fixes PR26774.
If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".
Motivation:
I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard. So transforming:
```
void f(unsigned x) {
unsigned t = 5 / x;
(void)t;
}
```
to
```
void f(unsigned x) { }
```
is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).
Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM. For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).
Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have. This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.
For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store. As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal. The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal. Such a
refined variant will look like it is `readonly`. However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.
Note: this is not just a problem with atomics or with linking
differently optimized object files. See PR26774 for more realistic
examples that involved neither.
This patch:
This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time. It then changes a set of IPO passes to bail out if they see
such a function.
Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18634
llvm-svn: 265762
Since the names are used in a loop this does more work in debug builds. In
release builds value names are generally discarded so we don't have to do
the concatenation at all. It's also simpler code, no functional change
intended.
llvm-svn: 263215
Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized.
Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17828
llvm-svn: 262530
Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements.
Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15890
llvm-svn: 262521
Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it.
Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17326
llvm-svn: 261139
Original commit message:
[InstCombine] Fold IntToPtr and PtrToInt into preceding loads.
Currently we only fold a BitCast into a Load when the BitCast is its
only user.
Do the same for any no-op cast.
Patch by Philip Pfaffe!
Differential Revision: http://reviews.llvm.org/D9152
llvm-svn: 260612
When optimizing a extractvalue(load), we generate a load from the
aggregate type. This load didn't have alignment set and so would
get the alignment of the type. This breaks when the type is packed
and so the alignment should be lower.
For example, loading { int, int } would give us alignment of 4, but
the original load from this type may have an alignment of 1 if packed.
Reviewed by David Majnemer
Differential revision: http://reviews.llvm.org/D17158
llvm-svn: 260587
According to git bisect, this is the root cause of a miscompile for Regex in
libLLVMSupport. I am still working on reducing a test case.
The actual bug may be elsewhere and this commit just exposed it.
Anyway, at the moment, to reproduce, follow these steps:
1. Build clang and libLTO in release mode.
2. Create a new build directory <stage2> and cd into it.
3. Use clang and libLTO from #1 to build llvm-extract in Release mode + asserts
using -O2 -flto
4. Run llvm-extract -ralias '.*bar' -S test/Other/extract-alias.ll
Result:
program doesn't contain global named '.*bar'!
Expected result:
@a0a0bar = alias void ()* @bar
@a0bar = alias void ()* @bar
declare void @bar()
Note: In step #3, if you don't use lto or asserts, the miscompile disappears.
llvm-svn: 259674
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.
Key points:
* We only remove unordered or simple stores.
* The loads producing values consumed by dead stores don't influence whether the store is dead.
Differential Revision: http://reviews.llvm.org/D15354
llvm-svn: 255932
For non padded structs, we can just proceed and deaggregate them.
We don't want ot do this when there is padding in the struct as to not
lose information about this padding (the subsequents passes would then
try hard to preserve the padding, which is undesirable).
Also update extractvalue.ll and cast.ll so that they use structs with padding.
Remove the FIXME in the extractvalue of laod case as the non padded case is
handled when processing the load, and we don't want to do it on the padded
case.
Patch by: Amaury SECHET <deadalnix@gmail.com>
Differential Revision: http://reviews.llvm.org/D14483
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255600
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.
http://reviews.llvm.org/D12992
llvm-svn: 249196
Summary: This fixes a variety of typos in docs, code and headers.
Subscribers: jholewinski, sanjoy, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D12626
llvm-svn: 247495
and make it always preserve debug locations, since all callers wanted this
behavior anyway.
This is addressing a post-commit review feedback for r245589.
NFC (inside the LLVM tree).
llvm-svn: 245622
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.
Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.
llvm-svn: 245589
Not doing this can lead to misoptimizations down the line, e.g. because
of range metadata on the replacing load excluding values that are valid
for the load that is being replaced.
llvm-svn: 241886
Currently we only fold a BitCast into a Load when the BitCast is its
only user.
Do the same for any no-op cast.
Differential Revision: http://reviews.llvm.org/D9152
llvm-svn: 238452
We already had a method to iterate over all the incoming values of a PHI. This just changes all eligible code to use it.
Ineligible code included anything which cared about the index, or was also trying to get the i'th incoming BB.
llvm-svn: 237169
Summary:
One step further getting aggregate loads and store being optimized
properly. This will only handle struct with one element at this point.
Test Plan: Added unit tests for the new supported cases.
Reviewers: chandlerc, joker-eph, joker.eph, majnemer
Reviewed By: majnemer
Subscribers: pete, llvm-commits
Differential Revision: http://reviews.llvm.org/D8339
Patch by Amaury Sechet.
From: Amaury Sechet <amaury@fb.com>
llvm-svn: 236695
CallSite roughly behaves as a common base CallInst and InvokeInst. Bring
the behavior closer to that model by making upcasts explicit. Downcasts
remain implicit and work as before.
Following dyn_cast as a mental model checking whether a Value *V isa
CallSite now looks like this:
if (auto CS = CallSite(V)) // think dyn_cast
instead of:
if (CallSite CS = V)
This is an extra token but I think it is slightly clearer. Making the
ctor explicit has the advantage of not accidentally creating nullptr
CallSites, e.g. when you pass a Value * to a function taking a CallSite
argument.
llvm-svn: 234601
This pushes the use of PointerType::getElementType up into several
callers - I'll essentially just have to keep pushing that up the stack
until I can eliminate every call to it...
llvm-svn: 233604
Summary: This is a first step toward getting proper support for aggregate loads and stores.
Test Plan: Added unittests
Reviewers: reames, chandlerc
Reviewed By: chandlerc
Subscribers: majnemer, joker.eph, chandlerc, llvm-commits
Differential Revision: http://reviews.llvm.org/D7780
Patch by Amaury Sechet
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 232284
As a follow-up to r232200, add an `-instcombine` to canonicalize scalar
allocations to `i32 1`. Since r232200, `iX 1` (for X != 32) are only
created by RAUWs, so this shouldn't fire too often. Nevertheless, it's
a cheap check and a nice cleanup.
llvm-svn: 232202
Move type promotion of the size of the array allocation to the end of
`simplifyAllocaArraySize()`. This avoids promoting the type of the
array size if it's a `ConstantInt`, since the next -instcombine
iteration will drop it to a scalar allocation anyway. Similarly, this
avoids promoting the type if it's an `UndefValue`, in which case the
alloca gets RAUW'ed.
This is NFC when considered over the lifetime of -instcombine, since
it's just reducing the number of iterations needed to reach fixed point.
llvm-svn: 232201
Write the `alloca` array size explicitly when it's non-canonical.
Previously, if the array size was `iX 1` (where X is not 32), the type
would mutate to `i32` when round-tripping through assembly.
The testcase I added fails in `verify-uselistorder` (as well as
`FileCheck`), since the use-lists for `i32 1` and `i64 1` change.
(Manman Ren came across this when running `verify-uselistorder` on some
non-trivial, optimized code as part of PR5680.)
The type mutation started with r104911, which allowed array sizes to be
something other than an `i32`. Starting with r204945, we
"canonicalized" to `i64` on 64-bit platforms -- and then on every
round-trip through assembly, mutated back to `i32`.
I bundled a fixup for `-instcombine` to avoid r204945 on scalar
allocations. (There wasn't a clean way to sequence this into two
commits, since the assembly change on its own caused testcase churn, and
the `-instcombine` change can't be tested without the assembly changes.)
An obvious alternative fix -- change `AllocaInst::AllocaInst()`,
`AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for
scalar allocations -- was rejected out of hand, since this required
teaching them each about the data layout.
A follow-up commit will add an `-instcombine` to canonicalize the scalar
allocation array size to `i32 1` rather than leaving `iX 1` alone.
rdar://problem/20075773
llvm-svn: 232200