The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
Summary:
We can fold compares to false when two distinct allocations within a
function are compared for equality.
Patch by Anna Thomas!
Reviewers: majnemer, reames, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19390
llvm-svn: 267214
Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case.
llvm-svn: 267210
Summary:
When optimizing PHIs which have inputs floating point binary
operators, we preserve all IR flags except the fast math
flags.
This change removes the logic which tracked some of the IR flags
(no wrap, exact) and replaces it by doing an and on the IR flags of
all inputs to the PHI - which will also handle the fast math
flags.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19370
llvm-svn: 267139
Summary:
If we know that the pointer allocated within a function does not escape,
we can fold away comparisons that are done with global pointers
Patch by Anna Thomas!
Reviewers: reames, majnemer, sanjoy
Subscribers: mgrang, mcrosier, majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D19276
llvm-svn: 267035
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
This builds on 266999 which made FindAvailableValue do the right thing. Tests included show the newly enabled transforms and those which disabled either due to conservatism or correctness requirements.
llvm-svn: 267006
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic. If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.
Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.
llvm-svn: 266801
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
I accidentally replaced `mayBeOverridden` with `!isInterposable`.
Remove the negation and add a test case that would've caught this.
Many thanks to Håkan Hjort for spotting this!
llvm-svn: 266551
InstCombine wants to optimize compares of calls to fabs with zero.
However, we didn't have the necessary legality checking to verify that
the function call had the same behavior as fabs.
llvm-svn: 266452
https://llvm.org/bugs/show_bug.cgi?id=27105
We can check if all bits outside of a constant mask are set with a
single constant.
As noted in the bug report, although this form should be considered the
canonical IR, backends may want to transform this into an 'andn' / 'andc'
comparison against zero because that could be a single machine instruction.
Differential Revision: http://reviews.llvm.org/D18842
llvm-svn: 266362
Remove an ad-hoc transform in InstCombine and replace it with more
general machinery (ValueTracking, InstructionSimplify and VectorUtils).
This fixes PR27332.
llvm-svn: 266175
This bug was introduced with:
http://reviews.llvm.org/rL262269
AVX masked loads are specified to set vector lanes to zero when the high bit of the mask
element for that lane is zero:
"If the mask is 0, the corresponding data element is set to zero in the load form of these
instructions, and unmodified in the store form." --Intel manual
Differential Revision: http://reviews.llvm.org/D19017
llvm-svn: 266148
`allocsize` is a function attribute that allows users to request that
LLVM treat arbitrary functions as allocation functions.
This patch makes LLVM accept the `allocsize` attribute, and makes
`@llvm.objectsize` recognize said attribute.
The review for this was split into two patches for ease of reviewing:
D18974 and D14933. As promised on the revisions, I'm landing both
patches as a single commit.
Differential Revision: http://reviews.llvm.org/D14933
llvm-svn: 266032
We need just a couple of logic tweaks to consolidate the shl and lshr cases.
This is step 5 of refactoring to solve PR26760:
https://llvm.org/bugs/show_bug.cgi?id=26760
llvm-svn: 265965
This is the straightforward fix for PR26760:
https://llvm.org/bugs/show_bug.cgi?id=26760
But we still need to make some changes to generalize this helper function
and then send the lshr case into here.
llvm-svn: 265960
We had a select of a cast of a select but attempted to replace the outer
select with the inner select dispite their incompatible types.
Patch by Anton Korobeynikov!
This fixes PR27236.
llvm-svn: 265805
Two or more identical assumes are occasionally next to each other in a
basic block.
While our generic machinery will turn a redundant assume into a no-op,
it is not super cheap.
We can perform a simpler check to achieve the same result for this case.
llvm-svn: 265801
Summary:
Fixes PR26774.
If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".
Motivation:
I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard. So transforming:
```
void f(unsigned x) {
unsigned t = 5 / x;
(void)t;
}
```
to
```
void f(unsigned x) { }
```
is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).
Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM. For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).
Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have. This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.
For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store. As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal. The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal. Such a
refined variant will look like it is `readonly`. However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.
Note: this is not just a problem with atomics or with linking
differently optimized object files. See PR26774 for more realistic
examples that involved neither.
This patch:
This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time. It then changes a set of IPO passes to bail out if they see
such a function.
Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18634
llvm-svn: 265762
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Originally landed as r261544, then reverted in r261544 for (incidental)
build breakage. Re-landed here with no changes.
Reviewers: chandlerc, jingyue
Subscribers: llvm-commits, tra, jhen, hfinkel
Differential Revision: http://reviews.llvm.org/D17739
llvm-svn: 263481
This follows up on the related AVX instruction transforms, but this
one is too strange to do anything more with. Intel's behavioral
description of this instruction in its Software Developer's Manual
is tragi-comic.
llvm-svn: 263340
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date: Fri Mar 11 17:15:50 2016 +0000
Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
91177308-0d34-0410-b5e6-96231b3b80d8
until we can figure out what to do about clang and Release build testing.
This reverts commit 263258.
llvm-svn: 263321
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
This was originally a pointer to support pass managers which didn't use
AnalysisManagers. However, that doesn't realistically come up much and
the complexity of supporting it doesn't really make sense.
In fact, *many* parts of the pass manager were just assuming the pointer
was never null already. This at least makes it much more explicit and
clear.
llvm-svn: 263219
Since the names are used in a loop this does more work in debug builds. In
release builds value names are generally discarded so we don't have to do
the concatenation at all. It's also simpler code, no functional change
intended.
llvm-svn: 263215
When checking whether an smin is positive, we can move the comparison to one of the inputs if the other is known positive. If the known positive one is the min, then the other can't be negative. If the other is the min, then we compute the min.
Differential Revision: http://reviews.llvm.org/D17873
llvm-svn: 263059
As part of r251146 InstCombine was extended to call computeKnownBits on
every value in the function to determine whether it happens to be
constant. This increases typical compiletime by 1-3% (5% in irgen+opt
time) in my measurements. On the other hand this case did not trigger
once in the whole llvm-testsuite.
This patch introduces the notion of ExpensiveCombines which are only
enabled for OptLevel > 2. I removed the check in InstructionSimplify as
that is called from various places where the OptLevel is not known but
given the rarity of the situation I think a check in InstCombine is
enough.
Differential Revision: http://reviews.llvm.org/D16835
llvm-svn: 263047
Original commit message:
calculate builtin_object_size if argument is a removable pointer
This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D17337
Reland the original change with a small modification (first do a null check
and then do the cast) to satisfy ubsan.
llvm-svn: 263011
Given that we're not actually reducing the instruction count in the included
regression tests, I think we would call this a canonicalization step.
The motivation comes from the example in PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702
If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
example of:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
%not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%bc = bitcast <4 x i32> %not to <2 x i64>
%notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
%bc2 = bitcast <2 x i64> %notnot to <4 x i32>
ret <4 x i32> %bc2
}
Simplifies to the expected:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
ret <4 x i32> %lobit
}
Differential Revision: http://reviews.llvm.org/D17583
llvm-svn: 262645
Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized.
Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17828
llvm-svn: 262530
Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements.
Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15890
llvm-svn: 262521
As noted in the code comment, I don't think we can do the same transform that we do for
*scalar* integers comparisons to *vector* integers comparisons because it might pessimize
the general case.
Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
for integer vectors.
But we should now recognize all the variants of this construct and produce the optimal code
for the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
llvm-svn: 262424
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17742
llvm-svn: 262419
This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D17337
llvm-svn: 262337
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:
__m128 mload_zeros(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0));
}
__m128 mload_fakeones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(1));
}
__m128 mload_ones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}
__m128 mload_oneset(float *f) {
return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}
...so none of the above will actually generate a masked load for optimized code.
This is the masked load counterpart to:
http://reviews.llvm.org/rL262064
llvm-svn: 262269
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the store mask is constant:
void mstore_zero_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0), v);
}
void mstore_fake_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(1), v);
}
void mstore_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v);
}
void mstore_one_set_elt_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v);
}
...so none of the above will actually generate a masked store for optimized code.
Differential Revision: http://reviews.llvm.org/D17485
llvm-svn: 262064
This is part of the payoff for the refactoring in:
http://reviews.llvm.org/rL261649http://reviews.llvm.org/rL261707
In addition to removing a pile of duplicated code, the xor case was
missing the optimization for vector types because it checked
"SrcTy->isIntegerTy()" rather than "SrcTy->isIntOrIntVectorTy()"
like 'and' and 'or' were already doing.
This solves part of:
https://llvm.org/bugs/show_bug.cgi?id=26702
llvm-svn: 261750
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16180
llvm-svn: 261736
Note: The 'and' case in foldCastedBitwiseLogic() is inheriting one extra
check from the nearly identical 'or' case:
if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))
But I'm not sure how to expose that difference in a regression test.
Without that check, the 'or' path will infinite loop on:
test/Transforms/InstCombine/zext-or-icmp.ll
because the zext-or-icmp fold is attempting a reverse transform.
The refactoring should extend to the 'xor' case next to solve part of
PR26702.
llvm-svn: 261707
This is a straight cut and paste of the existing code and is intended to
be the first step in solving part of PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702
We should be able to reuse most of this and delete the nearly identical
existing code in visitOr(). Then, we can enhance visitXor() to use the
same code too.
llvm-svn: 261649
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Reviewers: chandlerc, jingyue
Subscribers: hfinkel, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17317
llvm-svn: 261544
more places to prevent gratuitous re-"runs" of these passes.
The passes themselves don't do any work when run, but we keep spending
time scheduling and running these needlessly when we really don't need
to do so.
This is the first patch towards fixing the really horrible loop pass
pipeline fragmentation pointed out by Sanjoy in PR24804.
llvm-svn: 261302
Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it.
Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17326
llvm-svn: 261139
For some cases, InstCombine replaces the sequence of xor/sub instruction
followed by cmp instruction into a single cmp instruction.
However, this replacement may result suboptimal result especially when
the xor/sub has more than one use, as discussed in
bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465).
This patch make the replacement happen only when xor/sub has only one
use.
Differential Revision: http://reviews.llvm.org/D16915
Patch by Taewook Oh!
llvm-svn: 260695
Original commit message:
[InstCombine] Fold IntToPtr and PtrToInt into preceding loads.
Currently we only fold a BitCast into a Load when the BitCast is its
only user.
Do the same for any no-op cast.
Patch by Philip Pfaffe!
Differential Revision: http://reviews.llvm.org/D9152
llvm-svn: 260612
When optimizing a extractvalue(load), we generate a load from the
aggregate type. This load didn't have alignment set and so would
get the alignment of the type. This breaks when the type is packed
and so the alignment should be lower.
For example, loading { int, int } would give us alignment of 4, but
the original load from this type may have an alignment of 1 if packed.
Reviewed by David Majnemer
Differential revision: http://reviews.llvm.org/D17158
llvm-svn: 260587
Summary:
When a PHI is used only to be compared with zero, it is possible to replace an
incoming value with any non-zero constant if the incoming value can be proved as
a known nonzero value. For example, in below code, we can replace the incoming value %v with
any non-zero constant based on the fact that the PHI is only used to be compared with zero
and %v is a known non-zero value:
%v = select %cond, 1, 2
%p = phi [%v, BB] ...
%c = icmp eq, %p, 0
Reviewers: mcrosier, jmolloy, sanjoy
Subscribers: hfinkel, mcrosier, majnemer, llvm-commits, haicheng, bmakam, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16240
llvm-svn: 260530
We introduced gc.relocates of vector-of-pointer types a couple of weeks back. Somehow, I missed updating the InstCombine rule to account for this. If we hit this code path with a vector-of-pointers gc.relocate, we'd crash on a cast<PointerType>.
I also took the chance to do a bit of code style cleanup.
llvm-svn: 260279
According to git bisect, this is the root cause of a miscompile for Regex in
libLLVMSupport. I am still working on reducing a test case.
The actual bug may be elsewhere and this commit just exposed it.
Anyway, at the moment, to reproduce, follow these steps:
1. Build clang and libLTO in release mode.
2. Create a new build directory <stage2> and cd into it.
3. Use clang and libLTO from #1 to build llvm-extract in Release mode + asserts
using -O2 -flto
4. Run llvm-extract -ralias '.*bar' -S test/Other/extract-alias.ll
Result:
program doesn't contain global named '.*bar'!
Expected result:
@a0a0bar = alias void ()* @bar
@a0bar = alias void ()* @bar
declare void @bar()
Note: In step #3, if you don't use lto or asserts, the miscompile disappears.
llvm-svn: 259674
A masked store with a zero mask means there's no store.
A masked store with an allOnes mask means it's a normal vector store.
This is a continuation of:
http://reviews.llvm.org/rL259369
llvm-svn: 259392
This miscompile came about because we tried to use a transform which was
only appropriate for xor operators when addition was present.
This fixes PR26407.
llvm-svn: 259375
A masked load with a zero mask means there's no load.
A masked load with an allOnes mask means it's a normal vector load.
Differential Revision: http://reviews.llvm.org/D16691
llvm-svn: 259369
We would infinite loop because we created a shufflevector that was wider than
needed and then failed to combine that with the insertelement. When subsequently
visiting the extractelement from that shuffle, we see that it's unnecessary,
delete it, and trigger another visit to the insertelement.
llvm-svn: 259236
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
InstCombine and SCCP both want to remove dead code in a very particular
way but using identical means to do so. Share the code between the two.
No functionality change is intended.
llvm-svn: 258653
The intrinsic target prefix should match the target name
as it appears in the triple.
This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.
llvm-svn: 258557
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
This contains a fix for the issue that caused the revert:
we no longer assume that we can insert instructions after the
instruction that produces the base pointer. We previously
assumed that this would be ok, because the instruction produces
a value and therefore is not a terminator. This is false for invoke
instructions. We will now insert these new instruction directly
at the location of the users.
Original commit message:
[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.
InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.
This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.
Reviewers: majnemer, jmolloy
Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits
Differential Revision: http://reviews.llvm.org/D15146
llvm-svn: 257897
There are several requirements that ended up with this design;
1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
2. Bitreversals and byteswaps are very related in their matching logic.
3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
4. Bswaps are best matched early in InstCombine.
The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.
We can then extend the matching logic in one place only.
llvm-svn: 257875
In setInsertionPoint if the value is not a PHI, Instruction or
Argument it should be a Constant, not a ConstantExpr.
Original commit message:
[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.
InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.
This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.
Reviewers: majnemer, jmolloy
Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits
Differential Revision: http://reviews.llvm.org/D15146
llvm-svn: 257164
Limit this transform to a basic block and guard against PHIs.
Hopefully, this fixes the remaining failures in PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999
llvm-svn: 257133
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.
InstCombine was already doing this when comparing two GEPs if the
base pointers were the same. However, in the case where we have
complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs,
conversions to or from integers, etc) the value of the original
base pointer will be hidden to the optimizer and this transformation
will be disabled.
This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the
relevant uses of GEPs to GEPs with a common base pointer. The
GEP comparison will be converted to a comparison done on indices.
Reviewers: majnemer, jmolloy
Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits
Differential Revision: http://reviews.llvm.org/D15146
llvm-svn: 257064
Summary:
This commit renames GCRelocateOperands to GCRelocateInst and makes it an
intrinsic wrapper, similar to e.g. MemCpyInst. Also, all users of
GCRelocateOperands were changed to use the new intrinsic wrapper instead.
Reviewers: sanjoy, reames
Subscribers: reames, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15762
llvm-svn: 256811
Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction.
Reviewers: reames, majnemer
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D15859
llvm-svn: 256792
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229
The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.
The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109
For that example, the IR becomes:
%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2
And x86 SSE output improves from:
movq (%rdi), %xmm1 ## xmm1 = mem[0],zero
movdqa %xmm1, %xmm2
shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3]
shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0]
retq
To the almost optimal:
movhpd (%rdi), %xmm0
Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.
Differential Revision: http://reviews.llvm.org/D15096
llvm-svn: 256394
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.
Key points:
* We only remove unordered or simple stores.
* The loads producing values consumed by dead stores don't influence whether the store is dead.
Differential Revision: http://reviews.llvm.org/D15354
llvm-svn: 255932
For non padded structs, we can just proceed and deaggregate them.
We don't want ot do this when there is padding in the struct as to not
lose information about this padding (the subsequents passes would then
try hard to preserve the padding, which is undesirable).
Also update extractvalue.ll and cast.ll so that they use structs with padding.
Remove the FIXME in the extractvalue of laod case as the non padded case is
handled when processing the load, and we don't want to do it on the padded
case.
Patch by: Amaury SECHET <deadalnix@gmail.com>
Differential Revision: http://reviews.llvm.org/D14483
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255600
This change was discussed in D15392. It allows us to remove the fold that was added
in:
http://reviews.llvm.org/r255261
...and it will allow us to generalize this fold:
http://reviews.llvm.org/rL112232
while preserving the order of bitcast + extract that it produces and testing shows
is better handled by the backend.
Note that the existing check for "isVectorTy()" wasn't strong enough in general
and specifically because: x86_mmx. It's not a vector, but it's not vectorizable
either. So here we check VectorType::isValidElementType() directly before
proceeding with the transform.
llvm-svn: 255433
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.
llvm-svn: 255334
This is a redo of r255137 (reverted at r255227) which was a redo of
r255124 (reverted at r255126) with a fixed check for a scalar source
type and an added test for the failure that caused the revert.
Original commit message:
Example:
bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
--->
extractelement <2 x float> %X, i32 1
This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543
The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)
Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232
with 2 less specific transforms to catch the case in the bug report.
Differential Revision: http://reviews.llvm.org/D14879
llvm-svn: 255261
This is a redo of r255124 (reverted at r255126) with an added check for a
scalar destination type and an added test for the failure seen in Clang's
test/CodeGen/vector.c. The extra test shows a different missing optimization.
Original commit message:
Example:
bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
--->
extractelement <2 x float> %X, i32 1
This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543
The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)
Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232
with 2 less specific transforms to catch the case in the bug report.
Differential Revision: http://reviews.llvm.org/D14879
llvm-svn: 255137