In the future, AttributeWithIndex won't be used anymore. Besides, it exposes the
internals of the AttributeSet to outside users, which isn't goodness.
llvm-svn: 173602
The 'getSlot' function and its ilk allow introspection into the AttributeSet
class. However, that class should be opaque. Allow access through accessor
methods instead.
llvm-svn: 173522
Collections of attributes are handled via the AttributeSet class now. This
finally frees us up to make significant changes to how attributes are structured.
llvm-svn: 173228
This is more code to isolate the use of the Attribute class to that of just
holding one attribute instead of a collection of attributes.
llvm-svn: 173094
Because the Attribute class is going to stop representing a collection of
attributes, limit the use of it as an aggregate in favor of using AttributeSet.
This replaces some of the uses for querying the function attributes.
llvm-svn: 172844
some optimization opportunities (in the enclosing supper-expressions).
rule 1. (-0.0 - X ) * Y => -0.0 - (X * Y)
if expression "-0.0 - X" has only one reference.
rule 2. (0.0 - X ) * Y => -0.0 - (X * Y)
if expression "0.0 - X" has only one reference, and
the instruction is marked "noSignedZero".
2. Eliminate negation (The compiler was already able to handle these
opt if the 0.0s are replaced with -0.0.)
rule 3: (0.0 - X) * (0.0 - Y) => X * Y
rule 4: (0.0 - X) * C => X * -C
if the expr is flagged "noSignedZero".
3.
Rule 5: (X*Y) * X => (X*X) * Y
if X!=Y and the expression is flagged with "UnsafeAlgebra".
The purpose of this transformation is two-fold:
a) to form a power expression (of X).
b) potentially shorten the critical path: After transformation, the
latency of the instruction Y is amortized by the expression of X*X,
and therefore Y is in a "less critical" position compared to what it
was before the transformation.
4. Remove the InstCombine code about simplifiying "X * select".
The reasons are following:
a) The "select" is somewhat architecture-dependent, therefore the
higher level optimizers are not able to precisely predict if
the simplification really yields any performance improvement
or not.
b) The "select" operator is bit complicate, and tends to obscure
optimization opportunities. It is btter to keep it as low as
possible in expr tree, and let CodeGen to tackle the optimization.
llvm-svn: 172551
---------------------------------------------------------------------------
C_A: reassociation is allowed
C_R: reciprocal of a constant C is appropriate, which means
- 1/C is exact, or
- reciprocal is allowed and 1/C is neither a special value nor a denormal.
-----------------------------------------------------------------------------
rule1: (X/C1) / C2 => X / (C2*C1) (if C_A)
=> X * (1/(C2*C1)) (if C_A && C_R)
rule 2: X*C1 / C2 => X * (C1/C2) if C_A
rule 3: (X/Y)/Z = > X/(Y*Z) (if C_A && at least one of Y and Z is symbolic value)
rule 4: Z/(X/Y) = > (Z*Y)/X (similar to rule3)
rule 5: C1/(X*C2) => (C1/C2) / X (if C_A)
rule 6: C1/(X/C2) => (C1*C2) / X (if C_A)
rule 7: C1/(C2/X) => (C1/C2) * X (if C_A)
llvm-svn: 172488
o. X/C1 * C2 => X * (C2/C1) (if C2/C1 is neither special FP nor denormal)
o. X/C1 * C2 -> X/(C1/C2) (if C2/C1 is either specical FP or denormal, but C1/C2 is a normal Fp)
Let MDC denote multiplication or dividion with one & only one operand being a constant
o. (MDC ± C1) * C2 => (MDC * C2) ± (C1 * C2)
(so long as the constant-folding doesn't yield any denormal or special value)
llvm-svn: 171793
turning a code like this:
if (foo)
free(foo)
into that:
free(foo)
Move a call to free from basic block FB into FB's predecessor, P,
when the path from P to FB is taken only if the argument of free is
not equal to NULL.
Some restrictions apply on P and FB to be sure that this code motion
is profitable. Namely:
1. FB must have only one predecessor P.
2. FB must contain only the call to free plus an unconditional
branch to S.
3. P's successors are FB and S.
Because of 1., we will not increase the code size when moving the call
to free from FB to P.
Because of 2., FB will be empty after the move.
Because of 2. and 3., P's branch instruction becomes useless, so as FB
(simplifycfg will do the job).
llvm-svn: 171762
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
The later API is nicer than the former, and is correct regarding wrap-around offsets (if anyone cares).
There are a few more places left with duplicated code, which I'll remove soon.
llvm-svn: 171259
such as by a compiler warning, a check in clang -fsanitizer=undefined, being
optimized to unreachable, or a combination of the above. PR14722.
llvm-svn: 171119
When the backend is used from clang, it should produce proper diagnostics
instead of just printing messages to errs(). Other clients may also want to
register their own error handlers with the LLVMContext, and the same handler
should work for warnings in the same way as the existing emitError methods.
llvm-svn: 171041
When the least bit of C is greater than V, (x&C) must be greater than V
if it is not zero, so the comparison can be simplified.
Although this was suggested in Target/X86/README.txt, it benefits any
architecture with a directly testable form of AND.
Patch by Kevin Schoedel
llvm-svn: 170576
This assumes (1 << n) is always not zero. Consider n is greater than word size.
Although I know it is undefined, this transforms undefined behavior hidden.
This led clang unexpected behavior with some failures. I will investigate to fix undefined shl in clang.
llvm-svn: 170128
In a previous thread it was pointed out that isPowerOfTwo is not a very precise
name since it can return false for powers of two if it is unable to show that
they are powers of two.
llvm-svn: 170093
Provides m_Argument that allows matching against a CallSite's specified argument. Provides m_Intrinsic pattern that can be templatized over the intrinsic id and bind/match arguments similarly to other pattern matchers. Implementations provided for 0 to 4 arguments, though it's very simple to extend for more. Also provides example template specialization for bswap (m_BSwap) and example of code cleanup for its use.
llvm-svn: 170091
been used in the first place. It simply was passed to the function and to the
recursive invocations. Simply drop the parameter and update the callers for the
new signature.
Patch by Saleem Abdulrasool!
llvm-svn: 169988
This change attempts to simplify (X^Y) -> X or Y in the user's context if we know that
only bits from X or Y are demanded.
A minimized case is provided bellow. This change will simplify "t>>16" into "var1 >>16".
=============================================================
unsigned foo (unsigned val1, unsigned val2) {
unsigned t = val1 ^ 1234;
return (t >> 16) | t; // NOTE: t is used more than once.
}
=============================================================
Note that if the "t" were used only once, the expression would be finally optimized as well.
However, with with this change, the optimization will take place earlier.
Reviewed by Nadav, Thanks a lot!
llvm-svn: 169317
missed in the first pass because the script didn't yet handle include
guards.
Note that the script is now able to handle all of these headers without
manual edits. =]
llvm-svn: 169224
This change tries to simmplify E1 = " X >> C1 << C2" into :
- E2 = "X << (C2 - C1)" if C2 > C1, or
- E2 = "X >> (C1 - C2)" if C1 > C2, or
- E2 = X if C1 == C2.
Reviewed by Nadav. Thanks!
llvm-svn: 169182
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
The simplify-libcalls pass maintained a statistic to count the number
of library calls that have been simplified. Now that library call
simplification is being carried out in instcombine the statistic should
be moved to there.
llvm-svn: 168975
depends on the IR infrastructure, there is no sense in it being off in
Support land.
This is in preparation to start working to expand InstVisitor into more
special-purpose visitors that are still generic and can be re-used
across different passes. The expansion will go into the Analylis tree
though as nothing in VMCore needs it.
llvm-svn: 168972
My commit to migrate the printf simplifiers from the simplify-libcalls
in r168604 introduced a regression reported by Duncan [1]. The problem
is that in some cases the library call simplifier can return a new value
that has no uses and the new value's type is different than the old value's
type (which is fine because there are no uses). The specific case that
triggered the bug looked something like:
declare void @printf(i8*, ...)
...
call void (i8*, ...)* @printf(i8* %fmt)
Which we want to optimized into:
call i32 @putchar(i32 104)
However, the code was attempting to replace all uses of the printf with
the putchar and the types differ, hence a crash. This is fixed by *just*
deleting the original instruction when there are no uses. The old
simplify-libcalls pass is already doing something similar.
[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-November/056338.html
llvm-svn: 168716
InstCombineLoadStoreAlloca.cpp, which had many issues.
(At least two bugs were noted on llvm-commits, and it was overly conservative.)
Instead, use getOrEnforceKnownAlignment.
llvm-svn: 168629
Enhancement to InstCombine. Try to catch this opportunity:
---------------------------------------------------------------
((X^C1) >> C2) ^ C3 => (X>>C2) ^ ((C1>>C2)^C3)
where the subexpression "X ^ C1" has more than one uses, and
"(X^C1) >> C2" has single use.
----------------------------------------------------------------
Reviewed by Nadav (with minor change per his request).
llvm-svn: 168615
When code deletes the context, the AttributeImpls that the AttrListPtr points to
are now invalid. Therefore, instead of keeping a separate managed static for the
AttrListPtrs that's reference counted, move it into the LLVMContext and delete
it when deleting the AttributeImpls.
llvm-svn: 168354
replaced by this patch is equivalent to the new logic, but you'd be wrong, and
that's exactly where the bug was. There's a similar bug in instsimplify which
manifests itself as instsimplify failing to simplify this, rather than doing it
wrong, see next commit.
llvm-svn: 168181
This patch migrates the math library call simplifications from the
simplify-libcalls pass into the instcombine library call simplifier.
I have typically migrated just one simplifier at a time, but the math
simplifiers are interdependent because:
1. CosOpt, PowOpt, and Exp2Opt all depend on UnaryDoubleFPOpt.
2. CosOpt, PowOpt, Exp2Opt, and UnaryDoubleFPOpt all depend on
the option -enable-double-float-shrink.
These two factors made migrating each of these simplifiers individually
more of a pain than it would be worth. So, I migrated them all together.
llvm-svn: 167815
In some cases the library call simplifier may need to replace instructions
other than the library call being simplified. In those cases it may be
necessary for clients of the simplifier to override how the replacements
are actually done. As such, a new overrideable method for replacing
instructions was added to LibCallSimplifier.
A new subclass of LibCallSimplifier is also defined which overrides
the instruction replacement method. This is because the instruction
combiner defines its own replacement method which updates the worklist
when instructions are replaced.
llvm-svn: 167681
r165941: Resubmit the changes to llvm core to update the functions to
support different pointer sizes on a per address space basis.
Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.
However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.
In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.
In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.
This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.
llvm-svn: 167222
getIntPtrType support for multiple address spaces via a pointer type,
and also introduced a crasher bug in the constant folder reported in
PR14233.
These commits also contained several problems that should really be
addressed before they are re-committed. I have avoided reverting various
cleanups to the DataLayout APIs that are reasonable to have moving
forward in order to reduce the amount of churn, and minimize the number
of commits that were reverted. I've also manually updated merge
conflicts and manually arranged for the getIntPtrType function to stay
in DataLayout and to be defined in a plausible way after this revert.
Thanks to Duncan for working through this exact strategy with me, and
Nick Lewycky for tracking down the really annoying crasher this
triggered. (Test case to follow in its own commit.)
After discussing with Duncan extensively, and based on a note from
Micah, I'm going to continue to back out some more of the more
problematic patches in this series in order to ensure we go into the
LLVM 3.2 branch with a reasonable story here. I'll send a note to
llvmdev explaining what's going on and why.
Summary of reverted revisions:
r166634: Fix a compiler warning with an unused variable.
r166607: Add some cleanup to the DataLayout changes requested by
Chandler.
r166596: Revert "Back out r166591, not sure why this made it through
since I cancelled the command. Bleh, sorry about this!
r166591: Delete a directory that wasn't supposed to be checked in yet.
r166578: Add in support for getIntPtrType to get the pointer type based
on the address space.
llvm-svn: 167221
%V = mul i64 %N, 4
%t = getelementptr i8* bitcast (i32* %arr to i8*), i32 %V
into
%t1 = getelementptr i32* %arr, i32 %N
%t = bitcast i32* %t1 to i8*
incorporating the multiplication into the getelementptr.
This happens all the time in dragonegg, for example for
int foo(int *A, int N) {
return A[N];
}
because gcc turns this into byte pointer arithmetic before it hits the plugin:
D.1590_2 = (long unsigned int) N_1(D);
D.1591_3 = D.1590_2 * 4;
D.1592_5 = A_4(D) + D.1591_3;
D.1589_6 = *D.1592_5;
return D.1589_6;
The D.1592_5 line is a POINTER_PLUS_EXPR, which is turned into a getelementptr
on a bitcast of A_4 to i8*, so this becomes exactly the kind of IR that the
transform fires on.
An analogous transform (with no testcases!) already existed for bitcasts of
arrays, so I rewrote it to share code with this one.
llvm-svn: 166474
An obfuscated splat is where the frontend poorly generates code for a splat
using several different shuffles to create the splat, i.e.,
%A = load <4 x float>* %in_ptr, align 16
%B = shufflevector <4 x float> %A, <4 x float> undef, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>
%C = shufflevector <4 x float> %B, <4 x float> %A, <4 x i32> <i32 0, i32 1, i32 4, i32 undef>
%D = shufflevector <4 x float> %C, <4 x float> %A, <4 x i32> <i32 0, i32 1, i32 2, i32 4>
llvm-svn: 166061
Convert the internal representation of the Attributes class into a pointer to an
opaque object that's uniqued by and stored in the LLVMContext object. The
Attributes class then becomes a thin wrapper around this opaque
object. Eventually, the internal representation will be expanded to include
attributes that represent code generation options, etc.
llvm-svn: 165917
This patch implements the new LibCallSimplifier class as outlined in [1].
In addition to providing the new base library simplification infrastructure,
all the fortified library call simplifications were moved over to the new
infrastructure. The rest of the library simplification optimizations will
be moved over with follow up patches.
NOTE: The original fortified library call simplifier located in the
SimplifyFortifiedLibCalls class was not removed because it is still
used by CodeGenPrepare. This class will eventually go away too.
[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-August/052283.html
llvm-svn: 165873
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.
llvm-svn: 165488
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.
Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.
Fixes PR13694 and probably others.
llvm-svn: 162841
No test case, undefined shifts get folded early, but can occur when other
transforms generate a constant. Thanks to Duncan for bringing this up.
llvm-svn: 162755
This optimization is really just replacing allocas wholesale with
globals, there is no scalarization.
The underlying motivation for this patch is to simplify the SROA pass
and focus it on splitting and promoting allocas.
llvm-svn: 162271
- memcpy size is wrongly truncated into 32-bit and treat 8GB memcpy is
0-sized memcpy
- as 0-sized memcpy/memset is already removed before SimplifyMemTransfer
and SimplifyMemSet in visitCallInst, replace 0 checking with
assertions.
- replace getZExtValue() with getLimitedValue() according to
Eli Friedman
llvm-svn: 161923
An unsigned value converted to floating-point will always be greater than
a negative constant. Unfortunately InstCombine reversed the check so that
unsigned values were being optimized to always be greater than all positive
floating-point constants. <rdar://problem/12029145>
llvm-svn: 161452
This can happen as long as the instruction is not reachable. Instcombine does generate these unreachable malformed selects when doing RAUW
llvm-svn: 160874
%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, -1
%and = and i64 %sub, %shr
ret i64 %and
to:
%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, 2305843009213693951
%and = and i64 %sub, %shr
ret i64 %and
The demanded bit optimization is actually a pessimization because add -1 would
be codegen'ed as a sub 1. Teach the demanded constant shrinking optimization
to check for negated constant to make sure it is actually reducing the width
of the constant.
rdar://11793464
llvm-svn: 160101
This patch removes ~70 lines in InstCombineLoadStoreAlloca.cpp and makes both functions a bit more aggressive than before :)
In theory, we can be more aggressive when removing an alloca than a malloc, because an alloca pointer should never escape, but we are not taking advantage of this anyway
llvm-svn: 159952
This means we can do cheap DSE for heap memory.
Nothing is done if the pointer excapes or has a load.
The churn in the tests is mostly due to objectsize, since we want to make sure we
don't delete the malloc call before evaluating the objectsize (otherwise it becomes -1/0)
llvm-svn: 159876
This was always part of the VMCore library out of necessity -- it deals
entirely in the IR. The .cpp file in fact was already part of the VMCore
library. This is just a mechanical move.
I've tried to go through and re-apply the coding standard's preferred
header sort, but at 40-ish files, I may have gotten some wrong. Please
let me know if so.
I'll be committing the corresponding updates to Clang and Polly, and
Duncan has DragonEgg.
Thanks to Bill and Eric for giving the green light for this bit of cleanup.
llvm-svn: 159421
// C - zext(bool) -> bool ? C - 1 : C
if (ZExtInst *ZI = dyn_cast<ZExtInst>(Op1))
if (ZI->getSrcTy()->isIntegerTy(1))
return SelectInst::Create(ZI->getOperand(0), SubOne(C), C);
This ends up forming sext i1 instructions that codegen to terrible code. e.g.
int blah(_Bool x, _Bool y) {
return (x - y) + 1;
}
=>
movzbl %dil, %eax
movzbl %sil, %ecx
shll $31, %ecx
sarl $31, %ecx
leal 1(%rax,%rcx), %eax
ret
Without the rule, llvm now generates:
movzbl %sil, %ecx
movzbl %dil, %eax
incl %eax
subl %ecx, %eax
ret
It also helps with ARM (and pretty much any target that doesn't have a sext i1 :-).
The transformation was done as part of Eli's r75531. He has given the ok to
remove it.
rdar://11748024
llvm-svn: 159230
merge all zero-sized alloca's into one, fixing c43204g from the Ada ACATS
conformance testsuite. What happened there was that a variable sized object
was being allocated on the stack, "alloca i8, i32 %size". It was then being
passed to another function, which tested that the address was not null (raising
an exception if it was) then manipulated %size bytes in it (load and/or store).
The optimizers cleverly managed to deduce that %size was zero (congratulations
to them, as it isn't at all obvious), which made the alloca zero size, causing
the optimizers to replace it with null, which then caused the check mentioned
above to fail, and the exception to be raised, wrongly. Note that no loads
and stores were actually being done to the alloca (the loop that does them is
executed %size times, i.e. is not executed), only the not-null address check.
llvm-svn: 159202
- simplifycfg: invoke undef/null -> unreachable
- instcombine: invoke new -> invoke expect(0, 0) (an arbitrary NOOP intrinsic; only done if the allocated memory is unused, of course)
- verifier: allow invoke of intrinsics (to make the previous step work)
llvm-svn: 159146
This fixes PR5997.
These transforms were disabled because codegen couldn't deal with other
uses of trunc(x). This is now handled by the peephole pass.
This causes no regressions on x86-64.
llvm-svn: 159003