Harden r235258 to support any integer bitwidth. The quick glance at
the reference made me think only i32 and i64 were valid types, but
they're not special, so any overload is legal.
Thanks to David Majnemer for noticing!
llvm-svn: 235261
Followup to r235232, which caused PR23278.
We can't assume the memset and memcpy sizes have the same type, as
nothing in the language reference prevents that.
Instead, zext both to i64 if they disagree.
While there, robustify tests by using i8 %c rather than i8 0 for the
memset character.
llvm-svn: 235258
A common idiom in some code is to do the following:
memset(dst, 0, dst_size);
memcpy(dst, src, src_size);
Some of the memset is redundant; instead, we can do:
memcpy(dst, src, src_size);
memset(dst + src_size, 0,
dst_size <= src_size ? 0 : dst_size - src_size);
Original patch by: Joel Jones
Differential Revision: http://reviews.llvm.org/D498
llvm-svn: 235232
Summary:
An alternative is to use a worklist approach. However, that approach
would break the traversing order so that we couldn't lookup SeenExprs
efficiently. I don't see a clear winner here, so I picked the easier approach.
Along with two minor improvements:
1. preserves ScalarEvolution by forgetting instructions replaced
2. removes dead code locally avoiding the need of running DCE afterwards
Test Plan: add to slsr-add.ll a test that requires multiple iterations
Reviewers: broune, dberlin, atrick, meheff
Reviewed By: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9058
llvm-svn: 235151
See r230786 and r230794 for similar changes to gep and load
respectively.
Call is a bit different because it often doesn't have a single explicit
type - usually the type is deduced from the arguments, and just the
return type is explicit. In those cases there's no need to change the
IR.
When that's not the case, the IR usually contains the pointer type of
the first operand - but since typed pointers are going away, that
representation is insufficient so I'm just stripping the "pointerness"
of the explicit type away.
This does make the IR a bit weird - it /sort of/ reads like the type of
the first operand: "call void () %x(" but %x is actually of type "void
()*" and will eventually be just of type "ptr". But this seems not too
bad and I don't think it would benefit from repeating the type
("void (), void () * %x(" and then eventually "void (), ptr %x(") as has
been done with gep and load.
This also has a side benefit: since the explicit type is no longer a
pointer, there's no ambiguity between an explicit type and a function
that returns a function pointer. Previously this case needed an explicit
type (eg: a function returning a void() function was written as
"call void () () * @x(" rather than "call void () * @x(" because of the
ambiguity between a function returning a pointer to a void() function
and a function returning void).
No ambiguity means even function pointer return types can just be
written alone, without writing the whole function's type.
This leaves /only/ the varargs case where the explicit type is required.
Given the special type syntax in call instructions, the regex-fu used
for migration was a bit more involved in its own unique way (as every
one of these is) so here it is. Use it in conjunction with the apply.sh
script and associated find/xargs commands I've provided in rr230786 to
migrate your out of tree tests. Do let me know if any of this doesn't
cover your cases & we can iterate on a more general script/regexes to
help others with out of tree tests.
About 9 test cases couldn't be automatically migrated - half of those
were functions returning function pointers, where I just had to manually
delete the function argument types now that we didn't need an explicit
function type there. The other half were typedefs of function types used
in calls - just had to manually drop the * from those.
import fileinput
import sys
import re
pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)')
addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$")
func_end = re.compile("(?:void.*|\)\s*)\*$")
def conv(match, line):
if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)):
return line
return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():]
for line in sys.stdin:
sys.stdout.write(conv(re.search(pat, line), line))
llvm-svn: 235145
Summary:
This fixes a left-over efficiency issue in D8950.
As Andrew and Daniel suggested, we can store the candidates in a stack
and pop the top element when it does not dominate the current
instruction. This reduces the worst-case time complexity to O(n).
Test Plan: a new test in nary-add.ll that exercises this optimization.
Reviewers: broune, dberlin, meheff, atrick
Reviewed By: atrick
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D9055
llvm-svn: 235129
This is very similar to D8486 / r232852 (vperm2). If we treat insertps intrinsics
as shufflevectors, we can optimize them better.
I've left all but the full zero case of the zero mask variants out of this patch.
I don't think those can be converted into a single shuffle in all cases, but I'd
be happy to be proven wrong as I was for vperm2f128.
Either way, we'd need to support whatever sequence we come up with for those cases
in the backend before converting them here.
Differential Revision: http://reviews.llvm.org/D8833
llvm-svn: 235124
Remove 'inlinedAt:' from MDLocalVariable. Besides saving some memory
(variables with it seem to be single largest `Metadata` contributer to
memory usage right now in -g -flto builds), this stops optimization and
backend passes from having to change local variables.
The 'inlinedAt:' field was used by the backend in two ways:
1. To tell the backend whether and into what a variable was inlined.
2. To create a unique id for each inlined variable.
Instead, rely on the 'inlinedAt:' field of the intrinsic's `!dbg`
attachment, and change the DWARF backend to use a typedef called
`InlinedVariable` which is `std::pair<MDLocalVariable*, MDLocation*>`.
This `DebugLoc` is already passed reliably through the backend (as
verified by r234021).
This commit removes the check from r234021, but I added a new check
(that will survive) in r235048, and changed the `DIBuilder` API in
r235041 to require a `!dbg` attachment whose 'scope:` is in the same
`MDSubprogram` as the variable's.
If this breaks your out-of-tree testcases, perhaps the script I used
(mdlocalvariable-drop-inlinedat.sh) will help; I'll attach it to PR22778
in a moment.
llvm-svn: 235050
Add missing `!dbg` attachments to `@llvm.dbg.*` intrinsics. I updated
these using a script (add-dbg-to-intrinsics.sh) that I'll attach to
PR22778 for posterity.
llvm-svn: 235040
Summary:
With this patch, SLSR may rewrite
S1: X = B + i * S
S2: Y = B + i' * S
to
S2: Y = X + (i' - i) * S
A secondary improvement: if (i' - i) is a power of 2, emit Y as X + (S << log(i' - i)). (S << log(i' -i)) is in a canonical form and thus more likely GVN'ed than (i' - i) * S.
Test Plan: slsr-add.ll
Reviewers: hfinkel, sanjoy, meheff, broune, eliben
Reviewed By: eliben
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8983
llvm-svn: 235019
Inlining such intrinsics is very difficult, since you need to
simultaneously transform many calls to llvm.framerecover and potentially
duplicate the functions containing them. Normally this intrinsic isn't
added until EH preparation, which is part of the backend pass pipeline
after inlining. However, if it were to get fed through the inliner,
this change will ensure that it doesn't break the code.
llvm-svn: 234937
Summary:
This transformation reassociates a n-ary add so that the add can partially reuse
existing instructions. For example, this pass can simplify
void foo(int a, int b) {
bar(a + b);
bar((a + 2) + b);
}
to
void foo(int a, int b) {
int t = a + b;
bar(t);
bar(t + 2);
}
saving one add instruction.
Fixes PR22357 (https://llvm.org/bugs/show_bug.cgi?id=22357).
Test Plan: nary-add.ll
Reviewers: broune, dberlin, hfinkel, meheff, sanjoy, atrick
Reviewed By: sanjoy, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8950
llvm-svn: 234855
Summary:
Runtime unrolling of loops needs to emit an expression to compute the
loop's runtime trip-count. Avoid runtime unrolling if this computation
will be expensive.
Depends on D8993.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8994
llvm-svn: 234846
Summary:
Teach `isHighCostExpansion` to consider divisions by power-of-two
constants as cheap and add a test case. This change is needed for a new
user of `isHighCostExpansion` that will be added in a subsequent change.
Depends on D8995.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8993
llvm-svn: 234845
Clean up a predicate I added in r229731, fix the relevant comment and
add a test case. The earlier version is confusing to read and was also
buggy (probably not a coincidence) till Alexey fixed it in r233881.
llvm-svn: 234701
When rewriting statepoints to make relocations explicit, we need to have a conservative but consistent notion of where a particular pointer is live at a particular site. The old code just used dominance, which is correct, but decidedly more conservative then it needed to be. This patch implements a simple dataflow algorithm that's run one per function (well, twice counting fixup after base pointer insertion). There's still lots of room to make this faster, but it's fast enough for all practical purposes today.
Differential Revision: http://reviews.llvm.org/D8674
llvm-svn: 234657
Two related small changes:
Various dominance based queries about liveness can get confused if we're talking about unreachable blocks. To avoid reasoning about such cases, just remove them before rewriting statepoints.
Remove single entry phis (likely left behind by LCSSA) to reduce the number of live values.
Both of these are motivated by http://reviews.llvm.org/D8674 which will be submitted shortly.
Differential Revision: http://reviews.llvm.org/D8675
llvm-svn: 234651
This patch adds limited support for inserting explicit relocations when there's a vector of pointers live over the statepoint. This doesn't handle the case where the vector contains a mix of base and non-base pointers; that's future work.
The current implementation just scalarizes the vector over the gc.statepoint before doing the explicit rewrite. An alternate approach would be to plumb the vector all the way though the backend lowering, but doing that appears challenging. In particular, the size of the indirect spill slot is currently assumed to be sizeof(pointer) throughout the backend.
In practice, this is enough to allow running the SLP and Loop vectorizers before RewriteStatepointsForGC.
Differential Revision: http://reviews.llvm.org/D8671
llvm-svn: 234647
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep. Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.
Depends on D8888.
Reviewers: majnemer, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8889
llvm-svn: 234638
InstCombine didn't realize that it needs to use DataLayout to determine
how wide pointers are. This lead to assertion failures.
This fixes PR23113.
llvm-svn: 234046
Check that the `MDLocalVariable::getInlinedAt()` in a debug info
intrinsic's variable always matches the `MDLocation::getInlinedAt()` of
its `!dbg` attachment.
The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), since it's expensive and unnecessary, but I'll let
this verifier check bake for a while (a week maybe?) first. I've
updated the testcases that had the wrong value for `inlinedAt:`.
This checks that things are sane in the IR, but currently things go out
of whack in a few places in the backend. I'll follow shortly with
assertions in the backend (with code fixes).
If you have out-of-tree testcases that just started failing, here's how
I updated these ones:
1. The verifier check gives you the basic block, function, instruction,
and relevant metadata arguments (metadata numbering doesn't
necessarily match the source file, unfortunately).
2. Look at the `@llvm.dbg.*()` instruction, and compare the
`inlinedAt:` fields of the variable argument (second `metadata`
argument) and the `!dbg` attachment.
3. Figure out based on the variable `scope:` chain and the functions in
the file whether the variable has been inlined (and into what), so
you can determine which `inlinedAt:` is actually correct. In all of
the in-tree testcases, the `!MDLocation()` was correct and the
`!MDLocalVariable()` was wrong, but YMMV.
4. Duplicate the metadata that you're going to change, and add/drop the
`inlinedAt:` field from one of them. Be careful that the other
references to the same metadata node point at the correct one.
llvm-svn: 234021
Summary:
The old requirement on GEP candidates being in bounds is unnecessary.
For off-bound GEPs, we still have
&B[i * S] = B + (i * S) * e = B + (i * e) * S
Test Plan: slsr_offbound_gep in slsr-gep.ll
Reviewers: meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8809
llvm-svn: 233949
We used to do this before refactorings around r225640.
Some clang users checked for _chk libcall availability using:
__has_builtin(__builtin___memcpy_chk)
When compiling with -fno-builtin, this is always true.
When passing -ffreestanding/-mkernel, which both imply -fno-builtin, we
end up with fortified libcalls, which isn't acceptable in a freestanding
environment which only provides their non-fortified counterparts.
Until we change clang and/or teach external users to check for availability
differently, disregard the "nobuiltin" attribute and TLI::has.
Workaround for PR23093.
llvm-svn: 233776
PPC_FP128 is really the sum of two consecutive doubles, where the first double
is always stored first in memory, regardless of the target endianness. The
memory layout of i128, however, depends on the target endianness, and so we
can't fold this without target endianness information. As a result, we must not
do this folding in lib/IR/ConstantFold.cpp (it could be done instead in
Analysis/ConstantFolding.cpp, but that's not done now).
Fixes PR23026.
llvm-svn: 233481
Fix testcases that don't pass the verifier after a WIP patch to check
`MDSubprogram` operands more effectively. I found the following issues:
- When `isDefinition: false`, the `variables:` field might point at
`!{i32 786468}`, or at a tuple that pointed at an empty tuple with
the comment "previously: invalid DW_TAG_base_type" (I vaguely recall
adding those comments during an upgrade script). In these cases, I
just dropped the array.
- The `variables:` field might point at something like `!{!{!8}}`,
where `!8` was an `MDLocation`. I removed the extra layer of
indirection.
- Invalid `type:` (not an `MDSubroutineType`).
llvm-svn: 233466
Change `llc` and `opt` to run `verifyModule()`. This ensures that we
check the full module before `FunctionPass::doInitialization()` ever
gets called (I was getting crashes in `DwarfDebug` instead of verifier
failures when testing a WIP patch that checks operands of compile
units). In `opt`, also move up debug-info-stripping so that it still
runs before verification.
There was a fair bit of broken code that was sitting in tree.
Interestingly, some were cases of a `select` that referred to itself in
`-instcombine` tests (apparently an intermediate result). I split them
off to `*-noverify.ll` tests with RUN lines like this:
opt < %s -S -disable-verify -instcombine | opt -S | FileCheck %s
This avoids verifying the input file (so we can get the broken code into
`-instcombine), but still verifies the output with a second call to
`opt` (to verify that `-instcombine` will clean it up like it should).
llvm-svn: 233432
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types. The problems look like they were all
caused by bitrot. They fell into these categories:
- Using `!{i32 0}` instead of `!{}`.
- Using `!{null}` instead of `!{}`.
- Using `!MDExpression()` instead of `!{}`.
- Using `!8` instead of `!{!8}`.
- `file:` references that pointed at `MDCompileUnit`s instead of the
same `MDFile` as the compile unit.
- `file:` references that were numerically off-by-one or (off-by-ten).
llvm-svn: 233415
This re-adds float2int to the tree, after fixing PR23038. It turns
out the argument to APSInt() is true-if-unsigned, rather than
true-if-signed :(. Added testcase and explanatory comment.
llvm-svn: 233370
This was discussed a while back and I left it optional for migration. Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.
llvm-svn: 233357
Fix testcases whose variables are invalid. I'm working on a patch that
adds `Verifier` checks for `MDLocalVariable` (and `MDGlobalVariable`),
and these failed because:
- `scope:` fields need to point at `MDLocalScope` and can't be null.
- `file:` fields need to point at `MDFile`.
- `inlinedAt:` fields need to point at `MDLocation`.
llvm-svn: 233349
Summary:
This patch enhances SLSR to handle another candidate form &B[i * S]. If
we found two candidates
S1: X = &B[i * S]
S2: Y = &B[i' * S]
and S1 dominates S2, we can replace S2 with
Y = &X[(i' - i) * S]
Test Plan:
slsr-gep.ll
X86/no-slsr.ll: verify that we do not run SLSR on GEPs that already fit into
an addressing mode
Reviewers: eliben, atrick, meheff, hfinkel
Reviewed By: hfinkel
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D7459
llvm-svn: 233286
A load from an invariant location is assumed to not alias any otherwise potentially aliasing stores. Our implementation only applied this rule to store instructions themselves whereas they it should apply for any memory accessing instruction. This results in both FRE and PRE becoming more effective at eliminating invariant loads.
Note that as a follow on change I will likely move this into AliasAnalysis itself. That's where the TBAA constant flag is handled and the semantics are essentially the same. I'd like to separate the semantic change from the refactoring and thus have extended the hack that's already in MemoryDependenceAnalysis for this change.
Differential Revision: http://reviews.llvm.org/D8591
llvm-svn: 233140
This patch tries to merge duplicate landing pads when they branch to a common shared target.
Given IR that looks like this:
lpad1:
%exn = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
cleanup
br label %shared_resume
lpad2:
%exn2 = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
cleanup
br label %shared_resume
shared_resume:
call void @fn()
ret void
}
We can rewrite the users of both landing pad blocks to use one of them. This will generally allow the shared_resume block to be merged with the common landing pad as well.
Without this change, tail duplication would likely kick in - creating N (2 in this case) copies of the shared_resume basic block.
Differential Revision: http://reviews.llvm.org/D8297
llvm-svn: 233125
Assert that this doesn't fire - I'll remove all of this later, but just
leaving it in for a while in case this is firing & we just don't have
test coverage.
llvm-svn: 233116
This is the IR optimizer follow-on patch for D8563: the x86 backend patch
that converts this kind of shuffle back into a vperm2.
This is also a continuation of the transform that started in D8486.
In that patch, Andrea suggested that we could convert vperm2 intrinsics that
use zero masks into a single shuffle.
This is an implementation of that suggestion.
Differential Revision: http://reviews.llvm.org/D8567
llvm-svn: 233110
This caused PR23008, compiles failing with: "Use still stuck around after Def is
destroyed: %.sroa.speculated"
Also reverting follow-up r233064.
llvm-svn: 233105
It is possible to have code that converts from integer to float, performs operations then converts back, and the result is provably the same as if integers were used.
This can come from different sources, but the most obvious is a helper function that uses floats but the arguments given at an inlined callsites are integers.
This pass considers all integers requiring a bitwidth less than or equal to the bitwidth of the mantissa of a floating point type (23 for floats, 52 for doubles) as exactly representable in floating point.
To reduce the risk of harming efficient code, the pass only attempts to perform complete removal of inttofp/fptoint operations, not just move them around.
llvm-svn: 233062
Continue to simplify the `DIDescriptor` subclasses, so that they behave
more like raw pointers. Remove `getRaw()`, replace it with an
overloaded `get()`, and overload the arrow and cast operators. Two
testcases started to crash on the arrow operators with this change
because of `scope:` references that weren't real scopes. I fixed them.
Soon I'll add verifier checks for them too.
This also adds explicit dereference operators. Previously, the builtin
dereference against `operator MDNode *()` would have worked, but now the
builtins are ambiguous.
llvm-svn: 233030
strchr("123!", C) != nullptr is a common pattern to check if C is one
of 1, 2, 3 or !. If the largest element of the string is smaller than
the target's register size we can easily create a bitfield and just
do a simple test for set membership.
int foo(char C) { return strchr("123!", C) != nullptr; } now becomes
cmpl $64, %edi ## range check
sbbb %al, %al
movabsq $0xE000200000001, %rcx
btq %rdi, %rcx ## bit test
sbbb %cl, %cl
andb %al, %cl ## and the two conditions
andb $1, %cl
movzbl %cl, %eax ## returning an int
ret
(imho the backend should expand this into a series of branches, but
that's a different story)
The code is currently limited to bit fields that fit in a register, so
usually 64 or 32 bits. Sadly, this misses anything using alpha chars
or {}. This could be fixed by just emitting a i128 bit field, but that
can generate really ugly code so we have to find a better way. To some
degree this is also recreating switch lowering logic, but we can't
simply emit a switch instruction and thus change the CFG within
instcombine.
llvm-svn: 232902
r216771 introduced a change to MemoryDependenceAnalysis that allowed it
to reason about acquire/release operations. However, this change does
not ensure that the acquire/release operations pair. Unfortunately,
this leads to miscompiles as we won't see an acquire load as properly
memory effecting. This largely reverts r216771.
This fixes PR22708.
llvm-svn: 232889
vperm2* intrinsics are just shuffles.
In a few special cases, they're not even shuffles.
Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:
1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
really angry (and so I have regrets about some patches from last week).
2. Doing mask conversion logic in header files is hard to write and
subsequently read.
There are a couple of TODOs in this patch to complete this optimization.
Differential Revision: http://reviews.llvm.org/D8486
llvm-svn: 232852
When estimating SROA savings, we want to see if an address is derived
off an alloca in the caller. For store instructions, operand 1 is the
address operand, but the current code uses operand 0. Use
getPointerOperand for loads and stores to fix this.
Patch by Easwaran Raman.
http://reviews.llvm.org/D8425
llvm-svn: 232827
Each use of the byte array uses a different alias. This makes the
backend less likely to reuse previously computed byte array addresses,
improving the security of the CFI mechanism based on this pass.
Differential Revision: http://reviews.llvm.org/D8455
llvm-svn: 232770
Also, add several entries to vectorizable functions table, and
corresponding tests. The table isn't complete, it'll be populated later.
Review: http://reviews.llvm.org/D8131
llvm-svn: 232531
Re-commit the test cases added in r232444. These now use
-irce-print-changed-loops and -irce-print-range-checks so they run
correctly on a without asserts build of llvm.
llvm-svn: 232452
I accidentally checked in two tests that used -debug-only -- these fail
on a release LLVM build. Temporarily delete these from the repo to keep
the bots green while I fix this locally.
llvm-svn: 232446
This change to IRCE gets it to recognize "half" range checks. Half
range checks are range checks that only either check if the index is
`slt` some positive integer ("length") or if the index is `sge` `0`.
The range solver does not try to be clever / aggressive about solving
half-range checks -- it transforms "I < L" to "0 <= I < L" and "0 <= I"
to "0 <= I < INT_SMAX". This is safe, but not always optimal.
llvm-svn: 232444
By default we want our gcov emission to stay 4.2 compatible, which
means we need to continue emit the exit block last by default. We add
an option to emit it before the body for users that need it.
llvm-svn: 232438
As part of PR22777, fix testcases that fail the debug info verifier.
The changes fall into the following categories:
- Empty `filename:` fields in `MDFile`s. Compile units and some types
require non-empty filenames. A number of testcases have empty
filenames, probably due to hand-reduction of testcases.
- Not-quite empty arrays: `!{i32 0}`. This used to be equivalent in
the debug info schema to `!{}`. They cause problems for
`!MDSubroutineType`'s `types:` array, since it requires all operands
to be valid types. (Note that `!{null}` is the correct type array
for functions that take no arguments and return `void`.)
- Significantly bitrotted testcases. Nodes got left behind a few
upgrades ago because of missing or invalid tags.
llvm-svn: 232415
The problem here is the infamous one direction known safe. I was
hesitant to turn it off before b/c of the potential for regressions
without an actual bug from users hitting the problem. This is that bug ;
).
The main performance impact of having known safe in both directions is
that often times it is very difficult to find two releases without a use
in-between them since we are so conservative with determining potential
uses. The one direction known safe gets around that problem by taking
advantage of many situations where we have two retains in a row,
allowing us to avoid that problem. That being said, the one direction
known safe is unsafe. Consider the following situation:
retain(x)
retain(x)
call(x)
call(x)
release(x)
Then we know the following about the reference count of x:
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
retain(x)
// rc(x) == N+2
call A(x)
call B(x)
// rc(x) >= 1 (since we can not release a deallocated pointer).
release(x)
// rc(x) >= 0
That is all the information that we can know statically. That means that
we know that A(x), B(x) together can release (x) at most N+1 times. Lets
say that we remove the inner retain, release pair.
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
call A(x)
call B(x)
// rc(x) >= 1
release(x)
// rc(x) >= 0
We knew before that A(x), B(x) could release x up to N+1 times meaning
that rc(x) may be zero at the release(x). That is not safe. On the other
hand, consider the following situation where we have a must use of
release(x) that x must be kept alive for after the release(x)**. Then we
know that:
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
retain(x)
// rc(x) == N+2
call A(x)
call B(x)
// rc(x) >= 2 (since we know that we are going to release x and that that release can not be the last use of x).
release(x)
// rc(x) >= 1 (since we can not deallocate the pointer since we have a must use after x).
…
// rc(x) >= 1
use(x)
Thus we know that statically the calls to A(x), B(x) can together only
release rc(x) N times. Thus if we remove the inner retain, release pair:
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
call A(x)
call B(x)
// rc(x) >= 1
…
// rc(x) >= 1
use(x)
We are still safe unless in the final … there are unbalanced retains,
releases which would have caused the program to blow up anyways even
before optimization occurred. The simplest form of must use is an
additional release that has not been paired up with any retain (if we
had paired the release with a retain and removed it we would not have
the additional use). This fits nicely into the ARC framework since
basically what you do is say that given any nested releases regardless
of what is in between, the inner release is known safe. This enables us to get
back the lost performance.
<rdar://problem/19023795>
llvm-svn: 232351
Verify that debug info intrinsic arguments are valid. (These checks
will not recurse through the full debug info graph, so they don't need
to be cordoned of in `DebugInfoVerifier`.)
With those checks in place, changing the `DbgIntrinsicInst` accessors to
downcast to `MDLocalVariable` and `MDExpression` is natural (added isa
specializations in `Metadata.h` to support this).
Added tests to `test/Verifier` for the new -verify checks, and fixed the
debug info in all the in-tree tests.
If you have out-of-tree testcases that have started to fail to -verify,
hopefully the verify checks are helpful. The most likely problem is
that the expression argument is `!{}` (instead of `!MDExpression()`).
llvm-svn: 232296
Summary: This is a first step toward getting proper support for aggregate loads and stores.
Test Plan: Added unittests
Reviewers: reames, chandlerc
Reviewed By: chandlerc
Subscribers: majnemer, joker.eph, chandlerc, llvm-commits
Differential Revision: http://reviews.llvm.org/D7780
Patch by Amaury Sechet
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 232284
The linker on that platform may re-order symbols or strip dead symbols, which
will break bit set checks. Avoid this by hiding the symbols from the linker.
llvm-svn: 232235
This reapplies the patch previously committed at revision 232190. This was
reverted at revision 232196 as it caused test failures in tests that did not
expect operands to be commuted. I have made the tests more resilient to
reassociation in revision 232206.
llvm-svn: 232209
As a follow-up to r232200, add an `-instcombine` to canonicalize scalar
allocations to `i32 1`. Since r232200, `iX 1` (for X != 32) are only
created by RAUWs, so this shouldn't fire too often. Nevertheless, it's
a cheap check and a nice cleanup.
llvm-svn: 232202
Write the `alloca` array size explicitly when it's non-canonical.
Previously, if the array size was `iX 1` (where X is not 32), the type
would mutate to `i32` when round-tripping through assembly.
The testcase I added fails in `verify-uselistorder` (as well as
`FileCheck`), since the use-lists for `i32 1` and `i64 1` change.
(Manman Ren came across this when running `verify-uselistorder` on some
non-trivial, optimized code as part of PR5680.)
The type mutation started with r104911, which allowed array sizes to be
something other than an `i32`. Starting with r204945, we
"canonicalized" to `i64` on 64-bit platforms -- and then on every
round-trip through assembly, mutated back to `i32`.
I bundled a fixup for `-instcombine` to avoid r204945 on scalar
allocations. (There wasn't a clean way to sequence this into two
commits, since the assembly change on its own caused testcase churn, and
the `-instcombine` change can't be tested without the assembly changes.)
An obvious alternative fix -- change `AllocaInst::AllocaInst()`,
`AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for
scalar allocations -- was rejected out of hand, since this required
teaching them each about the data layout.
A follow-up commit will add an `-instcombine` to canonicalize the scalar
allocation array size to `i32 1` rather than leaving `iX 1` alone.
rdar://problem/20075773
llvm-svn: 232200
This patch adds initial support for vector instructions to the reassociation
pass. It enables most parts of the pass to work with vectors but to keep the
size of the patch small, optimization of Xor trees, canonicalization of
negative constants and converting shifts to muls, etc., have been left out.
This will be handled in later patches.
The patch is based on an initial patch by Chad Rosier.
Differential Revision: http://reviews.llvm.org/D7566
llvm-svn: 232190
Similar to gep (r230786) and load (r230794) changes.
Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.
(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)
import fileinput
import sys
import re
rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)
def conv(match):
line = match.group(1)
line += match.group(4)
line += ", "
line += match.group(2)
return line
line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
sys.stdout.write(line[off:match.start()])
sys.stdout.write(conv(match))
off = match.end()
sys.stdout.write(line[off:])
llvm-svn: 232184
Constant folding for shift IR instructions ignores all bits above 32 of
second argument (shift amount).
Because of that, some undef results are not recognized and APInt can
raise an assert failure if second argument has more than 64 bits.
Patch by Paweł Bylica!
Differential Revision: http://reviews.llvm.org/D7701
llvm-svn: 232176
It's firstly committed at r231630, and reverted at r231635.
Function pass InstructionSimplifier is inserted as barrier to
make sure loop unroll pass won't affect on LICM pass.
llvm-svn: 232011
The CallGraphNode function "addCalledFunction()" asserts that edges are not to intrinsics.
This patch makes sure that the Inliner does not add such an edge to the callgraph.
Fix for clang crash by assertion: https://llvm.org/bugs/show_bug.cgi?id=22857
Differential Revision: http://reviews.llvm.org/D8231
llvm-svn: 231927
Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass.
I noticed this completely by accident in another test case. It's not anything that actually came from a real workload.
p.s. We should probably do the same thing for switch instructions.
Differential Revision: http://reviews.llvm.org/D8220
llvm-svn: 231881
This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.
Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems. In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow. Even with that restriction, it handles many interesting cases:
* Early exits from functions
* Early exits from loops (for context instructions in the loop and after the check)
* Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)
Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.
This patch implements two approaches to the problem that need further benchmarking. Approach 1 is to directly walk the dominator tree looking for interesting conditions. Approach 2 is to inspect other uses of the value being queried for interesting comparisons. From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated.
Differential Revision: http://reviews.llvm.org/D7708
llvm-svn: 231879
ReplaceInstUsesWith needs to return nullptr when the input has no users,
because in that case it does not mutate the program. Otherwise, we can
get stuck in an infinite loop of repeatedly attempting to constant fold
and instruction with no users.
llvm-svn: 231755
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.
llvm-svn: 231632
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.
llvm-svn: 231631
Runtime unrollng will introduce a runtime check in loop prologue.
If the unrolled loop is a inner loop, then the proglogue will be inside
the outer loop. LICM pass can help to promote the runtime check out if
the checked value is loop invariant.
llvm-svn: 231630
Summary:
See the two test cases.
; Can fold fcmp with undef on one side by choosing NaN for the undef
; Can fold fcmp with undef on both side
; fcmp u_pred undef, undef -> true
; fcmp o_pred undef, undef -> false
; because whatever you choose for the first undef
; you can choose NaN for the other undef
Reviewers: hfinkel, chandlerc, majnemer
Reviewed By: majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D7617
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231626
This pass interchanges loops to provide a more cache-friendly memory access.
For e.g. given a loop like -
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
A[j][i] = A[j][i]+B[j][i];
is interchanged to -
for(int j=0;j<N;j++)
for(int i=0;i<N;i++)
A[j][i] = A[j][i]+B[j][i];
This pass is currently disabled by default.
To give a brief introduction it consists of 3 stages-
LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.
LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.
A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499
llvm-svn: 231458