Summary:
At least for CoreCLR, a catchpad which immediately executes an
`unreachable` instruction indicates that the exception can never have a
matching type, and so such catchpads can be removed, and so can their
catchswitches if the catchswitch becomes empty.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15846
llvm-svn: 256809
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint.
Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob
Subscribers: reames, mjacob, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15662
llvm-svn: 256443
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).
This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.
Now with a fix (and fixed tests) for the conformance issue seen in Chromium.
llvm-svn: 255767
This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare.
The intent is to restore the behavior enabled by:
http://reviews.llvm.org/rL228826
but prevent bad performance such as:
https://llvm.org/bugs/show_bug.cgi?id=24818
Earlier patches in this sequence:
D12882 (disable SimplifyCFG speculation for expensive instructions)
D13297 (have CGP despeculate expensive ops)
D14630 (have CGP despeculate special versions of cttz/ctlz)
As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally.
Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented
in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal.
A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently
everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as
they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow
on those targets.
Differential Revision: http://reviews.llvm.org/D15213
llvm-svn: 255660
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function. This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.
Depends on D15478.
Differential Revision: http://reviews.llvm.org/D15479
llvm-svn: 255522
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).
This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.
llvm-svn: 255489
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
MIPS32 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any MIPS32
implementation.
The net result of allowing this speculation for the regression tests in this patch is
that we get this code:
ctlz:
jr $ra
clz $2, $4
cttz:
addiu $1, $4, -1
not $2, $4
and $1, $2, $1
clz $1, $1
addiu $2, $zero, 32
jr $ra
subu $2, $2, $1
Instead of:
ctlz:
beqz $4, $BB0_2
addiu $2, $zero, 32
clz $2, $4
$BB0_2:
jr $ra
nop
cttz:
beqz $4, $BB1_2
addiu $2, $zero, 32
addiu $1, $4, -1
not $2, $4
and $1, $2, $1
clz $1, $1
addiu $2, $zero, 32
subu $2, $2, $1
$BB1_2:
jr $ra
nop
See D14469 for the larger motivation.
Differential Revision: http://reviews.llvm.org/D14500
llvm-svn: 252755
ARM V6T2 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any ARM V6T2
implementation.
The net result of allowing this speculation for the regression tests in this patch is
that we get this code:
ctlz:
clz r0, r0
bx lr
cttz:
rbit r0, r0
clz r0, r0
bx lr
Instead of:
ctlz:
cmp r0, #0
moveq r0, #32
clzne r0, r0
bx lr
cttz:
cmp r0, #0
moveq r0, #32
rbitne r0, r0
clzne r0, r0
bx lr
This will help solve a general speculation/despeculation problem noted in PR24818:
https://llvm.org/bugs/show_bug.cgi?id=24818
Differential Revision: http://reviews.llvm.org/D14469
llvm-svn: 252639
AArch64 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any AArch64
implementation.
The net result of allowing this speculation for the regression tests in this
patch is that we get this code:
ctlz:
clz w0, w0
ret
cttz:
rbit w8, w0
clz w0, w8
ret
Instead of:
ctlz:
cbz w0, .LBB0_2
clz w0, w0
ret
.LBB0_2:
orr w0, wzr, #0x20
ret
cttz:
cbz w0, .LBB1_2
rbit w8, w0
clz w0, w8
ret
.LBB1_2:
orr w0, wzr, #0x20
ret
See D14469 for the larger motivation.
Differential Revision: http://reviews.llvm.org/D14505
llvm-svn: 252625
This is fix for PR24059.
When we are hoisting instruction above some condition it may turn out
that metadata on this instruction was control dependant on the condition.
This metadata becomes invalid and we need to drop it.
This patch should cover most obvious places of speculative execution (which
I have found by greping isSafeToSpeculativelyExecute). I think there are more
cases but at least this change covers the severe ones.
Differential Revision: http://reviews.llvm.org/D14398
llvm-svn: 252604
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.
For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.
This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.
Since this is an IR change, a bitcode upgrade has been provided.
Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.
Differential Revision: http://reviews.llvm.org/D14265
llvm-svn: 252219
We can often end up with conditional stores that cannot be speculated. They can come from fairly simple, idiomatic code:
if (c & flag1)
*a = x;
if (c & flag2)
*a = y;
...
There is no dominating or post-dominating store to a, so it is not legal to move the store unconditionally to the end of the sequence and cache the intermediate result in a register, as we would like to.
It is, however, legal to merge the stores together and do the store once:
tmp = undef;
if (c & flag1)
tmp = x;
if (c & flag2)
tmp = y;
if (c & flag1 || c & flag2)
*a = tmp;
The real power in this optimization is that it allows arbitrary length ladders such as these to be completely and trivially if-converted. The typical code I'd expect this to trigger on often uses binary-AND with constants as the condition (as in the above example), which means the ending condition can simply be truncated into a single binary-AND too: 'if (c & (flag1|flag2))'. As in the general case there are bitwise operators here, the ladder can often be optimized further too.
This optimization involves potentially increasing register pressure. Even in the simplest case, the lifetime of the first predicate is extended. This can be elided in some cases such as using binary-AND on constants, but not in the general case. Threading 'tmp' through all branches can also increase register pressure.
The optimization as in this patch is enabled by default but kept in a very conservative mode. It will only optimize if it thinks the resultant code should be if-convertable, and additionally if it can thread 'tmp' through at least one existing PHI, so it will only ever in the worst case create one more PHI and extend the lifetime of a predicate.
This doesn't trigger much in LNT, unfortunately, but it does trigger in a big way in a third party test suite.
llvm-svn: 252051
The most common use case is when eliminating redundant range checks in an example like the following:
c = a[i+1] + a[i];
Note that all the smarts of the transform (the implication engine) is already in ValueTracking and is tested directly through InstructionSimplify.
Differential Revision: http://reviews.llvm.org/D13040
llvm-svn: 251596
CatchReturnInst has side-effects: it runs a destructor. This destructor
could conceivably run forever/call exit/etc. and should not be removed.
llvm-svn: 251461
Summary: Currently SimplifyResume can convert an invoke instruction to a call instruction if its landing pad is trivial. In practice we could have several invoke instructions with trivial landing pads and share a common rethrow block, and in the common rethrow block, all the landing pads join to a phi node. The patch extends SimplifyResume to check the phi of landing pad and their incoming blocks. If any of them is trivial, remove it from the phi node and convert the invoke instruction to a call instruction.
Reviewers: hfinkel, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13718
llvm-svn: 251061
SimplifyTerminatorOnSelect didn't consider the possibility that the
condition might be related to one of PHI nodes.
This fixes PR25267.
llvm-svn: 250922
Turns out this approach is buggy. In discussion about follow on work, Sanjoy pointed out that we could be subject to circular logic problems.
Consider:
if (i u< L) leave()
if ((i + 1) u< L) leave()
print(a[i] + a[i+1])
If we know that L is less than UINT_MAX, we could possible prove (in a control dependent way) that i + 1 does not overflow. This gives us:
if (i u< L) leave()
if ((i +nuw 1) u< L) leave()
print(a[i] + a[i+1])
If we now do the transform this patch proposed, we end up with:
if ((i +nuw 1) u< L) leave_appropriately()
print(a[i] + a[i+1])
That would be a miscompile when i==-1. The problem here is that the control dependent nuw bits got used to prove something about the first condition. That's obviously invalid.
This won't happen today, but since I plan to enhance LVI/CVP with exactly that transform at some point in the not too distant future...
llvm-svn: 250430
Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.
Reviewers: andrew.w.kaylor, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13152
llvm-svn: 248677
...because that's what the cost model was intended to do.
As discussed in D12882, this fix has a temporary unintended consequence for
SimplifyCFG: it causes us to not speculate an fdiv. However, two wrongs make
PR24818 right, and two wrongs make PR24343 act right even though it's really
still wrong.
I intend to correct SimplifyCFG and add to CodeGenPrepare to account for this
cost model change and preserve the righteousness for the bug report cases.
https://llvm.org/bugs/show_bug.cgi?id=24818https://llvm.org/bugs/show_bug.cgi?id=24343
Differential Revision: http://reviews.llvm.org/D12882
llvm-svn: 248439
Clang now passes the adjectives as an argument to catchpad.
Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.
llvm-svn: 247844
The rest of the EH pads are fine, since they have at most one label and
take fewer operands for the personality.
Old catchpad vs. new:
%5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)] to label %__except.ret.10 unwind label %catchendblock.9
-----
%5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)]
to label %__except.ret.10 unwind label %catchendblock.9
llvm-svn: 247433
This is a follow up to http://reviews.llvm.org/D11995 implementing the suggestion by Hans.
If we know some of the bits of the value being switched on, we know that the maximum number of unique cases covers the unknown bits. This allows to eliminate switch defaults for large integers (i32) when most bits in the value are known.
Note that I had to make the transform contingent on not having any dead cases. This is conservatively correct with the old code, but required for the new code since we might have a dead case which varies one of the known bits. Counting that towards our number of covering cases would be bad. If we do have dead cases, we'll eliminate them first, then revisit the possibly dead default.
Differential Revision: http://reviews.llvm.org/D12497
llvm-svn: 247309
This reverts isSafeToSpeculativelyExecute's use of ReadNone until we
split ReadNone into two pieces: one attribute which reasons about how
the function reasons about memory and another attribute which determines
how it may be speculated, CSE'd, trap, etc.
llvm-svn: 246331
As a follow-up to r246098, require `DISubprogram` definitions
(`isDefinition: true`) to be 'distinct'. Specifically, add an assembler
check, a verifier check, and bitcode upgrading logic to combat testcase
bitrot after the `DIBuilder` change.
While working on the testcases, I realized that
test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its
purpose was to check for a corner case in PR22792 where two subprogram
definitions match exactly and share the same metadata node. The new
verifier check, requiring that subprogram definitions are 'distinct',
precludes that possibility.
I updated almost all the IR with the following script:
git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' |
grep -v test/Bitcode |
xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/'
Likely some variant of would work for out-of-tree testcases.
llvm-svn: 246327
Any call which is side effect free is trivially OK to speculate. We
already had similar logic in EarlyCSE and GVN but we were missing it
from isSafeToSpeculativelyExecute.
This fixes PR24601.
llvm-svn: 246232
As Sanjoy pointed out over in http://reviews.llvm.org/D11819, a switch on an icmp should always be able to become a branch instruction. This patch generalizes that notion slightly to prove that the default case of a switch is unreachable if the cases completely cover all possible bit patterns in the condition. Once that's done, the switch to branch conversion kicks in just fine.
Note: Duplicate case values are disallowed by the LangRef and verifier.
Differential Revision: http://reviews.llvm.org/D11995
llvm-svn: 246125
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.
Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.
llvm-svn: 245589
Summary: llvm::ConstantFoldTerminator function can convert SwitchInst with single case (and default) to a conditional BranchInst. This patch adds support to preserve make.implicit metadata on this conversion.
Reviewers: sanjoy, weimingz, chenli
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D11841
llvm-svn: 244348
Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s.
The backend is liable to start relying on that (if it hasn't already),
so make uniquable `DICompileUnit`s illegal and automatically upgrade old
bitcode. This is a nice cleanup, since we can remove an unnecessary
`DenseSet` (and the associated uniquing info) from `LLVMContextImpl`.
Almost all the testcases were updated with this script:
git grep -e '= !DICompileUnit' -l -- test |
grep -v test/Bitcode |
xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,'
I imagine something similar should work for out-of-tree testcases.
llvm-svn: 243885
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.
Most of the testcase updates were generated by the following sed script:
find test/ -name "*.ll" -o -name "*.mir" |
xargs grep -l 'DILocalVariable' |
xargs sed -i '' \
-e 's/tag: DW_TAG_arg_variable, //' \
-e 's/tag: DW_TAG_auto_variable, //'
There were only a handful of tests in `test/Assembly` that I needed to
update by hand.
(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`. I've added a FIXME to that effect.)
llvm-svn: 243774
The personality routine currently lives in the LandingPadInst.
This isn't desirable because:
- All LandingPadInsts in the same function must have the same
personality routine. This means that each LandingPadInst beyond the
first has an operand which produces no additional information.
- There is ongoing work to introduce EH IR constructs other than
LandingPadInst. Moving the personality routine off of any one
particular Instruction and onto the parent function seems a lot better
than have N different places a personality function can sneak onto an
exceptional function.
Differential Revision: http://reviews.llvm.org/D10429
llvm-svn: 239940
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
When using bit tests for hole checks, we call AddPredecessorToBlock to give the
phi node a value from the bit test block. This would break if we've
previously called removePredecessor on the default destination because the
switch is fully covered.
Test case by Mark Lacey.
llvm-svn: 235771
See r230786 and r230794 for similar changes to gep and load
respectively.
Call is a bit different because it often doesn't have a single explicit
type - usually the type is deduced from the arguments, and just the
return type is explicit. In those cases there's no need to change the
IR.
When that's not the case, the IR usually contains the pointer type of
the first operand - but since typed pointers are going away, that
representation is insufficient so I'm just stripping the "pointerness"
of the explicit type away.
This does make the IR a bit weird - it /sort of/ reads like the type of
the first operand: "call void () %x(" but %x is actually of type "void
()*" and will eventually be just of type "ptr". But this seems not too
bad and I don't think it would benefit from repeating the type
("void (), void () * %x(" and then eventually "void (), ptr %x(") as has
been done with gep and load.
This also has a side benefit: since the explicit type is no longer a
pointer, there's no ambiguity between an explicit type and a function
that returns a function pointer. Previously this case needed an explicit
type (eg: a function returning a void() function was written as
"call void () () * @x(" rather than "call void () * @x(" because of the
ambiguity between a function returning a pointer to a void() function
and a function returning void).
No ambiguity means even function pointer return types can just be
written alone, without writing the whole function's type.
This leaves /only/ the varargs case where the explicit type is required.
Given the special type syntax in call instructions, the regex-fu used
for migration was a bit more involved in its own unique way (as every
one of these is) so here it is. Use it in conjunction with the apply.sh
script and associated find/xargs commands I've provided in rr230786 to
migrate your out of tree tests. Do let me know if any of this doesn't
cover your cases & we can iterate on a more general script/regexes to
help others with out of tree tests.
About 9 test cases couldn't be automatically migrated - half of those
were functions returning function pointers, where I just had to manually
delete the function argument types now that we didn't need an explicit
function type there. The other half were typedefs of function types used
in calls - just had to manually drop the * from those.
import fileinput
import sys
import re
pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)')
addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$")
func_end = re.compile("(?:void.*|\)\s*)\*$")
def conv(match, line):
if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)):
return line
return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():]
for line in sys.stdin:
sys.stdout.write(conv(re.search(pat, line), line))
llvm-svn: 235145
Change `llc` and `opt` to run `verifyModule()`. This ensures that we
check the full module before `FunctionPass::doInitialization()` ever
gets called (I was getting crashes in `DwarfDebug` instead of verifier
failures when testing a WIP patch that checks operands of compile
units). In `opt`, also move up debug-info-stripping so that it still
runs before verification.
There was a fair bit of broken code that was sitting in tree.
Interestingly, some were cases of a `select` that referred to itself in
`-instcombine` tests (apparently an intermediate result). I split them
off to `*-noverify.ll` tests with RUN lines like this:
opt < %s -S -disable-verify -instcombine | opt -S | FileCheck %s
This avoids verifying the input file (so we can get the broken code into
`-instcombine), but still verifies the output with a second call to
`opt` (to verify that `-instcombine` will clean it up like it should).
llvm-svn: 233432
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types. The problems look like they were all
caused by bitrot. They fell into these categories:
- Using `!{i32 0}` instead of `!{}`.
- Using `!{null}` instead of `!{}`.
- Using `!MDExpression()` instead of `!{}`.
- Using `!8` instead of `!{!8}`.
- `file:` references that pointed at `MDCompileUnit`s instead of the
same `MDFile` as the compile unit.
- `file:` references that were numerically off-by-one or (off-by-ten).
llvm-svn: 233415
This patch tries to merge duplicate landing pads when they branch to a common shared target.
Given IR that looks like this:
lpad1:
%exn = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
cleanup
br label %shared_resume
lpad2:
%exn2 = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
cleanup
br label %shared_resume
shared_resume:
call void @fn()
ret void
}
We can rewrite the users of both landing pad blocks to use one of them. This will generally allow the shared_resume block to be merged with the common landing pad as well.
Without this change, tail duplication would likely kick in - creating N (2 in this case) copies of the shared_resume basic block.
Differential Revision: http://reviews.llvm.org/D8297
llvm-svn: 233125
Verify that debug info intrinsic arguments are valid. (These checks
will not recurse through the full debug info graph, so they don't need
to be cordoned of in `DebugInfoVerifier`.)
With those checks in place, changing the `DbgIntrinsicInst` accessors to
downcast to `MDLocalVariable` and `MDExpression` is natural (added isa
specializations in `Metadata.h` to support this).
Added tests to `test/Verifier` for the new -verify checks, and fixed the
debug info in all the in-tree tests.
If you have out-of-tree testcases that have started to fail to -verify,
hopefully the verify checks are helpful. The most likely problem is
that the expression argument is `!{}` (instead of `!MDExpression()`).
llvm-svn: 232296
Similar to gep (r230786) and load (r230794) changes.
Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.
(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)
import fileinput
import sys
import re
rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)
def conv(match):
line = match.group(1)
line += match.group(4)
line += ", "
line += match.group(2)
return line
line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
sys.stdout.write(line[off:match.start()])
sys.stdout.write(conv(match))
off = match.end()
sys.stdout.write(line[off:])
llvm-svn: 232184
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464. I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned. Let me know if I'm wrong :).
The code changes are fairly mechanical:
- Bumped the "Debug Info Version".
- `DIBuilder` now creates the appropriate subclass of `MDNode`.
- Subclasses of DIDescriptor now expect to hold their "MD"
counterparts (e.g., `DIBasicType` expects `MDBasicType`).
- Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
for printing comments.
- Big update to LangRef to describe the nodes in the new hierarchy.
Feel free to make it better.
Testcase changes are enormous. There's an accompanying clang commit on
its way.
If you have out-of-tree debug info testcases, I just broke your build.
- `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to
update all the IR testcases.
- Unfortunately I failed to find way to script the updates to CHECK
lines, so I updated all of these by hand. This was fairly painful,
since the old CHECKs are difficult to reason about. That's one of
the benefits of the new hierarchy.
This work isn't quite finished, BTW. The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro). Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I
also expect to make a few schema changes now that it's easier to reason
about everything.
llvm-svn: 231082
Essentially the same as the GEP change in r230786.
A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)
import fileinput
import sys
import re
pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
for line in sys.stdin:
sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7649
llvm-svn: 230794
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.
This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.
* This doesn't modify gep operators, only instructions (operators will be
handled separately)
* Textual IR changes only. Bitcode (including upgrade) and changing the
in-memory representation will be in separate changes.
* geps of vectors are transformed as:
getelementptr <4 x float*> %x, ...
->getelementptr float, <4 x float*> %x, ...
Then, once the opaque pointer type is introduced, this will ultimately look
like:
getelementptr float, <4 x ptr> %x
with the unambiguous interpretation that it is a vector of pointers to float.
* address spaces remain on the pointer, not the type:
getelementptr float addrspace(1)* %x
->getelementptr float, float addrspace(1)* %x
Then, eventually:
getelementptr float, ptr addrspace(1) %x
Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.
update.py:
import fileinput
import sys
import re
ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
def conv(match, line):
if not match:
return line
line = match.groups()[0]
if len(match.groups()[5]) == 0:
line += match.groups()[2]
line += match.groups()[3]
line += ", "
line += match.groups()[1]
line += "\n"
return line
for line in sys.stdin:
if line.find("getelementptr ") == line.find("getelementptr inbounds"):
if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
line = conv(re.match(ibrep, line), line)
elif line.find("getelementptr ") != line.find("getelementptr ("):
line = conv(re.match(normrep, line), line)
sys.stdout.write(line)
apply.sh:
for name in "$@"
do
python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
rm -f "$name.tmp"
done
The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh
After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).
The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7636
llvm-svn: 230786
This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.
Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>
llvm-svn: 230241
SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are
'cheap' for the target. Therefore, some of the logic in CodeGenPrepare
that was originally added at revision 224899 can now be removed.
This patch is basically a no functional change. It removes the duplicated
logic in CodeGenPrepare and converts all the existing target specific tests
for cttz/ctlz into SimplifyCFG tests.
Differential Revision: http://reviews.llvm.org/D7608
llvm-svn: 229105
Now that SimplifyCFG uses TTI for the cost heuristic, we can teach BasicTTIImpl
how to query TLI in order to get a more accurate cost for truncates and
zero-extends.
Before this patch, the basic cost heuristic in TargetTransformInfoImplCRTPBase
would have conservatively returned a 'default' TCC_Basic for all zero-extends,
and TCC_Free for truncates on native types.
This patch improves the heuristic so that we query TLI (if available) to get
more accurate answers. If TLI is available, then methods 'isZExtFree' and
'isTruncateFree' can be used to check if a zext/trunc is free for the target.
Added more test cases to SimplifyCFG/X86/speculate-cttz-ctlz.ll.
With this change, SimplifyCFG is now able to speculate a 'cheap' cttz/ctlz
immediately followed by a free zext/trunc.
Differential Revision: http://reviews.llvm.org/D7585
llvm-svn: 228923
This patch is a follow-up of r228826 (see code-review: D7506).
Now that SimplifyCFG uses TargetTransformInfo for cost analysis, we
have to fix the cost heuristic for intrinsic calls to cttz/ctlz.
This patch defines method 'getIntrinsicCost' in BasicTTIImpl: now, BasicTTIImpl
queries TLI to check if a call to cttz/ctlz is cheap for the target.
Added test cases in Transforms/SimplifyCFG/X86 to verify that on x86,
SimplifyCFG only speculates a call to cttz/ctlz if it is cheap.
Differential Revision: http://reviews.llvm.org/D7554
llvm-svn: 228829
analysis.
We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.
Test changes:
* combine-comparisons-by-cse.ll: Removed unneeded branch check.
* 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
* coalesce-subregs.ll: Superfluous block check.
* 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
* PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
* select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.
llvm-svn: 228826
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.
Also add some landingpads to invalid LLVM IR test cases that lack them.
Over-the-shoulder reviewed by David Majnemer.
llvm-svn: 228782
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.
llvm-svn: 227726
An unreachable default destination can be exploited by other optimizations and
allows for more efficient lowering. Both the SDag switch lowering and
LowerSwitch can exploit unreachable defaults.
Also make TurnSwitchRangeICmp handle switches with unreachable default.
This is kind of separate change, but it cannot be tested without the change
above, and I don't want to land the change above without this since that would
regress other tests.
Differential Revision: http://reviews.llvm.org/D6471
llvm-svn: 227125
This reverts commit r176827.
Björn Steinbrink pointed out that this didn't actually fix the bug
(PR15555) it was attempting to fix.
With this reverted, we can now remove landingpad cleanups that
immediately resume unwinding, converting the invoke to a call.
llvm-svn: 226850
This commit moves `MDLocation`, finishing off PR21433. There's an
accompanying clang commit for frontend testcases. I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.
This changes the schema for `DebugLoc` and `DILocation` from:
!{i32 3, i32 7, !7, !8}
to:
!MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)
Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.
llvm-svn: 226048
The previous code assumed that such instructions could not have any uses
outside CaseDest, with the motivation that the instruction could not
dominate CommonDest because CommonDest has phi nodes in it. That simply
isn't true; e.g., CommonDest could have an edge back to itself.
llvm-svn: 225552
- Fix the case where more than 1 common instructions derived from the same
operand cannot be sunk. When a pair of value has more than 1 derived values
in both branches, only 1 derived value could be sunk.
- Replace BB1 -> (BB2, PN) map with joint value map, i.e.
map of (BB1, BB2) -> PN, which is more accurate to track common ops.
llvm-svn: 224757
Now that `Metadata` is typeless, reflect that in the assembly. These
are the matching assembly changes for the metadata/value split in
r223802.
- Only use the `metadata` type when referencing metadata from a call
intrinsic -- i.e., only when it's used as a `Value`.
- Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
when referencing it from call intrinsics.
So, assembly like this:
define @foo(i32 %v) {
call void @llvm.foo(metadata !{i32 %v}, metadata !0)
call void @llvm.foo(metadata !{i32 7}, metadata !0)
call void @llvm.foo(metadata !1, metadata !0)
call void @llvm.foo(metadata !3, metadata !0)
call void @llvm.foo(metadata !{metadata !3}, metadata !0)
ret void, !bar !2
}
!0 = metadata !{metadata !2}
!1 = metadata !{i32* @global}
!2 = metadata !{metadata !3}
!3 = metadata !{}
turns into this:
define @foo(i32 %v) {
call void @llvm.foo(metadata i32 %v, metadata !0)
call void @llvm.foo(metadata i32 7, metadata !0)
call void @llvm.foo(metadata i32* @global, metadata !0)
call void @llvm.foo(metadata !3, metadata !0)
call void @llvm.foo(metadata !{!3}, metadata !0)
ret void, !bar !2
}
!0 = !{!2}
!1 = !{i32* @global}
!2 = !{!3}
!3 = !{}
I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines). I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.
This is part of PR21532.
llvm-svn: 224257
An unreachable default destination can be exploited by other optimizations, and
SDag lowering is now prepared to handle them efficiently.
For example, branches to the unreachable destination will be optimized away,
such as in the case of range checks for switch lookup tables.
On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and
Chromium by 30 kB).
llvm-svn: 223050
Fixed missing dominance check.
Original commit message:
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
if (idx < tablesize)
r = table[idx]; // table does not contain default_value
else
r = default_value;
if (r != default_value)
...
Is optimized to:
cond = idx < tablesize;
if (cond)
r = table[idx];
else
r = default_value;
if (cond)
...
Jump threading will then eliminate the second if(cond).
llvm-svn: 222891
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
if (idx < tablesize)
r = table[idx]; // table does not contain default_value
else
r = default_value;
if (r != default_value)
...
Is optimized to:
cond = idx < tablesize;
if (cond)
r = table[idx];
else
r = default_value;
if (cond)
...
\endcode
Jump threading will then eliminate the second if(cond).
llvm-svn: 222872
When converting a switch to a lookup table we might have to generate a bitmaks
to encode and check for holes in the original switch statement.
The type of this mask depends on the number of switch statements, which can
result in illegal types for pretty much all architectures.
To avoid unnecessary type legalization and help FastISel this commit increases
the size of the bitmask to next power-of-2 value when necessary.
This fixes rdar://problem/18984639.
llvm-svn: 222168
This is a simple optimization for switch table lookup:
It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output.
Example:
int f1(int x) {
switch (x) {
case 0: return 10;
case 1: return 11;
case 2: return 12;
case 3: return 13;
}
return 0;
}
generates:
define i32 @f1(i32 %x) #0 {
entry:
%0 = icmp ult i32 %x, 4
br i1 %0, label %switch.lookup, label %return
switch.lookup:
%switch.offset = add i32 %x, 10
ret i32 %switch.offset
return:
ret i32 0
}
llvm-svn: 222121
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.
llvm-svn: 220341
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.
llvm-svn: 219656
instead
We used to transform this:
define void @test6(i1 %cond, i8* %ptr) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br label %bb2
bb2:
%ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
store i8 2, i8* %ptr.2, align 8
ret void
}
into this:
define void @test6(i1 %cond, i8* %ptr) {
%ptr.2 = select i1 %cond, i8* null, i8* %ptr
store i8 2, i8* %ptr.2, align 8
ret void
}
because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).
The existing transformation that removes unreachable control flow in simplifycfg
is:
/// If BB has an incoming value that will always trigger undefined behavior
/// (eg. null pointer dereference), remove the branch leading here.
static bool removeUndefIntroducingPredecessor(BasicBlock *BB)
Now we generate:
define void @test6(i1 %cond, i8* %ptr) {
store i8 2, i8* %ptr.2, align 8
ret void
}
I did not see any impact on the test-suite + externals.
rdar://18596215
llvm-svn: 219462
This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
to a select or a couple of selects (depending if the default block is reachable or not).
The typical case this optimization wants to be able to optimize is this one:
Example:
switch (a) {
case 10: %0 = icmp eq i32 %a, 10
return 10; %1 = select i1 %0, i32 10, i32 4
case 20: ----> %2 = icmp eq i32 %a, 20
return 2; %3 = select i1 %2, i32 2, i32 %1
default:
return 4;
}
It also sets the base for further optimizations that are planned and being reviewed.
llvm-svn: 219223
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash. The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).
Original commit message follows.
--
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
llvm-svn: 219010
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
llvm-svn: 218914
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
llvm-svn: 218778
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.
The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:
__global__ void foo(int a, int b, int c, int d, int e, int n,
const int *input, int *output) {
int sum = 0;
for (int i = 0; i < n; ++i)
sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
*output = sum;
}
The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:
sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];
Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.
We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!
Test Plan: Added one test case to check the threshold is in effect
Reviewers: nadav, eliben, meheff, resistor, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D5529
llvm-svn: 218711
The lifetime intrinsics need some work in order to make it clear which
optimizations are or are not valid.
For now dropping this optimization avoids a miscompilation.
Patch by Björn Steinbrink.
llvm-svn: 214336
This is the first commit in a series that add an @llvm.assume intrinsic which
can be used to provide the optimizer with a condition it may assume to be true
(when the control flow would hit the intrinsic call). Some basic properties are added here:
- llvm.invariant(true) is dead.
- llvm.invariant(false) is unreachable (this directly corresponds to the
documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef).
The intrinsic is tagged as writing arbitrarily, in order to maintain control
dependencies. BasicAA has been updated, however, to return NoModRef for any
particular location-based query so that we don't unnecessarily block code
motion.
llvm-svn: 213973
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.
For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx
It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext
rdar://17735071
llvm-svn: 213815
This patch adds to an existing loop over phi nodes in SimplifyCondBranchToCondBranch() to check for trapping ops and bails out of the optimization if we find one of those.
The test cases verify that trapping ops are not hoisted and non-trapping ops are still optimized as expected.
llvm-svn: 212490
We would previously put dllimport variables in switch lookup tables, which
doesn't work because the address cannot be used in a constant initializer.
This is basically the same problem that we have in PR19955.
Putting TLS variables in switch tables also desn't work, because the
address of such a variable is not constant.
Differential Revision: http://reviews.llvm.org/D4220
llvm-svn: 211331
Since ExtractValue is not included in ComputeSpeculationCost CFGs containing
ExtractValueInsts cannot be simplified. In particular this interacts with
InstCombineCompare's tendency to insert add.with.overflow intrinsics for
certain idiomatic math operations, preventing optimization.
This patch adds ExtractValue to the ComputeSpeculationCost. Test case included
rdar://14853450
llvm-svn: 208434
This allows us to generate table lookups for code such as:
unsigned test(unsigned x) {
switch (x) {
case 100: return 0;
case 101: return 1;
case 103: return 2;
case 105: return 3;
case 107: return 4;
case 109: return 5;
case 110: return 6;
default: return f(x);
}
}
Since cases 102, 104, etc. are not constants, the lookup table has holes
in those positions. We therefore guard the table lookup with a bitmask check.
Patch by Jasper Neumann!
llvm-svn: 203694
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
When simplifycfg moves an instruction, it must drop metadata it doesn't know
is still valid with the preconditions changes. In particular, it must drop
the range and tbaa metadata.
The patch implements this with an utility function to drop all metadata not
in a white list.
llvm-svn: 200322
uint32.
When folding branches to common destination, the updated branch weights
can exceed uint32 by more than factor of 2. We should keep halving the
weights until they can fit into uint32.
llvm-svn: 200262
There has been an old FIXME to find the right cut-off for when it's worth
analyzing and potentially transforming a switch to a lookup table.
The switches always have two or more cases. I could not measure any speed-up
by transforming a switch with two cases. A switch with three cases gets a nice
speed-up, and I couldn't measure any compile-time regression, so I think this
is the right threshold.
In a Clang self-host, this causes 480 new switches to be transformed,
and reduces the final binary size with 8 KB.
llvm-svn: 199294
case when the lookup table doesn't have any holes.
This means we can build a lookup table for switches like this:
switch (x) {
case 0: return 1;
case 1: return 2;
case 2: return 3;
case 3: return 4;
default: exit(1);
}
The default case doesn't yield a constant result here, but that doesn't matter,
since a default result is only necessary for filling holes in the lookup table,
and this table doesn't have any holes.
This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
off the resulting clang binary.
llvm-svn: 199025
If we happen to eliminate every case in a switch that has branch
weights, we currently try to create metadata for the one remaining
branch, triggering an assert. Instead, we need to check that the
metadata we're trying to create is sensible.
llvm-svn: 197791
This avoids creating branch weight metadata of length one when we fold
cases into the default of a switch instruction, which was triggering
an assert.
llvm-svn: 196845
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
Make tests more robust by removing hard-coded metadata numbers in CHECK lines.
llvm-svn: 195535
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
llvm-svn: 195504
One optimization simplify-cfg performs is the converting of switches to
lookup tables if the switch has > 4 cases. This is done by:
1. Finding the max/min case value and calculating the switch case range.
2. Create a lookup table basic block.
3. Perform a check in the switch's BB to see if the input value is in
the switch's case range. If the input value satisfies said predicate
branch to the lookup table BB, otherwise branch to the switch's default
destination BB using the default value as the result.
The conditional check consists of subtracting the min case value of the
table from any input iN value and then ensuring that said value is
unsigned less than the size of the lookup table represented as an iN
value.
If the lookup table is a covered lookup table, the size of the table will be N
which is 0 as an iN value. Thus the comparison will be an `icmp ult` of an iN
value against 0 which is always false yielding the incorrect result.
This patch fixes this problem by recognizing if we have a covered lookup table
and if we do, unconditionally jumps to the lookup table BB since the covering
property of the lookup table implies no input values could not be handled by
said BB.
rdar://15268442
llvm-svn: 193045
Field 2 of DIType (Context), field 9 of DIDerivedType (TypeDerivedFrom),
field 12 of DICompositeType (ContainingType), fields 2, 7, 12 of DISubprogram
(Context, Type, ContainingType).
llvm-svn: 190205
DICompositeType will have an identifier field at position 14. For now, the
field is set to null in DIBuilder.
For DICompositeTypes where the template argument field (the 13th field)
was optional, modify DIBuilder to make sure the template argument field is set.
Now DICompositeType has 15 fields.
Update DIBuilder to use NULL instead of "i32 0" for null value of a MDNode.
Update verifier to check that DICompositeType has 15 fields and the last
field is null or a MDString.
Update testing cases to include an extra field for DICompositeType.
The identifier field will be used by type uniquing so a front end can
genearte a DICompositeType with a unique identifer.
llvm-svn: 189282
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
llvm-svn: 188513
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.
llvm-svn: 187618
Also always add DIType, DISubprogram and DIGlobalVariable to the list
in DebugInfoFinder without checking them, so we can verify them later
on.
llvm-svn: 187285
This conversion was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)define\([^@]*\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3define\4@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
llvm-svn: 186269
This update was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
llvm-svn: 186268
predecessors of the two blocks it is attempting to merge supply the
same incoming values to any phi in the successor block. This change
allows merging in the case where there is one or more incoming values
that are undef. The undef values are rewritten to match the non-undef
value that flows from the other edge. Patch by Mark Lacey.
llvm-svn: 186069
This allows us to create switches even if instcombine has munged two of the
incombing compares into one and some bit twiddling. This was motivated by enum
compares that are common in clang.
llvm-svn: 185632
The problem this time seems to be a thinko. We were assuming that in the CFG
A
| \
| B
| /
C
speculating the basic block B would cause only the phi value for the B->C edge
to be speculated. That is not true, the phi's are semantically in the edges, so
if the A->B->C path is taken, any code needed for A->C is not executed and we
have to consider it too when deciding to speculate B.
llvm-svn: 183226
PR16069 is an interesting case where an incoming value to a PHI is a
trap value while also being a 'ConstantExpr'.
We do not consider this case when performing the 'HoistThenElseCodeToIf'
optimization.
Instead, make our modifications more conservative if we detect that we
cannot transform the PHI to a select.
llvm-svn: 183152
This resurrects r179957, but adds code that makes sure we don't touch
atomic/volatile stores:
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:
a[i] =
may-alias with a[i] load
if (cond)
a[i] = Y
into an unconditional store.
a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp
We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case where the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.
hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.
llvm-svn: 180731
There is the temptation to make this tranform dependent on target information as
it is not going to be beneficial on all (sub)targets. Therefore, we should
probably do this in MI Early-Ifconversion.
This reverts commit r179957. Original commit message:
"SimplifyCFG: If convert single conditional stores
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:
a[i] =
may-alias with a[i] load
if (cond)
a[i] = Y
into an unconditional store.
a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp
We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.
hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.
I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up."
llvm-svn: 179980
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:
a[i] =
may-alias with a[i] load
if (cond)
a[i] = Y
into an unconditional store.
a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp
We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.
hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.
I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up.
llvm-svn: 179957
If a switch instruction has a case for every possible value of its type,
with the same successor, SimplifyCFG would replace it with an icmp ult,
but the computation of the bound overflows in that case, which inverts
the test.
Patch by Jed Davis!
llvm-svn: 179587
Fixes rdar:13349374.
Volatile loads and stores need to be preserved even if the language
standard says they are undefined. "volatile" in this context means "get
out of the way compiler, let my platform handle it".
Additionally, this is the only way I know of with llvm to write to the
first page (when hardware allows) without dropping to assembly.
llvm-svn: 176599
Listing all of the attributes for the callee of a call/invoke instruction is way
too much and makes the IR unreadable. Use references to attributes instead.
llvm-svn: 175877
loops over instructions in the basic block or the use-def list of the
value, neither of which are really efficient when repeatedly querying
about values in the same basic block.
What's more, we already know that the CondBB is small, and so we can do
a much more efficient test by counting the uses in CondBB, and seeing if
those account for all of the uses.
Finally, we shouldn't blanket fail on any such instruction, instead we
should conservatively assume that those instructions are part of the
cost.
Note that this actually fixes a bug in the pass because
isUsedInBasicBlock has a really terrible bug in it. I'll fix that in my
next commit, but the fix for it would make this code suddenly take the
compile time hit I thought it already was taking, so I wanted to go
ahead and migrate this code to a faster & better pattern.
The bug in isUsedInBasicBlock was also causing other tests to test the
wrong thing entirely: for example we weren't actually disabling
speculation for floating point operations as intended (and tested), but
the test passed because we failed to speculate them due to the
isUsedInBasicBlock failure.
llvm-svn: 173417
Original commit message:
Plug TTI into the speculation logic, giving it a real cost interface
that can be specialized by targets.
The goal here is not to be more aggressive, but to just be more accurate
with very obvious cases. There are instructions which are known to be
truly free and which were not being modeled as such in this code -- see
the regression test which is distilled from an inner loop of zlib.
Everywhere the TTI cost model is insufficiently conservative I've added
explicit checks with FIXME comments to go add proper modelling of these
cost factors.
If this causes regressions, the likely solution is to make TTI even more
conservative in its cost estimates, but test cases will help here.
llvm-svn: 173357
that can be specialized by targets.
The goal here is not to be more aggressive, but to just be more accurate
with very obvious cases. There are instructions which are known to be
truly free and which were not being modeled as such in this code -- see
the regression test which is distilled from an inner loop of zlib.
Everywhere the TTI cost model is insufficiently conservative I've added
explicit checks with FIXME comments to go add proper modelling of these
cost factors.
If this causes regressions, the likely solution is to make TTI even more
conservative in its cost estimates, but test cases will help here.
llvm-svn: 173342
a cost fuction that seems both a bit ad-hoc and also poorly suited to
evaluating constant expressions.
Notably, it is missing any support for trivial expressions such as
'inttoptr'. I could fix this routine, but it isn't clear to me all of
the constraints its other users are operating under.
The core protection that seems relevant here is avoiding the formation
of a select instruction wich a further chain of select operations in
a constant expression operand. Just explicitly encode that constraint.
Also, update the comments and organization here to make it clear where
this needs to go -- this should be driven off of real cost measurements
which take into account the number of constants expressions and the
depth of the constant expression tree.
llvm-svn: 173340
By propagating the value for the switch condition, LLVM can now build
lookup tables for code such as:
switch (x) {
case 1: return 5;
case 2: return 42;
case 3: case 4: case 5:
return x - 123;
default:
return 123;
}
Given that x is known for each case, "x - 123" becomes a constant for
cases 3, 4, and 5.
llvm-svn: 167115
When the switch-to-lookup tables transform landed in SimplifyCFG, it
was pointed out that this could be inappropriate for some targets.
Since there was no way at the time for the pass to know anything about
the target, an awkward reverse-transform was added in CodeGenPrepare
that turned lookup tables back into switches for some targets.
This patch uses the new TargetTransformInfo to determine if a
switch should be transformed, and removes
CodeGenPrepare::ConvertLoadToSwitch.
llvm-svn: 167011
The isValueEqualityComparison() guard at the top of SimplifySwitch()
only applies to some of the possible transformations.
The newer transformations work just fine on large switches, and the
check on predecessor count is nonsensical.
llvm-svn: 166710
We conservatively only check the first use to avoid walking long use chains.
This catches the common case of having both a load and a store to a pointer
supplied by a PHI node.
llvm-svn: 165232
If the width is very large it gets truncated from uint64_t to uint32_t when
passed to TD->fitsInLegalInteger. The truncated value can fit in a register.
This manifested in massive memory usage or crashes (PR13946).
llvm-svn: 164784
- Put statistics in alphabetical order
- Don't use getZextValue when building TableInt, just use APInts
- Introduce Create{Z,S}ExtOrTrunc in IRBuilder.
llvm-svn: 164696
tables in bitmaps when they fit in a target-legal register.
This saves some space, and it also allows for building tables that would
otherwise be deemed too sparse.
One interesting case that this hits is example 7 from
http://blog.regehr.org/archives/320. We currently generate good code
for this when lowering the switch to the selection DAG: we build a
bitmask to decide whether to jump to one block or the other. My patch
will result in the same bitmask, but it removes the need for the jump,
as the return value can just be retrieved from the mask.
llvm-svn: 164684
We already have HoistThenElseCodeToIf, this patch implements
SinkThenElseCodeToEnd. When END block has only two predecessors and each
predecessor terminates with unconditional branches, we compare instructions in
IF and ELSE blocks backwards and check whether we can sink the common
instructions down.
rdar://12191395
llvm-svn: 164325
two variables where the first variable is returned and the second
ignored.
I don't think this occurs in practice (other passes should have cleaned
up the unused phi node), but it should still be handled correctly.
Also make the logic for determining if we should return early less
sketchy.
llvm-svn: 164225
destination.
Updated previous implementation to fix a case not covered:
// PBI: br i1 %x, TrueDest, BB
// BI: br i1 %y, TrueDest, FalseDest
The other case was handled correctly.
// PBI: br i1 %x, BB, FalseDest
// BI: br i1 %y, TrueDest, FalseDest
Also tried to use 64-bit arithmetic instead of APInt with scale to simplify the
computation. Let me know if you have other opinions about this.
llvm-svn: 163954
a pair of switch/branch where both depend on the value of the same variable and
the default case of the first switch/branch goes to the second switch/branch.
Code clean up and fixed a few issues:
1> handling the case where some cases of the 2nd switch are invalidated
2> correctly calculate the weight for the 2nd switch when it is a conditional eq
Testing case is modified from Alastair's original patch.
llvm-svn: 163635
The lookup tables did not get built in a deterministic order.
This makes them get built in the order that the corresponding phi nodes
were found.
llvm-svn: 163305
This adds a transformation to SimplifyCFG that attemps to turn switch
instructions into loads from lookup tables. It works on switches that
are only used to initialize one or more phi nodes in a common successor
basic block, for example:
int f(int x) {
switch (x) {
case 0: return 5;
case 1: return 4;
case 2: return -2;
case 5: return 7;
case 6: return 9;
default: return 42;
}
This speeds up the code by removing the hard-to-predict jump, and
reduces code size by removing the code for the jump targets.
llvm-svn: 163302
another mechanical change accomplished though the power of terrible Perl
scripts.
I have manually switched some "s to 's to make escaping simpler.
While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.
Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/
llvm-svn: 159547
This was done through the aid of a terrible Perl creation. I will not
paste any of the horrors here. Suffice to say, it require multiple
staged rounds of replacements, state carried between, and a few
nested-construct-parsing hacks that I'm not proud of. It happens, by
luck, to be able to deal with all the TCL-quoting patterns in evidence
in the LLVM test suite.
If anyone is maintaining large out-of-tree test trees, feel free to poke
me and I'll send you the steps I used to convert things, as well as
answer any painful questions etc. IRC works best for this type of thing
I find.
Once converted, switch the LLVM lit config to use ShTests the same as
Clang. In addition to being able to delete large amounts of Python code
from 'lit', this will also simplify the entire test suite and some of
lit's architecture.
Finally, the test suite runs 33% faster on Linux now. ;]
For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s
llvm-svn: 159525