Avoid running forever by checking we are not reassociating an expression into
the same form.
Tested with @avoid_infinite_loops in nary-add.ll
llvm-svn: 237269
This patch uses the new function profile metadata "function_entry_count"
to annotate entry counts from sample profiles.
In a sampling profile, the total samples collected at the function entry
are an approximation for the number of times that function was invoked.
llvm-svn: 237265
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`. `id` gets propagated to the ID field
in the generated StackMap section. If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.
This change brings statepoints one step closer to patchpoints. With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.
PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`. This can be made more sophisticated
later.
Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9546
llvm-svn: 237214
Summary:
If the branch that leads to the PHI node and the Select instruction
depend on correlated conditions, we might be able to directly use the
corresponding value from the Select instruction as the incoming value
for the PHI node, allowing later removal of the select instruction.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9051
llvm-svn: 237201
When relocating a pointer, we need to determine a base pointer for the derived pointer being relocated. We have limited support for handling a pointer extracted from a vector; the current code only handled the case where the entire vector was known to contain base pointers. This patch extends the reasoning to handle chains of insertelements where the indices are constants. This case turns out to be fairly common in vectorized code. We can now handle vectors which contains mixtures of base and derived pointers provided the insertelements use constant indices.
Note that this doesn't solve the general problem. To handle variable indexed insertelements, we'd need to scalarize and introduce conditional branching based on the index. Alternatively, we could eagerly scalarize, but the code structure doesn't currently make either fix easy. The patch also doesn't handle shufflevector or other vector manipulation for much the same reasons. I plan to defer this work until I have a motivating test case.
Differential Revision: http://reviews.llvm.org/D9676
llvm-svn: 237200
As a step towards getting rid of internal pass manager hack entirely, remove the need for loop simplify to run in the inner pass manager. The new code does produce slightly different loop structures, so this isn't technically NFC.
Differential Revision: http://reviews.llvm.org/D9585
llvm-svn: 237172
Summary:
This patch reimplements heuristic that tries to estimate optimization beneftis
from complete loop unrolling.
In this patch I kept the minimal changes - e.g. I removed code handling
branches and folding compares. That's a promising area, but now there
are too many questions to discuss before we can enable it.
Test Plan: Tests are included in the patch.
Reviewers: hfinkel, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8816
llvm-svn: 237156
This fixes another miscompile introduced by r235232: when there was a
dependency on the memcpy destination other than the memset, we would
ignore it, because we only looked at the source dependency.
It was a mistake to use SrcDepInfo. Instead, just use DepInfo.
llvm-svn: 237066
Summary:
In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for
each relocated pointer, and the gc_relocate has the same type with the
pointer. During the creation of gc_relocate intrinsic, llvm requires to
mangle its type. However, llvm does not support mangling of all possible
types. RewriteStatepointsForGC will hit an assertion failure when it
tries to create a gc_relocate for pointer to vector of pointers because
mangling for vector of pointers is not supported.
This patch changes the way RewriteStatepointsForGC pass creates
gc_relocate. For each relocated pointer, we erase the type of pointers
and create an unified gc_relocate of type i8 addrspace(1)*. Then a
bitcast is inserted to convert the gc_relocate to the correct type. In
this way, gc_relocate does not need to deal with different types of
pointers and the unsupported type mangling is no longer a problem. This
change would also ease further merge when LLVM erases types of pointers
and introduces an unified pointer type.
Some minor changes are also introduced to gc_relocate related part in
InstCombineCalls, CodeGenPrepare, and Verifier accordingly.
Patch by Chen Li!
Reviewers: reames, AndyAyers, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9592
llvm-svn: 237009
The QPX single-precision load/store intrinsics have implied
truncation/extension from/to the declared value type of <4 x double> to the
memory type of <4 x float>. When we can prove the alignment of the pointer
argument, and thus replace the intrinsic with a regular load or store, we need
to load or store the correct data type (<4 x float>) instead of (<4 x double>).
llvm-svn: 236973
This changes the shape of the statepoint intrinsic from:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)
to:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)
This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.
In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.
Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.
Differential Revision: http://reviews.llvm.org/D9501
llvm-svn: 236888
Summary:
I noticed this bug when deubging a WIP on LSR. I wonder whether and how we
should add a regression test for this.
Test Plan: no tests failed.
Reviewers: atrick
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D9536
llvm-svn: 236887
Summary:
One step further getting aggregate loads and store being optimized
properly. This will only handle struct with one element at this point.
Test Plan: Added unit tests for the new supported cases.
Reviewers: chandlerc, joker-eph, joker.eph, majnemer
Reviewed By: majnemer
Subscribers: pete, llvm-commits
Differential Revision: http://reviews.llvm.org/D8339
Patch by Amaury Sechet.
From: Amaury Sechet <amaury@fb.com>
llvm-svn: 236695
If we have recognized that a conditional is constant at a particular location in the code (while trying to decide if we can simplify a conditional branch), we can eagerly replace that condition with a constant if it's definition is post dominated by the branch in question.
In practice, this ends up being a compile time savings at most. JumpThreading would have visited each using branch anyways. CVP would have visited the cmp itself again. Unless LVI gives up early, we shouldn't gain any addition power by doing this transformation early. What we do gain is simplicity and compile time.
Differential Revision: http://reviews.llvm.org/D9312
llvm-svn: 236684
options.
This commit fixes a bug in llc and opt where "-mcpu" and "-mattr" wouldn't
override function attributes "-target-cpu" and "-target-features" in the IR.
Differential Revision: http://reviews.llvm.org/D9537
llvm-svn: 236677
Renames the original CreateGCStatepoint to CreateGCStatepointCall, and
moves invoke creating functionality from PlaceSafepoints.cpp to
IRBuilder.cpp.
This changes the labels generated for PlaceSafepoints/invokes.ll so use
a regex there to make the basic block labels more resilient.
llvm-svn: 236672
Summary:
When computing branch weights in BPI, we used to disallow branches with
weight 0. This is a minor nuisance, because a branch with weight 0 is
different to "don't have information". In the context of
instrumentation, it may mean "never executed", in the context of
sampling, it means "never or seldom executed".
In allowing 0 weight branches, I ran into issues with the switch
expansion code in selection DAG. It is currently hardwired to not handle
branches with weight 0. To maintain the current behaviour, I changed it
to use 1 when it finds 0, but perhaps the algorithm needs changes to
tolerate branches with weight zero.
Reviewers: hansw
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9533
llvm-svn: 236617
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.
Differential Revision: http://reviews.llvm.org/D9515
llvm-svn: 236613
COMDAT groups which have become rendered unused because of inline are
discardable if we can prove that we've made the group empty.
This fixes PR22285.
llvm-svn: 236539
It got this in some cases (if one of them was an identified object), but not in all cases.
This caused stores to undef to block load-forwarding in some cases, etc.
Added test to Transforms/GVN to verify optimization occurs as expected.
llvm-svn: 236511
When optimizing demanded bits of the operands of an Add we have to
remove the nsw/nuw flags as we have no guarantee anymore that we don't
wrap. This is legal here because the top bit is not demanded. In fact
this operaion was already performed but missed in the case of an Add
with a constant on the right side. To fix this this patch refactors the
code to unify the code paths in SimplifyDemandedUseBits() handling of
Add/Sub:
- The transformation of Add->Or is removed from the simplify demand
code because the equivalent transformation exists in
InstCombiner::visitAdd()
- KnownOnes/KnownZero are not adjusted for Add x, C anymore as
computeKnownBits() already performs these computations.
- The simplification of the operands is unified. In this new version
constant on the right side of a Sub are shrunk now as I could not find
a reason why not to do so.
- The special case for clearing nsw/nuw in ShrinkDemandedConstant() is
not necessary anymore as the caller does that already.
Differential Revision: http://reviews.llvm.org/D9415
llvm-svn: 236269
Summary:
Optimizing these well are especially interesting for IRCE since it
"clamps" values by generating this sort of pattern through SCEV
expressions.
Depends on D9352.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9353
llvm-svn: 236203
Summary:
After this change `MatchSelectPattern` recognizes the following form
of SMIN:
Y >s C ? ~Y : ~C == ~Y <s ~C ? ~Y : ~C = SMIN(~Y, ~C)
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9352
llvm-svn: 236202
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
Summary:
This patch adds constant folding of insertelement instruction to undef value when index operand is constant and is not less than vector size or is undef.
InstCombine does not support this case, but I'm happy to add it there also if this change is accepted.
Test Plan: Unittests and regression tests for ConstProp pass.
Reviewers: majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9287
llvm-svn: 235854
There can be various constant pointers in the IR which do not get relocated at a safepoint. One example is the address of a global variable. Another example is a pointer created via inttoptr. Note that the optimizer itself likes to create such inttoptrs when locally propagating constants through dynamically dead code.
To deal with this, we need to exclude uses of constants from contributing to the liveness of a safepoint which might reach that use. At some later date, it might be worth exploring what could be done to support the relocation of various special types of "constants", but that's future work.
Differential Revision: http://reviews.llvm.org/D9236
llvm-svn: 235821
llvm.frameescape() intrinsic is not a real call. The intrinsic can only exist in the entry block. Inserting a gc.statepoint() before llvm.frameescape() may split the entry block, and push the intrinsic out of the entry block.
Patch by: Swaroop.Sridhar@microsoft.com
Differential Revision: http://reviews.llvm.org/D8910
llvm-svn: 235820
This is a follow-on to D8833 (insertps optimization when the zero mask is not used).
In this patch, we check for the case where the zmask is used, but both input vectors
to the insertps intrinsic are the same operand or the zmask overrides the destination
lane. This lets us replace the 2nd shuffle input operand with the zero vector.
Differential Revision: http://reviews.llvm.org/D9257
llvm-svn: 235810
When using bit tests for hole checks, we call AddPredecessorToBlock to give the
phi node a value from the bit test block. This would break if we've
previously called removePredecessor on the default destination because the
switch is fully covered.
Test case by Mark Lacey.
llvm-svn: 235771
Same as r235145 for the call instruction - the justification, tradeoffs,
etc are all the same. The conversion script worked the same without any
false negatives (after replacing 'call' with 'invoke').
llvm-svn: 235755
Summary:
We pick this order because SeparateConstOffsetFromGEP may create more
opportunities for SLSR.
Test Plan:
reassociate-geps-and-slsr.ll
no performance regression on internal benchmarks
Reviewers: meheff
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D9230
llvm-svn: 235632
Only clear out the NSW/NUW flags if we are optimizing 'add'/'sub' while
taking advantage that the sign bit is not set. We do this optimization
to further shrink the mask but shrinking the mask isn't NSW/NUW
preserving in this case.
llvm-svn: 235558
An nsw/nuw operation relies on the values feeding into it to not
overflow if 'poison' is not to be produced. This means that
optimizations which make modifications to the bottom of a chain (like
SimplifyDemandedBits) must strip out nsw/nuw if they cannot ensure that
they will be preserved.
This fixes PR23309.
llvm-svn: 235544
https://llvm.org/bugs/show_bug.cgi?id=23163.
Gep merging sometimes behaves like a reverse CSE/LICM optimization,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.
The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.
Differential Revision: http://reviews.llvm.org/D8911
llvm-svn: 235455
https://llvm.org/bugs/show_bug.cgi?id=23163.
Gep merging sometimes behaves like a reverse CSE/LICM optimizations,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.
The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.
Differential Revision: http://reviews.llvm.org/D9007
llvm-svn: 235451
MemIntrinsic::getDest() looks through pointer casts, and using it
directly when building the new GEP+memset results in stuff like:
%0 = getelementptr i64* %p, i32 16
%1 = bitcast i64* %0 to i8*
call ..memset(i8* %1, ...)
instead of the correct:
%0 = bitcast i64* %p to i8*
%1 = getelementptr i8* %0, i32 16
call ..memset(i8* %1, ...)
Instead, use getRawDest, which just gives you the i8* value.
While there, use the memcpy's dest, as it's live anyway.
In most cases, when the optimization triggers, the memset and memcpy
sizes are the same, so the built memset is 0-sized and eliminated.
The problem occurs when they're different.
Fixes a regression caused by r235232: PR23300.
llvm-svn: 235419
Summary:
After we rewrite a candidate, the instructions used by the old form may
become unused. This patch cleans up these unused instructions so that we
needn't run DCE after SLSR.
Test Plan: removed -dce in all the SLSR tests
Reviewers: broune, meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9101
llvm-svn: 235410
Summary: so that we needn't run DCE after this pass.
Test Plan: removed -dce from the commandline in split-gep.ll and split-gep-and-gvn.ll
Reviewers: meheff
Subscribers: llvm-commits, HaoLiu, hfinkel, jholewinski
Differential Revision: http://reviews.llvm.org/D9096
llvm-svn: 235409
This commit fixes the code which adds lifetime markers in InlineFunction to skip
zero-sized allocas instead of asserting on them.
rdar://problem/20531155
llvm-svn: 235312
Harden r235258 to support any integer bitwidth. The quick glance at
the reference made me think only i32 and i64 were valid types, but
they're not special, so any overload is legal.
Thanks to David Majnemer for noticing!
llvm-svn: 235261
Followup to r235232, which caused PR23278.
We can't assume the memset and memcpy sizes have the same type, as
nothing in the language reference prevents that.
Instead, zext both to i64 if they disagree.
While there, robustify tests by using i8 %c rather than i8 0 for the
memset character.
llvm-svn: 235258
A common idiom in some code is to do the following:
memset(dst, 0, dst_size);
memcpy(dst, src, src_size);
Some of the memset is redundant; instead, we can do:
memcpy(dst, src, src_size);
memset(dst + src_size, 0,
dst_size <= src_size ? 0 : dst_size - src_size);
Original patch by: Joel Jones
Differential Revision: http://reviews.llvm.org/D498
llvm-svn: 235232
Summary:
An alternative is to use a worklist approach. However, that approach
would break the traversing order so that we couldn't lookup SeenExprs
efficiently. I don't see a clear winner here, so I picked the easier approach.
Along with two minor improvements:
1. preserves ScalarEvolution by forgetting instructions replaced
2. removes dead code locally avoiding the need of running DCE afterwards
Test Plan: add to slsr-add.ll a test that requires multiple iterations
Reviewers: broune, dberlin, atrick, meheff
Reviewed By: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9058
llvm-svn: 235151
See r230786 and r230794 for similar changes to gep and load
respectively.
Call is a bit different because it often doesn't have a single explicit
type - usually the type is deduced from the arguments, and just the
return type is explicit. In those cases there's no need to change the
IR.
When that's not the case, the IR usually contains the pointer type of
the first operand - but since typed pointers are going away, that
representation is insufficient so I'm just stripping the "pointerness"
of the explicit type away.
This does make the IR a bit weird - it /sort of/ reads like the type of
the first operand: "call void () %x(" but %x is actually of type "void
()*" and will eventually be just of type "ptr". But this seems not too
bad and I don't think it would benefit from repeating the type
("void (), void () * %x(" and then eventually "void (), ptr %x(") as has
been done with gep and load.
This also has a side benefit: since the explicit type is no longer a
pointer, there's no ambiguity between an explicit type and a function
that returns a function pointer. Previously this case needed an explicit
type (eg: a function returning a void() function was written as
"call void () () * @x(" rather than "call void () * @x(" because of the
ambiguity between a function returning a pointer to a void() function
and a function returning void).
No ambiguity means even function pointer return types can just be
written alone, without writing the whole function's type.
This leaves /only/ the varargs case where the explicit type is required.
Given the special type syntax in call instructions, the regex-fu used
for migration was a bit more involved in its own unique way (as every
one of these is) so here it is. Use it in conjunction with the apply.sh
script and associated find/xargs commands I've provided in rr230786 to
migrate your out of tree tests. Do let me know if any of this doesn't
cover your cases & we can iterate on a more general script/regexes to
help others with out of tree tests.
About 9 test cases couldn't be automatically migrated - half of those
were functions returning function pointers, where I just had to manually
delete the function argument types now that we didn't need an explicit
function type there. The other half were typedefs of function types used
in calls - just had to manually drop the * from those.
import fileinput
import sys
import re
pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)')
addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$")
func_end = re.compile("(?:void.*|\)\s*)\*$")
def conv(match, line):
if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)):
return line
return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():]
for line in sys.stdin:
sys.stdout.write(conv(re.search(pat, line), line))
llvm-svn: 235145
Summary:
This fixes a left-over efficiency issue in D8950.
As Andrew and Daniel suggested, we can store the candidates in a stack
and pop the top element when it does not dominate the current
instruction. This reduces the worst-case time complexity to O(n).
Test Plan: a new test in nary-add.ll that exercises this optimization.
Reviewers: broune, dberlin, meheff, atrick
Reviewed By: atrick
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D9055
llvm-svn: 235129
This is very similar to D8486 / r232852 (vperm2). If we treat insertps intrinsics
as shufflevectors, we can optimize them better.
I've left all but the full zero case of the zero mask variants out of this patch.
I don't think those can be converted into a single shuffle in all cases, but I'd
be happy to be proven wrong as I was for vperm2f128.
Either way, we'd need to support whatever sequence we come up with for those cases
in the backend before converting them here.
Differential Revision: http://reviews.llvm.org/D8833
llvm-svn: 235124
Remove 'inlinedAt:' from MDLocalVariable. Besides saving some memory
(variables with it seem to be single largest `Metadata` contributer to
memory usage right now in -g -flto builds), this stops optimization and
backend passes from having to change local variables.
The 'inlinedAt:' field was used by the backend in two ways:
1. To tell the backend whether and into what a variable was inlined.
2. To create a unique id for each inlined variable.
Instead, rely on the 'inlinedAt:' field of the intrinsic's `!dbg`
attachment, and change the DWARF backend to use a typedef called
`InlinedVariable` which is `std::pair<MDLocalVariable*, MDLocation*>`.
This `DebugLoc` is already passed reliably through the backend (as
verified by r234021).
This commit removes the check from r234021, but I added a new check
(that will survive) in r235048, and changed the `DIBuilder` API in
r235041 to require a `!dbg` attachment whose 'scope:` is in the same
`MDSubprogram` as the variable's.
If this breaks your out-of-tree testcases, perhaps the script I used
(mdlocalvariable-drop-inlinedat.sh) will help; I'll attach it to PR22778
in a moment.
llvm-svn: 235050
Add missing `!dbg` attachments to `@llvm.dbg.*` intrinsics. I updated
these using a script (add-dbg-to-intrinsics.sh) that I'll attach to
PR22778 for posterity.
llvm-svn: 235040
Summary:
With this patch, SLSR may rewrite
S1: X = B + i * S
S2: Y = B + i' * S
to
S2: Y = X + (i' - i) * S
A secondary improvement: if (i' - i) is a power of 2, emit Y as X + (S << log(i' - i)). (S << log(i' -i)) is in a canonical form and thus more likely GVN'ed than (i' - i) * S.
Test Plan: slsr-add.ll
Reviewers: hfinkel, sanjoy, meheff, broune, eliben
Reviewed By: eliben
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8983
llvm-svn: 235019
Inlining such intrinsics is very difficult, since you need to
simultaneously transform many calls to llvm.framerecover and potentially
duplicate the functions containing them. Normally this intrinsic isn't
added until EH preparation, which is part of the backend pass pipeline
after inlining. However, if it were to get fed through the inliner,
this change will ensure that it doesn't break the code.
llvm-svn: 234937
Summary:
This transformation reassociates a n-ary add so that the add can partially reuse
existing instructions. For example, this pass can simplify
void foo(int a, int b) {
bar(a + b);
bar((a + 2) + b);
}
to
void foo(int a, int b) {
int t = a + b;
bar(t);
bar(t + 2);
}
saving one add instruction.
Fixes PR22357 (https://llvm.org/bugs/show_bug.cgi?id=22357).
Test Plan: nary-add.ll
Reviewers: broune, dberlin, hfinkel, meheff, sanjoy, atrick
Reviewed By: sanjoy, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8950
llvm-svn: 234855
Summary:
Runtime unrolling of loops needs to emit an expression to compute the
loop's runtime trip-count. Avoid runtime unrolling if this computation
will be expensive.
Depends on D8993.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8994
llvm-svn: 234846
Summary:
Teach `isHighCostExpansion` to consider divisions by power-of-two
constants as cheap and add a test case. This change is needed for a new
user of `isHighCostExpansion` that will be added in a subsequent change.
Depends on D8995.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8993
llvm-svn: 234845
Clean up a predicate I added in r229731, fix the relevant comment and
add a test case. The earlier version is confusing to read and was also
buggy (probably not a coincidence) till Alexey fixed it in r233881.
llvm-svn: 234701
When rewriting statepoints to make relocations explicit, we need to have a conservative but consistent notion of where a particular pointer is live at a particular site. The old code just used dominance, which is correct, but decidedly more conservative then it needed to be. This patch implements a simple dataflow algorithm that's run one per function (well, twice counting fixup after base pointer insertion). There's still lots of room to make this faster, but it's fast enough for all practical purposes today.
Differential Revision: http://reviews.llvm.org/D8674
llvm-svn: 234657
Two related small changes:
Various dominance based queries about liveness can get confused if we're talking about unreachable blocks. To avoid reasoning about such cases, just remove them before rewriting statepoints.
Remove single entry phis (likely left behind by LCSSA) to reduce the number of live values.
Both of these are motivated by http://reviews.llvm.org/D8674 which will be submitted shortly.
Differential Revision: http://reviews.llvm.org/D8675
llvm-svn: 234651
This patch adds limited support for inserting explicit relocations when there's a vector of pointers live over the statepoint. This doesn't handle the case where the vector contains a mix of base and non-base pointers; that's future work.
The current implementation just scalarizes the vector over the gc.statepoint before doing the explicit rewrite. An alternate approach would be to plumb the vector all the way though the backend lowering, but doing that appears challenging. In particular, the size of the indirect spill slot is currently assumed to be sizeof(pointer) throughout the backend.
In practice, this is enough to allow running the SLP and Loop vectorizers before RewriteStatepointsForGC.
Differential Revision: http://reviews.llvm.org/D8671
llvm-svn: 234647
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep. Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.
Depends on D8888.
Reviewers: majnemer, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8889
llvm-svn: 234638
InstCombine didn't realize that it needs to use DataLayout to determine
how wide pointers are. This lead to assertion failures.
This fixes PR23113.
llvm-svn: 234046
Check that the `MDLocalVariable::getInlinedAt()` in a debug info
intrinsic's variable always matches the `MDLocation::getInlinedAt()` of
its `!dbg` attachment.
The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), since it's expensive and unnecessary, but I'll let
this verifier check bake for a while (a week maybe?) first. I've
updated the testcases that had the wrong value for `inlinedAt:`.
This checks that things are sane in the IR, but currently things go out
of whack in a few places in the backend. I'll follow shortly with
assertions in the backend (with code fixes).
If you have out-of-tree testcases that just started failing, here's how
I updated these ones:
1. The verifier check gives you the basic block, function, instruction,
and relevant metadata arguments (metadata numbering doesn't
necessarily match the source file, unfortunately).
2. Look at the `@llvm.dbg.*()` instruction, and compare the
`inlinedAt:` fields of the variable argument (second `metadata`
argument) and the `!dbg` attachment.
3. Figure out based on the variable `scope:` chain and the functions in
the file whether the variable has been inlined (and into what), so
you can determine which `inlinedAt:` is actually correct. In all of
the in-tree testcases, the `!MDLocation()` was correct and the
`!MDLocalVariable()` was wrong, but YMMV.
4. Duplicate the metadata that you're going to change, and add/drop the
`inlinedAt:` field from one of them. Be careful that the other
references to the same metadata node point at the correct one.
llvm-svn: 234021
Summary:
The old requirement on GEP candidates being in bounds is unnecessary.
For off-bound GEPs, we still have
&B[i * S] = B + (i * S) * e = B + (i * e) * S
Test Plan: slsr_offbound_gep in slsr-gep.ll
Reviewers: meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8809
llvm-svn: 233949
We used to do this before refactorings around r225640.
Some clang users checked for _chk libcall availability using:
__has_builtin(__builtin___memcpy_chk)
When compiling with -fno-builtin, this is always true.
When passing -ffreestanding/-mkernel, which both imply -fno-builtin, we
end up with fortified libcalls, which isn't acceptable in a freestanding
environment which only provides their non-fortified counterparts.
Until we change clang and/or teach external users to check for availability
differently, disregard the "nobuiltin" attribute and TLI::has.
Workaround for PR23093.
llvm-svn: 233776
PPC_FP128 is really the sum of two consecutive doubles, where the first double
is always stored first in memory, regardless of the target endianness. The
memory layout of i128, however, depends on the target endianness, and so we
can't fold this without target endianness information. As a result, we must not
do this folding in lib/IR/ConstantFold.cpp (it could be done instead in
Analysis/ConstantFolding.cpp, but that's not done now).
Fixes PR23026.
llvm-svn: 233481
Fix testcases that don't pass the verifier after a WIP patch to check
`MDSubprogram` operands more effectively. I found the following issues:
- When `isDefinition: false`, the `variables:` field might point at
`!{i32 786468}`, or at a tuple that pointed at an empty tuple with
the comment "previously: invalid DW_TAG_base_type" (I vaguely recall
adding those comments during an upgrade script). In these cases, I
just dropped the array.
- The `variables:` field might point at something like `!{!{!8}}`,
where `!8` was an `MDLocation`. I removed the extra layer of
indirection.
- Invalid `type:` (not an `MDSubroutineType`).
llvm-svn: 233466
Change `llc` and `opt` to run `verifyModule()`. This ensures that we
check the full module before `FunctionPass::doInitialization()` ever
gets called (I was getting crashes in `DwarfDebug` instead of verifier
failures when testing a WIP patch that checks operands of compile
units). In `opt`, also move up debug-info-stripping so that it still
runs before verification.
There was a fair bit of broken code that was sitting in tree.
Interestingly, some were cases of a `select` that referred to itself in
`-instcombine` tests (apparently an intermediate result). I split them
off to `*-noverify.ll` tests with RUN lines like this:
opt < %s -S -disable-verify -instcombine | opt -S | FileCheck %s
This avoids verifying the input file (so we can get the broken code into
`-instcombine), but still verifies the output with a second call to
`opt` (to verify that `-instcombine` will clean it up like it should).
llvm-svn: 233432
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types. The problems look like they were all
caused by bitrot. They fell into these categories:
- Using `!{i32 0}` instead of `!{}`.
- Using `!{null}` instead of `!{}`.
- Using `!MDExpression()` instead of `!{}`.
- Using `!8` instead of `!{!8}`.
- `file:` references that pointed at `MDCompileUnit`s instead of the
same `MDFile` as the compile unit.
- `file:` references that were numerically off-by-one or (off-by-ten).
llvm-svn: 233415
This re-adds float2int to the tree, after fixing PR23038. It turns
out the argument to APSInt() is true-if-unsigned, rather than
true-if-signed :(. Added testcase and explanatory comment.
llvm-svn: 233370
This was discussed a while back and I left it optional for migration. Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.
llvm-svn: 233357
Fix testcases whose variables are invalid. I'm working on a patch that
adds `Verifier` checks for `MDLocalVariable` (and `MDGlobalVariable`),
and these failed because:
- `scope:` fields need to point at `MDLocalScope` and can't be null.
- `file:` fields need to point at `MDFile`.
- `inlinedAt:` fields need to point at `MDLocation`.
llvm-svn: 233349
Summary:
This patch enhances SLSR to handle another candidate form &B[i * S]. If
we found two candidates
S1: X = &B[i * S]
S2: Y = &B[i' * S]
and S1 dominates S2, we can replace S2 with
Y = &X[(i' - i) * S]
Test Plan:
slsr-gep.ll
X86/no-slsr.ll: verify that we do not run SLSR on GEPs that already fit into
an addressing mode
Reviewers: eliben, atrick, meheff, hfinkel
Reviewed By: hfinkel
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D7459
llvm-svn: 233286
A load from an invariant location is assumed to not alias any otherwise potentially aliasing stores. Our implementation only applied this rule to store instructions themselves whereas they it should apply for any memory accessing instruction. This results in both FRE and PRE becoming more effective at eliminating invariant loads.
Note that as a follow on change I will likely move this into AliasAnalysis itself. That's where the TBAA constant flag is handled and the semantics are essentially the same. I'd like to separate the semantic change from the refactoring and thus have extended the hack that's already in MemoryDependenceAnalysis for this change.
Differential Revision: http://reviews.llvm.org/D8591
llvm-svn: 233140
This patch tries to merge duplicate landing pads when they branch to a common shared target.
Given IR that looks like this:
lpad1:
%exn = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
cleanup
br label %shared_resume
lpad2:
%exn2 = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
cleanup
br label %shared_resume
shared_resume:
call void @fn()
ret void
}
We can rewrite the users of both landing pad blocks to use one of them. This will generally allow the shared_resume block to be merged with the common landing pad as well.
Without this change, tail duplication would likely kick in - creating N (2 in this case) copies of the shared_resume basic block.
Differential Revision: http://reviews.llvm.org/D8297
llvm-svn: 233125
Assert that this doesn't fire - I'll remove all of this later, but just
leaving it in for a while in case this is firing & we just don't have
test coverage.
llvm-svn: 233116
This is the IR optimizer follow-on patch for D8563: the x86 backend patch
that converts this kind of shuffle back into a vperm2.
This is also a continuation of the transform that started in D8486.
In that patch, Andrea suggested that we could convert vperm2 intrinsics that
use zero masks into a single shuffle.
This is an implementation of that suggestion.
Differential Revision: http://reviews.llvm.org/D8567
llvm-svn: 233110
This caused PR23008, compiles failing with: "Use still stuck around after Def is
destroyed: %.sroa.speculated"
Also reverting follow-up r233064.
llvm-svn: 233105
It is possible to have code that converts from integer to float, performs operations then converts back, and the result is provably the same as if integers were used.
This can come from different sources, but the most obvious is a helper function that uses floats but the arguments given at an inlined callsites are integers.
This pass considers all integers requiring a bitwidth less than or equal to the bitwidth of the mantissa of a floating point type (23 for floats, 52 for doubles) as exactly representable in floating point.
To reduce the risk of harming efficient code, the pass only attempts to perform complete removal of inttofp/fptoint operations, not just move them around.
llvm-svn: 233062
Continue to simplify the `DIDescriptor` subclasses, so that they behave
more like raw pointers. Remove `getRaw()`, replace it with an
overloaded `get()`, and overload the arrow and cast operators. Two
testcases started to crash on the arrow operators with this change
because of `scope:` references that weren't real scopes. I fixed them.
Soon I'll add verifier checks for them too.
This also adds explicit dereference operators. Previously, the builtin
dereference against `operator MDNode *()` would have worked, but now the
builtins are ambiguous.
llvm-svn: 233030
strchr("123!", C) != nullptr is a common pattern to check if C is one
of 1, 2, 3 or !. If the largest element of the string is smaller than
the target's register size we can easily create a bitfield and just
do a simple test for set membership.
int foo(char C) { return strchr("123!", C) != nullptr; } now becomes
cmpl $64, %edi ## range check
sbbb %al, %al
movabsq $0xE000200000001, %rcx
btq %rdi, %rcx ## bit test
sbbb %cl, %cl
andb %al, %cl ## and the two conditions
andb $1, %cl
movzbl %cl, %eax ## returning an int
ret
(imho the backend should expand this into a series of branches, but
that's a different story)
The code is currently limited to bit fields that fit in a register, so
usually 64 or 32 bits. Sadly, this misses anything using alpha chars
or {}. This could be fixed by just emitting a i128 bit field, but that
can generate really ugly code so we have to find a better way. To some
degree this is also recreating switch lowering logic, but we can't
simply emit a switch instruction and thus change the CFG within
instcombine.
llvm-svn: 232902
r216771 introduced a change to MemoryDependenceAnalysis that allowed it
to reason about acquire/release operations. However, this change does
not ensure that the acquire/release operations pair. Unfortunately,
this leads to miscompiles as we won't see an acquire load as properly
memory effecting. This largely reverts r216771.
This fixes PR22708.
llvm-svn: 232889
vperm2* intrinsics are just shuffles.
In a few special cases, they're not even shuffles.
Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:
1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
really angry (and so I have regrets about some patches from last week).
2. Doing mask conversion logic in header files is hard to write and
subsequently read.
There are a couple of TODOs in this patch to complete this optimization.
Differential Revision: http://reviews.llvm.org/D8486
llvm-svn: 232852
When estimating SROA savings, we want to see if an address is derived
off an alloca in the caller. For store instructions, operand 1 is the
address operand, but the current code uses operand 0. Use
getPointerOperand for loads and stores to fix this.
Patch by Easwaran Raman.
http://reviews.llvm.org/D8425
llvm-svn: 232827
Each use of the byte array uses a different alias. This makes the
backend less likely to reuse previously computed byte array addresses,
improving the security of the CFI mechanism based on this pass.
Differential Revision: http://reviews.llvm.org/D8455
llvm-svn: 232770
Also, add several entries to vectorizable functions table, and
corresponding tests. The table isn't complete, it'll be populated later.
Review: http://reviews.llvm.org/D8131
llvm-svn: 232531
Re-commit the test cases added in r232444. These now use
-irce-print-changed-loops and -irce-print-range-checks so they run
correctly on a without asserts build of llvm.
llvm-svn: 232452
I accidentally checked in two tests that used -debug-only -- these fail
on a release LLVM build. Temporarily delete these from the repo to keep
the bots green while I fix this locally.
llvm-svn: 232446
This change to IRCE gets it to recognize "half" range checks. Half
range checks are range checks that only either check if the index is
`slt` some positive integer ("length") or if the index is `sge` `0`.
The range solver does not try to be clever / aggressive about solving
half-range checks -- it transforms "I < L" to "0 <= I < L" and "0 <= I"
to "0 <= I < INT_SMAX". This is safe, but not always optimal.
llvm-svn: 232444
By default we want our gcov emission to stay 4.2 compatible, which
means we need to continue emit the exit block last by default. We add
an option to emit it before the body for users that need it.
llvm-svn: 232438
As part of PR22777, fix testcases that fail the debug info verifier.
The changes fall into the following categories:
- Empty `filename:` fields in `MDFile`s. Compile units and some types
require non-empty filenames. A number of testcases have empty
filenames, probably due to hand-reduction of testcases.
- Not-quite empty arrays: `!{i32 0}`. This used to be equivalent in
the debug info schema to `!{}`. They cause problems for
`!MDSubroutineType`'s `types:` array, since it requires all operands
to be valid types. (Note that `!{null}` is the correct type array
for functions that take no arguments and return `void`.)
- Significantly bitrotted testcases. Nodes got left behind a few
upgrades ago because of missing or invalid tags.
llvm-svn: 232415
The problem here is the infamous one direction known safe. I was
hesitant to turn it off before b/c of the potential for regressions
without an actual bug from users hitting the problem. This is that bug ;
).
The main performance impact of having known safe in both directions is
that often times it is very difficult to find two releases without a use
in-between them since we are so conservative with determining potential
uses. The one direction known safe gets around that problem by taking
advantage of many situations where we have two retains in a row,
allowing us to avoid that problem. That being said, the one direction
known safe is unsafe. Consider the following situation:
retain(x)
retain(x)
call(x)
call(x)
release(x)
Then we know the following about the reference count of x:
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
retain(x)
// rc(x) == N+2
call A(x)
call B(x)
// rc(x) >= 1 (since we can not release a deallocated pointer).
release(x)
// rc(x) >= 0
That is all the information that we can know statically. That means that
we know that A(x), B(x) together can release (x) at most N+1 times. Lets
say that we remove the inner retain, release pair.
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
call A(x)
call B(x)
// rc(x) >= 1
release(x)
// rc(x) >= 0
We knew before that A(x), B(x) could release x up to N+1 times meaning
that rc(x) may be zero at the release(x). That is not safe. On the other
hand, consider the following situation where we have a must use of
release(x) that x must be kept alive for after the release(x)**. Then we
know that:
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
retain(x)
// rc(x) == N+2
call A(x)
call B(x)
// rc(x) >= 2 (since we know that we are going to release x and that that release can not be the last use of x).
release(x)
// rc(x) >= 1 (since we can not deallocate the pointer since we have a must use after x).
…
// rc(x) >= 1
use(x)
Thus we know that statically the calls to A(x), B(x) can together only
release rc(x) N times. Thus if we remove the inner retain, release pair:
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
call A(x)
call B(x)
// rc(x) >= 1
…
// rc(x) >= 1
use(x)
We are still safe unless in the final … there are unbalanced retains,
releases which would have caused the program to blow up anyways even
before optimization occurred. The simplest form of must use is an
additional release that has not been paired up with any retain (if we
had paired the release with a retain and removed it we would not have
the additional use). This fits nicely into the ARC framework since
basically what you do is say that given any nested releases regardless
of what is in between, the inner release is known safe. This enables us to get
back the lost performance.
<rdar://problem/19023795>
llvm-svn: 232351
Verify that debug info intrinsic arguments are valid. (These checks
will not recurse through the full debug info graph, so they don't need
to be cordoned of in `DebugInfoVerifier`.)
With those checks in place, changing the `DbgIntrinsicInst` accessors to
downcast to `MDLocalVariable` and `MDExpression` is natural (added isa
specializations in `Metadata.h` to support this).
Added tests to `test/Verifier` for the new -verify checks, and fixed the
debug info in all the in-tree tests.
If you have out-of-tree testcases that have started to fail to -verify,
hopefully the verify checks are helpful. The most likely problem is
that the expression argument is `!{}` (instead of `!MDExpression()`).
llvm-svn: 232296
Summary: This is a first step toward getting proper support for aggregate loads and stores.
Test Plan: Added unittests
Reviewers: reames, chandlerc
Reviewed By: chandlerc
Subscribers: majnemer, joker.eph, chandlerc, llvm-commits
Differential Revision: http://reviews.llvm.org/D7780
Patch by Amaury Sechet
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 232284
The linker on that platform may re-order symbols or strip dead symbols, which
will break bit set checks. Avoid this by hiding the symbols from the linker.
llvm-svn: 232235
This reapplies the patch previously committed at revision 232190. This was
reverted at revision 232196 as it caused test failures in tests that did not
expect operands to be commuted. I have made the tests more resilient to
reassociation in revision 232206.
llvm-svn: 232209
As a follow-up to r232200, add an `-instcombine` to canonicalize scalar
allocations to `i32 1`. Since r232200, `iX 1` (for X != 32) are only
created by RAUWs, so this shouldn't fire too often. Nevertheless, it's
a cheap check and a nice cleanup.
llvm-svn: 232202
Write the `alloca` array size explicitly when it's non-canonical.
Previously, if the array size was `iX 1` (where X is not 32), the type
would mutate to `i32` when round-tripping through assembly.
The testcase I added fails in `verify-uselistorder` (as well as
`FileCheck`), since the use-lists for `i32 1` and `i64 1` change.
(Manman Ren came across this when running `verify-uselistorder` on some
non-trivial, optimized code as part of PR5680.)
The type mutation started with r104911, which allowed array sizes to be
something other than an `i32`. Starting with r204945, we
"canonicalized" to `i64` on 64-bit platforms -- and then on every
round-trip through assembly, mutated back to `i32`.
I bundled a fixup for `-instcombine` to avoid r204945 on scalar
allocations. (There wasn't a clean way to sequence this into two
commits, since the assembly change on its own caused testcase churn, and
the `-instcombine` change can't be tested without the assembly changes.)
An obvious alternative fix -- change `AllocaInst::AllocaInst()`,
`AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for
scalar allocations -- was rejected out of hand, since this required
teaching them each about the data layout.
A follow-up commit will add an `-instcombine` to canonicalize the scalar
allocation array size to `i32 1` rather than leaving `iX 1` alone.
rdar://problem/20075773
llvm-svn: 232200
This patch adds initial support for vector instructions to the reassociation
pass. It enables most parts of the pass to work with vectors but to keep the
size of the patch small, optimization of Xor trees, canonicalization of
negative constants and converting shifts to muls, etc., have been left out.
This will be handled in later patches.
The patch is based on an initial patch by Chad Rosier.
Differential Revision: http://reviews.llvm.org/D7566
llvm-svn: 232190
Similar to gep (r230786) and load (r230794) changes.
Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.
(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)
import fileinput
import sys
import re
rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)
def conv(match):
line = match.group(1)
line += match.group(4)
line += ", "
line += match.group(2)
return line
line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
sys.stdout.write(line[off:match.start()])
sys.stdout.write(conv(match))
off = match.end()
sys.stdout.write(line[off:])
llvm-svn: 232184
Constant folding for shift IR instructions ignores all bits above 32 of
second argument (shift amount).
Because of that, some undef results are not recognized and APInt can
raise an assert failure if second argument has more than 64 bits.
Patch by Paweł Bylica!
Differential Revision: http://reviews.llvm.org/D7701
llvm-svn: 232176
It's firstly committed at r231630, and reverted at r231635.
Function pass InstructionSimplifier is inserted as barrier to
make sure loop unroll pass won't affect on LICM pass.
llvm-svn: 232011
The CallGraphNode function "addCalledFunction()" asserts that edges are not to intrinsics.
This patch makes sure that the Inliner does not add such an edge to the callgraph.
Fix for clang crash by assertion: https://llvm.org/bugs/show_bug.cgi?id=22857
Differential Revision: http://reviews.llvm.org/D8231
llvm-svn: 231927
Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass.
I noticed this completely by accident in another test case. It's not anything that actually came from a real workload.
p.s. We should probably do the same thing for switch instructions.
Differential Revision: http://reviews.llvm.org/D8220
llvm-svn: 231881
This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.
Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems. In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow. Even with that restriction, it handles many interesting cases:
* Early exits from functions
* Early exits from loops (for context instructions in the loop and after the check)
* Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)
Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.
This patch implements two approaches to the problem that need further benchmarking. Approach 1 is to directly walk the dominator tree looking for interesting conditions. Approach 2 is to inspect other uses of the value being queried for interesting comparisons. From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated.
Differential Revision: http://reviews.llvm.org/D7708
llvm-svn: 231879
ReplaceInstUsesWith needs to return nullptr when the input has no users,
because in that case it does not mutate the program. Otherwise, we can
get stuck in an infinite loop of repeatedly attempting to constant fold
and instruction with no users.
llvm-svn: 231755
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.
llvm-svn: 231632
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.
llvm-svn: 231631
Runtime unrollng will introduce a runtime check in loop prologue.
If the unrolled loop is a inner loop, then the proglogue will be inside
the outer loop. LICM pass can help to promote the runtime check out if
the checked value is loop invariant.
llvm-svn: 231630
Summary:
See the two test cases.
; Can fold fcmp with undef on one side by choosing NaN for the undef
; Can fold fcmp with undef on both side
; fcmp u_pred undef, undef -> true
; fcmp o_pred undef, undef -> false
; because whatever you choose for the first undef
; you can choose NaN for the other undef
Reviewers: hfinkel, chandlerc, majnemer
Reviewed By: majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D7617
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231626
This pass interchanges loops to provide a more cache-friendly memory access.
For e.g. given a loop like -
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
A[j][i] = A[j][i]+B[j][i];
is interchanged to -
for(int j=0;j<N;j++)
for(int i=0;i<N;i++)
A[j][i] = A[j][i]+B[j][i];
This pass is currently disabled by default.
To give a brief introduction it consists of 3 stages-
LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.
LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.
A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499
llvm-svn: 231458
At this point, we should have decent coverage of the involved code. I've got a few more test cases to cleanup and submit, but what's here is already reasonable.
I've got a collection of liveness tests which will be posted for review along with a decent liveness algorithm in the next few days. Once those are in, the code in this file should be well tested and I can start renaming things without risk of serious breakage.
llvm-svn: 231414
isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions.
Patch by Pawel Jurek <pawel.jurek@intel.com>
Differential Revision: http://reviews.llvm.org/D8053
llvm-svn: 231359
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
RewriteStatepointsForGC pass emits an alloca for each GC pointer which will be relocated. It then inserts stores after def and all relocations, and inserts loads before each use as well. In the end, mem2reg is used to update IR with relocations in SSA form.
However, there is a problem with inserting stores for values defined by invoke instructions. The code didn't expect a def was a terminator instruction, and inserting instructions after these terminators resulted in malformed IR.
This patch fixes this problem by handling invoke instructions as a special case. If the def is an invoke instruction, the store will be inserted at the beginning of the normal destination block. Since return value from invoke instruction does not dominate the unwind destination block, no action is needed there.
Patch by: Chen Li
Differential Revision: http://reviews.llvm.org/D7923
llvm-svn: 231183
Selection conditions may be vectors or scalars. Make sure InstCombine
doesn't indiscriminately assume that a select which is value dependent
on another select have identical select condition types.
This fixes PR22773.
llvm-svn: 231156
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464. I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned. Let me know if I'm wrong :).
The code changes are fairly mechanical:
- Bumped the "Debug Info Version".
- `DIBuilder` now creates the appropriate subclass of `MDNode`.
- Subclasses of DIDescriptor now expect to hold their "MD"
counterparts (e.g., `DIBasicType` expects `MDBasicType`).
- Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
for printing comments.
- Big update to LangRef to describe the nodes in the new hierarchy.
Feel free to make it better.
Testcase changes are enormous. There's an accompanying clang commit on
its way.
If you have out-of-tree debug info testcases, I just broke your build.
- `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to
update all the IR testcases.
- Unfortunately I failed to find way to script the updates to CHECK
lines, so I updated all of these by hand. This was fairly painful,
since the old CHECKs are difficult to reason about. That's one of
the benefits of the new hierarchy.
This work isn't quite finished, BTW. The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro). Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I
also expect to make a few schema changes now that it's easier to reason
about everything.
llvm-svn: 231082
By loading from indexed offsets into a byte array and applying a mask, a
program can test bits from the bit set with a relatively short instruction
sequence. For example, suppose we have 15 bit sets to lay out:
A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
L (4 bits), M (3 bits), N (2 bits), O (1 bit)
These bits can be laid out in a 16-byte array like this:
Byte Offset
0123456789ABCDEF
Bit
7 HHHHHHHHHIIIIIII
6 GGGGGGGGGGJJJJJJ
5 FFFFFFFFFFFKKKKK
4 EEEEEEEEEEEELLLL
3 DDDDDDDDDDDDDMMM
2 CCCCCCCCCCCCCCNN
1 BBBBBBBBBBBBBBBO
0 AAAAAAAAAAAAAAAA
For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
in 1-2 machine instructions on x86, or 4-6 instructions on ARM.
This uses the LPT multiprocessor scheduling algorithm to lay out the bits
efficiently.
Saves ~450KB of instructions in a recent build of Chromium.
Differential Revision: http://reviews.llvm.org/D7954
llvm-svn: 231043
There's really no reason to have them have entries in the symbol table
anymore. Old versions of ld64 had some bugs in this area but those have
been fixed long ago.
llvm-svn: 231041
This re-lands change r230921. r230921 was reverted because it broke a
clang test; a checkin fixing the clang test will be commited shortly.
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533. SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow. There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.
Revert "IndVarSimplify: Allow LFTR to fire more often"
This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).
Revert "IndVarSimplify: Don't let LFTR compare against a poison value"
This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).
Reviewers: majnemer, atrick, spatel
Differential Revision: http://reviews.llvm.org/D7979
llvm-svn: 231018
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533. SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow. There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.
Revert "IndVarSimplify: Allow LFTR to fire more often"
This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).
Revert "IndVarSimplify: Don't let LFTR compare against a poison value"
This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).
Reviewers: majnemer, atrick, spatel
Differential Revision: http://reviews.llvm.org/D7979
llvm-svn: 230921
r228631 stopped using `DW_OP_piece` inside `DIExpression`s in the IR,
but it apparently missed updating these testcases. Caught by verifier
checks for `MDExpression` while working on moving the new hierarchy into
place.
llvm-svn: 230882
Leaving empty blocks around just opens up a can of bugs like PR22704. Deleting
them early also slightly simplifies code.
Thanks to Sanjay for the IR test case.
llvm-svn: 230856
It turns out the naming of inserted phis and selects is sensative to the order in which two sets are iterated. We need to nail this down to avoid non-deterministic output and possible test failures.
The modified test is the one I first noticed something odd in. The change is making it more strict to report the error. With the test change, but without the code change, the test fails roughly 1 in 5. With the code change, I've run ~30 runs without error.
Long term, the right fix here is to adjust the naming scheme. I'm checking in this hack to avoid any possible non-determinism in the tests over the weekend. HJust because I only noticed one case doesn't mean it's actually the only case. I hope to get to the right change Monday.
std->llvm data structure changes bugfix change #3
llvm-svn: 230835
These tests cover the 'base object' identification and rewritting portion of RewriteStatepointsForGC. These aren't completely exhaustive, but they've proven to be reasonable effective over time at finding regressions.
In the process of porting these tests over, I found my first "cleanup per llvm code style standards" bug. We were relying on the order of iteration when testing the base pointers found for a derived pointer. When we switched from std::set to DenseSet, this stopped being a safe assumption. I'm suspecting I'm going to find more of those. In particular, I'm now really wondering about the main iteration loop for this algorithm. I need to go take a closer look at the assumptions there.
I'm not really happy with the fact these are testing what is essentially debug output (i.e. enabled via command line flags). Suggestions for how to structure this better are very welcome.
llvm-svn: 230818
Essentially the same as the GEP change in r230786.
A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)
import fileinput
import sys
import re
pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
for line in sys.stdin:
sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7649
llvm-svn: 230794
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.
This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.
* This doesn't modify gep operators, only instructions (operators will be
handled separately)
* Textual IR changes only. Bitcode (including upgrade) and changing the
in-memory representation will be in separate changes.
* geps of vectors are transformed as:
getelementptr <4 x float*> %x, ...
->getelementptr float, <4 x float*> %x, ...
Then, once the opaque pointer type is introduced, this will ultimately look
like:
getelementptr float, <4 x ptr> %x
with the unambiguous interpretation that it is a vector of pointers to float.
* address spaces remain on the pointer, not the type:
getelementptr float addrspace(1)* %x
->getelementptr float, float addrspace(1)* %x
Then, eventually:
getelementptr float, ptr addrspace(1) %x
Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.
update.py:
import fileinput
import sys
import re
ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
def conv(match, line):
if not match:
return line
line = match.groups()[0]
if len(match.groups()[5]) == 0:
line += match.groups()[2]
line += match.groups()[3]
line += ", "
line += match.groups()[1]
line += "\n"
return line
for line in sys.stdin:
if line.find("getelementptr ") == line.find("getelementptr inbounds"):
if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
line = conv(re.match(ibrep, line), line)
elif line.find("getelementptr ") != line.find("getelementptr ("):
line = conv(re.match(normrep, line), line)
sys.stdout.write(line)
apply.sh:
for name in "$@"
do
python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
rm -f "$name.tmp"
done
The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh
After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).
The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7636
llvm-svn: 230786
InstCombine has long had logic to convert aligned Altivec load/store intrinsics
into regular loads and stores. This mirrors that functionality for QPX vector
load/store intrinsics.
llvm-svn: 230660
InstCombine has logic to convert aligned Altivec load/store intrinsics into
regular loads and stores. Unfortunately, there seems to be no regression test
covering this behavior. Adding one...
llvm-svn: 230632
Use the IRBuilder helpers for gc.statepoint and gc.result, instead of
coding the construction by hand. Note that the gc.statepoint IRBuilder
handles only CallInst, not InvokeInst; retain that part of hand-coding.
Differential Revision: http://reviews.llvm.org/D7518
llvm-svn: 230591
This is a follow-on to r227491 which tightens the check for propagating FP
values. If a non-constant value happens to be a zero, we would hit the same
bug as before.
Bug noted and patch suggested by Eli Friedman.
llvm-svn: 230564
Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.
Test Plan: make check
Reviewers: jvoung, chandlerc
Subscribers: llvm-commits
llvm-svn: 230560
This change aligns globals to the next highest power of 2 bytes, up to a
maximum of 128. This makes it more likely that we will be able to compress
bit sets with a greater alignment. In many more cases, we can now take
advantage of a new optimization also introduced in this patch that removes
bit set checks if the bit set is all ones.
The 128 byte maximum was found to provide the best tradeoff between instruction
overhead and data overhead in a recent build of Chromium. It allows us to
remove ~2.4MB of instructions at the cost of ~250KB of data.
Differential Revision: http://reviews.llvm.org/D7873
llvm-svn: 230540
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.
Differential Revision: http://reviews.llvm.org/D7778
llvm-svn: 230533
With a diabolically crafted test case, we could recurse
through this code and return true instead of false.
The larger engineering crime is the use of magic numbers.
Added FIXME comments for those.
llvm-svn: 230515
Summary:
This change fixes the FIXME that you recently added when you committed
(a modified version of) my patch. When `InstCombine` combines a load and
store of an pointer to those of an equivalently-sized integer, it currently
drops any `!nonnull` metadata that might be present. This change replaces
`!nonnull` metadata with `!range !{ 1, -1 }` metadata instead.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7621
llvm-svn: 230462
The builder is based on a layout algorithm that tries to keep members of
small bit sets together. The new layout compresses Chromium's bit sets to
around 15% of their original size.
Differential Revision: http://reviews.llvm.org/D7796
llvm-svn: 230394
This case is interesting because ScalarEvolutionExpander lowers min(a,
b) as ~max(~a,~b). I think the profitability heuristics can be made
more clever/aggressive, but this is a start.
Differential Revision: http://reviews.llvm.org/D7821
llvm-svn: 230285
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.
NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279. This commit
message is the correct one.
Differential Revision: http://reviews.llvm.org/D7778
llvm-svn: 230280
230275 got committed with an incorrect commit message due to a mixup
on my side. Will re-land in a few moments with the correct commit
message.
llvm-svn: 230279
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal. {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.
Differential Revision: http://reviews.llvm.org/D7808
llvm-svn: 230275
This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.
Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>
llvm-svn: 230241
calculations. Semantically non-functional change.
This gets rid of some of the SCEV -> Value -> SCEV round tripping and
the Construct(SMin|SMax)Of and MaybeSimplify helper routines.
llvm-svn: 230150
Previously, this pass ran over every function in the Module if added to the pass order. With this change, it runs only over those with a GC attribute where the GC explicitly opts in. A GC can also choose which of entry safepoint polls, backedge safepoint polls, and call safepoints it wants. I hope to get these exposed as checks on the GCStrategy at some point, but for now, the checks are manual string comparisons.
llvm-svn: 230097
Yet another chapter in the endless story. While this looks like we leave
the loop in a non-canonical state this replicates the logic in
LoopSimplify so it doesn't diverge from the canonical form in any way.
PR21968
llvm-svn: 230058
This patch introduces a new mechanism that allows IR modules to co-operatively
build pointer sets corresponding to addresses within a given set of
globals. One particular use case for this is to allow a C++ program to
efficiently verify (at each call site) that a vtable pointer is in the set
of valid vtable pointers for the class or its derived classes. One way of
doing this is for a toolchain component to build, for each class, a bit set
that maps to the memory region allocated for the vtables, such that each 1
bit in the bit set maps to a valid vtable for that class, and lay out the
vtables next to each other, to minimize the total size of the bit sets.
The patch introduces a metadata format for representing pointer sets, an
'@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals
and builds the bitsets, and documents the new feature.
Differential Revision: http://reviews.llvm.org/D7288
llvm-svn: 230054
Before calling Function::getGC to test for enablement, we need to make sure there's actually a GC at all via Function::hasGC. Otherwise, we'd crash on functions without a GC. Thankfully, this only mattered if you manually scheduled the pass, but still, oops. :(
llvm-svn: 230040
This change addresses a deficiency pointed out in PR22629. To copy from the bug
report:
[from the bug report]
Consider this code:
int f(int x) {
int a[] = {12};
return a[x];
}
GCC knows to optimize this to
movl $12, %eax
ret
The code generated by recent Clang at -O3 is:
movslq %edi, %rax
movl .L_ZZ1fiE1a(,%rax,4), %eax
retq
.L_ZZ1fiE1a:
.long 12 # 0xc
[end from the bug report]
This definitely seems worth fixing. I've also seen this kind of code before (as
the base case of generic vector wrapper templates with one element).
The general idea is to look at the GEP feeding a load or a store, which has
some variable as its first non-zero index, and determine if that index must be
zero (or else an out-of-bounds access would occur). We can do this for allocas
and globals with constant initializers where we know the maximum size of the
underlying object. When we find such a GEP, we create a new one for the memory
access with that first variable index replaced with a constant zero.
Even if we can't eliminate the memory access (and sometimes we can't), it is
still useful because it removes unnecessary indexing calculations.
llvm-svn: 229959