Summary:
This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.
Credit goes to Jingyue Wu for writing an earlier version of this pass.
Patched by Bjarke Roune.
Test Plan:
This patch adds a set of tests in test/Transforms/SpeculativeExecution/spec.ll
The pass is controlled by a flag which defaults to having the pass not run.
Reviewers: eliben, dberlin, meheff, jingyue, hfinkel
Reviewed By: jingyue, hfinkel
Subscribers: majnemer, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9360
llvm-svn: 237459
This reverts r237453 - it was causing timeouts on some bots. Reverting
while I investigate (it's probably InstCombine fighting itself...)
llvm-svn: 237458
I intended this loop to only unwrap SplitVector actions, but it
was more broad than that, such as unwrapping WidenVector actions,
which makes operations seem legal when they're not.
llvm-svn: 237457
Summary:
Consider (B | i) * S as (B + i) * S if B and i have no bits set in
common.
Test Plan: @or in slsr-mul.ll
Reviewers: broune, meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9788
llvm-svn: 237456
This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:
%1 = icmp slt i32 %a, i32 0
%2 = sext i32 %a to i64
%3 = select i1 %1, i64 %2, i64 0
Would now be canonicalized into:
%1 = icmp slt i32 %a, i32 0
%2 = select i1 %1, i32 %a, i32 0
%3 = sext i32 %2 to i64
This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.
Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.
llvm-svn: 237453
This teaches the min/max idiom detector in ValueTracking to see through
casts such as SExt/ZExt/Trunc. SCEV can already do this, so we're bringing
non-SCEV analyses up to the same level.
The returned LHS/RHS will not match the type of the original SelectInst
any more, so a CastOp is returned too to inform the caller how to
convert to the SelectInst's type.
No in-tree users yet; this will be used by InstCombine in a followup.
llvm-svn: 237452
This has caused some local failures. Updating the test case to be more
like the majority of the similar test cases.
Committing on behalf of Hubert Tong (hstong@ca.ibm.com).
llvm-svn: 237449
collectUpperBound hits an assertion when the back edge count is wider then the desired type.
If that happens, truncate the backedge count.
Patch by Philip Pfaffe!
llvm-svn: 237439
Summary:
To maintain compatibility with GAS, we need to stop treating negative 32-bit immediates as 64-bit values when expanding LI/DLI.
This currently happens because of sign extension.
To do this we need to choose the 32-bit value expansion for values which use their upper 33 bits only for sign extension (i.e. no 0's, only 1's).
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8662
llvm-svn: 237428
This adds new SDNodes for signed/unsigned min/max. These nodes are built from
select/icmp pairs matched at SDAGBuilder stage.
This patch adds the nodes, as well as legalization support and sets them to
be "expand" for all targets.
NFC for now; this will be tested when I switch AArch64 to using these new
nodes.
llvm-svn: 237423
Transfer the calling convention from the invoke being replaced by
PlaceStatepoints to the new invoke to gc.statepoint created. Add a test
case that would have caught this issue.
llvm-svn: 237414
rL236672 would generate all invoke statepoints with deopt args set to a
list containing the single element "0", instead of an empty list.
Also add a test case that would have caught this.
llvm-svn: 237413
Instead of doing that, create a temporary copy of MCTargetOptions and reset its
SanitizeAddress field based on the function's attribute every time an InlineAsm
instruction is emitted in AsmPrinter::EmitInlineAsm.
This is part of the work to remove TargetMachine::resetTargetOptions (the FIXME
added to TargetMachine.cpp in r236009 explains why this function has to be
removed).
Differential Revision: http://reviews.llvm.org/D9570
llvm-svn: 237412
Summary:
Extract method haveNoCommonBitsSet so that we don't have to duplicate this logic in
InstCombine and SeparateConstOffsetFromGEP.
This patch also makes SeparateConstOffsetFromGEP more precise by passing
DominatorTree to computeKnownBits.
Test Plan: value-tracking-domtree.ll that tests ValueTracking indeed leverages dominating conditions
Reviewers: broune, meheff, majnemer
Reviewed By: majnemer
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9734
llvm-svn: 237407
This commit gives the users of the YAML Traits I/O library
the ability to serialize scalars using the YAML literal block
scalar notation by allowing them to implement a specialization
of the `BlockScalarTraits` struct for their custom types.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D9613
llvm-svn: 237404
Summary:
This adds a SHA1 implementation taken from public domain code.
The change is trivial, but as it involves third-party code I'd like
a second pair of eyes before commit.
LibFuzzer can not use SHA1 from openssl because openssl may not be available
and because we may be fuzzing openssl itself.
Using sha1sum via a pipe is too slow.
Test Plan: n/a
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D9733
llvm-svn: 237400
This is to cleanup some redundency generated by LoopUnroll pass. Such redundency may not be cleaned up by existing passes after LoopUnroll.
Differential Revision: http://reviews.llvm.org/D9777
llvm-svn: 237395
The commit r237314 that implements YAML block parsing
introduced a leak that was caught by the ASAN linux buildbot.
YAML Parser stores its tokens in an ilist, and allocates
tokens using a BumpPtrAllocator, but doesn't call the
destructor for the allocated tokens. R237314 added an
std::string field to a Token which leaked as the Token's
destructor wasn't called. This commit fixes this leak
by calling the Token's destructor when a Token is being
removed from an ilist of tokens.
llvm-svn: 237389
The induction variable in the vectorized loop wasn't
recognized properly, so a hardware loop wasn't generated.
Differential Revision: http://reviews.llvm.org/D9722
llvm-svn: 237388
Function 'ConstantFoldScalarCall' (in ConstantFolding.cpp) works under the
wrong assumption that a call to 'convert.from.fp16' returns a value of
type 'float'.
However, intrinsic 'convert.from.fp16' can be overloaded; for example, we
can call 'convert.from.fp16.f64' to convert from half to double; etc.
Before this patch, the following example would have triggered an assertion
failure in opt (with -constprop):
```
define double @foo() {
entry:
%0 = call double @llvm.convert.from.fp16.f64(i16 0)
ret double %0
}
```
This patch fixes the problem in ConstantFolding.cpp. When folding a call to
convert.from.fp16, we perform a different kind of conversion based on the call
return type.
Added test 'Transform/ConstProp/convert-from-fp16.ll'.
Differential Revision: http://reviews.llvm.org/D9771
llvm-svn: 237377
After converting a loop to a hardware loop, the pass should remove
any unnecessary instructions from the old compare-and-branch
code. This patch removes a dead constant assignment that was
used in the compare instruction.
Differential Revision: http://reviews.llvm.org/D9720
llvm-svn: 237373
If the loop trip count may underflow or wrap, the compiler should
not generate a hardware loop since the trip count will be
incorrect.
llvm-svn: 237365
Summary:
When we are trying to fill the delay slot of a call instruction, we must avoid
filler instructions that use the $ra register. This fixes the test
MultiSource/Applications/JM/lencod when we enable the forward delay slot filler.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9670
llvm-svn: 237362
Summary:
This implements the initial version as was proposed earlier this year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html).
Since then Loop Access Analysis was split out from the Loop Vectorizer
and was made into a separate analysis pass. Loop Distribution becomes
the second user of this analysis.
The pass is off by default and can be enabled
with -enable-loop-distribution. There is currently no notion of
profitability; if there is a loop with dependence cycles, the pass will
try to split them off from other memory operations into a separate loop.
I decided to remove the control-dependence calculation from this first
version. This and the issues with the PDT are actively discussed so it
probably makes sense to treat it separately. Right now I just mark all
terminator instruction required which keeps identical CFGs for each
distributed loop. This seems to be working pretty well for 456.hmmer
where even though there is an empty if-then block in the distributed
loop initially, it gets completely removed.
The pass keeps DominatorTree and LoopInfo updated. I've tested this
with -loop-distribute-verify with the testsuite where we distribute ~90
loops. SimplifyLoop is violated in some cases and I have a FIXME
covering this.
Reviewers: hfinkel, nadav, aschwaighofer
Reviewed By: aschwaighofer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8831
llvm-svn: 237358
Summary:
If we only pass the necessary operands, we don't have to determine the position of the symbol operand when entering expandLoadAddressSym().
This simplifies the expandLoadAddressSym() code.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9291
llvm-svn: 237355
i1 type is a legal type on AVX-512 and can be passed as parameter or return value.
i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here).
The result code is similar to the previous X86 targets, where i1 is allways promoted to i8.
llvm-svn: 237350
Found by ubsan. This was taking a bool and left shifting by 32 - the
result is 64 bit, so we should really do the math in a type it fits
in.
llvm-svn: 237345
The outer if had 3 separate conditions ORed together and then the inner ifs detected which of the three conditions it was by using only a portion of the specific condition. Just put the whole condition in each inner if and remove the outer if.
llvm-svn: 237343
Other targets probably should as well. Since r237161, compiler-rt has
both, but I don't see why anything other than gnueabi would use a
gnueabi naming scheme.
llvm-svn: 237324
This commit implements the parsing of YAML block scalars.
Some code existed for it before, but it couldn't parse block
scalars.
This commit adds a new yaml node type to represent the block
scalar values.
This commit also deletes the 'spec-09-27' and 'spec-09-28' tests
as they are identical to the test file 'spec-09-26'.
This commit introduces 3 new utility functions to the YAML scanner
class: `skip_s_space`, `advanceWhile` and `consumeLineBreakIfPresent`.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D9503
llvm-svn: 237314
ArrayRef already has a SFINAE constructor which can construct ArrayRef<const T*> from ArrayRef<T*>.
This adds methods to do the same directly from SmallVector and std::vector. This avoids an intermediate step through the use of makeArrayRef.
Also update the users of this in LICM and SROA to remove the now unnecessary makeArrayRef call.
Reviewed by David Blaikie.
llvm-svn: 237309
llvm-cov was truncating numbers that were larger than a particular
fixed width, which is as confusing as it is useless. Instead, we use
engineering notation with SI prefix for magnitude.
llvm-svn: 237307
This version doesn't need begin/end but can instead just take a type which has begin/end methods.
Use this to replace an eligible foreach loop in LoopInfo found by David Blaikie in r237224.
Reviewed by David Blaikie.
llvm-svn: 237301
If we have a coverage mapping but no profile data for a function,
calling it mismatched is misleading. This can just as easily be
unreachable code that was stripped from the binary. Instead, treat
these the same as functions where we have an explicit "zero" coverage
map by setting the count to zero for each mapped region.
llvm-svn: 237298
There's no need to manually pass modifier strings around to tell an operand how
to print now, that information is encoded in the operand itself since the MC
layer came along.
llvm-svn: 237295
We were creating and propagating two separate indices for each jump table (from
back in the mists of time). However, the generic index used by other backends
is sufficient to emit a unique symbol so this was unneeded.
llvm-svn: 237294
The previous logic mixed 2 separate questions:
+ Can we form a TBB/TBH instruction?
+ Can we remove the jump-table calculation before it?
It then performed a bunch of random tests on the instructions earlier in the
basic block, which were probably sufficient to answer 2 but only because of the
very limited ways in which a t2BR_JT can actually be created.
For example there's no reason to expect the LeaInst to define the same base
register as the following indexing calulation. In practice this means we might
have missed opportunities to form TBB/TBH, in theory you could end up
misidentifying a sequence and removing the wrong LEA:
%R1 = t2LEApcrelJT ...
%R2 = t2LEApcrelJT ...
<... using and killing %R2 ...>
%R2 = t2ADDr %R1, $Ridx
Before we would have looked for an LEA defining %R2 and found the wrong one. We
just got lucky that jump table setup was (almost?) always confined to a single
basic block and there was only one jump table per block.
llvm-svn: 237293
Summary:
This patch teaches the PlaceSafepoints pass about two `CallSite`
function attributes:
* "statepoint-id": if the string value of this attribute can be parsed
as an integer, then it is propagated to the ID parameter of the
statepoint created.
* "statepoint-num-patch-bytes": if the string value of this attribute
can be parsed as an integer, then it is propagated to the `num patch
bytes` parameter of the statepoint created.
This change intentionally does not assert on a malformed value for these
attributes, given that they're not "official" attributes.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9735
llvm-svn: 237286
-dump mode normally omits blob data that contains unprintable characters.
When -show-binary-blobs is passed, it unilaterally escapes all blobs,
allowing those with binary data to be displayed.
llvm-svn: 237276
Avoid running forever by checking we are not reassociating an expression into
the same form.
Tested with @avoid_infinite_loops in nary-add.ll
llvm-svn: 237269
This patch uses the new function profile metadata "function_entry_count"
to annotate entry counts from sample profiles.
In a sampling profile, the total samples collected at the function entry
are an approximation for the number of times that function was invoked.
llvm-svn: 237265
Some compilers warn about using the ternary operator with an unsigned variable
and enum.
I haven't seen this trigger in the llvm.org buildbots yet, but it probably will
at some point.
Reported by Daniel Sanders.
llvm-svn: 237262
Summary:
This adds three Function methods to handle function entry counts:
setEntryCount() and getEntryCount().
Entry counts are stored under the MD_prof metadata node with the name
"function_entry_count". They are unsigned 64 bit values set by profilers
(instrumentation and sample profiler changes coming up).
Added documentation for new profile metadata and tests.
Reviewers: dexonsmith, bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9628
llvm-svn: 237260
The hardware loop pass should try to generate a hardware loop
instruction when the original loop has a critical edge.
Differential Revision: http://reviews.llvm.org/D9678
llvm-svn: 237258
Summary: A side-effect of this is that LA gains proper handling of unsigned and positive signed 16-bit immediates and more accurate error messages.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9290
llvm-svn: 237255
Summary: Also did some minor reformatting in the resulting test.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9702
llvm-svn: 237242