Commit Graph

8537 Commits

Author SHA1 Message Date
George Burgess IV 56c7e88c2c Let llvm.objectsize be conservative with null pointers
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.

This fixes PR23277.

Differential Revision: https://reviews.llvm.org/D28494

llvm-svn: 298430
2017-03-21 20:08:59 +00:00
Sanjay Patel 00ece756c3 [InstCombine] auto-generate better checks; NFC
llvm-svn: 298377
2017-03-21 14:04:44 +00:00
Matt Arsenault 6b00d40900 InstCombine: Check source value precision when reducing cast intrinsic
Missed this check when porting from the libcall version.

llvm-svn: 298312
2017-03-20 21:59:24 +00:00
Daniel Berlin fa42a23cfc Add missing updated test from VN coercion changes. Instructions were renamed. NFC
llvm-svn: 298280
2017-03-20 18:04:19 +00:00
Dehao Chen e593049fb0 Updates branch_weights annotation for call instructions during inlining.
Summary: Inliner should update the branch_weights annotation to scale it to proper value.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D30767

llvm-svn: 298270
2017-03-20 16:40:44 +00:00
Craig Topper ff2283ec0e [InstCombine] Use update_test_checks.py to regenerate a test. NFC
llvm-svn: 298227
2017-03-19 17:04:52 +00:00
Xin Tong 967e313078 Remove unused arguments. NFCI
llvm-svn: 298218
2017-03-19 15:31:16 +00:00
Xin Tong d67fb1b66e [JumpThreading] Perform phi-translation in SimplifyPartiallyRedundantLoad.
Summary:
In case we are loading on a phi-load in SimplifyPartiallyRedundantLoad.
Try to phi translate it into incoming values in the predecessors before
we search for available loads.

This needs https://reviews.llvm.org/D30524

Reviewers: davide, sanjoy, efriedma, dberlin, rengolin

Reviewed By: dberlin

Subscribers: junbuml, llvm-commits

Differential Revision: https://reviews.llvm.org/D30543

llvm-svn: 298217
2017-03-19 15:30:53 +00:00
Brian Gesiak 1640e68728 [Analysis] bitreverse(undef) returns undef
Summary:
The reverse of an artbitrary bitpattern is also an arbitrary
bitpattern.

Reviewers: trentxintong, arsenm, majnemer

Reviewed By: majnemer

Subscribers: majnemer, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D31118

llvm-svn: 298201
2017-03-19 04:40:42 +00:00
Daniel Berlin 41b39169e2 NewGVN: Fix PHI evaluation bug exposed by new verifier. We were checking whether the incoming block was reachable instead of whether the specific edge was reachable
llvm-svn: 298187
2017-03-18 15:41:36 +00:00
Rong Xu 661ffe104e [PGO] Add omitted test cases.
llvm-svn: 298115
2017-03-17 20:05:13 +00:00
Rong Xu e60343d6b0 [PGO] Value profile for size of memory intrinsic calls
This patch annotates the valuesites profile to memory intrinsics.

Differential Revision: http://reviews.llvm.org/D31002

llvm-svn: 298110
2017-03-17 18:07:26 +00:00
Stanislav Mekhanoshin ee2dd785f6 Only unswitch loops with uniform conditions
Loop unswitching can be extremely harmful for a SIMT target. In case
if hoisted condition is not uniform a SIMT machine will execute both
clones of a loop sequentially. Therefor LoopUnswitch checks if the
condition is non-divergent.

Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
not needed for non-SIMT targets a new option is added to avoid unneded
analysis initialization. The method getAnalysisUsage is called when
TargetTransformInfo is not yet available and we cannot use it here.
For that reason a new field DivergentTarget is added to PassManagerBuilder
to control the behavior and set this field from a target.

Differential Revision: https://reviews.llvm.org/D30796

llvm-svn: 298104
2017-03-17 17:13:41 +00:00
Sanjoy Das c4e4dcdf64 [RSForGC] Handle vector GEPs
We were not handling getelemenptr instructions of vector type before.
Since getelemenptr instructions for vector types follow the same rule as
getelementptr instructions for non-vector types, we can just handle them
in the same way.

llvm-svn: 298028
2017-03-17 00:55:53 +00:00
Rong Xu 60faea19f8 Resubmit r297897: [PGO] Value profile for size of memory intrinsic calls
R297897 inadvertently enabled annotation for memop profiling. This new patch
fixed it.

llvm-svn: 297996
2017-03-16 21:15:48 +00:00
Adrian Prantl 47ea6478ed Salvage debug info from instructions about to be deleted
[Reapplies r297971 and punting on finding a better API for findDbgValues()]

This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.

In the example in the testcase (which was extracted from XNU) there is a sequence of

 %4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
 %5 = bitcast %struct.entry* %4 to i8*, !dbg !42
 %add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
 %6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
 call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34

When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:

- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic

The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.

rdar://problem/30725338

Differential Revision: https://reviews.llvm.org/D30919

llvm-svn: 297994
2017-03-16 21:14:09 +00:00
Michael Kuperstein 2da2bfa088 [LoopUnroll] Don't peel loops where the latch isn't the exiting block
Peeling assumed this doesn't happen, but didn't check it.
This fixes PR32178.

Differential Revision: https://reviews.llvm.org/D30757

llvm-svn: 297993
2017-03-16 21:07:48 +00:00
Sanjay Patel 6105bb5eaf [InstCombine] avoid breaking up bitcasted vector min/max patterns (PR32306)
As the related tests show, we're not canonicalizing to this form for scalars or vectors yet,
but this solves the immediate problem in:
https://bugs.llvm.org/show_bug.cgi?id=32306

llvm-svn: 297989
2017-03-16 20:42:45 +00:00
Sanjay Patel 634e622069 [InstCombine] add tests for PR32306 and missed min/max canonicalization; NFC
llvm-svn: 297986
2017-03-16 20:31:38 +00:00
Adrian Prantl fa9e84eb6d Revert commit r297971 because of issues reported by msan.
llvm-svn: 297982
2017-03-16 20:11:54 +00:00
Adrian Prantl 4377314a98 Salvage debug info from instructions about to be deleted
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.

In the example in the testcase (which was extracted from XNU) there is a sequence of

  %4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
  %5 = bitcast %struct.entry* %4 to i8*, !dbg !42
  %add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
  %6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
  call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34

When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:

- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic

The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.

rdar://problem/30725338

Differential Revision: https://reviews.llvm.org/D30919

llvm-svn: 297971
2017-03-16 18:22:52 +00:00
Eric Liu 971de62291 Revert "[PGO] Value profile for size of memory intrinsic calls"
This commit reverts r297897 and r297909.

llvm-svn: 297951
2017-03-16 13:16:35 +00:00
Chandler Carruth 814e0df1c5 [PM/Inliner] Fix a bug in r297374 where we would leave stale calls in
the work queue and crash when trying to visit them after deleting the
function containing those calls.

llvm-svn: 297940
2017-03-16 10:45:42 +00:00
Chandler Carruth 6ef42cc6bb [PM/Inliner] Add a test case that encapsulates the core issue addressed
in r297374.

I've extracted a small version of this from the C++ metaprogram Richard
came up with to exercise these kinds of issues and written comments to
describe both how to reproduce a fresh version of the test case and what
likely failure modes are.

The test case is still a bit brittle as it depends on the particular
inline cost modeling and SCC visitation order, but it definitely would
have caught the bug right away when developing things so it seems
a really valuable test case to have.

llvm-svn: 297935
2017-03-16 10:13:55 +00:00
Rong Xu 4ed52798ce [PGO] Value profile for size of memory intrinsic calls
This patch adds the value profile support to profile the size parameter of
memory intrinsic calls: memcpy, memcmp, and memmov.

Differential Revision: http://reviews.llvm.org/D28965

llvm-svn: 297897
2017-03-15 21:47:27 +00:00
Simon Pilgrim 06c70adcf0 [X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars
Prep work for PR31810

llvm-svn: 297876
2017-03-15 19:34:55 +00:00
Fiona Glaser a9bd572b6f MemCpyOptimizer: don't create new addrspace casts
This isn't safe on all targets, and since we don't have a way
to know it's safe, avoid doing it for now.

llvm-svn: 297788
2017-03-14 22:37:38 +00:00
Dehao Chen 4a435e0896 SamplePGO ThinLTO ICP fix for local functions.
Summary:
In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen:

1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName.
2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName

This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining:

1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import.
2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName.
3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote.

Reviewers: mehdi_amini, tejohnson

Reviewed By: tejohnson

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D30754

llvm-svn: 297757
2017-03-14 17:33:01 +00:00
Sanjay Patel 1c8c6a457d [InstCombine] consolidate rem tests and update checks; NFC
llvm-svn: 297747
2017-03-14 16:27:46 +00:00
Sanjay Patel 9deec85c34 [InstCombine] regenerate checks; NFC
llvm-svn: 297746
2017-03-14 16:16:40 +00:00
Oliver Stannard 062041113f [ValueTracking] Out of range shifts might be undef
If it is possible for the RHS of a shift operation to be greater than or equal
to the bit-width, then the result might be undef, and we can't report any known
bits.

In some cases, this was allowing a transformation in instcombine which widened
an undef value from i1 to i32, increasing the range of values that a function
could return.

Differential revision: https://reviews.llvm.org/D30781

llvm-svn: 297724
2017-03-14 10:13:17 +00:00
Jonas Paulsson a48ea231c0 [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.

Tests updates:

Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll

The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.

Review: Hal Finkel
https://reviews.llvm.org/D29540

llvm-svn: 297705
2017-03-14 06:35:36 +00:00
Daniel Berlin 620f86ff2b Add missing condprop-xfail.ll that contains the remaining xfail'd tests
llvm-svn: 297699
2017-03-14 01:46:51 +00:00
Daniel Berlin 2aa23e8881 NewGVN: We pass rle-nonlocal, we just perform the replacement in a way that keeps the old name instead of the new one
llvm-svn: 297683
2017-03-13 22:43:30 +00:00
Sanjay Patel caf369bd03 [SimplifyCFG] move tests for PR31028 from CGP
Hopefully, this will make sense with a forthcoming patch. If not, we can move these back.

llvm-svn: 297660
2017-03-13 19:59:14 +00:00
Matt Arsenault d81f557fe2 AMDGPU: Fold icmp/fcmp into icmp intrinsic
The typical use is a library vote function which
compares to 0. Fold the user condition into the intrinsic.

llvm-svn: 297650
2017-03-13 18:14:02 +00:00
Sanjay Patel 6023a2501c [CGP] add tests for PR31028; NFC
llvm-svn: 297629
2017-03-13 15:45:37 +00:00
Sanjoy Das 3f1e8e0102 Use a WeakVH for UnknownInstructions in AliasSetTracker
Summary:
This change solves the same problem as D30726, except that this only
throws out the bathwater.

AST was not correctly tracking and deleting UnknownInstructions via
handles.  The existing code only tracks "pointers" in its
`ASTCallbackVH`, so an UnknownInstruction (that isn't also def'ing a
pointer used by another memory instruction) never gets a
`ASTCallbackVH`.

There are two other ways to solve this problem:

 - Use the `PointerRec` scheme for both known and unknown instructions.
 - Use a `CallbackVH` that erases the offending Instruction from the
   UnknownInstruction list.

Both of the above changes seemed to be significantly (and unnecessarily
IMO) more complex than this.

Reviewers: chandlerc, dberlin, hfinkel, reames

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D30849

llvm-svn: 297539
2017-03-11 01:15:48 +00:00
Peter Collingbourne 14dcf02fcb WholeProgramDevirt: Implement export/import support for VCP.
Differential Revision: https://reviews.llvm.org/D30017

llvm-svn: 297503
2017-03-10 20:13:58 +00:00
Peter Collingbourne 59675ba0f8 WholeProgramDevirt: Implement export/import support for unique ret val opt.
Differential Revision: https://reviews.llvm.org/D29917

llvm-svn: 297502
2017-03-10 20:09:11 +00:00
Michael Kuperstein 5fb39a7966 [SLP] Revert everything that has to do with memory access sorting.
This reverts r293386, r294027, r294029 and r296411.

Turns out the SLP tree isn't actually a "tree" and we don't handle
accessing the same packet of loads in several different orders well,
causing miscompiles.

Revert until we can fix this properly.

llvm-svn: 297493
2017-03-10 18:59:07 +00:00
Matt Arsenault a3bdd8f27b AMDGPU: Fix insertion point when reducing load intrinsics
The insertion point may be later than the next instruction,
so it is necessary to set it when replacing the call.

llvm-svn: 297439
2017-03-10 05:25:49 +00:00
Daniel Berlin e3e69e1680 NewGVN: Rewrite DCE during elimination so we do it as well as old GVN did.
llvm-svn: 297428
2017-03-10 00:32:33 +00:00
Sanjay Patel 962a8431ea [InstSimplify] allow folds for bool vector div/rem
llvm-svn: 297411
2017-03-09 21:56:03 +00:00
Sanjay Patel 7e56366204 [ConstantFold] vector div/rem with any zero element in divisor is undef
Follow-up for:
https://reviews.llvm.org/D30665
https://reviews.llvm.org/rL297390

llvm-svn: 297409
2017-03-09 20:42:30 +00:00
Matt Arsenault efe949cc67 AMDGPU: Support for SimplifyDemandedVectorElts for load intrinsics
llvm-svn: 297408
2017-03-09 20:34:27 +00:00
Sanjay Patel bb47616aef [InstSimplify] add tests for vector constant folding div/rem-by-0; NFC
llvm-svn: 297407
2017-03-09 20:31:20 +00:00
Sanjay Patel 2b1f6f4b92 [InstSimplify] vector div/rem with any zero element in divisor is undef
This was suggested as a DAG simplification in the review for rL297026 :
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435253.html
...but let's start with IR since we have actual docs for IR (LangRef).

Differential Revision:
https://reviews.llvm.org/D30665

llvm-svn: 297390
2017-03-09 16:20:52 +00:00
Chandler Carruth 20e588e1af [PM/Inliner] Make the new PM's inliner process call edges across an
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.

This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.

Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.

Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;

void g(const char *);

template <bool K, int N> void f(bool *B, bool *E) {
  if (K)
    g(str<N>);
  if (B == E)
    return;
  if (*B)
    f<true, N + 1>(B + 1, E);
  else
    f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }

extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```

When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.

What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.

The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.

We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.

I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.

The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
   really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
   we consider for inlining.

llvm-svn: 297374
2017-03-09 11:35:40 +00:00
Peter Collingbourne 0152c8156b WholeProgramDevirt: Implement importing for uniform ret val opt.
Differential Revision: https://reviews.llvm.org/D29854

llvm-svn: 297350
2017-03-09 01:11:15 +00:00