Commit Graph

22597 Commits

Author SHA1 Message Date
Guillaume Chatelet 17380227e8 [Alignment][NFC] Remove LoadInst::setAlignment(unsigned)
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jdoerfert

Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68142

llvm-svn: 373195
2019-09-30 09:37:05 +00:00
Aditya Kumar a6d9d31279 [LLVM-C][Ocaml] Add MergeFunctions and DCE pass
MergeFunctions and DCE pass are missing from OCaml/C-api. This patch
adds them.

Differential Revision: https://reviews.llvm.org/D65071

Reviewers: whitequark, hiraditya, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Tags: #llvm

Authored by: kren1

llvm-svn: 373170
2019-09-29 16:06:22 +00:00
Roman Lebedev d30093bb8a [DivRemPairs] Don't assert that we won't ever get expanded-form rem pairs in different BB's (PR43500)
If we happen to have the same div in two basic blocks,
and in one of those we also happen to have the rem part,
we'd match the div-rem pair, but the wrong ones.
So let's drop overly-ambiguous assert.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43500

llvm-svn: 373167
2019-09-29 15:25:24 +00:00
Alexey Bataev 8b1eeafb91 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 373166
2019-09-29 14:18:06 +00:00
Aditya Kumar 2adae76cc6 [NFC] Move hot cold splitting class to header file
Summary:  This is to facilitate unittests

Reviewers: compnerd, vsk, tejohnson, sebpop, brzycki, SirishP

Reviewed By: tejohnson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68079

llvm-svn: 373151
2019-09-28 18:13:33 +00:00
Wei Mi f0c4e70e95 [SampleFDO] Create a separate flag profile-accurate-for-symsinlist to handle
profile symbol list.

Currently many existing users using profile-sample-accurate want to reduce
code size as much as possible. Their use cases are different from the scenario
profile symbol list tries to handle -- the major motivation of adding profile
symbol list is to get the major memory/code size saving without introduce
performance regression. So to keep the behavior of profile-sample-accurate
unchanged, we think decoupling these two things and using a new flag to
control the handling of profile symbol list may be better.

When profile-sample-accurate and the new flag profile-accurate-for-symsinlist
are both present, since profile-sample-accurate is a user assertion we let it
have a higher precedence.

Differential Revision: https://reviews.llvm.org/D68047

llvm-svn: 373133
2019-09-27 22:33:59 +00:00
Roman Lebedev 269f1bea0d [InstCombine] Simplify shift-by-sext to shift-by-zext
Summary:
This is valid for any `sext` bitwidth pair:
```
Processing /tmp/opt.ll..

----------------------------------------
  %signed = sext %y
  %r = shl %x, %signed
  ret %r
=>
  %unsigned = zext %y
  %r = shl %x, %unsigned
  ret %r
  %signed = sext %y

Done: 2016
Optimization is correct!
```

(This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.)

Main motivation is the C++ semantics:
```
int shl(int a, char b) {
    return a << b;
}
```
ends as
```
  %3 = sext i8 %1 to i32
  %4 = shl i32 %0, %3
```
https://godbolt.org/z/0jgqUq
which is, as this shows, too pessimistic.

There is another problem here - we can only do the fold
if sext is one-use. But we can trivially have cases
where several shifts have the same sext shift amount.
This should be resolved, later.

Reviewers: spatel, nikic, RKSimon

Reviewed By: spatel

Subscribers: efriedma, hiraditya, nlopes, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68103

llvm-svn: 373106
2019-09-27 18:12:15 +00:00
Simon Pilgrim 2e0de86808 ModuleUtils - silence static analyzer dyn_cast<> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us.

llvm-svn: 373099
2019-09-27 16:55:49 +00:00
Simon Pilgrim f71f23d14d FunctionImportGlobalProcessing::processGlobalForThinLTO - silence static analyzer dyn_cast<FunctionSummary> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<FunctionSummary> directly and if not assert will fire for us.

llvm-svn: 373097
2019-09-27 15:49:19 +00:00
Simon Pilgrim 623b0e6963 SCCP - silence static analyzer dyn_cast<StructType> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<StructType> directly and if not assert will fire for us.

llvm-svn: 373095
2019-09-27 15:49:10 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Guillaume Chatelet d886f391af [Alignment][NFC] MaybeAlign in GVNExpression
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67922

llvm-svn: 373054
2019-09-27 08:56:43 +00:00
Peter Collingbourne c336557f02 hwasan: Compatibility fixes for short granules.
We can't use short granules with stack instrumentation when targeting older
API levels because the rest of the system won't understand the short granule
tags stored in shadow memory.

Moreover, we need to be able to let old binaries (which won't understand
short granule tags) run on a new system that supports short granule
tags. Such binaries will call the __hwasan_tag_mismatch function when their
outlined checks fail. We can compensate for the binary's lack of support
for short granules by implementing the short granule part of the check in
the __hwasan_tag_mismatch function. Unfortunately we can't do anything about
inline checks, but I don't believe that we can generate these by default on
aarch64, nor did we do so when the ABI was fixed.

A new function, __hwasan_tag_mismatch_v2, is introduced that lets code
targeting the new runtime avoid redoing the short granule check. Because tag
mismatches are rare this isn't important from a performance perspective; the
main benefit is that it introduces a symbol dependency that prevents binaries
targeting the new runtime from running on older (i.e. incompatible) runtimes.

Differential Revision: https://reviews.llvm.org/D68059

llvm-svn: 373035
2019-09-27 01:02:10 +00:00
Jordan Rupprecht f98d2c099a Revert [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
This reverts r372626 (git commit 6a278d9073)

llvm-svn: 373019
2019-09-26 22:09:17 +00:00
Kit Barton 50bc610460 [LoopFusion] Add ability to fuse guarded loops
Summary:
This patch extends the current capabilities in loop fusion to fuse guarded loops
(as defined in https://reviews.llvm.org/D63885). The patch adds the necessary
safety checks to ensure that it safe to fuse the guarded loops (control flow
equivalent, no intervening code, and same guard conditions). It also provides an
alternative method to perform the actual fusion of guarded loops. The mechanics
to fuse guarded loops are slightly different then fusing non-guarded loops, so I
opted to keep them separate methods. I will be cleaning this up in later
patches, and hope to converge on a single method to fuse both guarded and
non-guarded loops, but for now I think the review will be easier to keep them
separate.

Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65464

llvm-svn: 373018
2019-09-26 21:42:45 +00:00
Zhaoshi Zheng 1128fa0924 [Unroll] Do NOT unroll a loop with small runtime upperbound
For a runtime loop if we can compute its trip count upperbound:

Don't unroll if:
1. loop is not guaranteed to run either zero or upperbound iterations; and
2. trip count upperbound is less than UnrollMaxUpperBound
Unless user or TTI asked to do so.

If unrolling, limit unroll factor to loop's trip count upperbound.

Differential Revision: https://reviews.llvm.org/D62989

Change-Id: I6083c46a9d98b2e22cd855e60523fdc5a4929c73
llvm-svn: 373017
2019-09-26 21:40:27 +00:00
Craig Topper 46721bb7f5 [InstCombine] Use m_Zero instead of isNullValue() when checking if a GEP index is all zeroes to prevent an infinite loop.
The test case here previously infinite looped. Only one element from the GEP is used so SimplifyDemandedVectorElts would replace the other lanes in each index with undef leading to the first index being <0, undef, undef, undef>. But there's a GEP transform that tries to replace an index into a 0 sized type with a zero index. But the zero index check only works on ConstantInt 0 or ConstantAggregateZero so it would turn the index back to zeroinitializer. Resulting in a loop.

The fix is to use m_Zero() to allow a vector of zeroes and undefs.

Differential Revision: https://reviews.llvm.org/D67977

llvm-svn: 373000
2019-09-26 17:20:50 +00:00
Jakub Kuderski d98cb81cd1 Handle successor's PHI node correctly when flattening CFG merges two if-regions
Summary:
FlattenCFG merges two 'if' basicblocks by inserting one basicblock
to another basicblock. The inserted basicblock can have a successor
that contains a PHI node whoes incoming basicblock is the inserted
basicblock. Since the existing code does not handle it, it becomes
a badref.

if (cond1)
  statement
if (cond2)
  statement
successor - contains PHI node whose predecessor is cond2

-->
if (cond1 || cond2)
  statement
(BB for cond2 was deleted)
successor - contains PHI node whose predecessor is cond2 --> bad ref!

Author: Jaebaek Seo

Reviewers: asbirlea, kuhar, tstellar, chandlerc, davide, dexonsmith

Reviewed By: kuhar

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68032

llvm-svn: 372989
2019-09-26 15:20:17 +00:00
Simon Pilgrim c15cd009ac [FlattenCFG] Silence static analyzer dyn_cast<BranchInst> null dereference warnings. NFCI.
The static analyzer is warning about a potential null dereferences, but we should be able to use cast<BranchInst> directly and if not assert will fire for us.

llvm-svn: 372977
2019-09-26 13:33:15 +00:00
Bjorn Pettersson 163c54d288 [InstCombine] Don't assume CmpInst has been visited in getFlippedStrictnessPredicateAndConstant
Summary:
Removing an assumption (assert) that the CmpInst already has been
simplified in getFlippedStrictnessPredicateAndConstant. Solution is
to simply bail out instead of hitting the assertion. Instead we
assume that any profitable rewrite will happen in the next iteration
of InstCombine.

The reason why we can't assume that the CmpInst already has been
simplified is that the worklist does not guarantee such an ordering.

Solves https://bugs.llvm.org/show_bug.cgi?id=43376

Reviewers: spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68022

llvm-svn: 372972
2019-09-26 12:16:01 +00:00
Simon Pilgrim 6b794dfd3d MemorySanitizer - silence static analyzer dyn_cast<> null dereference warnings. NFCI.
The static analyzer is warning about a potential null dereferences, but we should be able to use cast<> directly and if not assert will fire for us.

llvm-svn: 372960
2019-09-26 10:56:14 +00:00
Simon Pilgrim faa5b39e4e PGOMemOPSizeOpt - silence static analyzer dyn_cast<MemIntrinsic> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<MemIntrinsic> directly and if not assert will fire for us.

llvm-svn: 372959
2019-09-26 10:56:07 +00:00
Roman Lebedev a2fa03af3a [InstCombine] foldUnsignedUnderflowCheck(): one last pattern with 'sub' (PR43251)
https://rise4fun.com/Alive/0j9

llvm-svn: 372930
2019-09-25 22:59:59 +00:00
Eli Friedman 69dddfe268 [LICM] Don't verify domtree/loopinfo unless EXPENSIVE_CHECKS is enabled.
For large functions, verifying the whole function after each loop takes
non-linear time.

Differential Revision: https://reviews.llvm.org/D67571

llvm-svn: 372924
2019-09-25 22:35:47 +00:00
Roman Lebedev 23646952e2 [InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0
https://rise4fun.com/Alive/KtL

This also shows that the fold added in D67412 / r372257
was too specific, and the new fold allows those test cases
to be handled more generically, therefore i delete now-dead code.

This is yet again motivated by
D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour"

llvm-svn: 372912
2019-09-25 19:06:40 +00:00
Florian Hahn f3ab99dcf8 [InstCombine] Limit FMul constant folding for fma simplifications.
As @reames pointed out post-commit, rL371518 adds additional rounding
in some cases, when doing constant folding of the multiplication.
This breaks a guarantee llvm.fma makes and must be avoided.

This patch reapplies rL371518, but splits off the simplifications not
requiring rounding from SimplifFMulInst as SimplifyFMAFMul.

Reviewers: spatel, lebedev.ri, reames, scanon

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D67434

llvm-svn: 372899
2019-09-25 17:03:20 +00:00
Florian Hahn 5c3bc3c930 [PatternMatch] Make m_Br more flexible, add matchers for BB values.
Currently m_Br only takes references to BasicBlock*, which limits its
flexibility. For example, you have to declare a variable, even if you
ignore the result or you have to have additional checks to make sure the
matched BB matches an expected one.

This patch adds m_BasicBlock and m_SpecificBB matchers, which can be
used like the existing matchers for constants or values.

I also had a look at the existing uses and updated a few. IMO it makes
the code a bit more explicit.

Reviewers: spatel, craig.topper, RKSimon, majnemer, lebedev.ri

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D68013

llvm-svn: 372885
2019-09-25 15:05:08 +00:00
Philip Reames d9629b88ff [GCRelocate] Add a peephole to canonicalize base pointer relocation
If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting.

llvm-svn: 372771
2019-09-24 17:24:16 +00:00
Roman Lebedev 45fd1e9d50 [InstCombine] (a+b) < a && (a+b) != 0 -> (0-b) < a iff a/b != 0 (PR43259)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

For
```
#include <cassert>
char* test(char& base, signed long offset) {
  __builtin_assume(offset < 0);
  return &base + offset;
}
```
We produce

https://godbolt.org/z/r40U47

and again those two icmp's can be merged:
```
Name: 0
Pre: C != 0
  %adjusted = add i8 %base, C
  %not_null = icmp ne i8 %adjusted, 0
  %no_underflow = icmp ult i8 %adjusted, %base
  %r = and i1 %not_null, %no_underflow
=>
  %neg_offset = sub i8 0, C
  %r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/ALap
https://rise4fun.com/Alive/slnN

There are 3 other variants of this pattern,
i believe they all will go into InstSimplify.

https://bugs.llvm.org/show_bug.cgi?id=43259

Reviewers: spatel, xbolva00, nikic

Reviewed By: spatel

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67849

llvm-svn: 372768
2019-09-24 16:10:50 +00:00
Roman Lebedev 5b881f356c [InstCombine] (a+b) <= a && (a+b) != 0 -> (0-b) < a (PR43259)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

This pattern isn't exactly what we get there
(strict vs. non-strict predicate), but this pattern does not
require known-bits analysis, so it is best to handle it first.

```
Name: 0
  %adjusted = add i8 %base, %offset
  %not_null = icmp ne i8 %adjusted, 0
  %no_underflow = icmp ule i8 %adjusted, %base
  %r = and i1 %not_null, %no_underflow
=>
  %neg_offset = sub i8 0, %offset
  %r = icmp ugt i8 %base, %neg_offset
```
https://rise4fun.com/Alive/knp

There are 3 other variants of this pattern,
they all will go into InstSimplify:
https://rise4fun.com/Alive/bIDZ

https://bugs.llvm.org/show_bug.cgi?id=43259

Reviewers: spatel, xbolva00, nikic

Reviewed By: spatel

Subscribers: hiraditya, majnemer, vsk, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67846

llvm-svn: 372767
2019-09-24 16:10:38 +00:00
Simon Pilgrim 934f18144d LoopVectorize - silence static analyzer dyn_cast<CmpInst> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<CmpInst> directly and if not assert will fire for us.

llvm-svn: 372732
2019-09-24 11:27:38 +00:00
Simon Pilgrim b6d11def37 [SimplifyCFG] FoldTwoEntryPHINode - silence static analyzer null dereference warning. NFCI.
Assert that we've found the DomBlock.

llvm-svn: 372728
2019-09-24 11:17:20 +00:00
Simon Pilgrim 9e8076b219 SimplifyCFG - silence static analyzer dyn_cast<LandingPadInst> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<LandingPadInst> directly and if not assert will fire for us.

llvm-svn: 372727
2019-09-24 11:17:13 +00:00
Simon Pilgrim bc58230e29 SimplifyCFG - silence static analyzer dyn_cast<Instruction> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<Instruction> directly and if not assert will fire for us.

llvm-svn: 372726
2019-09-24 11:17:06 +00:00
Alexey Lapshin 49f3c2b604 [Debuginfo] dbg.value points to undef value after Induction Variable Simplification.
Induction Variable Simplification pass does not update dbg.value intrinsic.

Before:

%add = add nuw nsw i32 %ArgIndex.06, 1
call void @llvm.dbg.value(metadata i32 %add, metadata !17, metadata !DIExpression())

After:

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
call void @llvm.dbg.value(metadata i64 undef, metadata !17, metadata !DIExpression())

There should be:

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
call void @llvm.dbg.value(metadata i64 %indvars.iv.next, metadata !17, metadata !DIExpression())

Differential Revision: https://reviews.llvm.org/D67770

llvm-svn: 372703
2019-09-24 08:47:03 +00:00
Sjoerd Meijer 0fcb3afb40 [LV] Forced vectorization with runtime checks and OptForSize
When vectorisation is forced with a pragma, we optimise for min size, and we
need to emit runtime memory checks, then allow this code growth and don't run
in an assert like we currently do.

This is the result of D65197 and D66803, and was a use-case not really
considered before. If this now happens, we emit an optimisation remark warning
about the code-size expansion, which can be avoided by not forcing
vectorisation or possibly source-code modifications.

Differential Revision: https://reviews.llvm.org/D67764

llvm-svn: 372694
2019-09-24 08:03:34 +00:00
Huihui Zhang a4dd98f2e9 [InstCombine] Fold a shifty implementation of clamp-to-allones.
Summary:
Fold
or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X)
into
X s> Y ? -1 : X

https://rise4fun.com/Alive/d8Ab

clamp255 is a common operator in image processing, can be implemented
in a shifty way "(255 - X) >> 31 | X & 255". Fold shift into select
enables more optimization, e.g., vmin generation for ARM target.

Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon

Reviewed By: lebedev.ri

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67800

llvm-svn: 372678
2019-09-24 00:30:09 +00:00
Huihui Zhang 8952199715 [InstCombine] Fold a shifty implementation of clamp-to-zero.
Summary:
Fold
and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X)
into
X s> Y ? X : 0

https://rise4fun.com/Alive/lFH

Fold shift into select enables more optimization,
e.g., vmax generation for ARM target.

Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon

Reviewed By: lebedev.ri

Subscribers: xbolva00, andreadb, craig.topper, RKSimon, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67799

llvm-svn: 372676
2019-09-24 00:15:03 +00:00
Saleem Abdulrasool 082f895b1a HotColdSplitting: invalidate the AssumptionCache on split
When a cold path is outlined, the value tracking in the assumption cache may be
invalidated due to the code motion.  We would previously trip an assertion in
subsequent passes (but required the passes to happen in a single run as the
assumption cache is shared across the passes).  Invalidating the cache ensures
that we get the correct information when needed with the legacy pass manager as
well.

llvm-svn: 372667
2019-09-23 22:23:01 +00:00
Wei Mi 22fd88530b [SampleFDO] Treat names in profile as not cold only when profile symbol list
is available

In rL372232, we treated names showing up in profile as not cold when
profile-sample-accurate is enabled. This caused 70k size regression in
Chrome/Android. The patch put a guard and only enable the change when
profile symbol list is available, i.e., keep the old behavior when profile
symbol list is not available.

Differential Revision: https://reviews.llvm.org/D67931

llvm-svn: 372665
2019-09-23 22:11:35 +00:00
Roman Lebedev 23aac95a32 [InstCombine] foldOrOfICmps(): Acquire SimplifyQuery with set CxtI
Extracted from https://reviews.llvm.org/D67849#inline-610377

llvm-svn: 372654
2019-09-23 20:40:47 +00:00
Roman Lebedev 595cfda059 [InstCombine] foldAndOfICmps(): Acquire SimplifyQuery with set CxtI
Extracted from https://reviews.llvm.org/D67849#inline-610377

llvm-svn: 372653
2019-09-23 20:40:40 +00:00
David Bolvansky 48db0272d6 [InstCombine] Annotate strndup calls with dereferenceable_or_null
"Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size."

llvm-svn: 372647
2019-09-23 19:55:45 +00:00
Roman Lebedev 47e1ce4abe [IR] Add getExtendedType() to IntegerType and Type (dispatching to IntegerType or VectorType)
llvm-svn: 372638
2019-09-23 18:21:33 +00:00
Roman Lebedev 1972327d63 [InstCombine] dropRedundantMaskingOfLeftShiftInput(): improve comment
llvm-svn: 372637
2019-09-23 18:21:14 +00:00
David Bolvansky 8d52016155 [SLC] Convert some strndup calls to strdup calls
Summary:
Motivation:
- If we can fold it to strdup, we should (strndup does more things than strdup).
- Annotation mechanism. (Works for strdup well).

strdup and strndup are part of C 20 (currently posix fns), so we should optimize them.

Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67679

llvm-svn: 372636
2019-09-23 18:20:01 +00:00
Roman Lebedev 0a51e1f66d [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. c/d/e with mask (PR42563)
Summary:
If we have a pattern `(x & (-1 >> maskNbits)) << shiftNbits`,
we already know (have a fold) that will drop the `& (-1 >> maskNbits)`
mask iff `(shiftNbits-maskNbits) s>= 0` (i.e. `shiftNbits u>= maskNbits`).

So even if `(shiftNbits-maskNbits) s< 0`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: c, normal+mask
  %t0 = lshr i32 -1, C1
  %t1 = and i32 %t0, %x
  %r = shl i32 %t1, C2
=>
  %n0 = shl i32 %x, C2
  %n1 = i32 ((-(C2-C1))+32)
  %n2 = zext i32 %n1 to i64
  %n3 = lshr i64 -1, %n2
  %n4 = trunc i64 %n3 to i32
  %r = and i32 %n0, %n4
```
https://rise4fun.com/Alive/gslRa

Naturally, old `%masked` will have to be one-use.
This is not valid for pattern f - where "masking" is done via `ashr`.

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67725

llvm-svn: 372630
2019-09-23 17:04:28 +00:00
Roman Lebedev b4a1d8a84c [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563)
Summary:
And this is **finally** the interesting part of that fold!

If we have a pattern `(x & (~(-1 << maskNbits))) << shiftNbits`,
we already know (have a fold) that will drop the `& (~(-1 << maskNbits))`
mask iff `(maskNbits+shiftNbits) u>= bitwidth(x)`.
But that is actually ignorant, there's more general fold here:

In this pattern, `(maskNbits+shiftNbits)` actually correlates
with the number of low bits that will remain in the final value.
So even if `(maskNbits+shiftNbits) u< bitwidth(x)`, we can still
fold, we will just need to apply a **constant** mask afterwards:
```
Name: a, normal+mask
  %onebit = shl i32 -1, C1
  %mask = xor i32 %onebit, -1
  %masked = and i32 %mask, %x
  %r = shl i32 %masked, C2
=>
  %n0 = shl i32 %x, C2
  %n1 = add i32 C1, C2
  %n2 = zext i32 %n1 to i64
  %n3 = shl i64 -1, %n2
  %n4 = xor i64 %n3, -1
  %n5 = trunc i64 %n4 to i32
  %r = and i32 %n0, %n5
```
https://rise4fun.com/Alive/F5R

Naturally, old `%masked` will have to be one-use.
Similar fold exists for patterns c,d,e, will post patch later.

https://bugs.llvm.org/show_bug.cgi?id=42563

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67677

llvm-svn: 372629
2019-09-23 17:04:14 +00:00
Alexey Bataev 6a278d9073 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Summary:
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 372626
2019-09-23 16:25:03 +00:00
Roman Lebedev 01ac23ca62 [InstCombine] foldUnsignedUnderflowCheck(): s/Subtracted/ZeroCmpOp/
llvm-svn: 372625
2019-09-23 16:04:32 +00:00
David Bolvansky d90fd41f7e [FunctionAttrs] Enable nonnull arg propagation
Enable flag introduced in rL294998. Security concerns are no longer valid, since function signatures for mentioned libc functions has no nonnull attribute (Clang does not generate them? I see no nonnull attr in LLVM IR for these functions) and since rL372091 we carefully annotate the callsites where we know that size is static, non zero. So let's enable this flag again..

llvm-svn: 372573
2019-09-23 09:58:02 +00:00
Simon Pilgrim 2441455bc8 [LSR] Silence static analyzer null dereference warnings with assertions. NFCI.
Add assertions to make it clear that GenerateIVChain / NarrowSearchSpaceByPickingWinnerRegs should succeed in finding non-null values

llvm-svn: 372518
2019-09-22 17:59:24 +00:00
Simon Pilgrim db05a482bc ConstantHoisting - Silence static analyzer dyn_cast<PointerType> null dereference warning. NFCI.
llvm-svn: 372517
2019-09-22 17:45:05 +00:00
Sanjay Patel eb8d39e113 [InstCombine] allow icmp+binop folds before min/max bailout (PR43310)
This has the potential to uncover missed analysis/folds as shown in the
min/max code comment/test, but fewer restrictions on icmp folds should
be better in general to solve cases like:
https://bugs.llvm.org/show_bug.cgi?id=43310

llvm-svn: 372510
2019-09-22 14:31:53 +00:00
Simon Pilgrim a56bd6c51e [VPlan] Silence static analyzer dyn_cast null dereference warning. NFCI.
llvm-svn: 372502
2019-09-22 13:02:00 +00:00
Suyog Sarda cd629ea0a8 SROA: Check Total Bits of vector type
While Promoting alloca instruction of Vector Type, 
Check total size in bits of its slices too.
If they don't match, don't promote the alloca instruction.

Bug : https://bugs.llvm.org/show_bug.cgi?id=42585

llvm-svn: 372480
2019-09-21 18:16:37 +00:00
Suyog Sarda c62136e674 Test mail. NFC.
Testing commit acces. NFC.

llvm-svn: 372479
2019-09-21 18:03:30 +00:00
Hideto Ueno 63f6066b53 [Attributor] Implement "norecurse" function attribute deduction
Summary:
This patch introduces `norecurse` function attribute deduction.

`norecurse` will be deduced if the following conditions hold:
* The size of SCC in which the function belongs equals to 1.
* The function doesn't have self-recursion.
* We have `norecurse` for all call site.

To avoid a large change, SCC is calculated using scc_iterator in InfoCache initialization for now.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67751

llvm-svn: 372475
2019-09-21 15:13:19 +00:00
Simon Pilgrim 2e0c95edfe [AddressSanitizer] Don't dereference dyn_cast<ConstantInt> results. NFCI.
The static analyzer is warning about potential null dereference, but we can use cast<ConstantInt> directly and if not assert will fire for us.

llvm-svn: 372429
2019-09-20 20:52:21 +00:00
Akira Hatanaka 75fbb171c3 [ObjC][ARC] Skip debug instructions when computing the insert point of
objc_release calls

This fixes a bug where the presence of debug instructions would cause
ARC optimizer to change the order of retain and release calls.

rdar://problem/55319419

llvm-svn: 372352
2019-09-19 20:58:51 +00:00
Jakub Kuderski e6b2164723 Don't use invalidated iterators in FlattenCFGPass
Summary:
FlattenCFG may erase unnecessary blocks, which also invalidates iterators to those erased blocks.
Before this patch, `iterativelyFlattenCFG` could try to increment a BB iterator after that BB has been removed and crash.

This patch makes FlattenCFGPass use `WeakVH` to skip over erased blocks.

Reviewers: dblaikie, tstellar, davide, sanjoy, asbirlea, grosser

Reviewed By: asbirlea

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67672

llvm-svn: 372347
2019-09-19 19:39:42 +00:00
Roman Lebedev 7a67ed5795 [InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251)
Summary:
This is again motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

In this particular case, given
```
char* test(char& base, unsigned long offset) {
  return &base - offset;
}
```
it will end up producing something like
https://godbolt.org/z/luGEju
which after optimizations reduces down to roughly
```
declare void @use64(i64)
define i1 @test(i8* dereferenceable(1) %base, i64 %offset) {
  %base_int = ptrtoint i8* %base to i64
  %adjusted = sub i64 %base_int, %offset
  call void @use64(i64 %adjusted)
  %not_null = icmp ne i64 %adjusted, 0
  %no_underflow = icmp ule i64 %adjusted, %base_int
  %no_underflow_and_not_null = and i1 %not_null, %no_underflow
  ret i1 %no_underflow_and_not_null
}
```
Without D67122 there was no `%not_null`,
and in this particular case we can "get rid of it", by merging two checks:
Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset`

Alive proofs:
https://rise4fun.com/Alive/QOs

The `@llvm.usub.with.overflow` pattern itself is not handled here
because this is the main pattern, that we currently consider canonical.

https://bugs.llvm.org/show_bug.cgi?id=43251

Reviewers: spatel, nikic, xbolva00, majnemer

Reviewed By: xbolva00, majnemer

Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67356

llvm-svn: 372341
2019-09-19 17:25:19 +00:00
Sanjay Patel 13e71ce693 [Float2Int] avoid crashing on unreachable code (PR38502)
In the example from:
https://bugs.llvm.org/show_bug.cgi?id=38502
...we hit infinite looping/crashing because we have non-standard IR -
an instruction operand is used before defined.
This and other unusual constructs are allowed in unreachable blocks,
so avoid the problem by using DominatorTree to step around landmines.

Differential Revision: https://reviews.llvm.org/D67766

llvm-svn: 372339
2019-09-19 16:31:17 +00:00
Serguei Katkov a44768858c [Unroll] Add an option to control complete unrolling
Add an ability to specify the max full unroll count for LoopUnrollPass pass
in pass options.

Reviewers: fhahn, fedor.sergeev
Reviewed By: fedor.sergeev
Subscribers: hiraditya, zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D67701

llvm-svn: 372305
2019-09-19 06:57:29 +00:00
Roman Lebedev feea722cf3 [SimplifyCFG] mergeConditionalStoreToAddress(): try to pacify MSAN
MSAN bot complains that there is use-of-uninitialized-value
of this FreeStores later in IsWorthwhile().
Perhaps FreeStores needs to be stored in a vector?

llvm-svn: 372262
2019-09-18 21:04:39 +00:00
Roman Lebedev b646dd92c2 [InstCombine] foldUnsignedUnderflowCheck(): handle last few cases (PR43251)
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.

This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.

The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..

Proofs: https://rise4fun.com/Alive/21b

Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)

With this, i see no other missing folds for
https://bugs.llvm.org/show_bug.cgi?id=43251

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67412

llvm-svn: 372257
2019-09-18 20:10:07 +00:00
Roman Lebedev dd0170ab24 [SimplifyCFG] mergeConditionalStoreToAddress(): consider cost, not instruction count
Summary:
As it can be see in the changed test, while `div` is really costly,
we were speculating it. This does not seem correct.

Also, the old code would run for every single insturuction in BB,
instead of eagerly bailing out as soon as there are too many instructions.

This function still has a problem that `PHINodeFoldingThreshold` is
per-basic-block, while it should be for all the basic blocks.

Reviewers: efriedma, craig.topper, dmgreen, jmolloy

Reviewed By: jmolloy

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67315

llvm-svn: 372255
2019-09-18 19:46:57 +00:00
Roman Lebedev ba4cad9039 [InstCombine] dropRedundantMaskingOfLeftShiftInput(): some cleanup before upcoming patch
llvm-svn: 372245
2019-09-18 18:38:40 +00:00
Wei Mi 5f7e822dc7 [SampleFDO] Minimize performance impact when profile-sample-accurate
is enabled.

We can save memory and reduce binary size significantly by enabling
ProfileSampleAccurate. However when ProfileSampleAccurate is true,
function without sample will be regarded as cold and this could
potentially cause performance regression.

To minimize the potential negative performance impact, we want to be
a little conservative here saying if a function shows up in the profile,
no matter as outline instance, inline instance or call targets, treat
the function as not being cold. This will handle the cases such as most
callsites of a function are inlined in sampled binary (thus outline copy
don't get any sample) but not inlined in current build (because of source
code drift, imprecise debug information, or the callsites are all cold
individually but not cold accumulatively...), so that the outline function
showing up as cold in sampled binary will actually not be cold after current
build. After the change, such function will be treated as not cold even
profile-sample-accurate is enabled.

At the same time we lower the hot criteria of callsiteIsHot check when
profile-sample-accurate is enabled. callsiteIsHot is used to determined
whether a callsite is hot and qualified for early inlining. When
profile-sample-accurate is enabled, functions without profile will be
regarded as cold and much less inlining will happen in CGSCC inlining pass,
so we can worry less about size increase and be aggressive to allow more
early inlining to happen for warm callsites and it is helpful for performance
overall.

Differential Revision: https://reviews.llvm.org/D67561

llvm-svn: 372232
2019-09-18 16:06:28 +00:00
Sanjay Patel d46bf63fbb [SimplifyLibCalls] fix crash with empty function name (PR43347)
...and improve some variable names while here.

https://bugs.llvm.org/show_bug.cgi?id=43347

llvm-svn: 372227
2019-09-18 14:33:40 +00:00
Teresa Johnson fd2044f299 [PGO] Change hardcoded thresholds for cold/inlinehint to use summary
Summary:
The PGO counter reading will add cold and inlinehint (hot) attributes
to functions that are very cold or hot. This was using hardcoded
thresholds, instead of the profile summary cutoffs which are used in
other hot/cold detection and are more dynamic and adaptable. Switch
to using the summary-based cold/hot detection.

The hardcoded limits were causing some code that had a medium level of
hotness (per the summary) to be incorrectly marked with a cold
attribute, blocking inlining.

Reviewers: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67673

llvm-svn: 372189
2019-09-17 23:12:13 +00:00
Reid Kleckner 23e872a3d0 [PGO] Don't use comdat groups for counters & data on COFF
For COFF, a comdat group is really a symbol marked
IMAGE_COMDAT_SELECT_ANY and zero or more other symbols marked
IMAGE_COMDAT_SELECT_ASSOCIATIVE. Typically the associative symbols in
the group are not external and are not referenced by other TUs, they are
things like debug info, C++ dynamic initializers, or other section
registration schemes. The Visual C++ linker reports a duplicate symbol
error for symbols marked IMAGE_COMDAT_SELECT_ASSOCIATIVE even if they
would be discarded after handling the leader symbol.

Fixes coverage-inline.cpp in check-profile after r372020.

llvm-svn: 372182
2019-09-17 21:10:49 +00:00
Roman Lebedev 97bc5ae993 [NFC][InstCombine] dropRedundantMaskingOfLeftShiftInput(): some NFC diff shaving
llvm-svn: 372171
2019-09-17 19:32:26 +00:00
David Bolvansky 0c0de794f1 Reland "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)"
llvm-svn: 372142
2019-09-17 17:12:24 +00:00
Krasimir Georgiev bdff164e0e Revert "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)"
Summary:
This reverts commit r372101.

Causes ASAN build bot failures:

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176
From http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176/steps/64-bit%20check-asan/logs/stdio:

```
[ RUN      ] AddressSanitizer.StrNCatOOBTest
/home/buildbots/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/lib/asan/tests/asan_str_test.cpp:462: Failure
Death test: strncat(to - 1, from, 0)
    Result: failed to die.
```

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67658

llvm-svn: 372125
2019-09-17 14:15:23 +00:00
Simon Pilgrim a2719f38c1 [LoopVectorize] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results, we can use cast<> directly as we know that these cases should all be CastInst, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 372116
2019-09-17 13:24:54 +00:00
Benjamin Kramer df4b9a3f4f Hide implementation details in namespaces.
llvm-svn: 372113
2019-09-17 12:56:29 +00:00
Johannes Doerfert 3ab9e8b818 [Attributor][Fix] Initialize the cache prior to using it
Summary:
There were segfaults as we modified and iterated the instruction maps in
the cache at the same time. This was happening because we created new
instructions while we populated the cache. This fix changes the order
in which we perform these actions. First, the caches for the whole
module are created, then we start to create abstract attributes.

I don't have a unit test but the LLVM test suite exposes this problem.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67232

llvm-svn: 372105
2019-09-17 10:52:41 +00:00
David Bolvansky ded48e93e6 [SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)
llvm-svn: 372101
2019-09-17 10:25:38 +00:00
David Bolvansky be2487a2ba [InstCombine] Annotate strdup with deref_or_null
llvm-svn: 372098
2019-09-17 10:12:48 +00:00
David Bolvansky 3a3dddd9d7 [NFCI] Fixed buildbots
llvm-svn: 372097
2019-09-17 10:03:45 +00:00
Fangrui Song 8351763709 [SimplifyLibCalls] Fix -Wunused-result after D53342/r372091
llvm-svn: 372096
2019-09-17 09:56:55 +00:00
David Bolvansky e80fcf0340 [SimplifyLibCalls] Mark known arguments with nonnull
Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: ychen, rsmith, joerg, aaron.ballman, lebedev.ri, uenoku, jdoerfert, hfinkel, javed.absar, spatel, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D53342

llvm-svn: 372091
2019-09-17 09:32:52 +00:00
Florian Hahn 1bd58870e5 [LoopUnroll] Use LoopSize+1 as threshold, to allow unrolling loops matching LoopSize.
We use `< UP.Threshold` later on, so we should use LoopSize + 1, to
allow unrolling if the result won't exceed to loop size.

Fixes PR43305.

Reviewers: efriedma, dmgreen, paquette

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D67594

llvm-svn: 372084
2019-09-17 09:02:48 +00:00
Hideto Ueno 30d86f1858 [Attributor] Use Alias Analysis in noalias callsite argument deduction
Summary: This patch adds a check of alias analysis in `noalias` callsite argument deduction.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67604

llvm-svn: 372075
2019-09-17 06:53:27 +00:00
Hideto Ueno 3bb5cbc20b [Attributor] Create helper struct for handling analysis getters
Summary: This patch introduces a helper struct `AnalysisGetter` to put together analysis getters. In this patch, a getter for `AAResult` is also added for  `noalias`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67603

llvm-svn: 372072
2019-09-17 05:45:18 +00:00
Reid Kleckner 32837a0c93 [PGO] Use linkonce_odr linkage for __profd_ variables in comdat groups
This fixes relocations against __profd_ symbols in discarded sections,
which is PR41380.

In general, instrumentation happens very early, and optimization and
inlining happens afterwards. The counters for a function are calculated
early, and after inlining, counters for an inlined function may be
widely referenced by other functions.

For C++ inline functions of all kinds (linkonce_odr &
available_externally mainly), instr profiling wants to deduplicate these
__profc_ and __profd_ globals. Otherwise the binary would be quite
large.

I made __profd_ and __profc_ comdat in r355044, but I chose to make
__profd_ internal. At the time, I was only dealing with coverage, and in
that case, none of the instrumentation needs to reference __profd_.
However, if you use PGO, then instrumentation passes add calls to
__llvm_profile_instrument_range which reference __profd_ globals. The
solution is to make these globals externally visible by using
linkonce_odr linkage for data as was done for counters.

This is safe because PGO adds a CFG hash to the names of the data and
counter globals, so if different TUs have different globals, they will
get different data and counter arrays.

Reviewers: xur, hans

Differential Revision: https://reviews.llvm.org/D67579

llvm-svn: 372020
2019-09-16 18:49:09 +00:00
Roman Lebedev 10151f6618 [SimplifyCFG] FoldTwoEntryPHINode(): consider *total* speculation cost, not per-BB cost
Summary:
Previously, if the threshold was 2, we were willing to speculatively
execute 2 cheap instructions in both basic blocks (thus we were willing
to speculatively execute cost = 4), but weren't willing to speculate
when one BB had 3 instructions and other one had no instructions,
even thought that would have total cost of 3.

This looks inconsistent to me.
I don't think `cmov`-like instructions will start executing
until both of it's inputs are available: https://godbolt.org/z/zgHePf
So i don't see why the existing behavior is the correct one.

Also, let's add it's own `cl::opt` for this threshold,
with default=4, so it is not stricter than the previous threshold:
will allow to fold when there are 2 BB's each with cost=2.
And since the logic has changed, it will also allow to fold when
one BB has cost=3 and other cost=1, or there is only one BB with cost=4.

This is an alternative solution to D65148:
This fix is mainly motivated by `signbit-like-value-extension.ll` test.
That pattern comes up in JPEG decoding, see e.g.
`Figure F.12 – Extending the sign bit of a decoded value in V`
of `ITU T.81` (JPEG specification).
That branch is not predictable, and it is within the innermost loop,
so the fact that that pattern ends up being stuck with a branch
instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial.

This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240)
| metric                                 |     old |     new | delta |      % |
| x86-mi-counting.NumMachineFunctions    |   37720 |   37721 |     1 |  0.00% |
| x86-mi-counting.NumMachineBasicBlocks  |  773545 |  771181 | -2364 | -0.31% |
| x86-mi-counting.NumMachineInstructions | 7488843 | 7486442 | -2401 | -0.03% |
| x86-mi-counting.NumUncondBR            |  135770 |  135543 |  -227 | -0.17% |
| x86-mi-counting.NumCondBR              |  423753 |  422187 | -1566 | -0.37% |
| x86-mi-counting.NumCMOV                |   24815 |   25731 |   916 |  3.69% |
| x86-mi-counting.NumVecBlend            |      17 |      17 |     0 |  0.00% |

We significantly decrease basic block count, notably decrease instruction count,
significantly decrease branch count and very significantly increase `cmov` count.

Performance-wise, unsurprisingly, this has great effect on
target RawSpeed benchmark. I'm seeing 5 **major** improvements:
```
Benchmark                                                                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 49 vs 49
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean                                  -0.3064         -0.3064      226.9913      157.4452      226.9800      157.4384
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median                                -0.3057         -0.3057      226.8407      157.4926      226.8282      157.4828
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev                                -0.4985         -0.4954        0.3051        0.1530        0.3040        0.1534
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean                                   -0.1747         -0.1747       80.4787       66.4227       80.4771       66.4146
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median                                 -0.1742         -0.1743       80.4686       66.4542       80.4690       66.4436
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev                                 +0.6089         +0.5797        0.0670        0.1078        0.0673        0.1062
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 49 vs 49
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean                                  -0.1598         -0.1598      171.6996      144.2575      171.6915      144.2538
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median                                -0.1598         -0.1597      171.7109      144.2755      171.7018      144.2766
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev                                +0.4024         +0.3850        0.0847        0.1187        0.0848        0.1175
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean                                   -0.0550         -0.0551      280.3046      264.8800      280.3017      264.8559
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median                                 -0.0554         -0.0554      280.2628      264.7360      280.2574      264.7297
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev                                 +0.7005         +0.7041        0.2779        0.4725        0.2775        0.4729
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean                                   -0.0354         -0.0355      316.7396      305.5208      316.7342      305.4890
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median                                 -0.0354         -0.0356      316.6969      305.4798      316.6917      305.4324
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev                                 +0.0493         +0.0330        0.3562        0.3737        0.3563        0.3681
```

That being said, it's always best-effort, so there will likely
be cases where this worsens things.

Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc

Reviewed By: jmolloy

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67318

llvm-svn: 372009
2019-09-16 16:18:24 +00:00
Sanjay Patel 3961a143e1 [InstCombine] remove unneeded one-use checks for icmp fold
Related folds were added in:
rL125734
...the code comment about register pressure is discussed in
more detail in:
https://bugs.llvm.org/show_bug.cgi?id=2698

But 10 years later, perf testing bzip2 with this change now
shows a slight (0.2% average) improvement on Haswell although
that's probably within test noise.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

rL371940 and rL371981 are related patches in this series.

llvm-svn: 372007
2019-09-16 16:15:25 +00:00
Sanjay Patel c5cd808156 [InstCombine] remove unneeded one-use checks for icmp fold
This fold and several others were added in:
rL125734 <https://reviews.llvm.org/rL125734>
...with no explanation for the one-use checks other than the code
comments about register pressure.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

rL371940 is a related patch in this series.

llvm-svn: 371981
2019-09-16 12:54:34 +00:00
Sanjay Patel 91c2cd0691 [InstCombine] fix comments to match code; NFC
This blob was written before match() existed, so it
could probably be reduced significantly.

But I suspect it isn't well tested, so tests would have
to be added to reduce risk from logic changes.

llvm-svn: 371978
2019-09-16 12:12:05 +00:00
Simon Pilgrim 1aaefbca24 [VPlanSLP] Don't dereference a cast_or_null<VPInstruction> result. NFCI.
The static analyzer is warning about a potential null dereference of the cast_or_null result, I've split the cast_or_null check from the ->getUnderlyingInstr() call to avoid this, but it appears that we weren't seeing any null pointers in the dumped bundles in the first place.

llvm-svn: 371975
2019-09-16 11:22:44 +00:00
Simon Pilgrim bfe6b35c70 [SLPVectorizer] Assert that we find a LastInst to silence analyzer null dereference warning. NFCI.
llvm-svn: 371974
2019-09-16 10:48:16 +00:00
Simon Pilgrim ae625d70cd [SLPVectorizer] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 371973
2019-09-16 10:35:09 +00:00
Stefan Stipanovic 431141c5cc [Attributor] Heap-To-Stack Conversion
D53362 gives a prototype heap-to-stack conversion pass. With addition of new attributes in the attributor, this can now be revisted and improved. This will place it in the Attributor to make it easier to use new attributes (eg. nofree, nosync, willreturn, etc.) and other attributor features.

Reviewers: jdoerfert, uenoku, hfinkel, efriedma

Subscribers: lebedev.ri, xbolva00, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D65408

llvm-svn: 371942
2019-09-15 21:47:41 +00:00
Sanjay Patel 3daf168fa9 [InstCombine] remove unneeded one-use checks for icmp fold
This fold and several others were added in:
rL125734
...with no explanation for the one-use checks other than the code
comments about register pressure.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

There are similar checks as noted with the TODO comments. I'm
hoping to remove those restrictions too, but if any of these
does cause a regression, it should be easier to correct by making
small, individual commits.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

llvm-svn: 371940
2019-09-15 20:56:34 +00:00
Simon Pilgrim 4e46ea3946 [LoadStoreVectorizer] vectorizeLoadChain - ensure we find a valid Type down the load chain. NFCI.
Silence static analyzer uninitialized variable warning by setting the LoadTy to null and then asserting we find a real value.

llvm-svn: 371936
2019-09-15 16:44:35 +00:00
Sanjay Patel b6a0faaa0c [SLP] limit vectorization of Constant subclasses (PR33958)
This is a fix for:
https://bugs.llvm.org/show_bug.cgi?id=33958

It seems universally true that we would not want to transform this kind of
sequence on any target, but if that's not correct, then we could view this
as a target-specific cost model problem. We could also white-list ConstantInt,
ConstantFP, etc. rather than blacklist Global and ConstantExpr.

Differential Revision: https://reviews.llvm.org/D67362

llvm-svn: 371931
2019-09-15 13:03:24 +00:00
Johannes Doerfert e7c6f97039 [Attributor][Fix] Use right type to replace expressions
Summary: This should be obsolete once the functionality in D66967 is integrated.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67231

llvm-svn: 371915
2019-09-14 02:57:50 +00:00
Florian Hahn cde8343d85 [BasicBlockUtils] Add optional BBName argument, in line with BB:splitBasicBlock
Reviewers: spatel, asbirlea, craig.topper

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D67521

llvm-svn: 371819
2019-09-13 08:03:32 +00:00
Alina Sbirlea 6943472d45 [MemorySSA] Pass (for update) MSSAU when hoisting instructions.
Summary: Pass MSSAU to makeLoopInvariant in order to properly update MSSA.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, uabelho, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67470

llvm-svn: 371748
2019-09-12 17:12:51 +00:00
Sanjay Patel 2bfb955c51 [InstCombine] rename variable for readability; NFC
There's more that can be done here, but "OpI"
doesn't convey that we casted to BinaryOperator.

llvm-svn: 371682
2019-09-11 22:31:34 +00:00
Eli Friedman 403e08d4cf [ConstantHoisting] Fix non-determinism.
Differential Revision: https://reviews.llvm.org/D66114

llvm-svn: 371644
2019-09-11 18:55:00 +00:00
Petr Hosek 7bdad08429 Reland "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300

We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.

We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.

A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.

In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect

Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324

llvm-svn: 371635
2019-09-11 16:19:50 +00:00
Florian Hahn 51de22c8ee Revert [InstCombine] Use SimplifyFMulInst to simplify multiply in fma.
This introduces additional rounding error in some cases. See D67434.

This reverts r371518 (git commit 18a1f0818b)

llvm-svn: 371634
2019-09-11 16:17:03 +00:00
Whitney Tsang 1ccba7c1a1 LLVM: Optimization Pass: Remove conflicting attribute, if any, before
adding new read attribute to an argument
Summary: Update optimization pass to prevent adding read-attribute to an
argument without removing its conflicting attribute.

A read attribute, based on the result of the attribute deduction
process, might be added to an argument. The attribute might be in
conflict with other read/write attribute currently associated with the
argument. To ensure the compatibility of attributes, conflicting
attribute, if any, must be removed before a new one is added.

The following snippet shows the current behavior of the compiler, where
the compilation process is aborted due to incompatible attributes.

$ cat x.ll
; ModuleID = 'x.bc'

%_type_of_d-ccc = type <{ i8*, i8, i8, i8, i8 }>

@d-ccc = internal global %_type_of_d-ccc <{ i8* null, i8 1, i8 13, i8 0,
i8 -127 }>, align 8

define void @foo(i32* writeonly %.aaa) {
foo_entry:
  %_param_.aaa = alloca i32*, align 8
  store i32* %.aaa, i32** %_param_.aaa, align 8
  store i8 0, i8* getelementptr inbounds (%_type_of_d-ccc,
%_type_of_d-ccc* @d-ccc, i32 0, i32 3)
  ret void
}

$ opt -O3 x.ll
Attributes 'readnone and writeonly' are incompatible!
void (i32*)* @foo
in function foo
LLVM ERROR: Broken function found, compilation aborted!
The purpose of this changeset is to fix the above error. This fix is
based on a suggestion from Johannes @jdoerfert (many thanks!!!)
Authored By: anhtuyen
Reviewer: nicholas, rnk, chandlerc, jdoerfert
Reviewed By: rnk
Subscribers: hiraditya, jdoerfert, llvm-commits, anhtuyen, LLVM
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D58694

llvm-svn: 371622
2019-09-11 14:26:22 +00:00
Sanjay Patel 80bea345d1 [InstCombine] fold sign-bit compares of srem
(srem X, pow2C) sgt/slt 0 can be reduced using bit hacks by masking
off the sign bit and the module (low) bits:
https://rise4fun.com/Alive/jSO
A '2' divisor allows slightly more folding:
https://rise4fun.com/Alive/tDBM

Any chance to remove an 'srem' use is probably worthwhile, but this is limited
to the one-use improvement case because doing more may expose other missing
folds. That means it does nothing for PR21929 yet:
https://bugs.llvm.org/show_bug.cgi?id=21929

Differential Revision: https://reviews.llvm.org/D67334

llvm-svn: 371610
2019-09-11 12:04:26 +00:00
David Bolvansky 4dae283cd3 [InstCombine] Fixed handling of isOpNewLike (PR11748)
llvm-svn: 371602
2019-09-11 10:37:03 +00:00
Florian Hahn e79381c3f7 [LoopInterchange] Drop unused splitInnerLoopHeader declaration.
llvm-svn: 371601
2019-09-11 10:32:15 +00:00
Dmitri Gribenko 57256af307 Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
This reverts commit r371584. It introduced a dependency from compiler-rt
to llvm/include/ADT, which is problematic for multiple reasons.

One is that it is a novel dependency edge, which needs cross-compliation
machinery for llvm/include/ADT (yes, it is true that right now
compiler-rt included only header-only libraries, however, if we allow
compiler-rt to depend on anything from ADT, other libraries will
eventually get used).

Secondly, depending on ADT from compiler-rt exposes ADT symbols from
compiler-rt, which would cause ODR violations when Clang is built with
the profile library.

llvm-svn: 371598
2019-09-11 09:16:17 +00:00
Florian Hahn e4961218fd [LoopInterchange] Properly move condition, induction increment and ops to latch.
Currently we only rely on the induction increment to come before the
condition to ensure the required instructions get moved to the new
latch.

This patch duplicates and moves the required instructions to the
newly created latch. We move the condition to the end of the new block,
then process its operands. We stop at operands that are defined
outside the loop, or are the induction PHI.

We duplicate the instructions and update the uses in the moved
instructions, to ensure other users remain intact. See the added
test2 for such an example.

Reviewers: efriedma, mcrosier

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D67367

llvm-svn: 371595
2019-09-11 08:23:23 +00:00
Hideto Ueno 1d68ed8c24 [Attributor] Implement "noalias" callsite argument deduction
Summary: Now, `nocapture` is deduced in Attributor therefore, this patch introduces deduction for `noalias` callsite argument using `nocapture`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67286

llvm-svn: 371590
2019-09-11 07:00:33 +00:00
Hideto Ueno 3736764657 [Attributor][Fix] Manifest nocapture only in CSArgument or Argument
Summary:
We can query to Attributor whether the value is captured in the scope or not on the following way:

```
    const auto & NoCapAA = A.getAAFor<AANoCapture>(*this, IRPosition::value(V));
```
And if V is CallSiteReturned then `getDeducedAttribute` will add `nocatpure` to the callsite returned value. It is not valid.
This patch checks the position is an argument or call site argument.

This is tested in D67286.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67342

llvm-svn: 371589
2019-09-11 06:52:11 +00:00
Alexey Lapshin 6b1c6c1287 [Debuginfo][Instcombiner] Do not clone dbg.declare.
TryToSinkInstruction() has a bug: While updating debug info for
sunk instruction, it could clone dbg.declare intrinsic.
That is wrong. There could be only one dbg.declare.
The fix is to not clone dbg.declare intrinsic and to update
it`s arguments, to not to point to sunk instruction.

Differential Revision: https://reviews.llvm.org/D67217

llvm-svn: 371587
2019-09-11 06:07:16 +00:00
Petr Hosek 394a8ed8f1 clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300

We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.

We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.

A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.

In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect

Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324

llvm-svn: 371584
2019-09-11 01:09:16 +00:00
Alina Sbirlea f9cc0393b3 [MemorySSA] MemorySSA should not model debuginfo, and need not update it.
Reverts the change in r371084, but keeps the test.
After r371565, debuginfo cannot be modelled in MemorySSA, even with a
non-standard AA pipeline.

llvm-svn: 371573
2019-09-10 23:36:43 +00:00
Philip Reames cffa630c80 [Loads] Move generic code out of vectorizer into a location it might be reused [NFC]
llvm-svn: 371558
2019-09-10 21:33:53 +00:00
Philip Reames 1e1db80048 [ValueTracking] Factor our common speculation suppression logic [NFC]
Expose a utility function so that all places which want to suppress speculation (when otherwise legal) due to ordering and/or sanitizer interaction can do so.

llvm-svn: 371556
2019-09-10 21:12:29 +00:00
Florian Hahn 18a1f0818b [InstCombine] Use SimplifyFMulInst to simplify multiply in fma.
This allows us to fold fma's that multiply with 0.0. Also, the
multiply by 1.0 case is handled there as well. The fneg/fabs cases
are not handled by SimplifyFMulInst, so we need to keep them.

Reviewers: spatel, anemet, lebedev.ri

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D67351

llvm-svn: 371518
2019-09-10 13:10:28 +00:00
Dmitri Gribenko 2bf8d77453 Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.""
This reverts commit r371502, it broke tests
(clang/test/CodeGenCXX/auto-var-init.cpp).

llvm-svn: 371507
2019-09-10 10:39:09 +00:00
Clement Courbet 612c260ec3 Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."
With a fix for sanitizer breakage (see explanation in D60318).

llvm-svn: 371502
2019-09-10 09:18:00 +00:00
Petr Hosek 7d1757aba8 Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
This reverts commit r371484: this broke sanitizer-x86_64-linux-fast bot.

llvm-svn: 371488
2019-09-10 06:25:13 +00:00
Petr Hosek a10802fd73 clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM
This patch contains the basic functionality for reporting potentially
incorrect usage of __builtin_expect() by comparing the developer's
annotation against a collected PGO profile. A more detailed proposal and
discussion appears on the CFE-dev mailing list
(http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a
prototype of the initial frontend changes appear here in D65300

We revised the work in D65300 by moving the misexpect check into the
LLVM backend, and adding support for IR and sampling based profiles, in
addition to frontend instrumentation.

We add new misexpect metadata tags to those instructions directly
influenced by the llvm.expect intrinsic (branch, switch, and select)
when lowering the intrinsics. The misexpect metadata contains
information about the expected target of the intrinsic so that we can
check against the correct PGO counter when emitting diagnostics, and the
compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight.
We use these branch weight values to determine when to emit the
diagnostic to the user.

A future patch should address the comment at the top of
LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and
UnlikelyBranchWeight values into a shared space that can be accessed
outside of the LowerExpectIntrinsic pass. Once that is done, the
misexpect metadata can be updated to be smaller.

In the long term, it is possible to reconstruct portions of the
misexpect metadata from the existing profile data. However, we have
avoided this to keep the code simple, and because some kind of metadata
tag will be required to identify which branch/switch/select instructions
are influenced by the use of llvm.expect

Patch By: paulkirth
Differential Revision: https://reviews.llvm.org/D66324

llvm-svn: 371484
2019-09-10 03:11:39 +00:00
Philip Reames 7403569be7 [LoopVectorize] Leverage speculation safety to avoid masked.loads
If we're vectorizing a load in a predicated block, check to see if the load can be speculated rather than predicated.  This allows us to generate a normal vector load instead of a masked.load.

To do so, we must prove that all bytes accessed on any iteration of the original loop are dereferenceable, and that all loads (across all iterations) are properly aligned.  This is equivelent to proving that hoisting the load into the loop header in the original scalar loop is safe.

Note: There are a couple of code motion todos in the code.  My intention is to wait about a day - to be sure this sticks - and then perform the NFC motion without furthe review.

Differential Revision: https://reviews.llvm.org/D66688

llvm-svn: 371452
2019-09-09 20:54:13 +00:00
Sanjay Patel aff5bee35f [InstCombine] fold extract+insert into identity shuffle
This is similar to the existing fold for splats added with:
rL365379

If we can adjust the shuffle mask to include another element
in an identity mask (if it changes vector length, that's an
extract/insert subvector operation in the backend), then that
can eliminate extractelement/insertelement pairs in IR.

All targets are expected to lower shuffles with identity masks
efficiently.

llvm-svn: 371340
2019-09-08 19:03:01 +00:00
Simon Pilgrim 879ed20bde Fix typo. NFCI
llvm-svn: 371317
2019-09-07 18:09:09 +00:00
Roman Lebedev 45ba26599b [SimplifyCFG] SpeculativelyExecuteBB(): It's SpeculatedInstructions, not SpeculationCost
It counts the number of instructions we are ok speculating
(at most 1 there), not their cost, so rename accordingly.

llvm-svn: 371294
2019-09-07 09:06:06 +00:00
Hideto Ueno f2b9dc4758 [Attributor] ValueSimplify Abstract Attribute
Summary:
This patch introduces initial `AAValueSimplify` which simplifies a value in a context.

example
- (for function returned) If all the return values are the same and constant, then we can replace callsite returned with the constant.
- If an internal function takes the same value(constant) as an argument in the callsite, then we can replace the argument with that constant.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66967

llvm-svn: 371291
2019-09-07 07:03:05 +00:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Evandro Menezes 7d677adf2d [InstCombine] Refactor substitution of instruction in the parent BB (NFC)
Add the new method `LibCallSimplifier::substituteInParent()` that calls
`LibCallSimplifier::replaceAllUsesWith()' and
`LibCallSimplifier::eraseFromParent()` back to back, simplifying the
resulting code.

llvm-svn: 371264
2019-09-06 22:07:11 +00:00
Sanjay Patel 4f0e429acc [SimplifyLibCalls] handle pow(x,-0.0) before it can assert (PR43233)
https://bugs.llvm.org/show_bug.cgi?id=43233

llvm-svn: 371221
2019-09-06 16:10:18 +00:00
Matt Arsenault 524a9d5774 InstCombine: Fix crash on icmp of gep with addrspacecasted null
llvm-svn: 371146
2019-09-05 23:39:21 +00:00
Vitaly Buka 9020f11377 [SimplifyCFG] Don't SimplifyBranchOnICmpChain with ExtraCase
Summary:
Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain
which can check on undef. Msan by design reports branches on uninitialized
memory and undefs, so we have false report here.

In general msan does not like when we convert

```
// If at least one of them is true we can MSAN is ok if another is undefs
if (a || b)
  return;
```
into
```
// If 'a' is undef MSAN will complain even if 'b' is true
if (a)
  return;
if (b)
  return;
```

Example

Before optimization we had something like this:
```
while (true) {
  bool maybe_undef = doStuff();

  while (true) {
    char c = getChar();
    if (c != 10 && c != 13)
     continue
    break;
  }

  // we know that c == 10 || c == 13 if we get here,
  // so msan know that branch is not affected by maybe_undef
  if (maybe_undef || c == 10 || c == 13)
    continue;
  return;
}
```

SimplifyBranchOnICmpChain will convert that into
```
while (true) {
  bool maybe_undef = doStuff();

  while (true) {
    char c = getChar();
    if (c != 10 && c != 13)
      continue;
    break;
  }

  // however msan will complain here:
  if (maybe_undef)
    continue;

  // we know that c == 10 || c == 13, so either way we will get continue
  switch(c) {
    case 10: continue;
    case 13: continue;
  }
  return;
}
```

Reviewers: eugenis, efriedma

Reviewed By: eugenis, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67205

llvm-svn: 371138
2019-09-05 22:49:34 +00:00
Roman Lebedev 8360c42e25 [InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub overflow' check
A follow-up for r329011.
This may be changed to produce @llvm.sub.with.overflow in a later patch,
but for now just make things more consistent overall.

A few observations stem from this:
* There does not seem to be a similar one-instruction fold for uadd-overflow
* I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`,
  so since the `icmp` here no longer refers to `sub`,
  reconstructing `usub.with.overflow` will be problematic,
  and will likely require standalone pass (similar to DivRemPairs).

https://rise4fun.com/Alive/Zqs

Name: (A - B) u> A --> B u> A
  %t0 = sub i8 %A, %B
  %r = icmp ugt i8 %t0, %A
=>
  %r = icmp ugt i8 %B, %A

Name: (A - B) u<= A --> B u<= A
  %t0 = sub i8 %A, %B
  %r = icmp ule i8 %t0, %A
=>
  %r = icmp ule i8 %B, %A

Name: C u< (C - D) --> C u< D
  %t0 = sub i8 %C, %D
  %r = icmp ult i8 %C, %t0
=>
  %r = icmp ult i8 %C, %D

Name: C u>= (C - D) --> C u>= D
  %t0 = sub i8 %C, %D
  %r = icmp uge i8 %C, %t0
=>
  %r = icmp uge i8 %C, %D

llvm-svn: 371101
2019-09-05 17:41:02 +00:00
Roman Lebedev ecb7ea1ae7 [InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add overflow' check
A follow-up for r342004.
This will be changed to produce @llvm.add.with.overflow in a later patch,
but for now just make things more consistent overall.

https://rise4fun.com/Alive/qxE

Name: (Op1 + X) u< Op1 --> ~Op1 u< X
  %t0 = add i8 %Op1, %X
  %r = icmp ult i8 %t0, %Op1
=>
  %n = xor i8 %Op1, -1
  %r = icmp ult i8 %n, %X

Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X
  %t0 = add i8 %Op1, %X
  %r = icmp uge i8 %t0, %Op1
=>
  %n = xor i8 %Op1, -1
  %r = icmp uge i8 %n, %X

;-------------------------------------------------------------------------------

Name: Op0 u> (Op0 + X) --> X u> ~Op0
  %t0 = add i8 %Op0, %X
  %r = icmp ugt i8 %Op0, %t0
=>
  %n = xor i8 %Op0, -1
  %r = icmp ugt i8 %X, %n

Name: Op0 u<= (Op0 + X) --> X u<= ~Op0
  %t0 = add i8 %Op0, %X
  %r = icmp ule i8 %Op0, %t0
=>
  %n = xor i8 %Op0, -1
  %r = icmp ule i8 %X, %n

llvm-svn: 371100
2019-09-05 17:40:49 +00:00
Denis Bakhvalov 58f172f05a [MergedLoadStoreMotion] Sink stores to BB with more than 2 predecessors
If we have:

bb5:
  br i1 %arg3, label %bb6, label %bb7

bb6:
  %tmp = getelementptr inbounds i32, i32* %arg1, i64 2
  store i32 3, i32* %tmp, align 4
  br label %bb9

bb7:
  %tmp8 = getelementptr inbounds i32, i32* %arg1, i64 2
  store i32 3, i32* %tmp8, align 4
  br label %bb9

bb9:  ; preds = %bb4, %bb6, %bb7
  ...

We can't sink stores directly into bb9.
This patch creates new BB that is successor of %bb6 and %bb7
and sinks stores into that block.

SplitFooterBB is the parameter to the pass that controls
that behavior.

Change-Id: I7fdf50a772b84633e4b1b860e905bf7e3e29940f
Differential: https://reviews.llvm.org/D66234
llvm-svn: 371089
2019-09-05 17:00:32 +00:00
Alina Sbirlea 2ac69aadb5 [MemorySSA] Verify MSSAUpdater exists.
llvm-svn: 371087
2019-09-05 16:58:15 +00:00
Hiroshi Yamauchi d842f2eec4 [PGO][CHR] Speed up following long, interlinked use-def chains.
Summary:
Avoid visiting an instruction more than once by using a map.

This is similar to https://reviews.llvm.org/rL361416.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67198

llvm-svn: 371086
2019-09-05 16:56:55 +00:00
Alina Sbirlea ae900d3882 [MemorySSA] Update MemorySSA when removing debug.value calls.
llvm-svn: 371084
2019-09-05 16:25:24 +00:00
Guillaume Chatelet 33671ceffa [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67223

llvm-svn: 371063
2019-09-05 13:09:42 +00:00
Johannes Doerfert 56e9b608ad [Attributor][Stats] Use the right statistics macro
llvm-svn: 370976
2019-09-04 20:34:57 +00:00
Johannes Doerfert 7ab5253704 [Attributor][Fix] Make sure we do not delete live code
Summary: Liveness needs to mark edges, not blocks as dead.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67191

llvm-svn: 370975
2019-09-04 20:34:52 +00:00
Leonard Chan eca01b031d [NewPM][Sancov] Make Sancov a Module Pass instead of 2 Passes
This patch merges the sancov module and funciton passes into one module pass.

The reason for this is because we ran into an out of memory error when
attempting to run asan fuzzer on some protobufs (pc.cc files). I traced the OOM
error to the destructor of SanitizerCoverage where we only call
appendTo[Compiler]Used which calls appendToUsedList. I'm not sure where precisely
in appendToUsedList causes the OOM, but I am able to confirm that it's calling
this function *repeatedly* that causes the OOM. (I hacked sancov a bit such that
I can still create and destroy a new sancov on every function run, but only call
appendToUsedList after all functions in the module have finished. This passes, but
when I make it such that appendToUsedList is called on every sancov destruction,
we hit OOM.)

I don't think the OOM is from just adding to the SmallSet and SmallVector inside
appendToUsedList since in either case for a given module, they'll have the same
max size. I suspect that when the existing llvm.compiler.used global is erased,
the memory behind it isn't freed. I could be wrong on this though.

This patch works around the OOM issue by just calling appendToUsedList at the
end of every module run instead of function run. The same amount of constants
still get added to llvm.compiler.used, abd we make the pass usage and logic
simpler by not having any inter-pass dependencies.

Differential Revision: https://reviews.llvm.org/D66988

llvm-svn: 370971
2019-09-04 20:30:29 +00:00
Alina Sbirlea 6da79ce1fe [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370957
2019-09-04 19:16:04 +00:00
Johannes Doerfert a7a3b3aa43 [Attributor][Fix] Ensure the attribute names are created properly
The names of the attributes were not always created properly which
caused problems with the yaml output.

llvm-svn: 370956
2019-09-04 19:01:08 +00:00
Philip Reames 4228245e41 [NFC] Switch last couple of invariant_load checks to use hasMetadata
llvm-svn: 370948
2019-09-04 18:27:31 +00:00
David Bolvansky 420cbb6190 [InstCombine] sub(xor(x, y), or(x, y)) -> neg(and(x, y))
Summary:
```
Name: sub(xor(x, y), or(x, y)) -> neg(and(x, y))
%or = or i32 %y, %x
%xor = xor i32 %x, %y
%sub = sub i32 %xor, %or
  =>
%sub1 = and i32 %x, %y
%sub = sub i32 0, %sub1

Optimization: sub(xor(x, y), or(x, y)) -> neg(and(x, y))
Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/8OI

Reviewers: lebedev.ri

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67188

llvm-svn: 370945
2019-09-04 18:03:21 +00:00
David Bolvansky 0e07248704 [InstCombine] Fold sub (and A, B) (or A, B)) to neg (xor A, B)
Summary:
```
Name: sub(and(x, y), or(x, y)) -> neg(xor(x, y))
%or = or i32 %y, %x
%and = and i32 %x, %y
%sub = sub i32 %and, %or
  =>
%sub1 = xor i32 %x, %y
%sub = sub i32 0, %sub1

Optimization: sub(and(x, y), or(x, y)) -> neg(xor(x, y))
Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/VI6

Found by @lebedev.ri. Also author of the proof.

Reviewers: lebedev.ri, spatel

Reviewed By: lebedev.ri

Subscribers: llvm-commits, lebedev.ri

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67155

llvm-svn: 370934
2019-09-04 17:30:53 +00:00
Philip Reames 27820f9909 [Instruction] Add hasMetadata(Kind) helper [NFC]
It's a common idiom, so let's add the obvious wrapper for metadata kinds which are basically booleans.

llvm-svn: 370933
2019-09-04 17:28:48 +00:00
Johannes Doerfert 2f6220633c [Attributor] Look at internal functions only on-demand
Summary:
Instead of building attributes for internal functions which we do not
update as long as we assume they are dead, we now do not create
attributes until we assume the internal function to be live. This
improves the number of required iterations, as well as the number of
required updates, in real code. On our tests, the results are mixed.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66914

llvm-svn: 370924
2019-09-04 16:35:20 +00:00