Commit Graph

8670 Commits

Author SHA1 Message Date
Philip Reames b2c9fc02d5 Fix a bug in SCEV's isSafeToExpand around speculation safety
isSafeToExpand was making a common, but dangerously wrong, mistake in assuming that if any instruction within a basic block executes, that all instructions within that block must execute.  This can be trivially shown to be false by considering the following small example:
bb:
  add x, y  <-- InsertionPoint
  call @throws()
  udiv x, y <-- SCEV* S
  br ...

It's clearly not legal to expand S above the throwing call, but the previous logic would do so since S dominates (but not properlyDominates) the block containing the InsertionPoint.

Since iterating instructions w/in a block is expensive, this change special cases two cases: 1) S is an operand of InsertionPoint, and 2) InsertionPoint is the terminator of it's block.  These two together are enough to keep all current optimizations triggering while fixing the latent correctness issue.

As best I can tell, this is a silent bug in current ToT.  Given that, there's no tests with this change.  It was noticed in an upcoming optimization change (D60093), and was reviewed as part of that.  That change will include the test which caused me to notice the issue.  I'm submitting this seperately so that anyone bisecting a problem gets a clear explanation.

llvm-svn: 358680
2019-04-18 16:10:21 +00:00
Nikita Popov 2039581002 [LVI][CVP] Constrain values in with.overflow branches
If a branch is conditional on extractvalue(op.with.overflow(%x, C), 1)
then we can constrain the value of %x inside the branch based on
makeGuaranteedNoWrapRegion(). We do this by extending the edge-value
handling in LVI. This allows CVP to then fold comparisons against %x,
as illustrated in the tests.

Differential Revision: https://reviews.llvm.org/D60650

llvm-svn: 358597
2019-04-17 16:57:42 +00:00
Nikita Popov 79dffc67b5 [IR] Add WithOverflowInst class
This adds a WithOverflowInst class with a few helper methods to get
the underlying binop, signedness and nowrap type and makes use of it
where sensible. There will be two more uses in D60650/D60656.

The refactorings are all NFC, though I left some TODOs where things
could be improved. In particular we have two places where add/sub are
handled but mul isn't.

Differential Revision: https://reviews.llvm.org/D60668

llvm-svn: 358512
2019-04-16 18:55:16 +00:00
Alina Sbirlea f9f073a861 [MemorySSA] Add previous def to cache when found, even if trivial.
Summary:
When inserting a new Def, MemorySSA may be have non-minimal number of Phis.
While inserting, the walk to find the previous definition may cleanup minimal Phis.
When the last definition is trivial to obtain, we do not cache it.

It is possible while getting the previous definition for a Def to get two different answers:
- one that was straight-forward to find when walking the first path (a trivial phi in this case), and
- another that follows a cleanup of the trivial phi, it determines it may need additional Phi nodes, it inserts them and returns a new phi in the same position as the former trivial one.
While the Phis added for the second path are all redundant, they are not complete (the walk is only done upwards), and they are not properly cleaned up afterwards.

A way to fix this problem is to cache the straight-forward answer we got on the first walk.
The caching is only kept for the duration of a getPreviousDef call, and for Phis we use TrackingVH, so removing the trivial phi will lead to replacing it with the next dominating phi in the cache.
Resolves PR40749.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60634

llvm-svn: 358313
2019-04-12 21:58:52 +00:00
Alina Sbirlea 2312a06c87 [SCEV] Add option to forget everything in SCEV.
Summary:
Create a method to forget everything in SCEV.
Add a cl::opt and PassManagerBuilder option to use this in LoopUnroll.

Motivation: Certain Halide applications spend a very long time compiling in forgetLoop, and prefer to forget everything and rebuild SCEV from scratch.
Sample difference in compile time reduction: 21.04 to 14.78 using current ToT release build.
Testcase showcasing this cannot be opensourced and is fairly large.

The option disabled by default, but it may be desirable to enable by
default. Evidence in favor (two difference runs on different days/ToT state):

File Before (s) After (s)
clang-9.bc 7267.91 6639.14
llvm-as.bc 194.12 194.12
llvm-dis.bc 62.50 62.50
opt.bc 1855.85 1857.53

File Before (s) After (s)
clang-9.bc 8588.70 7812.83
llvm-as.bc 196.20 194.78
llvm-dis.bc 61.55 61.97
opt.bc 1739.78 1886.26

Reviewers: sanjoy

Subscribers: mehdi_amini, jlebar, zzheng, javed.absar, dmgreen, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60144

llvm-svn: 358304
2019-04-12 19:16:07 +00:00
Alina Sbirlea 57769382b1 [MemorySSA] Small fix for the clobber limit.
Summary:
After introducing the limit for clobber walking, `walkToPhiOrClobber` would assert that the limit is at least 1 on entry.
The test included triggered that assert.

The callsite in `tryOptimizePhi` making the calls to `walkToPhiOrClobber` is structured like this:
```
while (true) {
   if (getBlockingAccess()) { // calls walkToPhiOrClobber
   }
   for (...) {
     walkToPhiOrClobber();
   }
}
```

The cleanest fix is to check if the limit was reached inside `walkToPhiOrClobber`, and give an allowence of 1.
This approach not make any alias() calls (no calls to instructionClobbersQuery), so the performance condition is enforced.
The limit is set back to 0 if not used, as this provides info on the fact that we stopped before reaching a true clobber.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60479

llvm-svn: 358303
2019-04-12 18:48:46 +00:00
Fangrui Song cecc435250 Use llvm::lower_bound. NFC
This reapplies rL358161. That commit inadvertently reverted an exegesis file to an old version.

llvm-svn: 358246
2019-04-12 02:02:06 +00:00
Ali Tamur 7822b46188 Revert "Use llvm::lower_bound. NFC"
This reverts commit rL358161.

This patch have broken the test:
llvm/test/tools/llvm-exegesis/X86/uops-CMOV16rm-noreg.s

llvm-svn: 358199
2019-04-11 17:35:20 +00:00
Sander de Smalen 4f5d2df48d [ValueTracking] Change if-else chain into switch in computeKnownBitsFromAssume
This is a follow-up patch to D60504 to further improve
performance issues in computeKnownBitsFromAssume.
    
The patch is NFC, but may improve compile-time performance
if the compiler isn't clever enough to do the optimization
itself.

llvm-svn: 358163
2019-04-11 13:02:19 +00:00
Fangrui Song 71cce580b9 Use llvm::lower_bound. NFC
llvm-svn: 358161
2019-04-11 10:25:41 +00:00
Erik Pilkington cb5c7bd9eb Fix a hang when lowering __builtin_dynamic_object_size
If the ObjectSizeOffsetEvaluator fails to fold the object size call, then it may
litter some unused instructions in the function. When done repeatably in
InstCombine, this results in an infinite loop. Fix this by tracking the set of
instructions that were inserted, then removing them on failure.

rdar://49172227

Differential revision: https://reviews.llvm.org/D60298

llvm-svn: 358146
2019-04-10 23:42:11 +00:00
Sander de Smalen 0e66db5d77 Improve compile-time performance in computeKnownBitsFromAssume.
This patch changes the order of pattern matching by first testing
a compare instruction's predicate, before doing the pattern
match for the whole expression tree.

Patch by Paul Walker.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D60504

llvm-svn: 358097
2019-04-10 16:24:48 +00:00
Nikita Popov 4b2323d1a3 [ValueTracking] Use computeConstantRange() for signed sub overflow determination
This is the same change as D60420 but for signed sub rather than
signed add: Range information is intersected into the known bits
result, allows to detect more no/always overflow conditions.

Differential Revision: https://reviews.llvm.org/D60469

llvm-svn: 358020
2019-04-09 17:01:49 +00:00
Nikita Popov 10edd2b79d [ValueTracking] Use computeConstantRange() in signed add overflow determination
This is D59386 for the signed add case. The computeConstantRange()
result is now intersected into the existing known bits information,
allowing to detect additional no-overflow/always-overflow conditions
(though the latter isn't used yet).

This (finally...) covers the motivating case from D59071.

Differential Revision: https://reviews.llvm.org/D60420

llvm-svn: 358014
2019-04-09 16:12:59 +00:00
Nemanja Ivanovic 820b90318f NFC: Refactor library-specific mappings of scalar maths functions to their vector counterparts
This patch factors out mappings of scalar maths functions to their vector
counterparts from TargetLibraryInfo.cpp to a separate VecFuncs.def file. Such
mappings are currently available for Accelerate framework, and SVML library.

This is in support of the follow-up: https://reviews.llvm.org/D59881

Patch by pjeeva01

Differential revision: https://reviews.llvm.org/D60211

llvm-svn: 358001
2019-04-09 13:21:11 +00:00
Nikita Popov 6e9157d588 [ValueTracking] Use ConstantRange methods; NFC
Switch part of the computeOverflowForSignedAdd() implementation to
use Range.isAllNegative() rather than KnownBits.isNegative() and
similar. They do the same thing, but using the ConstantRange methods
allows dropping the KnownBits variables more easily in D60420.

llvm-svn: 357969
2019-04-09 07:13:09 +00:00
Nikita Popov 7bd7878d22 [ValueTracking] Explicitly specify intersection type; NFC
Preparation for D60420.

llvm-svn: 357968
2019-04-09 07:13:03 +00:00
Nikita Popov 3db93ac5d6 Reapply [ValueTracking] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.

Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 357870
2019-04-07 17:22:16 +00:00
Guozhi Wei 36fc9c3107 [LCG] Add aliased functions as LCG roots
Current LCG doesn't check aliased functions. So if an internal function has a public alias it will not be added to CG SCC, but it is still reachable from outside through the alias.
So this patch adds aliased functions to SCC.

Differential Revision: https://reviews.llvm.org/D59898

llvm-svn: 357795
2019-04-05 18:51:08 +00:00
Nick Lewycky a6ed16c98f An unreachable block may have a route to a reachable block, don't fast-path return that it can't.
A block reachable from the entry block can't have any route to a block that's not reachable from the entry block (if it did, that route would make it reachable from the entry block). That is the intended performance optimization for isPotentiallyReachable. For the case where we ask whether an unreachable from entry block has a route to a reachable from entry block, we can't conclude one way or the other. Fix a bug where we claimed there could be no such route.

The fix in rL357425 ironically reintroduced the very bug it was fixing but only when a DominatorTree is provided. This fixes the remaining bug.

llvm-svn: 357734
2019-04-04 23:09:40 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Evandro Menezes 7c711ccf36 [IR] Create new method in `Function` class (NFC)
Create method `optForNone()` testing for the function level equivalent of
`-O0` and refactor appropriately.

Differential revision: https://reviews.llvm.org/D59852

llvm-svn: 357638
2019-04-03 21:27:03 +00:00
Alon Zakai b4f9991f38 [WebAssembly] Add Emscripten OS definition + small_printf
The Emscripten OS provides a definition of __EMSCRIPTEN__, and also that it
supports iprintf optimizations.

Also define small_printf optimizations, which is a printf with float support
but not long double (which in wasm can be useful since long doubles are 128
bit and force linking of float128 emulation code). This part is based on
sunfish's https://reviews.llvm.org/D57620 (which can't land yet since
the WASI integration isn't ready yet).

Differential Revision: https://reviews.llvm.org/D60167

llvm-svn: 357552
2019-04-03 01:08:35 +00:00
Matt Arsenault 03e7492876 InstSimplify: Fold round intrinsics from sitofp/uitofp
https://godbolt.org/z/gEMRZb

llvm-svn: 357549
2019-04-03 00:25:06 +00:00
Philip Reames d3d5d76a7b [WideableCond] Fix a nasty bug in detection of "explicit guards"
The code was failing to actually check for the presence of the call to widenable_condition.  The whole point of specifying the widenable_condition intrinsic was allowing widening transforms.  A normal branch is not widenable.  A normal branch leading to a deopt is not widenable (in general).

I added a test case via LoopPredication, but GuardWidening has an analogous bug.  Those are the only two passes actually using this utility just yet. Noticed while working on LoopPredication for non-widenable branches; POC in D60111.

llvm-svn: 357493
2019-04-02 16:51:43 +00:00
Nick Lewycky c0ebfbe3f3 Add an optional list of blocks to avoid when looking for a path in isPotentiallyReachable.
The leads to some ambiguous overloads, so update three callers.

Differential Revision: https://reviews.llvm.org/D60085

llvm-svn: 357447
2019-04-02 01:05:48 +00:00
Nick Lewycky 66d7eb9704 Not all blocks are reachable from entry. Don't assume they are.
Fixes a bug in isPotentiallyReachable, noticed by inspection.

llvm-svn: 357425
2019-04-01 20:03:16 +00:00
Alina Sbirlea c8d6e0496d [MemorySSA] Temporary fix assert when reaching 0 limit.
llvm-svn: 357327
2019-03-29 22:55:59 +00:00
Sanjoy Das 53a5952a93 Try to fix buildbot error
Error is:

llvm/lib/Analysis/ScalarEvolution.cpp:3534:10: error: chosen constructor is explicit in copy-initialization
  return {UniqueSCEVs.FindNodeOrInsertPos(ID, IP), std::move(ID), IP};
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/bin/../lib/gcc/aarch64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
        constexpr tuple(_UElements&&... __elements)
                  ^
1 error generated.

llvm-svn: 357324
2019-03-29 22:27:10 +00:00
Sanjoy Das 32fd32bc6f [SCEV] Check the cache in get{S|U}MaxExpr before doing any work
Summary:
This lets us avoid e.g. checking if A >=s B in getSMaxExpr(A, B) if we've
already established that (A smax B) is the best we can do.

Fixes PR41225.

Reviewers: asbirlea

Subscribers: mcrosier, jlebar, bixia, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60010

llvm-svn: 357320
2019-03-29 22:00:12 +00:00
Alina Sbirlea f085cc5aa7 [MemorySSA] Limit clobber walks.
Summary: This patch limits all getClobberingMemoryAccess() walks to MaxCheckLimit.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59569

llvm-svn: 357319
2019-03-29 21:56:09 +00:00
Alina Sbirlea e589067e61 [MemorySSA] Don't optimize incomplete phis.
Summary:
MemoryPhis cannot be optimized out until they are complete.
Resolves PR41254.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59966

llvm-svn: 357315
2019-03-29 21:16:31 +00:00
Mircea Trofin b27d0fd0bf [llvm][NFC] Factor out logic for getting incoming & back Loop edges
Reviewers: davidxl

Reviewed By: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59967

llvm-svn: 357284
2019-03-29 17:39:17 +00:00
Florian Hahn 9b41a7320d Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."
Updated to use DenseMap::insert instead of [] operator for insertion, to
avoid a crash caused by epoch checks.

This reverts commit 2b85de4383.

llvm-svn: 357257
2019-03-29 14:10:24 +00:00
Florian Hahn 2b85de4383 Revert Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."
Another buildbot failure

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/20402

clang-9: /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/llvm/include/llvm/ADT/DenseMap.h:1228: llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type* llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::operator->() const [with KeyT = const llvm::Instruction*; ValueT = unsigned int; KeyInfoT = llvm::DenseMapInfo<const llvm::Instruction*>; Bucket = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>; bool IsConst = false; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::pointer = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>*; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type = llvm::detail::DenseMapPair<const llvm::Instruction*, unsigned int>]: Assertion `isHandleInSync() && "invalid iterator access!"' failed.

0.	Program arguments: /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/bin/clang-9 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -main-file-name ArchiveCommandLine.cpp -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu skylake-avx512 -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -coverage-notes-file /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip/Output/ArchiveCommandLine.llvm.gcno -resource-dir /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/lib/clang/9.0.0 -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/include -I ../../../include -D _GNU_SOURCE -D __STDC_LIMIT_MACROS -D NDEBUG -D BREAK_HANDLER -D UNICODE -D _UNICODE -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/C -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/myWindows -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/include_windows -I /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP -I . -D _FILE_OFFSET_BITS=64 -D _LARGEFILE_SOURCE -D NDEBUG -D _REENTRANT -D ENV_UNIX -D _7ZIP_LARGE_PAGES -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/x86_64-linux-gnu/c++/5.4.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/backward -internal-isystem /usr/local/include -internal-isystem /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/stage1.install/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -std=gnu++98 -fdeprecated-macro -fdebug-compilation-dir /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/sandbox/build/MultiSource/Benchmarks/7zip -ferror-limit 19 -fmessage-length 0 -pthread -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -vectorize-loops -vectorize-slp -o Output/ArchiveCommandLine.llvm.o -x c++ /home/ssglocal/clang-cmake-x86_64-sde-avx512-linux/clang-cmake-x86_64-sde-avx512-linux/test/test-suite/MultiSource/Benchmarks/7zip/CPP/7zip/UI/Common/ArchiveCommandLine.cpp -faddrsig

This reverts r357222 (git commit 64cccfcc72)

llvm-svn: 357227
2019-03-29 00:22:26 +00:00
Florian Hahn 64cccfcc72 Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."
Recommitting after addressing a buildbot failure.

This reverts commit c87869ebea.

llvm-svn: 357222
2019-03-28 23:11:00 +00:00
Florian Hahn c87869ebea Revert [DSE] Preserve basic block ordering using OrderedBasicBlock.
This reverts r357208 (git commit c0bfd37d38)

This causes a buildbot failure:  http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/16124

FAILED: lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o
/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/install/stage2/bin/clang++   -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/IR -I/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/IR -Iinclude -I/home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/include -fPIC -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -O3    -UNDEBUG  -fno-exceptions -fno-rtti -MD -MT lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -MF lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o.d -o lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -c /home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/IR/IRBuilder.cpp
clang-9: /home/buildslave/ps4-buildslave1/clang-with-thin-lto-ubuntu/llvm.src/lib/Analysis/OrderedBasicBlock.cpp:38: bool llvm::OrderedBasicBlock::comesBefore(const llvm::Instruction *, const llvm::Instruction *): Assertion `!(LastInstFound == BB->end() && NextInstPos != 0) && "Instruction supposed to be in NumberedInsts"' failed.

llvm-svn: 357211
2019-03-28 20:36:24 +00:00
Florian Hahn c0bfd37d38 [DSE] Preserve basic block ordering using OrderedBasicBlock.
By extending OrderedBB to allow removing and replacing cached
instructions, we can preserve OrderedBBs in DSE easily. This eliminates
one source of quadratic compile time in DSE.

Fixes PR38829.

Reviewers: rnk, efriedma, hfinkel

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59789

llvm-svn: 357208
2019-03-28 20:02:33 +00:00
Florian Hahn 6c3024368c [MemDepAnalysis] Allow caller to pass in an OrderedBasicBlock.
If the caller can preserve the OBB, we can avoid recomputing the order
for each getDependency call.

Reviewers: efriedma, rnk, hfinkel

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D59788

llvm-svn: 357206
2019-03-28 19:17:31 +00:00
Chandler Carruth 923ff550b9 [NewPM] Fix a nasty bug with analysis invalidation in the new PM.
The issue here is that we actually allow CGSCC passes to mutate IR (and
therefore invalidate analyses) outside of the current SCC. At a minimum,
we need to support mutating parent and ancestor SCCs to support the
ArgumentPromotion pass which rewrites all calls to a function.

However, the analysis invalidation infrastructure is heavily based
around not needing to invalidate the same IR-unit at multiple levels.
With Loop passes for example, they don't invalidate other Loops. So we
need to customize how we handle CGSCC invalidation. Doing this without
gratuitously re-running analyses is even harder. I've avoided most of
these by using an out-of-band preserved set to accumulate the cross-SCC
invalidation, but it still isn't perfect in the case of re-visiting the
same SCC repeatedly *but* it coming off the worklist. Unclear how
important this use case really is, but I wanted to call it out.

Another wrinkle is that in order for this to successfully propagate to
function analyses, we have to make sure we have a proxy from the SCC to
the Function level. That requires pre-creating the necessary proxy.

The motivating test case now works cleanly and is added for
ArgumentPromotion.

Thanks for the review from Philip and Wei!

Differential Revision: https://reviews.llvm.org/D59869

llvm-svn: 357137
2019-03-28 00:51:36 +00:00
Hans Wennborg 5519cb2d94 Fix the build with GCC 4.8 after r356783
llvm-svn: 356875
2019-03-25 09:27:42 +00:00
Nikita Popov 977934f00f [ConstantRange] Add getFull() + getEmpty() named constructors; NFC
This adds ConstantRange::getFull(BitWidth) and
ConstantRange::getEmpty(BitWidth) named constructors as more readable
alternatives to the current ConstantRange(BitWidth, /* full */ false)
and similar. Additionally private getFull() and getEmpty() member
functions are added which return a full/empty range with the same bit
width -- these are commonly needed inside ConstantRange.cpp.

The IsFullSet argument in the ConstantRange(BitWidth, IsFullSet)
constructor is now mandatory for the few usages that still make use of it.

Differential Revision: https://reviews.llvm.org/D59716

llvm-svn: 356852
2019-03-24 09:34:40 +00:00
Nikita Popov 280a6b01c8 [ValueTracking] Avoid redundant known bits calculation in computeOverflowForSignedAdd()
We're already computing the known bits of the operands here. If the
known bits of the operands can determine the sign bit of the result,
we'll already catch this in signedAddMayOverflow(). The only other
way (and as the comment already indicates) we'll get new information
from computing known bits on the whole add, is if there's an assumption
on it.

As such, we change the code to only compute known bits from assumptions,
instead of computing full known bits on the add (which would unnecessarily
recompute the known bits of the operands as well).

Differential Revision: https://reviews.llvm.org/D59473

llvm-svn: 356785
2019-03-22 17:51:40 +00:00
Alina Sbirlea bfc779e491 [AliasAnalysis] Second prototype to cache BasicAA / anyAA state.
Summary:
Adding contained caching to AliasAnalysis. BasicAA is currently the only one using it.

AA changes:
- This patch is pulling the caches from BasicAAResults to AAResults, meaning the getModRefInfo call benefits from the IsCapturedCache as well when in "batch mode".
- All AAResultBase implementations add the QueryInfo member to all APIs. AAResults APIs maintain wrapper APIs such that all alias()/getModRefInfo call sites are unchanged.
- AA now provides a BatchAAResults type as a wrapper to AAResults. It keeps the AAResults instance and a QueryInfo instantiated to batch mode. It delegates all work to the AAResults instance with the batched QueryInfo. More API wrappers may be needed in BatchAAResults; only the minimum needed is currently added.

MemorySSA changes:
- All walkers are now templated on the AA used (AliasAnalysis=AAResults or BatchAAResults).
- At build time, we optimize uses; now we create a local walker (lives only as long as OptimizeUses does) using BatchAAResults.
- All Walkers have an internal AA and only use that now, never the AA in MemorySSA. The Walkers receive the AA they will use when built.

- The walker we use for queries after the build is instantiated on AliasAnalysis and is built after building MemorySSA and setting AA.
- All static methods doing walking are now templated on AliasAnalysisType if they are used both during build and after. If used only during build, the method now only takes a BatchAAResults. If used only after build, the method now takes an AliasAnalysis.

Subscribers: sanjoy, arsenm, jvesely, nhaehnle, jlebar, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59315

llvm-svn: 356783
2019-03-22 17:22:19 +00:00
Bixia Zheng bdf0230cff [ConstantFolding] Fix GetConstantFoldFPValue to avoid cast overflow.
Summary:
In C++, the behavior of casting a double value that is beyond the range
of a single precision floating-point to a float value is undefined. This
change replaces such a cast with APFloat::convert to convert the value,
which is consistent with how we convert a double value to a half value.

Reviewers: sanjoy

Subscribers: lebedev.ri, sanjoy, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59500

llvm-svn: 356781
2019-03-22 16:37:37 +00:00
Craig Topper 9f0b17a248 [ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics.
This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle.

I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway.

Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that.

Differential Revision: https://reviews.llvm.org/D59180

llvm-svn: 356687
2019-03-21 17:38:52 +00:00
Nikita Popov 3af5b28f47 [ValueTracking] Use ConstantRange based overflow check for signed sub
This is D59450, but for signed sub. This case is not NFC, because
the overflow logic in ConstantRange is more powerful than the existing
check. This resolves the TODO in the function.

I've added two tests to show that this indeed catches more cases than
the previous logic, but the main correctness test coverage here is in
the existing ConstantRange unit tests.

Differential Revision: https://reviews.llvm.org/D59617

llvm-svn: 356685
2019-03-21 17:23:51 +00:00
Fangrui Song ebfb7852be [BasicAA] Use DenseMap::try_emplace after D59151. NFC
llvm-svn: 356651
2019-03-21 08:47:40 +00:00
Alina Sbirlea 4fdbd822fc [BasicAA] Reduce no of map seaches [NFCI].
Summary:
This is a refactoring patch.
- Reduce the number of map searches by reusing the iterator.
- Add asserts to check that the entry is in the cache, as this is something BasicAA relies on to avoid infinite recursion.

Reviewers: chandlerc, aschwaighofer

Subscribers: sanjoy, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59151

llvm-svn: 356644
2019-03-21 05:02:05 +00:00
George Burgess IV ae84e9ab49 [MSSA] Delete move ctor; remove dynamic never-moved verification
Code archaeology in D59315 revealed that MSSA should never be moved.
Rather than trying to check dynamically that this hasn't happened in the
verify() functions of Walkers, it's likely best to just delete its move
constructor.

Since all these verify() functions did is check that MSSA hasn't moved,
this allows us to remove these verify functions.

I can readd the verification checks if someone's super concerned about
us trying to `memcpy` MemorySSA or something somewhere, but I imagine we
have other problems if we're trying anything like that...

llvm-svn: 356641
2019-03-21 03:11:34 +00:00
Nikita Popov 00b5ecab5d [ValueTracking] Compute range for abs without nsw
This is a small followup to D59511. The code that was moved into
computeConstantRange() there is a bit overly conversative: If the
abs is not nsw, it does not compute any range. However, abs without
nsw still has a well-defined contiguous unsigned range from 0 to
SIGNED_MIN. This is a lot less useful than the usual 0 to SIGNED_MAX
range, but if we're already here we might as well specify it...

Differential Revision: https://reviews.llvm.org/D59563

llvm-svn: 356586
2019-03-20 18:16:02 +00:00
Nikita Popov 208381953b [ValueTracking] Use computeConstantRange() for unsigned add/sub overflow
Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by
intersecting the computeConstantRange() result into the ConstantRange
created from computeKnownBits(). This allows us to detect some
additional never/always overflows conditions that can't be determined
from known bits.

This revision also adds basic handling for constants to
computeConstantRange(). Non-splat vectors will be handled in a followup.

The signed case will also be handled in a followup, as it needs some
more groundwork.

Differential Revision: https://reviews.llvm.org/D59386

llvm-svn: 356489
2019-03-19 17:53:56 +00:00
Simon Pilgrim a56f2822d0 [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect
These changes are related to PR37743 and include:

    SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.

    Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.

    Add promoting the integer ABS node in the LegalizeIntegerType.

    Expand-based legalization of integer result for the ABS nodes.

    Expand-based legalization of ABS vector operations.

    Add some integer abs testcases for different typesizes for Thumb arch

    Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
        tmp = (SRA, Hi, 31)
        Lo = (UADDO tmp, Lo)
        Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
        Lo = (XOR tmp, Lo)

    The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
        (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).

    Change integer abs testcases for codegen with the ABS node support for AArch64.
        Indicate that the ABS is legal for the i64 type when the NEON is supported.
        Change the integer abs testcases to show changing of codegen.

    Add combine and legalization of ABS nodes for Thumb arch.

    Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.

For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743

Patch by: @ikulagin (Ivan Kulagin)

Differential Revision: https://reviews.llvm.org/D49837

llvm-svn: 356468
2019-03-19 16:24:55 +00:00
Simon Pilgrim 8ee477a2ab [InstSimplify] SimplifyICmpInst - icmp eq/ne %X, undef -> undef
As discussed on PR41125 and D59363, we have a mismatch between icmp eq/ne cases with an undef operand:

When the other operand is constant we fold to undef (handled in ConstantFoldCompareInstruction)
When the other operand is non-constant we fold to a bool constant based on isTrueWhenEqual (handled in SimplifyICmpInst).

Neither is really wrong, but this patch changes the logic in SimplifyICmpInst to consistently fold to undef.

The NewGVN test change is annoying (as with most heavily reduced tests) but AFAICT I have kept the purpose of the test based on rL291968.

Differential Revision: https://reviews.llvm.org/D59541

llvm-svn: 356456
2019-03-19 14:08:23 +00:00
Nikita Popov 3e9770d2dc Revert "[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()"
This reverts commit 106f0cdefb.

This change impacts the AMDGPU smed3.ll and umed3.ll codegen tests.

llvm-svn: 356424
2019-03-18 22:26:27 +00:00
Nikita Popov 106f0cdefb [ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This was suggested by spatel as an alternative
approach to D59378. I've also added the infinite looping test from
that revision here.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 356415
2019-03-18 21:35:19 +00:00
Nikita Popov f89343bc47 [ValueTracking][InstSimplify] Move abs handling into computeConstantRange(); NFC
This is preparation for D59506. The InstructionSimplify abs handling
is moved into computeConstantRange(), which is the general place for
such calculations. This is NFC and doesn't affect the existing tests
in test/Transforms/InstSimplify/icmp-abs-nabs.ll.

Differential Revision: https://reviews.llvm.org/D59511

llvm-svn: 356409
2019-03-18 21:20:03 +00:00
Warren Ristow ad7d0ded2e [SCEV] Guard movement of insertion point for loop-invariants
This reinstates r347934, along with a tweak to address a problem with
PHI node ordering that that commit created (or exposed). (That commit
was reverted at r348426, due to the PHI node issue.)

Original commit message:

r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D57428

llvm-svn: 356392
2019-03-18 18:52:35 +00:00
Nikita Popov 322e2dbee1 [ValueTracking] Use ConstantRange overflow check for signed add; NFC
This is the same change as rL356290, but for signed add. It replaces
the existing ripple logic with the overflow logic in ConstantRange.

This is NFC in that it should return NeverOverflow in exactly the
same cases as the previous implementation. However, it does make
computeOverflowForSignedAdd() more powerful by now also determining
AlwaysOverflows conditions. As none of its consumers handle this yet,
this has no impact on optimization. Making use of AlwaysOverflows
in with.overflow folding will be handled as a followup.

Differential Revision: https://reviews.llvm.org/D59450

llvm-svn: 356345
2019-03-17 21:25:26 +00:00
Nikita Popov ef2d979943 [ConstantRange] Add fromKnownBits() method
Following the suggestion in D59450, I'm moving the code for constructing
a ConstantRange from KnownBits out of ValueTracking, which also allows us
to test this code independently.

I'm adding this method to ConstantRange rather than KnownBits (which
would have been a bit nicer API wise) to avoid creating a dependency
from Support to IR, where ConstantRange lives.

Differential Revision: https://reviews.llvm.org/D59475

llvm-svn: 356339
2019-03-17 20:24:02 +00:00
Nikita Popov 614b1bea97 [ValueTracking] Use ConstantRange overflow checks for unsigned add/sub; NFC
Use the methods introduced in rL356276 to implement the
computeOverflowForUnsigned(Add|Sub) functions in ValueTracking, by
converting the KnownBits into a ConstantRange.

This is NFC: The existing KnownBits based implementation uses the same
logic as the the ConstantRange based one. This is not the case for the
signed equivalents, so I'm only changing unsigned here.

This is in preparation for D59386, which will also intersect the
computeConstantRange() result into the range determined from KnownBits.

llvm-svn: 356290
2019-03-15 18:37:45 +00:00
Teresa Johnson 70ec64cb72 [ThinLTO] Restructure AliasSummary to contain ValueInfo of Aliasee
Summary:
The AliasSummary previously contained the AliaseeGUID, which was only
populated when reading the summary from bitcode. This patch changes it
to instead hold the ValueInfo of the aliasee, and always populates it.
This enables more efficient access to the ValueInfo (specifically in the
recent patch r352438 which needed to perform an index hash table lookup
using the aliasee GUID).

As noted in the comments in AliasSummary, we no longer technically need
to keep a pointer to the corresponding aliasee summary, since it could
be obtained by walking the list of summaries on the ValueInfo looking
for the summary in the same module. However, I am concerned that this
would be inefficient when walking through the index during the thin
link for various analyses. That can be reevaluated in the future.

By always populating this new field, we can remove the guard and special
handling for a 0 aliasee GUID when dumping the dot graph of the summary.

An additional improvement in this patch is when reading the summaries
from LLVM assembly we now set the AliaseeSummary field to the aliasee
summary in that same module, which makes it consistent with the behavior
when reading the summary from bitcode.

Reviewers: pcc, mehdi_amini

Subscribers: inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D57470

llvm-svn: 356268
2019-03-15 15:11:38 +00:00
Sanjay Patel de1d5d3675 [InstCombine] canonicalize funnel shift constant shift amount to be modulo bitwidth
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.

We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.

Differential Revision: https://reviews.llvm.org/D59374

llvm-svn: 356192
2019-03-14 19:22:08 +00:00
Alina Sbirlea 2d7458a351 [MemorySSA] Remove redundant walker assignment [NFC].
Subscribers: llvm-commits
llvm-svn: 356189
2019-03-14 18:45:17 +00:00
Teresa Johnson 4ab0a9f0a4 [SCEV] Use depth limit for trunc analysis
Summary:
This fixes an extremely long compile time caused by recursive analysis
of truncs, which were not previously subject to any depth limits unlike
some of the other ops. I decided to use the same control used for
sext/zext, since the routines analyzing these are sometimes mutually
recursive with the trunc analysis.

Reviewers: mkazantsev, sanjoy

Subscribers: sanjoy, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58994

llvm-svn: 355949
2019-03-12 18:28:05 +00:00
Sjoerd Meijer 31ff647c1d [TTI] Enable analysis of clib functions in getIntrinsicCosts. NFCI.
This is addressing the issue that we're not modeling the cost of clib functions
in TTI::getIntrinsicCosts and thus we're basically addressing this fixme:
    
// FIXME: This is wrong for libc intrinsics.

To enable analysis of clib functions, we not only need an intrinsic ID and
formal arguments, but also the actual user of that function so that we can e.g.
look at alignment and values of arguments. So, this is the initial plumbing to
pass the user of an intrinsinsic on to getCallCosts, which queries
getIntrinsicCosts.

Differential Revision: https://reviews.llvm.org/D59014

llvm-svn: 355901
2019-03-12 09:48:02 +00:00
Sanjoy Das 3f5ce18658 Reland "Relax constraints for reduction vectorization"
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.

Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355889
2019-03-12 01:31:44 +00:00
Sanjoy Das 2136a5bc49 Revert "Relax constraints for reduction vectorization"
This reverts commit r355868.  Breaks hexagon.

llvm-svn: 355873
2019-03-11 22:37:31 +00:00
Sanjoy Das 93f8cc186a Relax constraints for reduction vectorization
Summary:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355868
2019-03-11 21:36:41 +00:00
Nikita Popov 490975979b [ValueTracking] Move constant range computation into ValueTracking; NFC
InstructionSimplify currently has some code to determine the constant
range of integer instructions for some simple cases. It is used to
simplify icmps.

This change moves the relevant code into ValueTracking as
llvm::computeConstantRange(), so it can also be reused for other
purposes.

In particular this is with the optimization of overflow checks in
mind (ref D59071), where constant ranges cover some cases that
known bits don't.

llvm-svn: 355781
2019-03-09 21:17:42 +00:00
Michael Kruse 65c5821e3f [RegionPass] Fix forgotten "!".
Commit r355068 "Fix IR/Analysis layering issue with OptBisect" uses the
template

   return Gate.isEnabled() && !Gate.shouldRunPass(this, getDescription(...));

for all pass kinds. For the RegionPass, it left out the not operator,
causing region passes to be skipped as soon as a pass gate is used.

llvm-svn: 355733
2019-03-08 21:03:06 +00:00
George Burgess IV 4ea679f1f4 [CFLAnders] Fix typo in comment; NFC
Patch by Enna1!

Differential Revision: https://reviews.llvm.org/D58756

llvm-svn: 355715
2019-03-08 19:28:55 +00:00
Clement Courbet 8e16d73346 [SelectionDAG] Allow the user to specify a memeq function.
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.

This is sub-optimal because memcmp has to compute much more than
equality.

This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.

`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.

Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56593

llvm-svn: 355672
2019-03-08 09:07:45 +00:00
Fangrui Song 9ade843ccb [IDF] Delete a redundant J-edge test
In the DJ-graph based computation of iterated dominance frontiers,
SuccNode->getIDom() == Node is one of the tests to check if (Node,Succ)
is a J-edge. If it is true, since Node is dominated by Root,

  SuccLevel = level(Node)+1 > RootLevel

which means the next test SuccLevel > RootLevel will also be true. test
the check is redundant and can be deleted as it also involves one
indirection and provides no speed-up.

llvm-svn: 355589
2019-03-07 11:42:59 +00:00
David Green 4511f3fa86 [SCEV] Ensure that isHighCostExpansion takes into account what is being divided
A SCEV is not low-cost just because you can divide it by a power of 2. We need to also
check what we are dividing to make sure it too is not a high-code expansion. This helps
to not expand the exit value of certain loops, helping not to bloat the code.

The change in no-iv-rewrite.ll is reverting back to what it was testing before rL194116,
and looks a lot like the other tests in replace-loop-exit-folds.ll.

Differential Revision: https://reviews.llvm.org/D58435

llvm-svn: 355393
2019-03-05 12:12:18 +00:00
Sanjoy Das 719e78631d PHI nodes are not `FPMathOperator` s
Reviewers: chandlerc, arsenm

Reviewed By: arsenm

Subscribers: wdng, arsenm, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58887

llvm-svn: 355362
2019-03-05 01:15:08 +00:00
Sanjay Patel 2a70703770 [ValueTracking] do not try to peek through bitcasts in computeKnownBitsFromAssume()
There are no tests for this case, and I'm not sure how it could ever work,
so I'm just removing this option from the matcher. This should fix PR40940:
https://bugs.llvm.org/show_bug.cgi?id=40940

llvm-svn: 355292
2019-03-03 18:59:33 +00:00
Fangrui Song 5fa53d1593 [DemandedBits] Remove some redundancy in the work list
InputIsKnownDead check is shared by all operands. Compute it once.

For non-integer instructions, use Visited.insert(I).second to replace a
find() and an insert().

llvm-svn: 355290
2019-03-03 14:50:01 +00:00
Fangrui Song 981f216d1d [DemandedBits] Optimize a find()+insert pattern with try_emplace and APInt::operator|=
llvm-svn: 355284
2019-03-03 11:12:57 +00:00
Florian Hahn 98f11a7d75 [SCEV] Handle case where MaxBECount is less precise than ExactBECount for OR.
In some cases, MaxBECount can be less precise than ExactBECount for AND
and OR (the AND case was PR26207). In the OR test case, both ExactBECounts are
undef, but MaxBECount are different, so we hit the assertion below. This
patch uses the same solution the AND case already uses.

Assertion failed:
   ((isa<SCEVCouldNotCompute>(ExactNotTaken) || !isa<SCEVCouldNotCompute>(MaxNotTaken))
     && "Exact is not allowed to be less precise than Max"), function ExitLimit

This patch also consolidates test cases for both AND and OR in a single
test case.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13245

Reviewers: sanjoy, efriedma, mkazantsev

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D58853

llvm-svn: 355259
2019-03-02 02:31:44 +00:00
Florian Hahn 3c7e92b5d6 [SCEV] Remove undef check for SCEVConstant (NFC)
The value stored in SCEVConstant is of type ConstantInt*, which can
never be UndefValue. So we should never hit that code.

Reviewers: mkazantsev, sanjoy

Reviewed By: sanjoy

Differential Revision: https://reviews.llvm.org/D58851

llvm-svn: 355257
2019-03-02 01:57:28 +00:00
Nikita Popov ed3ca9272f [ValueTracking] Known bits support for unsigned saturating add/sub
We have two sources of known bits:

1. For adds leading ones of either operand are preserved. For sub
leading zeros of LHS and leading ones of RHS become leading zeros in
the result.

2. The saturating math is a select between add/sub and an all-ones/
zero value. As such we can carry out the add/sub known bits
calculation, and only preseve the known one/zero bits respectively.

Differential Revision: https://reviews.llvm.org/D58329

llvm-svn: 355223
2019-03-01 20:07:04 +00:00
Jonas Hahnfeld e071cd86df Hide two unused debugging methods, NFCI.
GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and
StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used
in Release builds. Hide them behind 'ifndef NDEBUG'.

llvm-svn: 355205
2019-03-01 17:15:21 +00:00
Rong Xu a6ff69f6dd [PGO] Context sensitive PGO (part 2)
Part 2 of CSPGO changes (mostly related to ProfileSummary).
Note that I use a default parameter in setProfileSummary() and getSummary().
This is to break the dependency in clang. I will make the parameter explicit
after changing clang in a separated patch.

Differential Revision: https://reviews.llvm.org/D54175

llvm-svn: 355131
2019-02-28 19:55:07 +00:00
Nikita Popov af2b0bef43 [ValueTracking] More accurate unsigned sub overflow detection
Second part of D58593.

Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a - b overflows iff a < b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and b subject to the known bits
constraint.

llvm-svn: 355109
2019-02-28 18:04:20 +00:00
Bjorn Pettersson d30f308a9f Add support for computing "zext of value" in KnownBits. NFCI
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.

This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.

Reviewers: craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58650

llvm-svn: 355099
2019-02-28 15:45:29 +00:00
Nikita Popov 6c57395fb4 [ValueTracking] More accurate unsigned add overflow detection
Part of D58593.

Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a + b overflows iff a > ~b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and ~b subject to the known bits
constraint.

llvm-svn: 355072
2019-02-28 08:11:20 +00:00
Richard Trieu b37a70f40e Fix IR/Analysis layering issue with OptBisect
OptBisect is in IR due to LLVMContext using it.  However, it uses IR units from
Analysis as well.  This change moves getDescription functions from OptBisect
to their respective IR units.  Generating names for IR units will now be up
to the callers, keeping the Analysis IR units in Analysis.  To prevent
unnecessary string generation, isEnabled function is added so that callers know
when the description needs to be generated.

Differential Revision: https://reviews.llvm.org/D58406

llvm-svn: 355068
2019-02-28 04:00:55 +00:00
Alina Sbirlea fcfa7c5f92 [MemorySSA] Make insertDef insert corresponding phi nodes.
Summary:
The original assumption for the insertDef method was that it would not
materialize Defs out of no-where, hence it will not insert phis needed
after inserting a Def.

However, when cloning an instruction (use case used in LICM), we do
materialize Defs "out of no-where". If the block receiving a Def has at
least one other Def, then no processing is needed. If the block just
received its first Def, we must check where Phi placement is needed.
The only new usage of insertDef is in LICM, hence the trigger for the bug.

But the original goal of the method also fails to apply for the move()
method. If we move a Def from the entry point of a diamond to either the
left or right blocks, then the merge block must add a phi.
While this usecase does not currently occur, or may be viewed as an
incorrect transformation, MSSA must behave corectly given the scenario.

Resolves PR40749 and PR40754.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58652

llvm-svn: 355040
2019-02-27 22:20:22 +00:00
Sanjay Patel 9dada83d6c [InstSimplify] remove zero-shift-guard fold for general funnel shift
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html

We can't remove the compare+select in the general case because
we are treating funnel shift like a standard instruction (as
opposed to a special instruction like select/phi).

That means that if one of the operands of the funnel shift is
poison, the result is poison regardless of whether we know that
the operand is actually unused based on the instruction's
particular semantics.

The motivating case for this transform is the more specific
rotate op (rather than funnel shift), and we are preserving the
fold for that case because there is no chance of introducing
extra poison when there is no anonymous extra operand to the
funnel shift.

llvm-svn: 354905
2019-02-26 18:26:56 +00:00
Simon Pilgrim a066f1f9e6 [Vectorizer] Add vectorization support for fixed smul/umul intrinsics
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.

Differential Revision: https://reviews.llvm.org/D58616

llvm-svn: 354790
2019-02-25 15:42:02 +00:00
Jordan Rupprecht 6387fa2715 [NFC] Fix typos: preceeding -> preceding
llvm-svn: 354715
2019-02-23 01:28:32 +00:00
Chijun Sima 70e97163e0 [DTU] Refine the interface and logic of applyUpdates
Summary:
This patch separates two semantics of `applyUpdates`:
1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of `the number of edge insertions` and `the number of edge deletions` to infer the status of an edge before and after the update.
2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated.

Logic changes:

Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, `applyUpdates` does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to `applyUpdates` might cause unintended consequences, for example,
```
DTU(Lazy) and Edge A->B exists.
1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued
2. Remove A->B
3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended)
```
But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue.

Interface changes:
The second semantic of `applyUpdates`  is separated to `applyUpdatesPermissive`.
These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in `transforms/utils`.

Reviewers: kuhar, brzycki, dmgreen, grosser

Reviewed By: brzycki

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58170

llvm-svn: 354669
2019-02-22 13:48:38 +00:00
Sanjay Patel 68171e3cd6 [InstSimplify] use any-zero matcher for fcmp folds
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201

The previous attempt at this in rL354406 had a logic bug that
actually triggered a regression test failure, but I failed to
notice it the first time.

llvm-svn: 354467
2019-02-20 14:34:00 +00:00
Sanjay Patel 49f97395ab Revert "[InstSimplify] use any-zero matcher for fcmp folds"
This reverts commit 058bb83513.
Forgot to update another test affected by this change.

llvm-svn: 354408
2019-02-20 00:20:38 +00:00
Sanjay Patel 058bb83513 [InstSimplify] use any-zero matcher for fcmp folds
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201

llvm-svn: 354406
2019-02-20 00:09:50 +00:00
Sam Parker 0b53e8454b [BPI] Look through bitcasts in calcZeroHeuristic
Constant hoisting may have hidden a constant behind a bitcast so that
it isn't folded into its users. However, this prevents BPI from
calculating some of its heuristics that are based upon constant
values. So, I've added a simple helper function to look through these
casts.

Differential Revision: https://reviews.llvm.org/D58166

llvm-svn: 354119
2019-02-15 11:50:21 +00:00
Nick Desaulniers 6a84cd3b8e Revert "[INLINER] allow inlining of address taken blocks"
This reverts commit 19e95fe611.

llvm-svn: 354082
2019-02-14 23:42:21 +00:00
Nick Desaulniers 19e95fe611 [INLINER] allow inlining of address taken blocks
as long as their uses does not contain calls to functions that capture
the argument (potentially allowing the blockaddress to "escape" the
lifetime of the caller).

TODO:
- add more tests
- fix crash in llvm::updateCGAndAnalysisManagerForFunctionPass when
  invoking Transforms/Inline/blockaddress.ll

llvm-svn: 354079
2019-02-14 23:35:53 +00:00
Max Kazantsev 24383cd7bb Make widenable condition transparent for MemoryWriteTracking
Side effects of widenable condition intrinsic are modelled via
InaccessibleMemOnly, and there is no way to say that it isn't
really writing any memory. This patch teaches MemoryWriteTracking
ignore this intrinsic.

llvm-svn: 354021
2019-02-14 11:10:29 +00:00
Max Kazantsev b3168a400f Teach isGuaranteedToTransferExecutionToSuccessor about widenable conditions
Widenable condition intrinsic is guaranteed to return value, notify
the isGuaranteedToTransferExecutionToSuccessor function about it.

llvm-svn: 354020
2019-02-14 11:10:21 +00:00
Max Kazantsev 4a1c02987e [NFC] Simplify code & reduce nest slightly
llvm-svn: 353832
2019-02-12 11:31:46 +00:00
Evandro Menezes f4a369596f [TargetLibraryInfo] Update run time support for Windows
It seems that, since VC19, the `float` C99 math functions are supported for all
targets, unlike the C89 ones.

According to the discussion at https://reviews.llvm.org/D57625.

llvm-svn: 353758
2019-02-11 22:12:01 +00:00
Alina Sbirlea d77edc00a8 [MemorySSA] Remove verifyClobberSanity.
Summary:
This verification may fail after certain transformations due to
BasicAA's fragility. Added a small explanation and a testcase that
triggers the assert in checkClobberSanity (before its removal).
Addresses PR40509.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, llvm-commits, Prazek

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57973

llvm-svn: 353739
2019-02-11 19:51:21 +00:00
Michael Kruse 77a614a6e1 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

llvm-svn: 353738
2019-02-11 19:45:44 +00:00
Evandro Menezes 4b86c474ff [TargetLibraryInfo] Update run time support for Windows
It seems that the run time for Windows has changed and supports more math
functions than it used to, especially on AArch64, ARM, and AMD64.

Fixes PR40541.

Differential revision: https://reviews.llvm.org/D57625

llvm-svn: 353733
2019-02-11 19:02:28 +00:00
Chandler Carruth 9beadff6a5 Move CFLGraph and the AA summary code over to the new `CallBase`
instruction base class rather than the `CallSite` wrapper.

llvm-svn: 353676
2019-02-11 09:25:41 +00:00
Chandler Carruth 2d2a4359a2 Remove `CallSite` from the CodeMetrics analysis, moving it to the new
`CallBase` and simpler APIs therein.

llvm-svn: 353673
2019-02-11 09:03:32 +00:00
Chandler Carruth dac20a8254 [CallSite removal] Port InstSimplify over to use `CallBase` both in its
interface and implementation.

Port code with: `cast<CallBase>(CS.getInstruction())`.

llvm-svn: 353662
2019-02-11 07:54:10 +00:00
Chandler Carruth 751d95fb9b [CallSite removal] Migrate ConstantFolding APIs and implementation to
`CallBase`.

Users have been updated. You can see how to update any out-of-tree
usages: pass `cast<CallBase>(CS.getInstruction())`.

llvm-svn: 353661
2019-02-11 07:51:44 +00:00
Craig Topper 784929d045 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
Sergey Dmitriev 807960e6ef [CodeExtractor] Update function's assumption cache after extracting blocks from it
Summary: Assumption cache's self-updating mechanism does not correctly handle the case when blocks are extracted from the function by the CodeExtractor. As a result function's assumption cache may have stale references to the llvm.assume calls that were moved to the outlined function. This patch fixes this problem by removing extracted llvm.assume calls from the function’s assumption cache.

Reviewers: hfinkel, vsk, fhahn, davidxl, sanjoy

Reviewed By: hfinkel, vsk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57215

llvm-svn: 353500
2019-02-08 06:55:18 +00:00
Sam Parker 67756c09f2 [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Alina Sbirlea 6cba96ed52 [LICM/MSSA] Add promotion to scalars by building an AliasSetTracker with MemorySSA.
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.

Reviewers: sanjoy, chandlerc

Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D56625

llvm-svn: 353339
2019-02-06 20:25:17 +00:00
Alina Sbirlea 910c6bef3e [AliasSetTracker] Pass MustAlias to addPointer more often.
Summary:
Pass the alias info to addPointer when available. Will save an alias()
call for must sets when adding a known Must or May alias.
[Part of a series of cleanup patches]

Reviewers: reames, mkazantsev

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D56613

llvm-svn: 353335
2019-02-06 19:55:12 +00:00
Philip Reames 00ae46ba52 [AliasSetTracker] Minor style tweak to avoid a variable w/two distinct live ranges [NFC]
llvm-svn: 353267
2019-02-06 03:46:40 +00:00
Richard Trieu 5f436fc57a Move DomTreeUpdater from IR to Analysis
DomTreeUpdater depends on headers from Analysis, but is in IR.  This is a
layering violation since Analysis depends on IR.  Relocate this code from IR
to Analysis to fix the layering violation.

llvm-svn: 353265
2019-02-06 02:52:52 +00:00
Alina Sbirlea b9c1bc6d3c [BasicAA] Cache nonEscapingLocalObjects for alias() calls.
Summary:
Use a small cache for Values tested by nonEscapingLocalObject().
Since the calls to PointerMayBeCaptured are fairly expensive, this saves
a good amount of compile time for anything relying heavily on
BasicAA.alias() calls.

This uses the same approach as the AliasCache, i.e. the cache is reset
after each alias() call. The cache is not used or updated by modRefInfo
calls since it's harder to know when to reset the cache.

Testcases that show improvements with this patch are too large to
include. Example compile time improvement: 7s to 6s.

Reviewers: chandlerc, sunfish

Subscribers: sanjoy, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57627

llvm-svn: 353245
2019-02-05 23:52:08 +00:00
Evandro Menezes e5bb58b115 [TargetLibraryInfo] Regroup run time functions for Windows (NFC)
Regroup supported and unsupported functions by precision and C standard.

llvm-svn: 353213
2019-02-05 20:24:21 +00:00
Hiroshi Inoue 02a2bb2f54 [NFC] fix trivial typos in comments
llvm-svn: 353147
2019-02-05 08:30:48 +00:00
Evandro Menezes 98f356cd74 Revert "[PATCH] [TargetLibraryInfo] Update run time support for Windows"
This reverts accidental commit ff5527718d.

llvm-svn: 353118
2019-02-04 23:34:50 +00:00
Evandro Menezes ff5527718d [PATCH] [TargetLibraryInfo] Update run time support for Windows
It seems that the run time for Windows has changed and supports more math
functions than before.  Since LLVM requires at least VS2015, I assume that
this is the run time that would be redistributed with programs built with
Clang.  Thus, I based this update on the header file `math.h` that
accompanies it.

This patch addresses the PR40541.  Unfortunately, I have no access to a
Windows development environment to validate it.

llvm-svn: 353114
2019-02-04 23:29:41 +00:00
David Callahan fd3e7a9320 Adjust cardinality of internal inliner thresholds
Summary:
While compiling openJDK11 (also other workloads), some make files would pass both  CFLAGS  and LDFLAGS at link step ; resulting in duplicate options on the command line when one is using LTO and trying to influence the inliner. Most of the internal flags are ZeroOrMore, this diff changes the remaining ones.

Reviewers: david2050, twoh, modocache

Reviewed By: twoh

Subscribers: mehdi_amini, dexonsmith, eraman, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57537

Patch by: Abdoul-Kader Keita

llvm-svn: 353071
2019-02-04 18:46:25 +00:00
Max Kazantsev 437ee05885 [SCEV] Do not bother creating separate SCEVUnknown for unreachable nodes
Currently, SCEV creates SCEVUnknown for every node of unreachable code. If we
have a huge amounts of such code, we will be littering SE with these nodes. We could
just state that they all are undef and save some memory.

Differential Revision: https://reviews.llvm.org/D57567
Reviewed By: sanjoy

llvm-svn: 353017
2019-02-04 05:04:19 +00:00
Philip Pfaffe 9438585fe4 [DA][NewPM] Handle transitive dependencies in the new-pm version of DA
Summary:
The analysis result of DA caches pointers to AA, SCEV, and LI, but it
never checks for their invalidation. Fix that.

Reviewers: chandlerc, dmgreen, bogner

Reviewed By: dmgreen

Subscribers: hiraditya, bollu, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D56381

llvm-svn: 352986
2019-02-03 12:25:41 +00:00
Dmitry Venikov aaa709f2ec [InstSimplify] Missed optimization in math expression: log10(pow(10.0,x)) == x, log2(pow(2.0,x)) == x
Summary: This patch enables folding following instructions under -ffast-math flag: log10(pow(10.0,x)) -> x, log2(pow(2.0,x)) -> x

Reviewers: hfinkel, spatel, efriedma, craig.topper, zvi, majnemer, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D41940

llvm-svn: 352981
2019-02-03 03:48:30 +00:00
Yevgeny Rouban 15b17d0a7c Provide reason messages for unviable inlining
InlineCost's isInlineViable() is changed to return InlineResult
instead of bool. This provides messages for failure reasons and
allows to get more specific messages for cases where callsites
are not viable for inlining.

Reviewed By: xbolva00, anemet

Differential Revision: https://reviews.llvm.org/D57089

llvm-svn: 352849
2019-02-01 10:44:43 +00:00
Alina Sbirlea 240a90a57e [MemorySSA] Extend removeMemoryAccess API to optimize MemoryPhis.
Summary:
EarlyCSE needs to optimize MemoryPhis after an access is removed and has
special handling for it. This should be handled by MemorySSA instead.
The default remains that MemoryPhis are *not* optimized after an access
is removed.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D57199

llvm-svn: 352787
2019-01-31 20:13:47 +00:00
Yevgeny Rouban ae29857d64 Test commit. NFCI.
llvm-svn: 352738
2019-01-31 08:49:20 +00:00
Max Kazantsev b37419ef66 [SCEV] Prohibit SCEV transformations for huge SCEVs
Currently SCEV attempts to limit transformations so that they do not work with
big SCEVs (that may take almost infinite compile time). But for this, it uses heuristics
such as recursion depth and number of operands, which do not give us a guarantee
that we don't actually have big SCEVs. This situation is still possible, though it is not
likely to happen. However, the bug PR33494 showed a bunch of simple corner case
tests where we still produce huge SCEVs, even not reaching big recursion depth etc.

This patch introduces a concept of 'huge' SCEVs. A SCEV is huge if its expression
size (intoduced in D35989) exceeds some threshold value. We prohibit optimizing
transformations if any of SCEVs we are dealing with is huge. This gives us a reliable
check that we don't spend too much time working with them.

As the next step, we can possibly get rid of old limiting mechanisms, such as recursion
depth thresholds.

Differential Revision: https://reviews.llvm.org/D35990
Reviewed By: reames

llvm-svn: 352728
2019-01-31 06:19:25 +00:00
Erik Pilkington 600e9deacf Add a 'dynamic' parameter to the objectsize intrinsic
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56761

llvm-svn: 352664
2019-01-30 20:34:35 +00:00
Hiroshi Inoue c437f310a5 [NFC] fix trivial typos in comments
llvm-svn: 352602
2019-01-30 05:26:31 +00:00
Max Kazantsev 23e642248d [NFC] Use ArrayRef instead of SmallVectorImpl where possible
llvm-svn: 352466
2019-01-29 09:39:15 +00:00
Max Kazantsev 468ad52213 [SCEV] Take correct loop in AddRec simplification. PR40420
The code of AddRec simplification is using wrong loop when it creates a new
AddRecExpr. It should be using AddRecLoop which we have saved and against which
all gate checks are made, and not calling AddRec->getLoop() over and over
again because AddRec may change and become an AddRecurrency from outer loop
during the transform iterations.

Considering this change trivial, commiting for postcommit review.

llvm-svn: 352451
2019-01-29 05:37:59 +00:00
Teresa Johnson 2f616e479b [ThinLTO] Add option to dump per-module summary dot graph
Summary:
I found that there currently isn't a way to invoke exportToDot from
the command line for a per-module summary index, and therefore no
testing of that case. Add an internal option and use it to test dumping
of per module summary indexes.

In particular, I am looking at fixing the limitation that causes the
aliasee GUID in the per-module summary to be 0, and want to be able to
test that change.

Reviewers: evgeny777

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D57206

llvm-svn: 352441
2019-01-28 23:43:26 +00:00
Alina Sbirlea 8e1d65771a [AliasSetTracker] Cleanup more comments. [NFCI]
llvm-svn: 352416
2019-01-28 19:38:03 +00:00
Alina Sbirlea d8c829bc22 [AliasSetTracker] Cleanup comments. [NFCI]
llvm-svn: 352406
2019-01-28 19:01:32 +00:00
Alina Sbirlea 3d1d95ca55 [AliasSetTracker] Update signature to aliasesPointer [NFCI].
llvm-svn: 352399
2019-01-28 18:30:05 +00:00
Johannes Doerfert 00102c7d95 [ValueTracking] Look through casts when determining non-nullness
Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the
value of their operand and can therefore be looked through when we
determine non-nullness.

Differential Revision: https://reviews.llvm.org/D54956

llvm-svn: 352293
2019-01-26 23:40:35 +00:00
Eli Friedman 525ef0159d [Analysis] Fix isSafeToLoadUnconditionally handling of volatile.
A volatile operation cannot be used to prove an address points to normal
memory.  (LangRef was recently updated to state it explicitly.)

Differential Revision: https://reviews.llvm.org/D57040

llvm-svn: 352109
2019-01-24 21:31:13 +00:00
Simon Pilgrim 0e08b6f017 Move saturated arithmetic intrinsics to other integer intrinsics. NFCI.
They were in the floating point group.

llvm-svn: 351953
2019-01-23 13:49:10 +00:00
Max Kazantsev ca47f1f72a [NFC] Add function to parse widenable conditional branches
llvm-svn: 351803
2019-01-22 11:21:32 +00:00
Max Kazantsev bd374b27cc [NFC] Add detector for guards expressed as branch by widenable conditions
This patch adds a function to detect guards expressed in explicit control
flow form as branch by `and` with widenable condition intrinsic call:

    %wc = call i1 @llvm.experimental.widenable.condition()
    %guard_cond = and i1, %some_cond, %wc
    br i1 %guard_cond, label %guarded, label %deopt

  deopt:
    <maybe some non-side-effecting instructions>
    deoptimize()

This form can be used as alternative to implicit control flow guard
representation expressed by `experimental_guard` intrinsic.

Differential Revision: https://reviews.llvm.org/D56074
Reviewed By: reames

llvm-svn: 351791
2019-01-22 09:36:22 +00:00
Philip Reames 390c0e2f72 [CVP] Use LVI to constant fold deopt operands
Deopt operands are generally intended to record information about a site in code with minimal perturbation of the surrounding code. Idiomatically, they also tend to appear down rare paths. Putting these together, we have an obvious case for extending CVP w/deopt operand constant folding. Arguably, we should be doing this for all operands on all instructions, but that's definitely a much larger and risky change.

Differential Revision: https://reviews.llvm.org/D55678

llvm-svn: 351774
2019-01-22 01:34:33 +00:00
Max Kazantsev 85c988388a [SCEV][NFC] Introduces expression sizes estimation
This patch introduces the field `ExpressionSize` in SCEV. This field is
calculated only once on SCEV creation, and it represents the complexity of
this SCEV from arithmetical point of view (not from the point of the number
of actual different SCEV nodes that are used in the expression). Roughly
saying, it is the number of operands and operations symbols when we print this
SCEV.

A formal definition is following: if SCEV `X` has operands
  `Op1`, `Op2`, ..., `OpN`,
then
  Size(X) = 1 + Size(Op1) + Size(Op2) + ... + Size(OpN).
Size of SCEVConstant and SCEVUnknown is one.

Expression size may be used as a universal way to limit SCEV transformations
for huge SCEVs. Currently, we have a bunch of options that represents various
limits (such as recursion depth limit) that may not make any sense from the
point of view of a LLVM users who is not familiar with SCEV internals, and all
these different options pursue one goal. A more general rule that may
potentially allow us to get rid of this redundancy in options is "do not make
transformations with SCEVs of huge size". It can apply to all SCEV traversals
and transformations that may need to visit a SCEV node more than once, hence
they are prone to combinatorial explosions.

This patch only introduces SCEV sizes calculation as NFC, its utilization will
be introduced in follow-up patches.

Differential Revision: https://reviews.llvm.org/D35989
Reviewed By: reames

llvm-svn: 351725
2019-01-21 06:19:50 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Teresa Johnson 8d86f1ba47 Revert "[ThinLTO] Add summary entries for index-based WPD"
Mistaken commit of something still under review!

This reverts commit r351453.

llvm-svn: 351455
2019-01-17 16:05:04 +00:00
Teresa Johnson 4fcf3b1621 [ThinLTO] Add summary entries for index-based WPD
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).

Also added are the necessary bitcode records, and the corresponding
assembly support.

The index-based WPD will be sent as a follow-on.

Depends on D53890.

Reviewers: pcc

Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54815

llvm-svn: 351453
2019-01-17 15:49:03 +00:00
Tom Stellard 3d36e5c3e6 Only promote args when function attributes are compatible
Summary:
Check to make sure that the caller and the callee have compatible
function arguments before promoting arguments.  This uses the same
TargetTransformInfo queries that are used to determine if attributes
are compatible for inlining.

The goal here is to avoid breaking ABI when a called function's ABI
depends on a target feature that is not enabled in the caller.

This is a very conservative fix for PR37358.  Ideally we would have a more
sophisticated check for ABI compatiblity rather than checking if the
attributes are compatible for inlining.

Reviewers: echristo, chandlerc, eli.friedman, craig.topper

Reviewed By: echristo, chandlerc

Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits

Differential Revision: https://reviews.llvm.org/D53554

llvm-svn: 351296
2019-01-16 05:15:31 +00:00
Nikita Popov 5f393eb5da Reapply "[DemandedBits] Use SetVector for Worklist"
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.

Reapplying with a smaller number of inline elements in the
SmallSetVector, to avoid running into the SmallDenseMap issue
described in D56455.

Differential Revision: https://reviews.llvm.org/D56362

llvm-svn: 350997
2019-01-12 09:09:15 +00:00
Nikita Popov 9f6e9cf71b [ConstantFolding] Fold undef for integer intrinsics
This fixes https://bugs.llvm.org/show_bug.cgi?id=40110.

This implements handling of undef operands for integer intrinsics in
ConstantFolding, in particular for the bitcounting intrinsics (ctpop,
cttz, ctlz), the with.overflow intrinsics, the saturating math
intrinsics and the funnel shift intrinsics.

The undef behavior follows what InstSimplify does for the general cas
e of non-constant operands. For the bitcount intrinsics (where
InstSimplify doesn't do undef handling -- there cannot be a combination
of an undef + non-constant operand) I'm using a 0 result if the intrinsic
is defined for zero and undef otherwise.

Differential Revision: https://reviews.llvm.org/D55950

llvm-svn: 350971
2019-01-11 21:18:00 +00:00
Teresa Johnson 290a839891 [LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).

The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.

This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.

Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.

Reviewers: pcc

Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D53890

llvm-svn: 350948
2019-01-11 18:31:57 +00:00
Alina Sbirlea e41f4b39e5 [MemorySSA] Disable checkClobberSanity for SkipSelfWalker.
Sanity will fail for this, since we're exploring getting a clobber
further than the sanity check expects.
Ideally we need to teach the sanity check to differentiate between the
two walkers based on the SkipSelf bool in the query.

llvm-svn: 350895
2019-01-10 21:47:15 +00:00
Easwaran Raman b45994b843 Refactor synthetic profile count computation. NFC.
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).

Reviewers: davidxl

Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D56464

llvm-svn: 350755
2019-01-09 20:10:27 +00:00
Easwaran Raman ed279752f0 [Inliner] Assert that the computed inline threshold is non-negative.
Reviewers: chandlerc

Subscribers: haicheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D56409

llvm-svn: 350751
2019-01-09 19:26:17 +00:00
David Callahan 3ef0f4447d refactor BlockFrequencyInfo::view to take a title parameter
Summary: All a non-default title for the debugging this debugging aide

Reviewers: twoh, Kader, modocache

Reviewed By: twoh

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56499

llvm-svn: 350749
2019-01-09 19:12:38 +00:00
Max Kazantsev 4615a505f8 [IPT] Drop cache less eagerly in GVN and LoopSafetyInfo
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.

This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.

Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma

llvm-svn: 350694
2019-01-09 07:28:13 +00:00
Philip Pfaffe efb5ad1c58 [DA][NewPM] Add a printerpass and port the testsuite
The new-pm version of DA is untested. Testing requires a printer, so
add that and use it in the existing DA tests.

Differential Revision: https://reviews.llvm.org/D56386

llvm-svn: 350624
2019-01-08 14:06:58 +00:00
Alina Sbirlea 12bbb4fe8d [MemorySSA] Add SkipSelfWalker.
Summary: Add implementation of SkipSelfWalker.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D56285

llvm-svn: 350561
2019-01-07 19:38:47 +00:00
Alina Sbirlea bc8aa24c2f [MemorySSA] Refactor CachingWalker.
Summary:
Refactor caching walker to make creating a walker that skips the
starting access strightforward.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits, jfb

Differential Revision: https://reviews.llvm.org/D55957

llvm-svn: 350558
2019-01-07 19:22:37 +00:00
Alina Sbirlea f723020456 [MemorySSA] Extend the clobber walker with the option to skip the starting access.
Summary:
The option enables loop transformations to hoist accesses that do not
have clobbers in the loop. If the clobber queries skips the starting
access, the result may be outside the loop instead of the header Phi.

Adding the walker that uses this option in a separate patch.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D55944

llvm-svn: 350551
2019-01-07 18:40:27 +00:00
Nikita Popov 8dd19ed3ec Revert "[DemandedBits] Use SetVector for Worklist"
This reverts commit r350547.

Seeing assertion failures on clang tests.

llvm-svn: 350549
2019-01-07 18:15:11 +00:00
Nikita Popov 353d92decb [DemandedBits] Use SetVector for Worklist
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.

Differential Revision: https://reviews.llvm.org/D56362

llvm-svn: 350547
2019-01-07 18:03:36 +00:00
Chandler Carruth 57578aaf96 [CallSite removal] Port `IndirectCallSiteVisitor` to use `CallBase` and
update client code.

Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.

Differential Revision: https://reviews.llvm.org/D56183

llvm-svn: 350508
2019-01-07 07:15:51 +00:00
Chandler Carruth 363ac68374 [CallSite removal] Migrate all Alias Analysis APIs to use the newly
minted `CallBase` class instead of the `CallSite` wrapper.

This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.

Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.

Thanks for the review from Saleem!

Differential Revision: https://reviews.llvm.org/D55641

llvm-svn: 350503
2019-01-07 05:42:51 +00:00
Nikita Popov 6658fce4fc [BDCE] Remove dead uses of arguments
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.

I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.

Differential Revision: https://reviews.llvm.org/D56247

llvm-svn: 350435
2019-01-04 21:21:43 +00:00
Florian Hahn 7902405c42 [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :

  ByteOffset += GEPOffset.getSExtValue();

The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.

Changing it to

  ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());

resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.

See also
 https://reviews.llvm.org/D24729
 https://reviews.llvm.org/D24772

This commit must be merged after D38662 in order for the test to pass.

Patch by Michael Ferguson <mpfergu@gmail.com>.

Reviewers: reames, sanjoy, hfinkel

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D38501

llvm-svn: 350395
2019-01-04 14:53:22 +00:00
Simon Pilgrim c2aadfaaad [SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123)
Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics

llvm-svn: 350300
2019-01-03 12:18:23 +00:00
Hal Finkel 4f2381440d [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).

Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).

After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).

As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.

Patch by me with contributions from Michael Ferguson (mferguson@cray.com).

Differential Revision: https://reviews.llvm.org/D38662

llvm-svn: 350220
2019-01-02 16:28:09 +00:00
Nikita Popov bc9986e9ad Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 350188
2019-01-01 10:05:26 +00:00
George Burgess IV 69952979da [MemoryLocation] Use LocationSize instead of ints; NFC
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

This one sadly isn't *super* small, but all of the changes here are
either to:
- libfuncs that are passed a constant size (memcpy, memset, ...)
- instructions that store/load a constant size

So they have to be precise

llvm-svn: 350017
2018-12-23 03:36:44 +00:00
George Burgess IV 8c5413f3f7 [Loads] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

This tries to find literal loads/stores of the given type, so this has
to be precise.

llvm-svn: 350016
2018-12-23 03:10:56 +00:00
George Burgess IV 1329cf1791 [Lint] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350015
2018-12-23 02:50:08 +00:00
George Burgess IV 685e781d55 [AAEval] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350014
2018-12-23 02:39:58 +00:00
George Burgess IV 640be69249 [Analysis] More LocationSize cleanup; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350008
2018-12-22 18:23:21 +00:00
Vedant Kumar b264d69de7 [IR] Add Instruction::isLifetimeStartOrEnd, NFC
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.

This was suggested as a cleanup in D55967.

Differential Revision: https://reviews.llvm.org/D56019

llvm-svn: 349964
2018-12-21 21:49:40 +00:00
Reid Kleckner 84a2d29681 [BasicAA] Fix AA bug on dynamic allocas and stackrestore
Summary:
BasicAA has special logic for unescaped allocas, which normally applies
equally well to dynamic and static allocas. However, llvm.stackrestore
has the power to end the lifetime of dynamic allocas, without referring
to them directly.

stackrestore is already marked with the most conservative memory
modification attributes, but because the alloca is not escaped, the
normal logic produces incorrect results. I think BasicAA needs a special
case here to teach it about the relationship between dynamic allocas and
stackrestore.

Fixes PR40118

Reviewers: gbiv, efriedma, george.burgess.iv

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55969

llvm-svn: 349945
2018-12-21 19:59:03 +00:00
Florian Hahn ef307b8c26 [LAA] Avoid generating RT checks for known deps preventing vectorization.
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.

This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.

Reviewers: Ayal, anemet, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D55798

llvm-svn: 349794
2018-12-20 18:49:09 +00:00
Clement Courbet d4bd3eb85d [NFC] Fix trailing comma after function.
lib/Analysis/VectorUtils.cpp:482:2: warning: extra ‘;’ [-Wpedantic]

llvm-svn: 349732
2018-12-20 09:20:07 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Nikita Popov 3817ee7908 Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This reverts commit r349674. It causes a failure in
test-suite enc-3des.execution_time.

llvm-svn: 349684
2018-12-19 22:09:02 +00:00
Nikita Popov 649e125451 [BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 349674
2018-12-19 19:56:21 +00:00
Sanjay Patel 798c5982a0 [ValueTracking] remove unused parameters from helper functions; NFC
llvm-svn: 349641
2018-12-19 16:49:18 +00:00
Florian Hahn 485f2826ba [LAA] Introduce enum for vectorization safety status (NFC).
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.

Reviewers: anemet, anna, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D54892

llvm-svn: 349556
2018-12-18 22:25:11 +00:00
Pete Cooper be4f571107 Change the objc ARC optimizer to use the new objc.* intrinsics
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics.  This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.

Differential Revision: https://reviews.llvm.org/D55348

Reviewers: ahatanak
llvm-svn: 349534
2018-12-18 20:32:49 +00:00
Artur Pilipenko 2a0146e0fd [CaptureTracking] Pass MaxUsesToExplore from wrappers to the actual implementation
This is a follow up for rL347910. In the original patch I somehow forgot to pass
the limit from wrappers to the function which actually does the job.

llvm-svn: 349438
2018-12-18 03:32:33 +00:00
Nikita Popov 221f3fc750 [InstSimplify] Simplify saturating add/sub + icmp
If a saturating add/sub has one constant operand, then we can
determine the possible range of outputs it can produce, and simplify
an icmp comparison based on that.

The implementation is based on a similar existing mechanism for
simplifying binary operator + icmps.

Differential Revision: https://reviews.llvm.org/D55735

llvm-svn: 349369
2018-12-17 17:45:18 +00:00
Wei Mi 66c6c5abea [SampleFDO] handle ProfileSampleAccurate when initializing function entry count
ProfileSampleAccurate is used to indicate the profile has exact match to the
code to be optimized.

Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite
and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function
entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle
ProfileSampleAccurate in multiple places.

Differential Revision: https://reviews.llvm.org/D55660

llvm-svn: 349088
2018-12-13 21:51:42 +00:00
Easwaran Raman 5a7056fa03 [ThinLTO] Compute synthetic function entry count
Summary:
This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.

Reviewers: tejohnson

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D43521

llvm-svn: 349076
2018-12-13 19:54:27 +00:00
David L. Jones 54c01ad6a9 Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail"
This revision caused trucated memsets for structs with padding. See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html

llvm-svn: 349002
2018-12-13 03:15:11 +00:00
Michael Kruse 7244852557 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288

llvm-svn: 348944
2018-12-12 17:32:52 +00:00
Wei Mi 7da5a08e1a [SampleFDO] Extend profile-sample-accurate option to cover isFunctionColdInCallGraph
For SampleFDO, when a callsite doesn't appear in the profile, it will not be marked as cold callsite unless the option -profile-sample-accurate is specified.

But profile-sample-accurate doesn't cover function isFunctionColdInCallGraph which is used to decide whether a function should be put into text.unlikely section, so even if the user knows the profile is accurate and specifies profile-sample-accurate, those functions not appearing in the sample profile are still not be put into text.unlikely section right now.

The patch fixes that.

Differential Revision: https://reviews.llvm.org/D55567

llvm-svn: 348940
2018-12-12 17:09:27 +00:00
Nikita Popov 79c994d976 [ConstantFolding] Handle leading zero-size elements in load folding
Struct types may have leading zero-size elements like [0 x i32], in
which case the "real" element at offset 0 will not necessarily coincide
with the 0th element of the aggregate. ConstantFoldLoadThroughBitcast()
wants to drill down the element at offset 0, but currently always picks
the 0th aggregate element to do so. This patch changes the code to find
the first non-zero-size element instead, for the struct case.

The motivation behind this change is https://github.com/rust-lang/rust/issues/48627.
Rust is fond of emitting [0 x iN] separators between struct elements to
enforce alignment, which prevents constant folding in this particular case.

The additional tests with [4294967295 x [0 x i32]] check that we don't
end up unnecessarily looping over a large number of zero-size elements
of a zero-size array.

Differential Revision: https://reviews.llvm.org/D55169

llvm-svn: 348895
2018-12-11 20:29:16 +00:00
Fedor Sergeev a1d95c3fc4 [NewPM] fixing asserts on deleted loop in -print-after-all
IR-printing AfterPass instrumentation might be called on a loop
that has just been invalidated. We should skip printing it to
avoid spurious asserts.

Reviewed By: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D54740

llvm-svn: 348887
2018-12-11 19:05:35 +00:00
Nikita Popov 94b8e2ea4e [MemCpyOpt] memset->memcpy forwarding with undef tail
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

For the implementation, I'm reusing a bit of code for a similar existing
optimization (direct memcpy of undef). I've also added memset support to
MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
used, but it seems to make more sense to add this to GetLocation and thus
make the computation cachable.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 348645
2018-12-07 21:16:58 +00:00
Nikita Popov 110cf05203 Reapply "[DemandedBits][BDCE] Support vectors of integers"
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)

The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348602
2018-12-07 15:38:13 +00:00
Nikita Popov 14ca9a8355 Revert "[DemandedBits][BDCE] Support vectors of integers"
This reverts commit r348549. Causing assertion failures during
clang build.

llvm-svn: 348558
2018-12-07 00:42:03 +00:00
Nikita Popov cf65b9207b [DemandedBits][BDCE] Support vectors of integers
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

The getDemandedBits() method can now only be called on instructions that
have integer or vector of integer type. Previously it could be called on
any sized instruction (even if it was not particularly useful). The size
of the return value is now always the scalar size in bits (while
previously it was the type size in bits).

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348549
2018-12-06 23:50:32 +00:00
David L. Jones 5ff7b8a04a Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants"
This change caused SEGVs in instcombine. (The r347934 change seems to me to be a
precipitating cause, not a root cause. Details are on the llvm-commits thread
for r347934.)

llvm-svn: 348426
2018-12-05 23:13:50 +00:00
Sanjay Patel 3d5bb15a1d [CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC
The old function underspecified the return type, took an unused parameter,
and had a misleading name.

llvm-svn: 348292
2018-12-04 18:53:27 +00:00
Sanjay Patel 472652ef68 [CmpInstAnalysis] fix formatting; NFC
There are potential improvements to the structure of this API
raised by D54994, but remove some cosmetic blemishes before
making any functional changes.

llvm-svn: 348149
2018-12-03 15:48:30 +00:00
Fedor Sergeev 7254d3c51c Fixing -print-module-scope for legacy SCC passes
It appears that print-module-scope was not implemented for legacy SCC passes.
Fixed to print a whole module instead of just current SCC.

Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D54793

llvm-svn: 348144
2018-12-03 14:48:15 +00:00
Nikita Popov 687b92cd9c [ValueTracking] Support funnel shifts in computeKnownBits()
If the shift amount is known, we can determine the known bits of the
output based on the known bits of two inputs.

This is essentially the same functionality as implemented in D54869,
but for ValueTracking rather than InstCombine SimplifyDemandedBits.

Differential Revision: https://reviews.llvm.org/D55140

llvm-svn: 348091
2018-12-02 14:14:11 +00:00
Sanjay Patel 7d82d37854 [ValueTracking] add helper function for testing implied condition; NFCI
We were duplicating code around the existing isImpliedCondition() that
checks for a predecessor block/dominating condition, so make that a
wrapper call.

llvm-svn: 348088
2018-12-02 13:26:03 +00:00
Teresa Johnson 5b8ff375c8 [ThinLTO] Allow importing of functions with var args
Summary:
Follow up to D54270, which allowed importing of var args functions
unless they called va_start. As pointed out in the post-commit comments
on that patch, the inliner can handle functions that call va_start in
certain situations as well. Go ahead and enable importing of all var
args functions. Measurements on a large binary show that this increases
imports and binary size by an insignificant amount.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54607

llvm-svn: 348068
2018-12-01 05:11:46 +00:00
Nicolai Haehnle 413f8691ab LegacyDivergenceAnalysis: fix uninitialized value
Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26
llvm-svn: 348051
2018-11-30 23:07:49 +00:00
Nicolai Haehnle 56d0ed2a50 [DA] GPUDivergenceAnalysis for unstructured GPU kernels
Summary:
This is patch #3 of the new DivergenceAnalysis

  <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html>

The GPUDivergenceAnalysis is intended to eventually supersede the existing
LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces
incorrect results on unstructured Control-Flow Graphs:

  <https://bugs.llvm.org/show_bug.cgi?id=37185>

This patch adds the option -use-gpu-divergence-analysis to the
LegacyDivergenceAnalysis to turn it into a transparent wrapper for the
GPUDivergenceAnalysis.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D53493

llvm-svn: 348048
2018-11-30 22:55:20 +00:00
Renato Golin de4b88e5ac Fix parenthesis warning in IVDescriptors
llvm-svn: 347990
2018-11-30 13:54:36 +00:00
Renato Golin 135e72e1b9 Add a new reduction pattern match
Adding a new reduction pattern match for vectorizing code similar
to TSVC s3111:

for (int i = 0; i < N; i++)
  if (a[i] > b)
    sum += a[i];

This patch adds support for fadd, fsub and fmull, as well as multiple
branches and different (but compatible) instructions (ex. add+sub) in
different branches.

The difference from the previous patch(https://reviews.llvm.org/D49168)
is as follows:
 - Added check of fast-math property of fp-instruction to the
   previous patch
 - Fix/add some pattern for if-reduction.ll


Differential Revision: https://reviews.llvm.org/D54464

Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>
     and Masakazu Ueno <masakazu.ueno@linaro.org>

llvm-svn: 347989
2018-11-30 13:40:10 +00:00
Warren Ristow 72d1f3a285 [SCEV] Guard movement of insertion point for loop-invariants
r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D54713

llvm-svn: 347934
2018-11-30 00:02:54 +00:00
Artur Pilipenko eba2365f23 Introduce MaxUsesToExplore argument to capture tracking
Currently CaptureTracker gives up if it encounters a value with more than 20 
uses. The motivation for this cap is to keep it relatively cheap for 
BasicAliasAnalysis use case, where the results can't be cached. Although, other 
clients of CaptureTracker might be ok with higher cost. This patch introduces an 
argument for PointerMayBeCaptured functions to specify the max number of uses to 
explore. The motivation for this change is a downstream user of CaptureTracker, 
but I believe upstream clients of CaptureTracker might also benefit from more 
fine grained cap.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D55042

llvm-svn: 347910
2018-11-29 20:08:12 +00:00
Sanjay Patel d802270808 [InstSimplify] fold select with implied condition
This is an almost direct move of the functionality from InstCombine to 
InstSimplify. There's no reason not to do this in InstSimplify because 
we never create a new value with this transform.

(There's a question of whether any dominance-based transform belongs in
either of these passes, but that's a separate issue.)

I've changed 1 of the conditions for the fold (1 of the blocks for the 
branch must be the block we started with) into an assert because I'm not 
sure how that could ever be false.

We need 1 extra check to make sure that the instruction itself is in a
basic block because passes other than InstCombine may be using InstSimplify
as an analysis on values that are not wired up yet.

The 3-way compare changes show that InstCombine has some kind of 
phase-ordering hole. Otherwise, we would have already gotten the intended
final result that we now show here.

llvm-svn: 347896
2018-11-29 18:44:39 +00:00
Artur Pilipenko 8b92c1d142 NFC. Use unsigned type for uses counter in CaptureTracking
llvm-svn: 347826
2018-11-29 02:15:35 +00:00
Nikita Popov cf596a8c26 [ValueTracking] Determine always-overflow condition for unsigned sub
Always-overflow was already determined for unsigned addition, but
not subtraction. This patch establishes parity.

This allows us to perform some additional simplifications for
signed saturating subtractions.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347771
2018-11-28 16:37:04 +00:00
Vitaly Buka 44abeb5c3c [stack-safety] Update comment
llvm-svn: 347626
2018-11-27 01:56:44 +00:00
Vitaly Buka 7792f5f145 [stack-safety] Fix and uncomment assert
llvm-svn: 347625
2018-11-27 01:56:35 +00:00
Vitaly Buka 769bff18be [stack-safety] Fix build on gcc 5.4
llvm-svn: 347624
2018-11-27 01:56:26 +00:00
JF Bastien 3c242438ec Fix debug build break
Comment out an assertion from D54543 which failed with error: no member named 'Range' in '(anonymous namespace)::PassAsArgInfo'.

llvm-svn: 347616
2018-11-26 23:48:47 +00:00
Vitaly Buka 42b050673e [stack-safety] Inter-Procedural Analysis implementation
Summary:
IPA is implemented as module pass which produce map from Function or Alias to
StackSafetyInfo for a single function.

From prototype by Evgenii Stepanov and Vlad Tsyrklevich.

Reviewers: eugenis, vlad.tsyrklevich, pcc, glider

Subscribers: hiraditya, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D54543

llvm-svn: 347611
2018-11-26 23:05:58 +00:00
Vitaly Buka b8e6fa6638 [stack-safety] Empty local passes for Stack Safety Global Analysis
Reviewers: eugenis, vlad.tsyrklevich

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54541

llvm-svn: 347610
2018-11-26 23:05:48 +00:00
Vitaly Buka fa98c074b7 [stack-safety] Local analysis implementation
Summary:
Analysis produces StackSafetyInfo which contains information with how allocas
and parameters were used in functions.

From prototype by Evgenii Stepanov and  Vlad Tsyrklevich.

Reviewers: eugenis, vlad.tsyrklevich, pcc, glider

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54504

llvm-svn: 347603
2018-11-26 21:57:59 +00:00
Vitaly Buka 4493fe1c1b [stack-safety] Empty local passes for Stack Safety Local Analysis
Reviewers: eugenis, vlad.tsyrklevich

Subscribers: mgorny, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54502

llvm-svn: 347602
2018-11-26 21:57:47 +00:00
Nikita Popov f94c8f0d1b [DemandedBits] Add support for funnel shifts
Add support for funnel shifts to the DemandedBits analysis. The
demanded bits of the first two operands can be determined if the
shift amount is constant. The demanded bits of the third operand
(shift amount) can be determined if the bitwidth is a power of two.

This is basically the same functionality as implemented in D54869
and D54478, but for DemandedBits rather than InstCombine.

Differential Revision: https://reviews.llvm.org/D54876

llvm-svn: 347561
2018-11-26 15:36:57 +00:00
Joel Jones 7459398a43 Revert unapproved commit
llvm-svn: 347511
2018-11-24 07:26:55 +00:00
Joel Jones 5f533c5fe1 [AArch64] Enable libm vectorized functions via SLEEF
This changeset is modeled after Intel's submission for SVML. It enables
trigonometry functions vectorization via SLEEF: http://sleef.org/.

 * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
 * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
 * A comprehensive test case is included in this changeset.
 * In a separate changeset (for clang), a new vectorization library argument is
   added to -fveclib: -fveclib=SLEEF.

Trigonometry functions that are vectorized by sleef:

acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma

Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D53927

llvm-svn: 347510
2018-11-24 06:41:39 +00:00
John Regehr 3a1c9d55cc [LVI] run transfer function for binary operator even when the RHS isn't a constant
LVI was symbolically executing binary operators only when the RHS was
constant, missing the case where we have a ConstantRange for the RHS,
but not an actual constant. Tested using check-all and by
bootstrapping. Compile time is not impacted measurably.

Differential Revision: https://reviews.llvm.org/D19859

llvm-svn: 347379
2018-11-21 05:24:12 +00:00
Sanjay Patel 14ab9170b8 [InstSimplify] fold funnel shifts with undef operands
Splitting these off from the D54666.

Patch by: nikic (Nikita Popov)

llvm-svn: 347332
2018-11-20 17:34:59 +00:00
Sanjay Patel eea21da12a [InstructionSimplify] Add support for saturating add/sub
Add support for saturating add/sub in InstructionSimplify. In particular, the following simplifications are supported:

    sat(X + 0) -> X
    sat(X + undef) -> -1
    sat(X uadd MAX) -> MAX
    (and commutative variants)

    sat(X - 0) -> X
    sat(X - X) -> 0
    sat(X - undef) -> 0
    sat(undef - X) -> 0
    sat(0 usub X) -> 0
    sat(X usub MAX) -> 0

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54532

llvm-svn: 347330
2018-11-20 17:20:26 +00:00
Sanjay Patel efc3d1dfaa [ConstantFolding] Add support for saturating add/sub
Support saturating add/sub in constant folding, based on the APInt methods introduced in D54332.

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54531

llvm-svn: 347328
2018-11-20 17:05:55 +00:00
Vedant Kumar 4de31bba51 [IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock
Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.

This can be more efficient than using pred_size(), which is a linear
time operation.

We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).

With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.

See llvm.org/PR39702 for a harder but more general way of achieving
similar results.

Differential Revision: https://reviews.llvm.org/D54686

llvm-svn: 347256
2018-11-19 19:54:27 +00:00
Anna Thomas 5e9215f02b [LV] Avoid vectorizing unsafe dependencies in uniform address
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.

Fixes PR39653.

Reviewers: Ayal, efriedma

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D54538

llvm-svn: 347220
2018-11-19 15:39:59 +00:00
Fedor Sergeev 3a3d688cc8 [LoopPass] fixing 'Modification' messages in -debug-pass=Executions for loop passes
Legacy loop pass manager is issuing "Made Modification" message after each Loop Pass
run, however condition for issuing it is accumulated among all the runs.
That leads to confusing 'modification' messages as soon as the first modification is done.

Changing condition to be "current pass made modifications", similar to how
it is being done in all other pass managers.

llvm-svn: 347215
2018-11-19 15:10:59 +00:00
Vedant Kumar e7b789b529 [ProfileSummary] Standardize methods and fix comment
Every Analysis pass has a get method that returns a reference of the Result of
the Analysis, for example, BlockFrequencyInfo
&BlockFrequencyInfoWrapperPass::getBFI().  I believe that
ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning
a pointer.

Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock,
respectively.  Most methods use BB as the argument of variable names while
methods usually refer to Basic Blocks as Blocks, instead of BB.  For example,
Function::getEntryBlock, Loop:getExitBlock, etc.

I also fixed one of the comments.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D54669

llvm-svn: 347182
2018-11-19 05:23:16 +00:00
Fangrui Song 7570932977 Use llvm::copy. NFC
llvm-svn: 347126
2018-11-17 01:44:25 +00:00
Eugene Leviant bf46e7410c [ThinLTO] Internalize readonly globals
An attempt to recommit r346584 after failure on OSX build bot.
Fixed cache key computation in ThinLTOCodeGenerator and added
test case

llvm-svn: 347033
2018-11-16 07:08:00 +00:00
Sanjay Patel e98ec77a95 [InstSimplify] delete shift-of-zero guard ops around funnel shifts
This is a problem seen in common rotate idioms as noted in:
https://bugs.llvm.org/show_bug.cgi?id=34924

Note that we are not canonicalizing standard IR (shifts and logic) to the intrinsics yet. 
(Although I've written this before...) I think this is the last step before we enable 
that transform. Ie, we could regress code by doing that transform without this 
simplification in place.

In PR34924, I questioned whether this is a valid transform for target-independent IR, 
but I convinced myself this is ok. If we're speculating a funnel shift by turning cmp+br 
into select, then SimplifyCFG has already determined that the transform is justified. 
It's possible that SimplifyCFG is not taking into account profile or other metadata, 
but if that's true, then it's a bug independent of funnel shifts.

Also, we do have CGP code to restore a guard like this around an intrinsic if it can't 
be lowered cheaply. But that isn't necessary for funnel shift because the default 
expansion in SelectionDAGBuilder includes this same cmp+select.

Differential Revision: https://reviews.llvm.org/D54552

llvm-svn: 346960
2018-11-15 14:53:37 +00:00
Teresa Johnson 32dc5b9bf1 [ThinLTO] Update handling of vararg functions to match inliner
Summary:
Previously we marked all vararg functions as non-inlinable in the
function summary, which prevented their importing. However, the
corresponding inliner restriction was loosened in r321940/r342675
to only apply to functions calling va_start. Adjust the summary
flag computation to match.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54270

llvm-svn: 346883
2018-11-14 19:30:13 +00:00
Simon Pilgrim 2b166c5044 [TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue
llvm-svn: 346868
2018-11-14 15:04:08 +00:00
Alina Sbirlea b4d088d090 [MemorySSA] Create query after checking if instruction is a fence.
The alternative is checking if I is a fence in the Query constructor, so
as to not attempt to get a non-existent MemoryLocation.

llvm-svn: 346798
2018-11-13 21:12:49 +00:00
Steven Wu fa43892d6f Revert "[ThinLTO] Internalize readonly globals"
This reverts commit 10c84a8f35cae4a9fc421648d9608fccda3925f2.

llvm-svn: 346768
2018-11-13 17:35:04 +00:00
Florian Hahn 86ed347bcd [VectorUtils] Use namespace for InterleaveGroup template specialization.
llvm-svn: 346759
2018-11-13 16:26:34 +00:00
Florian Hahn a4dc7feeea [VPlan] VPlan version of InterleavedAccessInfo.
This patch turns InterleaveGroup into a template with the instruction type
being a template parameter. It also adds a VPInterleavedAccessInfo class, which
only contains a mapping from VPInstructions to their respective InterleaveGroup.
As we do not have access to scalar evolution in VPlan, we can re-use
convert InterleavedAccessInfo to VPInterleavedAccess info.


Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D49489

llvm-svn: 346758
2018-11-13 15:58:18 +00:00
Simon Pilgrim 077a42ca9f [TTI] Make TargetTransformInfo::getOperandInfo static. NFCI.
It has no member dependencies and this makes it easier to reuse in other cost analysis code.

llvm-svn: 346755
2018-11-13 13:45:10 +00:00
Sanjay Patel 1456fd7614 [VectorUtils] add funnel-shifts to the list of vectorizable intrinsics
This just identifies the intrinsics as candidates for vectorization.
It does not mean we will attempt to vectorize under normal conditions
(the test file is forcing vectorization). 

The cost model must be fixed to show that the transform is profitable 
in general.

Allowing vectorization with these intrinsics is required to avoid
potential regressions from canonicalizing to the intrinsics from
generic IR:
https://bugs.llvm.org/show_bug.cgi?id=37417

llvm-svn: 346661
2018-11-12 15:20:14 +00:00
Sanjay Patel 0f4f4806b3 [VectorUtils] reorder list of vectorizable intrinsics; NFC
We need to add funnel-shifts to this list, so clean up
the random order before it gets worse.

llvm-svn: 346660
2018-11-12 15:10:30 +00:00
Max Kazantsev 7d49a3a816 [LICM] Hoist guards from non-header blocks
This patch relaxes overconservative checks on whether or not we could write
memory before we execute an instruction. This allows us to hoist guards out of
loops even if they are not in the header block.

Differential Revision: https://reviews.llvm.org/D50891
Reviewed By: fedor.sergeev

llvm-svn: 346643
2018-11-12 09:29:58 +00:00
Eugene Leviant be8d19967a [ThinLTO] Internalize readonly globals
This patch allows internalising globals if all accesses to them
(from live functions) are from non-volatile load instructions

Differential revision: https://reviews.llvm.org/D49362

llvm-svn: 346584
2018-11-10 08:31:21 +00:00
Simon Pilgrim 26e1c887f5 [TTI] Flip vector types in getShuffleCost SK_ExtractSubvector call
For SK_ExtractSubvector, the default 'Ty' type is the source operand type and 'SubTy' is the destination subvector type

I got this the wrong way around when I added rL346510

llvm-svn: 346534
2018-11-09 18:30:59 +00:00
Simon Pilgrim d0c71609c5 [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368)
Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks.

llvm-svn: 346510
2018-11-09 16:28:19 +00:00
Max Kazantsev 65cb9d79a2 [SCEV][NFC] Verify IR in isLoop[Entry,Backedge]GuardedByCond
We have a lot of various bugs that are caused by misuse of SCEV (in particular in LV),
all of them can simply be described as "we ask SCEV to prove some fact on invalid IR".
Some of examples of those are PR36311, PR37221, PR39160.

The problem is that these failues manifest differently (what we saw was failure of various
asserts across SCEV, but there can also be miscompiles). This patch adds an assert into two
SCEV methods that strongly rely on correctness of the IR and are involved in known failues.
This will at least allow us to have a clear indication of what was wrong in this case.

This patch also fixes a unit test with incorrect IR that fails this verification.

Differential Revision: https://reviews.llvm.org/D52930
Reviewed By: fhahn

llvm-svn: 346389
2018-11-08 05:07:58 +00:00