Commit Graph

8892 Commits

Author SHA1 Message Date
Victor Campos e64863d192 [SCEV] Removing deprecated comment in ScalarEvolutionExpander
Removing a comment in the ScalarEvolutionExpander.cpp file that was about the
class SCEVSDivExpr, which has been long gone from LLVM.

llvm-svn: 375232
2019-10-18 13:33:45 +00:00
Oliver Stannard 3b598b9c86 Reland: Dead Virtual Function Elimination
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.

Original commit message:

Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

llvm-svn: 375094
2019-10-17 09:58:57 +00:00
Guillaume Chatelet bae629b966 [Alignment][NFC] Value::getPointerAlignment returns MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68398

llvm-svn: 374889
2019-10-15 13:58:22 +00:00
Jorge Gorbe Moya b052331bd6 Revert "Dead Virtual Function Elimination"
This reverts commit 9f6a873268.

llvm-svn: 374844
2019-10-14 23:25:25 +00:00
Sam Parker 527a35e155 [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]
Add an extra parameter so the backend can take the alignment into
consideration.

Differential Revision: https://reviews.llvm.org/D68400

llvm-svn: 374763
2019-10-14 10:00:21 +00:00
Zi Xuan Wu 9802268ad3 recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374634
2019-10-12 02:53:04 +00:00
Oliver Stannard 9f6a873268 Dead Virtual Function Elimination
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

llvm-svn: 374539
2019-10-11 11:59:55 +00:00
Florian Hahn 77fbf069f6 [SCEV] Add stricter verification option.
Currently -verify-scev only fails if there is a constant difference
between two BE counts. This misses a lot of cases.

This patch adds a -verify-scev-strict options, which fails for any
non-zero differences, if used together with -verify-scev.

With the stricter checking, some unit tests fail because
of mis-matches, especially around IndVarSimplify.

If there is no reason I am missing for just checking constant deltas, I
am planning on looking into the various failures.

Reviewers: efriedma, sanjoy.google, reames, atrick

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D68592

llvm-svn: 374535
2019-10-11 11:46:40 +00:00
Alina Sbirlea 6442b56974 [MemorySSA] Update Phi simplification.
When simplifying a Phi to the unique value found incoming, check that
there wasn't a Phi already created to break a cycle. If so, remove it.
Resolves PR43541.

Some additional nits included.

llvm-svn: 374471
2019-10-10 23:27:21 +00:00
Rong Xu 686fa4bbfb [ValueTracking] Improve pointer offset computation for cases of same base
This patch improves the handling of pointer offset in GEP expressions where
one argument is the base pointer. isPointerOffset() is being used by memcpyopt
where current code synthesizes consecutive 32 bytes stores to one store and
two memset intrinsic calls. With this patch, we convert the stores to one
memset intrinsic.

Differential Revision: https://reviews.llvm.org/D67989

llvm-svn: 374454
2019-10-10 21:30:43 +00:00
Alina Sbirlea 67f0c5c085 [MemorySSA] Additional handling of unreachable blocks.
Summary:
Whenever we get the previous definition, the assumption is that the
recursion starts ina  reachable block.
If the recursion starts in an unreachable block, we may recurse
indefinitely. Handle this case by returning LoE if the block is
unreachable.

Resolves PR43426.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68809

llvm-svn: 374447
2019-10-10 20:43:06 +00:00
David Greene 7c562f1286 [System Model] [TTI] Move default cache/prefetch implementations
Move the default implementations of cache and prefetch queries to
TargetTransformInfoImplBase and delete them from NoTIIImpl.  This brings these
interfaces in line with how other TTI interfaces work.

Differential Revision: https://reviews.llvm.org/D68804

llvm-svn: 374446
2019-10-10 20:39:27 +00:00
Guillaume Chatelet 837a1b84ce [Alignment][NFC] Make VectorUtils uas llvm::Align
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, rogfer01, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68784

llvm-svn: 374330
2019-10-10 12:35:04 +00:00
David Greene 2e6f6b4dad [System Model] [TTI] Update cache and prefetch TTI interfaces
Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon.  This involved
moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo.

Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model.  Changes include:

- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
  implementation
- Moving the default TargetTransformInfoImplBase implementation to a default
  MCSubtarget implementation

Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC
and SystemZ.  AArch64 already has a custom subtarget implementation, so its
custom TTI implementation is migrated to use the new facilities in BasicTTIImpl
to invoke its custom subtarget implementation.  The custom TTI implementations
continue to exist for the other targets with this change.  They are not moved
over to subtarget-based implementations.

The end goal is to have the default subtarget implementation defer to the system
model defined by the target.  With this change, the default MCSubtargetInfo
implementation essentially returns the defaults TargetTransformInfoImplBase used
to return.  Existing users of TTI defaults will hit the defaults now in
MCSubtargetInfo.  Targets that define their own custom TTI implementations won't
use the BasicTTIImpl implementations that route to the subtarget.

Once system models are in place for the targets that use these interfaces, their
custom TTI implementations can be removed.

Differential Revision: https://reviews.llvm.org/D63614

llvm-svn: 374205
2019-10-09 19:51:48 +00:00
Alina Sbirlea 7faa14a98b [MemorySSA] Make the use of moveAllAfterMergeBlocks consistent.
Summary:
The rule for the moveAllAfterMergeBlocks API si for all instructions
from `From` to have been moved to `To`, while keeping the CFG edges (and
block terminators) unchanged.
Update all the callsites for moveAllAfterMergeBlocks to follow this.

Pending follow-up: since the same behavior is needed everytime, merge
all callsites into one. The common denominator may be the call to
`MergeBlockIntoPredecessor`.

Resolves PR43569.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68659

llvm-svn: 374177
2019-10-09 15:54:24 +00:00
Jinsong Ji 9912232b46 Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0.
This reverts commit 18b6fe07bc.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

llvm-svn: 374091
2019-10-08 17:32:56 +00:00
Graham Hunter b302561b76 [SVE][IR] Scalable Vector size queries and IR instruction support
* Adds a TypeSize struct to represent the known minimum size of a type
  along with a flag to indicate that the runtime size is a integer multiple
  of that size
* Converts existing size query functions from Type.h and DataLayout.h to
  return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
  to uint64_t) so that most existing code 'just works' as if the return
  values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
  supported instructions used with scalable vectors can be constructed
  in IR.

Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen

Reviewed By: rovka, sdesmalen

Differential Revision: https://reviews.llvm.org/D53137

llvm-svn: 374042
2019-10-08 12:53:54 +00:00
Zi Xuan Wu 9f41deccc0 [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374017
2019-10-08 03:28:33 +00:00
Jordan Rose fdaa742174 Second attempt to add iterator_range::empty()
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.

https://reviews.llvm.org/D68439

llvm-svn: 373935
2019-10-07 18:14:24 +00:00
Martin Storsjo dfc1aee25b Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine"
This reverts SVN r373833, as it caused a failed assert "Non-zero loop
cost expected" on building numerous projects, see PR43582 for details
and reproduction samples.

llvm-svn: 373882
2019-10-07 08:21:37 +00:00
Whitney Tsang dcb75bf843 [LOOPGUARD] Remove asserts in getLoopGuardBranch
Summary: The assertion in getLoopGuardBranch can be a 'return nullptr'
under if condition.
Authored By: DTharun
Reviewer: Whitney, fhahn
Reviewed By: Whitney, fhahn
Subscribers: fhahn, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D66084

llvm-svn: 373857
2019-10-06 16:39:43 +00:00
Sanjay Patel e2321bb448 [SLP] avoid reduction transform on patterns that the backend can load-combine
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 373833
2019-10-05 18:03:58 +00:00
Alina Sbirlea 24ae5ce54b [MemorySSA] Update Phi creation when inserting a Def.
MemoryPhis should be added in the IDF of the blocks newly gaining Defs.
This includes the blocks that gained a Phi and the block gaining a Def,
if the block did not have one before.
Resolves PR43427.

llvm-svn: 373505
2019-10-02 18:42:33 +00:00
Aditya Kumar 0cacf136fc Fix: Actually erase remove the elements from AssumeHandles
Reviewers: sdmitriev, tejohnson

Reviewed by: tejohnson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68318

llvm-svn: 373494
2019-10-02 17:35:06 +00:00
Simon Pilgrim b635964abc MemorySSAUpdater::applyInsertUpdates - silence static analyzer dyn_cast<MemoryAccess> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<MemoryAccess> directly and if not assert will fire for us.

llvm-svn: 373467
2019-10-02 13:09:12 +00:00
Simon Pilgrim 65e1150988 MemorySSA tryOptimizePhi - assert that we've found a DefChainEnd. NFCI.
Silences static analyzer null dereference warning.

llvm-svn: 373466
2019-10-02 13:09:04 +00:00
Simon Pilgrim e2ded3d131 LoopAccessAnalysis isConsecutiveAccess() - silence static analyzer dyn_cast<SCEVConstant> null dereference warning. NFCI.
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<SCEVConstant> directly and if not assert will fire for us.

llvm-svn: 373465
2019-10-02 13:08:56 +00:00
Florian Hahn 067ed96e8e [InstCombine] Simplify fma multiplication to nan for undef or nan operands.
In similar fashion to D67721, we can simplify FMA multiplications if any
of the operands is NaN or undef. In instcombine, we will simplify the
FMA to an fadd with a NaN operand, which in turn gets folded to NaN.

Note that this just changes SimplifyFMAFMul, so we still not catch the
case where only the Add part of the FMA is Nan/Undef.

Reviewers: cameron.mcinally, mcberg2017, spatel, arsenm

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D68265

llvm-svn: 373459
2019-10-02 12:32:52 +00:00
Sanjay Patel be21ceb565 [InstSimplify] fold fma/fmuladd with a NaN or undef operand
This is intended to be similar to the constant folding results from
D67446
and earlier, but not all operands are constant in these tests, so the
responsibility for folding is left to InstSimplify.

Differential Revision: https://reviews.llvm.org/D67721

llvm-svn: 373455
2019-10-02 12:12:02 +00:00
Jay Foad c38188c5fe Remove an unnecessary cast. NFC.
llvm-svn: 373434
2019-10-02 08:56:33 +00:00
Bardia Mahjour 91b62d5c89 [DDG] Data Dependence Graph - Root Node
Summary:
This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

Tag: #llvm

Differential Revision: https://reviews.llvm.org/D67970

llvm-svn: 373386
2019-10-01 19:32:42 +00:00
Alina Sbirlea 890090f7f5 [MemorySSA] Check for unreachable blocks when getting last definition.
If a single predecessor is found, still check if the block is
unreachable. The test that found this had a self loop unreachable block.
Resolves PR43493.

llvm-svn: 373383
2019-10-01 19:09:50 +00:00
Alina Sbirlea ae40dfc1e3 [MemorySSA] Update last_access_in_block check.
The check for "was there an access in this block" should be: is the last
access in this block and is it not a newly inserted phi.
Resolves new test in PR43438.

Also fix a typo when simplifying trivial Phis to match the comment.

llvm-svn: 373380
2019-10-01 18:34:39 +00:00
Sam Parker 95aee9da4c [NFC][HardwareLoops] Update some iterators
llvm-svn: 373309
2019-10-01 07:53:28 +00:00
Evandro Menezes 22cb3d2e58 [ConstantFolding] Fold constant calls to log2()
Somehow, folding calls to `log2()` with a constant was missing.

Differential revision: https://reviews.llvm.org/D67300

llvm-svn: 373262
2019-09-30 20:53:23 +00:00
Tim Northover 58e8c793d0 Revert "[SCEV] add no wrap flag for SCEVAddExpr."
This reverts r366419 because the analysis performed is within the context of
the loop and it's only valid to add wrapping flags to "global" expressions if
they're always correct.

llvm-svn: 373184
2019-09-30 07:46:52 +00:00
Sanjay Patel 8cecc30c99 [InstSimplify] generalize FP folds with undef/NaN; NFC
We can reuse this logic for things like fma.

llvm-svn: 373119
2019-09-27 20:09:09 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Wei Mi 9c8efeda5c Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicated
exits"

Get a better approach in https://reviews.llvm.org/D68107 to solve the problem.
Revert the initial patch and will commit the new one soon.

This reverts commit rL372990.

llvm-svn: 373044
2019-09-27 05:43:30 +00:00
Whitney Tsang 9c5fbcf920 [LOOPGUARD] Disable loop with multiple loop exiting blocks.
Summary: As discussed in the loop group meeting. With the current
definition of loop guard, we should not allow multiple loop exiting
blocks. For loops that has multiple loop exiting blocks, we can simply
unable to find the loop guard.
When getUniqueExitBlock() obtains a vector size not equals to one, that
means there is either no exit blocks or there exists more than one
unique block the loop exit to.
If we don't disallow loop with multiple loop exit blocks, then with our
current implementation, there can exist exit blocks don't post dominated
by the non pre-header successor of the guard block.
Reviewer: reames, Meinersbur, kbarton, etiotto, bmahjour
Reviewed By: Meinersbur, kbarton
Subscribers: fhahn, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D66529

llvm-svn: 373011
2019-09-26 20:20:42 +00:00
Simon Pilgrim 514e6b6e6e ConstantFold - silence static analyzer dyn_cast<ExtractValueInst> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<ExtractValueInst> directly and if not assert will fire for us.

llvm-svn: 372993
2019-09-26 16:30:36 +00:00
Wei Mi 67d93f0d91 [LoopInfo] Limit the iterations to check whether a loop has dedicated exits
for extreme large case.

We had a case that a single loop which has 4000 exits and the average number
of predecessors of each exit is > 1000, and we found compiling the case spent
a significant amount of time on checking whether a loop has dedicated exits.
This patch adds a limit for the iterations to the check. With the patch, the
time to compile our testcase reduced from 1000s to 200s (clang release build).

Differential Revision: https://reviews.llvm.org/D67359

llvm-svn: 372990
2019-09-26 15:36:25 +00:00
Simon Pilgrim 7573845061 Remove local shadow constant. NFCI.
ValueTracking.cpp already has a local static MaxDepth = 6 constant - this one seems to have been missed when rL124183 landed.

llvm-svn: 372964
2019-09-26 11:30:35 +00:00
Simon Pilgrim 2dcee966ad [ValueTracking] Silence static analyzer dyn_cast<Operator> null dereference warnings. NFCI.
The static analyzer is warning about a potential null dereferences, but since the pointer is only used in a switch statement for Operator::getOpcode() (with an empty default) then its easiest just to wrap this in a null test as the dyn_cast might return null here.

llvm-svn: 372962
2019-09-26 11:09:08 +00:00
Keno Fischer cea8882254 [ConstantFolding] Use FoldBitCast correctly
Previously we might attempt to use a BitCast to turn bits into vectors of pointers,
but that requires an inttoptr cast to be legal. Add an assertion to detect the formation of illegal bitcast attempts
early (in the tests, we often constant-fold away the result before getting to this assertion check),
while being careful to still handle the early-return conditions without adding extra complexity in the result.

Patch by Jameson Nash <jameson@juliacomputing.com>.

Differential Revision: https://reviews.llvm.org/D65057

llvm-svn: 372940
2019-09-26 02:07:51 +00:00
Alina Sbirlea 6720ed851b [MemorySSA] Avoid adding Phis in the presence of unreachable blocks.
Summary:
If a block has all incoming values with the same MemoryAccess (ignoring
incoming values from unreachable blocks), then use that incoming
MemoryAccess and do not create a Phi in the first place.

Revert IDF work-around added in rL372673; it should not be required unless
the Def inserted is the first in its block.

The patch also cleans up a series of tests, added during the many
iterations on insertDef.

The patch also fixes PR43438.
The same issue that occurs in insertDef with "adding phis, hence the IDF of
Phis is needed", can also occur in fixupDefs: the `getPreviousRecursive`
call only adds Phis walking on the predecessor edges, which means there
may be the case of a Phi added walking the CFG "backwards" which
triggers the needs for an additional Phi in successor blocks.
Such Phis are added during fixupDefs only in the presence of unreachable
blocks.
Hence this highlights the need to avoid adding Phis in blocks with
unreachable predecessors in the first place.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67995

llvm-svn: 372932
2019-09-25 23:24:39 +00:00
Roman Lebedev 914a3d1cf2 [InstSimplify] Handle more 'A </>/>=/<= B &&/|| (A - B) !=/== 0' patterns (PR43251)
https://rise4fun.com/Alive/sl9s
https://rise4fun.com/Alive/2plN

https://bugs.llvm.org/show_bug.cgi?id=43251

llvm-svn: 372928
2019-09-25 22:59:41 +00:00
Florian Hahn d663efe23a [InstSimplify] Match 1.0 and 0.0 for both operands in SimplifyFMAMul
Because we do not constant fold multiplications in SimplifyFMAMul,
we match 1.0 and 0.0 for both operands, as multiplying by them
is guaranteed to produce an exact result (if it is allowed to do so).

Note that it is not enough to just swap the operands to ensure a
constant is on the RHS, as we want to also cover the case with
2 constants.

Reviewers: lebedev.ri, spatel, reames, scanon

Reviewed By: lebedev.ri, reames

Differential Revision: https://reviews.llvm.org/D67553

llvm-svn: 372915
2019-09-25 19:33:26 +00:00
Florian Hahn f3ab99dcf8 [InstCombine] Limit FMul constant folding for fma simplifications.
As @reames pointed out post-commit, rL371518 adds additional rounding
in some cases, when doing constant folding of the multiplication.
This breaks a guarantee llvm.fma makes and must be avoided.

This patch reapplies rL371518, but splits off the simplifications not
requiring rounding from SimplifFMulInst as SimplifyFMAFMul.

Reviewers: spatel, lebedev.ri, reames, scanon

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D67434

llvm-svn: 372899
2019-09-25 17:03:20 +00:00
Florian Hahn 5c3bc3c930 [PatternMatch] Make m_Br more flexible, add matchers for BB values.
Currently m_Br only takes references to BasicBlock*, which limits its
flexibility. For example, you have to declare a variable, even if you
ignore the result or you have to have additional checks to make sure the
matched BB matches an expected one.

This patch adds m_BasicBlock and m_SpecificBB matchers, which can be
used like the existing matchers for constants or values.

I also had a look at the existing uses and updated a few. IMO it makes
the code a bit more explicit.

Reviewers: spatel, craig.topper, RKSimon, majnemer, lebedev.ri

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D68013

llvm-svn: 372885
2019-09-25 15:05:08 +00:00
Sanjay Patel 6d4ea22e70 [IR] allow fast-math-flags on phi of FP values (2nd try)
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917 <https://reviews.llvm.org/D61917>

As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086

The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535

But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.

The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.

Differential Revision: https://reviews.llvm.org/D67564

llvm-svn: 372878
2019-09-25 14:35:02 +00:00
Sanjay Patel 2cec4b58f5 Revert [IR] allow fast-math-flags on phi of FP values
This reverts r372866 (git commit dec03223a9)

llvm-svn: 372868
2019-09-25 13:29:09 +00:00
Sanjay Patel dec03223a9 [IR] allow fast-math-flags on phi of FP values
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917

As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086

The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535

But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.

The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.

Differential Revision: https://reviews.llvm.org/D67564

llvm-svn: 372866
2019-09-25 13:14:12 +00:00
Artur Pilipenko 5c1447cd43 [SCEV] Disable canonical expansion for non-affine addrecs.
Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D65276

Patch by Evgeniy Brevnov (ybrevnov@azul.com)

llvm-svn: 372789
2019-09-24 23:21:07 +00:00
Hiroshi Yamauchi 857424d185 [PGO][PGSO] ProfileSummary changes.
(Split of off D67120)

ProfileSummary changes for profile guided size optimization.

Differential Revision: https://reviews.llvm.org/D67377

llvm-svn: 372783
2019-09-24 22:17:51 +00:00
Alina Sbirlea 2c5e6646ef [MemorySSA] Update Phi insertion.
Summary:
MemoryPhis may be needed following a Def insertion inthe IDF of all the
new accesses added (phis + potentially a def). Ensure this also  occurs when
only the new MemoryPhis are the defining accesses.

Note: The need for computing IDF here is because of new Phis added with
edges incoming from unreachable code, Phis that had previously been
simplified. The preferred solution is to not reintroduce such Phis.
This patch is the needed fix while working on the preferred solution.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67927

llvm-svn: 372673
2019-09-23 23:50:16 +00:00
Simon Pilgrim f62293e8fe [ValueTracking] Fix uninitialized variable warnings in matchSelectPattern const wrapper. NFCI.
Static analyzer complains about const_cast uninitialized variables, we should explicitly set these to null.

Ideally that const wrapper would go away though.......

llvm-svn: 372603
2019-09-23 13:15:52 +00:00
Roman Lebedev baf809811b [InstSimplify] simplifyUnsignedRangeCheck(): X >= Y && Y == 0 --> Y == 0
https://rise4fun.com/Alive/v9Y4

llvm-svn: 372491
2019-09-21 22:27:39 +00:00
Roman Lebedev e94f156f77 [InstSimplify][NFC] Reorganize simplifyUnsignedRangeCheck() to emphasize and/or symmetry
Only a single `X >= Y && Y == 0  -->  Y == 0` fold appears to be missing.

llvm-svn: 372490
2019-09-21 22:27:28 +00:00
Teresa Johnson 2f32e5d84d [Inliner] Remove incorrect early exit during switch cost computation
Summary:
The CallAnalyzer::visitSwitchInst has an early exit when the estimated
lower bound of the switch cost will put the overall cost of the inline
above the threshold. However, this code is not correctly estimating the
lower bound for switches that can be transformed into bit tests, leading
to unnecessary lost inlines, and also differing behavior with
optimization remarks enabled.

First, the early exit is controlled by whether ComputeFullInlineCost is
enabled or not, and that in turn is disabled by default but enabled when
enabling -pass-remarks=missed. This by itself wouldn't lead to a
problem, except that as described below, the lower bound can be above
the real lower bound, so we can sometimes get different inline decisions
with inline remarks enabled, which is problematic.

The early exit was added in along with a new switch cost model in D31085.
The reason why this early exit was added is due to a concern one reviewer
raised about compile time for large switches:
https://reviews.llvm.org/D31085?id=94559#inline-276200

However, the code just below there calls
getEstimatedNumberOfCaseClusters, which in turn immediately calls
BasicTTIImpl getEstimatedNumberOfCaseClusters, which in the worst case
does a linear scan of the cases to get the high and low values. The
bit test handling in particular is guarded by whether the number of
cases fits into the max bit width. There is no suggestion that anyone
measured a compile time issue, it appears to be theoretical.

The problem is that the reviewer's comment about the lower bound
calculation is incorrect, specifically in the case of a switch that can
be lowered to a bit test. This isn't followed up on the comment
thread, but the author does add a FIXME to that effect above the early
exit added when they subsequently revised the patch.

As a result, we were incorrectly early exiting and not inlining
functions with switch statements that would be lowered to bit tests in
cases where we were nearing the threshold. Combined with the fact that
this early exit was skipped with opt remarks enabled, this caused
different inlining decisions to be made when -pass-remarks=missed is
enabled to debug the missing inline.

Remove the early exit for the above reasons.

I also copied over an existing AArch64 inlining test to X86, and
adjusted the threshold so that the bit test inline only occurs with the
fix in this patch.

Reviewers: davidxl

Subscribers: eraman, kristof.beyls, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67716

llvm-svn: 372440
2019-09-20 23:29:17 +00:00
Shoaib Meenai d89f2d872d [Analysis] Allow -scalar-evolution-max-iterations more than once
At present, `-scalar-evolution-max-iterations` is a `cl::Optional`
option, which means it demands to be passed exactly zero or one times.
Our build system makes it pretty tricky to guarantee this. We often
accidentally pass the flag more than once (but always with the same
value) which results in an error, after which compilation fails:

```
clang (LLVM option parsing): for the -scalar-evolution-max-iterations option: may only occur zero or one times!
```

It seems reasonable to allow -scalar-evolution-max-iterations to be
passed more than once. Quoting the [[ http://llvm.org/docs/CommandLine.html#controlling-the-number-of-occurrences-required-and-allowed | documentation ]]:

> The cl::ZeroOrMore modifier ... indicates that your program will allow the option to be specified zero or more times.
> ...
> If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained.

Original patch by: Enrico Bern Hardy Tanuwidjaja <etanuwid@fb.com>

Differential Revision: https://reviews.llvm.org/D67512

llvm-svn: 372346
2019-09-19 18:21:32 +00:00
Francesco Petrogalli cb032aa2c7 [SVFS] Vector Function ABI demangling.
This patch implements the demangling functionality as described in the
Vector Function ABI. This patch will be used to implement the
SearchVectorFunctionSystem (SVFS) as described in the RFC:

http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html

A fuzzer is added to test the demangling utility.

Patch by Sumedh Arani <sumedh.arani@arm.com>

Differential revision: https://reviews.llvm.org/D66024

llvm-svn: 372343
2019-09-19 17:47:32 +00:00
Bardia Mahjour db800c267d Data Dependence Graph Basics
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.

The algorithm for building the graph involves the following steps in order:

  1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
  2. For each node in the graph establish def-use edges to/from other nodes in the graph.
  3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur, fhahn, myhsu

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto

Tag: #llvm

Differential Revision: https://reviews.llvm.org/D65350

llvm-svn: 372238
2019-09-18 17:43:45 +00:00
Jay Foad c92e51d84b [SDA] Don't stop divergence propagation at the IPD.
Summary:
This fixes B42473 and B42706.

This patch makes the SDA propagate branch divergence until the end of the RPO traversal. Before, the SyncDependenceAnalysis propagated divergence only until the IPD in rpo order. RPO is incompatible with post dominance in the presence of loops. This made the SDA crash because blocks were missed in the propagation.

Reviewers: foad, nhaehnle

Reviewed By: foad

Subscribers: jvesely, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65274

llvm-svn: 372223
2019-09-18 13:40:22 +00:00
Bardia Mahjour 6476d7cf0b Revert "Data Dependence Graph Basics"
This reverts commit c98ec60993, which broke the sphinx-docs build.

llvm-svn: 372168
2019-09-17 19:22:01 +00:00
Bardia Mahjour c98ec60993 Data Dependence Graph Basics
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.

The algorithm for building the graph involves the following steps in order:

  1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
  2. For each node in the graph establish def-use edges to/from other nodes in the graph.
  3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur, fhahn, myhsu

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto

Tag: #llvm

Differential Revision: https://reviews.llvm.org/D65350

llvm-svn: 372162
2019-09-17 18:55:44 +00:00
Alina Sbirlea 4e9082ef95 [MemorySSA] Fix phi insertion when inserting a def.
Summary:
When inserting a Def, the current algorithm is walking edges backward
and inserting new Phis where needed. There may be additional Phis needed
in the IDF of the newly inserted Def and Phis.
Adding Phis in the IDF of the Def was added ina  previous patch, but we
may also need other Phis in the IDF of the newly added Phis.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67637

llvm-svn: 372138
2019-09-17 16:33:35 +00:00
Alina Sbirlea 6b2d1346d8 [MemorySSA] Update MSSA for non-conventional AA.
Summary:
Regularly when moving an instruction that may not read or write memory,
the instruction is not modelled in MSSA, so not action is necessary.
For a non-conventional AA pipeline, MSSA needs to explicitly check when
creating accesses, so as to not model instructions that may not read and
write memory.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67562

llvm-svn: 372137
2019-09-17 16:31:37 +00:00
Simon Pilgrim c52a7093df InterleavedAccessInfo - Don't dereference a dyn_cast result. NFCI.
llvm-svn: 372117
2019-09-17 13:25:56 +00:00
David Bolvansky be2487a2ba [InstCombine] Annotate strdup with deref_or_null
llvm-svn: 372098
2019-09-17 10:12:48 +00:00
Guillaume Chatelet 98cb8db836 [NFC] remove unused functions
Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67616

llvm-svn: 371994
2019-09-16 14:48:58 +00:00
Roman Lebedev 9c5a4a4527 [InstSimplify] simplifyUnsignedRangeCheck(): handle few tautological cases (PR43251)
Summary:
This is split off from D67356, since these cases produce a constant,
no real need to keep them in instcombine.

Alive proofs:
https://rise4fun.com/Alive/u7Fk
https://rise4fun.com/Alive/4lV

https://bugs.llvm.org/show_bug.cgi?id=43251

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67498

llvm-svn: 371921
2019-09-14 13:47:27 +00:00
Philip Reames bdf608477e [SCEV] Add smin support to getRangeRef
We were failing to compute trip counts (both exact and maximum) for any loop which involved a comparison against either an umin or smin. It looks like this simply got missed when we added smin/umin to SCEV.  (Note: umin was submitted separately earlier today.  Turned out two folks hit this at the same time.)

Differential Revision: https://reviews.llvm.org/D67514

llvm-svn: 371776
2019-09-12 21:32:27 +00:00
Evandro Menezes 08df6e64d5 [ConstantFolding] Expand folding of some library functions
Expanding the folding of `nearbyint()`, `rint()` and `trunc()` to library
functions, in addition to the current support for intrinsics.

Differential revision: https://reviews.llvm.org/D67468

llvm-svn: 371774
2019-09-12 21:23:22 +00:00
Florian Hahn a31ee37624 [SCEV] Support SCEVUMinExpr in getRangeRef.
This patch adds support for SCEVUMinExpr to getRangeRef,
similar to the support for SCEVUMaxExpr.

Reviewers: sanjoy.google, efriedma, reames, nikic

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D67177

llvm-svn: 371768
2019-09-12 20:03:32 +00:00
Alina Sbirlea 18f5204db4 [LICM/AST] Check if the AliasAny set is removed from the tracker.
Summary:
Resolves PR38513.
Credit to @bjope for debugging this.

Reviewers: hfinkel, uabelho, bjope

Subscribers: sanjoy.google, bjope, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67417

llvm-svn: 371752
2019-09-12 18:09:47 +00:00
Philip Reames b90f94f42e [LV] Support invariant addresses in speculation logic
Implement a TODO from rL371452, and handle loop invariant addresses in predicated blocks. If we can prove that the load is safe to speculate into the header, then we can avoid using a masked.load in favour of a normal load.

This is mostly about vectorization robustness. In the common case, it's generally expected that LICM/LoadStorePromotion would have eliminated such loads entirely.

Differential Revision: https://reviews.llvm.org/D67372

llvm-svn: 371745
2019-09-12 16:49:10 +00:00
Sanjay Patel 3f5a808365 [ConstProp] allow folding for fma that produces NaN
Folding for fma/fmuladd was added here:
rL202914
...and as seen in existing/unchanged tests, that works to propagate NaN
if it's already an input, but we should fold an fma() that creates NaN too.

From IEEE-754-2008 7.2 "Invalid Operation", there are 2 clauses that apply
to fma, so I added tests for those patterns:

  c) fusedMultiplyAdd: fusedMultiplyAdd(0, ∞, c) or fusedMultiplyAdd(∞, 0, c)
     unless c is a quiet NaN; if c is a quiet NaN then it is implementation
     defined whether the invalid operation exception is signaled
  d) addition or subtraction or fusedMultiplyAdd: magnitude subtraction of
     infinities, such as: addition(+∞, −∞)

Differential Revision: https://reviews.llvm.org/D67446

llvm-svn: 371735
2019-09-12 14:10:50 +00:00
Roman Lebedev f1286621eb [InstSimplify] simplifyUnsignedRangeCheck(): handle more cases (PR43251)
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.

This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.

The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..

Proofs: https://rise4fun.com/Alive/21b

Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)

Reviewers: spatel, nikic, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67411

llvm-svn: 371718
2019-09-12 09:26:17 +00:00
Evandro Menezes ed5f452645 [ConstantFolding] Refactor math functions to use LLVM ones (NFC)
When possible, replace calls to library routines on the host with equivalent
ones in LLVM.

Differential revision: https://reviews.llvm.org/D67459

llvm-svn: 371677
2019-09-11 21:46:57 +00:00
Roman Lebedev 00c1ee48e4 [InstSimplify] Pass SimplifyQuery into simplifyUnsignedRangeCheck() and use it for isKnownNonZero()
This was actually the original intention in D67332,
but i messed up and forgot about it.
This patch was originally part of D67411, but precommitting this.

llvm-svn: 371630
2019-09-11 15:32:46 +00:00
Tim Renouf c26b3940c3 [TLI][AMDGPU] AMDPAL does not have library functions
Configure TLI to say that r600/amdgpu does not have any library
functions, such that InstCombine does not do anything like turn sin/cos
into the library function @tan with sufficient fast math flags.

Differential Revision: https://reviews.llvm.org/D67406

Change-Id: I02f907d3e64832117ea9800e9f9285282856e5df
llvm-svn: 371592
2019-09-11 07:26:39 +00:00
Alina Sbirlea f7b4022db1 [MemorySSA] Do not create memoryaccesses for debug info intrinsics.
Summary:
Do not model debuginfo intrinsics in MemorySSA.
Regularly these are non-memory modifying instructions. With -disable-basicaa, they were being modelled as Defs.

Reviewers: george.burgess.iv

Subscribers: aprantl, Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67307

llvm-svn: 371565
2019-09-10 22:35:27 +00:00
Philip Reames cffa630c80 [Loads] Move generic code out of vectorizer into a location it might be reused [NFC]
llvm-svn: 371558
2019-09-10 21:33:53 +00:00
Philip Reames 1e1db80048 [ValueTracking] Factor our common speculation suppression logic [NFC]
Expose a utility function so that all places which want to suppress speculation (when otherwise legal) due to ordering and/or sanitizer interaction can do so.

llvm-svn: 371556
2019-09-10 21:12:29 +00:00
Guozhi Wei b329e0728b [BPI] Adjust the probability for floating point unordered comparison
Since NaN is very rare in normal programs, so the probability for floating point unordered comparison should be extremely small. Current probability is 3/8, it is too large, this patch changes it to a tiny number.

Differential Revision: https://reviews.llvm.org/D65303

llvm-svn: 371541
2019-09-10 17:25:11 +00:00
Roman Lebedev 6e2c5c8710 [InstSimplify] simplifyUnsignedRangeCheck(): if we know that X != 0, handle more cases (PR43246)
Summary:
This is motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

In this particular case, given
```
char* test(char& base, unsigned long offset) {
  return &base + offset;
}
```
it will end up producing something like
https://godbolt.org/z/LK5-iH
which after optimizations reduces down to roughly
```
define i1 @t0(i8* nonnull %base, i64 %offset) {
  %base_int = ptrtoint i8* %base to i64
  %adjusted = add i64 %base_int, %offset
  %non_null_after_adjustment = icmp ne i64 %adjusted, 0
  %no_overflow_during_adjustment = icmp uge i64 %adjusted, %base_int
  %res = and i1 %non_null_after_adjustment, %no_overflow_during_adjustment
  ret i1 %res
}
```
Without D67122 there was no `%non_null_after_adjustment`,
and in this particular case we can get rid of the overhead:

Here we add some offset to a non-null pointer,
and check that the result does not overflow and is not a null pointer.
But since the base pointer is already non-null, and we check for overflow,
that overflow check will already catch the null pointer,
so the separate null check is redundant and can be dropped.

Alive proofs:
https://rise4fun.com/Alive/WRzq

There are more patterns of "unsigned-add-with-overflow", they are not handled here,
but this is the main pattern, that we currently consider canonical,
so it makes sense to handle it.

https://bugs.llvm.org/show_bug.cgi?id=43246

Reviewers: spatel, nikic, vsk

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits, reames

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67332

llvm-svn: 371349
2019-09-08 20:14:15 +00:00
Bjorn Pettersson 5e331e4ce8 [Intrinsic] Add the llvm.umul.fix.sat intrinsic
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.

Patch by: leonardchan, bjope

Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel

Reviewed By: leonardchan

Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57836

llvm-svn: 371308
2019-09-07 12:16:14 +00:00
Nikita Popov fdc6977ff3 [LVI] Look through extractvalue of insertvalue
This addresses the issue mentioned on D19867. When we simplify
with.overflow instructions in CVP, we leave behind extractvalue
of insertvalue sequences that LVI no longer understands. This
means that we can not simplify any instructions based on the
with.overflow anymore (until some over pass like InstCombine
cleans them up).

This patch extends LVI extractvalue handling by calling
SimplifyExtractValueInst (which doesn't do anything more than
constant folding + looking through insertvalue) and using the block
value of the simplification.

A possible alternative would be to do something similar to
SimplifyIndVars, where we instead directly try to replace
extractvalue users of the with.overflow. This would need some
additional structural changes to CVP, as it's currently not legal
to remove anything but the current instruction -- we'd have to
introduce a worklist with instructions scheduled for deletion or similar.

Differential Revision: https://reviews.llvm.org/D67035

llvm-svn: 371306
2019-09-07 12:03:59 +00:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Evandro Menezes 7feb812ccd [ConstantFolding] Refactor functions not available before C99 (NFC)
Note the cases when calling a function at compile time may fail if the host
does not support the C99 run time library.

llvm-svn: 371236
2019-09-06 18:24:21 +00:00
Evandro Menezes 6f1369755d [ConstantFolding] Refactor function match for better speed (NFC)
Use an `enum` instead of string comparison to match the candidate function.

llvm-svn: 371228
2019-09-06 16:49:49 +00:00
Guillaume Chatelet 33671ceffa [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67223

llvm-svn: 371063
2019-09-05 13:09:42 +00:00
Alina Sbirlea 6da79ce1fe [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370957
2019-09-04 19:16:04 +00:00
Philip Reames 4228245e41 [NFC] Switch last couple of invariant_load checks to use hasMetadata
llvm-svn: 370948
2019-09-04 18:27:31 +00:00
Evandro Menezes 3b705ef712 [TargetLibraryInfo] Define enumerator for no library function (NFC)
Add a null enumerator do designate no library function.

llvm-svn: 370947
2019-09-04 18:15:58 +00:00
Philip Reames 27820f9909 [Instruction] Add hasMetadata(Kind) helper [NFC]
It's a common idiom, so let's add the obvious wrapper for metadata kinds which are basically booleans.

llvm-svn: 370933
2019-09-04 17:28:48 +00:00
Alina Sbirlea 594f0e0927 [MemorySSA] Move two verify calls under expensive checks.
llvm-svn: 370831
2019-09-04 00:44:54 +00:00
Alina Sbirlea ccb1862bc9 [MemorySSA] Disable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370821
2019-09-03 21:20:46 +00:00
Alina Sbirlea e331d50534 [MemorySSA] Re-enable MemorySSA use.
Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370811
2019-09-03 19:28:37 +00:00
Roman Lebedev ff21e3f055 [ConstantFolding] Fix 'undef' folding for @llvm.[us]{add,sub}.with.overflow ops (PR43188)
As we have already established/fixed in
  https://bugs.llvm.org/show_bug.cgi?id=42209
  https://reviews.llvm.org/D63065
  https://reviews.llvm.org/rL363522
the InstSimplify handling for @llvm.with.overflow ops with undefs
is correct. Therefore if ConstantFolding produces different results,
then it is wrong.

This duplication of code hints at the need for some refactoring,
but for now address the brokenness of ConstantFolding by
copying the known-good handling from rL363522.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43188

llvm-svn: 370608
2019-09-01 11:56:52 +00:00
Nikita Popov ac5821395b [LVI] Extract solveBlockValueExtractValue(); NFC
Extract this method in preparation for additional extractvalue
support.

llvm-svn: 370575
2019-08-31 09:58:50 +00:00
Alina Sbirlea 3d03769ba0 [MemorySSA] Rename all phi entries.
When renaming Phis incoming values, there may be multiple edges incoming
from the same block (switch). Rename all.

llvm-svn: 370548
2019-08-30 23:02:53 +00:00
Alina Sbirlea 4b87023bae Revert enabling MemorySSA.
Breaks sanitizers bots.

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370397
2019-08-29 19:01:23 +00:00
Alina Sbirlea 6289ee941d [MemorySSA & LoopPassManager] Enable MemorySSA as loop dependency. Update tests.
Summary:
I'm not planning to check this in at the moment, but feedback is very welcome, in particular how this affects performance.
The feedback obtains here will guide the next steps towards enabling this.

This patch enables the use of MemorySSA in the loop pass manager.

Passes that currently use MemorySSA:
 - EarlyCSE
Passes that use MemorySSA after this patch:
 - EarlyCSE
 - LICM
 - SimpleLoopUnswitch
Loop passes that update MemorySSA (and do not use it yet, but could use it after this patch):
 - LoopInstSimplify
 - LoopSimplifyCFG
 - LoopUnswitch
 - LoopRotate
 - LoopSimplify
 - LCSSA
Loop passes that do *not* update MemorySSA:
 - IndVarSimplify
 - LoopDelete
 - LoopIdiom
 - LoopSink
 - LoopUnroll
 - LoopInterchange
 - LoopUnrollAndJam
 - LoopVectorize
 - LoopReroll
 - IRCE

Reviewers: chandlerc, george.burgess.iv, davide, sanjoy, gberry

Subscribers: jlebar, Prazek, dmgreen, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58311

llvm-svn: 370384
2019-08-29 17:08:13 +00:00
Joerg Sonnenberger 799c96693f Allow replaceAndRecursivelySimplify to list unsimplified visitees.
This is part of D65280 and split it to avoid ABI changes on the 9.0
release branch.

llvm-svn: 370355
2019-08-29 13:22:30 +00:00
Roman Lebedev c584786854 [InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` inverted overflow bit
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow or zero
  %iszero = icmp eq i4 %y, 0
  %umul = smul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %umul.ov.not = xor %umul.ov, -1
  %retval.0 = or i1 %iszero, %umul.ov.not
  ret i1 %retval.0
=>
  %iszero = icmp eq i4 %y, 0
  %umul = smul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %umul.ov.not = xor %umul.ov, -1
  %retval.0 = or i1 %iszero, %umul.ov.not
  ret i1 %umul.ov.not

Done: 1
Optimization is correct!
```
Note that this is inverted from what we have in a previous patch,
here we are looking for the inverted overflow bit.
And that inversion is kinda problematic - given this particular
pattern we neither hoist that `not` closer to `ret` (then the pattern
would have been identical to the one without inversion,
and would have been handled by the previous patch), neither
do the opposite transform. But regardless, we should handle this too.
I've filled [[ https://bugs.llvm.org/show_bug.cgi?id=42720 | PR42720 ]].

Reviewers: nikic, spatel, xbolva00, RKSimon

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65151

llvm-svn: 370351
2019-08-29 12:48:04 +00:00
Roman Lebedev aaf6ab4410 [InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` overflow bit
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow and not zero
  %iszero = icmp ne i4 %y, 0
  %umul = umul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %retval.0 = and i1 %iszero, %umul.ov
  ret i1 %retval.0
=>
  %iszero = icmp ne i4 %y, 0
  %umul = umul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %retval.0 = and i1 %iszero, %umul.ov
  ret %umul.ov

Done: 1
Optimization is correct!
```

Reviewers: nikic, spatel, xbolva00

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65150

llvm-svn: 370350
2019-08-29 12:47:50 +00:00
Roman Lebedev cc95a45f8a [CostModel] Model all `extractvalue`s as free.
Summary:
As disscussed in https://reviews.llvm.org/D65148#1606412,
`extractvalue` don't actually generate any code,
so we should treat them as free.

Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen

Reviewed By: jmolloy

Subscribers: javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66098

llvm-svn: 370339
2019-08-29 11:50:30 +00:00
David Bolvansky 05bda8b4e5 Annotate return values of allocation functions with dereferenceable_or_null
Summary:
Example
define dso_local noalias i8* @_Z6maixxnv() local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(64) i8* @malloc(i64 64) #6
  ret i8* %call
}


Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: aaron.ballman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66651

llvm-svn: 370168
2019-08-28 08:28:20 +00:00
Philip Reames 93a26ec98d [NFC] Assert preconditions and merge all users into one codepath in Loads.cpp
llvm-svn: 370128
2019-08-27 23:36:31 +00:00
Philip Reames 2694522f13 [Loads/SROA] Remove blatantly incorrect code and fix a bug revealed in the process
The code we had isSafeToLoadUnconditionally was blatantly wrong. This function takes a "Size" argument which is supposed to describe the span loaded from. Instead, the code use the size of the pointer passed (which may be unrelated!) and only checks that span. For any Size > LoadSize, this can and does lead to miscompiles.

Worse, the generic code just a few lines above correctly handles the cases which *are* valid. So, let's delete said code.

Removing this code revealed two issues:
1) As noted by jdoerfert the removed code incorrectly handled external globals.  The test update in SROA is to stop testing incorrect behavior.
2) SROA was confusing bytes and bits, but this wasn't obvious as the Size parameter was being essentially ignored anyway.  Fixed.

Differential Revision: https://reviews.llvm.org/D66778

llvm-svn: 370102
2019-08-27 19:34:43 +00:00
Philip Reames 20650eda99 [NFC] Replace the FIXME I added in rL369989 with a comment clarifying the current code
The current approach is restrictive (as all of geps must be multiples of the alignment), but correct.  

llvm-svn: 370013
2019-08-27 04:52:35 +00:00
Alina Sbirlea 228ffac678 [MemorySSA] Fix insertUse.
Actually call the renamePass on inserted Phis.
Fixes PR42940.

Subscribers: llvm-commits
llvm-svn: 369997
2019-08-27 00:34:47 +00:00
Philip Reames 2f858c2e91 Reorganize code and add a fixme to point out a bug in existing code [NFC]
llvm-svn: 369989
2019-08-26 23:57:27 +00:00
Benjamin Kramer d5e60669c4 [TLI] Simplify code. NFCI.
llvm-svn: 369854
2019-08-24 17:30:12 +00:00
Benjamin Kramer 7043477042 Fix some accidental global initializers by using StringLiteral instead of StringRef
llvm-svn: 369850
2019-08-24 15:24:25 +00:00
Johannes Doerfert 22e6e108e1 [BasicAA] Use dereferenceability to reason about aliasing
Summary:
We already use the fact that an object with known size X does not alias
another objection of size Y > X before. With this commit, we use
dereferenceability information to determine a lower bound for Y and not
only rely on the user provided query size.

The result for @global_and_deref_arg_2() and @local_and_deref_ret_2()
in test/Analysis/BasicAA/dereferenceable.ll improved with this patch.

Reviewers: asbirlea, chandlerc, hfinkel, sanjoy

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66157

llvm-svn: 369786
2019-08-23 17:56:10 +00:00
Johannes Doerfert a5b10b464e [MustExec] Add a generic "must-be-executed-context" explorer
Given an instruction I, the MustBeExecutedContextExplorer allows to
easily traverse instructions that are guaranteed to be executed whenever
I is. For now, these instruction have to be statically "after" I, in
the same or different basic blocks.

This patch also adds a pass which prints the must-be-executed-context
for each instruction in a module. It is used to test the
MustBeExecutedContextExplorer, for now on the examples given in the
class comment of the MustBeExecutedIterator.

Differential Revision: https://reviews.llvm.org/D65186

llvm-svn: 369765
2019-08-23 15:17:27 +00:00
Peter Collingbourne 2452d7030b IR. Change strip* family of functions to not look through aliases.
I noticed another instance of the issue where references to aliases were
being replaced with aliasees, this time in InstCombine. In the instance that
I saw it turned out to be only a QoI issue (a symbol ended up being missing
from the symbol table due to the last reference to the alias being removed,
preventing HWASAN from symbolizing a global reference), but it could easily
have manifested as incorrect behaviour.

Since this is the third such issue encountered (previously: D65118, D65314)
it seems to be time to address this common error/QoI issue once and for all
and make the strip* family of functions not look through aliases.

Includes a test for the specific issue that I saw, but no doubt there are
other similar bugs fixed here.

As with D65118 this has been tested to make sure that the optimization isn't
load bearing. I built Clang, Chromium for Linux, Android and Windows as well
as the test-suite and there were no size regressions.

Differential Revision: https://reviews.llvm.org/D66606

llvm-svn: 369697
2019-08-22 19:56:14 +00:00
Alina Sbirlea 7425179fee [LoopPassManager + MemorySSA] Only enable use of MemorySSA for LPMs known to preserve it.
Summary:
Add a flag to the FunctionToLoopAdaptor that allows enabling MemorySSA only for the loop pass managers that are known to preserve it.

If an LPM is known to have only loop transforms that *all* preserve MemorySSA, then use MemorySSA if `EnableMSSALoopDependency` is set.
If an LPM has loop passes that do not preserve MemorySSA, then the flag passed is `false`, regardless of the value of `EnableMSSALoopDependency`.

When using a custom loop pass pipeline via `passes=...`, use keyword `loop` vs `loop-mssa` to use MemorySSA in that LPM. If a loop that does not preserve MemorySSA is added while using the `loop-mssa` keyword, that's an error.

Add the new `loop-mssa` keyword to a few tests where a difference occurs when enabling MemorySSA.

Reviewers: chandlerc

Subscribers: mehdi_amini, Prazek, george.burgess.iv, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66376

llvm-svn: 369548
2019-08-21 17:00:57 +00:00
Guillaume Chatelet 1c18a9cb9e [LLVM][Alignment] Introduce Alignment In MachineFrameInfo
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: jfb

Subscribers: hiraditya, dexonsmith, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65800

llvm-svn: 369531
2019-08-21 14:29:30 +00:00
Alina Sbirlea 2863721f05 [MemorySSA] Make Phi cleanups consistent.
Summary:
Make Phi cleanups consistent: remove self as a trivial Phi and
recurse to potentially remove other trivial phis.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66454

llvm-svn: 369466
2019-08-20 22:47:58 +00:00
Alina Sbirlea 1c528e8f1b [MemorySSA] Fix existing phis when inserting defs.
Summary:
When inserting a new Def, and inserting Phis in the IDF when needed,
also mark the already existing Phis in the IDF as non-optimized, since
these may need fixing as well.
In the test attached, there is a Phi in the IDF that happens to be
trivial, and is wrongfully removed by the call to getLastDef that
follows. This is a valid situation and the existing IDF Phis need to
marked as "may need fixing" as well.
Resolves PR43044.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66495

llvm-svn: 369464
2019-08-20 22:29:06 +00:00
Johannes Doerfert 8b962f2814 [CaptureTracker] Let subclasses provide dereferenceability information
Summary:
CaptureTracker subclasses might have better dereferenceability
information which allows null pointer checks to be no-capturing.
The first user will be D59922.

Reviewers: sanjoy, hfinkel, aykevl, sstefan1, uenoku, xbolva00

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66371

llvm-svn: 369305
2019-08-19 21:56:38 +00:00
Evgeniy Stepanov 55ccd16354 Refactor isPointerOffset (NFC).
Summary:
Simplify the API using Optional<> and address comments in
         https://reviews.llvm.org/D66165

Reviewers: vitalybuka

Subscribers: hiraditya, llvm-commits, ostannard, pcc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66317

llvm-svn: 369300
2019-08-19 21:08:04 +00:00
Alina Sbirlea 1a3fdaf6a6 [MemorySSA] Rename uses when inserting memory uses.
Summary:
When inserting uses from outside the MemorySSA creation, we don't
normally need to rename uses, based on the assumption that there will be
no inserted Phis (if  Def existed that required a Phi, that Phi already
exists). However, when dealing with unreachable blocks, MemorySSA will
optimize away Phis whose incoming blocks are unreachable, and these Phis end
up being re-added when inserting a Use.
There are two potential solutions here:
1. Analyze the inserted Phis and clean them up if they are unneeded
(current method for cleaning up trivial phis does not cover this)
2. Leave the Phi in place and rename uses, the same way as whe inserting
defs.
This patch use approach 2.

Resolves first test in PR42940.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66033

llvm-svn: 369291
2019-08-19 18:57:40 +00:00
Johannes Doerfert 17cb918536 [CaptureTracking] Allow null to be in either icmp operand
Summary:
Before we required the comparison against null to be "canonical", hence
null to be operand #1. This patch allows null to be in either operand,
similar to the handling of loaded globals that follows.

Reviewers: sanjoy, hfinkel, aykevl, sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66321

llvm-svn: 369158
2019-08-16 21:53:49 +00:00
Benjamin Kramer 31a47f9890 Revert "[CallGraph] Refine call graph for indirect calls with !callees metadata"
This reverts commit r369025. Crashes clang, test case is on the mailing
list.

llvm-svn: 369096
2019-08-16 10:59:18 +00:00
Tim Northover 22970d66be AssumptionCache: remove old affected values after RAUW.
If they're left in the cache then they can't be removed efficiently when the
cache is notified to unlink a @llvm.assume call, and that can lead to values
from different functions entirely remaining there.

llvm-svn: 369091
2019-08-16 09:34:27 +00:00
Florian Hahn 75be1a9e58 [ValueTracking] Fix recurrence detection to check both PHI operands.
Summary:
Currently we fail to compute known bits for recurrences where the
first incoming value is the start value of the recurrence.

Instead of exiting the loop when the first incoming value is not
the step of the recurrence, continue to check the second incoming
value.

The original code uses a loop to handle both cases, but incorrectly
exits instead of continuing.

Reviewers: lebedev.ri, spatel, nikic

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66216

llvm-svn: 369088
2019-08-16 09:15:02 +00:00
Evgeniy Stepanov 75344955fc Move isPointerOffset function to ValueTracking (NFC).
Summary: To be reused in MemTag sanitizer.

Reviewers: pcc, vitalybuka, ostannard

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66165

llvm-svn: 369062
2019-08-15 22:58:28 +00:00
Alina Sbirlea 79ff20428e [MemorySSA] Remove restrictive asserts.
The verification I added has overly restrictive asserts.
Unreachable blocks can have any incoming value in practice, after an
update due to a "replaceAllUses" call when the repalced entry is
LiveOnEntry.

llvm-svn: 369050
2019-08-15 21:20:08 +00:00
Florian Hahn 3f2850bc60 [ValueTracking] Look through ptrmask intrinsics during getUnderlyingObject.
Reviewers: nlopes, efriedma, hfinkel, sanjoy, aqjune, jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61669

llvm-svn: 369036
2019-08-15 18:39:56 +00:00
Mark Lacey 626ed22fbe [CallGraph] Refine call graph for indirect calls with !callees metadata
For indirect call sites having a small set of possible callees,
!callees metadata can be used to indicate what those callees are.
This patch updates the call graph and lazy call graph analyses so
that they consider this metadata when encountering call sites. For
the call graph, it adds a new external call graph node to the graph
for each unique !callees metadata node. A call graph edge connects
an indirect call site with the external node associated with the
!callees metadata that is attached to it. And there is an edge from
this external node to each of the callees indicated by the metadata.
Similarly, for the lazy call graph, the patch adds Ref edges from a
caller to the possible callees indicated by the metadata.

The primary purpose of the patch is to facilitate iterating over the
functions in a module such that all of the callees indicated by a
given !callees metadata node will be visited prior to the functions
containing call sites annotated by that node. This property is
required by optimizations performing a bottom-up traversal of the
SCC DAG. For example, the inliner can be made to inline through an
indirect call. If the call site is annotated with !callees metadata,
this patch ensures that the inliner will have visited all of the
callees prior to the caller, allowing it to reliably compute the
cost of inlining one or more of the potential callees.

Original patch by @mssimpso. I've made some small changes to get it
to apply, build, and pass tests on the top of tree, as well as
some minor tweaks to formatting and functionality.

Subscribers: mehdi_amini, hiraditya, llvm-commits, mssimpso

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D39339

llvm-svn: 369025
2019-08-15 17:47:53 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Florian Hahn fd72bf21c9 [ValueTracking] Add MustPreserveNullness arg to functions analyzing calls. (NFC)
Some uses of getArgumentAliasingToReturnedPointer and
isIntrinsicReturningPointerAliasingArgumentWithoutCapturing require the
calls/intrinsics to preserve the nullness of the argument.

For alias analysis, the nullness property does not really come into
play.

This patch explicitly sets it to true. In D61669, the alias analysis
uses will be switched to not require preserving nullness.

Reviewers: nlopes, efriedma, hfinkel, sanjoy, aqjune, jdoerfert

Reviewed By: jdoerfert

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64150

llvm-svn: 368993
2019-08-15 12:13:02 +00:00
Philip Reames 7b0515176b [SCEV] Rename getMaxBackedgeTakenCount to getConstantMaxBackedgeTakenCount [NFC]
llvm-svn: 368930
2019-08-14 21:58:13 +00:00
Matt Arsenault dbc1f207fa InferAddressSpaces: Move target intrinsic handling to TTI
I'm planning on handling intrinsics that will benefit from checking
the address space enums. Don't bother moving the address collection
for now, since those won't need th enums.

llvm-svn: 368895
2019-08-14 18:13:00 +00:00
Nikita Popov 2a4f26b4c2 [ValueTracking] Improve reverse assumption inference
Use isGuaranteedToTransferExecutionToSuccessor() instead of
isSafeToSpeculativelyExecute() when seeing whether we can propagate
the information in an assume backwards in isValidAssumeForContext().
The latter is more general - it also allows arbitrary loads/stores -
and is also the condition we want: if our assume is guaranteed to
execute, its condition not holding would be UB.

Original patch by arielb1.

Differential Revision: https://reviews.llvm.org/D37215

llvm-svn: 368723
2019-08-13 17:15:42 +00:00
Stanislav Mekhanoshin 4c9c98f36b [AMDGPU] Printf runtime binding pass
This pass is a port of the according pass from the HSAIL compiler.
It parses printf calls and setup runtime printf buffer.
After that it copies printf arguments to the buffer and fills in
module metadata for runtime.

Differential Revision: https://reviews.llvm.org/D24035

llvm-svn: 368592
2019-08-12 17:12:29 +00:00
Fedor Sergeev 92e160abab [MemDep] allow to select block-scan-limit when constructing MemoryDependenceAnalysis
Introducing non-global control for default block-scan-limit in MemDep analysis.
Useful when there are many compilations per initialized LLVM instance (e.g. JIT).

Reviewed By: asbirlea
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65806

llvm-svn: 368502
2019-08-10 01:23:38 +00:00
Whitney Tsang dd3b6498b0 Title: Loop Cache Analysis
Summary: Implement a new analysis to estimate the number of cache lines
required by a loop nest.
The analysis is largely based on the following paper:

Compiler Optimizations for Improving Data Locality
By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng
http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf
The analysis considers temporal reuse (accesses to the same memory
location) and spatial reuse (accesses to memory locations within a cache
line). For simplicity the analysis considers memory accesses in the
innermost loop in a loop nest, and thus determines the number of cache
lines used when the loop L in loop nest LN is placed in the innermost
position.

The result of the analysis can be used to drive several transformations.
As an example, loop interchange could use it determine which loops in a
perfect loop nest should be interchanged to maximize cache reuse.
Similarly, loop distribution could be enhanced to take into
consideration cache reuse between arrays when distributing a loop to
eliminate vectorization inhibiting dependencies.

The general approach taken to estimate the number of cache lines used by
the memory references in the inner loop of a loop nest is:

Partition memory references that exhibit temporal or spatial reuse into
reference groups.
For each loop L in the a loop nest LN: a. Compute the cost of the
reference group b. Compute the 'cache cost' of the loop nest by summing
up the reference groups costs
For further details of the algorithm please refer to the paper.
Authored By: etiotto
Reviewers: hfinkel, Meinersbur, jdoerfert, kbarton, bmahjour, anemet,
fhahn
Reviewed By: Meinersbur
Subscribers: reames, nemanjai, MaskRay, wuzish, Hahnfeld, xusx595,
venkataramanan.kumar.llvm, greened, dmgreen, steleman, fhahn, xblvaOO,
Whitney, mgorny, hiraditya, mgrang, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63459

llvm-svn: 368439
2019-08-09 13:56:29 +00:00
Craig Topper 66c08430f6 [ValueTracking] When calculating known bits for integer abs, make sure we're looking at a negate and not just any instruction with the nsw flag set.
The matchSelectPattern code can match patterns like (x >= 0) ? x : -x
for absolute value. But it can also match ((x-y) >= 0) ? (x-y) : (y-x).
If the latter form was matched we can only use the nsw flag if its
set on both subtracts.

This match makes sure we're looking at the former case only.

Differential Revision: https://reviews.llvm.org/D65692

llvm-svn: 368195
2019-08-07 18:28:16 +00:00
Nikolai Bozhenov 03edcd68dd [SCEV] Return zero from computeConstantDifference(X, X)
Without this patch computeConstantDifference returns None for cases like
these:

  computeConstantDifference(%x, %x)
  computeConstantDifference({%x,+,16}, {%x,+,16})

Differential Revision: https://reviews.llvm.org/D65474

llvm-svn: 368193
2019-08-07 17:38:38 +00:00
Alex Lorenz 8d5c280316 TLI: darwin does not support _bcmp
Not all Darwin targets support _bcmp in all circumstances.

Differential Revision: https://reviews.llvm.org/D65834

llvm-svn: 368113
2019-08-07 00:03:37 +00:00
Fangrui Song cb4327d7db Change two unnecessary uses of llvm::size(C) to C.size()
llvm-svn: 368011
2019-08-06 10:24:36 +00:00
David Bolvansky ef72cded32 [TLI][NFC] Fixed typo
llvm-svn: 367827
2019-08-05 10:14:09 +00:00
Fangrui Song d9b948b6eb Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC
F_{None,Text,Append} are kept for compatibility since r334221.

llvm-svn: 367800
2019-08-05 05:43:48 +00:00
Sanjay Patel 9ce5f41851 [InstCombine] fold cmp+select using select operand equivalence
As discussed in PR42696:
https://bugs.llvm.org/show_bug.cgi?id=42696
...but won't help that case yet.

We have an odd situation where a select operand equivalence fold was
implemented in InstSimplify when it could have been done more generally
in InstCombine if we allow dropping of {nsw,nuw,exact} from a binop operand.

Here's an example:
https://rise4fun.com/Alive/Xplr

  %cmp = icmp eq i32 %x, 2147483647
  %add = add nsw i32 %x, 1
  %sel = select i1 %cmp, i32 -2147483648, i32 %add
  =>
  %sel = add i32 %x, 1

I've left the InstSimplify code in place for now, but my guess is that we'd
prefer to remove that as a follow-up to save on code duplication and
compile-time.

Differential Revision: https://reviews.llvm.org/D65576

llvm-svn: 367695
2019-08-02 17:39:32 +00:00