Commit Graph

8833 Commits

Author SHA1 Message Date
Rachel Craik f897d087d0 [LoopCacheAnalysis]: Fix assertion failure during cost computation
Ensure the stride and trip count have the same type before multiplying them during reference cost calculation

Reviewed By: jdoefert

Differential Revision: https://reviews.llvm.org/D70192
2019-11-15 14:56:26 -05:00
Francesco Petrogalli d6de5f12d4 [SVFS] Inject TLI Mappings in VFABI attribute.
This patch introduces a function pass to inject the scalar-to-vector
mappings stored in the TargetLIbraryInfo (TLI) into the Vector
Function ABI (VFABI) variants attribute.

The test is testing the injection for three vector libraries supported
by the TLI (Accelerate, SVML, MASSV).

The pass does not change any of the analysis associated to the
function.

Differential Revision: https://reviews.llvm.org/D70107
2019-11-15 18:42:56 +00:00
Reid Kleckner 4c1a1d3cf9 Add missing includes needed to prune LLVMContext.h include, NFC
These are a pre-requisite to removing #include "llvm/Support/Options.h"
from LLVMContext.h: https://reviews.llvm.org/D70280
2019-11-14 15:23:15 -08:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Hans Wennborg 6ea4775900 Revert 57dd4b0 "[ValueTracking] Allow context-sensitive nullness check for non-pointers"
This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced
repro is small enough to fit here:

  $ cat /tmp/a.c
  unsigned char f(unsigned char *p) {
    unsigned char result = 0;
    for (int shift = 0; shift < 1; ++shift)
      result |= p[0] << (shift * 8);
    return result;
  }
  $ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f:
  f:                                      # @f
          .cfi_startproc
  # %bb.0:                                # %entry
          xorl    %eax, %eax
          retq

That's nicely optimized, but I don't think it's the right result :-)

> Same as D60846 but with a fix for the problem encountered there which
> was a missing context adjustment in the handling of PHI nodes.
>
> The test that caused D60846 to be reverted was added in e15ab8f277.
>
> Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
>
> Subscribers: hiraditya, bollu, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D69571

This reverts commit 57dd4b03e4.
2019-11-13 12:19:02 +01:00
Francesco Petrogalli d8b6b11143 [VFABI] Add LLVM internal mangling for vector functions.
Summary:
This patch adds a custom ISA for vector functions for internal use
in LLVM. The <isa> token is set to "_LLVM_", and it is not attached
to any specific instruction Vector ISA, or Vector Function ABI.

The ISA is used as a token for handling Vector Function ABI-style
vectorization for those vector functions that are not directly
associated to any existing Vector Function ABI (for example, some of
the vector functions exposed by TargetLibraryInfo). The demangling
function for this ISA in a Vector Function ABI context is set to be
the same as the common one shared between X86 and AArch64.

Reviewers: jdoerfert, sdesmalen, simoll

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70089
2019-11-13 03:26:39 +00:00
Eric Christopher 7a3ad48d6d Temporarily Revert "Reapply [LVI] Normalize pointer behavior" as it's broken python 3.6.
Reverting to figure out if it's a problem in python or the compiler for now.

This reverts commit 885a05f48a.
2019-11-12 15:51:51 -08:00
Alina Sbirlea db69f1b229 [GlobalsAA] Restrict ModRef result if any internal method has its address taken.
Summary:
If there are any internal methods whose address was taken, conclude there is nothing known in relation of any other internal method and a global.

Reviewers: nlopes, sanjoy.google

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69690
2019-11-12 14:24:56 -08:00
Diana Picus 7f1dcc8952 [InstCombine] Skip scalable vectors in combineLoadToOperationType
Don't try to canonicalize loads to scalable vector types to loads
of integers.

This removes one assertion when trying to use a TypeSize as a parameter
to DataLayout::isLegalInteger. It does not handle the second part of the
function (which looks at bitcasts).

This patch also contains a NFC fix for Load Analysis, where a variable
initialization that would cause the same assertion is moved closer to
its use. This allows us to run the new test for InstCombine without
having to teach LocationSize to play nicely with scalable vectors.

Differential Revision: https://reviews.llvm.org/D70075
2019-11-12 12:27:09 +01:00
Francesco Petrogalli e9a06e0606 [VFABI] Read/Write functions for the VFABI attribute.
The attribute is stored at the `FunctionIndex` attribute set, with the
name "vector-function-abi-variant".

The get/set methods of the attribute have assertion to verify that:

1. Each name in the attribute is a valid VFABI mangled name.

2. Each name in the attribute correspond to a function declared in the
   module.

Differential Revision: https://reviews.llvm.org/D69976
2019-11-12 03:40:42 +00:00
aqjune 4187cb138b Add InstCombine/InstructionSimplify support for Freeze Instruction
Summary:
- Add llvm::SimplifyFreezeInst
- Add InstCombiner::visitFreeze
- Add llvm tests

Reviewers: majnemer, sanjoy, reames, lebedev.ri, spatel

Reviewed By: reames, lebedev.ri

Subscribers: reames, lebedev.ri, filcab, regehr, trentxintong, llvm-commits

Differential Revision: https://reviews.llvm.org/D29013
2019-11-12 12:13:26 +09:00
Simon Pilgrim 0040c4ba1e Fix -Wparentheses warning. NFCI. 2019-11-11 11:13:32 +00:00
Tsang Whitney W.H 89453d186d [NFC]: Fix PVS Studio warning in LoopNestAnalysis
Summary:This patch fixes the following warnings uncovered by PVS
Studio:

/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp
353 warn V612 An unconditional 'return' within a loop.
/home/xbolva00/LLVM/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp
456 err V502 Perhaps the '?:' operator works in a different way than it
was expected. The '?:' operator has a lower priority than the '=='
operator.
Authored By:etiotto
Reviewer:Meinersbur, kbarton, bmahjour, Whitney, xbolva00
Reviewed By:xbolva00
Subscribers:hiraditya, llvm-commits
Tag:LLVM
Differential Revision:https://reviews.llvm.org/D69821
2019-11-10 05:39:40 +00:00
Teresa Johnson b11391bb47 ThinLTO : Import always_inline functions irrespective of the threshold
Summary: A user can force a function to be inlined by specifying the always_inline attribute. Currently, thinlto implementation is not aware of always_inline functions and does not guarantee import of such functions, which in turn can prevent inlining of such functions.

Patch by Bharathi Seshadri <bseshadr@cisco.com>

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70014
2019-11-08 17:02:01 -08:00
bmahjour f0af11d86f [DDG] Data Dependence Graph - Pi Block
Summary:
    This patch adds Pi Blocks to the DDG. A pi-block represents a group of DDG
    nodes that are part of a strongly-connected component of the graph.
    Replacing all the SCCs with pi-blocks results in an acyclic representation
    of the DDG. For example if we have:
       {a -> b}, {b -> c, d}, {c -> a}
    the cycle a -> b -> c -> a is abstracted into a pi-block "p" as follows:
       {p -> d} with "p" containing: {a -> b}, {b -> c}, {c -> a}
    In this implementation the edges between nodes that are part of the pi-block
    are preserved. The crossing edges (edges where one end of the edge is in the
    set of nodes belonging to an SCC and the other end is outside that set) are
    replaced with corresponding edges to/from the pi-block node instead.

    Authored By: bmahjour

    Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

    Reviewed By: Meinersbur

    Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

    Tag: #llvm

    Differential Revision: https://reviews.llvm.org/D68827
2019-11-08 15:46:08 -05:00
Nikita Popov 885a05f48a Reapply [LVI] Normalize pointer behavior
Fix cache invalidation by not guarding the dereferenced pointer cache
erasure by SeenBlocks. SeenBlocks is only populated when actually
caching a value in the block, which doesn't necessarily have to happen
just because dereferenced pointers were calculated.

-----

Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914
2019-11-08 20:13:55 +01:00
Nikita Popov 43ae5f4386 Revert "[LVI] Normalize pointer behavior"
This reverts commit 15bc4dc9a8.

clang-cmake-x86_64-sde-avx512-linux buildbot reported quite a few
compile-time regressions in test-suite, will investigate.
2019-11-08 18:22:34 +01:00
Nikita Popov 15bc4dc9a8 [LVI] Normalize pointer behavior
Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914
2019-11-08 17:57:14 +01:00
Hans Wennborg eaff300401 Revert f0c2a5a "[LV] Generalize conditions for sinking instrs for first order recurrences."
It broke Chromium, causing "Instruction does not dominate all uses!" errors.
See https://bugs.chromium.org/p/chromium/issues/detail?id=1022297#c1 for a
reproducer.

> If the recurrence PHI node has a single user, we can sink any
> instruction without side effects, given that all users are dominated by
> the instruction computing the incoming value of the next iteration
> ('Previous'). We can sink instructions that may cause traps, because
> that only causes the trap to occur later, but not on any new paths.
>
> With the relaxed check, we also have to make sure that we do not have a
> direct cycle (meaning PHI user == 'Previous), which indicates a
> reduction relation, which potentially gets missed by
> ReductionDescriptor.
>
> As follow-ups, we can also sink stores, iff they do not alias with
> other instructions we move them across and we could also support sinking
> chains of instructions and multiple users of the PHI.
>
> Fixes PR43398.
>
> Reviewers: hsaito, dcaballe, Ayal, rengolin
>
> Reviewed By: Ayal
>
> Differential Revision: https://reviews.llvm.org/D69228
2019-11-07 11:00:02 +01:00
Philip Reames 686f449e3d [WC] Fix a subtle bug in our definition of widenable branch
We had a subtle, but nasty bug in our definition of a widenable branch, and thus in the transforms which used that utility. Specifically, we returned true for any branch which included a widenable condition within it's condition, regardless of whether that widenable condition also had other uses.

The problem is that the result of the WC() call is defined to be one particular value. As such, all users must agree as to what that value is. If we widen a branch without also updating *all other users* of the WC in the same way, we have broken the required semantics.

Most of the textual diff is updating existing transforms not to leave dead uses hanging around. They're largely NFC as the dead instructions would be immediately deleted by other passes. The reason to make these changes is so that the transforms preserve the widenable branch form.

In practice, we don't get bitten by this only because it isn't profitable to CSE WC() calls and the lowering pass from guards uses distinct WC calls per branch.

Differential Revision: https://reviews.llvm.org/D69916
2019-11-06 14:16:34 -08:00
Philip Reames 9bfa5ab3d1 [LoopPred] Fix two subtle issues found by inspection
This patch fixes two issues noticed by inspection when going to enable the loop predication code in IndVarSimplify.

Issue 1 - Both the LoopPredication transform, and the already on by default optimizeLoopExits transform, modify the exit count of the exits they modify. (either to 0 or Infinity) Looking at the code more closely, this was not reflected into SCEV and we were instead running later transforms with incorrect SCEVs. Fixing this requires forgetting the loop, weakening a too strong assert, and updating SCEV to not pessimize results when a loop is provable untaken. I haven't been able to find a test case to demonstrate the miscompile.

Issue 2 - For modules without a data layout, we can end up with unsized pointer typed exit counts. Just bail out of this case.

I think these are the last two issues which need addressed before we enable this by default. The code has already survived a decent amount of fuzzing without revealing either of the above.

Differential Revision: https://reviews.llvm.org/D69695
2019-11-06 14:04:45 -08:00
Sjoerd Meijer 6c2a4f5ff9 [TTI][LV] preferPredicateOverEpilogue
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.

While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
  and its corresponding loophint.

Differential Revision: https://reviews.llvm.org/D69040
2019-11-06 10:14:20 +00:00
Sanjay Patel 659bd73d13 [InstSimplify] use FMF to improve fcmp+select fold
This is part of a series of patches needed to solve PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535
2019-11-04 08:29:56 -05:00
Dávid Bolvanský decd8c4844 [SCEV] Fixed 'Uninitialized variable 'ContainsAddRec' used.' warning. NFCI. 2019-11-03 20:29:49 +01:00
Dávid Bolvanský 717965ae57 [MemorySSA] Fixed null check after dereferencing warning. NFCI. 2019-11-03 20:27:40 +01:00
Florian Hahn f0c2a5af76 [LV] Generalize conditions for sinking instrs for first order recurrences.
If the recurrence PHI node has a single user, we can sink any
instruction without side effects, given that all users are dominated by
the instruction computing the incoming value of the next iteration
('Previous'). We can sink instructions that may cause traps, because
that only causes the trap to occur later, but not on any new paths.

With the relaxed check, we also have to make sure that we do not have a
direct cycle (meaning PHI user == 'Previous), which indicates a
reduction relation, which potentially gets missed by
ReductionDescriptor.

As follow-ups, we can also sink stores, iff they do not alias with
other instructions we move them across and we could also support sinking
chains of instructions and multiple users of the PHI.

Fixes PR43398.

Reviewers: hsaito, dcaballe, Ayal, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D69228
2019-11-02 22:08:27 +01:00
Hiroshi Yamauchi 0d987e411a [PGO][PGSO] TargetLowering/TargetTransformationInfo/SwitchLoweringUtils part.
Summary:
(Split of off D67120)

TargetLowering/TargetTransformationInfo/SwitchLoweringUtils changes for profile
guided size optimization.

Reviewers: davidxl

Subscribers: eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69580
2019-10-31 13:22:56 -07:00
Johannes Doerfert 57dd4b03e4 [ValueTracking] Allow context-sensitive nullness check for non-pointers
Same as D60846 but with a fix for the problem encountered there which
was a missing context adjustment in the handling of PHI nodes.

The test that caused D60846 to be reverted was added in e15ab8f277.

Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69571
2019-10-31 14:37:38 -05:00
Johannes Doerfert cb19ea45a7 [FIX] Make LSan happy by *not* leaking memory
I left a memory leak in a printer pass which made LSan sad so I remove
the memory leak now to make LSan happy.

Reported and tested by vlad.tsyrklevich.
2019-10-31 12:16:54 -05:00
Mikael Holmen c950495405 [MustExecute] Silence clang warning about unused captured 'this'
New code introduced in fe799c97fa caused clang to complain with

../lib/Analysis/MustExecute.cpp:360:34: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  GetterTy<LoopInfo> LIGetter = [this](const Function &F) {
                                 ^~~~
../lib/Analysis/MustExecute.cpp:365:44: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
  GetterTy<PostDominatorTree> PDTGetter = [this](const Function &F) {
                                           ^~~~
2 errors generated.
2019-10-31 09:41:05 +01:00
Johannes Doerfert fe799c97fa [MustExecute] Forward iterate over conditional branches
Summary:
If a conditional branch is encountered we can try to find a join block
where the execution is known to continue. This means finding a suitable
block, e.g., the immediate post dominator of the conditional branch, and
proofing control will always reach that block.

This patch implements different techniques that work with and without
provided analysis.

Reviewers: uenoku, sstefan1, hfinkel

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68933
2019-10-31 00:06:43 -05:00
Guillaume Chatelet a4783ef58d [Alignment][NFC] getMemoryOpCost uses MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69307
2019-10-25 21:26:59 +02:00
Philip Reames 34f68253ca [SCEV] Expose and use maximum constant exit counts for individual loop exits
We were already going to all of the trouble of computing maximum constant exit counts for each loop exit, we might as well expose them through the API.  The change in IndVars is mostly to demonstrate that the wired up code works, but it als very slightly strengthens the transform.  The strengthened case is rather narrow though: it requires one exactly analyzeable exit, one imprecisely analyzeable exit (with the upper bound less than the precise one), and one unanalyzeable exit.  I coudn't construct a reasonably stable test case.

This does increase the memory usage of the BackedgeTakenCount by a factor of 2 in the worst case.

I also noticed the loop in IndVars is O(#Exits ^ 2).  This doesn't change with this patch.  A future patch will cache this result inside of SCEV to avoid requering.
2019-10-24 19:07:33 -07:00
David Blaikie 0e8fc21c2e Fix Clang -Wcovered-switch-default warning by moving llvm_unreachable default to after the switch 2019-10-24 18:56:45 -07:00
Philip Reames c27010ef76 [SCEV] Start reworking backedge taken count APIs to unify max handling [NFC]
This is a first step in figuring out a proper API for maximum (non constant) exit counts.  This may evolve a bit as we get experience with the API needs; suggestions very welcome.  This patch just tried to provide a framework that we can later add maximum too in a clean and obvious way.
2019-10-24 18:21:55 -07:00
Simon Pilgrim c39ba0429c Fix MSVC "switch statement contains 'default' but no 'case' labels" warning. NFCI. 2019-10-24 13:40:13 -07:00
Roman Lebedev 8eda8f8ce8
[LVI][NFC] Factor solveBlockValueSaturatingIntrinsic() out of solveBlockValueIntrinsic()
Now that there's SaturatingInst class, this is cleaner.
2019-10-23 18:17:33 +03:00
Roman Lebedev 1f665046fb
[LVI][CVP] LazyValueInfoImpl::solveBlockValueBinaryOp(): use no-wrap flags from `add` op
Summary:
This was suggested in https://reviews.llvm.org/D69277#1717210
In this form (this is what was suggested, right?), the results aren't staggering
(especially since given LVI cross-block focus)
this does catch some things (as per test-suite), but not too much:

| statistic                                        |       old |       new | delta | % change |
| correlated-value-propagation.NumAddNSW           |      4981 |      4982 |     1 |  0.0201% |
| correlated-value-propagation.NumAddNW            |     12125 |     12126 |     1 |  0.0082% |
| correlated-value-propagation.NumCmps             |      1199 |      1202 |     3 |  0.2502% |
| correlated-value-propagation.NumDeadCases        |       112 |       111 |    -1 | -0.8929% |
| correlated-value-propagation.NumMulNSW           |       275 |       278 |     3 |  1.0909% |
| correlated-value-propagation.NumMulNUW           |      1323 |      1326 |     3 |  0.2268% |
| correlated-value-propagation.NumMulNW            |      1598 |      1604 |     6 |  0.3755% |
| correlated-value-propagation.NumNSW              |      7158 |      7167 |     9 |  0.1257% |
| correlated-value-propagation.NumNUW              |     13304 |     13310 |     6 |  0.0451% |
| correlated-value-propagation.NumNW               |     20462 |     20477 |    15 |  0.0733% |
| correlated-value-propagation.NumOverflows        |         4 |         7 |     3 | 75.0000% |
| correlated-value-propagation.NumPhis             |     15366 |     15381 |    15 |  0.0976% |
| correlated-value-propagation.NumSExt             |      6273 |      6277 |     4 |  0.0638% |
| correlated-value-propagation.NumShlNSW           |      1172 |      1171 |    -1 | -0.0853% |
| correlated-value-propagation.NumShlNUW           |      2793 |      2794 |     1 |  0.0358% |
| correlated-value-propagation.NumSubNSW           |       730 |       736 |     6 |  0.8219% |
| correlated-value-propagation.NumSubNUW           |      2044 |      2046 |     2 |  0.0978% |
| correlated-value-propagation.NumSubNW            |      2774 |      2782 |     8 |  0.2884% |
| instcount.NumAddInst                             |    277586 |    277569 |   -17 | -0.0061% |
| instcount.NumAndInst                             |     66056 |     66054 |    -2 | -0.0030% |
| instcount.NumBrInst                              |    709147 |    709146 |    -1 | -0.0001% |
| instcount.NumCallInst                            |    528579 |    528576 |    -3 | -0.0006% |
| instcount.NumExtractValueInst                    |     18307 |     18301 |    -6 | -0.0328% |
| instcount.NumOrInst                              |    102660 |    102665 |     5 |  0.0049% |
| instcount.NumPHIInst                             |    318008 |    318007 |    -1 | -0.0003% |
| instcount.NumSelectInst                          |     46373 |     46370 |    -3 | -0.0065% |
| instcount.NumSExtInst                            |     79496 |     79488 |    -8 | -0.0101% |
| instcount.NumShlInst                             |     40654 |     40657 |     3 |  0.0074% |
| instcount.NumTruncInst                           |     62251 |     62249 |    -2 | -0.0032% |
| instcount.NumZExtInst                            |     68211 |     68221 |    10 |  0.0147% |
| instcount.TotalBlocks                            |    843910 |    843909 |    -1 | -0.0001% |
| instcount.TotalInsts                             |   7387448 |   7387423 |   -25 | -0.0003% |

Reviewers: nikic, reames

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69321
2019-10-23 18:17:32 +03:00
Guillaume Chatelet 301b4128ac [Alignment][NFC] Finish transition for `Loads`
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69253

llvm-svn: 375419
2019-10-21 15:10:26 +00:00
Guillaume Chatelet 3cc4835c00 Use Align for TFL::TransientStackAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69216

llvm-svn: 375398
2019-10-21 08:31:25 +00:00
Philip Reames 1d509201e2 [SCEV] Simplify umin/max of zext and sext of the same value
This is a common idiom which arises after induction variables are widened, and we have two or more exit conditions.  Interestingly, we don't have instcombine or instsimplify support for this either.

Differential Revision: https://reviews.llvm.org/D69006

llvm-svn: 375349
2019-10-19 17:23:02 +00:00
Victor Campos e64863d192 [SCEV] Removing deprecated comment in ScalarEvolutionExpander
Removing a comment in the ScalarEvolutionExpander.cpp file that was about the
class SCEVSDivExpr, which has been long gone from LLVM.

llvm-svn: 375232
2019-10-18 13:33:45 +00:00
Oliver Stannard 3b598b9c86 Reland: Dead Virtual Function Elimination
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.

Original commit message:

Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

llvm-svn: 375094
2019-10-17 09:58:57 +00:00
Guillaume Chatelet bae629b966 [Alignment][NFC] Value::getPointerAlignment returns MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68398

llvm-svn: 374889
2019-10-15 13:58:22 +00:00
Jorge Gorbe Moya b052331bd6 Revert "Dead Virtual Function Elimination"
This reverts commit 9f6a873268.

llvm-svn: 374844
2019-10-14 23:25:25 +00:00
Sam Parker 527a35e155 [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]
Add an extra parameter so the backend can take the alignment into
consideration.

Differential Revision: https://reviews.llvm.org/D68400

llvm-svn: 374763
2019-10-14 10:00:21 +00:00
Zi Xuan Wu 9802268ad3 recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374634
2019-10-12 02:53:04 +00:00
Oliver Stannard 9f6a873268 Dead Virtual Function Elimination
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

llvm-svn: 374539
2019-10-11 11:59:55 +00:00
Florian Hahn 77fbf069f6 [SCEV] Add stricter verification option.
Currently -verify-scev only fails if there is a constant difference
between two BE counts. This misses a lot of cases.

This patch adds a -verify-scev-strict options, which fails for any
non-zero differences, if used together with -verify-scev.

With the stricter checking, some unit tests fail because
of mis-matches, especially around IndVarSimplify.

If there is no reason I am missing for just checking constant deltas, I
am planning on looking into the various failures.

Reviewers: efriedma, sanjoy.google, reames, atrick

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D68592

llvm-svn: 374535
2019-10-11 11:46:40 +00:00
Alina Sbirlea 6442b56974 [MemorySSA] Update Phi simplification.
When simplifying a Phi to the unique value found incoming, check that
there wasn't a Phi already created to break a cycle. If so, remove it.
Resolves PR43541.

Some additional nits included.

llvm-svn: 374471
2019-10-10 23:27:21 +00:00