Commit Graph

21547 Commits

Author SHA1 Message Date
Alina Sbirlea 605b21739d [LICM&MSSA] Limit store hoisting.
Summary:
If there is no clobbering access for a store inside the loop, that store
can only be hoisted if there are no interfearing loads.
A more general verification introduced here: there are no loads that are
not optimized to an access outside the loop.
Addresses PR40586.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57967

llvm-svn: 353734
2019-02-11 19:07:15 +00:00
Chandler Carruth 1f5550326f Update more files added with the old header to the new one.
llvm-svn: 353667
2019-02-11 08:25:56 +00:00
Chandler Carruth b53f0e1145 Update files that were mistakenly added with the old file header to the
new one.

llvm-svn: 353665
2019-02-11 08:07:38 +00:00
Chandler Carruth 751d95fb9b [CallSite removal] Migrate ConstantFolding APIs and implementation to
`CallBase`.

Users have been updated. You can see how to update any out-of-tree
usages: pass `cast<CallBase>(CS.getInstruction())`.

llvm-svn: 353661
2019-02-11 07:51:44 +00:00
Chandler Carruth 3160734af1 [CallSite removal] Migrate the statepoint GC infrastructure to use the
`CallBase` class rather than `CallSite` wrappers.

I pushed this change down through most of the statepoint infrastructure,
completely removing the use of CallSite where I could reasonably do so.
I ended up making a couple of cut-points: generic call handling
(instcombine, TLI, SDAG). As soon as it hit truly generic handling with
users outside the immediate code, I simply transitioned into or out of
a `CallSite` to make this a reasonable sized chunk.

Differential Revision: https://reviews.llvm.org/D56122

llvm-svn: 353660
2019-02-11 07:42:30 +00:00
Fangrui Song 709a3e7488 [Local] Delete a redundant check. NFC
isInstructionTriviallyDead also performs the use_empty() check.

llvm-svn: 353637
2019-02-10 09:25:56 +00:00
Craig Topper a97857b5b5 [InstCombine] Fix an unused variable warning.
llvm-svn: 353630
2019-02-10 02:21:29 +00:00
Fangrui Song 6e679f8ba5 [GlobalOpt] Simplify __cxa_atexit elimination
cxxDtorIsEmpty checks callers recursively to determine if the
__cxa_atexit-registered function is empty, and eliminates the
__cxa_atexit call accordingly.

This recursive check is unnecessary as redundant instructions and
function calls can be removed by early-cse and inliner. In addition,
cxxDtorIsEmpty does not mark visited function and it may visit a
function exponential times (multiplication principle).

llvm-svn: 353603
2019-02-09 09:18:37 +00:00
Gabor Buella 53980b24b7 Extra processing for BitCast + PHI in InstCombine
For some specific cases with bitcast A->B->A with intervening PHI nodes InstCombiner::optimizeBitCastFromPhi transformation creates extra PHI nodes, which are actually a copy of already created PHI or in another words, they are redundant. These extra PHI nodes could lead to extra move instructions generated after DeSSA transformation. This happens when several conditions are met

 - SROA kicks in and creates new alloca;
 - there is a simple assignment L = R, which falls under 'canonicalize loads' done by combineLoadToOperationType (this transformation is by default). Exactly this transformation is the reason of bitcasts generated;
 - the alloca is then used in A->B->A + PHI chain;
 - there is a loop unrolling.

As a result optimizeBitCastFromPhi creates as many of PHI nodes for each new SROA alloca as loop unrolling factor is. These new extra PHI nodes are redundant actually except of one and should not be created. Moreover the idea of optimizeBitCastFromPhi is to get rid of the cast (when possible) but that doesn't happen in these conditions.

The proposed fix is to do the cast replacement for the whole calculated/accumulated PHI closure not for one cast only, which is an argument to the optimizeBitCastFromPhi. These will help to accomplish several things: 1) avoid extra PHI nodes generated as all casts which may trigger optimizeBitCastFromPhi transformation will be replaced, 3) bitcasts will be replaced, and 3) create more opportunities to remove dead code, which appears after the replacement.

A new test case shows that it's possible to get rid of all bitcasts completely and get quite good code reduction.

Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>

Reviewed By: Carrot

Differential Revision: https://reviews.llvm.org/D57053

llvm-svn: 353595
2019-02-09 01:44:28 +00:00
Sergey Dmitriev afd612ece9 [NFC] Avoid passing blocks vector to the OutlineRegionInfo constructor by value.
Reviewers: vsk, fhahn, davidxl

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57957

llvm-svn: 353582
2019-02-08 23:52:15 +00:00
Craig Topper 784929d045 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
Vedant Kumar 0e5dd512aa [CodeExtractor] Restore outputs after creating exit stubs
When CodeExtractor saves the result of InvokeInst at the first insertion
point of the 'normal destination' basic block, this block can be omitted
in the outlined region, so store is placed outside of the function. The
suggested solution is to process saving outputs after creating exit
stubs for new function, and stores will be placed in that blocks before
return in this case.

Patch by Sergei Kachkov!

Fixes llvm.org/PR40455.

Differential Revision: https://reviews.llvm.org/D57919

llvm-svn: 353562
2019-02-08 20:48:04 +00:00
Reid Kleckner 987d331fab [InstrProf] Implement static profdata registration
Summary:
The motivating use case is eliminating duplicate profile data registered
for the same inline function in two object files. Before this change,
users would observe multiple symbol definition errors with VC link, but
links with LLD would succeed.

Users (Mozilla) have reported that PGO works well with clang-cl and LLD,
but when using LLD without this static registration, we would get into a
"relocation against a discarded section" situation. I'm not sure what
happens in that situation, but I suspect that duplicate, unused profile
information was retained. If so, this change will reduce the size of
such binaries with LLD.

Now, Windows uses static registration and is in line with all the other
platforms.

Reviewers: davidxl, wmi, inglorion, void, calixte

Subscribers: mgorny, krytarowski, eraman, fedor.sergeev, hiraditya, #sanitizers, dmajor, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D57929

llvm-svn: 353547
2019-02-08 19:03:50 +00:00
Teresa Johnson 3ce8112dad ArgumentPromotion should copy all metadata to new Function
Summary:
ArgumentPromotion had code to specifically move the dbg metadata over to
the new function, but other metadata such as the function_entry_count
!prof metadata was not. Replace code that moved dbg metadata with a call
to copyMetadata. The old metadata is automatically removed when the old
Function is removed.

Reviewers: davidxl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57846

llvm-svn: 353537
2019-02-08 17:08:27 +00:00
Carlos Alberto Enciso 08dc50f2fb [DWARF] LLVM ERROR: Broken function found, while removing Debug Intrinsics.
Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed.

Differential Revision: https://reviews.llvm.org/D57444

llvm-svn: 353511
2019-02-08 10:57:26 +00:00
Max Kazantsev 6b63d3a277 [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge
`insert/deleteEdge` methods in DTU can make updates incorrectly in some cases
(see https://bugs.llvm.org/show_bug.cgi?id=40528), and it is recommended to
use `applyUpdates` methods instead when it is needed to make a mass update in CFG.

Differential Revision: https://reviews.llvm.org/D57316
Reviewed By: kuhar

llvm-svn: 353502
2019-02-08 08:12:41 +00:00
Sergey Dmitriev 807960e6ef [CodeExtractor] Update function's assumption cache after extracting blocks from it
Summary: Assumption cache's self-updating mechanism does not correctly handle the case when blocks are extracted from the function by the CodeExtractor. As a result function's assumption cache may have stale references to the llvm.assume calls that were moved to the outlined function. This patch fixes this problem by removing extracted llvm.assume calls from the function’s assumption cache.

Reviewers: hfinkel, vsk, fhahn, davidxl, sanjoy

Reviewed By: hfinkel, vsk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57215

llvm-svn: 353500
2019-02-08 06:55:18 +00:00
Quentin Colombet 96f54de8ff [InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible
This commit teaches InstCombine how to replace an atomicrmw operation
into a simple load atomic.
For a given `atomicrmw <op>`, this is possible when:
1. The ordering of that operation is compatible with a load (i.e.,
   anything that doesn't have a release semantic).
2. <op> does not modify the value being stored

Differential Revision: https://reviews.llvm.org/D57854

llvm-svn: 353471
2019-02-07 21:27:23 +00:00
Florian Hahn f557a94aa3 [LV] Remove unnecessary assignment to UserIC.
llvm-svn: 353469
2019-02-07 21:23:37 +00:00
Sanjay Patel 781d883862 [InstCombine] Fix crashing from (icmp (bitcast ([su]itofp X)), Y)
This fixes a class of bugs introduced by D44367, 
which transforms various cases of icmp (bitcast ([su]itofp X)), Y to icmp X, Y. 
If the bitcast is between vector types with a different number of elements, 
the current code will produce bad IR along the lines of: icmp <N x i32> ..., <M x i32> <...>.

This patch suppresses the transform if the bitcast changes the number of vector elements.

Patch by: @AndrewScheidecker (Andrew Scheidecker)

Differential Revision: https://reviews.llvm.org/D57871

llvm-svn: 353467
2019-02-07 21:12:01 +00:00
Sanjay Patel e7f46c3db3 [InstCombine] refactor folds for (icmp (bitcast X), Y); NFCI
llvm-svn: 353462
2019-02-07 20:54:09 +00:00
Florian Hahn ba5acbc4fe [LV] Prevent interleaving if computeMaxVF returned None.
As discussed in D57382, interleaving should be avoided if computeMaxVF
returns None, same as we currently do for vectorization.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477

Reviewers: Ayal, dcaballe, hsaito, mkuper, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D57837

llvm-svn: 353461
2019-02-07 20:49:10 +00:00
Reid Kleckner f21c022380 [InstrProf] Avoid reconstructing Triple, NFC
llvm-svn: 353439
2019-02-07 18:16:22 +00:00
Teresa Johnson c36c10ddfb [HotColdSplit] With PGO add profile entry metadata to split cold function
Summary:
When compiling with profile data, ensure the split cold function gets
cold function_entry_count metadata (just use 0 since it should be cold).
Otherwise with function sections it will not be placed in the unlikely
text section with other cold code.

Reviewers: vsk

Subscribers: sebpop, hiraditya, davidxl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57900

llvm-svn: 353434
2019-02-07 17:50:35 +00:00
Sam Parker 67756c09f2 [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Alina Sbirlea 6cba96ed52 [LICM/MSSA] Add promotion to scalars by building an AliasSetTracker with MemorySSA.
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.

Reviewers: sanjoy, chandlerc

Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D56625

llvm-svn: 353339
2019-02-06 20:25:17 +00:00
Sanjay Patel 68bc5fb0ad [InstCombine] X | C == C --> (X & ~C) == 0
We should canonicalize to one of these forms,
and compare-with-zero could be more conducive
to follow-on transforms. This also leads to
generally better codegen as shown in PR40611:
https://bugs.llvm.org/show_bug.cgi?id=40611

llvm-svn: 353313
2019-02-06 16:43:54 +00:00
Max Kazantsev cd48ac3661 [NFC] Simplify check in guard widening
llvm-svn: 353290
2019-02-06 11:27:00 +00:00
Max Kazantsev 36b392cbe4 [NFC] Factor out detatchment of dead blocks from their erasing
llvm-svn: 353277
2019-02-06 07:56:36 +00:00
Max Kazantsev a4ccfc1841 [LoopSimplifyCFG] Do not count dead exit blocks twice, make CFG simpler
llvm-svn: 353276
2019-02-06 07:49:17 +00:00
Max Kazantsev 0d7ad3c9a3 [NFC] Revert rL353274
llvm-svn: 353275
2019-02-06 06:33:02 +00:00
Max Kazantsev 61e6ffc398 [NFC] Extend API of DeleteDeadBlock(s) to collect updates without DTU
llvm-svn: 353274
2019-02-06 06:00:02 +00:00
Max Kazantsev bad4db8b1a [NFC] Replace readonly SmallVectorImpl with ArrayRef
llvm-svn: 353273
2019-02-06 05:40:31 +00:00
Teresa Johnson 716abbeb43 [HotColdSplit] Move splitting after instrumented PGO use
Summary:
Follow up to D57082 which moved splitting earlier in the pipeline, in
order to perform it before inlining. However, it was moved too early,
before the IR is annotated with instrumented PGO data. This caused the
splitting to incorrectly determine cold functions.

Move it to just after PGO annotation (still before inlining), in both
pass managers.

Reviewers: vsk, hiraditya, sebpop

Subscribers: mehdi_amini, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57805

llvm-svn: 353270
2019-02-06 04:29:39 +00:00
Richard Trieu 5f436fc57a Move DomTreeUpdater from IR to Analysis
DomTreeUpdater depends on headers from Analysis, but is in IR.  This is a
layering violation since Analysis depends on IR.  Relocate this code from IR
to Analysis to fix the layering violation.

llvm-svn: 353265
2019-02-06 02:52:52 +00:00
Vedant Kumar bd94b4287c [HotColdSplit] Do not split out `resume` instructions
Resumes that are not reachable from a cleanup landing pad are considered
to be unreachable. It’s not safe to split them out.

rdar://47808235

llvm-svn: 353242
2019-02-05 23:39:02 +00:00
Sanjay Patel cddb1e5469 [InstCombine] limit extracting shuffle transform based on uses
As discussed in D53037, this can lead to worse codegen, and we
don't generally expect the backend to be able to optimize
arbitrary shuffles. If there's only one use of the 1st shuffle,
that means it's getting removed, so that should always be
safe.

llvm-svn: 353235
2019-02-05 22:58:45 +00:00
Rong Xu ce10d5ead4 [PGO] Use a function for creating variable for profile file name. NFC.
Factored out the code for creating variable for profile file name to
a function.

llvm-svn: 353230
2019-02-05 22:34:45 +00:00
Jeremy Morse 84ca706be1 [DebugInfo][NFCI] Split salvageDebugInfo into helper functions
Some use cases are appearing where salvaging is needed that does not
correspond to an instruction being deleted -- for example an instruction
being sunk, or a Value not being available in a block being isel'd.

Enable more fine grained control over how salavging occurs by splitting
the logic into helper functions, separating things that are specific to
working on DbgVariableIntrinsics from those specific to interpreting IR
and building DIExpressions.

Differential Revision: https://reviews.llvm.org/D57696

llvm-svn: 353156
2019-02-05 11:11:28 +00:00
Max Kazantsev d5e595b7a6 [LSR] Check SCEV on isZero() after extend. PR40514
When LSR first adds SCEVs to BaseRegs, it only does it if `isZero()` has
returned false. In the end, in invocation of `InsertFormula`, it asserts that
all values there are still not zero constants. However between these two
points, it makes some transformations, in particular extends them to wider
type.

SCEV does not give us guarantee that if `S` is not a constant zero, then
`sext(S)` is also not a constant zero. It might have missed some optimizing
transforms when it was calculating `S` and then made them when it took `sext`.
For example, it may happen if previously optimizing transforms were limited
by depth or somehow else.

This patch adds a bailout when we may end up with a zero SCEV after extension.

Differential Revision: https://reviews.llvm.org/D57565
Reviewed By: samparker

llvm-svn: 353136
2019-02-05 04:30:37 +00:00
Richard Trieu a9354b2f33 Fix narrowing issue from r353129
llvm-svn: 353134
2019-02-05 02:26:03 +00:00
Wei Mi 4901f371a2 [SamplePGO][NFC] Minor improvement to replace a temporary vector with a
brace-enclosed init list.

Differential Revision: https://reviews.llvm.org/D57726

llvm-svn: 353129
2019-02-05 00:57:50 +00:00
Teresa Johnson 4bdf82ce79 [SamplePGO] Minor efficiency improvement in samplePGO ICP
Summary:
When attaching prof metadata to promoted direct calls in SamplePGO
mode, no need to construct and use a SmallVector to pass a single count
to the ArrayRef parameter, we can simply use a brace-enclosed init list.

This made a small but consistent improvement for a ThinLTO backend
compile I was measuring.

Reviewers: wmi

Subscribers: mehdi_amini, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57706

llvm-svn: 353123
2019-02-05 00:18:38 +00:00
Julian Lettner 29ac3a5b82 [SanitizerCoverage] Clang crashes if user declares `__sancov_lowest_stack` variable
Summary:
If the user declares or defines `__sancov_lowest_stack` with an
unexpected type, then `getOrInsertGlobal` inserts a bitcast and the
following cast fails:
```
Constant *SanCovLowestStackConstant =
       M.getOrInsertGlobal(SanCovLowestStackName, IntptrTy);
SanCovLowestStack = cast<GlobalVariable>(SanCovLowestStackConstant);
```

This variable is a SanitizerCoverage implementation detail and the user
should generally never have a need to access it, so we emit an error
now.

rdar://problem/44143130

Reviewers: morehouse

Differential Revision: https://reviews.llvm.org/D57633

llvm-svn: 353100
2019-02-04 22:06:30 +00:00
Nicolai Haehnle a69146e67e [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded
Summary:
The fix added in r352904 is not quite correct, or rather misleading:

1. When the texfailctrl (TFC) argument was non-constant, the fix assumed
   non-TFE/LWE, which is incorrect.

2. Regardless, this code path cannot even be hit for correct
   TFE/LWE-enabled calls, because those return a struct. Added
   a test case for those for completeness.

Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf

Reviewers: hliao, dstuttard, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57681

llvm-svn: 353097
2019-02-04 21:24:19 +00:00
Philip Pfaffe 0ee6a933ce [NewPM][MSan] Add Options Handling
Summary: This patch enables passing options to msan via the passes pipeline, e.e., -passes=msan<recover;kernel;track-origins=4>.

Reviewers: chandlerc, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57640

llvm-svn: 353090
2019-02-04 21:02:49 +00:00
Michael Kruse 70560a0a2c [WarnMissedTransforms] Do not warn about already vectorized loops.
LoopVectorize adds llvm.loop.isvectorized, but leaves
llvm.loop.vectorize.enable. Do not consider such a loop for user-forced
vectorization since vectorization already happened -- by prioritizing
llvm.loop.isvectorized except for TM_SuppressedByUser.

Fixes http://llvm.org/PR40546

Differential Revision: https://reviews.llvm.org/D57542

llvm-svn: 353082
2019-02-04 19:55:59 +00:00
Max Kazantsev 56b57e3f53 [NFC] Make a check in GuardWidening more obvious
llvm-svn: 353038
2019-02-04 10:41:17 +00:00
Max Kazantsev 09802f41cc [NFC] Rename variables to reflect the actual status of GuardWidening
llvm-svn: 353036
2019-02-04 10:31:18 +00:00
Max Kazantsev 13ab5cbb64 [NFC] Remove redundant parameters for better readability
llvm-svn: 353034
2019-02-04 10:20:51 +00:00
Max Kazantsev 65970aa24d [NFC] Replace equivalent condition for better readability
llvm-svn: 353032
2019-02-04 09:55:18 +00:00
Davide Italiano 73929c4d24 [LoopIdiomRecognize] @llvm.dbg values shouldn't affect the transformation.
Summary: PR40564

Reviewers: aprantl, rnk

Subscribers: llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57629

llvm-svn: 353007
2019-02-03 20:33:20 +00:00
Florian Hahn dd2ef0af46 [LCSSA] Handle case with single new PHI faster.
If there is only a single available value, all uses must be dominated by
the single value and there is no need to search for a reaching
definition.

This drastically speeds up LCSSA in some cases. For the test case
from PR37202, it speeds up LCSSA construction by 4 times.

Time-passes without this patch for test case from PR37202:

    Total Execution Time: 29.9285 seconds (29.9276 wall clock)

    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
    5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
    4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
    4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
    2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
    2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
    1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
    1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
    1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
    1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
    1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
    1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3

Time passes with this patch

  Total Execution Time: 19.2802 seconds (19.2813 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   4.4234 ( 23.2%)   0.0038 (  2.0%)   4.4272 ( 23.0%)   4.4273 ( 23.0%)  Unswitch loops
   2.3828 ( 12.5%)   0.0020 (  1.1%)   2.3848 ( 12.4%)   2.3847 ( 12.4%)  Unroll loops
   1.8714 (  9.8%)   0.0020 (  1.1%)   1.8734 (  9.7%)   1.8735 (  9.7%)  Loop Invariant Code Motion
   1.7973 (  9.4%)   0.0022 (  1.2%)   1.7995 (  9.3%)   1.8003 (  9.3%)  Value Propagation
   1.4010 (  7.3%)   0.0033 (  1.8%)   1.4043 (  7.3%)   1.4044 (  7.3%)  Induction Variable Simplification
   0.9978 (  5.2%)   0.0244 ( 13.1%)   1.0222 (  5.3%)   1.0224 (  5.3%)  Loop-Closed SSA Form Pass #2
   0.9611 (  5.0%)   0.0257 ( 13.8%)   0.9868 (  5.1%)   0.9868 (  5.1%)  Loop-Closed SSA Form Pass
   0.5856 (  3.1%)   0.0015 (  0.8%)   0.5871 (  3.0%)   0.5869 (  3.0%)  Unroll loops #2
   0.4132 (  2.2%)   0.0012 (  0.7%)   0.4145 (  2.1%)   0.4143 (  2.1%)  Loop Invariant Code Motion #3

Reviewers: efriedma, davide, mzolotukhin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D57033

llvm-svn: 352960
2019-02-02 15:26:05 +00:00
Florian Hahn 509b48a64a [LCSSA] Add expensive verification of LCSSA form for sub-loops.
This assertion makes sure all sub-loops are in LCSSA form before
bringing their parent in LCSSA form. This precondition was added to
formLCSSA in D56848.

Reviewers: davide, efriedma, mzolotukhin

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D56921

llvm-svn: 352958
2019-02-02 14:42:27 +00:00
Julian Lettner f82d8924ef [ASan] Do not instrument other runtime functions with `__asan_handle_no_return`
Summary:
Currently, ASan inserts a call to `__asan_handle_no_return` before every
`noreturn` function call/invoke. This is unnecessary for calls to other
runtime funtions. This patch changes ASan to skip instrumentation for
functions calls marked with `!nosanitize` metadata.

Reviewers: TODO

Differential Revision: https://reviews.llvm.org/D57489

llvm-svn: 352948
2019-02-02 02:05:16 +00:00
James Y Knight 291f791ef1 [opaque pointer types] Pass function type for CallBase::setCalledFunction.
Differential Revision: https://reviews.llvm.org/D57174

llvm-svn: 352914
2019-02-01 20:44:54 +00:00
James Y Knight 7716075a17 [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
James Y Knight d9e85a0861 [opaque pointer types] Pass function types to InvokeInst creation.
This cleans up all InvokeInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57171

llvm-svn: 352910
2019-02-01 20:43:34 +00:00
James Y Knight 7976eb5838 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

llvm-svn: 352909
2019-02-01 20:43:25 +00:00
Michael Liao 8b323f53eb [InstCombine] Extra null-checking on TFE/LWE support
- If that operand is not ConstantInt, skip enabling TFE/LWE.

Differential Revision: https://reviews.llvm.org/D57539

llvm-svn: 352904
2019-02-01 19:53:44 +00:00
Sanjay Patel fbcbac7174 [InstCombine] reduce duplicate code; NFC
An unused variable problem was introduced with rL352870 
and stubbed out with rL352871, but we can make a better
fix by actually using the local variable in code rather 
than just the assert.  

llvm-svn: 352873
2019-02-01 14:37:49 +00:00
Fangrui Song 8495aabec2 [InstCombine] Fix -Wunused-variable when -DLLVM_ENABLE_ASSERTIONS=off
llvm-svn: 352871
2019-02-01 14:22:02 +00:00
Sanjay Patel be23a91fcd [InstCombine] try to reduce x86 addcarry to generic uaddo intrinsic
If we can reduce the x86-specific intrinsic to the generic op, it allows existing 
simplifications and value tracking folds. AFAICT, this always results in identical 
x86 codegen in the non-reduced case...which should be true because we semi-generically 
(too aggressively IMO) convert to llvm.uadd.with.overflow in CGP, so the DAG/isel must 
already combine/lower this intrinsic as expected.

This isn't quite what was requested in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but we want to have these kinds of folds early for efficiency and to enable greater 
simplifications. For the case in the bug report where we have:
_addcarry_u64(0, ahi, 0, &ahi)
...this gets completely simplified away in IR.

Differential Revision: https://reviews.llvm.org/D57453

llvm-svn: 352870
2019-02-01 14:14:47 +00:00
Yevgeny Rouban 15b17d0a7c Provide reason messages for unviable inlining
InlineCost's isInlineViable() is changed to return InlineResult
instead of bool. This provides messages for failure reasons and
allows to get more specific messages for cases where callsites
are not viable for inlining.

Reviewed By: xbolva00, anemet

Differential Revision: https://reviews.llvm.org/D57089

llvm-svn: 352849
2019-02-01 10:44:43 +00:00
Yevgeny Rouban 4cdd783955 [SLPVectorizer] Get rid of IndexQueue array from vectorizeStores. NFCI.
Indices are checked as they are generated. No need to fill the whole array of indices.

Differential Revision: https://reviews.llvm.org/D57144

llvm-svn: 352839
2019-02-01 06:44:08 +00:00
James Y Knight 13680223b9 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.

Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352827
2019-02-01 02:28:03 +00:00
Kostya Serebryany a78a44d480 [sanitizer-coverage] prune trace-cmp instrumentation for CMP isntructions that feed into the backedge branch. Instrumenting these CMP instructions is almost always useless (and harmful) for fuzzing
llvm-svn: 352818
2019-01-31 23:43:00 +00:00
James Y Knight fadf25068e Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it."
This reverts commit f47d6b38c7 (r352791).

Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.

llvm-svn: 352800
2019-01-31 21:51:58 +00:00
Alina Sbirlea e271889291 [EarlyCSE & MSSA] Cleanup special handling for removing MemoryAccesses.
Summary: Moving special handling to MemorySSAUpdater in D57199.

Reviewers: gberry, george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D57200

llvm-svn: 352794
2019-01-31 21:12:41 +00:00
James Y Knight f47d6b38c7 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352791
2019-01-31 20:35:56 +00:00
Craig Topper c1892ec15a [CallSite removal] Remove CallSite uses from InstCombine.
Reviewers: chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57494

llvm-svn: 352771
2019-01-31 17:23:29 +00:00
Teresa Johnson f59242e5ff Recommit "[ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader"
Recommit of r352763 with fix for use after free.

llvm-svn: 352770
2019-01-31 17:18:11 +00:00
Teresa Johnson 4877715ee6 Revert "[ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader"
This reverts commit r352763.

Causing a couple bot failures, root cause pointed to by sanitizer bot:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/28909/steps/annotate/logs/stdio

Use after free. I understand the issue but will revert and test with fix
before recommitting.

llvm-svn: 352768
2019-01-31 16:46:14 +00:00
Teresa Johnson 992b53fd16 [ThinLTO] Rename COMDATs for COFF when promoting/renaming COMDAT leader
Summary:
COFF requires that COMDAT name match that of the leader. When we promote
and rename an internal leader in ThinLTO due to an import, ensure we
subsequently rename the associated COMDAT. Similar to D31963 which did
this during ThinLTO module splitting.

Fixes PR40414.

Reviewers: pcc, inglorion

Subscribers: mehdi_amini, dexonsmith, dmajor, llvm-commits

Differential Revision: https://reviews.llvm.org/D57395

llvm-svn: 352763
2019-01-31 16:00:15 +00:00
Max Kazantsev f392bc846f Default lowering for experimental.widenable.condition
Introduces a pass that provides default lowering strategy for the
`experimental.widenable.condition` intrinsic, replacing all its uses with
`i1 true`.

Differential Revision: https://reviews.llvm.org/D56096
Reviewed By: reames

llvm-svn: 352739
2019-01-31 09:10:17 +00:00
Dmitry Venikov 8817658836 [InstCombine] Missed optimization in math expression: simplify calls exp functions
Summary: This patch enables folding following expressions under -ffast-math flag: exp(X) * exp(Y) -> exp(X + Y), exp2(X) * exp2(Y) -> exp2(X + Y). Motivation: https://bugs.llvm.org/show_bug.cgi?id=35594

Reviewers: hfinkel, spatel, efriedma, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D41342

llvm-svn: 352730
2019-01-31 06:28:10 +00:00
Erik Pilkington 600e9deacf Add a 'dynamic' parameter to the objectsize intrinsic
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56761

llvm-svn: 352664
2019-01-30 20:34:35 +00:00
Philip Reames c71e996aed SimplifyDemandedVectorElts for all intrinsics
The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones.

This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output.  This is due to order of transforms w/in instcombine resulting in two slightly different fixed points.  That's something we should fix, but isn't a problem w/this patch per se.

Differential Revision: https://reviews.llvm.org/D57398

llvm-svn: 352653
2019-01-30 19:21:11 +00:00
Max Kazantsev 365021cc15 Properly use DT.verify in LoopSimplifyCFG
llvm-svn: 352621
2019-01-30 12:32:19 +00:00
Max Kazantsev 34eeeec3ae Enable IRCE for narrow latch by defailt
llvm-svn: 352619
2019-01-30 11:25:12 +00:00
Alina Sbirlea f9027e554a Check bool attribute value in getOptionalBoolLoopAttribute.
Summary:
Check the bool value of the attribute in getOptionalBoolLoopAttribute
not just its existance.
Eliminates the warning noise generated when vectorization is explicitly disabled.

Reviewers: Meinersbur, hfinkel, dmgreen

Subscribers: jlebar, sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D57260

llvm-svn: 352555
2019-01-29 22:33:20 +00:00
Sanjay Patel 18db56209c [InstCombine] canonicalize cmp/select form of uadd saturate with constant
I'm circling back around to a loose end from D51929.

The backend (either CGP or DAG) doesn't recognize this pattern, so we end up with different asm for these IR variants.

Regardless of any future changes to canonicalize to saturation/overflow intrinsics, we want to get raw IR variations 
into the minimal number of raw IR forms. If/when we can canonicalize to intrinsics, that will make that step easier.

  Pre: C2 == ~C1
  %a = add i32 %x, C1
  %c = icmp ugt i32 %x, C2
  %r = select i1 %c, i32 -1, i32 %a
  =>
  %a = add i32 %x, C1
  %c2 = icmp ult i32 %x, C2
  %r = select i1 %c2, i32 %a, i32 -1

  https://rise4fun.com/Alive/pkH

Differential Revision: https://reviews.llvm.org/D57352

llvm-svn: 352536
2019-01-29 20:02:45 +00:00
Bjorn Pettersson d014d576a9 [IPCP] Don't crash due to arg count/type mismatch between caller/callee
Summary:
This patch avoids an assert in IPConstantPropagation when
there is a argument count/type mismatch between the caller and
the callee.

While this is actually UB on C-level (clang emits a warning),
the IR verifier seems to accept it. I'm not sure what other
frontends/languages might think about this, so simply bailing out
to avoid hitting an assert (in CallSiteBase<>::getArgOperand or
Value::doRAUW) seems like a simple solution.

The problem is exposed by the fact that AbstractCallSites will look
through a bitcast at the callee position of a call/invoke.

Reviewers: jdoerfert, reames, efriedma

Reviewed By: jdoerfert, efriedma

Subscribers: eli.friedman, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D57052

llvm-svn: 352469
2019-01-29 10:19:44 +00:00
Philip Reames 6c5341bc5a Demanded elements support for vector GEPs
GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations

Differential Revision: https://reviews.llvm.org/D57177

llvm-svn: 352440
2019-01-28 23:24:49 +00:00
Teresa Johnson 5b2f6a1bc2 [ThinLTO] Refine reachability check to fix compile time increase
Summary:
A recent fix to the ThinLTO whole program dead code elimination (D56117)
increased the thin link time on a large MSAN'ed binary by 2x.
It's likely that the time increased elsewhere, but was more noticeable
here since it was already large and ended up timing out.

That change made it so we would repeatedly scan all copies of linkonce
symbols for liveness every time they were encountered during the graph
traversal. This was needed since we only mark one copy of an aliasee as
live when we encounter a live alias. This patch fixes the issue in a
more efficient manner by simply proactively visiting the aliasee (thus
marking all copies live) when we encounter a live alias.

Two notes: One, this requires a hash table lookup (finding the aliasee
summary in the index based on aliasee GUID). However, the impact of this
seems to be small compared to the original pre-D56117 thin link time. It
could be addressed if we keep the aliasee ValueInfo in the alias summary
instead of the aliasee GUID, which I am exploring in a separate patch.

Second, we only populate the aliasee GUID field when reading summaries
from bitcode (whether we are reading individual summaries and merging on
the fly to form the compiled index, or reading in a serialized combined
index). Thankfully, that's currently the only way we can get to this
code as we don't yet support reading summaries from LLVM assembly
directly into a tool that performs the thin link (they must be converted
to bitcode first). I added a FIXME, however I have the fix under test
already. The easiest fix is to simply populate this field always, which
isn't hard, but more likely the change I am exploring to store the
ValueInfo instead as described above will subsume this. I don't want to
hold up the regression fix for this though.

Reviewers: trentxintong

Subscribers: mehdi_amini, inglorion, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D57203

llvm-svn: 352438
2019-01-28 22:27:05 +00:00
Vedant Kumar 1c3694a4d4 [CodeExtractor] Add support for the `swifterror` attribute
When passing a `swifterror` argument or alloca as an input to an
extraction region, mark the input parameter `swifterror`.

llvm-svn: 352408
2019-01-28 19:13:37 +00:00
Alina Sbirlea 932108703a [SimpleLoopUnswitch] Early check exit for trivial unswitch with MemorySSA.
Summary:
If MemorySSA is avaiable, we can skip checking all instructions if block has any Defs.
(volatile loads are also Defs).
We still need to check all instructions for "canThrow", even if no Defs are found.

Reviewers: chandlerc

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D57129

llvm-svn: 352393
2019-01-28 17:48:45 +00:00
Alina Sbirlea a34bcbf335 Revert rL352238.
llvm-svn: 352241
2019-01-25 21:12:08 +00:00
Alina Sbirlea 890a8e575f [WarnMissedTransforms] Set default to 1.
Summary:
Set default value for retrieved attributes to 1, since the check is against 1.
Eliminates the warning noise generated when the attributes are not present.

Reviewers: sanjoy

Subscribers: jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D57253

llvm-svn: 352238
2019-01-25 20:51:55 +00:00
Vedant Kumar db3f9774ee [HotColdSplit] Introduce a cost model to control splitting behavior
The main goal of the model is to avoid *increasing* function size, as
that would eradicate any memory locality benefits from splitting. This
happens when:

  - There are too many inputs or outputs to the cold region. Argument
    materialization and reloads of outputs have a cost.

  - The cold region has too many distinct exit blocks, causing a large
    switch to be formed in the caller.

  - The code size cost of the split code is less than the cost of a
    set-up call.

A secondary goal is to prevent excessive overall binary size growth.

With the cost model in place, I experimented to find a splitting
threshold that works well in practice. To make warm & cold code easily
separable for analysis purposes, I moved split functions to a "cold"
section. I experimented with thresholds between [0, 4] and set the
default to the threshold which minimized geomean __text size.

Experiment data from building LNT+externals for X86 (N = 639 programs,
all sizes in bytes):

| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os**       | 1736.3           | 0, n=0           | 10961.6        |
| -Os, thresh=0 | 1740.53          | 124.482, n=134   | 11014          |
| -Os, thresh=1 | 1734.79          | 57.8781, n=90    | 10978.6        |
| -Os, thresh=2 | ** 1733.85 **    | 65.6604, n=61    | 10977.6        |
| -Os, thresh=3 | 1733.85          | 65.3071, n=61    | 10977.6        |
| -Os, thresh=4 | 1735.08          | 67.5156, n=54    | 10965.7        |
| **-Oz**       | 1554.4           | 0, n=0           | 10153          |
| -Oz, thresh=2 | ** 1552.2 **     | 65.633, n=61     | 10176          |
| **-O3**       | 2563.37          | 0, n=0           | 13105.4        |
| -O3, thresh=2 | ** 2559.49 **    | 71.1072, n=61    | 13162.4        |

Picking thresh=2 reduces the geomean __text section size by 0.14% at
-Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that
TEXT size is page-aligned, whereas section sizes are byte-aligned.

Experiment data from building LNT+externals for ARM64 (N = 558 programs,
all sizes in bytes):

| Configuration | __text geom size | __cold geom size | TEXT geom size |
| **-Os**       | 1763.96          | 0, n=0           | 42934.9        |
| -Os, thresh=2 | ** 1760.9 **     | 76.6755, n=61    | 42934.9        |

Picking thresh=2 reduces the geomean __text section size by 0.17% at
-Os and causes no growth in the TEXT segment.

Measurements were done with D57082 (r352080) applied.

Differential Revision: https://reviews.llvm.org/D57125

llvm-svn: 352228
2019-01-25 18:30:37 +00:00
Max Kazantsev 38cd9acbb9 [LoopSimplifyCFG] Fix inconsistency in blocks in loop markup
2nd part of D57095 with the same reason, just in another place. We never
fold branches that are not immediately in the current loop, but this check
is missing in `IsEdgeLive` As result, it may think that the edge in subloop is
dead while it's live. It's a pessimization in the current stance.

Differential Revision: https://reviews.llvm.org/D57147
Reviewed By: rupprecht	

llvm-svn: 352170
2019-01-25 05:05:02 +00:00
Vedant Kumar 9d70f2b939 [HotColdSplit] Describe the pass in more detail, NFC
llvm-svn: 352161
2019-01-25 03:22:38 +00:00
Vedant Kumar 65de025d64 [HotColdSplit] Split more aggressively before/after cold invokes
While a cold invoke itself and its unwind destination can't be
extracted, code which unconditionally executes before/after the invoke
may still be profitable to extract.

With cost model changes from D57125 applied, this gives a 3.5% increase
in split text across LNT+externals on arm64 at -Os.

llvm-svn: 352160
2019-01-25 03:22:23 +00:00
Peter Collingbourne 1a8acfb768 hwasan: If we split the entry block, move static allocas back into the entry block.
Otherwise they are treated as dynamic allocas, which ends up increasing
code size significantly. This reduces size of Chromium base_unittests
by 2MB (6.7%).

Differential Revision: https://reviews.llvm.org/D57205

llvm-svn: 352152
2019-01-25 02:08:46 +00:00
Haojian Wu b9613a39b8 Fix a compiler error introduced in r352093.
llvm-svn: 352098
2019-01-24 20:30:48 +00:00
Alina Sbirlea 0a4367209c [LICM] Cleanup duplicated code. [NFCI]
llvm-svn: 352093
2019-01-24 19:57:30 +00:00
Alina Sbirlea 52f6e2a173 [MemorySSA +LICM CFHoist] Solve PR40317.
Summary:
MemorySSA needs updating each time an instruction is moved.
LICM and control flow hoisting re-hoists instructions, thus needing another update when re-moving those instructions.
Pending cleanup: the MSSA update is duplicated, should be moved inside moveInstructionBefore.

Reviewers: jnspaulsson

Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits

Differential Revision: https://reviews.llvm.org/D57176

llvm-svn: 352092
2019-01-24 19:48:35 +00:00
Vedant Kumar ef1ebed1c6 [HotColdSplit] Move splitting earlier in the pipeline
Performing splitting early has several advantages:

  - Inhibiting inlining of cold code early improves code size. Compared
    to scheduling splitting at the end of the pipeline, this cuts code
    size growth in half within the iOS shared cache (0.69% to 0.34%).

  - Inhibiting inlining of cold code improves compile time. There's no
    need to inline split cold functions, or to inline as much *within*
    those split functions as they are marked `minsize`.

  - During LTO, extra work is only done in the pre-link step. Less code
    must be inlined during cross-module inlining.

An additional motivation here is that the most common cold regions
identified by the static/conservative splitting heuristic can (a) be
found before inlining and (b) do not grow after inlining. E.g.
__assert_fail, os_log_error.

The disadvantages are:

  - Some opportunities for splitting out cold code may be missed. This
    gap can potentially be narrowed by adding a worklist algorithm to the
    splitting pass.

  - Some opportunities to reduce code size may be lost (e.g. store
    sinking, when one side of the CFG diamond is split). This does not
    outweigh the code size benefits of splitting earlier.

On net, splitting early in the pipeline has substantial code size
benefits, and no major effects on memory locality or performance. We
measured memory locality using ktrace data, and consistently found that
10% fewer pages were needed to capture 95% of text page faults in key
iOS benchmarks. We measured performance on frequency-stabilized iOS
devices using LNT+externals.

This reverses course on the decision made to schedule splitting late in
r344869 (D53437).

Differential Revision: https://reviews.llvm.org/D57082

llvm-svn: 352080
2019-01-24 18:55:49 +00:00
Julian Lettner b62e9dc46b Revert "[Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls"
This reverts commit cea84ab93a.

llvm-svn: 352069
2019-01-24 18:04:21 +00:00
Philip Reames 4d683ee7e3 [RS4GC] Be slightly less conservative for gep vector_base, scalar_idx
After submitting https://reviews.llvm.org/D57138, I realized it was slightly more conservative than needed. The scalar indices don't appear to be a problem on a vector gep, we even had a test for that.

Differential Revision: https://reviews.llvm.org/D57161

llvm-svn: 352061
2019-01-24 16:34:00 +00:00
Philip Reames a657510eb7 [RS4GC] Avoid crashing on gep scalar_base, vector_idx
This is an alternative to https://reviews.llvm.org/D57103.  After discussion, we dedicided to check this in as a temporary workaround, and pursue a true fix under the original thread.

The issue at hand is that the base rewriting algorithm doesn't consider the fact that GEPs can turn a scalar input into a vector of outputs. We had handling for scalar GEPs and fully vector GEPs (i.e. all vector operands), but not the scalar-base + vector-index forms. A true fix here requires treating GEP analogously to extractelement or shufflevector.

This patch is merely a workaround. It simply hides the crash at the cost of some ugly code gen for this presumable very rare pattern.

Differential Revision: https://reviews.llvm.org/D57138

llvm-svn: 352059
2019-01-24 16:08:18 +00:00
Florian Hahn bed7f9eab2 Revert "[HotColdSplitting] Get DT and PDT from the pass manager."
This reverts commit a6982414ed (llvm-svn: 352036),
because it causes a memory leak in the pass manager. Failing bot

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio

llvm-svn: 352041
2019-01-24 11:22:08 +00:00
Florian Hahn a6982414ed [HotColdSplitting] Get DT and PDT from the pass manager.
Instead of manually computing DT and PDT, we can get the from the pass
manager, which ideally has them already cached. With the new pass
manager, we could even preserve DT/PDT on a per function basis in a
module pass.

I think this also addresses the TODO about re-using the computed DTs for
BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we
will fetch the cached version later.

Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D57092

llvm-svn: 352036
2019-01-24 09:44:52 +00:00
Max Kazantsev 56515a2c76 [LoopSimplifyCFG] Fix inconsistency in live blocks markup
When we choose whether or not we should mark block as dead, we have an
inconsistent logic in markup of live blocks.
- We take candidate IF its terminator branches on constant AND it is immediately
  in current loop;
- We mark successor live IF its terminator doesn't branch by constant OR it branches
  by constant and the successor is its always taken block.

What we are missing here is that when the terminator branches on a constant but is
not taken as a candidate because is it not immediately in the current loop, we will
mark only one (always taken) successor as live. Therefore, we do NOT do the actual
folding but may NOT mark one of the successors as live. So the result of markup is
wrong in this case, and we may then hit various asserts.

Thanks Jordan Rupprech for reporting this!

Differential Revision: https://reviews.llvm.org/D57095
Reviewed By: rupprecht

llvm-svn: 352024
2019-01-24 05:20:29 +00:00
Julian Lettner cea84ab93a [Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls
Summary:
UBSan wants to detect when unreachable code is actually reached, so it
adds instrumentation before every `unreachable` instruction. However,
the optimizer will remove code after calls to functions marked with
`noreturn`. To avoid this UBSan removes `noreturn` from both the call
instruction as well as from the function itself. Unfortunately, ASan
relies on this annotation to unpoison the stack by inserting calls to
`_asan_handle_no_return` before `noreturn` functions. This is important
for functions that do not return but access the the stack memory, e.g.,
unwinder functions *like* `longjmp` (`longjmp` itself is actually
"double-proofed" via its interceptor). The result is that when ASan and
UBSan are combined, the `noreturn` attributes are missing and ASan
cannot unpoison the stack, so it has false positives when stack
unwinding is used.

Changes:
  # UBSan now adds the `expect_noreturn` attribute whenever it removes
    the `noreturn` attribute from a function
  # ASan additionally checks for the presence of this attribute

Generated code:
```
call void @__asan_handle_no_return    // Additionally inserted to avoid false positives
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
unreachable
```

The second call to `__asan_handle_no_return` is redundant. This will be
cleaned up in a follow-up patch.

rdar://problem/40723397

Reviewers: delcypher, eugenis

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D56624

llvm-svn: 352003
2019-01-24 01:06:19 +00:00
David Callahan d2eeb2516d Update entry count for cold calls
Summary:
Profile sample files include the number of times each entry or inlined
call site is sampled. This is translated into the entry count metadta
on functions.

When sample data is being read, if a call site that was inlined
in the sample program is considered cold and not inlined, then
the entry count of the out-of-line functions does not reflect
the current compilation.

In this patch, we note call sites where the function was not inlined
and as a last action of the sample profile loading, we update the
called function's entry count to reflect the calls from these
call sites which are not included in the profile file.

Reviewers: danielcdh, wmi, Kader, modocache

Reviewed By: wmi

Subscribers: davidxl, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D52845

llvm-svn: 352001
2019-01-24 00:55:23 +00:00
Mircea Trofin ec02630278 [llvm] Clarify responsiblity of some of DILocation discriminator APIs
Summary:
Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match
similar APIs. Also changed its behavior to copy over the other
discriminator components, instead of eliding them.

Renamed cloneWithDuplicationFactor to
cloneByMultiplyingDuplicationFactor, which more closely matches what
this API does.

Reviewers: dblaikie, wmi

Reviewed By: dblaikie

Subscribers: zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D56220

llvm-svn: 351996
2019-01-24 00:10:25 +00:00
Hideki Saito 4e4ecae028 [LV][VPlan] Change to implement VPlan based predication for
VPlan-native path

Context: Patch Series #2 for outer loop vectorization support in LV
using VPlan. (RFC:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

Patch series #2 checks that inner loops are still trivially lock-step
among all vector elements. Non-loop branches are blindly assumed as
divergent.

Changes here implement VPlan based predication algorithm to compute
predicates for blocks that need predication. Predicates are computed
for the VPLoop region in reverse post order. A block's predicate is
computed as OR of the masks of all incoming edges. The mask for an
incoming edge is computed as AND of predecessor block's predicate and
either predecessor's Condition bit or NOT(Condition bit) depending on
whether the edge from predecessor block to the current block is true
or false edge.

Reviewers: fhahn, rengolin, hsaito, dcaballe

Reviewed By: fhahn

Patch by Satish Guggilla, thanks!

Differential Revision: https://reviews.llvm.org/D53349

llvm-svn: 351990
2019-01-23 22:43:12 +00:00
Peter Collingbourne 020ce3f026 hwasan: Read shadow address from ifunc if we don't need a frame record.
This saves a cbz+cold call in the interceptor ABI, as well as a realign
in both ABIs, trading off a dcache entry against some branch predictor
entries and some code size.

Unfortunately the functionality is hidden behind a flag because ifunc is
known to be broken on static binaries on Android.

Differential Revision: https://reviews.llvm.org/D57084

llvm-svn: 351989
2019-01-23 22:39:11 +00:00
Florian Hahn 68cea130df [HotColdSplitting] Remove unused SSAUpdater.h include (NFC).
llvm-svn: 351945
2019-01-23 11:51:38 +00:00
Max Kazantsev d9aee3c0d1 [IRCE] Support narrow latch condition for wide range checks
This patch relaxes restrictions on types of latch condition and range check.
In current implementation, they should match. This patch allows to handle
wide range checks against narrow condition. The motivating example is the
following:

  int N = ...
  for (long i = 0; (int) i < N; i++) {
    if (i >= length) deopt;
  }

In this patch, the option that enables this support is turned off by
default. We'll wait until it is switched to true.

Differential Revision: https://reviews.llvm.org/D56837
Reviewed By: reames

llvm-svn: 351926
2019-01-23 07:20:56 +00:00
Peter Collingbourne 73078ecd38 hwasan: Move memory access checks into small outlined functions on aarch64.
Each hwasan check requires emitting a small piece of code like this:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses

The problem with this is that these code blocks typically bloat code
size significantly.

An obvious solution is to outline these blocks of code. In fact, this
has already been implemented under the -hwasan-instrument-with-calls
flag. However, as currently implemented this has a number of problems:
- The functions use the same calling convention as regular C functions.
  This means that the backend must spill all temporary registers as
  required by the platform's C calling convention, even though the
  check only needs two registers on the hot path.
- The functions take the address to be checked in a fixed register,
  which increases register pressure.
Both of these factors can diminish the code size effect and increase
the performance hit of -hwasan-instrument-with-calls.

The solution that this patch implements is to involve the aarch64
backend in outlining the checks. An intrinsic and pseudo-instruction
are created to represent a hwasan check. The pseudo-instruction
is register allocated like any other instruction, and we allow the
register allocator to select almost any register for the address to
check. A particular combination of (register selection, type of check)
triggers the creation in the backend of a function to handle the check
for specifically that pair. The resulting functions are deduplicated by
the linker. The pseudo-instruction (really the function) is specified
to preserve all registers except for the registers that the AAPCS
specifies may be clobbered by a call.

To measure the code size and performance effect of this change, I
took a number of measurements using Chromium for Android on aarch64,
comparing a browser with inlined checks (the baseline) against a
browser with outlined checks.

Code size: Size of .text decreases from 243897420 to 171619972 bytes,
or a 30% decrease.

Performance: Using Chromium's blink_perf.layout microbenchmarks I
measured a median performance regression of 6.24%.

The fact that a perf/size tradeoff is evident here suggests that
we might want to make the new behaviour conditional on -Os/-Oz.
But for now I've enabled it unconditionally, my reasoning being that
hwasan users typically expect a relatively large perf hit, and ~6%
isn't really adding much. We may want to revisit this decision in
the future, though.

I also tried experimenting with varying the number of registers
selectable by the hwasan check pseudo-instruction (which would result
in fewer variants being created), on the hypothesis that creating
fewer variants of the function would expose another perf/size tradeoff
by reducing icache pressure from the check functions at the cost of
register pressure. Although I did observe a code size increase with
fewer registers, I did not observe a strong correlation between the
number of registers and the performance of the resulting browser on the
microbenchmarks, so I conclude that we might as well use ~all registers
to get the maximum code size improvement. My results are below:

Regs | .text size | Perf hit
-----+------------+---------
~all | 171619972  | 6.24%
  16 | 171765192  | 7.03%
   8 | 172917788  | 5.82%
   4 | 177054016  | 6.89%

Differential Revision: https://reviews.llvm.org/D56954

llvm-svn: 351920
2019-01-23 02:20:10 +00:00
Vedant Kumar cde65c0fac [HotColdSplit] Calculate BFI lazily to reduce compile-time, NFC
The splitting pass does not need BFI unless the Module actually has a profile
summary. Do not calcualte BFI unless the summary is present.

For the sqlite3 amalgamation, this reduces time spent in the splitting pass
from 0.4% of the total to under 0.1%.

llvm-svn: 351894
2019-01-22 22:49:22 +00:00
Vedant Kumar 38874c8f7b [HotColdSplit] Calculate domtrees lazily to reduce compile-time, NFC
The splitting pass does not need (post)domtrees until after it's found a
cold block. Defer domtree calculation until a cold block is found.

For the sqlite3 amalgamation, this reduces time spent in the splitting
pass from 0.8% of the total to 0.4%.

llvm-svn: 351892
2019-01-22 22:49:08 +00:00
Jordan Rupprecht 0efe27e62b Revert r351520, "Re-enable terminator folding in LoopSimplifyCFG"
This is still causing compilation crashes in some targets. Will follow up shortly with a repro.

llvm-svn: 351845
2019-01-22 17:39:02 +00:00
Max Kazantsev feb475f4cf [LoopPredication] Support guards expressed as branches by widenable condition
This patch adds support of guards expressed as branches by widenable
conditions in Loop Predication.

Differential Revision: https://reviews.llvm.org/D56081
Reviewed By: reames

llvm-svn: 351805
2019-01-22 11:49:06 +00:00
Max Kazantsev ca45087826 [NFC] Factor out some reusable logic
llvm-svn: 351794
2019-01-22 10:13:36 +00:00
Philip Reames 390c0e2f72 [CVP] Use LVI to constant fold deopt operands
Deopt operands are generally intended to record information about a site in code with minimal perturbation of the surrounding code. Idiomatically, they also tend to appear down rare paths. Putting these together, we have an obvious case for extending CVP w/deopt operand constant folding. Arguably, we should be doing this for all operands on all instructions, but that's definitely a much larger and risky change.

Differential Revision: https://reviews.llvm.org/D55678

llvm-svn: 351774
2019-01-22 01:34:33 +00:00
Serge Guelton be88539b85 Replace llvm::isPodLike<...> by llvm::is_trivially_copyable<...>
As noted in https://bugs.llvm.org/show_bug.cgi?id=36651, the specialization for
isPodLike<std::pair<...>> did not match the expectation of
std::is_trivially_copyable which makes the memcpy optimization invalid.

This patch renames the llvm::isPodLike trait into llvm::is_trivially_copyable.
Unfortunately std::is_trivially_copyable is not portable across compiler / STL
versions. So a portable version is provided too.

Note that the following specialization were invalid:

    std::pair<T0, T1>
    llvm::Optional<T>

Tests have been added to assert that former specialization are respected by the
standard usage of llvm::is_trivially_copyable, and that when a decent version
of std::is_trivially_copyable is available, llvm::is_trivially_copyable is
compared to std::is_trivially_copyable.

As of this patch, llvm::Optional is no longer considered trivially copyable,
even if T is. This is to be fixed in a later patch, as it has impact on a
long-running bug (see r347004)

Note that GCC warns about this UB, but this got silented by https://reviews.llvm.org/D50296.

Differential Revision: https://reviews.llvm.org/D54472

llvm-svn: 351701
2019-01-20 21:19:56 +00:00
Simon Pilgrim e1143c1322 [X86] Auto upgrade VPCOM/VPCOMU intrinsics to generic integer comparisons
This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp.

Noticed while cleaning up vector integer comparison costs for PR40376.

llvm-svn: 351697
2019-01-20 19:27:40 +00:00
Vedant Kumar 857cacd9de [ConstantMerge] Factor out check for un-mergeable globals, NFC
llvm-svn: 351671
2019-01-20 02:44:43 +00:00
Nikita Popov 6515db205a [InstCombine] Simplify cttz/ctlz + icmp ugt/ult
Followup to D55745, this time handling comparisons with ugt and ult
predicates (which are the canonical forms for non-equality predicates).

For ctlz we can convert into a simple icmp, for cttz we can convert
into a mask check.

Differential Revision: https://reviews.llvm.org/D56355

llvm-svn: 351645
2019-01-19 09:56:01 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Johannes Doerfert 36872b5db9 Enable IPConstantPropagation to work with abstract call sites
This modification of the currently unused inter-procedural constant
propagation pass (IPConstantPropagation) shows how abstract call sites
enable optimization of callback calls alongside direct and indirect
calls. Through minimal changes, mostly dealing with the partial mapping
of callbacks, inter-procedural constant propagation was enabled for
callbacks, e.g., OpenMP runtime calls or pthreads_create.

Differential Revision: https://reviews.llvm.org/D56447

llvm-svn: 351628
2019-01-19 05:19:12 +00:00
Vedant Kumar b537b946b8 [MergeFunc] Allow merging identical vararg functions using aliases
Thanks to Nikita Popov for pointing out this missed case.

This is a follow-up to r351411, which disabled function merging for
vararg functions outright due to a miscompile (see llvm.org/PR40345).

Differential Revision: https://reviews.llvm.org/D56865

llvm-svn: 351624
2019-01-19 02:46:22 +00:00
Vedant Kumar b755a2df51 [HotColdSplit] Mark inherently cold functions as such
If an inherently cold function is found, mark it as cold. For now this
means applying the `cold` and `minsize` attributes.

As a drive-by, revisit and clean up the criteria for considering a
function for splitting. Add tests.

llvm-svn: 351623
2019-01-19 02:38:47 +00:00
Vedant Kumar 4de1962bb9 [HotColdSplit] Remove a set which tracked split functions (NFC)
Use the begin/end iterator idiom to avoid visiting split functions,
instead of doing a set lookup.

llvm-svn: 351622
2019-01-19 02:38:17 +00:00
Vedant Kumar 17d9f14bff [CodeExtractor] Emit lifetime markers around reloads of outputs
CodeExtractor permits extracting a region of blocks from a function even
when values defined within the region are used outside of it.

This is typically done by creating an alloca in the original function
and reloading the alloca after a call to the extracted function.

Wrap the reload in lifetime start/end markers to promote stack coloring.

Suggested by Sergei Kachkov!

Differential Revision: https://reviews.llvm.org/D56045

llvm-svn: 351621
2019-01-19 02:37:59 +00:00
Florian Hahn be7cbe3f70 [LCSSA] Skip blocks in sub-loops when scanning for uses.
Summary:
Scanning blocks in sub-loops for uses is unnecessary, as they were
already handled while dealing with the containing sub-loop.

This speeds up LCSSA for highly nested loops. For the test case in PR37202, it
halves the time spent in LCSSA. In cases were we won't be able to skip
any blocks, the additional lookup should be negligible.

Time-passes without this patch for test case from PR37202:

  Total Execution Time: 48.5505 seconds (48.5511 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  10.0822 ( 21.0%)   0.1406 ( 27.0%)  10.2228 ( 21.1%)  10.2228 ( 21.1%)  Loop-Closed SSA Form Pass
  10.0417 ( 20.9%)   0.1467 ( 28.2%)  10.1884 ( 21.0%)  10.1890 ( 21.0%)  Loop-Closed SSA Form Pass #2
   4.2703 (  8.9%)   0.0040 (  0.8%)   4.2742 (  8.8%)   4.2742 (  8.8%)  Unswitch loops
   2.7376 (  5.7%)   0.0229 (  4.4%)   2.7605 (  5.7%)   2.7611 (  5.7%)  Loop-Closed SSA Form Pass #5
   2.7332 (  5.7%)   0.0214 (  4.1%)   2.7546 (  5.7%)   2.7546 (  5.7%)  Loop-Closed SSA Form Pass #3
   2.7088 (  5.6%)   0.0230 (  4.4%)   2.7319 (  5.6%)   2.7324 (  5.6%)  Loop-Closed SSA Form Pass #4
   2.6855 (  5.6%)   0.0236 (  4.5%)   2.7091 (  5.6%)   2.7090 (  5.6%)  Loop-Closed SSA Form Pass #6
   2.1648 (  4.5%)   0.0018 (  0.4%)   2.1666 (  4.5%)   2.1664 (  4.5%)  Unroll loops
   1.8371 (  3.8%)   0.0009 (  0.2%)   1.8379 (  3.8%)   1.8380 (  3.8%)  Value Propagation
   1.8149 (  3.8%)   0.0021 (  0.4%)   1.8170 (  3.7%)   1.8169 (  3.7%)  Loop Invariant Code Motion
   1.6755 (  3.5%)   0.0226 (  4.3%)   1.6981 (  3.5%)   1.6980 (  3.5%)  Loop-Closed SSA Form Pass #7

Time-passes with this patch

  Total Execution Time: 29.9285 seconds (29.9276 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
   4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
   4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
   2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
   2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
   1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
   1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
   1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
   1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
   1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
   1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3

Reviewers: davide, efriedma, mzolotukhin

Reviewed By: davide, efriedma

Subscribers: hiraditya, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56848

llvm-svn: 351567
2019-01-18 17:36:22 +00:00
Max Kazantsev b4cd50be56 Re-enable terminator folding in LoopSimplifyCFG: underlying bugs fixed
llvm-svn: 351520
2019-01-18 04:57:32 +00:00
Vedant Kumar f529b50702 [HotColdSplit] Allow outlining with live outputs
Prior to r348205, extracting code regions with live output values was
disabled because of a miscompilation (PR39433). Lift the restriction as
PR39433 has been addressed.

Tested on LNT+externals, on a run of check-llvm in a stage2 build, and
with a full build of iOS (with hot/cold splitting enabled).

As a drive-by, remove an errant TODO.

llvm-svn: 351492
2019-01-17 22:36:05 +00:00
Vedant Kumar b70e20db62 [HotColdSplit] Consider resume instructions to be cold
Resuming exception unwinding is roughly as unlikely as throwing an
exception.

Tested on LNT+externals (in particular, the C++ EH regression tests
provide end-to-end test coverage), as well as with a full build of iOS.

llvm-svn: 351491
2019-01-17 22:35:47 +00:00
Vedant Kumar 4541be0686 [HotColdSplit] Relax requirement that the cold sink block be extractable
Relaxing this requirement creates opportunities to split code dominated
by an EH pad.

Tested on LNT+externals.

llvm-svn: 351483
2019-01-17 21:42:36 +00:00
Vedant Kumar 32a014d048 [HotColdSplit] Simplify tests by lowering their splitting thresholds
This gets rid of the brittle/mysterious calls to @sink()/@sideeffect()
peppered throughout the test cases. They are no longer needed to force
splitting to occur.

llvm-svn: 351480
2019-01-17 21:29:34 +00:00
Wei Mi 3bcccdfe38 [SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink
If the sample profile has no inlining hierachy information included, we call
the sample profile is flattened. For flattened profile, in ThinLTO postlink
phase, SampleProfileLoader's hot function inlining and profile annotation will
do nothing, so it is better to save the effort to read in the profile and run
the sample profile loader pass. It is helpful for reducing compile time when
the flattened profile is huge.

Differential Revision: https://reviews.llvm.org/D54819

llvm-svn: 351476
2019-01-17 20:48:34 +00:00
Reid Kleckner edd653bc07 [InstCombine] Don't sink dynamic allocas
Summary:
InstCombine's sinking algorithm only thinks about memory. It doesn't
think about non-memory constraints like stack object lifetime. It can
sink dynamic allocas across a stacksave call, which may be used with
stackrestore, which can incorrectly reduce the lifetime of the dynamic
alloca.

Fixes PR40365

Reviewers: hfinkel, efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56872

llvm-svn: 351475
2019-01-17 20:46:53 +00:00
Teresa Johnson 8d86f1ba47 Revert "[ThinLTO] Add summary entries for index-based WPD"
Mistaken commit of something still under review!

This reverts commit r351453.

llvm-svn: 351455
2019-01-17 16:05:04 +00:00
Teresa Johnson 4fcf3b1621 [ThinLTO] Add summary entries for index-based WPD
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).

Also added are the necessary bitcode records, and the corresponding
assembly support.

The index-based WPD will be sent as a follow-on.

Depends on D53890.

Reviewers: pcc

Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54815

llvm-svn: 351453
2019-01-17 15:49:03 +00:00
Max Kazantsev 61a8d3fb33 [LoopSimplifyCFG] Form LCSSA when a parent loop becomes a sibling
During the transforms in LoopSimplifyCFG, when we remove a dead exiting edge, the
parent loop may stop being reachable from the child loop, and therefore they become
siblings. If the former child loop had uses of some values from its former parent loop,
now such uses will require LCSSA Phis, even if they weren't needed before. So we must
form LCSSA for all loops that stopped being ancestors of the current loop in this case.

Differential Revision: https://reviews.llvm.org/D56144
Reviewed By: fedor.sergeev

llvm-svn: 351434
2019-01-17 12:51:10 +00:00
Max Kazantsev 8b134169f5 [LoopSimplifyCFG] Fix order of deletion of complex dead subloops
Function `DeleteDeadBlock` requires that all predecessors of a block
being deleted have already been deleted, with the exception of a
single-block loop. When we use it for removal of dead subloops that
contain more than one block, we may not fulfull this requirement and
fail an assertion.

This patch replaces invocation of `DeleteDeadBlock` with a generalized
version `DeleteDeadBlocks` that is able to deal with multiple dead blocks,
even if they contain some cycles.

Differential Revision: https://reviews.llvm.org/D56121
Reviewed By: fedor.sergeev

llvm-svn: 351433
2019-01-17 12:25:40 +00:00
Max Kazantsev ee61308595 [NFC] Factor out some local vars
llvm-svn: 351416
2019-01-17 06:20:42 +00:00
Vedant Kumar a9906c1e5e [MergeFunc] Prevent silent miscompile of vararg functions
The function merging pass miscompiles identical vararg functions. The
forwarding thunk it emits doesn't forward the full variable-length list
of arguments. Disable merging for vararg functions for now.

I've filed llvm.org/PR40345 to track the issue.

rdar://47326238

llvm-svn: 351411
2019-01-17 02:15:05 +00:00
Vedant Kumar e21ab22115 [FunctionComparator] Consider tail call kinds
Essentially, do not treat `call` and `musttail call` as the same thing.

As a drive-by, fold CallInst and InvokeInst handling together using the
CallSite helper.

Differential Revision: https://reviews.llvm.org/D56815

llvm-svn: 351405
2019-01-17 00:29:14 +00:00
Wei Mi 79c4408aa2 Fix a mistake in rL351392.
PGOInstrGen should be initialized to "" instead of false.

llvm-svn: 351397
2019-01-16 23:31:40 +00:00
Wei Mi c876e3d42b [PGO] Make pgo related options in opt more consistent.
Currently we have pgo options defined in PassManagerBuilder.cpp only for
instrument pgo, but not for sample pgo. We also have pgo options defined
in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
pgo. They have some inconsistency.

To make the options more consistent and make tests writing easier, the
patch let old pass manager to share the same pgo options with new pass
manager in opt, and removes the options in PassManagerBuilder.cpp.

Differential Revision: https://reviews.llvm.org/D56749

llvm-svn: 351392
2019-01-16 23:19:02 +00:00
Alexey Bataev 18809a6bbb [SLP] Fix PR40310: The reduction nodes should stay scalar.
Summary:
Sometimes the SLP vectorizer tries to vectorize the horizontal reduction
nodes during regular vectorization. This may happen inside of the loops,
when there are some vectorizable PHIs. Patch fixes this by checking if
the node is the reduction node and thus it must not be vectorized, it must
be gathered.

Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56783

llvm-svn: 351349
2019-01-16 15:39:52 +00:00
Gabor Buella 3ec170c85a Assertion in isAllocaPromotable due to extra bitcast goes into lifetime marker
For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic.

For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation)

```
%rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]*    <----- this one causing the assertion and is an extra bitcast
%5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* %5)
```

isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion.

As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation.

The test is a greatly simplified to just reproduce the assertion.

Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>

Reviewers: chandlerc, craig.topper

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D55934

llvm-svn: 351325
2019-01-16 12:06:17 +00:00
Philip Pfaffe 81101de585 [MSan] Apply the ctor creation scheme of TSan
Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan.

Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Differential Revision: https://reviews.llvm.org/D56734

llvm-svn: 351322
2019-01-16 11:14:07 +00:00
Philip Pfaffe 685c76d7a3 [NewPM][TSan] Reiterate the TSan port
Summary:
Second iteration of D56433 which got reverted in rL350719. The problem
in the previous version was that we dropped the thunk calling the tsan init
function. The new version keeps the thunk which should appease dyld, but is not
actually OK wrt. the current semantics of function passes. Hence, add a
helper to insert the functions only on the first time. The helper
allows hooking into the insertion to be able to append them to the
global ctors list.

Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Differential Revision: https://reviews.llvm.org/D56538

llvm-svn: 351314
2019-01-16 09:28:01 +00:00
Tom Stellard 3d36e5c3e6 Only promote args when function attributes are compatible
Summary:
Check to make sure that the caller and the callee have compatible
function arguments before promoting arguments.  This uses the same
TargetTransformInfo queries that are used to determine if attributes
are compatible for inlining.

The goal here is to avoid breaking ABI when a called function's ABI
depends on a target feature that is not enabled in the caller.

This is a very conservative fix for PR37358.  Ideally we would have a more
sophisticated check for ABI compatiblity rather than checking if the
attributes are compatible for inlining.

Reviewers: echristo, chandlerc, eli.friedman, craig.topper

Reviewed By: echristo, chandlerc

Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits

Differential Revision: https://reviews.llvm.org/D53554

llvm-svn: 351296
2019-01-16 05:15:31 +00:00
Serguei Katkov a5b0e5585b [InstCombine]Avoid introduction of unaligned mem access
InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair.
It might result in generation of unaligned atomic load/store which later in backend
will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is
and handle it at backend.

Reviewers: reames, anna, apilipenko, mkazantsev
Reviewed By: reames
Subscribers: t.p.northover, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56582

llvm-svn: 351295
2019-01-16 04:36:26 +00:00
David Callahan d129d3e93f treat invoke like call
Summary:
InvokeInst should be treated like CallInst and
assigned a separate discriminator. This is particularly
import when an Invoke is converted to a Call
during compilation and so can invalidate sample profile
data collected wtih different link time optimizations

Reviewers: twoh, Kader, danielcdh, wmi

Reviewed By: wmi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56491

llvm-svn: 351251
2019-01-15 21:26:51 +00:00
Matt Morehouse 19ff35c481 [SanitizerCoverage] Don't create comdat for interposable functions.
Summary:
Comdat groups override weak symbol behavior, allowing the linker to keep
the comdats for weak symbols in favor of comdats for strong symbols.

Fixes the issue described in:
https://bugs.chromium.org/p/chromium/issues/detail?id=918662

Reviewers: eugenis, pcc, rnk

Reviewed By: pcc, rnk

Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56516

llvm-svn: 351247
2019-01-15 21:21:01 +00:00
David Callahan dee001207e We can improve the performance (generally) by memo-izing the action to map a debug location to its function summary.
Summary:
Here are timings (as reported by "opt -time-passes") for
sample-profile pass for some files holding hot functions from a major
service©r. Average 17% reduction. Delta column is 100*(old-new)/old.

```
Old    New    Delta
0.0537 0.0538 -0.2%
0.8155 0.6522 20.0%
0.0779 0.0751  3.6%
0.0727 0.0913 -25.6%
0.1622 0.1302 19.7%
0.0627 0.0594  5.3%
0.0766 0.0744  2.9%
0.6426 0.4387 31.7%
0.3521 0.2776 21.2%
0.3549 0.2721 23.3%
0.0912 0.0904  0.9%
0.1236 0.1059 14.3%
0.0854 0.0866 -1.4%
0.0757 0.0722  4.6%
0.1293 0.1147 11.3%
0.1354 0.1122 17.1%
0.0767 0.0770 -0.4%
0.1135 0.0968 14.7%
0.0524 0.0608 -16.0%
0.1279 0.1106 13.5%
==========
3.6820 3.0520 17.1% Total
```

Reviewers: twoh, Kader, danielcdh, wmi

Reviewed By: wmi

Subscribers: dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D56435

llvm-svn: 351211
2019-01-15 17:45:54 +00:00
Zaara Syeda b7dff9c9af [SimpleLoopUnswitch] Increment stats counter for unswitching switch instruction
Increment statistics counter NumSwitches at unswitchNontrivialInvariants() for
unswitching a non-trivial switch instruction. This is to fix a bug that it
increments NumBranches even for the case of switch instruction.
There is no functional change in this patch.

Differential Revision: https://reviews.llvm.org/D56408

llvm-svn: 351193
2019-01-15 15:08:01 +00:00
Florian Hahn 4094f34f78 [InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs.
Otherwise instcombine gets stuck in a cycle. The canonicalization was
added in D55961.

This patch fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12400

llvm-svn: 351187
2019-01-15 11:18:21 +00:00
Max Kazantsev f8a0e0ddf0 [NFC] Remove some code duplication
llvm-svn: 351185
2019-01-15 11:16:14 +00:00
Max Kazantsev 80242ee87e [NFC] Remove obsolete enum RangeCheckKind
llvm-svn: 351183
2019-01-15 10:48:45 +00:00
Max Kazantsev 78a5435284 [NFC] Decrease if nest
llvm-svn: 351180
2019-01-15 10:01:46 +00:00
Max Kazantsev a78dc4d6c8 [NFC] Move some functions to LoopUtils
llvm-svn: 351179
2019-01-15 09:51:34 +00:00
Marek Olsak 33eb4d947d AMDGPU: Add a fast path for icmp.i1(src, false, NE)
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.

And we can also get the mask from and i1, or i1, xor i1.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52060

llvm-svn: 351150
2019-01-15 02:13:18 +00:00
Jonathan Metzman e159a0dd1a [SanitizerCoverage][NFC] Use appendToUsed instead of include
Summary:
Use appendToUsed instead of include to ensure that
SanitizerCoverage's constructors are not stripped.

Also, use isOSBinFormatCOFF() to determine if target
binary format is COFF.

Reviewers: pcc

Reviewed By: pcc

Subscribers: hiraditya

Differential Revision: https://reviews.llvm.org/D56369

llvm-svn: 351118
2019-01-14 21:02:02 +00:00
David Callahan 957795973b Ignore PhiNodes when mapping sample profile data
Summary: Like branch instructions, phi nodes frequently do not have debug information related to the block they are in and so they should be ignored.

Reviewers: danielcdh, twoh, Kader, wmi

Reviewed By: wmi

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55094

llvm-svn: 351102
2019-01-14 19:05:59 +00:00
David Callahan b1853a6a94 Revert "Merge branch 'arcpatch-D55094'"
This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing
changes made to f1309ffebf718d16aec4fab83380556c660e2825.

unintended merge pushed

llvm-svn: 351095
2019-01-14 18:49:27 +00:00
David Callahan 1b46231764 Merge branch 'arcpatch-D55094'
llvm-svn: 351092
2019-01-14 18:35:43 +00:00
Tom Stellard d0a7676087 cmake: Don't install plugins used for examples or tests
Summary:
This patch drops install targets for LLVMHello.so,
TestPlugin.so, and BugpointPasses.so.

Reviewers: chandlerc, beanz, thakis, philip.pfaffe

Reviewed By: chandlerc

Subscribers: SquallATF, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D55965

llvm-svn: 351087
2019-01-14 18:25:35 +00:00
Jeremy Morse f216da7ee0 [DebugInfo] Remove un-necessary logic from HoistThenElseCodeToIf
Following PR39807, the way in which SimplifyCFG hoists common code on
branch paths was fixed in r347782. However this left extra code hanging
around HoistThenElseCodeToIf that wasn't necessary and needlessly
complicated matters -- we no longer need to look up through the 'if'
basic block to find a location for hoisted 'select' insts, we can instead
use the location chosen by applyMergedLocation.

This patch deletes that extra logic, and updates a regression test to
reflect the new logic (selects get the merged location, not a previous
insts location).

Differential Revision: https://reviews.llvm.org/D55272

llvm-svn: 351058
2019-01-14 12:13:12 +00:00
David Stuttard f77079f892 [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda


Work around for ppcle compiler bug

Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
2019-01-14 11:55:24 +00:00
Max Kazantsev 1f73310e1e [BasicBlockUtils] Generalize DeleteDeadBlock to deal with multiple dead blocks
Utility function `DeleteDeadBlock` expects that all predecessors of a block being
deleted are already deleted, with the exception of single-block loop. It makes it
hard to use for deletion of a set of blocks that may contain cyclic dependencies.
The is no correct order of invocations of this function that does not produce
dangling pointers on already deleted blocks.

This patch introduces a generalized version of this function `DeleteDeadBlocks`
that allows us to remove multiple blocks at once, even if there are cycles among
them. The only requirement is that no block being deleted should have a predecessor
that is not being deleted. 

The logic of `DeleteDeadBlocks` is following:
  for each block
    create relevant DT updates;
    remove all instructions (replace with undef if needed);
    replace terminator with unreacheable;
  apply DT updates;
  for each block
    delete block;

Therefore, `DeleteDeadBlock` becomes a particular case of
the general algorithm called for a single block.

Differential Revision: https://reviews.llvm.org/D56120
Reviewed By: skatkov

llvm-svn: 351045
2019-01-14 10:26:26 +00:00
Benjamin Kramer b17d2136ea Give helper classes/functions local linkage. NFC.
llvm-svn: 351016
2019-01-12 18:36:22 +00:00
Sanjay Patel 7d65fe5cd5 [LoopVectorizer] give more advice in remark about failure to vectorize call
Something like this is requested by:
https://bugs.llvm.org/show_bug.cgi?id=40265
...and it seems like a common enough case that we should acknowledge it.

Differential Revision: https://reviews.llvm.org/D56551

llvm-svn: 351010
2019-01-12 15:27:15 +00:00
Teresa Johnson 290a839891 [LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).

The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.

This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.

Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.

Reviewers: pcc

Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D53890

llvm-svn: 350948
2019-01-11 18:31:57 +00:00
Vedant Kumar ee10ef737e [MergeFunc] Erase unused duplicate functions if they are discardable
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.

Differential Revision: https://reviews.llvm.org/D56574

llvm-svn: 350939
2019-01-11 17:56:35 +00:00
Vedant Kumar 08fe7e02fb [MergeFunc] Use Instruction::getFunction as a cleanup, NFC
llvm-svn: 350938
2019-01-11 17:56:21 +00:00
Ehsan Amiri f452f116d2 [Jump Threading] Unfold a select insn that feeds a switch via a phi node
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select 
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a 
switch (via a phi node). The patch unfolds select under this condition. 
A testcase is provided.

llvm-svn: 350931
2019-01-11 15:52:57 +00:00
Matt Davis 9cd9f41f0e [GVN] Update BlockRPONumber prior to use.
Summary:
The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional.  In short we found cases where that map was being accessed with blocks that had not yet been added to that structure.  For context, I've kept the wall of text below,  to what we are trying to fix, by always ensuring a updated BlockRPONumber.

== Backstory ==

I was investigating an ICE (segfault accessing a DenseMap item).  This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018).

After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in `GVN::performScalarPRE()`. The values in the map are unsigned integers, the keys are `llvm::BasicBlock*`.  Our test case that triggered this warning and periodic crash is rather involved.  But the problematic line looks to be:

GVN.cpp: Line 2197

```
     if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] &&
```

To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs.

My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above).  In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm.

1) There is no sequence point for the `>=` operation.

2) A call to a `DenseMapBase::operator[]` can have the side effect of the map reallocating a larger store (more Buckets, via a call to `DenseMap::grow`).

3) It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to `operator[]` on the stack (that's what my host compile of GVN.cpp is doing) .  A second call to `operator[]` //might// encourage the map to 'grow' thus making any pointers to the map's store invalid.  The `>=` compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison.

The assembly generated from the Window's host compiler does show the result of the first access to the map via `operator[]` produces a pointer to an unsigned int.  And that pointer is being stored on  the stack.  If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid. 

Reviewers: t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D55974

llvm-svn: 350880
2019-01-10 19:56:03 +00:00
Alina Sbirlea cae12edaaa Use MemorySSA in LICM to do sinking and hoisting.
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.

Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.

Reviewers: sanjoy, davide, gberry, george.burgess.iv

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D40375

llvm-svn: 350879
2019-01-10 19:29:04 +00:00
James Y Knight 62df5eed16 [opaque pointer types] Remove some calls to generic Type subtype accessors.
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).

I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.

llvm-svn: 350835
2019-01-10 16:07:20 +00:00
Eli Friedman d4e7a0d83c [SimplifyLibCalls] Fix memchr expansion for constant strings.
The C standard says "The memchr function locates the first
occurrence of c (converted to an unsigned char)[...]".  The expansion
was missing the conversion to unsigned char.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 .

Differential Revision: https://reviews.llvm.org/D55947

llvm-svn: 350775
2019-01-09 23:39:26 +00:00
Easwaran Raman b45994b843 Refactor synthetic profile count computation. NFC.
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).

Reviewers: davidxl

Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D56464

llvm-svn: 350755
2019-01-09 20:10:27 +00:00
Florian Hahn 9697d2a764 Revert r350647: "[NewPM] Port tsan"
This patch breaks thread sanitizer on some macOS builders, e.g.
http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/52725/

llvm-svn: 350719
2019-01-09 13:32:16 +00:00
Max Kazantsev 4615a505f8 [IPT] Drop cache less eagerly in GVN and LoopSafetyInfo
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.

This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.

Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma

llvm-svn: 350694
2019-01-09 07:28:13 +00:00
Sanjay Patel d023dd60e9 [InstCombine] canonicalize another raw IR rotate pattern to funnel shift
This is matching the equivalent of the DAG expansion, 
so it should never end up with worse perf than the 
original code even if the target doesn't have a rotate
instruction.

llvm-svn: 350672
2019-01-08 22:39:55 +00:00
Philip Pfaffe 82f995db75 [NewPM] Port tsan
A straightforward port of tsan to the new PM, following the same path
as D55647.

Differential Revision: https://reviews.llvm.org/D56433

llvm-svn: 350647
2019-01-08 19:21:57 +00:00
Anna Thomas 2dfa412efe [UnrollRuntime] Fix domTree failures in multiexit unrolling
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).

Reviewers: reames, mzolotukhin, mkazantsev, hfinkel

Reviewed by: brzycki, kuhar

Subscribers: zzheng, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56284

llvm-svn: 350640
2019-01-08 17:16:25 +00:00
Chandler Carruth 57578aaf96 [CallSite removal] Port `IndirectCallSiteVisitor` to use `CallBase` and
update client code.

Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.

Differential Revision: https://reviews.llvm.org/D56183

llvm-svn: 350508
2019-01-07 07:15:51 +00:00
Chandler Carruth 363ac68374 [CallSite removal] Migrate all Alias Analysis APIs to use the newly
minted `CallBase` class instead of the `CallSite` wrapper.

This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.

Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.

Thanks for the review from Saleem!

Differential Revision: https://reviews.llvm.org/D55641

llvm-svn: 350503
2019-01-07 05:42:51 +00:00
Nikita Popov 65038515ee [InstCombine] Relax cttz/ctlz with select on zero
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.

Differential Revision: https://reviews.llvm.org/D55786

llvm-svn: 350463
2019-01-05 09:48:16 +00:00
Easwaran Raman 366a873f14 [Inliner] Optimize shouldBeDeferred
This has some minor optimizations to shouldBeDeferred. This is not
strictly NFC because the early exit inside the loop assumes
TotalSecondaryCost is monotonically non-decreasing, which is not true if
the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
not go below 0 for the default values of the various options we use.

llvm-svn: 350456
2019-01-05 02:26:29 +00:00
Evgeniy Stepanov 0184c53cbd Revert "Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)""
This reapplies commit r348983.

llvm-svn: 350448
2019-01-05 00:44:58 +00:00
Nikita Popov 6658fce4fc [BDCE] Remove dead uses of arguments
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.

I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.

Differential Revision: https://reviews.llvm.org/D56247

llvm-svn: 350435
2019-01-04 21:21:43 +00:00
Peter Collingbourne 87f477b5e4 hwasan: Implement lazy thread initialization for the interceptor ABI.
The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.

The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.

This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.

Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.

Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.

Differential Revision: https://reviews.llvm.org/D56038

llvm-svn: 350429
2019-01-04 19:27:04 +00:00
Teresa Johnson 853b962416 [ThinLTO] Handle chains of aliases
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.

Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.

Reviewers: pcc, davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54507

llvm-svn: 350423
2019-01-04 19:04:54 +00:00
Vedant Kumar a1778df474 [CodeExtractor] Do not extract unsafe lifetime markers
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):

```
               entry:
              +------------+
              | x = alloca |
              | y = alloca |
              +------------+
             /              \
   lhs:                      rhs:
  +-------------------+     +-------------------+
  | lifetime_start(x) |     | lifetime_start(x) |
  | use(x)            |     | lifetime_start(y) |
  | lifetime_end(x)   |     | use(x, y)         |
  | lifetime_start(y) |     | lifetime_end(y)   |
  | use(y)            |     | lifetime_end(x)   |
  | lifetime_end(y)   |     +-------------------+
  +-------------------+
```

Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.

This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.

Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.

Fixes llvm.org/PR39671 (rdar://45939472).

Differential Revision: https://reviews.llvm.org/D55967

llvm-svn: 350420
2019-01-04 17:43:22 +00:00
Sanjay Patel 722466e1f1 [InstCombine] reduce raw IR narrowing rotate patterns to funnel shift
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.

llvm-svn: 350419
2019-01-04 17:38:12 +00:00
John Brawn 39ac159c24 [LICM] Adjust how moving the re-hoist point works
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.

Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.

Differential Revision: https://reviews.llvm.org/D55266

llvm-svn: 350408
2019-01-04 17:12:09 +00:00
Xin Tong 47beee2f3f [memcpyopt] Remove a few unnecessary isVolatile() checks. NFC
We already checked for isSimple() on the store.

llvm-svn: 350378
2019-01-04 02:13:22 +00:00
Anna Thomas a470aa6701 [UnrollRuntime] Move the DomTree verification under expensive checks
Suggested by Hal as done in r349871.

llvm-svn: 350349
2019-01-03 19:43:33 +00:00
Anna Thomas 0785e7307e [UnrollRuntime] Add DomTree verification under debug mode
NFC: This adds the dom tree verification under debug mode at a point
just before we start unrolling the loop. This allows us to verify dom
tree at a state where it is much smaller and before the unrolling
actually happens.
This also implies we do not need to run -verify-dom-info everytime to
see if the DT is in a valid state when we transform the loop for runtime
unrolling.

llvm-svn: 350334
2019-01-03 17:44:44 +00:00
Philip Pfaffe b39a97c8f6 [NewPM] Port Msan
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.

Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
  library functions, for which it inserts declarations as necessary.
- Update tests.

Caveats:
- There is one test that I dropped, because it specifically tested the
  definition of the ctor.

Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka

Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji

Differential Revision: https://reviews.llvm.org/D55647

llvm-svn: 350305
2019-01-03 13:42:44 +00:00
Pete Cooper 697281df42 Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.

Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.

This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.

rdar://problem/47005143

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D56235

llvm-svn: 350284
2019-01-03 01:38:08 +00:00
Xin Tong 33e3b4b9b3 [ThinLTO] Scan all variants of vague symbol for reachability.
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.

Reviewers: tejohnson, grimar

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D56117

llvm-svn: 350269
2019-01-02 23:18:20 +00:00
Pete Cooper 8d58048024 Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef.
The caller to EraseInstruction had this conditional:

    // ARC calls with null are no-ops. Delete them.
    if (IsNullOrUndef(Arg))

but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.

This adds support for both of these cases.

rdar://problem/47003805

llvm-svn: 350261
2019-01-02 21:00:02 +00:00
Nikita Popov cc6ef7f153 [BDCE] Remove instructions without demanded bits
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.

Differential Revision: https://reviews.llvm.org/D56185

llvm-svn: 350257
2019-01-02 20:02:14 +00:00
Pawel Bylica 119aa8fa5f Format AggresiveInstCombine.cpp. NFC
llvm-svn: 350255
2019-01-02 19:51:46 +00:00
Sanjay Patel 654e6aabb9 [InstCombine] canonicalize raw IR rotate patterns to funnel shift
The final piece of IR-level analysis to allow this was committed with:
rL350188

Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.

The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.

llvm-svn: 350199
2019-01-01 21:51:39 +00:00
Nikita Popov bc9986e9ad Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 350188
2019-01-01 10:05:26 +00:00
Chen Zheng 4952e668f8 [InstCombine] canonicalize MUL with NEG operand
-X * Y --> -(X * Y)
X * -Y --> -(X * Y)

Differential Revision: https://reviews.llvm.org/D55961

llvm-svn: 350185
2019-01-01 01:09:20 +00:00
Alexander Potapenko cea4f83371 [MSan] Handle llvm.is.constant intrinsic
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.

llvm-svn: 350173
2018-12-31 09:42:23 +00:00
Max Kazantsev 201534d753 Drop SE cache early because loop parent can change in LoopSimplifyCFG
llvm-svn: 350145
2018-12-29 04:26:22 +00:00
Anna Thomas 98743fa77a [UnrollRuntime] NFC: Add comment and verify LCSSA
Added -verify-loop-lcssa to test cases.
Updated comments in ConnectProlog.

llvm-svn: 350131
2018-12-28 18:52:16 +00:00
Max Kazantsev 530ff8f3cc Temporarily disable term folding in LoopSimplifyCFG, add tests
llvm-svn: 350117
2018-12-28 06:22:39 +00:00
Max Kazantsev 80e4b40f3e [LoopSimplifyCFG] Delete dead blocks in RPO
Deletion of dead blocks in arbitrary order may lead to failure
of assertion in `DeleteDeadBlock` that requires that we have
deleted all predecessors before we can delete the current block.
We should instead delete them in RPO order.

llvm-svn: 350116
2018-12-28 06:08:51 +00:00
Craig Topper c9a6000755 [LoopIdiomRecognize] Add CTTZ support
Summary:
Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom)

This commit:
1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left.
2. Prepare for BitScan idiom recognition.

Patch by Yuanfang Chen (tabloid.adroit)

Reviewers: craig.topper, evstupac

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55876

llvm-svn: 350074
2018-12-26 21:59:48 +00:00
Max Kazantsev 28298e9647 [NFC] Use utility function for guards detection
llvm-svn: 350064
2018-12-26 08:22:25 +00:00
Max Kazantsev 9b25bf3960 [NFC] Reuse variables instead of re-calling getParent
llvm-svn: 350062
2018-12-25 07:20:06 +00:00
Eugene Leviant 4dc3a3f746 [HWASAN] Instrument memorty intrinsics by default
Differential revision: https://reviews.llvm.org/D55926

llvm-svn: 350055
2018-12-24 16:02:48 +00:00
Max Kazantsev edabb9ae56 [LoopSimplifyCFG] Delete dead exiting edges
This patch teaches LoopSimplifyCFG to remove dead exiting edges
from loops.

Differential Revision: https://reviews.llvm.org/D54025
Reviewed By: fedor.sergeev

llvm-svn: 350049
2018-12-24 07:41:33 +00:00
Max Kazantsev 347c583772 Return "[LoopSimplifyCFG] Delete dead in-loop blocks"
The underlying bug that caused the revert should be fixed by rL348567.

Differential Revision: https://reviews.llvm.org/D54023

llvm-svn: 350045
2018-12-24 06:06:17 +00:00
George Burgess IV 7e12875c89 [LoopIdioms] More LocationSize::precise annotations; NFC
Both of these places reference memset-like loops. Memset is precise.

Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350044
2018-12-24 05:55:50 +00:00
George Burgess IV 5e4a03a089 [MemCpyOpt] Use LocationSize instead of ints; NFC
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

srcSize is derived from the size of an alloca, and we quit out if the
size of that is > the size of the thing we're copying to. Hence, we
should always copy everything over, so these sizes are precise.

Don't make srcSize itself a LocationSize, since optionality isn't
helpful, and we do some comparisons against other sizes elsewhere in
that function.

llvm-svn: 350019
2018-12-23 06:40:39 +00:00
Mircea Trofin b53eeb6f4c [llvm] API for encoding/decoding DWARF discriminators.
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)

The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.

Reviewers: davidxl, danielcdh, wmi, dblaikie

Reviewed By: dblaikie

Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D55681

llvm-svn: 349973
2018-12-21 22:48:50 +00:00
Vedant Kumar b264d69de7 [IR] Add Instruction::isLifetimeStartOrEnd, NFC
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.

This was suggested as a cleanup in D55967.

Differential Revision: https://reviews.llvm.org/D56019

llvm-svn: 349964
2018-12-21 21:49:40 +00:00
Anna Thomas 18be3cb606 [RuntimeUnrolling] NFC: Add TODO and comments in connectProlog
Currently, runtime unrolling does not support loops where multiple
exiting blocks exit to the latchExit. Added TODO and other code
clarifications for ConnectProlog code.

llvm-svn: 349944
2018-12-21 19:45:05 +00:00
Simon Pilgrim 5d403f6bf8 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.

Clang counterpart: https://reviews.llvm.org/D55890

Differential Revision: https://reviews.llvm.org/D55894

llvm-svn: 349892
2018-12-21 09:04:14 +00:00
Reid Kleckner b894ecf903 [memcpyopt] Add debug logs when forwarding memcpy src to dst
llvm-svn: 349873
2018-12-21 01:41:20 +00:00
Eli Friedman 3af2f53456 [LoopUnroll] Don't verify domtree by default with +Asserts.
This verification is linear in the size of the function, so it can cause
a quadratic compile-time explosion in a function with many loops to
unroll.

Differential Revision: https://reviews.llvm.org/D54732

llvm-svn: 349871
2018-12-21 01:28:49 +00:00
Tom Stellard 2f44fbe936 cmake: Remove add_llvm_loadable_module()
Summary:
This function is very similar to add_llvm_library(),  so this patch merges it
into add_llvm_library() and replaces all calls to add_llvm_loadable_module(lib ...)
with add_llvm_library(lib MODULE ...)

Reviewers: philip.pfaffe, beanz, chandlerc

Reviewed By: philip.pfaffe

Subscribers: chapuni, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D51748

llvm-svn: 349839
2018-12-20 22:04:08 +00:00
Michael Kruse 199427100b [InstCombine] Preserve access-group metadata.
Preserve llvm.access.group metadata when combining store instructions.
This was forgotten in r349725.

Fixes llvm.org/PR40117

llvm-svn: 349774
2018-12-20 17:11:02 +00:00
Piotr Sobczak deaacc17fe [InstCombine][AMDGPU] Handle more buffer intrinsics
Summary:
Include the following intrinsics in the InsctCombine
simplification:

* amdgcn_raw_buffer_load
* amdgcn_raw_buffer_load_format
* amdgcn_struct_buffer_load
* amdgcn_struct_buffer_load_format

Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55882

llvm-svn: 349735
2018-12-20 10:08:18 +00:00
Alexander Potapenko 0e3b85a730 [MSan] Don't emit __msan_instrument_asm_load() calls
LLVM treats void* pointers passed to assembly routines as pointers to
sized types.
We used to emit calls to __msan_instrument_asm_load() for every such
void*, which sometimes led to false positives.
A less error-prone (and truly "conservative") approach is to unpoison
only assembly output arguments.

llvm-svn: 349734
2018-12-20 10:05:00 +00:00
Eugene Leviant 2d98eb1b2e [HWASAN] Add support for memory intrinsics
Differential revision: https://reviews.llvm.org/D55117

llvm-svn: 349728
2018-12-20 09:04:33 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Vitaly Buka 07a55f27dc [asan] Undo special treatment of linkonce_odr and weak_odr
Summary:
On non-Windows these are already removed by ShouldInstrumentGlobal.
On Window we will wait until we get actual issues with that.

Reviewers: pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55899

llvm-svn: 349707
2018-12-20 00:30:27 +00:00
Vitaly Buka d414e1bbb5 [asan] Prevent folding of globals with redzones
Summary:
ICF prevented by removing unnamed_addr and local_unnamed_addr for all sanitized
globals.
Also in general unnamed_addr is not valid here as address now is important for
ODR violation detector and redzone poisoning.

Before the patch ICF on globals caused:
1. false ODR reports when we register global on the same address more than once
2. globals buffer overflow if we fold variables of smaller type inside of large
type. Then the smaller one will poison redzone which overlaps with the larger one.

Reviewers: eugenis, pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55857

llvm-svn: 349706
2018-12-20 00:30:18 +00:00
Nikita Popov 3817ee7908 Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This reverts commit r349674. It causes a failure in
test-suite enc-3des.execution_time.

llvm-svn: 349684
2018-12-19 22:09:02 +00:00
Nikita Popov 649e125451 [BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 349674
2018-12-19 19:56:21 +00:00
Anton Afanasyev ce28791e20 Test commit
Fix typos.

llvm-svn: 349644
2018-12-19 17:18:40 +00:00
Vitaly Buka 4e4920694c [asan] Restore ODR-violation detection on vtables
Summary:
unnamed_addr is still useful for detecting of ODR violations on vtables

Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false
reports which can be avoided with --icf=none or by using private aliases
with -fsanitize-address-use-odr-indicator

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: kubamracek, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55799

llvm-svn: 349555
2018-12-18 22:23:30 +00:00
Kuba Mracek 3760fc9f3d [asan] In llvm.asan.globals, allow entries to be non-GlobalVariable and skip over them
Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list.

Differential Revision: https://reviews.llvm.org/D55794

llvm-svn: 349544
2018-12-18 21:20:17 +00:00
Pete Cooper be4f571107 Change the objc ARC optimizer to use the new objc.* intrinsics
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics.  This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.

Differential Revision: https://reviews.llvm.org/D55348

Reviewers: ahatanak
llvm-svn: 349534
2018-12-18 20:32:49 +00:00
Nikita Popov 20853a7807 [InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check
Checking whether a number has a certain number of trailing / leading
zeros means checking whether it is of the form XXXX1000 / 0001XXXX,
which can be done with an and+icmp.

Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next
step, this can be extended to non-equality predicates.

Differential Revision: https://reviews.llvm.org/D55745

llvm-svn: 349530
2018-12-18 19:59:50 +00:00
Florian Hahn 5c014037b3 [SCCP] Get rid of redundant call for getPredicateInfoFor (NFC).
We can use the result fetched a few lines above.

llvm-svn: 349527
2018-12-18 19:37:07 +00:00
Sanjay Patel e51d5bdb3c [InstCombine] refactor isCheapToScalarize(); NFC
As the FIXME indicates, this has the potential to go
overboard. So I'm not sure if it's even worth keeping 
this vs. iteratively doing simple matches, but we might 
as well clean it up.

llvm-svn: 349523
2018-12-18 19:07:38 +00:00
Michael Kruse d4eb13c880 [LoopVectorize] Rename pass options. NFC.
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced

Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.

Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.

Differential Revision: https://reviews.llvm.org/D55785

llvm-svn: 349513
2018-12-18 17:46:09 +00:00
Michael Kruse 3284775b70 [LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops.
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.

This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.

The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.

Differential Revision: https://reviews.llvm.org/D55716

llvm-svn: 349509
2018-12-18 17:16:05 +00:00
Dylan McKay f920da009e [IPO][AVR] Create new Functions in the default address space specified in the data layout
This modifies the IPO pass so that it respects any explicit function
address space specified in the data layout.

In targets with nonzero program address spaces, all functions should, by
default, be placed into the default program address space.

This is required for Harvard architectures like AVR. Without this, the
functions will be marked as residing in data space, and thus not be
callable.

This has no effect to any in-tree official backends, as none use an
explicit program address space in their data layouts.

Patch by Tim Neumann.

llvm-svn: 349469
2018-12-18 09:52:52 +00:00
Tim Northover 856628f707 SROA: preserve alignment tags on loads and stores.
When splitting up an alloca's uses we were dropping any explicit
alignment tags, which means they default to the ABI-required default
alignment and this can cause miscompiles if the real value was smaller.

Also refactor the TBAA metadata into a parent class since it's shared by
both children anyway.

llvm-svn: 349465
2018-12-18 09:29:39 +00:00
Peter Collingbourne d3a3e4b46d hwasan: Move ctor into a comdat.
Differential Revision: https://reviews.llvm.org/D55733

llvm-svn: 349413
2018-12-17 22:56:34 +00:00
Sanjay Patel 200885e654 [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924)
Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. 
In the worst case (a target that doesn't have rotate instructions), we will expand this into a 
branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very 
likely to be a perf improvement over the original code.

The motivating source code pattern for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924

Background:
I looked at several different options before deciding where to try this - instcombine, simplifycfg, 
CGP - because it doesn't fit cleanly anywhere AFAIK.

The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to 
have the IR converted before we reach things like vectorization because the reduced code can make a 
loop much simpler to transform.

Technically, this could be included in instcombine, but it's a large pattern match that includes 
control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). 
Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch.

So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. 
This only runs at -O3, but that seems like a reasonable limitation given that source code has many 
options to avoid this pattern (including the recently added clang intrinsics for rotates).

I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, 
instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug
with the new pass manager that's independent of this change (but it will be masked if we canonicalize
harder to funnel shift intrinsics in instcombine).

Differential Revision: https://reviews.llvm.org/D55604

llvm-svn: 349396
2018-12-17 21:14:51 +00:00
Sanjay Patel 1a6e9ec434 [InstCombine] don't widen an arbitrary sequence of vector ops (PR40032)
The problem is shown specifically for a case with vector multiply here:
https://bugs.llvm.org/show_bug.cgi?id=40032
...and this might mask the original backend bug for ARM shown in:
https://bugs.llvm.org/show_bug.cgi?id=39967

As the test diffs here show, we were (and probably still aren't) doing 
these kinds of transforms in a principled way. We are producing more or 
equal wide instructions than we started with in some cases, so we still 
need to restrict/correct other transforms from overstepping.

If there are perf regressions from this change, we can either carve out 
exceptions to the general IR rules, or improve the backend to do these 
transforms when we know the transform is profitable. That's probably 
similar to a change like D55448.

Differential Revision: https://reviews.llvm.org/D55744

llvm-svn: 349389
2018-12-17 20:27:43 +00:00
Davide Italiano e41e1d015f [EarlyCSE] If DI can't be salvaged, mark it as unavailable.
Fixes PR39874.

llvm-svn: 349323
2018-12-17 01:42:39 +00:00
Kamil Rytarowski 21e270a479 Add NetBSD support in needsRuntimeRegistrationOfSectionRange.
Use linker script magic to get data/cnts/name start/end.

llvm-svn: 349277
2018-12-15 16:51:35 +00:00
Kamil Rytarowski 15ae738bc8 Register kASan shadow offset for NetBSD/amd64
The NetBSD x86_64 kernel uses the 0xdfff900000000000 shadow
offset.

llvm-svn: 349276
2018-12-15 16:32:41 +00:00
Florian Hahn c214bc2b8d [NewGVN] Update use counts for SSA copies when replacing them by their operands.
The current code relies on LeaderUseCount to determine if we can remove
an SSA copy, but in that the LeaderUseCount does not refer to the SSA
copy. If a SSA copy is a dominating leader, we use the operand as dominating
leader instead. This means we removed a user of a ssa copy and we should
decrement its use count, so we can remove the ssa copy once it becomes dead.

Fixes PR38804.

Reviewers: efriedma, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D51595

llvm-svn: 349217
2018-12-15 00:32:38 +00:00
Vedant Kumar 9d1827331f [Util] Refer to [s|z]exts of args when converting dbg.declares (fix PR35400)
When converting dbg.declares, if the described value is a [s|z]ext,
refer to the ext directly instead of referring to its operand.

This fixes a narrowing bug (the debugger got the sign of a variable
wrong, see llvm.org/PR35400).

The main reason to refer to the ext's operand was that an optimization
may remove the ext itself, leading to a dropped variable. Now that
InstCombine has been taught to use replaceAllDbgUsesWith (r336451), this
is less of a concern. Other passes can/should adopt this API as needed
to fix dropped variable bugs.

Differential Revision: https://reviews.llvm.org/D51813

llvm-svn: 349214
2018-12-15 00:03:33 +00:00
Michael Kruse ea9ef34558 [TransformWarning] Do not warn missed transformations in optnone functions.
Optimization transformations are intentionally disabled by the 'optnone'
function attribute. Therefore do not warn if transformation metadata is
still present.

Using the legacy pass manager structure, the `skipFunction` method takes
care for the optnone attribute (already called before this patch). For
the new pass manager, there is no equivalent, so we check for the
'optnone' attribute manually.

Differential Revision: https://reviews.llvm.org/D55690

llvm-svn: 349184
2018-12-14 19:45:43 +00:00
Michael Kruse 5948b7f30f [Transforms] Preserve metadata when converting invoke to call.
The `changeToCall` function did not preserve the invoke's metadata.
Currently, there is probably no metadata that depends on being applied
on a CallInst or InvokeInst. Therefore we can replace the instruction's
metadata.

This fixes http://llvm.org/PR39994

Suggested-by: Moritz Kreutzer <moritz.kreutzer@siemens.com>

Differential Revision: https://reviews.llvm.org/D55666

llvm-svn: 349170
2018-12-14 18:15:11 +00:00
Evgeniy Stepanov eb238ecf0f Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)"
Breaks sanitizer-android buildbot.

This reverts commit af8443a984c3b491c9ca2996b8d126ea31e5ecbe.

llvm-svn: 349092
2018-12-13 23:47:50 +00:00
Wei Mi 66c6c5abea [SampleFDO] handle ProfileSampleAccurate when initializing function entry count
ProfileSampleAccurate is used to indicate the profile has exact match to the
code to be optimized.

Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite
and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function
entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle
ProfileSampleAccurate in multiple places.

Differential Revision: https://reviews.llvm.org/D55660

llvm-svn: 349088
2018-12-13 21:51:42 +00:00
Nikita Popov dc73a6edde Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail"
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

The previous version of this patch did not perform dependency checking
properly: While the dependency is checked at the position of the memset,
the used size must be that of the memcpy. Previously the size of the
memset was used, which missed modification in the region
MemSetSize..CopySize, resulting in miscompiles. The added tests cover
variations of this issue.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 349078
2018-12-13 20:04:27 +00:00
Easwaran Raman 5a7056fa03 [ThinLTO] Compute synthetic function entry count
Summary:
This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.

Reviewers: tejohnson

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D43521

llvm-svn: 349076
2018-12-13 19:54:27 +00:00
Davide Italiano 9737096bb1 [LoopUtils] Use i32 instead of `void`.
The actual type of the first argument of the @dbg intrinsic
doesn't really matter as we're setting it to `undef`, but the
bitcode reader is picky about `void` types.

llvm-svn: 349069
2018-12-13 18:37:23 +00:00
Vitaly Buka a257639a69 [asan] Don't check ODR violations for particular types of globals
Summary:
private and internal: should not trigger ODR at all.
unnamed_addr: current ODR checking approach fail and rereport false violation if
a linker merges such globals
linkonce_odr, weak_odr: could cause similar problems and they are already not
instrumented for ELF.

Reviewers: eugenis, kcc

Subscribers: kubamracek, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55621

llvm-svn: 349015
2018-12-13 09:47:39 +00:00
David L. Jones 54c01ad6a9 Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail"
This revision caused trucated memsets for structs with padding. See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html

llvm-svn: 349002
2018-12-13 03:15:11 +00:00
Davide Italiano 8ee59ca653 [LoopUtils] Prefer a set over a map. NFCI.
llvm-svn: 348999
2018-12-13 01:11:52 +00:00
Davide Italiano 744c3c327f [LoopDeletion] Update debug values after loop deletion.
When loops are deleted, we don't keep track of variables modified inside
the loops, so the DI will contain the wrong value for these.

e.g.

int b() {

int i;
for (i = 0; i < 2; i++)
  ;
patatino();
return a;
-> 6 patatino();

7     return a;
8   }
9   int main() { b(); }
(lldb) frame var i
(int) i = 0

We mark instead these values as unavailable inserting a
@llvm.dbg.value(undef to make sure we don't end up printing an incorrect
value in the debugger. We could consider doing something fancier,
for, e.g. constants, in the future.

PR39868.
rdar://problem/46418795)

Differential Revision: https://reviews.llvm.org/D55299

llvm-svn: 348988
2018-12-12 23:32:35 +00:00
Nikita Popov 36e03ac6ee [InstCombine] Fix negative GEP offset evaluation for 32-bit pointers
This fixes https://bugs.llvm.org/show_bug.cgi?id=39908.

The evaluateGEPOffsetExpression() function simplifies GEP offsets for
use in comparisons against zero, basically by converting X*Scale+Offset==0
to X+Offset/Scale==0 if Scale divides Offset. However, before this is done,
Offset is masked down to the pointer size. This results in incorrect
results for negative Offsets, because we basically end up dividing the
32-bit offset *zero* extended to 64-bit bits (rather than sign extended).

Fix this by explicitly sign extending the truncated value.

Differential Revision: https://reviews.llvm.org/D55449

llvm-svn: 348987
2018-12-12 23:19:03 +00:00
Ryan Prichard e028c818f5 [hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)
Summary:
The change is needed to support ELF TLS in Android. See D55581 for the
same change in compiler-rt.

Reviewers: srhines, eugenis

Reviewed By: eugenis

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D55592

llvm-svn: 348983
2018-12-12 22:45:06 +00:00
Michael Kruse 7244852557 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288

llvm-svn: 348944
2018-12-12 17:32:52 +00:00
Mikael Holmen c06b01cb22 Fix compiler warning about unused variable [NFC]
llvm-svn: 348913
2018-12-12 06:33:45 +00:00
Gor Nishanov 20d833d5e3 [coroutines] Improve suspend point simplification
Summary:
Enable suspend point simplification for cases where:
* coro.save and coro.suspend are in different basic blocks
* where there are intervening intrinsics

Reviewers: modocache, tks2103, lewissbaker

Reviewed By: modocache

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D55160

llvm-svn: 348897
2018-12-11 21:23:09 +00:00
Fedor Sergeev a1d95c3fc4 [NewPM] fixing asserts on deleted loop in -print-after-all
IR-printing AfterPass instrumentation might be called on a loop
that has just been invalidated. We should skip printing it to
avoid spurious asserts.

Reviewed By: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D54740

llvm-svn: 348887
2018-12-11 19:05:35 +00:00
Vedant Kumar b3a7cae045 [HotColdSplitting] Disable outlining landingpad instructions (PR39917)
It's currently not safe to outline landingpad instructions (see
llvm.org/PR39917). Like @llvm.eh.typeid.for, the order and content of
previous landingpad instructions in a function alters the lowering of
subsequent landingpads by renumbering type info ID's. Outlining a
landingpad therefore breaks exception handling & unwinding.

llvm-svn: 348870
2018-12-11 18:05:31 +00:00
Sanjay Patel 2aa2dc76c2 [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)
call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM

This has the potential to create less-than-8-bit scalar types as shown in 
some of the test diffs, but it looks like the backend knows how to deal 
with that in these patterns. This is the simple part of the fix suggested in:
https://bugs.llvm.org/show_bug.cgi?id=39927

Differential Revision: https://reviews.llvm.org/D55529

llvm-svn: 348862
2018-12-11 16:38:03 +00:00
David Stenberg 2474ce5862 [DeadArgElim] Fixes for dbg.values using dead arg/return values
Summary:
When eliminating a dead argument or return value in a function with
local linkage, all uses, including in dbg.value intrinsics, would be
replaced with null constants. This would mean that, for example for an
integer argument, the debug info would incorrectly express that the
value is 0. Instead, replace all uses with undef to indicate that the
argument/return value is optimized out.

Also, make sure that metadata uses of return values are rewritten even
if there are no non-metadata uses of the value.

As a bit of historical curiosity, the code that emitted null constants
was introduced in the initial check-in of the pass in 2003, before
'undef' values even existed in LLVM.

This fixes PR23260.

Reviewers: dblaikie, aprantl, vsk, djtodoro

Reviewed By: aprantl

Subscribers: llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D55513

llvm-svn: 348837
2018-12-11 10:33:38 +00:00
Davide Italiano 8ec7709f58 [Local] Promote an utility that could be used elsewhere. NFCI.
llvm-svn: 348804
2018-12-10 22:17:04 +00:00
Matt Arsenault 9ccde61f81 InstCombine: Scalarize single use icmp/fcmp
llvm-svn: 348801
2018-12-10 21:50:54 +00:00
Nikita Popov 94b8e2ea4e [MemCpyOpt] memset->memcpy forwarding with undef tail
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

For the implementation, I'm reusing a bit of code for a similar existing
optimization (direct memcpy of undef). I've also added memset support to
MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
used, but it seems to make more sense to add this to GetLocation and thus
make the computation cachable.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 348645
2018-12-07 21:16:58 +00:00
Vedant Kumar 03f9f15b16 [HotColdSplitting] Refine definition of unlikelyExecuted
The splitting pass uses its 'unlikelyExecuted' predicate to statically
decide which blocks are cold.

- Do not treat noreturn calls as if they are cold unless they are actually
  marked cold. This is motivated by functions like exit() and longjmp(), which
  are not beneficial to outline.

- Do not treat inline asm as an outlining barrier. In practice asm("") is
  frequently used to inhibit basic block merging; enabling outlining in this case
  results in substantial memory savings.

- Treat invokes of cold functions as cold.

As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's
no longer needed. The pass can identify & outline blocks dominated by EH pads,
so there's no need to special-case __cxa_begin_catch etc.

Differential Revision: https://reviews.llvm.org/D54244

llvm-svn: 348640
2018-12-07 20:24:04 +00:00
Vedant Kumar 03aaa3e2aa [HotColdSplitting] Outline more than once per function
Algorithm: Identify maximal cold regions and put them in a worklist. If
a candidate region overlaps with another, discard it. While the worklist
is full, remove a single-entry sub-region from the worklist and attempt
to outline it. By the non-overlap property, this should not invalidate
parts of the domtree pertaining to other outlining regions.

Testing: LNT results on X86 are clean. With test-suite + externals, llvm
outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file
483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
outlines over 100 times in some functions (mostly EH paths). There was
not a significant performance impact pre vs. post-patch.

Differential Revision: https://reviews.llvm.org/D53887

llvm-svn: 348639
2018-12-07 20:23:52 +00:00
Nikita Popov 110cf05203 Reapply "[DemandedBits][BDCE] Support vectors of integers"
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)

The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348602
2018-12-07 15:38:13 +00:00
Max Kazantsev b9e65cbddf Introduce llvm.experimental.widenable_condition intrinsic
This patch introduces a new instinsic `@llvm.experimental.widenable_condition`
that allows explicit representation for guards. It is an alternative to using
`@llvm.experimental.guard` intrinsic that does not contain implicit control flow.

We keep finding places where `@llvm.experimental.guard` is not supported or
treated too conservatively, and there are 2 reasons to that:

- `@llvm.experimental.guard` has memory write side effect to model implicit control flow,
  and this sometimes confuses passes and analyzes that work with memory;
- Not all passes and analysis are aware of the semantics of guards. These passes treat them
  as regular throwing call and have no idea that the condition of guard may be used to prove
  something. One well-known place which had caused us troubles in the past is explicit loop
  iteration count calculation in SCEV. Another example is new loop unswitching which is not
  aware of guards. Whenever a new pass appears, we potentially have this problem there.

Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.

This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard`
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:


    %widenable_condition = call i1 @llvm.experimental.widenable.condition()
    %new_condition = and i1 %cond, %widenable_condition
    br i1 %new_condition, label %guarded, label %deopt

  guarded:
    ; Guarded instructions

  deopt:
    call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]

The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an
`undef`, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to `br` instuction, and it still can be
widened by replacing the result of `@llvm.experimental.widenable.condition()`
with `and` with any arbitrary boolean value (as long as the branch that is taken when
it is `false` has a deopt and has no side-effects).

For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".

This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.

The naming discussion is still ungoing. Merging this to unblock further items. We can
later change the name of this intrinsic.

Reviewed By: reames, fedor.sergeev, sanjoy
Differential Revision: https://reviews.llvm.org/D51207

llvm-svn: 348593
2018-12-07 14:39:46 +00:00
Markus Lavin 4dc4ebd606 [PM] Port LoadStoreVectorizer to the new pass manager.
Differential Revision: https://reviews.llvm.org/D54848

llvm-svn: 348570
2018-12-07 08:23:37 +00:00
Max Kazantsev a523a21175 [LoopSimplifyCFG] Do not deal with loops with irreducible CFG inside
The current algorithm that collects live/dead/inloop blocks relies on some invariants
related to RPO and PO traversals. In particular, the important fact it requires is that
the only loop's latch is the first block in PO traversal. It also relies on fact that during
RPO we visit all prececessors of a block before we visit this block (backedges ignored).

If a loop has irreducible non-loop cycle inside, both these assumptions may break.
This patch adds detection for this situation and prohibits the terminator folding
for loops with irreducible CFG.

We can in theory support this later, for this some algorithmic changes are needed.
Besides, irreducible CFG is not a frequent situation and we can just don't bother.

Thanks @uabelho for finding this!

Differential Revision: https://reviews.llvm.org/D55357
Reviewed By: skatkov

llvm-svn: 348567
2018-12-07 05:44:45 +00:00
Vedant Kumar b2a6f8e505 [CodeExtractor] Store outputs at the first valid insertion point
When CodeExtractor outlines values which are used by the original
function, it must store those values in some in-out parameter. This
store instruction must not be inserted in between a PHI and an EH pad
instruction, as that results in invalid IR.

This fixes the following verifier failure seen while outlining within
ObjC methods with live exit values:

  The unwind destination does not have an exception handling instruction!
    %call35 = invoke i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %exn.adjusted, i8* %1)
            to label %invoke.cont34 unwind label %lpad33, !dbg !4183
  The unwind destination does not have an exception handling instruction!
    invoke void @objc_exception_throw(i8* %call35) #12
            to label %invoke.cont36 unwind label %lpad33, !dbg !4184
  LandingPadInst not the first non-PHI instruction in the block.
    %3 = landingpad { i8*, i32 }
            catch i8* null, !dbg !1411

rdar://46540815

llvm-svn: 348562
2018-12-07 03:01:54 +00:00
Nikita Popov 14ca9a8355 Revert "[DemandedBits][BDCE] Support vectors of integers"
This reverts commit r348549. Causing assertion failures during
clang build.

llvm-svn: 348558
2018-12-07 00:42:03 +00:00
Nikita Popov cf65b9207b [DemandedBits][BDCE] Support vectors of integers
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

The getDemandedBits() method can now only be called on instructions that
have integer or vector of integer type. Previously it could be called on
any sized instruction (even if it was not particularly useful). The size
of the return value is now always the scalar size in bits (while
previously it was the type size in bits).

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348549
2018-12-06 23:50:32 +00:00
Adrian Prantl fbeeac0e1e Reapply "Adapt gcov to changes in CFE."
This reverts commit r348203 and reapplies D55085 with an additional
GCOV bugfix to make the change NFC for relative file paths in .gcno files.

Thanks to Ilya Biryukov for additional testing!

Original commit message:

    Update Diagnostic handling for changes in CFE.

    The clang frontend no longer emits the current working directory for
    DIFiles containing an absolute path in the filename: and will move the
    common prefix between current working directory and the file into the
    directory: component.

    https://reviews.llvm.org/D55085

llvm-svn: 348512
2018-12-06 18:44:48 +00:00
Alexandros Lamprineas e4c91f5c4c [GVN] Don't perform scalar PRE on GEPs
Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from
sinking the addressing mode computation of memory instructions back
to its uses. The problem comes from the insertion of PHIs, which
confuse CGP and make it bail.

I've autogenerated the check lines of an existing test and added a
store instruction to demonstrate the motivation behind this change.
The store is now using the gep instead of a phi.

Differential Revision: https://reviews.llvm.org/D55009

llvm-svn: 348496
2018-12-06 16:11:58 +00:00
Ilya Biryukov cb5331eb93 Revert "[LoopSimplifyCFG] Delete dead in-loop blocks"
This reverts commit r348457.
The original commit causes clang to crash when doing an instrumented
build with a new pass manager. Reverting to unbreak our integrate.

llvm-svn: 348484
2018-12-06 13:21:01 +00:00
Roman Lebedev 98cb1216a6 [InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts
I was finally able to quantify what i thought was missing in the fix,
it was vector constants. If we have a scalar (and %x, -1),
it will be instsimplified before we reach this code,
but if it is a vector, we may still have a -1 element.

Thus, we want to avoid the fold if *at least one* element is -1.
Or in other words, ignoring the undef elements, no sign bits
should be set. Thus, m_NonNegative().

A follow-up for rL348181
https://bugs.llvm.org/show_bug.cgi?id=39861

llvm-svn: 348462
2018-12-06 08:14:24 +00:00
Max Kazantsev 0b1d069d64 [LoopSimplifyCFG] Delete dead in-loop blocks
This patch teaches LoopSimplifyCFG to delete loop blocks that have
become unreachable after terminator folding has been done.

Differential Revision: https://reviews.llvm.org/D54023
Reviewed By: anna

llvm-svn: 348457
2018-12-06 05:45:02 +00:00
Sanjay Patel 998ececef0 [InstCombine] remove dead code from visitExtractElement
Extracting from a splat constant is always handled by InstSimplify.
Move the test for this from InstCombine to InstSimplify to make
sure that stays true.

llvm-svn: 348423
2018-12-05 23:09:33 +00:00
Sanjay Patel 47b3b4b5aa [InstCombine] reduce duplication in visitExtractElementInst; NFC
llvm-svn: 348418
2018-12-05 21:57:51 +00:00
Vedant Kumar 09415a850e [CodeExtractor] Do not marked outlined calls which may resume EH as noreturn
Treat terminators which resume exception propagation as returning instructions
(at least, for the purposes of marking outlined functions `noreturn`). This is
to avoid inserting traps after calls to outlined functions which unwind.

rdar://46129950

llvm-svn: 348404
2018-12-05 19:35:37 +00:00
Christian Bruel 4ead99b3ac Allow norecurse attribute on functions that have debug infos.
Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks.

Reviewers: chandlerc, jmolloy, aprantl

Reviewed By: aprantl

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55187

llvm-svn: 348381
2018-12-05 16:48:00 +00:00
Sanjay Patel baffae91b2 [InstCombine] simplify icmps with same operands based on dominating cmp
The tests here are based on the motivating cases from D54827.

More background:
1. We don't get these cases in general with SimplifyCFG because the root
   of the pattern match is an icmp, not a branch. I'm not sure how often
   we encounter this pattern vs. the seemingly more likely case with 
   branches, but I don't see evidence to leave the minimal pattern
   unoptimized.

2. This has a chance of increasing compile-time because we're using a
   ValueTracking call to handle the match. The motivating cases could be
   handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/
   isImpliedFalseByMatchingCmp, but I saw that we have a more 
   comprehensive wrapper around those, so we might as well use it here
   unless there's evidence that it's significantly slower.

3. Ideally, we'd handle the fold to constants in InstSimplify, but as
   with the existing code here, we could extend this to handle cases
   where the result is not a constant, but a new combined predicate.
   That would mean splitting the logic across the 2 passes and possibly
   duplicating the pattern-matching cost.

4. As mentioned in D54827, this seems like the kind of thing that should
   be handled in Correlated Value Propagation, but that pass is currently
   limited to dealing with instructions with constant operands, so extending
   this bit of InstCombine is the smallest/easiest way to get these patterns 
   optimized.

llvm-svn: 348367
2018-12-05 15:04:00 +00:00
Alina Sbirlea 0e216854f9 [LICM] *Actually* disable ControlFlowHoisting.
Summary:
The remaining code paths that ControlFlowHoisting introduced that were
not disabled, increased compile time by 3x for some benchmarks.
The time is spent in DominatorTree updates.

Reviewers: john.brawn, mkazantsev

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D55313

llvm-svn: 348345
2018-12-05 10:16:21 +00:00