Commit Graph

8358 Commits

Author SHA1 Message Date
Nikita Popov 353d92decb [DemandedBits] Use SetVector for Worklist
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.

Differential Revision: https://reviews.llvm.org/D56362

llvm-svn: 350547
2019-01-07 18:03:36 +00:00
Chandler Carruth 57578aaf96 [CallSite removal] Port `IndirectCallSiteVisitor` to use `CallBase` and
update client code.

Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.

Differential Revision: https://reviews.llvm.org/D56183

llvm-svn: 350508
2019-01-07 07:15:51 +00:00
Chandler Carruth 363ac68374 [CallSite removal] Migrate all Alias Analysis APIs to use the newly
minted `CallBase` class instead of the `CallSite` wrapper.

This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.

Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.

Thanks for the review from Saleem!

Differential Revision: https://reviews.llvm.org/D55641

llvm-svn: 350503
2019-01-07 05:42:51 +00:00
Nikita Popov 6658fce4fc [BDCE] Remove dead uses of arguments
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.

I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.

Differential Revision: https://reviews.llvm.org/D56247

llvm-svn: 350435
2019-01-04 21:21:43 +00:00
Florian Hahn 7902405c42 [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :

  ByteOffset += GEPOffset.getSExtValue();

The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.

Changing it to

  ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());

resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.

See also
 https://reviews.llvm.org/D24729
 https://reviews.llvm.org/D24772

This commit must be merged after D38662 in order for the test to pass.

Patch by Michael Ferguson <mpfergu@gmail.com>.

Reviewers: reames, sanjoy, hfinkel

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D38501

llvm-svn: 350395
2019-01-04 14:53:22 +00:00
Simon Pilgrim c2aadfaaad [SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123)
Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics

llvm-svn: 350300
2019-01-03 12:18:23 +00:00
Hal Finkel 4f2381440d [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).

Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).

After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).

As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.

Patch by me with contributions from Michael Ferguson (mferguson@cray.com).

Differential Revision: https://reviews.llvm.org/D38662

llvm-svn: 350220
2019-01-02 16:28:09 +00:00
Nikita Popov bc9986e9ad Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 350188
2019-01-01 10:05:26 +00:00
George Burgess IV 69952979da [MemoryLocation] Use LocationSize instead of ints; NFC
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

This one sadly isn't *super* small, but all of the changes here are
either to:
- libfuncs that are passed a constant size (memcpy, memset, ...)
- instructions that store/load a constant size

So they have to be precise

llvm-svn: 350017
2018-12-23 03:36:44 +00:00
George Burgess IV 8c5413f3f7 [Loads] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

This tries to find literal loads/stores of the given type, so this has
to be precise.

llvm-svn: 350016
2018-12-23 03:10:56 +00:00
George Burgess IV 1329cf1791 [Lint] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350015
2018-12-23 02:50:08 +00:00
George Burgess IV 685e781d55 [AAEval] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350014
2018-12-23 02:39:58 +00:00
George Burgess IV 640be69249 [Analysis] More LocationSize cleanup; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350008
2018-12-22 18:23:21 +00:00
Vedant Kumar b264d69de7 [IR] Add Instruction::isLifetimeStartOrEnd, NFC
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.

This was suggested as a cleanup in D55967.

Differential Revision: https://reviews.llvm.org/D56019

llvm-svn: 349964
2018-12-21 21:49:40 +00:00
Reid Kleckner 84a2d29681 [BasicAA] Fix AA bug on dynamic allocas and stackrestore
Summary:
BasicAA has special logic for unescaped allocas, which normally applies
equally well to dynamic and static allocas. However, llvm.stackrestore
has the power to end the lifetime of dynamic allocas, without referring
to them directly.

stackrestore is already marked with the most conservative memory
modification attributes, but because the alloca is not escaped, the
normal logic produces incorrect results. I think BasicAA needs a special
case here to teach it about the relationship between dynamic allocas and
stackrestore.

Fixes PR40118

Reviewers: gbiv, efriedma, george.burgess.iv

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55969

llvm-svn: 349945
2018-12-21 19:59:03 +00:00
Florian Hahn ef307b8c26 [LAA] Avoid generating RT checks for known deps preventing vectorization.
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.

This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.

Reviewers: Ayal, anemet, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D55798

llvm-svn: 349794
2018-12-20 18:49:09 +00:00
Clement Courbet d4bd3eb85d [NFC] Fix trailing comma after function.
lib/Analysis/VectorUtils.cpp:482:2: warning: extra ‘;’ [-Wpedantic]

llvm-svn: 349732
2018-12-20 09:20:07 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Nikita Popov 3817ee7908 Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This reverts commit r349674. It causes a failure in
test-suite enc-3des.execution_time.

llvm-svn: 349684
2018-12-19 22:09:02 +00:00
Nikita Popov 649e125451 [BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 349674
2018-12-19 19:56:21 +00:00
Sanjay Patel 798c5982a0 [ValueTracking] remove unused parameters from helper functions; NFC
llvm-svn: 349641
2018-12-19 16:49:18 +00:00
Florian Hahn 485f2826ba [LAA] Introduce enum for vectorization safety status (NFC).
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.

Reviewers: anemet, anna, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D54892

llvm-svn: 349556
2018-12-18 22:25:11 +00:00
Pete Cooper be4f571107 Change the objc ARC optimizer to use the new objc.* intrinsics
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics.  This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.

Differential Revision: https://reviews.llvm.org/D55348

Reviewers: ahatanak
llvm-svn: 349534
2018-12-18 20:32:49 +00:00
Artur Pilipenko 2a0146e0fd [CaptureTracking] Pass MaxUsesToExplore from wrappers to the actual implementation
This is a follow up for rL347910. In the original patch I somehow forgot to pass
the limit from wrappers to the function which actually does the job.

llvm-svn: 349438
2018-12-18 03:32:33 +00:00
Nikita Popov 221f3fc750 [InstSimplify] Simplify saturating add/sub + icmp
If a saturating add/sub has one constant operand, then we can
determine the possible range of outputs it can produce, and simplify
an icmp comparison based on that.

The implementation is based on a similar existing mechanism for
simplifying binary operator + icmps.

Differential Revision: https://reviews.llvm.org/D55735

llvm-svn: 349369
2018-12-17 17:45:18 +00:00
Wei Mi 66c6c5abea [SampleFDO] handle ProfileSampleAccurate when initializing function entry count
ProfileSampleAccurate is used to indicate the profile has exact match to the
code to be optimized.

Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite
and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function
entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle
ProfileSampleAccurate in multiple places.

Differential Revision: https://reviews.llvm.org/D55660

llvm-svn: 349088
2018-12-13 21:51:42 +00:00
Easwaran Raman 5a7056fa03 [ThinLTO] Compute synthetic function entry count
Summary:
This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.

Reviewers: tejohnson

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D43521

llvm-svn: 349076
2018-12-13 19:54:27 +00:00
David L. Jones 54c01ad6a9 Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail"
This revision caused trucated memsets for structs with padding. See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html

llvm-svn: 349002
2018-12-13 03:15:11 +00:00
Michael Kruse 7244852557 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288

llvm-svn: 348944
2018-12-12 17:32:52 +00:00
Wei Mi 7da5a08e1a [SampleFDO] Extend profile-sample-accurate option to cover isFunctionColdInCallGraph
For SampleFDO, when a callsite doesn't appear in the profile, it will not be marked as cold callsite unless the option -profile-sample-accurate is specified.

But profile-sample-accurate doesn't cover function isFunctionColdInCallGraph which is used to decide whether a function should be put into text.unlikely section, so even if the user knows the profile is accurate and specifies profile-sample-accurate, those functions not appearing in the sample profile are still not be put into text.unlikely section right now.

The patch fixes that.

Differential Revision: https://reviews.llvm.org/D55567

llvm-svn: 348940
2018-12-12 17:09:27 +00:00
Nikita Popov 79c994d976 [ConstantFolding] Handle leading zero-size elements in load folding
Struct types may have leading zero-size elements like [0 x i32], in
which case the "real" element at offset 0 will not necessarily coincide
with the 0th element of the aggregate. ConstantFoldLoadThroughBitcast()
wants to drill down the element at offset 0, but currently always picks
the 0th aggregate element to do so. This patch changes the code to find
the first non-zero-size element instead, for the struct case.

The motivation behind this change is https://github.com/rust-lang/rust/issues/48627.
Rust is fond of emitting [0 x iN] separators between struct elements to
enforce alignment, which prevents constant folding in this particular case.

The additional tests with [4294967295 x [0 x i32]] check that we don't
end up unnecessarily looping over a large number of zero-size elements
of a zero-size array.

Differential Revision: https://reviews.llvm.org/D55169

llvm-svn: 348895
2018-12-11 20:29:16 +00:00
Fedor Sergeev a1d95c3fc4 [NewPM] fixing asserts on deleted loop in -print-after-all
IR-printing AfterPass instrumentation might be called on a loop
that has just been invalidated. We should skip printing it to
avoid spurious asserts.

Reviewed By: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D54740

llvm-svn: 348887
2018-12-11 19:05:35 +00:00
Nikita Popov 94b8e2ea4e [MemCpyOpt] memset->memcpy forwarding with undef tail
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

For the implementation, I'm reusing a bit of code for a similar existing
optimization (direct memcpy of undef). I've also added memset support to
MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
used, but it seems to make more sense to add this to GetLocation and thus
make the computation cachable.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 348645
2018-12-07 21:16:58 +00:00
Nikita Popov 110cf05203 Reapply "[DemandedBits][BDCE] Support vectors of integers"
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)

The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348602
2018-12-07 15:38:13 +00:00
Nikita Popov 14ca9a8355 Revert "[DemandedBits][BDCE] Support vectors of integers"
This reverts commit r348549. Causing assertion failures during
clang build.

llvm-svn: 348558
2018-12-07 00:42:03 +00:00
Nikita Popov cf65b9207b [DemandedBits][BDCE] Support vectors of integers
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

The getDemandedBits() method can now only be called on instructions that
have integer or vector of integer type. Previously it could be called on
any sized instruction (even if it was not particularly useful). The size
of the return value is now always the scalar size in bits (while
previously it was the type size in bits).

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348549
2018-12-06 23:50:32 +00:00
David L. Jones 5ff7b8a04a Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants"
This change caused SEGVs in instcombine. (The r347934 change seems to me to be a
precipitating cause, not a root cause. Details are on the llvm-commits thread
for r347934.)

llvm-svn: 348426
2018-12-05 23:13:50 +00:00
Sanjay Patel 3d5bb15a1d [CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC
The old function underspecified the return type, took an unused parameter,
and had a misleading name.

llvm-svn: 348292
2018-12-04 18:53:27 +00:00
Sanjay Patel 472652ef68 [CmpInstAnalysis] fix formatting; NFC
There are potential improvements to the structure of this API
raised by D54994, but remove some cosmetic blemishes before
making any functional changes.

llvm-svn: 348149
2018-12-03 15:48:30 +00:00
Fedor Sergeev 7254d3c51c Fixing -print-module-scope for legacy SCC passes
It appears that print-module-scope was not implemented for legacy SCC passes.
Fixed to print a whole module instead of just current SCC.

Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D54793

llvm-svn: 348144
2018-12-03 14:48:15 +00:00
Nikita Popov 687b92cd9c [ValueTracking] Support funnel shifts in computeKnownBits()
If the shift amount is known, we can determine the known bits of the
output based on the known bits of two inputs.

This is essentially the same functionality as implemented in D54869,
but for ValueTracking rather than InstCombine SimplifyDemandedBits.

Differential Revision: https://reviews.llvm.org/D55140

llvm-svn: 348091
2018-12-02 14:14:11 +00:00
Sanjay Patel 7d82d37854 [ValueTracking] add helper function for testing implied condition; NFCI
We were duplicating code around the existing isImpliedCondition() that
checks for a predecessor block/dominating condition, so make that a
wrapper call.

llvm-svn: 348088
2018-12-02 13:26:03 +00:00
Teresa Johnson 5b8ff375c8 [ThinLTO] Allow importing of functions with var args
Summary:
Follow up to D54270, which allowed importing of var args functions
unless they called va_start. As pointed out in the post-commit comments
on that patch, the inliner can handle functions that call va_start in
certain situations as well. Go ahead and enable importing of all var
args functions. Measurements on a large binary show that this increases
imports and binary size by an insignificant amount.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54607

llvm-svn: 348068
2018-12-01 05:11:46 +00:00
Nicolai Haehnle 413f8691ab LegacyDivergenceAnalysis: fix uninitialized value
Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26
llvm-svn: 348051
2018-11-30 23:07:49 +00:00
Nicolai Haehnle 56d0ed2a50 [DA] GPUDivergenceAnalysis for unstructured GPU kernels
Summary:
This is patch #3 of the new DivergenceAnalysis

  <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html>

The GPUDivergenceAnalysis is intended to eventually supersede the existing
LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces
incorrect results on unstructured Control-Flow Graphs:

  <https://bugs.llvm.org/show_bug.cgi?id=37185>

This patch adds the option -use-gpu-divergence-analysis to the
LegacyDivergenceAnalysis to turn it into a transparent wrapper for the
GPUDivergenceAnalysis.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D53493

llvm-svn: 348048
2018-11-30 22:55:20 +00:00
Renato Golin de4b88e5ac Fix parenthesis warning in IVDescriptors
llvm-svn: 347990
2018-11-30 13:54:36 +00:00
Renato Golin 135e72e1b9 Add a new reduction pattern match
Adding a new reduction pattern match for vectorizing code similar
to TSVC s3111:

for (int i = 0; i < N; i++)
  if (a[i] > b)
    sum += a[i];

This patch adds support for fadd, fsub and fmull, as well as multiple
branches and different (but compatible) instructions (ex. add+sub) in
different branches.

The difference from the previous patch(https://reviews.llvm.org/D49168)
is as follows:
 - Added check of fast-math property of fp-instruction to the
   previous patch
 - Fix/add some pattern for if-reduction.ll


Differential Revision: https://reviews.llvm.org/D54464

Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>
     and Masakazu Ueno <masakazu.ueno@linaro.org>

llvm-svn: 347989
2018-11-30 13:40:10 +00:00
Warren Ristow 72d1f3a285 [SCEV] Guard movement of insertion point for loop-invariants
r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D54713

llvm-svn: 347934
2018-11-30 00:02:54 +00:00
Artur Pilipenko eba2365f23 Introduce MaxUsesToExplore argument to capture tracking
Currently CaptureTracker gives up if it encounters a value with more than 20 
uses. The motivation for this cap is to keep it relatively cheap for 
BasicAliasAnalysis use case, where the results can't be cached. Although, other 
clients of CaptureTracker might be ok with higher cost. This patch introduces an 
argument for PointerMayBeCaptured functions to specify the max number of uses to 
explore. The motivation for this change is a downstream user of CaptureTracker, 
but I believe upstream clients of CaptureTracker might also benefit from more 
fine grained cap.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D55042

llvm-svn: 347910
2018-11-29 20:08:12 +00:00
Sanjay Patel d802270808 [InstSimplify] fold select with implied condition
This is an almost direct move of the functionality from InstCombine to 
InstSimplify. There's no reason not to do this in InstSimplify because 
we never create a new value with this transform.

(There's a question of whether any dominance-based transform belongs in
either of these passes, but that's a separate issue.)

I've changed 1 of the conditions for the fold (1 of the blocks for the 
branch must be the block we started with) into an assert because I'm not 
sure how that could ever be false.

We need 1 extra check to make sure that the instruction itself is in a
basic block because passes other than InstCombine may be using InstSimplify
as an analysis on values that are not wired up yet.

The 3-way compare changes show that InstCombine has some kind of 
phase-ordering hole. Otherwise, we would have already gotten the intended
final result that we now show here.

llvm-svn: 347896
2018-11-29 18:44:39 +00:00
Artur Pilipenko 8b92c1d142 NFC. Use unsigned type for uses counter in CaptureTracking
llvm-svn: 347826
2018-11-29 02:15:35 +00:00
Nikita Popov cf596a8c26 [ValueTracking] Determine always-overflow condition for unsigned sub
Always-overflow was already determined for unsigned addition, but
not subtraction. This patch establishes parity.

This allows us to perform some additional simplifications for
signed saturating subtractions.

This change is part of https://reviews.llvm.org/D54534.

llvm-svn: 347771
2018-11-28 16:37:04 +00:00
Vitaly Buka 44abeb5c3c [stack-safety] Update comment
llvm-svn: 347626
2018-11-27 01:56:44 +00:00
Vitaly Buka 7792f5f145 [stack-safety] Fix and uncomment assert
llvm-svn: 347625
2018-11-27 01:56:35 +00:00
Vitaly Buka 769bff18be [stack-safety] Fix build on gcc 5.4
llvm-svn: 347624
2018-11-27 01:56:26 +00:00
JF Bastien 3c242438ec Fix debug build break
Comment out an assertion from D54543 which failed with error: no member named 'Range' in '(anonymous namespace)::PassAsArgInfo'.

llvm-svn: 347616
2018-11-26 23:48:47 +00:00
Vitaly Buka 42b050673e [stack-safety] Inter-Procedural Analysis implementation
Summary:
IPA is implemented as module pass which produce map from Function or Alias to
StackSafetyInfo for a single function.

From prototype by Evgenii Stepanov and Vlad Tsyrklevich.

Reviewers: eugenis, vlad.tsyrklevich, pcc, glider

Subscribers: hiraditya, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D54543

llvm-svn: 347611
2018-11-26 23:05:58 +00:00
Vitaly Buka b8e6fa6638 [stack-safety] Empty local passes for Stack Safety Global Analysis
Reviewers: eugenis, vlad.tsyrklevich

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54541

llvm-svn: 347610
2018-11-26 23:05:48 +00:00
Vitaly Buka fa98c074b7 [stack-safety] Local analysis implementation
Summary:
Analysis produces StackSafetyInfo which contains information with how allocas
and parameters were used in functions.

From prototype by Evgenii Stepanov and  Vlad Tsyrklevich.

Reviewers: eugenis, vlad.tsyrklevich, pcc, glider

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54504

llvm-svn: 347603
2018-11-26 21:57:59 +00:00
Vitaly Buka 4493fe1c1b [stack-safety] Empty local passes for Stack Safety Local Analysis
Reviewers: eugenis, vlad.tsyrklevich

Subscribers: mgorny, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54502

llvm-svn: 347602
2018-11-26 21:57:47 +00:00
Nikita Popov f94c8f0d1b [DemandedBits] Add support for funnel shifts
Add support for funnel shifts to the DemandedBits analysis. The
demanded bits of the first two operands can be determined if the
shift amount is constant. The demanded bits of the third operand
(shift amount) can be determined if the bitwidth is a power of two.

This is basically the same functionality as implemented in D54869
and D54478, but for DemandedBits rather than InstCombine.

Differential Revision: https://reviews.llvm.org/D54876

llvm-svn: 347561
2018-11-26 15:36:57 +00:00
Joel Jones 7459398a43 Revert unapproved commit
llvm-svn: 347511
2018-11-24 07:26:55 +00:00
Joel Jones 5f533c5fe1 [AArch64] Enable libm vectorized functions via SLEEF
This changeset is modeled after Intel's submission for SVML. It enables
trigonometry functions vectorization via SLEEF: http://sleef.org/.

 * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
 * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
 * A comprehensive test case is included in this changeset.
 * In a separate changeset (for clang), a new vectorization library argument is
   added to -fveclib: -fveclib=SLEEF.

Trigonometry functions that are vectorized by sleef:

acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma

Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D53927

llvm-svn: 347510
2018-11-24 06:41:39 +00:00
John Regehr 3a1c9d55cc [LVI] run transfer function for binary operator even when the RHS isn't a constant
LVI was symbolically executing binary operators only when the RHS was
constant, missing the case where we have a ConstantRange for the RHS,
but not an actual constant. Tested using check-all and by
bootstrapping. Compile time is not impacted measurably.

Differential Revision: https://reviews.llvm.org/D19859

llvm-svn: 347379
2018-11-21 05:24:12 +00:00
Sanjay Patel 14ab9170b8 [InstSimplify] fold funnel shifts with undef operands
Splitting these off from the D54666.

Patch by: nikic (Nikita Popov)

llvm-svn: 347332
2018-11-20 17:34:59 +00:00
Sanjay Patel eea21da12a [InstructionSimplify] Add support for saturating add/sub
Add support for saturating add/sub in InstructionSimplify. In particular, the following simplifications are supported:

    sat(X + 0) -> X
    sat(X + undef) -> -1
    sat(X uadd MAX) -> MAX
    (and commutative variants)

    sat(X - 0) -> X
    sat(X - X) -> 0
    sat(X - undef) -> 0
    sat(undef - X) -> 0
    sat(0 usub X) -> 0
    sat(X usub MAX) -> 0

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54532

llvm-svn: 347330
2018-11-20 17:20:26 +00:00
Sanjay Patel efc3d1dfaa [ConstantFolding] Add support for saturating add/sub
Support saturating add/sub in constant folding, based on the APInt methods introduced in D54332.

Patch by: @nikic (Nikita Popov)

Differential Revision: https://reviews.llvm.org/D54531

llvm-svn: 347328
2018-11-20 17:05:55 +00:00
Vedant Kumar 4de31bba51 [IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock
Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.

This can be more efficient than using pred_size(), which is a linear
time operation.

We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).

With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.

See llvm.org/PR39702 for a harder but more general way of achieving
similar results.

Differential Revision: https://reviews.llvm.org/D54686

llvm-svn: 347256
2018-11-19 19:54:27 +00:00
Anna Thomas 5e9215f02b [LV] Avoid vectorizing unsafe dependencies in uniform address
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.

Fixes PR39653.

Reviewers: Ayal, efriedma

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D54538

llvm-svn: 347220
2018-11-19 15:39:59 +00:00
Fedor Sergeev 3a3d688cc8 [LoopPass] fixing 'Modification' messages in -debug-pass=Executions for loop passes
Legacy loop pass manager is issuing "Made Modification" message after each Loop Pass
run, however condition for issuing it is accumulated among all the runs.
That leads to confusing 'modification' messages as soon as the first modification is done.

Changing condition to be "current pass made modifications", similar to how
it is being done in all other pass managers.

llvm-svn: 347215
2018-11-19 15:10:59 +00:00
Vedant Kumar e7b789b529 [ProfileSummary] Standardize methods and fix comment
Every Analysis pass has a get method that returns a reference of the Result of
the Analysis, for example, BlockFrequencyInfo
&BlockFrequencyInfoWrapperPass::getBFI().  I believe that
ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning
a pointer.

Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock,
respectively.  Most methods use BB as the argument of variable names while
methods usually refer to Basic Blocks as Blocks, instead of BB.  For example,
Function::getEntryBlock, Loop:getExitBlock, etc.

I also fixed one of the comments.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D54669

llvm-svn: 347182
2018-11-19 05:23:16 +00:00
Fangrui Song 7570932977 Use llvm::copy. NFC
llvm-svn: 347126
2018-11-17 01:44:25 +00:00
Eugene Leviant bf46e7410c [ThinLTO] Internalize readonly globals
An attempt to recommit r346584 after failure on OSX build bot.
Fixed cache key computation in ThinLTOCodeGenerator and added
test case

llvm-svn: 347033
2018-11-16 07:08:00 +00:00
Sanjay Patel e98ec77a95 [InstSimplify] delete shift-of-zero guard ops around funnel shifts
This is a problem seen in common rotate idioms as noted in:
https://bugs.llvm.org/show_bug.cgi?id=34924

Note that we are not canonicalizing standard IR (shifts and logic) to the intrinsics yet. 
(Although I've written this before...) I think this is the last step before we enable 
that transform. Ie, we could regress code by doing that transform without this 
simplification in place.

In PR34924, I questioned whether this is a valid transform for target-independent IR, 
but I convinced myself this is ok. If we're speculating a funnel shift by turning cmp+br 
into select, then SimplifyCFG has already determined that the transform is justified. 
It's possible that SimplifyCFG is not taking into account profile or other metadata, 
but if that's true, then it's a bug independent of funnel shifts.

Also, we do have CGP code to restore a guard like this around an intrinsic if it can't 
be lowered cheaply. But that isn't necessary for funnel shift because the default 
expansion in SelectionDAGBuilder includes this same cmp+select.

Differential Revision: https://reviews.llvm.org/D54552

llvm-svn: 346960
2018-11-15 14:53:37 +00:00
Teresa Johnson 32dc5b9bf1 [ThinLTO] Update handling of vararg functions to match inliner
Summary:
Previously we marked all vararg functions as non-inlinable in the
function summary, which prevented their importing. However, the
corresponding inliner restriction was loosened in r321940/r342675
to only apply to functions calling va_start. Adjust the summary
flag computation to match.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54270

llvm-svn: 346883
2018-11-14 19:30:13 +00:00
Simon Pilgrim 2b166c5044 [TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue
llvm-svn: 346868
2018-11-14 15:04:08 +00:00
Alina Sbirlea b4d088d090 [MemorySSA] Create query after checking if instruction is a fence.
The alternative is checking if I is a fence in the Query constructor, so
as to not attempt to get a non-existent MemoryLocation.

llvm-svn: 346798
2018-11-13 21:12:49 +00:00
Steven Wu fa43892d6f Revert "[ThinLTO] Internalize readonly globals"
This reverts commit 10c84a8f35cae4a9fc421648d9608fccda3925f2.

llvm-svn: 346768
2018-11-13 17:35:04 +00:00
Florian Hahn 86ed347bcd [VectorUtils] Use namespace for InterleaveGroup template specialization.
llvm-svn: 346759
2018-11-13 16:26:34 +00:00
Florian Hahn a4dc7feeea [VPlan] VPlan version of InterleavedAccessInfo.
This patch turns InterleaveGroup into a template with the instruction type
being a template parameter. It also adds a VPInterleavedAccessInfo class, which
only contains a mapping from VPInstructions to their respective InterleaveGroup.
As we do not have access to scalar evolution in VPlan, we can re-use
convert InterleavedAccessInfo to VPInterleavedAccess info.


Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D49489

llvm-svn: 346758
2018-11-13 15:58:18 +00:00
Simon Pilgrim 077a42ca9f [TTI] Make TargetTransformInfo::getOperandInfo static. NFCI.
It has no member dependencies and this makes it easier to reuse in other cost analysis code.

llvm-svn: 346755
2018-11-13 13:45:10 +00:00
Sanjay Patel 1456fd7614 [VectorUtils] add funnel-shifts to the list of vectorizable intrinsics
This just identifies the intrinsics as candidates for vectorization.
It does not mean we will attempt to vectorize under normal conditions
(the test file is forcing vectorization). 

The cost model must be fixed to show that the transform is profitable 
in general.

Allowing vectorization with these intrinsics is required to avoid
potential regressions from canonicalizing to the intrinsics from
generic IR:
https://bugs.llvm.org/show_bug.cgi?id=37417

llvm-svn: 346661
2018-11-12 15:20:14 +00:00
Sanjay Patel 0f4f4806b3 [VectorUtils] reorder list of vectorizable intrinsics; NFC
We need to add funnel-shifts to this list, so clean up
the random order before it gets worse.

llvm-svn: 346660
2018-11-12 15:10:30 +00:00
Max Kazantsev 7d49a3a816 [LICM] Hoist guards from non-header blocks
This patch relaxes overconservative checks on whether or not we could write
memory before we execute an instruction. This allows us to hoist guards out of
loops even if they are not in the header block.

Differential Revision: https://reviews.llvm.org/D50891
Reviewed By: fedor.sergeev

llvm-svn: 346643
2018-11-12 09:29:58 +00:00
Eugene Leviant be8d19967a [ThinLTO] Internalize readonly globals
This patch allows internalising globals if all accesses to them
(from live functions) are from non-volatile load instructions

Differential revision: https://reviews.llvm.org/D49362

llvm-svn: 346584
2018-11-10 08:31:21 +00:00
Simon Pilgrim 26e1c887f5 [TTI] Flip vector types in getShuffleCost SK_ExtractSubvector call
For SK_ExtractSubvector, the default 'Ty' type is the source operand type and 'SubTy' is the destination subvector type

I got this the wrong way around when I added rL346510

llvm-svn: 346534
2018-11-09 18:30:59 +00:00
Simon Pilgrim d0c71609c5 [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368)
Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks.

llvm-svn: 346510
2018-11-09 16:28:19 +00:00
Max Kazantsev 65cb9d79a2 [SCEV][NFC] Verify IR in isLoop[Entry,Backedge]GuardedByCond
We have a lot of various bugs that are caused by misuse of SCEV (in particular in LV),
all of them can simply be described as "we ask SCEV to prove some fact on invalid IR".
Some of examples of those are PR36311, PR37221, PR39160.

The problem is that these failues manifest differently (what we saw was failure of various
asserts across SCEV, but there can also be miscompiles). This patch adds an assert into two
SCEV methods that strongly rely on correctness of the IR and are involved in known failues.
This will at least allow us to have a clear indication of what was wrong in this case.

This patch also fixes a unit test with incorrect IR that fails this verification.

Differential Revision: https://reviews.llvm.org/D52930
Reviewed By: fhahn

llvm-svn: 346389
2018-11-08 05:07:58 +00:00
Matt Arsenault 8ba740a5a8 Allow subclassing ExternalAA
This allows testing AMDGPU alias analysis like any
other alias analysis pass. This fixes the existing
test pointlessly running opt -O3 when it really
just wants to run the one analysis.

Before there was no way to test this using -aa-eval
with opt, since the default constructed pass
is run. The wrapper subclass allows the
default constructor to pass the necessary callback.

llvm-svn: 346353
2018-11-07 20:26:42 +00:00
James Y Knight 72f76bf230 Add support for llvm.is.constant intrinsic (PR4898)
This adds the llvm-side support for post-inlining evaluation of the
__builtin_constant_p GCC intrinsic.

Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call
to a function where canConstantFoldTo returns true, and one of the
arguments is a struct.

Updated from patch initially by Janusz Sobczak.

Differential Revision: https://reviews.llvm.org/D4276

llvm-svn: 346322
2018-11-07 15:24:12 +00:00
Calixte Denizet 8f07efc7c5 Fix unit tests after patch https://reviews.llvm.org/rL346313
Summary: Tests are broken so fix them.

Reviewers: marco-c

Reviewed By: marco-c

Subscribers: sylvestre.ledru, llvm-commits

Differential Revision: https://reviews.llvm.org/D54208

llvm-svn: 346318
2018-11-07 14:46:26 +00:00
Calixte Denizet c3bed1e8e6 [GCOV] Flush counters before to avoid counting the execution before fork twice and for exec** functions we must flush before the call
Summary:
This is replacement for patch in https://reviews.llvm.org/D49460.
When we fork, the counters are duplicate as they're and so the values are finally wrong when writing gcda for parent and child.
So just before to fork, we flush the counters and so the parent and the child have new counters set to zero.
For exec** functions, we need to flush before the call to have some data.

Reviewers: vsk, davidxl, marco-c

Reviewed By: marco-c

Subscribers: llvm-commits, sylvestre.ledru, marco-c

Differential Revision: https://reviews.llvm.org/D53593

llvm-svn: 346313
2018-11-07 13:49:17 +00:00
Teresa Johnson cb397461e1 [ThinLTO] Split NotEligibleToImport into legality and inlinability flags
Summary:
The NotEligibleToImport flag on the GlobalValueSummary was set if it
isn't legal to import (e.g. because it references unpromotable locals)
and when it can't be inlined (in which case importing is pointless).

I split out the inlinable piece into a separate flag on the
FunctionSummary (doesn't make sense for aliases or global variables),
because in the future we may want to import for reasons other than
inlining.

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53345

llvm-svn: 346261
2018-11-06 19:41:35 +00:00
Eli Friedman e3a5fc6d80 Disable calls to *_finite and other glibc-only functions on Musl.
Non-GNU environments don't have __finite_*, so treat them as
unavailable.

Differential Revision: https://reviews.llvm.org/D51282

llvm-svn: 346250
2018-11-06 18:23:32 +00:00
Max Kazantsev 4855b74f8b [NFC] Turn collectTransitivePredecessors into a static function
llvm-svn: 346217
2018-11-06 09:07:03 +00:00
Sanjay Patel 1440107821 [InstSimplify] fold select (fcmp X, Y), X, Y
This is NFCI for InstCombine because it calls InstSimplify, 
so I left the tests for this transform there. As noted in
the code comment, we can allow this fold more often by using
FMF and/or value tracking.

llvm-svn: 346169
2018-11-05 21:51:39 +00:00
David Green ba9f245b0d [Inliner] Penalise inlining of calls with loops at Oz
We currently seem to underestimate the size of functions with loops in them,
both in terms of absolute code size and in the difficulties of dealing with
such code. (Calls, for example, can be tail merged to further reduce
codesize). At -Oz, we can then increase code size by inlining small loops
multiple times.

This attempts to penalise functions with loops at -Oz by adding a CallPenalty
for each top level loop in the function. It uses LI (and hence DT) to calculate
the number of loops. As we are dealing with minsize, the inline threshold is
small and functions at this point should be relatively small, making the
construction of these cheap.

Differential Revision: https://reviews.llvm.org/D52716

llvm-svn: 346134
2018-11-05 14:54:34 +00:00
Sanjay Patel e7c94ef1de [ValueTracking] determine sign of 0.0 from select when matching min/max FP
In PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475
..we may fail to recognize/simplify fabs() in some cases because we do not 
canonicalize fcmp with a -0.0 operand.

Adding that canonicalization can cause regressions on min/max FP tests, so 
that's this patch: for the purpose of determining whether something is min/max, 
let the value returned by the select determine how we treat a 0.0 operand in the fcmp.

This patch doesn't actually change the -0.0 to +0.0. It just changes the analysis, so 
we don't fail to recognize equivalent min/max patterns that only differ in the 
signbit of 0.0.

Differential Revision: https://reviews.llvm.org/D54001

llvm-svn: 346097
2018-11-04 14:28:48 +00:00
Sanjay Patel cac28b452e [ValueTracking] peek through 2-input shuffles in ComputeNumSignBits
This patch gives the IR ComputeNumSignBits the same functionality as the 
DAG version (the code is derived from the existing code).

This an extension of the single input shuffle analysis added with D53659.

Differential Revision: https://reviews.llvm.org/D53987

llvm-svn: 346071
2018-11-03 13:18:55 +00:00
Easwaran Raman c5e1506ec8 [ProfileSummary] Add options to override hot and cold count thresholds.
Summary:
The hot and cold count thresholds are derived from the summary, but for
debugging purposes it is convenient to provide the actual thresholds.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54040

llvm-svn: 346005
2018-11-02 17:39:31 +00:00