Commit Graph

22955 Commits

Author SHA1 Message Date
Djordje Todorovic 9b9e995819 [DebugInfo][EarlyCSE] Use the salvageDebugInfoOrMarkUndef(); NFC
Use the newest API.

Differential Revision: https://reviews.llvm.org/D71061
2019-12-09 13:57:35 +01:00
David Green be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
Florian Hahn c491949694 [LV] Pick correct BB as insert point when fixing PHI for FORs.
Currently we fail to pick the right insertion point when
PreviousLastPart of a first-order-recurrence is a PHI node not in the
LoopVectorBody. This can happen when PreviousLastPart is produce in a
predicated block. In that case, we should pick the insertion point in
the BB the PHI is in.

Fixes PR44020.

Reviewers: hsaito, fhahn, Ayal, dorit

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D71071
2019-12-07 19:32:00 +00:00
Florian Hahn c25de56905 [SimplifyCFG] Account for N being null.
Fixes a crash, e.g.
  http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15119/
2019-12-07 17:23:42 +00:00
Rodrigo Caetano Rocha d714aa0dfd [SimplifyCFG] Handle AssumptionCache being null.
AssumptionCache can be null in SimplifyCFGOptions. However, FoldCondBranchOnPHI() was not properly handling that when passing a null AssumptionCache to simplifyCFG.

Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com>

Reviewers: fhahn, lebedev.ri, spatel

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D69963
2019-12-07 16:54:49 +00:00
Florian Hahn e60b36cf92 [VPlan] Rename VPlanHCFGTransforms to VPlanTransforms (NFC).
The file is intended to gather various VPlan transformations, not only
CFG related transforms. Actually, the only transformation there is not
CFG related.

Reviewers: Ayal, gilr, hsaito, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D70732
2019-12-07 08:56:35 +00:00
Teresa Johnson c8e36862f5 [WPD] Remove unused parameter (NFC)
Remove unused parameter.
2019-12-06 13:14:21 -08:00
Wenlei He 7b61ae68ec [AutoFDO] Inline replay for cold/small callees from sample profile loader
Summary:
Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout.

This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70750
2019-12-06 11:44:45 -08:00
Sanjay Patel 43e2a901e1 Revert "[InstCombine] reduce code duplication; NFC"
This reverts commit db57396584.
At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.
2019-12-06 14:24:14 -05:00
Sanjay Patel b6d6f5470f Revert "[InstCombine] improve readability; NFC"
This reverts commit 7250ef3613.
At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.
2019-12-06 14:20:44 -05:00
Sanjay Patel 142a75a9b1 Revert "[InstCombine] reduce indentation; NFC"
This reverts commit 8bf8ef7116.
At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.
2019-12-06 14:19:02 -05:00
Sanjay Patel 8bf8ef7116 [InstCombine] reduce indentation; NFC 2019-12-06 13:26:45 -05:00
Sanjay Patel 7250ef3613 [InstCombine] improve readability; NFC
CreateIntCast returns the input if its type matches, so need to duplicate that check.
2019-12-06 13:26:45 -05:00
Sanjay Patel db57396584 [InstCombine] reduce code duplication; NFC 2019-12-06 13:26:45 -05:00
Sanjay Patel 6bb62a9d97 [InstCombine] improve readability; NFC 2019-12-06 13:26:44 -05:00
Gil Rapaport 39ccc099c9 [LV] Record GEP widening decisions in recipe (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit moves GEP operand queries controlling how GEPs are widened to a
dedicated recipe and extracts GEP widening code to its own ILV method taking
those recorded decisions as arguments. This reduces ingredient def-use usage by
ILV as a step towards full VPlan-based def-use relations.

Differential revision: https://reviews.llvm.org/D69067
2019-12-06 13:41:19 +02:00
Daniil Suchkov c4d8c6319f [LCSSA] Don't use VH callbacks to invalidate SCEV when creating LCSSA phis
In general ValueHandleBase::ValueIsRAUWd shouldn't be called when not
all uses of the value were actually replaced, though, currently
formLCSSAForInstructions calls it when it inserts LCSSA-phis.

Calls of ValueHandleBase::ValueIsRAUWd were added to LCSSA specifically
to update/invalidate SCEV. In the best case these calls duplicate some
of the work already done by SE->forgetValue, though in case when SCEV of
the value is SCEVUnknown, SCEV replaces the underlying value of
SCEVUnknown with the new value (i.e. acts like LCSSA-phi actually fully
replaces the value it is created for), which leads to SCEV being
corrupted because LCSSA-phi rarely dominates all uses of its inputs.

Fixes bug https://bugs.llvm.org/show_bug.cgi?id=44058.

Reviewers: fhahn, efriedma, reames, sanjoy.google

Reviewed By: fhahn

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70593
2019-12-06 13:21:49 +07:00
Teresa Johnson 54a3c2a81e [ThinLTO] Add option to disable readonly/writeonly attribute propagation
Summary:
Add an option to allow the attribute propagation on the index to be
disabled, to allow a workaround for issues (such as that fixed by
D70977).

Also move the setting of the WithAttributePropagation flag on the index
into propagateAttributes(), and remove some old stale code that predated
this flag and cleared the maybe read/write only bits when we need to
disable the propagation (previously only when importing disabled, now
also when the new option disables it).

Reviewers: evgeny777, steven_wu

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70984
2019-12-05 16:33:54 -08:00
Wenlei He 532196d811 [AutoFDO] Top-down Inlining for specialization with context-sensitive profile
Summary:
AutoFDO's sample profile loader processes function in arbitrary source code order, so if I change the order of two functions in source code, the inline decision can change. This also prevented the use of context-sensitive profile to do specialization while inlining. This commit enforces SCC top-down order for sample profile loader. With this change, we can now do specialization, as illustrated by the added test case:

Say if we have A->B->C and D->B->C call path, we want to inline C into B when root inliner is B, but not when root inliner is A or D, this is not possible without enforcing top-down order. E.g. Once C is inlined into B, A and D can only choose to inline (B->C) as a whole or nothing, but what we want is only inline B into A and D, not its recursive callee C. If we process functions in top-down order, this is no longer a problem, which is what this commit is doing.

This change is guarded with a new switch "-sample-profile-top-down-load" for tuning, and it depends on D70653. Eventually, top-down can be the default order for sample profile loader.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits, tejohnson

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70655
2019-12-05 16:07:01 -08:00
Wenlei He e503fd85d3 [AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function
Summary:
When sample profile loader decides not to inline a previously inlined call-site, we adjust the profile of outlined function simply by scaling up its profile counts by call-site count. This means the context-sensitive profile of that inlined instance will be thrown away. This commit try to keep context-sensitive profile for such cases:

 - Instead of scaling outlined function's profile, we now properly merge the FunctionSamples of inlined instance into outlined function, including all recursively inlined profile.
 - Instead of adjusting the profile for negative inline decision at the end of the sample profile loader pass, we do the profile merge right after processing each function. This change paired with top-down ordering of annotation/inline-replay (a separate diff) will make sure we recursively merge profile back before the profile is used for annotation and inline replay.

A new switch -sample-profile-merge-inlinee is added to enable the new profile merge for tuning. It should be the default behavior eventually.

Reviewers: wmi, davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70653
2019-12-05 15:57:55 -08:00
Florian Hahn 19071173fc Revert "[DSE] Fix for a dangling point bug in DeadStoreElimination."
The commit causes a failure:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20911

This reverts commit 1847fd9d85.
2019-12-05 19:29:21 +00:00
Evgenii Stepanov 6f89cbc429 LowerDbgDeclare: look through bitcasts.
Summary:
Emit a value debug intrinsic (with OP_deref) when an alloca address is
passed to a function call after going through a bitcast.

This generates an FP or SP-relative location for the local variable in
the following case:
  int x;
  use((void *)&x;

Reviewers: aprantl, vsk, pcc

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70752
2019-12-05 11:19:07 -08:00
Bob Haarman 055779a9ac Revert "[InstCombine] keep assumption before sinking calls"
Summary:
This reverts commit c3b06d0c39.

Reason for revert: Caused miscompiles when inserting assume for undef.

Also adds a test to prevent similar breakage in future.

Fixes PR44154.

Reviewers: rnk, jdoerfert, efriedma, xbolva00

Reviewed By: rnk

Subscribers: thakis, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70933
2019-12-05 10:39:34 -08:00
Roman Lebedev 796fa662f1
[InstCombine] Invert `add A, sext(B) --> sub A, zext(B)` canonicalization (to `sub A, zext B -> add A, sext B`)
Summary:
D68408 proposes to greatly improve our negation sinking abilities.
But in current canonicalization, we produce `sub A, zext(B)`,
which we will consider non-canonical and try to sink that negation,
undoing the existing canonicalization.
So unless we explicitly stop producing previous canonicalization,
we will have two conflicting folds, and will end up endlessly looping.

This inverts canonicalization, and adds back the obvious fold
that we'd miss:
* `sub [nsw] Op0, sext/zext (bool Y) -> add [nsw] Op0, zext/sext (bool Y)`
  https://rise4fun.com/Alive/xx4
* `sext(bool) + C -> bool ? C - 1 : C`
  https://rise4fun.com/Alive/fBl

It is obvious that `@ossfuzz_9880()` / `@lshr_out_of_range()`/`@ashr_out_of_range()`
(oss-fuzz 4871) are no longer folded as much, though those aren't really worrying.

Reviewers: spatel, efriedma, t.p.northover, hfinkel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71064
2019-12-05 21:21:30 +03:00
Ankit 1847fd9d85 [DSE] Fix for a dangling point bug in DeadStoreElimination.
The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction.

While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration.

In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container.

Patch by Ankit <quic_aankit@quicinc.com>

Reviewers: fhahn, bcahoon, efriedma

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D65326
2019-12-05 17:53:58 +00:00
Sanjay Patel 3c6b5d3674 [InstCombine] narrow select with FP casts
Select doesn't change values, so truncate of extended operand cancels out.
2019-12-05 11:12:44 -05:00
Sanjay Patel 51e420c27e [InstCombine] add FMF guard to builder in fptrunc transform; NFC
This makes no difference currently because we don't apply FMF
to FP casts, but that may change.

This could also be a place to add a fold for select with fptrunc,
so it will make that patch easier/smaller.
2019-12-05 10:55:07 -05:00
Roman Lebedev 09311459e3
[InstCombine] Extend `0 - (X sdiv C) -> (X sdiv -C)` fold to non-splat vectors
Split off from https://reviews.llvm.org/D68408
2019-12-05 15:48:29 +03:00
Teresa Johnson e420c0c78e [ThinLTO] Fix importing of writeonly variables in distributed ThinLTO
Summary:
D69561/dde5893 enabled importing of readonly variables with references,
however, it introduced a bug relating to importing/internalization of
writeonly variables with references.

A fix for this was added in D70006/7f92d66. But this didn't work in
distributed ThinLTO mode. The reason is that the fix (importing the
writeonly var with a zeroinitializer) was only applied when there were
references on the writeonly var summary. In distributed ThinLTO mode,
where we only have a small slice of the index, we will not have the
references on the importing side if we are not importing those
referenced values. Rather than changing this handshaking (which will
require a lot of other changes, since that's how we know what to import
in the distributed backend clang invocation), we can simply always give
the writeonly variable a zero initializer.

Reviewers: evgeny777, steven_wu

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70977
2019-12-04 14:59:27 -08:00
Tozer 8b0780f795 Revert "[DebugInfo] Recover debug intrinsics when killing duplicated/empty basic blocks"
This reverts commit 72ce759928.

Reverted due to build failure.
2019-12-04 18:47:08 +00:00
Vedant Kumar f208b70fbc Revert "[Coverage] Revise format to reduce binary size"
This reverts commit e18531595b.

On Windows, there is an error:

http://lab.llvm.org:8011/builders/sanitizer-windows/builds/54963/steps/stage%201%20check/logs/stdio

error: C:\b\slave\sanitizer-windows\build\stage1\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-merging.cpp.tmp.v1.o: Failed to load coverage: Malformed coverage data
2019-12-04 10:35:14 -08:00
Vedant Kumar e18531595b [Coverage] Revise format to reduce binary size
Revise the coverage mapping format to reduce binary size by:

1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.

This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).

Rationale for changes to the format:

- With the current format, most coverage function records are discarded.
  E.g., more than 97% of the records in llc are *duplicate* placeholders
  for functions visible-but-not-used in TUs. Placeholders *are* used to
  show under-covered functions, but duplicate placeholders waste space.

- We reached general consensus about giving (1) a try at the 2017 code
  coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
  duplicates is simpler than alternatives like teaching build systems
  about a coverage-aware database/module/etc on the side.

- Revising the format is expensive due to the backwards compatibility
  requirement, so we might as well compress filenames while we're at it.
  This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).

See CoverageMappingFormat.rst for the details on what exactly has
changed.

Fixes PR34533 [2], hopefully.

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533

Differential Revision: https://reviews.llvm.org/D69471
2019-12-04 10:10:55 -08:00
Florian Hahn e8a5c17211 [LoopInterchange] Improve inner exit loop safety checks.
The PHI node checks for inner loop exits are too permissive currently.
As indicated by an existing comment, we should only allow LCSSA PHI
nodes that are part of reductions or are only used outside of the loop
nest. We ensure this by checking the users of the LCSSA PHIs.
Specifically, it is not safe to use an exiting value from the inner loop in the latch of the outer
loop.

It also moves the inner loop exit check before the outer loop exit
check.

Fixes PR43473.

Reviewers: efriedma, mcrosier

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D68144
2019-12-04 17:46:01 +00:00
Francesco Petrogalli a249551bb2 [llvm][Transform] Remove unused variable. [NFCI]
The variable prevents compiling when using -Werror=unused-variable.
2019-12-04 17:40:30 +00:00
Hiroshi Yamauchi 62d429972e [PGO][PGSO] Distinguish queries from unit tests and explicitly enable for the existing IR passes only. NFC.
Summary:
This is one more prep step necessary before the code gen pass instrumentation
code could go in.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70988
2019-12-04 09:35:50 -08:00
stozer 72ce759928 [DebugInfo] Recover debug intrinsics when killing duplicated/empty basic blocks
When basic blocks are killed, either due to being empty or to being an if.then
or if.else block whose complement contains identical instructions, some of the
debug intrinsics in that block are lost. This patch sinks those intrinsics
into the single successor block, setting them Undef if necessary to
prevent debug info from falling out-of-date.

Differential Revision: https://reviews.llvm.org/D70318
2019-12-04 16:01:49 +00:00
Florian Hahn 4a9cde5a79 [SimpleLoopUnswitch] Invalidate the topmost loop with ExitBB as exiting.
SCEV caches the exiting blocks when computing exit counts. In
SimpleLoopUnswitch, we split the exit block of the loop to unswitch.

Currently we only invalidate the loop containing that exit block, but if
that block is the exiting block for a parent loop, we have stale cache
entries. We have to invalidate the top-most loop that contains the exit
block as exiting block. We might also be able to skip invalidating the
loop containing the exit block, if the exit block is not an exiting
block of that loop.

There are also 2 more places in SimpleLoopUnswitch, that use a similar
problematic approach to get the loop to invalidate. If the patch makes
sense, I will also update those places to a similar approach (they deal
with multiple exit blocks, so we cannot directly re-use
getTopMostExitingLoop).

Fixes PR43972.

Reviewers: skatkov, reames, asbirlea, chandlerc

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D70786
2019-12-04 11:32:09 +00:00
Ehud Katz 2b6b8cb10c [APFloat] Prevent construction of APFloat with Semantics and FP value
Constructor invocations such as `APFloat(APFloat::IEEEdouble(), 0.0)`
may seem like they accept a FP (floating point) value, but the overload
they reach is actually the `integerPart` one, not a `float` or `double`
overload (which only exists when `fltSemantics` isn't passed).

This may lead to possible loss of data, by the conversion from `float`
or `double` to `integerPart`.

To prevent future mistakes, a new constructor overload, which accepts
any FP value and marked with `delete`, to prevent its usage.

Fixes PR34095.

Differential Revision: https://reviews.llvm.org/D70425
2019-12-04 12:02:04 +02:00
Craig Topper 5ebbabc1af [InstCombine] Revert aafde063aa and 6749dc3446 related to bitcast handling of x86_mmx
This reverts these two commits
[InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double.
[InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement

We're seeing at least one internal test failure related to a
bitcast that was previously before an inline assembly block
containing emms being placed after it. This leads to the mmx
state ending up not empty after the emms. IR has no way to
make any specific guarantees about this. Reverting these patches
to get back to previous behavior which at least worked for this
test.
2019-12-03 14:02:22 -08:00
Ayal Zaks 6ed9cef25f [LV] Scalar with predication must not be uniform
Fix PR40816: avoid considering scalar-with-predication instructions as also
uniform-after-vectorization.

Instructions identified as "scalar with predication" will be "vectorized" using
a replicating region. If such instructions are also optimized as "uniform after
vectorization", namely when only the first of VF lanes is used, such a
replicating region becomes erroneous - only the first instance of the region can
and should be formed. Fix such cases by not considering such instructions as
"uniform after vectorization".

Differential Revision: https://reviews.llvm.org/D70298
2019-12-03 19:50:24 +02:00
Anton Afanasyev a315519c17 [SLP] Enhance SLPVectorizer to vectorize different combinations of aggregates
Summary:
Make SLPVectorize to recognize homogeneous aggregates like
`{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`,
`[2 x {float, float}]` and so on.
It's a follow-up of https://reviews.llvm.org/D70068.
Merged `findBuildVector()` and `findBuildAggregate()` to
one `findBuildAggregate()` function making it recursive
to recognize multidimensional aggregates. Aggregates required
to be homogeneous.

Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70587
2019-12-03 19:29:27 +03:00
Florian Hahn e9c68422de [VPlan] Add dump function to VPlan class.
This adds a dump() function to VPlan, which uses the existing
operator<<.

This method provides a convenient way to dump a VPlan while debugging,
e.g. from lldb.

Reviewers: hsaito, Ayal, gilr, rengolin

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D70920
2019-12-03 11:59:10 +00:00
Johannes Altmanninger 09667bc192 [asan] Remove debug locations from alloca prologue instrumentation
Summary:
This fixes https://llvm.org/PR26673
"Wrong debugging information with -fsanitize=address"
where asan instrumentation causes the prologue end to be computed
incorrectly: findPrologueEndLoc, looks for the first instruction
with a debug location to determine the prologue end.  Since the asan
instrumentation instructions had debug locations, that prologue end was
at some instruction, where the stack frame is still being set up.

There seems to be no good reason for extra debug locations for the
asan instrumentations that set up the frame; they don't have a natural
source location.  In the debugger they are simply located at the start
of the function.

For certain other instrumentations like -fsanitize-coverage=trace-pc-guard
the same problem persists - that might be more work to fix, since it
looks like they rely on locations of the tracee functions.

This partly reverts aaf4bb2394
"[asan] Set debug location in ASan function prologue"
whose motivation was to give debug location info to the coverage callback.
Its test only ensures that the call to @__sanitizer_cov_trace_pc_guard is
given the correct source location; as the debug location is still set in
ModuleSanitizerCoverage::InjectCoverageAtBlock, the test does not break.
So -fsanitize-coverage is hopefully unaffected - I don't think it should
rely on the debug locations of asan-generated allocas.

Related revision: 3c6c14d14b
"ASAN: Provide reliable debug info for local variables at -O0."

Below is how the X86 assembly version of the added test case changes.
We get rid of some .loc lines and put prologue_end where the user code starts.

```diff
--- 2.master.s	2019-12-02 12:32:38.982959053 +0100
+++ 2.patch.s	2019-12-02 12:32:41.106246674 +0100
@@ -45,8 +45,6 @@
 	.cfi_offset %rbx, -24
 	xorl	%eax, %eax
 	movl	%eax, %ecx
- .Ltmp2:
- 	.loc	1 3 0 prologue_end      # 2.c:3:0
 	cmpl	$0, __asan_option_detect_stack_use_after_return
 	movl	%edi, 92(%rbx)          # 4-byte Spill
 	movq	%rsi, 80(%rbx)          # 8-byte Spill
@@ -57,9 +55,7 @@
 	callq	__asan_stack_malloc_0
 	movq	%rax, 72(%rbx)          # 8-byte Spill
 .LBB1_2:
- 	.loc	1 0 0 is_stmt 0         # 2.c:0:0
 	movq	72(%rbx), %rax          # 8-byte Reload
- 	.loc	1 3 0                   # 2.c:3:0
 	cmpq	$0, %rax
 	movq	%rax, %rcx
 	movq	%rax, 64(%rbx)          # 8-byte Spill
@@ -72,9 +68,7 @@
 	movq	%rax, %rsp
 	movq	%rax, 56(%rbx)          # 8-byte Spill
 .LBB1_4:
- 	.loc	1 0 0                   # 2.c:0:0
 	movq	56(%rbx), %rax          # 8-byte Reload
- 	.loc	1 3 0                   # 2.c:3:0
 	movq	%rax, 120(%rbx)
 	movq	%rax, %rcx
 	addq	$32, %rcx
@@ -99,7 +93,6 @@
 	movb	%r8b, 31(%rbx)          # 1-byte Spill
 	je	.LBB1_7
 # %bb.5:
- 	.loc	1 0 0                   # 2.c:0:0
 	movq	40(%rbx), %rax          # 8-byte Reload
 	andq	$7, %rax
 	addq	$3, %rax
@@ -118,7 +111,8 @@
 	movl	%ecx, (%rax)
 	movq	80(%rbx), %rdx          # 8-byte Reload
 	movq	%rdx, 128(%rbx)
-	.loc	1 4 3 is_stmt 1         # 2.c:4:3
+.Ltmp2:
+	.loc	1 4 3 prologue_end      # 2.c:4:3
 	movq	%rax, %rdi
 	callq	f
 	movq	48(%rbx), %rax          # 8-byte Reload
```

Reviewers: eugenis, aprantl

Reviewed By: eugenis

Subscribers: ormris, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70894
2019-12-03 11:24:17 +01:00
Bill Wendling 87f146767e Place the "cold" code piece into the same section as the original function
Summary:
This cropped up in the Linux kernel where cold code was placed in an
incompatible section.

Reviewers: compnerd, vsk, tejohnson

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70925
2019-12-02 15:24:59 -08:00
Hiroshi Yamauchi 8cdfdfeee6 [PGO][PGSO] Add an optional query type parameter to shouldOptimizeForSize.
Summary:
In case of a need to distinguish different query sites for gradual commit or
debugging of PGSO. NFC.

Reviewers: davidxl

Subscribers: hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70510
2019-12-02 13:54:13 -08:00
Florian Hahn fe459ce65a [VPlan] Move graph traits (NFC).
By defining the graph traits right after the VPBlockBase definitions, we
can make use of them earlier in the file.

Reviewers: hsaito, Ayal, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D70733
2019-12-02 18:23:11 +00:00
Sanjay Patel af4e59949c [InstCombine] fix undef propagation for vector urem transform (PR44186)
As described here:
https://bugs.llvm.org/show_bug.cgi?id=44186

The match() code safely allows undef values, but we can't safely
propagate a vector constant that contains an undef to the new
compare instruction.
2019-12-02 12:17:38 -05:00
Simon Tatham 01aefae4a1 [ARM,MVE] Add an InstCombine rule permitting VPNOT.
Summary:
If a user writing C code using the ACLE MVE intrinsics generates a
predicate and then complements it, then the resulting IR will use the
`pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit
integer; complement that integer; and convert back. This will generate
machine code that moves the predicate out of the `P0` register,
complements it in an integer GPR, and moves it back in again.

This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement
of the original predicate vector, which we can already instruction-
select as the VPNOT instruction which complements P0 in place.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70484
2019-12-02 16:20:30 +00:00
Roman Lebedev 0f22e783a0
[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() (PR44100)
rL341831 moved one-use check higher up, restricting a few folds
that produced a single instruction from two instructions to the case
where the inner instruction would go away.

Original commit message:
> InstCombine: move hasOneUse check to the top of foldICmpAddConstant
>
> There were two combines not covered by the check before now,
> neither of which actually differed from normal in the benefit analysis.
>
> The most recent seems to be because it was just added at the top of the
> function (naturally). The older is from way back in 2008 (r46687)
> when we just didn't put those checks in so routinely, and has been
> diligently maintained since.

From the commit message alone, there doesn't seem to be a
deeper motivation, deeper problem that was trying to solve,
other than 'fixing the wrong one-use check'.

As i have briefly discusses in IRC with Tim, the original motivation
can no longer be recovered, too much time has passed.

However i believe that the original fold was doing the right thing,
we should be performing such a transformation even if the inner `add`
will not go away - that will still unchain the comparison from `add`,
it will no longer need to wait for `add` to compute.

Doing so doesn't seem to break any particular idioms,
as least as far as i can see.

References https://bugs.llvm.org/show_bug.cgi?id=44100
2019-12-02 18:06:15 +03:00
Sanjay Patel af0babc90a [InstCombine] fold copysign with constant sign argument to (fneg+)fabs
If the sign of the sign argument is known (this could be extended to use ValueTracking),
then we can use fneg+fabs to clear/set the sign bit of the magnitude argument.
http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic

This transform is already done in DAGCombiner, but we can do it sooner in IR as
suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153

We have effectively no analysis for copysign in IR, so we are taking the unusual step
of increasing the number of IR instructions for the negative constant case.

Differential Revision: https://reviews.llvm.org/D70792
2019-12-02 09:23:12 -05:00