Summary:
This factors cost and reporting out of the inlining workflow, thus
making it easier to reuse when driving inlining from the upcoming
InliningAdvisor.
Depends on: D79215
Reviewers: davidxl, echristo
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79275
Since every AbstractAttribute so far, and for the foreseeable future,
corresponds to a single IRPosition we can simplify the class structure.
We already did this for IRAttribute but there is no reason to stop
there.
When the shufflevector mask operand was converted into special
instruction data, the FunctionComparator was not updated to
account for this. As such, MergeFuncs will happily merge
shufflevectors with different masks.
This fixes https://bugs.llvm.org/show_bug.cgi?id=45773.
Differential Revision: https://reviews.llvm.org/D79261
Summary:
shouldInline makes a decision based on the InlineCost of a call site, as
well as an evaluation on whether the site should be deferred. This means
it's possible for the decision to be not to inline, even for an
InlineCost that would otherwise allow it.
Both uses of shouldInline performed the exact same logic after calling
it. In addition, the decision on whether to inline or not was
communicated through two values of the Option<InlineCost> return value:
None, or an InlineCost evaluating to false.
Simplified by:
- encapsulating the decision in the return object. The bool it evaluates
to communicates unambiguously the decision. The InlineCost is also
available.
- encapsulated the common post-shouldInline code into shouldInline.
Reviewers: davidxl, echristo, eraman
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79215
Summary:
Make foldVectorBinop return null if the instruction type is a scalable
vector. It is unclear what, if any, of this function works with scalable
vectors.
Identified by test LLVM.Transforms/InstCombine::nsw.ll
Reviewers: efriedma, david-arm, fpetrogalli, spatel
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79196
This fixes potential reference invalidations, when no lattice value is
assigned for CopyOf. As the state of CopyOf won't change while in
handleCallResult, we can get a copy once and use that.
Should fix PR45749.
In D74183 clang started emitting alignment for sret parameters
unconditionally. This caused a 1.5% compile-time regression on
tramp3d-v4. The reason is that we now generate many instance of IR like
%ptrint = ptrtoint %class.GuardLayers* %guards_m to i64
%maskedptr = and i64 %ptrint, 3
%maskcond = icmp eq i64 %maskedptr, 0
tail call void @llvm.assume(i1 %maskcond)
to preserve the alignment information during inlining. Based on IR
analysis, these assumptions also regress optimization. The attached
phase ordering test case illustrates two issues: One are instruction
count based optimization heuristics, which are affected by the four
additional instructions of the assumption. The other is blocking of
SROA due to ptrtoint casts (PR45763).
We already encountered the same problem in Rust, where we (unlike
Clang) generally prefer to emit alignment information absolutely
everywhere it is available. We were only able to do this after
hardcoding -preserve-alignment-assumptions-during-inlining=false,
because we were seeing significant optimization and compile-time
regressions otherwise.
This patch disables -preserve-alignment-assumptions-during-inlining
by default, because we should not be punishing people for adding
more alignment annotations.
Once the assume bundle work shakes out and we can represent (and use)
alignment assumptions using assume bundles, it should be possible to
re-enable this with reduced overhead.
Differential Revision: https://reviews.llvm.org/D76886
Summary:
Refactor getInterestingMemoryOperands() so that information about the
pointer operand is returned through an array of structures instead of
passing each piece of information separately by-value.
This is in preparation for returning information about multiple pointer
operands from a single instruction.
A side effect is that, instead of repeatedly generating the same
information through isInterestingMemoryAccess(), it is now simply collected
once and then passed around; that's probably more efficient.
HWAddressSanitizer has a bunch of copypasted code from AddressSanitizer,
so these changes have to be duplicated.
This is patch 3/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
[glider: renamed llvm::InterestingMemoryOperand::Type to OpType to fix
GCC compilation]
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77618
Summary:
In both AddressSanitizer and HWAddressSanitizer, we first collect
instructions whose operands should be instrumented and memory intrinsics,
then instrument them. Both during collection and when inserting
instrumentation, they are handled separately.
Collect them separately and instrument them separately. This is a bit
more straightforward, and prepares for collecting operands instead of
instructions in a future patch.
This is patch 2/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77617
Summary:
A following commit will split the loop over ToInstrument into two.
To avoid having to duplicate the condition for suppressing instrumentation
sites based on ClDebug{Min,Max}, refactor it out into a new function.
While we're at it, we can also avoid the indirection through
NumInstrumented for setting FunctionModified.
This is patch 1/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77616
I couldn't make arc land the changes properly, for some reason they all got
squashed. Reverting them now to land cleanly.
Summary: This reverts commit cfb5f89b62.
Reviewers: kcc, thejh
Subscribers:
Summary:
A following commit will split the loop over ToInstrument into two.
To avoid having to duplicate the condition for suppressing instrumentation
sites based on ClDebug{Min,Max}, refactor it out into a new function.
While we're at it, we can also avoid the indirection through
NumInstrumented for setting FunctionModified.
This is patch 1/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: jfb, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77616
Summary: There is no need to create BPI explicitly. It should be requested through AM in a normal way.
Reviewers: skatkov
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79080
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.
Reviewers: skatkov, taewookoh, yrouban
Reviewed By: skatkov
Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78987
Summary:
Renamed 'CS' to 'CB', and, in one case, to a more specific name to avoid
naming collision with outer scope (a maintainability/readability reason,
not correctness)
Also updated comments.
Reviewers: davidxl, dblaikie, jdoerfert
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79101
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.
This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.
This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.
A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.
Reviewed By: @craig.topper
Differential Revision: https://reviews.llvm.org/D78216
The crash that caused the original revert has been fixed in
a3c964a278. I also added a reduced version of the crash reproducer.
This reverts the revert commit 2107af9ccf.
This allows forward declarations of PointerCheck, which in turn reduce
the number of times LoopAccessAnalysis needs to be included.
Ultimately this helps with moving runtime check generation to
Transforms/Utils/LoopUtils.h, without having to include it there.
Reviewers: anemet, Ayal
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78458
Summary:
Refactored the parameter and return type where they are too generally
typed as Instruction.
Reviewers: dblaikie, wmi, craig.topper
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79027
In `InstCombiner::visitAdd()`, we have
```
// A+B --> A|B iff A and B have no bits set in common.
if (haveNoCommonBitsSet(LHS, RHS, DL, &AC, &I, &DT))
return BinaryOperator::CreateOr(LHS, RHS);
```
so we should handle such `or`'s here, too.
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.
The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.
getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.
This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.
Differential Revision: https://reviews.llvm.org/D78635
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Add llvm.call.preallocated.{setup,arg} instrinsics.
Add "preallocated" operand bundle which takes a token produced by llvm.call.preallocated.setup.
Add "preallocated" parameter attribute, which is like byval but without the copy.
Verifier changes for these IR constructs.
See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74651
We may want to identify sequences that are not
reductions, but still qualify as load-combines
in the back-end, so make most of the body a
helper function.
When folding tail, branch taken count is computed during initial VPlan execution
and recorded to be used by the compare computing the loop's mask. This recording
should directly set the State, instead of reusing Value2VPValue mapping which
serves original Values present prior to vectorization.
The branch taken count may be a constant Value, which may be used elsewhere in
the loop; trying to employ Value2VPValue for both leads to the issue reported in
https://reviews.llvm.org/D76992#inline-721028
Differential Revision: https://reviews.llvm.org/D78847
Integer ranges can be used for loaded/stored values. Note that widening
can be disabled for loads/stores, as we only rely on instructions that
cause continued increases to ranges to be widened (like binary
operators).
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78433
Summary:
Missing error mangling is noticed in
https://bugs.llvm.org/show_bug.cgi?id=45636
where inconsistent profiling input caused
llvm/lld to crash as:
```
Program aborted due to an unhandled Error:
linking module flags 'ProfileSummary':
IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'
```
The change does not change the fact that LLVM crashes
but changes error output to say what was incorrect:
```
LLVM ERROR: Function Import: link error:
linking module flags 'ProfileSummary':
IDs have conflicting values in 'Mutex_posix.o' and 'nsBrowserApp.o'
```
Actual crash has yet to be fixed.
Reviewers: lattner
Reviewed By: lattner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78676
(X | MaskC) == C --> (X & ~MaskC) == C ^ MaskC
(X | MaskC) != C --> (X & ~MaskC) != C ^ MaskC
We have more analyis for 'and' patterns and already lean this way
in the existing code, so this should be neutral or better in IR.
If this does not do as well in codegen, the problem already exists
and we should fix that based on target costs/heuristics.
http://volta.cs.utah.edu:8080/z/oP3ecL
define void @src(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
%or = or i8 %x, %OrC
%eq = icmp eq i8 %or, %C
store i1 %eq, i1* %p0
%ne = icmp ne i8 %or, %C
store i1 %ne, i1* %p1
ret void
}
define void @tgt(i8 %x, i8 %OrC, i8 %C, i1* %p0, i1* %p1) {
%NotOrC = xor i8 %OrC, -1
%a = and i8 %x, %NotOrC
%NewC = xor i8 %C, %OrC
%eq = icmp eq i8 %a, %NewC
store i1 %eq, i1* %p0
%ne = icmp ne i8 %a, %NewC
store i1 %ne, i1* %p1
ret void
}
Using the existing NumFastStores statistic can be misleading when
comparing the impact of DSE patches.
For example, consider the case where a store gets removed from a
function before it is inlined into another function. A less
powerful DSE might only remove the store from functions it has
been inlined into, which will result in more stores being removed, but
no difference in the actual number of stores after DSE.
The new stat provides the absolute number of stores surviving after
DSE.
Reviewers: dmgreen, bryant, asbirlea, jfb
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D78830
Summary:
refactor assume bulider for the next patch.
the assume builder now generate only one assume per attribute kind and per value they are on. to do this it takes the highest. this is desirable because currently, for all attributes the higest value is the most valuable.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78013
We should only skip `lifetime` and `dbg` intrinsics when searching for users.
Other intrinsics are legit users that can't be ignored.
Without this fix, the testcase would result in an invalid IR. `memcpy`
will have a reference to the, now, external value (local to the
extracted loop function).
Fix PR42194
Differential Revision: https://reviews.llvm.org/D78749
The CallSite and ImmutableCallSite were removed in a previous
commit. So rename the file to match the remaining class and
the name of the cpp that implements it.
This patch slightly improves the formatting of the debug output, adds a
few missing outputs and makes some existing outputs more consistent with
the rest.
As discussed in PR45478:
https://bugs.llvm.org/show_bug.cgi?id=45478
...propagating FMF from the outer (second) call is not correct,
so intersect them instead.
I suspect we could do better (see TODO comment), but mismatched
FMF is probably too rare to care about.
Differential Revision: https://reviews.llvm.org/D78631
One of transforms the loop vectorizer makes is LCSSA formation. In some cases it
is the only transform it makes. We should not drop CFG analyzes if only LCSSA was
formed and no actual CFG changes was made.
We should think of expanding this logic to other passes as well, and maybe make
it a part of PM framework.
Reviewed By: Florian Hahn
Differential Revision: https://reviews.llvm.org/D78360
This reverts commit 9245c7ac13.
This is triggering a segfault in XLA downstream, we'll follow-up with
a reproducer, it is likely influenced by TTI/TLI settings or other
options as a simple `opt -loop-vectorize` invocation on the IR
before the crash does not reproduce immediately.
While we can do that, it doesn't increase instruction count,
if the old `sub` sticks around then the transform is not only
not a unlikely win, but a likely regression, since we likely
now extended live range and use count of both of the `sub` operands,
as opposed to just the result of `sub`.
As Kostya Serebryany notes in post-commit review in
https://reviews.llvm.org/D68408#1998112
this indeed can degrade final assembly,
increase register pressure, and spilling.
This isn't what we want here,
so at least for now let's guard it with an use check.
The motivation is to be able to play with the option and change if it is required.
Reviewers: fedor.sergeev, apilipenko, rnk, jdoerfert
Reviewed By: fedor.sergeev
Subscribers: hiraditya, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D78624
This patch adds VPValue version of the instruction operands to
VPWidenRecipe and uses them during code-generation.
Similar to D76373 this reduces ingredient def-use usage by ILV as
a step towards full VPlan-based def-use relations.
Reviewers: rengolin, Ayal, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D76992
I don't believe this pass deals with vectors of pointers. I think
this getScalarType() was added during a mechanical opaque pointer
change of the interface to GetElementPtrInst::getIndexedType.
Summary:
Teach MachineDebugify how to insert DBG_VALUE instructions. This can
help find bugs causing CodeGen differences when debug info is present.
DBG_VALUE instructions are only emitted when -debugify-level is set to
locations+variables.
There is essentially no attempt made to match up DBG_VALUE register
operands with the local variables they ought to correspond to. I'm not
sure how to improve the situation. In some cases (MachineMemOperand?)
it's possible to find the IR instruction a MachineInstr corresponds to,
but in general this seems to call for "undoing" the work done by ISel.
Reviewers: dsanders, aprantl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78135
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.
This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.
In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr
Reviewed By: nikic
Subscribers: fhahn, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78503
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.
This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.
In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr
Reviewed By: nikic
Subscribers: fhahn, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78503
visitExtractValueInst uses mergeInValue, so it already can handle
constant ranges. Initially the early exit was using isOverdefined to
keep things as NFC during the initial move to ValueLatticeElement.
As the function already supports constant ranges, it can just use
ValueState[&I].isOverdefined.
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78393
Add support to optionally emit different instrumentation for accesses to
volatile variables. While the default TSAN runtime likely will never
require this feature, other runtimes for different environments that
have subtly different memory models or assumptions may require
distinguishing volatiles.
One such environment are OS kernels, where volatile is still used in
various places for various reasons, and often declare volatile to be
"safe enough" even in multi-threaded contexts. One such example is the
Linux kernel, which implements various synchronization primitives using
volatile (READ_ONCE(), WRITE_ONCE()). Here the Kernel Concurrency
Sanitizer (KCSAN) [1], is a runtime that uses TSAN instrumentation but
otherwise implements a very different approach to race detection from
TSAN.
While in the Linux kernel it is generally discouraged to use volatiles
explicitly, the topic will likely come up again, and we will eventually
need to distinguish volatile accesses [2]. The other use-case is
ignoring data races on specially marked variables in the kernel, for
example bit-flags (here we may hide 'volatile' behind a different name
such as 'no_data_race').
[1] https://github.com/google/ktsan/wiki/KCSAN
[2] https://lkml.kernel.org/r/CANpmjNOfXNE-Zh3MNP=-gmnhvKbsfUfTtWkyg_=VqTxS4nnptQ@mail.gmail.com
Author: melver (Marco Elver)
Reviewed-in: https://reviews.llvm.org/D78554
The number of different access location kinds we track is relatively
small (8 so far). With this patch we replace the DenseMap that mapped
from index (0-7) to the access set pointer with an array of access set
pointers. This reduces memory consumption.
No functional change is intended.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 472499 (215654/s)
temporary memory allocations: 77794 (35506/s)
peak heap memory consumption: 35.28MB
peak RSS (including heaptrack overhead): 125.46MB
total memory leaked: 269.04KB
```
After:
```
calls to allocation functions: 472270 (308673/s)
temporary memory allocations: 77578 (50704/s)
peak heap memory consumption: 32.70MB
peak RSS (including heaptrack overhead): 121.78MB
total memory leaked: 269.04KB
```
Difference:
```
calls to allocation functions: -229 (346/s)
temporary memory allocations: -216 (326/s)
peak heap memory consumption: -2.58MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
---
Summary:
When an irreducible SCC is converted into a new natural loop, existing
loops included in that SCC now become children of the new loop. The
logic that moves these loops from the parent loop to the new loop
invoked undefined behaviour when it modified the container that it was
iterating over. Fixed this by first extracting all the loops that are
to be removed from the parent.
Fixes bug 45623.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D78544
If we have a dependence between an abstract attribute A to an abstract
attribute B such hat changes in A should trigger an update of B, we do
not need to keep the dependence around once the update was triggered. If
the dependence is still required the update will reinsert it into the
dependence map, if it is not we avoid triggering B in the future. This
replaces the "recompute interval" mechanism we used before to prune
stale dependences.
Number of required iterations is generally down, compile time for the
module pass (not really the CGSCC pass) is down quite a bit.
There is one test change which looks like an artifact in the undefined
behavior AA that needs to be looked at.
The old command line option `-attributor-disable` was too coarse grained
as we want to measure the effects of the module or cgscc pass without
the other as well.
Since `none` is the default there is no real functional change.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D78571
Summary:
As we have discussed previously (e.g. in D63992 / D64090 / [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]]), `sub` instruction
can almost be considered non-canonical. While we do convert `sub %x, C` -> `add %x, -C`,
we sparsely do that for non-constants. But we should.
Here, i propose to interpret `sub %x, %y` as `add (sub 0, %y), %x` IFF the negation can be sinked into the `%y`
This has some potential to cause endless combine loops (either around PHI's, or if there are some opposite transforms).
For former there's `-instcombine-negator-max-depth` option to mitigate it, should this expose any such issues
For latter, if there are still any such opposing folds, we'd need to remove the colliding fold.
In any case, reproducers welcomed!
Reviewers: spatel, nikic, efriedma, xbolva00
Reviewed By: spatel
Subscribers: xbolva00, mgorny, hiraditya, reames, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68408
AbstractAttribute::initialize is used to initialize the deduction and
the object we do not always call it. To make sure we have the option to
initialize the object even if initialize is not called we pass the
Attributor to AbstractAttribute constructors now.
The individual tryTo* helpers do not need to be public. Also, the
builder contained two consecutive public: sections, which is not
necessary. Moved the remaining public methods after the constructor.
Also make some of the tryTo* helpers const.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed by: gilr
Differential Revision: https://reviews.llvm.org/D78288
Before we kept the first applicable `ident_t*` during deduplication of
runtime calls. The problem is that "first" is dependent on the iteration
order of a DenseMap. Since the proper solution, which is to combine the
information from all `ident_t*`, should be deterministic on its own, we
will not try to make the iteration order deterministic. Instead, we will
create a fresh `ident_t*` if there is not a unique existing `ident_t*`
to pick.
We now also use the BumpPtrAllocator from the Attributor in the
InformationCache. The lifetime of objects in either is pretty much the
same and it should result in consistently good performance regardless of
the allocator.
Doing so requires to call more constructors manually but so far that
does not seem to be problematic or messy.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 615359 (368257/s)
temporary memory allocations: 83315 (49859/s)
peak heap memory consumption: 75.64MB
peak RSS (including heaptrack overhead): 163.43MB
total memory leaked: 269.04KB
```
After:
```
calls to allocation functions: 613042 (359555/s)
temporary memory allocations: 83322 (48869/s)
peak heap memory consumption: 75.64MB
peak RSS (including heaptrack overhead): 162.92MB
total memory leaked: 269.04KB
```
Difference:
```
calls to allocation functions: -2317 (-68147/s)
temporary memory allocations: 7 (205/s)
peak heap memory consumption: 2.23KB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
---
With clang option -funique-internal-linkage-symbols, symbols with
internal linkage get names with the module hash appended.
Differential Revision: https://reviews.llvm.org/D78243
Use Instruction::comesBefore() instead of OrderedInstructions
inside InstructionPrecedenceTracking. This also removes the
dominator tree dependency.
Differential Revision: https://reviews.llvm.org/D78461
Summary:
The indexing operator in Scatterer may result in building new
instructions. When using multiple such operators in a function
argument list the order in which we build instructions depend on
argument evaluation order (which is undefined in C++).
This patch avoid such problems by expanding the components using
the [] operator prior to the function call.
Problem was seen when comparing output, while builing LLVM with
different compilers (clang vs gcc).
Reviewers: foad, cameron.mcinally, uabelho
Reviewed By: foad
Subscribers: hiraditya, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78455
This patch includes some clean-ups to tryToCreateRecipe, suggested in
D77973.
It includes:
* Renaming tryToCreateRecipe to tryToCreateWidenRecipe.
* Move VPBB insertion logic to caller of tryToCreateWidenRecipe.
* Hoists instruction checks to tryToCreateWidenRecipe, making it
clearer which instructions are handled by which recipe, simplifying
the checks by using early exits.
* Split up handling of induction PHIs and truncates using inductions.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78287
The recently added Instruction::comesBefore can be used instead of
OrderedInstructions.
Reviewers: rnk, nikic, efriedma
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D78452
The API for shuffles and reductions uses generic Type parameters,
instead of VectorType, and so assertions and casts are used a lot.
This patch makes those types explicit, which means that the clients
can't be lazy, but results in less ambiguity, and that can only be a
good thing.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562
Differential Revision: https://reviews.llvm.org/D78357
Some includes are not required and forward declarations can be used
instead. This also exposed a few places that were not directly including
required files.
Most of the includes in LoopUtils.h are not required in the header and
they can be replaced by forward declarations.
Unfortunately includes of TargetTransformInfo.h and IVDescriptors.h pull
in a bunch of additional things, but there is no easy way to get rid of
them at the moment I think.
bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'
This is the widen shuffle elements enhancement to D76727.
It builds on the analysis and simplifications in
D77881 and rG6a7e958a423e.
The phase ordering tests show that we can simplify inverse
shuffles across a binop in both directions (widen/narrow or
narrow/widen) now.
There's another potential transform visible in some of the
remaining TODOs - move a bitcasted operand of a shuffle
after the shuffle.
Differential Revision: https://reviews.llvm.org/D78371
This makes it easier to extend the merge options in the future and also
reduces the risk of accidentally setting a wrong option.
Reviewers: efriedma, nikic, reames, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78368
First-order recurrences require special treatment when they are live-out;
such treatment is provided by fixFirstOrderRecurrence(), so they should be
included in AllowedExit set.
(Should probably have been included originally in D16197.)
Fixes PR45526: AllowedExit set is used by prepareToFoldTailByMasking() to
check whether the treatment for live-outs also holds when folding the tail,
which is not (yet) the case for first-order recurrences.
Differential Revision: https://reviews.llvm.org/D78210
When running IPSCCP on a module with many small functions, memory
usage is dominated by PredicateInfo, which is a huge structure
(partially due to some unfortunate nested SmallVector use). However,
most of it is actually only temporary state needed to build
predicate info, and does not need to be retained after initial
construction.
This patch factors out the predicate building logic and state
into a separate PrediceInfoBuilder, with the extra bonus that
it does not need to live in the header anymore.
Differential Revision: https://reviews.llvm.org/D78326
We previously clamped the trailing zero count to 31 bits. And
then clamped the final alignment to MaximumAlignment which is
1 << 29.
This patch simplifies this to just clamp the trailing zero to
29 using MaxAlignmentExponent.
I was looking into changing this function to use Align/MaybeAlign
and noticed this.
Differential Revision: https://reviews.llvm.org/D78418
Cost-modeling decisions are tied to the compute interleave groups
(widening decisions, scalar and uniform values). When invalidating the
interleave groups, those decisions also need to be invalidated.
Otherwise there is a mis-match during VPlan construction.
VPWidenMemoryRecipes created initially are left around w/o converting them
into VPInterleave recipes. Such a conversion indeed should not take place,
and these gather/scatter recipes may in fact be right. The crux is leaving around
obsolete CM_Interleave (and dependent) markings of instructions along with
their costs, instead of recalculating decisions, costs, and recipes.
Alternatively to forcing a complete recompute later on, we could try
to selectively invalidate the decisions connected to the interleave
groups. But we would likely need to run the uniform/scalar value
detection parts again anyways and the extra complexity is probably not
worth it.
Fixes PR45572.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78298
There are also some adjustments to use MaybeAlign in here due
to CallBase::getParamAlignment() being deprecated. It would
be a little cleaner if getOrEnforceKnownAlignment was migrated
to Align/MaybeAlign.
Differential Revision: https://reviews.llvm.org/D78345
There are also some adjustments to use MaybeAlign in here due
to CallBase::getParamAlignment() being deprecated. It would
be cleaner if getOrEnforceKnownAlignment was migrated
to Align/MaybeAlign.
Differential Revision: https://reviews.llvm.org/D78345
Users of ValueLatticeElement currently have to ensure constant ranges
are not extended indefinitely. For example, in SCCP, mergeIn goes to
overdefined if a constantrange value is repeatedly merged with larger
constantranges. This is a simple form of widening.
In some cases, this leads to an unnecessary loss of information and
things can be improved by allowing a small number of extensions in the
hope that a fixed point is reached after a small number of steps.
To make better decisions about widening, it is helpful to keep track of
the number of range extensions. That state is tied directly to a
concrete ValueLatticeElement and some unused bits in the class can be
used. The current patch preserves the existing behavior by default:
CheckWiden defaults to false and if CheckWiden is true, a single change
to the range is allowed.
Follow-up patches will slightly increase the threshold for widening.
Reviewers: efriedma, davide, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78145
Removing CallSite left us with a bunch of explicit casts from
Instruction to CallBase. This moves the casts earlier so that
function arguments and data structure types are CallBase so
we don't have to cast when we use them.
Differential Revision: https://reviews.llvm.org/D78246
CallSite will likely be removed soon, but AbstractCallSite serves a different purpose and won't be going away.
This patch switches it to internally store a CallBase* instead of a
CallSite. The only interface changes are the removal of the getCallSite
method and getCallBackUses now takes a CallBase&. These methods had only
a few callers that were easy enough to update without needing a
compatibility shim.
In the future once the other CallSites are gone, the CallSite.h
header should be renamed to AbstractCallSite.h
Differential Revision: https://reviews.llvm.org/D78322
Summary:
Changes the type of the @__typeid_.*_unique_member imports we generate
for unique return value optimization from i8 to [0 x i8]. This
prevents assuming that these imports do not alias, such as when
two unique return values occur in the same vtable.
Fixes PR45393.
Reviewers: tejohnson, pcc
Reviewed By: pcc
Subscribers: aganea, hiraditya, rnk, george.burgess.iv, dblaikie, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77421
The Float2IntPass got a class member called Roots, but Roots
was also passed around to member function as a reference. This
patch simply remove those references.
Since we use the fact that some uses are droppable in the Attributor we
need to handle them explicitly when we replace uses. As an example, an
assumed dead value can have live droppable users. In those we cannot
replace the value simply by an undef. Instead, we either drop the uses
(via `dropDroppableUses`) or keep them as they are. In this patch we do
both, depending on the situation. For values that are dead but not
necessarily removed we keep droppable uses around because they contain
information we might be able to use later. For values that are removed
we drop droppable uses explicitly to avoid replacement with undef.
The handling of the `returned` attribute in D75815 did miss the case
where the argument is (bit)casted to a different type. This is
explicitly allowed by the language reference and exposed by the
Attributor.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D77977
The check if globals were accessed was not always working because two
bits are set for NO_GLOBAL_MEM. The new check works also if only on kind
of globals (internal/external) is accessed.
Running the verifier is expensive so we want to avoid it even in runs
that enable assertions. As we move closer to enabling the Attributor
this code will be executed by some buildbots but not cause overhead for
most people.
Before, we eagerly analyzed all the functions to collect information
about them, e.g. what instructions may read/write memory. This had
multiple drawbacks:
- In CGSCC-mode we can end up looking at a callee which is not in the
SCC but for which we need an initialized cache.
- We end up looking at functions that we deem dead and never need to
analyze in the first place.
- We have a implicit dependence which is easy to break.
This patch moves the function analysis into the information cache and
makes it lazy. There is no real functional change expected except due to
the first reason above.
The CallGraphUpdater allows to directly alter call site information and
we should do so. This might appease the windows buildbot that crashes
during the SCC traversal.
I uploaded the old version accidentally instead of the one with these
minor adjustments requested by the reviewers.
Differential Revision: https://reviews.llvm.org/D77855
Now Reassociate Pass invalidates the analysis results of AAManager and BasicAA,
but it saves GlobalsAA, although it seems that it should preserve them, since
it affects only Unary and Binary operators.
Author: kpolushin (Kirill)
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D77137
Summary:
We can and should remove deleted nodes from their respective SCCs. We
did not do this before and this was a potential problem even though I
couldn't locally trigger an issue. Since the `DeleteNode` would assert
if the node was not in the SCC, we know we only remove nodes from their
SCC and only once (when run on all the Attributor tests).
Reviewers: lebedev.ri, hfinkel, fhahn, probinson, wristow, loladiro, sstefan1, uenoku
Subscribers: hiraditya, bollu, uenoku, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77855
Summary:
While it is uncommon that the ExternalCallingNode needs to be updated,
it can happen. It is uncommon because most functions listed as callees
have external linkage, modifying them is usually not allowed. That said,
there are also internal functions that have, or better had, their
"address taken" at construction time. We conservatively assume various
uses cause the address "to be taken". Furthermore, the user might have
become dead at some point. As a consequence, transformations, e.g., the
Attributor, might be able to replace a function that is listed
as callee of the ExternalCallingNode.
Since there is no function corresponding to the ExternalCallingNode, we
did just remove the node from the callee list if we replaced it (so
far). Now it would be preferable to replace it if needed and remove it
otherwise. However, removing the node has implications on the CGSCC
iteration. Locally, that caused some other nodes to be never visited
but it is for sure possible other (bad) side effects can occur. As it
seems conservatively safe to keep the new node in the callee list we
will do that for now.
Reviewers: lebedev.ri, hfinkel, fhahn, probinson, wristow, loladiro, sstefan1, uenoku
Subscribers: hiraditya, bollu, uenoku, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77854
Summary:
The old code did eliminate references from and to functions that were
about to be deleted only just before we deleted them. This can cause
references from other functions that are supposed to be deleted to still
exist, depending on the order. If the functions form a strongly
connected component the problem manifests regardless of the order in
which we try to actually delete the functions.
This patch introduces a two step deletion. First we remove all
references and then we delete the function. Note that this only affects
the old call graph. There should not be any functional changes if no old
style call graph was given.
To test this we delete two strongly connected functions instead of one
in an existing test.
Reviewers: hfinkel
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77975
The current strategy LICM uses when sinking for debuginfo is
that of picking the debug location of one of the uses.
This causes stepping to be wrong sometimes, see, e.g. PR45523.
This patch introduces a generalization of getMergedLocation(),
that operates on a vector of locations instead of two, and try
to merge all them together, and use the new API in LICM.
<rdar://problem/61750950>
After introducing VPWidenSelectRecipe, the duplicated logic can be
shared.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D77973
We can eliminate MemoryDefs of objects not accessible after the function
returns (e.g. alloca), if there are no reads between the MemoryDef and
any function exits. We can stop traversing paths that completely
overwrite the memory location of the MemoryDef.
This patch was split off D73763.
Reviewers: dmgreen, bryant, asbirlea, Tyker, efriedma, george.burgess.iv
Reviewed By: asbirlea, george.burgess.iv
Differential Revision: https://reviews.llvm.org/D77736
This is related to commit 8c11bc0cd0
which introduces the FixIrreducible pass. The warning seems hard to
reproduce locally. The latest attempt ought to work.
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.
This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D77198
This restores commit 2ada8e2525.
Originally reverted with commit 44e09b59b8.
Handling LoadInst and StoreInst in tryToWiden seems a bit
counter-intuitive, as there is only an assertion for them and in no
case VPWidenRefipes are created for them.
I think it makes sense to move the assertion to handleReplication, where
the non-widened loads and store are handled.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D77972
Fix an assert introduced in 41ed5d856c1: a phi with a single predecessor and a
mask is a valid case which is already supported by the code.
Differential Revision: https://reviews.llvm.org/D78115
This reverts commit 2ada8e2525.
Buildbots produced compilation errors which I was not able to quickly
reproduce locally. Need more time to investigate.
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.
This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D77198
Summary:
Currently, the internal options -vectorize-loops, -vectorize-slp, and
-interleave-loops do not have much practical effect. This is because
they are used to initialize the corresponding flags in the pass
managers, and those flags are then unconditionally overwritten when
compiling via clang or via LTO from the linkers. The only exception was
-vectorize-loops via opt because of some special hackery there.
While vectorization could still be disabled when compiling via clang,
using -fno-[slp-]vectorize, this meant that there was no way to disable
it when compiling in LTO mode via the linkers. This only affected
ThinLTO, since for regular LTO vectorization is done during the compile
step for scalability reasons. For ThinLTO it is invoked in the LTO
backends. See also the discussion on PR45434.
This patch makes it so the internal options can actually be used to
disable these optimizations. Ultimately, the best long term solution is
to mark the loops with metadata (similar to the approach used to fix
-fno-unroll-loops in D77058), but this enables a shorter term
workaround, and actually makes these internal options useful.
I constant propagated the initial values of these internal flags into
the pass manager flags (for some reasons vectorize-loops and
interleave-loops were initialized to true, while vectorize-slp was
initialized to false). As mentioned above, they are overwritten
unconditionally so this doesn't have any real impact, and these initial
values aren't particularly meaningful.
I then changed the passes to check the internl values and return without
performing the associated optimization when false (I changed the default
of -vectorize-slp to true so the options behave similarly). I was able
to remove the hackery in opt used to get -vectorize-loops=false to work,
as well as a special option there used to disable SLP vectorization.
Finally, I changed thinlto-slp-vectorize-pm.c to:
a) Only test SLP (moved the loop vectorization checking to a new test).
b) Use code that is slp vectorized when it is enabled, and check that
instead of whether the pass is enabled.
c) Test the new behavior of -vectorize-slp.
d) Test both pass managers.
The loop vectorization (and associated interleaving) testing I moved to
a new thinlto-loop-vectorize-pm.c test, with several changes:
a) Changed the flags on the interleaving testing so that it will
actually interleave, and check that.
b) Test the new behavior of -vectorize-loops and -interleave-loops.
c) Test both pass managers.
Reviewers: fhahn, wmi
Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, davezarzycki, llvm-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77989
Summary:
This patch fix the following issues in InstCombiner::visitGetElementPtrInst
1. Skip for scalable type if transformation requires fixed size number of
vector element.
2. Skip for scalable type if transformation relies on compile-time known type
alloc size.
3. Use VectorType::getElementCount when scalable property is used to construct
new VectorType.
4. Use TypeSize::getKnownMinSize when minimal size of a scalable type is valid to determine GEP 'inbounds'.
5. Explicitly call TypeSize::getFixedSize to avoid implicit type conversion to uint64_t.
Reviewers: sdesmalen, efriedma, spatel, ctetreau
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78081
This patch fixes 2 related bugs in ADCE:
- `performDeadCodeElimination` does not report changes if it did ONLY
CFG changes (affects both old and new pass managers);
- When control flow removal is enabled, new pass manager does not
drop CFG analyses.
Both can lead to incorrect loop info after ADCE that does only CFG changes.
Differential Revision: https://reviews.llvm.org/D78103
Reviewed By: Denis Antrushin
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.
Differential revision: https://reviews.llvm.org/D78016
Summary:
Following up on the comments on D77638.
Not undoing rGd6525eff5ebfa0ef1d6cd75cb9b40b1881e7a707 here at the moment, since I don't know how to test mac builds. Please let me know if I should include that here too.
Reviewers: vitalybuka
Reviewed By: vitalybuka
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77889
ModuleSummaryAnalysis is the only file in libAnalysis that brings a
dependency on the CodeGen layer from libAnalysis, moving it breaks this
dependency.
Differential Revision: https://reviews.llvm.org/D77994
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: rriddle, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77259
Summary:
Share logic to strip debugify metadata between the IR and MIR level
debugify passes. This makes it simpler to hunt for bugs by diffing IR
with vs. without -debugify-each turned on.
As a drive-by, fix an issue causing CallGraphNodes to become invalid
when a dead llvm.dbg.value prototype is deleted.
Reviewers: dsanders, aprantl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77915
Pass from the calling recipe the interleave group itself instead of passing the
group's insertion position and having the function query CM for its interleave
group and making sure that given instruction is the insertion point of.
Differential Revision: https://reviews.llvm.org/D78002
Summary: change assumption cache to store an assume along with an index to the operand bundle containing the knowledge.
Reviewers: jdoerfert, hfinkel
Reviewed By: jdoerfert
Subscribers: hiraditya, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77402
Widening a selects depends on whether the condition is loop invariant or
not. Rather than checking during codegen-time, the information can be
recorded at the VPlan construction time.
This was suggested as part of D76992, to reduce the reliance on
accessing the original underlying IR values.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D77869
Summary:
Updated CallPromotionUtils and impacted sites. Parameters that are
expected to be non-null, and return values that are guranteed non-null,
were replaced with CallBase references rather than pointers.
Left FIXME in places where more changes are facilitated by CallBase, but
aren't CallSites: Instruction* parameters or return values, for example,
where the contract that they are actually CallBase values.
Reviewers: davidxl, dblaikie, wmi
Reviewed By: dblaikie
Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77930
As proposed in D77881, we'll have the related widening operation,
so this name becomes too vague.
While here, change the function signature to take an 'int' rather
than 'size_t' for the scaling factor, add an assert for overflow of
32-bits, and improve the documentation comments.
Default visibility for classes is private, so the private: at the top of
various class definitions is redundant.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D77810
Revision a1c05fe <https://reviews.llvm.org/rGa1c05fe20f3def1f1be9f50d2adefc6b6f1578ad>
removed bitcast from the list of problematic transformations, however:
%97 = fptrunc ppc_fp128 %2 to double // we need to check ppc_fp128 here to prevent the transformation
%98 = bitcast double %97 to i64 // a1c05fe checks ppc_fp128 at here
%99 = icmp slt i64 %98, 0
%100 = zext i1 %99 to i8
store i8 %100, i8* %7, align 1
so this patch does that. I'm also disabling it in the presence of extend just in case.
I verified separately that the hash of -std::infinity and std::infinity don't match now.
Differential Revision: https://reviews.llvm.org/D77911
Summary:
The inline history is associated with a call site. There are two locations
we fetch inline history. In one, we fetch it together with the call
site. In the other, we initialize it under certain conditions, use it
later under same conditions (different if check), and otherwise is
uninitialized. Although currently there is no uninitialized use, the
code is more challenging to maintain correctly, than if the value were
always initialized.
Changed to the upfront initialization pattern already present in this
file.
Reviewers: davidxl, dblaikie
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77877
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
Summary:
Function names: camel case, lower case first letter.
Variable names: start with upper letter. For iterators that were 'i',
renamed with a descriptive name, as 'I' is 'Instruction&'.
Lambda captures simplification.
Opportunistic boolean return simplification.
Reviewers: davidxl, dblaikie
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77837
Summary:
This patch establishes memory layout and adds instrumentation. It does
not add runtime support and does not enable MSan, which will be done
separately.
Memory layout is based on PPC64, with the exception that XorMask
is not used - low and high memory addresses are chosen in a way that
applying AndMask to low and high memory produces non-overlapping
results.
VarArgHelper is based on AMD64. It might be tempting to share some
code between the two implementations, but we need to keep in mind that
all the ABI similarities are coincidental, and therefore any such
sharing might backfire.
copyRegSaveArea() indiscriminately copies the entire register save area
shadow, however, fragments thereof not filled by the corresponding
visitCallSite() invocation contain irrelevant data. Whether or not this
can lead to practical problems is unclear, hence a simple TODO comment.
Note that the behavior of the related copyOverflowArea() is correct: it
copies only the vararg-related fragment of the overflow area shadow.
VarArgHelper test is based on the AArch64 one.
s390x ABI requires that arguments are zero-extended to 64 bits. This is
particularly important for __msan_maybe_warning_*() and
__msan_maybe_store_origin_*() shadow and origin arguments, since non
zeroed upper parts thereof confuse these functions. Therefore, add ZExt
attribute to the corresponding parameters.
Add ZExt attribute checks to msan-basic.ll. Since with
-msan-instrumentation-with-call-threshold=0 instrumentation looks quite
different, introduce the new CHECK-CALLS check prefix.
Reviewers: eugenis, vitalybuka, uweigand, jonpa
Reviewed By: eugenis
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits, stefansf, Andreas-Krebbel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76624
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sdesmalen, rriddle, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77262
For non-integer constants/expressions and overdefined, I think we can
just use SimplifyBinOp to do common folds. By just passing a context
with the DL, SimplifyBinOp should not try to get additional information
from looking at definitions.
For overdefined values, it should be enough to just pass the original
operand.
Note: The comment before the `if (isconstant(V1State)...` was wrong
originally: isConstant() also matches integer ranges with a single
element. It is correct now.
Reviewers: efriedma, davide, mssimpso, aartbik
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76459
Loop simplify form should always be checked because logic of
propagateStoredValueToLoadUsers relies on it (in particular, it
requires preheader).
Reviewed By: Fedor Sergeev, Florian Hahn
Differential Revision: https://reviews.llvm.org/D77775
Summary:
*Almost* all uses are replaced. Left FIXMEs for the two sites that
require refactoring outside of Inliner, to scope this patch.
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77817
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: efriedma, sdesmalen, rriddle
Reviewed By: sdesmalen
Subscribers: hiraditya, dantrushin, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77261
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: rriddle, sdesmalen, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77260
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces VPValues for VPBlendRecipe to use as the values to
blend. The recipe is generated with VPValues wrapping the phi's incoming values
of the scalar phi. This reduces ingredient def-use usage by ILV as a step
towards full VPlan-based def-use relations.
Differential Revision: https://reviews.llvm.org/D77539
Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector
induction for use in fold-tail-with-masking, if a primary induction is absent.
The canonical scalar IV having start = 0 and step = VF*UF, created during code
-gen to control the vector loop, is widened into a canonical vector IV having
start = {<Part*VF, Part*VF+1, ..., Part*VF+VF-1> for 0 <= Part < UF} and
step = <VF*UF, VF*UF, ..., VF*UF>.
Differential Revision: https://reviews.llvm.org/D77635
When building a VPlan, BasicBlock::instructionsWithoutDebug() is used to
iterate over the instructions in a block. This means that no recipes
should be created for debug info intrinsics already and we can turn the
early exit into an assertion.
Reviewers: Ayal, gilr, rengolin, aprantl
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D77636
This patch adds VPValue versions for the arguments of the call to
VPWidenCallRecipe and uses them during code-generation.
Similar to D76373 this reduces ingredient def-use usage by ILV as
a step towards full VPlan-based def-use relations.
Reviewers: Ayal, gilr, rengolin
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D77655
Summary:
There are at least three clients for KnownBits calculations:
ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the
common logic should be moved out of these clients and into KnownBits
itself.
This patch does this for AND, OR and XOR calculations by implementing
and using appropriate operator overloads KnownBits::operator& etc.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74060
Now compiler defines 5 sets of constants to represent rounding mode.
These are:
1. `llvm::APFloatBase::roundingMode`. It specifies all 5 rounding modes
defined by IEEE-754 and is used in `APFloat` implementation.
2. `clang::LangOptions::FPRoundingModeKind`. It specifies 4 of 5 IEEE-754
rounding modes and a special value for dynamic rounding mode. It is used
in clang frontend.
3. `llvm::fp::RoundingMode`. Defines the same values as
`clang::LangOptions::FPRoundingModeKind` but in different order. It is
used to specify rounding mode in in IR and functions that operate IR.
4. Rounding mode representation used by `FLT_ROUNDS` (C11, 5.2.4.2.2p7).
Besides constants for rounding mode it also uses a special value to
indicate error. It is convenient to use in intrinsic functions, as it
represents platform-independent representation for rounding mode. In this
role it is used in some pending patches.
5. Values like `FE_DOWNWARD` and other, which specify rounding mode in
library calls `fesetround` and `fegetround`. Often they represent bits
of some control register, so they are target-dependent. The same names
(not values) and a special name `FE_DYNAMIC` are used in
`#pragma STDC FENV_ROUND`.
The first 4 sets of constants are target independent and could have the
same numerical representation. It would simplify conversion between the
representations. Also now `clang::LangOptions::FPRoundingModeKind` and
`llvm::fp::RoundingMode` do not contain the value for IEEE-754 rounding
direction `roundTiesToAway`, although it is supported natively on
some targets.
This change defines all the rounding mode type via one `llvm::RoundingMode`,
which also contains rounding mode for IEEE rounding direction `roundTiesToAway`.
Differential Revision: https://reviews.llvm.org/D77379
Summary:
New SanitizerCoverage feature `inline-bool-flag` which inserts an
atomic store of `1` to a boolean (which is an 8bit integer in
practice) flag on every instrumented edge.
Implementation-wise it's very similar to `inline-8bit-counters`
features. So, much of wiring and test just follows the same pattern.
Reviewers: kcc, vitalybuka
Reviewed By: vitalybuka
Subscribers: llvm-commits, hiraditya, jfb, cfe-commits, #sanitizers
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D77244
Dead constants might be left when a function is replaced, we can
gracefully handle this case and avoid complexity for the users who would
see an assertion otherwise.
Passing a Value * to CreateCall has to call getPointerElementType
to find the type of the pointer.
In this case we can rely on the fact that Intrinsic::getDeclaration
returns a Function * and use that version of CreateCall.
Attributor.cpp became quite big and we need to start provide structure.
The Attributor code is now in Attributor.cpp and the classes derived
from AbstractAttribute are in AttributorAttributes.cpp. Minor changes
were required but no intended functional changes.
We also minimized includes as part of this.
Reviewed By: baziotis
Differential Revision: https://reviews.llvm.org/D76873
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sdesmalen, rriddle, efriedma
Reviewed By: sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77263
callCapturesBefore always returns ModRef , if UseInst isn't a call. As
we only call it if we already know Mod is set, this only destroys the
Must bit for non-calls.
Based on the post-commit comments for rG0f56bbc, there might
be a problem with this transform:
(bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0
...and the ppc_fp128 data type, so conservatively bypass if we
are bitcasting a ppc_fp128.
We might be able to account for endian or other differences to
enable this for PowerPC again if that is useful.
Differential Revision: https://reviews.llvm.org/D77642
Summary:
ComputeValueKnownInPredecessorsImpl is the main folding mechanism in
JumpThreading.cpp. To avoid potential infinite recursion while
chasing use-def chains, it uses:
DenseSet<std::pair<Value *, BasicBlock *>> &RecursionSet
to keep track of Value-BB pairs that we've processed.
Now, when ComputeValueKnownInPredecessorsImpl recursively calls
itself, it always passes BB as is, so the second element is always BB.
This patch simplifes the function by dropping "BasicBlock *" from
RecursionSet.
Reviewers: wmi, efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77699
Summary:
Re-used the IR-level debugify for the most part. The MIR-level code then
adds locations to the MachineInstrs afterwards based on the LLVM-IR debug
info.
It's worth mentioning that the resulting locations make little sense as
the range of line numbers used in a Function at the MIR level exceeds that
of the equivelent IR level function. As such, MachineInstrs can appear to
originate from outside the subprogram scope (and from other subprogram
scopes). However, it doesn't seem worth worrying about as the source is
imaginary anyway.
There's a few high level goals this pass works towards:
* We should be able to debugify our .ll/.mir in the lit tests without
changing the checks and still pass them. I.e. Debug info should not change
codegen. Combining this with a strip-debug pass should enable this. The
main issue I ran into without the strip-debug pass was instructions with MMO's and
checks on both the instruction and the MMO as the debug-location is
between them. I currently have a simple hack in the MIRPrinter to
resolve that but the more general solution is a proper strip-debug pass.
* We should be able to test that GlobalISel does not lose debug info. I
recently found that the legalizer can be unexpectedly lossy in seemingly
simple cases (e.g. expanding one instr into many). I have a verifier
(will be posted separately) that can be integrated with passes that use
the observer interface and will catch location loss (it does not verify
correctness, just that there's zero lossage). It is a little conservative
as the line-0 locations that arise from conflicts do not track the
conflicting locations but it can still catch a fair bit.
Depends on D77439, D77438
Reviewers: aprantl, bogner, vsk
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77446
dso_local leads to direct access even if the definition is not within this compilation unit (it is
still in the same linkage unit). On ELF, such a relocation (e.g. R_X86_64_PC32) referencing a
STB_GLOBAL STV_DEFAULT object can cause a linker error in a -shared link.
If the linkage is changed to available_externally, the dso_local flag should be dropped, so that no
direct access will be generated.
The current behavior is benign, because -fpic does not assume dso_local
(clang/lib/CodeGen/CodeGenModule.cpp:shouldAssumeDSOLocal).
If we do that for -fno-semantic-interposition (D73865), there will be an
R_X86_64_PC32 linker error without this patch.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D74751
This patch updates the code that deals with conditions from predicate
info to make use of constant ranges.
For ssa_copy instructions inserted by PredicateInfo, we have 2 ranges:
1. The range of the original value.
2. The range imposed by the linked condition.
1. is known, 2. can be determined using makeAllowedICmpRegion. The
intersection of those ranges is the range for the copy.
With this patch, we get a nice increase in the number of instructions
eliminated by both SCCP and IPSCCP for some benchmarks:
For MultiSource, SPEC2000 & SPEC2006:
Tests: 237
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.NumInstRemoved
Program base patch diff
test-suite...Source/Benchmarks/sim/sim.test 10.00 71.00 610.0%
test-suite...CFP2000/177.mesa/177.mesa.test 361.00 1626.00 350.4%
test-suite...encode/alacconvert-encode.test 141.00 602.00 327.0%
test-suite...decode/alacconvert-decode.test 141.00 602.00 327.0%
test-suite...CI_Purple/SMG2000/smg2000.test 1639.00 4093.00 149.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 75.00 163.00 117.3%
test-suite...T2006/401.bzip2/401.bzip2.test 358.00 513.00 43.3%
test-suite...rks/FreeBench/pifft/pifft.test 11.00 15.00 36.4%
test-suite...langs-C/unix-tbl/unix-tbl.test 4.00 5.00 25.0%
test-suite...lications/sqlite3/sqlite3.test 541.00 667.00 23.3%
test-suite.../CINT2000/254.gap/254.gap.test 243.00 299.00 23.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 25.00 29.00 16.0%
test-suite...marks/7zip/7zip-benchmark.test 1135.00 1304.00 14.9%
test-suite...lications/ClamAV/clamscan.test 1105.00 1268.00 14.8%
test-suite...urce/Applications/lua/lua.test 398.00 436.00 9.5%
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...C/CFP2000/179.art/179.art.test 1.00 3.00 200.0%
test-suite...006/447.dealII/447.dealII.test 429.00 1056.00 146.2%
test-suite...nch/fourinarow/fourinarow.test 3.00 7.00 133.3%
test-suite...CI_Purple/SMG2000/smg2000.test 818.00 1748.00 113.7%
test-suite...ks/McCat/04-bisect/bisect.test 3.00 5.00 66.7%
test-suite...CFP2000/177.mesa/177.mesa.test 165.00 255.00 54.5%
test-suite...ediabench/gsm/toast/toast.test 18.00 27.00 50.0%
test-suite...telecomm-gsm/telecomm-gsm.test 18.00 27.00 50.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 24.00 35.00 45.8%
test-suite...TimberWolfMC/timberwolfmc.test 43.00 62.00 44.2%
test-suite...encode/alacconvert-encode.test 46.00 66.00 43.5%
test-suite...decode/alacconvert-decode.test 46.00 66.00 43.5%
test-suite...langs-C/unix-tbl/unix-tbl.test 12.00 17.00 41.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 31.00 41.00 32.3%
test-suite.../CINT2000/254.gap/254.gap.test 117.00 154.00 31.6%
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76611
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.
In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.
In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)
Differential Revision: https://reviews.llvm.org/D75661
Summary:
In some cases, ASan may insert instrumentation before function arguments
have been stored into their allocas. This causes two issues:
1) The argument value must be spilled until it can be stored into the
reserved alloca, wasting a stack slot.
2) Until the store occurs in a later basic block, the debug location
will point to the wrong frame offset, and backtraces will show an
uninitialized value.
The proposed solution is to move instructions which initialize allocas
for arguments up into the entry block, before the position where ASan
starts inserting its instrumentation.
For the motivating test case, before the patch we see:
```
| 0033: movq %rdi, 0x68(%rbx) | | DW_TAG_formal_parameter |
| ... | | DW_AT_name ("a") |
| 00d1: movq 0x68(%rbx), %rsi | | DW_AT_location (RBX+0x90) |
| 00d5: movq %rsi, 0x90(%rbx) | | ^ not correct ... |
```
and after the patch we see:
```
| 002f: movq %rdi, 0x70(%rbx) | | DW_TAG_formal_parameter |
| | | DW_AT_name ("a") |
| | | DW_AT_location (RBX+0x70) |
```
rdar://61122691
Reviewers: aprantl, eugenis
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77182
Summary:
It can be helpful to test behaviour w.r.t locations without having DEBUG_VALUE
around. In particular, because DEBUG_VALUE has the potential to change CodeGen
behaviour (e.g. hasOneUse() vs hasOneNonDbgUse()) while locations generally
don't.
Reviewers: aprantl, bogner
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77438
The patch introduces the system to distinctively store the information
needed for the Control Flow Graph as well as the instrumentary needed for
the follow-up changes: BlockFrequencyInfo and BranchProbabilityInfo.
The patch is a part of sequence of three patches, related to graphs Heat Coloring.
Reviewers: rcorcs, apilipenko, davidxl, sfertile, fedor.sergeev, eraman, bollu
Differential Revision: https://reviews.llvm.org/D76820
This patch moves calls to their own recipe, to simplify the transition
to VPUser for operands of VPWidenRecipe, as discussed in D76992.
Subsequently additional information can be added to the recipe rather
than computing it during the execute step.
Reviewers: rengolin, Ayal, gilr, hsaito
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D77467
Summary:
In D77454 we explain that `LoadInst` and `StoreInst` always have their alignment defined.
This allows to work backward here and to infer that `getNewAlignment` does not need to return `0` in case of failure.
Returning `1` also works since it needs to be greater than the Load/Store alignment which is a least `1`.
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77538
Summary:
This is a roll forward of D77394 minus AlignmentFromAssumptions (which needs to be addressed separately)
Differences from D77394:
- DebugStr() now prints the alignment value or `None` and no more `Align(x)` or `MaybeAlign(x)`
- This is to keep Warning message consistent (CodeGen/SystemZ/alloca-04.ll)
- Removed a few unneeded headers from Alignment (since it's included everywhere it's better to keep the dependencies to a minimum)
Reviewers: courbet
Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77537
This patch adds a -matrix-default-layout option which can be used to
set the default matrix layout to row-major or column-major (default).
The initial patch updates codegen for loads, stores, binary operators
and matrix multiply.
Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D76325
This patch adds initial fusion for load/multiply/store chains of matrix
operations.
The patch contains roughly two parts:
1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).
First, we ensure that both loads of the multiply operands do not alias the store.
If they do, we create new non-aliasing copies of the operands. Note that this
may introduce new basic block. Finally we process TileSize x TileSize blocks.
That is: load tiles from the input operands, multiply and store them.
2. Identify fusion candidates & matrix instructions.
As a first step, collect all instructions with shape info and fusion candidates
(currently @llvm.matrix.multiply calls). Next, try to fuse candidates and
collect instructions eliminated by fusion. Finally iterate over all matrix
instructions, skip the ones eliminated by fusion and lower the rest as usual.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D75566
The new and old pass managers (PassManagerBuilder.cpp and
PassBuilder.cpp) are exposed to an `extern` declaration of
`attributor-disable` option which will guard the addition of the
attributor passes to the pass pipelines.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76871
This patch builds upon D76140 by updating metadata on pointer typed
loads in inlined functions, when the load is the return value, and the
callsite contains return attributes which can be updated as metadata on
the load.
Added test cases show this for nonnull, dereferenceable,
dereferenceable_or_null
Reviewed-By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76792
As discussed in D76983, that patch can turn a chain of insert/extract
with scalar trunc ops into bitcast+extract and existing instcombine
vector transforms end up creating a shuffle out of that (see the
PhaseOrdering test for an example). Currently, that process requires
at least this sequence: -instcombine -early-cse -instcombine.
Before D76983, the sequence of insert/extract would reach the SLP
vectorizer and become a vector trunc there.
Based on a small sampling of public targets/types, converting the
shuffle to a trunc is better for codegen in most cases (and a
regression of that form is the reason this was noticed). The trunc is
clearly better for IR-level analysis as well.
This means that we can induce "spontaneous vectorization" without
invoking any explicit vectorizer passes (at least a vector cast op
may be created out of scalar casts), but that seems to be the right
choice given that we started with a chain of insert/extract, and the
backend would expand back to that chain if a target does not support
the op.
Differential Revision: https://reviews.llvm.org/D77299
Query AAValueSimplify on pointers in memory accessing instructions to take
advantage of the constant propagation (or any other value simplification) of such values.
After 49d00824bb, VPWidenRecipe only stores a single instruction.
tryToWiden can simply return the widen recipe, like other helpers in
VPRecipeBuilder.