Summary:
While it is uncommon that the ExternalCallingNode needs to be updated,
it can happen. It is uncommon because most functions listed as callees
have external linkage, modifying them is usually not allowed. That said,
there are also internal functions that have, or better had, their
"address taken" at construction time. We conservatively assume various
uses cause the address "to be taken". Furthermore, the user might have
become dead at some point. As a consequence, transformations, e.g., the
Attributor, might be able to replace a function that is listed
as callee of the ExternalCallingNode.
Since there is no function corresponding to the ExternalCallingNode, we
did just remove the node from the callee list if we replaced it (so
far). Now it would be preferable to replace it if needed and remove it
otherwise. However, removing the node has implications on the CGSCC
iteration. Locally, that caused some other nodes to be never visited
but it is for sure possible other (bad) side effects can occur. As it
seems conservatively safe to keep the new node in the callee list we
will do that for now.
Reviewers: lebedev.ri, hfinkel, fhahn, probinson, wristow, loladiro, sstefan1, uenoku
Subscribers: hiraditya, bollu, uenoku, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77854
Summary:
The renaming is necessary to make the naming scheme uniform with other
gather/scatter load/stores SVE intrinsics.
The naming of variables and functions have been adapted to make it
explicit whether we are dealing with a scalar offset (which is
unscaled) or an index (which is scaled according to the data type of
the lanes of the vector).
Reviewers: andwar, sdesmalen, rengolin
Reviewed By: andwar
Subscribers: tschuett, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77839
This commit fixes using functions in `IRObjectFile` to load bitcode from
wasm objects by recognizing the file magic for wasm and also inheriting
the default implementation of classifying sections as bitcode.
Patch By: alexcrichton
Differential Revision: https://reviews.llvm.org/D78199
The current strategy LICM uses when sinking for debuginfo is
that of picking the debug location of one of the uses.
This causes stepping to be wrong sometimes, see, e.g. PR45523.
This patch introduces a generalization of getMergedLocation(),
that operates on a vector of locations instead of two, and try
to merge all them together, and use the new API in LICM.
<rdar://problem/61750950>
MCExpr has a bunch of free space that is currently going to waste.
Repurpose it as 24 bits of subclass data, which is enough to reduce
the size of all subclasses by 8 bytes. This gives us some respectable
savings for debuginfo builds. Here are the max-rss reductions for the
fat LTO link step:
kc.link 238MiB 231MiB (-2.82%)
sqlite3.link 258MiB 250MiB (-3.27%)
consumer-typeset.link 152MiB 148MiB (-2.51%)
bullet.link 197MiB 192MiB (-2.30%)
tramp3d-v4.link 578MiB 567MiB (-1.92%)
pairlocalalign.link 92MiB 90MiB (-1.98%)
clamscan.link 230MiB 223MiB (-2.81%)
lencod.link 242MiB 235MiB (-2.67%)
SPASS.link 235MiB 230MiB (-2.23%)
7zip-benchmark.link 450MiB 435MiB (-3.25%)
Differential Revision: https://reviews.llvm.org/D77939
We generally only combine starting from users to defs in the artifact combiner,
but this doesn't catch cases where at the point of combining a G_UNMERGE we don't
yet have the opposite G_MERGE on input yet since we haven't legalized that far.
This change adds the users of a G_MERGE to the artifact combiner worklist if one
of the uses is a G_UNMERGE or G_TRUNC.
Differential Revision: https://reviews.llvm.org/D77931
Summary:
The combine for unmerge(cast(merge)) is only valid for vectors, but was
missing a corresponding check. Add a check that the operands are vectors
to avoid an invalid combine.
Without this check, the combiner would emit incorrect code for scalars
and pointers because the artifact cast (trunc/ext) only affects bits at
the end of the type, while this combine assumes that the casted bits
appear between meaningful bits.
This also uncovered a segmentation fault in the AMDGPU
InstructionSelector. The tests triggering this bug have been moved to
their own file and a check for the segmentation fault has been added.
Reviewers: arsenm, dsanders, aemerson, paquette, aditya_nandakumar
Reviewed By: arsenm
Subscribers: tpr, jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78191
Summary:
As a follow up to https://reviews.llvm.org/D29014, add translation
support for freeze.
Introduce a new generic instruction G_FREEZE and translate freeze to it.
Reviewers: dsanders, aqjune, arsenm, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson
Reviewed By: aqjune, arsenm
Subscribers: fhahn, lebedev.ri, wdng, rovka, hiraditya, jfb, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77795
Summary:
This patch is to fix the parsing of long double literals encoded with the e prefix on PowerPC and S390. For both PowerPC and S390, type code e is used for 64-bit long double literals and g is used for 128-bit long double literals. libcxxabi test case test_demangle.pass.cpp fails without the fix.
Authored by: xingxue-ibm
Reviewers: hubert.reinterpretcast, jasonliu, erik.pilkington, uweigand, mclow.li
sts, libc++abi
Reviewed by: hubert.reinterpretcast, erik.pilkington
Differential Revision: https://reviews.llvm.org/D74163
Summary:
No error or warning is emitted when specific reserved registers are
written to in inline assembly. Therefore, writes to the program counter
or to the frame pointer, for instance, were permitted, which could have
led to undesirable behaviour.
Example:
int foo() {
register int a __asm__("r7"); // r7 = frame-pointer in M-class ARM
__asm__ __volatile__("mov %0, r1" : "=r"(a) : : );
return a;
}
In contrast, GCC issues an error in the same scenario.
This patch detects writes to specific reserved registers in inline
assembly for ARM and emits an error in such case. The detection works
for output and input operands. Clobber operands are not handled here:
they are already covered at a later point in
AsmPrinter::emitInlineAsm(const MachineInstr *MI). The registers
covered are: program counter, frame pointer and base pointer.
This is ARM only. Therefore the implementation of other targets'
counterparts remain open to do.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76848
To simplify future work on statepoint representation, hide
direct access to statepoint field indices and provide getters
for them. Add getters for couple more statepoint fields.
This also fixes two bugs in MachineVerifier for statepoint:
First, the `break` statement was falling out of `if` statement
scope, thus disabling following checks.
Second, it was incorrectly accessing some fields like CallingConv -
StatepointOpers gives index to their value directly, not to
preceeding field type encoding.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D78119
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.
This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D77198
This restores commit 2ada8e2525.
Originally reverted with commit 44e09b59b8.
This reverts commit 2ada8e2525.
Buildbots produced compilation errors which I was not able to quickly
reproduce locally. Need more time to investigate.
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.
This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D77198
This is a minor NFC change to make the code more clear. We have the NegatibleCost that
has cheaper, neutral, and expensive. Typically, the smaller one means the less cost.
It is inverse for current implementation, which makes following code not easy to read.
If (CostX > CostY) negate(X)
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D77993
Summary:
This revision adds two utilities currently present in MLIR to LLVM StringExtras:
* convertToSnakeFromCamelCase
Convert a string from a camel case naming scheme, to a snake case scheme
* convertToCamelFromSnakeCase
Convert a string from a snake case naming scheme, to a camel case scheme
Differential Revision: https://reviews.llvm.org/D78167
Summary:
Currently, the internal options -vectorize-loops, -vectorize-slp, and
-interleave-loops do not have much practical effect. This is because
they are used to initialize the corresponding flags in the pass
managers, and those flags are then unconditionally overwritten when
compiling via clang or via LTO from the linkers. The only exception was
-vectorize-loops via opt because of some special hackery there.
While vectorization could still be disabled when compiling via clang,
using -fno-[slp-]vectorize, this meant that there was no way to disable
it when compiling in LTO mode via the linkers. This only affected
ThinLTO, since for regular LTO vectorization is done during the compile
step for scalability reasons. For ThinLTO it is invoked in the LTO
backends. See also the discussion on PR45434.
This patch makes it so the internal options can actually be used to
disable these optimizations. Ultimately, the best long term solution is
to mark the loops with metadata (similar to the approach used to fix
-fno-unroll-loops in D77058), but this enables a shorter term
workaround, and actually makes these internal options useful.
I constant propagated the initial values of these internal flags into
the pass manager flags (for some reasons vectorize-loops and
interleave-loops were initialized to true, while vectorize-slp was
initialized to false). As mentioned above, they are overwritten
unconditionally so this doesn't have any real impact, and these initial
values aren't particularly meaningful.
I then changed the passes to check the internl values and return without
performing the associated optimization when false (I changed the default
of -vectorize-slp to true so the options behave similarly). I was able
to remove the hackery in opt used to get -vectorize-loops=false to work,
as well as a special option there used to disable SLP vectorization.
Finally, I changed thinlto-slp-vectorize-pm.c to:
a) Only test SLP (moved the loop vectorization checking to a new test).
b) Use code that is slp vectorized when it is enabled, and check that
instead of whether the pass is enabled.
c) Test the new behavior of -vectorize-slp.
d) Test both pass managers.
The loop vectorization (and associated interleaving) testing I moved to
a new thinlto-loop-vectorize-pm.c test, with several changes:
a) Changed the flags on the interleaving testing so that it will
actually interleave, and check that.
b) Test the new behavior of -vectorize-loops and -interleave-loops.
c) Test both pass managers.
Reviewers: fhahn, wmi
Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, davezarzycki, llvm-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77989
Previously, getWithOffset() would drop the offset if the base was null.
Because of this, MachineMemOperand would return the wrong result from
getAlign() in these cases. MachineMemOperand stores the alignment of
the pointer without the offset.
A bunch of MIR tests changed because we print the offset now.
Split off from D77687.
Differential Revision: https://reviews.llvm.org/D78049
This class implements a switch-like dispatch statement for a value of 'T' using dyn_cast functionality. Each `Case<T>` takes a callable to be invoked if the root value isa<T>, the callable is invoked with the result of dyn_cast<T>() as a parameter.
Differential Revision: https://reviews.llvm.org/D78070
These have proved incredibly useful for interleaving values between a range w.r.t to streams. After this revision, the mlir/Support/STLExtras.h is empty. A followup revision will remove it from the tree.
Differential Revision: https://reviews.llvm.org/D78067
This revision moves the various range utilities present in MLIR to LLVM to enable greater reuse. This revision moves the following utilities:
* indexed_accessor_*
This is set of utility iterator/range base classes that allow for building a range class where the iterators are represented by an object+index pair.
* make_second_range
Given a range of pairs, returns a range iterating over the `second` elements.
* hasSingleElement
Returns if the given range has 1 element. size() == 1 checks end up being very common, but size() is not always O(1) (e.g., ilist). This method provides O(1) checks for those cases.
Differential Revision: https://reviews.llvm.org/D78064
This revision moves several type_trait utilities from MLIR into LLVM. Namely, this revision adds:
is_detected - This matches the experimental std::is_detected
is_invocable - This matches the c++17 std::is_invocable
function_traits - A utility traits class for getting the argument and result types of a callable type
Differential Revision: https://reviews.llvm.org/D78059
This revision adds a DenseMapInfo overload for std::tuples whose elements all have a DenseMapInfo. The implementation is similar to that of std::pair, and has been used within MLIR for over a year.
Differential Revision: https://reviews.llvm.org/D78057
This should make both static and dynamic NewPM plugins work with LTO.
And as a bonus, it makes static linking of OldPM plugins more reliable
for plugins with both an OldPM and NewPM interface.
I only implemented the command-line flag to specify NewPM plugins in
llvm-lto2, to show it works. Support can be added for other tools later.
Differential Revision: https://reviews.llvm.org/D76866
GCC emits this new form along with others forms(supported in llvm-dwardump)
and since it's support was missing in llvm-dwarfdump, it was not
able to correctly dump the content a debug_macro section for GCC
generated binaries.
This patch extends llvm-dwarfdump to support this form,
now GCC generated debug_macro section can be correctly dumped
using llvm-dwarfdump.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D78006
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.
Differential revision: https://reviews.llvm.org/D78016
ModuleSummaryAnalysis is the only file in libAnalysis that brings a
dependency on the CodeGen layer from libAnalysis, moving it breaks this
dependency.
Differential Revision: https://reviews.llvm.org/D77994
This is equivalent in terms of LLVM IR semantics, but we want to
transition away from using MaybeAlign to represent the alignment of
these instructions.
Differential Revision: https://reviews.llvm.org/D77984
This patch extracts the RTTI part of llvm::ErrorInfo into its own class
(RTTIExtends) so that it can be used in other non-error hierarchies, and makes
it compatible with the existing LLVM RTTI function templates (isa, cast,
dyn_cast, dyn_cast_or_null) by adding the classof method.
Differential Revision: https://reviews.llvm.org/D39111
In the case of more than one SDep between two successor SUnits in the Nodeset, the current implementation sums the latencies of the dependencies, which could create a larger RecMII than necessary.
for example, in case there is both a data dependency and an output dependency (with latency > 0) between successor nodes:
SU(1) inst1:
successors:
SU(2): out latency = 1
SU(2): data latency = 1
SU(2) inst2:
successors:
SU(3): out latency = 1
SU(3): data latency = 1
SU(3) inst3:
successors:
SU(1): out latency = 1
SU(1): data latency = 1
the NodeSet latency returned would be 6, whereas it could be 3 if we take the max for each successor SUnit.
In general this can be extended to finding the shortest path in the recurrence..
thoughts?
Unfortunately I had a hard time creating a test for this in Hexagon/PowerPC, so help would be appreciated.
Reviewed By: bcahoon
Differential Revision: https://reviews.llvm.org/D75918