D75801 removed the last and only user of this option, so we can
drop it now. The original idea behind this was to only run expensive
transforms under -O3, but apart from the one known bits transform,
this has never really taken off. I believe nowadays the recommendation
is to put expensive transforms in AggressiveInstCombine instead,
though that isn't terribly popular either :)
Differential Revision: https://reviews.llvm.org/D76540
This is the Waymarking algorithm implemented as an independent utility.
The utility is operating on a range of sequential elements.
First we "tag" the elements, by calling `fillWaymarks`.
Then we can "follow" the tags from every element inside the tagged
range, and reach the "head" (the first element), by calling
`followWaymarks`.
Differential Revision: https://reviews.llvm.org/D74415
Summary:
DataLayout::getTypeStoreSize() returns TypeSize.
For cases where it can not be scalable vector (e.g., GlobalVariable),
explicitly call TypeSize::getFixedSize().
For cases where scalable property doesn't matter, (e.g., check for
zero-sized type), use TypeSize::isNonZero().
Reviewers: sdesmalen, efriedma, apazos, reames
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76454
advanceToLowerBound moves an iterator to the first bit set at, or after,
the given index. This can be faster than doing IntervalMap::find.
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76466
Avoid making a heap allocation when constructing a CoalescingBitVector.
This reduces time spent in LiveDebugValues when compiling sqlite3 by
700ms (0.5% of the total User Time).
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76465
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.
This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
Summary:
I've implemented them as target-specific IR intrinsics rather than
using `@llvm.experimental.vector.reduce.add`, on the grounds that the
'experimental' intrinsic doesn't currently have much code generation
benefit, and my replacements encapsulate the sign- or zero-extension
so that you don't expose the illegal MVE vector type (`<4 x i64>`) in
IR.
The machine instructions come in two versions: with and without an
input accumulator. My new IR intrinsics, like the 'experimental' one,
don't take an accumulator parameter: we represent that by just adding
on the input value using an ordinary i32 or i64 add. So if you write
the `vaddvaq` C-language intrinsic with an input accumulator of zero,
it can be optimised to VADDV, and conversely, if you write something
like `x += vaddvq(y)` then that can be combined into VADDVA.
Most of this is achieved in isel lowering, by converting these IR
intrinsics into the existing `ARMISD::VADDV` family of custom SDNode
types. For the difficult case (64-bit accumulators), isel lowering
already implements the optimization of folding an addition into a
VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is
already done, except that I had to introduce a parallel set of ARMISD
nodes for the //predicated// forms of VADDLV.
For the simpler VADDV, we handle the predicated form by just leaving
the IR intrinsic alone and matching it in an ordinary dag pattern.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76491
Summary:
I've implemented these as target-specific IR intrinsics, because
they're not //quite// enough like @llvm.experimental.vector.reduce.min
(which doesn't take the extra scalar parameter). Also this keeps the
predicated and unpredicated versions looking similar, and the
floating-point minnm/maxnm versions fold into the same schema.
We had a couple of min/max reductions already implemented, from the
initial pathfinding exercise in D67158. Those were done by having
separate IR intrinsic names for the signed and unsigned integer
versions; as part of this commit, I've changed them to use a flag
parameter indicating signedness, which is how we ended up deciding
that the rest of the MVE intrinsics family ought to work. So now
hopefully the ewhole lot is consistent.
In the new llc test, the output code from the `v8f16` test functions
looks quite unpleasant, but most of it is PCS lowering (you can't pass
a `half` directly in or out of a function). In other circumstances,
where you do something else with your `half` in the same function, it
doesn't look nearly as nasty.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76490
Summary:
This patch implements the following intrinsics:
uint8x16_t __arm_vcx1q_u8 (int coproc, uint32_t imm);
T __arm_vcx1qa(int coproc, T acc, uint32_t imm);
T __arm_vcx2q(int coproc, T n, uint32_t imm);
uint8x16_t __arm_vcx2q_u8(int coproc, T n, uint32_t imm);
T __arm_vcx2qa(int coproc, T acc, U n, uint32_t imm);
T __arm_vcx3q(int coproc, T n, U m, uint32_t imm);
uint8x16_t __arm_vcx3q_u8(int coproc, T n, U m, uint32_t imm);
T __arm_vcx3qa(int coproc, T acc, U n, V m, uint32_t imm);
Most of them are polymorphic. Furthermore, some intrinsics are
polymorphic by 2 or 3 parameter types, such polymorphism is not
supported by the existing MVE/CDE tablegen backends, also we don't
really want to have a combinatorial explosion caused by 1000 different
combinations of 3 vector types. Because of this some intrinsics are
implemented as macros involving a cast of the polymorphic arguments to
uint8x16_t.
The IR intrinsics are even more restricted in terms of types: all MVE
vectors are cast to v16i8.
Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76299
Summary:
This change implements ACLE CDE intrinsics that translate to
instructions working with general-purpose registers.
The specification is available at
https://static.docs.arm.com/101028/0010/ACLE_2019Q4_release-0010.pdf
Each ACLE intrinsic gets a corresponding LLVM IR intrinsic (because
they have distinct function prototypes). Dual-register operands are
represented as pairs of i32 values. Because of this the instruction
selection for these intrinsics cannot be represented as TableGen
patterns and requires custom C++ code.
Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76296
This reverts commit e9f22fd429.
When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70
failing tests with this revision, and 29 without this revision.
(would be nice to revisit the CFG traits and change them to use ranges
rather than begin/end - if anyone wants to do that refactor)
Also use more auto because writing the names of range utilty iterators
isn't helping readability here - they're sort of implementation details
for the most part, especially once you nest a few different filtering
and adapting iterators.
The fix (shooting from the hip since I couldn't reproduce this locally)
was to capture by value in a lambda used in a filtering iterator -
because the iterator would persist beyond the lifetime of the function
(as the iterators are returned to callers).
Originally committed in 79a7ed92a9.
This was reverted in 4a7f2032a3.
Summary:
1. FileLineInfoSpecifier::Default isn't the default for anything.
Rename to RawValue, which accurately reflects its role.
2. Most functions that take a part of a FileLineInfoSpecifier end up
constructing a full one later or plumb two values through. Make them
all just take a complete FileLineInfoSpecifier.
3. Printing basenames only was handled differently from all other
variants, make it parallel to all the other variants.
Reviewers: jhenderson
Subscribers: hiraditya, MaskRay, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76394
Port over the following:
- shuffle undef, undef, any_mask -> undef
- shuffle anything, anything, undef_mask -> undef
This sort of thing shows up a lot when you try to bugpoint code containing
shufflevector.
Differential Revision: https://reviews.llvm.org/D76382
Updates the object buffer ownership scheme in jitLinkForOrc and related
functions: Ownership of both the object::ObjectFile and underlying
MemoryBuffer is passed into jitLinkForOrc and passed back to the onEmit
callback once linking is complete. This avoids the use-after-free errors
that were seen in 98f2bb4461.
In MachOPlatform, obtaining the link-order for a JITDylib requires locking the
session, but also needs to be part of a larger atomic operation that collates
initializer symbols tracked by the platform. Trying to do this under a separate
platform mutex leads to potential locking order issues, e.g.
T1 locks session then tries to lock platform to register a new init symbol
meanwhile
T2 locks platform then tries to lock session to obtain link order.
Removing the platform lock and performing all these operations under the session
lock eliminates this possibility.
At the same time we also need to collate init pointers from the
MachOPlatform::InitScraperPlugin, and we don't need or want to lock the session
for that. The new InitSeqMutex has been added to guard these init pointers, and
the session mutex is never obtained while the InitSeqMutex is held.
Check the path length limit against the length of the UTF-16 version of
the input rather than the UTF-8 equivalent, as the UTF-16 length may be
shorter. Move widenPath from the llvm::sys::path namespace in Path.h to
the llvm::sys::windows namespace in WindowsSupport.h. Only use the
reduced path length limit for create directory. Canonicalize using
sys::path::remove_dots().
Differential Revision: https://reviews.llvm.org/D75372
Summary:
In order to keep the names consistent with other SVE gather loads, the
intrinsics for gather prefetch are renamed as follows:
* @llvm.aarch64.sve.gather.prfb -> @llvm.aarch64.sve.prfb.gather
Reviewed by: fpetrogalli
Differential Revision: https://reviews.llvm.org/D76421
Summary:
* Remove a bunch of asserts checking for unsupported scalable types and
add some more now that they are supported.
* Propagate the scalable flag where necessary.
* Add another `EVT::getExtendedVectorVT` method that takes an
ElementCount parameter.
* Add `EVT::isExtendedScalableVector` and
`EVT::getExtendedVectorElementCount` - latter is currently unused.
Reviewers: sdesmalen, efriedma, rengolin, craig.topper, huntergr
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75672
Summary:
Related to D75672, this patch adds EVT::isFixedLengthVector to determine
if the underlying vector type is of fixed length.
An assert is introduced in EVT::getVectorNumElements that triggers for
types that aren't fixed length. This is currently guarded by a flag
added D75297 that is off by default and has been renamed to the more
generic ENABLE_STRICT_FIXED_SIZE_VECTORS.
Ideally we want to get rid of getVectorNumElements but a quick grep
shows there are >350 uses in lib/CodeGen and 75 in lib/Target/AArch64
alone. All of these probably aren't EVT::getVectorNumElements (some may
be the MVT equivalent), but there are many places to fixup and having
the assert on by default would make the SVE upstreaming effort
difficult.
Reviewers: sdesmalen, efriedma, ctetreau, huntergr, rengolin
Reviewed By: efriedma
Subscribers: mgorny, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76376
(would be nice to revisit the CFG traits and change them to use ranges
rather than begin/end - if anyone wants to do that refactor)
Also use more auto because writing the names of range utilty iterators
isn't helping readability here - they're sort of implementation details
for the most part, especially once you nest a few different filtering
and adapting iterators.
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.
Differential Revision: https://reviews.llvm.org/D75660
SelectionDAG CSEs nodes based on their result type and operands, but not their flags. The flags are expected to be intersected when they are CSEd. In SelectionDAGBuilder, for FP nodes we manage both the fast math flags and the nofpexcept flag after the nodes have already been CSEd when they were created with getNode. The management of the fastmath flags before the constrained nodes prevents the nofpexcept management from working correctly.
This commit moves the FMF handling for constrained intrinsics into their visitor and disables the common FMF handling for these nodes.
Differential Revision: https://reviews.llvm.org/D75224
This patch generates TableGen descriptions for the specified register
banks which contain a list of register sizes corresponding to the
available HwModes. The appropriate size is used during codegen according
to the current HwMode. As this HwMode was not available on generation,
it is set upon construction of the RegisterBankInfo class. Targets
simply need to provide the HwMode argument to the
<target>GenRegisterBankInfo constructor.
The RISC-V RegisterBankInfo constructor has been updated accordingly
(plus an unused argument removed).
Differential Revision: https://reviews.llvm.org/D76007
This is fixing up various places that use the implicit
TypeSize->uint64_t conversion.
The new overloads in MemoryLocation.h are already used in various places
that construct a MemoryLocation from a TypeSize, including MemorySSA.
(They were using the implicit conversion before.)
Differential Revision: https://reviews.llvm.org/D76249
This ports some combines from DAGCombiner.cpp which perform some trivial
transformations on instructions with undef operands.
Not having these can make it extremely annoying to find out where we differ
from SelectionDAG by looking at existing lit tests. Without them, we tend to
produce pretty bad code generation when we run into instructions which use
undef operands.
Also remove the nonpow2_store_narrowing testcase from arm64-fallback.ll, since
we no longer fall back on the add.
Differential Revision: https://reviews.llvm.org/D76339
Summary:
This is another set of instructions too complicated to be sensibly
expressed in IR by anything short of a target-specific intrinsic.
Given input vectors a,b, the instruction generates intermediate values
2*(a[0]*b[0]+a[1]+b[1]), 2*(a[2]*b[2]+a[3]+b[3]), etc; takes the high
half of each double-width values, and overwrites half the lanes in the
output vector c, which you therefore have to provide the input value
of. Optionally you can swap the elements of b so that the are things
like a[0]*b[1]+a[1]*b[0]; optionally you can round to nearest when
taking the high half; and optionally you can take the difference
rather than sum of the two products. Finally, saturation is applied
when converting back to a single-width vector lane.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76359
Summary:
As noted in [[ https://bugs.llvm.org/show_bug.cgi?id=45201 | PR45201 ]],
[[ https://bugs.llvm.org/show_bug.cgi?id=10090 | PR10090 ]] SCEV doesn't
always avoid recursive algorithms, and that causes issues with
large expression depths and/or smaller stack sizes.
In `SCEVExpander::isHighCostExpansion*()` case, the refactoring to avoid
recursion is rather idiomatic. We simply need to place the root expr
into a vector, and iterate over vector elements accounting for the cost
of each one, adding new exprs at the end of the vector,
thus achieving recursion-less traversal.
The order in which we will visit exprs doesn't matter here,
so we will be fine with the most basic approach of using SmallVector
and inserting/extracting from the back, which accidentally is the same
depth-first traversal that we were doing previously recursively.
Reviewers: mkazantsev, reames, wmi, ekatz
Reviewed By: mkazantsev
Subscribers: hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76273
When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.
We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.
Differential revision: https://reviews.llvm.org/D69936
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76348
Summary:
This fixes a discrepancy between the non-temporal loads/store
intrinsics and other SVE load intrinsics (such as nf/ff), so
that Clang can use the same code to generate these intrinsics.
Reviewers: andwar, kmclaughlin, rengolin, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76237
Summary:
In D67768/D67492 I added support for entry values having blocks larger
than one byte, but I now noticed that the DIE implementation I added there
was broken. The takeNodes() function, that moves the entry value block
from a temporary buffer to the output buffer, would destroy the input
iterator when transferring the first node, meaning that only that node
was moved.
In practice, this meant that when emitting a call site value using a
DW_OP_entry_value operation with a DWARF register number larger than 31,
that multi-byte DW_OP_regx expression would be truncated.
Reviewers: djtodoro, aprantl, vsk
Reviewed By: djtodoro
Subscribers: llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D76279
Summary:
These are complicated integer multiply+add instructions with extra
saturation, taking the high half of a double-width product, and
optional rounding. There's no sensible way to represent that in
standard IR, so I've converted the clang builtins directly to
target-specific intrinsics.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76123
Summary:
These instructions compute multiply+add in integers, with one of the
operands being a splat of a scalar. (VMLA and VMLAS differ in whether
the splat operand is a multiplier or the addend.)
I've represented these in IR using existing standard IR operations for
the unpredicated forms. The predicated forms are done with target-
specific intrinsics, as usual.
When operating on n-bit vector lanes, only the bottom n bits of the
i32 scalar operand are used. So we have to tell that to isel lowering,
to allow it to remove a pointless sign- or zero-extension instruction
on that input register. That's done in `PerformIntrinsicCombine`, but
first I had to enable `PerformIntrinsicCombine` for MVE targets
(previously all the intrinsics it handled were for NEON), and make it
a method of `ARMTargetLowering` so that it can get at
`SimplifyDemandedBits`.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76122
RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86.
Based on a previous patch by Krzysztof Parzyszek
Differential Revision: https://reviews.llvm.org/D75932
Summary: Prevent InstCombine from removing llvm.assume for which the arguement is true when they have operand bundles with usefull information.
Reviewers: jdoerfert, nikic, lebedev.ri
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76147
MCTargetOptionsCommandFlags.inc and CommandFlags.inc are headers which contain
cl::opt with static storage.
These headers are meant to be incuded by tools to make it easier to parametrize
codegen/mc.
However, these headers are also included in at least two libraries: lldCommon
and handle-llvm. As a result, when creating DYLIB, clang-cpp holds a reference
to the options, and lldCommon holds another reference. Linking the two in a
single executable, as zig does[0], results in a double registration.
This patch explores an other approach: the .inc files are moved to regular
files, and the registration happens on-demand through static declaration of
options in the constructor of a static object.
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1756977#c5
Differential Revision: https://reviews.llvm.org/D75579
With -fstack-protector-strong we check if a non-array variable has its address
taken in a way that could cause a potential out-of-bounds access. However what
we don't catch is when the address is directly used to create an out-of-bounds
memory access.
Fix this by examining the offsets of GEPs that are ultimately derived from
allocas and checking if the resulting address is out-of-bounds, and by checking
that any memory operations using such addresses are not over-large.
Fixes PR43478.
Differential revision: https://reviews.llvm.org/D75695
This patch makes `Relocation::Addend` to be `ELFYAML::YAMLIntUInt` and not `int64_t`.
`ELFYAML::YAMLIntUInt` it is a new type and it has the following benefits/features:
1) For an 64-bit object any hex/decimal addends
in the range [INT64_MIN, UINT64_MAX] is accepted.
2) For an 32-bit object any hex/decimal addends
in range [INT32_MIN, UINT32_MAX] is accepted.
3) Negative hex numbers like -0xffffffff are not accepted.
4) It is printed as decimal. I.e. obj2yaml will print
something like "Addend: 125", this matches the current behavior.
This fixes all FIXMEs in `relocation-addend.yaml`.
Differential revision: https://reviews.llvm.org/D75527
Summary:
Run StackSafetyAnalysis at the end of the IR pipeline and annotate
proven safe allocas with !stack-safe metadata. Do not instrument such
allocas in the AArch64StackTagging pass.
Reviewers: pcc, vitalybuka, ostannard
Reviewed By: vitalybuka
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, gilang, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73513
This is the second patch in a series of patches to enable basic block
sections support.
This patch adds support for:
* Creating direct jumps at the end of basic blocks that have fall
through instructions.
* New pass, bbsections-prepare, that analyzes placement of basic blocks
in sections.
* Actual placing of a basic block in a unique section with special
handling of exception handling blocks.
* Supports placing a subset of basic blocks in a unique section.
* Support for MIR serialization and deserialization with basic block
sections.
Parent patch : D68063
Differential Revision: https://reviews.llvm.org/D73674
I used the implementation for floor instead of round. It also turns
out the OpenCL builtin library wasn't using the round builtin, but
implemented the expanded form.
Follow-up for D74433
What the function returns are almost standard BFD names, except that "ELF" is
in uppercase instead of lowercase.
This patch changes "ELF" to "elf" and changes ARM/AArch64 to use their BFD names.
MIPS and PPC64 have endianness differences as well, but this patch does not intend to address them.
Advantages:
* llvm-objdump: the "file format " line matches GNU objdump on ARM/AArch64 objects
* "file format " line can be extracted and fed into llvm-objcopy -O literally.
(https://github.com/ClangBuiltLinux/linux/issues/779 has such a use case)
Affected tools: llvm-readobj, llvm-objdump, llvm-dwarfdump, MCJIT (internal implementation detail, not exposed)
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D76046
Summary:
Truncating the result of a merge means that most likely we could have done without merge in the first place and just used the input merge inputs directly. This can be done in three cases:
1. If the truncation result is smaller than the merge source, we can use the source in the trunc directly
2. If the sizes are the same, we can replace the register or use a copy
3. If the truncation size is a multiple of the merge source size, we can build a smaller merge
This gets rid of most of the larger, hard-to-legalize merges.
Reviewers: qcolombet, aditya_nandakumar, aemerson, paquette, arsenm, Petar.Avramovic
Reviewed By: arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, jrtc27, atanasyan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75915
This adds the --debug-vars option to llvm-objdump, which prints
locations (registers/memory) of source-level variables alongside the
disassembly based on DWARF info. A vertical line is printed for each
live-range, with a label at the top giving the variable name and
location, and the position and length of the line indicating the program
counter range in which it is valid.
Currently, this only works for object files, not executables or shared
libraries.
Differential revision: https://reviews.llvm.org/D70720
alignBranches is X86 specific, change the name in a
more general one since other target can do some state
chang before and after emitting the instruction.
Enable use of ExecutionEngine JITEventListeners in RTDyldObjectLinkingLayer.
This allows existing MCJIT clients to more easily migrate to LLJIT / ORCv2.
Example usage in llvm/examples/OrcV2Examples/LLJITWithGDBRegistrationListener.
Differential Revision: https://reviews.llvm.org/D75838
This patch removes compiler runtime assertions that ensure the implicit
conversion are only guaranteed to work for fixed-width vectors.
With the assert it would be impossible to get _anything_ to build until
the
entire codebase has been upgraded, even when the indiscriminate uses of
the size as uint64_t would work fine for both scalable and fixed-width
types.
This issue will need to be addressed differently, with build-time errors
rather than assertion failures, but that effort falls beyond the scope
of this patch.
Returning the scalable size and avoiding the assert in getFixedSize()
is a temporary stop-gap in order to use LLVM for compiling and using
the SVE ACLE intrinsics.
Reviewers: efriedma, huntergr, rovka, ctetreau, rengolin
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75297
This patch adds a new singlecrfromundef lattice value, indicating a
single element constant range which was merge with undef at some point.
Merging it with another constant range results in overdefined, as we
won't be able to replace all users with a single value.
This patch uses a ConstantRange instead of a Constant*, because regular
integer constants are represented as single element constant ranges as
well and this allows the existing code working without additional
changes.
Reviewers: efriedma, nikic, reames, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75845
This is the first in a series of patches to enable Basic Block Sections
in LLVM.
We introduce a new compiler option, -fbasicblock-sections=, which places every
basic block in a unique ELF text section in the object file along with a
symbol labeling the basic block. The linker can then order the basic block
sections in any arbitrary sequence which when done correctly can encapsulate
block layout, function layout and function splitting optimizations. However,
there are a couple of challenges to be addressed for this to be feasible:
1) The compiler must not allow any implicit fall-through between any two
adjacent basic blocks as they could be reordered at link time to be
non-adjacent. In other words, the compiler must make a fall-through
between adjacent basic blocks explicit by retaining the direct jump
instruction that jumps to the next basic block. These branches can only
be removed later by the linker after the blocks have been reordered.
2) All inter-basic block branch targets would now need to be resolved by
the linker as they cannot be calculated during compile time. This is
done using static relocations which bloats the size of the object files.
Further, the compiler tries to use short branch instructions on some ISAs
for branch offsets that can be accommodated in one byte. This is not
possible with basic block sections as the offset is not determined at
compile time, and long branch instructions have to be used everywhere.
3) Each additional section bloats object file sizes by tens of bytes. The
number of basic blocks can be potentially very large compared to the
size of functions and can bloat object sizes significantly. Option
fbasicblock-sections= also takes a file path which can be used to
specify a subset of basic blocks that needs unique sections to keep
the bloats small.
4) Debug Info and CFI need special handling and will be presented as
separate patches.
Basic Block Labels
With -fbasicblock-sections=labels, or when a basic block is placed in a
unique section, it is labelled with a symbol. This allows easy mapping of
virtual addresses from PMU profiles back to the corresponding basic blocks.
Since the number of basic blocks is large, the labeling bloats the symbol
table sizes and the string table sizes significantly. While the binary size
does increase, it does not affect performance as the symbol table is not
loaded in memory during run-time. The string table size bloat is kept very
minimal using a unary naming scheme that uses string suffix compression.
The basic blocks for function foo are named "a.BB.foo", "aa.BB.foo", ...
This turns out to be very good for string table sizes and the bloat in the
string table size for a very large binary is ~8 %. The naming also allows
using the --symbol-ordering-file option in LLD to arbitrarily reorder the
sections.
Differential Revision: https://reviews.llvm.org/D68063
Renames the llvm/examples/LLJITExamples directory to llvm/examples/OrcV2Examples
since it is becoming a home for all OrcV2 examples, not just LLJIT.
See http://llvm.org/PR31103.
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.
The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.
Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.
Reviewers: efriedma, reames, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75120
Patch based on https://reviews.llvm.org/D75912 by Alexander Shishkin. Thanks
Alexander!
To minimize disruption to existing clients, who may be relying on the fact that
unused references to unresolved symbols do not generate an error, this patch
makes error checking opt-in: Clients can call ExecutionEngine::hasError or
LLVMExecutionEngineGetError to check whether and error has occurred.
Differential revision: https://reviews.llvm.org/D75912
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).
AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.
To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.
This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html
Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.
At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.
Differential Revision: https://reviews.llvm.org/D74338
When emitting PDBs, the TypeStreamMerger class is used to merge .debug$T records from the input .OBJ files into the output .PDB stream.
Records in .OBJs are not required to be aligned on 4-bytes, and "The Netwide Assembler 2.14" generates non-aligned records.
When compiling with -DLLVM_ENABLE_ASSERTIONS=ON, an assert was triggered in MergingTypeTableBuilder when non-ghash merging was used.
With ghash merging there was no assert.
As a result, LLD could potentially generate a non-aligned TPI stream.
We now align records on 4-bytes when record indices are remapped, in TypeStreamMerger::remapIndices().
Differential Revision: https://reviews.llvm.org/D75081
This patch add mayContainUnboundedCycle helper function which checks whether a function has any cycle which we don't know if it is bounded or not.
Loops with maximum trip count are considered bounded, any other cycle not.
It also contains some fixed tests and some added tests contain bounded and
unbounded loops and non-loop cycles.
Reviewed By: jdoerfert, uenoku, baziotis
Differential Revision: https://reviews.llvm.org/D74691
Calculating SCEVs can be cumbersome, and may take very long time (even
hours, for very long expressions). To prevent recalculating expressions
over and over again, we cache them.
This change add cache queries to key positions, to prevent recalculation
of the expressions.
Fix PR43571.
Differential Revision: https://reviews.llvm.org/D70097
Summary:
This intrinsic implements the unpredicated duplication of scalar values
and is mapped to (through ISD::SPLAT_VECTOR):
* DUP <Zd>.<T>, #<imm>
* DUP <Zd>.<T>, <R><n|SP>
Reviewed by: sdesmalen
Differential Revision: https://reviews.llvm.org/D75900
After a crash catched by the CrashRecoveryContext, this patch prevents from accessing dangling pointers in TimerGroup structures before the clang tool exits. Previously, the default TimerGroup had internal linked lists which were still pointing to old Timer or TimerGroup instances, which lived in stack frames released by the CrashRecoveryContext.
Fixes PR45164.
Differential Revision: https://reviews.llvm.org/D76099
LLVM currently supports CSK_MD5 and CSK_SHA1 source file checksums in
debug info. This change adds support for CSK_SHA256 checksums.
The SHA256 checksums are supported by the CodeView debug format.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D75785
These may be accessed from multiple threads if concurrent materialization is
enabled in ORC.
Testcase coming in a follow-up patch that enables eh-frame registration for
LLJIT.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Summary:
Using the default DAG.UnrollVectorOp on v16i8 and v8i16 vectors
results in i8 or i16 nodes being inserted into the SelectionDAG. Since
those are illegal types, this causes a legalization assertion failure
for some code patterns, as uncovered by PR45178. This change unrolls
shifts manually to avoid this issue by adding and using a new optional
EVT argument to DAG.ExtractVectorElements to control the type of the
extract_element nodes.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76043