We have a vector compare reduction problem seen in PR39665 comment 2:
https://bugs.llvm.org/show_bug.cgi?id=39665#c2
Or slightly reduced here:
define i1 @cmp2(<2 x double> %a0) {
%a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0>
%b = extractelement <2 x i1> %a, i32 0
%c = extractelement <2 x i1> %a, i32 1
%d = and i1 %b, %c
ret i1 %d
}
SLP would not attempt to turn this into a vector reduction because there is an
artificial lower limit on that transform. We can not completely remove that limit
without inducing regressions though, so this patch just hacks an extra attempt at
creating a 2-way reduction to the end of the analysis.
As shown in the test file, we are still not getting some of the motivating cases,
so follow-on patches will be needed to solve those cases.
Differential Revision: https://reviews.llvm.org/D59710
"[SLP] Generalization of stores vectorization."
"[SLP] Fix -Wunused-variable. NFC"
"[SLP] Vectorize jumbled stores."
As they're causing significant (10-30x) compile time regressions on
vectorizable code.
The primary cause of the compile-time regression is f228b53716.
This reverts commits:
f228b537165503455ccb21d498c9c0
Summary:
Much like D67339, adds ConstantRange handling for
when we know no-wrap behavior of the `sub`.
Unlike addWithNoWrap(), we only get lucky re returning empty set
for signed wrap. For unsigned, we must perform overflow check manually.
A patch that makes use of this in LVI (CVP) to be posted later.
Reviewers: nikic, shchenz, efriedma
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69918
Some targets (E.g. MachO/arm64) use relocations to fix some CFI record fields
in the eh-frame section. When relocations are used the initial (pre-relocation)
content of the eh-frame section can no longer be interpreted by following the
eh-frame specification. This causes errors in the existing eh-frame parser.
This patch moves eh-frame handling into two LinkGraph passes that are run after
relocations have been parsed (but before they are applied). The first] pass
breaks up blocks in the eh-frame section into per-CFI-record blocks, and the
second parses blocks of (potentially multiple) CFI records and adds the
appropriate edges to any CFI fields that do not have existing relocations.
These passes can be run independently of one another. By handling eh-frame
splitting/fixing with LinkGraph passes we can both re-use existing relocations
for CFI record fields and avoid applying eh-frame fixups before parsing the
section (which would complicate the linker and require extra temporary
allocations of working memory).
Summary:
This patch factors out code to clone instructions -- partly for
readability and partly to facilitate an upcoming patch of my own.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69861
The combine G_UNMERGE_VALUES with G_CONCAT_VECTORS used to only be performed
when the result type of the G_UNMERGE_VALUES was a vector type.
In other words, we were expecting that the G_UNMERGE_VALUES was effectively
the exact opposite of the G_CONCAT_VECTORS.
Lift that constraint by allowing any G_UNMERGE_VALUES to be combined
with any G_CONCAT_VECTORS (as long as the size of the different pieces
that we merge/unmerge match).
Differential Revision: https://reviews.llvm.org/D69288
Summary:
This patch stems from the discussion D68270 (including some offline
talks). The idea is to provide an "incremental" api for parsing location
lists, which will avoid caching or materializing parsed data. An
additional goal is to provide a high level location list api, which
abstracts the differences between different encoding schemes, and can be
used by users which don't care about those (such as LLDB).
This patch implements the first part. It implements a call-back based
"visitLocationList" api. This function parses a single location list,
calling a user-specified callback for each entry. This is going to be
the base api, which other location list functions (right now, just the
dumping code) are going to be based on.
Future patches will do something similar for the v4 location lists, and
add a mechanism to translate raw entries into concrete address ranges.
Reviewers: dblaikie, probinson, JDevlieghere, aprantl, SouraVX
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69672
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.
While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
and its corresponding loophint.
Differential Revision: https://reviews.llvm.org/D69040
This patch adds two new families of intrinsics, both of which are
memory accesses taking a vector of locations to load from / store to.
The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of
base addresses, and an immediate offset to be added consistently to
each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar
base address, and a vector of offsets to add to it. The
'shifted_offset' variants also multiply each offset by the element
size type, so that the vector is effectively of array indices.
At the IR level, these operations are represented by a single set of
four IR intrinsics: {gather,scatter} × {base,offset}. The other
details (signed/unsigned, shift, and memory element size as opposed to
vector element size) are all specified by IR intrinsic polymorphism
and immediate operands, because that made the selection job easier
than making a huge family of similarly named intrinsics.
I considered using the standard IR representations such as
llvm.masked.gather, but they're not a good fit. In order to use
llvm.masked.gather to represent a gather_offset load with element size
smaller than a pointer, you'd have to expand the <8 x i16> vector of
offsets into an <8 x i16*> vector of pointers, which would be split up
during legalization, so you'd spend most of your time undoing the mess
it had made. Also, ISel support for llvm.masked.gather would be easy
enough in a trivial way (you can expand it into a gather-base load
with a zero immediate offset), but instruction-selecting lots of
fiddly idioms back into all the _other_ MVE load instructions would be
much more work. So I think dedicated IR intrinsics are the more
sensible approach, at least for the moment.
On the clang tablegen side, I've added two new features to the
Tablegen source accepted by MveEmitter: a 'CopyKind' type node for
defining a type that varies with the parameter type (it lets you ask
for an unsigned integer type of the same width as the parameter), and
an 'unsignedflag' value node for passing an immediate IR operand which
is 0 for a signed integer type or 1 for an unsigned one. That lets me
write each kind of intrinsic just once and get all its subtypes and
immediate arguments generated automatically.
Also I've tweaked the handling of pointer-typed values in the code
generation part of MveEmitter: they're generated as Address rather
than Value (i.e. including an alignment) so that they can be given to
the ordinary IR load and store operations, but I'd omitted the code to
convert them back to Value when they're going to be used as an
argument to an IR intrinsic.
On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you
not only the full assembly-language suffix for a given vector type
(like 's32' or 'u16') but also the numeric-only one used by store
instructions (just '32' or '16').
Reviewers: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69791
Summary:
Rework the GMIR documentation to focus more on the end user than the
implementation and tie it in to the MIR document. There was also some
out-of-date information which has been removed.
The quality of the GenericOpcode reference is highly variable and drops
sharply as I worked through them all but we've got to start somewhere :-).
It would be great if others could expand on this too as there is an awful
lot to get through.
Also fix a typo in the definition of G_FLOG. Previously, the comments said
we had two base-2's (G_FLOG and G_FLOG2).
Reviewers: aemerson, volkan, rovka, arsenm
Reviewed By: rovka
Subscribers: wdng, arphaman, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69545
In an optimization to improve performance (rL375240) we added a std::shared_ptr
around the main table map. This is safe, but we also ended up making the
transcriber object a std::shared_ptr too. This has mutable state, so must be
copied when we copy the Automaton object. This is very cheap; the main optimization
was about the map `M` only.
Reported by Dan Palermo. No test as triggering this is rather hard from a unit test.
llvm-objdump -D this file:
int a[100000];
int main() { return 0; }
Will produce an error: "The end of the file was unexpectedly encountered".
This happens because of a check in Binary.h checkOffset. (Addr + Size > M.getBufferEnd()).
The sh_offset and sh_size fields can be ignored for SHT_NOBITS sections.
Fix the error by changing ELFObjectFile<ELFT>::getSectionContents to use
the file base for SHT_NOBITS sections.
Reviewed By: grimar, MaskRay
Differential Revision: https://reviews.llvm.org/D69192
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD
Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69734
Summary:
This patch factors out code to merge a basic block with its sole
successor -- partly for readability and partly to facilitate an
upcoming patch of my own.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69852
Summary:
To drive the automaton we used a uint64_t as an action type. This
contained the transition's resource requirements as a conjunction:
(a OR b) AND (b OR c)
We encoded this conjunction as a sequence of four 16-bit bitmasks.
This limited the number of addressable functional units to 16, which
is quite low and has bitten many people in the past.
Instead, the DFAEmitter now generates a lookup table from InstrItinerary
class (index of the ItinData inside the ProcItineraries) to an internal
action index which is essentially a dense embedding of the conjunctive
form. Because we never materialize the conjunctive form, we no longer
have the 16 FU restriction.
In this patch we limit to 64 functional units due to using a uint64_t
bitmask in the DFAEmitter. Now that we've decoupled these representations
we can increase this in future.
Reviewers: ThomasRaoux, kparzysz, majnemer
Reviewed By: ThomasRaoux
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69110
This recommits 2be17087f8 (reverted in
d3ec06d219 for heap-use-after-free) with a fix
in IAI's reset() which was not clearing the set of interleave groups after
deleting them.
Summary:
This patch factors out common code to update the SSA form in
JumpThreading.cpp -- partly for readability and partly to facilitate
an coming patch of my own.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69811
In the ARM backend, for historical reasons we have only some targets
using Machine Scheduling. The rest use the old list scheduler as they
are using itinaries and the list scheduler seems to produce better code
(and not crash running out of register on v6m codes). So whether to use
the MIScheduler or not is checked at runtime from the subtarget
features.
This is fine, except for post-ra scheduling. Whether to use the old
post-ra list scheduler or the post-ra machine schedule is decided as the
pass manager is set up, in arms case from a newly constructed subtarget.
Under some situations, like LTO, this won't include the correct cpu so
can pick the wrong option. This can have a surprising effect on
performance.
To fix that, this patch overrides targetSchedulesPostRAScheduling and
addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and
picking at runtime which to execute. To pick between the two I've had to
add a enablePostRAMachineScheduler() method that normally returns
enableMachineScheduler() && enablePostRAScheduler(), which can be
overridden to enable just one of PostRAMachineScheduler vs
PostRAScheduler.
Thanks to David Penry for the identifying this problem.
Differential Revision: https://reviews.llvm.org/D69775
Summary:
Handling relocations was not needed when the loclists section was a
DWO-only thing. But since DWARF5, it is possible to use it in regular
objects too, and the standard permits embedding addresses into the
section directly. These addresses need to be relocated in unlinked
files.
Reviewers: JDevlieghere, dblaikie, probinson
Subscribers: aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68271
This class was a bit overengineered, and was triggering some PVS warnings.
Instead, put strings into a NameType and let clients unconditionally treat it
as a Node.
This patch (second of two patches) lowers the generic PowerPC vector
entries to PowerPC subtarget-specific entries.
For instance, the PowerPC generic entry 'cbrtd2_massv' is lowered to
'cbrtd2_P9' or Power9 subtarget.
The first patch enables the vectorizer to recognize the IBM MASS vector
library routines. This patch specifically adds support for recognizing
the '-vector-library=MASSV' option, and defines mappings from IEEE
standard scalar math functions to generic PowerPC MASS vector
counterparts.
For instance, the generic PowerPC MASS vector entry for double-precision
'cbrt' function is '__cbrtd2_massv'
The overall support for MASS vector library is presented as such in two
patches for ease of review.
Patch by pjeeva01 (Jeeva P.)
Differential Revision: https://reviews.llvm.org/D59883
This reverts commit 004ed2b0d1.
Original commit hash 6d03890384
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.
https://reviews.llvm.org/D67723
MachineVerifier::visitMachineFunctionAfter() is extended to check the
live-through case for live-in lists. This is only done for registers without
aliases and that are neither allocatable or reserved, such as the SystemZ::CC
register.
The MachineVerifier earlier only catched the case of a live-in use without
an entry in the live-in list (as "using an undefined physical register").
A comment in LivePhysRegs.h has been added stating a guarantee that
addLiveOuts() can be trusted for a full register both before and after
register allocation.
Review: Quentin Colombet
https://reviews.llvm.org/D68267
Summary:
These were the only remaining users of the GetElementPtrInst::getGEPReturnType
method that gets the element type from the pointer type.
Remove that method since its now dead.
Reviewers: jyknight, t.p.northover, arsenm
Reviewed By: arsenm
Subscribers: wdng, arsenm, arphaman, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69756
Dependences between two abstract attributes SRC and TRG come naturally in
two flavors:
Either (1) "some" information of SRC is *required* for TRG to derive
information, or (2) SRC is just an *optional* way for TRG to derive
information.
While it is not strictly necessary to distinguish these types
explicitly, it can help us to converge faster, in terms of iterations,
and also cut down the number of `AbstractAttribute::update` calls.
As far as I can tell, we only use optional dependences for liveness so
far but that might change in the future. With this change the Attributor
can be informed about the "dependence class" and it will perform
appropriate actions when an Attribute is set to an invalid state, thus
one that cannot be used by others to derive information from.
We already have the FunctionType we can call getReturnType on.
I think this was due to a bad rebase of the CallBr patch while
it was in development when CallInst and InvokeInst were updated.
When the Attributor run on the IPConstantProp test case for multiple
callbacks it triggered a faulty assertion in the AbstractCallSite
implementation. The callee can well be at argument position 0.
Summary:
This instruction is not merged to the spec proposal, but we need it to
be implemented in the toolchain to experiment with it. It is available
only on an opt-in basis through a clang builtin.
Defined in https://github.com/WebAssembly/simd/pull/127.
Depends on D69696.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69697
Summary:
Introduces a clang builtins and LLVM intrinsics representing integer
min/max instructions. These instructions have not been merged to the
SIMD spec proposal yet, so they are currently opt-in only via builtins
and not produced by general pattern matching. If these instructions
are accepted into the spec proposal the builtins and intrinsics will
be replaced with normal pattern matching.
Defined in https://github.com/WebAssembly/simd/pull/27.
Reviewers: aheejin
Reviewed By: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69696
Add support for continuously syncing profile counter updates to a file.
The motivation for this is that programs do not always exit cleanly. On
iOS, for example, programs are usually killed via a signal from the OS.
Running atexit() handlers after catching a signal is unreliable, so some
method for progressively writing out profile data is necessary.
The approach taken here is to mmap() the `__llvm_prf_cnts` section onto
a raw profile. To do this, the linker must page-align the counter and
data sections, and the runtime must ensure that counters are mapped to a
page-aligned offset within a raw profile.
Continuous mode is (for the moment) incompatible with the online merging
mode. This limitation is lifted in https://reviews.llvm.org/D69586.
Continuous mode is also (for the moment) incompatible with value
profiling, as I'm not sure whether there is interest in this and the
implementation may be tricky.
As I have not been able to test extensively on non-Darwin platforms,
only Darwin support is included for the moment. However, continuous mode
may "just work" without modification on Linux and some UNIX-likes. AIUI
the default value for the GNU linker's `--section-alignment` flag is set
to the page size on many systems. This appears to be true for LLD as
well, as its `no_nmagic` option is on by default. Continuous mode will
not "just work" on Fuchsia or Windows, as it's not possible to mmap() a
section on these platforms. There is a proposal to add a layer of
indirection to the profile instrumentation to support these platforms.
rdar://54210980
Differential Revision: https://reviews.llvm.org/D68351
Remarks are usually emitted per-TU, and for generating a standalone
remark file that can be shipped with the linked binary we need some kind
of tool to merge everything together.
The remarks::RemarkLinker class takes care of this and:
* Deduplicates remarks
* Filters remarks with no debug location
* Merges string tables from all the entries
As an output, it provides an iterator range that can be used to
serialize the remarks to a file.
Differential Revision: https://reviews.llvm.org/D69141
Move TargetLoweringBase::isSuitableForJumpTable from
llvm/CodeGen/TargetLowering.h to .cpp, to avoid the undefined reference
from all LLVM${Target}ISelLowering.cpp.
Another fix is to add a dependency on TransformUtils to all
lib/Target/$Target/LLVMBuild.txt, but that is too disruptive.
Summary:
In order to get context sensitivity from isKnownNonZero we need to
provide a context instruction *and* a dominator tree. The latter is
passed now to which actually allows to remove some initialization code.
Tests taken from PR43833.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69595
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D69622
Used in D69245, these add pattern matchers for the WithOverflowInst
(capturing the result) and the ExtractValue instructions taking a
template parameter specifying the element being extracted.
Since SCEV can cache information about location of an instruction, it should be invalidated when the instruction is moved.
There should be similar bug in code sinking part of LICM, it will be fixed in a follow-up change.
Patch Author: Daniil Suchkov
Reviewers: asbirlea, mkazantsev, reames
Reviewed By: asbirlea
Subscribers: hiraditya, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D69370
This is the "official" constant for arm64. We also have another constant
for arm64 (called BP_ARM64), which was used by breakpad while there was
no official constant for arm64 available.
This adds a flag to LLVM and clang to always generate a .debug_frame
section, even if other debug information is not being generated. In
situations where .eh_frame would normally be emitted, both .debug_frame
and .eh_frame will be used.
Differential Revision: https://reviews.llvm.org/D67216
Summary:
This patch introduces liveness (AAIsDead) for all positions, thus for
all kinds of values. For now, we say an instruction is dead if it would
be removed assuming all users are dead. A call site return is different
as we just look at the users. If all call site returns have been
eliminated, the return values can return undef instead of their original
value, eliminating uses.
We try to recursively delete dead instructions now and we introduce a
simple check interface for use-traversal.
This is the idea tried out in D68626 but implemented in the right way.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68925
Summary:
If a conditional branch is encountered we can try to find a join block
where the execution is known to continue. This means finding a suitable
block, e.g., the immediate post dominator of the conditional branch, and
proofing control will always reach that block.
This patch implements different techniques that work with and without
provided analysis.
Reviewers: uenoku, sstefan1, hfinkel
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68933
Summary:
If there is a unique free of the allocated that has to be reached from
the malloc, we can apply the heap-2-stack transformation even if the
pointer escapes.
Reviewers: hfinkel, sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68958
If an attribute did not query any optimistic (=non-fixed) information to
justify its state, we know the attribute state will not change anymore.
Thus, we can indicate an optimistic fixpoint.
We pretended IRPosition came either as mutable or immutable objects
while they are basically always immutable, with a single (existing)
unfortunate exceptions. This patch cleans up the uses to deal with the
immutable version.
For (almost) all IRAttribute we can derive whatever we want for undef
values so it makes sense to provide this functionality in the base
class. At the same time, we probably do not want to annotate them.
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.
See https://bugs.llvm.org/show_bug.cgi?id=42344
Reviewers: rnk
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D67723
LinkGraph::splitBlock will split a block at a given index, returning a new
block covering the range [ 0, index ) and modifying the original block to
cover the range [ index, original-block-size ). Block addresses, content,
edges and symbols will be updated as necessary. This utility will be used
in upcoming improvements to JITLink's eh-frame support.
Summary:
Delete the BasicBlockPass and BasicBlockManager, all its dependencies and update documentation.
The BasicBlockManager was improperly tested and found to be potentially broken, and was deprecated as of rL373254.
In light of the switch to the new pass manager coming before the next release, this patch is a first cleanup of the LegacyPassManager.
Reviewers: chandlerc, echristo
Subscribers: mehdi_amini, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69121
I am using it in https://reviews.llvm.org/D69399.
This change changes how obj2yaml dumps arrays of `llvm::yaml::Hex8/llvm::yaml::Hex16/llvm::yaml::Hex32`
from:
```
PayloadBytes:
- 0x01
- 0x02
...
```
To
```
PayloadBytes: [ 0x01, 0x02, ... ]
```
The latter way is shorter and looks better for arrays.
Differential revision: https://reviews.llvm.org/D69558
Summary:
This extends the rules for when a call instruction is deemed to be an
FPMathOperator, which is based on the type of the call (i.e. the return
type of the function being called). Previously we only allowed
floating-point and vector-of-floating-point types. Now we also allow
arrays (nested to any depth) of floating-point and
vector-of-floating-point types.
This was motivated by llpc, the pipeline compiler for AMD GPUs
(https://github.com/GPUOpen-Drivers/llpc). llpc has many math library
functions that operate on vectors, typically represented as <4 x float>,
and some that operate on matrices, typically represented as
[4 x <4 x float>], and it's useful to be able to decorate calls to all
of them with fast math flags.
Reviewers: spatel, wristow, arsenm, hfinkel, aemerson, efriedma, cameron.mcinally, mcberg2017, jmolloy
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69161
This is a follow-up to D67448.
Split live intervals with multiple dead defs during the initial
execution of the live interval analysis, but do it outside of the
function createAndComputeVirtRegInterval.
Differential Revision: https://reviews.llvm.org/D68666
The architecture enum contains two kinds of contstants: the "official" ones
defined by Microsoft, and unofficial constants added by breakpad to cover the
architectures not described by the first ones.
Up until now, there was no big need to differentiate between the two. However,
now that Microsoft has defined
https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/ns-sysinfoapi-system_info
a constant for ARM64, we have a name clash.
This patch renames all breakpad-defined constants with to include the prefix
"BP_". This frees up the name "ARM64", which I'll re-introduce with the new
"official" value in a follow-up patch.
Reviewers: amccarth, clayborg
Subscribers: lldb-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D69285
Extend the describeLoadedValue() with support for target specific ARM and
AArch64 instructions interpretation. The patch provides specialization for
ADD and SUB operations that include a register and an immediate/offset
operand. Some of the instructions can operate with global string addresses
or constant pool indexes but such cases are omitted since we currently lack
flexible support for processing such operands at DWARF production stage.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D67556
Summary:
When createing an ORC remote JIT target the current library split forces the target process to link large portions of LLVM (Core, Execution Engine, JITLink, Object, MC, Passes, RuntimeDyld, Support, Target, and TransformUtils). This occurs because the ORC RPC interfaces rely on the static globals the ORC Error types require, which starts a cycle of pulling in more and more.
This patch breaks the ORC RPC Error implementations out into an "OrcError" library which only depends on LLVM Support. It also pulls the ORC RPC headers into their own subdirectory.
With this patch code can include the Orc/RPC/*.h headers and will only incur link dependencies on LLVMOrcError and LLVMSupport.
Reviewers: lhames
Reviewed By: lhames
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68732
Summary:
Add a flag `F_no_mmap` to `FileOutputBuffer` to support
`--[no-]mmap-output-file` in ELF LLD. LLD currently explicitly ignores
this flag for compatibility with GNU ld and gold.
We need this flag to speed up link time for large binaries in certain
scenarios. When we link some of our larger binaries we find that LLD
takes 50+ GB of memory, which causes memory pressure. The memory
pressure causes the VM to flush dirty pages of the output file to disk.
This is normally okay, since we should be flushing cold pages. However,
when using BtrFS with compression we need to write 128KB at a time when
we flush a page. If any page in that 128KB block is written again, then
it must be flushed a second time, and so on. Since LLD doesn't write
sequentially this causes write amplification. The same 128KB block will
end up being flushed multiple times, causing the linker to many times
more IO than necessary. We've observed 3-5x faster builds with
-no-mmap-output-file when we hit this scenario.
The bad scenario only applies to compressed filesystems, which group
together multiple pages into a single compressed block. I've tested
BtrFS, but the problem will be present for any compressed filesystem
on Linux, since it is caused by the VM.
Silently ignoring --no-mmap-output-file caused a silent regression when
we switched from gold to lld. We pass --no-mmap-output-file to fix this
edge case, but since lld silently ignored the flag we didn't realize it
wasn't being respected.
Benchmark building a 9 GB binary that exposes this edge case. I linked 3
times with --mmap-output-file and 3 times with --no-mmap-output-file and
took the average. The machine has 24 cores @ 2.4 GHz, 112 GB of RAM,
BtrFS mounted with -compress-force=zstd, and an 80% full disk.
| Mode | Time |
|---------|-------|
| mmap | 894 s |
| no mmap | 126 s |
When compression is disabled, BtrFS performs just as well with and
without mmap on this benchmark.
I was unable to reproduce the regression with any binaries in
lld-speed-test.
Reviewed By: ruiu, MaskRay
Differential Revision: https://reviews.llvm.org/D69294
This patch adds support for deleted C++ special member functions in
clang and llvm. Also added Defaulted member encodings for future
support for defaulted member functions.
Patch by Sourabh Singh Tomar!
Differential Revision: https://reviews.llvm.org/D69215
Adding patten matching for two SVE intrinsics: frecps and frsqrts.
Also added patterns for fsub and fmul - these SDNodes directly correspond
to machine instructions.
Review: https://reviews.llvm.org/D68476
Patch authored by mgudim (Mikhail Gudim).
llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with
EXPENSIVE_CHECKS enabled, causing the patch to be reverted in
rG2c496bb5309c972d59b11f05aee4782ddc087e71.
This patch relands the patch with a proper fix to the
live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the
callee-saves correctly so that the CalleeSaved info is taken from MIR
directly, rather than letting it be recalculated by the PEI pass. I've
done this by running `llc -stop-before=prologepilog` on the LLVM
IR as captured in the test files, adding the extra MOV instructions
that were manually added in the original test file, then running `llc
-run-pass=prologepilog` and finally re-added the comments for the MOV
instructions.
Stores are vectorized with maximum vectorization factor of 16. Patch
tries to improve the situation and use maximal vectorization factor.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Differential Revision: https://reviews.llvm.org/D43582
Define BitVector::BitWord as uintptr_t instead of unsigned long, as long does not necessarily translates to a pointer size (especially on 64-bit Visual Studio).
Committed on behalf of @ekatz (Ehud Katz)
Differential Revision: https://reviews.llvm.org/D69336
Summary:
Currently we only forget the loop we added LCSSA phis for. But SCEV
expressions in other loops could also depend on the instruction we added
a PHI for and currently we do not invalidate those expressions. This can
happen when we use ScalarEvolution before converting a function to LCSSA
form. The SCEV expressions will refer to the non-LCSSA value. If this
SCEV expression is then used with the expander, we do not preserve LCSSA
form.
This patch properly forgets the values we created PHIs for. Those need
to be recomputed again. This patch fixes PR43458.
Currently SCEV::verify does not catch this mismatch and any test would
need to run multiple passes to trigger the error (e.g. -loop-reduce
-loop-unroll). I will also look into catching this kind of mismatch in
the verifier. Also, we currently forget the whole loop in LCSSA and I'll
check if we can be more surgical.
Reviewers: efriedma, sanjoy.google, reames
Reviewed By: efriedma
Subscribers: zzheng, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68194
Currently, when we do not specify "Info" field in a YAML description
for SHT_GROUP section, yaml2obj reports an error:
"error: unknown symbol referenced: '' by YAML section '.group1'"
Also, we do not link it with a symbol table by default,
though it is what we do for AddrsigSection, HashSection, RelocationSection.
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#sh_link)
The patch fixes missings mentioned.
Differential revision: https://reviews.llvm.org/D69299
To make IntegerState more flexible but also less error prone we split it
up into (1) incrementing, (2) decrementing, and (3) bit-tracking states.
This adds functionality compared to before and disallows misuse, e.g.,
"incrementing" updates on a bit-tracking state.
Part of the change is a single operator in the base class which
simplifies helper functions that deal with states.
There are certain functional changes but all of which should actually be
corrections.
Summary:
Fixes some things from original commit at https://reviews.llvm.org/D69136. The main
change is that the heap alloc marker is always stored as ExtraInfo in the machine
instruction instead of in the PointerSumType because it cannot hold more than
4 pointer types.
Add instruction marker to MachineInstr ExtraInfo. This does almost the
same thing as Pre/PostInstrSymbols, except that it doesn't create a label until
printing instructions. This allows for labels to be put around instructions that
are deleted/duplicated somewhere.
Use this marker to track heap alloc site call instructions.
Reviewers: rnk
Subscribers: MatzeB, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69536
Summary:
(Split of off D67120)
SizeOpts/MachineSizeOpts changes for profile guided size optimization.
(A second try after previously committed as r375254 and reverted as r375375.)
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69409
Emit a remarks section by default for the following formats:
* bitstream
* yaml-strtab
while still providing -remarks-section=<bool> to override the defaults.
If IRBuilder is constructed using the NoFolder constant folder, we should use the Unary FNeg to match the non-constant part of IRBuilder.
Differential Revision: https://reviews.llvm.org/D69396
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
Also mention "basename" and "dirname" in Path.h since I tried
to find these functions by looking for these strings. It might
help others find them faster if the comments contain these strings.
No behavior change.
Differential Revision: https://reviews.llvm.org/D69458
Summary:
Writing support for three ACLE functions:
unsigned int __cls(uint32_t x)
unsigned int __clsl(unsigned long x)
unsigned int __clsll(uint64_t x)
CLS stands for "Count number of leading sign bits".
In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).
Reviewers: compnerd
Reviewed By: compnerd
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69250
Summary:
Compare two values, and if they are different, return the position of the
most significant bit that is different in the values.
Needed for D69387.
Reviewers: nikic, spatel, sanjoy, RKSimon
Reviewed By: nikic
Subscribers: xbolva00, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69439
I've accidentally reverted one of my previous patches.
It was not catched by bots because (I guess) they do not
build in debug (we have a test case which triggers an assert
in MSVS when runs without this change).
More info: https://reviews.llvm.org/D68983#inline-624235
Reported by Jordan Rupprecht.
Using `?` as an optional marker is very useful in Clang's AST-node
emitters because otherwise we need a separate class just to encode
the presence or absence of a base node reference.
MSVC supports it. Fixes the major MSVC compile time regression
introduced in r369961. Now
clang/lib/StaticAnalyzer/Frontend/CheckerRegistry.cpp compiles in 18s
instead of 7+ minutes.
Fixes PR43369
The IRBuilder needs to add the strictfp attribute to function
definitions and calls when constrained floating point is enabled.
Since so far all front ends have had to do is flip the constrained
switch, I've made this patch always add the required attributes
when said constrained switch is enabled. This continues to keep
changes to front ends minimal.
Differential Revision: D69312
Summary:
Add instruction marker to MachineInstr ExtraInfo. This does almost the
same thing as Pre/PostInstrSymbols, except that it doesn't create a label until
printing instructions. This allows for labels to be put around instructions that
are deleted/duplicated somewhere.
Also undo the workaround in r375137.
Reviewers: rnk
Subscribers: MatzeB, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69136
Summary:
There are `*_ov()` functions already, so at least for consistency it may be good to also have saturating variants.
These may or may not be needed for `ConstantRange`'s `shlWithNoWrap()`
Reviewers: spatel, nikic
Reviewed By: nikic
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69398
Summary:
There are `*_ov()` functions already, so at least for consistency it may be good to also have saturating variants.
These may or may not be needed for `ConstantRange`'s `mulWithNoWrap()`
Reviewers: spatel, nikic
Reviewed By: nikic
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69397
SHT_NOTE is the section that consists of
namesz, descsz, type, name + padding, desc + padding data.
This patch teaches yaml2obj, obj2yaml to dump and parse them.
This patch implements the section how it is described here:
https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-18048.html
Which says: "For 64–bit objects and 32–bit objects, each entry is an array of 4-byte words in
the format of the target processor"
The official specification is different
http://www.sco.com/developers/gabi/latest/ch5.pheader.html#note_section
And says: "n 64-bit objects (files with e_ident[EI_CLASS] equal to ELFCLASS64), each entry is an array
of 8-byte words in the format of the target processor. In 32-bit objects (files with e_ident[EI_CLASS]
equal to ELFCLASS32), each entry is an array of 4-byte words in the format of the target processor"
Since LLVM uses the first, 32-bit way, this patch follows it.
Differential revision: https://reviews.llvm.org/D68983
SHT_SYMTAB_SHNDX should have the same number of entries as the symbol table
associated (https://www.sco.com/developers/gabi/latest/ch4.sheader.html)
We currently can report the following message:
"SHT_SYMTAB_SHNDX section has sh_size (24) which is not equal to the number of symbols (2)"
It is just broken. This patch refines/fixes it.
Differential revision: https://reviews.llvm.org/D69305
We were already going to all of the trouble of computing maximum constant exit counts for each loop exit, we might as well expose them through the API. The change in IndVars is mostly to demonstrate that the wired up code works, but it als very slightly strengthens the transform. The strengthened case is rather narrow though: it requires one exactly analyzeable exit, one imprecisely analyzeable exit (with the upper bound less than the precise one), and one unanalyzeable exit. I coudn't construct a reasonably stable test case.
This does increase the memory usage of the BackedgeTakenCount by a factor of 2 in the worst case.
I also noticed the loop in IndVars is O(#Exits ^ 2). This doesn't change with this patch. A future patch will cache this result inside of SCEV to avoid requering.
This is a first step in figuring out a proper API for maximum (non constant) exit counts. This may evolve a bit as we get experience with the API needs; suggestions very welcome. This patch just tried to provide a framework that we can later add maximum too in a clean and obvious way.
This reverts commit 32ce14e55e.
In post-commit review, Pavel pointed out that there's a simpler way to
ignore SIGPIPE in lldb that doesn't rely on llvm's handlers.
The VST2 and VST4 instructions take two or four vector registers as
input, and store part of each register to memory in an interleaved
pattern. They come in variants indicating which part of each register
they store (VST20 and VST21; VST40 to VST43 inclusive); the intention
is that issuing each of those variants in turn has the combined effect
of loading or storing the whole set of registers to a memory block of
equal size. The corresponding VLD2 and VLD4 instructions load from
memory in the same interleaved format: each one overwrites only part
of its output register set, and again, the idea is that if you use
VLD4{0,1,2,3} or VLD2{0,1} together, you end up having written to the
whole of each register.
I've implemented the stores and loads quite differently. The loads
were easiest to implement as a single intrinsic that expands to all
four VLD4x instructions or both VLD2x, delivering four complete output
registers. (Implementing each individual load as a separate
instruction taking four input registers to partially overwrite is
possible in theory, but pointless, and when I tried it, I found it
would need extra work to get the register allocation not to be
horrible.) Since that intrinsic delivers multiple outputs, it has to
be instruction-selected in custom C++.
But the store instructions are easier to model individually, because
they don't overwrite any register at all and you can write a DAG Isel
pattern in Tablegen for each one.
Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load
instructions, delivers four full output vectors, and is handled by C++
code, whereas `int_arm_mve_vst4q` expands to just one store
instruction, takes four input vectors and a constant indicating which
lanes to store, and is handled entirely in Tablegen. (And similarly
for vld2q/vst2q.) This is asymmetric, but it was the easiest way to do
each one.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68700
This adds some initial example IR intrinsics for MVE instructions that
deliver multiple output values, and hence, have to be instruction-
selected by custom C++ code instead of Tablegen patterns.
I've added the writeback gather load instructions (taking a vector of
base addresses and a single common offset, returning a vector of
loaded values and an updated vector of base addresses); one example
from the long shift family (taking and returning a 64-bit value in two
GPRs); and the VADC instruction (which propagates a carry bit from
each vector-lane addition to the next, taking an input carry flag in
FPSCR and outputting the final one in FPSCR as well).
To support the VPT-predicated forms of these instructions, I've
written some helper functions to add the cluster of MVE predicate
operands to the end of a MachineInstr. `AddMVEPredicateToOps` is used
when the instruction actually is predicated (so it takes a predicate
mask argument), and `AddEmptyMVEPredicateToOps` is for when the
instruction is unpredicated (so it fills in $noreg for the mask). Each
one comes in a form suitable for `vpred_n`, and one for `vpred_r`
which takes the extra 'inactive' parameter.
For VADC, the representation of the carry flag in the IR intrinsic is
a word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e.
with the carry flag in bit 29 of the word. (The user-facing ACLE
intrinsic will want it to be in bit 0, but I'll do that on the clang
side.)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68699
This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.
This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).
When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.
(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new `MVEVectorVTInfo` records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)
The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67158
This broke various Windows builds, see comments on the Phabricator
review.
This also reverts the follow-up 20bf0cf.
> Summary:
> This fold, helps recover from the rest of the D62266 ARM regressions.
> https://rise4fun.com/Alive/TvpC
>
> Note that while the fold is quite flexible, i've restricted it
> to the single interesting pattern at the moment.
>
> Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix
>
> Reviewed By: deadalnix
>
> Subscribers: javed.absar, kristof.beyls, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D62450
This roughly mimics `std::thread(...).detach()` except it allows to
customize the stack size. Required for https://reviews.llvm.org/D50993.
I've decided against reusing the existing `llvm_execute_on_thread` because
it's not obvious what to do with the ownership of the passed
function/arguments:
1. If we pass possibly owning functions data to `llvm_execute_on_thread`,
we'll lose the ability to pass small non-owning non-allocating functions
for the joining case (as it's used now). Is it important enough?
2. If we use the non-owning interface in the new use case, we'll force
clients to transfer ownership to the spawned thread manually, but
similar code would still have to exist inside
`llvm_execute_on_thread(_async)` anyway (as we can't just pass the same
non-owning pointer to pthreads and Windows implementations, and would be
forced to wrap it in some structure, and deal with its ownership.
Patch by Dmitry Kozhevnikov!
Differential Revision: https://reviews.llvm.org/D51103
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.
Tags: #llvm, #clang, #lldb
Differential Revision: https://reviews.llvm.org/D66795
Summary:
This fold, helps recover from the rest of the D62266 ARM regressions.
https://rise4fun.com/Alive/TvpC
Note that while the fold is quite flexible, i've restricted it
to the single interesting pattern at the moment.
Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix
Reviewed By: deadalnix
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62450
Teach the CombinerHelper how to turn shuffle_vectors, that
concatenate vectors, into concat_vectors and add this combine
to the AArch64 pre-legalizer combiner.
Differential Revision: https://reviews.llvm.org/D69149
llvm-svn: 375452
Summary:
Reduce include dependencies by no longer including Pass.h from
DataLayout.h. That include seemed irrelevant to DataLayout, as
well as being irrelevant to several users of DataLayout.
Reviewers: rnk
Reviewed By: rnk
Subscribers: mehdi_amini, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69261
llvm-svn: 375436
This header doesn't seem to be parsable on its own and breaks the module build therefore with
the following error:
While building module 'LLVM_Backend' imported from llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:14:
In file included from <module-includes>:62:
llvm-project/llvm/include/llvm/CodeGen/MachinePipeliner.h:91:20: error: declaration of 'AAResultsWrapperPass' must be imported from module 'LLVM_Analysis.AliasAnalysis' before it is required
AU.addRequired<AAResultsWrapperPass>();
^
llvm-project/llvm/include/llvm/Analysis/AliasAnalysis.h:1157:7: note: previous declaration is here
class AAResultsWrapperPass : public FunctionPass {
^
llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:14:10: fatal error: could not build module 'LLVM_Backend'
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.
llvm-svn: 375433
Commit message from D66935:
This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.
In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.
A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.
Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.
This patch fixes the lldb unit tests in `functionalities/thread/concurrent_events/*`
Changes after D66935:
Ensures AArch64FunctionInfo::getCalleeSavedStackSize does not return
the uninitialized CalleeSavedStackSize when running `llc` on a specific
pass where the MIR code has already been expected to have gone through PEI.
Instead, getCalleeSavedStackSize (when passed the MachineFrameInfo) will try
to recalculate the CalleeSavedStackSize from the CalleeSavedInfo. In debug
mode, the compiler will assert the recalculated size equals the cached
size as calculated through a call to determineCalleeSaves.
This fixes two tests:
test/DebugInfo/AArch64/asan-stack-vars.mir
test/DebugInfo/AArch64/compiler-gen-bbs-livedebugvalues.mir
that otherwise fail when compiled using msan.
Reviewed By: omjavaid, efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68783
llvm-svn: 375425
This is designed to change the bitwidth of a type without altering the number
of vector lanes. Also useful in D68651. Otherwise an NFC.
Differential Revision: https://reviews.llvm.org/D69139
llvm-svn: 375417
In some cases using the return value of setFPAttrs simplifies the code.
In other cases it complicates the code with ugly casts, so stop doing
it. NFC.
llvm-svn: 375409
It returns just a section_iterator currently and have a report_fatal_error call inside.
This change adds a way to return errors and handle them on caller sides.
The patch also changes/improves current users and adds test cases.
Differential revision: https://reviews.llvm.org/D69167
llvm-svn: 375408
We have a following code to find quote type:
if (isspace(S.front()) || isspace(S.back()))
...
Problem is that:
"int isspace( int ch ): The behavior is undefined if the value of
ch is not representable as unsigned char and is not equal to EOF."
(https://en.cppreference.com/w/cpp/string/byte/isspace)
This patch shows how this UB can be triggered and fixes an issue.
Differential revision: https://reviews.llvm.org/D69160
llvm-svn: 375404
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69216
llvm-svn: 375398
This patch tries to resolve problems faced in D68943
and uses some of the code written by Konrad Wilhelm Kleine
in that patch.
Previously, yaml2obj tool always created a .symtab section.
This patch changes that. With it we only create it when
have a "Symbols:" tag in the YAML document or when
we need to create it because it is used by another section(s).
obj2yaml follows the new behavior and does not print "Symbols:"
anymore when there is no symbol table.
Differential revision: https://reviews.llvm.org/D69041
llvm-svn: 375361
Provides a TLI hook to allow targets to relax the emission of shifts, thus enabling
codegen improvements on targets with no multiple shift instructions and cheap selects
or branches.
Contributes to a Fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559
Patch by: @joanlluch (Joan LLuch)
Differential Revision: https://reviews.llvm.org/D69116
llvm-svn: 375347
Works on this dependency chain:
ArrayRef.h ->
Hashing.h -> --CUT--
Host.h ->
StringMap.h / StringRef.h
ArrayRef is very popular, but Host.h is rarely needed. Move the
IsBigEndianHost constant to SwapByteOrder.h. Clients of that header are
more likely to need it.
llvm-svn: 375316
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.
Noticed with -ftime-trace.
llvm-svn: 375311
by ExtBinary format profile
Profile on-demand loading was added for ExtBinary format profile in rL374233,
but currently profile on-demand loading doesn't work well with profile
remapping. The patch adds the support.
Suppose a function in the current module has outline instance in the profile.
The function name in the module is different from the name of the outline
instance, but remapper knows the two names are equal. When loading profile
on-demand, the outline instance has to be loaded with remapper's help.
At the same time SampleProfileReaderItaniumRemapper is changed from a proxy
of SampleProfileReader to a helper member in SampleProfileReader.
Differential Revision: https://reviews.llvm.org/D68901
llvm-svn: 375295
Occasionally, during test teardown, LLDB writes to a closed pipe.
Sometimes the communication is inherently unreliable, so LLDB tries to
avoid being killed due to SIGPIPE (it calls `signal(SIGPIPE, SIG_IGN)`).
However, LLVM's default SIGPIPE behavior overrides LLDB's, causing it to
exit with IO_ERR.
Opt LLDB out of the default SIGPIPE behavior. I expect that this will
resolve some LLDB test suite flakiness (tests randomly failing with
IO_ERR) that we've seen since r344372.
rdar://55750240
Differential Revision: https://reviews.llvm.org/D69148
llvm-svn: 375288
Summary:
Also changes the wasm YAML format to reflect the possibility of having
multiple return types and to put the returns after the params for
consistency with the binary encoding.
Reviewers: aheejin, sbc100
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, arphaman, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69156
llvm-svn: 375283
The default implementation of isIncomingArgumentHandler could lead
to generating incorrect code.
Make it a pure virtual method, so that targets know they have to
override it to produce correct code.
NFC
Differential Revision: https://reviews.llvm.org/D69187
llvm-svn: 375277
Summary:
This makes it much easier to verify that the implementation matches the
documentation. It uncovered a bug in the unit tests where we were
accidentally setting fast math flags on a load instruction.
Reviewers: spatel, wristow, arsenm, hfinkel, aemerson, efriedma, cameron.mcinally, mcberg2017, jmolloy
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69176
llvm-svn: 375252
D68992 / rL375086 refactored the packetizer and removed a bunch of logic. Unfortunately it creates an Automaton object whenever a DFAPacketizer is required. These objects have no longevity, and in particular on a debug build the population of the Automaton's transition map from the underlying table is very slow (because it is called ~10 times per MachineFunction, in the testcase I'm looking at).
This patch changes Automaton to wrap its underlying constant data in std::shared_ptr, which allows trivial copy construction. The DFAPacketizer creation function now creates a static archetypical Automaton and copies that whenever a new DFAPacketizer is required.
This takes a testcase down from ~20s to ~0.5s in debug mode.
llvm-svn: 375240
Summary:
This will allow updating MinidumpYAML and LLDB to use this common
definition.
Reviewers: labath, jhenderson, clayborg
Reviewed By: labath
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68656
llvm-svn: 375239
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.
Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.
At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.
I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.
Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy
Reviewed By: jmolloy
Differential Revision: https://reviews.llvm.org/D47775
llvm-svn: 375222
Summary:
Implements the following intrinsics:
- int_aarch64_sve_sunpkhi
- int_aarch64_sve_sunpklo
- int_aarch64_sve_uunpkhi
- int_aarch64_sve_uunpklo
This patch also adds AArch64ISD nodes for UNPK instead of implementing
the intrinsics directly, as they are required for a future patch which
implements the sign/zero extension of legal vectors.
This patch includes tests for the Subdivide2Argument type added by D67549
Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka
Reviewed By: greened
Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D67550
llvm-svn: 375210
Summary:
The current implementation eats the current errors and just outputs
the message parameter passed to llvm::cantFail. This change appends
the original error message(s), so the user can see exactly why
cantFail failed. New logic is conditional on NDEBUG.
Reviewed By: lhames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69057
llvm-svn: 375176
Header64.offset/Header64.size are uint64_t, thus we should not
truncate them to unit32_t. Moreover, there are a number of places
where we sum the offset and the size (e.g. in various checks in MachOUniversal.cpp),
the truncation causes issues since the offset/size can perfectly fit into uint32_t,
while the sum overflows.
Differential revision: https://reviews.llvm.org/D69126
Test plan: make check-all
llvm-svn: 375154
Reland r375051 (reverted in r375052) after fixing lld tests on Windows in r375126 and r375131.
Original description: Update GlobPattern in libSupport to handle a few more cases. It does not fully match the `fnmatch` used by GNU objcopy since named character classes (e.g. `[[:digit:]]`) are not supported, but this should support most existing use cases (mostly just `*` is what's used anyway).
This will be used to implement the `--wildcard` flag in llvm-objcopy to be more compatible with GNU objcopy.
This is split off of D66613 to land the libSupport changes separately. The llvm-objcopy part will land soon.
Reviewers: jhenderson, MaskRay, evgeny777, espindola, alexshap
Reviewed By: MaskRay
Subscribers: nickdesaulniers, emaste, arichardson, hiraditya, jakehehrlich, abrachet, seiya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66613
llvm-svn: 375149
This patch provides support for peudo ops including ADDIStocHA8, ADDIStocHA, LWZtocL,
LDtoc, LDtocL for AIX, lowering them from MIR to assembly.
Differential Revision: https://reviews.llvm.org/D68341
llvm-svn: 375113
Since GNU ar 2.31, the 't' operation prints member offsets beside file
names if the 'O' modifier is specified. 'O' is ignored for thin
archives.
Reviewed By: gbreynoo, ruiu
Differential Revision: https://reviews.llvm.org/D69087
llvm-svn: 375106
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.
Original commit message:
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Differential revision: https://reviews.llvm.org/D63932
llvm-svn: 375094
Summary:
Currently when computing a GEP offset using the function EmitGEPOffset
for the following instruction
getelementptr inbounds i32, i32* %p, i64 %offs
we get
mul nuw i64 %offs, 4
Unfortunately we cannot assume that unsigned wrapping won't happen
here because %offs is allowed to be negative.
Making such assumptions can lead to miscompilations: see the new test
test24_neg_offs in InstCombine/icmp.ll. Without the patch InstCombine
would generate the following comparison:
icmp eq i64 %offs, 4611686018427387902; 0x3ffffffffffffffe
Whereas the correct value to compare with is -2.
This patch replaces the NUW flag with NSW in the multiplication
instructions generated by EmitGEPOffset and adjusts the test suite.
https://bugs.llvm.org/show_bug.cgi?id=42699
Reviewers: chandlerc, craig.topper, ostannard, lebedev.ri, spatel, efriedma, nlopes, aqjune
Reviewed By: lebedev.ri
Subscribers: reames, lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68342
llvm-svn: 375089