Previously the CodeExtractor created exit stubs, and the subsequent return value of the outlined function based on the order of out-of-region blocks after splitting any phi nodes, and collecting the blocks to be outlined. This could cause differences in order if there was a difference of exit block phi nodes between the two regions. This patch moves the collection of the output target blocks to be before this occurs, so that the assignment of target block to output value will be the same, regardless of the contents of the output block.
Reviewers: paquette, roelofs
Differential Revision: https://reviews.llvm.org/D108657
The implementation is mostly copied from MemDepAnalysis. We want to look
at all loads and stores to the same pointer operand. Bitcasts and zero
GEPs of a pointer are considered the same pointer value. We choose the
most dominating instruction.
Since updating MemorySSA with invariant.group is non-trivial, for now
handling of invariant.group is not cached in any way, so it's part of
the walker. The number of loads/stores with invariant.group is small for
now anyway. We can revisit if this actually noticeably affects compile
times.
To avoid invariant.group affecting optimized uses, we need to have
optimizeUsesInBlock() not use invariant.group in any way.
Co-authored-by: Piotr Padlewski <prazek@google.com>
Reviewed By: asbirlea, nikic, Prazek
Differential Revision: https://reviews.llvm.org/D109134
Use the `HBuilder` interface to provide default implementations of `llvm::hash_value`.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D109024
On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.
On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.
For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.
This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.
I've attempted to take into account the in tree experimental backends.
Differential Revision: https://reviews.llvm.org/D45962
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.
There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.
For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.
This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.
Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.
Differential Revision: https://reviews.llvm.org/D45961
Add KnownBits handling and unit tests for X*X self-multiplication cases which guarantee that bit1 of their results will be zero - see PR48683.
https://alive2.llvm.org/ce/z/NN_eaR
The next step will be to add suitable test coverage so this can be enabled in ValueTracking/DAG/GlobalISel - currently only a single Analysis/ScalarEvolution test is affected.
Differential Revision: https://reviews.llvm.org/D108992
In the case of no tied variables, we pick random defs, and then random uses that don't alias with defs we just picked.
Sounds good, except that an X86 instruction may have implicit reg uses,
e.g. for `MULX` it's `EDX`/`RDX`: `Intel SDM, 4-162 Vol. 2B MULX — Unsigned Multiply Without Affecting Flags`
> Performs an unsigned multiplication of the implicit source operand (EDX/RDX) and the specified source operand
> (the third operand) and stores the low half of the result in the second destination (second operand), the high half
> of the result in the first destination operand (first operand), without reading or writing the arithmetic flags.
And indeed, every once in a while `llvm-exegesis` happened to pick EDX as a def while measuring throughput,
and producing garbage output:
```
$ ./bin/llvm-exegesis -num-repetitions=1000000 -mode=inverse_throughput -repetition-mode=min --loop-body-size=4096 -dump-object-to-disk=false -opcode-name=MULX32rr --max-configs-per-opcode=65536
---
mode: inverse_throughput
key:
instructions:
- 'MULX32rr EDX R11D R12D'
config: ''
register_initial_values:
- 'R12D=0x0'
- 'EDX=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 4.00014, per_snippet_value: 4.00014 }
error: ''
info: instruction has no tied variables picking Uses different from defs
assembled_snippet: 415441BC00000000BA00000000C4C223F6D4C4C223F6D4C4C223F6D4C4C223F6D4415CC3415441BC00000000BA0000000049B80200000000000000C4C223F6D4C4C223F6D44983C0FF75F0415CC3
...
```
```
$ ./bin/llvm-exegesis -num-repetitions=1000000 -mode=inverse_throughput -repetition-mode=min --loop-body-size=4096 -dump-object-to-disk=false -opcode-name=MULX32rr --max-configs-per-opcode=65536
---
mode: inverse_throughput
key:
instructions:
- 'MULX32rr R13D EDX ECX'
config: ''
register_initial_values:
- 'ECX=0x0'
- 'EDX=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 3.00013, per_snippet_value: 3.00013 }
error: ''
info: instruction has no tied variables picking Uses different from defs
assembled_snippet: 4155B900000000BA00000000C4626BF6E9C4626BF6E9C4626BF6E9C4626BF6E9415DC34155B900000000BA0000000049B80200000000000000C4626BF6E9C4626BF6E94983C0FF75F0415DC3
...
```
Oops! Not only does that not look fun, i did hit that pitfail during AMD Zen 3 enablement.
While i have since then addressed this in rGd4d459e7475b4bb0d15280f12ed669342fa5edcd,
i suspect there may be other buggy results lying around, so we should at least stop producing them.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D109275
The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.
This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.
We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.
Reviewers: paquette, yroux
Differential Revision: https://reviews.llvm.org/D106989
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:
* `unrollLoopFull`
* `unrollLoopPartial`
* `unrollLoopHeuristic`
`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.
With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.
Reviewed By: jdoerfert, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107764
This reapplies 71d7fed3bc which was
reverted by 3e2bd82f02. This change
includes the fix for breaking the sanitizer bots.
As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current
implementation for parsing grouped short options can return unclear
error messages. This change fixes the example given in the ticket in
which a flag is incorrectly given an argument. Also when parsing a
group we now keep reading past the first incorrect option and output
errors for all incorrect options in the group.
Differential Revision: https://reviews.llvm.org/D108770
Add support for ordered directive in the OpenMPIRBuilder.
This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.
Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").
Reviewed By: Meinersbur, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107430
All ExecutorProcessControl subclasses must provide a JITLinkMemoryManager object
that can be used to allocate memory in the executor process. The
EPCGenericJITLinkMemoryManager class provides an off-the-shelf
JITLinkMemoryManager implementation for JITs that do not need (or cannot
provide) a specialized JITLinkMemoryManager implementation. This simplifies the
process of creating new ExecutorProcessControl implementations.
Looks like the MS STL wants StringMapKeyIterator::operator*() to be const.
Return the result by copy instead of reference to do that.
Assigning to a hash map key iterator doesn't make sense anyways.
Also reverts 123f811fe5 which is now hopefully no longer needed.
Differential Revision: https://reviews.llvm.org/D109167
Now prints the list of known archs. This requires plumbing a Driver
arg through a few functions.
Also add two more convenience insert() overlods to StringMap.
Differential Revision: https://reviews.llvm.org/D109105
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
"LLVMFrontendOpenMP" of type SHARED_LIBRARY
depends on "LLVMPasses" (weak)
"LLVMipo" of type SHARED_LIBRARY
depends on "LLVMFrontendOpenMP" (weak)
"LLVMCoroutines" of type SHARED_LIBRARY
depends on "LLVMipo" (weak)
"LLVMPasses" of type SHARED_LIBRARY
depends on "LLVMCoroutines" (weak)
depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY. Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed. Build files cannot be regenerated correctly.
```
This reverts commit 707ce34b06.
llvm.vp.select extends the regular select instruction with an explicit
vector length (%evl).
All lanes with indexes at and above %evl are
undefined. Lanes below %evl are taken from the first input where the
mask is true and from the second input otherwise.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D105351
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:
* `unrollLoopFull`
* `unrollLoopPartial`
* `unrollLoopHeuristic`
`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.
With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.
Reviewed By: jdoerfert, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107764
This is used by BOLT to do patching of DebugInfo section, and Line Table. Directly by using find, and through getAttrFieldOffsetForUnit.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D107874
As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current
implementation for parsing grouped short options can return unclear
error messages. This change fixes the example given in the ticket in
which a flag is incorrectly given an argument. Also when parsing a
group we now keep reading past the first incorrect option and output
errors for all incorrect options in the group.
Differential Revision: https://reviews.llvm.org/D108770
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.
A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.
When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.
Testing
This is no-diff change in terms of code quality and profile content (for text profile).
For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.
The compile time of ads is dropped by 25%.
Differential Revision: https://reviews.llvm.org/D107299
When the initial relationship between two pairs of values between
similar sections is ambiguous to commutativity, arguments to the
outlined functions can be passed in such that the order is incorrect,
causing miscompilations. This adds a canonical mapping to each
similarity section, so that we can maintain the relationship of global
value numbering from one section to another.
Added Tests:
Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
unittests/Analysis/IRSimilarityIdentifierTest.cpp - IRSimilarityCandidate:CanonicalNumbering
Reviewers: jroelofs, jpaquette, yroux
Differential Revision: https://reviews.llvm.org/D104143
Generate btf_tag annotations for function parameters.
A field "annotations" is introduced to DILocalVariable, and
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates how
annotations are encoded in IR:
distinct !DILocalVariable(name: "info",, arg: 1, ..., annotations: !10)
!10 = !{!11, !12}
!11 = !{!"btf_tag", !"a"}
!12 = !{!"btf_tag", !"b"}
Differential Revision: https://reviews.llvm.org/D106620
Generate btf_tag annotations for DIGlobalVariable.
A field "annotations" is introduced to DIGlobalVariable, and
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates how
annotations are encoded in IR:
distinct !DIGlobalVariable(..., annotations: !10)
!10 = !{!11, !12}
!11 = !{!"btf_tag", !"a"}
!12 = !{!"btf_tag", !"b"}
Differential Revision: https://reviews.llvm.org/D106619
The Code Extractor does not provide an easy mechanism for determining the
inputs and outputs after extraction has occurred, this patch gives the
ability to pass in empty SetVectors to be filled with the inputs and
outputs if they need to be analyzed.
Added Tests:
- InputOutputMonitoring in unittests/Transforms/Utils/CodeExtractorTests.cpp
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D106991
The `HashBuilder` interface allows conveniently building hashes of various data
types, without relying on the underlying hasher type to know about hashed data
types.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D106910
In LLVM IR, `AlignmentBitfieldElementT` is 5-bit wide
But that means that the maximal alignment exponent is `(1<<5)-2`,
which is `30`, not `29`. And indeed, alignment of `1073741824`
roundtrips IR serialization-deserialization.
While this doesn't seem all that important, this doubles
the maximal supported alignment from 512MiB to 1GiB,
and there's actually one noticeable use-case for that;
On X86, the huge pages can have sizes of 2MiB and 1GiB (!).
So while this doesn't add support for truly huge alignments,
which i think we can easily-ish do if wanted, i think this adds
zero-cost support for a not-trivially-dismissable case.
I don't believe we need any upgrade infrastructure,
and since we don't explicitly record the IR version,
we don't need to bump one either.
As @craig.topper speculates in D108661#2963519,
this might be an artificial limit imposed by the original implementation
of the `getAlignment()` functions.
Differential Revision: https://reviews.llvm.org/D108661
When Src and Dst used in buildAnyExtOrTrunc or buildSExtOrTrunc
have the same type (creates COPY) use Src register directly or
use replaceRegOrBuildCopy instead.
Differential Revision: https://reviews.llvm.org/D108306
WrapperFunctionResult no longer supports wrapping constant data, so this patch
adds a non-const data method. Since data can now be written through the data
method, the allocate method can be simplified to return a WrapperFunctionResult.
DWARFDie::getDeclFile(...) previously only supported getting the DW_AT_decl_file if the DIE itself contained the DW_AT_decl_file attribute, or if the DIE had a DW_AT_abstract_origin that pointed to another DIE that had a DW_AT_decl_file. This patch allows the function to get the right attribute value if there is a DW_AT_specification that points to another DIE. We also test that if a DW_AT_abtract_origin or DW_AT_specification points to a DIE in another CU with a DW_FORM_ref_addr, that the right line table is used to extract the file index.
Full tests were added for the following cases:
- DIE has a DW_AT_decl_file attribute
- DIE has a DW_AT_abtract_origin that points to another die in the same CU
- DIE has a DW_AT_abtract_origin that points to another die in another CU
- DIE has a DW_AT_specification that points to another die in the same CU
- DIE has a DW_AT_specification that points to another die in another CU
Differential Revision: https://reviews.llvm.org/D108480
Renames the blobSerializationRoundTrip test helper function to
spsSerializationRoundTrip ('blob' was the placeholder name for the serialization
scheme during prototyping, this function was missed when renaming everything
for the mainline). Also drops explicit template arguments at call sites where
they can be inferred (and are obvious) from the call argument type.
All ExecutorProcessControl subclasses must provide an
ExecutorProcessControl::MemoryAccess object that can be used to access executor
memory from the JIT process. The EPCGenericMemoryAccess class provides an
off-the-shelf MemoryAccess implementation for JITs that do not need (or cannot
provide) a specialized MemoryAccess implementation. This simplifies the process
of creating new ExecutorProcessControl implementations.
Accepts a vector of (SymbolStringPtr, ExecutorAddress*) pairs, looks up all the
symbols, then writes their address to each of the corresponding
ExecutorAddresses.
This idiom (looking up and recording addresses into a specific set of variables)
is used in MachOPlatform and the (temporarily reverted) ELFNixPlatform, and is
likely to be used in other places in the near future, so wrapping it in a
utility function should save us some boilerplate.
Clang patch D106614 added attribute btf_tag support. This patch
generates btf_tag annotations for DIComposite types.
A field "annotations" is introduced to DIComposite, and the
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates
how annotations are encoded in IR:
distinct !DICompositeType(..., annotations: !10)
!10 = !{!11, !12}
!11 = !{!"btf_tag", !"a"}
!12 = !{!"btf_tag", !"b"}
Each btf_tag annotation is represented as a 2D array of
meta strings. Each record may have more than one
btf_tag annotations, as in the above example.
Reland with additional fixes for llvm/unittests/IR/DebugTypeODRUniquingTest.cpp.
Differential Revision: https://reviews.llvm.org/D106615
This information is necessary for clients of DebugInfo that
do not want to process a DWARF expression, but just treat it as a blob
of data. In BOLT, for example, we need to read these expressions in
CFIs and write them back to the binary, unchanged, so having access to
the original expression encoding is a shortcut to avoid the need to
re-encode the entire expression when re-writing exception handling
info (CFIs).
This patch is an alternative to https://reviews.llvm.org/D98301, in
which we implement the support to re-encode these expressions. But
since we don't really need to change anything in these expressions,
we can just copy their bytes.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D107515
MSSA-based LICM has been enabled by default for a few years now.
This drops the old AST-based implementation. Using loop(licm) will
result in a fatal error, the use of loop-mssa(licm) is required
(or just licm, which defaults to loop-mssa).
Note that the core canSinkOrHoistInst() logic has to retain AST
support for now, because it is shared with LoopSink.
Differential Revision: https://reviews.llvm.org/D108244
Nest from being perfect
Expand LoopNestAnalysis to return the full list of instructions that
cause a loop nest to be imperfect. This is useful for other passes to
know if they should continue for in the inner loops.
Added New function getInterveningInstructions
that returns a small vector with the instructions that prevent a loop
for being perfect. Also added a couple of helper functions to reduce
code duplication.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D107773
This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.
Support for expansion on targets without native vector-predication support is
included.
This patch is based on the ["reduction
slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D104308
This option has been enabled by default for quite a while now.
The practical impact of removing the option is that MSSA use
cannot be disabled in default pipelines (both LPM and NPM) and
in manual LPM invocations. NPM can still choose to enable/disable
MSSA using loop vs loop-mssa.
The next step will be to require MSSA for LICM and drop the
AST-based implementation entirely.
Differential Revision: https://reviews.llvm.org/D108075
Reset cl::Positional, cl::Sink and cl::ConsumeAfter options as well in cl::ResetCommandLineParser().
Reviewed By: rriddle, sammccall
Differential Revision: https://reviews.llvm.org/D103356
Improves maintainability (edit/modify the tests without recompiling) and
error messages (previously the failure would be a gtest failure
mentioning nothing of the input or desired text) and the option to
improve tests with more checks.
(maybe these tests shouldn't all be in separate files - we could
probably have DWARF yaml that contains multiple errors while still being
fairly maintainable - the various invalid offsets (ref_addr, rnglists,
ranges, etc) could probably be all in one test, but for the simple sake
of the migration I just did the mechanical thing here)
AttributeList::hasAttribute() is confusing, use clearer methods like
hasParamAttr()/hasRetAttr().
Add hasRetAttr() since it was missing from AttributeList.
Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives.
CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards.
In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0
Also see discussion in D105706.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D107540
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be
shifted individually by the constant shift amount.
However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization
code was used, producing intermediate shifts that are potentially illegal on some targets.
This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D89100
1) add some self-diagnosis (when asserts are enabled) to check that all
features have the same nr of entries
2) avoid storing pointers to mutable fields because the proto API
contract doesn't actually guarantee those stay fixed even if no further
mutation of the object occurs.
Differential Revision: https://reviews.llvm.org/D107594
It's entirely possible (because it actually happened) for a bool
variable to end up with a 256-bit DW_AT_const_value. This came about
when a local bool variable was initialized from a bitfield in a
32-byte struct of bitfields, and after inlining and constant
propagation, the variable did have a constant value. The sequence of
optimizations had it carrying "i256" values around, but once the
constant made it into the llvm.dbg.value, no further IR changes could
affect it.
Technically the llvm.dbg.value did have a DIExpression to reduce it
back down to 8 bits, but the compiler is in no way ready to emit an
oversized constant *and* a DWARF expression to manipulate it.
Depending on the circumstances, we had either just the very fat bool
value, or an expression with no starting value.
The sequence of optimizations that led to this state did seem pretty
reasonable, so the solution I came up with was to invent a DWARF
constant expression folder. Currently it only does convert ops, but
there's no reason it couldn't do other ops if that became useful.
This broke three tests that depended on having convert ops survive
into the DWARF, so I added an operator that would abort the folder to
each of those tests.
Differential Revision: https://reviews.llvm.org/D106915
This allows users accessing options in libSupport before invoking
`cl::ParseCommandLineOptions`, and also matches the behavior before
D105959.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D106334
ValueTracking should allow for value ranges that may satisfy
llvm.assume, instead of restricting the ranges only to values that
will always satisfy the condition.
Differential Revision: https://reviews.llvm.org/D107298
When we build with split dwarf in single mode the .o files that contain both "normal" debug sections and dwo sections, along with relocaiton sections for "normal" debug sections.
When we create DWARF context in DWARFObjInMemory we process relocations and store them in the map for .debug_info, etc section.
For DWO Context we also do it for non dwo dwarf sections. Which I believe is not necessary. This leads to a lot of memory being wasted. We observed 70GB extra memory being used.
I went with context sensitive approach, flag is passed in. I am not sure if it's always safe not to process relocations for regular debug sections if Obj contains .dwo sections.
If it is alternatvie might be just to scan, in constructor, sections and if there are .dwo sections not to process regular debug ones.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D106624
This introduces a builder function for emitting IR performing reductions in
OpenMP. Reduction variable privatization and initialization to the
reduction-neutral value is expected to be handled separately. The caller
provides the reduction functions. Further commits can provide implementation of
reduction functions for the reduction operators defined in the OpenMP
specification.
This implementation was tested on an MLIR fork targeting OpenMP from C and
produced correct executable code.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D104928
D106850 introduced a simplification for llvm.vscale by looking at the
surrounding function's vscale_range attributes. The call that's being
simplified may not yet have been inserted into the IR. This happens for
example during function cloning.
This patch fixes the issue by checking if the instruction is in a
parent basic block.
This takes two ranges and invokes a predicate on the element-wise pair in the
ranges. It returns true if all the pairs are matching the predicate and the ranges
have the same size.
It is useful with containers that aren't random iterator where we can't check the
sizes in O(1).
Differential Revision: https://reviews.llvm.org/D106605
Wrapper function call and dispatch handler helpers are moved to
ExecutionSession, and existing EPC-based tools are re-written to take an
ExecutionSession argument instead.
Requiring an ExecutorProcessControl instance simplifies existing EPC based
utilities (which only need to take an ES now), and should encourage more
utilities to use the EPC interface. It also simplifies process termination,
since the session can automatically call ExecutorProcessControl::disconnect
(previously this had to be done manually, and carefully ordered with the
rest of JIT tear-down to work correctly).
These tests access private symbols in the backends, so they cannot link
against libLLVM.so and must be statically linked. Linking these tests
can be slow and with debug builds the resulting binaries use a lot of
disk space.
By merging them into a single test binary means we now only need to
statically link 1 test instead of 6, which helps reduce the build
times and saves disk space.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D106464
This patch adds support for the next-generation arch14
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Detection of arch14 as host processor.
- Assembler/disassembler support for new instructions.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10304.
Note: No currently available Z system supports the arch14
architecture. Once new systems become available, the
official system name will be added as supported -march name.
checkForAllInstructions was not handling declarations correctly.
It should have been returning false when it gets called on a declaration
The patch also fixes a test case for AAFunctionReachability for it to be able
to pass after the changes to the checkForAllinstructions.
Differential Revision: https://reviews.llvm.org/D106625
Avoid buffering just to copy the buffered data, in 'development
mode', when logging. Instead, just populate the underlying protobuf.
Differential Revision: https://reviews.llvm.org/D106592
Opaque values (of zero size) can be stored in memory with the
implemention of reference types in the WebAssembly backend. Since
MachineMemOperand uses LLTs we need to be able to support
zero-sized scalars types in LLTs.
Differential Revision: https://reviews.llvm.org/D105423
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106496
fixed fields with highly-aligned flexible fields.
The code was not considering the possibility that aligning
the current offset to the alignment of a queue might push
us past the end of the gap. Subtracting the offsets to
figure out the maximum field size for the gap then overflowed,
making us think that we had nearly unbounded space to fill.
Fixes PR 51131.
Make it easier to initialize small maps inline. Note that DenseMap already has an initializer_list constructor.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D106363
This patch allows iterating typed enum via the ADT/Sequence utility.
It also changes the original design to better separate concerns:
- `StrongInt` only deals with safe `intmax_t` operations,
- `SafeIntIterator` presents the iterator and reverse iterator
interface but only deals with safe `StrongInt` internally.
- `iota_range` only deals with `SafeIntIterator` internally.
This design ensures that operations are always valid. In particular,
"Out of bounds" assertions fire when:
- the `value_type` is not representable as an `intmax_t`
- iterator operations make internal computation underflow/overflow
- the internal representation cannot be converted back to `value_type`
Differential Revision: https://reviews.llvm.org/D106279
LinkGraph::transferBlock can be used to move a block and all associated symbols
from one section to another.
LinkGraph::mergeSections moves all blocks and sections from a source section to
a destination section.
After rGbbbc4f110e35ac709b943efaa1c4c99ec073da30, we can move
any string type that has convenient pointer and length fields
into the PtrAndLengthKind, reducing the amount of code.
Differential Revision: https://reviews.llvm.org/D106381
This was placing sret/byval attributes without type argument on
non-pointer arguments. Make this valid IR by using pointer
arguments and passing the corresponding attribute type argument.
This is a follow-up to https://reviews.llvm.org/D103935
A Twine's internal layout should not depend on which version of the
C++ standard is in use. Dynamically linking binaries compiled with two
different layouts (eg, --std=c++14 vs --std=c++17) ends up
problematic.
This change avoids that issue by immediately converting a
string_view to a pointer-and-length at the cost of an extra eight-bytes
in Twine.
Differential Revision: https://reviews.llvm.org/D106186
- This patch adds in the GOFF format to the file magic identification logic in LLVM
- Currently, for the object file support, GOFF is marked as having as an error
- However, this is only temporary until https://reviews.llvm.org/D98437 is merged in
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D105993
Code in getCPUNameFromS390Model currently assumes that the
numerical value of the model number always increases with
future hardware. While this has happened to be the case
with the last few machines, it is not guaranteed -- that
assumption was violated with (much) older machines, and
it can be violated again with future machines.
Fix by explicitly listing model numbers for all supported
machine models.
It turns out that during training, the time required to parse the
textual protobuf of a training log is about the same as the time it
takes to compile the module generating that log. Using binary protobufs
instead elides that cost almost completely.
Differential Revision: https://reviews.llvm.org/D106157
This diff changes llvm-ifs to use unified IFS file format
and perform other renaming changes in preparation for the
merging between elfabi/ifs.
Differential Revision: https://reviews.llvm.org/D99810
This change implements unified text stub format and command line
interface proposed in the elfabi/ifs merge plan.
Differential Revision: https://reviews.llvm.org/D99399
Adds support for MachO static initializers/deinitializers and eh-frame
registration via the ORC runtime.
This commit introduces cooperative support code into the ORC runtime and ORC
LLVM libraries (especially the MachOPlatform class) to support macho runtime
features for JIT'd code. This commit introduces support for static
initializers, static destructors (via cxa_atexit interposition), and eh-frame
registration. Near-future commits will add support for MachO native
thread-local variables, and language runtime registration (e.g. for Objective-C
and Swift).
The llvm-jitlink tool is updated to use the ORC runtime where available, and
regression tests for the new MachOPlatform support are added to compiler-rt.
Notable changes on the ORC runtime side:
1. The new macho_platform.h / macho_platform.cpp files contain the bulk of the
runtime-side support. This includes eh-frame registration; jit versions of
dlopen, dlsym, and dlclose; a cxa_atexit interpose to record static destructors,
and an '__orc_rt_macho_run_program' function that defines running a JIT'd MachO
program in terms of the jit- dlopen/dlsym/dlclose functions.
2. Replaces JITTargetAddress (and casting operations) with ExecutorAddress
(copied from LLVM) to improve type-safety of address management.
3. Adds serialization support for ExecutorAddress and unordered_map types to
the runtime-side Simple Packed Serialization code.
4. Adds orc-runtime regression tests to ensure that static initializers and
cxa-atexit interposes work as expected.
Notable changes on the LLVM side:
1. The MachOPlatform class is updated to:
1.1. Load the ORC runtime into the ExecutionSession.
1.2. Set up standard aliases for macho-specific runtime functions. E.g.
___cxa_atexit -> ___orc_rt_macho_cxa_atexit.
1.3. Install the MachOPlatformPlugin to scrape LinkGraphs for information
needed to support MachO features (e.g. eh-frames, mod-inits), and
communicate this information to the runtime.
1.4. Provide entry-points that the runtime can call to request initializers,
perform symbol lookup, and request deinitialiers (the latter is
implemented as an empty placeholder as macho object deinits are rarely
used).
1.5. Create a MachO header object for each JITDylib (defining the __mh_header
and __dso_handle symbols).
2. The llvm-jitlink tool (and llvm-jitlink-executor) are updated to use the
runtime when available.
3. A `lookupInitSymbolsAsync` method is added to the Platform base class. This
can be used to issue an async lookup for initializer symbols. The existing
`lookupInitSymbols` method is retained (the GenericIRPlatform code is still
using it), but is deprecated and will be removed soon.
4. JIT-dispatch support code is added to ExecutorProcessControl.
The JIT-dispatch system allows handlers in the JIT process to be associated with
'tag' symbols in the executor, and allows the executor to make remote procedure
calls back to the JIT process (via __orc_rt_jit_dispatch) using those tags.
The primary use case is ORC runtime code that needs to call bakc to handlers in
orc::Platform subclasses. E.g. __orc_rt_macho_jit_dlopen calling back to
MachOPlatform::rt_getInitializers using __orc_rt_macho_get_initializers_tag.
(The system is generic however, and could be used by non-runtime code).
The new ExecutorProcessControl::JITDispatchInfo struct provides the address
(in the executor) of the jit-dispatch function and a jit-dispatch context
object, and implementations of the dispatch function are added to
SelfExecutorProcessControl and OrcRPCExecutorProcessControl.
5. OrcRPCTPCServer is updated to support JIT-dispatch calls over ORC-RPC.
6. Serialization support for StringMap is added to the LLVM-side Simple Packed
Serialization code.
7. A JITLink::allocateBuffer operation is introduced to allocate writable memory
attached to the graph. This is used by the MachO header synthesis code, and will
be generically useful for other clients who want to create new graph content
from scratch.
At most these use the StringRef/Twine wrappers and don't have any implicit uses of std::string.
Move the include down to any cpp implementation where std::string is actually used.
llvm::KnownBits::byteSwap() and reverse() don't modify in-place, so
we weren't actually computing anything. This was causing a miscompile on an
arm64 stage2 bootstrap clang build.
We can build it with -Werror=global-constructors now. This helps
in situation where libSupport is embedded as a shared library,
potential with dlopen/dlclose scenario, and when command-line
parsing or other facilities may not be involved. Avoiding the
implicit construction of these cl::opt can avoid double-registration
issues and other kind of behavior.
Reviewed By: lattner, jpienaar
Differential Revision: https://reviews.llvm.org/D105959
We can build it with -Werror=global-constructors now. This helps
in situation where libSupport is embedded as a shared library,
potential with dlopen/dlclose scenario, and when command-line
parsing or other facilities may not be involved. Avoiding the
implicit construction of these cl::opt can avoid double-registration
issues and other kind of behavior.
Reviewed By: lattner, jpienaar
Differential Revision: https://reviews.llvm.org/D105959
We can build it with -Werror=global-constructors now. This helps
in situation where libSupport is embedded as a shared library,
potential with dlopen/dlclose scenario, and when command-line
parsing or other facilities may not be involved. Avoiding the
implicit construction of these cl::opt can avoid double-registration
issues and other kind of behavior.
Reviewed By: lattner, jpienaar
Differential Revision: https://reviews.llvm.org/D105959
The ceiling variant was recently added (due to the work towards D105216), and we're spending a lot of time trying to find optimizations for the expression. This patch brute forces the space of i8 unsigned divides and checks that we get a correct (well consistent with APInt) result for both udiv and udiv ceiling.
(This is basically what I've been doing locally in a hand rolled C++ program, and I realized there no good reason not to check it in as a unit test which directly exercises the logic on constants.)
Differential Revision: https://reviews.llvm.org/D106083
SME introduces the ZA array, a new piece of architectural register state
consisting of a matrix of [SVLb x SVLb] bytes, where SVL is the
implementation defined Streaming SVE vector length and SVLb is the
number of 8-bit elements in a vector of SVL bits.
SME instructions consist of three types of matrix operands:
* Tiles: a ZA tile is a square, two-dimensional sub-array of elements
within the ZA array. These tiles make up the larger accumulator array
and the granularity varies based on the element size, i.e.
- ZAQ0..ZAQ15 (smallest tile granule)
- ZAD0..ZAD7
- ZAS0..ZAS3
- ZAH0..ZAH1
or ZAB0 (largest tile granule, single tile)
* Tile vectors: similar to regular tiles, but have an extra 'h' or 'v'
to tell how the vector at [reg+offset] is layed out in the tile,
horizontally or vertically. E.g. za1h.h or za15v.q, which corresponds
to vectors in registers ZAH1 and ZAQ15, respectively.
* Accumulator matrix: this is the entire accumulator array ZA.
This patch adds the register classes and related operands and parsing
for SME instructions operating on the accumulator array.
The ADDHA and ADDVA instructions which operate on tiles are also added
in this patch to make some use of the code added, later patches will
make use of the other operands introduced here.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Co-authored by: Sander de Smalen (@sdesmalen)
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D105570
Continuing from D105763, this allows placing certain properties
about attributes in the TableGen definition. In particular, we
store whether an attribute applies to fn/param/ret (or a combination
thereof). This information is used by the Verifier, as well as the
ForceFunctionAttrs pass. I also plan to use this in LLParser,
which also duplicates info on which attributes are valid where.
This keeps metadata about attributes in one place, and makes it
more likely that it stays in sync, rather than in various
functions spread across the codebase.
Differential Revision: https://reviews.llvm.org/D105780
First patch in a series adding MC layer support for the Arm Scalable
Matrix Extension.
This patch adds the following features:
sme, sme-i64, sme-f64
The sme-i64 and sme-f64 flags are for the optional I16I64 and F64F64
features.
If a target supports I16I64 then the following instructions are
implemented:
* 64-bit integer ADDHA and ADDVA variants (D105570).
* SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, and USMOPS
instructions that accumulate 16-bit integer outer products into 64-bit
integer tiles.
If a target supports F64F64 then the FMOPA and FMOPS instructions that
accumulate double-precision floating-point outer products into
double-precision tiles are implemented.
Outer products are implemented in D105571.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06
Reviewed By: CarolineConcatto
Differential Revision: https://reviews.llvm.org/D105569
This patch makes the operations on InstructionCost saturate, so that when
costs are accumulated they saturate to <max value>.
One of the compelling reasons for wanting to have saturation support
is because in various places, arbitrary values are used to represent
a 'high' cost, but when accumulating the cost of some set of operations
or a loop, overflow is not taken into account, which may lead to unexpected
results. By defining the operations to saturate, we can express the cost
of something 'very expensive' as InstructionCost::getMax().
Reviewed By: kparzysz, dmgreen
Differential Revision: https://reviews.llvm.org/D105108
Rules:
1. SCEVUnknown is a pointer if and only if the LLVM IR value is a
pointer.
2. SCEVPtrToInt is never a pointer.
3. If any other SCEV expression has no pointer operands, the result is
an integer.
4. If a SCEVAddExpr has exactly one pointer operand, the result is a
pointer.
5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other
pointer operands, the result is a pointer.
6. If every operand of a SCEVMinMaxExpr is a pointer, the result is a
pointer.
7. Otherwise, the SCEV expression is invalid.
I'm not sure how useful rule 6 is in practice. If we exclude it, we can
guarantee that ScalarEvolution::getPointerBase always returns a
SCEVUnknown, which might be a helpful property. Anyway, I'll leave that
for a followup.
This is basically mop-up at this point; all the changes with significant
functional effects have landed. Some of the remaining changes could be
split off, but I don't see much point.
Differential Revision: https://reviews.llvm.org/D105510
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
SelectionDAG's equivalents in ISD::InputArg/OutputArg track the
original argument index. Mips relies on this, and its currently
reinventing its own parallel CallLowering infrastructure which tracks
these indexes on the side. Add this to help move towards deleting the
custom mips handling.
This adds a new llvm::thread class with the same interface as std::thread
except there is an extra constructor that allows us to set the new thread's
stack size. On Darwin even the default size is boosted to 8MB to match the main
thread.
It also switches all users of the older C-style `llvm_execute_on_thread` API
family over to `llvm::thread` followed by either a `detach` or `join` call and
removes the old API.
Moved definition of DefaultStackSize into the .cpp file to hopefully
fix the build on some (GCC-6?) machines.
This adds a new llvm::thread class with the same interface as std::thread
except there is an extra constructor that allows us to set the new thread's
stack size. On Darwin even the default size is boosted to 8MB to match the main
thread.
It also switches all users of the older C-style `llvm_execute_on_thread` API
family over to `llvm::thread` followed by either a `detach` or `join` call and
removes the old API.
These currently always require a type parameter. The bitcode reader
already upgrades old bitcode without the type parameter to use the
pointee type.
In cases where the caller does not have byval but the callee does, we
need to follow CallBase::paramHasAttr() and also look at the callee for
the byval type so that CallBase::isByValArgument() and
CallBase::getParamByValType() are in sync. Do the same for preallocated.
While we're here add a corresponding version for inalloca since we'll
need it soon.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104663
The build system was linking the PluginsTests unittest against libLLVM.so
and LLVMAsmParser which was causing the test to fail with this error:
LLVM ERROR: inconsistency in registered CommandLine options
We need to add llvm libraries to LLVM_LINK_COMPONENTS so that
they are dropped from the linker arguments when linking with
LLVM_LINK_LLVM_DYLIB=ON
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D105523
This change yields an additional 2% size reduction on an internal search
binary, and an additional 0.5% size reduction on fuchsia.
Differential Revision: https://reviews.llvm.org/D104751
Address mistakenly comparing the pointer values of two C-style strings
rather than comparing their contents in the unit tests for makeVisitor,
added in 6d6f35eb7b
This patch adds intrinsic definitions and SDNodes for predicated
load/store/gather/scatter, based on the work done in D57504.
Reviewed By: simoll, craig.topper
Differential Revision: https://reviews.llvm.org/D99355
Adds support for both synchronous and asynchronous calls to wrapper functions
using SPS (Simple Packed Serialization). Also adds support for wrapping
functions on the JIT side in SPS-based wrappers that can be called from the
executor.
These new methods simplify calls between the JIT and Executor, and will be used
in upcoming ORC runtime patches to enable communication between ORC and the
runtime.
This enables proper lowering of non-byte sized loads. We still aren't
faithfully preserving memory types everywhere, so the legality checks
still only consider the size.
This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).
Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.
This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.
Dynamically loaded plugins for the new pass manager are initialised by
calling llvmGetPassPluginInfo. This is defined as a weak symbol so that
it is continually redefined by each plugin that is loaded. When loading
a plugin from a shared library, the intention is that
llvmGetPassPluginInfo will be resolved to the definition in the most
recent plugin. However, using a global search for this resolution can
fail in situations where multiple plugins are loaded.
Currently:
* If a plugin does not define llvmGetPassPluginInfo, then it will be
silently resolved to the previous plugin's definition.
* If loading the same plugin twice with another in between, e.g. plugin
A/plugin B/plugin A, then the second load of plugin A will resolve to
llvmGetPassPluginInfo in plugin B.
* The previous case can also occur when a dynamic library defines both
NPM and legacy plugins; the legacy plugins are loaded first and then
with `-fplugin=A -fpass-plugin=B -fpass-plugin=A`: A will be loaded as
a legacy plugin and define llvmGetPassPluginInfo; B will be loaded
and redefine it; and finally when A is loaded as an NPM plugin it will
be resolved to the definition from B.
Instead of searching globally, restrict the symbol lookup to the library
that is currently being loaded.
Differential Revision: https://reviews.llvm.org/D104916
Symbol tables can have symbols with no size in mach-o files that were failing to get combined into a single entry. This resulted in many duplicate entries for the same address and made gsym files larger.
Differential Revision: https://reviews.llvm.org/D105068
Relands patch reverted by 61242c0add
The original patch mistakenly included unrelated tests.
Adds a utility to combine multiple Callables into a single Callable.
This is useful to make constructing a visitor for `std::visit`-like
functions more natural; functions like this will be added in future
patches.
Intended to supercede https://reviews.llvm.org/D99560 by
perfectly-forwarding the combined Callables.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100670
Adds a utility to combine multiple Callables into a single Callable.
This is useful to make constructing a visitor for `std::visit`-like
functions more natural; functions like this will be added in future
patches.
Intended to supercede https://reviews.llvm.org/D99560 by
perfectly-forwarding the combined Callables.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100670
On AIX the alignment implementation has the storage aligned to the
preferred alignment instead of the alignment of a type. Macro guard
these tests for AIX and have them pass when the "reference alignment" is
less than or equal to the alignment observed. In other words, the
alignment applied is at least as strict as the required alignment.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D104786
This patch relands https://reviews.llvm.org/D104454, but fixes some failing
builds on Mac OS which apparently has a different definition for size_t,
that caused 'ambiguous operator overload' for the implicit conversion
of TypeSize to a scalar value.
This reverts commit b732e6c9a8.
To reflect that the size may be scalable, a TypeSize is returned
instead of an unsigned. In places where the result is used,
it currently relies on an implicit cast of TypeSize -> uint64_t,
which asserts that the type is not scalable.
This patch is NFC for fixed-width vectors.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D104454