Revert "Delete llvm::is_trivially_copyable and CMake variable HAVE_STD_IS_TRIVIALLY_COPYABLE"
This reverts commit 4d4bd40b57.
This reverts commit 557b00e0af.
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.
Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D89566
In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
...
!2 = !{!2, !3, !4}
!3 = !{!"llvm.loop.vectorize.width", i32 8}
!4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.
Differential Revision: https://reviews.llvm.org/D88962
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.
Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D89566
This does the same as `--mcpu=help` but was only
documented in the user guide.
* Added a test for both options.
* Corrected the single dash in `-mcpu=help` text.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D92305
llvm-symbolizer used to use the DIA SDK for symbolization on
Windows; this patch switches to using native symbolization, which was
implemented recently.
Users can still make the symbolizer use DIA by adding the `-dia` flag
in the LLVM_SYMBOLIZER_OPTS environment variable.
Differential Revision: https://reviews.llvm.org/D91814
- Document that the kernel descriptor defined is for code object V3.
Document that it also applies to earlier code object formats for CP.
- Document the deprecated bits in kernel descriptor.
Differential Revision: https://reviews.llvm.org/D91458
This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.
This change enables disassembling the text sections to build various address maps that are potentially used by the virtual unwinder. A switch `--show-disassembly` is being added to print the disassembly code.
Like the llvm-objdump tool, this change leverages existing LLVM components to parse and disassemble ELF binary files. So far X86 is supported.
Test Plan:
ninja check-llvm
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D89712
This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.
As a starter, this change sets up an entry point by introducing PerfReader to load profiled binaries and perf traces(including perf events and perf samples). For the event, here it parses the mmap2 events from perf script to build the loader snaps, which is used to retrieve the image load address in the subsequent perf tracing parsing.
As described in llvm-profgen.rst, the tool being built aims to support multiple input perf data (preprocessed by perf script) as well as multiple input binary images. It should also support dynamic reload/unload shared objects by leveraging the loader snaps being built by this change
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D89707
This is similar to the existing alloca and program address spaces (D37052)
and should be used when creating/accessing global variables.
We need this in our CHERI fork of LLVM to place all globals in address space 200.
This ensures that values are accessed using CHERI load/store instructions
instead of the normal MIPS/RISC-V ones.
The problem this is trying to fix is that most of the time the type of
globals is created using a simple PointerType::getUnqual() (or ::get() with
the default address-space value of 0). This does not work for us and we get
assertion/compilation/instruction selection failures whenever a new call
is added that uses the default value of zero.
In our fork we have removed the default parameter value of zero for most
address space arguments and use DL.getProgramAddressSpace() or
DL.getGlobalsAddressSpace() whenever possible. If this change is accepted,
I will upstream follow-up patches to use DL.getGlobalsAddressSpace() instead
of relying on the default value of 0 for PointerType::get(), etc.
This patch and the follow-up changes will not have any functional changes
for existing backends with the default globals address space of zero.
A follow-up commit will change the default globals address space for
AMDGPU to 1.
Reviewed By: dylanmckay
Differential Revision: https://reviews.llvm.org/D70947
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.
Differential Revision: https://reviews.llvm.org/D91157
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.
When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.
In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
advantage of `dso_local_equivalent` for relative references
This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.
See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.
Differential Revision: https://reviews.llvm.org/D77248
This patch introduces a new VPDef class, which can be used to
manage VPValues defined by recipes/VPInstructions.
The idea here is to mirror VPUser for values defined by a recipe. A
VPDef can produce either zero (e.g. a store recipe), one (most recipes)
or multiple (VPInterleaveRecipe) result VPValues.
To traverse the def-use chain from a VPDef to its users, one has to
traverse the users of all values defined by a VPDef.
VPValues now contain a pointer to their corresponding VPDef, if one
exists. To traverse the def-use chain upwards from a VPValue, we first
need to check if the VPValue is defined by a VPDef. If it does not have
a VPDef, this means we have a VPValue that is not directly defined
iniside the plan and we are done.
If we have a VPDef, it is defined inside the region by a recipe, which
is a VPUser, and the upwards def-use chain traversal continues by
traversing all its operands.
Note that we need to add an additional field to to VPVAlue to link them
to their defs. The space increase is going to be offset by being able to
remove the SubclassID field in future patches.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D90558
- In certain cases, a generic pointer could be assumed as a pointer to
the global memory space or other spaces. With a dedicated target hook
to query that address space from a given value, infer-address-space
pass could infer and propagate that to all its users.
Differential Revision: https://reviews.llvm.org/D91121
Describe in the BackEnd Developer's Guide. Instrument a few backends.
Remove an old unused timing facility. Add a null backend for timing
the parser.
Differential Revision: https://reviews.llvm.org/D91388
Clarify the semantics of GEP inbounds, in particular with regard
to what it means for wrapping. This cleans up some confusion on
when it is legal to apply nuw/nsw flags to various parts of the
GEP calculation.
Differential Revision: https://reviews.llvm.org/D90708
Use exact component name in add_ocaml_library.
Make expand_topologically compatible with new architecture.
Fix quoting in is_llvm_target_library.
Fix LLVMipo component name.
Write release note.
This patch adds a new !annotation metadata kind which can be used to
attach annotation strings to instructions.
It also adds a new pass that emits summary remarks per function with the
counts for each annotation kind.
The intended uses cases for this new metadata is annotating
'interesting' instructions and the remarks should provide additional
insight into transformations applied to a program.
To motivate this, consider these specific questions we would like to get answered:
* How many stores added for automatic variable initialization remain after optimizations? Where are they?
* How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated?
Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
Reviewed By: thegameg, jdoerfert
Differential Revision: https://reviews.llvm.org/D91188
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
Adding new paragraphs under "Introducing New Components" section to
check the different levels of support we have, to help introduction of
smaller set of changes without overwhelming new collaborators and
potentially losing the contribution.
Differential Revision: D91013
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR
This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.
This is a fairly simple change in itself, but leads to a number of other
required alterations.
- The hardware loop pass, if UsePhi is set, now generates loops of the
form:
%start = llvm.start.loop.iterations(%N)
loop:
%p = phi [%start], [%dec]
%dec = llvm.loop.decrement.reg(%p, 1)
%c = icmp ne %dec, 0
br %c, loop, exit
- For this a new llvm.start.loop.iterations intrinsic was added, identical
to llvm.set.loop.iterations but produces a value as seen above, gluing
the loop together more through def-use chains.
- This new instrinsic conceptually produces the same output as input,
which is taught to SCEV so that the checks in MVETailPredication are not
affected.
- Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
been left mostly as before. We should now more reliably be able to tell
that the t2DoLoopStart is correct without having to prove it, but
t2WhileLoopStart and tail-predicated loops will remain the same.
- And all the tests have been updated. There are a lot of them!
This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.
Differential Revision: https://reviews.llvm.org/D89881
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.
Differential Revision: https://reviews.llvm.org/D88540
As discussed in the mailing list [1-4], we need a separation of support
tiers when requiring support from the whole community versus a
sub-community. Essentially, if a sub-community is active enough and
takes maintenance into their own internal costs without affecting other
parts of the community's maintenance costs, then code that is not
immediately relevant to all parts (ie. not released, actively tested,
etc) can still find its way into the LLVM main repository without major
pain points.
The main benefit is to reduce the maintenance cost that those
sub-communities have outside of LLVM (for example, in duplicating common
code, applying the same patches on top of multiple user repositories or
downstream projects).
This document outlines the components and responsibilities of the
sub-communities with regards to maintenance costs and how they affect
the rest of the community.
It also adds an addendum on removal policies, which expand the existing
"new target removal" policy into something more generic, to encompass
any piece of code, scripts or documents in the repository.
[1] http://lists.llvm.org/pipermail/llvm-dev/2020-October/146249.html
[2] http://lists.llvm.org/pipermail/llvm-dev/2020-November/146335.html
[3] http://lists.llvm.org/pipermail/llvm-dev/2020-October/146138.html
[4] http://lists.llvm.org/pipermail/llvm-dev/2020-November/146298.html
The `llvm.coro.suspend.async` intrinsic takes a function pointer as its
argument that describes how-to restore the current continuation's
context from the context argument of the continuation function. Before
we assumed that the current context can be restored by loading from the
context arguments first pointer field (`first_arg->caller_context`).
This allows for defining suspension points that reuse the current
context for example.
Also:
llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic
Blocks inlining until after the async coroutine was split.
Also, change the async function pointer's context size position
struct async_function_pointer {
uint32_t relative_function_pointer_to_async_impl;
uint32_t context_size;
}
And make the position of the `async context` argument configurable. The
position is specified by the `llvm.coro.id.async` intrinsic.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90783
Update the Programmer's Reference document.
Add a test. Update a couple of tests with an improved error message.
Differential Revision: https://reviews.llvm.org/D90635
This patch adds the llvm.loop.mustprogress loop metadata. This is to be
added to loops where the frontend language requires that the loop makes
observable interactions with the environment. This is the loop-level
equivalent to the function attribute `mustprogress` defined in D86233.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D88464
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Reapply with fix for memory sanitizer failure and sphinx failure.
Differential Revision: https://reviews.llvm.org/D90612
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90612
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.
Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?
Differential Revision: https://reviews.llvm.org/D90153
This patch mainly made the following changes:
1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.
Differential Revision: https://reviews.llvm.org/D89105
Make all of the "AMDGPU Machine Code GFX*" columns in the Memory Model
table a consistent width of 32-characters.
Best viewed with something like --word-diff
Differential Revision: https://reviews.llvm.org/D89977
Mostly NFC, but some changes are "bug fixes" rather than just e.g.
formatting changes or typo corrections.
- Fix typo "competing" -> "completing".
- Document why waintcnt is added to stores and not loads for
sequentially consistent ordering.
- Lowercase some mentions of `buffer_gl{0,1}_inv`.
- Make mentions of `*cnt(0)` consistently include the `(0)` count.
- Remove some mentions of instructions for incorrect address spaces. For
example, remove mention of `flat_load` from
`load atomic acquire workgroup global`.
- Re-flow some text to get all the target columns to fit in a
32-character wide column. Makes a future NFC patch to make these columns
both 32-character wide more straightforward.
Modified cherry-pick of patch by Tony Tye
Reviewed By: t-tye
Differential Revision: https://reviews.llvm.org/D89596
The neutral value is -0.0, not 0.0. This doesn't matter for "fast"
reductions due to nsz, but does matter for reassoc-only and seq
reductions.
Change tests to mostly use -0.0 where the neutral value was intended,
and add some additional test coverage in some places. Also update
LangRef to use the right value.
- AMDGPUUsage.rst: Correct AMD GPU DWARF address space table address
sizes which are in bits and not bytes.
- clang/.../Options.td: Improve description of AMD GPU options.
- Re-generate ClangComamndLineReference.rst from clang/.../Options.td .
Differential Revision: https://reviews.llvm.org/D90364
Add a few cross-references among TableGen documents.
Differential Revision: https://reviews.llvm.org/D90186
Add cross-references between TableGen documents.
If `null_pointer_is_valid` is present, `dereferenceable` does not imply
`nonnull`, make it clear.
Came up in D17993.
Reviewed By: aqjune
Differential Revision: https://reviews.llvm.org/D89417
--section-details/-t is a GNU readelf option that produce
an output that is an alternative to --sections.
Differential revision: https://reviews.llvm.org/D89304
Allow overriding the default set of flags used to enable UBSan when
building llvm.
This can be used to test new checks or opt out of certain checks.
Differential Revision: https://reviews.llvm.org/D89439
This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.
See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ
Differential Revision: https://reviews.llvm.org/D88861
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.
While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
Fixes pr/47479.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D87956
In preparation for potential future concurrency, a FunctionPass
shouldn't modify anything at the module level that other FunctionPasses
can also modify.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89890
Recently [1], there was an upgrade to the version of buildbot being
deployed. The new setup will still work with old buildslaves but I
thought it might be a good idea to update the documentation to reflect,
that you now can use a newer buildbot version to when setting up your
worker (formely known as slave).
The upgrade from buildbot 0.8.5 to 2.8.5 went a long with a transition
to a new "worker" terminology [2] which is also reflected by this
change.
[1]: http://lists.llvm.org/pipermail/llvm-dev/2020-October/145629.html
[2]: http://docs.buildbot.net/0.9.12/manual/worker-transition.html
Reviewed By: gkistanova
Differential Revision: https://reviews.llvm.org/D89230
LLVM IR currently assumes some form of forward progress. This form is
not explicitly defined anywhere, and is the cause of miscompilations
in most languages that are not C++11 or later. This implicit forward progress
guarantee can not be opted out of on a function level nor on a loop
level. Languages such as C (C11 and later), C++ (pre-C++11), and Rust
have different forward progress requirements and this needs to be
evident in the IR.
Specifically, C11 and onwards (6.8.5, Paragraph 6) states that "An
iteration statement whose controlling expression is not a constant
expression, that performs no input/output operations, does not access
volatile objects, and performs no synchronization or atomic operations
in its body, controlling expression, or (in the case of for statement)
its expression-3, may be assumed by the implementation to terminate."
C++11 and onwards does not have this assumption, and instead assumes
that every thread must make progress as defined in [intro.progress] when
it comes to scheduling.
This was initially brought up in [0] as a bug, a solution was presented
in [1] which is the current workaround, and the predecessor to this
change was [2].
After defining a notion of forward progress for IR, there are two
options to address this:
1) Set the default to assuming Forward Progress and provide an opt-out for functions and an opt-in for loops.
2) Set the default to not assuming Forward Progress and provide an opt-in for functions, and an opt-in for loops.
Option 2) has been selected because only C++11 and onwards have a
forward progress requirement and it makes sense for them to opt-into it
via the defined `mustprogress` function attribute. The `mustprogress`
function attribute indicates that the function is required to make
forward progress as defined. This is sharply in contrast to the status
quo where this is implicitly assumed. In addition, `willreturn` implies `mustprogress`.
The background for why this definition was chosen is in [3] and for why
the option was chosen is in [4] and the corresponding thread(s). The implementation is in D85393, the
clang patch is in D86841, the LoopDeletion patch is in D86844, the
Inliner patches are in D87180 and D87262, and there will be more
incoming.
[0] https://bugs.llvm.org/show_bug.cgi?id=965#c25
[1] https://lists.llvm.org/pipermail/llvm-dev/2017-October/118558.html
[2] https://reviews.llvm.org/D65718
[3] https://lists.llvm.org/pipermail/llvm-dev/2020-September/144919.html
[4] https://lists.llvm.org/pipermail/llvm-dev/2020-September/145023.html
Reviewed By: jdoerfert, efriedma, nikic
Differential Revision: https://reviews.llvm.org/D86233
The langref description for llvm.test.set.loop.iterations.* were
missing the i1 return type.
Differential Revision: https://reviews.llvm.org/D89564
Patch by: Janek van Oirschot
This patch updates the Kaleidoscope and BuildingAJIT tutorial series (chapter
1-4) to OrcV2. Chapter 5 of the BuildingAJIT series is removed -- it will be
re-instated once we have in-tree support for out-of-process JITing.
This patch only updates the tutorial code, not the text. Patches welcome for
that, otherwise I will try to update it in a few weeks.