Add builtins required to implement vcmla and rotated variants from
the ACLE
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D92929
This method previously always recursively checked both the left-hand
side and right-hand side of binary operations for splatted (broadcast)
vector values to determine if the parent DAG node is a splat.
Like several other SelectionDAG methods, limit the recursion depth to
MaxRecursionDepth (6). This prevents stack overflow.
See also https://issuetracker.google.com/173785481
Patch by Nicolas Capens. Thanks!
Differential Revision: https://reviews.llvm.org/D92421
Add vfmk intrinsic instructions, a few pseudo instructions to expand
vfmk intrinsic using VM512 correctly, and regression tests.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D92758
* Steps are scaled by `vscale`, a runtime value.
* Changes to circumvent the cost-model for now (temporary)
so that the cost-model can be implemented separately.
This can vectorize the following loop [1]:
void loop(int N, double *a, double *b) {
#pragma clang loop vectorize_width(4, scalable)
for (int i = 0; i < N; i++) {
a[i] = b[i] + 1.0;
}
}
[1] This source-level example is based on the pragma proposed
separately in D89031. This patch only implements the LLVM part.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D91077
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D91084
This commit adds two new intrinsics.
- llvm.experimental.vector.insert: used to insert a vector into another
vector starting at a given index.
- llvm.experimental.vector.extract: used to extract a subvector from a
larger vector starting from a given index.
The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D91362
Add an overload of `RedirectingFileSystem::create` that builds a
redirecting filesystem off of a simple vector of string pairs. This is
intended to be used to support `clang::arcmt::FileRemapper` and
`clang::PreprocessorOptions::RemappedFiles`.
Differential Revision: https://reviews.llvm.org/D91317
Uniformly return uniquely-owned filesystems from VFS creation APIs. The
one exception is `getRealFileSystem`, which has a single instance and
needs to be shared.
This is almost NFC, except that it fixes a memory leak in
`vfs::collectVFSFromYAML()`.
Depends on https://reviews.llvm.org/D92888
Differential Revision: https://reviews.llvm.org/D92890
Allow a `std::unique_ptr` to be moved into the an `IntrusiveRefCntPtr`,
and remove a couple of now-unnecessary `release()` calls.
Differential Revision: https://reviews.llvm.org/D92888
MD5 is used.
Currently during sample profile loading, NameTable has to be loaded entirely
up front before any name string is retrieved. That is because NameTable is
stored using ULEB128 encoding and cannot be directly accessed like an array.
However, if MD5 is used to represent name in the NameTable, it has fixed
length. If MD5 names are stored in uint64_t type instead of ULEB128, NameTable
can be accessed like an array then in many cases only part of the NameTable
has to be read. This is helpful for reducing compile time especially when
small source file is compiled. We find that after this change, the elapsed
time to build a large application distributively is reduced by 5% and the
accumulative cpu time used for building is also reduced by 5%. The size of
the profile is slightly reduced with this change by ~0.2%, and that also
indicates encoding MD5 in ULEB128 doesn't save the storage space.
Differential Revision: https://reviews.llvm.org/D92621
FEntryInserter prepends FENTRY_CALL to the first basic block. In case
there are other instructions, PostRA Machine Instruction Scheduler can
move FENTRY_CALL call around. This actually occurs on SystemZ (see the
testcase). This is bad for the following reasons:
* FENTRY_CALL clobbers registers.
* Linux Kernel depends on whatever FENTRY_CALL expands to to be the very
first instruction in the function.
Fix by adding isCall attribute to FENTRY_CALL, which prevents reordering
by making it a scheduling boundary for PostRA Machine Instruction
Scheduler.
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D91218
Add a `hash_value` for Optional so that other data structures with
optional fields can easily hash them. I have a use for this in an
upcoming patch.
Differential Revision: https://reviews.llvm.org/D92676
This patch adds new PM support for the pass and the pass can be now used
during middle-end transforms. The old pass is remamed to
ScalarizeMaskedMemIntrinLegacyPass.
Reviewed-By: skatkov, aeubanks
Differential Revision: https://reviews.llvm.org/D92743
This adds a set of metaprogramming helpers to help define records and
serialize them out. This is motivated by API Notes which use the
bitcode format to serialize out a binary representation of the data.
These helpers are generically useful though and could help simplify some
of the existing bitcode consumers as well.
This is extracted from the code contributed by Apple at
https://github.com/llvm/llvm-project-staging/tree/staging/swift/apinotes.
Differential Revision: https://reviews.llvm.org/D88582
Update all reference from the specification to the new OpenACC 3.1
document.
Reviewed By: SouraVX
Differential Revision: https://reviews.llvm.org/D92120
clang was complaing about this code:
llvm/include/llvm/IR/PassManager.h:715:17: error: ISO C++20 considers use of overloaded operator '!=' to be ambiguous despite there being a unique best viable function with non-reversed arguments [-Werror,-Wambiguous-reversed-operator]
if (IMapI != IsResultInvalidated.end())
~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
llvm/include/llvm/ADT/DenseMap.h:1253:8: note: candidate function with non-reversed arguments
bool operator!=(const ConstIterator &RHS) const {
^
llvm/include/llvm/ADT/DenseMap.h:1246:8: note: ambiguous candidate function with reversed arguments
bool operator==(const ConstIterator &RHS) const {
^
The warning is triggered when the DenseMapIterator (lhs) is not const and so
the == operator is applied to different types on lhs/rhs.
Using a template allows the function to be available for both const/non-const
iterator types and gets rid of the warning
Currently, -ftime-report + new pass manager emits one line of report for each
pass run. This potentially causes huge output text especially with regular LTO
or large single file (Obeserved in private tests and was reported in D51276).
The behaviour of -ftime-report + legacy pass manager is
emitting one line of report for each pass object which has relatively reasonable
text output size. This patch adds a flag `-ftime-report=` to control time report
aggregation for new pass manager.
The flag is for new pass manager only. Using it with legacy pass manager gives
an error. It is a driver and cc1 flag. `per-pass` is the new default so
`-ftime-report` is aliased to `-ftime-report=per-pass`. Before this patch,
functionality-wise `-ftime-report` is aliased to `-ftime-report=per-pass-run`.
* Adds an boolean variable TimePassesHandler::PerRun to control per-pass vs per-pass-run.
* Adds a new clang CodeGen flag CodeGenOptions::TimePassesPerRun to work with the existing CodeGenOptions::TimePasses.
* Remove FrontendOptions::ShowTimers, its uses are replaced by the existing CodeGenOptions::TimePasses.
* Remove FrontendTimesIsEnabled (It was introduced in D45619 which was largely reverted.)
Differential Revision: https://reviews.llvm.org/D92436
This is a rework of D85812, which didn't land.
When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue
test plan: check-llvm, check-clang
In D85812, there was suggestions on moving the macros to Attributes.td to avoid circular header dependency issue.
I believe it's not worth doing just to be able to use one constant string in one place.
Today, there are already 3 possible attribute values for "coroutine.presplit": c6543cc6b8/llvm/lib/Transforms/Coroutines/CoroInternal.h (L40-L42)
If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill.
Instead, I think the best way to do this is to add an API in Function class that checks whether this function is a coroutine, by checking the attribute by name directly.
Differential Revision: https://reviews.llvm.org/D92706
Add couple of clause validity tests for the update directive and check for
the restriction where at least self, host or device clause must appear on the directive.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D92447
Use lambdas with captures to replace the redundant infrastructure for marshalling of two boolean flags that control the same keypath.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D92773
Text section prefix is created in CodeGenPrepare, it's file format independent implementation, text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF.
Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix.
Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix.
Reviewed By: pengfei, rnk
Differential Revision: https://reviews.llvm.org/D92073
It was removed back in 2013 (f63dfbb) by Matt Arsenault but then
reverted since DragonEgg used it, but that project is no longer
maintained.
Reviewed By: ldionne, dexonsmith
Differential Revision: https://reviews.llvm.org/D92571
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
Instruction darn was introduced in ISA 3.0. It means 'Deliver A Random
Number'. The immediate number L means:
- L=0, the number is 32-bit (higher 32-bits are all-zero)
- L=1, the number is 'conditioned' (processed by hardware to reduce bias)
- L=2, the number is not conditioned, directly from noise source
GCC implements them in three separate intrinsics: __builtin_darn,
__builtin_darn_32 and __builtin_darn_raw. This patch implements the
same intrinsics. And this change also addresses Bugzilla PR39800.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92465
This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.
This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path.
we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding.
Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context.
With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO.
A breakdown of noteworthy changes:
- Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack
* Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted
* Speed up by aggregating `HybridSample` into `AggregatedSamples`
* Added VirtualUnwinder that consumes aggregated `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context. Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`.
* Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.
* Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop
- `getCanonicalFnName` for callee name and name from ELF section
- Added regression test for both unwinding and profile generation
Test Plan:
ninja & ninja check-llvm
Reviewed By: hoy, wenlei, wmi
Differential Revision: https://reviews.llvm.org/D89723
Introduce a function that creates a statically-scheduled workshare loop
out of a canonical loop created earlier by the OpenMPIRBuilder. This
basically amounts to injecting runtime calls to the preheader and the
after block and updating the trip count. Static scheduling kind is
currently hardcoded and needs to be extracted from the runtime library
into common TableGen definitions.
Differential Revision: https://reviews.llvm.org/D92476
Added a trivial destructor in release mode and in debug mode a destructor that asserts RefCount is indeed zero.
This ensure people aren't manually (maybe accidentally) destroying these objects like in this contrived example.
```lang=c++
{
std::unique_ptr<SomethingRefCounted> Object;
holdIntrusiveOwnership(Object.get());
// Object Destructor called here will assert.
}
```
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D92480
Allow sections to be placed into COMDAT groups, in addtion to functions and data
segments.
Also make section symbols unnamed, which allows sections with identical names
(section names are independent of their section symbols, but previously we
gave the symbols the same name as their sections, which results in collisions
when sections are identically-named).
Differential Revision: https://reviews.llvm.org/D92691
This rewrites the logic to get rid of "ELFSymbolRef" API where possible.
This allowed to handle possible errors better, improve warnings reported and add new ones.
Also 'reportWarning' was replaced with 'reportUniqueWarning'
Differential revision: https://reviews.llvm.org/D92545
Since the length of the llvm::SmallVector shufflemask is related to the
minimum number of elements in a scalable vector, it is fine to just get
the Min field of the ElementCount. This is already done for the similar
function changesLength, tests have been added for both.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D92472
Summary: This patch added support for the intrinsics llvm.ppc.dcbfps and llvm.ppc.dcbstps.
dcbfps and dcbstps are actually extended mnemonics of dcbf.
dcbfps RA,RB ---> dcbf RA,RB,4
dcbstps RA,RB ---> dcbf RA,RB,6
Reviewed By: amyk, steven.zhang
Differential Revision: https://reviews.llvm.org/D91323
Notes:
* llvm::createAsmStreamer: it has been moved to TargetRegistry.h
* (anon ns)::WasmObjectWriter::updateCustomSectionRelocations: remnant of D46335
* COFFAsmParser::ParseSEHRegisterNumber: remnant of D66625
* llvm::CodeViewContext::isValidCVFileNumber: accidentally added by r279847
The file was added in 2007 but the functions have never been implemented.
Having the file can only cause confusion to existing C API (llvm-c/lto.h) users.
Notes about a few declarations:
* LiveVariables::RegisterDefIsDead: deleted by r47927
* createForwardControlFlowIntegrityPass, createJumpInstrTablesPass: deleted by r230780
* RegScavenger::setLiveInsUsed: deleted by r292543
* ScheduleDAGInstrs::{toggleKillFlag,startBlockForKills}: deleted by r304055
* Localizer::shouldLocalize: remnant of D75207
* DwarfDebug::addSectionLabel: deleted by r373273