This patch introduces the conversions from math function calls
to MASS library calls. To resolves calls generated with these conversions, one
need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX.
Differential: https://reviews.llvm.org/D101759
Reviewer: bmahjour
Adds new optimization remarks when vectorization fails.
More specifically, new remarks are added for following 4 cases:
- Backward dependency
- Backward dependency that prevents Store-to-load forwarding
- Forward dependency that prevents Store-to-load forwarding
- Unknown dependency
It is important to note that only one of the sources
of failures (to vectorize) is reported by the remarks.
This source of failure may not be first in program order.
A regression test has been added to test the following cases:
a) Loop can be vectorized: No optimization remark is emitted
b) Loop can not be vectorized: In this case an optimization
remark will be emitted for one source of failure.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D108371
None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.
This is to fix build errors "Cannot open include file:
'sanitizer/asan_interface.h'" when building LLVM with MSVC and
LLVM_USE_SANITIZER=Address.
asan_interface.h is not available in MSVC's search path, instead it is
located under %VCToolsInstallDir%/crt/src/sanitizer.
This is an alternate solution to https://reviews.llvm.org/D118159, to
avoid adding all internal crt sources to the header search paths.
Tested with visual studio 2019 v16.9.6 and visual studio 2022 v17.0.5
Reviewed By: aaron.ballman, rnk
Differential Revision: https://reviews.llvm.org/D118624
The new LEGALAVL node annotates that the AVL refers to packs of 64bit.
We use a two-stage lowering approach with LEGALAVL:
First, standard SDNodes are translated into illegal VVP layer nodes.
Regardless of source (VP or standard), all VVP nodes have a mask and AVL
parameter. The AVL parameter refers to the element position (just as in
VP intrinsics).
Second, we legalize the AVL usage in VVP layer nodes. If the element
size is < 64bit, the EVL parameter has to be adjusted to refer to packs
of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D118321
Based on the output of include-what-you-use.
This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/
I've tried to summarize the biggest change below:
- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h
And the usual count of preprocessed lines:
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after: 6189948
200k lines less to process is no that bad ;-)
Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D118652
Cleanup code in peelLoop API. We already have usage of DT without guarding
against a null DT, so this change constant folds the remaining null DT
checks.
Also make the argument a reference so that it is clear the argument is
a nonnull DT.
Extracted from D118472.
The namespace llvm::swift is causing errors to pop up in the apple/llvm-project build when cherry-picking 4ce1f3d47c into apple/llvm-project
Differential Review: https://reviews.llvm.org/D118716
While prepending lines to the copied source files is functional, it
disturbs the line numbering between the original and the copy. That
makes development more awkward than necessary, as it is the copy that
generally gets compiled first and emits compiler errors.
This uses sed to alter the first two lines, and also emits better
emacs mode setting, getting both C++ mode and read-only mode.
While here, also update and clarify documentation.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D118135
I have updated TargetLowering::isConstTrueVal to also consider
SPLAT_VECTOR nodes with constant integer operands. This allows the
optimisation to also work for targets that support scalable vectors.
Differential Revision: https://reviews.llvm.org/D117210
To make usage easier (compared to the many reachability related AAs),
this patch introduces a helper API, `AA::isPotentiallyReachable`, which
performs all the necessary steps. It also does the "backwards"
reachability (see D106720) as that simplifies the AA a lot (backwards
queries were somewhat different from the other query resolvers), and
ensures we use cached values in every stage.
To test inter-procedural reachability in a reasonable way this patch
includes an extension to `AAPointerInfo::forallInterferingWrites`.
Basically, we can exclude writes if they cannot reach a load "during the
lifetime" of the allocation. That is, we need to go up the call graph to
determine reachability until we can determine the allocation would be
dead in the caller. This leads to new constant propagations (through
memory) in `value-simplify-pointer-info-gpu.ll`.
Note: The new code contains plenty debug output to determine how
reachability queries are resolved.
Parts extracted from D110078.
Differential Revision: https://reviews.llvm.org/D118673
D106720 introduced features that did not work properly as we could add
new queries after a fixpoint was reached and which could not be answered
by the information gathered up to the fixpoint alone.
As an alternative to D110078, which forced eager computation where we
want to continue to be lazy, this patch fixes the problem.
QueryAAs are AAs that allow lazy queries during their lifetime. They are
never fixed if they have no outstanding dependences and always run as
part of the updates in an iteration. To determine if we are done, all
query AAs are asked if they received new queries, if not, we only need
to consider updated AAs, as before. If new queries are present we go for
another iteration.
Differential Revision: https://reviews.llvm.org/D118669
This patch implement instruction reachability for AAFunctionReachability
attribute. It is used to tell if a certain instruction can reach a function
transitively.
NOTE: I created a new commit based of D106720 and set the author back to
Kuter. Other metadata, etc. is wrong. I also addressed the
remaining review comments and fixed the unit test.
Differential Revision: https://reviews.llvm.org/D106720
We missed out on AANoRecurse in the module pass because we had no call
graph. With AAFunctionReachability we can simply ask if the function may
reach itself.
Differential Revision: https://reviews.llvm.org/D110099
genericValueTraversal can look through arguments and allow value
simplification across function boundaries. In fact, the latter already
happened unchecked. With this change we allow the user of
genericValueTraversal to opt-out of interprocedural traversal if
required. We explicitly look through arguments now which helps to do
various things, incl. the propagation of constants into OpenMP parallel
regions (on the host).
This fixes a conceptual problem with our AAIsDead usage which conflated
call site liveness with call site return value liveness. Without the
fix tests would obviously miscompile as we make genericValueTraversal
more powerful (in a follow up). The effects on the tests are mixed but
mostly marginal. The most prominent one is the lack of `noreturn` for
functions. The reason is that we make entire blocks live at the same
time (for time reasons). Now that we actually look at the block
liveness, which we need to do, the return instructions are live and
will survive. As an example, `noreturn_async.ll` has been modified
to retain the `noreturn` even with block granularity. We could address
this easily but there is little need in practice.
We have two attributes that can answer readnone queries. While there is
a dependence between them, it seems best to not force the users to know
what AA to ask. The helpers also allow to check for readonly nicely.
Test changes show where we now deduce readnone but haven't before,
mostly because we only asked AAMemoryBehavior and not AAMemoryLocation.
AANoAlias has not been ported to the new API yet.
Since D104432 we can look through memory by analyzing all writes that
might interfere with a load. This patch provides some logic to exclude
writes that cannot interfere with a location, due to CFG reasoning.
We make sure to avoid multi-thread write-read situations properly while
we ignore writes that cannot reach a load or writes that will be
overwritten before the load is reached.
Differential Revision: https://reviews.llvm.org/D106397
No-sync is a property that we need in more places as complex
transformations emerge. To simplify the query we provide an
`AA::isNoSyncInst` helper now and expose two existing helpers through
the `AANoSync` class.
Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for the new ABI
Set the metadata version to be 1.2
Reviewers:
t-tye, b-sumner, arsenm, and bcahoon
Fixes:
SWDEV-307188, SWDEV-307189
Differential Revision:
https://reviews.llvm.org/D118272
D116542 adds EmbedBufferInModule which introduces a layer violation
(https://llvm.org/docs/CodingStandards.html#library-layering).
See 2d5f857a1e for detail.
EmbedBufferInModule does not use BitcodeWriter functionality and should be moved
LLVMTransformsUtils. While here, change the function case to the prevailing
convention.
It seems that EmbedBufferInModule just follows the steps of
EmbedBitcodeInModule. EmbedBitcodeInModule calls WriteBitcodeToFile but has IR
update operations which ideally should be refactored to another library.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D118666
The logic for getProfileKind for RawInstrProfReader and
InstrProfReaderIndex is similar. To avoid duplication, move the logic
from the header to InstrProfReader.cpp and introduce a static method
which implements the common code.
Differential Revision: https://reviews.llvm.org/D118656
The definition of the MemInfoBlock is shared between the memprof
compiler-rt runtime and llvm/lib/ProfileData/. This change removes the
memprof_meminfoblock header and moves the struct to the shared include
file. To enable this sharing, the Print method is moved to the
memprof_allocator (the only place it is used) and the remaining uses are
updated to refer to the MemInfoBlock defined in the MemProfData.inc
file.
Also a couple of other minor changes which improve usability of the
types in MemProfData.inc.
* Update the PACKED macro to handle commas.
* Add constructors and equality operators.
* Don't initialize the buildid field.
Differential Revision: https://reviews.llvm.org/D116780
When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept.
Reviewed By: jeroen.dobbelaere
Differential Revision: https://reviews.llvm.org/D108221
This patch adds support for a flag `-fembed-offload-binary` to embed a
file as an ELF section in the output by placing it in a global variable.
This can be used to bundle offloading files with the host binary so it
can be accessed by the linker. The section is named using the
`-fembed-offload-section` option.
Depends on D116541
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D116542
This implements codegen for Armv8.8/9.3 Memory Operations extension (MOPS).
Any memcpy/memset/memmov intrinsics will always be emitted as a series
of three consecutive instructions P, M and E which perform the
operation. The SelectionDAG implementation is split into a separate
patch.
AArch64LegalizerInfo will now consider the following generic opcodes
if +mops is available, instead of legalising by expanding them to
libcalls: G_BZERO, G_MEMCPY_INLINE, G_MEMCPY, G_MEMMOVE, G_MEMSET
The s8 value of memset is legalised to s64 to match the pseudos.
AArch64O0PreLegalizerCombinerInfo will still be able to combine
G_MEMCPY_INLINE even if +mops is present, as it is unclear whether it is
better to generate fixed length copies or MOPS instructions for the
inline code of small or zero-sized memory operations, so we choose to be
conservative for now.
AArch64InstructionSelector will select the above as new pseudo
instructions: AArch64::MOPSMemory{Copy/Move/Set/SetTagging} These are
each expanded to a series of three instructions (e.g. SETP/SETM/SETE)
which must be emitted together during code emission to avoid scheduler
reordering.
This is part 3/4 of a series of patches split from
https://reviews.llvm.org/D117405 to facilitate reviewing.
Patch by Tomas Matheson and Son Tuan Vu
Differential Revision: https://reviews.llvm.org/D117763
We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements.
This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.
Based on the output of include-what you-use.
Most notably, llvm/Remarks/Remark.h is no longer automatically included by
llvm/Remarks/RemarkParser.h, so client code may need to include explicitly.
clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Remarks/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 770253
after: 759347
Related discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D118506
Both IDFCalculatorBase and its accompanying DominatorTreeBase only supports pointer nodes. The template argument is the block type itself and any uses of GraphTraits is therefore done via a pointer to the node type.
However, the ChildrenGetterTy type of IDFCalculatorBase has a use on just the node type instead of a pointer to the node type. Various parts of the monorepo has worked around this issue by providing specializations of GraphTraits for the node type directly, or not been affected by using specializations instead of the generic case. These are unnecessary however and instead the generic code should be fixed instead.
An example from within Tree is eg. A use of IDFCalculatorBase in InstrRefBasedImpl.cpp. It basically instantiates a IDFCalculatorBase<MachineBasicBlock, false> but due to the bug above then goes on to specialize GraphTraits<MachineBasicBlock> although GraphTraits<MachineBasicBlock*> exists (and should be used instead).
Similar dead code exists in clang which defines redundant GraphTraits to work around this bug.
This patch fixes both the original issue and removes the dead code that was used to work around the issue.
Differential Revision: https://reviews.llvm.org/D118386
Due to the SmallVector hierarchy, N==0 cannot be leveraged by functions defined
in base classes. This patch special cases N==0 for SmallVector to save code size
and be slightly more efficient.
In a Release build of x86 only clang, .text is -3.34KiB smaller. In lld .text is
7.17KiB smaller.
Reviewed By: lichray
Differential Revision: https://reviews.llvm.org/D117976
As raised here: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153881.html
Now that VS2022 is on general release, LLVM is expected to build on VS2017, VS2019 and VS2022, which is proving hazardous to maintain due to changes in behaviour including preprocessor and constexpr changes. Plus of the few developers that work with VS, many have already moved to VS2019/22.
This patch proposes to raise the minimum supported version to VS2019 (16.x) - I've made the hard limit 16.0 or later, with the soft limit VS2019 16.7 - older versions of VS2019 are "allowed" (at your own risk) via the LLVM_FORCE_USE_OLD_TOOLCHAIN cmake flag.
Differential Revision: https://reviews.llvm.org/D114639
With opaque pointers, it's possible to directly call a function with
a different signature, without an intermediate bitcast. However,
lot's of code using getCalledFunction() reasonably assumes that the
signatures match (which is always true without opaque pointers).
Add an explicit check to that effect.
The test case is from D105313, where I ran into the problem, but on
further investigation this also affects lots of other code, we just
have little coverage with mismatching signatures. The change from
D105313 is still desirable for other reasons, but this patch
addresses the root problem when it comes to opaque pointers.
Differential Revision: https://reviews.llvm.org/D105733
Few header file don't have file tag in them. This patch can be help
in viewing doxygen documentation. When we hover on the included header
file, small description will display.
Reviewed By: aaron.ballman, xgupta
Differential Revision: https://reviews.llvm.org/D116004
Extend scalar evolution to handle >= and <= if a loop is known to be finite and the induction variable guards the condition. Specifically, with these assumptions lhs <= rhs is equivalent to lhs < rhs + 1 and lhs >= rhs to lhs > rhs -1.
In the case of lhs <= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmax, which would have resulted in an infinite loop.
In the case of lhs >= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmin, which would again have resulted in an infinite loop.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D118090
Currently, the clang.arc.attachedcall bundle takes an optional function
argument. Depending on whether the argument is present, calls with this
bundle have the following semantics:
- on x86, with the argument present, the call is lowered to:
call _target
mov rax, rdi
call _objc_retainAutoreleasedReturnValue
- on AArch64, without the argument, the call is lowered to:
bl _target
mov x29, x29
and the objc runtime call is expected to be emitted separately.
That's because, on x86, the objc runtime checks for both the mov and
the call on x86, and treats the combination as the ARC autorelease elision
marker.
But on AArch64, it only checks for the dedicated NOP marker, as that's
historically been sufficiently unique. Thanks to that, the runtime call
wasn't required to be adjacent to the NOP marker, so it wasn't emitted
as part of the bundle sequence.
This patch unifies both architectures: on AArch64, we now emit all
3 instructions for the bundle. This guarantees that the runtime call
is adjacent to the marker in the sequence, and that's information the
runtime can use to further optimize this.
This helps simplify some of the handling, in particular
BundledRetainClaimRVs, which no longer needs to know whether the bundle
is sufficient or not: it now always should be.
Note that this does not include an AutoUpgrade for the nullary bundles,
as they are only produced in ObjCContract as part of the obj/asm emission
pipeline, and are not expected to be in bitcode.
Differential Revision: https://reviews.llvm.org/D118214
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.
This also adds an additional command line flag debug option to disable outlining intrinsics.
Recommit of: 8de76bd569
Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109450
Close#52781: for LTO, the inline asm diagnostic uses `<inline asm>` as the file
name (lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp) and it is unclear which
module has the issue.
With this patch, we will see the module name (say `asm.o`) before `<inline asm>` with ThinLTO.
```
% clang -flto=thin -c asm.c && myld.lld asm.o -e f
ld.lld: error: asm.o <inline asm>:1:2: invalid instruction mnemonic 'invalid'
invalid
^~~~~~~
```
For regular LTO, unfortunately the original module name is lost and we only get
ld-temp.o.
Reviewed By: #lld-macho, ychen, Jez Ng
Differential Revision: https://reviews.llvm.org/D118434
This file was added in https://reviews.llvm.org/D74415. There was no
justification as to why it was added, and after about a year of being
in-tree, it's still unused, so this removes it.
Add support for Swift reflection metadata to dsymutil.
This patch adds support for copying Swift reflection metadata (__swift5_.* sections) from .o files to into the symbol-rich binary in the output .dSYM. The functionality is automatically enabled only if a .o file has reflection metadata sections and the binary doesn't. When copying dsymutil moves the section from the __TEXT segment to the __DWARF segment.
rdar://76973336
Differential Revision: https://reviews.llvm.org/D115007
`createIRLevelProfileFlagVar()` seems to be only used in
`PGOInstrumentation.cpp` so we move it to that file. Then it can also
take advantage of directly using options rather than passing them as
arguments.
Reviewed By: kyulee, phosek
Differential Revision: https://reviews.llvm.org/D118097
This moves the dependency of several files on include/llvm/ADT/STLExtras.h to
the much shorter llvm/ADT/STLArrayExtras.h
Differential Revision: https://reviews.llvm.org/D118342
Branch protection in M-class is supported by
- Armv8.1-M.Main
- Armv8-M.Main
- Armv7-M
Attempting to enable this for other architectures, either by
command-line (e.g -mbranch-protection=bti) or by target attribute
in source code (e.g. __attribute__((target("branch-protection=..."))) )
will generate a warning.
In both cases function attributes related to branch protection will not
be emitted. Regardless of the warning, module level attributes related to
branch protection will be emitted when it is enabled via the command-line.
The following people also contributed to this patch:
- Victor Campos
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D115501
With opaque pointers, we can no longer derive this from the pointer
type, so we need to explicitly provide the element type the atomic
operation should work with.
Differential Revision: https://reviews.llvm.org/D118359
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
* We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
* We use a single byte per function rather than 8 bytes per block
The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.
When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).
[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116180
This change refactors the ProfileKind enum into a bitset enum to
represent the different attributes a profile can have. This change
simplifies the logic in the instrprof writer when multiple profiles are
merged together. In the future we plan on introducing a new memory
profile section which will extend the enum by one additional entry.
Without this change when accounting for memory profiles will have to be
maintained separately and will make the logic more complex.
Differential Revision: https://reviews.llvm.org/D115393
Since it's introduction, the qrdmlah has been represented as a qrdmulh
and a sadd_sat. This doesn't produce the same result for all input
values though. This patch fixes that by introducing a qrdmlah (and
qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah
instructions. The old test cases will now produce a qrdmulh and sqadd,
as expected.
Fixes#53120 and #50905 and #51761.
Differential Revision: https://reviews.llvm.org/D117592
Use the `llvm-profdata show` command to verify debug info for profile correlation using the `--debug-info` option.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D118181
These flags aren't used and we shouldn't add more flags for new
ratified extensions. So clear out the unused flags to avoid any
confusion.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D118294
Add support for Swift reflection metadata to dsymutil.
This patch adds support for copying Swift reflection metadata (__swift5_.* sections) from .o files to into the symbol-rich binary in the output .dSYM. The functionality is automatically enabled only if a .o file has reflection metadata sections and the binary doesn't. When copying dsymutil moves the section from the __TEXT segment to the __DWARF segment.
rdar://76973336
Differential Revision: https://reviews.llvm.org/D115007
Improve the error messages when using `llvm-profdata` to correlate profiles with debug info.
Reviewed By: kyulee, phosek
Differential Revision: https://reviews.llvm.org/D118166
DIStringType is used to encode the debug info of a character object
in Fortran. A Fortran deferred-length character object is typically
implemented as a pair of the following two pieces of info: An address
of the raw storage of the characters, and the length of the object.
The stringLocationExp field contains the DIExpression to get to the
raw storage.
This patch also enables the emission of DW_AT_data_location attribute
in a DW_TAG_string_type debug info entry based on stringLocationExp
in DIStringType.
A test is also added to ensure that the bitcode reader is backward
compatible with the old DIStringType format.
Differential Revision: https://reviews.llvm.org/D117586
This reverts commit ef82063207.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
Based on the output of iwyu. A full rebuild of llvm-project doesn't exhibit any
significant false dependencies.
The impact on preprocessed output is larger than expected, given the small
amount of changes
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/TextAPI/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 635319
After: 643716
Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
Code generating the special substitutions in std is a switch statement
with each case block containing the same conststruction template. It
is more efficient to commonize that after the switch, having
determined which SubKind to create. Also, let's sort the cases.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D118131
A few header removal, some forward declarations. As usual, this can
break your build due to false dependencies, the most notable change are:
- "llvm/BinaryFormat/AMDGPUMetadataVerifier.h" no longer includes "llvm/BinaryFormat/MsgPackDocument.h"
The impact on generated preprocessed lines for LLVMBinaryFormat is
pretty nice:
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/BinaryFormat/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before this patch: 705281
after this patch: 751456
Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence.
This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32)
on those targets which have no V_XNOR.
Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form.
We assume that xor (not) is already turned into the not (xor) by the combiner.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D116270
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.
This patch replaces the use of `std::vector` with `llvm::BitVector` in LLVM's YAML traits and replaces the call to `Vec.insert(Vec.begin(), N, false)` on empty `Vec` with `Vec.resize(N)`, which has the same semantics but avoids using `insert` and iterators, which `llvm::BitVector` doesn't possess.
Reviewed By: dexonsmith, dblaikie
Differential Revision: https://reviews.llvm.org/D118111
A few more forward-declarations, a few less headers. the impact on number of
preprocessed lines for LLVMSupport is negligible (-3K lines) but it's always
good to remove dependencies.
Related discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
This extract a common isNotVisibleOnUnwind() helper into
AliasAnalysis, which handles allocas, byval arguments and noalias
calls. After D116998 this could also handle sret arguments. We
have similar logic in DSE and MemCpyOpt, which will be switched
to use this helper as well.
The noalias call case is a bit different from the others, because
it also requires that the object is not captured. The caller is
responsible for doing the appropriate check.
Differential Revision: https://reviews.llvm.org/D117000
Summary:
This patch modifies code generation in OpenMPIRBuilder to pass arguments
to the parallel region outlined function in an aggregate (struct),
besides the global_tid and bound_tid arguments. It depends on the
updated CodeExtractor (see D96854) for support. It mirrors functionality
of Clang codegen (see D102107).
Differential Revision: https://reviews.llvm.org/D110114
Summary:
Enable CodeExtractor to construct output functions that partially
aggregate inputs/outputs in their argument list. A use case is the
OMPIRBuilder to create outlined functions for parallel regions that
aggregate in a struct the payload variables for the region while passing
as scalars thread and bound identifiers.
Differential Revision: https://reviews.llvm.org/D96854
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com>
Reviewers: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D117647
We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function.
Recommit of 515eec3553
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D106997