Add a bit more logic into the 'FalseLaneZeros' tracking to enable
horizontal reductions and also make the VADDV variants
validForTailPredication.
Differential Revision: https://reviews.llvm.org/D76708
Aligned_alloc is a standard lib function and has been in glibc since
2.16 and in the C11 standard. It has semantics similar to malloc/calloc
for several analyses/transforms. This patch introduces aligned_alloc
in target library info and memory builtins. Subsequent ones will
make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062
This change will also be useful to LLVM generators that need to allocate
buffers of vector elements larger than 16 bytes (for eg. 256-bit ones),
element boundary alignment for which is not typically provided by glibc malloc.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76970
This patch changes VPWidenRecipe to only store a single original IR
instruction. This is the first required step towards modeling it's
operands as VPValues and also towards breaking it up into a
VPInstruction.
Discussed as part of D74695.
Reviewers: Ayal, gilr, rengolin
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D76988
Extend the FileCollector's API with addDirectory which adds a directory
and its contents to the VFS mapping.
Differential revision: https://reviews.llvm.org/D76671
Without this some instances of copy construction would use the
converting constructor & lead to the destination function_ref referring
to the source function_ref instead of the underlying functor.
Discovered in feedback from 857bf5da35
Thanks to Johannes Doerfert, Arthur O'Dwyer, and Richard Smith for the
discussion and debugging.
The current implementation of the JSONWriter does not support writing
out directory entries. Earlier today I added a unit test to illustrate
the problem. When an entry is added to the YAMLVFSWriter and the path is
a directory, it will incorrectly emit the directory as a file, and any
files inside that directory will not be found by the VFS.
It's possible to partially work around the issue by only adding "leaf
nodes" (files) to the YAMLVFSWriter. However, this doesn't work for
representing empty directories. This is a problem for clients of the VFS
that want to iterate over a directory. The directory not being there is
not the same as the directory being empty.
This is not just a hypothetical problem. The FileCollector for example
does not differentiate between file and directory paths. I temporarily
worked around the issue for LLDB by ignoring directories, but I suspect
this will prove problematic sooner rather than later.
This patch fixes the issue by extending the JSONWriter to support
writing out directory entries. We store whether an entry should be
emitted as a file or directory.
Differential revision: https://reviews.llvm.org/D76670
This flag can be used to mark a symbol as existing only for the purpose of
enabling materialization. Such a symbol can be looked up to trigger
materialization with the lookup returning only once materialization is
complete. Symbols with this flag will never resolve however (to avoid
permanently polluting the symbol table), and should only be looked up using
the SymbolLookupFlags::WeaklyReferencedSymbol flag. The primary use case for
this flag is initialization symbols.
Add a flag for those instructions which read from the top/bottom
halves of their inputs and produce a vector of results with double
width elements.
Differential Revision: https://reviews.llvm.org/D76762
Summary:
The PC_32 DWARF register is for a 32-bit process address space which we
don't implement in AMDGCN; another way of putting this is that the size
of the PC register is not a function of the wavefront size. If we ever
implement a 32-bit process address space we will need to add two more
DwarfFlavours i.e. we will need to represent the product of (wave32,
wave64) x (64-bit address space, 32-bit address space).
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76732
Summary:
The existing helper function can only create a libcall to functions available in
RTLIB. Add a helper function that can create a libcall to a given function name
using the provided calling convention.
Reviewers: aditya_nandakumar, t.p.northover, rovka, arsenm, dsanders
Reviewed By: arsenm
Subscribers: wdng, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76845
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html
In detail this patch
- march options for armv8.6-a
- BFloat16 assembly
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson
Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson
Reviewed By: SjoerdMeijer
Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D76062
Summary:
Rename `succ_const_iterator` to `const_succ_iterator` and
`succ_const_range` to `const_succ_range` for consistency with the
predecessor iterators, and the corresponding iterators in
MachineBasicBlock.
Reviewers: nicholas, dblaikie, nlewycky
Subscribers: hiraditya, bmahjour, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75952
Add a target flag for instructions that reduce into one, or more,
scalar reg(s), including variants of:
- VADDV
- VABAV
- VMINV/VMAXV
- VMLADAV
Differential Revision: https://reviews.llvm.org/D76683
For some operations, the type is unimportant and only the number of
bits matters. For example I don't want to treat <4 x s8> as a legal
type, but I also don't want to decompose loads of this into smaller
pieces to get legal register types.
On AMDGPU in SelectionDAG, we legalize a number of operations (most
notably load and store) by coercing all types to vectors of i32. For
GlobalISel, I'm trying very hard to avoid doing this for every type,
but I don't think this strategy can be completely avoided. I'm trying
to avoid bitcasts for any legitimately legal type we can operate on,
since the intervening bitcasts have proven to be a hassle.
For loads, I think I can get away without ever casting the result
type, and handling any arbitrary bitwidth during selection (I will
eventually want new tablegen support to help with this, rather than
having to add every possible type as legal). The unmerge required to
do anything with the value should expand to the expected shifts. This
is trickier for stores, since it would now require handling a wide
array of truncates during selection which I don't want.
Future potentially interesting case are for vector indexing, where
sub-dword type should be indexed in s32 pieces.
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.
[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D74888
The initial implementation just delegates to APInt's implementation of
XOR for single element ranges and conservatively returns the full set
otherwise.
Reviewers: nikic, spatel, lebedev.ri
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D76453
Add a flag, 'RetainsPreviousHalfElement', for operations that operate
on top/bottom halves of their input and only write to half of their
destination, leaving the other half to retain its previous value.
Differential Revision: https://reviews.llvm.org/D76608
The algorithm supports both assigning a fixed offset to a field prior to
layout and allowing fields to have sizes that aren't multiples of their
required alignments. This means that the well-known algorithm of sorting
by decreasing alignment isn't always good enough. Still, we start with
that, and only if that leaves padding around do we fall back on a greedy
padding-minimizing algorithm.
There is no known efficient algorithm for producing a guaranteed-minimal
layout in all cases. In fact, allowing arbitrary fixed-offset fields means
there's a straightforward reduction from bin-packing, making this NP-hard.
But as usual with such problems, we can still efficiently produce adequate
solutions to the cases that matter most to us.
I intend to use this in coroutine frame layout, where the retcon lowerings
very badly want to minimize total space usage, and where the switch lowering
can indeed produce a header with interior padding if the promise field is
highly-aligned. But it may be useful in a much wider variety of situations.
Add a unit test for vfs::YAMLVFSWriter.
This patch exposes an issue in the writer: when we call addFileMapping
with a directory, the VFS writer will emit it as a regular file, causing
any of the nested files or directories to not be found.
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
The only reason we export symbols from these tools is to support
plugins; if we don't have plugins, exporting symbols just bloats the
executable and makes LTO less effective.
See review of D75879 for the discussion that led to this patch.
Differential Revision: https://reviews.llvm.org/D76527
FAILED: unittests/Target/AMDGPU/AMDGPUTests
...
ld.lld: error: undefined symbol: llvm::MCRegisterInfo::getLLVMRegNum(unsigned int, bool) const
>>> referenced by DwarfRegMappings.cpp:60 (/usr/local/google/home/maskray/llvm/llvm/unittests/Target/AMDGPU/DwarfRegMappings.cpp:60)
>>> unittests/Target/AMDGPU/CMakeFiles/AMDGPUTests.dir/DwarfRegMappings.cpp.o:(AMDGPUDwarfRegMappingTests_TestWave64DwarfRegMapping_Test::TestBody())
>>> referenced by DwarfRegMappings.cpp:82 (/usr/local/google/home/maskray/llvm/llvm/unittests/Target/AMDGPU/DwarfRegMappings.cpp:82)
>>> unittests/Target/AMDGPU/CMakeFiles/AMDGPUTests.dir/DwarfRegMappings.cpp.o:(AMDGPUDwarfRegMappingTests_TestWave32DwarfRegMapping_Test::TestBody())
A -DBUILD_SHARED_LIBS=off build is good because AMDGPUCodeGen pulls in MC.
A -DBUILD_SHARED_LIBS=on build requires all direct dependencies (MC) to be listed becuase llvm/cmake/modules/HandleLLVMOptions.cmake uses -Wl,-z,defs
This is NFC-ish. The results should be identical, but perf is hopefully
better with the fast-path for no scaling. Added a unit test for that.
The code is adapted from what used to be the DAGCombiner equivalent
function before D76508 (rG0eeee83d7513).
Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst.
This enables generating appropriate DWARF register numbers for wave64 and
wave32 modes.
We have some long-standing missing shuffle optimizations that could
use this transform via VectorCombine now:
https://bugs.llvm.org/show_bug.cgi?id=35454
(and we still don't get that case in the backend either)
This function is apparently templated because there's existing code
in IR that treats mask values as unsigned and backend code that
treats masks values as signed.
The mask values are not endian-dependent (as shown by the existing
bitcast transform from DAGCombiner).
Differential Revision: https://reviews.llvm.org/D76508
Summary:
Widening G_UNMERGE_VALUES to a type which is larger than the
original source type is the same as widening it to the same
type as the source type: in both cases, G_UNMERGE_VALUES has
to be replaced with bit arithmetic which. Although the arithmetic
itself is independent of whether the source type is smaller
or equal to the widen type, widening the source type to the
widen type should result in less artifacts being emitted,
since this is the type that the user explicitly requested.
Reviewers: arsenm, dsanders, aemerson, aditya_nandakumar
Reviewed By: arsenm, dsanders
Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76494
This is the Waymarking algorithm implemented as an independent utility.
The utility is operating on a range of sequential elements.
First we "tag" the elements, by calling `fillWaymarks`.
Then we can "follow" the tags from every element inside the tagged
range, and reach the "head" (the first element), by calling
`followWaymarks`.
Differential Revision: https://reviews.llvm.org/D74415
MSVC insists on using the deleted move constructor instead of the copy
constructor:
http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/41203
C:\ps4-buildslave2\lld-x86_64-win7\llvm-project\llvm\unittests\ADT\CoalescingBitVectorTest.cpp(193):
error C2280: 'llvm::CoalescingBitVector<unsigned
int,16>::CoalescingBitVector(llvm::CoalescingBitVector<unsigned int,16>
&&)': attempting to reference a deleted function
advanceToLowerBound moves an iterator to the first bit set at, or after,
the given index. This can be faster than doing IntervalMap::find.
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76466
Avoid making a heap allocation when constructing a CoalescingBitVector.
This reduces time spent in LiveDebugValues when compiling sqlite3 by
700ms (0.5% of the total User Time).
rdar://60046261
Differential Revision: https://reviews.llvm.org/D76465
Summary:
1. FileLineInfoSpecifier::Default isn't the default for anything.
Rename to RawValue, which accurately reflects its role.
2. Most functions that take a part of a FileLineInfoSpecifier end up
constructing a full one later or plumb two values through. Make them
all just take a complete FileLineInfoSpecifier.
3. Printing basenames only was handled differently from all other
variants, make it parallel to all the other variants.
Reviewers: jhenderson
Subscribers: hiraditya, MaskRay, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76394
Check the path length limit against the length of the UTF-16 version of
the input rather than the UTF-8 equivalent, as the UTF-16 length may be
shorter. Move widenPath from the llvm::sys::path namespace in Path.h to
the llvm::sys::windows namespace in WindowsSupport.h. Only use the
reduced path length limit for create directory. Canonicalize using
sys::path::remove_dots().
Differential Revision: https://reviews.llvm.org/D75372
The latest improvements to VPValue printing make this mapping clear when
printing the operand. Printing the mapping separately is not required
any longer.
Reviewers: rengolin, hsaito, Ayal, gilr
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D76375
When the an underlying value is available, we can use its name for
printing, as discussed in D73078.
Reviewers: rengolin, hsaito, Ayal, gilr
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D76200
MCTargetOptionsCommandFlags.inc and CommandFlags.inc are headers which contain
cl::opt with static storage.
These headers are meant to be incuded by tools to make it easier to parametrize
codegen/mc.
However, these headers are also included in at least two libraries: lldCommon
and handle-llvm. As a result, when creating DYLIB, clang-cpp holds a reference
to the options, and lldCommon holds another reference. Linking the two in a
single executable, as zig does[0], results in a double registration.
This patch explores an other approach: the .inc files are moved to regular
files, and the registration happens on-demand through static declaration of
options in the constructor of a static object.
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1756977#c5
Differential Revision: https://reviews.llvm.org/D75579
I used the implementation for floor instead of round. It also turns
out the OpenCL builtin library wasn't using the round builtin, but
implemented the expanded form.
Add a unit test for the compact DWARF expression printer which will be
used by the llvm-objdump --debug-vars option.
Differential revision: https://reviews.llvm.org/D75250
Summary:
This patch will filter attributes to only preserve those that are usefull.
In the case of NoAlias it is filtered out not because it isn't usefull
but because it is incorrect to preserve it as it is only valdi for the
duration of the function.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75828
Summary:
during inling Create and insert an llvm.assume with attributes to preserve them.
to prevent any changes for now generation of llvm.assume is under a flag disabled by default.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75825
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Summary: This patch adds the basic utilities to deal with dropable uses. dropable uses are uses that we rather drop than prevent transformations, for now they are limited to uses in llvm.assume.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: uenoku, lebedev.ri, mgorny, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73404
Summary: When narrowing a scalar G_EXTRACT where the destination lines up perfectly with a single result of the emitted G_UNMERGE_VALUES a COPY should be emitted instead of unconditionally trying to emit a G_MERGE_VALUES.
Reviewers: arsenm, dsanders
Reviewed By: arsenm
Subscribers: wdng, rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75743
In order for dsymutil to collect .apinotes files (which capture
attributes such as nullability, Swift import names, and availability),
I want to propose adding an apinotes: field to DIModule that gets
translated into a DW_AT_LLVM_apinotes (path) nested inside
DW_TAG_module. This will be primarily used by LLDB to indirectly
extract the Swift names of Clang declarations that were deserialized
from DWARF.
<rdar://problem/59514626>
Differential Revision: https://reviews.llvm.org/D75585
This is part of PR44213 https://bugs.llvm.org/show_bug.cgi?id=44213
When importing (system) Clang modules, LLDB needs to know which SDK
(e.g., MacOSX, iPhoneSimulator, ...) they came from. While the sysroot
attribute contains the absolute path to the SDK, this doesn't work
well when the debugger is run on a different machine than the
compiler, and the SDKs are installed in different directories. It thus
makes sense to just store the name of the SDK instead of the absolute
path, so it can be found relative to LLDB.
rdar://problem/51645582
Differential Revision: https://reviews.llvm.org/D75646
The archive library truncated the size of archive members whose size was
greater than max uint32_t. This patch fixes the issue and adds some unit
tests to verify.
Reviewed by: ruiu, MaskRay, grimar, rupprecht
Differential Revision: https://reviews.llvm.org/D75742
Behavior of IEEEFloat::roundToIntegral is aligned with IEEE-754
operation roundToIntegralExact. In partucular this function now:
- returns opInvalid for signaling NaNs,
- returns opInexact if the result of rounding differs from argument.
Differential Revision: https://reviews.llvm.org/D75246
Summary:
Follow up on D75283.
Also remove the test code that was moved to another test and was to be removed.
Reviewers: davidxl
Subscribers: eraman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75630
If the minimum_instruction_length of a debug line program is 0, no
address advancing via special opcodes, DW_LNS_const_add_pc, and
DW_LNS_advance_pc can occur, since the minimum_instruction_length is
used in a multiplication. This patch adds a warning reporting when this
issue occurs.
Reviewed by: probinson
Differential Revision: https://reviews.llvm.org/D75189
The line_range value of a debug line program header is used in divisions
related to special opcodes and DW_LNS_const_add_pc opcodes. As such, a
value of 0 cannot be used. This change introduces a new warning, if such
a situation is identified, and does not perform the relevant
calculations.
Reviewed by: probinson, aprantl
Differential Revision: https://reviews.llvm.org/D43470
This patch adds a check which reports an unsupported value of the
maximum_operations_per_instruction field in a debug line table header.
This is reported once per line table, at most, and only if the tablet
would otherwise need to use it (i.e. never for tables with version 3 or
less, or for tables which don't use DW_LNS_const_add_pc or special
opcodes). Unsupported values are currently any apart from 1.
Reviewed by: probinson, MaskRay
Differential Revision: https://reviews.llvm.org/D74819
Summary:
Assume bundles need to be usable by Analysis and Transforms/Utils isn't.
so this commit moves utilities to deal with asusme bundles to IR.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75618
Summary: Finding what information is know about a value from a use is generally useful and can be done quickly.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75616
Summary:
These implement the usual IEEE-style floating point comparison
semantics, e.g. +0.0 == -0.0 and all operators except != return false
if either argument is NaN.
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, dexonsmith, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75237
Summary:
We already have overloaded binary arithemetic operators so you can write
A+B etc. This patch lets you write -A instead of neg(A).
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75236
* Delete boilerplate
* Change functions to return `Error`
* Test parsing errors
* Update callers of ARMAttributeParser::parse() to check the `Error` return value.
Since this patch touches nearly everything in the file, I apply
http://llvm.org/docs/Proposals/VariableNames.html and change variable
names to lower case.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D75015
Currently when printing VPValues we use the object address, which makes
it hard to distinguish VPValues as they usually are large numbers with
varying distance between them.
This patch adds a simple slot tracker, similar to the ModuleSlotTracker
used for IR values. In order to dump a VPValue or anything containing a
VPValue, a slot tracker for the enclosing VPlan needs to be created. The
existing VPlanPrinter can take care of that for the existing code. We
assign consecutive numbers to each VPValue we encounter in a reverse
post order traversal of the VPlan.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D73078
This fixes printing long values that might reside in CIE and FDE,
including offsets, lengths, and addresses.
Differential Revision: https://reviews.llvm.org/D73887
Summary:
getInitialLength is a *DWARF*DataExtractor method so I had to "upgrade"
some DataExtractors to be able to make use of it.
Reviewers: ikudrin, jhenderson, probinson
Subscribers: aprantl, hiraditya, llvm-commits, dblaikie
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75535
Summary: This patch adds an analysis pass to collect loop nests and
summarize properties of the nest (e.g the nest depth, whether the nest
is perfect, what's the innermost loop, etc...).
The motivation for this patch was discussed at the latest meeting of the
LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we
discussed
the unimodular loop transformation framework ( “A Loop Transformation
Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and
Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework
provides a convenient way to unify legality checking and code generation
for several loop nest transformations (e.g. loop reversal, loop
interchange, loop skewing) and their compositions. Given that the
unimodular framework is applicable to perfect loop nests this is one
property of interest we expose in this analysis. Several other utility
functions are also provided. In the future other properties of interest
can be added in a centralized place.
Authored By: etiotto
Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn,
reames, hfinkel, jdoerfert, ppc-slack
Reviewed By: Meinersbur
Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D68789
Summary: This patch adds an analysis pass to collect loop nests and
summarize properties of the nest (e.g the nest depth, whether the nest
is perfect, what's the innermost loop, etc...).
The motivation for this patch was discussed at the latest meeting of the
LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we
discussed
the unimodular loop transformation framework ( “A Loop Transformation
Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and
Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework
provides a convenient way to unify legality checking and code generation
for several loop nest transformations (e.g. loop reversal, loop
interchange, loop skewing) and their compositions. Given that the
unimodular framework is applicable to perfect loop nests this is one
property of interest we expose in this analysis. Several other utility
functions are also provided. In the future other properties of interest
can be added in a centralized place.
Authored By: etiotto
Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn,
reames, hfinkel, jdoerfert, ppc-slack
Reviewed By: Meinersbur
Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D68789
Summary: This patch adds a new way to query operand bundles of an llvm.assume that is much better suited to some users like the Attributor that need to do many queries on the operand bundles of llvm.assume. Some modifications of the IR like replaceAllUsesWith can cause information in the map to be outdated, so this API is more suited to analysis passes and passes that don't make modification that could invalidate the map.
Reviewers: jdoerfert, sstefan1, uenoku
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75020
This patch adds a getPlan accessor to VPBlockBase, which finds the entry
block of the plan containing the block and returns the plan set for this
block.
VPBlockBase contains a VPlan pointer, but it should only be set for
the entry block of a plan. This allows moving blocks without updating
the pointer for each moved block and in the future we might introduce a
parent relationship between plans and blocks, similar to the one in LLVM IR.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: gilr
Differential Revision: https://reviews.llvm.org/D74445
llvm/Support/Base64, fix its implementation and provide a decent test suite.
Previous implementation code was using + operator instead of | to combine
results, which is a problem when shifting signed values. (0xFF << 16) is
implicitly converted to a (signed) int, and thus results in 0xffff0000,
h is
negative. Combining negative numbers with a + in that context is not what we
want to do.
This is a recommit of 5a1958f267 with UB removved.
This fixes https://github.com/llvm/llvm-project/issues/149.
Differential Revision: https://reviews.llvm.org/D75057
and follow-ups:
a2ca1c2d "build: disable zlib by default on Windows"
2181bf40 "[CMake] Link against ZLIB::ZLIB"
1079c68a "Attempt to fix ZLIB CMake logic on Windows"
This changed the output of llvm-config --system-libs, and more
importantly it broke stand-alone builds. Instead of piling on more fix
attempts, let's revert this to reduce the risk of more breakages.
getReductionVars, getInductionVars and getFirstOrderRecurrences were all
being returned from LoopVectorizationLegality as pointers to lists. This
just changes them to be references, cleaning up the interface slightly.
Differential Revision: https://reviews.llvm.org/D75448
This patch upstreams support for the ARM Armv8.1m cpu Cortex-M55.
In detail adding support for:
- mcpu option in clang
- Arm Target Features in clang
- llvm Arm TargetParser definitions
details of the CPU can be found here:
https://developer.arm.com/ip-products/processors/cortex-m/cortex-m55
Reviewers: chill
Reviewed By: chill
Subscribers: dmgreen, kristof.beyls, hiraditya, cfe-commits,
llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74966
in C++ templates."
This was reverted in 802b22b5c8 due to
missing .bc file and a chromium bot failure.
https://bugs.chromium.org/p/chromium/issues/detail?id=1057559#c1
This revision address both of them.
Summary:
This patch adds support for debuginfo generation for defaulted
parameters in clang and also extends corresponding DebugMetadata/IR to support this feature.
Reviewers: probinson, aprantl, dblaikie
Reviewed By: aprantl, dblaikie
Differential Revision: https://reviews.llvm.org/D73462
Summary:
In this patch I've done a slightly bigger rewrite to also remove the
hardcoded header lengths.
Reviewers: jhenderson, dblaikie, ikudrin
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75119
Move Base64 implementation from clangd/SemanticHighlighting to
llvm/Support/Base64, fix its implementation and provide a decent test suite.
Previous implementation code was using + operator instead of | to combine some
results, which is a problem when shifting signed values. (0xFF << 16) is
implicitly converted to a (signed) int, and thus results in 0xffff0000, which is
negative. Combining negative numbers with a + in that context is not what we
want to do.
This fixes https://github.com/llvm/llvm-project/issues/149.
Differential Revision: https://reviews.llvm.org/D75057
The Bitcode/DITemplateParameter-5.0.ll test is failing:
FAIL: LLVM :: Bitcode/DITemplateParameter-5.0.ll (5894 of 36324)
******************** TEST 'LLVM :: Bitcode/DITemplateParameter-5.0.ll' FAILED ********************
Script:
--
: 'RUN: at line 1'; /usr/local/google/home/thakis/src/llvm-project/out/gn/bin/llvm-dis -o - /usr/local/google/home/thakis/src/llvm-project/llvm/test/Bitcode/DITemplateParameter-5.0.ll.bc | /usr/local/google/home/thakis/src/llvm-project/out/gn/bin/FileCheck /usr/local/google/home/thakis/src/llvm-project/llvm/test/Bitcode/DITemplateParameter-5.0.ll
--
Exit Code: 2
Command Output (stderr):
--
It looks like the Bitcode/DITemplateParameter-5.0.ll.bc file was never checked in.
This reverts commit c2b437d53d.
in C++ templates.
Summary:
This patch adds support for debuginfo generation for defaulted
parameters in clang and also extends corresponding DebugMetadata/IR to support this feature.
Reviewers: probinson, aprantl, dblaikie
Reviewed By: aprantl, dblaikie
Differential Revision: https://reviews.llvm.org/D73462
Lots of headers pass around MemoryBuffer objects, but very few open
them. Let those that do include FileSystem.h.
Saves ~250 includes of Chrono.h & FileSystem.h:
$ diff -u thedeps-before.txt thedeps-after.txt | grep '^[-+] ' | sort | uniq -c | sort -nr
254 - ../llvm/include/llvm/Support/FileSystem.h
253 - ../llvm/include/llvm/Support/Chrono.h
237 - ../llvm/include/llvm/Support/NativeFormatting.h
237 - ../llvm/include/llvm/Support/FormatProviders.h
192 - ../llvm/include/llvm/ADT/StringSwitch.h
190 - ../llvm/include/llvm/Support/FormatVariadicDetails.h
...
This requires duplicating the file_t typedef, which is unfortunate. I
sunk the choice of mapping mode down into the cpp file using variable
template specializations instead of class members in headers.
Try again with an up-to-date version of D69471 (99317124 was a stale
revision).
---
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
DenseMap requires two sentinel values for keys: empty and tombstone
values. To avoid undefined behavior, LLVM aligns the two sentinel
pointers to alignof(T). This requires T to be complete, which is
needlessly restrictive.
Instead, assume that DenseMap pointer keys have a maximum alignment of
4096, and use the same sentinel values for all pointer keys. The new
sentinels are:
empty: static_cast<uintptr_t>(-1) << 12
tombstone: static_cast<uintptr_t>(-2) << 12
These correspond to the addresses of -4096 and -8192. Hopefully, such a
key is never inserted into a DenseMap.
I encountered this while looking at making clang's SourceManager not
require FileManager.h, but it has several maps keyed on classes defined
in FileManager.h. FileManager depends on various LLVM FS headers, which
cumulatively take ~200ms to parse, and are generally not needed.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D75301
Summary: This fixes the crash that led to the revert of D69591.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75307
Way back in D24994, the combination of LexicalScopes::dominates and
LiveDebugValues was identified as having worst-case quadratic complexity,
but it wasn't triggered by any code path at the time. I've since run into a
scenario where this occurs, in a very large basic block where large numbers
of inlined DBG_VALUEs are present.
The quadratic-ness comes from LiveDebugValues::join calling "dominates" on
every variable location, and LexicalScopes::dominates potentially touching
every instruction in a block to test for the presence of a scope. We have,
however, already computed the presence of scopes in blocks, in the
"InstrRanges" of each scope. This patch switches the dominates method to
examine whether a block is present in a scope's InsnRanges, avoiding
walking through the whole block.
At the same time, fix getMachineBasicBlocks to account for the fact that
InsnRanges can cover multiple blocks, and add some unit tests, as Lexical
Scopes didn't have any.
Differential revision: https://reviews.llvm.org/D73725
Summary: Include the offset at which this happened.
Reviewers: dblaikie, jhenderson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75265
MathExtras.h was just wrapping SwapByteOrder.h functionality, so have
the callers use it directly. Use the MathExtras.h name (ByteSwap_NN) as
the standard naming, since it appears to be the most popular.
Hopefully fixes compile errors on some bots, like:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/13383/steps/ninja%20check%201/logs/stdio
/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/unittests/ADT/CoalescingBitVectorTest.cpp:452:3: required from here
/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/utils/unittest/googletest/include/gtest/gtest-printers.h:377:56: error: ‘const class llvm::CoalescingBitVector<long unsigned int>::const_iterator’ has no member named ‘begin’
for (typename C::const_iterator it = container.begin();
^
/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/llvm/utils/unittest/googletest/include/gtest/gtest-printers.h:378:11: error: ‘const class llvm::CoalescingBitVector<long unsigned int>::const_iterator’ has no member named ‘end’
it != container.end(); ++it, ++count) {
^
On some bots, using gtest asserts to compare iterators does not compile,
and I'm not sure why (this certainly compiles with clang). Disable the
checks for now :/.
```
C:\buildbot\as-builder-3\llvm-clang-x86_64-win-fast\llvm-project\llvm\utils\unittest\googletest\include\gtest/gtest-printers.h(377): error C2039: 'begin': is not a member of 'llvm::CoalescingBitVector<unsigned int,16>::const_iterator'
C:\buildbot\as-builder-3\llvm-clang-x86_64-win-fast\llvm-project\llvm\include\llvm/ADT/CoalescingBitVector.h(243): note: see declaration of 'llvm::CoalescingBitVector<unsigned int,16>::const_iterator'
C:\buildbot\as-builder-3\llvm-clang-x86_64-win-fast\llvm-project\llvm\utils\unittest\googletest\include\gtest/gtest-printers.h(478): note: see reference to function template instantiation 'void testing::internal::DefaultPrintTo<T>(testing::internal::IsContainer,testing::internal::false_type,const C &,std::ostream *)' being compiled
with
[
T=T1,
C=T1
]
```
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-win-fast/builds/12006/steps/test-check-llvm-unit/logs/stdiohttp://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/34521/steps/ninja%20check%201/logs/stdio
Add CoalescingBitVector to ADT. This is part 1 of a 3-part series to
address a compile-time explosion issue in LiveDebugValues.
---
CoalescingBitVector is a bitvector that, under the hood, relies on an
IntervalMap to coalesce elements into intervals.
CoalescingBitVector efficiently represents sets which predominantly
contain contiguous ranges (e.g. the VarLocSets in LiveDebugValues,
which are very long sequences that look like {1, 2, 3, ...}). OTOH,
CoalescingBitVector isn't good at representing sets with lots of gaps
between elements. The first N coalesced intervals of set bits are stored
in-place (in the initial heap allocation).
Compared to SparseBitVector, CoalescingBitVector offers more predictable
performance for non-sequential find() operations. This provides a
crucial speedup in LiveDebugValues.
Differential Revision: https://reviews.llvm.org/D74984
Summary:
Error reporting in DebugInfoDWARF library currently done in three ways :
1. Direct calls to WithColor::error()/WithColor::warning()
2. ErrorPolicy defaultErrorHandler(Error E);
3. void dumpWarning(Error Warning);
additionally, other locations could have more variations:
lld/ELF/SyntheticSection.cpp
if (Error e = cu->tryExtractDIEsIfNeeded(false)) {
error(toString(sec) + ": " + toString(std::move(e)));
DebugInfo/DWARF/DWARFUnit.cpp
if (Error e = tryExtractDIEsIfNeeded(CUDieOnly))
WithColor::error() << toString(std::move(e));
Thus error reporting could look inconsistent. To have a consistent error
messages it is necessary to have a possibility to redefine error
reporting functions. This patch creates two handlers and allows to
redefine them. It also patches all places inside DebugInfoDWARF
to use these handlers.
The intention is always to use following handlers for error reporting
purposes inside DebugInfoDWARF:
DebugInfo/DWARF/DWARFContext.h
std::function<void(Error E)> RecoverableErrorHandler = WithColor::defaultErrorHandler;
std::function<void(Error E)> WarningHandler = WithColor::defaultWarningHandler;
This is last patch from series of patches: D74481, D74635, D75118.
Reviewers: jhenderson, dblaikie, probinson, aprantl, JDevlieghere
Reviewed By: jhenderson
Subscribers: grimar, hiraditya, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D74308
Summary:
Current LLVM code base does not use error handler with ErrorPolicy.
This patch removes ErrorPolicy from DWARFContext.
This patch is extracted from the D74308.
Reviewers: jhenderson, dblaikie, grimar, aprantl, JDevlieghere
Reviewed By: grimar
Subscribers: hiraditya, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D75118
Summary:
This patch introduces a function to house the code needed to do the
DWARF64 detection dance. The function decodes the initial length field
and returns it as a pair containing the actual length, and the DWARF
encoding.
This patch does _not_ attempt to handle the problem of detecting lengths
which extend past the size of the section, or cases when reads of a
single contribution accidentally escape beyond its specified length, but
I think it's useful in its own right.
Reviewers: dblaikie, jhenderson, ikudrin
Subscribers: hiraditya, probinson, aprantl, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74560
Unlike what I claimed in my previous commit. The caching is
actually not NFC on PHIs.
When we put a big enough max depth, we end up simulating loops.
The cache is effectively cutting the simulation short and we
get less information as a result.
E.g.,
```
v0 = G_CONSTANT i8 0xC0
jump
v1 = G_PHI i8 v0, v2
v2 = G_LSHR i8 v1, 1
```
Let say we want the known bits of v1.
- With cache:
Set v1 cache to we know nothing
v1 is v0 & v2
v0 gives us 0xC0
v2 gives us known bits of v1 >> 1
v1 is in the cache
=> v1 is 0, thus v2 is 0x80
Finally v1 is v0 & v2 => 0x80
- Without cache and enough depth to do two iteration of the loop:
v1 is v0 & v2
v0 gives us 0xC0
v2 gives us known bits of v1 >> 1
v1 is v0 & v2
v0 is 0xC0
v2 is v1 >> 1
Reach the max depth for v1...
unwinding
v1 is know nothing
v2 is 0x80
v0 is 0xC0
v1 is 0x80
v2 is 0xC0
v0 is 0xC0
v1 is 0xC0
Thus now v1 is 0xC0 instead of 0x80.
I've added a unittest demonstrating that.
NFC
Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73704
Summary:
These modificaitons will be used in D74883.
Fixed length C strings can have trailing NULLs or sometimes spaces (BSD archive files), so the fixed length C string defaults to stripping trailing NULLs, but can have the arguments specify to remove one or more kinds of spaces if needed. This is used to extract fixed length C strings from ELF NOTEs in D74883.
Reviewers: labath, dblaikie, aprantl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74991
Summary:
This should produce slightly better error messages in case of failures.
Only slightly, because this code was pretty careful about that to begin
with -- I've seen code which does much worse.
Reviewers: jhenderson, dblaikie
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74899
Summary:
We already have a "Failed" matcher, which can be used to check any
property of the Error object. However, most frequently one just wants to
check the error message, and while this is possible with the "Failed"
matcher, it is also very convoluted
(Failed<ErrorInfoBase>(testing::Property(&ErrorInfoBase::message, "the
message"))).
Now, one can just write: FailedWithMessage("the message"). I expect that
most of the usages will remain this simple, but the argument of the
matcher is not limited to simple strings -- the argument of the matcher
can be any other matcher, so one can write more complicated assertions
if needed (FailedWithMessage(ContainsRegex("foo|bar"))). If one wants to
match multiple error messages, he can pass multiple arguments to the
matcher.
If one wants to match the message list as a whole (perhaps to check the
message count), I've also included a FailedWithMessageArray matcher,
which takes a single matcher receiving a vector of error message
strings.
Reviewers: sammccall, dblaikie, jhenderson
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74898
This change is relevant when embedding the llvm cmake project into
another project. It should not change the build behavior of a normal
llvm build.
In the case where llvm is embedded as a cmake subproject,
CMAKE_SOURCE_DIR does not point to the expected directory and building
the tests fails.
Using CMAKE_CURRENT_SOURCE_DIR fixes this problem, as it will always
point to the same directory.
Differential Revision: https://reviews.llvm.org/D73466
When analyzing PHIs, we gather the known bits for every operand and
merge them together to get the known bits of the result of the PHI.
It is not unusual that merging the information leads to know nothing
on the result (e.g., phi a: i8 3, b: i8 unknown, ..., after looking at the
second argument we know we will know nothing on the result), thus, as
soon as we reach that state, stop analyzing the following operand (i.e.,
on the previous example, we won't process anything after looking at `b`).
This improves compile time in particular with PHIs with a large number
of operands.
NFC.
Summary:
The Offset provides the offset within the function in a SourceLocation struct. This allows us to show the byte offset within a function. We also track offsets within inline functions as well. Updated the lookup tests to verify the offset for functions and inline functions.
0x1000: main + 32 @ /tmp/main.cpp:45
Reviewers: labath, aadsm, serhiy.redko, jankratochvil, xiaobai, wallace, aprantl, JDevlieghere
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74680
Initializers and deinitializers are used to implement C++ static constructors
and destructors, runtime registration for some languages (e.g. with the
Objective-C runtime for Objective-C/C++ code) and other tasks that would
typically be performed when a shared-object/dylib is loaded or unloaded by a
statically compiled program.
MCJIT and ORC have historically provided limited support for discovering and
running initializers/deinitializers by scanning the llvm.global_ctors and
llvm.global_dtors variables and recording the functions to be run. This approach
suffers from several drawbacks: (1) It only works for IR inputs, not for object
files (including cached JIT'd objects). (2) It only works for initializers
described by llvm.global_ctors and llvm.global_dtors, however not all
initializers are described in this way (Objective-C, for example, describes
initializers via specially named metadata sections). (3) To make the
initializer/deinitializer functions described by llvm.global_ctors and
llvm.global_dtors searchable they must be promoted to extern linkage, polluting
the JIT symbol table (extra care must be taken to ensure this promotion does
not result in symbol name clashes).
This patch introduces several interdependent changes to ORCv2 to support the
construction of new initialization schemes, and includes an implementation of a
backwards-compatible llvm.global_ctor/llvm.global_dtor scanning scheme, and a
MachO specific scheme that handles Objective-C runtime registration (if the
Objective-C runtime is available) enabling execution of LLVM IR compiled from
Objective-C and Swift.
The major changes included in this patch are:
(1) The MaterializationUnit and MaterializationResponsibility classes are
extended to describe an optional "initializer" symbol for the module (see the
getInitializerSymbol method on each class). The presence or absence of this
symbol indicates whether the module contains any initializers or
deinitializers. The initializer symbol otherwise behaves like any other:
searching for it triggers materialization.
(2) A new Platform interface is introduced in llvm/ExecutionEngine/Orc/Core.h
which provides the following callback interface:
- Error setupJITDylib(JITDylib &JD): Can be used to install standard symbols
in JITDylibs upon creation. E.g. __dso_handle.
- Error notifyAdding(JITDylib &JD, const MaterializationUnit &MU): Generally
used to record initializer symbols.
- Error notifyRemoving(JITDylib &JD, VModuleKey K): Used to notify a platform
that a module is being removed.
Platform implementations can use these callbacks to track outstanding
initializers and implement a platform-specific approach for executing them. For
example, the MachOPlatform installs a plugin in the JIT linker to scan for both
__mod_inits sections (for C++ static constructors) and ObjC metadata sections.
If discovered, these are processed in the usual platform order: Objective-C
registration is carried out first, then static initializers are executed,
ensuring that calls to Objective-C from static initializers will be safe.
This patch updates LLJIT to use the new scheme for initialization. Two
LLJIT::PlatformSupport classes are implemented: A GenericIR platform and a MachO
platform. The GenericIR platform implements a modified version of the previous
llvm.global-ctor scraping scheme to provide support for Windows and
Linux. LLJIT's MachO platform uses the MachOPlatform class to provide MachO
specific initialization as described above.
Reviewers: sgraenitz, dblaikie
Subscribers: mgorny, hiraditya, mgrang, ributzka, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74300
Summary: Commit 63bb9fee52 was reverted in
7603bfb4b0 because it broke builds that treat
warnings as errors.
This commit updates the calls to `assembleToStream()` in tests to check that
the return value is valid.
Original commit message:
Followup to D74084.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.
Differential Revision: https://reviews.llvm.org/D74325
This test was getting a bit long. Before adding more checks, group the
existing checks according to the matcher used, and break it up into
smaller tests.
After having committed https://reviews.llvm.org/D72226, 2 buildbots
running GCC 5.4.0 began failing. The cause was the order in which those
compilers evaluated the left- and right-hand sides of the expression
`RC.SCCIndices[C] = RC.SCCIndices.size();`. This commit splits the
expression into multiple statements to avoid ambiguity, and adds a test
case that exercises the code that caused the test failures on those
older compilers (which was originally included in the reviewed patch,
https://reviews.llvm.org/D72226).
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.
The downside is that Instruction grows from 56 bytes to 64 bytes. The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.
The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.
We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.
Reviewed By: lattner, vsk, fhahn, dexonsmith
Differential Revision: https://reviews.llvm.org/D51664
This patch upstreams support for the AArch64 Armv8-A cpu Cortex-A34.
In detail adding support for:
- mcpu option in clang
- AArch64 Target Features in clang
- llvm AArch64 TargetParser definitions
details of the cpu can be found here:
https://developer.arm.com/ip-products/processors/cortex-a/cortex-a34
Reviewers: SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: SjoerdMeijer, kristof.beyls, hiraditya, cfe-commits,
llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74483
Change-Id: Ida101fc544ca183a0a0e61a1277c8957855fde0b
Summary:
I noticed a small regression in a toy project of mine after applying
D73835, in which instruction names weren't being set properly. In the
example test case included with this patch,
`llvm::IRBuilderBase::CreateAdd` returns an `llvm::Value *` that is then
passed as an argument to `llvm::IRBuilderBase::Insert`. The overloaded
function that is selected for that call then ignores the `Name`
parameter that is given. This patch addresses that issue.
Reviewers: nikic, Meinersbur, nhaehnle, fhahn, thakis, teemperor
Reviewed By: nikic, fhahn
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74754
Summary:
Depends on https://reviews.llvm.org/D70927.
`LazyCallGraph::addNewFunctionIntoSCC` allows users to insert a new
function node into a call graph, into a specific, existing SCC.
Extend this interface such that functions can be added even when they do
not belong in any existing SCC, but instead in a new SCC within an
existing RefSCC.
The ability to insert new functions as part of a RefSCC is necessary for
outlined functions that do not form a strongly connected cycle with the
function they are outlined from. An example of such a function would be the
coroutine funclets 'f.resume', etc., which are outlined from a coroutine 'f'.
Coroutine 'f' only references the funclets' addresses, it does not call
them directly.
Reviewers: jdoerfert, chandlerc, wenlei, hfinkel
Reviewed By: jdoerfert
Subscribers: hfinkel, JonChesterfield, mehdi_amini, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72226
Currently we always return true, when markConstantRange is used on an
object already containing a constant range. If NewR is equal to the
existing constant range however, nothing changes and we should return
false.
I also went ahead and added a clarifying comment and improved the
assertion.
Reviewers: efriedma, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D73240
As noted on D74621, the bswap intrinsic has a self imposed limitation that the type's bitwidth must be divisible by 16, but there's no reason that APInt::byteSwap must have the same limitation, given that it can already handle any byte width.
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
This patch implements an almost complete handling of OpenMP
contexts/traits such that we can reuse most of the logic in Flang
through the OMPContext.{h,cpp} in llvm/Frontend/OpenMP.
All but construct SIMD specifiers, e.g., inbranch, and the device ISA
selector are define in `llvm/lib/Frontend/OpenMP/OMPKinds.def`. From
these definitions we generate the enum classes `TraitSet`,
`TraitSelector`, and `TraitProperty` as well as conversion and helper
functions in `llvm/lib/Frontend/OpenMP/OMPContext.{h,cpp}`.
The above enum classes are used in the parser, sema, and the AST
attribute. The latter is not a collection of multiple primitive variant
arguments that contain encodings via numbers and strings but instead a
tree that mirrors the `match` clause (see `struct OpenMPTraitInfo`).
The changes to the parser make it more forgiving when wrong syntax is
read and they also resulted in more specialized diagnostics. The tests
are updated and the core issues are detected as before. Here and
elsewhere this patch tries to be generic, thus we do not distinguish
what selector set, selector, or property is parsed except if they do
behave exceptionally, as for example `user={condition(EXPR)}` does.
The sema logic changed in two ways: First, the OMPDeclareVariantAttr
representation changed, as mentioned above, and the sema was adjusted to
work with the new `OpenMPTraitInfo`. Second, the matching and scoring
logic moved into `OMPContext.{h,cpp}`. It is implemented on a flat
representation of the `match` clause that is not tied to clang.
`OpenMPTraitInfo` provides a method to generate this flat structure (see
`struct VariantMatchInfo`) by computing integer score values and boolean
user conditions from the `clang::Expr` we keep for them.
The OpenMP context is now an explicit object (see `struct OMPContext`).
This is in anticipation of construct traits that need to be tracked. The
OpenMP context, as well as the `VariantMatchInfo`, are basically made up
of a set of active or respectively required traits, e.g., 'host', and an
ordered container of constructs which allows duplication. Matching and
scoring is kept as generic as possible to allow easy extension in the
future.
---
Test changes:
The messages checked in `OpenMP/declare_variant_messages.{c,cpp}` have
been auto generated to match the new warnings and notes of the parser.
The "subset" checks were reversed causing the wrong version to be
picked. The tests have been adjusted to correct this.
We do not print scores if the user did not provide one.
We print spaces to make lists in the `match` clause more legible.
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: merge_guards_bot, rampitec, mgorny, hiraditya, aheejin, fedor.sergeev, simoncook, bollu, guansong, dexonsmith, jfb, s.egerton, llvm-commits, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71830
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
This patch adds DenseMapInfo<> support for BitVector and SmallBitVector.
This is part of https://reviews.llvm.org/D71775, where a BitVector is used as a thread affinity mask.
Prior to this patch, if a DW_LNE_set_address opcode was parsed with an
address size (i.e. with a length after the opcode) of anything other 1,
2, 4, or 8, an llvm_unreachable would be hit, as the data extractor does
not support other values. This patch introduces a new error check that
verifies the address size is one of the supported sizes, in common with
other places within the DWARF parsing.
This patch also fixes calculation of a generated line table's size in
unit tests. One of the tests in this patch highlighted a bug introduced
in 1271cde474, when non-byte operands were used as arguments for
extended or standard opcodes.
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D73962
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.
Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.
However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.
Testing: stage2 RelWithDebInfo sanitized build, check-llvm
rdar://59397340
Differential Revision: https://reviews.llvm.org/D74517
Same as D73328 but for TBD_V4. One notable tidbit is that the swift abi
version for swift 1 & 2 is emitted as a float which is considered
invalid input.
Differential revision: https://reviews.llvm.org/D73330
Summary:
The DWARF transformer is added as a class so it can be unit tested fully.
The DWARF is converted to GSYM format and handles many special cases for functions:
- omit functions in compile units with 4 byte addresses whose address is UINT32_MAX (dead stripped)
- omit functions in compile units with 8 byte addresses whose address is UINT64_MAX (dead stripped)
- omit any functions whose high PC is <= low PC (dead stripped)
- StringTable builder doesn't copy strings, so we need to make backing copies of strings but only when needed. Many strings come from sections in object files and won't need to have backing copies, but some do.
- When a function doesn't have a mangled name, store the fully qualified name by creating a string by traversing the parent decl context DIEs and then. If we don't do this, we end up having cases where some function might appear in the GSYM as "erase" instead of "std::vector<int>::erase".
- omit any functions whose address isn't in the optional TextRanges member variable of DwarfTransformer. This allows object file to register address ranges that are known valid code ranges and can help omit functions that should have been dead stripped, but just had their low PC values set to zero. In this case we have many functions that all appear at address zero and can omit these functions by making sure they fall into good address ranges on the object file. Many compilers do this when the DWARF has a DW_AT_low_pc with a DW_FORM_addr, and a DW_AT_high_pc with a DW_FORM_data4 as the offset from the low PC. In this case the linker can't write the same address to both the high and low PC since there is only a relocation for the DW_AT_low_pc, so many linkers tend to just zero it out.
Reviewers: aprantl, dblaikie, probinson
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74450
Finalization can introduce new blocks we need to outline as well so it
makes sense to identify the blocks that need to be outlined after
finalization happened. There was also a minor unit test adjustment to
account for the fact that we have a single outlined exit block now.
Reapply 8a56d64d76 with minor fixes.
The problem was that cancellation can cause new edges to the parallel
region exit block which is not outlined. The CodeExtractor will encode
the information which "exit" was taken as a return value. The fix is to
ensure we do not return any value from the outlined function, to prevent
control to value conversion we ensure a single exit block for the
outlined region.
This reverts commit 3aac953afa.
In order to fix PR44560 and to prepare for loop transformations we now
finalize a function late, which will also do the outlining late. The
logic is as before but the actual outlining step happens now after the
function was fully constructed. Once we have loop transformations we
can apply them in the finalize step before the outlining.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74372
function_ref is non-owning, so if we get it as a parameter in constructor,
our reference goes out-of-scope as soon as constructor returns.
Instead, let's just take it as a parameter to the actual `generate()` call
Summary:
Currently, we only have nice exploration for LEA instruction,
while for the rest, we rely on `randomizeUnsetVariables()`
to sometimes generate something interesting.
While that works, it isn't very reliable in coverage :)
Here, i'm making an assumption that while we may want to explore
multi-instruction configs, we are most interested in the
characteristics of the main instruction we were asked about.
Which we can do, by taking the existing `randomizeMCOperand()`,
and turning it on it's head - instead of relying on it to randomly fill
one of the interesting values, let's pregenerate all the possible interesting
values for the variable, and then generate as much `InstructionTemplate`
combinations of these possible values for variables as needed/possible.
Of course, that requires invasive changes to no longer pass just the
naked `Instruction`, but sometimes partially filled `InstructionTemplate`.
As it can be seen from the test, this allows us to explore
`X86::OperandType::OPERAND_COND_CODE` for instructions
that take such an operand.
I'm hoping this will greatly simplify exploration.
Reviewers: courbet, gchatelet
Reviewed By: gchatelet
Subscribers: orodley, mgorny, sdardis, tschuett, jrtc27, atanasyan, mstojanovic, andreadb, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74156
The DWARFv2-4 specification for the line table header states that the
include directories and file name tables both end with a single null
byte. Prior to this change, the parser did not detect if this byte was
missing, because it also stopped reading the tables once it reached the
prologue end, as claimed by the header_length field. This change adds a
check that the terminator has been seen at the end of each table.
Reviewed by: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D74413
Also remove some test duplication and add a test case that shows the
maximum version is rejected (this also shows that the value in the error
message is actually in decimal, and not just missing an 0x prefix).
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D74403
Summary:
Dwarf stores source-file names the three parts:
<compilation_directory><include_directory><filename>
Prior to this change, the code only allowed retrieving either all
three as the absolute path, or just the filename. But many
compile-command lines--especially those in hermetic build systems
don't specify an absolute path, nor just the filename, but rather the
path relative to the compilation directory. This features allows
retrieving them in that style.
Add tests for path printing styles.
Modify createBasicPrologue to handle include directories.
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73383
Summary:
* for <= tbd_v3, simulator platforms appear the same as the real
platform and we distinct the difference from the architecture.
fixes: rdar://problem/59161559
Reviewers: ributzka, steven_wu
Reviewed By: ributzka
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74416
Summary:
Simplifies the C++11-style "-> decltype(...)" return-type deduction.
Note that you have to be careful about whether the function return type
is `auto` or `decltype(auto)`. The difference is that bare `auto`
strips const and reference, just like lambda return type deduction. In
some cases that's what we want (or more likely, we know that the return
type is a value type), but whenever we're wrapping a templated function
which might return a reference, we need to be sure that the return type
is decltype(auto).
No functional change.
Subscribers: dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74383
Summary: It attempts to devirtualize a call on alloca through vtable loads.
Reviewers: davidxl
Subscribers: mgorny, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71308
The plugin expects to have undefined references to symbols exported
by the loading process, which isn't supported by shared libraries
on windows.
Differential Revision: https://reviews.llvm.org/D74042
If a debug line section with version of greater than 5 is encountered,
prior to this change the parser would accept it and treat it as version
5. This might work to some extent, but then it might not at all, as it
really depends on the format of the unspecified future version, which
will be different (otherwise there would be no point in changing the
version number). Any information we could provide has a good chance of
being invalid, so we should just refuse to parse such tables.
Reviewed by: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D74204