Current dsymutil implementation of hasLiveMemoryLocation()/hasLiveAddressRange()
and applyValidRelocs() assume that calls should be done in certain order
(from first Dies to last). Multi-thread implementation might call these methods
in other order(it might process compilation units in order other than they are physically
located), so we remove restriction that searching for relocations should be done
in ascending order. This change does not introduce noticable performance degradation.
The testing results for clang binary:
golden-dsymutil/dsymutil 23787992
clang MD5: 5efa8fd9355ebf81b65f24db5375caa2
elapsed time=91sec
build-Release/bin/dsymutil 23855616
clang MD5: 5efa8fd9355ebf81b65f24db5375caa2
elapsed time=91sec
Differential Revision: https://reviews.llvm.org/D93106
After committing D92214 it was noticed libc++ no longer builds with
C++17. For now reenable building with C++17. This is intended to be a
temporary measure in the future a C++20 capable compiler will be
required.
The result values of vp2intersect are vectors of bits, i.e.,
vector<8xi1> or vector<16xi8> (instead of i8 or i16).
Differential Revision: https://reviews.llvm.org/D95678
Analyze the shape of the result of TRANSFER(ptr,array) correctly
when "ptr" is an array of deferred shape. Fixing this bug led to
some refactoring and concentration of common code in TypeAndShape
member functions with code in general shape and character length
analysis, and this led to some regression test failures that have
all been cleaned up.
Differential Revision: https://reviews.llvm.org/D95744
We already used emplace_back in at least one other place so be
consistent.
AddPatternToMatch already took PTM as an rvalue reference, but
we need to use std::move again to move it into the PatternToMatch
vector.
Use vector::swap instead of copying to a local vector and clearing
the original. We can just swap into the just created local vector
instead which will move the pointers and not the data.
Use std::move in another place to avoid a copy.
This patch refines the logic to choose compute capabilites via the
environment variable `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES`. It supports the
following values (all case insensitive):
- "all": Build `deviceRTLs` for all supported compute capabilites;
- "auto": Only build for the compute capability auto detected. Note that this
requires CUDA. If CUDA is not found, a CMake fatal error will be raised.
- "xx,yy" or "xx;yy": Build for compute capabilities `xx` and `yy`.
If `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES` is not set, it is equivalent to set
it to `all`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95687
AMDGPUTargetTransformInfo.h needs AMDGPUTargetMachine but relies on a
forward declaration of AMDGPUTargetMachine in AMDGPU.h. This patch
adds a forward declaration right in AMDGPUTargetTransformInfo.h.
While we are at it, this patch removes the one in
AMDGPU.h, where it is unnecessary.
This demonstrates a missed optimization: the `vmv.x.s` instruction is
used to extract the element from the vector, and this instruction
already sign-extends the value to XLEN.
Fixes https://bugs.llvm.org/show_bug.cgi?id=48882.
If the input file does not exist (or has a reading error), the
following code will crash if there are two or more input addresses.
```
auto ResOrErr = Symbolizer.symbolizeInlinedCode(
ModuleName, {Offset, object::SectionedAddress::UndefSection});
Printer << (error(ResOrErr) ? DILineInfo() : ResOrErr.get().getFrame(0));
```
For the first address, `symbolizeInlinedCode` returns an error.
For the second address, `symbolizeInlinedCode` returns an empty result
(not an error) and `.getFrame(0)` will crash.
Differential revision: https://reviews.llvm.org/D95609
This patch fixes updating MemorySSA if the header contains memory
defs that do not clobber a duplicated instruction. We need to find the
first defining access outside the loop body and use that as defining
access of the duplicated instruction.
This fixes a crash caused by bee486851c.
D36116 refactored the logic of tests and removed the definition of TARGET_FLAGS, but left one use of it. Restore its definition for that one use, so that an x86_64 test is compiled with -m64.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D93634
This patch adds an option to enable the new pass manager in
LTOCodeGenerator. It also updates a few tests with legacy PM specific
tests, which started failing after 6a59f05606 when
LLVM_ENABLE_NEW_PASS_MANAGER=true.
This patch updates LTOCodeGenerator to use the utilities provided by
LTOBackend to run middle-end optimizations and backend code generation.
This is a first step towards unifying the code used by libLTO's C API
and the newer, C++ interface (see PR41541).
The immediate motivation is to allow using the new pass manager when
doing LTO using libLTO's C API, which is used on Darwin, among others.
With the changes, there are no codegen/stats differences when building
MultiSource/SPEC2000/SPEC2006 on Darwin X86 with LTO, compared
to without the patch.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D94487
Compact unwind entries have 8 bits for the encoding-table offset:
* offsets 0..126 reference the global commmon-encodings table, while
* offsets 127..255 reference a per-second-level-page table.
This diff teaches `llvm-objdump` to print this per-page encodings table.
Differential Revision: https://reviews.llvm.org/D93265
This is an optimized approach for D94155.
Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. Instead,
we analyze the AMX instructions between one call to another. We will
insert ldtilecfg after the first call if we find any AMX instructions.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D95136
Source Drift happens when the sources are updated after profiling the binary
but before building the final optimized binary. If the source has changed since
the profiles were obtained, optimizing basic blocks might be sub-optimal. This
only applies to BasicBlockSection::List as it creates clusters of basic blocks
using basic block ids. Source drift can invalidate these groupings leading to
sub-optimal code generation with regards to performance.
PGO source drift for a particular function can be detected using function
metadata added in D95495.
When source drift is deected, disable basic block clusters by default
which can be re-enabled with -mllvm option
bbsections-detect-source-drift=false.
Differential Revision: https://reviews.llvm.org/D95593
Tuples can occupy quite a lot of space, instead of printing out tuple type
everywhere, just use the type alias if larger (arbitrarily chose a bound for
now).
Differential Revision: https://reviews.llvm.org/D95707
As a fixme notes, both of these directory iterator implementations are
conceptually similar and duplicate the functionality of returning and uniquing
entries across two or more directories. This patch combines them into a single
class 'CombiningDirIterImpl'.
This also drops the 'Redirecting' prefix from RedirectingDirEntry and
RedirectingFileEntry to save horizontal space. There's no loss of clarity as
they already have to be prefixed with 'RedirectingFileSystem::' whenever
they're referenced anyway.
rdar://problem/72485443
Differential Revision: https://reviews.llvm.org/D94857
Update ElementsAttr::isValidIndex to handle ElementsAttr with a scalar. Scalar will have rank 0.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D95663
Legacy Fortran implementations support an alternative form of the
PARAMETER statement; it differs syntactically from the standard's
PARAMETER statement by lacking parentheses, and semantically by
using the type and shape of the initialization expression to define
the attributes of the named constant. (GNU Fortran gets that part
wrong; Intel Fortran and nvfortran have full support.)
This patch disables the old style PARAMETER statement by default, as
it is syntactically ambiguous with conforming assignment statements;
adds a new "-falternative-parameter-statement" option to enable it;
and implements it correctly when enabled.
Fixes https://bugs.llvm.org/show_bug.cgi?id=48774, in which a user
tripped over the syntactic ambiguity.
Differential Revision: https://reviews.llvm.org/D95697
Not all combinations of SEW and LMUL we need to support. For example, we
only need to support [M1, M2, M4, M8] for SEW = 64. There is no need to
define pseudos for PseudoVLSE64MF8, PseudoVLSE64MF4, and PseudoVLSE64MF2.
Differential Revision: https://reviews.llvm.org/D95667