Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: yaxunl, guansong, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D79255
If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.
On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):
instsimplify.NumKnownBits: 216
instsimplify.NumKnownBitsComputed: 13828375
valuetracking.NumKnownBitsComputed: 45860806
Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)
There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.
Differential Revision: https://reviews.llvm.org/D79294
Summary:
This factors cost and reporting out of the inlining workflow, thus
making it easier to reuse when driving inlining from the upcoming
InliningAdvisor.
Depends on: D79215
Reviewers: davidxl, echristo
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79275
Since every AbstractAttribute so far, and for the foreseeable future,
corresponds to a single IRPosition we can simplify the class structure.
We already did this for IRAttribute but there is no reason to stop
there.
LLD calls this on every source file string in every object file when
writing PDBs, so it is somewhat hot.
Avoid rewriting paths that do not contain path traversal components
(./..). Use find_first_not_of(separators) directly instead of using the
path iterators. The path component iterators appear to be slow, and
directly searching for slashes makes it easier to find double separators
that need to be canonicalized.
I discovered that the VFS relies on remote_dots to not canonicalize
early slashes (/foo or C:/foo) on Windows, so I had to leave that
behavior behind with unit tests for it. This is undesirable, but I claim
that my change is NFC.
Summary:
Current implementation of DWARFDie::getName(DINameKind Kind) could
lead to double call to DWARFDie::find(DW_AT_name) in following
scenario:
getName(LinkageName);
getName(ShortName);
getName(LinkageName) calls find(DW_AT_name) if linkage name is not
found. Then, it is called again in getName(ShortName). This patch
alows to request LinkageName and ShortName separately
to avoid extra call to find(DW_AT_name).
It helps D74169 to parse clang debuginfo faster(~1%).
Reviewers: clayborg, dblaikie
Differential Revision: https://reviews.llvm.org/D79173
The splitVector helper uses extractSubVector which splits build vectors like we do here, so avoid reimplementing it.
splitVector could easily be extended to peek through bitcasts as well but I'd prefer to keep this commit NFC.
The number of public symbols is very large, and each deserialization
does a few heap allocations. The public symbols are serialized by the
linker, so we can assume they have the expected layout and use it
directly.
Saves O(#publics) temporary heap allocations and shrinks some data
structures.
Don't use $noreg for instructions that take register inputs.
Only allow $noreg for parts of memory operands.
Don't use index register with $rip base.
Use RETQ instead of the RET pseudo. This pass is after the
ExpandPseudo pass that converts RET to RETQ.
This accounts for a large portion of the memory allocations in LLD.
This DebugSubsectionRecordBuilder object can be stored directly in
C13Builders, it mostly wraps other subsections.
Remove the container kind field from the object. It is always the same
for all elements in the vector, and we can pass it in during writing.
This particular overload allocates memory, and we do this for every
S_[GL]PROC32_ID record. Instead, hardcode the offset of the typeindex
that we are looking for in the LF_[MEM]FUNC_ID record. We already
assumed that looking up the item index already found a record of this
kind.
Summary:
FileCheck documentation contains an example of a numeric variable
defined and used on the same line. This is not currently supported by
FileCheck so this commit fixes the example to use CHECK-SAME for the
variable use.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D79253
The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces.
The technique employed by `ExpandLoad` is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines.
Differential Revision: https://reviews.llvm.org/D79096
Otherwise an ArgumentParser is constructed for every directive section,
and that involves copying the entire table of options into a vector.
There is no need for this, just have one option table.
Summary:
Lld test ELF/linkerscript/thunk-gen-mips.s was accidentally disabled due
to the use of wrong FileCheck directives. As a result the test seems to
have bitrotted as it fails to pass if fixing the directive. To ease
updates to the test in case of change of the __start address the checks
have been changed to use numeric variables to express all the addresses
based on the __start address.
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D79270
This makes the previously unaccessible AST nodes for C++17 "if with
init statements" accessible to consumers of libclang.
Differential Revision: https://reviews.llvm.org/D78214
This makes the BindingDecl accessible to consumers of libclang
as CXCursor_UnexposedDecl where previously these AST nodes were
not visited at all from the libclang API.
Differential Revision: https://reviews.llvm.org/D78213
This is useful for several reasons:
* In some situations the user can guarantee that thread-safety isn't necessary and don't want to pay the cost of synchronization, e.g., when parsing a very large module.
* For things like logging threading is not desirable as the output is not guaranteed to be in stable order.
This flag also subsumes the pass manager flag for multi-threading.
Differential Revision: https://reviews.llvm.org/D79266
rL368553 added SimplifyMultipleUseDemandedBits handling for ISD::TRUNCATE to SimplifyDemandedBits so we don't need to duplicate this (and it gets rid of another GetDemandedBits call which is slowly being replaced with SimplifyMultipleUseDemandedBits anyhow).