Add support for lowering the schedule modifiers (simd, monotonic,
non-monotonic) in worksharing loops.
Note: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D127311
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Introduce transform ops for "for" loops, in particular for peeling, software
pipelining and unrolling, along with a couple of "IR navigation" ops. These ops
are intended to be generalized to different kinds of loops when possible and
therefore use the "loop" prefix. They currently live in the SCF dialect as
there is no clear place to put transform ops that may span across several
dialects, this decision is postponed until the ops actually need to handle
non-SCF loops.
Additionally refactor some common utilities for transform ops into trait or
interface methods, and change the loop pipelining to be a returning pattern.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D127300
This is a small follow-up for https://reviews.llvm.org/D120051. It makes
sure that tables with "run-time type information for derived types" are
generated for code-gen actions. Originally, only non-code-gen actions
were updated (i.e. actions that were fully supported at that time).
Differential Revision: https://reviews.llvm.org/D127307
glrReduce maintains two priority queues (one for bases, and the other
for Sequence), these queues are in parallel with each other, corresponding to a
single family. They can be folded into one.
This patch removes the bases priority queue, which improves the glrParse by
10%.
ASTReader.cpp: 2.03M/s (before) vs 2.26M/s (after)
Differential Revision: https://reviews.llvm.org/D127283
As pointed out in the previous review section, having a dedicated accept
action doesn't seem to be necessary. This patch implements the the same behavior
without accept acction, which will save some code complexity.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D125677
After f06abbb393 I have been seeing build
failures due to the obj.clang target missing a dependency on
tools/clang/clang-tablegen-targets.
This appears to be due to the fact that LLVM_COMMON_DEPENDS are not added
as dependencies to the object library.
This patch uses the same logic as llvm_add_library to register
dependencies for object libraries.
Reviewed By: beanz, abrachet, steven_wu
Differential Revision: https://reviews.llvm.org/D127318
Existing condition for
fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1
missed some boundary. It cause this fold don't work for some cases, and the
reason is due to signed number overflow.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D127188
The IV widening code currently asserts that terminators aren't SCEVable
-- however, this is not the case for invokes with a returned attribute.
As far as I can tell, this assertions is not necessary -- even if we
have a critical edge (the second test case), the trunc gets inserted
in a legal position.
Fixes https://github.com/llvm/llvm-project/issues/55925.
Differential Revision: https://reviews.llvm.org/D127288
D83592 added comments to be part of skipped regions, and as part of that, it
also shrinks a skipped range if it spans a line that contains a non-comment
token. This is done by `adjustSkippedRange`.
The `adjustSkippedRange` currently runs on skipped regions that are not
comments, causing a 5min regression while building a big C++ files without any
comments.
Fix the compile time introduced in D83592 by tagging SkippedRange with kind
information and use that to decide what needs additional processing.
Differential Revision: https://reviews.llvm.org/D127338
Supports on Android but also from Linux 5.17
Reviewers: vitalybuka, eugenis
Reviewed-By: vitalybuka
Differential Revision: https://reviews.llvm.org/D127326
For arm64, llvm-mc emits relocations for the target function
address like so:
ltmp:
<CIE start>
...
<CIE end>
... multiple FDEs ...
<FDE start>
<target function address - (ltmp + pcrel offset)>
...
If any of the FDEs in `multiple FDEs` get dead-stripped, then `FDE start`
will move to an earlier address, and `ltmp + pcrel offset` will no longer
reflect an accurate pcrel value. To avoid this problem, we "canonicalize"
our relocation by adding an `EH_Frame` symbol at `FDE start`, and updating
the reloc to be `target function address - (EH_Frame + new pcrel offset)`.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D124561
== Background ==
`llvm-mc` generates unwind info in both compact unwind and DWARF
formats. LLD already handles the compact unwind format; this diff gets
us close to handling the DWARF format properly.
== Caveats ==
It's not quite done yet, but I figure it's worth getting this reviewed
and landed first as it's shaping up to be a fairly large code change.
**Known limitations of the current code:**
* Only works for x86_64, for which `llvm-mc` emits "abs-ified"
relocations as described in 618def651b.
`llvm-mc` emits regular relocations for ARM EH frames, which we do not
yet handle correctly.
Since the feature is not ready for real use yet, I've gated it behind a
flag that only gets toggled on during test suite runs. With most of the
new code disabled, we see just a hint of perf regression, so I don't
think it'd be remiss to land this as-is:
base diff difference (95% CI)
sys_time 1.926 ± 0.168 1.979 ± 0.117 [ -1.2% .. +6.6%]
user_time 3.590 ± 0.033 3.606 ± 0.028 [ +0.0% .. +0.9%]
wall_time 7.104 ± 0.184 7.179 ± 0.151 [ -0.2% .. +2.3%]
samples 30 31
== Design ==
Like compact unwind entries, EH frames are also represented as regular
ConcatInputSections that get pointed to via `Defined::unwindEntry`. This
allows them to be handled generically by e.g. the MarkLive and ICF
code. (But note that unlike compact unwind subsections, EH frame
subsections do end up in the final binary.)
In order to make EH frames "look like" a regular ConcatInputSection,
some processing is required. First, we need to split the `__eh_frame`
section along EH frame boundaries rather than along symbol boundaries.
We do this by decoding the length field of each EH frame. Second, the
abs-ified relocations need to be turned into regular Relocs.
== Next Steps ==
In order to support EH frames on ARM targets, we will either have to
teach LLD how to handle EH frames with explicit relocs, or we can try to
make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do
the latter as I think it will make the LLD implementation both simpler
and faster to execute.
== Misc ==
The `obj-file-with-stabs.s` test had to be updated as the previous
version would trip assertion errors in the code. It appears that in our
attempt to produce a minimal YAML test input, we created a file with
invalid EH frame data. I've fixed this by re-generating the YAML and not
doing any hand-pruning of it.
Reviewed By: #lld-macho, Roger
Differential Revision: https://reviews.llvm.org/D123435
The flexibility around extern template instantiation declarations in
libc++ result in a very complicated model, especially when support for
slightly different configurations (like the debug mode or assertions
in the dylib) are taken into account. That results in unexpected bugs
like http://llvm.org/PR50534 (and there have been multiple similar
bugs in the past, notably around the debug mode).
This patch gets rid of the _LIBCPP_DISABLE_EXTERN_TEMPLATE knob, which
I don't think is fundamental. Indeed, the motivation for that knob was to
avoid taking a dependency on the library, however that can be done better
by linking against the static library instead. And in fact, some parts of
the headers will always depend on things defined in the library, which
defeats the original goal of _LIBCPP_DISABLE_EXTERN_TEMPLATE.
Differential Revision: https://reviews.llvm.org/D103960
pthread_getaffinity_np (Linux `kernel/sched/core.c:sched_getaffinity`) fails
with EINVAL if 8*cpusetsize (constant in glibc: 1024) is smaller than
`nr_cpu_ids` (CONFIG_NR_CPUS, which is 2048 for several arch/powerpc/configs
configurations).
The build bot clang-ppc64le-linux-lnt seems to have a larger `nr_cpu_ids`.
Differential Revision: https://reviews.llvm.org/D127368
This reduces linking time by ~8% for my project (1.19s -> 0.53s for
writeSections()). writeTo is const, which bodes well for it being
parallelizable, and I've looked through the different overridden versions and
can't see any race conditions. It produces the same byte-for-byte output for my
project.
Differential Revision: https://reviews.llvm.org/D126800
1. If array element type is a tag decl, complete it.
2. Fix few places where `asTag` should be used instead of `asClass()`.
3. Handle the case that `PdbAstBuilder::CreateFunctionDecl` return nullptr mainly due to an existing workaround (`m_cxx_record_map`).
4. `FindMembersSize` should never return error as this would cause early exiting in `CVTypeVisitor::visitFieldListMemberStream` and then cause assertion failure.
5. In some pdbs from C++ runtime libraries have S_LPROC32 followed directly by S_LOCAL and the local variable location is a S_DEFRANGE_FRAMEPOINTER_REL. There is no information about base frame register in this case, ignoring it by returning RegisterId::NONE.
6. Add a TODO when S_DEFRANGE_SUBFIELD_REGISTER describes the variable location of a pointer type. For now, just ignoring it if the variable is pointer.
This patch precedes a future patch to make PointeeLoc for PointerValue possibly empty (for nullptr), by using a pointer instead of a reference type.
ReferenceValue should maintain a non-empty PointeeLoc reference.
Reviewed By: gribozavr2
Differential Revision: https://reviews.llvm.org/D127312
Add a function to make it easier to debug a test failure caused by an
unexpected state.
Currently, tests are using assertEqual which results in a cryptic error
message: "AssertionError: 5 != 10". Even when a test provides a message
to make it clear why a particular state is expected, you still have to
figure out which of the two was the expected state, and what the other
value corresponds to.
We have a function in lldbutil that helps you convert the state number
into a user readable string. This patch adds a wrapper around
assertEqual specifically for comparing states and reporting better error
messages.
The aforementioned error message now looks like this: "AssertionError:
stopped (5) != exited (10)". If the user provided a message, that
continues to get printed as well.
Differential revision: https://reviews.llvm.org/D127355
When some headers are not available because we removed features like
localization or threads, the compiler should not try to include these
headers when building modules. To avoid that from happening, add a
requires-declaration that is never satisfied when the configuration
in use doesn't support a header.
rdar://93777687
Differential Revision: https://reviews.llvm.org/D127127
On macOS Ventura and later, dyld and the main binary will be loaded
again when dyld moves itself into the shared cache. Update the test
accordingly.
Differential revision: https://reviews.llvm.org/D127331
Adds a check to ensure that a process exists before attempting to get
its ABI to prevent lldb from crashing due to trying to read from page zero.
Differential revision: https://reviews.llvm.org/D127016