Summary:
The old code made some incorrect assumptions about the order in which
basic blocks are laid out in a function. This could lead to incorrect
early-exits, especially when kills occurred inside of loops.
The new approach is to check whether the point where the conditional
kill occurs dominates all reachable code. If that is the case, there
cannot be any other threads in the wave that are waiting to rejoin
at a later point in the CFG, i.e. if exec=0 at that point, then all
threads really are dead and we can exit the wave.
Make some other minor cleanups to the pass while we're at it.
v2: preserve the dominator tree
Reviewers: arsenm, cdevadas, foad, critson
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74908
Change-Id: Ia0d2b113ac944ad642d1c622b6da1b20aa1aabcc
This revision performs some basic refactoring towards more easily defining Linalg "named" ops. Such named ops form the backbone of operations that are ubiquitous in the ML application domain.
Add atomic types and builtins operating on those atomic types to
`-fdeclare-opencl-builtins`. The _explicit variants are left out of
this commit, as these take enum arguments that are not yet supported
by the `-fdeclare-opencl-builtins` machinery.
Summary:
Though we don't have new changes to the index format, we have changes to
symbol collector, e.g. collect marcos, spelled references. Bump the
version to force background-index to rebuild.
Reviewers: kadircet
Reviewed By: kadircet
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74127
See discussion on PR44792.
This reverts commit 02ce9d8ef5.
It also reverts the follow-up commits
8f46269f0 "[profile] Don't dump counters when forking and don't reset when calling exec** functions"
62c7d8402 "[profile] gcov_mutex must be static"
Generally we ignore interceptors coming from called_from_lib-suppressed libraries.
However, we must not ignore critical interceptors like e.g. pthread_create,
otherwise runtime will lost track of threads.
pthread_detach is one of these interceptors we should not ignore as it affects
thread states and behavior of pthread_join which we don't ignore as well.
Currently we can produce very obscure false positives. For more context see:
https://groups.google.com/forum/#!topic/thread-sanitizer/ecH2P0QUqPs
The added test captures this pattern.
While we are here rename ThreadTid to ThreadConsumeTid to make it clear that
it's not just a "getter", it resets user_id to 0. This lead to confusion recently.
Reviewed in https://reviews.llvm.org/D74828
Summary:
Implements the following intrinsics:
- @llvm.aarch64.sve.bdep.x
- @llvm.aarch64.sve.bext.x
- @llvm.aarch64.sve.bgrp.x
- @llvm.aarch64.sve.tbl2
- @llvm.aarch64.sve.tbx
The SelectTableSVE2 function in this patch is used to select the TBL2
intrinsic & ensures that the vector registers allocated are consecutive.
Reviewers: sdesmalen, andwar, dancgr, cameron.mcinally, efriedma, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74912
Add getUniqueReachingMIDef to RDA which performs a global search for
a machine instruction that produces a unique definition of a given
register at a given point. Also add two helper functions
(getMIOperand) that wrap around this functionality to get the
incoming definition uses of a given instruction. These now replace
the uses of getReachingMIDef in ARMLowOverheadLoops. getReachingMIDef
has been renamed to getReachingLocalMIDef and has been made private
along with getInstFromId.
Differential Revision: https://reviews.llvm.org/D74605
The examples for different options were inconsistently indented in
the HTML display. As they are tied to the options, this change
normalises to indent them the same as the option description body.
"--functions none" and "--functions=none" are not the same. One is the
option "--functions" with its default value of "linkage", followed by an
input address of "none", and the other is "--functions" with the value
"none". This patch fixes the doc to match the actual behaviour by adding
an extra '=' sign in the allowed values description.
Summary:
This packet is necessary to make lldb work with the remote-gdb stub in
user mode qemu when running position-independent binaries. It reports
the relative position (load bias) of the loaded executable wrt. the
addresses in the file itself.
Lldb needs to know this information in order to correctly set the load
address of the executable. Normally, lldb would be able to find this out
on its own by following the breadcrumbs in the process auxiliary vector,
but we can't do this here because qemu does not support the
qXfer:auxv:read packet.
This patch does not implement full scope of the qOffsets packet (it only
supports packets with identical code, data and bss offsets), because it
is not fully clear how should the different offsets be handled and I am
not aware of a producer which would make use of this feature (qemu will
always
<https://github.com/qemu/qemu/blob/master/linux-user/elfload.c#L2436>
return the same value for code and data offsets). In fact, even gdb
ignores the offset for the bss sections, and uses the "data" offset
instead. So, until the we need more of this packet, I think it's best
to stick to the simplest solution possible. This patch simply rejects
replies with non-uniform offsets.
Reviewers: clayborg, jasonmolenda
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D74598
Summary: Fixes https://github.com/clangd/clangd/issues/279. We were removing the color options but not the preceeding -Xclang which causes errors since the -Xclang would now apply to the next option in the list of options. Now, when removing a color option, we check if there was a preceeding -Xclang and remove it as well.
Patch By @DaanDeMeyer !
Reviewers: sammccall, kadircet
Reviewed By: sammccall
Subscribers: ilya-biryukov, usaxena95
Differential Revision: https://reviews.llvm.org/D75019
Summary:
This patch creates the llvm-gsymutil binary that can convert object files to GSYM using the --convert <path> option. It can also dump and lookup addresses within GSYM files that have been saved to disk.
To dump a file:
llvm-gsymutil /path/to/a.gsym
To perform address lookups, like with atos, on GSYM files:
llvm-gsymutil --address 0x1000 --address 0x1100 /path/to/a.gsym
To convert a mach-o or ELF file, including any DWARF debug info contained within the object files:
llvm-gsymutil --convert /path/to/a.out --out-file /path/to/a.out.gsym
Conversion highlights:
- convert DWARF debug info in mach-o or ELF files to GSYM
- convert symbols in symbol table to GSYM and don't convert symbols that overlap with DWARF debug info
- extract UUID from object files
- extract .text (read + execute) section address ranges and filter out any DWARF or symbols that don't fall in those ranges.
- if .text sections are extracted, and if the last gsym::FunctionInfo object has no size, cap the size to the end of the section the function was contained in
Dumping GSYM files will dump all sections of the GSYM file in textual format.
Reviewers: labath, aadsm, serhiy.redko, jankratochvil, xiaobai, wallace, aprantl, JDevlieghere, jdoerfert
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74883
Summary:
Loop unswitch hoists branches on loop-invariant conditions. However, if this
condition is poison/undef and the branch wasn't originally reachable, loop
unswitch introduces UB (since the optimized code will branch on poison/undef and
the original one didn't)).
We fix this problem by freezing the condition to ensure we don't introduce UB.
We will now transform the following:
while (...) {
if (C) { A }
else { B }
}
Into:
C' = freeze(C)
if (C') {
while (...) { A }
} else {
while (...) { B }
}
This patch fixes the root cause of the following bug reports (which use the old loop unswitch, but can be reproduced with minor changes in the code and -enable-nontrivial-unswitch):
- https://llvm.org/bugs/show_bug.cgi?id=27506
- https://llvm.org/bugs/show_bug.cgi?id=31652
Reviewers: reames, majnemer, chenli, sanjoy, hfinkel
Reviewed By: reames
Subscribers: hiraditya, jvesely, nhaehnle, filcab, regehr, trentxintong, nlopes, llvm-commits, mzolotukhin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D29015
Summary:
The patch D62993 : `[PowerPC] Emit scalar min/max instructions with unsafe fp math`
has modified the functionality when `Subtarget.hasP9Vector() && (!HasNoInfs || !HasNoNaNs)`,
this modification is not expected.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D74701
Only MCAsmStreamer (assembly output) needs to keep names of temporary labels created by
MCContext::createTempSymbol().
This change made the rL236642 optimization available for cc2as and
probably some other users.
This eliminates a behavior difference between llvm-mc -filetype=obj and cc1as, which caused
https://reviews.llvm.org/D74006#1890487
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75097
Fixed an issue exposed by D74006.
In clang cc1as, MCContext::UseNamesOnTempLabels is true.
When parsing a .fnstart directive, FnStart gets redefined to a temporary symbol of a different name (.Ltmp0, .Ltmp1, ...).
MCContext::getELFSection() called by SwitchToEHSection() will create a different .ARM.exidx each time.
llvm-mc uses `Ctx.setUseNamesOnTempLabels(false);` and FnStart is unnamed.
MCContext::getELFSection() called by SwitchToEHSection() will reuse the same .ARM.exidx .
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75095
Instead, use `using namespace lld(::coff)`, and fully qualify the names
of free functions where they are defined in cpp files.
This effectively reverts d79c3be618 to follow the new style guide added
in 236fcbc21a.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D74882
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.
This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.
Differential Revision: https://reviews.llvm.org/D75132
std::shared_ptr::allocate_shared isn't in the standard. This commit removes it from libc++. It updates std::allocate_shared to use __create_with_cntrl_block.
Differential Revision: https://reviews.llvm.org/D66178
Unlike what I claimed in my previous commit. The caching is
actually not NFC on PHIs.
When we put a big enough max depth, we end up simulating loops.
The cache is effectively cutting the simulation short and we
get less information as a result.
E.g.,
```
v0 = G_CONSTANT i8 0xC0
jump
v1 = G_PHI i8 v0, v2
v2 = G_LSHR i8 v1, 1
```
Let say we want the known bits of v1.
- With cache:
Set v1 cache to we know nothing
v1 is v0 & v2
v0 gives us 0xC0
v2 gives us known bits of v1 >> 1
v1 is in the cache
=> v1 is 0, thus v2 is 0x80
Finally v1 is v0 & v2 => 0x80
- Without cache and enough depth to do two iteration of the loop:
v1 is v0 & v2
v0 gives us 0xC0
v2 gives us known bits of v1 >> 1
v1 is v0 & v2
v0 is 0xC0
v2 is v1 >> 1
Reach the max depth for v1...
unwinding
v1 is know nothing
v2 is 0x80
v0 is 0xC0
v1 is 0x80
v2 is 0xC0
v0 is 0xC0
v1 is 0xC0
Thus now v1 is 0xC0 instead of 0x80.
I've added a unittest demonstrating that.
NFC
Summary: bfloat16 is stored internally as a double, so we can't direct use Type::getIntOrFloatBitWidth.
Differential Revision: https://reviews.llvm.org/D75133
Updated the patch to only fetch $pc on a Return Address-using
target only if we're in a trap frame *and* if there is a saved
location for $pc in the trap frame's unwind rules. If not,
we fall back to fetching the Return Address register (eg $lr).
Original commit msg:
Unwind past an interrupt handler correctly on arm or at pc==0
Fix RegisterContextLLDB::InitializeNonZerothFrame so that it
will fetch a FullUnwindPlan instead of falling back to the
architectural default unwind plan -- GetFullUnwindPlan knows
how to spot a jmp 0x0 that results in a fault, which may be
the case when we see a trap handler on the stack.
Fix RegisterContextLLDB::SavedLocationForRegister so that when
the pc value is requested from a trap handler frame, where we
have a complete register context available to us, don't provide
the Return Address register (lr) instead of the pc. We have
an actual pc value here, and it's pointing to the instruction
that faulted.
Differential revision: https://reviews.llvm.org/D75007
<rdar://problem/59416588>
It turns out that <semaphore.h> is not well-behaved, as it transitively
includes <sys/param.h>, and that one defines several non-reserved macros
that clash with some downstream projects in modular builds. For the time
being, using <sys/semaphore.h> instead gives us the declarations we need
without the macros.
rdar://59744472
This reverts commit eee22ec3c3.
This is not the correct fix, the root cause seems to be a bug in the
stage1 host clang compiler. See https://reviews.llvm.org/D75091 for more
discussion.