Currently, OR1K architecture put the program counter at offset 0x128 of
the current `or1k_thread_state_t`. However, the PC is restored after
updating the thread pointer in `r3`, which causes the PC to be fetched
incorrectly.
This patch swaps the order of restoration of `r9` and `r3`, such that
the PC is restored to `r9` using the current thread state.
Patch by Oi Chee Cheung!
Reviewed By: whitequark, compnerd
Differential Revision: https://reviews.llvm.org/D107042
In some cases, when the execution path of the diagnostic
goes back and forth, arrows can overlap and create a mess.
Dimming arrows that are not relevant at the moment, solves this issue.
They are still visible, but don't draw too much attention.
Differential Revision: https://reviews.llvm.org/D92928
This commit adds a very first version of this feature.
It is off by default and has to be turned on by checking the
corresponding box. For this reason, HTML reports still keep
control notes (aka grey bubbles).
Further on, we plan on attaching arrows to events and having all arrows
not related to a currently selected event barely visible. This will
help with reports where control flow goes back and forth (eg in loops).
Right now, it can get pretty crammed with all the arrows.
Differential Revision: https://reviews.llvm.org/D92639
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
For some reason, Microsoft declares _m_prefetch to take a const void*,
but _m_prefetchw to take a /volatile/ const void*.
Do the same for compatibility.
Differential revision: https://reviews.llvm.org/D106790
Definition of `__cpp_threadsafe_static_init` macro is controlled by
language option Opts.ThreadsafeStatics. This patch sets language
option to false by default in OpenCL mode, resulting in macro
`__cpp_threadsafe_static_init` being undefined. Default value can be
overridden using command line option -fthreadsafe-statics.
Change is supposed to address portability because not all OpenCL
vendors support thread safe implementation of static initialization.
Fixes llvm.org/PR48012
Differential Revision: https://reviews.llvm.org/D107163
This patch fixes a bug in the existing implementation of detectAsFloorDiv,
where floordivs with numerator with non-zero constant term and floordivs with
numerator only consisting of a constant term were not being detected.
Reviewed By: vinayaka-polymage
Differential Revision: https://reviews.llvm.org/D107214
The next step is to be able to benchmark several implementations at once and compare which one performs best on a particular machine.
Differential Revision: https://reviews.llvm.org/D107265
Add new fixed-size vector clock for the new tsan runtime.
For now it's unused.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107167
Currently we save the creation stack for sync objects always.
But it's not needed to some sync objects, most notably atomics.
We simply don't use atomic creation stack anywhere.
Allow callers to control saving of the creation stack
and don't save it for atomics.
Depends on D107257.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107258
MetaMap::GetSync shows up in profiles,
so add LIKELY/UNLIKELY annotations.
Depends on D107256.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107257
Don't lock the sync object inside of MetaMap methods.
This has several advantages:
- the new interface does not confuse thread-safety analysis
so we can remove a bunch of NO_THREAD_SAFETY_ANALYSIS attributes
- this allows use of scoped lock objects
- this allows more flexibility, e.g. locking some other mutex
between searching and locking the sync object
Also prefix the methods with GetSync to be consistent with GetBlock method.
Also make interface wrappers inlinable, otherwise we either end up with
2 copies of the method, or with an additional call.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107256
ArtifactValueFinder keeps trying to combine g_unmerge_values in some cases.
Fix is to skip combine attempt for dead defs.
Differential Revision: https://reviews.llvm.org/D106879
As discussed on D107228, widening a subvector by inserting the whole subvector into the bottom a larger undef vector should always be cheap enough that we can treat it as zero cost.
NOTE: If this proves to cause issues we have the option of introducing a "SK_WidenSubvector" shuffle kind enum that targets could override the zero cost, but that doesn't seem necessary atm.
Differential Revision: https://reviews.llvm.org/D107228
This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188
This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion.
The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic.
The widening case will be addressed in a follow up patch that treats the cost as 0.
Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors.
Differential Revision: https://reviews.llvm.org/D107228
When the limit of the inner loop is a known integer, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.
Differential Revision: https://reviews.llvm.org/D105802
If the block target for a WLSTP instruction is known to be out of range,
and cannot be fixed by the ARMBlockPlacementPass, we can relax it to a
DLSTP (and cmp/branch) to still allow the creation of tail predicated
loops. That is what this patch does, adding extra revert code to the
fallback path of ARMBlockPlacementPass.
Due to the code produced when reverting, this creates a DLSTP between a
Bcc and a Br. As a DLS isn't necessarily a terminator we need to split
the block to move the DLS/Br into.
Differential Revision: https://reviews.llvm.org/D104709
The encoding used for opening files depends on the OS and might be different
from UTF-8 (e.g. on Windows it can be CP-1252). The documentation files use
UTF-8 and might be incompatible with other encodings. For example, right now
`clang-tools-extra/docs/clang-tidy/checks/abseil-no-internal-dependencies.rst`
has non-ASCII quotes and running `add_new_check.py` fails on Windows, because
it tries to read the file with incompatible encoding.
Use `io.open` for compatibility with both Python 2 and Python 3.
Reviewed By: kbobyrev
Differential Revision: https://reviews.llvm.org/D106792
ProcessPendingSignals is called in all interceptors
and user atomic operations. Make the fast-path check
(no pending signals) inlinable.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D107217
We might want to use info from GC strategy in middle end analysis.
The motivation for this is provided in D99135: we may want to ask
a GC if it's going to work with a given pointer (currently this code
makes naive check by the method name).
Differetial Revision: https://reviews.llvm.org/D100559
Reviewed By: reames
Happens when DestContext is LinkageSpecDecl and hense CurContext happens to be
both not TagDecl and NamespaceDecl.
Minimal reproducer: trigger define outline in
```
namespace ns {
extern "C" {
typedef int foo;
}
foo Fo^o(int id) { return id; }
}
```
Reviewed By: kadircet
Differential Revision: https://reviews.llvm.org/D107047
When compiling on ZLinux, we got this error:
/llvm-project/llvm/examples/OrcV2Examples/LLJITWithRemoteDebugging/ \
RemoteJITUtils.h:80:65: required from here...
/usr/include/c++/7/bits/unique_ptr.h:76:22: error: invalid application of
'sizeof' to incomplete type 'llvm::orc::RemoteExecutorProcessControl'
static_assert(sizeof(_Tp)>0,
This patch just removes nullptr from the initialization of
std::unique_ptr<RemoteExecutorProcessControl> to avoid the issue.
Patch by Tung D. Le (tung@jp.ibm.com). Thanks Tung!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D107247
Application of default mapping to BVH intrinsics was missing.
Copy parts of SelectionDAG test to GlobalISel test as these would
have indicated this error.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D107211
Test dependent on pexpect fail randomly with timeouts on Arm/AArch64 Linux
buildbots. I am setting pexpect timeout from 30 to 60.
I will revert this back if this doesnt improve random failures.
This reverts commit fd18f0e84c.
I reverted this change to see its effect on failing GUI tests on LLDB
Arm/AArch64 Linux buildbots. I could not find any evidence against this
particular change so reverting it back.
Differential Revision: https://reviews.llvm.org/D100243