The problem scenario is the following:
A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region
and calls some omp functions). An application has a loop where it dynamically
loads libfoo.so, calls the function from it, unloads libfoo.so. After several
loop iterations application crashes with the message about lack of resources
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
The problem is that pthread_kill() was not followed by pthread_join() in case
of terminated thread. This patch fixes this problem for both worker and monitor
threads.
Differential Revision: http://reviews.llvm.org/D21200
llvm-svn: 272567
These changes remove the hwloc_topology_ignore_type function which doesn't exist
in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc
has the cache levels stripped out and then assumes the final stripped topology
follows the typical three-level topology: packages -> cores -> HW threads.
But the code is doing unclean manipulations to determine at what level those
resources are located and also assumes too much about what hwloc is detecting
(there could be intermediate levels in between socket and core for instance).
This new way of extracting the topology doesn't strip out any hardware objects
that hwloc detects. It does not assume the three level topology, and instead
searches for the relevant three levels within the topology for each bit of
information using hwloc interface functions. i.e., the three level topology
subset that our affinity code is interested in is extracted from the hwloc
topology tree directly.
For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the
number of cores under a socket reliably without worrying if there are unexpected
objects between the socket object and core object in the hwloc topology
structure. Also, now that all topology information is kept, there are also
possibilities of using the caches/numa nodes to determine more sophisticated
affinity settings in the future.
There is also some cleanup code added for the destruction of the
__kmp_hwloc_topology object.
Differential Revision: http://reviews.llvm.org/D21195
llvm-svn: 272565
There is no need to use a target-specific intrinsic to implement
_bit_scan_forward or _bit_scan_reverse, reimplementing them using
generic intrinsics makes it more likely that the middle end will
understand what's going on.
llvm-svn: 272564
Differential Revision: http://reviews.llvm.org/D19843
Corresponding LLVM change: http://reviews.llvm.org/D19842
Re-commit after addressing issues with of generating too many warnings for Windows and asan test failures.
Patch by Eric Niebler
llvm-svn: 272562
The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.
Differential Revision: http://reviews.llvm.org/D21245
llvm-svn: 272561
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1
This makes one particular compute shader 8x faster.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D21136
llvm-svn: 272556
Differential Revision: http://reviews.llvm.org/D19842
Corresponding clang patch: http://reviews.llvm.org/D19843
Re-commit after addressing issues with of generating too many warnings for Windows and asan test failures
Patch by Eric Niebler
llvm-svn: 272555
The condition reg of the cndmask_b64 expansion can't be killed by
the first one, and the implicit super register implicit def is needed.
llvm-svn: 272554
Summary:
Adds a version of sigaction that uses a raw system call, to avoid circular
dependencies and support calling sigaction prior to setting up
interceptors. The new sigaction relies on an assembly sigreturn routine
for its restorer, which is Linux x86_64-only for now.
Uses the new sigaction to initialize the working set tool's shadow fault
handler prior to libc interceptor being set up. This is required to
support instrumentation invoked during interceptor setup, which happens
with an instrumented tcmalloc or other allocator compiled with esan.
Adds a test that emulates an instrumented allocator.
Reviewers: aizatsky
Subscribers: vitalybuka, tberghammer, zhaoqin, danalbert, kcc, srhines, eugenis, llvm-commits, kubabrecka
Differential Revision: http://reviews.llvm.org/D21083
llvm-svn: 272553
This patch implements PR#22821.
Taking the address of a packed member is dangerous since the reduced
alignment of the pointee is lost. This can lead to memory alignment
faults in some architectures if the pointer value is dereferenced.
This change adds a new warning to clang emitted when taking the address
of a packed member. A packed member is either a field/data member
declared as attribute((packed)) or belonging to a struct/class
declared as such. The associated flag is -Waddress-of-packed-member
Differential Revision: http://reviews.llvm.org/D20561
llvm-svn: 272552
This enables use of the 'R' and 'T' memory constraints for inline ASM
operands on SystemZ, which allow an index register as well as an
immediate displacement. This patch includes corresponding documentation
and test case updates.
As with the last patch of this kind, I moved the 'm' constraint to the
most general case, which is now 'T' (base + 20-bit signed displacement +
index register).
Author: colpell
Differential Revision: http://reviews.llvm.org/D21239
llvm-svn: 272547
MRRC/MRRC2 instruction writes to two registers. The
intrinsic definition returns a single uint64_t to
represent the write, this is a compact way of
representing a write to two 32 bit registers,
the alternative might have been two return a
struct of 2 uint32_t's but this isn't as nice.
Differential Revision:
llvm-svn: 272544
We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp
The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores.
The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load
Differential Revision: http://reviews.llvm.org/D21272
llvm-svn: 272540
Summary:
The "-Werror=enum-compare" shows that the statement is using two different enums:
enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType'
A follow-up fix on D21235.
Reviewers: klimek
Subscribers: spatel, cfe-commits
Differential Revision: http://reviews.llvm.org/D21278
llvm-svn: 272539
Before (when aligning & to the right):
SomeType MemberFunction(const Deleted &) const&;
After:
SomeType MemberFunction(const Deleted &) const &;
This also applies to variable declarations, e.g.:
int const * a;
However, this form is very uncommon (most people would write
"const int* a" instead) and contracting to "const*" might actually send
the wrong signal of what the const binds to.
llvm-svn: 272537
Create a special visualizer for OpaquePtr<QualType> because the
standard visualizer doesn't work with OpaquePtr<QualType>
due to QualType being heavily dependent on traits to be pointer-like.
Also, created an identical visualizer for UnionOpaquePtr
llvm-svn: 272531
This is a speculative attempt to fix the compiler error: "list initialization inside
member initializer list or non-static data member initializer is not implemented" with
r272529.
llvm-svn: 272530
This commit adds a static analysis checker to verify the correct usage of the MPI API in C
and C++. This version updates the reverted r271981 to fix a memory corruption found by the
ASan bots.
Three path-sensitive checks are included:
- Double nonblocking: Double request usage by nonblocking calls without intermediate wait
- Missing wait: Nonblocking call without matching wait.
- Unmatched wait: Waiting for a request that was never used by a nonblocking call
Examples of how to use the checker can be found at https://github.com/0ax1/MPI-Checker
A patch by Alexander Droste!
Reviewers: zaks.anna, dcoughlin
Differential Revision: http://reviews.llvm.org/D21081
llvm-svn: 272529
When visualizing small vectors in VS2015, show the first few elements in the DisplayString instead of the size. For example, a SmallVector of DeclAccessPair will visualize like
{public typename ...Ts, public typename U}
The visualization in VS2013 remains the same because we continue to include the old visualizer with a lower-than-default priority of MediumLow, and the same SmallVector would continue to be visualized as
{size = 2}
llvm-svn: 272525
Summary:
Do not insert whitespace preceding the "!" postfix operator. This is an
incomplete fix, but should cover common usage.
Reviewers: djasper
Subscribers: cfe-commits, klimek
Differential Revision: http://reviews.llvm.org/D21204
llvm-svn: 272524
Does a good job with type and non-type template arguments
and lays the groundwork for template template arguments to
visualize well once there is a TemplateName visualizer.
Also fixed what looks like an incorrect comment in the
header for ParsedTemplate.h.
llvm-svn: 272521
td_type is std::pair<std::string, std::string>, but the map returns
elements of std::pair<const std::string, std::string>. In well-designed
languages like C++ that yields an implicit copy perfectly hidden by
constref's lifetime extension. Just use auto, the typedef obscured the
real type anyways.
Found with a little help from clang-tidy's
performance-implicit-cast-in-loop.
llvm-svn: 272519