Summary:
This required for any users who call exit() after creating
thread-specific data, as tls destructors are only called when
pthread_exit() or pthread_cancel() are used. This should also
match tls behavior on linux.
Getting the base address of the tls section is straightforward,
as it's stored as a section offset in %gs. The size is a bit trickier
to work out, as there doesn't appear to be any official documentation
or source code referring to it. The size used in this patch was determined
by taking the difference between the base address and the address of the
subsequent memory region returned by vm_region_recurse_64, which was
1024 * sizeof(uptr) on all threads except the main thread, where it was
larger. Since the section must be the same size on all of the threads,
1024 * sizeof(uptr) seemed to be a reasonable size to use, barring
a more programtic way to get the size.
1024 seems like a reasonable number, given that PTHREAD_KEYS_MAX
is 512 on darwin, so pthread keys will fit inside the region while
leaving space for other tls data. A larger size would overflow the
memory region returned by vm_region_recurse_64, and a smaller size
wouldn't leave room for all the pthread keys. In addition, the
stress test added here passes, which means that we are scanning at
least the full set of possible pthread keys, and probably
the full tls section.
Reviewers: alekseyshl, kubamracek
Subscribers: krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D33215
llvm-svn: 303887
The existing implementation ran CHECKs to assert that the thread state
was stored inside the tls. However, the mac implementation of tsan doesn't
store the thread state in tls, so these checks fail once darwin tls support
is added to the sanitizers. Only run these checks on platforms where
the thread state is expected to be contained in the tls.
llvm-svn: 303886
Summary:
Apparently Windows's `UnmapOrDie` doesn't support partial unmapping. Which
makes the new region allocation technique not Windows compliant.
Reviewers: alekseyshl, dvyukov
Reviewed By: alekseyshl
Subscribers: llvm-commits, kubamracek
Differential Revision: https://reviews.llvm.org/D33554
llvm-svn: 303883
Summary:
Currently, AllocateRegion has a tendency to fragment memory: it allocates
`2*kRegionSize`, and if the memory is aligned, will unmap `kRegionSize` bytes,
thus creating a hole, which can't itself be reused for another region. This
is exacerbated by the fact that if 2 regions get allocated one after another
without any `mmap` in between, the second will be aligned due to mappings
generally being contiguous.
An idea, suggested by @alekseyshl, to prevent such a behavior is to have a
stash of regions: if the `2*kRegionSize` allocation is properly aligned, split
it in two, and stash the second part to be returned next time a region is
requested.
At this point, I thought about a couple of ways to implement this:
- either an `IntrusiveList` of regions candidates, storing `next` at the
begining of the region;
- a small array of regions candidates existing in the Primary.
While the second option is more constrained in terms of size, it offers several
advantages:
- security wise, a pointer in a region candidate could be overflowed into, and
abused when popping an element;
- we do not dirty the first page of the region by storing something in it;
- unless several threads request regions simultaneously from different size
classes, the stash rarely goes above 1 entry.
I am not certain about the Windows impact of this change, as `sanitizer_win.cc`
has its own version of MmapAlignedOrDie, maybe someone could chime in on this.
MmapAlignedOrDie is effectively unused after this change and could be removed
at a later point. I didn't notice any sizeable performance gain, even though we
are saving a few `mmap`/`munmap` syscalls.
Reviewers: alekseyshl, kcc, dvyukov
Reviewed By: alekseyshl
Subscribers: llvm-commits, kubamracek
Differential Revision: https://reviews.llvm.org/D33454
llvm-svn: 303879
Also comes with a cmake cache for building the runtime bits:
$ cmake <normal cmake flags> \
-DBAREMETAL_ARMV6M_SYSROOT=/path/to/sysroot \
-DBAREMETAL_ARMV7M_SYSROOT=/path/to/sysroot \
-DBAREMETAL_ARMV7EM_SYSROOT=/path/to/sysroot \
-C /path/to/clang/cmake/caches/BaremetalARM.cmake \
/path/to/llvm
https://reviews.llvm.org/D33259
llvm-svn: 303873
MSVC doesn't support C++ operator names (using 'or' instead of ||,
'not' instead of '!', etc), so this was disabled in MSVC mode in r303798.
This fixes the regression noticed on the buildbots.
llvm-svn: 303872
Summary: This patch attempts to make `git-clang-format` both python2 and python3 compatible. Currently it only works in python2.
Reviewers: modocache, compnerd, djasper, jbcoe, srhines, ddunbar
Reviewed By: jbcoe
Subscribers: kimgr, mgorny, llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D30773
llvm-svn: 303871
Summary:
Dmitry, seeking your expertise. I believe, the proper way to implement
Lock/Unlock here would be to use acquire/release semantics. Am I missing
something?
Reviewers: dvyukov
Subscribers: llvm-commits, kubamracek
Differential Revision: https://reviews.llvm.org/D33521
llvm-svn: 303869
Summary:
According to the PDTS it's perfectly legal to have a promise type that defines neither `return_value` nor `return_void`. However a coroutine that uses such a promise type will almost always have UB, because it can never `co_return`.
This patch changes Clang to diagnose such cases as an error. It also cleans up some of the diagnostic messages relating to member lookup in the promise type.
Reviewers: GorNishanov, rsmith
Reviewed By: GorNishanov
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D33534
llvm-svn: 303868
Summary: This patch is needed so that Libc++ can actually tess if Clang supports coroutines, instead of just paying lip service with a partial implementation. Otherwise the libc++ test suite will fail against older versions of Clang
Reviewers: GorNishanov, rsmith
Reviewed By: GorNishanov
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D33536
llvm-svn: 303867
Summary:
In FreeBSD we needed to add generic implementations for `__bswapdi2` and
`__bswapsi2`, since gcc 6.x for mips is emitting calls to these. See:
https://reviews.freebsd.org/D10838 and https://reviews.freebsd.org/rS318601
The actual mips code generated for these generic C versions is pretty
OK, as can be seen in the (FreeBSD) review.
I checked over gcc sources, and it seems that it can emit these calls on
more architectures, so maybe it's best to simply always add them to the
compiler-rt builtins library.
Reviewers: howard.hinnant, compnerd, petarj, emaste
Reviewed By: compnerd, emaste
Subscribers: mgorny, llvm-commits, arichardson
Differential Revision: https://reviews.llvm.org/D33516
llvm-svn: 303866
This test case occassionally fails when run on powerpc64 be.
asan/TestCases/Posix/halt_on_error-torture.cc
The failure causes false problem reports to be sent to developers whose
code had nothing to do with the failures. Reactivate it when the real
problem is fixed.
This could also be related to the same problems as with the tests
ThreadedOneSizeMallocStressTest, ThreadedMallocStressTest, ManyThreadsTest,
and several others that do not run reliably on powerpc.
llvm-svn: 303864
There's probably a lot more like this (see also comments in D33338 about responsibility),
but I suspect we don't usually get a visible manifestation.
Given the recent interest in improving InstCombine efficiency, another potential micro-opt
that could be repeated several times in this function: morph the existing icmp pred/operands
instead of creating a new instruction.
llvm-svn: 303860
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).
Differential Revision: https://reviews.llvm.org/D33169
llvm-svn: 303858
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the Clang side of the addition of six intrinsics for two new machine instructions (vpopcntd and vpopcntq).
It also includes the addition of the new feature set.
Differential Revision: https://reviews.llvm.org/D33170
llvm-svn: 303857
This reverts commit r303847 as it introduces a number of regressions.
Investigation has showed that we are parsing the CIE entries in the
debug_frame section incorrectly -- we are parsing them the same way as
eh_frame, but the entries in debug_frame have a couple of extra entries
which have not been taken into account.
llvm-svn: 303854
This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts.
This pass attempts to sink instructions into successors, reducing static
instruction count and enabling if-conversion.
We use a variant of global value numbering to decide what can be sunk.
Consider:
[ %a1 = add i32 %b, 1 ] [ %c1 = add i32 %d, 1 ]
[ %a2 = xor i32 %a1, 1 ] [ %c2 = xor i32 %c1, 1 ]
\ /
[ %e = phi i32 %a2, %c2 ]
[ add i32 %e, 4 ]
GVN would number %a1 and %c1 differently because they compute different
results - the VN of an instruction is a function of its opcode and the
transitive closure of its operands. This is the key property for hoisting
and CSE.
What we want when sinking however is for a numbering that is a function of
the *uses* of an instruction, which allows us to answer the question "if I
replace %a1 with %c1, will it contribute in an equivalent way to all
successive instructions?". The (new) PostValueTable class in GVN provides this
mapping.
This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG.
Differential revision: https://reviews.llvm.org/D24805
llvm-svn: 303850
This is a resubmit of r303732, which was reverted due to a regression.
The original patch caused a regression in TestLoadUnload, which has only showed
up when running the remote test suite. The problem there was that we interrupted
the target just as it has hit the rendezvous breakpoint in the dlopen call. This
meant that the stop reason was set to "breakpoint" even though the event would
not have been broadcast if we had not stopped the process. I fix this by
checking StopInfo->ShouldNotify() before stopping.
I also add a new test for the handling of conditional breakpoints in
expressions, which I noticed to be broken (pr33164)
Differential Revision: https://reviews.llvm.org/D33283
llvm-svn: 303848
There are some differences between eh_frame and debug_frame formats that
are not considered by DWARFCallFrameInfo::GetFDEIndex. An FDE entry
contains CIE_pointer in debug_frame in same place as cie_id in eh_frame.
As described in dwarf standard (section 6.4.1), CIE_pointer is an
"offset into the .debug_frame section". So, variable cie_offset should
be equal cie_id for debug_frame.
FDE entries with zeroth CIE pointer (which is actually placed in cie_id
variable) shouldn't be ignored also.
I have also added a little change which allow to use debug_info section
when eh_frame is absent. This case really can take place on some platforms.
Patch from tatyana-krasnukha.
https://reviews.llvm.org/D33504
llvm-svn: 303847
instrumenting code.
This is important in the new pass manager. The old pass manager's
inliner has a small DCE routine embedded within it. The new pass manager
relies on the actual GlobalDCE pass for this.
Without this patch, instrumentation profiling with the new PM results in
massive code bloat in the object files because the instrumentation
itself ends up preventing DCE from working to remove the code.
We should probably change the instrumentation (and/or DCE) so that we
can eliminate dead code even if instrumented, but we shouldn't even
spend the time generating instrumentation for that code so this still
seems like a good patch.
Differential Revision: https://reviews.llvm.org/D33535
llvm-svn: 303845
pass.
The original logic only considered direct successors of the hoisted
domtree nodes, but that isn't really enough. If there are other basic
blocks that are completely within the subtree, their successors could
just as easily be impacted by the hoisting.
The more I think about it, the more I think the correct update here is
to hoist every block on the dominance frontier which has an idom in the
chain we hoist across. However, this is subtle enough that I'd
definitely appreciate some more eyes on it.
Sadly, if this is the correct algorithm, it requires computing a (highly
localized) dominance frontier. I've done this in the simplest (IE, least
code) way I could come up with, but that may be too naive. Suggestions
welcome here, dominance update algorithms are not an area I've studied
much, so I don't have strong opinions.
In good news, with this patch, turning on simple unswitch passes the
LLVM test suite for me with asserts enabled.
Differential Revision: https://reviews.llvm.org/D32740
llvm-svn: 303843
If Op is equal to array_lengthof, the lookup would be out of bounds, but we were only checking for greater than. I suspect nothing ever passes in the equal value because its a sentinel to mark the end of the builtin opcodes and not a real opcode.
So really this fix is just so that the code looks right and makes sense.
llvm-svn: 303840