This follows the approach in r263208 (for GVN) pretty closely:
- move the bulk of the body of the function to the new PM class.
- expose a runImpl method on the new-PM class that takes the IRUnitT and
pointers/references to any analyses and use that to implement the
old-PM class.
- use a private namespace in the header for stuff that used to be file
scope
llvm-svn: 272597
Save machine function pointer so that
the reference does not need to be passed around.
This also gives other methods access to machine
function for information such as entry count etc.
llvm-svn: 272594
This is a bit gnarly since LVI is maintaining its own cache.
I think this port could be somewhat cleaner, but I'd rather not spend
too much time on it while we still have the old pass hanging around and
limiting how much we can clean things up.
Once the old pass is gone it will be easier (less time spent) to clean
it up anyway.
This is the last dependency needed for porting JumpThreading which I'll
do in a follow-up commit (there's no printer pass for LVI or anything to
test it, so porting a pass that depends on it seems best).
I've been mostly following:
r269370 / D18834 which ported Dependence Analysis
r268601 / D19839 which ported BPI
llvm-svn: 272593
Summary:
Adds a version of sigaction that uses a raw system call, to avoid circular
dependencies and support calling sigaction prior to setting up
interceptors. The new sigaction relies on an assembly sigreturn routine
for its restorer, which is Linux x86_64-only for now.
Uses the new sigaction to initialize the working set tool's shadow fault
handler prior to libc interceptor being set up. This is required to
support instrumentation invoked during interceptor setup, which happens
with an instrumented tcmalloc or other allocator compiled with esan.
Adds a test that emulates an instrumented allocator.
Reviewers: aizatsky
Subscribers: vitalybuka, tberghammer, zhaoqin, danalbert, kcc, srhines, eugenis, llvm-commits, kubabrecka
Differential Revision: http://reviews.llvm.org/D21083
llvm-svn: 272591
Fix for bugzilla https://llvm.org/bugs/show_bug.cgi?id=26602. Removed functions
body consisted of the only KMP_ASSERT(0) statement. Thus possible runtime crash
converted to compile-time error, which looks preferable (faster possible error
detection).
TODO: consider C++11 static assert as an alternative, that could
make the diagnostics better.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21304
llvm-svn: 272590
Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
When iterating through procs in a mask, always check if proc is in fullMask
(this check was missing in a few places).
Patch by Brian Bliss.
Differential Revision: http://reviews.llvm.org/D21300
llvm-svn: 272589
ActOnBinOp corrects delayed typos when in C mode; don't correct them in that
case. Fixes PR26700.
Differential Revision: http://reviews.llvm.org/D20490
llvm-svn: 272587
This is third patch to clean up the code.
Included in this patch:
1. Further unclutter trace/chain formation main routine;
2. Isolate the logic to compute global cost/conflict detection
into its own method;
3. Heavily document the selection algorithm;
4. Added helper hook to allow PGO specific logic to be
added in the future.
llvm-svn: 272582
Summary:
AAResults::callCapturesBefore would previously ignore operand
bundles. It was possible for a later instruction to miss its memory
dependency on a call site that would only access the pointer through a
bundle.
Patch by Oscar Blumberg!
Reviewers: sanjoy
Differential Revision: http://reviews.llvm.org/D21286
llvm-svn: 272580
Summary:
we should only deduplicate symbols after matching symbols with the
undefined identifier. This patch tries to fix the situation where matching
symbols are deleted when it has the same header as a non-matching symbol, which
can prevent finding the correct header even if there is a matched symbol.
Reviewers: bkramer
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D21290
llvm-svn: 272576
Summary:
This patch introduces the concept of offloading tool chain and offloading kind. Each tool chain may have associated an offloading kind that marks it as used in a given programming model that requires offloading.
It also adds the logic to iterate on the tool chains based on the kind. Currently, only CUDA is supported, but in general a programming model (an offloading kind) may have associated multiple tool chains that require supporting offloading.
This patch does not add tests - its goal is to keep the existing functionality.
This patch is the first of a series of three that attempts to make the current support of CUDA more generic and easier to extend to other programming models, namely OpenMP. It tries to capture the suggestions/improvements/concerns on the initial proposal in http://lists.llvm.org/pipermail/cfe-dev/2016-February/047547.html. It only tackles the more consensual part of the proposal, i.e.does not address the problem of intermediate files bundling yet.
Reviewers: ABataev, jlebar, echristo, hfinkel, tra
Subscribers: guansong, Hahnfeld, andreybokhanko, tcramer, mkuron, cfe-commits, arpith-jacob, carlo.bertolli, caomhin
Differential Revision: http://reviews.llvm.org/D18170
llvm-svn: 272571
If either current_task or new_task is untied then skip task scheduling
constraint checks, because untied tasks are not affected by the task
scheduling constraints.
Differential Revision: http://reviews.llvm.org/D21196
llvm-svn: 272570
The problem scenario is the following:
A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region
and calls some omp functions). An application has a loop where it dynamically
loads libfoo.so, calls the function from it, unloads libfoo.so. After several
loop iterations application crashes with the message about lack of resources
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
The problem is that pthread_kill() was not followed by pthread_join() in case
of terminated thread. This patch fixes this problem for both worker and monitor
threads.
Differential Revision: http://reviews.llvm.org/D21200
llvm-svn: 272567
These changes remove the hwloc_topology_ignore_type function which doesn't exist
in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc
has the cache levels stripped out and then assumes the final stripped topology
follows the typical three-level topology: packages -> cores -> HW threads.
But the code is doing unclean manipulations to determine at what level those
resources are located and also assumes too much about what hwloc is detecting
(there could be intermediate levels in between socket and core for instance).
This new way of extracting the topology doesn't strip out any hardware objects
that hwloc detects. It does not assume the three level topology, and instead
searches for the relevant three levels within the topology for each bit of
information using hwloc interface functions. i.e., the three level topology
subset that our affinity code is interested in is extracted from the hwloc
topology tree directly.
For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the
number of cores under a socket reliably without worrying if there are unexpected
objects between the socket object and core object in the hwloc topology
structure. Also, now that all topology information is kept, there are also
possibilities of using the caches/numa nodes to determine more sophisticated
affinity settings in the future.
There is also some cleanup code added for the destruction of the
__kmp_hwloc_topology object.
Differential Revision: http://reviews.llvm.org/D21195
llvm-svn: 272565
There is no need to use a target-specific intrinsic to implement
_bit_scan_forward or _bit_scan_reverse, reimplementing them using
generic intrinsics makes it more likely that the middle end will
understand what's going on.
llvm-svn: 272564
Differential Revision: http://reviews.llvm.org/D19843
Corresponding LLVM change: http://reviews.llvm.org/D19842
Re-commit after addressing issues with of generating too many warnings for Windows and asan test failures.
Patch by Eric Niebler
llvm-svn: 272562
The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.
Differential Revision: http://reviews.llvm.org/D21245
llvm-svn: 272561
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1
This makes one particular compute shader 8x faster.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D21136
llvm-svn: 272556
Differential Revision: http://reviews.llvm.org/D19842
Corresponding clang patch: http://reviews.llvm.org/D19843
Re-commit after addressing issues with of generating too many warnings for Windows and asan test failures
Patch by Eric Niebler
llvm-svn: 272555
The condition reg of the cndmask_b64 expansion can't be killed by
the first one, and the implicit super register implicit def is needed.
llvm-svn: 272554
Summary:
Adds a version of sigaction that uses a raw system call, to avoid circular
dependencies and support calling sigaction prior to setting up
interceptors. The new sigaction relies on an assembly sigreturn routine
for its restorer, which is Linux x86_64-only for now.
Uses the new sigaction to initialize the working set tool's shadow fault
handler prior to libc interceptor being set up. This is required to
support instrumentation invoked during interceptor setup, which happens
with an instrumented tcmalloc or other allocator compiled with esan.
Adds a test that emulates an instrumented allocator.
Reviewers: aizatsky
Subscribers: vitalybuka, tberghammer, zhaoqin, danalbert, kcc, srhines, eugenis, llvm-commits, kubabrecka
Differential Revision: http://reviews.llvm.org/D21083
llvm-svn: 272553
This patch implements PR#22821.
Taking the address of a packed member is dangerous since the reduced
alignment of the pointee is lost. This can lead to memory alignment
faults in some architectures if the pointer value is dereferenced.
This change adds a new warning to clang emitted when taking the address
of a packed member. A packed member is either a field/data member
declared as attribute((packed)) or belonging to a struct/class
declared as such. The associated flag is -Waddress-of-packed-member
Differential Revision: http://reviews.llvm.org/D20561
llvm-svn: 272552
This enables use of the 'R' and 'T' memory constraints for inline ASM
operands on SystemZ, which allow an index register as well as an
immediate displacement. This patch includes corresponding documentation
and test case updates.
As with the last patch of this kind, I moved the 'm' constraint to the
most general case, which is now 'T' (base + 20-bit signed displacement +
index register).
Author: colpell
Differential Revision: http://reviews.llvm.org/D21239
llvm-svn: 272547