Summary:
I have discovered this because i wanted to experiment with
building static libomp (with openmp-4.0 support only)
for debugging purposes.
There are three kinds of problems here:
1. `__kmp_compare_and_store_acq()` simply does not exist.
It was added in D47903 by @jlpeyton.
I'm guessing `__kmp_atomic_compare_store_acq()` was meant.
2. In `__kmp_is_ticket_lock_initialized()`,
`lck->lk.initialized` is `std::atomic<bool>`,
while `lck` is `kmp_ticket_lock_t *`.
Naturally, they can't be equality-compared.
Either, it should return the value read from `lck->lk.initialized`,
or do what `__kmp_is_queuing_lock_initialized()` does,
compare the passed pointer with the field in the struct
pointed by the pointer. I think the latter is correct-er choice here.
3. Tests were not versioned.
They assume that `LIBOMP_OMP_VERSION` is at the latest version.
This does not touch LIBOMP_OMP_VERSION=30. That is still broken.
Reviewers: jlpeyton, Hahnfeld, AndreyChurbanov
Reviewed By: AndreyChurbanov
Subscribers: guansong, jfb, openmp-commits, jlpeyton
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D55496
llvm-svn: 349260
The value returned by __kmp_now_nsec() can overflow 32-bit values causing
incorrect values to be returned. The overflow can end up causing a divide
by zero error because in __kmp_initialize_system_tick(), the value
(__kmp_now_nsec() - nsec) can end up being much larger than the numerator:
1e6 * (delay + (now - goal))
during a pathological timing where the current time calculated is much larger
than nsec. When this happens, the value of __kmp_ticks_per_msec is set to zero
which is then used as the denominator in the KMP_NOW_MSEC() macro leading to
the divide by zero error.
Differential Revision: https://reviews.llvm.org/D55300
llvm-svn: 349090
This patch adds the affinity format functionality introduced in OpenMP 5.0.
This patch adds: Two new environment variables:
OMP_DISPLAY_AFFINITY=TRUE|FALSE
OMP_AFFINITY_FORMAT=<string>
and Four new API:
1) omp_set_affinity_format()
2) omp_get_affinity_format()
3) omp_display_affinity()
4) omp_capture_affinity()
The affinity format functionality has two ICV's associated with it:
affinity-display-var (bool) and affinity-format-var (string).
The affinity-display-var enables/disables the functionality through the
envirable OMP_DISPLAY_AFFINITY. The affinity-format-var is a formatted
string with the special field types beginning with a '%' character
similar to printf
For example, the affinity-format-var could be:
"OMP: host:%H pid:%P OStid:%i num_threads:%N thread_num:%n affinity:{%A}"
The affinity-format-var is displayed by every thread implicitly at the beginning
of a parallel region when any thread's affinity has changed (including a brand
new thread being spawned), or explicitly using the omp_display_affinity() API.
The omp_capture_affinity() function can capture the affinity-format-var in a
char buffer. And omp_set|get_affinity_format() allow the user to set|get the
affinity-format-var explicitly at runtime. omp_capture_affinity() and
omp_get_affinity_format() both return the number of characters needed to hold
the entire string it tried to make (not including NULL character). If not
enough buffer space is available,
both these functions truncate their output.
Differential Revision: https://reviews.llvm.org/D55148
llvm-svn: 349089
Disable KMP_HAVE_QUAD when building via gcc on NetBSD system,
as the build fails due to unimplemented builtins:
.../kmp_atomic.cpp.o: In function `__kmpc_atomic_cmplx16_mul':
.../kmp_atomic.cpp:1332: undefined reference to `__multc3'
.../kmp_atomic.cpp.o: In function `__kmpc_atomic_cmplx16_div':
.../kmp_atomic.cpp:1334: undefined reference to `__divtc3'
...
Differential Revision: https://reviews.llvm.org/D55478
llvm-svn: 348886
Switch NetBSD from reading /proc (which is broken) to getloadavg()
(which is already used by Darwin). NetBSD discourages using procfs
in favor of system API calls.
Differential Revision: https://reviews.llvm.org/D55486
llvm-svn: 348885
Summary:
Use the sysctl(3) function to check whether an address is mapped
into the address space.
Reviewers: mgorny, joerg, #openmp
Reviewed By: mgorny
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D55549
llvm-svn: 348874
Summary: _lwp_self() returns current Thread Id in a numeric version on NetBSD.
Reviewers: joerg, mgorny, #openmp
Reviewed By: mgorny
Subscribers: llvm-commits, openmp-commits, #openmp
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D55497
llvm-svn: 348873
Increase the range for omp_get_wtick() test to allow for 0.01
(from <0.01). This is needed for NetBSD where it returns exactly that
value due to CLOCKS_PER_SEC being 100. This should not cause
a significant difference from e.g. FreeBSD where it is 128,
and especially from Linux where CLOCKS_PER_SEC is apparently meaningless
and sysconf(_SC_CLK_TCK) gives 100 as well.
Differential Revision: https://reviews.llvm.org/D55493
llvm-svn: 348857
On NetBSD, alloca() is in stdlib.h and there is no alloca.h. Adjust
the includes appopriately.
Differential Revision: https://reviews.llvm.org/D55487
llvm-svn: 348856
Pass `-n -s` instead of `--numeric --stable` to sort(1), as long options
are not supported by NetBSD sort implementation. `-n` is defined
by POSIX, so it should be fully portable. `-s` is used consistently
at least in GNU sort and FreeBSD sort, and I honestly doubt it would
cause issues with any other implementation supporting `--stable`.
Differential Revision: https://reviews.llvm.org/D55479
llvm-svn: 348855
Prefer using '-std=gnu++11' over '-std=c++11' when available, as NetBSD
exposes the correct alloca() implementation only with gnu* C/C++
standards.
Differential Revision: https://reviews.llvm.org/D55477
llvm-svn: 348854
Fix two build issues:
1) Recent commit 348756 accidentally included Unix clang compilers
to use immintrin.h when only clang-cl should be using it leading
to the following error:
openmp-llvm/runtime/src/kmp_lock.cpp:2035:25: error: always_
inline function '_xbegin' requires target feature 'rtm', but would be inlined into function
'__kmp_test_adaptive_lock_only' that is compiled without support for 'rtm'
kmp_uint32 status = _xbegin();
This patch changes the guard to use immintrin.h to only use clang-cl instead of all clang
2) gcc-8 gives a warning about multiline comment in kmp_runtime.cpp:
This patch just changes it to a two line comment
openmp-llvm/runtime/src/kmp_runtime.cpp:7697:8: warning: multi-line comment [-Wcomment]
#endif // KMP_OS_LINUX || KMP_OS_DRAGONFLY || KMP_OS_FREEBSD || KMP_OS_NETBSD \
llvm-svn: 348783
Summary:
Use the original shuffle implementation for __kmpc_shuffle_int64 since
default implementation uses the same implementation.
Reviewers: gtbercea
Subscribers: guansong, caomhin, openmp-commits
Differential Revision: https://reviews.llvm.org/D55514
llvm-svn: 348772
Summary:
Shuffle on 64bit data is allowed only for CUDA >= 9.0. Also, fixed the
constant for the mask, need one extra L in the end.
Reviewers: gtbercea, kkwli0
Subscribers: guansong, caomhin, openmp-commits
Differential Revision: https://reviews.llvm.org/D55440
llvm-svn: 348758
Summary: This patch permits OpenMP to build and work (with both gcc and clang) on OpenBSD. It mostly follows what was done for FreeBSD and NetBSD, except OpenBSD does not have pthread_getattr_np support, so it follows OS X in that one instance.
Reviewers: #openmp, krytarowski
Reviewed By: krytarowski
Subscribers: guansong, jfb, emaste, mgorny, krytarowski, #openmp
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D34280
llvm-svn: 348726
Summary:
Additions mostly follow FreeBSD and NetBSD and are not intrusive.
There is similar patch for OpenBSD: https://reviews.llvm.org/D34280
The -lm was being omitted due to -Wl,--as-needed in cmake rule, similar patch is in freebsd-ports/devel/llvm-devel port.
Simple OpenMP programs compile and work as expected:
$ clang-devel ~/omp_hello.c -fopenmp -I/usr/local/llvm-devel/include
$ LD_LIBRARY_PATH=/usr/local/llvm-devel/lib OMP_NUM_THREADS=100 ./a.out
The assertion in LLVMgold.so when -fopenmp was used together with -flto in 20170524 snapshot is no longer triggered on current svn-trunk and works fine as in llvm-4.0 with our local patches.
Reviewers: #openmp, krytarowski
Reviewed By: krytarowski
Subscribers: dexonsmith, jfb, krytarowski, guansong, gregrodgers, emaste, mgorny, mehdi_amini
Differential Revision: https://reviews.llvm.org/D35129
llvm-svn: 348725
Summary:
Introduced special noinline function log that allows to save some
registers for optimized builds but with enabled logging. Also, it
increases the stability of the optimized builds with inlined runtime.
Reviewers: gtbercea, kkwli0
Reviewed By: gtbercea
Subscribers: caomhin, guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D55436
llvm-svn: 348606
Summary:
According to the standard, after memory flushing the changes in the
memory must be visible to all the threads in all teams. Patch fixes
this.
Reviewers: gtbercea, kkwli0
Subscribers: guansong, jfb, caomhin, openmp-commits
Differential Revision: https://reviews.llvm.org/D55370
llvm-svn: 348491
Summary:
Reworked runtime to make it compatible with the requirements of the
original runtime library. Also, simplified some code to reduce number of
function calls.
Reviewers: gtbercea, kkwli0
Subscribers: guansong, jfb, caomhin, openmp-commits
Differential Revision: https://reviews.llvm.org/D55130
llvm-svn: 348003
There is a conflict between libomptarget and libomp concerning some of the
standard OpenMP device API which needs further intestigation.
llvm-svn: 347932
This patch adds __kmpc_omp_reg_task_with_affinity to register affinity
information for tasks. For now, the affinity information is not used,
and the function always succeeds. This also adds the kmp_task_affinity_info_t
structure to store the task affinity information.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D55026
llvm-svn: 347907
This change renames ompt_mutex_impl_unknown to ompt_mutex_impl_none,
following the name change in the specification.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D54347
llvm-svn: 347802
* Fix calculation of string length.
* Remove NULL-check of pointer which has been dereferenced.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D54948
llvm-svn: 347801
There is low probability that array th_hot_teams can be
accessed out of bound (when many nested levels are requested
to keep hot teams via KMP_HOT_TEAMS_MAX_LEVEL). The patch
adds the check of index that fixes the problem.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D54950
llvm-svn: 347800
Add omp_get_device_num() function for 5.0 which returns the number of the device
the current thread is running on. Also, did some cleanup and updating of device
API functions to make them into weak functions that should be replaced with
libomptarget functions when libomptarget is present.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D54342
llvm-svn: 347799
Summary: To enable the compiler to optimize parts of the function that are not needed when runtime can be omitted, a new version of the SPMD deinit kernel function is needed. This function takes the runtime required flag as an argument.
Reviewers: ABataev, kkwli0, caomhin
Reviewed By: ABataev
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D54969
llvm-svn: 347714
Summary:
Added functions __kmpc_nvptx_teams_reduce_nowait_simple and
__kmpc_nvptx_teams_end_reduce_nowait_simple to implement basic support
for reductions across the teams.
Reviewers: gtbercea, kkwli0
Subscribers: guansong, jfb, caomhin, openmp-commits
Differential Revision: https://reviews.llvm.org/D54967
llvm-svn: 347710
Summary: Refactor the checking for SPMD mode and whether the runtime is initialized or not. This uses constant flags which enables the runtime to optimize out unused sections of code that depend on these flags.
Reviewers: ABataev, caomhin
Reviewed By: ABataev
Subscribers: guansong, jfb, openmp-commits
Differential Revision: https://reviews.llvm.org/D54960
llvm-svn: 347698
Summary:
The base pointer for the lambda mapping must point to the lambda capture
placement and pointer must point to the captured variable itself. Patch
fixes this problem.
Reviewers: gtbercea
Subscribers: guansong, openmp-commits, kkwli0, caomhin
Differential Revision: https://reviews.llvm.org/D54260
llvm-svn: 346407
Summary:
The previously used combination `PTR_AND_OBJ | PRIVATE` could be used
for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ |
LITERAL`.
Reviewers: gtbercea
Subscribers: guansong, caomhin, openmp-commits
Differential Revision: https://reviews.llvm.org/D54035
llvm-svn: 345981
Summary:
Current globalization scheme works correctly only for SPMD+lightweight
runtime mode and does not work for full runtime. Patch improves support
for the globalization scheme + reduces global memory consumption in
lightweight runtime mode.
Patch adds runtime functions to work with the statically allocated
global memory. It allows to improve performance and memory consumption.
This global memory must be allocated by the compiler.
Reviewers: grokos, kkwli0, gtbercea, caomhin
Subscribers: guansong, jfb, openmp-commits
Differential Revision: https://reviews.llvm.org/D53943
llvm-svn: 345976
Summary: In the case of coalesced global records, we need to push the exact data size passed in. This patch fixes this by outlining the common functionality of the previous push function and by adding a separate entry point for coalesced pushes. The pop function remains unchanged.
Reviewers: ABataev, grokos, caomhin
Reviewed By: ABataev, grokos
Subscribers: jholewinski, cfe-commits, Hahnfeld, guansong, jfb, openmp-commits
Differential Revision: https://reviews.llvm.org/D53141
llvm-svn: 345867
Summary:
Added support for correct mapping of variables captured by reference in
lambdas. That kind of mapping may appear only in target-executable
regions and must follow the original lambda or another lambda capture
for the same lambda.
The expected data: base address - the address of the lambda, begin
pointer - pointer to the address of the lambda capture, size - size of
the captured variable.
When OMP_TGT_MAPTYPE_PTR_AND_OBJ mapping type is seen in
target-executable region, the target address of the last processed item
is taken as the address of the original lambda `tgt_lambda_ptr`. Then,
the pointer to capture on the device is calculated like `tgt_lambda_ptr
+ (host_begin_pointer - host_begin_base)` and the target-based address
of the original variable (which host address is
`*(void**)begin_pointer`) is written to that pointer.
Reviewers: kkwli0, gtbercea, grokos
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D51107
llvm-svn: 345608
Initializing an ompt_data_t object using the pointer union member is potentially
unsafe in 32-bit programs. This change fixes the issue
by using the constant, ompt_data_none.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D52046
llvm-svn: 343785
On Windows, child workers are terminated by the parent during the normal
program exit process (ExitProcess()) and they are not able to finish generating
their OpenMP events. We can force manual library shut down in __kmpc_end() to
fix this at least for the cases where __kmpc_end() is properly inserted.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D52628
llvm-svn: 343619
If the user requested LIBOMPTARGET_NVPTX_DEBUG, include asserts in
the bitcode library. Everything else will have very unpleasent
effects because asserts will appear when falling back to the static
library libomptarget-nvptx.a.
Differential Revision: https://reviews.llvm.org/D52701
llvm-svn: 343477
Pass in the correct value of isRuntimeUninitialized() which solves
parallel reductions as reported on the mailing list.
For reference: r333285 did the same for loop scheduling.
Differential Revision: https://reviews.llvm.org/D52725
llvm-svn: 343476
Patch suggested by Kelvin Li: removed optional "kind=" part of kind-selector
for variables with long names and kind names.
Differential Revision: https://reviews.llvm.org/D52712
llvm-svn: 343475
NVPTX requires addresses of pointer locations to be 8-byte aligned
or there will be an exception during runtime.
This could happen without this patch as shown in the added test:
getId() requires 4 byte of stack and putValueInParallel() uses 16
bytes to store the addresses of the captured variables.
Differential Revision: https://reviews.llvm.org/D52655
llvm-svn: 343402