Due to num_threads (probably also other reasons) we cannot assume
explicit barriers are always executed by all threads in an aligned
fashion. We can optimize them if that property can be proven but
that is different.
No-sync is a property that we need in more places as complex
transformations emerge. To simplify the query we provide an
`AA::isNoSyncInst` helper now and expose two existing helpers through
the `AANoSync` class.
Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and
bugs introduced later by me.
This patch allows us to remove redundant barriers if they are part
of a "consecutive" pair of barriers in a basic block with no impacted
memory effect (read or write) in-between them. Memory accesses to
local (=thread private) or constant memory are allowed to appear.
Technically we could also allow any other memory that is not used to
share information between threads, e.g., the result of a malloc that
is also not captured. However, it will be easier to do more reasoning
once the code is put into an AA. That will also allow us to look through
phis/selects reasonably. At that point we should also deal with calls,
barriers in different blocks, and other complexities.
Differential Revision: https://reviews.llvm.org/D118002
We used to remove noinline from known OpenMP runtime functions (which
are declared in OMPKinds.td). Now we remove noinline from all functions
with the proper prefixes: __kmpc, _ZN4_OMP (= namespace omp), omp_
Only enable --emit-relocs linker option for merge-fdata target if tests are enabled.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D118580
PointerToBase is a mapping between potentially derived pointer to its base.
As soon as we are in SSA form if there is a base of derived pointer and it
is available at def of derived pointer, the same base will be available at any
point where derived pointer is alive.
So the mapping of derived pointer to base pointer is not a property
of a call site but the same on function level.
Reviewers: reames, yrouban
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D118604
This patch adds a new target to the tests to run using the new driver as
the method for generating offloading code.
Depends on D116541
Differential Revision: https://reviews.llvm.org/D118637
The LTO support for OpenMP offloading allows us to run the OpenMPOpt
pass during the LTO pipeline. This patch introduces an early run of the
Module pass and a late run of the CGSCC pass. These are quick no-ops if
there is no OpenMP in the module.
Depends on D118198
Differential Revision: https://reviews.llvm.org/D118611
Summary:
This patch removes the system call to the `clang-offload-wrapper` tool
by replicating its functionality in a new file. This improves
performance and makes the future wrapping functionality easier to
change.
Differential Revision: https://reviews.llvm.org/D118198
Summary:
This patch replaces the system call to the `llc` binary with a library
call to the target machine interface. This should be faster than
relying on an external system call to compile the final wrapper binary.
Differential Revision: https://reviews.llvm.org/D118197
Summary:
This parses the executable name out of the linker arguments so we can
use it to give more informative temporary file names and so we don't
accidentally use it for device linking.
Summary:
This patch implements the `-save-temps` flag for the linker wrapper.
This allows the user to inspect the intermeditary outpout that the
linker wrapper creates.
Summary:
Various changes to the linker wrapper, and the bitcode embedding is not
done after the optimizations have run rather than after linking is done.
This saves time when doing JIT.
This patch improves the symbol resolution done for LTO with offloading
applications. The symbol resolution done here allows the LTO backend to
internalize more functions. The symbol resoltion done is a simplified
view that does not take into account various options like `--wrap` or
`--dyanimic-list` and always assumes we are creating a shared object.
The actual target may be an executable, but semantically it is used as a
shared object because certain objects need to be visible outside of the
executable when they are read by the OpenMP plugin.
Depends on D117246
Differential Revision: https://reviews.llvm.org/D118155
This patch adds support for linking AMDGPU images using the LLD binary.
AMDGPU files are always bitcode images and will always use the LTO
backend. Additionally we now pass the default architecture found with
the `amdgpu-arch` tool to the argument list.
Depends on D117156
Differential Revision: https://reviews.llvm.org/D117246
This patch adds support for a few extra flags in the linker wrapper,
such as debugging flags, verbose output, and passing arguments to ptxas. We also
now forward pass remarks to the LLVM backend so they will show up in the LTO
passes.
Depends on D117049
Differential Revision: https://reviews.llvm.org/D117156
Summary;
This patch adds support for embedding device images in the linker
wrapper tool. This will be used for performing JIT functionality in the
future.
Depends on D117048
Differential Revision: https://reviews.llvm.org/D117049
Summary:
This patch adds support for linking the OpenMP device bitcode library
late when doing LTO. This simply passes it in as an additional device
file when doing the final device linking phase with LTO. This has the
advantage that we don't link it multiple times, and the device
references do not get inlined and prevent us from doing needed OpenMP
optimizations when we have visiblity of the whole module.
Fix some failings where the implicit conversion of an Error to an
Expected triggered the deleted copy constructor.
Depends on D116675
Differential revision: https://reviews.llvm.org/D117048
This patch implements the fist support for handling LTO in the
offloading pipeline. The flag `-foffload-lto` is used to control if
bitcode is embedded into the device. If bitcode is found in the device,
the extracted files will be sent to the LTO pipeline to be linked and
sent to the backend. This implementation does not separately link the
device bitcode libraries yet.
Depends on D116675
Differential Revision: https://reviews.llvm.org/D116975
This patch adds support for searching through the linker library paths
to identify static libraries that may contain device code. If device
code is present it will be extracted. This should ideally fully support
static linking with OpenMP offloading.
Depends on D116627
Differential Revision: https://reviews.llvm.org/D116675
This patch adds the initial support for linking NVPTX offloading code
using the clang-linker-wrapper tool. This uses the extracted device
files and runs `nvlink` on them. Currently this is then passed to the
existing toolchain for creating linkable OpenMP offloading programs
using `clang-offload-wrapper` and compiling it manually using `llc`.
More work is required to support LTO, Bitcode linking, AMDGPU, and x86
offloading.
Depends on D116545
Differential Revision: https://reviews.llvm.org/D116627
This patchs add support for extracting device offloading code from the
linker's input files. If the file contains a section with the name
`.llvm.offloading.<triple>.<arch>` it will be extracted to a new
temporary file to be linked. Addtionally, the host file containing it
will have the section stripped so it does not remain in the executable
once linked.
Depends on D116544
Differential Revision: https://reviews.llvm.org/D116545
This matches the same API usage as attributes/ops/types. For example:
```c++
Dialect *dialect = ...;
// Instead of this:
if (auto *interface = dialect->getRegisteredInterface<DialectInlinerInterface>())
// You can do this:
if (auto *interface = dyn_cast<DialectInlinerInterface>(dialect))
```
Differential Revision: https://reviews.llvm.org/D117859
Build failures are not a particularly helpful way to enforce not using
deprecated APIs and that isn't the point of the Bazel build.
At the same time, this removes `-Wno-unused` this is a check that we do
enforce in the Google internal build and so are ok maintaining in our
maintenance of the upstream Bazel build (the comment about not wanting
to do so was from a time when this was in a separate repository and I was
the only one maintaining it).
Differential Revision: https://reviews.llvm.org/D118671
Summary:
Add code object v5 support (deafult is still v4)
Generate metadata for implicit kernel args for the new ABI
Set the metadata version to be 1.2
Reviewers:
t-tye, b-sumner, arsenm, and bcahoon
Fixes:
SWDEV-307188, SWDEV-307189
Differential Revision:
https://reviews.llvm.org/D118272
Some functions can end up non-externally visible despite not being
declared "static" or in an unnamed namespace in C++ - such as by having
parameters that are of non-external types.
Such functions aren't mistakenly intended to be defining some function
that needs a declaration. They could be maybe more legible (except for
the `operator new` example) with an explicit static, but that's a
stylistic thing outside what should be addressed by a warning.
Make sure that the shell tests use the same python interpreter as the
rest of the build instead of picking up `python` from the PATH.
It would be nice if we could use the _disallow helper, but that triggers
on invocations that specify python as the scripting language.
D116542 adds EmbedBufferInModule which introduces a layer violation
(https://llvm.org/docs/CodingStandards.html#library-layering).
See 2d5f857a1e for detail.
EmbedBufferInModule does not use BitcodeWriter functionality and should be moved
LLVMTransformsUtils. While here, change the function case to the prevailing
convention.
It seems that EmbedBufferInModule just follows the steps of
EmbedBitcodeInModule. EmbedBitcodeInModule calls WriteBitcodeToFile but has IR
update operations which ideally should be refactored to another library.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D118666
Summary:
The changes introduced in D116542 added a dependency on TransformUtils
to use the `appendToCompilerUsed` method. This created a circular
dependency. This patch simply copies the needed function locally to
remove the dependency.
Previously you would get this error:
```
error: unsupported load command (cmd=0x2d)
```
If the binary you were redefining the symbols of contained a
LC_LINKER_OPTION load command. This command does not need to be changed
when redefining symbols so we can ignore it like many others.
Differential Revision: https://reviews.llvm.org/D118526
There is a layer violation and can break clang -fmodule-name=X -fmodules-strict-decluse builds:
* LLVMTransformUtils has `#include "llvm/Bitcode/BitcodeWriterPass.h"`
* LLVMBitWriter depends on LLVMTransformUtils after D116542
Temporarily work around the issue.
When the stepsize does not evenly divide the range's end, round-up to ensure that that last multiple of the stepsize before the reaching the upper boud is reached. For instance, the trip count of
for (int i = 0; i < 7; i+=5)
is two (i=0 and i=5), not (7-0)/5 == 1.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D118542
Use a bit-set to manage runtime-generated I/O unit numbers, recycle
them after they're closed, and use a range of values that fits in
a minimal-sized integer.
Differential Revision: https://reviews.llvm.org/D118651
Factoring it out so we can subsequently cache it. This should be a NFC,
however, for the float quantities, we see small errors in the least
significant digits. This is because, before, we were summing up one by
one. Now, we sum up results of sums.
This shouldn't matter for ML, and will require rework when we do
quantization (avoiding floats altogether), but meanwhile, it did require
an update to the reference file used for testing.
The patch also bumps the precision of the variables involved in this, to
reduce the error (note they are casted back to float at the end by the
SET macro, since we only work with float and not double in TF)
Differential Revision: https://reviews.llvm.org/D118659
The logic for getProfileKind for RawInstrProfReader and
InstrProfReaderIndex is similar. To avoid duplication, move the logic
from the header to InstrProfReader.cpp and introduce a static method
which implements the common code.
Differential Revision: https://reviews.llvm.org/D118656
The definition of the MemInfoBlock is shared between the memprof
compiler-rt runtime and llvm/lib/ProfileData/. This change removes the
memprof_meminfoblock header and moves the struct to the shared include
file. To enable this sharing, the Print method is moved to the
memprof_allocator (the only place it is used) and the remaining uses are
updated to refer to the MemInfoBlock defined in the MemProfData.inc
file.
Also a couple of other minor changes which improve usability of the
types in MemProfData.inc.
* Update the PACKED macro to handle commas.
* Add constructors and equality operators.
* Don't initialize the buildid field.
Differential Revision: https://reviews.llvm.org/D116780
When reallocating an I/O buffer to accommodate a large record,
ensure that the amount of growth is at least as large as the
minimum initial record size (64KiB). The previous policy was
causing input buffer reallocation for each byte after the minimum
buffer size when scanning input data for record termination
newlines.
Differential Revision: https://reviews.llvm.org/D118649