Both these preference helper functions have initial support with
this change. The loop unrolling preferences are set with initial
settings to control thresholds, size and attributes of loops to
unroll with some tuning done. The peeling preferences may need
some tuning as well as the initial support looks much like what
other architectures utilize.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D113798
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115279
This patch is part of the upstreaming effort from the fir-dev branch.
Address review comments
- move CHECK blocks to after the mlir code in the test file
- fix style with respect to anonymous namespaces: only include class definitions in the namespace and make functions static and outside the namespace
- fix a few nits
- remove TODO in favor of notifyMatchFailure
- removed unnecessary CHECK line from convert-to-llvm.fir
- rebase on main - add TODO back in
- get successfull test of TODO in AllocMemOp converion of derived type with LEN params
- clearer comments and reduced use of auto
- move defintion of computeDerivedTypeSize to fix build error
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Reviewed By: awarzynski, clementval, kiranchandramohan, schweitz
Differential Revision: https://reviews.llvm.org/D114104
This does mostly the same as D112126, but for the runtimes cmake files.
Most of that is straightforward, but the interdependency between
libcxx and libunwind is tricky:
Libunwind is built at the same time as libcxx, but libunwind is not
installed yet. LIBCXXABI_USE_LLVM_UNWINDER makes libcxx link directly
against the just-built libunwind, but the compiler implicit -lunwind
isn't found. This patch avoids that by adding --unwindlib=none if
supported, if we are going to link explicitly against a newly built
unwinder anyway.
Reapplying this after
db32c4f456, which should fix the issues
that were reported last time this was applied.
Differential Revision: https://reviews.llvm.org/D113253
The XRebox Op is formed by the codegen rewrite which makes it easier to
convert the operation to LLVM. The XRebox op includes the information
from the rebox op and the associated slice, shift, and shape ops.
During the conversion process a new descriptor is created for reboxing.
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Val Donaldson <vdonaldson@nvidia.com>
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D114709
clang has `= default` as an extension in c++03, so just use it.
Reviewed By: ldionne, Quuxplusone, #libc
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D115275
At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version.
This patch prepares the plugin for asynchronous implementation by:
- Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function
Actual separation of kernel launch code (async vs sync) is a following patch.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115267
We avoid this fold in the more general cases where we use `FoldOpIntoSelect`.
That's because -- unlike most binary opcodes -- 'rem' can't usually be
speculated with a variable divisor since it can have immediate UB. But in
the case where both arms of the select are constants, we can safely evaluate
both sides and eliminate 'rem' completely.
This should fix:
https://llvm.org/PR52102
The same optimization for 'div' is planned as a follow-up patch.
Differential Revision: https://reviews.llvm.org/D115173
In the "runtimes" setup, the runtime (e.g. OpenMP) can be built for
a target entirely different from the current host build (where LLVM
and Clang are built). If profiling is enabled, libomptarget links
against LLVMSupport (which only has been built for the host).
Thus, don't enable profiling by default in this setup.
This should allow relanding D113253.
Differential Revision: https://reviews.llvm.org/D114083
This patch adds the runtime function to allocate and
deallocate ragged arrays.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D114534
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
A series of unary operators and casts may obscure the variable we're
trying to analyze. Ignore them for the uninitialized value analysis.
Other checks determine if the unary operators result in a valid l-value.
Link: https://github.com/ClangBuiltLinux/linux/issues/1521
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D114848
Before this patch, the new test's `CountedInvocable<int*, int*>`
would hard-error instead of SFINAEing and cleanly returning false.
Notice that views::counted specifically does NOT work with pipes;
`counted(42)` is ill-formed. This is because `counted`'s first argument
is supposed to be an iterator, not a range.
Also, mark `views::counted(it, n)` as [[nodiscard]], and test that.
(We have a general policy now that range adaptors are consistently
marked [[nodiscard]], so that people don't accidentally think that
they have side effects. This matters mostly for `reverse` and
`transform`, arguably `drop`, and just generally let's be consistent.)
Differential Revision: https://reviews.llvm.org/D115177
This revision implements sparse outputs (from scratch) in all cases where
the loops can be reordered with all but one parallel loops outer. If the
inner parallel loop appears inside one or more reductions loops, then an
access pattern expansion is required (aka. workspaces in TACO speak).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115091
This patch applies the lint rules described in the previous patch. There
was also a significant amount of effort put into manually fixing things,
since all of the templated functions, or structs defined in /spec, were
not updated and had to be handled manually.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D114302
This commit changes the clang-tidy rules for LLVM-libc to follow the new
format. The next commit applies these rules to the codebase.
The rules are as follows:
CamelCase for classes
lower_case for variables
lower_case for functions
UPPER_CASE for constexpr variables
There are also some exceptions, but the most important one is that any
function or variable that starts with an underscore is exempt from the
formatting.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D114301
This is a split of D113724. Calling `TypeSystemClang::AddMethodToCXXRecordType`
to create function decls for class methods.
Differential Revision: https://reviews.llvm.org/D113930
It is required for the [Leak Sanitizer port to Windows](https://reviews.llvm.org/D115103).
The currently used `unsigned long` type is 64 bits wide on UNIX like systems but only 32 bits wide on Windows.
Because of that, the literal `8UL << 30` causes an integer overflow on Windows.
By changing the type of the literals to `unsigned long long`, we have consistent behavior and no overflows on all Platforms.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D115186
Quantized case needs to include zero-point corrections before the tosa.mul.
Disabled for the quantized use-case.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D115264
The AsmParser checks the range of a PC-relative operand, but only if it is
immediate.
This patch adds range checks for operands in applyFixup(), at which point the
offset to a label is known.
The diagnostic message for an operand that is out of range is explicit (with
given value and min/max limits). This is now also done for displacement
fixups.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D114194
DataEncoder was previously made to modify data within an existing buffer. As the code progressed, new clients started using DataEncoder to create binary data. In these cases the use of this class was possibly, but only if you knew exactly how large your buffer would be ahead of time. This patchs adds the ability for DataEncoder to own a buffer that can be dynamically resized as data is appended to the buffer.
Change in this patch:
- Allow a DataEncoder object to be created that owns a DataBufferHeap object that can dynamically grow as data is appended
- Add new methods that start with "Append" to append data to the buffer and grow it as needed
- Adds full testing of the API to assure modifications don't regress any functionality
- Has two constructors: one that uses caller owned data and one that creates an object with object owned data
- "Append" methods only work if the object owns it own data
- Removes the ability to specify a shared memory buffer as no one was using this functionality. This allows us to switch to a case where the object owns its own data in a DataBufferHeap that can be resized as data is added
"Put" methods work on both caller and object owned data.
"Append" methods work on only object owned data where we can grow the buffer. These methods will return false if called on a DataEncoder object that has caller owned data.
The main reason for these modifications is to be able to use the DateEncoder objects instead of llvm::gsym::FileWriter in https://reviews.llvm.org/D113789. This patch wants to add the ability to create symbol table caching to LLDB and the code needs to build binary caches and save them to disk.
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D115073
In error cases it is possible to CLOSE a unit that has not
been successfully connected, so don't crash when the file descriptor
is negative.
Differential Revision: https://reviews.llvm.org/D115165
When closing all open units, don't hold the unit map lock
over the actual close operations; if one of those aborts,
CloseAll() may be called and then deadlock.
Differential Review: https://reviews.llvm.org/D115184