Semantic checks added to check the worksharing 'single' region closely nested inside a worksharing 'do' region. And also to check whether the 'do' iteration variable is a variable in 'Firstprivate' clause.
Files:
check-directive-structure.h
check-omp-structure.h
check-omp-structure.cpp
Testcases:
omp-do01-positivecase.f90
omp-do01.f90
omp-do05-positivecase.f90
omp-do05.f90
Reviewed by: Kiran Chandramohan @kiranchandramohan , Valentin Clement @clementval
Differential Revision: https://reviews.llvm.org/D93205
Use `memcpy` rather than copying bytes one by one, for there might be large
size structs to move.
Reviewed By: gchatelet, sivachandra
Differential Revision: https://reviews.llvm.org/D93195
In order to import patterns for these, we need to define new ops that can map to
the AArch64ISD::[SU]ITOF nodes. We then transform fpr->fpr variants of the generic
opcodes to these custom opcodes in preisel-lowering. We have to do it here and
not the PostLegalizer combiner because this has to run after regbankselect.
Differential Revision: https://reviews.llvm.org/D94702
This patch changes the default range used to anchor the include insertion to use
an expansion loc. This ensures that the location is valid, when the user relies
on the default range.
Driveby: extend a FIXME for a problem that was emphasized by this change; fix some spellings.
Differential Revision: https://reviews.llvm.org/D93703
[libomptarget][nvptx] Reduce calls to cuda header
Remove use of clock_t in favour of a builtin. Drop a preprocessor branch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94731
G_[US]ITOFP users of loads on AArch64 can operate on both gpr and fpr banks for scalars.
Because of this, if their source is a load, then that load can be assigned to an fpr
bank and therefore avoid having to do a cross bank copy via a gpr->fpr conversion.
Differential Revision: https://reviews.llvm.org/D94701
When a use-associated procedure was included in a generic, we weren't
correctly recording that fact. The ultimate symbol was added rather than
the local symbol.
Also, improve the message emitted for the specific procedure by
mentioning the module it came from.
This fixes one of the problems in https://bugs.llvm.org/show_bug.cgi?id=48648.
Differential Revision: https://reviews.llvm.org/D94696
[libomptarget][nvptx][nfc] Move target_impl functions out of header
This removes most of the differences between the two target_impl.h.
Also change name mangling from C to C++ for __kmpc_impl_*_lock.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D94728
With the recent changes to linalg on tensor semantics, the tiling
operations works out-of-the-box for generic operations. Add a test to
verify that and some minor refactoring.
Differential Revision: https://reviews.llvm.org/D93077
Add canonicalization to replace use of the result of a linalg
operation on tensors in a dim operation, to use one of the operands of
the linalg operations instead. This allows the linalg op itself to be
deleted when all its non-dim uses are removed (say through tiling, etc.)
Differential Revision: https://reviews.llvm.org/D93076
`omptarget-nvptx` is still a dependence for `check-libomptarget-nvtpx`
although it has been removed by D94573.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94725
This is NFC-intended. I'm still trying to figure out
how the loop where this is used works. It does not
seem like we require this data at all, but it's
hard to confirm given the complicated predicates.
linalg.generic/indexed_generic operations on tensors whose body is
just yielding the (non-induction variable) arguments of the operation
can be canonicalized by replacing uses of the result with the
corresponding arguments.
Differential Revision: https://reviews.llvm.org/D94581
The standard and gpu dialect both have `alloc` operations which use the
memory effect `MemAlloc`. In both cases, it is specified on both the
operation itself and on the result. This results in two memory effects
being created for these operations. When `MemAlloc` is defined on an
operation, it represents some background effect which the compiler
cannot reason about, and inhibits the ability of the compiler to
remove dead `std.alloc` operations. This change removes the uneeded
`MemAlloc` effect from these operations and leaves the effect on the
result, which allows dead allocs to be erased.
There is the same problem, but to a lesser extent, with MemFree, MemRead
and MemWrite. Over-specifying these traits is not currently inhibiting
any optimization.
Differential Revision: https://reviews.llvm.org/D94662
This regenerates these tests using utils/update_llc_test_checks.py so
that future changes in this area don't have the noise of lots of `@plt`
lines being added.
I also removed the `nounwind`s from the stack-realignment.ll test to
increase coverage on the generated call frame information.
The number of hardware threads available to a ThreadPool can be limited if setting an affinity mask.
For example:
> start /B /AFFINITY 0xF lld-link.exe ...
Would let LLD only use 4 hyper-threads.
Previously, there was an outstanding issue on Windows Server 2019 on dual-CPU machines, which was preventing from using both CPU sockets. In normal conditions, when no affinity mask was set, ProcessorGroup::AllThreads was different from ProcessorGroup::UsableThreads. The previous code in llvm/lib/Support/Windows/Threading.inc L201 was improperly assuming those two values to be equal, and consequently was limiting the execution to only one CPU socket.
Differential Revision: https://reviews.llvm.org/D92419
Some FP compares expand to a sequence ending with (xor X, 1) to invert the result. If
the consumer is a select_cc we can likely get rid of this xor by fixing
up the select_cc condition.
This patch combines (select_cc (xor X, 1), 0, setne, trueV, falseV) -
(select_cc X, 0, seteq, trueV, falseV) if we can prove X is 0/1.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D94546
Standard C allows all standard headers to declare macros for all
their functions. So after possibly including any standard header
like <ctype.h>, it's perfectly normal for any and all of the
functions it declares to be defined as macros. Standard C requires
explicit `#undef` before using that identifier in a way that is not
compatible with function-like macro definitions.
The C standard's rules for this are extended to POSIX as well for
the interfaces it defines, and it's the expected norm for
nonstandard extensions declared by standard C library headers too.
So far the only place this has come up for llvm-libc's code is with
the isascii function in Fuchsia's libc. But other cases can arise
for any standard (or common extension) function names that source
code in llvm-libc is using in nonstandard ways, i.e. as C++
identifiers.
The only correct and robust way to handle the possible inclusion of
standard C library headers when building llvm-libc source code is to
use `#undef` explicitly for each identifier before using it. The
easy and obvious place to do that is in the per-function header.
This requires that all code, such as test code, that might include
any standard C library headers, e.g. via utils/UnitTest/Test.h, make
sure to include those *first* before the per-function header.
This change does that for isascii and its test. But it should be
done uniformly for all the code and documented as a consistent
convention so new implementation files are sure to get this right.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D94642
Legacy AIX assembly might not support all extended mnes,
add one feature bit to control the generation in MC,
and avoid generating them by default on AIX.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D94458
This spilts out BufferDeallocationInternals.md, since buffer
deallocation is not part of bufferization per se.
Differential Revision: https://reviews.llvm.org/D94351
TosaToLinalg was depending on its header file indirectly through
Passes.h rather than directly. This removes that indirection.
Differential Revision: https://reviews.llvm.org/D94706
This revision adds a new `replaceOpWithIf` hook that replaces uses of an operation that satisfy a given functor. If all uses are replaced, the operation gets erased in a similar manner to `replaceOp`. DialectConversion support will be added in a followup as this requires adjusting how replacements are tracked there.
Differential Revision: https://reviews.llvm.org/D94632
MCTargetDesc includes headers from Utils and Utils includes headers
from MCTargetDesc. So from a library layering perspective it makes sense
for them to be in the same library. I guess the other option might be to
move the tablegen includes from RISCVMCTargetDesc.h to RISCVBaseInfo.h
so that RISCVBaseInfo.h didn't need to include RISCVMCTargetDesc.h.
Everything else that depends on Utils also depends on MCTargetDesc so
having one library seemed simpler.
Differential Revision: https://reviews.llvm.org/D93168
This generalizes D94647 to IR input, as suggested by @tejohnson.
Ideally the driver should just forward split dwarf options, but doing this currently will cause `clang -gsplit-dwarf -c a.c` to create a .dwo with just `.strtab`.
Reviewed By: dblaikie, tejohnson
Differential Revision: https://reviews.llvm.org/D94655
In the overwhelmingly common case, enum attribute case strings represent valid identifiers in MLIR syntax. This revision updates the format generator to format as a keyword in these cases, removing the need to wrap values in a string. The parser still retains the ability to parse the string form, but the printer will use the keyword form when applicable.
Differential Revision: https://reviews.llvm.org/D94575
This is a variant of TypesMatchWith that provides support for variadic arguments. This is necessary because ranges generally can't use the default operator== comparators for checking equality.
Differential Revision: https://reviews.llvm.org/D94574
An alias query currently works out roughly like this:
* Look up location pair in cache.
* Perform BasicAA logic (including cache lookup and insertion...)
* Perform a recursive query using BestAAResults.
* Look up location pair in cache (and thus do not recurse into BasicAA)
* Query all the other AA providers.
* Query all the other AA providers.
This is a lot of unnecessary work, all ultimately caused by the
BestAAResults query at the end of aliasCheck(). The reason we perform
it, is that aliasCheck() is getting called recursively, and we of
course want those recursive queries to also make use of other AA
providers, not just BasicAA. We can solve this by making the recursive
queries directly use BestAAResults (which will check both BasicAA
and other providers), rather than recursing into aliasCheck().
There are some tradeoffs:
* We can no longer pass through the precomputed underlying object
to aliasCheck(). This is not a major concern, because nowadays
getUnderlyingObject() is quite cheap.
* Results from other AA providers are no longer cached inside
BasicAA. The way this worked was already a bit iffy, in that a
result could be cached, but if it was MayAlias, we'd still end
up re-querying other providers anyway. If we want to cache
non-BasicAA results, we should do that in a more principled manner.
In any case, despite those tradeoffs, this works out to be a decent
compile-time improvment. I think it also simplifies the mental model
of how BasicAA works. It took me quite a while to fully understand
how these things interact.
Differential Revision: https://reviews.llvm.org/D90094
Initial commit to add support for lowering from TOSA to Linalg. The focus is on
the essential infrastructure for these lowerings and integration with existing
passes.
Includes lowerings for a subset of operations including:
abs, add, sub, pow, and, or, xor, left shift, right shift, tanh
Lit tests are used to validate correctness.
Differential Revision: https://reviews.llvm.org/D94247
The intent presumably is to avoid generating 'opaque' in the IR, but the
header contains the filename. Thus, having the workspace in a directory
with opaque in it causes this test to fail.
This just adds a 'CHECK' line on target-triple, which is the last line
of the IR-header.
This patch rename the tablegen generated file ACC.cpp.inc to ACC.inc in order
to match what was done in D92955. This file is included in header file as well as .cpp
file so it make more sense.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D93485
In preparation for the inbuilt options parser, this is a minor refactor
of optional components including:
- Putting certain optional elements in the right header files,
according to their function and their dependencies.
- Cleaning up some old and mostly-dead code.
- Moving some functions into anonymous namespaces to prevent symbol
export.
Reviewed By: cryptoad, eugenis
Differential Revision: https://reviews.llvm.org/D94117
The comment said CUDA 9 header files use the `nv_weak` attribute which
`clang` is not yet prepared to handle. It's three years ago and now things have
changed. Based on my test, removing the definition doesn't have any problem on
my machine with CUDA 11.1 installed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94700
Note -x86-use-fsrm-for-memcpy is still disabled by default and there's no
default behavior change.
Differential Revision: https://reviews.llvm.org/D94436
For NVPTX target, OpenMP provides a static library `libomptarget-nvptx`
built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by
Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang`
in the second run on the program that compiles the target part. Then the generated
PTX file will be fed to `ptxas` to generate the object file, and finally the driver
invokes `nvlink` to generate the binary, where the static library will be appened
to `nvlink`.
One question is, why do we need two libraries? The only difference is, the static
library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why
they were implemented in this way, but per D94565, there is no issue if we also
include the file into the bitcode library. Therefore, we can safely drop the
static library.
This patch is about the change in OpenMP. The driver will be updated as well if
this patch is accepted.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D94573
With this, we have complete support for emptiness checks. This also paves the way for future support to check if two FlatAffineConstraints are equal.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D94272
When a program maps one of its own modules for reading, and then
crashes, breakpad can emit two entries for that module in the
ModuleList. We have logic to identify this case by checking permissions
on mapped memory regions and report just the module with an executable
region. As currently written, though, the check is asymmetric -- the
entry with the executable region must be the second one encountered for
the preference to kick in.
This change makes the logic symmetric, so that the first-encountered
module will similarly be preferred if it has an executable region but
the second-encountered module does not. This happens for example when
the module in question is the executable itself, which breakpad likes to
report first -- we need to ignore the other entry for that module when
we see it later, even though it may be mapped at a lower virtual
address.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D94629
Even if we know nothing about LHS, it can still be useful to know that
smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS.
Differential Revision: https://reviews.llvm.org/D87145
Restore control of kernel launch tracing to be >= 1 as it was before
export LIBOMPTARGET_KERNEL_TRACE=1
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94695