This patch extends affine data copy optimization utility with an
optional memref filter argument. When the memref filter is used, data
copy optimization will only generate copies for such a memref.
Note: this patch is just porting the memref filter feature from Uday's
'hop' branch: https://github.com/bondhugula/llvm-project/tree/hop.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D74342
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
This pass would currently build, but fail to run when this backend isn't
linked in. On the other hand, we'd like it to initialize only the NVPTX
backend, which isn't possible if we continue to build it without the
backend available. Instead of building a broken configuration, let's
skip building the pass entirely.
Differential Revision: https://reviews.llvm.org/D74592
The commit switching the calling convention for memrefs (5a1778057)
inadvertently introduced a bug in the function argument attribute conversion:
due to incorrect indexing of function arguments it was not assigning the
attributes to the arguments beyond those generated from the first original
argument. This was not caught in the commit since the test suite does have a
test for converting multi-argument functions with argument attributes. Fix the
bug and add relevant tests.
Summary: This revision adds support to the declarative parser for formatting enum attributes in the symbolized form. It uses this new functionality to port several of the SPIRV parsers over to the declarative form.
Differential Revision: https://reviews.llvm.org/D74525
Summary:
This sets the basic framework for lowering vector.contract progressively
into simpler vector.contract operations until a direct vector.reduction
operation is reached. More details will be filled out progressively as well.
Reviewers: nicolasvasilache
Reviewed By: nicolasvasilache
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74520
Implement a pass to convert gpu.launch_func op into a sequence of
Vulkan runtime calls. The Vulkan runtime API surface is huge so currently we
don't expose separate external functions in IR for each of them, instead we
expose a few external functions to wrapper libraries which manages
Vulkan runtime.
Differential Revision: https://reviews.llvm.org/D74549
Summary:
To unblock other work, this implements basic lowering based on mapping
attributes that have to be provided on all loop.parallel. The lowering
does not yet support reduce.
Differential Revision: https://reviews.llvm.org/D73893
Summary:
On some platforms the build fails "std::function is not found". The include is used in
PassManager::IRPrinterConfig::enableIRPrinting.
Differential Revision: https://reviews.llvm.org/D74469
In code generators, one can automate the translation of typed ArrayAttrs
if element attribute translators are already implemented. However, the
type of the element attribute is lost at the construction of
TypedArrayAttrBase. With this change one can inspect the element type
and generate the translation logic automatically, which will reduce the
code repetition.
Differential Revision: https://reviews.llvm.org/D73579
Summary:
As discussed in https://llvm.discourse.group/t/rfc-add-affine-parallel/350, this is the first in a series of patches to bring in support for the `affine.parallel` operation.
This first patch adds the IR representation along with custom printer/parser implementations.
Reviewers: bondhugula, herhut, mehdi_amini, nicolasvasilache, rriddle, earhart, jbruestle
Reviewed By: bondhugula, nicolasvasilache, rriddle, earhart, jbruestle
Subscribers: jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74288
This patch adapts the method MemRefDescriptor::fromStaticShape to
support static non-zero offsets. The updated method uses the
getStridesAndOffset method to extract strides and offset. The patch also
adapts the test cases since sizes and strides are now set in forward
instead of reverse order.
Differential Revision: https://reviews.llvm.org/D74474
This revision prepares the ground for declaratively defining Linalg "named" ops.
Such named ops form the backbone of operations that are ubiquitous in the ML
application domain.
This revision closely related to the definition of a "Tensor Computation
Primitives Dialect" and demonstrates that ops can be expressed as declarative
configurations of the `linalg.generic` op.
Differential Revision: https://reviews.llvm.org/D74491
Summary:
This revision allows model builder to create a linalg_matmul whose body
is a vector.contract. This shows the abstractions compose nicely.
Differential Revision: https://reviews.llvm.org/D74457
Summary: This was a missed case when ValueRange was originally added, and allows for constructing a ValueRange from the arguments of a block.
Differential Revision: https://reviews.llvm.org/D74363
A memref_cast casting to a memref with a non identity map can't be
lowered to llvm. Take the following case:
```
func @invalid_memref_cast(%arg0: memref<?x?xf64>) {
%c1 = constant 1 : index
%c0 = constant 0 : index
%5 = memref_cast %arg0 : memref<?x?xf64> to memref<?x?xf64, #map1>
%25 = std.subview %5[%c0, %c0][%c1, %c1][] : memref<?x?xf64, #map1> to memref<?x?xf64, #map1>
return
}
```
When lowering the subview mlir was assuming `%5` to have an llvm type
(which is not the case as mlir failed to lower the memref_cast).
Differential Revision: https://reviews.llvm.org/D74466
Summary:
This was broken recently when moving from dialect registration via
static initializers to explicit intialization.
Differential Revision: https://reviews.llvm.org/D74480
Thus far we have been using builtin func op to model SPIR-V functions.
It was because builtin func op used to have special treatment in
various parts of the core codebase (e.g., pass pipelines, etc.) and
it's easy to bootstrap the development of the SPIR-V dialect. But
nowadays with general op concepts and region support we don't have
such limitations and it's time to tighten the SPIR-V dialect for
completeness.
This commits introduces a spv.func op to properly model SPIR-V
functions. Compared to builtin func op, it can provide the following
benefits:
* We can control the full op so we can integrate SPIR-V information
bits (e.g., function control) in a more integrated way and define
our own assembly form and enforcing better verification.
* We can have a better dialect and library boundary. At the current
moment only functions are modelled with an external op. With this
change, all ops modelling SPIR-V concpets will be spv.* ops and
registered to the SPIR-V dialect.
* We don't need to special-case func op anymore when creating
ConversionTarget declaring SPIR-V dialect as legal. This is quite
important given we'll see more and more conversions in the future.
In the process, bumps a few FuncOp methods to the FunctionLike trait.
Differential Revision: https://reviews.llvm.org/D74226
In the previous state, we were relying on forcing the linker to include
all libraries in the final binary and the global initializer to self-register
every piece of the system. This change help moving away from this model, and
allow users to compose pieces more freely. The current change is only "fixing"
the dialect registration and avoiding relying on "whole link" for the passes.
The translation is still relying on the global registry, and some refactoring
is needed to make this all more convenient.
Differential Revision: https://reviews.llvm.org/D74461
* Rename CMake target MLIROptMain to MLIROptLib:
The target provides the main library
* Rename CMake target MLIRMlirOptLib to MLIRMlirOptMain:
The target provides the main() entry function
At the moment, the Bazel configuration of TenorFlow maps the target
MlirOptLib to "lib/Support/MlirOptMain.cpp" and MlirOptMain to
"tools/mlir-opt/mlir-opt.cpp". This is the other way around in the CMake
configuration. As discussed in the context of the pull request
https://github.com/tensorflow/tensorflow/pull/36301, it seems useful to
revise the naming in the MLIR repo.
Differential Revision: https://reviews.llvm.org/D73778
Summary:
Adds affine loop fusion transformation function to LoopFusionUtils.
Updates TestLoopFusion utility to run loop fusion transformation until a fixed point is reached.
Adds unit tests to test the transformation.
Includes ASAN bug fix for D73190.
Reviewers: bondhugula, dcaballe
Reviewed By: bondhugula, dcaballe
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74330
Follow-up on D72802. Turn -convert-std-to-llvm-use-alloca and
-convert-std-to-llvm-bare-ptr-memref-call-conv into pass flags
of LLVMLoweringPass.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D73912
Defines a tablegen class RankedIntElementsAttr. This is an integer
version of RankedFloatElementsAttr.
Differential Revision: https://reviews.llvm.org/D73764
Summary:
The lowering to NVVM and ROCm handles tanh operations differently by
mapping them to NVVM/ROCm specific intrinsics. This conflicts with
the lowering to LLVM, which uses the default llvm intrinsic. This change
declares the LLVM intrinsics to be illegal, hence disallowing the
correspondign rewrite.
Differential Revision: https://reviews.llvm.org/D74389
We have spv.entry_point_abi for specifying the local workgroup size.
It should be decorated onto input gpu.func ops to drive the SPIR-V
CodeGen to generate the proper SPIR-V module execution mode. Compared
to using command-line options for specifying the configuration, using
attributes also has the benefits that 1) we are now able to use
different local workgroup for different entry points and 2) the
tests contains the configuration directly.
Differential Revision: https://reviews.llvm.org/D74012
Summary:
After D72555 has been landed, `linalg.indexed_generic` also accepts ranked
tensor as input and output. Add a test for it.
Differential Revision: https://reviews.llvm.org/D74267
Summary:
This revision adds EDSC support for VectorOps to enable the creation of a `vector_matmul` declaratively. The `vector_matmul` is a simple configuration
of the `vector.contract` op that follows the StructuredOps abstraction.
Differential Revision: https://reviews.llvm.org/D74284
This CL refactors EDSCs to layer them better and break unnecessary
dependencies. After this refactoring, the top-level EDSC target only
depends on IR but not on Dialects anymore and each dialect has its
own EDSC directory.
This simplifies the layering and breaks cyclic dependencies.
In particular, the declarative builder + folder are made explicit and
are now confined to Linalg.
As the refactoring occurred, certain classes and abstractions that were not
paying for themselves have been removed.
Differential Revision: https://reviews.llvm.org/D74302