forked from OSchip/llvm-project
Documentation for #pragma clang loop directive and options vectorize and interleave.
Reviewed by: Aaron Ballman and Dmitri Gribenko llvm-svn: 211135
This commit is contained in:
parent
a95ebc6801
commit
db2668a1da
|
@ -1764,3 +1764,68 @@ The ``container`` function is also in the region and will not be optimized, but
|
|||
it causes the instantiation of ``twice`` and ``thrice`` with an ``int`` type; of
|
||||
these two instantiations, ``twice`` will be optimized (because its definition
|
||||
was outside the region) and ``thrice`` will not be optimized.
|
||||
|
||||
Extensions for loop hint optimizations
|
||||
======================================
|
||||
|
||||
The ``#pragma clang loop`` directive is used to specify hints for optimizing the
|
||||
subsequent for, while, do-while, or c++11 range-based for loop. The directive
|
||||
provides options for vectorization and interleaving. Loop hints can be specified
|
||||
before any loop and will be ignored if the optimization is not safe to apply.
|
||||
|
||||
A vectorized loop performs multiple iterations of the original loop
|
||||
in parallel using vector instructions. The instruction set of the target
|
||||
processor determines which vector instructions are available and their vector
|
||||
widths. This restricts the types of loops that can be vectorized. The vectorizer
|
||||
automatically determines if the loop is safe and profitable to vectorize. A
|
||||
vector instruction cost model is used to select the vector width.
|
||||
|
||||
Interleaving multiple loop iterations allows modern processors to further
|
||||
improve instruction-level parallelism (ILP) using advanced hardware features,
|
||||
such as multiple execution units and out-of-order execution. The vectorizer uses
|
||||
a cost model that depends on the register pressure and generated code size to
|
||||
select the interleaving count.
|
||||
|
||||
Vectorization is enabled by ``vectorize(enable)`` and interleaving is enabled
|
||||
by ``interleave(enable)``. This is useful when compiling with ``-Os`` to
|
||||
manually enable vectorization or interleaving.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
#pragma clang loop vectorize(enable)
|
||||
#pragma clang loop interleave(enable)
|
||||
for(...) {
|
||||
...
|
||||
}
|
||||
|
||||
The vector width is specified by ``vectorize_width(_value_)`` and the interleave
|
||||
count is specified by ``interleave_count(_value_)``, where
|
||||
_value_ is a positive integer. This is useful for specifying the optimal
|
||||
width/count of the set of target architectures supported by your application.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
|
||||
#pragma clang loop vectorize_width(2)
|
||||
#pragma clang loop interleave_count(2)
|
||||
for(...) {
|
||||
...
|
||||
}
|
||||
|
||||
Specifying a width/count of 1 disables the optimization, and is equivalent to
|
||||
``vectorize(disable)`` or ``interleave(disable)``.
|
||||
|
||||
For convenience multiple loop hints can be specified on a single line.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
#pragma clang loop vectorize_width(4) interleave_count(8)
|
||||
for(...) {
|
||||
...
|
||||
}
|
||||
|
||||
If an optimization cannot be applied any hints that apply to it will be ignored.
|
||||
For example, the hint ``vectorize_width(4)`` is ignored if the loop is not
|
||||
proven safe to vectorize. To identify and diagnose optimization issues use
|
||||
`-Rpass`, `-Rpass-missed`, and `-Rpass-analysis` command line options. See the
|
||||
user guide for details.
|
||||
|
|
|
@ -97,6 +97,14 @@ passes via three new flags: `-Rpass`, `-Rpass-missed` and `-Rpass-analysis`.
|
|||
These flags take a POSIX regular expression which indicates the name
|
||||
of the pass (or passes) that should emit optimization remarks.
|
||||
|
||||
New Pragmas in Clang
|
||||
-----------------------
|
||||
|
||||
Loop optimization hints can be specified using the new `#pragma clang loop`
|
||||
directive just prior to the desired loop. The directive allows vectorization
|
||||
and interleaving to be enabled or disabled, and the vector width and interleave
|
||||
count to be manually specified. See language extensions for details.
|
||||
|
||||
C Language Changes in Clang
|
||||
---------------------------
|
||||
|
||||
|
|
|
@ -1812,5 +1812,5 @@ def LoopHint : Attr {
|
|||
}
|
||||
}];
|
||||
|
||||
let Documentation = [Undocumented];
|
||||
let Documentation = [LoopHintDocs];
|
||||
}
|
||||
|
|
|
@ -1012,3 +1012,14 @@ This attribute is incompatible with the ``always_inline`` attribute.
|
|||
}];
|
||||
}
|
||||
|
||||
def LoopHintDocs : Documentation {
|
||||
let Category = DocCatStmt;
|
||||
let Content = [{
|
||||
The ``#pragma clang loop'' directive allows loop optimization hints to be
|
||||
specified for the subsequent loop. The directive allows vectorization
|
||||
and interleaving to be enabled or disabled, and the vector width and interleave
|
||||
count to be manually specified. See `language extensions
|
||||
<http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations>'_
|
||||
for details.
|
||||
}];
|
||||
}
|
||||
|
|
Loading…
Reference in New Issue