Fix bugs relating to support for characters of different kinds. Lowering
was creating bad FIR and MLIR that crashed in conversion to LLVM IR.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128723
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This attribute is similar to DenseElementsAttr but does not support
splat. As such it has a much simpler API and does not need any smart
iterator: it exposes direct ArrayRef access.
A new syntax is introduced so that the generic printing/parsing looks
like:
[:i64 1, -2, 3]
This attribute beings like an ArrayAttr but has a `:` token after the
opening square brace to introduce the element type (supported are I8,
I16, I32, I64, F32, F64) and the comma separated list for the data.
This is particularly convenient for attributes intended to be small,
like those referring to shapes.
For example a `transpose` operation with a `dims` attribute could be
defined as such:
let arguments = (ins AnyTensor:$input, DenseI64ArrayAttr:$dims);
let assemblyFormat = "$input `dims` `=` $dims attr-dict : type($input)";
And printed this way (the element type is elided in this case):
transpose %input dims = [0, 2, 1] : tensor<2x3x4xf32>
The C++ API for dims would just directly return an ArrayRef<int64>
RFC: https://discourse.llvm.org/t/rfc-introduce-a-new-dense-array-attribute/63279
Recommit with a custom DenseArrayBaseAttrStorage class to ensure
over-alignment of the storage to the largest type.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123774
For the rapid triage push, just add a TODO for the degenerate POINTER
assignment case. The LHD ought to be a variable of type !fir.box, but it
is currently returning a shadow variable for the raw data pointer. More
investigation is needed there.
Make sure that conversions are applied in FORALL degenerate contexts.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128724
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Add lowering tests left behind during the upstreaming.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128721
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
The gold linker veneers are written between functions without symbols,
so we to handle it specially in BOLT.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D128082
Migrate extractelement, insertelement and shufflevector to use the
FoldXYZ rather than CreateXYZ APIs.
This is probably NFC in practice, because the places using
InstSimplifyFolder probably aren't using vector operations.
It makes sense to handle byval promotion in the same way as non-byval
but also allowing `store` instructions. However, these should
use the same checks as the `load` instructions do, i.e. be part of the
`ArgsToPromote` collection. For these instructions, the check for
interfering modifications can be disabled, though. The promotion
algorithm itself has been modified a lot: all the accesses (i.e. loads
and stores) are rewritten to the emitted `alloca` instructions. To
optimize these new `alloca`s out, the `PromoteMemToReg` function from
`Transforms/Utils/PromoteMemoryToRegister.cpp` file is invoked after
promotion.
In order to let the `PromoteMemToReg` promote as many `alloca`s as it
is possible, there should be no `GEP`s from the `alloca`s. To
eliminate the `GEP`s, its own `alloca` is generated for every argument
part because a single `alloca` for the whole argument (that
significantly simplifies the code of the pass though) unfortunately
cannot be used.
The idea comes from the following discussion:
https://reviews.llvm.org/D124514#3479676
Differential Revision: https://reviews.llvm.org/D125485
This attribute is similar to DenseElementsAttr but does not support
splat. As such it has a much simpler API and does not need any smart
iterator: it exposes direct ArrayRef access.
A new syntax is introduced so that the generic printing/parsing looks
like:
[:i64 1, -2, 3]
This attribute beings like an ArrayAttr but has a `:` token after the
opening square brace to introduce the element type (supported are I8,
I16, I32, I64, F32, F64) and the comma separated list for the data.
This is particularly convenient for attributes intended to be small,
like those referring to shapes.
For example a `transpose` operation with a `dims` attribute could be
defined as such:
let arguments = (ins AnyTensor:$input, DenseI64ArrayAttr:$dims);
let assemblyFormat = "$input `dims` `=` $dims attr-dict : type($input)";
And printed this way (the element type is elided in this case):
transpose %input dims = [0, 2, 1] : tensor<2x3x4xf32>
The C++ API for dims would just directly return an ArrayRef<int64>
RFC: https://discourse.llvm.org/t/rfc-introduce-a-new-dense-array-attribute/63279
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D123774
opportunities
There are straight forward splat load opportunities blocked by
getNormalLoadInput(), since those cases involve consecutive bitcasts.
Improve by looking through bitcasts.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D128703
This was previous implemented as part of the BufferizableOpInterface of ForEachThreadOp. Moving the implementation to ParallelInsertSliceOp to be consistent with the remaining ops and to have a nice example op that can serve as a blueprint for other ops.
Differential Revision: https://reviews.llvm.org/D128666
Add basic canonicalization for consecutive complex.add and sub operations.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D128702
This patch moves the code for recipe implementations to a separate file.
The benefits are:
* Keep VPlan.cpp smaller => faster compile-time during parallel builds.
* Keep code for logical units together
As a follow-up I am also planning on moving all ::execute
implemetnations from LoopVectorize.cpp over to the new file, which
should help to reduce the size of the file a bit.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D127965
This helps ISel decompose the generic offset for the tile into a base + offset.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D128508
When the SME feature is enabled we also gain access to a few extra
SVE2 instructions. This patch adds LLVM IR intrinsics to make use
of these new instructions:
@llvm.aarch64.sve.psel
@llvm.aarch64.sve.revd
@llvm.aarch64.sve.sclamp
@llvm.aarch64.sve.uclamp
Differential Revision: https://reviews.llvm.org/D128332
This patch is a subpart of D125768 intented to make the review easier.
This patch introduces the same algorithms as in `libc/src/string/memory_utils/elements.h` but using the new API.
Differential Revision: https://reviews.llvm.org/D128335
This implements an autoupgrade from constant expressions to
instructions, which is needed for
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
The basic approach is that constant expressions (CST_CODE_CE_*
records) now initially only create a BitcodeConstant value that
holds opcode, flags and operands IDs. Then, when the value actually
gets used, it can be converted either into a constant expression
(if that expression type is still supported) or into a sequence of
instructions. As currently all expressions are still supported,
-expand-constant-exprs is added for testing purposes, to force
expansion.
PHI nodes require special handling, because the constant expression
needs to be evaluated on the incoming edge. We do this by putting
it into a temporary block and then wiring it up appropriately
afterwards (for non-critical edges, we could also move the
instructions into the predecessor).
This also removes the need for the forward referenced constants
machinery, as the BitcodeConstants only hold value IDs. At the
point where the value is actually materialized, no forward
references are needed anymore.
Differential Revision: https://reviews.llvm.org/D127729
These intrinsics should be able to use opaque pointers, because the
load/store type is already encoded in their names and return/operand type.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D128505
This patch adds the following intrinsics to support the SME ACLE:
* @llvm.aarch64.sme.mopa: Non-widening outer product + accumulate
* @llvm.aarch64.sme.mops: Non-widening outer product + subtract
* @llvm.aarch64.sme.mopa.wide: Widening outer product + accumulate
* @llvm.aarch64.sme.mops.wide: Widening outer product + subtract
* @llvm.aarch64.sme.smopa.wide: Widening signed sum of outer product + accumulate
* @llvm.aarch64.sme.smops.wide: Widening signed sum of outer product + subtract
* @llvm.aarch64.sme.umopa.wide: Widening unsigned sum of outer product + accumulate
* @llvm.aarch64.sme.umops.wide: Widening unsigned sum of outer product + subtract
* @llvm.aarch64.sme.sumopa.wide: Widening signed by unsigned sum of outer product + accumulate
* @llvm.aarch64.sme.sumops.wide: Widening signed by unsigned sum of outer product + subtract
* @llvm.aarch64.sme.usmopa.wide: Widening unsigned by signed sum of outer product + accumulate
* @llvm.aarch64.sme.usmops.wide: Widening unsigned by signed sum of outer product + subtract
Differential Revision: https://reviews.llvm.org/D127956
This removes the extractvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
extractvalue is already not supported in bitcode, so we do not need
to worry about bitcode auto-upgrade.
Uses of ConstantExpr::getExtractValue() should be replaced with
IRBuilder::CreateExtractValue() (if the fact that the result is
constant is not important) or ConstantFoldExtractValueInstruction()
(if it is). Though for this particular case, it is also possible
and usually preferable to use getAggregateElement() instead.
The C API function LLVMConstExtractValue() is removed, as the
underlying constant expression no longer exists. Instead,
LLVMBuildExtractValue() should be used (which will constant fold
or create an instruction). Depending on the use-case,
LLVMGetAggregateElement() may also be used instead.
Differential Revision: https://reviews.llvm.org/D125795
A tile slice offset of '0' is the default and by moving this into
SelectSMETileSlice we can remove some redundant patterns.
Reviewed By: kmclaughlin
Differential Revision: https://reviews.llvm.org/D128506
`commonAlignment` is a shortcut to pick the smallest of two `Align`
objects. As-is it doesn't bring much value compared to `std::min`.
Differential Revision: https://reviews.llvm.org/D128345
Also make the output of -emit-ast end up where /o points.
The same with .plist files from the static analyzer.
These are changes needed to make it possible to do CTU static
analysing work with clang-cl.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D128409
Before D126061, Clang would warn about this code
```
struct X {
[[deprecated]] struct Y {};
};
```
with the warning
attribute 'deprecated' is ignored, place it after "struct" to apply attribute to type declaration
D126061 inadvertently caused this warning to no longer be emitted. This patch
restores the previous behavior.
The reason for the bug is that after D126061, C++11 attributes applied to a
member declaration are no longer placed in `DS.getAttributes()` but are instead
tracked in a separate list (`DeclAttrs`). In the case of a free-standing
decl-specifier-seq, we would simply ignore the contents of this list. Instead,
we now pass the list on to `Sema::ParsedFreeStandingDeclSpec()` so that it can
issue the appropriate warning.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D128499
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.
One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.
Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.
Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D127031
Our investigation showed ProfileMap's key is the bottleneck of the memory consumption for CS profile generation on some large services. This patch tries to optimize it by storing the CS function samples using the context trie tree structure instead of the context frame array ref. Parts of code in `ContextTrieNode` are reused.
Our experiment on one internal service showed that the context key's memory can be reduced from 80GB to 300MB.
To be compatible with non-CS profiles, the profile writer still needs to use ProfileMap as input, so rebuild the ProfileMap using the context trie in `postProcessProfiles`.
The optimization is not complete yet, next step is to reimplement Pre-inliner or profile trimmer, after that, ProfileMap should be small to be written.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D125246
We already remove dots from collected paths and path mappings. This
makes it difficult to match paths inside the profile which contain
dots. For example, we would never match /path/to/../file.c because
the collected path is always be normalized to /path/file.c. This
change enables dot removal for paths inside the profile to address
the issue.
Differential Revision: https://reviews.llvm.org/D122750
Adding the accumulator value after the `vector.contract` changes the
precision of the operation. This makes sure the accumulator is carried
through to `vector.reduce` (and down to LLVM).
Differential Revision: https://reviews.llvm.org/D128674
This is another attempt to land this patch.
The patch proposed to use a new cost model for loop interchange,
which is obtained from loop cache analysis.
Given a loopnest, what loop cache analysis returns is a vector of
loops [loop0, loop1, loop2, ...] where loop0 should be replaced as
the outermost loop, loop1 should be placed one more level inside, and
loop2 one more level inside, etc. What loop cache analysis does is not
only more comprehensive than the current cost model, it is also a "one-shot"
query which means that we only need to query it once during the entire
loop interchange pass, which is better than the current cost model where
we query it every time we check whether it is profitable to interchange
two loops. Thus complexity is reduced, especially after D120386 where we
do more interchanges to get the globally optimal loop access pattern.
Updates made to test cases are mostly minor changes and some
corrections. One change that applies to all tests is that we added an option
`-cache-line-size=64` to the RUN lines. This is ensure that loop
cache analysis receives a valid number of cache line size for correct
analysis. Test coverage for loop interchange is not reduced.
Currently we did not completely remove the legacy cost model, but
keep it as fall-back in case the new cost model did not run successfully.
This is because currently we have some limitations in delinearization, which
sometimes makes loop cache analysis bail out. The longer term goal is to
enhance delinearization and eventually remove the legacy cost model
compeletely.
Reviewed By: bmahjour, #loopoptwg
Differential Revision: https://reviews.llvm.org/D124926
Including the following opcode:
Select_FPR16_Using_CC_GPR
Select_FPR32_Using_CC_GPR
Select_FPR64_Using_CC_GPR
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D127871
For example, when parsing Zbpbo0p911, an error will be reported:
"multi-character extensions must be separated by underscores"
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D128644