This patch adds further optimization techniques to RVV BUILD_VECTOR
lowering. It teaches the compiler to find splats of larger vector
element types "hidden" in smaller ones. For example, a v4i8 build_vector
(0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally
more optimal than the dominant-element BUILD_VECTORs and so takes
priority.
This optimization is currently limited to all-constant-or-undef
BUILD_VECTORs as those were found to be the most common. There's no
reason this couldn't be extended to other BUILD_VECTORs, but the
additional bit-manipulation instructions may require more sophisticated
heuristics.
There are some cases where the materialization of the larger constant
takes more scalar instructions than it does to build the vector with
vector instructions. We could add heuristics to try and catch this.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99195
Until AVX512 we don't have any vector truncation instructions, and always lower using shuffles instead.
combineVectorTruncation performs this earlier than lowering as it makes it easier to use any sign/zero-extended bits in the truncated bits with PACKSS/PACKUS to perform the shuffle.
We currently don't attempt to use combineVectorTruncation on AVX2 targets as in the past 256-bit PACKSS/PACKUS tended to cause 128-bit lane shuffle regressions - but these should now be all resolved with combineHorizOpWithShuffle and in all cases we now reduce the amount of cross-lane shuffling and variable shuffle mask usage.
Differential Revision: https://reviews.llvm.org/D96609
LowerVSETCC calls splitIntVSETCC after canonicalizing certain patterns, in particular (X & CPow2 != 0) -> (X & CPow2 == CPow2).
Unfortunately if we're splitting for AVX1/non-AVX512BW cases, we lose these canonicalizations as we call the split with the original SetCC node, and when the split nodes are later lowered in LowerVSETCC the patterns are lost behind extract_subvector etc. But if we pass the canonicalized operands for splitting we retain the optimizations.
Differential Revision: https://reviews.llvm.org/D99256
Commit
f7f9f94b2e
changed the indent of ObjC method arguments from +4 to +2, if the method
occurs after a block statement. I believe this was unintentional and there
was insufficient ObjC test coverage to catch this.
Example: `clang-format -style=google test.mm`
before:
```
void aaaaaaaaaaaaaaaaaaaaa(int c) {
if (c) {
f();
}
[dddddddddddddddddddddddddddddddddddddddddddddddddddddddd
eeeeeeeeeeeeeeeeeeeeeeeeeeeee:^(fffffffffffffff gggggggg) {
f(SSSSS, c);
}];
}
```
after:
```
void aaaaaaaaaaaaaaaaaaaaa(int c) {
if (c) {
f();
}
[dddddddddddddddddddddddddddddddddddddddddddddddddddddddd
eeeeeeeeeeeeeeeeeeeeeeeeeeeee:^(fffffffffffffff gggggggg) {
f(SSSSS, c);
}];
}
```
Differential Revision: https://reviews.llvm.org/D99063
In case an operation in a global initializer region refers to another
global variable defined afterwards in the module of itself, translation
to LLVM IR was currently crashing because it could not find the LLVM IR global
when going through the initializer block.
To solve this problem, split global conversion to LLVM IR into two passes. A
first pass that creates LLVM IR global variables, and a second one that converts
the initializer, if any, and adds it to the llvm global.
Differential Revision: https://reviews.llvm.org/D99246
This safeguards against cases if some of the env vars contain chars
that are problematic for shells, e.g. if called with --env "X=Y;Z".
(In cases of cross testing for windows, the PATH variable can end up
specified with semicolon separators - even if specifying a PATH when
cross testing in such differing environments might not make sense or
do anything - but this makes ssh.py not break on such a variable.)
Differential Revision: https://reviews.llvm.org/D99242
Don't run the 'tar' tool in a cleared environment with only the
LANG variable set, just set LANG on top of the existing environment.
If the 'tar' tool is an MSYS based tool, running it in a Windows
Container hangs if all environment variables are cleared - in
particular, the USERPROFILE variable needs to be kept intact.
This is the same issue fixed as was fixed in other places in
9de63b2e05, but contrary to running
the actual tests, running with an as-cleared-as-possible environment
here is less important.
Differential Revision: https://reviews.llvm.org/D99304
Summary: Add -fno-split-stack and rename CC1 option from `-split-stacks`
to `-fsplit-stack`.
Test Plan: check-all
Differential Revision: https://reviews.llvm.org/D99245
This may occur when swifterror codegen in the translator generates these,
but we shouldn't try to handle them since they should have regclasses anyway.
rdar://75784009
Differential Revision: https://reviews.llvm.org/D99287
This implements various idioms using ctlz/cttz like Log2, Log2_Ceil,
findFirstSetBit, etc.
Some of these demonstrate that we fail to use clzw because the
idiom breaks the isel patterns we use. The isel pattern we use
is (add (cttz (and X, 0xffffffff)), -32). Some of the idioms
cause the constant on the add to be different.
In particular for Graph Regions, the terminator needs is just a
historical artifact of the generalization of MLIR from CFG region.
Operations like Module don't need a terminator, and before Module
migrated to be an operation with region there wasn't any needed.
To validate the feature, the ModuleOp is migrated to use this trait and
the ModuleTerminator operation is deleted.
This patch is likely to break clients, if you're in this case:
- you may iterate on a ModuleOp with `getBody()->without_terminator()`,
the solution is simple: just remove the ->without_terminator!
- you created a builder with `Builder::atBlockTerminator(module_body)`,
just use `Builder::atBlockEnd(module_body)` instead.
- you were handling ModuleTerminator: it isn't needed anymore.
- for generic code, a `Block::mayNotHaveTerminator()` may be used.
Differential Revision: https://reviews.llvm.org/D98468
The objc_debug_isa_class_mask magic value that the objc runtime vends
is now initialized using a static initializer instead of a constant
value. The runtime plugin itself will be initialized before the value
is computed and as a result, the cache will get the wrong value.
Making the creation of the NonPointerIsaCache fully lazy fixes this.
Before the conversion to LLVM-IR dialect and ultimately LLVM IR, FIR is
partially rewritten into a codegen form. This patch adds that pass, the
fircg dialect, and the small set of Ops in the fircg (sub) dialect.
Fircg is not part of the FIR dialect and should never be used outside of
the (closed) conversion to LLVM IR.
Authors: Eric Schweitz, Jean Perier, Rajan Walia, et.al.
Differential Revision: https://reviews.llvm.org/D98063
Lowering of bitwise_not to linalg dialect using a xor operation with a constant
of all-bits-one.
Differential Revision: https://reviews.llvm.org/D99221
This implements a subset of the initial set of inference rules proposed in the llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree". The nolias one got moved to a separate review as there was some concerns raised which require further discussion.
Differential Revision: https://reviews.llvm.org/D99135
With cost-benefit analysis for inlining, we bypass the cost-threshold by returning inline result from call analyzer early.
However the cost and threshold are still available from call analyzer, and when cost is actually higher than threshold, we incorrect set the reason.
The change makes the decision from cost-benefit analysis explicit. It's mostly NFC, except that it allows the priority-based sample loader inliner used by CSSPGO to use cost-benefit heuristic.
Differential Revision: https://reviews.llvm.org/D99302
```
Warn when a function pointer is cast to an incompatible function
pointer. In a cast involving function types with a variable argument
list only the types of initial arguments that are provided are
considered. Any parameter of pointer-type matches any other
pointer-type. Any benign differences in integral types are ignored, like
int vs. long on ILP32 targets. Likewise type qualifiers are ignored. The
function type void (*) (void) is special and matches everything, which
can be used to suppress this warning. In a cast involving pointer to
member types this warning warns whenever the type cast is changing the
pointer to member type. This warning is enabled by -Wextra.
```
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D97831
This patch enables the cost-benefit-analysis-based inliner by default
if we have instrumentation profile.
- SPEC CPU 2017 shows a 0.4% improvement.
- An internal large benchmark shows a 0.9% reduction in the cycle
count along with 14.6% reduction in the number of call instructions
executed.
Differential Revision: https://reviews.llvm.org/D98213
This reverts commit aae84b8e39.
The chromium goma folks want to use a Debian sysroot without
lib/x86_64-linux-gnu to perform `clang -c` but no link action. The previous
commit has removed D.getVFS().exists check to make such usage work.
Not only can this save unneeded filesystem stats, it can make `clang
--sysroot=/path/to/debian-sysroot -c a.cc` work (get `-internal-isystem
$sysroot/usr/include/x86_64-linux-gnu`) even without `lib/x86_64-linux-gnu/`.
This should make thakis happy.
For such op chains, we can create new linalg.fill ops
with the result type of the linalg.tensor_reshape op.
Differential Revision: https://reviews.llvm.org/D99116
init tensor operands also has indexing map and generally follow
the same constraints we expect for non-init-tensor operands.
Differential Revision: https://reviews.llvm.org/D99115
This commit exposes an option to the pattern
FoldWithProducerReshapeOpByExpansion to allow
folding unit dim reshapes. This gives callers
more fine-grained controls.
Differential Revision: https://reviews.llvm.org/D99114
This identifies a pattern where the producer affine min/max op
is bound to a dimension/symbol that is used as a standalone
expression in the consumer affine op's map. In that case the
producer affine min/max op can be merged into its consumer.
For example, a pattern like the following:
```
%0 = affine.min affine_map<()[s0] -> (s0 + 16, s0 * 8)> ()[%sym1]
%1 = affine.min affine_map<(d0)[s0] -> (s0 + 4, d0)> (%0)[%sym2]
```
Can be turned into:
```
%1 = affine.min affine_map<
()[s0, s1] -> (s0 + 4, s1 + 16, s1 * 8)> ()[%sym2, %sym1]
```
Differential Revision: https://reviews.llvm.org/D99016
If there are multiple identical expressions in an affine
min/max op's map, we can just keep one.
Differential Revision: https://reviews.llvm.org/D99015
Until now Linalg fusion only allow fusing producers whose operands
are all permutation indexing maps. It's easier to deduce the
subtensor/subview but it is an unnecessary constraint, as in tiling
we have more advanced logic to deduce the subranges even when the
operand is not of permutation indexing maps, e.g., the input operand
for convolution ops.
This patch uses the logic on tiling side to deduce subranges for
fusion. This enables fusing convolution with its consumer ops
when possible.
Along the way, we are now generating proper affine.min ops to guard
against size boundaries, if we cannot be certain they won't be
out of bounds.
Differential Revision: https://reviews.llvm.org/D99014
This is a preparation step to reuse makeTiledShapes in tensor
fusion. Along the way, did some lightweight cleanups.
Differential Revision: https://reviews.llvm.org/D99013
This path would unblock the build of libc++ library on AIX:
1. Add _AIX guard for _LIBCPP_HAS_THREAD_API_PTHREAD
2. Use uselocale to actually take the locale setting
into account.
3. extract_mtime and extract_atime mod needed for AIX. As stat
structure on AIX uses internal structure st_timespec to store
time for binary compatibility reason. So we need to convert it
back to timespec here.
4. Do not build cxa_thread_atexit.cpp for libcxxabi on AIX.
Differential Revision: https://reviews.llvm.org/D97558
This is similar to the select logic just ahead of the new code.
Min/max choose exactly one value from the inputs, so if both of
those are a power-of-2, then the result must be a power-of-2.
This might help with D98152, but we likely still need other
pieces of the puzzle to avoid regressions.
The change in PatternMatch.h is needed to build with clang.
It's possible there is a better way to deal with the 'const'
incompatibities.
Differential Revision: https://reviews.llvm.org/D99276