No longer go through an external model. Also put BufferizableOpInterface into the same build target as the BufferizationDialect. This allows for some code reuse between BufferizationOps canonicalizers and BufferizableOpInterface implementations.
Differential Revision: https://reviews.llvm.org/D117987
This can show up during when bitreverse is expanded to bswap and
swap of bits within a byte. If the input is already a bswap, we
should cancel them out before we further transform them in a way
that makes it harder to see the redundancy.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D118007
Add might be faster than shift. We can't do this earlier without
using a Freeze instruction.
This is the intrinsic version of D106689.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D118013
Both insertion points are valid. This is to make BufferizableOpInteface-based bufferization compatible with existing partial bufferization test cases. (So less changes are necessary to unit tests.)
Differential Revision: https://reviews.llvm.org/D117986
Patch originally by oktal3000: https://github.com/mikael-s-persson/templight/pull/40
When a template parameter is unnamed, the name of -templight-dump might return
an empty string. This is fine, they are unnamed after all, but it might be more
user friendly to at least describe what entity is unnamed.
Differential Revision: https://reviews.llvm.org/D115521
Add the MemoryAllocation pass into the pipeline. Add
the possibilty to pass the options directly within the tool (tco).
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D117886
This is the only op that is not supported via BufferizableOpInterfaceImpl bufferization. Once this op is supported we can switch `tensor-bufferize` over to the new unified bufferization.
Differential Revision: https://reviews.llvm.org/D117985
A number of the filesystem tests create a directory that contains a bad
symlink. On AIX recursively setting permissions on said directory will
return a non-zero value because of the bad symlink, however the
following rm -r still completes successfully. Avoid the assertion on
AIX, and rely on the return value of the remove command to detect
problems.
Differential Revision: https://reviews.llvm.org/D112086
This is in preparation of unifying the existing bufferization with One-Shot bufferization.
A subsequent commit will replace `tensor-bufferize`'s implementation with the BufferizableOpInterface-based implementation and move over missing test cases.
Differential Revision: https://reviews.llvm.org/D117984
These have negative / out of bounds frame index values and would
assert when trying to set the BitVector. Fixed stack objects can't be
colored away so ignore them.
- Fixes https://github.com/llvm/llvm-project/issues/53227 that wrongly
indents multiline comments
- Fixes wrong detection of single-line opening braces when used along
with those only opening scopes, causing crashes due to duplicated
replacements on the same token:
void foo()
{
{
int x;
}
}
- Fixes wrong recognition of first line of definition when the line
starts with block comment, causing crashes due to duplicated
replacements on the same token for this leads toward skipping the line
starting with inline block comment:
/*
Some descriptions about function
*/
/*inline*/ void bar() {
}
- Fixes wrong recognition of enum when used as a type name rather than
starting definition block, causing crashes due to duplicated
replacements on the same token since both actions for enum and for
definition blocks were taken place:
void foobar(const enum EnumType e) {
}
- Change to use function keyword for JavaScript instead of comparing
strings
- Resolves formatting conflict with options EmptyLineAfterAccessModifier
and EmptyLineBeforeAccessModifier (prompts with --dry-run (-n) or
--output-replacement-xml but no observable change)
- Recognize long (len>=5) uppercased name taking a single line as return
type and fix the problem of adding newline below it, with adding new
token type FunctionLikeOrFreestandingMacro and marking tokens in
UnwrappedLineParser:
void
afunc(int x) {
return;
}
TYPENAME
func(int x, int y) {
// ...
}
- Remove redundant and repeated initialization
- Do no change to newlines before EOF
Reviewed By: MyDeveloperDay, curdeius, HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D117520
This patch introduces new intrinsics that enable the use of vsetvli in
contexts where only the returned vector length is of interest. The
pre-existing intrinsics are marked with side-effects, which prevents
even trivial optimizations on/across them.
These intrinsics are intended to be used in situations where the vector
length is fed in turn to RVV intrinsics or to vector-predication
intrinsics during loop vectorization, for example. Those codegen paths
ensure that instructions are generated with their own implicit vsetvli,
so the vector length and vtype can be relied upon to be correct.
No corresponding C builtins are planned at this stage, though that is a
possibility for the future if the need arises.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117910
Together with the previous commit which mainly documents better LoopFlatten's
overall strategy, this addresses a concern added as a FIXME comment in D110587;
the code refactoring (NFC) introduces functions (also for the SCEV usage) to
make this clearer.
There's some unnecessary code duplication in the parser. This
refactors that and deploys boolean variables to avoid the duplication.
These also happen to help adding module demangling (with an updated
mangling scheme).
1a) The grammar requires some lookahead concerning <template-args>. We
may discover an <unscoped-name> is actually <unscoped-template-name>
<template-args>. (When <unscoped-name> was a substitution, there must
be a following <template-args>.) Refactor parseName to only have one
code path looking for the 'I' indicating <template-args>.
1b) While there I altered the control flow to hold the result in a
variable, rather than tail call. Made it easier to debug (and of
course an optimizer will DTRT here anyway).
2a) An <unscoped-name> can have an St or StL prefix. No need for
completely separate code paths handling the following unqualified-name
though.
2b) Also no need to look for both 'St' and 'StL' separately. Look for
'St' and then conditionally swallow an 'L'.
3) We get a similar issue as #1a when parsing a typeName. Here I just
change the control flow slightly to bring the 'break' out to the end
of the 'S' block and embed the early return inside an if. That's more
in keeping with the code style.
4) Although NFC, there's a new testcase as that's not covered by the
existing demangler tests and is significant in the #1a case above.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D117879
To try and avoid undesired changes to the non-canonical demangler
sources, change the cp-to-llvm script to (a) write-protect the target
files and (b) prepend 'do not edit' comments that are significant to
emacs[*], and hopefully humans.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D118008
Recent commits changed llvm/include/llvm/Demangle without also
changing libcxxabi/src/Demangle, which is the canonical source
location. This resyncs those commits to the libcxxabi directory.
Commits:
* 1f9e18b656
* f53d359816
* 065044c443
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D117990.diff
tco is a tool to test the FIR to LLVM IR pipeline of the Flang compiler.
This patch update tco pipelines and adds the translation to LLVM IR.
A simple test is added to make sure the tool is working with a simple
FIR program.
More tests will be upstream in follow up patch from the fir-dev branch.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: kiranchandramohan, awarzynski, schweitz, mehdi_amini
Differential Revision: https://reviews.llvm.org/D117781
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Andrzej Warzynski <andrzej.warzynski@arm.com>
Only using that change in StringRef already decreases the number of
preoprocessed lines from 7837621 to 7776151 for LLVMSupport
Perhaps more interestingly, it shows that many files were relying on the
inclusion of StringRef.h to have the declaration from STLExtras.h. This
patch tries hard to patch relevant part of llvm-project impacted by this
hidden dependency removal.
Potential impact:
- "llvm/ADT/StringRef.h" no longer includes <memory>,
"llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h"
Related Discourse thread:
https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831
New test checks results of BasicAA for llvm.memcpy.*/llvm.memmove.* intrinsics in presence of deopt bundle. By specification expected result for unrelated global memory should be Ref. Currently this is not the case and will be fixed in upcoming patches.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D118031
This patch moves merging of duplicate divisions to presburger utility
functions. This is required to support division merging in structures other
than IntegerPolyhedron.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D118001
This patch adds support for zbkx extension from K extension(v1.0.0) in MC layer.
Instructions with same functionality and same encoding is defined in the bitmanip extension.
It defines {Xperm8, Xperm4} as instruction aliases for xperm.* in Zbp extension. When Zbkx is enabled while Zbp is not, xperm.h will not be available. When Zbkx and Zbp are both enabled, the instructions will be decoded in Zbp format.
[[ https://reviews.llvm.org/D94999 | D94999 ]] this is the patch that introduces xperm.* instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117889
isCandidateForEpilogueVectorization will currently return false for loops
which contain reductions. This patch removes this restriction and makes
the following changes to support epilogue vectorisation with reductions:
- `fixReduction`: If fixReduction is being called during vectorisation of the
epilogue, the phi node it creates will need to additionally carry incoming
values from the middle block of the main loop.
- `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi
created by fixReduction are updated after the vec.epilog.iter.check block
is added. The phi is also moved to the preheader of the epilogue.
- `processLoop`: The start value of any VPReductionPHIRecipes are updated before
vectorising the epilogue loop. The getResumeInstr function added to the ILV
will return the resume instruction associated with the recurrence descriptor.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D116928
None of these have any reordering issues, and they still emit the same reduction intrinsics without any change in the existing test coverage:
llvm-project\clang\test\CodeGen\X86\avx512-reduceIntrin.c
llvm-project\clang\test\CodeGen\X86\avx512-reduceMinMaxIntrin.c
Differential Revision: https://reviews.llvm.org/D117881
This commit introduces a new check `readability-container-contains` which finds
usages of `container.count()` and `container.find() != container.end()` and
instead recommends the `container.contains()` method introduced in C++20.
For containers which permit multiple entries per key (`multimap`, `multiset`,
...), `contains` is more efficient than `count` because `count` has to do
unnecessary additional work.
While this this performance difference does not exist for containers with only
a single entry per key (`map`, `unordered_map`, ...), `contains` still conveys
the intent better.
Reviewed By: xazax.hun, whisperity
Differential Revision: http://reviews.llvm.org/D112646
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions
This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
// CHECK-LABEL: test_mm256_max_epu32
// CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Sibling patch to D117791
Differential Revision: https://reviews.llvm.org/D117798
Pass a ValueRange instead of an ArrayRef<Value> for better compatibility. Also provide an additional function overload that automatically deallocates the buffer if specified.
Differential Revision: https://reviews.llvm.org/D118025
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)
This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
// CHECK-LABEL: test_mm256_abs_epi8
// CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).
Differential Revision: https://reviews.llvm.org/D117791
In code review for D117104 two slightly weird checks were found
in DAGCombiner::reduceLoadWidth. They were typically checking
if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)),
but such a comparison actually only make sense if BitsB is a power
of two.
The checks were related to the code that attempted to shrink a load
based on the fact that the loaded value would be right shifted.
Afaict the legality of the value types is checked later (typically in
isLegalNarrowLdSt), so the existing checks were both overly
conservative as well as being wrong whenever ExtVTBits wasn't a
power of two. The latter was a situation triggered by a number of
lit tests so we could not just assert on ExtVTBIts being a power of
two).
When attempting to simply remove the checks I found some problems,
that seems to have been guarded by the checks (maybe just out of
luck). A typical example would be a pattern like this:
t1 = load i96* ptr
t2 = srl t1, 64
t3 = truncate t2 to i64
When DAGCombine is visiting the truncate reduceLoadWidth is called
attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then
the SRL is detected and we set ShAmt to 64.
In the past we've bailed out due to i96 not being a multiple of 64.
If we simply remove that check then we would end up replacing the
load with a new load that would read 64 bits but with a base pointer
adjusted by 64 bits. So we would read 32 bits the wasn't accessed by
the original load.
This patch will instead utilize the fact that the logical left shift
can be folded away by using a zextload. Thus, the pattern above will
now be combined into
t3 = load i32* ptr+offset, zext to i64
Another case is shown in the X86/shift-folding.ll test case:
t1 = load i32* ptr
t2 = srl i32 t1, 8
t3 = truncate t2 to i16
In the past we bailed out due to the shift count (8) not being a
multiple of 16. Now the narrowing kicks in and we get
t3 = load i16* ptr+offset
Differential Revision: https://reviews.llvm.org/D117406
This is a pre-commit of test cases relevant for D117406.
@srl_load_narrowing1 is showing a pattern that could be folded into
a more narrow load.
@srl_load_narrowing2 is showing a similar pattern that happens to
be optimized already, but that happens in two steps (first triggering
a combine based on SRL and later another combine based on TRUNCATE).
Differential Revision: https://reviews.llvm.org/D117588