Allows shuffle combining through per-element shift nodes
This exposed a number of issues with shuffle combining with target intrinsics that are lowered to nodes later during legalization - in particular shuffle combining and SimplifyDemandedVectorElts were being called after canonicalizeShuffleWithBinOps, meaning that shuffles didn't have a chance to be combined away before the shuffle(binop(x,y)) -> binop(shuffle(x),shuffle(y)) fold.
I believe this is unexploitable because in either case the result
will be 'couldn't allocate register for constraint' error message,
but error code checking is clearly wrong.
Differential Revision: https://reviews.llvm.org/D117189
The PlatformKind/PlatformType enums contain the same information, which requires
them to be kept in-sync. This commit changes over to PlatformType as the sole
source of truth, which allows the removal of the redundant PlatformKind.
The majority of the changes were in LLD and TextAPI.
Reviewed By: cishida
Differential Revision: https://reviews.llvm.org/D117163
Shockingly we weren't doing this already. We should probably have this
be done earlier in the IR too, but it's still helpful to have the
lowering guarantee it so that we can modify the ABI implicit inputs
based on it.
If we know we we aren't using a component from the kernel, we can save
a few bit packing instructions.
We're still enabling the VGPR input to the kernel though.
As per https://github.com/flang-compiler/f18-llvm-project/issues/1344,
the `flang` bash script works fine with 4.4.19 and requiring
4.4.23 is too restrictive. Rather than keep updating the patch level,
this patch removes this particular check (so that it will only check the
major and minor versions instead).
As this is both rather straightforward and urgent, I'm merging this
without a review.
This patch adds the `weak` identifier to the openmp device environment
variable. The changes introduced in https://reviews.llvm.org/D117211
result in multiply defined symbols. Because the symbol is potentially
included multiple times for each offloading file we will get symbol
colisions, and because it needs to have external visiblity it should be
weak.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D117231
This completes the implementation of
WG14 N2412 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf),
which standardizes C on a twos complement representation for integer
types. The only work that remained there was to define the correct
macros in the standard headers, which this patch does.
Functions pointers should be created with program address space. This
patch introduces program address space in TargetInfo. Targets with
non-default (default is 0) address space for functions should explicitly
set this value. This patch fixes a crash on lvalue reference to function
pointer (in device code) when using oneAPI DPC++ compiler.
Differential Revision: https://reviews.llvm.org/D111566
to repro error.
As mentioned yesterday, I've got a problem that I can only reproduce on
Godbolt (none of the build configs on my local machine!), so this is at
least somewhat usable until I figure out a cause.
This was left over from when I had used some pointer authentication
instructions to sign the pointer. Then I realised that simply setting
the top byte is enough to prove the ABI plugin is being called.
Top byte ignore is a feature of the armv8-a architecure and doesn't
need any extra compiler flags.
The AST doesn't track their locations, and the default behavior of attributing
them to the lexically-enclosing node is sloppy and often inaccurate.
Also add a couple of passing test cases for declarators that weren't obvious.
Differential Revision: https://reviews.llvm.org/D117185
When searching for AST nodes that may overlap the selection, mayHit() was only
attempting to prune nodes whose begin/end are both in the main file.
While failing to prune never gives wrong results, it hurts performance.
In GTest unit-tests, `TEST()` macros at the top level declare classes.
These were never pruned and we traversed *every* such class for any selection.
We fix this by reasoning about what tokens such a node might claim.
They must lie within its ultimate macro expansion range, so if this doesn't
overlap with the selection, we can prune the node.
Differential Revision: https://reviews.llvm.org/D116978
Partial element rotate patterns (e.g. for element insertion on Issue #53124) were being split if every lane wasn't crossing, but really there's a good repeated mask hiding in there.
Without this patch when using CMAKE_CXX_STANDARD=20
and MSVC 19.30.30705.0 compilation fails with
clang\lib\Tooling\Syntax\Tree.cpp(347): error C2666: 'clang::syntax::Tree::ChildIteratorBase<clang::syntax::Tree::ChildIterator,clang::syntax::Node>::operator ==': 4 overloads have similar conversions
clang\lib\Tooling\Syntax\Tree.cpp(392): error C2666: 'clang::syntax::Tree::ChildIteratorBase<clang::syntax::Tree::ChildIterator,clang::syntax::Node>::operator ==': 4 overloads have similar conversions
Note that removed comment that
"iterator_facade_base requires == to be a member"
was made obsolete by change https://reviews.llvm.org/D78938
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D116904
A lot of neon intrinsics work lane-wise, meaning that non-demanded
elements in and not demanded out. This teaches that to
AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic for some simple
single-input truncate intrinsics, which can help remove unnecessary
instructions.
Differential Revision: https://reviews.llvm.org/D117097
LLVM Dialect Constant Op translations assume that if the attribute is a
vector, it's a fixed length one, generating an invalid translation for
constant scalable vector initializations.
Differential Revision: https://reviews.llvm.org/D117125
Since 26c6a3e736, LLVM's inliner will "upgrade" the caller's stack protector
attribute based on the callee. This lead to surprising results with Clang's
no_stack_protector attribute added in 4fbf84c173 (D46300). Consider the
following code compiled with clang -fstack-protector-strong -Os
(https://godbolt.org/z/7s3rW7a1q).
extern void h(int* p);
inline __attribute__((always_inline)) int g() {
return 0;
}
int __attribute__((__no_stack_protector__)) f() {
int a[1];
h(a);
return g();
}
LLVM will inline g() into f(), and f() would get a stack protector, against the
users explicit wishes, potentially breaking the program e.g. if h() changes the
value of the stack cookie. That's a miscompile.
More recently, bc044a88ee (D91816) addressed this problem by preventing
inlining when the stack protector is disabled in the caller and enabled in the
callee or vice versa. However, the problem remained if the callee is marked
always_inline as in the example above. This affected users, see e.g.
http://crbug.com/1274129 and http://llvm.org/pr52886.
One way to fix this would be to prevent inlining also in the always_inline
case. Despite the name, always_inline does not guarantee inlining, so this
would be legal but potentially surprising to users.
However, I think the better fix is to not enable the stack protector in a
caller based on the callee. The motivation for the old behaviour is unclear, it
seems counter-intuitive, and causes real problems as we've seen.
This commit implements that fix, which means in the example above, g() gets
inlined into f() (also without always_inline), and f() is emitted without stack
protector. I think that matches most developers' expectations, and that's also
what GCC does.
Another effect of this change is that a no_stack_protector function can now be
inlined into a stack protected function, e.g. (https://godbolt.org/z/hafP6W856):
extern void h(int* p);
inline int __attribute__((__no_stack_protector__)) __attribute__((always_inline)) g() {
return 0;
}
int f() {
int a[1];
h(a);
return g();
}
I think that's fine. Such code would be unusual since no_stack_protector is
normally applied to a program entry point which sets up the stack canary. And
even if such code exists, inlining doesn't change the semantics: there is still
no stack cookie setup/check around entry/exit of the g() code region, but there
may be in the surrounding context, as there was before inlining. This also
matches GCC.
See also the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722
Differential revision: https://reviews.llvm.org/D116589
IR:
- globals (and functions, ifuncs, aliases) can have a partition
- catchret has a `to` before the label
- the sint/int types do not exist
- signext comes after the type
- a variable was missing its type
TableGen:
- The second value after a `#` concatenation is optional
See e.g. llvm/lib/Target/X86/X86InstrAVX512.td:L3351
- IncludeDirective and PreprocessorDirective were never referenced in
the grammar
- Add some missing ;
- Parent classes of multiclasses can have generic arguments.
Reuse the `ParentClassList` that is already used in other places.
MIR:
- liveins only allows physical registers, which start with a $
Differential Revision: https://reviews.llvm.org/D116674