Summary:
The code was assuming in a few places that if there was only one exit
from the function that it was a normal return, which is invalid. It
could be an infinite loop, in which case we still need to insert the
usual fake edge so that the null export happens. This fixes shaders that
end with an infinite loop that discards.
Reviewers: arsenm, nhaehnle, critson
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71192
Summary:
Due to the fact that kill is just a normal intrinsic, even though it's
supposed to terminate the thread, we can end up with provably infinite
loops that are actually supposed to end successfully. The
AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because
there's no obvious place to make the loop branch to, it just makes it
return immediately, which skips the exports that are supposed to happen
at the end and hangs the GPU if all the threads end up being killed.
While it would be nice if the fact that kill terminates the thread were
modeled in the IR, I think that the structurizer as-is would make a mess if we
did that when the kill is inside control flow. For now, we just add a null
export at the end to make sure that it always exports something, which fixes
the immediate problem without penalizing the more common case. This means that
we sometimes do two "done" exports when only some of the threads enter the
discard loop, but from tests the hardware seems ok with that.
This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70781
cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M
As discussed in PR44588:
https://bugs.llvm.org/show_bug.cgi?id=44588
...we try harder to push shuffles after binops than after compares.
This patch handles the special (but presumably most common case) of
splat shuffles. If both operands are splats, then we can do the
comparison on the non-splat inputs followed by splat of the compare.
That should take care of the regression noted in D73411.
There's another potential fold requested in PR37463 to scalarize the
compare, but that's another patch (and it's not clear if we can do
that without the ability to undo it later):
https://bugs.llvm.org/show_bug.cgi?id=37463
Differential Revision: https://reviews.llvm.org/D73575
Summary:
Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:
%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.
This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:
int16x8_t v = {2,3,4,5,6,7,8,9};
a = vqdmulh_laneq_s16(a, v, 0);
b = vqdmulh_laneq_s16(b, v, 1);
c = vqdmulh_laneq_s16(c, v, 2);
d = vqdmulh_laneq_s16(d, v, 3);
[...]
In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.
We could teach the compiler to recover the lane variants, but this would likely
require its own pass. (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)
This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:
- sqdmulh_lane
- sqdmulh_laneq
- sqrdmulh_lane
- sqrdmulh_laneq.
These 'lane' variants need an additional register class. The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.
Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.
This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).
Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71469
This adds some missing MVE opcodes to evaluateBranch, which results in
llvm-objdump being able to print the PC relative branch target as an
annotation.
Differential Revision: https://reviews.llvm.org/D73553
Summary:
This change adds an option to insert trailing commas into container
literals. For example, in JavaScript:
const x = [
a,
b,
^~~~~ inserted if missing.
]
This is implemented as a seperate post-processing pass after formatting
(because formatting might change whether the container literal does or
does not wrap). This keeps the code relatively simple and orthogonal,
though it has the notable drawback that the newly inserted comma is not
taken into account for formatting decisions (e.g. it might exceed the 80
char limit). To avoid exceeding the ColumnLimit, a comma is only
inserted if it fits into the limit.
Trailing comma insertion conceptually conflicts with argument
bin-packing: inserting a comma disables bin-packing, so we cannot do
both. clang-format rejects FormatStyle configurations that do both with
this change.
Reviewers: krasimir, MyDeveloperDay
Subscribers: cfe-commits
Tags: #clang
Summary:
there is a slight behavior change in this patch:
- before: `in^t a;`, returns our internal error message (no symbol at given location)
- after: `in^t a, returns null, and client displays their message (e.g.
e.g. "the element can't be renamed" in vscode).
both are sensible according LSP, and we'd save one `rename` call in the later case.
Reviewers: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73610
D73474 disabled the generation of interworking thunks for branch
relocations to non STT_FUNC symbols. This patch handles the case of BL and
BLX instructions to non STT_FUNC symbols. LLD would normally look at the
state of the caller and the callee and write a BL if the states are the
same and a BLX if the states are different.
This patch disables BL/BLX substitution when the destination symbol does
not have type STT_FUNC. This brings our behavior in line with GNU ld which
may prevent difficult to diagnose runtime errors when switching to lld.
Differential Revision: https://reviews.llvm.org/D73542
Summary:
Barrier is a simple operation that takes no arguments and returns
nothing, but implies a side effect (synchronization of all threads)
Reviewers: jdoerfert
Subscribers: mgorny, guansong, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72400
Summary:
No need to pass fno-delayed-template-parsing as the opposite flag is
only passed to cc1 when abi is set to msvc. Sending as a follow-up to D73613 to
keep changes in the release branch minimal.
Reviewers: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73615
Many of the debug line prologue errors are not inherently fatal. In most
cases, we can make reasonable assumptions and carry on. This patch does
exactly that. In the case of length problems, the approach of "assume
stated length is correct" is taken which means the offset might need
adjusting.
This is a relanding of b94191fe, fixing an LLD test and the LLDB build.
Reviewed by: dblaikie, labath
Differential Revision: https://reviews.llvm.org/D72158
Summary:
This removes a couple of unnecessary isReg checks, now that
memOpsHaveSameBasePtr can handle FI operands, but is otherwise NFC.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73485
Summary:
This patch adds a simple mechanism to disallow global rename
on std symbols. We might extend it to other symbols, e.g. protobuf.
Reviewers: kadircet
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73450
Add several new helpers to RDA:
- hasLocalDefBefore
- isRegDefinedAfter
- isSafeToDefRegAt
And move two bits of logic from ARMLowOverheadLoops into RDA:
- isSafeToMove
- isSafeToRemove
Both of these have some wrappers too to make them more convienent to
use.
Differential Revision: https://reviews.llvm.org/D73460
Summary:
I noticed this strange line in `ASTImporterDelegate::ImportDefinitionTo` which doesn't make a lot of sense:
```
to_tag->setCompleteDefinition(from_tag->isCompleteDefinition());
```
It forcibly sets the imported TagDecl to be defined if the source TagDecl was defined. This doesn't make any
sense as in this code we already forced the ASTImporter to import the definition so this should always be
a no-op.
Turns out this is hiding two bugs:
1. The way we handle forward declarations in the debug info that might be completed later is that we
import them and then mark them as having external lexical storage. This makes Clang ask for the definition
later when it needs it (at which point we hopefully have the definition around and can complete it). However,
this is currently not completing the forward decls with external storage but instead creates a duplicated decl
in the target AST which is then defined. The forward decl is kept incomplete after the import and we just
forcibly make it a definition of the record without any content with our workaround. The TestSharedLib* tests
is only passing because of this.
2. Minimal import of lambdas is broken and never imports the definition it seems. That appears to be a bug
in the ASTImporter which gives the definition of lambda's some special treatment. TestLambdas.py is actually
broken but is passing because of this workaround.
This patch fixes the first bug by forcing the ASTImporter to import to the target forward declaration. We can't
delete the workaround as the second bug is still around but that will be a follow up review for the ASTImporter.
However it will get rid of all the duplicated RecordDecls that are in our expression AST that are strangely defined
but don't have any of the fields they are supposed to have.
Reviewers: shafik, labath
Reviewed By: shafik
Subscribers: aprantl, abidh, JDevlieghere, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D73345
Summary:
Currently we crash in Clang's CodeGen when we call functions with covariant return types with this assert:
```
Assertion failed: (DD && "queried property of class with no definition"), function data, file clang/include/clang/AST/DeclCXX.h, line 433.
```
when calling `clang::CXXRecordDecl::isDerivedFrom` from the `ItaniumVTableBuilder`.
Clang seems to assume that the underlying record decls of covariant return types are already completed.
This is true during a normal Clang invocation as there the type checker will complete both decls when
checking if the overloaded function is valid (i.e., the return types are covariant).
When we minimally import our AST into the expression in LLDB we don't do this type checking (which
would complete the record decls) and we end up trying to access the invalid record decls from CodeGen
which makes us trigger the assert.
This patch just completes the underlying types of ptr/ref return types of virtual function so that the
underlying records are complete and we behave as Clang expects us to do.
Fixes rdar://38048657
Reviewers: lhames, shafik
Reviewed By: shafik
Subscribers: abidh, JDevlieghere, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D73024
Any REPL client should just move to CompletionRequest instead of relying on
the translation code from the old API, so let's remove that translation code.
For `ret i64 add (i64 ptrtoint (i32* @foo to i64), i64 1701208431)`,
```
X86DAGToDAGISel::matchAdd
...
// AM.setBaseReg(CurDAG->getRegister(X86::RIP, MVT::i64));
if (!matchAddressRecursively(N.getOperand(0), AM, Depth+1) &&
// Try folding offset but fail; there is a symbolic displacement, so offset cannot be too large
!matchAddressRecursively(Handle.getValue().getOperand(1), AM, Depth+1))
return false;
...
// Try again after commuting the operands.
// AM.Disp = Val; foldOffsetIntoAddress() does not know there will be a symbolic displacement
if (!matchAddressRecursively(Handle.getValue().getOperand(1), AM, Depth+1) &&
// AM.setBaseReg(CurDAG->getRegister(X86::RIP, MVT::i64));
!matchAddressRecursively(Handle.getValue().getOperand(0), AM, Depth+1))
// Succeeded! Produced leaq sym+disp(%rip),...
return false;
```
`foldOffsetIntoAddress()` currently does not know there is a symbolic
displacement and can fold a large offset.
The produced `leaq sym+disp(%rip), %rax` instruction is relocated by
an R_X86_64_PC32. If disp is large and sym+disp-rip>=2**31, there
will be a relocation overflow.
This approach is still not elegant. Unfortunately the isRIPRelative
interface is a bit clumsy. I tried several solutions and eventually
picked this one.
Differential Revision: https://reviews.llvm.org/D73606
There was a TODO in AAValueConstantRangeArgument to reuse
AAArgumentFromCallSiteArguments. We now do this by allowing new States
to be build from the bestState.
If we invalidate an attribute we need to inform all dependent ones even
if the fixpoint state is not invalid. Before we only continued
invalidation if the fixpoint state was invalid, now we signal a change
in case the fixpoint state is valid.
The test case was already included in D71620 but the problem was hiding
because it only manifested with the old PM (for that input).
This patch modularizes the way we check for no-alias call site arguments
by putting the existing logic into helper functions. The reasoning was
not changed but special cases for readonly/readnone were added.
If `null` is not defined we cannot access it, hence the pointer is
`noalias`. While this is not helpful on it's own it simplifies later
deductions that can skip over already known `noalias` pointers in
certain situations.