If getClobberingMemoryAccess() is called with an explicit
MemoryLocation, but the starting access happens to be a call, the
provided location is currently ignored, and alias analysis queries
will be performed against the call instruction instead. Something
similar happens if the starting access is a load with a MemoryDef.
Change the implementation to not set Q.Inst in the first place if
we want to perform a MemoryLocation-based query, to make sure it
can't be turned into an Instruction-based query along the way...
Additionally, remove the special handling that lifetime.start
intrinsics currently get. They simply report NoAlias for clobbers
between lifetime.start and other calls, but that's obviously not
right if the other call is something like a memset or memcpy. The
default behavior we get from getModRefInfo() will already do the
right thing here.
Differential Revision: https://reviews.llvm.org/D88782
`mftb` and `mftbl` are equivalent, there is no need to have two names for doing the same thing, rename `mftbl` to only have `mftb`.
Differential Revision: https://reviews.llvm.org/D89506
fsl/fsr take their shift amount in $rs2 or an immediate. The
sources are $rs1 and $rs3.
fshl/fshr ISD opcodes both concatenate operand 0 in the high bits and
operand 1 in the lower bits. fshl returns the high bits after
shifting and fshr returns the low bits. So a shift amount of 0
returns operand 0 for fshl and operand 1 for fshr.
fsl/fsr concatenate their operands in different orders such that
$rs1 will be returned for a shift amount of 0. So $rs1 needs to
come from operand 0 of fshl and operand 1 of fshr.
Differential Revision: https://reviews.llvm.org/D90735
Move `DirectoryEntry` and `DirectoryEntryRef` into their own header,
similar to the creation of FileEntry.h. No functionality change here,
just preparing to include it in more places to allow wider adoption of
`DirectoryEntryRef` without requiring all of `FileManager.h`.
Differential Revision: https://reviews.llvm.org/D90478
Summary:
For vector element types which are not byte-sized, we would generate
incorrect scalar offsets and produce incorrect codegen.
This optimization could potentially be supported in the future, e.g. by
loading in bytes, then shifting and masking out the remaining bits of
the vector element. However, without an upstream target to test against
it's best to avoid the bad codegen in the simplest possible way.
Related to this bug:
https://bugs.llvm.org/show_bug.cgi?id=27600
Reviewed by: foad
Differential Revision: https://reviews.llvm.org/D78568
Clang now asserts for the below case:
```
void clang::CodeGen::CGOpenMPRuntime::createOffloadEntriesAndInfoMetadata(): Assertion `std::get<0>(E) && "All ordered entries must exist!"' failed.
```
The reason why Clang hit the assert is because in
`emitTargetDataCalls`, both `BeginThenGen` and `BeginElseGen` call
`registerTargetRegionEntryInfo` and try to register the Entry in
OffloadEntriesTargetRegion with same key. If changing the expression in
if clause to any constant expression, then the assert disappear. (https://godbolt.org/z/TW7haj)
The assert itself is to avoid
user from accessing elements out of bound inside `OrderedEntries` in
`createOffloadEntriesAndInfoMetadata`.
In this patch, I add a check in `registerTargetRegionEntryInfo` to avoid
register the target region more than once.
A test case that triggers assert: https://godbolt.org/z/4cnGW8
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D90704
riscv_sllw/srlw only reads the lower 32 bits of the first operand.
And the lower 5 bits of the second operands. Whether the upper
32 bits of the input are sign bits or not doesn't matter.
Also use ineg and not to shorten the patterns.
Differential Revision: https://reviews.llvm.org/D90668
- Eliminate duplicated information about mapping from memref -> its descriptor fields
by consolidating that mapping in two functions: getMemRefDescriptorFields and
getUnrankedMemRefDescriptorFields.
- Change convertMemRefType() and convertUnrankedMemRefType() to use these
functions.
- Remove convertMemrefSignature and convertUnrankedMemrefSignature.
Differential Revision: https://reviews.llvm.org/D90707
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Reapply with fix for memory sanitizer failure and sphinx failure.
Differential Revision: https://reviews.llvm.org/D90612
Previously, linalg-bufferize was a "finalizing" bufferization pass (it
did a "full" conversion). This wasn't great because it couldn't be used
composably with other bufferization passes like std-bufferize and
scf-bufferize.
This patch makes linalg-bufferize a composable bufferization pass.
Notice that the integration tests are switched over to using a pipeline
of std-bufferize, linalg-bufferize, and (to finalize the conversion)
func-bufferize. It all "just works" together.
While doing this transition, I ran into a nasty bug in the 1-use special
case logic for forwarding init tensors. That logic, while
well-intentioned, was fundamentally flawed, because it assumed that if
the original tensor value had one use, then the converted memref could
be mutated in place. That assumption is wrong in many cases. For
example:
```
%0 = some_tensor : tensor<4xf32>
br ^bb0(%0, %0: tensor<4xf32>, tensor<4xf32>)
^bb0(%bbarg0: tensor<4xf32>, %bbarg1: tensor<4xf32>)
// %bbarg0 is an alias of %bbarg1. We cannot safely write
// to it without analyzing uses of %bbarg1.
linalg.generic ... init(%bbarg0) {...}
```
A similar example can happen in many scenarios with function arguments.
Even more sinister, if the converted memref is produced by a
`std.get_global_memref` of a constant global memref, then we might
attempt to write into read-only statically allocated storage! Not all
memrefs are writable!
Clearly, this 1-use check is not a local transformation that we can do
on the fly in this pattern, so I removed it.
The test is now drastically shorter and I basically rewrote the CHECK
lines from scratch because:
- the new composable linalg-bufferize just doesn't do as much, so there
is less to test
- a lot of the tests were related to the 1-use check, which is now gone,
so there is less to test
- the `-buffer-hoisting -buffer-deallocation` is no longer mixed in, so
the checks related to that had to be rewritten
Differential Revision: https://reviews.llvm.org/D90657
We need to ensure the upper 32 bits of the mask are zero.
So that the srl shifts zeroes into the lower 32 bits.
Differential Revision: https://reviews.llvm.org/D90585
Silence warning Undefined Behavior Sanitzer warning:
runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D90710
We don't need custom matching, we just a need a predicate to check
the immediate is greater than 32. We can use the existing ImmSub32
to adjust the immediate.
I've also used the new predicate in the other location that used
ImmSub32. I tried to create a test case where we would break without
the greater than 32 check on that pattern, but DAG combine defeated me.
Still seemed safer to have it.
Differential Revision: https://reviews.llvm.org/D90546
Pseudo-registers allow different register encodings
between gpu generations. Make sure we resolve the
pseudo regs to real regs whenever we get their
hardware encoding.
Using the correct encodings revealed a register
bank conflict and an unnecessary write dependency.
Tests have been updated to match.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D90721
Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca
This allows those instrumentation to log when they decide to skip a
pass. This provides extra helpful info for optnone functions and also
will help with opt-bisect.
Have OptNoneInstrumentation print when it skips due to seeing optnone.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90545
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.
This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.
Differential Revision: https://reviews.llvm.org/D90596
This matches behavior GNU objcopy and can simplify clang-offload-bundler
(which currently works around the issue by invoking llvm-objcopy twice).
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D90438
Adds visual studio debugger support to dexter via option --debugger vs2019
Differential Revision: https://reviews.llvm.org/D89803
Author: Nabeel Omer <nabeel.omer@sony.com>
adds a test that checks debugging experience in functions that are marked
__attribute__((optnone)) and have loops inside them.
Differential Revision: https://reviews.llvm.org/D89873
Author: Nabeel Omer <nabeel.omer@sony.com>
Those are part of the library, and shipping them just adds a tiny bit of
size to the distribution. This was originally added in b422ecc7de to
make it possible to match the Makefile build, which doesn't exist anymore.
The upside is build system simplification.
RemoteIndexClient implementations only depends on clangdSupport for
logging functionality and has no dependence on clangDeamon itself. This clears
out that link time dependency and enables depending on it in clangDeamon itself,
so that we can have other index implementations that makes use of the
RemoteIndex.
Differential Revision: https://reviews.llvm.org/D90746
SIG30-C. Call only asynchronous-safe functions within signal handlers
First version of this check, only minimal list of functions is allowed
("strictly conforming" case), for C only.
Differential Revision: https://reviews.llvm.org/D87449
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90612
Allows the MACRO directive to define macro procedures with parameters and macro-local symbols.
Supports required and optional parameters (including default values), and matches ml64.exe for its macro-local symbol handling (up to 65536 macro-local symbols in any translation unit).
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D89729
This patch enhances the clang driver to link the runtime profile
library on AIX when the -fprofile-generate option is used.
Reviewed By: phosek
Differentail Revision: https://reviews.llvm.org/D90641