If there were no free VGPRs we would need two emergency spill slots for register
scavenging during PEI/frame index elimination. Reuse 'ResultReg' for scale
calculation so that only one spill is needed.
Differential Revision: https://reviews.llvm.org/D76387
Apart from the argument registers, set the CostPerUse
value as per the ratio reg_index/allocation_granularity.
It is a pre-commit for introducing the scratch registers
in the ABI. This change should help in a balanced
register allocation.
Differential Revision: https://reviews.llvm.org/D76417
For now, when final suspend can be simplified by simplifySuspendPoint,
handleFinalSuspend is executed as well to remove last case in switch
instruction. This patch fixes it.
Differential Revision: https://reviews.llvm.org/D76345
Passing small data limit to RISCVELFTargetObjectFile by module flag,
So the backend can set small data section threshold by the value.
The data will be put into the small data section if the data smaller than
the threshold.
Differential Revision: https://reviews.llvm.org/D57497
OperationFolder::tryToFold was running the pre-replacement
action even when there was no constant folding, i.e., when the operation
was just being updated in place but was not going to be replaced. This
led to nested ops being unnecessarily removed from the worklist and only
being processed in the next outer iteration of the greedy pattern
rewriter, which is also why this didn't affect the final output IR but
only the convergence rate. It also led to an op's results' users to be
unnecessarily added to the worklist.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76268
The VMO size is always page-rounded, but Zircon now provides
a way to publish the precise intended size.
Patch By: mcgrathr
Differential Revision: https://reviews.llvm.org/D76437
This essentially drops the change by r288021 (discussed with Georgii Rymar
and Peter Smith and noted down in the release note of lld 10).
GNU ld>=2.31 enables -z separate-code by default for Linux x86. By
default (in the absence of a PHDRS command) a readonly PT_LOAD is
created, which is different from its traditional behavior.
Not emulating GNU ld's traditional behavior is good for us because it
improves code consistency (we create a readonly PT_LOAD in the absence
of a SECTIONS command).
Users can add --no-rosegment to restore the previous behavior (combined
readonly and read-executable sections in a single RX PT_LOAD).
While the VMO size is always page aligned, we can record the content
size as a property and then use this metadata when writing the data to
a file.
Differential Revision: https://reviews.llvm.org/D76462
Because MLIR_HAS_EXPORTS is not set, MLIRTarget.cmake is not delivered
to the install area. When this happens, the delivered MLIRConfig.cmake
should not reference it. Independently, we need to determine under what
conditions MLIR_HAS_EXPORTS should be set. Probably we are not exporting
all the libraries correctly.
(would be nice to revisit the CFG traits and change them to use ranges
rather than begin/end - if anyone wants to do that refactor)
Also use more auto because writing the names of range utilty iterators
isn't helping readability here - they're sort of implementation details
for the most part, especially once you nest a few different filtering
and adapting iterators.
The fix (shooting from the hip since I couldn't reproduce this locally)
was to capture by value in a lambda used in a filtering iterator -
because the iterator would persist beyond the lifetime of the function
(as the iterators are returned to callers).
Originally committed in 79a7ed92a9.
This was reverted in 4a7f2032a3.
Summary:
Swift ABI is based on basic C ABI described here https://github.com/WebAssembly/tool-conventions/blob/master/BasicCABI.md
Swift Calling Convention on WebAssembly is a little deffer from swiftcc
on another architectures.
On non WebAssembly arch, swiftcc accepts extra parameters that are
attributed with swifterror or swiftself by caller. Even if callee
doesn't have these parameters, the invocation succeed ignoring extra
parameters.
But WebAssembly strictly checks that callee and caller signatures are
same. https://github.com/WebAssembly/design/blob/master/Semantics.md#calls
So at WebAssembly level, all swiftcc functions end up extra arguments
and all function definitions and invocations explicitly have additional
parameters to fill swifterror and swiftself.
This patch support signature difference for swiftself and swifterror cc
is swiftcc.
e.g.
```
declare swiftcc void @foo(i32, i32)
@data = global i8* bitcast (void (i32, i32)* @foo to i8*)
define swiftcc void @bar() {
%1 = load i8*, i8** @data
%2 = bitcast i8* %1 to void (i32, i32, i32)*
call swiftcc void %2(i32 1, i32 2, i32 swiftself 3)
ret void
}
```
For swiftcc, emit additional swiftself and swifterror parameters
if there aren't while lowering. These additional parameters are added
for both callee and caller.
They are necessary to match callee and caller signature for direct and
indirect function call.
Differential Revision: https://reviews.llvm.org/D76049
Summary:
These were merged to the SIMD proposal in
https://github.com/WebAssembly/simd/pull/128.
Depends on D76397 to avoid merge conflicts.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76399
Summary:
1. FileLineInfoSpecifier::Default isn't the default for anything.
Rename to RawValue, which accurately reflects its role.
2. Most functions that take a part of a FileLineInfoSpecifier end up
constructing a full one later or plumb two values through. Make them
all just take a complete FileLineInfoSpecifier.
3. Printing basenames only was handled differently from all other
variants, make it parallel to all the other variants.
Reviewers: jhenderson
Subscribers: hiraditya, MaskRay, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76394
Port over the following:
- shuffle undef, undef, any_mask -> undef
- shuffle anything, anything, undef_mask -> undef
This sort of thing shows up a lot when you try to bugpoint code containing
shufflevector.
Differential Revision: https://reviews.llvm.org/D76382
The Interface libraries were moved from Analysis, but declared in
cmake using add_llvm_library(). This breaks LLVM_BUILD_LLVM_DYLIB
builds.
Differential Revision: https://reviews.llvm.org/D76463
Updates the object buffer ownership scheme in jitLinkForOrc and related
functions: Ownership of both the object::ObjectFile and underlying
MemoryBuffer is passed into jitLinkForOrc and passed back to the onEmit
callback once linking is complete. This avoids the use-after-free errors
that were seen in 98f2bb4461.
While the VMO size is always page aligned, we can record the content
size as a property and then use this metadata when writing the profile
to a file.
Differential Revision: https://reviews.llvm.org/D76402
This makes toolchain independent of the path it was built in by
rewriting all absolute paths embedded in sources and debug info
into relative ones.
Differential Revision: https://reviews.llvm.org/D76189
This handles not paths embedded in debug info, but also in sources.
Since the use of this flag is controlled by an option, rather than
replacing the new option, we add a new option.
Differential Revision: https://reviews.llvm.org/D76018
Add pseudo instructions for ldrsbt/ldrht/ldrsht with implicit immediate
and add fall back C++ code to transform the instruction to the
equivalent LDRSBTi/LDRHTi/LDRSHTi form.
This is similar to how it has been done in commit
fb3950ec63
This fixes:
https://bugs.llvm.org/show_bug.cgi?id=45070
Summary:
The expected pattern is for subclasses to initialize through
computeDependence, which needs only setDependence.
The few places that still use addDependence can be simulated with get+set.
Reviewers: hokein
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76392
Summary:
Some kernels can provide 16EiB worth of mappings to each process, which
causes mmap test to run for a very long time. In order to make it stop
after a few seconds, make mmap_interceptor() fail when the original
mmap() returns an address which is outside of the application range.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: #sanitizers, Andreas-Krebbel, stefansf, jonpa, uweigand
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D76426
This logic can be shared with the tiled code generation.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D75565
Summary:
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44611 by
preventing an infinite loop in the jump threading pass when
-jump-threading-across-loop-headers is on. Specifically, without this
patch, jump threading through two basic blocks would trigger on the
same area of the CFG over and over, resulting in an infinite loop.
Consider testcase PR44611-across-header-hang.ll in this patch. The
first opportunity to thread through two basic blocks is:
from bb_body2 through bb_header and bb_body1 to bb_body2.
The pass duplicates bb_header and bb_body1 as, say, bb_header.thread1
and bb_body1.thread1. Since bb_header contains a successor edge back
to itself, bb_header.thread1 also contains a successor edge to
bb_header, immediately giving rise to the next jump threading
opportunity:
from bb_header.thread1 through bb_header and bb_body1 to bb_body2.
After that, we repeatedly thread an incoming edge into bb_header
through bb_header and bb_body1 to bb_body2. In other words, we keep
peeling one iteration from bb_header's self loop.
The patch fixes the problem by preventing the pass from duplicating a
basic block containing a self loop.
Reviewers: wmi, junparser, efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76390
Remove the gap left between the stack pointer (s32) and frame pointer
(s34) now that the scratch wave offset is no longer a part of the
calling convention ABI.
Update llvm/docs/AMDGPUUsage.rst to reflect the change.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75657
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
Remove dead code and factor repeated conditions out into a single check.
Rename and move code to make it more obvious what is running only for
entry functions. Simplify function arguments to make it clearer what the
relevant inputs are. Make flat scratch init accept an MBB iterator and
move it to where it was logically being emitted within the prologue.
These changes will make a future update to the calling convention
simpler.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75092
This patch slightly generalizes the code to emit loads and stores of a
matrix and adds helpers to load/store a tile of a larger matrix.
This will be used in a follow-up patch introducing initial tiling.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D75564
Builder::get{I32,I64}VectorAttr are actually of limited applicability since
vector types can't have zero elements, whereas many uses of this kind of
attribute (such as dimension lists for "transpose"-like and other tensor
ops) often can result in empty lists.
Differential Revision: https://reviews.llvm.org/D76403
If we know the SSE shift amount is out of range then we can simplify to zero value (logical) or a 'signsplat' bitwidth-1 shift (arithmetic). This allows us to remove the equivalent ConstantInt constant folding path from simplifyX86immShift.
Support prefixing destructive operations, with the MOVPRFX instruction, to build constructive operations.
Differential Revision: https://reviews.llvm.org/D75064
Along the same lines as eb918d8daf1: This code also had to acquire the session
mutex, and this could cause a deadlock under the wrong circumstances. This
patch updates GenericLLVMIRPlatformSupport to just use the session lock for
everything.
In MachOPlatform, obtaining the link-order for a JITDylib requires locking the
session, but also needs to be part of a larger atomic operation that collates
initializer symbols tracked by the platform. Trying to do this under a separate
platform mutex leads to potential locking order issues, e.g.
T1 locks session then tries to lock platform to register a new init symbol
meanwhile
T2 locks platform then tries to lock session to obtain link order.
Removing the platform lock and performing all these operations under the session
lock eliminates this possibility.
At the same time we also need to collate init pointers from the
MachOPlatform::InitScraperPlugin, and we don't need or want to lock the session
for that. The new InitSeqMutex has been added to guard these init pointers, and
the session mutex is never obtained while the InitSeqMutex is held.