FixupStatepoints pass does not take into account the undef use
it skips may have a tied def. So when defs are handled pass
considers that tied-use should be spilled and triggers an assert.
FixupStatepoints should skip undef def as well.
Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D95858
GCC warning:
```
In file included from /llvm-project/llvm/lib/Support/VirtualFileSystem.cpp:13:
/llvm-project/llvm/include/llvm/Support/VirtualFileSystem.h: In static member function ‘static bool llvm::vfs::RedirectingFileSystem::RemapEntry::classof(const llvm::vfs::RedirectingFileSystem::Entry*)’:
/llvm-project/llvm/include/llvm/Support/VirtualFileSystem.h:681:5: warning: control reaches end of non-void function [-Wreturn-type]
681 | }
| ^
```
It seems that recording fundamental return type is bogus.
This can trigger asserts when running a test with reproducers so this
patch updates the `SBTarget::IsLoaded` test to stop recording them.
Differential Revision: https://reviews.llvm.org/D95686
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
We don't register i128 as a legal type with addRegisterClass, but it
appears in the list of legal register types. This inconsistency
resulted in the asm constraint lowering trying to use 2 128-bit
registers for these operands. This would leave behind a dead def that
would waste registers.
Regresses GlobalISel tests for i128 load/store, but these aren't very
important right now. Ideally these would not depend on the list of
register types.
This should only consider whether the pressure impact of the bundle at
the given point in the program will decrease the occupancy. High VGPR
pressure was incorrectly blocking the formation of scalar bundles, and
vice versa. This was also blocking bundling from high pressure
situations at other points in the program.
If the G_BR + G_BRCOND in this combine use the same MBB, then it will infinite
loop. Don't allow that to happen.
Differential Revision: https://reviews.llvm.org/D95895
This change also introduces a new source layout for adding machine
specific and generic implementations. To keep the scope of this change
small, this new pattern is only applied for ceil, ceilf and ceill.
Follow up changes will switch all math functions in to the new pattern.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D95850
We should be check whether lb + step >= ub to determine
whether this is a single iteration. Previously we were
checking lb + lb >= ub.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D95440
Add the conversion pattern for vector.bitcast to lower it to
the LLVM Dialect.
Reviewed By: ThomasRaoux, aartbik
Differential Revision: https://reviews.llvm.org/D95579
cl::ZeroOrMore allows the option to be specified multiple times, which makes
downstream projects possible to specify a default value in lit configuration
while some tests can override the value.
Split up MeasureSizeInBytes() so that array element sizes can be
calculated accurately; use the new API in some places where
DynamicType::MeasureSizeInBytes() was being used but the new
API performs better due to TypeAndShape having precise CHARACTER
length information.
Differential Revision: https://reviews.llvm.org/D95897
tut-simplify-cfg hasn't been ported to the new PM.
llvm-lto2's -enable-new-pm defaults to the CMake flag, so the legacy PM extension test needs to be pinned.
Reviewed By: MaskRay, ychen
Differential Revision: https://reviews.llvm.org/D95898
Second land attempt. MachineVerifier DefRegState expensive check errors fixed.
Prologs and epilogs handle callee-save registers and tend to be irregular with
different immediate offsets that are not often handled by the MachineOutliner.
Commit D18619/a5335647d5e8 (combining stack operations) stretched irregularity
further.
This patch tries to emit homogeneous stores and loads with the same offset for
prologs and epilogs respectively. We have observed that this canonicalizes
(homogenizes) prologs and epilogs significantly and results in a greatly
increased chance of outlining, resulting in a code size reduction.
Despite the above results, there are still size wins to be had that the
MachineOutliner does not provide due to the special handling X30/LR. To handle
the LR case, his patch custom-outlines prologs and epilogs in place. It does
this by doing the following:
* Injects HOM_Prolog and HOM_Epilog pseudo instructions during a Prolog and
Epilog Injection Pass.
* Lowers and optimizes said pseudos in a AArchLowerHomogneousPrologEpilog Pass.
* Outlined helpers are created on demand. Identical helpers are merged by the linker.
* An opt-in flag is introduced to enable this feature. Another threshold flag
is also introduced to control the aggressiveness of outlining for application's need.
This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz.
Differential Revision: https://reviews.llvm.org/D76570
Unwinders (like libc's backtrace()) can call their own locks (like the
libdl lock). We need to let the unwinder release the locks before
forking. Wrap a new lock around the unwinder for atfork protection.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D95889
Previously we'd hit UB due to an invalid left shift operand.
Also fix the WASM emitter to properly use SLEB128 encoding instead of
ULEB128 encoding for signed fields so that negative numbers don't
result in overly-large values that we can't read back any more.
In passing, don't diagnose a non-canonical ULEB128 that fits in a uint64_t but
has redundant trailing zero bytes.
Reviewed By: dblaikie, aardappel
Differential Revision: https://reviews.llvm.org/D95510
DFSan uses TLS to pass metadata of arguments and return values. When an
instrumented function accesses the TLS, if a signal callback happens, and
the callback calls other instrumented functions with updating the same TLS,
the TLS is in an inconsistent state after the callback ends. This may cause
either under-tainting or over-tainting.
This fix follows MSan's workaround.
cb22c67a21
It simply resets TLS at restore. This prevents from over-tainting. Although
under-tainting may still happen, a taint flow can be found eventually if we
run a DFSan-instrumented program multiple times. The alternative option is
saving the entire TLS. However the TLS storage takes 2k bytes, and signal calls
could be nested. So it does not seem worth.
This diff fixes sigaction. A following diff will be fixing signal.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D95642
This will allow running back-deployment testing on macOS only on systems
running the right version of macOS. For the time being, we're cheating
because we don't have actual machines running older than 10.15.
Convert `assertTrue(a == b)` to `assertEqual(a, b)` to produce better failure messages.
These were mostly done via regex search & replace, with some manual fixes.
Differential Revision: https://reviews.llvm.org/D95813
This revision adds two new classes, RewriterBase and IRRewriter. RewriterBase is a new shared base class between IRRewriter and PatternRewriter. PatternRewriter will continue to be the base class used to perform rewrites within a rewrite pattern. IRRewriter on the other hand, is a new class that allows for tracking IR rewrites from outside of a rewrite pattern. In this revision all of the old API from PatternRewriter is moved to RewriterBase, but the distinction between IRRewriter and PatternRewriter is kept on the chance that a necessary API divergence happens in the future.
Currently if you want to have some utility that transforms a piece of IR and share it between pattern and non-pattern code, you have to duplicate it. This revision enables the creation of utilities that can be invoked from rewrite patterns and normal transformation code:
```c++
void someSharedUtility(RewriterBase &rewriter, ...) {
// Some interesting IR mutation here.
}
// Some RewritePattern
LogicalResult MyPattern::matchAndRewrite(Operation *op, PatternRewriter &rewriter) {
...
someSharedUtility(rewriter, ...);
...
}
// Some Pass
void MyPass::runOnOperation() {
...
IRRewriter rewriter(...);
someSharedUtility(rewriter, ...);
}
```
Differential Revision: https://reviews.llvm.org/D94638
Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count.
This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes.
A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead.
Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D93264
Previously, operator== would consider the actual equality of the pairs
(lhs.Value, lhs.State) == (rhs.Value, rhs.State). However, if an invalid
cost was involved in a call to operator<, only the state would be
compared. Thus, it was not the case that ({2, Invalid} < {3, Invalid} ||
{2, Invalid} > {3, Invalid} || {2, Invalid} == {3, Invalid}).
This patch implements a true total ordering, where cost state is
considered first, then value. While it's not really imporant that
{2, Invalid} be considered to be less than {3, Invalid}, it's not a
problem either. This patch also implements operator== in terms of
operator<, so the two definitions will be kept in sync.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D95803
Implement IEEE_SUPPORT_DATATYPE() and other inquiry intrinisic
functions from the intrinsic module IEEE_ARITHMETIC, folding all of
their results to .TRUE.
Differential Revision: https://reviews.llvm.org/D95830
The MLIR Async runtime uses different namespacing for the header file,
and the definitions of its C API. The header file places the extern "C"
functions inside namespace mlir::runtime, and the definitions are not
in a namespace. This causes issues in cl.exe. It treats the declaration
and definition as different, and thus does not apply dllexport to the
definition, which leads to the mlir_async_runtime.dll containing no
definitions, and the mlir_async_runtime.lib not being generated.
This patch moves the namespace to cover the definitions, and thus
generates the dll correctly on Windows with cl.exe.
This was tested with Visual Studio C++ 19.28.29336.
Differential Revision: https://reviews.llvm.org/D95386
Add the necessary bits to CMakeLists to make it possible to configure
MLIR against installed LLVM, and build it with minimal need for LLVM
source tree. The latter is only necessary to run unittests, and if it
is missing then unittests are skipped with a warning.
This change includes the necessary changes to tests, in particular
adding some missing substitutions and defining missing variables
for lit.site.cfg.py substitution.
Reviewed By: stephenneuendorffer
Differential Revision: https://reviews.llvm.org/D85464
Co-authored-by: Isuru Fernando <isuruf@gmail.com>
The __resume function trips up LLVM's 'X86 DAG->DAG Instruction Selection' unless optimizations are disabled.
Only adding the __resume function when it's needed allows lowering through AsyncToLLVM and LLVM without '-O0' as long as the coroutine functionality is not used.
Reviewed By: ezhulenev
Differential Revision: https://reviews.llvm.org/D95868
LLVM_TARGETS_TO_BUILD accepts both "host" or "Native" for auto-selecting
the target from the environment. However the way "Native" was plumbed
would lead to the JIT environment being disabled. This patch is making
"Native" works just as "host".
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D95837
Due to a clerical error, the sdiv operation was mapping to vdivu and
udiv to vdiv, when the opposite mapping is the correct one.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95869
It will allow to perform additional manipulation with the newly created Operation.
For example, custom attributes propagation/changes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D95525