Freeze the condition of the newly introduced conditional branch,
to avoid immediate undefined behavior if the input to ctlz/cttz
was originally poison.
Differential Revision: https://reviews.llvm.org/D125887
This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule
STP Q's to the same base-address in ascending order of offsets. We have found
this to improve performance on Neoverse N1 and should not hurt other AArch64
cores.
Differential Revision: https://reviews.llvm.org/D125377
Currently reassociating add expressions can lead to failing to select
(u|s)mlal. Implement isReassocProfitable to skip reassociating
expressions that can be lowered to (u|s)mlal.
The same issue exists for the *mlsl variants as well, but the DAG
combiner doesn't use the isReassocProfitable hook before reassociating.
To be fixed in a follow-up commit as this requires DAGCombiner changes
as well.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D125895
Previously, `getRegUsageForType` was implemented using
`getTypeLegalizationCost`. `getRegUsageForType` is used by the loop
vectorizer to estimate the register pressure caused by using a vector
type. However, `getTypeLegalizationCost` currently only appears to
understand splitting and not scalarization, so significantly
underestimates the register requirements.
Instead, use `getNumRegisters`, which understands when scalarization
can occur (via computeRegisterProperties).
This was discovered while investigating D118979 (Set maximum VF with
shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the
loop vectorizer previously ends up costing an v128i1 as 2 v64i*
registers where it actually occupies 128 i32 registers.
I'm sending this patch early for comment, I'm still doing some sanity checking
with LNT. I note that getRegisterClassForType appears to return VectorRC even
though the type in question (large vNi1 types) end up occupying scalar
registers. That might be worth fixing too.
Differential Revision: https://reviews.llvm.org/D125918
Without the change llvm build fails on this week's gcc-13 snapshot as:
[ 91%] Building CXX object unittests/Support/CMakeFiles/SupportTests.dir/Base64Test.cpp.o
In file included from llvm/unittests/Support/Base64Test.cpp:14:
llvm/include/llvm/Support/Base64.h: In function 'std::string llvm::encodeBase64(const InputBytes&)':
llvm/include/llvm/Support/Base64.h:29:5: error: 'uint32_t' was not declared in this scope
29 | uint32_t x = ((unsigned char)Bytes[i] << 16) |
| ^~~~~~~~
Without the change llvm build fails on this week's gcc-13 snapshot as:
[ 0%] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/Signals.cpp.o
In file included from llvm/lib/Support/Signals.cpp:14:
llvm/include/llvm/Support/Signals.h:119:8: error: variable or field 'CleanupOnSignal' declared void
119 | void CleanupOnSignal(uintptr_t Context);
| ^~~~~~~~~~~~~~~
Depends on D125892. There might be efficiency and performance
implications by using a lambda. Thus, I am going to conduct measurements
to see if there is any noticeable impact.
I've been thinking about two more alternatives:
1) Make `assumeDualImpl` a variadic template and (perfect) forward the
arguments for the used `assume` function.
2) Use a macros.
I have concerns though, whether these alternatives would deteriorate the
readability of the code.
Differential Revision: https://reviews.llvm.org/D125954
Depends on D124758. This is the very same thing we have done for
assumeDual, but this time we do it for assumeInclusiveRange. This patch
is basically a no-brainer copy of that previous patch.
Differential Revision:
https://reviews.llvm.org/D125892
This diff modifies `mlir-tblgen` to generate Python Operation class `__init__()`
functions that use Python keyword-only arguments.
Previously, all `__init__()` function arguments were positional. Python code to
create MLIR Operations was required to provide values for ALL builder arguments,
including optional arguments (attributes and operands). Callers that did not
provide, for example, an optional attribute would be forced to provide `None`
as an argument for EACH optional attribute. Proposed changes in this diff use
`tblgen` record information (as provided by ODS) to generate keyword arguments
for:
- optional operands
- optional attributes (which includes unit attributes)
- default-valued attributes
These `__init__()` function keyword arguments have default `None` values (i.e.
the argument form is `optionalAttr=None`), allowing callers to create Operations
more easily.
Note that since optional arguments become keyword-only arguments (since they are
placed after the bare `*` argument), this diff will require ALL optional
operands and attributes to be provided using explicit keyword syntax. This may,
in the short term, break any out-of-tree Python code that provided values via
positional arguments. However, in the long term, it seems that requiring
keywords for optional arguments will be more robust to operation changes that
add arguments.
Tests were modified to reflect the updated Operation builder calling convention.
This diff partially addresses the requests made in the github issue below.
https://github.com/llvm/llvm-project/issues/54932
Reviewed By: stellaraccident, mikeurbach
Differential Revision: https://reviews.llvm.org/D124717
The arch or cpu has its default fpu features and versions such as fpuv2_sf/fpuv3_sf.
And there is also -mfpu option to specify and override fpu version and features.
For example, C860 has fpuv3_sf/fpuv3_df feature as default, when
-mfpu=fpv2 is given, fpuv3_sf/fpuv3_df is replaced with fpuv2_sf/fpuv2_df.
I had initially assumed this was the problem with
https://github.com/llvm/llvm-project/issues/55271#issuecomment-1133426243
But it turns out that was a simpler issue. This patch is still
more correct than what we were doing before so figured I'd submit
it anyway.
No test case because I'm not sure how to get an undef around
until expansion.
Looking at the test deltas I wonder if it be valid to combine
(sext_inreg (freeze (aextload X))) -> (freeze (sextload X)).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D126175
abs should only produce a positive value or the signed minimum
value. This means we can't fold abs(undef) to undef as that would
allow more values. Fold to 0 instead to match InstSimplify.
Fixes test mentioned in comment on pr55271.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D126174
The latch may not be the exiting block. Use the exiting block instead
when looking up the incoming value of the LCSSA phi node. This fixes a
crash with early-exit loops.
validateTree() is instantiated with __FILE__.
It will be pruned at link time due to -ffunction-sections but be left in
object files.
Its user is only GenericCycleInfo::compute() with assert(validateTree());
Therefore I think validateTree() may be hidden with NDEBUG.
This is a fixup for https://reviews.llvm.org/D112696
This is the specific pattern seen in #53432, but it can be extended
in multiple ways:
1. The 'zext' could be an 'and'
2. The 'sub' could be some other binop with a similar ==0 property (udiv).
There might be some way to generalize using knownbits, but that
would require checking that the 'bool' value is created with
some instruction that can be replaced with new icmp+logic.
https://alive2.llvm.org/ce/z/-KCfpa
Properly handle the case where only the second operand of e.g. an MVC
instruction uses a fixup for the displacement.
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D125982
Current codegen only supports scalarization of pointer inductions for
scalable VFs if they are uniform. After 3bebec659 we now may enter the
scalarization code path in VPWidenPointerInductionRecipe::execute for
scalable vectors.
Fall back to widening for scalable vectors if necessary.
This should fix a build failure when bootstrapping LLVM with SVE, e.g.
https://lab.llvm.org/buildbot/#/builders/176/builds/1723
This diff fixes decoding conflict between move instructions and
their tail-call counterpart
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D125948
Do not remove braces if the conditional of if/for/while might not
fit on a single line even after the opening brace is removed.
Examples:
// ColumnLimit: 20
// 45678901234567890
if (a) { /* Remove. */
foo();
}
if (-b >= c) { // Keep.
bar();
}
Differential Revision: https://reviews.llvm.org/D126052
This code previously used cantFail, but both steps (resolution and emission)
can fail if the resource tracker associated with the
AbsoluteSymbolsMaterializationUnit is removed. Checking these errors is
necessary for correct error propagation.
Reviewing the code again, I believe the sext is needed on the LHS
or RHS for ICmp and only on the RHS for Add.
Add an opcode check before checking the operand number.
Fixes PR55627.
Differential Revision: https://reviews.llvm.org/D125654
Idiomatic llvm::Error usage can result in a FailedToMaterialize error tearing
down an ExecutionSession instance. Since the FailedToMaterialize error holds
SymbolStringPtrs and JITDylib references this leads to crashes when accessing
or logging the error.
This patch modifies FailedToMaterialize to retain the SymbolStringPool and
JITDylibs involved in the failure so that we can safely report an error message
to the client, even if the error tears down the session.
The contract for JITDylibs allows the getName method to be used even after the
session has been torn down, but no other JITDylib fields should be accessed via
the FailedToMaterialize error if the ssesion has been torn down. Logging the
error is guaranteed to be safe in all cases.
I and @whisperity spent some time debugging a LIT test case using the
`-std=c++11-or-later` `check_clang_tidy.py` flag when the test had
fixits.
It turns out if the test wants to report a diagnostic into a header
file AND into the test file as well, one needs to first copy the header
somewhere under the build directory.
It needs to be copied since `clang-tidy` sorts the reports into
alphabetical order, thus to have a deterministic order relative to the
diagnostic in the header AND the diagnostic in the test cpp file.
There is more to this story.
The `-std=c++11-or-later` turns out executes the test with multiple
`-std=XX` version substitution, and each execution will also have the
`-fix` tidy parameter. This means that the freshly copied header file I
stated in the previous paragraph gets fixed up and the very next tidy
execution will fail miserably.
Following @whisperity's advice, I'm leaving a reminder about such
//shared// state in the related doc comment.
Reviewed By: LegalizeAdulthood
Differential Revision: https://reviews.llvm.org/D125771
This reverts commit fc9c59c355.
The patch triggers an assertion when building SPEC on X86. Reduced
reproducer shared at D107966.
Also reverts follow-up commit 11a09af76d.