Add shape func op for use (primarily) in shape function_library op. Allows
setting default dialect for some simpler authoring. This is a minimal version
of the ops needed.
Differential Revision: https://reviews.llvm.org/D124055
If there is only one single element in the vector, then we can
just extract the element to compute the final result.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D124129
Backwards search
The sext.w removal pass (before the new patch) checks if the input to sext.w is already in sign-extended form, so it can eliminate it. It does that by checking every definition/source that reaches the sext.w is an instruction that produces a sign-extended value, either by definition (e.g. ADDW), or it propagates sign-extension (e.g. OR) so we check its sources recursively.
Forward search
Sometimes, one of the sources is an instruction that doesn't always produce a sign-extended value, but it has a W-version that does (e.g. ADD / ADDW). If we transform the ADD to ADDW, the sext.w can be removed (assuming other def paths are satisfied), but this transformation is sound only if every use of this ADD/W only reqruires the lower 32-bits either directly (like sll %x, 32) or they propagate dependency (lower word of output only depends on lower word of input) so we check its uses recursively.
When searching backwards, if an instruction that can be replaced with W-variant is encountered, this pass runs the forward search to verify it can be replaced, then adds it to a list of fixable instructions. After verifying all paths, it replaces the instruction and removes the sext.w.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D119928
At some point in instruction selection, A2_tfrsi Constant:i32<...> was
created, where the "Constant" came from SelectAnyInt. Since it wasn't
a TargetConstant, it was selected again, leading to
%vreg = A2_tfrsi ...
... = A2_tfrsi %vreg
which is not a valid code.
Functionality of restoreStatOnFile may be reused. Move it into
FileUtilities.cpp. Create helper class FilePermissionsApplier
to store and apply permissions.
Differential Revision: https://reviews.llvm.org/D123821
NamedDecl::getIdentifier can return a nullptr when
DeclarationName::isIdentifier is false, which leads to a null pointer
dereference when TypePrinter::printTemplateId calls ->getName().
NamedDecl::getName does the same thing in the successful case and
returns an empty string in the failure case.
This crash affects the llvm 14 packages on llvm.org.
> Includes regression test for problem noted by @hans.
> is reverts commit 973de71.
>
> Differential Revision: https://reviews.llvm.org/D106898
Feature implemented as-is is fairly expensive and hasn't been used by
libc++. A potential reimplementation is possible if libc++ become
interested in this feature again.
Differential Revision: https://reviews.llvm.org/D123885
This patch adds custom MIR operand comments to VTYPE immediate operands
in VSETVLI instructions and SEW/LMUL operands in vector codegen pseudo
instructions. The result is intended to be more human-readable and
hopefully maintainable when working with MIR, particularly when
writing or reading test cases.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D124187
This makes the API easier to use. Also allows us to check for incorrect API usage for easier debugging.
Differential Revision: https://reviews.llvm.org/D124265
Reduce peak memory usage by tearing down the intermediate representation
as we build the final one. Rather than deleting it in the end.
Differential Revision: https://reviews.llvm.org/D124240
This is a very specific fold to fix an upstream poor codegen issue.
InstCombine has the much more flexible pushFreezeToPreventPoisonFromPropagating but I don't think we're quite there with DAG/TLI handling for canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison value tracking yet.
Fixes#54911
Differential Revision: https://reviews.llvm.org/D124185
The `hasFilter` field is not needed. Instead, the filter accepts ops by default if no ALLOW rule was specified.
Differential Revision: https://reviews.llvm.org/D124264
Currently metadata is inserted in a late pass which is lowered
to an AssertZext. The metadata would be more useful if it was
inserted earlier after inlining, but before codegen.
Probably shouldn't change anything now. Just replacing the
late metadata annotation needs more work, since we lose
out on optimizations after these are lowered to CopyFromReg.
Seems to be slightly better than relying on the AssertZext from the
metadata. The test change in cvt_f32_ubyte.ll is a quirk from it using
-start-before=amdgpu-isel instead of running the usual codegen
pipeline.
The most common situation where G_ASSERT_ZEXT appears for AMDGPU is a
copy from a physical register, which happens to use set the actual
register class on the virtual register. After copy coalescing, the
assert's source operand had a vreg with a set class. The verifier was
strictly rejecting cases where the set class/bank weren't an exact
match. Additionally, RegBankSelect was also expecting a register bank
to be set on the register, not a class.
This is much stricter than regular copies so relax this behavior. This
now allows these 2 cases:
1. Source register has either class or bank, and the result does not
2. Source register has a register class, and the result is a register
with a matching bank.
This should avoid needing some kind of special handling to avoid
violating this constraint when folding copies.
The original patch (https://reviews.llvm.org/D121354) targets x86 and adjusts
the lookahead score of splat loads ad they can be done by the `movddup`
instruction that combines the load and the broadcast and is cheap to execute.
A similar issue shows up on AArch64. The `ld1r` instruction performs a broadcast
load and is cheap to execute.
This patch implements the TargetTransformInfo hooks for AArch64.
Differential Revision: https://reviews.llvm.org/D123638
This reverts part of https://reviews.llvm.org/D124224 that causes
an assert because the register allocator triggers a pathological
situation where there's no safe way to insert a zeroing MOVPFRX
instruction.
These command-line flags are alternates to providing the -x
c++-*-header indicators that we are building a header unit.
Act on fmodule-header= for headers on the c/l:
If we have x.hh -fmodule-header, then we should treat that header
as a header unit input (equivalent to -xc++-header-unit-header x.hh).
Likewise, for fmodule-header={user,system} the source should be now
recognised as a header unit input (since this can affect the job list
that we need).
It's not practical to recognise a header without any suffix so
-fmodule-header=system foo isn't going to happen. Although
-fmodule-header=system foo.hh will work OK. However we can make it
work if the user indicates that the item without a suffix is a valid
header. (so -fmodule-header=system -xc++-header vector)
Differential Revision: https://reviews.llvm.org/D121589
vector.broadcast can inject all size one dimensions. If it's
followed by a vector.shape_cast to the original type, we can
cancel the op pair, like cancelling consecutive shape_cast ops.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D124094
This patch extends cc1as to export the build version load command with
LC_VERSION_MIN_MACOSX.
This is especially important for Mac Catalyst as Mac Catalyst uses
the MacOS's compiler rt built-ins.
Differential Revision: https://reviews.llvm.org/D121868
Add DestructiveBinaryComm* patterns for ORR, EOR, AND and BIC.
The above instructions requires that the source and destination registers are
equal, so use movprfx should be beneficial to performance.
note: BIC (i.e. A & ~B) is not a commutative operation.
Reviewed By: paulwalker-arm, david-arm
Differential Revision: https://reviews.llvm.org/D124224
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.
Reviewed By: nickdesaulniers, MaskRay
Differential Revision: https://reviews.llvm.org/D111780
attribute((__aligned__)) is present but ignored`
In the original code, the 'getDeclAlignIfRequired' function is used.
The 'getDeclAlignIfRequired' function will return the max alignment
of all aligned attributes if the type has aligned attributes. The
function doesn't consider the type at all.
The 'getTypeAlignIfRequired' function uses the type's alignment value,
which also used by the 'alignof' function. I think we should use the
function of 'getTypeAlignIfRequired'.
Reviewed By: dblaikie, jmorse, wolfgangp
Differential Revision: https://reviews.llvm.org/D124006
Folds are supposed to always be added in conjugated pairs for and
and or. Merge the two functions to make folds for which this is
currently not the case more obvious.
* Move Module Bufferization to the bufferization dialect. The implementation is split into `OneShotModuleBufferize.cpp` and `FuncBufferizableOpInterfaceImpl.cpp`, so that the external model implementation can be easily moved to the func dialect in the future.
* Split and clean up test cases. A few test cases are still remaining in Linalg and will be updated separately.
* `linalg.inplaceable` is renamed to `bufferization.writable` to accurately reflect its current usage.
* Attributes and their verifiers are moved from the Linalg dialect to the Bufferization dialect.
* Expand documentation.
* Add a new flag to One-Shot Bufferize to allow for function boundary bufferization.
Differential Revision: https://reviews.llvm.org/D122229
1d90e53044 switch this code to store
the predicates and operands in variables, but retained a
swapOperands() call here. Thus the commuted cases were no longer
folded. Additionally, as the change was not reported, the next
InstCombine iteration would not pick it up either.
The layout postprocessing step was removed and is now part of the FuncOp bufferization. If the user specified a certain layout map for a tensor function arg, use that layout map directly when bufferizing the function signature. Previously, the bufferization used a generic layout map for every tensor function arg and then updated function signatures and CallOps in a separate step.
Differential Revision: https://reviews.llvm.org/D122228
FuncOps are now less special. They must still be analyzed + bufferized in a certain order, but they are now bufferized same as other ops that have a region: Bufferize the op first (`bufferize` interface method), then bufferize the region body with other bufferization patterns. In the case of FuncOps, the function signature is bufferized together with ReturnOps. Similar to how, e.g., scf.for ops are bufferized together with scf.yield ops.
This change is essentially a reimplementation of the FuncOp bufferization, but mostly NFC from a user's perspective (apart from error messages). This change is in preparation of moving the code to the bufferization dialect.
Differential Revision: https://reviews.llvm.org/D123214