Depthwise convolution should support kernel dilation and non-dilation should
not be a special case. Updated op definition to include a dilation attribute.
This also adds a tosa.depthwise_conv2d lowering to linalg to support the new
linalg behavior.
Differential Revision: https://reviews.llvm.org/D103219
It's still in use in a few places so we can't delete it yet but there's not
many at this point.
Differential Revision: https://reviews.llvm.org/D103352
This gives a nice message about the location of errors in a large
tablegen file, which is much more useful for users
Differential Revision: https://reviews.llvm.org/D102740
This omits load commands for unreferenced dylibs if:
- the dylib was loaded implicitly,
- it is marked MH_DEAD_STRIPPABLE_DYLIB
- or -dead_strip_dylibs is passed
This matches ld64.
Currently, the "is dylib referenced" state is computed before dead code
stripping and is not updated after dead code stripping. This too matches ld64.
We should do better here.
With this, clang-format linked with lld (like with ld64) no longer has
libobjc.A.dylib in `otool -L` output. (It was implicitly loaded as a reexport
of CoreFoundation.framework, but it's not needed.)
Differential Revision: https://reviews.llvm.org/D103430
WindowsSupport.h is a public header, however if it gets included, will cause a compile error indicating that llvm/Config/config.h cannot be found, because config.h is a private header. However there is no actual dependency on the private things in this header, so it can be changed to the public config header.
Reviewed By: amccarth
Differential Revision: https://reviews.llvm.org/D103370
- A lot of lit tests simply specify the arch minus the triple. On z/OS, this could result in a scenario of some-other-triple-unknown-ibm-zos. This points to an incorrect triple + arch combo.
- To prevent this, isOSzOS change is switched in favour of isOSBinFormatGOFF.
- This is because, the GOFF format is set only if the triple is systemz and if the operating system is GOFF. And currently, there are no other architectures/os's using the GOFF file format.
- An argument could be made that the problematic tests be fixed to explicitly specify the arch-vendor-triple string, but there's a large number of these tests, and adding this stricter scope ensures that we aren't instantiating the incorrect instance of the AsmParser for other platforms when run on z/OS.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D103343
This addresses pr50497. The argument of a typeid expression is
unevaluated, *except* when it's a polymorphic type. We handle this by
parsing as unevaluated and then transforming to evaluated if we
discover it should have been an evaluated context.
We do the same in TreeTransform<Derived>::TransformCXXTypeidExpr,
entering unevaluated context before transforming and rebuilding the
typeid. But that's incorrect and can lead us to converting to
evaluated context twice -- and hitting an assert.
During normal template instantiation we're always cloning the
expression, but during generic lambda processing we do not necessarily
AlwaysRebuild, and end up with TransformDeclRefExpr unconditionally
calling MarkDeclRefReferenced around line 10226. That triggers the
assert.
// Mark it referenced in the new context regardless.
// FIXME: this is a bit instantiation-specific.
SemaRef.MarkDeclRefReferenced(E);
This patch makes 2 changes.
a) TreeTransform<Derived>::TransformCXXTypeidExpr only enters
unevaluated context if the typeid's operand is not a polymorphic
glvalue. If it is, it keeps the same evaluation context.
b) Sema::BuildCXXTypeId is altered to only transform to evaluated, if
the current context is unevaluated.
Differential Revision: https://reviews.llvm.org/D103258
loadDylib() keeps a name->DylibFile cache, but it only writes
to the cache once the DylibFile constructor has completed.
So dylib loads done recursively from the DylibFile constructor
wouldn't use the cache.
Now, we load additional dylibs after writing to the cache,
which means the cache now gets used for dylibs loaded because
they're referenced from other dylibs.
Related to PR49514 and PR50101, but no dramatic behavior change in itself.
(Technically we no longer crash when a tbd file reexports itself,
but that doesn't happen in practice. We now accept it silently instead
of crashing; ld64 has a diag for the reexport cycle.)
Differential Revision: https://reviews.llvm.org/D103423
As the existing test unreachable.ll shows, we should be doing more
work to avoid entering unreachable blocks: we should not stop
vectorization just because a PHI incoming value from an unreachable
block cannot be vectorized. We know that particular value will never
be used so we can just replace it with poison.
When on KNL and L2 or Tile layer is detected, manually add
the corresponding layer which is equivalent.
Differential Revision: https://reviews.llvm.org/D102865
We now make up a TypeLoc for the class receiver to simplify visiting,
notably for indexing, availability, and clangd.
Differential Revision: https://reviews.llvm.org/D101645
Each var argument to an attach or detach clause must be a
Fortran variable or array with the pointer or allocatable attribute.
This patch enforce this restruction.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D103279
WrapperFunctionResult is a C++ wrapper for __orc_rt_CWrapperFunctionResult
that automatically manages the underlying struct.
The Simple Packed Serialization (SPS) utilities support a simple serialization
scheme for wrapper function argument and result buffers:
Primitive typess (bool, char, int8_t, and uint8_t, int16_t, uint16_t, int32_t,
uint32_t, int64_t, uint64_t) are serialized in little-endian form.
SPSTuples are serialized by serializing each of the tuple members in order
without padding.
SPSSequences are serialized by serializing a sequence length (as a uint64_t)
followed by each of the elements of the sequence in order without padding.
Serialization/deserialization always involves a pair of SPS type tag (a tag
representing the serialized format to use, e.g. uint32_t, or
SPSTuple<bool, SPSString>) and a concrete type to be serialized from or
deserialized to (uint32_t, std::pair<bool, std::string>). Serialization for new
types can be implemented by specializing the SPSSerializationTraits type.
When applying the changes in 8edd3464af,
it seems that this bit got merged incorrectly and no test coverage
caught the issue. This fixes the diagnostic and adds a test.
When completing an Objective-C method declaration by name, we need to
preserve the leading text as a `qualifier` so we insert it properly
before the first typed text chunk.
Differential Revision: https://reviews.llvm.org/D100798
This patch transforms the sequence
lea (reg1, reg2), reg3
sub reg3, reg4
to two sub instructions
sub reg1, reg4
sub reg2, reg4
Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register of ADD).
Differential Revision: https://reviews.llvm.org/D101970
When we're remapping an AddRec, the AddRec constructed by a partial
rewrite might not make sense. This triggers an assertion complaining
it's not loop-invariant.
Instead of constructing the partially rewritten AddRec, just skip
straight to calling evaluateAtIteration.
Testcase was automatically reduced using llvm-reduce, so it's a little
messy, but hopefully makes sense.
Differential Revision: https://reviews.llvm.org/D102959
As suggested in https://bugs.llvm.org/show_bug.cgi?id=50527, this
moves the DenseMapInfo for APInt and APSInt into the respective
headers, removing the need to include APInt.h and APSInt.h from
DenseMapInfo.h.
We could probably do the same from StringRef and ArrayRef as well.
Differential Revision: https://reviews.llvm.org/D103422
This guarantees they meet this overlap exception:
"The destination EEW is smaller than the source EEW and the overlap
is in the lowest-numbered part of the source register group"
Being a single register guarantees the overlap is always in the
lowerst-number part of the group.
Reviewed By: frasercrmck, khchen
Differential Revision: https://reviews.llvm.org/D103351
Compares are considered a narrowing operation for register overlap.
I believe for LMUL<=1 they meet this exception to allow overlap
"The destination EEW is smaller than the source EEW and the overlap is in the
lowest-numbered part of the source register group"
Both the result and the sources will occupy a single register for
LMUL<=1 so the overlap would always be in the "lowest-numbered part".
Reviewed By: frasercrmck, HsiangKai
Differential Revision: https://reviews.llvm.org/D103336
This function was added in D67589 and returns an internal CommandReturnObject
which isn't allowed in the SB API. This patch just makes it private as all uses
of this function are inside SBCommandReturnObject.
Reviewed By: jankratochvil
Differential Revision: https://reviews.llvm.org/D103390
D101841 added this test. It appears to generate different outcome on different platforms.
Make it to only call -coro-split instead of entire O2 pipeline to simplify the test flow.
Hope this will make the test more robust.
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D103418
Implemented better scheme for perfect/shuffled matches of the gather
nodes which allows to fix the performance regressions introduced by
earlier patches. Starting detecting matches for broadcast nodes and
extractelement gathering.
Differential Revision: https://reviews.llvm.org/D102920
This change adds tests specifically for --parent-recurse-depth, --quiet
and -o. The test for -o found a typo in an error message which is also
fixed in this change.
Differential Revision: https://reviews.llvm.org/D103250
This is a re-application of dc67299 which was reverted in f63adf5b because
it broke the build. The issue should now be fixed.
Attribution note: The original author of this patch is Erik Pilkington.
I'm only trying to land it after rebasing.
Differential Revision: https://reviews.llvm.org/D91630
Because half is limited to the `cl_khr_fp16` extension being enabled,
`DefaultLvalueConversion` can fail when it's not enabled.
The original assumption that it will never fail is therefore wrong now.
Fixes: PR47976
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D103175
Turns out the only purpose of this class was verify if device ID
was in range or not which could be done easily by using g_atl_machine.
Still getting rid of g_atl_machine is pending which would be done in
a later patch.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D103443
InstCombine didn't perform the transformations when fmul's operands were
the same instruction because it required to have one use for each of them
which is false in the case. This patch fixes this + adds tests for them
and introduces a new function isOnlyUserOfAnyOperand to check these cases
in a single place.
This patch is a result of discussion in D102574.
Differential Revision: https://reviews.llvm.org/D102698
When toolchain can supports all of arm, armhf and armv6m architectures compiler-rt
libraries won't compile because architecture specific flags are appended to single
BUILTIN_CFLAGS variable.
Differential revision: https://reviews.llvm.org/D103363
The current loop or any of its sub-loops may be infinite. Unless the
function or the loops are marked as mustprogress, this in itself makes
the loop *not* dead.
This patch moves the logic to check whether the current loop is finite
or mustprogress to `isLoopDead` and also extends it to check the
sub-loops. This should fix PR50511.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D103382
Previously, this assumed use of ModuleOp and FuncOp. There is no need to
restrict this, and using interfaces allows these patterns to be used
during dialect conversion to LLVM.
Some assertions were removed due to inconsistent implementation of
FunctionLikeOps.
Differential Revision: https://reviews.llvm.org/D103447
If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff55.
This patch updates the code to freeze the index, unless it is proven to
not be poison.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D103378
This patch extends the RISC-V lowering of the 'fastcc' calling
convention to vector types, both fixed-length and scalable. Without this
patch, any function passing or returning vector types by value would
throw a compiler error.
Vectors are handled in 'fastcc' much as they are in the default calling
convention, the noticeable difference being the extended set of scalar
GPR registers that can be used to pass vectors indirectly.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D102505