This patch adds a new ShuffleKind SK_Splice and then handle the cost in
getShuffleCost, as in experimental.vector.reverse.
Differential Revision: https://reviews.llvm.org/D104630
This patch fixes PR50823.
The shuffle mask should be twisted twice before gotten the correct one due to the difference between inner HOP and outer.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104903
We already have a fold for variable index with constant vector,
but if we can determine a scalar splat value, then it does not
matter whether that value is constant or not.
We overlooked this fold in D102404 and earlier patches,
but the fixed vector variant is shown in:
https://llvm.org/PR50817
Alive2 agrees on that:
https://alive2.llvm.org/ce/z/HpijPC
The same logic applies to scalable vectors.
Differential Revision: https://reviews.llvm.org/D104867
The function vectorizeChainsInBlock does not support scalable vector,
because function like canReuseExtract and isCommutative in the code
path assert with scalable vectors.
This patch avoids vectorizing blocks that have extract instructions with scalable
vector..
Differential Revision: https://reviews.llvm.org/D104809
Ignore top-level qualifiers in casts, which fixes issues in reinterpret_cast.
This rule comes from [expr.type]/8.2.2 which explains that casting to a
pr-qualified type should actually cast to the unqualified type. In C++
this is only done for types that aren't classes or arrays.
Fixes: PR49221
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D102689
Improve codegen when lowering the common vector shuffle case from the
vectorizer (op1[last]:op2[0:last-1]). This patch only handles this
common case as it is difficult to handle this more generally when using
fixed length vectors, due to being unable to use the SVE ext instruction.
Differential Revision: https://reviews.llvm.org/D105289
Loads of <4 x i8> vectors were modeled as extremely expensive. And while we
don't have a load instruction that supports this, it isn't that expensive to
create a vector of i8 elements. The codegen for this was fixed/optimised in
D105110. This now tweaks the cost model and enables SLP vectorisation of my
motivating case loadi8.ll.
Differential Revision: https://reviews.llvm.org/D103629
Opaque attributes that currently contain string literals can't currently be properly roundtripped as they are not printed as escaped strings. This leads to incorrect tokens being generated and the parser to almost certainly fail. This patch simply uses llvm::printEscapedString from LLVM. It escapes all non printable characters and quotes to \xx hex literals, and backslashes to two backslashes. This syntax is supported by MLIRs Lexer as well. The same function is also currently in use for the same purpose in printSymbolReference, printAttribute for StringAttr and many more in AsmPrinter.cpp.
Differential Revision: https://reviews.llvm.org/D105405
This patch fixes an issue which occurred in CodeGenPrepare and
HWAddressSanitizer, which both at some point create a map of Old->New
instructions and update dbg.value uses of these. They did this by
iterating over the dbg.value's location operands, and if an instance of
the old instruction was found, replaceVariableLocationOp would be
called on that dbg.value. This would cause an error if the same operand
appeared multiple times as a location operand, as the first call to
replaceVariableLocationOp would update all uses of the old instruction,
invalidating the old iterator and eventually hitting an assertion.
This has been fixed by no longer iterating over the dbg.value's location
operands directly, but by first collecting them into a set and then
iterating over that, ensuring that we never attempt to replace a
duplicated operand multiple times.
Differential Revision: https://reviews.llvm.org/D105129
Added support to check if architecture supports s_mulhi which is used as part of
the decision whether or not to use valu 24 bit mul (if the mulhi gets
transformed to a valu op anyway, then may as well use it).
This is an extension of the work in D97063
Differential Revision: https://reviews.llvm.org/D103321
Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14
Update CMakeLists.txt in the tutorial to reflect the latest changes in
LLVM. The demo project cannot be linked without added libraries.
Reviewed By: xgupta
Differential Revision: https://reviews.llvm.org/D105409
FeatureBitset is 4 64-bit values in an array. It's better passed by
reference rather than copying it.
I may be adding FeatureBitset as an argument to another function
and noticed this while working on that.
clang and gcc both seem to emit relocations in reverse order of
address. That means we can match relocations to their containing
subsections in `O(relocs + subsections)` rather than the `O(relocs *
log(subsections))` that our previous binary search implementation
required.
Unfortunately, `ld -r` can still emit unsorted relocations, so we have a
fallback code path for that (less common) case.
Numbers for linking chromium_framework on my 3.2 GHz 16-Core Intel Xeon W:
N Min Max Median Avg Stddev
x 20 4.04 4.11 4.075 4.0775 0.018027756
+ 20 3.95 4.02 3.98 3.985 0.020900768
Difference at 95.0% confidence
-0.0925 +/- 0.0124919
-2.26855% +/- 0.306361%
(Student's t, pooled s = 0.0195172)
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D105410
Summary: The patch adds the StringTable dumping to
llvm-readobj. Currently only XCOFF is supported.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D104613
Different constraints may share the same predicate, in this case, we
will generate duplicate ODS verification function.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D104369
Remove `getDynOperands` and `createOrFoldDimOp` from MemRef.h to decouple MemRef a bit from Tensor. These two functions are used in other dialects/transforms.
Differential Revision: https://reviews.llvm.org/D105260
Two bugs:
1. This tries to take the address of the last symbol plus the length
of the last symbol. However, the sorted vector is cuPtrVector,
not cuVector. Also, cuPtrVector has tombstone values removed
and cuVector doesn't. If there was a stripped value at the end,
the "last" element's value was UINT64_MAX, which meant the
sentinel value was one less than the length of that "last"
dead symbol.
2. We have to subtract in.header->addr. For 64-bit binaries that's
(1 << 32) and functionAddress is 32-bit so this is a no-op, but
for 32-bit binaries the sentinel's value was too large.
I believe this has no effect in practice since the first-level
binary search code in libunwind (in UnwindCursor.hpp) does:
uint32_t low = 0;
uint32_t high = sectionHeader.indexCount();
uint32_t last = high - 1;
while (low < high) {
uint32_t mid = (low + high) / 2;
if ((mid == last) ||
(topIndex.functionOffset(mid + 1) > targetFunctionOffset)) {
low = mid;
break;
} else {
low = mid + 1;
}
So the address of the last entry in the first-level table isn't really
checked -- except for the very end, but the check against `last` means
we just run the loop once more than necessary. But it makes `unwinddump` output
look less confusing, and it's what it looks was the intention here.
(No test since I can't think of a way to make FileCheck check that one
number is larger than another.)
Differential Revision: https://reviews.llvm.org/D105404
"bad second level page" and "second level compressed unwind table"
can now be grepped for.
(Also remove one of the two spaces between "second" and "level"
in the second message.)
Basically every kind of parseOptional* method in DialectAsmParser has a corresponding parse* method which will emit an error if the requested token has not been found. An odd one out of this rule is parseOptionalString which does not have a corresponding parseString method.
This patch adds that method and implements it in basically the same fashion as parseKeyword, by first going through parseOptionalString and emitting an error on failure.
Differential Revision: https://reviews.llvm.org/D105406
This API is not compatible with opaque pointers, the method
accepting an explicit pointer element type should be used instead.
Thankfully there were few in-tree users. The BPF case still ends
up using the pointer element type for now and needs something like
D105407 to avoid doing so.
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.
Differential Revision: https://reviews.llvm.org/D105395
Compiling LLVM with Clang modules and libc++ identified that
`Support/Printable.h` and `ADL/SmallVector.h` were using features that
live in these headers.
Differential Revision: https://reviews.llvm.org/D105402
Compiling clangd with Clang modules and libc++ revealed that
`support/Threading.h` uses `std::atomic` but wasn't including the
correct header.
Differential Revision: https://reviews.llvm.org/D105400
Fix offset calculation routines in padding checker to avoid assertion
errors described in bugzilla issue 50426. The fields that are subojbects
of zero size, marked with [[no_unique_address]] or empty bitfields will
be excluded from padding calculation routines.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D104097
Specifically the CreateMaskedStore and CreateMaskedScatter APIs.
The CreateMaskedLoad and CreateMaskedGather APIs will need an
additional type argument.
The bash wrapper script, `flang`, calls `flang-new -fc1` under the hood,
which does not support `--version` (this is consistent with `clang -cc1
--version`). This change is needed for `flang --version` to work as
expected.
Note that `flang --version` (the Flang bash wrapper script for the
compiler driver) gives rather minimal output compared to `flang-new
--version` (the Flang compiler driver). As the wrapper script is just a
temporary solution for us, this should be sufficient.
Differential Revision: https://reviews.llvm.org/D105352
This replaces the current ad-hoc implementation,
by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`,
with one exception that here in SimplifyCFG we are allowed to remove EH instructions.
Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return),
arithmetic/logic operations, etc.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D105374
This adds simple patterns for signed and unsigned saturating extract
narrow instructions. They combine a min/max/truncate into a single
instruction, providing that the immediates on the min/max are correct
for the saturation type. This is just handled in tablegen with some
extra patterns.
v2i64->v2i32 is not handled here as the min/max nodes are not legal,
making the lowering quite different.
Differential Revision: https://reviews.llvm.org/D103263
Allocate non-volatile registers in order to be compatible with ABI, regarding gpr_save.
Quoted from https://www.ibm.com/docs/en/ssw_aix_72/assembler/assembler_pdf.pdf page55,
> The preferred method of using GPRs is to use the volatile registers first. Next, use the nonvolatile registers
> in descending order, starting with GPR31.
This patch is based on @jsji 's initial draft.
Tested on test-suite and SPEC, found no degradation.
Reviewed By: jsji, ZarkoCA, xingxue
Differential Revision: https://reviews.llvm.org/D100167