Currently, BPF only contains three relocations:
R_BPF_NONE for no relocation
R_BPF_64_64 for LD_imm64 and normal 64-bit data relocation
R_BPF_64_32 for call insn and normal 32-bit data relocation
Also .BTF and .BTF.ext sections contain symbols in allocated
program and data sections. These two sections reserved 32bit
space to hold the offset relative to the symbol's section.
When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld
may attempt to resolve relocations for .BTF and .BTF.ext,
which we want to prevent. So we used R_BPF_NONE for such relocations.
This all works fine until when we try to do linking of
multiple objects.
. R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data
is different, so lld target->relocate() needs more context
to do a correct job.
. The same for R_BPF_64_32. More context is needed for
lld target->relocate() to differentiate call insn vs.
normal 32-bit data relocation.
. Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE,
they will not be relocated properly when multiple .BTF/.BTF.ext
sections are merged by lld.
This patch intends to address this issue by adding additional
relocation kinds:
R_BPF_64_ABS64 for normal 64-bit data relocation
R_BPF_64_ABS32 for normal 32-bit data relocation
R_BPF_64_NODYLD32 for .BTF and .BTF.ext style relocations.
The old R_BPF_64_{64,32} semantics:
R_BPF_64_64 for LD_imm64 relocation
R_BPF_64_32 for call insn relocation
The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values
is maintained. They are the most common use cases for
bpf programs and we want to maintain backward compatibility
as much as possible.
ExecutionEngine RuntimeDyld BPF relocations are adjusted as well.
R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and
other relocations will be ignored.
Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in
RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!"
fatal error.
FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp
are removed as they are not triggered in BPF backend.
BPF backend used FK_SecRel_8 for LD_imm64 instruction operands.
Differential Revision: https://reviews.llvm.org/D102712
cxx20_iterator_traits.compile.pass.cpp actually depends on
implementation details of libc++, which is not great;
but I just left a comment and moved on.
NFC, since no instructions have their AsmMatchConverter
changed, but prepares for that to happen.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103046
Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa
rG1ad4f887bd7692a9e63fb42586f0ece366f2fe01 incorrectly assumed that vXi64 non-uniform shifts were slow like vXi32 were - but llvm-mca (+Agner) both confirm that Haswell/Broadwell are full rate.
This relands commit 13dd65b3a1.
The original commit contained a test, which failed when compiled
for a MACH-O target.
This patch changes the test to run for x86_64-linux instead of
`%itanium_abi_triple`, to avoid having invalid syntax for MACH-O
sections. The patch itself does not care about section attribute
syntax and a x86 backend does not even need to be included in the
build.
Differential Revision: https://reviews.llvm.org/D102693
Since 4468e5b899 clang will prefer
the last one it finds of "-mimplicit-it" or "-Wa,-mimplicit-it".
Due to a mistake in that patch the compiler argument "-mimplicit-it"
was never marked as used, even if it was the last one and was passed
to llvm.
Move the Claim call back to the start of the loop and update
the testing to check we don't get any unused argument warnings.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D103086
NFC. Renames the variable in the dpp input operand generators
from DstRC to OldRC, because that is what it actually sets.
Also documents the importance of setting HasModifiers = 0 in the
dpp8 asm string.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103047
Change-Id: Ice69ae38f644de7f228a75ca47c43e88b1f7d9e1
We can only scalarize memory accesses if we know the index is valid.
This patch adjusts canScalarizeAcceess to fall back to
computeConstantRange to check if the index is known to be valid.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D102476
This patch is the first in a series of patches fixing markdown links and references inside the mlir documentation. I chose to split it in a few reviews to be able to iterate quicker and to ease review.
This patch addresses all broken references to other markdown files and sections inside the Dialects folder.
One change that was also done was to insert '/' between the markdown files and section:
Example:
Builtin.md#integertype
was changed to:
Builtin.md/#integertype
After compilation, hugo then translates the later to jump directly to the integer type section, but not the former. Not inserting the slash would simply jump to just the Builtin page, instead of the integertype section. I therefore changed occurrences of the former version to the later as well.
Differential Revision: https://reviews.llvm.org/D103011
This patch is the second in a series of patches fixing markdown links and references inside the mlir documentation.
This patch addresses all broken references to other markdown files and sections inside the Rationale folder.
In addition to fixing the links and references like in the previous patch, I also changed references which are URLs to the mlir.llvm.org/docs website, to proper relative markdown references instead.
Differential Revision: https://reviews.llvm.org/D103013
We could go either direction on this transform. VectorCombine already goes this
way for bitcasts (and handles more complicated cases using the cost model), so
let's try cast-first.
Deferring completely to VectorCombine is another possibility. But the backend
should be able to invert this easily when the vectors have the same shape, so
it doesn't seem like a transform that we need to avoid.
The motivating example from https://llvm.org/PR49081 has an int-to-float
sandwiched between 2 shuffles, and the backend currently does not reduce that,
so on x86, we get something like:
pshufd $249, %xmm0, %xmm0]
cvtdq2ps %xmm0, %xmm0
shufps $144, %xmm0, %xmm0
...instead of just a single conversion instruction.
Differential Revision: https://reviews.llvm.org/D103038
Disallow transfer ops that change the element type of the transfer. Such transfers could be supported in the future, if needed.
Differential Revision: https://reviews.llvm.org/D102746
We want to use `DexDeclareFile` to specify paths relative to a project root
directory. The option `--source-root-dir`, prior to this patch, causes dexter
to strip the path prefix from commands before passing them to a debugger, and
appends the prefix to file paths returned from a debugger. This patch changes
the behviour of `--source-root-dir`. Relative paths in commands, made possible
with `DexDeclareFile(relative/path)`, are appended to the `--source-root-dir`
directory.
A new option, `--debugger-use-relative-paths`, can be used alongside
`--source-root-dir` to reproduce the old behaviour: all paths passed to the
debugger will be made relative to `--source-root-dir`.
I've added a regression test source_root_dir.dex for this new behaviour, and
modified the existing `--source-root-dir` regression and unit tests to use
`--debugger-use-relative-paths`.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D100307
This adds support for the "count active bits" pattern, i.e.:
```
int countBits(unsigned val) {
int cnt = 0;
for( ; (val << cnt) != 0; ++cnt)
;
return cnt;
}
```
but a somewhat more general one:
```
int countBits(unsigned val, int start, int off) {
int cnt;
for (cnt = start; val << (cnt + off); cnt++)
;
return cnt;
}
```
alive2 is happy with all the tests there.
Note that, again, much like with the right-shift cases,
we don't require the `val != 0` guard.
This is the last pattern that was supported by
`detectShiftUntilZeroIdiom()`, which now becomes obsolete.
DexDeclareFile allows test producers to write test files with .dex extensions
that contain pure dexter commands.
.dex file commands do not need to be commented out like they do when written
inline within test source files.
DexDeclareFile commands are declarative in behaviour, they state that any
Dexter command seen from this point on will have its path attribute set to the
path declared in the DexDeclareFile command.
Differential Revision: https://reviews.llvm.org/D99651
It seems std::ostringstream ignores NaN signs on Darwin while it prints them on
Linux. This causes that LLDB behaves differently on those platforms which is
both confusing for users and it also means we have to deal with that in our
tests.
This patch manually implements the NaN/Inf printing (which are apparently
implementation defined) to make LLDB print the same thing on all platforms. The
only output difference in practice seems to be that we now print negative NaNs
as `-nan`, but this potentially also changes the output on other systems I
haven't tested this on.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D102845
This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(signed val) {
int cnt = 0;
for( ; (val >> cnt) != 0; ++cnt)
;
return cnt;
}
```
but a somewhat more general one:
```
int countActiveBits(signed val, int start, int off) {
int cnt;
for (cnt = start; val >> (cnt + off); cnt++)
;
return cnt;
}
```
This directly matches the existing 'logical right-shift until zero' idiom.
alive2 is happy with all the tests there.
Note that, again, much like with the original unsigned case,
we don't require the `val != 0` guard.
The old `detectShiftUntilZeroIdiom()` already supports this pattern,
the idea here is that the `val` must be positive (have at least one
leading zero), because otherwise the loop is non-terminating,
but since it is not `while(1)`, that would have been UB.
IRForTarget is never used by a pass manager or any other interface that requires
this class to inherit from `Pass`.
Also IRForTarget doesn't implement the current interface correctly because it
uses the `runOnModule` return value to indicate success/failure instead of
changed/not-changed, so if this ever ends up being used as a pass it would most
likely not work as intended.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D102677
This was originally failed because of llvm.org/pr21765 which describes that
LLDB can't call a debugee's functions, but I removed the (unnecessary)
function call in the rewrite. It seems that the actual bug here is that we
can't lookup static members at all, so let's X-FAIL the test for the right
reason.
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.
Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.
Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035
Reviewed By: vitalybuka, morehouse
Differential Revision: https://reviews.llvm.org/D102772
Determined from llvm-mca analysis, AVX2+ capable targets have a higher throughput for VPBLENDVB and VPMOVZX ops, making it cheaper to perform shift+select patterns for vXi8 shifts or extend/shift/truncate for vXi16 shifts. Similarly AVX512BW can perform vXi8 as extend/shift/truncate patterns.
Prevent users of `iter_args` of an affine for loop from being hoisted
out of it. Otherwise, LICM leads to a violation of the SSA dominance
(as demonstrated in the added test case).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=50103
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D102984
This previously handled memref::SubviewOp, but this can be extended to
all ops implementing the interface.
Differential Revision: https://reviews.llvm.org/D103076
Currently the vector load + extract gets lowered to a single scalar
store, not accounting for the fact that the index could be
out-of-bounds, which is poison, not UB.
See PR50382.
Clang adds a Decl in two phases to a DeclContext. First it adds it invisible and
then it makes it visible (which will add it to the lookup data structures). It's
important that we can't do lookups into the DeclContext we are currently adding
the Decl to during this process as once the Decl has been added, any lookup will
automatically build a new lookup map and add the added Decl to it. The second
step would then add the Decl a second time to the lookup which will lead to
weird errors later one. I made adding a Decl twice to a lookup an assertion
error in D84827.
In the first step Clang also does some computations on the added Decl if it's
for example a FieldDecl that is added to a RecordDecl.
One of these computations is checking if the FieldDecl is of a record type
and the record type has a deleted constexpr destructor which will delete
the constexpr destructor of the record that got the FieldDecl.
This can lead to a bug with the way we implement MinimalImport in LLDB
and the following code:
```
struct Outer {
typedef int HookToOuter;
struct NestedClass {
HookToOuter RefToOuter;
} NestedClassMember; // We are adding this.
};
```
1. We just imported `Outer` minimally so far.
2. We are now asked to add `NestedClassMember` as a FieldDecl.
3. We import `NestedClass` minimally.
4. We add `NestedClassMember` and clang does a lookup for a constexpr dtor in
`NestedClass`. `NestedClassMember` hasn't been added to the lookup.
5. The lookup into `NestedClass` will now load the members of `NestedClass`.
6. We try to import the type of `RefToOuter` which will try to import the `HookToOuter` typedef.
7. We import the typedef and while importing we check for conflicts in `Outer` via a lookup.
8. The lookup into `Outer` will cause the invisible `NestedClassMember` to be added to the lookup.
9. We continue normally until we get back to the `addDecl` call in step 2.
10. We now add `NestedClassMember` to the lookup even though we already did that in step 8.
The fix here is disabling the minimal import for RecordTypes from FieldDecls. We
actually already did this, but so far we only force the definition of the type
to be imported *after* we imported the FieldDecl. This just moves that code
*before* we import the FieldDecl so prevent the issue above.
Reviewed By: shafik, aprantl
Differential Revision: https://reviews.llvm.org/D102993
It's not clear why the whole test got disabled, but the linked bug report
has since been fixed and the only part of it that still fails is the test
for the too permissive lookup. This re-enables the test, rewrites it to use
the modern test functions we have and splits the failing part into its
own test that we can skip without disabling the rest.
The change is currently NFC, but exploited by the depending D102954.
Code to handle constants is borrowed from the general implementation
of Value::doRAUW().
Differential Revision: https://reviews.llvm.org/D103051
I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.
Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode: inverse_throughput
key:
instructions:
- 'VPXORYrr YMM0 YMM0 YMM0'
config: ''
register_initial_values: []
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error: ''
info: ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...
```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.
Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode: inverse_throughput
key:
instructions:
- 'VPXORYrr YMM0 YMM0 YMM0'
config: ''
register_initial_values: []
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error: ''
info: ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?
That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode: inverse_throughput
key:
instructions:
- 'VPXORYrr YMM0 YMM0 YMM0'
config: ''
register_initial_values: []
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error: ''
info: ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D102522
I cannot find documentation on this CPU, and it
is not supported by the Arm Compiler 5 product either.
It was likely a mistake or a different name for the
"ep9312", which is an Arm based Cirrus Logic chip.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D103024