SemaBuiltinConstantArg has an early exit for that case that doesn't
produce an error and doesn't update the APInt. We need to detect that
case and not use the APInt value.
While there delete the signature of CheckX86BuiltinTileArgumentsRange
that takes a single Argument index to check. There's another version
that takes an ArrayRef and single value is convertible to an ArrayRef.
It's annoying to have to maintain multiple, nearly identical chains of if
statements which all set the same attributes.
Add a helper function, `addFlagsUsingAttrFn` which performs the attribute
setting.
Then, use wrappers for that function in `lowerCall` and `setArgFlags`.
(Note that the flag-setting code in `setArgFlags` was missing the returned
attribute. There's no selection for this yet, so no test. It's an example of
the kind of thing this lets us avoid, though.)
Differential Revision: https://reviews.llvm.org/D86159
Similar to this commit:
faf8065a99
Testcase is pretty much the same as
test/CodeGen/AArch64/tailcall-explicit-sret.ll
Except it uses i64 (since we don't handle the i1024 return values yet), and
doesn't have indirect tail call testcases (because we can't translate those
yet).
Differential Revision: https://reviews.llvm.org/D86148
Parsing DWARFv5 debug_loclist offsets when a CU is parsed is weighing
down memory usage of symbolizers that don't need to parse this data at
all. There's not much benefit to caching these anyway - since they are
O(1) lookup and reading once you know where the offset list starts (and
can do bounds checking with the offset list size too).
In general, I think it might be time to start paying down some of the
technical debt of loc/loclist/range/rnglist parsing to try to unify it a
bit more.
eg:
* Currently DWARFUnit has: RangeSection, RangeSectionBase, LocSection,
LocSectionBase, LocTable, RngListTable, LoclistTableHeader (be nice if
these were all wrapped up in two variables - one for loclists, one for
rnglists)
* rnglists and loclists are handled differently (see:
LoclistTableHeader, but no RnglistTableHeader)
* maybe all these types could be less stateful - lazily parse what they
need to, even reparsing rather than caching because it doesn't seem
too expensive, for instance. (though admittedly so long as it's
constantcost/overead per compilatiton that's probably adequate)
* Maybe implementing and using a DWARFDataExtractor that can be
sub-ranged (so we could slice it up to just the single contribution) -
though maybe that's not so useful because loc/ranges need to refer to
it by absolute, not contribution-relative mechanisms
Differential Revision: https://reviews.llvm.org/D86110
When a procedure name was used on the RHS of an assignment we were not
reporting the error. When one was used in an expression the error
message wasn't very good (e.g. "Operands of + must be numeric; have
INTEGER(4) and untyped").
Detect these cases in ArgumentAnalyzer and emit better messages,
depending on whether the named procedure is a function or subroutine.
Procedure names may appear as actual arguments to function and
subroutine calls so don't report errors in those cases. That is the same
case where assumed type arguments are allowed, so rename `isAssumedType_`
to `isProcedureCall_` and use that to decide if it is an error.
Differential Revision: https://reviews.llvm.org/D86107
This is restricted to single use loads, which if we fold to sextloads we can
find more optimal addressing modes on AArch64.
This also fixes an overload the MachineFunction::getMachineMemOperand() method
which was incorrectly using the MF alignment instead of the MMO alignment.
Differential Revision: https://reviews.llvm.org/D85966
By detecting this sign extend pattern early, we can uncover opportunities for
more optimizations.
Differential Revision: https://reviews.llvm.org/D85965
There should be an equivalent std.floor op to std.ceil. This includes
matching lowerings for SPIRV, NVVM, ROCDL, and LLVM.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85940
Create a reduction pass that accepts an optimization pass as argument
and only replaces the golden module in the pipeline if the output of the
optimization pass is smaller than the input and still exhibits the
interesting behavior.
Add a -test-pass option to test individual passes in the MLIR Reduce
tool.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D84783
We're (temporarily) disabling ExtInt for the '__atomic' builtins so we can better design their behavior later. The idea is until we do an audit/design for the way atomic builtins are supposed to work with _ExtInt, we should leave them restricted so they don't limit our future options, such as by binding us to a sub-optimal implementation via ABI.
Example after this change:
$ cat test.c
void f(_ExtInt(64) *ptr) {
__atomic_fetch_add(ptr, 1, 0);
}
$ clang -c test.c
test.c:2:22: error: argument to atomic builtin of type '_ExtInt' is not supported
__atomic_fetch_add(ptr, 1, 0);
^
1 error generated.
Differential Revision: https://reviews.llvm.org/D84049
VLD2/4 instructions cannot be predicated, so we cannot tail predicate
them from autovec. From intrinsics though, they should be valid as they
will just end up loading extra values into off vector lanes, not
effecting the on lanes. The same is true for loads in general where so
long as we are not using the other vector lanes, an unpredicated load
can be converted to a predicated one.
This marks VLD2 and VLD4 instructions as validForTailPredication and
allows any unpredicated load in tail predication loop, which seems to be
valid given the other checks we have.
Differential Revision: https://reviews.llvm.org/D86022
D85968 started to use `split-file` and while buildbots run fine while
doing `make check-lldb` by hand I get:
.../llvm-monorepo-clangassert/tools/lldb/test/SymbolFile/DWARF/Output/DW_AT_declaration-with-children.s.script: line 2: split-file: command not found
failed:
lldb-shell :: SymbolFile/DWARF/DW_AT_declaration-with-children.s
Differential Revision: https://reviews.llvm.org/D86144
There are some cases where the instruction that sets up the iteration
count for a tail predicated loop cannot be moved before the dlstp,
stopping tail predication entirely. This patch checks if the mov operand
can be used and if so, uses that instead.
Differential Revision: https://reviews.llvm.org/D86087
This patch adds more op/type conversion support
necessary for `spirv-runner`:
- EntryPoint/ExecutionMode: currently removed since we assume
having only one kernel function in the kernel module.
- StorageBuffer storage class is now supported. We are not
concerned with multithreading so this is fine for now.
- Type conversion enhanced, now regular offsets and strides
for structs and arrays are supported (based on
`VulkanLayoutUtils`).
- Support of `spc.AccessChain` that is modelled with GEP op
in LLVM dialect.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D86109
The CrossOver mutator is meant to cross over two given buffers (referred to as
the first/second buffer henceforth). Previously InsertPartOf/CopyPartOf calls
used in the CrossOver mutator incorrectly inserted/copied part of the second
buffer into a "scratch buffer" (MutateInPlaceHere of the size
CurrentMaxMutationLen), rather than the first buffer. This is not intended
behavior, because the scratch buffer does not always (i) contain the content of
the first buffer, and (ii) have the same size as the first buffer;
CurrentMaxMutationLen is typically a lot larger than the size of the first
buffer. This patch fixes the issue by using the first buffer instead of the
scratch buffer in InsertPartOf/CopyPartOf calls.
A FuzzBench experiment was run to make sure that this change does not
inadvertently degrade the performance. The performance is largely the same; more
details can be found at:
https://storage.googleapis.com/fuzzer-test-suite-public/fixcrossover-report/index.html
This patch also adds two new tests, namely "cross_over_insert" and
"cross_over_copy", which specifically target InsertPartOf and CopyPartOf,
respectively.
- cross_over_insert.test checks if the fuzzer can use InsertPartOf to trigger
the crash.
- cross_over_copy.test checks if the fuzzer can use CopyPartOf to trigger the
crash.
These newly added tests were designed to pass with the current patch, but not
without the it (with 790878f291 these tests do not
pass). To achieve this, -max_len was intentionally given a high value. Without
this patch, InsertPartOf/CopyPartOf will generate larger inputs, possibly with
unpredictable data in it, thereby failing to trigger the crash.
The test pass condition for these new tests is narrowed down by (i) limiting
mutation depth to 1 (i.e., a single CrossOver mutation should be able to trigger
the crash) and (ii) checking whether the mutation sequence of "CrossOver-" leads
to the crash.
Also note that these newly added tests and an existing test (cross_over.test)
all use "-reduce_inputs=0" flags to prevent reducing inputs; it's easier to
force the fuzzer to keep original input string this way than tweaking
cov-instrumented basic blocks in the source code of the fuzzer executable.
Differential Revision: https://reviews.llvm.org/D85554
There is an untested but useful case: `this` (even if not written) is counted as a
source variable.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D86044
This is a non-functional-change to generalize the printIR routines so that
the output can be saved and manipulated rather than being directly output
to dbgs(). This is a prerequisite change for many upcoming changes that
allow new ways of examining changes made to the IR in the new pass manager.
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D85999
* GNU ld places non-SHF_ALLOC sections after SHF_ALLOC sections. This has the
advantage that the file offsets of a non-SHF_ALLOC cannot be contained in
a PT_LOAD. This patch matches the behavior.
* For non-SHF_ALLOC non-orphan sections, GNU ld may assign non-zero sh_addr and
treat them similar to SHT_NOBITS (not advance location counter). This
is an alternative approach to what we have done in D85100.
By placing non-SHF_ALLOC sections at the end, we can drop special
cases in createSection and findOrphanPos added by D85100.
Different from GNU ld, we set sh_addr to 0 for non-SHF_ALLOC sections. 0
arguably is better because non-SHF_ALLOC sections don't appear in the memory
image.
ELF spec says:
> sh_addr - If the section will appear in the memory image of a process, this
> member gives the address at which the section's first byte should
> reside. Otherwise, the member contains 0.
D85100 appeared to take a detour. If we take a combined view on D85100 and this
patch, the overall complexity slightly increases (one more 3-line loop) and
compatibility with GNU ld improves.
The behavior we don't want to match is the special treatment of .symtab
.shstrtab .strtab: they can be matched in LLD but not in GNU ld.
Reviewed By: jhenderson, psmith
Differential Revision: https://reviews.llvm.org/D85867
We weren't looking through the parameters on calls at all.
E.g., say you had
```
declare i32 @zext(i32 zeroext %x)
...
%y = call i32 @zext(i32 %something)
...
```
At the point of the call, we wouldn't know that the %something should have the
zeroext attribute.
This sets flags in about the same way as
TargetLoweringBase::ArgListEntry::setAttributes.
Differential Revision: https://reviews.llvm.org/D86125
Summary:
This is a follow up for D82481. For .lcomm directive, although it's
not necessary to have .rename emitted, it's still desirable to do
it so that we do not see internal 'Rename..' gets print out in
symbol table. And we could have consistent naming between TC entry
and .lcomm. And also have consistent naming between IR and final
object file.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D86075
When the operand to the linalg.tensor_reshape op is a splat constant,
the result can be replaced with a splat constant of the same value but
different type.
Differential Revision: https://reviews.llvm.org/D86117
Allow non-VLX targets to use 512-bits VPERMV/VPERMV3 for 128/256-bit shuffles.
TBH I'm not sure these targets actually exist in the wild, but we're testing for them and its good test coverage for shuffle lowering/combines across different subvector widths.
Previously, it would successfully select and assert if not HSA or PAL
when expanding the pseudoinstruction. We don't need the
pseudoinstruction anymore since we know the total size after
legalization.
The code to determine the value size was overcomplicated and only
correct in the case where the result register already had a register
class assigned. We can always take the size directly from the
register's type.
This uses modern `split-file` tool to merge 5 `packed-relocs-error*.s` tests to a
new `packed-relocs-errors.s` and adds testing for GNU style.
Differential revision: https://reviews.llvm.org/D85835
We currently call the `llvm_unreachable` for the following YAML:
```
--- !ELF
FileHeader:
Class: ELFCLASS32
Data: ELFDATA2LSB
Type: ET_REL
Machine: EM_NONE
Flags: [ ]
```
it happens because the `Flags` key is present, though `EM_NONE` is a
machine type that has no known `EF_*` values and we call `llvm_unreachable` by mistake.
Differential revision: https://reviews.llvm.org/D86138
If the declaration is used in the reduction clause, it is captured by
reference by default. But if the declaration is a pointer and it is a
base for array-like reduction, this declaration can be captured by
value, since the pointee is reduced but not the original declaration.
Differential Revision: https://reviews.llvm.org/D85321
In this process we also create some other tests, in order to not lose
coverage when focusing on the annotated code
Differential Revision: https://reviews.llvm.org/D85962
We add the method `SyntaxTreeTest::treeDumpEqualOnAnnotations`, which
allows us to compare the treeDump of only annotated code. This will reduce a
lot of noise from our `BuildTreeTest` and make them short and easier to
read.
AMDGPU ISA isn't backwards compatible and hence -mcpu must always be specified during disassembly.
However, the AMDGPU target CPU is stored in e_flags in the ELF object.
This patch allows targets to implement CPU string detection, and also implements it for AMDGPU by looking at e_flags.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D84519