canonicalizeShuffleMaskWithHorizOp currently only supports shuffles with 1 or 2 sources, but PR41813 will require us to support higher numbers of sources.
This patch just generalizes the initial setup stages to ensure all src ops are the same type and opcode and then will continue to early out if we have more than 2 sources.
This reverts commit e5553b9a6a.
Tests are not allowed to modify the source. Please figure out a way to
use %t rather than dynamically modifying the inputs.
For some reason some builds dont like the arrow operator access. using the deref then access should fix the issue.
/home/buildbots/ppc64le-flang-mlir-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/llvm/include/llvm/ADT/iterator.h:171:34: error: taking the address of a temporary object of type 'llvm::StringRef' [-Waddress-of-temporary]
PointerT operator->() { return &static_cast<DerivedT *>(this)->operator*(); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/buildbots/ppc64le-flang-mlir-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/llvm/include/llvm/ADT/StringExtras.h:387:13: note: in instantiation of member function 'llvm::iterator_facade_base<llvm::mapped_iterator<mlir::tblgen::TypeParameter *, (lambda at /home/buildbots/ppc64le-flang-mlir-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/mlir/tools/mlir-tblgen/TypeDefGen.cpp:414:19), llvm::StringRef>, std::random_access_iterator_tag, llvm::StringRef, long, llvm::StringRef *, llvm::StringRef &>::operator->' requested here
Len += I->size();
When adding this function in https://reviews.llvm.org/D68794 I did not
notice that internal_prctl has the API of the syscall to prctl rather
than the API of the glibc (posix) wrapper.
This means that the error return value is not necessarily -1 and that
errno is not set by the call.
For InitPrctl this means that the checks do not catch running on a
kernel *without* the required ABI (not caught since I only tested this
function correctly enables the ABI when it exists).
This commit updates the two calls which check for an error condition to
use `internal_iserror`. That function sets a provided integer to an
equivalent errno value and returns a boolean to indicate success or not.
Tested by running on a kernel that has this ABI and on one that does
not. Verified that running on the kernel without this ABI the current
code prints the provided error message and does not attempt to run the
program. Verified that running on the kernel with this ABI the current
code does not print an error message and turns on the ABI.
All tests done on an AArch64 Linux machine.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D94425
rG73a44f437bf1 result in 256-bit packss/packus ops with additional shuffles that shuffle combining can sometimes try to convert back into a truncation.
This commit extends SVEIntrinsicOpts::optimizeConvertFromSVBool to
identify and remove longer chains of redundant SVE reintepret
intrinsics. For example, the following chain of redundant SVE
reinterprets is now recognised as redundant:
%a = <vscale x 2 x i1>
%1 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 2 x i1> %a)
%2 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %1)
%3 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %2)
%4 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %3)
%5 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %4)
%6 = <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %5)
ret <vscale x 2 x i1> %6
and will be replaced with:
ret <vscale x 2 x i1> %a
Eliminating these can sometimes mean emitting fewer unnecessary
loads/stores when lowering to assembly.
Differential Revision: https://reviews.llvm.org/D94074
In places where we calculate costs using TTI.getXXXCost() interfaces
I have changed the code to use InstructionCost instead of unsigned.
The change is non functional since InstructionCost behaves in the
same way as an integer for valid costs. Currently the getXXXCost()
functions used in this file do not return invalid costs.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Differential revision: https://reviews.llvm.org/D94484
Only class declarations should be inside anonymous namespaces
(https://llvm.org/docs/CodingStandards.html#anonymous-namespaces)
Instead of using a anonymous namespace, just mark the functions in it as
static (some of them already were).
This simplifies the diff for D94486.
The help text for `-I` was recently expanded in [1]. The expanded
version focuses on explaining the semantics of `-I` in Clang. We are now
in the process of adding support for `-I` in Flang and this new
description is incompatible with the semantics of `-I` in Flang. This
was brought up in this review:
* https://reviews.llvm.org/D93453
This patch reverts the original change in Options.td. This way the help
text for `-I` remains generic enough so that it applies to both Clang
and Flang.
The expanded description of `-I` from [1] is moved to the
`DocBrief` field for `-I`. This field is prioritised over the help text
when generating ClangCommandLineReference.rst, so the user facing
documentation for Clang retains the expanded description:
* https://clang.llvm.org/docs/ClangCommandLineReference.html
`DocBrief` fields are currently not used in Flang.
As requested in the reviews, the help text and the expanded description
are slightly refined.
[1] Commit: 8dd4e3ceb8
Differential Revision: https://reviews.llvm.org/D94169
This reuses the code from yaml2obj (moves it to ELFYAML.h).
With it we can set the `sh_entsize` in a single place in `obj2yaml`.
Note that it also fixes a bug of `yaml2obj`: we do not
set the `sh_entsize` field for the `SHT_ARM_EXIDX` section properly.
Differential revision: https://reviews.llvm.org/D93858
The isVMOVNOriginalMask was previously only checking for two input
shuffles that could be better expanded as vmovn nodes. This expands that
to single input shuffles that will later be legalized to multiple
vectors.
Differential Revision: https://reviews.llvm.org/D94189
Currently we don't support multiple SHT_SYMTAB_SHNDX sections
and the DT_SYMTAB_SHNDX tag currently.
This patch implements it and fixes the
https://bugs.llvm.org/show_bug.cgi?id=43991.
I had to introduce the `struct DataRegion` to ELF.h,
it is used to represent a region that might have no known size.
It is needed, because we don't know the size of the extended
section indices table when it is located via DT_SYMTAB_SHNDX.
In this case we still want to validate that we don't read
past the end of the file.
Differential revision: https://reviews.llvm.org/D92923
Currently dsymutil will silently fail when processing binaries with
Dwarf 5 debug info. This patch adds rudimentary support for Dwarf 5 in
dsymutil.
- Recognize relocations in the debug_addr section.
- Recognize (a subset of) Dwarf 5 form values.
- Emits valid Dwarf 5 compile unit header chains.
To simplify things (and avoid having to emit indexed sections) I decided
to emit the relocated addresses directly in the debug info section.
- DW_FORM_strx gets relocated and rewritten to DW_FORM_strp
- DW_FORM_addrx gets relocated and rewritten to DW_FORM_addr
Obviously there's a lot of work left, but this should be a step in the
right direction.
rdar://62345491
Differential revision: https://reviews.llvm.org/D94323
Also old mir tests are updated to meet last changes in STATEPOINT format.
Reviewers: reames, dantrushin
Reviewed By: reames, dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D94482
Default value is not changed, so it is NFC actually.
The option allows to use gc values on registers in landing pads.
Reviewers: reames, dantrushin
Reviewed By: reames, dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D94469
Add pseudo instruction to allow early termination of pixel shader
anywhere based on the value of SCC. The intention is to use this
when a mask of live lanes is updated, e.g. live lanes in WQM pass.
This facilitates early termination of shaders even when EXEC is
incomplete, e.g. in non-uniform control flow.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D88777
Copy over the __eh_frame from the binary into the dSYM. This helps
kernel developers that are working with only dSYMs (i.e. no binaries)
when debugging a core file. This only kicks in when the __eh_frame
exists in the linked binary. Most of the time ld64 will remove the
section in favor of compact unwind info. When it is emitted, it's
generally small enough and should not bloat the dSYM.
rdar://69774935
Differential revision: https://reviews.llvm.org/D94460
InlineSpiller::foldMemoryOperand unties registers before an attempt to fold and
does not restore tied-ness in case of failure.
I do not have a particular test for demo of invalid behavior.
This is something of clean-up.
It is better to keep the behavior correct in case some time in future it happens.
Reviewers: reames, dantrushin
Reviewed By: dantrushin, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D94389
Add a warning when the timestmap doesn't match between the object file
and the debug map entry. We were already emitting such warnings for
archive members and swift interface files. This patch also unifies the
warning across all three.
rdar://65614640
Differential revision: https://reviews.llvm.org/D94536
This change enables volatile use of persistent memory for omp_large_cap_mem*
on supported systems. It depends on libmemkind's support for persistent memory,
and requirements/details can be found at the following url.
https://pmem.io/2020/01/20/memkind-dax-kmem.html
Differential Revision: https://reviews.llvm.org/D94353
Contrary to the current visibility macro documentation, it appears that
gcc does handle visibility attribute on extern templates correctly, e.g.
https://godbolt.org/g/EejuV7. We need this so that extern template
instantiations of classes not marked _LIBCPP_TEMPLATE_VIS (e.g.
__vector_base_common) are correctly exported with gcc when building with
hidden visibility.
Reviewed By: ldionne
Differential Revision: https://reviews.llvm.org/D35388
- Merge 6706342f48 -- no more libcxx_needs_site_config, we now
always need it
- Since it was always off in practice, write_config bitrot. Unbitrot
it so that it works
- Remove copy step and let concat step write to final location
immediately -- and fix copy destination directory
As a side effect, libcxx/include/BUILD.gn now has only a single
sources list, which means the cmake sync script should be able to
automatically sync additions and removals of .h files. On the flipside,
this means this file now must be updated after most changes to
libcxx/include/__config_site.in, and looking through the last few months
of changes this looks like it's going to be a wash.
This is a pretty classic optimization. Instead of processing symbol
records and copying them to temporary storage, do a first pass to
measure how large the module symbol stream will be, and then copy the
data into place in the PDB file. This requires defering relocation until
much later, which accounts for most of the complexity in this patch.
This patch avoids copying the contents of all live .debug$S sections
into heap memory, which is worth about 20% of private memory usage when
making PDBs. However, this is not an unmitigated performance win,
because it can be faster to read dense, temporary, heap data than it is
to iterate symbol records in object file backed memory a second time.
Results on release chrome.dll:
peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%)
wall-j1: 0m30.844s -> 0m32.094s (slightly slower)
wall-j3: 0m20.968s -> 0m20.312s (slightly faster)
wall-j8: 0m19.062s -> 0m17.672s (meaningfully faster)
I gathered similar numbers for a debug, component build of content.dll
in Chrome, and the performance impact of this change was in the noise.
The memory usage reduction was visible and similar.
Because of the new parallelism in the PDB commit phase, more cores makes
the new approach faster. I'm assuming that most C++ developer machines
these days are at least quad core, so I think this is a win.
Differential Revision: https://reviews.llvm.org/D94267
promise is a header field but it is not guaranteed that it would be the third
field of the frame due to `performOptimizedStructLayout`.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D94137
The load/store instruction will be transformed to amx intrinsics in the
pass of AMX type lowering. Prohibiting the pointer cast make that pass
happy.
Differential Revision: https://reviews.llvm.org/D94372
An invalid permutation will trigger a C++ assertion when attempting to create an AffineMap from the permutation.
This patch adds an `isPermutation` function to check the given permutation before creating the AffineMap.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D94492
There could be some mis-alignments when copying origins not aligned.
I believe inaligned memcpy is rare so the cases do not matter too much
in practice.
1) About the change at line 50
Let dst be (void*)5,
then d=5, beg=4
so we need to write 3 (4+4-5) bytes from 5 to 7.
2) About the change around line 77.
Let dst be (void*)5,
because of lines 50-55, the bytes from 5-7 were already writen.
So the aligned copy is from 8.
Reviewed-by: eugenis
Differential Revision: https://reviews.llvm.org/D94552
This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 .
When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted.
It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag.
The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however.
This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`.
To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB.
It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however.
I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction).
Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB.
Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D92015
This is a small patch stating that a nocapture pointer cannot be returned.
Discussed in D93189.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94386