Previously, the `SValBuilder` could not encounter expressions of the
following kind:
NonLoc OP Loc
Loc OP NonLoc
Where the `Op` is other than `BO_Add`.
As of now, due to the smarter simplification and the fixedpoint
iteration, it turns out we can.
It can happen if the `Loc` was perfectly constrained to a concrete
value (`nonloc::ConcreteInt`), thus the simplifier can do
constant-folding in these cases as well.
Unfortunately, this could cause assertion failures, since we assumed
that the operator must be `BO_Add`, causing a crash.
---
In the patch, I decided to preserve the original behavior (aka. swap the
operands (if the operator is commutative), but if the `RHS` was a
`loc::ConcreteInt` call `evalBinOpNN()`.
I think this interpretation of the arithmetic expression is closer to
reality.
I also tried naively introducing a separate handler for
`loc::ConcreteInt` RHS, before doing handling the more generic `Loc` RHS
case. However, it broke the `zoo1backwards()` test in the `nullptr.cpp`
file. This highlighted for me the importance to preserve the original
behavior for the `BO_Add` at least.
PS: Sorry for introducing yet another branch into this `evalBinOpXX`
madness. I've got a couple of ideas about refactoring these.
We'll see if I can get to it.
The test file demonstrates the issue and makes sure nothing similar
happens. The `no-crash` annotated lines show, where we crashed before
applying this patch.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D115149
The immediate size check on StepNumerator did not take into account
that vmul.vi does not exist. It also did not account for power of 2
constants that can be done with vshl.vi.
This patch fixes this by moving the conversion from mul to shift
further up. Then we can consider the immediates separately for MUL
vs SHL. For MUL I've allowed simm12 which requires a single addi
before a vmul.vx. For SHL I've allowed any uimm5 which works with
vshl.vi. We could relax these further in the future. This is a
starting point that allows us to emit the same number of instructions
we were already using for smaller numerators.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D115081
Reduction functions were guarded before which was wrong, these are SPMD
compatible.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D115159
Before SPMDzation it was sufficient to add an incompatible instruction
to the SPMDCompatibilityTracker. However, now adding instructions means
they need guarding. As calls cannot be guarded in general we need to
explicitly prevent SPMD mode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D115158
Without this patch when using CMAKE_CXX_STANDARD=20
and MSVC 19.30.30705.0 compilation of unit tests
fails with
llvm\utils\unittest\googlemock\include\gmock/gmock-actions.h(828): error C2039: 'result_of': is not a member of 'std'
Patch is taken from Google Test:
61f010d703
Do not use std::result_of as it was removed in C++20.
Differential Revision: https://reviews.llvm.org/D115163
A new basic block ordering improving existing MachineBlockPlacement.
The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality and thus processor I-cache utilization. This is
achieved via increasing the number of fall-through jumps and co-locating
frequently executed nodes together. The name follows the underlying
optimization problem, Extended-TSP, which is a generalization of classical
(maximum) Traveling Salesmen Problem.
The algorithm is a greedy heuristic that works with chains (ordered lists)
of basic blocks. Initially all chains are isolated basic blocks. On every
iteration, we pick a pair of chains whose merging yields the biggest increase
in the ExtTSP value, which models how i-cache "friendly" a specific chain is.
A pair of chains giving the maximum gain is merged into a new chain. The
procedure stops when there is only one chain left, or when merging does not
increase ExtTSP. In the latter case, the remaining chains are sorted by
density in decreasing order.
An important aspect is the way two chains are merged. Unlike earlier
algorithms (e.g., based on the approach of Pettis-Hansen), two
chains, X and Y, are first split into three, X1, X2, and Y. Then we
consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y,
X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score.
This improves the quality of the final result (the search space is larger)
while keeping the implementation sufficiently fast.
Differential Revision: https://reviews.llvm.org/D113424
f526c600c0 had a concern raised because of an invalid typesize request
on a scalable vector, which this patch addresses.
Prevent shouldReduceLoadWidth from attempting to query the bit size, and
add a regression test in sve-extract-fixed-vector.ll.
Differential Revision: https://reviews.llvm.org/D115156
Minor fix to the lit.cfg. Currently, nvptx runs the tests twice on the new runtime.
Soon, amdgpu will run them on the new runtime as well as the old.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D115150
Added TTI queries for the cost of a VP Memory operation, and added Opcode,
DataType and Alignment to the hasActiveVectorLength() interface.
Reviewed By: Roland Froese
Differential Revision: https://reviews.llvm.org/D109416
Clang trunk rejects the new test case, but this is a Clang bug
(PR47414, 47509, 50864, 44833).
```
In module 'std' imported from /Users/aodwyer/llvm-project/libcxx/test/std/ranges/range.adaptors/range.transform/general.pass.cpp:17:
/Users/aodwyer/llvm-project/build2/include/c++/v1/__ranges/transform_view.h:85:44: error: constraints not satisfied for alias template 'range_reference_t' [with _Rp = const NonConstView]
regular_invocable<const _Fn&, range_reference_t<const _View>>
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/aodwyer/llvm-project/build2/include/c++/v1/__ranges/transform_view.h:416:25: note: in instantiation of template class 'std::ranges::transform_view<NonConstView, (lambda at /Users/aodwyer/llvm-project/libcxx/test/std/ranges/range.adaptors/range.transform/general.pass.cpp:73:71)>' requested here
-> decltype( transform_view(_VSTD::forward<_Range>(__range), _VSTD::forward<_Fn>(__f)))
^
```
We can work around this by adding a layer of indirection: put the
problematic constraint into a named concept and Clang becomes more
amenable to SFINAE'ing instead of hard-erroring.
Drive-by simplify `range.transform/general.pass.cpp` to make it clearer
what it's actually testing in this area.
Differential Revision: https://reviews.llvm.org/D115116
This patch adds the conversion pattern for the fircg.ext_array_coor
operation. It applies the address arithmetic on a dynamically shaped, shifted
and/or sliced array.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D113968
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
This patch adds `emitXCOFFSymbolLinkageWithVisibility` to MCNullStreamer to fix llvm_unreachable getting reached when using option `-filetype=null` on AIX.
Reviewed By: DiggerLin
Differential Revision: https://reviews.llvm.org/D114876
Lldb uses a pty to read/write to the standard input and output of the
debugged process. For host processes this would be automatically set up
by Target::FinalizeFileActions. The Qemu platform is in a unique
position of not really being a host platform, but not being remote
either. It reports IsHost() = false, but it is sufficiently host-like
that we can use the usual pty mechanism.
This patch adds the necessary glue code to enable pty redirection. It
includes a small refactor of Target::FinalizeFileActions and
ProcessLaunchInfo::SetUpPtyRedirection to reduce the amount of
boilerplate that would need to be copied.
I will note that qemu is not able to separate output from the emulated
program from the output of the emulator itself, so the two will arrive
intertwined. Normally this should not be a problem since qemu should not
produce any output during regular operation, but some output can slip
through in case of errors. This situation should be pretty obvious (to a
human), and it is the best we can do anyway.
For testing purposes, and inspired by lldb-server tests, I have extended
the mock emulator with the ability "program" the behavior of the
"emulated" program via command-line arguments.
Differential Revision: https://reviews.llvm.org/D114796
Recognize FreeBSD vmcores (kernel core dumps) through OS ABI = 0xFF
+ ELF version = 0, and do not process them via the elf-core plugin.
While these files use ELF as a container format, they contain raw memory
dump rather than proper VM segments and therefore are not usable
to the elf-core plugin.
Differential Revision: https://reviews.llvm.org/D114967
These tests tend to hang or crash on hardware that doesn't
support USM. Disabling them helps diagnose other issues. To safely
enable we require a means of testing whether USM is expected to work.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D115144
Conversion of LLVM named structs leads to them being renamed since we cannot
modify the body of the struct type once it is set. Previously, this applied to
all named struct types, even if their element types were not affected by the
conversion. Make this behvaior only applicable when element types are changed.
This requires making the LLVM dialect type-compatibility check recursively look
at the element types (arguably, it should have been doing than since the moment
the LLVM dialect type system stopped being closed). In addition, have a more
lax check for outer types only to avoid repeated check when necessary (e.g.,
parser, verifiers that are going to also look at the inner type).
Reviewed By: wsmoses
Differential Revision: https://reviews.llvm.org/D115037
Before, the CLANG_DEFAULT_LINKER cmake option was a global override for
the linker that shall be used on all toolchains. The linker binary
specified that way may not be available on toolchains with custom
linkers. Eg, the only linker for VE is named 'nld' - any other linker
invalidates the toolchain.
This patch removes the hard override and instead lets the generic
toolchain implementation default to CLANG_DEFAULT_LINKER. Toolchains
can now deviate with a custom linker name or deliberatly default to
CLANG_DEFAULT_LINKER.
Reviewed By: MaskRay, phosek
Differential Revision: https://reviews.llvm.org/D115045
If the condition of a select is a compare, pass its predicate to
TTI::getCmpSelInstrCost to get a more accurate cost value instead
of passing BAD_ICMP_PREDICATE.
I noticed that the commit message from D90070 had a comment about the
vectorized select predicate possibly being composed of other compares with
different predicate values, but I wasn't able to construct an example
where this was an actual issue. If this is an issue, I guess we could
add another check that the block isn't predicated for any reason.
Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D114646
This documentation supports the dataflow analysis framework (see "[RFC]
A dataflow analysis framework for Clang AST" on cfe-dev).
Since the implementation of the framework has not been committed yet,
right now the doc describes dataflow analysis in general.
Since this is the first markdown document in clang/docs, I added support
for Markdown to clang/docs/conf.py in the same way as it is done in
llvm/docs.
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D114231
Change to use R_VE_SREL32 for relative branch instructions instead of
R_VE_PC_LO32 in order to check ranges of relative branch isntructions
at link time correctly.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D115097
The v8.1-m ARMARM uses the vrinta.f16.f16 names, as opposed to
vrinta.f16. This adds an alias for it in the same way that we have for
f32 and f64.
Differential Revision: https://reviews.llvm.org/D68127
Change C++ header files placement to support multiple LLVM_RUNTIME_TARGETS
build. Also modifies regression test for it.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D114527
This is a cleanup of ModuleBufferization. Instead of storing information about writable function arguments in BufferizationAliasInfo, we can use isWritable and make the decision there, based on dialect-specifc bufferization state.
Differential Revision: https://reviews.llvm.org/D114930
Some projects [1,2,3] have flex-generated files besides bison-generated
ones.
Unfortunately, the comment `"/* A lexical scanner generated by flex */"`
generated by the tools is not necessarily at the beginning of the file,
thus we need to quickly skim through the file for this needle string.
Luckily, StringRef can do this operation in an efficient way.
That being said, now the bison comment is not required to be at the very
beginning of the file. This allows us to detect a couple more cases
[4,5,6].
Alternatively, we could say that we only allow whitespace characters
before matching the bison/flex header comment. That would prevent the
(probably) unnecessary string search in the buffer. However, I could not
verify that these tools would actually respect this assumption.
Additionally to this, e.g. the Twin project [1] has other non-whitespace
characters (some preprocessor directives) before the flex-generated
header comment. So the heuristic in the previous paragraph won't work
with that.
Thus, I would advocate the current implementation.
According to my measurement, this patch won't introduce measurable
performance degradation, even though we will do 2 linear scans.
I introduce the ignore-bison-generated-files and
ignore-flex-generated-files to disable skipping these files.
Both of these options are true by default.
[1]: https://github.com/cosmos72/twin/blob/master/server/rcparse_lex.cpp#L7
[2]: 22362cdcf9/sandbox/count-words/lexer.c (L6)
[3]: 11abdf6462/lab1/lex.yy.c (L6)
[4]: 47f5b2cfe2/B_yacc/1/y1.tab.h (L2)
[5]: 71d1bf9b1e/src/VBox/Additions/x11/x11include/xorg-server-1.8.0/parser.h (L2)
[6]: 3f773ceb13/Framework/OpenEars.framework/Versions/A/Headers/jsgf_parser.h (L2)
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D114510
Remove all function calls related to buffer equivalence from bufferize implementations.
Add a new PostAnalysisStep for scf.for that ensures that yielded values are equivalent to the corresponding BBArgs. (This was previously checked in `bufferize`.) This will be relaxed in a subsequent commit.
Note: This commit changes two test cases. These were broken by design
and should not have passed. With the new scf.for PostAnalysisStep, this
bug was fixed.
Differential Revision: https://reviews.llvm.org/D114927
This patch implements the intrinsic for ref.null.
In the process of implementing int_wasm_ref_null_func() and
int_wasm_ref_null_extern() intrinsics, it removes the redundant
HeapType.
This also causes the textual assembler syntax for ref.null to
change. Instead of receiving an argument: `func` or `extern`, the
instruction mnemonic is either ref.null_func or ref.null_extern,
without the need for a further operand.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D114979
Adding the default implementation of `getLoopIteratorTypes` and
`getLoopBounds` allows ExternalModels to override these methods.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115101
Collect equivalent BBArgs right after the equivalence analysis of the FuncOp and before bufferizing. This is in preparation of decoupling bufferization from aliasInfo.
Also gather equivalence info for CallOps, which was missing in the
previous commit.
Differential Revision: https://reviews.llvm.org/D114847
This adds support for header deprecation using
LLVM_ATTRIBUTE_C_DEPRECATED (note that we can't use
LLVM_ATTRIBUTE_DEPRECATED, which is C++ specific). This will not
help people using the FFI interface, but may help people using the
C headers.
Differential Revision: https://reviews.llvm.org/D114936
To support creating both a mask with just a single `true` and `false` values,
I had to relax the restriction in the verifier that the rank is always equal to
the length of the attribute array, in other words, we now allow:
- `vector.constant_mask [0] : vector<i1>` which gets lowered to
`arith.constant dense<false> : vector<i1>`
- `vector.constant_mask [1] : vector<i1>` which gets lowered to
`arith.constant dense<true> : vector<i1>`
(the attribute list for the 0-D case must be a singleton containing
either `0` or `1`)
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D115023