Migrate some MOVE instruction MC tests from test/CodeGen/M68k.
Unfortunately the tests touched in this commit were failed due to
lacking of the `abs.W` operand, which forces any memory address parsed
from assembly being represented in 32-bits.
We're temporarily allowing these unwanted widening in the tests until
the support for `abs.W` is there.
These functions will be used in a future patch to implement
trigonometric functions. Unit tests have been added but to the
libc-long-running-tests suite. The unit tests long running because we
compare against MPFR computations performed at 1280 bits of precision.
Some cleanups or elimination of repeated patterns can be done as follow
up changes.
Differential Revision: https://reviews.llvm.org/D104817
This is the first step to enable PPC64 support huge frame size(>2G). Also fix an assertion error for frame size, i.e.,`int x; !isInt<32>(x);` should be always evaluated false, so the guard code for frame size is impossible to hit.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D107435
The new pass manager does not allow adding module passes at the
-polly-position=before-vectorizer extension point. Introduce a
DumpFunctionPass that dumps only current function. In contrast to the
legacy pass manager's -polly-dump-before, each function will be dumped
into its own file. -polly-dump-before-file is still not supported.
The DumpFunctionPass uses llvm::CloneModule to copy the current function
into a new module and then write it into a file.
* This is the native data layout for PyTorch and npcomp was using the prior version before cleanup.
Differential Revision: https://reviews.llvm.org/D108527
* Resolves a TODO by making this configurable by downstreams.
* This seems to be the last thing allowing full use of the Python bindings as a library within another project (i.e. be embedding them).
Differential Revision: https://reviews.llvm.org/D108523
This test was not modifying the pointer in the loop, so the loads
just ended up as undef, without relation to loop load PRE.
Pass the alloca to the called function, so the memory is
potentially modified.
4ad41902e8 changed this code to
propagate Changed if scalar GEP PRE is performed. However, as
implemented this would skip the load PRE entirely if GEP indices
were PREd. Make sure load PRE runs even if Changed is already
true.
This likely has no functional effect as load PRE would then
occur on a later GVN iteration.
This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
verifyDieRanges function checks for the intersected address ranges.
It adds child DieRangeInfo into parent DieRangeInfo to check
whether children have overlapping address ranges. It is safe to not add
DieRangeInfo with empty address range into parent's children list.
This decreases the number of children which should be navigated and as a result
decreases execution time(parents having a lot of children with empty ranges
spend much time navigating them). For this command: "llvm-dwarfdump --verify clang-repl"
execution time decreased from 220 sec till 75 sec.
Differential Revision: https://reviews.llvm.org/D107554
When `_Compare` is a function parameter already (so it's not `void`
and it's not an abominable function type), `add_lvalue_reference_t<_Compare>`
is simply a synonym for `_Compare&`. We don't need to pull in `<type_traits>`
and instantiate a template trait to figure that out.
Differential Revision: https://reviews.llvm.org/D108400
Extend matchShuffleAsBlend to not only match against known in-place elements for BLEND shuffles, but use isElementEquivalent to determine if the shuffle mask's referenced element is the same as the in-place element.
This allows us to replace a number of insertps instructions with more general blendps instructions (better opportunities for commutation, concatenation etc.).
Before lowering shuffles, see if we can merge horizontal ops or canonicalize the shuffle mask to point to the same LHS/RHS of the HOps when an HOp's args are repeated.
Broadwell is mainly a die shrink of Haswell, but the model had many of the scheduling classes in different orders, making side-by-side comparisons very difficult.
The InstRW overrides are still quite different, but at least that part of the side-by-side diff is now in the same position.
This was noticed while I was trying to investigate diffs between llvm-mca and other perf analyzers in https://uica.uops.info/ - we used to be able to do diffs between most of the models very easily, but we seem to have lost that simplicity as classes have been altered, models have been refined and other models have rotted.
The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.
In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.
Adjusting the reduction recipes still relies on references to the
original IR, which can become outdated by the first-order recurrence
handling. Until reduction recipe construction does not require IR
references, move it before first-order recurrence handling, to prevent a
crash as exposed by D106653.
This patch supported the R_X86_64_32S relocation and add the Pointer32Signed generic edge kind.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D108446
Renames the blobSerializationRoundTrip test helper function to
spsSerializationRoundTrip ('blob' was the placeholder name for the serialization
scheme during prototyping, this function was missed when renaming everything
for the mainline). Also drops explicit template arguments at call sites where
they can be inferred (and are obvious) from the call argument type.
This may overlap partially with the reassociate pass,
but it seems simple enough that we should try it here
in InstCombine to enable other folds.
This shows up as an opportunity and potential regression
if we improve a subtract fold with 'not' ops to be more
general.
Add a variant of mve-vqdmulh tests that uses min/max intrinsics
directly, including a scalar test that shows it misbehaving for min
intrinsics and a fix for the combine to prevent it from misbehaving.
This patch cleans-up the file generation code in Flang's frontend
driver. It improves the layering between
`CompilerInstance::CreateDefaultOutputFile`,
`CompilerInstance::CreateOutputFile` and their various clients.
* Rename `CreateOutputFile` as `CreateOutputFileImpl` and make it
private. This method is an implementation detail.
* Instead of passing an `std::error_code` out parameter into
`CreateOutputFileImpl`, have it return Expected<>. This is a bit shorter
and idiomatic LLVM.
* Make `CreateDefaultOutputFile` (which calls `CreateOutputFileImpl`)
issue an error when file creation fails. The error code from
`CreateOutputFileImpl` is used to generate a meaningful diagnostic
message.
* Remove error reporting from `PrintPreprocessedAction::ExecuteAction`.
This is only for cases when output file generation fails. This is
handled in `CreateDefaultOutputFile` instead (see the previous point).
* Inline `AddOutputFile` into its only caller,
`CreateDefaultOutputFile`.
* Switch from `lvm::buffer_ostream` to `llvm::buffer_unique_ostream>`
for non-seekable output streams. This simplifies the logic in the driver
and was introduced for this very reason in [1]
* Moke sure that the diagnostics from the prescanner when running `-E`
(`PrintPreprocessedAction::ExecuteAction`) are printed before the actual
output is generated.
* Update comments, add test.
NOTE: This patch relands [2]. As suggested by Michael Kruse in the
post-commit/post-revert review, I've added the following:
```
config.errc_messages = "@LLVM_LIT_ERRC_MESSAGES@"
```
in Flang's `lit.site.cfg.py.in`. This way, `%errc_ENOENT` in
output-paths.f90 gets the correct value on Windows as well as on Linux.
[1] https://reviews.llvm.org/D93260
[2] fd21d1e198
Reviewed By: ashermancinelli
Differential Revision: https://reviews.llvm.org/D108390
All ExecutorProcessControl subclasses must provide an
ExecutorProcessControl::MemoryAccess object that can be used to access executor
memory from the JIT process. The EPCGenericMemoryAccess class provides an
off-the-shelf MemoryAccess implementation for JITs that do not need (or cannot
provide) a specialized MemoryAccess implementation. This simplifies the process
of creating new ExecutorProcessControl implementations.