We can sometimes get code that does:
xe = zext i16 x to i32
ye = zext i16 y to i32
m = mul i32 xe, ye
me = zext i32 m to i64
r = vecreduce.add(me)
This "double extend" can trip up the reduction identification, but
should give identical results.
This extends the pattern matching to handle them.
Differential Revision: https://reviews.llvm.org/D87276
This is a followup to D86834, which partially fixed this issue in
InstSimplify. However, InstCombine repeats the same transform while
dropping poison flags -- which does not cover cases where poison is
introduced in some other way.
The fix here is a bit more comprehensive, because things are quite
entangled, and it's hard to only partially address it without
regressing optimization. There are really two changes here:
* Export the SimplifyWithOpReplaced API from InstSimplify, with an
added AllowRefinement flag. For replacements inside the TrueVal
we don't actually care whether refinement occurs or not, the
replacement is always legal. This part of the transform is now
done in InstSimplify only. (It should be noted that the current
AllowRefinement check is not sufficient -- that's an issue we
need to address separately.)
* Change the InstCombine fold to work by temporarily dropping
poison generating flags, running the fold and then restoring the
flags if it didn't work out. This will ensure that the InstCombine
fold is correct as long as the InstSimplify fold is correct.
Differential Revision: https://reviews.llvm.org/D87445
Follow up to D86429 to handle the remaining regressions.
This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered.
For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better.
Differential Revision: https://reviews.llvm.org/D87405
This change allow a CastExpr to have optional FPOptionsOverride object,
stored in trailing storage. Of all cast nodes only ImplicitCastExpr,
CStyleCastExpr, CXXFunctionalCastExpr and CXXStaticCastExpr are allowed
to have FPOptions.
Differential Revision: https://reviews.llvm.org/D85960
with P9 Model
Enable the pre-ra and post-ra scheduler strategy for Power10 as we want
to customize the heuristic later. And switch the scheduler model with P9
model before P10 Model is available. The NoSchedModel is modelled as
in-order cpu and the pre-ra scheduler is not bi-directional which will
have big impact on the scheduler.
Reviewed By: jji
Differential Revision: https://reviews.llvm.org/D86865
From ISA, fcmpu will raise the Floating-Point Invalid Operation
Exception (SNaN) if either of the operands is a Signaling NaN by setting
the bit VXSNAN. But the instruction description didn't set the
mayRaiseFPException which might have impact on the scheduling or some
backend optimization.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D83937
Tablegen does not have link time dependencies on MC. Having llvm-tblgen
depend on it causes it to be rebuilt in the gn build every time somebody
touches any cpp file in llvm/lib/MC* or llvm/lib/DebugInfo/Codeview*.
Touching tablegen invalidates most of the rest of the build, and
re-running it takes a while. This is is annoying for me when swapping
between branches that touch CodeView logic.
This dep was added to LLVMBuild.txt back in 2018, and presumably it was
carried over into the gn build.
Differential Revision: https://reviews.llvm.org/D87553
Because why would that be necessary? (I joke - I hadn't actually
expected this to be an issue but a content-hash-named filesystem means
the clang binary's just a bunch of numbers, and doesn't have 'clang'
anywhere in the name)
On macOS Big Sur the class descriptor contains the NSKVONotifying_
prefix. This is covered by TestDataFormatterObjCKVO.
Differential revision: https://reviews.llvm.org/D87545
In particular, we shouldn't make assumptions about globals which are
unnamed_addr: we can fold them together with other globals.
Also while I'm here, use isInterposable() instead of trying to
explicitly name all the different kinds of weak linkage.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47090
Differential Revision: https://reviews.llvm.org/D87123
Following up on D67687.
Please refer to the RFC here http://lists.llvm.org/pipermail/llvm-dev/2020-July/143309.html
`CodeGenPassBuilder` is the NPM counterpart of `TargetPassConfig` with below differences.
- Debugging features (MIR print/verify, disable pass, start/stop-before/after, etc.) living in `TargetPassConfig` are moved to use PassInstrument as much as possible. (Implementation also lives in `TargetPassConfig.cpp`)
- `TargetPassConfig` is a polymorphic base (virtual inheritance) to build the target-dependent pipeline whereas `CodeGenPassBuilder` is the CRTP base/helper to implement the target-dependent pipeline. The motivation is flexibility for targets to customize the pipeline, inlining opportunity, and fits the overall NPM value semantics design.
- `TargetPassConfig` is a legacy immutable pass to declare hooks for targets to customize some target-independent codegen layer behavior. This is partially ported to TargetMachine::options. The rest, such as `createMachineScheduler/createPostMachineScheduler`, are left out for now. They should be implemented in LLVMTargetMachine in the future.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D83608
Building on Mac OS with clang 12:
```
jhemphill@jhemphill-mbp build % clang --version
Apple clang version 12.0.0 (clang-1200.0.26.2)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
```
yields one warning:
```
/Users/jhemphill/oss/llvm-project/clang/lib/Tooling/Syntax/BuildTree.cpp:1126:22: warning: loop variable 'Arg' is always a copy because the range of type 'llvm::iterator_range<clang::Stmt::CastIterator<clang::Expr, clang::Expr *, clang::Stmt *> >' does not return a reference [-Wrange-loop-analysis]
for (const auto &Arg : Args) {
^
/Users/jhemphill/oss/llvm-project/clang/lib/Tooling/Syntax/BuildTree.cpp:1126:10: note: use non-reference type 'clang::Expr *'
for (const auto &Arg : Args) {
```
It appears that `Arg` is an `Expr*`, passed by value rather than by const reference.
Reviewed By: eduucaldas, gribozavr2
Differential Revision: https://reviews.llvm.org/D87482
Introduce a new attribute that is used to indicate the error handling
convention used by a function. This is used to translate the error
semantics from the decorated interface to a compatible Swift interface.
The supported error convention is one of:
- none: no error handling
- nonnull_error: a non-null error parameter indicates an error signifier
- null_result: a return value of NULL is an error signifier
- zero_result: a return value of 0 is an error signifier
- nonzero_result: a non-zero return value is an error signifier
Since this is the first of the attributes needed to support the semantic
annotation for Swift, this change also includes the necessary supporting
infrastructure for a new category of attributes (Swift).
This is based on the work of the original changes in
8afaf3aad2
Differential Revision: https://reviews.llvm.org/D87331
Reviewed By: John McCall, Aaron Ballman, Dmitri Gribenko
gcc translates -gz=zlib to --compress-debug-options=zlib for both assembler and linker
but clang only does this for assembler.
The linker needs --compress-debug-options=zlib option to compress the debug sections
in the generated executable or shared library.
Due to this bug, -gz=zlib has no effect on the generated executable or shared library.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D87321
In a future patch
* Implement helper function to generate Trees for tests
* and test Tree methods, namely `findFirstLeaf` and `findLastLeaf`
Differential Revision: https://reviews.llvm.org/D87533
In generating the code for symmetric transfer, a temporary object is created to store the returned handle from await_suspend() call of the awaiter. Previously this temp won't be cleaned up until very later, which ends up causing this temp to be spilled to the heap. However, we know that this temp will no longer be needed after the coro_resume call. We can clean it up right after.
Differential Revision: https://reviews.llvm.org/D87470
The current behavior of -lto-embed-bitcode is not quite the same as that
of -fembed-bitcode. While both populate .llvmbc with bitcode, the latter
populates it with pre-optimized bitcode(*), while the former with
post-optimized. The scenarios driving them are different - the latter's
goal is to allow re-compilation, while the former, IIUC, is execution.
I plan to add a third mode for thinlto cases, closely-related to
-fembed-bitcode's scenario: adding the bitcode pre-optimization, but
post-merging. This would allow re-compilation without requiring the
other .bc files that were merged (akin to how -fembed-bitcode allows
recompilation without all the .h files)
The third mode can't co-exist with the current -lto-embed-bitcode mode,
because the latter would overwrite it. For clarity, we change
-lto-embed-bitcode to be an enum.
(*) That's the compiler semantics. The driver splits compilation in 2
phases, so if -fembed-bitcode is given to the driver, the .llvmbc is
optimized bitcode; if the option is passed to the compiler (after -cc1),
the section is pre-optimized.
Differential Revision: https://reviews.llvm.org/D87477
A type name in an IMPLICIT declaration that was later used in a PARAMETER
statement caused problems because the default symbol scope had not yet been
initialized. I avoided dereferencing in the situation where the default scope
was uninitialized and added a test that triggers the problem.
Differential Revision: https://reviews.llvm.org/D87535
This adds and optional ", immutable" to the end of a `.globaltype`
declaration. I would have prefered to match the `.wat` syntax
where immutable is the default and `mut` is the signifier for
mutable globals. Sadly changing the default would break backwards
compat with existing assembly in the wild so I think its best
to stick with this approach.
Differential Revision: https://reviews.llvm.org/D87515
This patch adds a way to fetch breakpoint metadatas as a serialized
`Structured` Data format (JSON). This can be used by IDEs to update
their UI when a breakpoint is set or modified from the console.
rdar://11013798
Differential Revision: https://reviews.llvm.org/D87491
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
This introduces a builder for the more general case that supports zero
elements (where the element type can't be inferred from the ValueRange,
since it might be empty).
Also, fix up some cases in ShapeToStandard lowering that hit this. It
happens very easily when dealing with shapes of 0-D tensors.
The SameOperandsAndResultElementType is redundant with the new
TypesMatchWith and prevented having zero elements.
Differential Revision: https://reviews.llvm.org/D87492