This revision adds a helper function to hoist vector.transfer_read /
vector.transfer_write pairs out of immediately enclosing scf::ForOp
iteratively, if the following conditions are true:
1. The 2 ops access the same memref with the same indices.
2. All operands are invariant under the enclosing scf::ForOp.
3. No uses of the memref either dominate the transfer_read or are
dominated by the transfer_write (i.e. no aliasing between the write and
the read across the loop)
To improve hoisting opportunities, call the `moveLoopInvariantCode` helper
function on the candidate loop above which to hoist. Hoisting the transfers
results in scf::ForOp yielding the value that originally transited through
memory.
This revision additionally exposes `moveLoopInvariantCode` as a helper in
LoopUtils.h and updates SliceAnalysis to support return scf::For values and
allow hoisting across multiple scf::ForOps.
Differential Revision: https://reviews.llvm.org/D81199
We can simplify
```
icmp <pred> phi(C1, C2, ...), C
```
with
```
phi(icmp(C1, C), icmp(C2, C), ...)
```
provided that all comparison of constants are constants themselves.
Differential Revision: https://reviews.llvm.org/D81151
Reviewed By: lebedev.ri
Forward declaring llvm::errs is not enough, as it is used as a default
parameter with a type that references the base class. So the class
hierarchy must be visible.
Summary:
Add regression tests of asmparser, mccodeemitter, and disassembler for
fixed-point operation instructions. In order to support them, we add
MImm parser to asmparser. Also add a new MPD instruction which is one
of multiply instructions.
Differential Revision: https://reviews.llvm.org/D81207
Summary:
This patch upstreams support for a new storage only bfloat16 C type.
This type is used to implement primitive support for bfloat16 data, in
line with the Bfloat16 extension of the Armv8.6-a architecture, as
detailed here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
In detail this patch:
- introduces an opaque, storage-only C-type __bf16, which introduces a new bfloat IR type.
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
The following people contributed to this patch:
- Luke Cheeseman
- Momchil Velikov
- Alexandros Lamprineas
- Luke Geeson
- Simon Tatham
- Ties Stuij
Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, fpetrogalli
Reviewed By: SjoerdMeijer
Subscribers: labrinea, majnemer, asmith, dexonsmith, kristof.beyls, arphaman, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76077
Use getMemoryOpCost from the generic implementation of getUserCost
and have getInstructionThroughput return the result of that for loads
and stores.
This also means that the X86 implementation of getUserCost can be
removed with the functionality folded into its getMemoryOpCost.
Differential Revision: https://reviews.llvm.org/D80984
Summary:
This will inline the region to a shape.assuming in the case that the
input witness is found to be statically true.
Differential Revision: https://reviews.llvm.org/D80302
In the case of all inputs being constant and equal, cstr_eq will be
replaced with a true_witness.
Differential Revision: https://reviews.llvm.org/D80303
This allows replacing of this op with a true witness in the case of both
inputs being const_shapes and being found to be broadcastable.
Differential Revision: https://reviews.llvm.org/D80304
This allows assuming_all to be replaced when all inputs are known to be
statically passing witnesses.
Differential Revision: https://reviews.llvm.org/D80306
This will later be used during canonicalization and folding steps to replace
statically known passing constraints.
Differential Revision: https://reviews.llvm.org/D80307
After a180d54 the build was failing with:
In file included from /work/llvm.monorepo/clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp:9:0:
/work/llvm.monorepo/clang/unittests/ASTMatchers/ASTMatchersTest.h:
In function ‘llvm::ArrayRef<clang::TestLanguage> clang::ast_matchers::langCxx11OrLater()’:
/work/llvm.monorepo/clang/unittests/ASTMatchers/ASTMatchersTest.h:64:10:
error: could not convert ‘(const clang::TestLanguage*)(& Result)’ from
‘const clang::TestLanguage*’ to ‘llvm::ArrayRef<clang::TestLanguage>’
return Result;
^
Update linalg to affine lowering for convop to use affine load for input
whenever there is no padding. It had always been using std.loads because
max in index functions (needed for non-zero padding if not materializing
zeros) couldn't be represented in the non-zero padding cases.
In the future, the non-zero padding case could also be made to use
affine - either by materializing or using affine.execute_region. The
latter approach will not impact the scf/std output obtained after
lowering out affine.
Differential Revision: https://reviews.llvm.org/D81191
This patch addresses the comment in [D80972](https://reviews.llvm.org/D80972#inline-744217).
Before this patch, the initial length field of .debug_aranges section should be declared as:
```
## 32-bit DWARF
debug_aranges:
- Length:
TotalLength: 0x20
Version: 2
...
## 64-bit DWARF
debug_aranges:
- Length:
TotalLength: 0xffffffff
TotalLength64: 0x20
Version: 2
...
```
After this patch:
```
## 32-bit DWARF
debug_aranges:
- [[Format: DWARF32]] ## Optional
Length: 0x20
Version: 2
...
## 64-bit DWARF
debug_aranges:
- Format: DWARF64
Length: 0x20
Version: 2
```
Current implementation of generating DWARF64 .debug_aranges section is buggy. A follow-up patch will improve it and add test cases for DWARF64.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D81063
It should not be necessary to use weak linkage for these. Doing so
implies interposablity and thus PIC generates indirections and
dynamic relocations, which are unnecessary and suboptimal. Aside
from this, ASan instrumentation never introduces GOT indirection
relocations where there were none before--only new absolute relocs
in RELRO sections for metadata, which are less problematic for
special linkage situations that take pains to avoid GOT generation.
Patch By: mcgrathr
Differential Revision: https://reviews.llvm.org/D80605
trivial.
We previously took a shortcut by assuming that if a subobject had a
trivial copy assignment operator (with a few side-conditions), we would
always invoke it, and could avoid going through overload resolution.
That turns out to not be correct in the presenve of ref-qualifiers (and
also won't be the case for copy-assignments with requires-clauses
either). Use the same logic for lazy declaration of copy-assignments
that we use for all other special member functions.
Summary:
Cache the results from getMachineBasicBlocks in LexicalScopes to speed
up UserValueScopes::dominates queries. This replaces the caching done
in UserValueScopes. Compared to the old caching method, this reduces
memory traffic when a VarLoc is copied (e.g. when a VarLocMap grows),
and enables caching across basic blocks.
When compiling sqlite 3.5.7 (CTMark version), this patch reduces the
number of calls to getMachineBasicBlocks from 10,207 to 1,093. I also
measured a small compile-time reduction (~ 0.1% of total wall time, on
average, on my machine).
As a drive-by, I made the DebugLoc in UserValueScopes a const reference
to cut down on MetadataTracking traffic.
Reviewers: jmorse, Orlando, aprantl, nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80957
This simplifies a lot of handling of BoolAttr/IntegerAttr. For example, a lot of places currently have to handle both IntegerAttr and BoolAttr. In other places, a decision is made to pick one which can lead to surprising results for users. For example, DenseElementsAttr currently uses BoolAttr for i1 even if the user initialized it with an Array of i1 IntegerAttrs.
Differential Revision: https://reviews.llvm.org/D81047
As a followup to D62922, add a sysroot command-line option to this test
to ensure that the output is independent of any default sysroot options,
and adjust the reactor test to be more consistent with the command test.
This revision adds a helper function to hoist alloc/dealloc pairs and
alloca op out of immediately enclosing scf::ForOp if both conditions are true:
1. all operands are defined outside the loop.
2. all uses are ViewLikeOp or DeallocOp.
This is now considered Linalg-specific and will be generalized on a per-need basis.
Differential Revision: https://reviews.llvm.org/D81152
Now that we have an operand based form for the GC arguments to a statepoint intrinsic, update RS4GC to use it and update tests to reflect. This is pretty straight forward. I nearly landed without review, but figured a second set of eyes didn't hurt.
Differential Revision: https://reviews.llvm.org/D81121
Follow the model used on Linux, where the clang driver passes the
linker a -u switch to force the profile runtime to be linked in,
rather than having every TU emit a dead function with a reference.
Differential Revision: https://reviews.llvm.org/D79835