Added support for incremental mode 8 and 28 ie. `frontend::EmitBC:` and `frontend::PrintPreprocessedInput:`
Added supporting clang tests to test in clang-repl mode
Reviewed By: v.g.vassilev
Differential Revision: https://reviews.llvm.org/D125946
Previously when we add module initializer, we forget to handle header
units. This results that we couldn't compile a Hello World Example with
Header Units. This patch tries to fix this.
Reviewed By: iains
Differential Revision: https://reviews.llvm.org/D130871
This patch adds constant folder for TanOp which only supports single and double precision floating-point.
Differential Revision: https://reviews.llvm.org/D130873
Previously when we add module initializer, we forget to handle header
units. This results that we couldn't compile a Hello World Example with
Header Units. This patch tries to fix this.
Reviewed By: iains
Differential Revision: https://reviews.llvm.org/D130871
Previously, DenseArrayAttr used VectorType for its shaped type.
VectorType is problematic for arrays because it doesn't support zero
dimensions, meaning that an empty array would have `vector<i32>` as its
type. ElementsAttr would think that an empty dense array is size 1, not
0. This patch switches over to TensorType, which does support zero
dimensions.
Fixes#56860
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D130921
Adds optional attribute to support tensor cores on F32 datatype by lowering to `mma.sync` with TF32 operands. Since, TF32 is not a native datatype in LLVM we are adding `tf32Enabled` as an attribute to allow the IR to be aware of `MmaSyncOp` datatype. Additionally, this patch adds placeholders for nvgpu-to-nvgpu transformation targeting higher precision tf32x3.
For mma.sync on f32 input using tensor cores there are two possibilites:
(a) tf32 (1 `mma.sync` per warp-level matrix-multiply-accumulate)
(b) tf32x3 (3 `mma.sync` per warp-level matrix-multiply-accumulate)
Typically, tf32 tensor core acceleration comes at a cost of accuracy from missing precision bits. While f32 has 23 precision bits, tf32 has only 10 precision bits. tf32x3 aims to recover the precision bits by splitting each operand into two tf32 values and issue three `mma.sync` tensor core operations.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D130294
Reflect in the pointer's offset the length of the leading part
of the consumed string preceding the first converted digit.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D130912
As progress towards having FileEntryRef contain the requested name of
the file, this commit narrows the "remap" hack to only apply to paths
that were remapped to an external contents path by a VFS. That was
always the original intent of this code, and the fact it was making
relative paths absolute was an unintended side effect.
Differential Revision: https://reviews.llvm.org/D130935
It's possible we have:
lui a0, %hi(sym)
addi a0, %lo(sym)
addi a0, <offset1>
lw a0, <offset2>(a0)
We want to arrive at
lui a0, %hi(sym+offset1+offset2)
lw a0, %lo(sym+offset1+offset2)
We currently fail to do this because we only consider loads/stores
if we didn't find any arithmetic.
This patch splits arithmetic folding and load/store folding into
two separate phases. The load/store folding can no longer assume
the offset in hi/lo is 0 so we must combine the offsets. I've applied
the same simm32 limit that we applied in the arithmetic folding.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D130931
The patch replaces SPIRVBaseInfo.* previously created using macros by
the tablegen approach. There are many small changes in other files due to
differences in namespaces. Also, functions in SPIRVUtils are moved to
the llvm namespace.
Differential Revision: https://reviews.llvm.org/D130518
Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
This test had to be disabled because ps4 targets don't support
-fuse-ld. Preferably, this should just be unsupported for ps4
targets. However no such lit feature exists so I have just gone
ahead and set the target explicitly. Moreover, this needs
to create a terminal link step, either an executable or shared
object to get the link error. With the change to the explicit
target I've had to also add -nostartfiles -nostdlib so that
clang doesn't pull crt files into the link which may not be
present. Again, this would likely be solved if this test
was unsupported for the one platform that disables -fuse-ld
This attribute is technical debt from the early stages of MLIR, before
ElementsAttr was an interface and when it was more difficult for
dialects to define their own types of attributes. At present it isn't
used at all in tree (aside from being convenient for eliding other
ElementsAttr), and has had little to no evolution in the past three years.
Differential Revision: https://reviews.llvm.org/D129917
It's an accident that we started return asbolute paths from
FileEntry::getName for all relative paths. Prepare for getName to get
(closer to) return the requested path. Note: conceptually it might make
sense for the dependency scanner to allow relative paths and have the
DependencyConsumer decide if it wants to make them absolute, but we
currently document that it's absolute and I didn't want to change
behaviour here.
Differential Revision: https://reviews.llvm.org/D130934
We were not handling correclty multiple DW_OP_addrx in the location expression.
This was exposed by clang-15 build in release mode with debug information.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D130812
The use of `std::unique_ptr` with `TraceCursor` adds unnecessary complexity to adding `SBTraceCursor` bindings
Specifically, since `TraceCursor` is an abstract class there's no clean
way to provide "deep clone" semantics for `TraceCursorUP` short of
creating a pure virtual `clone()` method (afaict).
After discussing with @wallace, we decided there is no strong reason to
favor wrapping `TraceCursor` with `std::unique_ptr` over `std::shared_ptr`, thus this diff
replaces all usages of `std::unique_ptr<TraceCursor>` with `std::shared_ptr<TraceCursor>`.
This sets the stage for future diffs to introduce `SBTraceCursor`
bindings in a more clean fashion.
Test Plan:
Differential Revision: https://reviews.llvm.org/D130925
GCC and that specific build bot issued warnings turned errors, due to a narrowing conversion from `unsigned` to `int32_t`. Silence these via a static_cast.
2xi64 is the legalized type for wide reductions (like 16xi64) and setting the
cost to 2 makes `load-reduce` and `load-zext-reduce` patterns profitable.
The few performance measurments that I did on an aarch64 machine confirm that
these patterns are actually faster when vectorized.
Differential Revision: https://reviews.llvm.org/D130740
The runtime makes some use of `std::vector` data structures. We should
be able to replace these trivially with `llvm::SmallVector` instead.
This should allow us to avoid heap allocations in the majority of cases
now.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130927