Temporary fix for missing entry offset when creating address
translation tables (BAT) after D127935 landed. Will later work on
assigning a more reasonable offset different than zero.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D128092
This adds weak versions of the truncation libcalls in case the runtime
environment doesn't have them.
Differential Revision: https://reviews.llvm.org/D128091
Doing so let's the post-mutation pass leverage the demanded info to rewrite vsetvlis before a store/mask-op to eliminate later vsetvlis.
Sorry for the lack of store test change; all of my attempts to write something reasonable have been handled through existing logic.
This reverts commit 7aa8a67882.
This version includes fixes to address issues uncovered after
the commit landed and discussed at D11448.
Those include:
* Limit select-traversal to selects inside the loop.
* Freeze pointers resulting from looking through selects to avoid
branch-on-poison.
The added section and table here list the object file formats LLVM MC
supports and which targets support each format.
Differential Revision: https://reviews.llvm.org/D127645
This document is a work in progress to begin fleshing out documentation
for the DirectX backend and related changes in the LLVM project.
This is not intended to be exhaustive or complete, it is intended as a
starting
point so taht future changes have a place for documentation to land.
Differential Revision: https://reviews.llvm.org/D127640
This adds a parser for the log symbolizer markup format discussed in
https://discourse.llvm.org/t/rfc-log-symbolizer/61282. The parser
operates in a line-by-line fashion with minimal memory requirements.
This doesn't yet include support for multi-line tags or specific parsing
for ANSI X3.64 SGR control sequences, but it can be extended to do so.
The latter can also be relatively easily handled by examining the
resulting text elements.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D124686
The upstreamed code was not incrementing the sliceOffset in multiples
of 3. This issue is fixed by using Offsets and incrementing by 3 during
every iteration.
In the conversion pattern, we were comparing the definingOp of an
operand with an FIR::UndefOp. Use LLVM::UndefOp for conversion.
Reviewed By: clementval, Leporacanthicus
Differential Revision: https://reviews.llvm.org/D128017
This mostly copys the `<experimental/functional>` stuff and updates the code to current libc++ style.
Reviewed By: ldionne, #libc
Spies: nlopes, adamdebreceni, arichardson, libcxx-commits, mgorny
Differential Revision: https://reviews.llvm.org/D121074
This code should be dead. A simple whole register copy of an IMPLICIT_DEF, is simply an IMPLICIT_DEF of it's own. (This would not be true for freeze, but is for copy.) If we find a case which gets here with vector operand copy of an IMPLICIT_DEF, we most likely have an earlier missed optimization anyways. (The most recent case of this was e6c7a3a, found by Craig during review of this patch.) There might be others, and if so, we'll revisit them individually as regressions are reported.
Differential Revision: https://reviews.llvm.org/D127996
If a lazyCompoundVal to a struct is bound to the store, there is a policy which decides
whether a copy gets created instead.
This patch introduces a similar policy for arrays, which is required to model structured
binding to arrays without false negatives.
Differential Revision: https://reviews.llvm.org/D128064
In the sign writer, a size_t was being compared to an int. This patch
casts the size_t to an int so that the comparison doesn't cause a sign
comparison warning.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D127984
Simplify the implementation of `std::copy` and `std::move` by using `__unwrap_iter` and `__rewrap_iter` to unwrap and rewrap `reverse_iterator<reverse_iterator<Iter>>` instead of specializing `__copy_impl` and `__move_impl`.
Reviewed By: ldionne, #libc
Spies: wenlei, libcxx-commits
Differential Revision: https://reviews.llvm.org/D127049
The memcmp simplifier is limited to folding to constants calls with constant
arrays and constant sizes. This change adds the ability to simplify
memcmp(A, B, N) calls with constant A and B and variable N to the pseudocode
equivalent of
N <= Pos ? 0 : (A < B ? -1 : B < A ? +1 : 0)
where Pos is the offset of the first mismatch between A and B.
Differential Revision: https://reviews.llvm.org/D127766
For DecompositionDecl, the array, which is being decomposed was not present in the
CFG, which lead to the liveness analysis falsely detecting it as a dead symbol.
Differential Revision: https://reviews.llvm.org/D127993
llvm::formatv expects the parameter indexes to start with 0.
Unfortunately it doesn't detect out-of-bounds accesses in the format
string at compile-time, of which we had several inside ClangExpressionDeclMap.
This patch fixes these out-of-bounds format accesses.
Example output
Before
ClangExpressionDeclMap::FindExternalVisibleDecls for '$__lldb_class' in a
'TranslationUnit'
CEDM::FEVD Searching the root namespace
CEDM::FEVD Adding type for $__lldb_class: 1
After
ClangExpressionDeclMap::FindExternalVisibleDecls for '$__lldb_class' in
a 'TranslationUnit'
CEDM::FEVD Searching the root namespace
CEDM::FEVD Adding type for $__lldb_class: class (lambda)
Patch by Michael Buch!
Differential Revision: https://reviews.llvm.org/D128063
This fixes all sorts of ABI issues due to passing by-value
(using by-reference with memref's exclusively).
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D128018
This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by
`newColIdx = col % vecSize + perm[row](col/vecSize,row)`
where `perm` is a permutation function indexed by `row` and `vecSize`
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized accesses
common to loading matrix multiplication operands to/from shared memory.
Differential Revision: https://reviews.llvm.org/D127457
This patch adds the instructions `MTVSCR` and `MFVSCR` as not swappable to the
PPCVSXSwapRemoval pass because they are not lane-insensitive. This will prevent
the compiler from optimizing out required swaps when using `lxvd2x` and
`stxvd2x`.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D128062
A splat of the values 0 and -1 as sign extended 12 bit immediates are always the same bit pattern regardless of the etype used to perform the operation. As a result, we can sometimes avoid introducing a vsetvli just for the purposes of a splat.
Looking at the diffs, we don't get a huge amount of immediate value out of this. We mostly push the vsetvli one instruction down, usually in front of a vmerge. We also don't get the corresponding fixed length vector cases because VL typically is changed despite the actual bits written being the same. Both of these are areas I plan to explore in future patches.
Interestingly, this makes a great example of why we need the forward and backward implementation to be consistent. Before we merged the demanded field handling, if we implement only the forward direction, we lost the ability to mutate a prior vsetvli and eliminate a later one entirely. This resulted in practical regressions instead of improvements. It's always nice when practice matches theory. :)
Differential Revision: https://reviews.llvm.org/D128006
Dependency scanning does not care about the order of submodules for
correctness, so sort the submodules so that we get the same
command-lines to build the module across different TUs. The order of
inferred submodules can vary depending on the order of #includes in the
including TU.
Differential Revision: https://reviews.llvm.org/D128008
When the mask is a power-of-2 constant and op0 is a shifted-power-of-2
constant, test if the shift amount equals the offset bit index:
(ShiftC << X) & C --> X == (log2(C) - log2(ShiftC)) ? C : 0
(ShiftC >> X) & C --> X == (log2(ShiftC) - log2(C)) ? C : 0
This is an alternate to D127610 with a more general pattern.
We match only shift+and instead of the trailing xor, so we see a few
more tests diffs. I think we discussed this initially in D126617.
Here are proofs for shifts in both directions:
https://alive2.llvm.org/ce/z/CFrLs4
The test diffs look equal or better for IR, and this makes the
patterns more uniform in IR. The backend can partially invert this
in both cases if that is profitable. It is not trivially reversible,
however, so if we find perf regressions that are not easy to undo,
then we may want to revert this.
Differential Revision: https://reviews.llvm.org/D127801
This reverts commit 7aac15d5df.
Only updates the tests, as these statements are still part of the CFG
and its just the pretty printer policy that changes. Hopefully this
shouldn't affect any analysis.
This patch removes usage of `-mllvm -combiner-global-alias-analysis`
and relies on compiler builtin to implement `memcpy`.
Note that `-mllvm -combiner-global-alias-analysis` is actually only useful for
functions where buffers can alias (namely `memcpy` and `memmove`). The other
memory functions where not benefiting from the flag anyways.
The upside is that the memory functions can now be compiled from source with
thinlto (thinlto would not be able to carry on the flag when doing inlining).
The downside is that for compilers other than clang (i.e. not providing
`__builtin_memcpy_inline`) the codegen may be worse.
Differential Revision: https://reviews.llvm.org/D128051
This patch is part of the upstreaming effort from fir-dev branch.
It also ensures all descriptors created inline complies with LBOUND
requirement that the lower bound is `1` when the related dimension
extent is zero.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128047
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Simplify logic in `__config` by assuming that we are using Clang in C++03 mode. Also, use standardized feature-test macros instead of compiler-specific checks (like `__has_feature`) in a couple of places.
Reviewed By: ldionne, #libc
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D127606