This patch moves the logic to clone and check a new chunk into a new
function, to allow re-use in a follow-up patch that implements parallel
reductions.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D113856
Calls to MMA builtins that take pointer to void
do not accept other pointers/arrays whereas normal
functions with the same parameter do. This patch
allows MMA built-ins to accept non-void pointers
and arrays.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D113306
Precisely specifying the unused parts of the bitfield is critical for
performance. If we don't specify them, compiler will generate code to load
the old value and shuffle it to extract the unused bits to apply to the new
value. If we specify the unused part and store 0 in there, all that
unnecessary code goes away (store of the 0 const is combined with other
constant parts).
I don't see a good way to ensure we cover all of u64 bits with fields.
So at least introduce named kUnusedBits consts and check that bits
sum up to 64.
Depends on D113978.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113979
In the old runtime we used to use different number of trace parts
for C++ and Go to reduce trace memory consumption for Go.
But now it's easier and better to use smaller parts because
we already use minimal possible number of parts for C++ (3).
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D113978
Places `format_to_n_result` to its own file. While working on D112361 it
turns out the type will be used outside the format header.
Reviewed By: #libc, Quuxplusone, Mordante
Differential Revision: https://reviews.llvm.org/D113831
Limit the backtracking along def-use chains when a prefix is encountered as it would generate incorrect foldings.
Differential Revision: https://reviews.llvm.org/D113975
Global ctor/dtor can be an empty array, which is a Constant not a
ConstantArray. The cast<ConstantArray> therefore asserts / crashes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D113800
No need to count the final shuffle cost for the constants, gathering of
the constants is just a constant vector + extra inserts, if required.
Differential Revision: https://reviews.llvm.org/D113770
This test explicitly checks for .file directives, which is not currently supported on AIX. This patch sets this test to XFAIL on AIX for now.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D113640
This is an NFC diff that prepares for pruning & relocating `__eh_frame`.
Along the way, I made the following changes to ...
* clarify usage of `section` vs. `subsection`
* remove `map` & `vec` from type names
* disambiguate class `Section` from template parameter `SectionHeader`.
Differential Revision: https://reviews.llvm.org/D113241
1. To avoid overwriting the part of the record read in the non advancing read,
the furtherPositionInRecord field must be set to the max of the
furtherPositionInRecord and the positionInRecord at the beginning of the
IO write.
2. To allow any further read to succeed after the write, the unit
beganReadingRecord_ must be set to false when resetting the recordLength
during the write, otherwise, recordLength will not be computed in further
read and an assert is hit (at unit.cpp(398)).
The added unit test exercises both of these scenarios.
Differential Revision: https://reviews.llvm.org/D113740
This patch fixes the warnings which shows up when libcxx library started to be compiled in 32-bit mode on z/OS.
More specifically, the assignment from unsigned int to time_t aka long was flags as follows:
```
libcxx/include/c++/v1/__support/ibm/nanosleep.h:31:11: warning: implicit conversion changes signedness: 'unsigned int' to 'time_t' (aka 'long') [-Wsign-conversion]
__sec = sleep(static_cast<unsigned int>(__sec));
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
libcxx/include/c++/v1/__support/ibm/nanosleep.h:36:36: warning: implicit conversion changes signedness: 'unsigned int' to 'long' [-Wsign-conversion]
__rem->tv_nsec = __micro_sec * 1000;
~ ~~~~~~~~~~~~^~~~~~
libcxx/include/c++/v1/__support/ibm/nanosleep.h:47:36: warning: implicit conversion changes signedness: 'unsigned int' to 'long' [-Wsign-conversion]
__rem->tv_nsec = __micro_sec * 1000;
~ ~~~~~~~~~~~~^~~~~~
3 warnings generated.
```
Here is a small test case illustrating the issue:
```
typedef long time_t ;
unsigned int sleep(unsigned int );
int main() {
time_t sec = 0;
#ifdef FIX
sec = static_cast<time_t>(sleep(static_cast<unsigned int>(sec)));
#else
sec = sleep(static_cast<unsigned int>(sec));
#endif
}
```
clang++ -c -Wsign-conversion -m32 t.C
```
t.C:8:9: warning: implicit conversion changes signedness: 'unsigned int' to 'time_t' (aka 'long') [-Wsign-conversion]
sec = sleep(static_cast<unsigned int>(sec));
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed By: ldionne, #libc, Quuxplusone, Mordante
Differential Revision: https://reviews.llvm.org/D112837
This patch extends the `FIRToLLVMLowering` pass in Flang by adding a
hook to transform `fir.boxchar_len` to a sequence of LLVM MLIR
instructions.
This is part of the upstreaming effort from the `fir-dev` branch in [1].
[1] https://github.com/flang-compiler/f18-llvm-project
Differential Revision: https://reviews.llvm.org/D113763
Originally written by:
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Need to adjust the types of GEPs indices when building the tree
entries/operands. Otherwise some of the nodes might differ and
vectorizer is unable to correctly find them and count their cost.
Differential Revision: https://reviews.llvm.org/D113792
Add conversion pattern for the GenTypeDescOp.
Convert to a global constant with an addressof.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D113766
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D113963
Cleaning up unused "using" declarations.
This patch was generated from automatically applyning clang-tidy fixes.
Differential Revision: https://reviews.llvm.org/D113891
rGf39978b84f1d3a1da6c32db48f64c8daae64b3ad led to and/or exposed
an issue with IndVarSimplification for a loop where a i32 phi node is
no longer replaced by a widened (i64) phi node, because the SCEVs of a
sign-extend no longer folded the same way. I'm unsure how to properly
explain this because it's all rather complicated, but in short: SCEVs
don't fold as nicely as they used to and this caused a difference.
While investigating this, I found that IndVarSimplify can actually
optimise the case in the way we want to if it chooses the widened IV to
be 'signed' (the i32 IV is both sign and zero-extended). Oddly enough,
there is some level of indeterminism in the way the algorithm works,
it just picks the sign of the 'first' zext/sext user, where the order of
the users-iterator is not guaranteed to be the same on each invocation
of the pass (e.g. shown by first running loop-rotate, which puts the
users in a different order).
While I think the fix is valid in the sense that consistently picking
_any_ order is better than having an nondeterministic order, I can
use a bit of advice from people more familiar in this area of the
code-base.
For example, I'm not sure if this fix is hiding another issue where the
IndVarSimplify pass could actually draw the same conclusions (i.e. that
it only needs an i64 phi node) if it does a bit more work, regardless
of whether it chooses the induction variable to be signed or unsigned.
I'm also not sure if choosing signed is better than unsigned, or whether
that just happens to be beneficial only in this individual case.
Any feedback would be much appreciated!
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D112573
Textual LLVM IR files are much bigger and take longer to write to disk.
To avoid the extra cost incurred by serializing to text, this patch adds
an option to save temporary files as bitcode instead.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D113858
This patch adds the codegen for fir.cmpc. The real and imaginary parts
are extracted and compared separately. For the .EQ. predicate the
results are AND'd, for the .NE. predicate the results are OR'd, and for
other predicates we keep only the result on the real parts.
This patch is part of the upstreaming effort from fir-dev.
Differential Revision: https://reviews.llvm.org/D113976
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Fixes PR#52400. The tests for bugprone-throw-keyword-missing actually
already contain exceptions as class members, but not as members with
initializers, which was probably just an oversight.
Module resolution is probably the most complex piece of lldb [citation
needed], with numerous levels of abstraction, each one implementing
various retry and fallback strategies.
It is also a very repetitive, with only small differences between
"host", "remote-and-connected" and "remote-but-not-(yet)-connected"
scenarios.
The goal of this patch (first in series) is to reduce the number of
abstractions, and deduplicate the code.
One of the reasons for this complexity is the tension between the desire
to offload the process of module resolution to the remote platform
instance (that's how most other platform methods work), and the desire
to keep it local to the outer platform class (its easier to subclass the
outer class, and it generally makes more sense).
This patch resolves that conflict in favour of doing everything in the
outer class. The gdb-remote (our only remote platform) implementation of
ResolveExecutable was not doing anything gdb-specific, and was rather
similar to the other implementations of that method (any divergence is
most likely the result of fixes not being applied everywhere rather than
intentional).
It does this by excising the remote platform out of the resolution
codepath. The gdb-remote implementation of ResolveExecutable is moved to
Platform::ResolveRemoteExecutable, and the (only) call site is
redirected to that. On its own, this does not achieve (much), but it
creates new opportunities for layer peeling and code sharing, since all
of the code now lives closer together.
Differential Revision: https://reviews.llvm.org/D113487
So far, applying loop guard information has been restricted to
SCEVUnknown. In a few cases, like PR40961 and PR52464, this leads to
SCEV failing to determine tight upper bounds for the backedge taken
count.
This patch adjusts SCEVLoopGuardRewriter and applyLoopGuards to support
re-writing ZExt expressions.
This is a first step towards fixing PR40961 and PR52464.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D113577
InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store
and LD1/ST1 to load/store when a ptrue all predicate pattern operand
is present.
This allows existing IR optimizations such as dead-load removal to
occur.
Differential Revision: https://reviews.llvm.org/D113489
The GetSupportedArchitectureAtIndex pattern forces the use of
complicated patterns in both the implementations of the function and in
the various callers.
This patch creates a new method (GetSupportedArchitectures), which
returns a list (vector) of architectures. The
GetSupportedArchitectureAtIndex is kept in order to enable incremental
rollout. Base Platform class contains implementations of both of these
methods, using the other method as the source of truth. Platforms
without infinite stacks should implement at least one of them.
This patch also ports Linux, FreeBSD and NetBSD platforms to the new
API. A new helper function (CreateArchList) is added to simplify the
common task of creating a list of ArchSpecs with the same OS but
different architectures.
Differential Revision: https://reviews.llvm.org/D113608
This infrastructure has proven proven its worth, so give it a more
prominent place.
My immediate motivation for this is the desire to reuse this
infrastructure for qemu platform testing, but I believe this move makes
sense independently of that. Moving this code to the packages tree will
allow as to add more structure to the gdb client tests -- currently they
are all crammed into the same test folder as that was the only way they
could access this code.
I'm splitting the code into two parts while moving it. The first once
contains just the generic gdb protocol wrappers, while the other one
contains the unit test glue. The reason for that is that for qemu
testing, I need to run the gdb code in a separate process, so I will
only be using the first part there.
Differential Revision: https://reviews.llvm.org/D113893
* It works similar to scf.for coversion, but convert condition and yield ops as part of scf.whille pattern so it don't need to maintain external state
Differential Revision: https://reviews.llvm.org/D113007
For the semi affine expressions, whenever rhs of a floordiv, ceildiv, mod
or product expression is a symbolic expression, we introduce a local variable
representing the result, and store the floordiv/ceildiv, mod or product
affine expression in LocalExprs. In this way the expression is flattened,
and trivial addition and subtraction related simplifications are performed.
Also rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.
Differential Revision: https://reviews.llvm.org/D112808
This patch extends the existing functionality of computing an explicit
representation for local variables, to also get the explicit representation,
instead of only the inequality pairs.
This is required for a future patch to remove redundant local ids based on
their explicit representation.
Reviewed By: arjunp
Differential Revision: https://reviews.llvm.org/D113814