Copying sequences around as the heap resized is significantly expensive.
This speeds up glrParse by ~35% (2.4 => 3.25 MB/s)
Differential Revision: https://reviews.llvm.org/D128307
All bufferizable ops that bufferize to an allocation receive a `bufferization.escape` attribute during TensorCopyInsertion.
Differential Revision: https://reviews.llvm.org/D128137
While it is indeed an error to use SIZE, SHAPE, or UBOUND on an
assumed-shape dummy argument without also supplying a DIM= argument
to the intrinsic function, it is *not* an error to use these intrinsic
functions on sections or expressions of such arrays. Refine the test
used for the error message.
Differential Revision: https://reviews.llvm.org/D128391
This is a ~5% speedup, we no longer have to allocate the priority queues and
other collections for each reduction step where we use them.
It's also IMO easier to understand the structure of a class with methods vs a
function with nested lambdas.
Differential Revision: https://reviews.llvm.org/D128301
A forward abstract state can be in the special SEWLMULRatioOnly state which means we're not allowed to inspect its fields. The scalar to vector move case was mising a guard, and we'd crash on an assert. Test cases included.
If there are multiple constraints in the same block, at the moment the
order they are processed may be different depending on the sort
implementation.
Use stable_sort to ensure consistent ordering.
The target features are necessary for correctly compiling most programs
in LTO mode. Currently, these are derived in clang at link time and
passed as an arguemnt to the linker wrapper. This is problematic because
it requires knowing the required toolchain at link time, which should
not be necessry. Instead, these features should be embedded into the
offloading binary so we can unify them in the linker wrapper for LTO.
This also required changing the offload packager to interpret multiple
arguments as concatenation with a comma. This is so we can still use the
`,` separator for the argument list.
Depends on D127246
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D127686
When a READ statement reads into a CHARACTER(2 or 4) variable from a
unit whose encoding is not UTF-8, don't copy bytes directly; they must
each be zero-extended.
Differential Revision: https://reviews.llvm.org/D128390
The check for the PAD= setting should examine the mutable modes
of the current I/O statement, not the persistent modes of the
I/O unit.
Differential Revision: https://reviews.llvm.org/D128389
In general we split a reduce into pop/push, so concurrently-available reductions
can run in the correct order. The data structures for this are expensive.
When only one reduction is possible at a time, we need not do this: we can pop
and immediately push instead.
Strictly this is correct whenever we yield one concurrent PushSpec.
This patch recognizes a trivial but common subset of these cases:
- there must be no pending pushes and only one head available to pop
- the head must have only one reduction rule
- the reduction path must be a straight line (no multiple parents)
On my machine this speeds up by 2.12 -> 2.30 MB/s = 8%
Differential Revision: https://reviews.llvm.org/D128299
First, ExternalFileUnit::SetPosition was being used both as a utility
within the class' member functions as well as an API from I/O statement
processing. Make it private, and add APIs for SetStreamPos and SetDirectRec.
Second, ensure that SetStreamPos for POS= positioning in a stream
doesn't leave the current record number and endfile record number
in an arbitrary state. In stream I/O they are used only to manage
end-of-file detection, and shouldn't produce false positive results
from IsAtEnd() after repositioning.
Differential Revision: https://reviews.llvm.org/D128388
If the target has chosen to expand a scalable vector type, BasicTTI tries to scalarize and we'd crash. As a minimum, we should return an invalid cost instead.
The added test provide coverage for the moment, but given they show a number of gaps in RISCV costing, they're likely not to cover this code path long term.
Drop the thread-safe flag and make the locking strategy the
responsibility of the individual log handler.
Previously we got away with a non-thread safe mode because we were using
unbuffered streams that rely on the underlying syscalls/OS for
synchronization. With the introduction of log handlers, we can have
arbitrary logic involved in writing out the logs. With this patch the
log handlers can pick the most appropriate locking strategy for their
particular implementation.
Differential revision: https://reviews.llvm.org/D127922
This patch adds a buffered logging mode to lldb. A buffer size can be
passed to `log enable` with the -b flag. If no buffer size is specified,
logging is unbuffered.
Differential revision: https://reviews.llvm.org/D127986
Character conversion requires memory storage as it operates on a
sequence of code points.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128438
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
* Make Semantics test doconcurrent01.f90 an expected failure pending a fix
for a problem in recognizing a PURE prefix specifier for a specific procedure
that occurs in new intrinsic module source code,
* review update
* review update
* Increase support for intrinsic module procedures
The f18 standard defines 5 intrinsic modules that define varying numbers
of procedures, including several operators:
2 iso_fortran_env
55 ieee_arithmetic
10 ieee_exceptions
0 ieee_features
6 iso_c_binding
There are existing fortran source files for each of these intrinsic modules.
This PR adds generic procedure declarations to these files for procedures
that do not already have them, together with associated specific procedure
declarations. It also adds the capability of recognizing intrinsic module
procedures in lowering code, making it possible to use existing language
intrinsic code generation for intrinsic module procedures for both scalar
and elemental calls. Code can then be generated for intrinsic module
procedures using existing options, including front end folding, direct
inlining, and calls to runtime support routines. Detailed code generation
is provided for several procedures in this PR, with others left to future PRs.
Procedure calls that reach lowering and don't have detailed implementation
support will generate a "not yet implemented" message with a recognizable name.
The generic procedures in these modules may each have as many as 36 specific
procedures. Most specific procedures are generated via macros that generate
type specific interface declarations. These specific declarations provide
detailed argument information for each individual procedure call, similar
to what is done via other means for standard language intrinsics. The
modules only provide interface declarations. There are no procedure
definitions, again in keeping with how language intrinsics are processed.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128431
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.
Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.
For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D128286
Identical literal folding takes ~1.4% of the time, and was missing
from the trace.
Signature computation still needs ~2.2% of the time, so probably worth
explicitly marking its contribution to "Write output file" (9.1%)
Differential Revision: https://reviews.llvm.org/D128343
This adds the macrofusion plumbing and support fusing LUI+ADDI(W).
This is similar to D73643, but handles a different case. Other cases
can be added in the future.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D128393
IMO this model is simpler to understand (borrowed from the LR0 patch D127357).
It also makes error recovery easier to implement, as we have a simple list of
head nodes lying around to recover from when needed.
(It's not quite as nice as LR0 in this respect though).
It's slightly slower (2.24 -> 2.12 MB/S on my machine = 5%) but nothing close
to as bad as LR0.
However
- I think we'd have to eat a litle performance loss otherwise to implement
error recovery.
- this frees up some complexity budget for optimizations like fastpath push/pop
(this + fastpath is already faster than head)
- I haven't changed the data structure here and it's now pretty dumb, we can
make it faster
Differential Revision: https://reviews.llvm.org/D128297
This is a helper patch to ease the reviewing of D128139.
The originals will be removed at a later time when all formatters are
converted to the new style. (Floating-point and pointer aren't up for
review yet.)
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D128367
Similarly to how undefined symbol diagnostics were changed in D128184,
we now show where in the source file duplicate symbols are defined at:
ld64.lld: error: duplicate symbol: _foo
>> defined in bar.c:42
>> /path/to/bar.o
>> defined in baz.c:1
>> /path/to/libbaz.a(baz.o)
For objects that don't contain DWARF data, the format is unchanged.
A slight difference to undefined symbol diagnostics is that we don't
print the name of the symbol on the third line, as it's already
contained on the first line.
Differential Revision: https://reviews.llvm.org/D128425
waitcnt vmcnt instructions are currently generated in loop bodies before using
values loaded outside of the loop. In some cases, it is better to flush the
vmcnt counter in a loop preheader before entering the loop body. This patch
detects these cases and generates waitcnt instructions to flush the counter.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D115747
This reverts commit 719658d078.
Breaks a few things, see comments on https://reviews.llvm.org/D128437
There's disagreement about the best fix.
So let's keep HEAD green while discussions are happening.
Summary:
When writing the offload binary, we use a SmallVector. We already know
the size that we expect the buffer to take up so we should reserve all
that memory up-front to improve performance. Also this patch adds some
extra sanity checks for the binary format for safety.
A llvm.vscale will always be at least 1, never zero. Teaching that to
isKnownNonZero can help fold away some statically known compares.
Differential Revision: https://reviews.llvm.org/D128217
Fixes#54629.
The crash is is caused by the double template instantiation.
See the added test. Here is what happens:
- Template arguments for the partial specialization get instantiated.
- This causes instantitation into the corrensponding requires
expression.
- `TemplateInsantiator` correctly handles instantiation of parameters
inside `RequiresExprBody` and instantiates the constraint expression
inside the `NestedRequirement`.
- To build the substituted `NestedRequirement`, `TemplateInsantiator`
calls `Sema::BuildNestedRequirement` calls
`CheckConstraintSatisfaction`, which results in another template
instantiation (with empty template arguments). This seem to be an
implementation detail to handle constraint satisfaction and is not
required by the standard.
- The recursive template instantiation tries to find the parameter
inside `RequiresExprBody` and fails with the corresponding assertion.
Note that this only happens as both instantiations happen with the class
partial template specialization set as `Sema.CurContext`, which is
considered a dependent `DeclContext`.
To fix the assertion, avoid doing the recursive template instantiation
and instead evaluate resulting expressions in-place.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D127487
I expect to eliminate this ambiguity at the grammar level by use of guards,
because it interferes with brace-based error recvoery.
Differential Revision: https://reviews.llvm.org/D127400
In GFX11 ShaderType is determined by the hardware and should no longer
be written into bits[3:2] of the ds_ordered_count offset field.
Differential Revision: https://reviews.llvm.org/D128196
MODULE FUNCTION and MODULE SUBROUTINE currently cause lowering crash:
"symbol is not mapped to any IR value" because special care is needed
to handle their interface.
Add a TODO for now.
Example of program that crashed and will hit the TODO:
```
module mod
interface
module subroutine sub
end subroutine
end interface
contains
module subroutine sub
x = 42
end subroutine
end module
```
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128412
Co-authored-by: Jean Perier <jperier@nvidia.com>