The comments in the existing code appear to pre-exist the standardization of the +v extension. In particular, the specification *does* provide a bound on the maximum value VLEN can take. From what I can tell, the LMUL comment was simply a misunderstanding of what this API returns.
This API returns the maximum value that vscale can take at runtime. This is used in the vectorizer to bound the largest scalable VF (e.g. LMUL in RISCV terms) which can be used without violating memory dependence.
Differential Revision: https://reviews.llvm.org/D128538
This patch introduces a new feature that allows InstrBuilder to reuse
mca::Instruction recycled from IncrementalSourceMgr. This significantly
reduces the memory footprint.
Note that we're only recycling instructions that have static InstrDesc
and no variadic operands.
Differential Revision: https://reviews.llvm.org/D127084
The new resumable mca::Pipeline capability introduced in this patch
allows users to save the current state of pipeline and resume from the
very checkpoint.
It is better (but not require) to use with the new IncrementalSourceMgr,
where users can add mca::Instruction incrementally rather than having a
fixed number of instructions ahead-of-time.
Note that we're using unit tests to test these new features. Because
integrating them into the `llvm-mca` tool will make too many churns.
Differential Revision: https://reviews.llvm.org/D127083
`equivalentBoolValues` compares equivalence between two booleans. The current implementation does not consider constraints imposed by flow conditions on the booleans and its subvalues.
Depends On D128520
Reviewed By: gribozavr2, xazax.hun
Differential Revision: https://reviews.llvm.org/D128521
Given a set of `Constraints`, `querySolver` adds common background information across queries (`TrueVal` is always true and `FalseVal` is always false) and passes the query to the solver.
`checkUnsatisfiable` is a simple wrapper around `querySolver` for checking that the solver returns an unsatisfiable result.
Depends On D128519
Reviewed By: gribozavr2, xazax.hun
Differential Revision: https://reviews.llvm.org/D128520
Just force the aarch64 target compilation (after making sure the test
only runs if that target is available).
Because global metadata isn't target-specific, just selecting a target
here is fine.
Should fix https://reviews.llvm.org/D127544#3609312
Fix-forward for https://reviews.llvm.org/D127544#3609312
IR pass has some target-specific inline asm lowering that check-fails
for non-x86 non-aarch64 targets.
For now, just run these tests only on those targets.
Summary:
Patch D123580 changed to use bit fields for strings in long and short mode. As a result, this changes the layout of these strings on AIX because bit fields on AIX are 4 bytes, which breaks the ABI compatibility with earlier strings before the change on AIX. This patch uses the attribute 'packed' and anonymous structure to make string layout compatible. This patch will also make test cases alignof.compile.pass.cpp and sizeof.compile.pass.cpp introduced in D127672 pass on AIX.
Reviewed by: philnik, Mordante, hubert.reinterpretcast, libc++
Differential Revision: https://reviews.llvm.org/D128285
To keep functionality of creating boolean expressions in a consistent location.
Depends On D128357
Reviewed By: gribozavr2, sgatev, xazax.hun
Differential Revision: https://reviews.llvm.org/D128519
This is a implementation of find remainder fmod function from standard libm.
The underline algorithm is developed by myself, but probably it was first
invented before.
Some features of the implementation:
1. The code is written on more-or-less modern C++.
2. One general implementation for both float and double precision numbers.
3. Spitted platform/architecture dependent and independent code and tests.
4. Tests covers 100% of the code for both float and double numbers. Tests cases with NaN/Inf etc is copied from glibc.
5. The new implementation in general 2-4 times faster for “regular” x,y values. It can be 20 times faster for x/y huge value, but can also be 2 times slower for double denormalized range (according to perf tests provided).
6. Two different implementation of division loop are provided. In some platforms division can be very time consuming operation. Depend on platform it can be 3-10 times slower than multiplication.
Performance tests:
The test is based on core-math project (https://gitlab.inria.fr/core-math/core-math). By Tue Ly suggestion I took hypot function and use it as template for fmod. Preserving all test cases.
`./check.sh <--special|--worst> fmodf` passed.
`CORE_MATH_PERF_MODE=rdtsc ./perf.sh fmodf` results are
```
GNU libc version: 2.35
GNU libc release: stable
21.166 <-- FPU
51.031 <-- current glibc
37.659 <-- this fmod version.
```
getRealMaxVLen returns an upper bound on the value of VLEN. We can use this upper bound (which unless explicitly set at command line is going to result in a e8 MaxVLMax of much greater than 256) instead of explicitly handling the unknown case separately from the bounded by number greater than 256 case.
Note as well that this code already implicitly depends on a capped value for VLEN. If infinite VLEN were possible, than 16 bit indices wouldn't be enough.
A flow condition is represented with an atomic boolean token, and it is bound to a set of constraints: `(FC <=> C1 ^ C2 ^ ...)`. \
This was internally represented as `(FC v !C1 v !C2 v ...) ^ (C1 v !FC) ^ (C2 v !FC) ^ ...` and tracked by 2 maps:
- `FlowConditionFirstConjunct` stores the first conjunct `(FC v !C1 v !C2 v ...)`
- `FlowConditionRemainingConjuncts` stores the remaining conjuncts `(C1 v !FC) ^ (C2 v !FC) ^ ...`
This patch simplifies the tracking of the constraints by using a single `FlowConditionConstraints` map which stores `(C1 ^ C2 ^ ...)`, eliminating the use of two maps.
Reviewed By: gribozavr2, sgatev, xazax.hun
Differential Revision: https://reviews.llvm.org/D128357
DWARF 5 added two new attributes DW_AT_call_pc and DW_AT_call_return_pc.
Adding support for them.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D128526
When creating a scf.for without argument a scf.yield is automatically
created. Make sure we don't create a second one.
Differential Revision: https://reviews.llvm.org/D128405
This doesn't change behavior, it just makes it slightly more obvious what's
going on. Note that getRealMinVLen is always >= getMinRVVVectorSizeInBits.
The first case is a bit tricky, as you have to know that
getMinRVVVectorSizeInBits returns 0 when not set, and thus is equivalent
to the else value clause. The new code structure makes it more obvious we
return 0 unless using RVV for fixed length vectors.
Lower the `parallel loop` contrsuct and refactor some of the code
of parallel and loop lowering to be reused.
Also add tests for loop and parallel since they were not upstreamed.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128510
In merge FSOURCE and TSOURCE must have the same Fortran dynamic types,
but this does not imply that FSOURCE and TSOURCE will be lowered to the
same MLIR types. For instance, TSOURCE may be a character expression
with a compile type constant length (!fir.char<1,4>) while FSOURCE may
have dynamic length (!fir.char<1,?>).
Cast FSOURCE to TSOURCE MLIR types to handle these cases.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128507
Co-authored-by: Jean Perier <jperier@nvidia.com>
Currently, `__attribute__((no_sanitize('hwaddress')))` is not possible. Add this piece of plumbing, and now that we properly support copying attributes between an old and a new global variable, add a regression test for the GlobalOpt bug that previously lost the attribute.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D127544
Globals that shouldn't be sanitized are currently communicated to HWASan
through the use of the llvm.asan.globals IR metadata. Now that we have
an on-GV attribute, use it.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D127543
Explicitly map host associated symbols in DoConcurrent with shared
locality-spec, clauses in OpenMP/OpenACC. The mapping of host-assoc
symbols is set to their parent SymbolBox. This is achieved through
a new interface function in the AbstractConverter.
This was already upstream for OpenMP.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128518
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
LBOUND with a non constant DIM argument use the runtime to allow runtime
verification of DIM <= RANK. The interface uses a descriptor. This caused
undefined behavior because the runtime believed it was seeing an explicit
shape arrays with zero extent and returned `1` (the runtime descriptor
does not allow making a difference between an explicit shape and an
assumed size. Assumed size are not meant to be described by runtime
descriptors).
Fix the issue by setting the last extent of assumed size to `1` when
creating the descriptor to inquire about the LBOUND with the runtime.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D128509
Co-authored-by: Jean Perier <jperier@nvidia.com>
We currently have a costing bug around the etype == ELEN case, so add otherwise duplicate tests to show test diffs as I work on other parts of costing.
The requirements for "thread until <line number>" are:
a) If any code contributed by <line number> or the nearest subsequent of <line number> is executed before leaving the function, stop
b) If you end up leaving the function w/o triggering (a), then stop
In case of (a), since the <line number> may have multiple entries in the line table and the compiler might have scheduled/moved the relevant code across, and the lldb does not know the control flow, set breakpoints on all the line table entries of best match of <line number> i.e. exact or the nearest subsequent line.
Along with the above, currently, CommandObjectThreadUntil is also setting the breakpoints on all the subsequent line numbers after the best match and this latter part is wrong.
This issue is discussed at http://lists.llvm.org/pipermail/lldb-dev/2018-August/013979.html.
In fact, currently `TestStepUntil.py` is not actually testing step until scenarios and `test_missing_one` test fails without this patch if tests are made to run. Fixed the test as well.
Reviewed By: jingham
Differential Revision: https://reviews.llvm.org/D50304
Putting some direct use restrictions on tensor allocations in the
sparse case enables the use of simplifying assumptions in the
bufferization analysis.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D128463
Because the diagnostic events are processed by the default event handler
in its own thread, tests cannot rely on output ordering. Split stdout
and stderr to make the test reliable again.
Add a system log handler that emits log messages to the operating system
log. In addition to the log handler itself, this patch also introduces a
new Host::SystemLog helper function to abstract over writing to the
system log.
Differential revision: https://reviews.llvm.org/D128321
This patch gives basic parsing and semantic support for "masked taskloop"
construct introduced in OpenMP 5.1 (section 2.16.7)
Differential Revision: https://reviews.llvm.org/D128478
This fixes a bug in clang where it emits the following diagnostic when
compiling the test case:
"argument to 'sizeof' in 'memset' call is the same pointer type 'S' as
the destination"
The code that merges __auto_type with other types was committed in
https://reviews.llvm.org/D122029.
Differential Revision: https://reviews.llvm.org/D128373
As it exists today, Host::SystemLog is used exclusively for error
reporting. With the introduction of diagnostic events, we have a better
way of reporting those. Instead of printing directly to stderr, these
messages now get printed to the debugger's error stream (when using the
default event handler). Alternatively, if someone is listening for these
events, they can decide how to display them, for example in the context
of an IDE such as Xcode.
This change also means we no longer write these messages to the system
log on Darwin. As far as I know, nobody is relying on this, but I think
this is something we could add to the diagnostic event mechanism.
Differential revision: https://reviews.llvm.org/D128480
Improved/fixed cost modeling for shuffles by providing masks, improved
cost model for non-identity insertelements.
Differential Revision: https://reviews.llvm.org/D115462
The result of pointer subtraction is of type ptrdiff_t, which is not necessarily the same underlying type as ssize_t. This can lead to a compilation error since std::min requires both parameters to be the same type.
Fixes: https://github.com/llvm/llvm-project/issues/54846
Reviewed By: alexander-shaposhnikov, drodriguez, jhenderson
Differential Revision: https://reviews.llvm.org/D128117