Profiling a basic internal real input read benchmark shows some
hot spots in the code used to prepare input for decimal-to-binary
conversion, which is of course where the time should be spent.
The library that implements decimal to/from binary conversions has
been optimized, but not the code in the Fortran runtime that calls it,
and there are some obvious light changes worth making here.
Move some member functions from *.cpp files into the class definitions
of Descriptor and IoStatementState to enable inlining and specialization.
Make GetNextInputBytes() the new basic input API within the
runtime, replacing GetCurrentChar() -- which is rewritten in terms of
GetNextInputBytes -- so that input routines can have the
ability to acquire more than one input character at a time
and amortize overhead.
These changes speed up the time to read 1M random reals
using internal I/O from a character array from 1.29s to 0.54s
on my machine, which on par with Intel Fortran and much faster than
GNU Fortran.
Differential Revision: https://reviews.llvm.org/D113697
The algorithm for Fw.d output will drive binary to decimal conversion for
an initial fixed number of digits, then adjust that number based on the
result's exposent. For value close to a power of ten, this adjustment
process wouldn't terminate; e.g., formatting 9.999 as F10.2 would start
with 1e2, boost the digits to 2, get 9.99e1, decrease the digits, and loop.
Solve by refusing to boost the digits a second time.
Differential Revision: https://reviews.llvm.org/D107490
A recent patch exposed an assumption that "long double" is (at least)
an 80-bit floating-point type, which of course it is not in MSVC.
Also get it right for non-x87 floating-point.
Add runtime APIs, implementations, and tests for ALL, ANY, COUNT,
MAXLOC, MAXVAL, MINLOC, MINVAL, PRODUCT, and SUM reduction
transformantional intrinsic functions for all relevant argument
and result types and kinds, both without DIM= arguments
(total reductions) and with (partial reductions).
Complex-valued reductions have their APIs in C so that
C's _Complex types can be used for their results.
Some infrastructure work was also necessary or noticed:
* Usage of "long double" in the compiler was cleaned up a
bit, and host dependences on x86 / MSVC have been isolated
in a new Common/long-double header.
* Character comparison has been exposed via an extern template
so that reductions could use it.
* Mappings from Fortran type category/kind to host C++ types
and vice versa have been isolated into runtime/cpp-type.h and
then used throughout the runtime as appropriate.
* The portable 128-bit integer package in Common/uint128.h
was generalized to support signed comparisons.
* Bugs in descriptor indexing code were fixed.
Differential Revision: https://reviews.llvm.org/D99666
Tweak binary->decimal conversions to avoid an integer multiplication
in a hot loop to improve readability and get a minor (~5%) speed-up.
Use native integer division by constants for more readability, too,
since current build compilers seem to optimize it correctly now.
Delete the now needless temporary work-around facility in
Common/unsigned-const-division.h.
Differential revision: https://reviews.llvm.org/D88604
Use short division of big-radix values by powers of two when
converting values with negative unbiased exponents rather than
multiplication by smaller powers of five; this reduces the overall
outer iteration count. This change is a win across the entire range
of inputs.
Reviewed By: tskeith
Differential Revision: https://reviews.llvm.org/D83806
There were dependences upon LLVM libraries in the Fortran
runtime support library binaries due to some indirect #includes
of llvm/Support/raw_ostream.h, which caused some kind of internal
ABI version consistency checking to get pulled in. Fixed by
cleaning up some includes.
Reviewed By: tskeith, PeteSteinfeld, sscalpone
Differential Revision: https://reviews.llvm.org/D83060
Summary: Apply workaround done in D81179 only for GCC < 8. As @klausler mentioned in D81179 we want to avoid additional checks for other compilers that do not need them.
Reviewers: DavidTruby, klausler, jdoerfert, sscalpone
Reviewed By: klausler, sscalpone
Subscribers: llvm-commits, tskeith, isuruf, klausler
Tags: #flang, #llvm
Differential Revision: https://reviews.llvm.org/D81208
Summary:
Fix decimal formatting of 80-bit x87 values; the calculation ofnearest neighbor values failed to account for the explicitmost significant bit in that format.
Replace MultiplyByRounded with MultiplyBy in binary->decimal conversions,
since rounding won't happen and the name was misleading; then remove
dead code, and migrate LoseLeastSignificantDigit() from one source file
to another where it's still needed.
Reviewers: tskeith, sscalpone, jdoerfert, DavidTruby
Reviewed By: tskeith
Subscribers: llvm-commits, flang-commits
Tags: #flang, #llvm
Differential Revision: https://reviews.llvm.org/D79345
In general all the basic functionality seems to work and removes some redundancy
and more complicated features in favor of borrowing infrastructure from LLVM
build configurations. Here's a quick summary of details and remaining issues:
* Testing has spanned Ubuntu 18.04 & 19.10, CentOS 7, RHEL 8, and
MacOS/darwin. Architectures include x86_64 and Arm. Without
access to Window nothing has been tested there yet.
* As we change file and directory naming schemes (i.e.,
capitalization) some odd things can occur on MacOS systems with
case preserving but not case senstive file system configurations.
Can be painful and certainly something to watch out for as any
any such changes continue.
* Testing infrastructure still needs to be tuned up and worked on.
Note that there do appear to be cases of some tests hanging (on
MacOS in particular). They appear unrelated to the build
process.
* Shared library configurations need testing (and probably fixing).
* Tested both standalone and 'in-mono repo' builds. Changes for
supporting the mono repo builds will require LLVM-level changes that
are straightforward when the time comes.
* The configuration contains a work-around for LLVM's C++ standard mode
passing down into Flang/F18 builds (i.e., LLVM CMake configuration would
force a -std=c++11 flag to show up in command line arguments. The
current configuration removes that automatically and is more strict in
following new CMake guidelines for enforcing C++17 mode across all the
CMake files.
* Cleaned up a lot of repetition in the command line arguments. It
is likely that more work is still needed to both allow for
customization and working around CMake defailts (or those
inherited from LLVM's configuration files). On some platforms agressive
optimization flags (e.g. -O3) can actually break builds due to the inlining
of templates in .cpp source files that then no longer are available for use
cases outside those source files (shows up as link errors). Sticking at -O2
appears to fix this. Currently this CMake configuration forces this in
release mode but at the cost of stomping on any CMake, or user customized,
settings for the release flags.
* Made the lit tests non-source directory dependent where appropriate. This is
done by configuring certain test shell files to refer to the correct paths
whether an in or out of tree build is being performed. These configured
files are output in the build directory. A %B substitution is introduced in
lit to refer to the build directory, mirroring the %S substitution for the
source directory, so that the tests can refer to the configured shell scripts.
Co-authored-by: David Truby <david.truby@arm.com>
Original-commit: flang-compiler/f18@d1c7184159
Reviewed-on: https://github.com/flang-compiler/f18/pull/1045