Commit Graph

425658 Commits

Author SHA1 Message Date
Peter Klausler 562fd2c99b [flang][runtime] Emit error message rather than crashing for MOD(ULO)(x,P=0)
Add extra arguments and checks to the runtime support library so that
a call to the intrinsic functions MOD and MODULO with "denominator"
argument P of zero will cause a crash with a source location rather
than an uninformative floating-point error or integer division by
zero signal.

Additional work is required in lowering to (1) pass source file path and
source line number arguments and (2) actually call these runtime
library APIs instead of emitting inline code for MOD &/or MODULO.

Differential Revision: https://reviews.llvm.org/D127034
2022-06-04 11:02:48 -07:00
Peter Klausler 11f928af9b [flang][runtime] Fix deadlock in error recovery
When an external I/O statement is in a recoverable error
state before any data transfers take place (for example,
an unformatted transfer with ERR=/IOSTAT=/IOMSG= attempted on
a formatted unit), ensure that the unit's mutex is still
released at the end of the statement.

Differential Revision: https://reviews.llvm.org/D127032
2022-06-04 09:55:53 -07:00
Peter Klausler ed71a0b45b [flang] When folding FINDLOC, convert operands to a common type
For example, FINDLOC(A,X) should convert both A and X to COMPLEX(8)
if the operands are REAL(8) and COMPLEX(4), so that comparisons
can be done without losing inforation.  The current implementation
unconditionally converts X to the type of the array A.

Differential Revision: https://reviews.llvm.org/D127030
2022-06-04 09:26:13 -07:00
Peter Klausler 9a163ffe1a [flang][runtime] Fix WRITE after OPEN(.., ACCESS="APPEND")
The initial size of the file was not being captured as the file position
on which the first output buffer should be framed.

Differential Revision: https://reviews.llvm.org/D127029
2022-06-04 09:18:25 -07:00
Peter Klausler dfcccc6dee [flang][runtime] Fix edge case discrepancies with EN output editing
The "engineering" ENw.d output editing descriptor has some difficult
edge case behavior for values that might format into a bunch of 9's
or round up to a 1 for a given scale factor.  Fix the algorithm,
and add tests to protect against regressions.

Differential Revision: https://reviews.llvm.org/D127028
2022-06-04 09:14:05 -07:00
Peter Klausler d484fe93d4 [flang] Don't crash on initialization with a zero-sized derived type
Avoid calls to memcpy with zero byte counts if their address argument
calculations may not be valid expressions.

Differential Revision: https://reviews.llvm.org/D127027
2022-06-04 08:58:16 -07:00
Peter Klausler ea5b205bb8 [flang][runtime] Don't crash after surviving internal output overflow
After the program has survived its attempt to overflow the output buffer
with an internal WRITE using ERR=, IOSTAT=, &/or IOMSG=, don't crash
by accidentally blank-filling the next record that usually doesn't exist.

Differential Revision: https://reviews.llvm.org/D127024
2022-06-04 08:47:13 -07:00
Peter Klausler ea1a69d66d [flang][runtime] Don't let random seed queries change the sequence
When the current seed of the pseudo-random generator is queried
with CALL RANDOM_SEED(GET=n), that query should not change the
stream of pseudo-random numbers produced by CALL RANDOM_NUMBER().

Differential Revision: https://reviews.llvm.org/D127023
2022-06-04 08:01:46 -07:00
Mehdi Amini 369ce54bb3 Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."
This reverts commit bcfc0a9051.

The build is broken with shared library enabled.
2022-06-04 08:35:45 +00:00
Fangrui Song 36c7d79dc4 Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
2022-06-04 00:10:42 -07:00
Christian Sigg bcfc0a9051 [MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.
This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because:

- it performs less Newton iterations
- it avoids the slow path for e.g. denormals
- it allows reuse of the reciprocal for multiple divisions by the same divisor

Test program:
```
#include <stdio.h>
#include "cuda_fp16.h"

// This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below
// and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values.
__device__ half hdiv_newton(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float rcp;
  asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb));

  float result = fa * rcp;
  auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000;
  if (exponent != 0 && exponent != 0x7f800000) {
    float err = __fmaf_rn(-fb, result, fa);
    result = __fmaf_rn(rcp, err, result);
  }

  return __float2half(result);
}

// Surprisingly, this is faster than CUDA's own __hdiv.
__device__ half hdiv_promote(half a, half b) {
  return __float2half(__half2float(a) / __half2float(b));
}

// This is an approximation that is accurate up to 1 ulp.
__device__ half hdiv_approx(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float result;
  asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb));
  return __float2half(result);
}

__global__ void CheckCorrectness() {
  int i = threadIdx.x + blockIdx.x * blockDim.x;
  half x = reinterpret_cast<const half&>(i);
  for (int j = 0; j < 65536; ++j) {
    half y = reinterpret_cast<const half&>(j);
    half d1 = hdiv_newton(x, y);
    half d2 = hdiv_promote(x, y);
    auto s1 = reinterpret_cast<const short&>(d1);
    auto s2 = reinterpret_cast<const short&>(d2);
    if (s1 != s2) {
      printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n",
             __half2float(x), i, __half2float(y), j, __half2float(d1), s1,
             __half2float(d2), s2);
      //__trap();
    }
  }
}

__device__ half dst;

__global__ void ProfileBuiltin(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = x / x;
  }
  dst = x;
}

__global__ void ProfilePromote(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_promote(x, x);
  }
  dst = x;
}

__global__ void ProfileNewton(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_newton(x, x);
  }
  dst = x;
}

__global__ void ProfileApprox(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_approx(x, x);
  }
  dst = x;
}

int main() {
  CheckCorrectness<<<256, 256>>>();
  half one = __float2half(1.0f);
  ProfileBuiltin<<<1, 1>>>(one);  // 1.001s
  ProfilePromote<<<1, 1>>>(one);  // 0.560s
  ProfileNewton<<<1, 1>>>(one);   // 0.508s
  ProfileApprox<<<1, 1>>>(one);   // 0.304s
  auto status = cudaDeviceSynchronize();
  printf("%s\n", cudaGetErrorString(status));
}
```

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D126158
2022-06-04 08:03:29 +02:00
Peter Klausler 9c54d76251 [flang][runtime] Signal new I/O error on floating-point input overflow
Besides raising the IEEE floating-point overflow exception, treat
a floating-point overflow on input as an I/O error catchable with
ERR=, IOSTAT=, &/or IOMSG=.

Differential Revision: https://reviews.llvm.org/D127022
2022-06-03 22:55:03 -07:00
Amir Ayupov b346af6d44 [BOLT][UTILS] Usability improvements for nfc-check-setup
# Stash local changes before checkout.
# Print a message that the source repository revision has been changed, with
  instructions to switch back.
# Make the script executable.
# Print sample instructions how to run bolt tests.
# Assume that llvm-bolt-wrapper script is in the same source directory.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126941
2022-06-03 22:54:56 -07:00
Peter Klausler 08c6a32381 [flang] Don't discard lower bounds of implicit-shape named constants
F18 preserves lower bounds of explicit-shape named constant arrays, but
failed to also do so for implicit-shape named constants.  Fix.

Differential Revision: https://reviews.llvm.org/D127021
2022-06-03 22:45:12 -07:00
Peter Klausler f3278e0f3c [flang][runtime] Ensure that 0. <= RANDOM_NUMBER() < 1.
It was possible for RANDOM_NUMBER() to return 1.0.

Differential Revision: https://reviews.llvm.org/D127020
2022-06-03 22:44:19 -07:00
Fangrui Song 025b309631 Revert D126950 "[lld][WebAssembly] Retain data segments referenced via __start/__stop"
This reverts commit dcf3368e33.

It breaks -DLLVM_ENABLE_ASSERTIONS=on builds. In addition, the description is
incorrect about ld.lld behavior. For wasm, there should be justification to add
the new mode.
2022-06-03 22:18:06 -07:00
Peter Klausler 15faac900d [flang] Distinguish intrinsic module USE in module files; correct search paths
In the USE statements that f18 emits to module files, ensure that symbols
from intrinsic modules are marked as such on their USE statements.  And
ensure that the current working directory (".") cannot override the intrinsic
module search path when trying to locate an intrinsic module.

Differential Revision: https://reviews.llvm.org/D127019
2022-06-03 22:07:44 -07:00
Fangrui Song 72f9c69421 [Hexagon][bolt] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Similar to 557efc9a8b
2022-06-03 22:04:57 -07:00
Fangrui Song 734c223445 [clang-link-wrapper] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Similar to 557efc9a8b
2022-06-03 22:02:11 -07:00
Fangrui Song 557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
LiaoChunyu f14d18c7a9 [RISCV] Add more patterns for FNMADD
D54205 handles fnmadd: -rs1 * rs2 - rs3
This patch add fnmadd: -(rs1 * rs2 + rs3) (the nsz flag on the FMA)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126852
2022-06-04 12:31:45 +08:00
varconst 7c63cc198b [libc++][ranges][NFC] Fix a patch link in ranges status. 2022-06-03 20:39:00 -07:00
varconst faf43ad7ae [libc++][ranges][NFC] Mark range algorithms that are in progress. 2022-06-03 20:02:46 -07:00
Yuta Saito dcf3368e33 [lld][WebAssembly] Retain data segments referenced via __start/__stop
As well as ELF linker does, retain all data segments named X referenced
through `__start_X` or `__stop_X`.

For example, `FOO_MD` should not be stripped in the below case, but it's currently mis-stripped

```llvm
@FOO_MD  = global [4 x i8] c"bar\00", section "foo_md", align 1
@__start_foo_md = external constant i8*
@__stop_foo_md = external constant i8*
@llvm.used = appending global [1 x i8*] [i8* bitcast (i32 ()* @foo_md_size to i8*)], section "llvm.metadata"

define i32 @foo_md_size()  {
entry:
  ret i32 sub (
    i32 ptrtoint (i8** @__stop_foo_md to i32),
    i32 ptrtoint (i8** @__start_foo_md to i32)
  )
}
```

This fixes https://github.com/llvm/llvm-project/issues/55839

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D126950
2022-06-04 02:28:31 +00:00
Peter Klausler e0adee8481 [flang] Correct folding of CSHIFT and EOSHIFT for DIM>1
The algorithm was wrong for higher dimensions, and so were
the expected test results.  Rework.

Differential Revision: https://reviews.llvm.org/D127018
2022-06-03 18:59:44 -07:00
Fangrui Song 47ec8b5574 [pseudo] Fix leaks after D126731
Array Operator new Cookies help lsan find allocations, while std::array
can't.
2022-06-03 18:43:16 -07:00
Peter Klausler aa77cf90aa [flang][runtime] Signal format error when input field width is zero
A data edit descriptor for input may not have a zero field width.

Differential Revision: https://reviews.llvm.org/D127017
2022-06-03 18:11:00 -07:00
Peter Klausler e5a4f730da [flang][runtime] OPEN write-only files
If a file being opened with no ACTION= is write-only then cope with
it rather than defaulting prematurely to treating it as read-only.

Differential Revision: https://reviews.llvm.org/D127015
2022-06-03 18:09:40 -07:00
Craig Topper cc3bd43533 [RISCV] Support LUI+ADDIW in doPeepholeLoadStoreADDI.
This fixes an inconsistency between RV32 and RV64. Still considering
trying to do this peephole during isel, but wanted to fix the
inconsistency first.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126986
2022-06-03 18:06:56 -07:00
Peter Klausler 9878facfd0 [flang][runtime] INQUIRE(FILE="...",SIZE=nbytes)
Implement inquire-by-file SIZE= specifier.

Differential Revision: https://reviews.llvm.org/D127014
2022-06-03 18:05:27 -07:00
Jake Egan c3c75d805c [clang][test] Mark test arm-float-abi-lto.c unsupported on AIX
This test is failing after the introduction of opaque pointers (https://reviews.llvm.org/D125847). The test is flaky and fails from segmentation fault, but it's unclear why. So, mark this test unsupported while it's investigated.
2022-06-03 21:04:56 -04:00
Paul Pluzhnikov 490990bb1f [test] Modify test to verify D126396 (Clean "./" from __FILE__ expansion)
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D127009
2022-06-03 17:54:03 -07:00
Peter Klausler da63fee0d0 [flang][runtime] Allow extra character for E0.0 output editing
When the digit count ('d') is zero in E0 editing, allow for one more
output character; otherwise, any - or + sign in the output causes
an output field overflow.

Differential Revision: https://reviews.llvm.org/D127013
2022-06-03 17:41:22 -07:00
wren romano 3cf03f1c56 [mlir][sparse] Adding IsSparseTensorPred and updating ops to use it
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D126994
2022-06-03 17:15:31 -07:00
Peter Klausler 604016dbe4 [flang][runtime] Fix bug with extra leading zero in octal output
Octal (O) output editing often emits an extra leading 0 digit
due to the total digit count being off by one since word sizes
aren't multiples of three bits.

Differential Revision: https://reviews.llvm.org/D127012
2022-06-03 17:02:07 -07:00
Peter Klausler 66a871b973 [flang] Fix crash in IsSaved()
Code was accessing ProcEntityDetails in a symbol that didn't have them.

Differential Revision: https://reviews.llvm.org/D127011
2022-06-03 17:00:01 -07:00
Florian Mayer 53c1584063 [NFC] [libunwind] turn assert into static_assert
Reviewed By: #libunwind, MaskRay

Differential Revision: https://reviews.llvm.org/D126987
2022-06-03 16:32:42 -07:00
Clemens Wasser 42c7f494d9 [tools] Forward declare classes & remove includes
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D120208
2022-06-03 16:32:04 -07:00
Christopher Bate 9f819f4c62 [mlir][linalg] fix crash in vectorization of elementwise operations
The current vectorization logic implicitly expects "elementwise"
linalg ops to have projected permutations for indexing maps, but
the precondition logic misses this check. This can result in a
crash when executing the generic vectorization transform on an op
with a non-projected permutation input indexing map. This change
fixes the logic and adds a test (which crashes without this fix).

Differential Revision: https://reviews.llvm.org/D127000
2022-06-03 16:38:13 -06:00
Florian Mayer f60875254b [DWARF] Show which augmentation character was unrecognized.
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D127003
2022-06-03 15:35:33 -07:00
Brad Smith a0bc67e555 [Hexagon] Enable IAS in the Hexagon backend
Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D123096
2022-06-03 18:15:12 -04:00
Anders Waldenborg dd2362a8ba [clang] Allow const variables with weak attribute to be overridden
A variable with `weak` attribute signifies that it can be replaced with
a "strong" symbol link time. Therefore it must not emitted with
"weak_odr" linkage, as that allows the backend to use its value in
optimizations.

The frontend already considers weak const variables as
non-constant (note_constexpr_var_init_weak diagnostic) so this change
makes frontend and backend consistent.

This commit reverses the
  f49573d1 weak globals that are const should get weak_odr linkage.
commit from 2009-08-05 which introduced this behavior. Unfortunately
that commit doesn't provide any details on why the change was made.

This was discussed in
https://discourse.llvm.org/t/weak-attribute-semantics-on-const-variables/62311

Differential Revision: https://reviews.llvm.org/D126324
2022-06-03 23:44:15 +02:00
Joseph Huber 1257fe193a [Clang] Change the offload packager build to be a clang tool
Summary:
This patch changes the CMake build configruation for the
`clang-offload-packager` to be a clang tool rather than an executable.
2022-06-03 17:35:26 -04:00
Diego Caballero 9a79b1b04c [mlir] Add peeling xform to Codegen Strategy
This patch adds the knobs to use peeling in the codegen strategy
infrastructure.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D126842
2022-06-03 21:31:43 +00:00
Huan Nguyen 5ac26156fe [BOLT][NFC] Warning for deprecated option '-reorder-blocks=cache+'
Emit warning when using deprecated option '-reorder-blocks=cache+'.
Auto switch to option '-reorder-blocks=ext-tsp'.

Test Plan:
```
ninja check-bolt
```
Added a new test cache+-deprecated.test.
Run and verify that the upstream tests are passed.

Reviewed By: rafauler, Amir, maksfb

Differential Revision: https://reviews.llvm.org/D126722
2022-06-03 14:16:55 -07:00
Jacob Weightman 814a0abcce AMDGPU: allow reordering of functions in AMDGPUResourceUsageAnalysis
The AMDGPUResourceUsageAnalysis was previously a CGSCC pass, and assumed
that a function's callees were always analyzed prior to their callees.
When it was refactored into a module pass, this assumption no longer
always holds. This results in calls being erroneously identified as
indirect, and reserving private segment space for them. This results in
significantly slower kernel launch latency.

This patch changes the order in which the module's functions are analyzed
from the order in which they occur in the module to a post-order traversal
of the call graph. Perhaps Clang always generates the module's functions
in such an order, but this is not the case for the Cray Fortran compiler.

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D126025
2022-06-03 15:55:54 -05:00
Craig Topper 8da5d5dbdc [RISCV] Pre-commit test cases for D126986. NFC 2022-06-03 13:31:45 -07:00
Reid Kleckner d82b4fe50d [bazel] Update build for config.h.cmake change 2022-06-03 12:58:04 -07:00
Tue Ly 484319f497 [libc] Make expm1f correctly rounded when the targets have no FMA instructions.
Add another exceptional value and fix the case when |x| is small.

Performance tests with CORE-MATH project scripts:
With FMA instructions on Ryzen 1700:
```
$ ./perf.sh expm1f
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
CORE-MATH reciprocal throughput   : 15.362
System LIBC reciprocal throughput : 53.194
LIBC reciprocal throughput        : 14.595
$ ./perf.sh expm1f --latency
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
CORE-MATH latency   : 57.755
System LIBC latency : 147.020
LIBC latency        : 60.269
```
Without FMA instructions:
```
$ ./perf.sh expm1f
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
CORE-MATH reciprocal throughput   : 15.362
System LIBC reciprocal throughput : 53.300
LIBC reciprocal throughput        : 18.020
$ ./perf.sh expm1f --latency
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
CORE-MATH latency   : 57.758
System LIBC latency : 147.025
LIBC latency        : 70.304
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D123440
2022-06-03 15:57:48 -04:00
Joe Loser 4fc502368a
[libc++][test] Skip string_view tests for other vendors on older modes
`string_view` is supported all the way back to C++03 as an extension in
`libc++`, and so many of the tests run in all standards modes for all vendors.
This is unlikely desired by other standard library vendors using our test suite.
So, disable the tests for vendors other than `libc++` in these older standards
modes.

Differential Revision: https://reviews.llvm.org/D126850
2022-06-03 13:51:49 -06:00