Commit Graph

425669 Commits

Author SHA1 Message Date
Kazu Hirata e0039b8d6a Use llvm::less_second (NFC) 2022-06-04 22:48:32 -07:00
Kazu Hirata 9a8e65de8c [Target] Use MachineBasicBlock::erase (NFC) 2022-06-04 22:41:24 -07:00
Kazu Hirata bcf4fa458a [CodeGen] Use a range-based for loop (NFC) 2022-06-04 22:26:55 -07:00
Kazu Hirata 8cc9fa6f78 Use static_cast from SmallString to std::string (NFC) 2022-06-04 22:09:27 -07:00
Kazu Hirata 4969a6924d Use llvm::less_first (NFC) 2022-06-04 21:23:18 -07:00
Kazu Hirata 32ce076d78 [CodeGen] Use StringRef::contains (NFC) 2022-06-04 20:58:58 -07:00
Kazu Hirata f83a88a179 [Transforms] Use llvm::is_contained (NFC) 2022-06-04 20:48:26 -07:00
LemonBoy 700eadca5f [SPARC] Fix type for i64 inline asm operands
Differential Revision: https://reviews.llvm.org/D101694
2022-06-04 18:32:16 -04:00
Florian Hahn 416a5080d8
[VPlan] Update vector latch terminator edge to exit block after execution.
Instead of setting the successor to the exit using CFG.ExitBB, set it to
nullptr initially. The successor to the exit block is later set either
through createEmptyBasicBlock or after VPlan execution (because at the
moment, no block is created by VPlan for the exit block, the existing
one is reused).

This also enables BranchOnCond to be used as terminator for the exiting
block of the topmost vector region.

Depends on D126618.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126679
2022-06-04 21:22:32 +01:00
Jacques Pienaar 29794ab0fa [mlir] Use context provided rather than getContext
Avoids "pass state was never initialized" assertion failure.
2022-06-04 12:18:51 -07:00
Peter Klausler 03c066ab13 [flang][runtime] Catch OPEN of connected file
Diagnose OPEN(FILE=f) when f is already connected by the same name to
a distinct external I/O unit.

Differential Revision: https://reviews.llvm.org/D127035
2022-06-04 11:06:37 -07:00
Peter Klausler 562fd2c99b [flang][runtime] Emit error message rather than crashing for MOD(ULO)(x,P=0)
Add extra arguments and checks to the runtime support library so that
a call to the intrinsic functions MOD and MODULO with "denominator"
argument P of zero will cause a crash with a source location rather
than an uninformative floating-point error or integer division by
zero signal.

Additional work is required in lowering to (1) pass source file path and
source line number arguments and (2) actually call these runtime
library APIs instead of emitting inline code for MOD &/or MODULO.

Differential Revision: https://reviews.llvm.org/D127034
2022-06-04 11:02:48 -07:00
Peter Klausler 11f928af9b [flang][runtime] Fix deadlock in error recovery
When an external I/O statement is in a recoverable error
state before any data transfers take place (for example,
an unformatted transfer with ERR=/IOSTAT=/IOMSG= attempted on
a formatted unit), ensure that the unit's mutex is still
released at the end of the statement.

Differential Revision: https://reviews.llvm.org/D127032
2022-06-04 09:55:53 -07:00
Peter Klausler ed71a0b45b [flang] When folding FINDLOC, convert operands to a common type
For example, FINDLOC(A,X) should convert both A and X to COMPLEX(8)
if the operands are REAL(8) and COMPLEX(4), so that comparisons
can be done without losing inforation.  The current implementation
unconditionally converts X to the type of the array A.

Differential Revision: https://reviews.llvm.org/D127030
2022-06-04 09:26:13 -07:00
Peter Klausler 9a163ffe1a [flang][runtime] Fix WRITE after OPEN(.., ACCESS="APPEND")
The initial size of the file was not being captured as the file position
on which the first output buffer should be framed.

Differential Revision: https://reviews.llvm.org/D127029
2022-06-04 09:18:25 -07:00
Peter Klausler dfcccc6dee [flang][runtime] Fix edge case discrepancies with EN output editing
The "engineering" ENw.d output editing descriptor has some difficult
edge case behavior for values that might format into a bunch of 9's
or round up to a 1 for a given scale factor.  Fix the algorithm,
and add tests to protect against regressions.

Differential Revision: https://reviews.llvm.org/D127028
2022-06-04 09:14:05 -07:00
Peter Klausler d484fe93d4 [flang] Don't crash on initialization with a zero-sized derived type
Avoid calls to memcpy with zero byte counts if their address argument
calculations may not be valid expressions.

Differential Revision: https://reviews.llvm.org/D127027
2022-06-04 08:58:16 -07:00
Peter Klausler ea5b205bb8 [flang][runtime] Don't crash after surviving internal output overflow
After the program has survived its attempt to overflow the output buffer
with an internal WRITE using ERR=, IOSTAT=, &/or IOMSG=, don't crash
by accidentally blank-filling the next record that usually doesn't exist.

Differential Revision: https://reviews.llvm.org/D127024
2022-06-04 08:47:13 -07:00
Peter Klausler ea1a69d66d [flang][runtime] Don't let random seed queries change the sequence
When the current seed of the pseudo-random generator is queried
with CALL RANDOM_SEED(GET=n), that query should not change the
stream of pseudo-random numbers produced by CALL RANDOM_NUMBER().

Differential Revision: https://reviews.llvm.org/D127023
2022-06-04 08:01:46 -07:00
Mehdi Amini 369ce54bb3 Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."
This reverts commit bcfc0a9051.

The build is broken with shared library enabled.
2022-06-04 08:35:45 +00:00
Fangrui Song 36c7d79dc4 Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
2022-06-04 00:10:42 -07:00
Christian Sigg bcfc0a9051 [MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.
This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because:

- it performs less Newton iterations
- it avoids the slow path for e.g. denormals
- it allows reuse of the reciprocal for multiple divisions by the same divisor

Test program:
```
#include <stdio.h>
#include "cuda_fp16.h"

// This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below
// and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values.
__device__ half hdiv_newton(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float rcp;
  asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb));

  float result = fa * rcp;
  auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000;
  if (exponent != 0 && exponent != 0x7f800000) {
    float err = __fmaf_rn(-fb, result, fa);
    result = __fmaf_rn(rcp, err, result);
  }

  return __float2half(result);
}

// Surprisingly, this is faster than CUDA's own __hdiv.
__device__ half hdiv_promote(half a, half b) {
  return __float2half(__half2float(a) / __half2float(b));
}

// This is an approximation that is accurate up to 1 ulp.
__device__ half hdiv_approx(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float result;
  asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb));
  return __float2half(result);
}

__global__ void CheckCorrectness() {
  int i = threadIdx.x + blockIdx.x * blockDim.x;
  half x = reinterpret_cast<const half&>(i);
  for (int j = 0; j < 65536; ++j) {
    half y = reinterpret_cast<const half&>(j);
    half d1 = hdiv_newton(x, y);
    half d2 = hdiv_promote(x, y);
    auto s1 = reinterpret_cast<const short&>(d1);
    auto s2 = reinterpret_cast<const short&>(d2);
    if (s1 != s2) {
      printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n",
             __half2float(x), i, __half2float(y), j, __half2float(d1), s1,
             __half2float(d2), s2);
      //__trap();
    }
  }
}

__device__ half dst;

__global__ void ProfileBuiltin(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = x / x;
  }
  dst = x;
}

__global__ void ProfilePromote(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_promote(x, x);
  }
  dst = x;
}

__global__ void ProfileNewton(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_newton(x, x);
  }
  dst = x;
}

__global__ void ProfileApprox(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_approx(x, x);
  }
  dst = x;
}

int main() {
  CheckCorrectness<<<256, 256>>>();
  half one = __float2half(1.0f);
  ProfileBuiltin<<<1, 1>>>(one);  // 1.001s
  ProfilePromote<<<1, 1>>>(one);  // 0.560s
  ProfileNewton<<<1, 1>>>(one);   // 0.508s
  ProfileApprox<<<1, 1>>>(one);   // 0.304s
  auto status = cudaDeviceSynchronize();
  printf("%s\n", cudaGetErrorString(status));
}
```

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D126158
2022-06-04 08:03:29 +02:00
Peter Klausler 9c54d76251 [flang][runtime] Signal new I/O error on floating-point input overflow
Besides raising the IEEE floating-point overflow exception, treat
a floating-point overflow on input as an I/O error catchable with
ERR=, IOSTAT=, &/or IOMSG=.

Differential Revision: https://reviews.llvm.org/D127022
2022-06-03 22:55:03 -07:00
Amir Ayupov b346af6d44 [BOLT][UTILS] Usability improvements for nfc-check-setup
# Stash local changes before checkout.
# Print a message that the source repository revision has been changed, with
  instructions to switch back.
# Make the script executable.
# Print sample instructions how to run bolt tests.
# Assume that llvm-bolt-wrapper script is in the same source directory.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126941
2022-06-03 22:54:56 -07:00
Peter Klausler 08c6a32381 [flang] Don't discard lower bounds of implicit-shape named constants
F18 preserves lower bounds of explicit-shape named constant arrays, but
failed to also do so for implicit-shape named constants.  Fix.

Differential Revision: https://reviews.llvm.org/D127021
2022-06-03 22:45:12 -07:00
Peter Klausler f3278e0f3c [flang][runtime] Ensure that 0. <= RANDOM_NUMBER() < 1.
It was possible for RANDOM_NUMBER() to return 1.0.

Differential Revision: https://reviews.llvm.org/D127020
2022-06-03 22:44:19 -07:00
Fangrui Song 025b309631 Revert D126950 "[lld][WebAssembly] Retain data segments referenced via __start/__stop"
This reverts commit dcf3368e33.

It breaks -DLLVM_ENABLE_ASSERTIONS=on builds. In addition, the description is
incorrect about ld.lld behavior. For wasm, there should be justification to add
the new mode.
2022-06-03 22:18:06 -07:00
Peter Klausler 15faac900d [flang] Distinguish intrinsic module USE in module files; correct search paths
In the USE statements that f18 emits to module files, ensure that symbols
from intrinsic modules are marked as such on their USE statements.  And
ensure that the current working directory (".") cannot override the intrinsic
module search path when trying to locate an intrinsic module.

Differential Revision: https://reviews.llvm.org/D127019
2022-06-03 22:07:44 -07:00
Fangrui Song 72f9c69421 [Hexagon][bolt] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Similar to 557efc9a8b
2022-06-03 22:04:57 -07:00
Fangrui Song 734c223445 [clang-link-wrapper] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Similar to 557efc9a8b
2022-06-03 22:02:11 -07:00
Fangrui Song 557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
LiaoChunyu f14d18c7a9 [RISCV] Add more patterns for FNMADD
D54205 handles fnmadd: -rs1 * rs2 - rs3
This patch add fnmadd: -(rs1 * rs2 + rs3) (the nsz flag on the FMA)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D126852
2022-06-04 12:31:45 +08:00
varconst 7c63cc198b [libc++][ranges][NFC] Fix a patch link in ranges status. 2022-06-03 20:39:00 -07:00
varconst faf43ad7ae [libc++][ranges][NFC] Mark range algorithms that are in progress. 2022-06-03 20:02:46 -07:00
Yuta Saito dcf3368e33 [lld][WebAssembly] Retain data segments referenced via __start/__stop
As well as ELF linker does, retain all data segments named X referenced
through `__start_X` or `__stop_X`.

For example, `FOO_MD` should not be stripped in the below case, but it's currently mis-stripped

```llvm
@FOO_MD  = global [4 x i8] c"bar\00", section "foo_md", align 1
@__start_foo_md = external constant i8*
@__stop_foo_md = external constant i8*
@llvm.used = appending global [1 x i8*] [i8* bitcast (i32 ()* @foo_md_size to i8*)], section "llvm.metadata"

define i32 @foo_md_size()  {
entry:
  ret i32 sub (
    i32 ptrtoint (i8** @__stop_foo_md to i32),
    i32 ptrtoint (i8** @__start_foo_md to i32)
  )
}
```

This fixes https://github.com/llvm/llvm-project/issues/55839

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D126950
2022-06-04 02:28:31 +00:00
Peter Klausler e0adee8481 [flang] Correct folding of CSHIFT and EOSHIFT for DIM>1
The algorithm was wrong for higher dimensions, and so were
the expected test results.  Rework.

Differential Revision: https://reviews.llvm.org/D127018
2022-06-03 18:59:44 -07:00
Fangrui Song 47ec8b5574 [pseudo] Fix leaks after D126731
Array Operator new Cookies help lsan find allocations, while std::array
can't.
2022-06-03 18:43:16 -07:00
Peter Klausler aa77cf90aa [flang][runtime] Signal format error when input field width is zero
A data edit descriptor for input may not have a zero field width.

Differential Revision: https://reviews.llvm.org/D127017
2022-06-03 18:11:00 -07:00
Peter Klausler e5a4f730da [flang][runtime] OPEN write-only files
If a file being opened with no ACTION= is write-only then cope with
it rather than defaulting prematurely to treating it as read-only.

Differential Revision: https://reviews.llvm.org/D127015
2022-06-03 18:09:40 -07:00
Craig Topper cc3bd43533 [RISCV] Support LUI+ADDIW in doPeepholeLoadStoreADDI.
This fixes an inconsistency between RV32 and RV64. Still considering
trying to do this peephole during isel, but wanted to fix the
inconsistency first.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D126986
2022-06-03 18:06:56 -07:00
Peter Klausler 9878facfd0 [flang][runtime] INQUIRE(FILE="...",SIZE=nbytes)
Implement inquire-by-file SIZE= specifier.

Differential Revision: https://reviews.llvm.org/D127014
2022-06-03 18:05:27 -07:00
Jake Egan c3c75d805c [clang][test] Mark test arm-float-abi-lto.c unsupported on AIX
This test is failing after the introduction of opaque pointers (https://reviews.llvm.org/D125847). The test is flaky and fails from segmentation fault, but it's unclear why. So, mark this test unsupported while it's investigated.
2022-06-03 21:04:56 -04:00
Paul Pluzhnikov 490990bb1f [test] Modify test to verify D126396 (Clean "./" from __FILE__ expansion)
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D127009
2022-06-03 17:54:03 -07:00
Peter Klausler da63fee0d0 [flang][runtime] Allow extra character for E0.0 output editing
When the digit count ('d') is zero in E0 editing, allow for one more
output character; otherwise, any - or + sign in the output causes
an output field overflow.

Differential Revision: https://reviews.llvm.org/D127013
2022-06-03 17:41:22 -07:00
wren romano 3cf03f1c56 [mlir][sparse] Adding IsSparseTensorPred and updating ops to use it
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D126994
2022-06-03 17:15:31 -07:00
Peter Klausler 604016dbe4 [flang][runtime] Fix bug with extra leading zero in octal output
Octal (O) output editing often emits an extra leading 0 digit
due to the total digit count being off by one since word sizes
aren't multiples of three bits.

Differential Revision: https://reviews.llvm.org/D127012
2022-06-03 17:02:07 -07:00
Peter Klausler 66a871b973 [flang] Fix crash in IsSaved()
Code was accessing ProcEntityDetails in a symbol that didn't have them.

Differential Revision: https://reviews.llvm.org/D127011
2022-06-03 17:00:01 -07:00
Florian Mayer 53c1584063 [NFC] [libunwind] turn assert into static_assert
Reviewed By: #libunwind, MaskRay

Differential Revision: https://reviews.llvm.org/D126987
2022-06-03 16:32:42 -07:00
Clemens Wasser 42c7f494d9 [tools] Forward declare classes & remove includes
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D120208
2022-06-03 16:32:04 -07:00
Christopher Bate 9f819f4c62 [mlir][linalg] fix crash in vectorization of elementwise operations
The current vectorization logic implicitly expects "elementwise"
linalg ops to have projected permutations for indexing maps, but
the precondition logic misses this check. This can result in a
crash when executing the generic vectorization transform on an op
with a non-projected permutation input indexing map. This change
fixes the logic and adds a test (which crashes without this fix).

Differential Revision: https://reviews.llvm.org/D127000
2022-06-03 16:38:13 -06:00