llvm-project

Commit Graph

Author	SHA1	Message	Date
Peter Klausler	562fd2c99b	[flang][runtime] Emit error message rather than crashing for MOD(ULO)(x,P=0) Add extra arguments and checks to the runtime support library so that a call to the intrinsic functions MOD and MODULO with "denominator" argument P of zero will cause a crash with a source location rather than an uninformative floating-point error or integer division by zero signal. Additional work is required in lowering to (1) pass source file path and source line number arguments and (2) actually call these runtime library APIs instead of emitting inline code for MOD &/or MODULO. Differential Revision: https://reviews.llvm.org/D127034	2022-06-04 11:02:48 -07:00
Peter Klausler	11f928af9b	[flang][runtime] Fix deadlock in error recovery When an external I/O statement is in a recoverable error state before any data transfers take place (for example, an unformatted transfer with ERR=/IOSTAT=/IOMSG= attempted on a formatted unit), ensure that the unit's mutex is still released at the end of the statement. Differential Revision: https://reviews.llvm.org/D127032	2022-06-04 09:55:53 -07:00
Peter Klausler	ed71a0b45b	[flang] When folding FINDLOC, convert operands to a common type For example, FINDLOC(A,X) should convert both A and X to COMPLEX(8) if the operands are REAL(8) and COMPLEX(4), so that comparisons can be done without losing inforation. The current implementation unconditionally converts X to the type of the array A. Differential Revision: https://reviews.llvm.org/D127030	2022-06-04 09:26:13 -07:00
Peter Klausler	9a163ffe1a	[flang][runtime] Fix WRITE after OPEN(.., ACCESS="APPEND") The initial size of the file was not being captured as the file position on which the first output buffer should be framed. Differential Revision: https://reviews.llvm.org/D127029	2022-06-04 09:18:25 -07:00
Peter Klausler	dfcccc6dee	[flang][runtime] Fix edge case discrepancies with EN output editing The "engineering" ENw.d output editing descriptor has some difficult edge case behavior for values that might format into a bunch of 9's or round up to a 1 for a given scale factor. Fix the algorithm, and add tests to protect against regressions. Differential Revision: https://reviews.llvm.org/D127028	2022-06-04 09:14:05 -07:00
Peter Klausler	d484fe93d4	[flang] Don't crash on initialization with a zero-sized derived type Avoid calls to memcpy with zero byte counts if their address argument calculations may not be valid expressions. Differential Revision: https://reviews.llvm.org/D127027	2022-06-04 08:58:16 -07:00
Peter Klausler	ea5b205bb8	[flang][runtime] Don't crash after surviving internal output overflow After the program has survived its attempt to overflow the output buffer with an internal WRITE using ERR=, IOSTAT=, &/or IOMSG=, don't crash by accidentally blank-filling the next record that usually doesn't exist. Differential Revision: https://reviews.llvm.org/D127024	2022-06-04 08:47:13 -07:00
Peter Klausler	ea1a69d66d	[flang][runtime] Don't let random seed queries change the sequence When the current seed of the pseudo-random generator is queried with CALL RANDOM_SEED(GET=n), that query should not change the stream of pseudo-random numbers produced by CALL RANDOM_NUMBER(). Differential Revision: https://reviews.llvm.org/D127023	2022-06-04 08:01:46 -07:00
Mehdi Amini	369ce54bb3	Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This reverts commit `bcfc0a9051`. The build is broken with shared library enabled.	2022-06-04 08:35:45 +00:00
Fangrui Song	36c7d79dc4	Remove unneeded cl::ZeroOrMore for cl::opt options Similar to `557efc9a8b`. This commit handles options where cl::ZeroOrMore is more than one line below cl::opt.	2022-06-04 00:10:42 -07:00
Christian Sigg	bcfc0a9051	[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration. This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because: - it performs less Newton iterations - it avoids the slow path for e.g. denormals - it allows reuse of the reciprocal for multiple divisions by the same divisor Test program: ``` #include <stdio.h> #include "cuda_fp16.h" // This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below // and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values. __device__ half hdiv_newton(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float rcp; asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb)); float result = fa * rcp; auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000; if (exponent != 0 && exponent != 0x7f800000) { float err = __fmaf_rn(-fb, result, fa); result = __fmaf_rn(rcp, err, result); } return __float2half(result); } // Surprisingly, this is faster than CUDA's own __hdiv. __device__ half hdiv_promote(half a, half b) { return __float2half(__half2float(a) / __half2float(b)); } // This is an approximation that is accurate up to 1 ulp. __device__ half hdiv_approx(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float result; asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb)); return __float2half(result); } __global__ void CheckCorrectness() { int i = threadIdx.x + blockIdx.x * blockDim.x; half x = reinterpret_cast<const half&>(i); for (int j = 0; j < 65536; ++j) { half y = reinterpret_cast<const half&>(j); half d1 = hdiv_newton(x, y); half d2 = hdiv_promote(x, y); auto s1 = reinterpret_cast<const short&>(d1); auto s2 = reinterpret_cast<const short&>(d2); if (s1 != s2) { printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n", __half2float(x), i, __half2float(y), j, __half2float(d1), s1, __half2float(d2), s2); //__trap(); } } } __device__ half dst; __global__ void ProfileBuiltin(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = x / x; } dst = x; } __global__ void ProfilePromote(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_promote(x, x); } dst = x; } __global__ void ProfileNewton(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_newton(x, x); } dst = x; } __global__ void ProfileApprox(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_approx(x, x); } dst = x; } int main() { CheckCorrectness<<<256, 256>>>(); half one = __float2half(1.0f); ProfileBuiltin<<<1, 1>>>(one); // 1.001s ProfilePromote<<<1, 1>>>(one); // 0.560s ProfileNewton<<<1, 1>>>(one); // 0.508s ProfileApprox<<<1, 1>>>(one); // 0.304s auto status = cudaDeviceSynchronize(); printf("%s\n", cudaGetErrorString(status)); } ``` Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D126158	2022-06-04 08:03:29 +02:00
Peter Klausler	9c54d76251	[flang][runtime] Signal new I/O error on floating-point input overflow Besides raising the IEEE floating-point overflow exception, treat a floating-point overflow on input as an I/O error catchable with ERR=, IOSTAT=, &/or IOMSG=. Differential Revision: https://reviews.llvm.org/D127022	2022-06-03 22:55:03 -07:00
Amir Ayupov	b346af6d44	[BOLT][UTILS] Usability improvements for nfc-check-setup # Stash local changes before checkout. # Print a message that the source repository revision has been changed, with instructions to switch back. # Make the script executable. # Print sample instructions how to run bolt tests. # Assume that llvm-bolt-wrapper script is in the same source directory. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D126941	2022-06-03 22:54:56 -07:00
Peter Klausler	08c6a32381	[flang] Don't discard lower bounds of implicit-shape named constants F18 preserves lower bounds of explicit-shape named constant arrays, but failed to also do so for implicit-shape named constants. Fix. Differential Revision: https://reviews.llvm.org/D127021	2022-06-03 22:45:12 -07:00
Peter Klausler	f3278e0f3c	[flang][runtime] Ensure that 0. <= RANDOM_NUMBER() < 1. It was possible for RANDOM_NUMBER() to return 1.0. Differential Revision: https://reviews.llvm.org/D127020	2022-06-03 22:44:19 -07:00
Fangrui Song	025b309631	Revert D126950 "[lld][WebAssembly] Retain data segments referenced via __start/__stop" This reverts commit `dcf3368e33`. It breaks -DLLVM_ENABLE_ASSERTIONS=on builds. In addition, the description is incorrect about ld.lld behavior. For wasm, there should be justification to add the new mode.	2022-06-03 22:18:06 -07:00
Peter Klausler	15faac900d	[flang] Distinguish intrinsic module USE in module files; correct search paths In the USE statements that f18 emits to module files, ensure that symbols from intrinsic modules are marked as such on their USE statements. And ensure that the current working directory (".") cannot override the intrinsic module search path when trying to locate an intrinsic module. Differential Revision: https://reviews.llvm.org/D127019	2022-06-03 22:07:44 -07:00
Fangrui Song	72f9c69421	[Hexagon][bolt] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Similar to `557efc9a8b`	2022-06-03 22:04:57 -07:00
Fangrui Song	734c223445	[clang-link-wrapper] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Similar to `557efc9a8b`	2022-06-03 22:02:11 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
LiaoChunyu	f14d18c7a9	[RISCV] Add more patterns for FNMADD D54205 handles fnmadd: -rs1 * rs2 - rs3 This patch add fnmadd: -(rs1 * rs2 + rs3) (the nsz flag on the FMA) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126852	2022-06-04 12:31:45 +08:00
varconst	7c63cc198b	[libc++][ranges][NFC] Fix a patch link in ranges status.	2022-06-03 20:39:00 -07:00
varconst	faf43ad7ae	[libc++][ranges][NFC] Mark range algorithms that are in progress.	2022-06-03 20:02:46 -07:00
Yuta Saito	dcf3368e33	[lld][WebAssembly] Retain data segments referenced via __start/__stop As well as ELF linker does, retain all data segments named X referenced through `__start_X` or `__stop_X`. For example, `FOO_MD` should not be stripped in the below case, but it's currently mis-stripped ```llvm @FOO_MD = global [4 x i8] c"bar\00", section "foo_md", align 1 @__start_foo_md = external constant i8* @__stop_foo_md = external constant i8* @llvm.used = appending global [1 x i8] [i8 bitcast (i32 ()* @foo_md_size to i8)], section "llvm.metadata" define i32 @foo_md_size() { entry: ret i32 sub ( i32 ptrtoint (i8* @__stop_foo_md to i32), i32 ptrtoint (i8** @__start_foo_md to i32) ) } ``` This fixes https://github.com/llvm/llvm-project/issues/55839 Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D126950	2022-06-04 02:28:31 +00:00
Peter Klausler	e0adee8481	[flang] Correct folding of CSHIFT and EOSHIFT for DIM>1 The algorithm was wrong for higher dimensions, and so were the expected test results. Rework. Differential Revision: https://reviews.llvm.org/D127018	2022-06-03 18:59:44 -07:00
Fangrui Song	47ec8b5574	[pseudo] Fix leaks after D126731 Array Operator new Cookies help lsan find allocations, while std::array can't.	2022-06-03 18:43:16 -07:00
Peter Klausler	aa77cf90aa	[flang][runtime] Signal format error when input field width is zero A data edit descriptor for input may not have a zero field width. Differential Revision: https://reviews.llvm.org/D127017	2022-06-03 18:11:00 -07:00
Peter Klausler	e5a4f730da	[flang][runtime] OPEN write-only files If a file being opened with no ACTION= is write-only then cope with it rather than defaulting prematurely to treating it as read-only. Differential Revision: https://reviews.llvm.org/D127015	2022-06-03 18:09:40 -07:00
Craig Topper	cc3bd43533	[RISCV] Support LUI+ADDIW in doPeepholeLoadStoreADDI. This fixes an inconsistency between RV32 and RV64. Still considering trying to do this peephole during isel, but wanted to fix the inconsistency first. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126986	2022-06-03 18:06:56 -07:00
Peter Klausler	9878facfd0	[flang][runtime] INQUIRE(FILE="...",SIZE=nbytes) Implement inquire-by-file SIZE= specifier. Differential Revision: https://reviews.llvm.org/D127014	2022-06-03 18:05:27 -07:00
Jake Egan	c3c75d805c	[clang][test] Mark test arm-float-abi-lto.c unsupported on AIX This test is failing after the introduction of opaque pointers (https://reviews.llvm.org/D125847). The test is flaky and fails from segmentation fault, but it's unclear why. So, mark this test unsupported while it's investigated.	2022-06-03 21:04:56 -04:00
Paul Pluzhnikov	490990bb1f	[test] Modify test to verify D126396 (Clean "./" from __FILE__ expansion) Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D127009	2022-06-03 17:54:03 -07:00
Peter Klausler	da63fee0d0	[flang][runtime] Allow extra character for E0.0 output editing When the digit count ('d') is zero in E0 editing, allow for one more output character; otherwise, any - or + sign in the output causes an output field overflow. Differential Revision: https://reviews.llvm.org/D127013	2022-06-03 17:41:22 -07:00
wren romano	3cf03f1c56	[mlir][sparse] Adding IsSparseTensorPred and updating ops to use it Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D126994	2022-06-03 17:15:31 -07:00
Peter Klausler	604016dbe4	[flang][runtime] Fix bug with extra leading zero in octal output Octal (O) output editing often emits an extra leading 0 digit due to the total digit count being off by one since word sizes aren't multiples of three bits. Differential Revision: https://reviews.llvm.org/D127012	2022-06-03 17:02:07 -07:00
Peter Klausler	66a871b973	[flang] Fix crash in IsSaved() Code was accessing ProcEntityDetails in a symbol that didn't have them. Differential Revision: https://reviews.llvm.org/D127011	2022-06-03 17:00:01 -07:00
Florian Mayer	53c1584063	[NFC] [libunwind] turn assert into static_assert Reviewed By: #libunwind, MaskRay Differential Revision: https://reviews.llvm.org/D126987	2022-06-03 16:32:42 -07:00
Clemens Wasser	42c7f494d9	[tools] Forward declare classes & remove includes Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D120208	2022-06-03 16:32:04 -07:00
Christopher Bate	9f819f4c62	[mlir][linalg] fix crash in vectorization of elementwise operations The current vectorization logic implicitly expects "elementwise" linalg ops to have projected permutations for indexing maps, but the precondition logic misses this check. This can result in a crash when executing the generic vectorization transform on an op with a non-projected permutation input indexing map. This change fixes the logic and adds a test (which crashes without this fix). Differential Revision: https://reviews.llvm.org/D127000	2022-06-03 16:38:13 -06:00
Florian Mayer	f60875254b	[DWARF] Show which augmentation character was unrecognized. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D127003	2022-06-03 15:35:33 -07:00
Brad Smith	a0bc67e555	[Hexagon] Enable IAS in the Hexagon backend Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D123096	2022-06-03 18:15:12 -04:00
Anders Waldenborg	dd2362a8ba	[clang] Allow const variables with weak attribute to be overridden A variable with `weak` attribute signifies that it can be replaced with a "strong" symbol link time. Therefore it must not emitted with "weak_odr" linkage, as that allows the backend to use its value in optimizations. The frontend already considers weak const variables as non-constant (note_constexpr_var_init_weak diagnostic) so this change makes frontend and backend consistent. This commit reverses the `f49573d1` weak globals that are const should get weak_odr linkage. commit from 2009-08-05 which introduced this behavior. Unfortunately that commit doesn't provide any details on why the change was made. This was discussed in https://discourse.llvm.org/t/weak-attribute-semantics-on-const-variables/62311 Differential Revision: https://reviews.llvm.org/D126324	2022-06-03 23:44:15 +02:00
Joseph Huber	1257fe193a	[Clang] Change the offload packager build to be a clang tool Summary: This patch changes the CMake build configruation for the `clang-offload-packager` to be a clang tool rather than an executable.	2022-06-03 17:35:26 -04:00
Diego Caballero	9a79b1b04c	[mlir] Add peeling xform to Codegen Strategy This patch adds the knobs to use peeling in the codegen strategy infrastructure. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D126842	2022-06-03 21:31:43 +00:00
Huan Nguyen	5ac26156fe	[BOLT][NFC] Warning for deprecated option '-reorder-blocks=cache+' Emit warning when using deprecated option '-reorder-blocks=cache+'. Auto switch to option '-reorder-blocks=ext-tsp'. Test Plan: ``` ninja check-bolt ``` Added a new test cache+-deprecated.test. Run and verify that the upstream tests are passed. Reviewed By: rafauler, Amir, maksfb Differential Revision: https://reviews.llvm.org/D126722	2022-06-03 14:16:55 -07:00
Jacob Weightman	814a0abcce	AMDGPU: allow reordering of functions in AMDGPUResourceUsageAnalysis The AMDGPUResourceUsageAnalysis was previously a CGSCC pass, and assumed that a function's callees were always analyzed prior to their callees. When it was refactored into a module pass, this assumption no longer always holds. This results in calls being erroneously identified as indirect, and reserving private segment space for them. This results in significantly slower kernel launch latency. This patch changes the order in which the module's functions are analyzed from the order in which they occur in the module to a post-order traversal of the call graph. Perhaps Clang always generates the module's functions in such an order, but this is not the case for the Cray Fortran compiler. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D126025	2022-06-03 15:55:54 -05:00
Craig Topper	8da5d5dbdc	[RISCV] Pre-commit test cases for D126986. NFC	2022-06-03 13:31:45 -07:00
Reid Kleckner	d82b4fe50d	[bazel] Update build for config.h.cmake change	2022-06-03 12:58:04 -07:00
Tue Ly	484319f497	[libc] Make expm1f correctly rounded when the targets have no FMA instructions. Add another exceptional value and fix the case when \|x\| is small. Performance tests with CORE-MATH project scripts: With FMA instructions on Ryzen 1700: ``` $ ./perf.sh expm1f LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE-MATH reciprocal throughput : 15.362 System LIBC reciprocal throughput : 53.194 LIBC reciprocal throughput : 14.595 $ ./perf.sh expm1f --latency LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE-MATH latency : 57.755 System LIBC latency : 147.020 LIBC latency : 60.269 ``` Without FMA instructions: ``` $ ./perf.sh expm1f LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE-MATH reciprocal throughput : 15.362 System LIBC reciprocal throughput : 53.300 LIBC reciprocal throughput : 18.020 $ ./perf.sh expm1f --latency LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a CORE-MATH latency : 57.758 System LIBC latency : 147.025 LIBC latency : 70.304 ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D123440	2022-06-03 15:57:48 -04:00
Joe Loser	4fc502368a	[libc++][test] Skip string_view tests for other vendors on older modes `string_view` is supported all the way back to C++03 as an extension in `libc++`, and so many of the tests run in all standards modes for all vendors. This is unlikely desired by other standard library vendors using our test suite. So, disable the tests for vendors other than `libc++` in these older standards modes. Differential Revision: https://reviews.llvm.org/D126850	2022-06-03 13:51:49 -06:00

1 2 3 4 5 ...

425658 Commits All Branches Search

425658 Commits

All Branches