llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	791a7ae1ba	[AArch64] Add big-endian tests for trunc-to-tbl.ll Extra tests for D133495.	2022-09-15 15:12:34 +01:00
Dmitry Preobrazhensky	0e868aff43	[AMDGPU][MC][GFX11] Add validation of constant bus limitations for VOPD Differential Revision: https://reviews.llvm.org/D133881	2022-09-15 16:36:19 +03:00
Dmitry Preobrazhensky	c89e60bf1f	[AMDGPU][MC][GFX11] Add VOPD literals validation Differential Revision: https://reviews.llvm.org/D133864	2022-09-15 16:29:53 +03:00
Dmitry Preobrazhensky	8bb5c89205	[AMDGPU][MC][NFC] Refactor AMDGPUAsmParser::validateVOPLiteral Differential Revision: https://reviews.llvm.org/D133861	2022-09-15 16:26:14 +03:00
Mehdi Amini	0b7f05e1a1	Apply clang-tidy fixes for llvm-include-order in TypeTest.cpp (NFC)	2022-09-15 13:23:35 +00:00
Mehdi Amini	41630c6d3f	Apply clang-tidy fixes for bugprone-argument-comment in LLVMTypeTest.cpp (NFC)	2022-09-15 13:23:34 +00:00
Mehdi Amini	7637f3b850	Apply clang-tidy fixes for readability-simplify-boolean-expr in OpFormatGen.cpp (NFC)	2022-09-15 13:23:34 +00:00
Tue Ly	1c89ae71ea	[libc][math] Improve sinhf and coshf performance. Optimize `sinhf` and `coshf` by computing exp(x) and exp(-x) simultaneously. Currently `sinhf` and `coshf` are implemented using the following formulas: ``` sinh(x) = 0.5 (exp(x) - 1) - 0.5(exp(-x) - 1) cosh(x) = 0.5exp(x) + 0.5exp(-x) ``` where `exp(x)` and `exp(-x)` are calculated separately using the formula: ``` exp(x) ~ 2^hi * 2^mid * exp(dx) ~ 2^hi * 2^mid * P(dx) ``` By expanding the polynomial `P(dx)` into even and odd parts ``` P(dx) = P_even(dx) + dx * P_odd(dx) ``` we can see that the computations of `exp(x)` and `exp(-x)` have many things in common, namely: ``` exp(x) ~ 2^(hi + mid) * (P_even(dx) + dx * P_odd(dx)) exp(-x) ~ 2^(-(hi + mid)) * (P_even(dx) - dx * P_odd(dx)) ``` Expanding `sinh(x)` and `cosh(x)` with respect to the above formulas, we can compute these two functions as follow in order to maximize the sharing parts: ``` sinh(x) = (e^x - e^(-x)) / 2 ~ 0.5 * (P_even * (2^(hi + mid) - 2^(-(hi + mid))) + dx * P_odd * (2^(hi + mid) + 2^(-(hi + mid)))) cosh(x) = (e^x + e^(-x)) / 2 ~ 0.5 * (P_even * (2^(hi + mid) + 2^(-(hi + mid))) + dx * P_odd * (2^(hi + mid) - 2^(-(hi + mid)))) ``` So in this patch, we perform the following optimizations for `sinhf` and `coshf`: # Use the above formulas to maximize sharing intermediate results, # Apply similar optimizations from https://reviews.llvm.org/D133870 Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700: For `sinhf`: ``` $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf GNU libc version: 2.35 GNU libc release: stable CORE-MATH reciprocal throughput : 16.718 System LIBC reciprocal throughput : 63.151 BEFORE: LIBC reciprocal throughput : 90.116 LIBC reciprocal throughput : 28.554 (with `-msse4.2` flag) LIBC reciprocal throughput : 22.577 (with `-mfma` flag) AFTER: LIBC reciprocal throughput : 36.482 LIBC reciprocal throughput : 16.955 (with `-msse4.2` flag) LIBC reciprocal throughput : 13.943 (with `-mfma` flag) $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf --latency GNU libc version: 2.35 GNU libc release: stable CORE-MATH latency : 48.821 System LIBC latency : 137.019 BEFORE LIBC latency : 97.122 LIBC latency : 84.214 (with `-msse4.2` flag) LIBC latency : 71.611 (with `-mfma` flag) AFTER LIBC latency : 54.555 LIBC latency : 50.865 (with `-msse4.2` flag) LIBC latency : 48.700 (with `-mfma` flag) ``` For `coshf`: ``` $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf GNU libc version: 2.35 GNU libc release: stable CORE-MATH reciprocal throughput : 16.939 System LIBC reciprocal throughput : 19.695 BEFORE: LIBC reciprocal throughput : 52.845 LIBC reciprocal throughput : 29.174 (with `-msse4.2` flag) LIBC reciprocal throughput : 22.553 (with `-mfma` flag) AFTER: LIBC reciprocal throughput : 37.169 LIBC reciprocal throughput : 17.805 (with `-msse4.2` flag) LIBC reciprocal throughput : 14.691 (with `-mfma` flag) $ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf --latency GNU libc version: 2.35 GNU libc release: stable CORE-MATH latency : 48.478 System LIBC latency : 48.044 BEFORE LIBC latency : 99.123 LIBC latency : 85.595 (with `-msse4.2` flag) LIBC latency : 72.776 (with `-mfma` flag) AFTER LIBC latency : 57.760 LIBC latency : 53.967 (with `-msse4.2` flag) LIBC latency : 50.987 (with `-mfma` flag) ``` Reviewed By: orex, zimmermann6 Differential Revision: https://reviews.llvm.org/D133913	2022-09-15 09:20:39 -04:00
Alexey Lapshin	adabfb5e32	[DWARFLinker][NFC] Set the target DWARF version explicitly. Currently, DWARFLinker determines the target DWARF version internally. It examines incoming object files, detects maximal DWARF version and uses that version for the output file. This patch allows explicitly setting output DWARF version by the consumer of DWARFLinker. So that DWARFLinker uses a specified version instead of autodetected one. It allows consumers to use different logic for setting the target DWARF version. f.e. instead of the maximally used version someone could set a higher version to convert from DWARFv4 to DWARFv5 (This possibility is not supported yet, but it would be good if the interface will support it). Or another variant is to set the target version through the command line. In this patch, the autodetection is moved into the consumers(DwarfLinkerForBinary.cpp, DebugInfoLinker.cpp). Differential Revision: https://reviews.llvm.org/D132755	2022-09-15 16:06:10 +03:00
Simon Pilgrim	0ec028fe10	[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 14:05:30 +01:00
Florian Hahn	8f19de848b	[AArch64] Add big-endian tests for zext-to-tbl.ll Extra tests for D120571.	2022-09-15 14:01:27 +01:00
wanglei	a65557d4b3	[LoongArch] Fixup value adjustment in applyFixup A complete implementation of `applyFixup` for D132323. Makes `LoongArchAsmBackend::shouldForceRelocation` to determine if the relocation types must be forced. This patch also adds range and alignment checks for `b*` instructions' operands, at which point the offset to a label is known. Differential Revision: https://reviews.llvm.org/D132818	2022-09-15 21:00:22 +08:00
Aleksandr Platonov	3ce7d256f2	[clang][RecoveryExpr] Don't perform alignment check if parameter type is dependent This patch fixes a crash which appears because of getTypeAlignInChars() call with depentent type. Reviewed By: hokein Differential Revision: https://reviews.llvm.org/D133886	2022-09-15 15:51:43 +03:00
Ivan Kosarev	693f816288	[AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D133787	2022-09-15 13:48:51 +01:00
Jez Ng	5b8da10b87	[lld-macho] Add support for N_INDR symbols This is similar to the `-alias` CLI option, but it gives finer-grained control in that it allows the aliased symbols to be treated as private externs. While working on this, I realized that our `-alias` handling did not cover the cases where the aliased symbol is a common or dylib symbol, nor the case where we have an undefined that gets treated specially and converted to a defined later on. My N_INDR handling neglects this too for now; I've added checks and TODO messages for these. `N_INDR` symbols cropped up as part of our attempt to link swift-stdlib. Reviewed By: #lld-macho, thakis, thevinster Differential Revision: https://reviews.llvm.org/D133825	2022-09-15 08:35:24 -04:00
Haojian Wu	e488ce29ec	[mlir] Remove the unused source file. It seems to be a missing file in `84d07d0213` Differential Revision: https://reviews.llvm.org/D133937	2022-09-15 14:30:40 +02:00
Arjun P	edd173201c	[MLIR][Presburger] clarify why -0 is used instead of 0 (NFC)	2022-09-15 13:23:48 +01:00
Ilia Diachkov	3544d200d9	[SPIRV] add IR regularization pass The patch adds the regularization pass that prepare LLVM IR for the IR translation. It also contains following changes: - reduce indentation, make getNonParametrizedType, getSamplerType, getPipeType, getImageType, getSampledImageType static in SPIRVBuiltins, - rename mayBeOclOrSpirvBuiltin to getOclOrSpirvBuiltinDemangledName, - move isOpenCLBuiltinType, isSPIRVBuiltinType, isSpecialType from SPIRVGlobalRegistry.cpp to SPIRVUtils.cpp, renaming isSpecialType to isSpecialOpaqueType, - implment getTgtMemIntrinsic() in SPIRVISelLowering, - add hasSideEffects = 0 in Pseudo (SPIRVInstrFormats.td), - add legalization rule for G_MEMSET, correct G_BRCOND rule, - add capability processing for OpBuildNDRange in SPIRVModuleAnalysis, - don't correct types of registers holding constants and used in G_ADDRSPACE_CAST (SPIRVPreLegalizer.cpp), - lower memset/bswap intrinsics to functions in SPIRVPrepareFunctions, - change TargetLoweringObjectFileELF to SPIRVTargetObjectFile in SPIRVTargetMachine.cpp, - correct comments. 5 LIT tests are added to show the improvement. Differential Revision: https://reviews.llvm.org/D133253 Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com> Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com> Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>	2022-09-15 15:53:44 +03:00
Michael Platings	f0c234d2a6	[NFC] Don't assume llvm directory is CMake root This makes the file consistent with ARM/CMakeLists.txt	2022-09-15 13:06:54 +01:00
Haojian Wu	10250c5a2a	Fix bazel build after `84d07d0213`.	2022-09-15 13:52:46 +02:00
Nicolas Vasilache	a8645a3c2d	[mlir][Linalg] Post submit addressed comments missed in f0cdc5bcd3f25192f12bfaff072ce02497b59c3c Differential Revision: https://reviews.llvm.org/D133936	2022-09-15 04:47:41 -07:00
Tobias Hieta	24c10abe83	[NFC] Fix exception in version-check.py script	2022-09-15 13:34:29 +02:00
Aaron Ballman	e076680bd5	Add a "Potentially Breaking Changes" section to the Clang release notes Sometimes we make changes to the compiler that we expect may cause disruption for users. For example, we may strengthen a warning to default to be an error, or fix an accepts-invalid bug that's been around for a long time, etc which may cause previously accepted code to now be rejected. Rather than hope users discover that information by reading all of the release notes, it's better that we call these out in one location at the top of the release notes. Based on feedback collected in the discussion at: https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-implicit-function-declaration/65213/ Differential Revision: https://reviews.llvm.org/D133771	2022-09-15 07:29:49 -04:00
Groverkss	84d07d0213	[MLIR][Presburger] Improve unittest parsing This patch adds better functions for parsing MultiAffineFunctions and PWMAFunctions in Presburger unittests. A PWMAFunction can now be parsed as: ``` PWMAFunction result = parsePWMAF({ {"(x, y) : (x >= 10, x <= 20, y >= 1)", "(x, y) -> (x + y)"}, {"(x, y) : (x >= 21)", "(x, y) -> (x + y)"}, {"(x, y) : (x <= 9)", "(x, y) -> (x - y)"}, {"(x, y) : (x >= 10, x <= 20, y <= 0)", "(x, y) -> (x - y)"}, }); ``` which is much more readable than the old format since the output can be described as an AffineMap, instead of coefficients. This patch also adds support for parsing divisions in MultiAffineFunctions and PWMAFunctions which was previously not possible. Reviewed By: arjunp Differential Revision: https://reviews.llvm.org/D133654	2022-09-15 12:09:00 +01:00
esmeyi	6e0e926c2f	[PowerPC] Converts to comparison against zero even when the optimization doesn't happened in peephole optimizer. Summary: Converting a comparison against 1 or -1 into a comparison against 0 can exploit record-form instructions for comparison optimization. The conversion will happen only when a record-form instruction can be used to replace the comparison during the peephole optimizer (see function optimizeCompareInstr). In post-RA, we also want to optimize the comparison by using the record form (see D131873) and it requires additional dataflow analysis to reliably find uses of the CR register set. It's reasonable to common the conversion for both peephole optimizer and post-RA optimizer. Converting to comparison against zero even when the optimization doesn't happened in peephole optimizer may create additional opportunities for the post-RA optimization. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D131374	2022-09-15 06:06:25 -04:00
Guray Ozen	5279e11f06	[mlir][linalg] Retire Linalg's Vectorization Pattern This revision retires the LinalgCodegenStrategy vectorization pattern. Please see the context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785. This revision improves the transform dialect's VectorizeOp in different ways below: - Adds LinalgDialect as a dependent dialect. When `transform.structured.vectorize` vectorizes `tensor.pad`, it generates `linalg.init_tensor`. In this case, linalg dialect must be registered. - Inserts CopyVectorizationPattern in order to vectorize `memref.copy`. - Creates two attributes: `disable_multi_reduction_to_contract_patterns` and `disable_transfer_permutation_map_lowering_patterns`. They are limiting the power of vectorization and are currently intended for testing purposes. It also removes some of the "CHECK: vector.transfer_write" in the vectorization.mlir test. They are redundant writes, at the end of the code there is a rewrite to the same place. Transform dialect no longer generates them. Depends on D133684 that retires the LinalgCodegenStrategy vectorization pass. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D133699	2022-09-15 11:23:46 +02:00
Guray Ozen	51e0946591	[mlir][linalg] Retire Linalg's StrategyVectorizePass We retire linalg's strategy vectorize pass. Our goal is to use transform dialect instead of passes. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D133684	2022-09-15 11:12:18 +02:00
Tom Praschan	220d850823	[clangd] Fix hover on symbol introduced by using declaration This fixes https://github.com/clangd/clangd/issues/1284. The example there was C++20's "using enum", but I noticed that we had the same issue for other using-declarations. Differential Revision: https://reviews.llvm.org/D133664	2022-09-15 13:02:58 +02:00
Marco Elver	72e7575ffe	[GlobalISel][AArch64] Fix pcsections for expanded atomics and add more tests Add fix for propagation of !pcsections metadata for expanded atomics, together with more tests for interesting atomic instructions (based on llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133710	2022-09-15 10:36:11 +02:00
Christian Sigg	70ac466676	[Bazel] Add lit tests to bazel builds. Add BUILD.bazel files for most of the MLIR tests and lit tests itself. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D133455	2022-09-15 07:49:37 +02:00
Evgeniy Brevnov	03a102e3b2	[JumpThreading][NFC] Reuse existing DT instead of recomputation (newPM) This is the same change as `503d5771b6` with the same intent but for new pass manager.	2022-09-15 12:27:57 +07:00
Fangrui Song	057fb8153a	[IRBuilder] Fix -Wunused-variable in non-assertion build. NFC	2022-09-14 22:14:36 -07:00
Dhruva Chakrabarti	839ac62c50	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `7539e9cf81`.	2022-09-15 03:08:46 +00:00
Vitaly Buka	f221720e82	[nfc][msan] getShadowOriginPtr on <N x ptr> Some vector instructions can benefit from of Addr as <N x ptr>. Differential Revision: https://reviews.llvm.org/D133681	2022-09-14 19:18:52 -07:00
Vitaly Buka	72b776168c	[IRBuilder] Add CreateMaskedExpandLoad and CreateMaskedCompressStore	2022-09-14 19:18:52 -07:00
Vitaly Buka	f404169f24	[NFC][msan] Rename variables to match definition	2022-09-14 19:16:27 -07:00
Vitaly Buka	2209be15a5	[NFC][msan] Convert some code to early returns Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133673	2022-09-14 19:16:11 -07:00
Vitaly Buka	bcf3d666b4	[NFC][msan] Simplify llvm.masked.load origin code Reviewed By: kda Differential Revision: https://reviews.llvm.org/D133652	2022-09-14 19:14:29 -07:00
Vitaly Buka	d421223e25	[msan] Resolve FIXME from D133880 We don't need to change tests we convertToBool unconditionally only before OR.	2022-09-14 18:55:57 -07:00
Vitaly Buka	2cf9b25aa9	[test][msan] Use implicit-check-not	2022-09-14 18:44:22 -07:00
Sheng	bea33f75e2	[M68k] Fix the crash of fast register allocator `MOVEM` is used to spill the register, which will cause problem with 1 byte data, since it only supports word (2 bytes) and long (4 bytes) size. We change to use the normal `move` instruction to spill 1 byte data. Fixes #57660 Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D133636	2022-09-15 09:24:22 +08:00
Jeff Niu	9c7ba57e70	[mlir] Allow `Attribute::print` to elide the type This patch adds a flag to `Attribute::print` that prints the attribute without its type. Fixes #57689 Reviewed By: rriddle, lattner Differential Revision: https://reviews.llvm.org/D133822	2022-09-14 18:17:30 -07:00
Jeff Niu	9eec5284c7	[mlir][ods] Add cppClassName to ConfinedType So ODS can generate `OneTypedResult` when a ConfinedType is used as a result type. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D133893	2022-09-14 18:16:07 -07:00
Giorgis Georgakoudis	7539e9cf81	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6, ABataev Differential Revision: https://reviews.llvm.org/D102107	2022-09-15 00:54:05 +00:00
Stanislav Mekhanoshin	ef4b9c33f5	Fix crash while printing MMO target flags MachineMemOperand::print can dereference a NULL pointer if TII is not passed from the printMemOperand. This does not happen while dumping the DAG/MIR from llc but crashes the debugger if a dump() method is called from gdb. Differential Revision: https://reviews.llvm.org/D133903	2022-09-14 17:29:48 -07:00
Craig Topper	5888c157a7	[RISCV] Simplify some code in RISCVInstrInfo::verifyInstruction. NFCI This code was written as if it lived in the MC layer instead of the CodeGen layer. We get the MCInstrDesc directly from MachineInstr. And we can use RISCVSubtarget::is64Bit instead of going to the Triple. Differential Revision: https://reviews.llvm.org/D133905	2022-09-14 17:07:21 -07:00
Sam Clegg	8273ca1421	[MC] Fix typo in getSectionAddressSize comment. NFC The comment was refering to a now non-existant function that was removed in `93e3cf0ebd`. Differential Revision: https://reviews.llvm.org/D133098	2022-09-14 15:15:41 -07:00
Craig Topper	50a699e362	[IR][VP] Remove IntrArgMemOnly from vp.gather/scatter. IntrArgMemOnly is only valid for intrinsics that use a scalar pointer argument. These intrinsics use a vector of pointer. Alias analysis will try to find a scalar pointer argument and will return incorrect alias results when it doesn't find one. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133898	2022-09-14 15:00:07 -07:00
Craig Topper	6384044df4	[GVN][VP] Add test case for incorrect removal of a vp.gather. NFC Pre-commit for D133898 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133899	2022-09-14 15:00:07 -07:00
Jez Ng	118bfde90a	[lld-macho] Have ICF dedup explicitly-defined selrefs This is what ld64 does (though it doesn't use ICF to do this; instead it always dedups selrefs by default). We'll want to dedup implicitly-defined selrefs as well, but I will leave that for future work. Additionally, I'm not super happy with the current LLD implementation because I think it is rather janky and inefficient. But at least it moves us toward the goal of closing the size gap with ld64. I've described ideas for cleaning up our implementation here: https://github.com/llvm/llvm-project/issues/57714 Differential Revision: https://reviews.llvm.org/D133780	2022-09-14 17:59:22 -04:00

1 2 3 4 5 ...

436098 Commits All Branches Search

436098 Commits

All Branches