llvm-project

Commit Graph

Author	SHA1	Message	Date
Joe Nash	f1cfaa956d	[AMDGPU] Use GFX11 S_PACK_HL instruction in more cases Differential Revision: https://reviews.llvm.org/D128527	2022-06-28 14:35:19 +01:00
Valentin Clement	ae35635f34	[flang] Fix bugs relating to support for characters of different kinds Fix bugs relating to support for characters of different kinds. Lowering was creating bad FIR and MLIR that crashed in conversion to LLVM IR. This patch is part of the upstreaming effort from fir-dev branch. Reviewed By: jeanPerier Differential Revision: https://reviews.llvm.org/D128723 Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>	2022-06-28 15:28:58 +02:00
Mehdi Amini	7faf75bb3e	Introduce a new Dense Array attribute This attribute is similar to DenseElementsAttr but does not support splat. As such it has a much simpler API and does not need any smart iterator: it exposes direct ArrayRef access. A new syntax is introduced so that the generic printing/parsing looks like: [:i64 1, -2, 3] This attribute beings like an ArrayAttr but has a `:` token after the opening square brace to introduce the element type (supported are I8, I16, I32, I64, F32, F64) and the comma separated list for the data. This is particularly convenient for attributes intended to be small, like those referring to shapes. For example a `transpose` operation with a `dims` attribute could be defined as such: let arguments = (ins AnyTensor:$input, DenseI64ArrayAttr:$dims); let assemblyFormat = "$input `dims` `=` $dims attr-dict : type($input)"; And printed this way (the element type is elided in this case): transpose %input dims = [0, 2, 1] : tensor<2x3x4xf32> The C++ API for dims would just directly return an ArrayRef<int64> RFC: https://discourse.llvm.org/t/rfc-introduce-a-new-dense-array-attribute/63279 Recommit with a custom DenseArrayBaseAttrStorage class to ensure over-alignment of the storage to the largest type. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D123774	2022-06-28 13:28:06 +00:00
Valentin Clement	e2f313df8f	[flang] Make sure that conversions are applied in FORALL degenerate contexts For the rapid triage push, just add a TODO for the degenerate POINTER assignment case. The LHD ought to be a variable of type !fir.box, but it is currently returning a shadow variable for the raw data pointer. More investigation is needed there. Make sure that conversions are applied in FORALL degenerate contexts. This patch is part of the upstreaming effort from fir-dev branch. Reviewed By: jeanPerier Differential Revision: https://reviews.llvm.org/D128724 Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>	2022-06-28 15:24:55 +02:00
Valentin Clement	3348c08359	[flang] Add lowering tests Add lowering tests left behind during the upstreaming. This patch is part of the upstreaming effort from fir-dev branch. Reviewed By: jeanPerier Differential Revision: https://reviews.llvm.org/D128721 Co-authored-by: Jean Perier <jperier@nvidia.com> Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>	2022-06-28 15:22:21 +02:00
Vladislav Khmelevsky	425dda76e9	[BOLT][AArch64] Handle gold linker veneers The gold linker veneers are written between functions without symbols, so we to handle it specially in BOLT. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D128082	2022-06-28 16:14:05 +03:00
Nikita Popov	278a47cc92	[IRBuilder] Migrate vector operations to fold infrastructure Migrate extractelement, insertelement and shufflevector to use the FoldXYZ rather than CreateXYZ APIs. This is probably NFC in practice, because the places using InstSimplifyFolder probably aren't using vector operations.	2022-06-28 15:11:15 +02:00
Mehdi Amini	744d06e4f2	Revert "Introduce a new Dense Array attribute" This reverts commit `508eb41d82`. UBSAN indicates some pointer mis-alignment I need to investigate	2022-06-28 12:47:15 +00:00
Yi Kong	b83b82f9f4	[lldb] Fix build on older Linux kernel versions PERF_COUNT_SW_DUMMY is introduced in Linux 3.12. Differential Revision: https://reviews.llvm.org/D128707	2022-06-28 20:23:33 +08:00
Pavel Samolysov	170c4d21bd	[ArgPromotion] Unify byval promotion with non-byval It makes sense to handle byval promotion in the same way as non-byval but also allowing `store` instructions. However, these should use the same checks as the `load` instructions do, i.e. be part of the `ArgsToPromote` collection. For these instructions, the check for interfering modifications can be disabled, though. The promotion algorithm itself has been modified a lot: all the accesses (i.e. loads and stores) are rewritten to the emitted `alloca` instructions. To optimize these new `alloca`s out, the `PromoteMemToReg` function from `Transforms/Utils/PromoteMemoryToRegister.cpp` file is invoked after promotion. In order to let the `PromoteMemToReg` promote as many `alloca`s as it is possible, there should be no `GEP`s from the `alloca`s. To eliminate the `GEP`s, its own `alloca` is generated for every argument part because a single `alloca` for the whole argument (that significantly simplifies the code of the pass though) unfortunately cannot be used. The idea comes from the following discussion: https://reviews.llvm.org/D124514#3479676 Differential Revision: https://reviews.llvm.org/D125485	2022-06-28 15:19:58 +03:00
Mehdi Amini	508eb41d82	Introduce a new Dense Array attribute This attribute is similar to DenseElementsAttr but does not support splat. As such it has a much simpler API and does not need any smart iterator: it exposes direct ArrayRef access. A new syntax is introduced so that the generic printing/parsing looks like: [:i64 1, -2, 3] This attribute beings like an ArrayAttr but has a `:` token after the opening square brace to introduce the element type (supported are I8, I16, I32, I64, F32, F64) and the comma separated list for the data. This is particularly convenient for attributes intended to be small, like those referring to shapes. For example a `transpose` operation with a `dims` attribute could be defined as such: let arguments = (ins AnyTensor:$input, DenseI64ArrayAttr:$dims); let assemblyFormat = "$input `dims` `=` $dims attr-dict : type($input)"; And printed this way (the element type is elided in this case): transpose %input dims = [0, 2, 1] : tensor<2x3x4xf32> The C++ API for dims would just directly return an ArrayRef<int64> RFC: https://discourse.llvm.org/t/rfc-introduce-a-new-dense-array-attribute/63279 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D123774	2022-06-28 12:08:25 +00:00
Ting Wang	88b6d22791	[PowerPC] Improve getNormalLoadInput to reach more splat load opportunities There are straight forward splat load opportunities blocked by getNormalLoadInput(), since those cases involve consecutive bitcasts. Improve by looking through bitcasts. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D128703	2022-06-28 08:02:49 -04:00
Alex Bradbury	7bcfcabbd1	[RISCV] Implement support for the Zicbop extension Implements the ratified RISC-V Base Cache Management Operation ISA Extension: Zicbop, as described in https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf. This is implemented in a separate patch to Zicbom and Zicboz due to it requiring a new ASM operand type to be defined. Differential Revision: https://reviews.llvm.org/D117433	2022-06-28 12:43:26 +01:00
Alex Bradbury	4f40ca53ce	[RISCV] Implement support for the Zicbom and Zicboz extensions Implements the ratified RISC-V Base Cache Management Operation ISA Extensions: Zicbom and Zicboz, as described in https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf. Zicbop is implemented in a separate patch due to it requiring a new ASM operand type to be defined. As discussed in the relevant issue in the upstream spec https://github.com/riscv/riscv-CMOs/issues/47, the cbo.* instructions use the format (rs1) or 0(rs1) for their operand, similar to the AMOs. Differential Revision: https://reviews.llvm.org/D117432	2022-06-28 12:43:25 +01:00
Nikita Popov	f5bab24afe	[ValueList] Include Error.h (NFC) Hopefully fixes clang-ppc64-aix. Apparently std::function can't be instantiated with a forward declared type in some environments.	2022-06-28 13:26:20 +02:00
Mehdi Amini	2d70faa299	Apply clang-tidy fixes for readability-simplify-boolean-expr in TosaToLinalg.cpp (NFC)	2022-06-28 11:21:43 +00:00
Mehdi Amini	cf3f477d30	Apply clang-tidy fixes for readability-simplify-boolean-expr in Utils.cpp (NFC)	2022-06-28 11:21:37 +00:00
Tim Northover	4aafebce52	SelectionDAG: allow FP extensions when folding extract/insert. Before, we were trying to sign extend half -> float, and asserted in getNode.	2022-06-28 12:08:35 +01:00
Ting Wang	22b8f3511a	[PowerPC] Add base test case for load splat opportunity Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D128718	2022-06-28 06:55:23 -04:00
Matthias Springer	04dac2ca7c	[mlir][SCF][bufferize][NFC] Implement resolveConflicts for ParallelInsertSliceOp This was previous implemented as part of the BufferizableOpInterface of ForEachThreadOp. Moving the implementation to ParallelInsertSliceOp to be consistent with the remaining ops and to have a nice example op that can serve as a blueprint for other ops. Differential Revision: https://reviews.llvm.org/D128666	2022-06-28 12:18:22 +02:00
Guillaume Chatelet	f6f53e990d	[libc] Disable use of inlined builtins for tests	2022-06-28 10:17:46 +00:00
LLVM GN Syncbot	403466860b	[gn build] Port `03975b7f0e`	2022-06-28 09:52:16 +00:00
Guillaume Chatelet	81863dd303	[libc] Fix missing static_cast	2022-06-28 09:50:54 +00:00
Mikhail Goncharov	c6c124ca80	Fixed unused variable warning.	2022-06-28 11:44:16 +02:00
lewuathe	036a699675	[mlir][complex] Canonicalization for consecutive complex.add and sub Add basic canonicalization for consecutive complex.add and sub operations. Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D128702	2022-06-28 11:41:16 +02:00
Guillaume Chatelet	5ae9b42efb	[libc] Use ASSERT_ instead of EXPECT_ in memcmp tests	2022-06-28 09:36:04 +00:00
Florian Hahn	03975b7f0e	[VPlan] Move recipe implementations to separate file (NFC). This patch moves the code for recipe implementations to a separate file. The benefits are: * Keep VPlan.cpp smaller => faster compile-time during parallel builds. * Keep code for logical units together As a follow-up I am also planning on moving all ::execute implemetnations from LoopVectorize.cpp over to the new file, which should help to reduce the size of the file a bit. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D127965	2022-06-28 10:34:30 +01:00
Sander de Smalen	fbefc62a96	[AArch64][SME] Sink tile offset operands into the loop for load/store instructions. This helps ISel decompose the generic offset for the tile into a base + offset. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D128508	2022-06-28 10:28:36 +01:00
David Sherwood	054faac9f9	[AArch64][SME] Add SVE2 psel, uclamp, sclamp and revd IR intrinsics When the SME feature is enabled we also gain access to a few extra SVE2 instructions. This patch adds LLVM IR intrinsics to make use of these new instructions: @llvm.aarch64.sve.psel @llvm.aarch64.sve.revd @llvm.aarch64.sve.sclamp @llvm.aarch64.sve.uclamp Differential Revision: https://reviews.llvm.org/D128332	2022-06-28 10:25:06 +01:00
Guillaume Chatelet	7f5d7bc827	[libc][mem*] Introduce Algorithms for new mem framework This patch is a subpart of D125768 intented to make the review easier. This patch introduces the same algorithms as in `libc/src/string/memory_utils/elements.h` but using the new API. Differential Revision: https://reviews.llvm.org/D128335	2022-06-28 09:23:49 +00:00
Nikita Popov	941c8e0ea5	[Bitcode] Support expanding constant expressions into instructions This implements an autoupgrade from constant expressions to instructions, which is needed for https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179. The basic approach is that constant expressions (CST_CODE_CE_* records) now initially only create a BitcodeConstant value that holds opcode, flags and operands IDs. Then, when the value actually gets used, it can be converted either into a constant expression (if that expression type is still supported) or into a sequence of instructions. As currently all expressions are still supported, -expand-constant-exprs is added for testing purposes, to force expansion. PHI nodes require special handling, because the constant expression needs to be evaluated on the incoming edge. We do this by putting it into a temporary block and then wiring it up appropriately afterwards (for non-critical edges, we could also move the instructions into the predecessor). This also removes the need for the forward referenced constants machinery, as the BitcodeConstants only hold value IDs. At the point where the value is actually materialized, no forward references are needed anymore. Differential Revision: https://reviews.llvm.org/D127729	2022-06-28 11:09:46 +02:00
Sander de Smalen	180cc74de9	[AArch64] Update SME load/store intrinsics to work on opaque pointers. These intrinsics should be able to use opaque pointers, because the load/store type is already encoded in their names and return/operand type. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D128505	2022-06-28 09:50:11 +01:00
David Sherwood	f916ee0fb1	[AArch64][SME] Add SME outer product intrinsics This patch adds the following intrinsics to support the SME ACLE: * @llvm.aarch64.sme.mopa: Non-widening outer product + accumulate * @llvm.aarch64.sme.mops: Non-widening outer product + subtract * @llvm.aarch64.sme.mopa.wide: Widening outer product + accumulate * @llvm.aarch64.sme.mops.wide: Widening outer product + subtract * @llvm.aarch64.sme.smopa.wide: Widening signed sum of outer product + accumulate * @llvm.aarch64.sme.smops.wide: Widening signed sum of outer product + subtract * @llvm.aarch64.sme.umopa.wide: Widening unsigned sum of outer product + accumulate * @llvm.aarch64.sme.umops.wide: Widening unsigned sum of outer product + subtract * @llvm.aarch64.sme.sumopa.wide: Widening signed by unsigned sum of outer product + accumulate * @llvm.aarch64.sme.sumops.wide: Widening signed by unsigned sum of outer product + subtract * @llvm.aarch64.sme.usmopa.wide: Widening unsigned by signed sum of outer product + accumulate * @llvm.aarch64.sme.usmops.wide: Widening unsigned by signed sum of outer product + subtract Differential Revision: https://reviews.llvm.org/D127956	2022-06-28 09:41:44 +01:00
Nikita Popov	5548e807b5	[IR] Remove support for extractvalue constant expression This removes the extractvalue constant expression, as part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179. extractvalue is already not supported in bitcode, so we do not need to worry about bitcode auto-upgrade. Uses of ConstantExpr::getExtractValue() should be replaced with IRBuilder::CreateExtractValue() (if the fact that the result is constant is not important) or ConstantFoldExtractValueInstruction() (if it is). Though for this particular case, it is also possible and usually preferable to use getAggregateElement() instead. The C API function LLVMConstExtractValue() is removed, as the underlying constant expression no longer exists. Instead, LLVMBuildExtractValue() should be used (which will constant fold or create an instruction). Depending on the use-case, LLVMGetAggregateElement() may also be used instead. Differential Revision: https://reviews.llvm.org/D125795	2022-06-28 10:40:17 +02:00
Sander de Smalen	ab7218277c	[AArch64][SME] NFC: Extend tile_slice ComplexPattern to match default case. A tile slice offset of '0' is the default and by moving this into SelectSMETileSlice we can remove some redundant patterns. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D128506	2022-06-28 09:15:52 +01:00
Lian Wang	96ab083622	[RISCV] Support VECTOR_REVERSE mask operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128627	2022-06-28 07:48:51 +00:00
Guillaume Chatelet	3c126d5fe4	[Alignment] Replace commonAlignment with std::min `commonAlignment` is a shortcut to pick the smallest of two `Align` objects. As-is it doesn't bring much value compared to `std::min`. Differential Revision: https://reviews.llvm.org/D128345	2022-06-28 07:15:02 +00:00
Tobias Hieta	3f0578dd87	[clang-cl] Add -emit-ast to clang-cl driver Also make the output of -emit-ast end up where /o points. The same with .plist files from the static analyzer. These are changes needed to make it possible to do CTU static analysing work with clang-cl. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D128409	2022-06-28 09:11:34 +02:00
Martin Boehme	86866107b8	[Clang] Fix: Restore warning inadvertently removed by D126061. Before D126061, Clang would warn about this code ``` struct X { [[deprecated]] struct Y {}; }; ``` with the warning attribute 'deprecated' is ignored, place it after "struct" to apply attribute to type declaration D126061 inadvertently caused this warning to no longer be emitted. This patch restores the previous behavior. The reason for the bug is that after D126061, C++11 attributes applied to a member declaration are no longer placed in `DS.getAttributes()` but are instead tracked in a separate list (`DeclAttrs`). In the case of a free-standing decl-specifier-seq, we would simply ignore the contents of this list. Instead, we now pass the list on to `Sema::ParsedFreeStandingDeclSpec()` so that it can issue the appropriate warning. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D128499	2022-06-28 08:52:58 +02:00
Phoebe Wang	527ef8ca98	Reland "[X86] Support `_Float16` on SSE2 and up" Enable `COMPILER_RT_HAS_FLOAT16` to solve the lit fail. This is split from D113107 to address #56204 and https://discourse.llvm.org/t/how-to-build-compiler-rt-for-new-x86-half-float-abi/63366 Reviewed By: zahiraam, rjmccall, bkramer Differential Revision: https://reviews.llvm.org/D128571	2022-06-28 14:38:56 +08:00
wlei	7e86b13c63	[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner. One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node. Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic. Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D127031	2022-06-27 23:22:21 -07:00
wlei	aa58b7b1e3	[CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie Follow-up patch to https://reviews.llvm.org/D125246, support `computeSummaryAndThreshold` based on context trie. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D127026	2022-06-27 23:22:21 -07:00
wlei	eba5749262	[CSSPGO][llvm-profgen] Reimplement CS profile generator using context trie Our investigation showed ProfileMap's key is the bottleneck of the memory consumption for CS profile generation on some large services. This patch tries to optimize it by storing the CS function samples using the context trie tree structure instead of the context frame array ref. Parts of code in `ContextTrieNode` are reused. Our experiment on one internal service showed that the context key's memory can be reduced from 80GB to 300MB. To be compatible with non-CS profiles, the profile writer still needs to use ProfileMap as input, so rebuild the ProfileMap using the context trie in `postProcessProfiles`. The optimization is not complete yet, next step is to reimplement Pre-inliner or profile trimmer, after that, ProfileMap should be small to be written. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D125246	2022-06-27 23:22:21 -07:00
Petr Hosek	834a38bbcb	Revert "[CoverageMapping] Remove dots from paths inside the profile" This reverts commit `d1b098fc82` since it is failing on Windows builders.	2022-06-27 23:20:54 -07:00
Petr Hosek	d1b098fc82	[CoverageMapping] Remove dots from paths inside the profile We already remove dots from collected paths and path mappings. This makes it difficult to match paths inside the profile which contain dots. For example, we would never match /path/to/../file.c because the collected path is always be normalized to /path/file.c. This change enables dot removal for paths inside the profile to address the issue. Differential Revision: https://reviews.llvm.org/D122750	2022-06-27 23:09:37 -07:00
Mahesh Ravishankar	fa596c6921	[mlir][Vector] Fix reordering of floating point adds during lower of `vector.contract`. Adding the accumulator value after the `vector.contract` changes the precision of the operation. This makes sure the accumulator is carried through to `vector.reduce` (and down to LLVM). Differential Revision: https://reviews.llvm.org/D128674	2022-06-28 05:26:39 +00:00
Congzhe Cao	b941857b40	[LoopInterchange] New cost model for loop interchange This is another attempt to land this patch. The patch proposed to use a new cost model for loop interchange, which is obtained from loop cache analysis. Given a loopnest, what loop cache analysis returns is a vector of loops [loop0, loop1, loop2, ...] where loop0 should be replaced as the outermost loop, loop1 should be placed one more level inside, and loop2 one more level inside, etc. What loop cache analysis does is not only more comprehensive than the current cost model, it is also a "one-shot" query which means that we only need to query it once during the entire loop interchange pass, which is better than the current cost model where we query it every time we check whether it is profitable to interchange two loops. Thus complexity is reduced, especially after D120386 where we do more interchanges to get the globally optimal loop access pattern. Updates made to test cases are mostly minor changes and some corrections. One change that applies to all tests is that we added an option `-cache-line-size=64` to the RUN lines. This is ensure that loop cache analysis receives a valid number of cache line size for correct analysis. Test coverage for loop interchange is not reduced. Currently we did not completely remove the legacy cost model, but keep it as fall-back in case the new cost model did not run successfully. This is because currently we have some limitations in delinearization, which sometimes makes loop cache analysis bail out. The longer term goal is to enhance delinearization and eventually remove the legacy cost model compeletely. Reviewed By: bmahjour, #loopoptwg Differential Revision: https://reviews.llvm.org/D124926	2022-06-28 00:08:37 -04:00
Michał Górny	f1dcc6af30	[lldb] [test] Mark test_vCont_supports_t llgs-only Sponsored by: The FreeBSD Foundation	2022-06-28 06:06:54 +02:00
LiaoChunyu	1178992c72	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-06-28 12:02:05 +08:00
Shao-Ce SUN	1919adb19b	[RISCV] Fix the problem of parsing long version numbers For example, when parsing Zbpbo0p911, an error will be reported: "multi-character extensions must be separated by underscores" Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128644	2022-06-28 11:48:14 +08:00

1 2 3 4 5 ...

428290 Commits All Branches Search

428290 Commits

All Branches