llvm-project

Commit Graph

Author	SHA1	Message	Date
Eduardo Caldas	3b75f65e6b	[SyntaxTree] Fix C++ versions on tests of `BuildTreeTest.cpp` Differential Revision: https://reviews.llvm.org/D86591	2020-08-26 07:19:49 +00:00
Eduardo Caldas	2de2ca348d	[SyntaxTree] Add support for `CallExpression` * Generate `CallExpression` syntax node for all semantic nodes inheriting from `CallExpr` with call-expression syntax - except `CUDAKernelCallExpr`. * Implement all the accessors * Arguments of `CallExpression` have their own syntax node which is based on the `List` base API Differential Revision: https://reviews.llvm.org/D86544	2020-08-26 07:03:49 +00:00
Roman Lebedev	1f90d45b9e	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) This is a reland of the original commit `fcb51d8c24`, because originally i forgot to ensure that the base aggregate types match. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:57:50 +03:00
Roman Lebedev	451b1bd894	[NFC][InstCombine] Add a PHI-of-insertvalues test with different base aggregate types	2020-08-26 09:57:50 +03:00
Mehdi Amini	0b7c184c2d	Add assertion in PatternRewriter::create<> to defend the same way as OpBuilder::create<> against missing dialect registration (NFC) The code would have failed a few line later, but that way the error message is more clear/friendly to debug.	2020-08-26 06:57:23 +00:00
Mehdi Amini	5a6ff2bb3e	Adjust assertion when casting to an unregistered operation This assertion does not achieve what it meant to do originally, as it would fire only when applied to an unregistered operation, which is a fairly rare circumstance (it needs a dialect or context allowing unregistered operation in the input in the first place). Instead we relax it to only fire when it should have matched but didn't because of the misconfiguration. Differential Revision: https://reviews.llvm.org/D86588	2020-08-26 06:57:22 +00:00
Siva Chandra Reddy	fe44992b79	[libc][NFC] For remquo quotient, compare only 3 bits of MPFR and libc results.	2020-08-25 23:42:06 -07:00
Mateusz Mikuła	c82078b5d7	[LLD][MinGW] Handle allow-multiple-definition flag Basically copied from ELF driver. Differential Revision: https://reviews.llvm.org/D86512	2020-08-26 09:38:11 +03:00
Mateusz Mikuła	dcb1ce61b8	[LLD][MinGW] Cleanup Options.td file. NFC. Based on ELF driver Options.td. Differential Revision: https://reviews.llvm.org/D86509	2020-08-26 09:38:11 +03:00
Martin Storsjö	b07d78bcf9	[MC] [Win64EH] Update the AArch64/seh.s test slightly. NFC. Update the comment stating the aim of the test - this is currently only checking that these assembler directives doesn't cause the assembler to fail, but the results of the testcase aren't particularly correct yet. Remove bits of the testcase that are even less likely to be found in the wild (the .seh_startchained/.seh_endchained block), where the testcase currently doesn't really generate anything interesting anyway. Differential Revision: https://reviews.llvm.org/D86524	2020-08-26 09:38:11 +03:00
Martin Storsjö	db259fe38b	[llvm-readobj] Fix arm64 unwind opcode disassembly printing Add a missing minus, fix vertical alignment of instructions for one opcode. Differential Revision: https://reviews.llvm.org/D86523	2020-08-26 09:38:11 +03:00
Thomas Raoux	6a3c69e918	[mlir][spirv] Infer converted type of scf.for from the init value Instead of using the TypeConverter infer the value of the alloca created based on the init value. This will allow some ambiguous types like multidimensional vectors to be converted correctly. Differential Revision: https://reviews.llvm.org/D86582	2020-08-25 23:35:01 -07:00
Roman Lebedev	c295c6f2c0	Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad" This reverts commit `fcb51d8c24`. As buildbots report, there's apparently some missing check to ensure that the types of incoming values match the type of PHI. Let's revert for a moment.	2020-08-26 09:23:22 +03:00
Roman Lebedev	fcb51d8c24	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:08:24 +03:00
Jianzhou Zhao	4784987027	Fix a 32-bit overflow issue when reading LTO-generated bitcode files whose strtab are of size > 2^29 This happens when using -flto and -Wl,--plugin-opt=emit-llvm to create a linked LTO bitcode file, and the bitcode file has a strtab with size > 2^29. All the issues relate to a pattern like this size_t x64 = y64 + z32 * C When z32 is >= (2^32)/C, z32 * C overflows. Reviewed-by: MaskRay Differential Revision: https://reviews.llvm.org/D86500	2020-08-26 05:47:22 +00:00
Mehdi Amini	a3ef1054fd	Remove the use of global dialect registration from the standalone-translate.cpp example (NFC)	2020-08-26 05:14:04 +00:00
Siva Chandra Reddy	1948acb61b	[libc][obvious] Add back the accidentally removed MPFRNumber destructor.	2020-08-25 21:57:46 -07:00
Siva Chandra Reddy	3f4674a557	[libc] Extend MPFRMatcher to handle multiple-input-multiple-output functions. Tests for frexp[f\|l] now use the new capability. Not all input-output combinations have been addressed by this change. Support for newer combinations can be added in future as needed. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D86506	2020-08-25 21:42:49 -07:00
Xing GUO	75e0b58668	[DWARFYAML] Use writeDWARFOffset() to write the prologue_length field. NFC. Use writeDWARFOffset() to simplify the logic. NFC.	2020-08-26 12:34:02 +08:00
Adrien Guinet	c6f7ac0071	[llvm-lipo] Add support for bitcode files A Mach-O universal binary may contain bitcode as a slice. This diff adds proper handling of such binaries to llvm-lipo. Test plan: make check-all Differential revision: https://reviews.llvm.org/D85740	2020-08-25 21:11:18 -07:00
Jason Molenda	b1e856d3a9	Ah, one test too many updated. This one should be unmodified.	2020-08-25 21:03:39 -07:00
Jason Molenda	99d187a003	Update UnwindPlan dump to list if it is a trap handler func; also Command Update the "image show-unwind" command output to show if the function being shown is listed as a user-setting or platform trap handler. Update the individual UnwindPlan dumps to show whether the unwind plan is registered as a trap handler.	2020-08-25 20:53:59 -07:00
Teresa Johnson	72bdb41a06	[Docs] Document --lto-whole-program-visibility Summary: Documents interaction of linker option added in D71913 with LTO visibility. Reviewers: pcc Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D75655	2020-08-25 19:44:54 -07:00
Mikhail R. Gadelha	30967e51da	Add Z3 to system libraries list if enabled Without this trying to link static LLVM libraries (built with Z3 enabled) fails because `llvm-config` doesn't print `-lz3`. We are already using this patch at MSYS2: https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/0013-Add-Z3-to-system-libraries-list-if-enabled.patch Reviewed By: mikhail.ramalho Differential Revision: https://reviews.llvm.org/D85195	2020-08-25 22:32:36 -04:00
Craig Topper	1d1515a9e2	[X86] Add an isel pattern for (i8 (trunc (i16 (bitconvert (v16i1 X))))) to avoid an extra EXTRACT_SUBREG Since we can only copy to GR32 we had to EXTRACT from GR32, but we would first go to GR16 and then the truncate would extra again to GR8. This adds a special case to go directly from GR32 to GR8. This would eventually get cleaned up, but though maybe we should avoid doing it in the first place. Our k-register handling is weird and we could probably stand to have some more special ISD nodes for the conversions so the i32 type would be explicit.	2020-08-25 18:20:43 -07:00
Volodymyr Sapsai	8839e278ff	[Modules] Improve error message when cannot find parent module for submodule definition. Before the change the diagnostic for module unknown.submodule {} was "error: expected module name" which is incorrect and misleading because both "unknown" and "submodule" are valid module names. We already have a better error message when a parent module is a submodule itself and is missing. Make the error for a missing top-level module more like the one for a submodule. rdar://problem/64424407 Reviewed By: bruno Differential Revision: https://reviews.llvm.org/D84458	2020-08-25 16:31:27 -07:00
Mehdi Amini	1e13372bc8	Remove global registration from the test dialect in MLIR (NFC)	2020-08-25 23:30:53 +00:00
Craig Topper	b8ec8f5776	[X86] Remove extra getOperand(0) call from recently introduced store(extract_element(vtrunc)) to truncated store combine. The IsExtractedElement already called getOperand(0) so Extract here is the source vector. We shouldn't call getOperand(0). This worked for the original test cases because the result was a bitcast so the getOperand(0) accidently peeked through the bitcast which is what we wanted. In the failing case here, the operand turns out to be undef so the getOperand(0) asserts because undef has no operands. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184 Differential Revision: https://reviews.llvm.org/D86428	2020-08-25 16:16:54 -07:00
Mehdi Amini	49c371b319	Add llvm_unreachable after fully covered switch to silence some warnings from GCC (NFC)	2020-08-25 23:09:11 +00:00
Zequan Wu	9500a72091	Revert "[Coverage] Enable emitting gap area between macros" This reverts commit `a31c89c1b7`.	2020-08-25 15:28:42 -07:00
Craig Topper	ba319ac47e	[X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output pattern. KMOVWkr produces VK16, there's no reason to copy it to VK16 again. Test changes are presumably because we were scheduling based on the COPY that is no longer there.	2020-08-25 15:19:27 -07:00
Shoaib Meenai	22cd6bee4a	[llvm-libtool-darwin] Address post-commit feedback Address James Henderson's comments on https://reviews.llvm.org/D86359.	2020-08-25 15:04:23 -07:00
Dave Lee	66c4880291	Remove unused/misnamed SetObjectModificationTime Remove `SetObjectModificationTime` which is not currently used, and assigns to the wrong member. Differential Revision: https://reviews.llvm.org/D86493	2020-08-25 14:49:34 -07:00
Mircea Trofin	7cfcecece0	[MLInliner] Simplify TFUTILS_SUPPORTED_TYPES We only need the C++ type and the corresponding TF Enum. The other parameter was used for the output spec json file, but we can just standardize on the C++ type name there. Differential Revision: https://reviews.llvm.org/D86549	2020-08-25 14:19:39 -07:00
Stanislav Mekhanoshin	b7760c3e5d	[AMDGPU] Remove unsound dependency on ISA version in waitcnt Differential Revision: https://reviews.llvm.org/D86566	2020-08-25 14:01:42 -07:00
Fangrui Song	82d0749749	[TargetLoweringObjectFileImpl] Make .llvmbc and .llvmcmd non-SHF_ALLOC There are two ways .llvmbc can be produced: * clang -c -fembed-bitcode=all (which also produces .llvmcmd) * LTO backend: ld.lld -mllvm -lto-embed-bitcode or -plugin-opt=-lto-embed-bitcode .llvmbc and .llvmcmd have the SHF_ALLOC flag, so they can be dropped by --gc-sections. This patch sets SectionKind::Metadata to drop the SHF_ALLOC flag. This is conceptually correct: the two sections are not part of the process image, so SHF_ALLOC is not appropriate. `test/LTO/X86/embed-bitcode.ll`: changed `llvm-objcopy -O binary --only-section` to `llvm-objcopy --dump-section`. `-O binary` does not dump non-SHF_ALLOC sections. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D86374	2020-08-25 13:37:29 -07:00
Krzysztof Parzyszek	514d6e9a8d	[SDAG] Improve MemSDNode::getBasePtr It returned getOperand(1), except for STORE for which it returned getOperand(2). Handle MSTORE, MGATHER, and MSCATTER as well.	2020-08-25 15:19:52 -05:00
aartbik	66e536bc36	[mlir] [LLVMIR] Mark reductions as side-effect free Attribute was missing from original base class. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D86569	2020-08-25 13:09:19 -07:00
Shilei Tian	0775c1dfbc	[OpenMP] Pack first-private arguments to improve efficiency of data transfer In this patch, we pack all small first-private arguments, allocate and transfer them all at once to reduce the number of data transfer which is very expensive. Let's take the test case as example. ``` int main() { int data1[3] = {1}, data2[3] = {2}, data3[3] = {3}; int sum[16] = {0}; #pragma omp target teams distribute parallel for map(tofrom: sum) firstprivate(data1, data2, data3) for (int i = 0; i < 16; ++i) { for (int j = 0; j < 3; ++j) { sum[i] += data1[j]; sum[i] += data2[j]; sum[i] += data3[j]; } } } ``` Here `data1`, `data2`, and `data3` are three first-private arguments of the target region. In the previous `libomptarget`, it called data allocation and data transfer three times, each of which allocated and transferred 12 bytes. With this patch, it only calls allocation and transfer once. The size is `(12+4)3=48` where 12 is the size of each array and 4 is the padding to keep the address aligned with 8. It is implemented in this way: 1. First collect all information for those first*-private arguments. _private_ arguments are not the case because private arguments don't need to be mapped to target device. It just needs a data allocation. With the patch for memory manager, the data allocation could be very cheap, especially for the small size. For each qualified argument, push a place holder pointer `nullptr` to the `vector` for kernel arguments, and we will update them later. 2. After we have all information, create a buffer that can accommodate all arguments plus their paddings. Copy the arguments to the buffer at the right place, i.e. aligned address. 3. Allocate a target memory with the same size as the host buffer, transfer the host buffer to target device, and finally update all place holder pointers in the arguments `vector`. The reason we only consider small arguments is, the data transfer is asynchronous. Therefore, for the large argument, we could continue to do things on the host side meanwhile, hopefully, the data is also being transferred. The "small" is defined by that the argument size is less than a predefined value. Currently it is 1024. I'm not sure whether it is a good one, and that is an open question. Another question is, do we need to make it configurable via an environment variable? Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D86307	2020-08-25 16:06:29 -04:00
Stanislav Mekhanoshin	817c831f02	[AMDGPU] Switch to named simm16 in vscnt insertion Differential Revision: https://reviews.llvm.org/D86568	2020-08-25 13:05:27 -07:00
Ankit Aggarwal	2da1eefb58	[Hexagon] Check if EVT is simple type in HVX lowering	2020-08-25 15:02:44 -05:00
Jonas Devlieghere	521220690a	[lldb] Make Reproducer compatbile with SubsystemRAII (NFC) Make Reproducer compatbile with SubsystemRAII and use it in LocateSymbolFileTest.	2020-08-25 13:00:04 -07:00
Abhina Sreeskantharajan	97ccf93b36	[SystemZ][z/OS] Add z/OS Target and define macros This patch adds the z/OS target and defines macros as a stepping stone towards enabling a native build on z/OS. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D85324	2020-08-25 15:51:59 -04:00
Juneyoung Lee	f753f5b050	[ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands. Instead of special-casing llvm.assume, I think it is also a viable option to add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86477	2020-08-26 04:40:21 +09:00
Juneyoung Lee	8e51bb249b	[ValueTracking] Add a noundef test for D86477; NFC	2020-08-26 04:40:21 +09:00
Amy Huang	b1009ee84f	Reland "[DebugInfo] Move constructor homing case in shouldOmitDefinition." For some reason the ctor homing case was before the template specialization case, and could have returned false too early. I moved the code out into a separate function to avoid this. This reverts commit `05777ab941`.	2020-08-25 12:36:11 -07:00
Nikita Popov	3a54b6a4b7	[MemDep] Use BatchAA when computing pointer dependencies We're not changing IR while running a single MemDep query, so it's safe to cache alias analysis results using BatchAA. This adds BatchAA usage to getSimplePointerDependencyFrom(), which is non-intrusive -- covering larger parts (like a whole processNonLocalLoad query) is also possible, but requires threading BatchAA through a bunch of APIs. For the ThinLTO configuration, this is a 1% geomean improvement on CTMark. Differential Revision: https://reviews.llvm.org/D85583	2020-08-25 21:34:34 +02:00
aartbik	84fdc33f47	[mlir] [LLVMIR] Add get active lane mask intrinsic Provides fast, generic way of setting a mask up to a certain point. Potential use cases that may benefit are create_mask and transfer_read/write operations in the vector dialect. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D86501	2020-08-25 12:19:17 -07:00
Wolfgang Pieb	e02920fe55	[llvm-mca][NFC] Refactor handling of views that examine individual instructions, including printing them. Reviewers: andreadb, lebedev.ri Differential Review: https://reviews.llvm.org/D86390 Introduces a new base class "InstructionView" that such views derive from. Other views still use the "View" base class.	2020-08-25 12:12:37 -07:00
clementval	4d69bcb12f	[mlir][openacc][NFC] Fix comment about OpenACCExecMapping	2020-08-25 15:11:05 -04:00

... 9 10 11 12 13 ...

364984 Commits All Branches Search

364984 Commits

All Branches