llvm-project

Commit Graph

Author	SHA1	Message	Date
Raphael Isemann	f5f22f0448	[lldb] Skip TestSimulatorPlatform with sanitized builds The test executable crashes when ran on a simulator. Skipping until this is fixed. rdar://67238668	2020-08-17 15:06:48 +02:00
Kai Nacke	c2ae7934c8	[SystemZ/ZOS]__(de)register_frame are not available on z/OS. The functions `__register_frame`/`__deregister_frame` are not available on z/OS, so add a guard to not use them. Reviewed By: lhames, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D84787	2020-08-17 09:00:09 -04:00
Sam Parker	dad04e62f1	[NFC] run update test script On Transforms/LoopUnroll/runtime-small-upperbound.ll	2020-08-17 13:54:28 +01:00
Luís Marques	687e7d3425	[NFC] Tweak a comment about the lock-free builtins	2020-08-17 13:43:53 +01:00
Raphael Isemann	cfb773c676	[lldb][NFC] Use StringRef in CreateFunctionDeclaration/GetDeclarationName CreateFunctionDeclaration should just take a StringRef. GetDeclarationName is (only) used by CreateFunctionDeclaration so that's why now also takes a StringRef.	2020-08-17 14:17:20 +02:00
Georgii Rymar	bc902191d3	[llvm-readobj] - Remove unwrapOrError calls from GNUStyle<ELFT>::printRelocations. This fixes existent FIXMEs: we should not error out when unable to find the number of relocations. Differential revision: https://reviews.llvm.org/D85891	2020-08-17 15:16:36 +03:00
Sam Elliott	3f7068ad98	[RISCV] Enable the use of the old mucounteren name The RISC-V Privileged Specification 1.11 defines `mcountinhibit`, which has the same numeric CSR value as `mucounteren` from 1.09.1. This patch enables the use of the old `mucounteren` name. Patch by Yuichi Sugiyama. Reviewed By: lenary, jrtc27, pzheng Differential Revision: https://reviews.llvm.org/D85067	2020-08-17 13:11:49 +01:00
Sam Elliott	5f9ecc5d85	[RISCV] Indirect branch generation in position independent code This fixes the "Unable to insert indirect branch" fatal error sometimes seen when generating position-independent code. Patch by msizanoen1 Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D84833	2020-08-17 13:09:26 +01:00
LLVM GN Syncbot	e0eb4f204a	[gn build] Port `c1f6ce0c73`	2020-08-17 12:02:24 +00:00
Sanjay Patel	e6b6787d01	[InstCombine] fold abs(X)/X to cmp+select The backend can convert the select-of-constants to bit-hack shift+logic if desirable. https://alive2.llvm.org/ce/z/pgJT6E define i8 @src(i8 %x) { %0: %a = abs i8 %x, 1 %d = sdiv i8 %x, %a ret i8 %d } => define i8 @tgt(i8 %x) { %0: %cond = icmp sgt i8 %x, 255 %r = select i1 %cond, i8 1, i8 255 ret i8 %r } Transformation seems to be correct!	2020-08-17 08:01:28 -04:00
Sanjay Patel	61512ddd2d	[InstCombine] add tests for sdiv-of-abs; NFC	2020-08-17 08:01:27 -04:00
Sanjay Patel	6cd4a6f6b2	[InstCombine] reduce code duplication; NFC	2020-08-17 08:01:27 -04:00
Georgii Rymar	6567f82216	[llvm-readobj/elf] - Refine the warning about the broken PT_DYNAMIC segment. Splitted out from D85519. Currently we report "PT_DYNAMIC segment offset + size exceeds the size of the file", this changes it to "PT_DYNAMIC segment offset (0x1234) + file size (0x5678) exceeds the size of the file (0x68ab)" Differential revision: https://reviews.llvm.org/D85654	2020-08-17 14:57:19 +03:00
Simon Pilgrim	c1f6ce0c73	[DemandedBits] Improve accuracy of Add propagator The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits. I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback. This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read. Known A: 0_______0_______0_______0_______ Known B: 0_______0_______0_______0_______ AOut: 00000000001000000000000000000000 AB, current: 00000000001111111111111111111111 AB, patch: 00000000001111111000000000000000 Committed on behalf of: @rrika (Erika) Differential Revision: https://reviews.llvm.org/D72423	2020-08-17 12:54:09 +01:00
Simon Pilgrim	79d9e2cd93	[DemandedBits] Reorder addition test checks. NFC. As suggested on D72423 we should try to keep the same order as the original IR	2020-08-17 12:54:09 +01:00
Sam Parker	613d8f2953	[NFC] Run update script on test Update IndVarSimplify/no-iv-rewrite.ll	2020-08-17 12:53:14 +01:00
Georgii Rymar	c135a68d42	[LLD][ELF] - Do not produce an invalid dynamic relocation order with --shuffle-sections. Normally (when not on android with android relocation packing enabled), we put IRelative relocations to ".rel[a].dyn", after other relocations, to ensure that IRelatives are processed last by the dynamic loader. To achieve that we add the `in.relaIplt` after the `part.relaDyn`: https://github.com/llvm/llvm-project/blob/master/lld/ELF/Writer.cpp#L540 The problem is that `--shuffle-sections` might break the sections order. This patch fixes it. Fixes https://bugs.llvm.org/show_bug.cgi?id=47056. Differential revision: https://reviews.llvm.org/D85651	2020-08-17 14:46:52 +03:00
Raphael Isemann	7e6c437fb4	[lldb][NFC] Remove name parameter from CreateFunctionTemplateDecl It's unused and not documented.	2020-08-17 13:44:10 +02:00
Raphael Isemann	42b9a68352	[lldb][NFC] Use expect_expr in more tests	2020-08-17 13:14:57 +02:00
Simon Pilgrim	1d2ede87ea	[X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported. We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004	2020-08-17 11:58:51 +01:00
Raphael Isemann	cd2139a527	[lldb][NFC] Use the proper type for the 'storage' parameter of CreateFunctionDeclaration All the callers pass an enum and we cast the int anyway back to the actual type, so we might as well just use the type for the parameter.	2020-08-17 12:53:58 +02:00
Cullen Rhodes	2ccde3c96b	[InlineCost] Fix scalable vectors in visitAlloca Discovered as part of the VLS type work (see D85128). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85848	2020-08-17 10:34:27 +00:00
Vitaly Buka	3b348d9102	[NFC][StackSafety] Move out sort from the loop	2020-08-17 03:30:14 -07:00
Raphael Isemann	6b97fa0bfe	[lldb] Remove OS-specific string from TestInvalidArgsLog This is the error message from the OS, so we shouldn't check against the OS-specific part of the string. Fixes the test on Windows which returns a different error message.	2020-08-17 11:57:43 +02:00
Raphael Isemann	24c74f5e8c	[lldb] Don't delete orphaned shared modules in SBDebugger::DeleteTarget In D83876 the consensus seems that LLDB should never deleted orphaned modules implicitly. However, SBDebugger::DeleteTarget is currently doing exactly that. This code was added in `753406221b` but I don't see any explanation in the commit, so I think we should delete it. Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D83933	2020-08-17 11:30:56 +02:00
Pavel Labath	67cdb899c6	[lldb/Utility] Simplify and generalize Scalar class The class contains an enum listing all host integer types as well as some non-host types. This setup is a remnant of a time when this class was actually implemented in terms of host integer types. Now that we are using llvm::APInt, they are mostly useless and mean that each function needs to enumerate all of these cases even though it treats most of them identically. I only leave e_sint and e_uint to denote the integer signedness, but I want to remove that in a follow-up as well. Removing these cases simplifies most of these functions, with the only exception being PromoteToMaxType, which can no longer rely on a simple enum comparison to determine what needs to be promoted. This also makes the class ready to work with arbitrary integer sizes, so it does not need to be modified when someone needs to add a larger integer size. Differential Revision: https://reviews.llvm.org/D85836	2020-08-17 11:09:56 +02:00
Pavel Labath	2d89a3ba12	[lldb] Forcefully complete a type when adding nested classes With -flimit-debug-info, we can run into cases when we only have a class as a declaration, but we do have a definition of a nested class. In this case, clang will hit an assertion when adding a member to an incomplete type (but only if it's adding a c++ class, and not C struct). It turns out we already had code to handle a similar situation arising in the -gmodules scenario. This extends the code to handle -flimit-debug-info as well, and reorganizes bits of other code handling completion of types to move functions doing similar things closer together. Differential Revision: https://reviews.llvm.org/D85968	2020-08-17 11:09:13 +02:00
Raphael Isemann	c2f9454a16	[lldb] Add SBModule::GarbageCollectAllocatedModules and clear modules after each test run Right now the only places in the SB API where lldb:: ModuleSP instances are destroyed are in SBDebugger::MemoryPressureDetected (where it's just attempted but not guaranteed) and in SBDebugger::DeleteTarget (which will be removed in D83933). Tests that directly create an lldb::ModuleSP and never create a target therefore currently leak lldb::Module instances. This triggers the sanity checks in lldbtest that make sure that the global module list is empty after a test. This patch adds SBModule::GarbageCollectAllocatedModules as an explicit way to clean orphaned lldb::ModuleSP instances. Also we now start calling this method at the end of each test run and move the sanity check behind that call to make this work. This way even tests that don't create targets can pass the sanity check. This fixes TestUnicodeSymbols.py when D83865 is applied (which makes that the sanity checks actually fail the test). Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D83876	2020-08-17 11:00:19 +02:00
Raphael Isemann	867c347c32	[lldb] Fix that log enable's -f parameter causes LLDB to crash when it can't open the log file We didn't do anything with the llvm::Error we get from `Open`, so when we end up in the error case we just crash due to the llvm::Error sanity check. Also add the missing newline behind the error message so it no longer messes with the next (lldb) prompt. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D85970	2020-08-17 10:43:00 +02:00
Raphael Isemann	c57ea1b48f	[lldb] Get lldb-server platform's --socket-file working again `lldb-server platform --socket-file /any/path` currently always fails to create the socket file. This stopped working after D67424 which changed the input variables of `writeFileAtomically` slightly. We're expected to pass in a temporary path template (`/tmp/foo-%%%%%`) and the final path we want to write. Instead we currently pass in the never set `temp_file_path` as the temporary path (which will make this function always fail) and pass in the temp_file_spec's path as the final path (which is actually the template path such as `/tmp/foo-%%%%%`) instead of the actual path we want to write (e.g. `/tmp/foo`). Reviewed By: labath Differential Revision: https://reviews.llvm.org/D85890	2020-08-17 10:29:06 +02:00
Kazushi (Jam) Marukawa	40f1e7e804	[VE] Support f128 Support f128 using VE instructions. Update regression tests. I've noticed there is no load or store i128 test, so I add them too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D86035	2020-08-17 17:26:52 +09:00
Raphael Isemann	5913f2591c	[lldb][NFC] Remove stride parameter from GetArrayElementType This parameter isn't used anywhere in LLDB nor the Swift downstream branch. It also doesn't really fit into the TypeSystem APIs that usually don't return additional related functionality via some output parameters. Also the implementations already states that the calculated value there is wrong. Let's remove it. If we need this functionality at some point then Swift's much nicer `GetByteStride` function seems like the way to go. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D84299	2020-08-17 10:19:51 +02:00
Kadir Cetinkaya	53c593c2c8	[clang] Make signature help work with dependent args Fixes https://github.com/clangd/clangd/issues/490 Differential Revision: https://reviews.llvm.org/D85826	2020-08-17 10:06:36 +02:00
Raphael Isemann	24fc3177c1	[lldb] Print the exception traceback when hitting cleanup errors Right now if the test suite encounters a cleanup error it just prints "CLEANUP ERROR:" but not any additional information. This patch just prints the exception that caused the cleanup error. This should make debugging the failing tests for D83865 easier (and seems in general nice to have). Reviewed By: labath Differential Revision: https://reviews.llvm.org/D83874	2020-08-17 09:53:52 +02:00
Craig Topper	a206f85091	[X86] Reject dirflag in inline asm constraints other than clobber. Fixes the crash from PR47195.	2020-08-16 23:33:45 -07:00
Chen Zheng	4d52ebb9b9	[PowerPC] Make StartMI ignore COPY like instructions. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D85659	2020-08-17 02:12:30 -04:00
Yonghong Song	aa61e43040	[InstCombine] Fix a compilation bug With gcc 6.3.0, I hit the following compilation bug. ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic] }; ^ cc1plus: all warnings being treated as errors The error is introduced by Commit `ae7f08812e` ("[InstCombine] Aggregate reconstruction simplification (PR47060)")	2020-08-16 21:56:42 -07:00
Yonghong Song	000ad1a976	[clang] fix a compilation bug With gcc 6.3.0, I hit the following compilation bug: /home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp: In function ‘bool ParseCodeGenArgs(clang::CodeGenOptions&, llvm::opt::ArgList&, clang::InputKind, clang::DiagnosticsEngine&, const clang::TargetOptions&, const clang::FrontendOptions&)’: /home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp:780:12: error: unused variable ‘A’ [-Werror=unused-variable] if (Arg *A = Args.getLastArg(OPT_fuse_ctor_homing)) ^ cc1plus: all warnings being treated as errors The bug is introduced by Commit `ae6523cd62` ("[DebugInfo] Add -fuse-ctor-homing cc1 flag so we can turn on constructor homing only if limited debug info is already on.")	2020-08-16 21:53:37 -07:00
zhanghb97	fcd2969da9	Initial MLIR python bindings based on the C API. * Basic support for context creation, module parsing and dumping. Differential Revision: https://reviews.llvm.org/D85481	2020-08-16 19:34:25 -07:00
Vitaly Buka	e10e7829bf	[StackSafety] Skip ambiguous lifetime analysis If we can't identify alloca used in lifetime marker we need to assume to worst case scenario. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D84630	2020-08-16 18:05:52 -07:00
Richard Smith	948219d109	Replace setter named 'getAsOpaqueInt' with a real getter. Clean up a bunch of places where the opaque forms of FPOptions and FPOptionsOverride were being used inappropriately.	2020-08-16 16:38:33 -07:00
Richard Smith	ae500e4d09	Always keep unset fields in FPOptionsOverride zeroed. There are three fields that the FPOptions default constructor sets to non-zero values; those fields previously could have been zero or non-zero depending on whether they'd been explicitly removed from the FPOptionsOverride set. However, that doesn't seem to ever actually happen, so this is NFC, except that it makes the AST file representation of FPOptionsOverride make more sense.	2020-08-16 15:44:51 -07:00
Richard Smith	ae3067055b	Use consistent code for setting FPFeatures from operator constructors.	2020-08-16 15:40:38 -07:00
Richard Smith	9860e68450	Don't leave the FPOptions in a UnaryOperator uninitialized. We don't appear to use these FPOptions for anything right now, but they shouldn't be uninitialized because that makes our AST file output nondeterministic.	2020-08-16 15:16:12 -07:00
Mehdi Amini	de71b46a51	Add missing parsing for attributes to std.generic_atomic_rmw op Fix llvm.org/pr47182 Differential Revision: https://reviews.llvm.org/D86030	2020-08-16 22:13:58 +00:00
Roman Lebedev	0ec1f0f332	[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically	2020-08-16 23:37:36 +03:00
Roman Lebedev	ae7f08812e	[InstCombine] Aggregate reconstruction simplification (PR47060) This pattern happens in clang C++ exception lowering code, on unwind branch. We end up having a `landingpad` block after each `invoke`, where RAII cleanup is performed, and the elements of an aggregate `{i8, i32}` holding exception info are `extractvalue`'d, and we then branch to common block that takes extracted `i8` and `i32` elements (via `phi` nodes), form a new aggregate, and finally `resume`'s the exception. The problem is that, if the cleanup block is effectively empty, it shouldn't be there, there shouldn't be that `landingpad` and `resume`, said `invoke` should be a `call`. Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`. But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft, while it is pointless, does not look like "empty cleanup block". So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has higher cost than it could have on unwind branch :S This doesn't happen that often, but it will basically happen once per C++ function with complex CFG that called more than one other function that isn't known to be `nounwind`. I think, this is a missing fold in InstCombine, so i've implemented it. I think, the algorithm/implementation is rather self-explanatory: 1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate. 2. For each element, try to find from which aggregate it was extracted. If it was extracted from the aggregate with identical type, from identical element index, great. 3. If all elements were found to have been extracted from the same aggregate, then we can just use said original source aggregate directly, instead of re-creating it. 4. If we fail to find said aggregate when looking only in the current block, we need be PHI-aware - we might have different source aggregate when coming from each predecessor. I'm not sure if this already handles everything, and there are some FIXME's, i'll deal with all that later in followups. I'd be fine with going with post-commit review here code-wise, but just in case there are thoughts, i'm posting this. On RawSpeed, for example, this has the following effect: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 1253 \| 1253 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 948 \| 1355 \| 407 \| 42.93% \| 42.93% \| \| instcount.NumInsertValueInst \| 4382 \| 3210 \| -1172 \| -26.75% \| 26.75% \| \| simplifycfg.NumSinkCommonCode \| 574 \| 458 \| -116 \| -20.21% \| 20.21% \| \| simplifycfg.NumSinkCommonInstrs \| 1154 \| 921 \| -233 \| -20.19% \| 20.19% \| \| instcount.NumExtractValueInst \| 29017 \| 26397 \| -2620 \| -9.03% \| 9.03% \| \| instcombine.NumDeadInst \| 166618 \| 174705 \| 8087 \| 4.85% \| 4.85% \| \| instcount.NumPHIInst \| 51526 \| 50678 \| -848 \| -1.65% \| 1.65% \| \| instcount.NumLandingPadInst \| 20865 \| 20609 \| -256 \| -1.23% \| 1.23% \| \| instcount.NumInvokeInst \| 34023 \| 33675 \| -348 \| -1.02% \| 1.02% \| \| simplifycfg.NumSimpl \| 113634 \| 114708 \| 1074 \| 0.95% \| 0.95% \| \| instcombine.NumSunkInst \| 15030 \| 14930 \| -100 \| -0.67% \| 0.67% \| \| instcount.TotalBlocks \| 219544 \| 219024 \| -520 \| -0.24% \| 0.24% \| \| instcombine.NumCombined \| 644562 \| 645805 \| 1243 \| 0.19% \| 0.19% \| \| instcount.TotalInsts \| 2139506 \| 2135377 \| -4129 \| -0.19% \| 0.19% \| \| instcount.NumBrInst \| 156988 \| 156821 \| -167 \| -0.11% \| 0.11% \| \| instcount.NumCallInst \| 1206144 \| 1207076 \| 932 \| 0.08% \| 0.08% \| \| instcount.NumResumeInst \| 5193 \| 5190 \| -3 \| -0.06% \| 0.06% \| \| asm-printer.EmittedInsts \| 948580 \| 948299 \| -281 \| -0.03% \| 0.03% \| \| instcount.TotalFuncs \| 11509 \| 11507 \| -2 \| -0.02% \| 0.02% \| \| inline.NumDeleted \| 97595 \| 97597 \| 2 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 210514 \| 210522 \| 8 \| 0.00% \| 0.00% \| ``` So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half, and there is a very apparent decrease in instruction and basic block count. On vanilla llvm-test-suite: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 744 \| 744 \| 0.00% \| 0.00% \| \| instcount.NumInsertValueInst \| 2705 \| 2053 \| -652 \| -24.10% \| 24.10% \| \| simplifycfg.NumInvokes \| 1212 \| 1424 \| 212 \| 17.49% \| 17.49% \| \| instcount.NumExtractValueInst \| 21681 \| 20139 \| -1542 \| -7.11% \| 7.11% \| \| simplifycfg.NumSinkCommonInstrs \| 14575 \| 14361 \| -214 \| -1.47% \| 1.47% \| \| simplifycfg.NumSinkCommonCode \| 6815 \| 6743 \| -72 \| -1.06% \| 1.06% \| \| instcount.NumLandingPadInst \| 14851 \| 14712 \| -139 \| -0.94% \| 0.94% \| \| instcount.NumInvokeInst \| 27510 \| 27332 \| -178 \| -0.65% \| 0.65% \| \| instcombine.NumDeadInst \| 1438173 \| 1443371 \| 5198 \| 0.36% \| 0.36% \| \| instcount.NumResumeInst \| 2880 \| 2872 \| -8 \| -0.28% \| 0.28% \| \| instcombine.NumSunkInst \| 55187 \| 55076 \| -111 \| -0.20% \| 0.20% \| \| instcount.NumPHIInst \| 321366 \| 320916 \| -450 \| -0.14% \| 0.14% \| \| instcount.TotalBlocks \| 886816 \| 886493 \| -323 \| -0.04% \| 0.04% \| \| instcount.TotalInsts \| 7663845 \| 7661108 \| -2737 \| -0.04% \| 0.04% \| \| simplifycfg.NumSimpl \| 886791 \| 887171 \| 380 \| 0.04% \| 0.04% \| \| instcount.NumCallInst \| 553552 \| 553733 \| 181 \| 0.03% \| 0.03% \| \| instcombine.NumCombined \| 3200512 \| 3201202 \| 690 \| 0.02% \| 0.02% \| \| instcount.NumBrInst \| 741794 \| 741656 \| -138 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 14443 \| 14445 \| 2 \| 0.01% \| 0.01% \| \| asm-printer.EmittedInsts \| 7978085 \| 7977916 \| -169 \| 0.00% \| 0.00% \| \| inline.NumDeleted \| 73188 \| 73189 \| 1 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 291959 \| 291968 \| 9 \| 0.00% \| 0.00% \| ``` Roughly similar effect, less instructions and blocks total. See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e. Compile-time wise, this appears to be roughly geomean-neutral: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions And this is a win size-wize in general: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text See https://bugs.llvm.org/show_bug.cgi?id=47060 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85787	2020-08-16 23:27:56 +03:00
Johannes Doerfert	5272d29e2c	[OpenMP][CUDA] Keep one kernel list per device, not globally. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86039	2020-08-16 14:38:35 -05:00
Johannes Doerfert	aa27cfc1e7	[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel) Instead of calling `cuFuncGetAttribute` with `CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation, we can do it for the first one and cache the result as part of the `KernelInfo` struct. The only functional change is that we now expect `cuFuncGetAttribute` to succeed and otherwise propagate the error. Ignoring any error seems like a slippery slope... Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86038	2020-08-16 14:38:33 -05:00
Johannes Doerfert	95a25e4c32	[OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156 When we implement OpenMP GPU reductions we use type punning a lot during the shuffle and reduce operations. This is not always compatible with language rules on aliasing. So far we generated TBAA which later allowed to remove some of the reduce code as accesses and initialization were "known to not alias". With this patch we avoid TBAA in this step, hopefully for all accesses that we need to. Verified on the reproducer of PR46156 and QMCPack. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D86037	2020-08-16 14:38:31 -05:00

... 3 4 5 6 7 ...

363874 Commits All Branches Search

363874 Commits

All Branches