llvm-project

Commit Graph

Author	SHA1	Message	Date
Hongtao Yu	c75da238b4	[CSSPGO] Deduplicating dangling pseudo probes. Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far. This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added. An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed. Differential Revision: https://reviews.llvm.org/D97482	2021-03-03 22:44:42 -08:00
Hongtao Yu	8985515822	[CSSPGO] Unblocking optimizations by dangling pseudo probes. This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments. The optimizations being unblocked are: 1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted. 2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions. 3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks. Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D97481	2021-03-03 22:44:42 -08:00
Hongtao Yu	ad2a59f584	[CSSPGO] Introducing dangling pseudo probes. Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block. To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission. If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95962	2021-03-03 22:44:41 -08:00
Christopher Di Bella	647af31e74	[libcxx] adds concept `std::assignable_from` Implements parts of: - P0898R3 Standard Library Concepts - P1754 Rename concepts to standard_case for C++20, while we still can Depends on D96660 Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D96742	2021-03-03 22:41:55 -08:00
Johannes Doerfert	e04c058798	[Docs] Remove `no-aa` from the alias analysis documentation The `no-aa` pass has been removed with `7b560d40bd`. Differential Revision: https://reviews.llvm.org/D95416	2021-03-04 00:35:52 -06:00
Johannes Doerfert	5b70c12f3e	[Attributor] Make DepClass a required argument We often used a sub-optimal dependence class in the past because we didn't see the argument. Let's make it explicit so we remember to think about it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	e592dad82e	[Attributor] Fold "TrackDependence" into the DepClassTy enum We don't need a bool and an enum to express the three options we currently have. This makes the interface nicer and much easier to use optional dependencies. Also avoids mistakes where the bool is false and enum ignored.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c8c93fdf0a	[Attributor] Avoid work for GEPs and wait till the users are visited	2021-03-04 00:35:52 -06:00
Johannes Doerfert	f3f88287c5	[Attributor] Use known alignment as lower bound to avoid work If we know already more than available from a use, we don't need to invest time on it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c14213e030	[Attributor][NFC] Move some trivial checks up	2021-03-04 00:35:52 -06:00
Johannes Doerfert	09c3eebf5f	[Attributor] Use sensible initialization in AANoCaptureCallSiteReturned	2021-03-04 00:35:51 -06:00
Kito Cheng	b46a1b129f	[doc] Fix description of _Float16 According to ISO/IEC TS 18661-3:2015 _FloatN is interchange floating point type, extended floating-point type is _FloatNx. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2342.pdf Reviewed By: SjoerdMeijer Differential revision: https://reviews.llvm.org/D97759	2021-03-04 14:17:54 +08:00
Siva Chandra Reddy	35e2e448ce	[libc] Remove redundant header files included from internal paths.	2021-03-03 21:29:46 -08:00
Evgeniy Brevnov	e94125f054	[DSE] Add support for not aligned begin/end This is an attempt to improve handling of partial overlaps in case of unaligned begin\end. Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end. The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment. I'll update with performance results as measured by the test-suite...it's still running... Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93530	2021-03-04 12:24:23 +07:00
Alan Baker	21427b8eb8	libclc: Add clspv target to libclc Add clspv as a new target for libclc. clspv is an open-source compiler that compiles OpenCL C to Vulkan SPIR-V. Compiles for the spir target. The clspv target differs from the the spirv target in the following ways: * fma is modified to use uint2 instead of ulong for mantissas. This results in lower performance fma, but provides a implementation that can be used on more Vulkan devices where 64-bit integer support is less common. * Use of a software implementation of nextafter because the generic implementation depends on nextafter being a defined builtin function for which clspv has no definition. * Full optimization of the library (-O3) and no conversion to SPIR-V This library is close to what would be produced by running opt -O3 < builtins.opt.spirv-mesa3d-.bc > builtins.opt.clspv--.bc and continuing the build from that point. Reviewer: jvesely Differential Revision: https://reviews.llvm.org/D94013	2021-03-04 00:19:10 -05:00
Siva Chandra Reddy	0106370bee	[compiler-rt \| interceptors] Provide an intercept override knob. This knob is useful for downstream users who want that some of their libc functions to not be intercepted. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D97740	2021-03-03 21:03:46 -08:00
Christopher Di Bella	f893312c1a	[libcxx] adds concept `std::common_with` Implements parts of: - P0898R3 Standard Library Concepts - P1754 Rename concepts to standard_case for C++20, while we still can Depends on D96660 Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D96683	2021-03-03 20:01:11 -08:00
Serguei Katkov	a0ff0f30df	[InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase statepoint intrinsic can be used in invoke context, so it should be handled in visitCallBase to cover both call and invoke. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97833	2021-03-04 11:00:22 +07:00
Wang, Pengfei	e7e67c930a	Add Windows ehcont section support (/guard:ehcont). Add option /guard:ehcont Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D96709	2021-03-04 11:47:29 +08:00
Jason Molenda	266bb78f7d	LanguageRuntime for 0th frame unwind, simplify getting pc-for-symbolication Add calls into LanguageRuntime when finding the unwind method to use out of the 0th (currently executing) stack frame. Allow for the LanguageRuntimes to indicate if this stack frames should be treated like a zeroth-frame -- symbolication should be done based on the saved pc address, not decremented like normal ABI function calls. Add methods to RegisterContext and StackFrame to get a pc value suitable for symbolication, to reduce the number of places in lldb where we decrement the saved pc values before symbolication. <rdar://problem/70398009> Differential Revision: https://reviews.llvm.org/D97644	2021-03-03 19:29:40 -08:00
Arthur O'Dwyer	09fa1d0e50	[libc++] Introduce __identity_t<T>. NFCI. This is just a shorter synonym for `__identity<T>::type`. Use it consistently throughout, where possible. There is still some metaprogramming in <memory> and <variant> where `__identity` is being used _without_ immediately calling `::type` on it; but this is the unusual case, and it will become even less usual as we start deliberately protecting certain types against deduction (e.g. D97742). Differential Revision: https://reviews.llvm.org/D97862	2021-03-03 22:23:14 -05:00
Christopher Di Bella	3f5438c46c	[libcxx] adds concept `std::common_reference_with` Implements parts of: - P0898R3 Standard Library Concepts - P1754 Rename concepts to standard_case for C++20, while we still can Depends on D96657 Reviewed By: ldionne, Mordante, #libc Differential Revision: https://reviews.llvm.org/D96660	2021-03-03 17:52:41 -08:00
Aart Bik	553cb6d473	[mlir][sparse] fix bug in reduction chain Found with exhaustive testing, it is possible that a while loop appears in between chainable for loops. As long as we don't scalarize reductions in while loops, this means we need to terminate the chain at the while. This also refactors the reduction code into more readable helper methods. Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D97886	2021-03-03 17:38:22 -08:00
Juneyoung Lee	dbf41ddaa3	[LangRef] fix undefined label	2021-03-04 10:12:57 +09:00
Juneyoung Lee	c821ef4513	[LangRef] Make lifetime intrinsic's semantics consistent with StackColoring's comment This patch is an update to LangRef by describing lifetime intrinsics' behavior by following the description of MIR's LIFETIME_START/LIFETIME_END markers at StackColoring.cpp (`eb44682d67/llvm/lib/CodeGen/StackColoring.cpp (L163)`) and the discussion in llvm-dev. In order to explicitly define the meaning of an object lifetime, I added 'Object Lifetime' subsection. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D94002	2021-03-04 09:58:06 +09:00
River Riddle	83ef862fad	[mlir] Add support for generating Attribute classes for ODS The support for attributes closely maps that of Types (basically 1-1) given that Attributes are defined in exactly the same way as Types. All of the current ODS TypeDef classes get an Attr equivalent. The generation of the attribute classes themselves share the same generator as types. Differential Revision: https://reviews.llvm.org/D97589	2021-03-03 16:41:49 -08:00
Craig Topper	201ebf211f	[RISCV] Make use of the required features in BuiltinInfo to store that V extension builtins require 'experimental-v'. Use that to print the diagnostic in SemaChecking instead of listing all of the builtins in a switch. With the required features, IR generation will also be able to error on this. Checking this here allows us to have a RISCV focused error message. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D97826	2021-03-03 16:24:08 -08:00
Fangrui Song	584cb67d2d	[IRSymTab] Set FB_used on llvm.compiler.used symbols IR symbol table does not parse inline asm. A symbol only referenced by inline asm is not in the IR symbol table, so LTO does not know that the definition (in another translation unit) is referenced and may internalize it, even if that definition has `__attribute__((used))` (which lowers to `llvm.compiler.used` on ELF targets since D97446). ``` // cabac.c __attribute__((used)) const uint8_t ff_h264_cabac_tables[...] = {...}; // h264_cabac.c asm("lea ff_h264_cabac_tables(%rip), %0" : ...); ``` `__attribute__((used))` is the recommended way to tell the compiler there may be inline asm references, so the usage is perfectly fine. This patch conservatively sets the `FB_used` bit on `llvm.compiler.used` symbols to work around the IR symbol table limitation. Note: before D97446, Clang never emitted symbols in the `llvm.compiler.used` list, so this change does not punish any Clang emitted global object. Without the patch, `ff_h264_cabac_tables` may be assigned to a non-external partition and get internalized. Then we will get a linker error because the `cabac.c` definition is not exposed. Differential Revision: https://reviews.llvm.org/D97755	2021-03-03 16:22:30 -08:00
Steven Wan	0b274ed499	[AIX] Update default arch on AIX On AIX, the default arch level should match the minimum supported arch level of the OS version. Differential Revision: https://reviews.llvm.org/D97823	2021-03-03 19:07:43 -05:00
River Riddle	e07c968a6d	[mlir][pdl][NFC] Rename InputOp to OperandOp This better matches the actual IR concept that is being modeled, and is consistent with how the rest of PDL is structured. Differential Revision: https://reviews.llvm.org/D95718	2021-03-03 15:48:00 -08:00
River Riddle	55f878bad9	[mlir][pdl] Add a new !pdl.range<> type This type represents a range of positional values. It will be used in followup revisions to add support for variadic constructs to PDL, such as operand and result ranges. Differential Revision: https://reviews.llvm.org/D95717	2021-03-03 15:48:00 -08:00
Xun Li	03f668613c	[LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment. This patch disable promotion by check whether loop contains coro.suspend intrinsics. This is a resubmit of D86190. Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose. In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame. Differential Revision: https://reviews.llvm.org/D96928	2021-03-03 15:21:57 -08:00
Fangrui Song	30ad7b5dad	[test] Fix profiling.ll `__llvm_prf_nm` is compressed if zlib is available. In addition, its size may not be that stable.	2021-03-03 15:18:44 -08:00
Jin Lin	7c2192b277	Add the use of register r for outlined function when register r is live in and defined later. The compiler needs to mark register $x0 as live in for the following case. $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def dead $x0 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D95267	2021-03-03 15:14:11 -08:00
River Riddle	64be3fcb7a	Fix flang build after D97804	2021-03-03 15:07:03 -08:00
George Balatsouras	6b7b53f5b9	[dfsan] Remove hard-coded shadow width in more tests As a preparation step for fast8 support, we need to update the tests to pass in both modes. That requires generalizing the shadow width and remove any hard coded references that assume it's always 2 bytes. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D97884	2021-03-03 15:05:16 -08:00
Zequan Wu	5bdc5e7efd	[lld-link] Add safe icf mode to lld-link, which does safe icf for all sections. Differential Revision: https://reviews.llvm.org/D97436	2021-03-03 14:52:33 -08:00
River Riddle	3dfa86149e	[mlir][IR] Refactor the internal implementation of Value The current implementation of Value involves a pointer int pair with several different kinds of owners, i.e. BlockArgumentImpl, Operation , TrailingOpResult. This design arose from the desire to save memory overhead for operations that have a very small number of results (generally 0-2). There are, unfortunately, many problematic aspects of the current implementation that make Values difficult to work with or just inefficient. Operation result types are stored as a separate array on the Operation. This is very inefficient for many reasons: we use TupleType for multiple results, which can lead to huge amounts of memory usage if multi-result operations change types frequently(they do). It also means that simple methods like Value::getType/Value::setType now require complex logic to get to the desired type. Value only has one pointer bit free, severely limiting the ability to use it in things like PointerUnion/PointerIntPair. Given that we store the kind of a Value along with the "owner" pointer, we only leave one bit free for users of Value. This creates situations where we end up nesting PointerUnions to be able to use Value in one. As noted above, most of the methods in Value need to branch on at least 3 different cases which is both inefficient, possibly error prone, and verbose. The current storage of results also creates problems for utilities like ValueRange/TypeRange, which want to efficiently store base pointers to ranges (of which Operation isn't really useful as one). This revision greatly simplifies the implementation of Value by the introduction of a new ValueImpl class. This class contains all of the state shared between all of the various derived value classes; i.e. the use list, the type, and the kind. This shared implementation class provides several large benefits: * Most of the methods on value are now branchless, and often one-liners. * The "kind" of the value is now stored in ValueImpl instead of Value This frees up all of Value's pointer bits, allowing for users to take full advantage of PointerUnion/PointerIntPair/etc. It also allows for storing more operation results as "inline", 6 now instead of 2, freeing up 1 word per new inline result. * Operation result types are now stored in the result, instead of a side array This drops the size of zero-result operations by 1 word. It also removes the memory crushing use of TupleType for operations results (which could lead up to hundreds of megabytes of "dead" TupleTypes in the context). This also allowed restructured ValueRange, making it simpler and one word smaller. This revision does come with two conceptual downsides: * Operation::getResultTypes no longer returns an ArrayRef<Type> This conceptually makes some usages slower, as the iterator increment is slightly more complex. * OpResult::getOwner is slightly more expensive, as it now requires a little bit of arithmetic From profiling, neither of the conceptual downsides have resulted in any perceivable hit to performance. Given the advantages of the new design, most compiles are slightly faster. Differential Revision: https://reviews.llvm.org/D97804	2021-03-03 14:33:37 -08:00
Jez Ng	5d9169845b	[lld-macho] Fix test breakage introduced by D97799	2021-03-03 17:32:12 -05:00
Sanjay Patel	b3f0c2653b	[Analysis] simplify propagation of FMF in recurrences; NFC This is a mess, but this is hopefully no-functional-change. The 'Prev' descriptor is only used for min/max recurrences or when starting a match from a phi, so it should not be a factor when propagating FMF for fmul/fadd. The API is confusing (and should be reduced in subsequent steps) because the "UnsafeAlgebraInst" appears to actually be a placeholder for a recurrence that does NOT have FMF, but we still want to treat it as reassociative.	2021-03-03 17:28:10 -05:00
Stefan Gränitz	295ea050ad	[lli] Add JITLink link component after `99a6d003ed`	2021-03-03 23:14:26 +01:00
David Blaikie	4fda0dc14b	Fix use of deprecated API	2021-03-03 14:07:28 -08:00
Louis Dionne	460953ad9a	[libc++] Temporary hack: disable Apple back-deployment testing Apple back-deployment testing is currently failing because Green Dragon is down. To avoid stalling the whole CI pipeline because of that, I am temporarily disabling those jobs until Green Dragon is back, or even better we have found a different way to store those small artifacts.	2021-03-03 17:02:48 -05:00
MaheshRavishankar	c118fdcd59	[mlir] Remove incorrect folding for SubTensorInsertOp The SubTensorInsertOp has a requirement that dest type and result type match. Just folding the tensor.cast operation violates this and creates verification errors during canonicalization. Also fix other canonicalization methods that werent inserting casts properly. Differential Revision: https://reviews.llvm.org/D97800	2021-03-03 13:58:05 -08:00
Florian Hahn	75805dce5f	[AArch64] Add implicit uses for operands when expanding BLR_RVMARKER. Make sure we preserve info about passed arguments as implicit uses, to make sure later passes still have access to this information. This fixes a mis-compile where the machine-combiner would pick an incorrect free register.	2021-03-03 21:56:05 +00:00
Stefan Gränitz	e984c2b06f	Revert "hack to unbreak check-llvm on win after D97335" in attempt for actual fix This reverts commit `900f076113` and attempts an actual fix: All failing tests for llvm-jitlink use the `-noexec` flag. The inputs they operate on are not meant for execution on the host system. Looking e.g. at the MachO_test_harness_harnesss.s test, llvm-mc generates input machine code with "x86_64-apple-macosx10.9". My previous attempt in `bbdb4c8c9b` disabled the debug support plugin for Windows targets, but what we would actually want is to disable it on Windows HOSTS. With the new patch here, I don't do exactly that, but instead follow the approach for the EH frame plugin and include the `-noexec` flag in the condition. It should have the desired effect when it comes to the test suite. It appears a little workaround'ish, but should work reliably for now. I will discuss the issue with Lang and see if we can do better. Thanks @thakis again for the temporary fix.	2021-03-03 22:35:36 +01:00
Soumi Manna	eec7f8f7b1	[WebAssembly] Add missing default cases in switch statements unsigned variable 'IntNo' has been declared but not been defined inside function EmitWebAssemblyBuiltinExpr(). static code analysis tool complains about uninitialized variable "IntNo" since this enters to default branch without setting any intrinsics and calls Function *Callee = CGM.getIntrinsic(IntNo). This patch fixes the problem by adding default cases in switch statements.	2021-03-03 13:15:23 -08:00
Jez Ng	b63919e180	[lld-macho] Require -arch and -platform_version to always be specified We previously defaulted to x86_64 and an unknown platform, which was fine when we only supported one arch and did no platform checks, but that will no longer be true going ahead. Therefore, we should require those flags to be specified whenever the linker is invoked. Note that LLD-ELF and ld64 both infer the arch from their input object files, but the usefulness of that is questionable since clang will always specify these flags, and most of the time `lld` will be invoked via clang. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D97799	2021-03-03 15:52:10 -05:00
Jez Ng	1168736c66	[lld-macho][nfc] Parse more options using getLastArg{Value} The option-iterating loop should be reserved for options whose command-line order is important. I think LLD-ELF follows a similar design. Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D97797	2021-03-03 15:52:06 -05:00
Whitney Tsang	58d531fd6f	[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable. Reviewed By: Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D97747	2021-03-03 20:43:31 +00:00

1 2 3 4 5 ...

381630 Commits All Branches Search

381630 Commits

All Branches