llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Chesterfield	bcb3f0f867	[libomptarget] Fix devicertl build [libomptarget] Fix devicertl build The target specific functions in target_interface are extern C, but the implementations for nvptx were mostly C++ mangling. That worked out as a quirk of DEVICE macro expanding to nothing, except for shuffle.h which only forward declared the functions with C++ linkage. Also implements GetWarpSize, as used by shuffle, and includes target_interface in nvptx target_impl.cu to help catch future divergence between interface and implementation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98651	2021-03-15 19:50:22 +00:00
Michael Kruse	9c486eb348	[Polly] Fix deprecation warning. NFC. IRBuilder::CreateLoad without type parameter was deprecated in `6312c538` to prepare for opaque pointers.	2021-03-15 14:31:16 -05:00
Wenlei He	a5d30421a6	[CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining. For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work: - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee. - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added. - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile). - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate. Differential Revision: https://reviews.llvm.org/D98590	2021-03-15 12:22:15 -07:00
Jianzhou Zhao	9cf5220c5c	[dfsan] Updated check_custom_wrappers.sh to dedup function names The origin wrappers added by https://reviews.llvm.org/D98359 reuse those __dfsw_ functions.	2021-03-15 19:12:08 +00:00
Fangrui Song	5d44c92bf8	Change void getNoop(MCInst &NopInst) to MCInst getNop() Prefer (self-documenting) return values to output parameters (which are liable to be used). While here, rename Noop to Nop which is more widely used and improves consistency with hasEmitNops/setEmitNops/emitNop/etc.	2021-03-15 12:05:34 -07:00
Jez Ng	29d4676059	[lld-macho] Place LC_FUNCTION_STARTS data at the right position This pleases the codesign (Otherwise it complains about "function starts data out of place") Reviewed By: #lld-macho, smeenai Differential Revision: https://reviews.llvm.org/D98648	2021-03-15 14:56:31 -04:00
Jianzhou Zhao	57a532b3ac	[dfsan] Do not check dfsan_get_origin by check_custom_wrappers.sh It is implemented like dfsan_get_label, and does not any code in dfsan_custome.cpp.	2021-03-15 18:55:34 +00:00
Craig Topper	41759c3d92	[RISCV] Add RISCVISD::BR_CC similar to RISCVISD::SELECT_CC. This allows me to introduce similar combines for branches as we have recently added for SELECT_CC. Some of them are less useful for standalone setccs and only help branch instructions. By having a BR_CC node its easier to only affect branches. I'm using CondCodeSDNode to make isel patterns easier to write so we can refer to the codes by name. SELECT_CC uses a constant instead. I've translated the condition code just like SELECT_CC so we need less patterns for the swapped conditions. This includes special cases for X < 1 and X > -1 that get translated to blez and bgez by using a 0 constant. computeKnownBitsForTargetNode support for SELECT_CC is added to allow MaskedValueIsZero to work for cases where the true and false values of the SELECT_CC are setccs and the result of the SELECT_CC is used by a BR_CC. This was needed to avoid regressions in some of the overflow tests. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98159	2021-03-15 11:54:01 -07:00
Jon Chesterfield	f675b3df48	[libomptarget] Drop assert.h, use freestanding for amdgcn devicertl [libomptarget] Drop assert.h, use freestanding for amdgcn devicertl Promotes the runtime assert to a link time error for the unimplemented fallback functions. Enables amdgcn to build with only clang provided headers, which makes it less likely to break other builds when enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98649	2021-03-15 18:50:09 +00:00
Philipp Tomsich	018e96f71f	[RISCV] Add isel-patterns to optimize (a < 1) into blez (a <= 0) The following code-sequence showed up in a testcase (isolated from SPEC2017) for if-conversion and vectorization when searching for the maximum in an array: addi a2, zero, 1 blt a1, a2, .LBB0_5 which can be expressed as `bge zero,a1,.LBB0_5`/`blez a1,/LBB0_5`. More generally, we want to express (a < 1) as (a <= 0). This adds the required isel-pattern and updates the testcases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98449	2021-03-15 11:32:43 -07:00
Michael Kruse	3f170eb197	[Polly][Optimizer] Apply user-directed unrolling. Make Polly look for unrolling metadata (https://llvm.org/docs/TransformMetadata.html#loop-unrolling) that is usually only interpreted by the LoopUnroll pass and apply it to the SCoP's schedule. While not that useful by itself (there already is an unroll pass), it introduces mechanism to apply arbitrary loop transformation directives in arbitrary order to the schedule. Transformations are applied until no more directives are found. Since ISL's rescheduling would discard the manual transformations and it is assumed that when the user specifies the sequence of transformations, they do not want any other transformations to apply. Applying user-directed transformations can be controlled using the `-polly-pragma-based-opts` switch and is enabled by default. This does not influence the SCoP detection heuristic. As a consequence, loop that do not fulfill SCoP requirements or the initial profitability heuristic will be ignored. `-polly-process-unprofitable` can be used to disable the latter. Other than manually editing the IR, there is currently no way for the user to add loop transformations in an order other than the order in the default pipeline, or transformations other than the one supported by clang's LoopHint. See the `unroll_double.ll` test as example that clang currently is unable to emit. My own extension of `#pragma clang loop` allowing an arbitrary order and additional transformations is available here: https://github.com/meinersbur/llvm-project/tree/pragma-clang-loop. An effort to upstream this functionality as `#pragma clang transform` (because `#pragma clang loop` has an implicit transformation order defined by the loop pipeline) is D69088. Additional transformations from my downstream pragma-clang-loop branch are tiling, interchange, reversal, unroll-and-jam, thread-parallelization and array packing. Unroll was chosen because it uses already-defined metadata and does not require correctness checks. Reviewed By: sebastiankreutzer Differential Revision: https://reviews.llvm.org/D97977	2021-03-15 13:05:39 -05:00
Stelios Ioannou	ab86edbc88	[AArch64] Implement __rndr, __rndrrs intrinsics This patch implements the __rndr and __rndrrs intrinsics to provide access to the random number instructions introduced in Armv8.5-A. They are only defined for the AArch64 execution state and are available when __ARM_FEATURE_RNG is defined. These intrinsics store the random number in their pointer argument and return a status code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter intrinsic reseeds the random number generator. The instructions write the NZCV flags indicating the success of the operation that we can then read with a CSET. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics [2] https://bugs.llvm.org/show_bug.cgi?id=47838 Differential Revision: https://reviews.llvm.org/D98264 Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc	2021-03-15 17:51:48 +00:00
Alex Zinenko	b868a3edad	[mlir] fix SPIR-V CPU and Vulkan runners after `e2310704d8` The commit in question changed the syntax but did not update the runner tests. This also required registering the MemRef dialect for custom parser to work correctly.	2021-03-15 18:36:58 +01:00
serge-sans-paille	4aa510be78	Allow __ieee128 as an alias to __float128 on ppc This matches gcc behavior. Differential Revision: https://reviews.llvm.org/D97846	2021-03-15 18:28:26 +01:00
serge-sans-paille	9628cb1fee	[NFC] Use higher level constructs to check for whitespace/newlines in the lexer It turns out that according to valgrind and perf, it's also slightly faster. Differential Revision: https://reviews.llvm.org/D98637	2021-03-15 18:27:19 +01:00
Luke Drummond	fcfd3fda71	[OpenCL] Respect calling convention for builtin `__translate_sampler_initializer` has a calling convention of `spir_func`, but clang generated calls to it using the default CC. Instruction Combining was lowering these mismatching calling conventions to `store i1* undef` which itself was subsequently lowered to a trap instruction by simplifyCFG resulting in runtime `SIGILL` There are arguably two bugs here: but whether there's any wisdom in converting an obviously invalid call into a runtime crash over aborting with a sensible error message will require further discussion. So for now it's enough to set the right calling convention on the runtime helper. Reviewed By: svenh, bader Differential Revision: https://reviews.llvm.org/D98411	2021-03-15 17:26:51 +00:00
Andrzej Warzynski	da408d98d7	[flang][docs] Fix the time for the new Flang driver call	2021-03-15 17:25:55 +00:00
Martin Storsjö	b5e228fc00	[libcxx] [test] Fix the temp_directory_path test for windows Check a different set of env vars, don't check the exact value of the fallback path. (GetTempPath falls back to returning the Windows folder if nothing better is available in env vars.) The test still fails one check on windows (due to relying on perms::none), which will be addressed separately. Differential Revision: https://reviews.llvm.org/D98139	2021-03-15 19:24:56 +02:00
Juneyoung Lee	edf634ebc2	[AssumeBundles] Add nonnull/align to op bundle if noundef exists This is a patch to add nonnull and align to assume's operand bundle only if noundef exists. Since nonnull and align in fn attr have poison semantics, they should be paired with noundef or noundef-implying attributes to be immediate UB. Reviewed By: jdoerfert, Tyker Differential Revision: https://reviews.llvm.org/D98228	2021-03-16 10:23:42 +09:00
Fraser Cormack	0035decae7	[CodeGen] Fix issues with scalable-vector INSERT/EXTRACT_SUBVECTORs This patch addresses a few issues when dealing with scalable-vector INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes. When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we store the low and high halves to the stack separately. The offset for the high half was calculated incorrectly. Additionally, we can optimize this process when we can detect that the subvector is contained entirely within the low/high split vector type. While this optimization is valid on scalable vectors, when performing the 'high' optimization, the subvector must also be a scalable vector. Note that the 'low' optimization is still conservative: it may be possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32, but we can't guarantee it. It is always possible to insert v2i32 into nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1. Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value type being a scalable vector, forgetting that we can also extract a fixed-length vector from a scalable one. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98495	2021-03-15 17:04:21 +00:00
Kevin Zhou	b7df372cdc	[Polly] Refactoring astScheduleDimIsParallel to take the C++ wrapper object. NFC Polly currently needs to be slowly refactor to use the C++ wrapper objects to handle the reference counters automatically. I took the function of astScheduleDimIsParallel and refactored it so that it uses the C++ wrapper function as much as possible. There are some problems with the IsParallel since it expects the C objects, so the C++ wrapper functions must be .release() and .get() first before they are able to be used with IsParallel. When checking the ReductionDependencies Parallelism with the Build's Schedule, I opted to keep the union map as a C object rather than a C++ object. Eventually, changes will need to be made to IsParallel to refactor it to the C++ wrappers. When this is done, this function will also need to be slightly refactored to not use the C object. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98455	2021-03-15 12:08:50 -05:00
Jon Chesterfield	156842937f	[libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding [libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding The glibc headers are a periodic source of problems compiling the devicertl. This patch resolves the following error run into while building llvm on a slightly different linux system. ``` In file included from .../lib/clang/13.0.0/include/inttypes.h:21: In file included from /usr/include/inttypes.h:25: /usr/include/features.h:461:12: fatal error: 'sys/cdefs.h' file not found # include <sys/cdefs.h> ^~~~~~~~~~~~~ ``` As a second patch, removing assert.h from shuffle will let amdgcn build as -ffreestanding, at which point only the headers that clang itself provides are used and interactions with the host glibc are eliminated. Doing the same for nvptx is complicated by printf handling but also seems worthwhile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98565	2021-03-15 16:54:58 +00:00
Martin Storsjö	d07e5c23b4	[libcxx] [test] Fix the get_temp_file_name() function for mingw Add the missing includes for getting the defines and functions used in the mingw version of get_temp_file_name(). This fixes 31 tests when built in a mingw configuration. Also remove a redundant ifdef; _WIN32 is defined in mingw targets too. Differential Revision: https://reviews.llvm.org/D97456	2021-03-15 18:52:49 +02:00
Martin Storsjö	f5f3a59837	[libcxx] [test] Disable some allocation checks in class.path tests on windows On windows, the path internal representation is wchar_t, and input/output often goes through utf8 inbetween, which causes extra allocations. MS STL also fails a number of strict allocation checks, so this shouldn't be a standards compliance issue. Differential Revision: https://reviews.llvm.org/D98398	2021-03-15 18:52:48 +02:00
Nico Weber	a431268668	[gn build] (semi-manually) port `b136a74efc`	2021-03-15 12:51:12 -04:00
Christopher Tetreault	39970764af	[CMake] Require python 3.6 if enabling LLVM test targets The lit test suite uses python 3.6 features. Rather than a strange python syntax error upon running the lit tests, we will require the correct version in CMake. Reviewed By: serge-sans-paille, yln Differential Revision: https://reviews.llvm.org/D95635	2021-03-15 09:50:39 -07:00
Craig Topper	3dc5b533e0	[RISCV] Improve legalization of i32 UADDO/USUBO on RV64. The default legalization uses zero extends that require pair of shifts on RISCV. Instead we can take advantage of the fact that unsigned compares work equally well on sign extended inputs. This allows us to use addw/subw and sext.w. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98233	2021-03-15 09:30:23 -07:00
Simon Pilgrim	772155793b	[X86][SSE] isHorizontalBinOp - ensure we clear any unused source operands to improve HADD/SUB matching Our shuffle matching for HADD/SUB patterns wasn't clearing repeated ops in 'fake unary' style shuffle masks (unpack(x,x) etc.), preventing matching of add(fakeunary(),fakeunary()) style patterns.	2021-03-15 16:24:29 +00:00
Alex Zinenko	0aceb61665	[mlir] make memref.cast implement ViewLikeOpInterface This was seemingly dropped in `e2310704d8`, potentially due to a misrebase. The absence of this trait makes aliasing analysis incorrect, leading to, e.g., buffer deallocation pass inserting deallocations too early.	2021-03-15 17:21:27 +01:00
Jianzhou Zhao	4e67ae7b6b	[dfsan] Add origin ABI wrappers for thread/signal/fork This is a part of https://reviews.llvm.org/D95835. See `bb91e02efd` about the similar issue of fork in MSan's origin tracking. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D98359	2021-03-15 16:18:00 +00:00
Zahira Ammarguellat	80ca4fd154	[NFC] Fix "unused parameter" error revealed in the Linux self-build.	2021-03-15 12:17:11 -04:00
Melanie Blower	33b1f3f42c	[clang][patch] Solve PR49479, File scope fp pragma should propagate to functions nested in struct, and initialization expressions Previously, the CurFPFeatures state was set to command line settings before semantic analysis of the nested member functions and initialization expressions, that's not correct, it should use the pragma state which is in effect at the lexical position. Reviewed By: Erich Keane, Aaron Ballman Differential Revision: https://reviews.llvm.org/D98211	2021-03-15 12:15:20 -04:00
Sanjay Patel	660728acd4	[InstSimplify] ctlz({signbit} >>u x) --> x The motivating pattern was handled in `0a2d69480d` , but we should have this for symmetry. But this really highlights that we could generalize for any shifted constant if we match this in instcombine. https://alive2.llvm.org/ce/z/MrmVNt	2021-03-15 12:03:35 -04:00
Sanjay Patel	3c93852a78	[InstSimplify] add tests for ctlz of shifted constant; NFC	2021-03-15 12:03:35 -04:00
Edward Jones	b136a74efc	[RISCV][compiler-rt] Add support for save-restore This adds the compiler-rt entry points required by the -msave-restore option. Differential Revision: https://reviews.llvm.org/D91717	2021-03-15 15:51:47 +00:00
Thomas Preud'homme	f60b35340f	Stop traping on sNaN in __builtin_isinf __builtin_isinf currently generates a floating-point compare operation which triggers a trap when faced with a signaling NaN in StrictFP mode. This commit uses integer operations instead to not generate any trap in such a case. Reviewed By: mibintc Differential Revision: https://reviews.llvm.org/D97125	2021-03-15 15:38:08 +00:00
Martin Storsjö	995a128f07	[libcxx] [docs] Update docs about how to build for Windows Refresh the existing paragraphs on building in MSVC configurations, add a sample of one working configuration for MinGW, and add more details on what's necessary to run the tests these days. Differential Revision: https://reviews.llvm.org/D97166	2021-03-15 17:30:26 +02:00
LLVM GN Syncbot	fd9604c815	[gn build] Port `13e49dcee4`	2021-03-15 15:24:41 +00:00
Jon Chesterfield	13e49dcee4	[amdgpu] Implement lower function LDS pass [amdgpu] Implement lower function LDS pass Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset. Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer. This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite. This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94648	2021-03-15 15:24:01 +00:00
Kostya Kortchinsky	752f477d67	[scudo][standalone] Add shared library to makefile Since we are looking to remove the old Scudo, we have to have a .so for parity purposes as some platforms use it. I tested this on Fuchsia & Linux, not on Android though. Differential Revision: https://reviews.llvm.org/D98456	2021-03-15 08:12:37 -07:00
Tim Keith	8e1c09ee5f	[flang] Build intrinsic .mod files in include/flang The build was putting .mod files for intrinsic modules in tools/flang/include/flang but the install puts them in include/flang, as does the out-of-tree build. This confused things for the driver. This change makes the build consistent with the install and simplifies the flang script accordingly. Also, clean up the cmake commands for building the .mod files. Differential Revision: https://reviews.llvm.org/D98522	2021-03-15 08:03:02 -07:00
Simon Pilgrim	814339454d	[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. Fold SHUFFLE(BINOP(SHUFFLE(X),SHUFFLE(Y))) -> BINOP(SHUFFLE'(X),SHUFFLE'(Y)) style patterns as well as the existing shuffles of constants.	2021-03-15 15:01:29 +00:00
Vy Nguyen	6f37d18d8c	[asan] Fixed test failing on windows due to different printf behaviour. %p reported prints upper case hex chars on Windows. The fix is to switch to using %#lx Differential Revision: https://reviews.llvm.org/D98570	2021-03-15 10:58:40 -04:00
David Green	0b2aae42e5	[AArch64] Zero extended extract_vector_elt pattern This adds a pattern for i64 zext_inreg(i32 extract_vector_elt X), producing a single UMOVvi16 instruction that is already expected to clear the top bits. The exact pattern that this matches is and(anyext(vector_extract X, lane), 0xff), similar to the sext patterns higher up in the same file. Differential Revision: https://reviews.llvm.org/D98599	2021-03-15 14:56:20 +00:00
Dmitry Polukhin	da55af7f1d	[clang-tidy] Enable modernize-concat-nested-namespaces also on headers For some reason the initial implementation of the check had an explicit check for the main file to avoid being applied in headers. This diff removes this check and add a test for the check on a header. Similar approach was proposed in D61989 but review there got stuck. Test Plan: added new test case Differential Revision: https://reviews.llvm.org/D97563	2021-03-15 07:32:45 -07:00
Nathan James	0333dde923	[clang-tidy] Fix readability-identifer-naming duplicating prefix or suffix for replacements. If a identifier has a correct prefix/suffix but a bad case, the fix won't strip them when computing the correct case, leading to duplication when the are added back. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D98521	2021-03-15 14:20:48 +00:00
Nathan James	74c270f33e	[ASTMatchers] Don't forward matchers in MapAnyOf Forwarding these means that if an r-value reference is passed, the matcher will be moved. However it appears this happens for each mapped node matcher, resulting in use-after-move issues. Reviewed By: steveire Differential Revision: https://reviews.llvm.org/D98497	2021-03-15 14:16:52 +00:00
Jan Svoboda	23cc8ebf59	[clang][lex] Speculative fix for buffer overrun on raw string parse This attempts to fix a (non-deterministic) buffer overrun when parsing raw string literals during modular build. Similar fix to `4e5b5c36f4`. Reviewed By: beccadax Differential Revision: https://reviews.llvm.org/D94950	2021-03-15 15:13:47 +01:00
Amy Kwan	e582c073d1	[NFC][PowerPC] Add additional load/store test cases This patch adds additional load/store test cases involving scalars, vectors, and PC-Rel in preparation for the refactored load and store implementation introduced in D93370. Differential Revision: https://reviews.llvm.org/D97391	2021-03-15 08:54:38 -05:00
Alex Zinenko	7aa6f3aa0c	[mlir] fix integration tests post `e2310704d8` The commit in question moved some ops across dialects but did not update some of the target-specific integration tests that use these ops, presumably because the corresponding target hardware was not available. Fix these tests.	2021-03-15 14:41:27 +01:00

1 2 3 4 5 ...

382757 Commits All Branches Search

382757 Commits

All Branches