llvm-project

Commit Graph

Author	SHA1	Message	Date
Jessica Paquette	bc43ddf42f	[AArch64][GlobalISel] NFC: Refactor G_FCMP selection code Refactor this so it's similar to the existing integer comparison code. Also add some missing 64-bit testcases to select-fcmp.mir. Refactoring to prep for improving selection for G_FCMP-related conditional branches etc. Differential Revision: https://reviews.llvm.org/D88614	2020-09-30 16:50:39 -07:00
Ranjeet Singh	e4f50e587f	[ARM] Add missing target for Arm neon test case. This is a follow-up from https://reviews.llvm.org/D61717. Where Richard described the issue with compiling arm_neon.h under -flax-vector-conversions=none. It looks like the example reproducer does actually work but what was missing was a test entry for that target. Differential Revision: https://reviews.llvm.org/D88546	2020-10-01 00:32:33 +01:00
Joachim Protze	23419bfd1c	[OpenMP][libarcher] Allow all possible argument separators in TSAN_OPTIONS Currently, the parser used to tokenize the TSAN_OPTIONS in libomp uses only spaces as separators, even though TSAN in compiler-rt supports other separators like ':' or ','. CTest uses ':' to separate sanitizer options by default. The documentation for other sanitizers mentions ':' as separator, but TSAN only lists spaces, which is probably where this mismatch originated. Patch provided by upsj Differential Revision: https://reviews.llvm.org/D87144	2020-10-01 01:10:13 +02:00
Craig Topper	b23916504a	Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes (bug 34579) Patch IEEEFloat::isSignificandAllZeros and IEEEFloat::isSignificandAllOnes to behave correctly in the case that the size of the significand is a multiple of the width of the integerParts making up the significand. The patch to IEEEFloat::isSignificandAllOnes fixes bug 34579, and the patch to IEEE:Float:isSignificandAllZeros fixes the unit test "APFloatTest.x87Next" I added here. I have included both in this diff since the changes are very similar. Patch by Andrew Briand	2020-09-30 16:07:15 -07:00
Ahsan Saghir	66d2e3f495	[PowerPC] Add outer product instructions for MMA This patch adds outer product instructions for MMA, including related infrastructure, and their tests. Depends on D84968. Reviewed By: #powerpc, bsaleil, amyk Differential Revision: https://reviews.llvm.org/D88043	2020-09-30 18:06:49 -05:00
Akira Hatanaka	21cf2e6c26	Handle unknown OSes in DarwinTargetInfo::getExnObjectAlignment rdar://problem/69727650	2020-09-30 16:05:17 -07:00
Joachim Protze	6104b30446	[OpenMP][OMPT] Update OMPT tests for newly added GOMP interface patches This patch updates the expected results for the GOMP interface patches: D87267, D87269, and D87271. The taskwait-depend test is changed to really use taskwait-depend and copied to an task_if0-depend test. To pass the tests, the handling of the return address was fixed. Differential Revision: https://reviews.llvm.org/D87680	2020-10-01 00:53:41 +02:00
Joachim Protze	55cff5b288	[OpenMP][libomptarget] make omp_get_initial_device 5.1 compliant OpenMP 5.1 defines omp_get_initial_device to return the same value as omp_get_num_devices. Since this change is also 5.0 compliant, no versioning is needed. Differential Revision: https://reviews.llvm.org/D88149	2020-10-01 00:51:11 +02:00
peter klausler	37b2e2b04c	[flang] Semantic analysis for FINAL subroutines Represent FINAL subroutines in the symbol table entries of derived types. Enforce constraints. Update tests that have inadvertent violations or modified messages. Added a test. The specific procedure distinguishability checking code for generics was used to enforce distinguishability of FINAL procedures. (Also cleaned up some confusion and redundancy noticed in the type compatibility infrastructure while digging into that area.) Differential revision: https://reviews.llvm.org/D88613	2020-09-30 15:46:15 -07:00
Reid Kleckner	5519e4da83	Re-land "[PDB] Merge types in parallel when using ghashing" Stored Error objects have to be checked, even if they are success values. This reverts commit `8d250ac3cd`. Relands commit 49b3459930655d879b2dc190ff8fe11c38a8be5f.. Original commit message: ----------------------------------------- This makes type merging much faster (-24% on chrome.dll) when multiple threads are available, but it slightly increases the time to link (+10%) when /threads:1 is passed. With only one more thread, the new type merging is faster (-11%). The output PDB should be identical to what it was before this change. To give an idea, here is the /time output placed side by side: BEFORE \| AFTER Input File Reading: 956 ms \| 968 ms Code Layout: 258 ms \| 190 ms Commit Output File: 6 ms \| 7 ms PDB Emission (Cumulative): 6691 ms \| 4253 ms Add Objects: 4341 ms \| 2927 ms Type Merging: 2814 ms \| 1269 ms -55%! Symbol Merging: 1509 ms \| 1645 ms Publics Stream Layout: 111 ms \| 112 ms TPI Stream Layout: 764 ms \| 26 ms trivial Commit to Disk: 1322 ms \| 1036 ms -300ms ----------------------------------------- -------- Total Link Time: 8416 ms 5882 ms -30% overall The main source of the additional overhead in the single-threaded case is the need to iterate all .debug$T sections up front to check which type records should go in the IPI stream. See fillIsItemIndexFromDebugT. With changes to the .debug$H section, we could pre-calculate this info and eliminate the need to do this walk up front. That should restore single-threaded performance back to what it was before this change. This change will cause LLD to be much more parallel than it used to, and for users who do multiple links in parallel, it could regress performance. However, when the user is only doing one link, it's a huge improvement. In the future, we can use NT worker threads to avoid oversaturating the machine with work, but for now, this is such an improvement for the single-link use case that I think we should land this as is. Algorithm ---------- Before this change, we essentially used a DenseMap<GloballyHashedType, TypeIndex> to check if a type has already been seen, and if it hasn't been seen, insert it now and use the next available type index for it in the destination type stream. DenseMap does not support concurrent insertion, and even if it did, the linker must be deterministic: it cannot produce different PDBs by using different numbers of threads. The output type stream must be in the same order regardless of the order of hash table insertions. In order to create a hash table that supports concurrent insertion, the table cells must be small enough that they can be updated atomically. The algorithm I used for updating the table using linear probing is described in this paper, "Concurrent Hash Tables: Fast and General(?)!": https://dl.acm.org/doi/10.1145/3309206 The GHashCell in this change is essentially a pair of 32-bit integer indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the TpiSource object, and it represents an input type stream. The typeIndex is the index of the type in the stream. Together, we have something like a ragged 2D array of ghashes, which can be looked up as: tpiSources[tpiSrcIndex]->ghashes[typeIndex] By using these side tables, we can omit the key data from the hash table, and keep the table cell small. There is a cost to this: resolving hash table collisions requires many more loads than simply looking at the key in the same cache line as the insertion position. However, most supported platforms should have a 64-bit CAS operation to update the cell atomically. To make the result of concurrent insertion deterministic, the cell payloads must have a priority function. Defining one is pretty straightforward: compare the two 32-bit numbers as a combined 64-bit number. This means that types coming from inputs earlier on the command line have a higher priority and are more likely to appear earlier in the final PDB type stream than types from an input appearing later on the link line. After table insertion, the non-empty cells in the table can be copied out of the main table and sorted by priority to determine the ordering of the final type index stream. At this point, item and type records must be separated, either by sorting or by splitting into two arrays, and I chose sorting. This is why the GHashCell must contain the isItem bit. Once the final PDB TPI stream ordering is known, we need to compute a mapping from source type index to PDB type index. To avoid starting over from scratch and looking up every type again by its ghash, we save the insertion position of every hash table insertion during the first insertion phase. Because the table does not support rehashing, the insertion position is stable. Using the array of insertion positions indexed by source type index, we can replace the source type indices in the ghash table cells with the PDB type indices. Once the table cells have been updated to contain PDB type indices, the mapping for each type source can be computed in parallel. Simply iterate the list of cell positions and replace them with the PDB type index, since the insertion positions are no longer needed. Once we have a source to destination type index mapping for every type source, there are no more data dependencies. We know which type records are "unique" (not duplicates), and what their final type indices will be. We can do the remapping in parallel, and accumulate type sizes and type hashes in parallel by type source. Lastly, TPI stream layout must be done serially. Accumulate all the type records, sizes, and hashes, and add them to the PDB. Differential Revision: https://reviews.llvm.org/D87805	2020-09-30 15:44:38 -07:00
Stanislav Mekhanoshin	722d792499	[AMDGPU] Reorganize VOP3P encoding This changes width of encoding and opcode fields to match the documentation. Differential Revision: https://reviews.llvm.org/D88619	2020-09-30 15:27:06 -07:00
Vitaly Buka	7475bd5411	[Msan] Add ptsname, ptsname_r interceptors Reviewed By: eugenis, MaskRay Differential Revision: https://reviews.llvm.org/D88547	2020-09-30 15:00:52 -07:00
MaheshRavishankar	c694588fc5	[mlir][Linalg] Add pattern to tile and fuse Linalg operations on buffers. The pattern is structured similar to other patterns like LinalgTilingPattern. The fusion patterns takes options that allows you to fuse with producers of multiple operands at once. - The pattern fuses only at the level that is known to be legal, i.e if a reduction loop in the consumer is tiled, then fusion should happen "before" this loop. Some refactoring of the fusion code is needed to fuse only where it is legal. - Since the fusion on buffers uses the LinalgDependenceGraph that is not mutable in place the fusion pattern keeps the original operations in the IR, but are tagged with a marker that can be later used to find the original operations. This change also fixes an issue with tiling and distribution/interchange where if the tile size of a loop were 0 it wasnt account for in these. Differential Revision: https://reviews.llvm.org/D88435	2020-09-30 14:56:58 -07:00
Reid Kleckner	8d250ac3cd	Revert "[PDB] Merge types in parallel when using ghashing" This reverts commit `49b3459930`.	2020-09-30 14:55:32 -07:00
Reid Kleckner	49b3459930	[PDB] Merge types in parallel when using ghashing This makes type merging much faster (-24% on chrome.dll) when multiple threads are available, but it slightly increases the time to link (+10%) when /threads:1 is passed. With only one more thread, the new type merging is faster (-11%). The output PDB should be identical to what it was before this change. To give an idea, here is the /time output placed side by side: BEFORE \| AFTER Input File Reading: 956 ms \| 968 ms Code Layout: 258 ms \| 190 ms Commit Output File: 6 ms \| 7 ms PDB Emission (Cumulative): 6691 ms \| 4253 ms Add Objects: 4341 ms \| 2927 ms Type Merging: 2814 ms \| 1269 ms -55%! Symbol Merging: 1509 ms \| 1645 ms Publics Stream Layout: 111 ms \| 112 ms TPI Stream Layout: 764 ms \| 26 ms trivial Commit to Disk: 1322 ms \| 1036 ms -300ms ----------------------------------------- -------- Total Link Time: 8416 ms 5882 ms -30% overall The main source of the additional overhead in the single-threaded case is the need to iterate all .debug$T sections up front to check which type records should go in the IPI stream. See fillIsItemIndexFromDebugT. With changes to the .debug$H section, we could pre-calculate this info and eliminate the need to do this walk up front. That should restore single-threaded performance back to what it was before this change. This change will cause LLD to be much more parallel than it used to, and for users who do multiple links in parallel, it could regress performance. However, when the user is only doing one link, it's a huge improvement. In the future, we can use NT worker threads to avoid oversaturating the machine with work, but for now, this is such an improvement for the single-link use case that I think we should land this as is. Algorithm ---------- Before this change, we essentially used a DenseMap<GloballyHashedType, TypeIndex> to check if a type has already been seen, and if it hasn't been seen, insert it now and use the next available type index for it in the destination type stream. DenseMap does not support concurrent insertion, and even if it did, the linker must be deterministic: it cannot produce different PDBs by using different numbers of threads. The output type stream must be in the same order regardless of the order of hash table insertions. In order to create a hash table that supports concurrent insertion, the table cells must be small enough that they can be updated atomically. The algorithm I used for updating the table using linear probing is described in this paper, "Concurrent Hash Tables: Fast and General(?)!": https://dl.acm.org/doi/10.1145/3309206 The GHashCell in this change is essentially a pair of 32-bit integer indices: <sourceIndex, typeIndex>. The sourceIndex is the index of the TpiSource object, and it represents an input type stream. The typeIndex is the index of the type in the stream. Together, we have something like a ragged 2D array of ghashes, which can be looked up as: tpiSources[tpiSrcIndex]->ghashes[typeIndex] By using these side tables, we can omit the key data from the hash table, and keep the table cell small. There is a cost to this: resolving hash table collisions requires many more loads than simply looking at the key in the same cache line as the insertion position. However, most supported platforms should have a 64-bit CAS operation to update the cell atomically. To make the result of concurrent insertion deterministic, the cell payloads must have a priority function. Defining one is pretty straightforward: compare the two 32-bit numbers as a combined 64-bit number. This means that types coming from inputs earlier on the command line have a higher priority and are more likely to appear earlier in the final PDB type stream than types from an input appearing later on the link line. After table insertion, the non-empty cells in the table can be copied out of the main table and sorted by priority to determine the ordering of the final type index stream. At this point, item and type records must be separated, either by sorting or by splitting into two arrays, and I chose sorting. This is why the GHashCell must contain the isItem bit. Once the final PDB TPI stream ordering is known, we need to compute a mapping from source type index to PDB type index. To avoid starting over from scratch and looking up every type again by its ghash, we save the insertion position of every hash table insertion during the first insertion phase. Because the table does not support rehashing, the insertion position is stable. Using the array of insertion positions indexed by source type index, we can replace the source type indices in the ghash table cells with the PDB type indices. Once the table cells have been updated to contain PDB type indices, the mapping for each type source can be computed in parallel. Simply iterate the list of cell positions and replace them with the PDB type index, since the insertion positions are no longer needed. Once we have a source to destination type index mapping for every type source, there are no more data dependencies. We know which type records are "unique" (not duplicates), and what their final type indices will be. We can do the remapping in parallel, and accumulate type sizes and type hashes in parallel by type source. Lastly, TPI stream layout must be done serially. Accumulate all the type records, sizes, and hashes, and add them to the PDB. Differential Revision: https://reviews.llvm.org/D87805	2020-09-30 14:22:48 -07:00
Sam McCall	85fc5bf341	[clangd] Remove dead variable. NFC	2020-09-30 23:19:15 +02:00
peter klausler	0c3c8f4ae6	[flang] Fix descriptor-based array data item I/O for list-directed CHARACTER & LOGICAL These types have to distinguish list-directed I/O from formatted I/O, and the subscript incrementation call was in the formatted branch of the if() rather than after the if(). Differential revision: https://reviews.llvm.org/D88606	2020-09-30 14:01:45 -07:00
Hubert Tong	ae4c400e02	[NFC] Fix spacing in clang/test/Driver/aix-ld.c Fix one line with mismatch in indentation after `afc277b0ed`.	2020-09-30 17:01:32 -04:00
Rainer Orth	8a1084a948	[asan][test] XFAIL Posix/no_asan_gen_globals.c on Solaris `Posix/no_asan_gen_globals.c` currently `FAIL`s on Solaris: $ nm no_asan_gen_globals.c.tmp.exe \| grep ___asan_gen_ 0809696a r .L___asan_gen_.1 0809a4cd r .L___asan_gen_.2 080908e2 r .L___asan_gen_.4 0809a4cd r .L___asan_gen_.5 0809a529 r .L___asan_gen_.7 0809a4cd r .L___asan_gen_.8 As detailed in Bug 47607, there are two factors here: - `clang` plays games by emitting some local labels into the symbol table. When instead one uses `-fno-integrated-as` to have `gas` create the object files, they don't land in the objects in the first place. - Unlike GNU `ld`, the Solaris `ld` doesn't support support `-X`/`--discard-locals` but instead relies on the assembler to follow its specification and not emit local labels. Therefore this patch `XFAIL`s the test on Solaris. Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D88218	2020-09-30 22:58:07 +02:00
Craig Topper	d1d7fc9832	[X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare. This will be further canonicalized to a compare involving 0 which will enable the use of test instructions. Either using cmovg for signed for cmovne for unsigned. Fixes more case for PR47049	2020-09-30 13:50:52 -07:00
Arthur Eubanks	ce5379f0f0	[NPM] Add target specific hook to add passes for New Pass Manager The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager). Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D88138	2020-09-30 13:29:43 -07:00
Thomas Raoux	dd14e58252	[mlir][vector] First step of vector distribution transformation This is the first of several steps to support distributing large vectors. This adds instructions extract_map and insert_map that allow us to do incremental lowering. Right now the transformation only apply to simple pointwise operation with a vector size matching the multiplicity of the IDs used to distribute the vector. This can be used to distribute large vectors to loops or SPMD. Differential Revision: https://reviews.llvm.org/D88341	2020-09-30 13:14:55 -07:00
Christian Sigg	e9b3884161	Add GDB prettyprinters for a few more MLIR types. Reviewed By: dblaikie, jpienaar Differential Revision: https://reviews.llvm.org/D87159	2020-09-30 21:22:47 +02:00
Joseph Huber	1b60f63e4f	Revert "[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def" Failing tests on Arm due to the tests automatically populating incomatible pointer width architectures. Reverting until the tests are updated. Failing tests: OpenMP/distribute_parallel_for_num_threads_codegen.cpp OpenMP/distribute_parallel_for_if_codegen.cpp OpenMP/distribute_parallel_for_simd_if_codegen.cpp OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp OpenMP/teams_distribute_parallel_for_if_codegen.cpp OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp This reverts commit `90eaedda9b`.	2020-09-30 15:12:21 -04:00
Sanjay Patel	81921ebc43	[CodeGen] improve coverage for float (32-bit) type of NAN; NFC Goes with D88238	2020-09-30 15:10:25 -04:00
Joseph Huber	bdc85292fb	Revert "[OpenMP] Add Error Handling for Conflicting Pointer Sizes for Target Offload" Failing tests on Arm due to the tests automatically populating incomatible pointer width architectures. Reverting until the tests are updated. Failing tests: OpenMP/distribute_parallel_for_num_threads_codegen.cpp OpenMP/distribute_parallel_for_if_codegen.cpp OpenMP/distribute_parallel_for_simd_if_codegen.cpp OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp OpenMP/teams_distribute_parallel_for_if_codegen.cpp OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp This reverts commit `9d2378b591`.	2020-09-30 15:08:22 -04:00
Louis Dionne	490b556a0f	[libc++] Make sure we don't attempt to run check-cxx-abilist when libc++ doesn't define new/delete That would make the test fail spuriously because we don't generate an ABI list for that configuration.	2020-09-30 14:59:03 -04:00
Arthur Eubanks	2d761a368c	[test][NewPM][SampleProfile] Fix more tests under NPM These all have separate legacy and new PM RUN lines.	2020-09-30 11:50:41 -07:00
Jim Ingham	afaeb6af79	Fix crash in SBStructuredData::GetDescription() when there's no StructuredDataPlugin. Also, use the StructuredData::Dump method to print the StructuredData if there is no plugin, rather than just returning an error. Differential Revision: https://reviews.llvm.org/D88266	2020-09-30 11:48:54 -07:00
Jordan Rupprecht	ad865d9d10	[lldb-vscode] Allow an empty 'breakpoints' field to clear breakpoints. Per the DAP spec for SetBreakpoints [1], the way to clear breakpoints is: `To clear all breakpoint for a source, specify an empty array.` However, leaving the breakpoints field unset is also a well formed request (note the `breakpoints?:` in the `SetBreakpointsArguments` definition). If it's unset, we have a couple choices: 1. Crash (current behavior) 2. Clear breakpoints 3. Return an error response that the breakpoints field is missing. I propose we do (2) instead of (1), and treat an unset breakpoints field the same as an empty breakpoints field. [1] https://microsoft.github.io/debug-adapter-protocol/specification#Requests_SetBreakpoints Reviewed By: wallace, labath Differential Revision: https://reviews.llvm.org/D88513	2020-09-30 11:32:06 -07:00
Eugene Zhulenev	655af658c9	[MLIR] Add async.value type to Async dialect Return values from async regions as !async.value<...>. Reviewed By: mehdi_amini, csigg Differential Revision: https://reviews.llvm.org/D88510	2020-09-30 11:30:06 -07:00
Jordan Rupprecht	c3193e464c	[lldb/ipv6] Support running lldb tests in an ipv6-only environment. When running in an ipv6-only environment where `AF_INET` sockets are not available, many lldb tests (mostly gdb remote tests) fail because things like `127.0.0.1` don't work there. Use `localhost` instead of `127.0.0.1` whenever possible, or include a fallback of creating `AF_INET6` sockets when `AF_INET` fails. Reviewed By: labath Differential Revision: https://reviews.llvm.org/D87333	2020-09-30 11:08:41 -07:00
Rahman Lavaee	8955950c12	Exception support for basic block sections This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables. The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a nop operation as the first non-CFI instruction in the exception section. Background on Exception Handling in C++ ABI https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function. The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows: \| CallSite \| --> Position of the call site (relative to the function entry) \| CallSite length \| --> Length of the call site. \| Landing Pad \| --> Position of the landing pad (relative to the landing pad fragment’s begin label) \| Action record offset \| --> Position of the first action record The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table). The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted. The action record offset is an index into the action table which includes information about which exception types are caught. C++ Exceptions with Basic Block Sections Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section. This patch introduces a new CallSiteRange structure which specifies the range of call-sites which correspond to every section: `struct CallSiteRange { // Symbol marking the beginning of the precedure fragment. MCSymbol FragmentBeginLabel = nullptr; // Symbol marking the end of the procedure fragment. MCSymbol FragmentEndLabel = nullptr; // LSDA symbol for this call-site range. MCSymbol *ExceptionLabel = nullptr; // Index of the first call-site entry in the call-site table which // belongs to this range. size_t CallSiteBeginIdx = 0; // Index just after the last call-site entry in the call-site table which // belongs to this range. size_t CallSiteEndIdx = 0; // Whether this is the call-site range containing all the landing pads. bool IsLPRange = false; };` With N basic-block-sections, the call-site table is partitioned into N call-site ranges. Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed. Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange). \| @LPStart \| ---> Landing pad fragment ( LSDA1 points here) \| CallSite Table Length \| ---> Used to find the action table. \| CallSites \| \| … \| \| … \| \| @LPStart \| ---> Landing pad fragment ( LSDA2 points here) \| CallSite Table Length \| \| CallSites \| \| … \| \| … \| … … \| Action Table \| \| Types Table \| Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D73739	2020-09-30 11:05:55 -07:00
David Tenty	afc277b0ed	[AIX][Clang][Driver] Link libm in c++ mode since that is the normal behaviour of other compilers on the platform. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D88500	2020-09-30 14:02:17 -04:00
Joseph Huber	90eaedda9b	[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def Summary: Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU for OpenMP device code generation with ones in OMPKinds.def and use OMPIRBuilder for generating runtime calls. This allows us to consolidate more OpenMP code generation into the OMPIRBuilder. This patch also invalidates specifying target architectures with conflicting pointer sizes. Reviewers: jdoerfert Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl Tags: #OpenMP #Clang #LLVM Differential Revision: https://reviews.llvm.org/D88430	2020-09-30 14:00:01 -04:00
Joseph Huber	9d2378b591	[OpenMP] Add Error Handling for Conflicting Pointer Sizes for Target Offload Summary: This patch adds an error to Clang that detects if OpenMP offloading is used between two architectures with incompatible pointer sizes. This ensures that the data mapping can be done correctly and solves an issue in code generation generating the wrong size pointer. Reviewer: jdoerfert Subscribers: Tags: #OpenMP #Clang Differential Revision:	2020-09-30 13:58:24 -04:00
Richard Smith	892df30a7f	Fix interaction of `constinit` and `weak`. We previously took a shortcut and said that weak variables never have constant initializers (because those initializers are never correct to use outside the variable). We now say that weak variables can have constant initializers, but are never usable in constant expressions.	2020-09-30 10:49:50 -07:00
Alexandre Rames	700e63293e	[Sema] Support Comma operator for fp16 vectors. The current half vector was enforcing an assert expecting "(LHS is half vector) == (RHS is half vector)" for comma. Reviewed By: ahatanak, fhahn Differential Revision: https://reviews.llvm.org/D88265	2020-09-30 18:23:09 +01:00
Sanjay Patel	187686bea3	[CodeGen] add test for NAN creation; NFC This goes with the APFloat change proposed in D88238. This is copied from the MIPS-specific test in builtin-nan-legacy.c to verify that the normal behavior is correct on other targets without the complication of an inverted quiet bit.	2020-09-30 13:22:12 -04:00
Congzhe Cao	8d8cb1ad80	[AArch64] Avoid pairing loads when the base reg is modified When pairing loads, we should check if in between the two loads the base register has been modified. If that is the case then avoid pairing them because the second load actually loads from a different address. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D86956	2020-09-30 13:06:51 -04:00
Rainer Orth	73fb9698c0	[asan][test] Several Posix/unpoison-alternate-stack.cpp fixes `Posix/unpoison-alternate-stack.cpp` currently `FAIL`s on Solaris/i386. Some of the problems are generic: - `clang` warns compiling the testcase: compiler-rt/test/asan/TestCases/Posix/unpoison-alternate-stack.cpp:83:7: warning: nested designators are a C99 extension [-Wc99-designator] .sa_sigaction = signalHandler, ^~~~~~~~~~~~~ compiler-rt/test/asan/TestCases/Posix/unpoison-alternate-stack.cpp:84:7: warning: ISO C++ requires field designators to be specified in declaration order; field '_funcptr' will be initialized after field 'sa_flags' [-Wreorder-init-list] .sa_flags = SA_SIGINFO \| SA_NODEFER \| SA_ONSTACK, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and some more instances. This can all easily be avoided by initializing each field separately. - The test `SEGV`s in `__asan_memcpy`. The default Solaris/i386 stack size is only 4 kB, while `__asan_memcpy` tries to allocate either 5436 (32-bit) or 10688 bytes (64-bit) on the stack. This patch avoids this by requiring at least 16 kB stack size. - Even without `-fsanitize=address` I get an assertion failure: Assertion failed: !isOnSignalStack(), file compiler-rt/test/asan/TestCases/Posix/unpoison-alternate-stack.cpp, line 117 The fundamental problem with this testcase is that `longjmp` from a signal handler is highly unportable; XPG7 strongly warns against it and it is thus unspecified which stack is used when `longjmp`ing from a signal handler running on an alternative stack. So I'm `XFAIL`ing this testcase on Solaris. Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D88501	2020-09-30 18:56:52 +02:00
Arthur Eubanks	2ab8770223	[test][SampleProfile][NewPM] Fix some tests under NPM	2020-09-30 09:44:29 -07:00
Peter Collingbourne	719ab7309e	scudo: Make it thread-safe to set some runtime configuration flags. Move some of the flags previously in Options, as well as the UseMemoryTagging flag previously in the primary allocator, into an atomic variable so that it can be updated while other threads are running. Relaxed accesses are used because we only have the requirement that the other threads see the new value eventually. The code is set up so that the variable is generally loaded once per allocation function call with the exception of some rarely used code such as error handlers. The flag bits can generally stay in a register during the execution of the allocation function which means that they can be branched on with minimal overhead (e.g. TBZ on aarch64). Differential Revision: https://reviews.llvm.org/D88523	2020-09-30 09:42:45 -07:00
Vy Nguyen	4fcd1a8e65	[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking. This is mostly for the benefit of the LBR latency mode. Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency. Differential Revision: https://reviews.llvm.org/D85254	2020-09-30 12:25:59 -04:00
Valentin Clement	dd4fb7c8cf	[mlir][openacc] Remove -allow-unregistred-dialect from ops and invalid tests Switch to a dummy op in the test dialect so we can remove the -allow-unregistred-dialect on ops.mlir and invalid.mlir. Change after comment on D88272. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D88587	2020-09-30 12:24:21 -04:00
Arthur Eubanks	4fbd83c716	[ObjCARCAA][NewPM] Add already ported objc-arc-aa to PassRegistry.def Also add missing AnalysisKey definition.	2020-09-30 08:50:44 -07:00
Kazushi (Jam) Marukawa	1034262e0a	[VE] Support TargetBlockAddress Change to handle TargetBlockAddress and add a regression test for it. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D88576	2020-10-01 00:48:21 +09:00
Simon Moll	05ae04c396	[DA][SDA] SyncDependenceAnalysis re-write This patch achieves two things: 1. It breaks up the `join_blocks` interface between the SDA to the DA to return two separate sets for divergent loops exits and divergent, disjoint path joins. 2. It updates the SDA algorithm to run in O(n) time and improves the precision on divergent loop exits. This fixes `https://bugs.llvm.org/show_bug.cgi?id=46372` (by virtue of the improved `join_blocks` interface) and revealed an imprecise expected result in the `Analysis/DivergenceAnalysis/AMDGPU/hidden_loopdiverge.ll` test. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D84413	2020-09-30 17:36:26 +02:00
Mircea Trofin	d6de40f886	[NFC][regalloc] Make VirtRegAuxInfo part of allocator state All the state of VRAI is allocator-wide, so we can avoid creating it every time we need it. In addition, the normalization function is allocator-specific. In a next change, we can simplify that design in favor of just having it as a virtual member. Differential Revision: https://reviews.llvm.org/D88499	2020-09-30 08:13:05 -07:00
Simon Pilgrim	f425418fc4	[InstCombine] Add tests for 'partial' bswap patterns As mentioned on PR47191, if we're bswap'ing some bytes and the zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.	2020-09-30 16:09:09 +01:00

1 2 3 4 5 ...

367767 Commits All Branches Search

367767 Commits

All Branches