llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	fd5665af2c	[Attributor] Fix -Wunused-variable for -DLLVM_ENABLE_ASSERTIONS=off builds after `b4352e43d8`	2020-02-14 21:47:19 -08:00
Fangrui Song	6b14814e10	[AsmPrinter] Omit unique ID for .stack_sizes Follow-up for D74006.	2020-02-14 21:25:06 -08:00
Fangrui Song	895cad1a13	[AsmPrinter][XRay] Omit unique ID for xray_instr_map and xray_fn_idx Follow-up for D74006.	2020-02-14 21:10:46 -08:00
Diogo Sampaio	8bc790f9e6	[AArch64][FPenv] Update chain of int to fp conversion Summary: When using strict fp, it is required to update the chain when performing integer type promotion of a operand to a integer to floating point conversion. Reviewers: craig.topper, john.brawn Reviewed By: craig.topper Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74597	2020-02-15 05:07:34 +00:00
Fangrui Song	f554e27224	[AsmPrinter] Omit unique ID for __patchable_function_entries sections Follow-up for D74006. When the integrated assembler is used, we use SHF_LINK_ORDER. The linked-to symbol is part of ELFSectionKey, thus we can omit the unique ID.	2020-02-14 20:54:54 -08:00
Fangrui Song	1dc16c752d	[MC] Add MCSection::NonUniqueID and delete one MCContext::getELFSection overload	2020-02-14 20:25:52 -08:00
Fangrui Song	0fbe221543	[MC][ELF] Make linked-to symbol name part of ELFSectionKey https://bugs.llvm.org/show_bug.cgi?id=44775 This rule has been implemented by GNU as https://sourceware.org/ml/binutils/2020-02/msg00028.html (binutils >= 2.35) It allows us to simplify ``` .section .foo,"o",foo,unique,0 .section .foo,"o",bar,unique,1 # different section ``` to ``` .section .foo,"o",foo .section .foo,"o",bar # different section ``` We consider the two `.foo` different even if the linked-to symbols foo and bar are defined in the same section. This is a deliberate choice so that we don't need to know the section where foo and bar are defined beforehand. Differential Revision: https://reviews.llvm.org/D74006	2020-02-14 20:03:04 -08:00
Johannes Doerfert	b70297a39a	[Attributor][FIX] Ensure abstract attributes are existing before manifest While the function return updateImpl did only look at call sites the manifest method looked at return values. If we don't do this during the updateImpl we might create new abstract attributes during manifest. This is a problem when it comes to liveness information.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	ad121ea14d	[Attributor] Manifest simplified (return) values properly If we simplify a function return value we have to modify the return instructions.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	b53af0e7f9	[Attributor][FIX] Collapse `undef` to a proper value If we see an undef we cannot assume it's the same as "no value". For now we just collapse it to 0.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	137c99a6a5	[Attributor][FIX] Restrict cross-SCC call deletion If we know a call was not needed we might have ended up deleting it even if it was in a different SCC. This prevents us from doing so.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	32e98a7089	[Attributor][FIX] Carefully strip casts in AANoAlias We can strip casts in AANoAlias but that might cause us to end up with a non-pointer type. We do properly handle that case now.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	b4352e43d8	[Attributor][FIX] Do not RAUW void values This caused an error when passes iterated over cached assumptions in the tracker and assumed them to be `null` or an instruction. I failed to create a test case so far.	2020-02-14 21:44:46 -06:00
Matt Arsenault	8d8d46b57a	AMDGPU/GlobalISel: Fix missing impdef of scc on boolean bit ops	2020-02-14 22:35:30 -05:00
Fangrui Song	6d2d589b06	[MC] De-capitalize another set of MCStreamer::Emit* functions Emit{ValueTo,Code}Alignment Emit{DTP,TP,GP}* EmitSymbolValue etc	2020-02-14 19:26:52 -08:00
Fangrui Song	a55daa1461	[MC] De-capitalize some MCStreamer::Emit* functions	2020-02-14 19:11:53 -08:00
Shiva Chen	1cae2f9d19	[RISCV] Correct the CallPreservedMask for the function call in an interrupt handler CallPreservedMask is used to describe the register liveness after a function call. The function call in an interrupt handler should use the same CallPreservedMask as normal functions. So that only callee save registers can live through the function call.	2020-02-15 09:14:04 +08:00
Johannes Doerfert	282f5d7ad1	[Attributor] Derive memory location attributes (argmemonly, ...) In addition to memory behavior attributes (readonly/writeonly) we now derive memory location attributes (argmemonly/inaccessiblememonly/...). The former is part of AAMemoryBehavior and the latter part of AAMemoryLocation. While they are similar in nature it got messy when they were put in a single AA. Location attributes for arguments and floating values will follow later. Note that both memory attributes kinds can derive readnone. If there are no accesses AAMemoryBehavior will derive readnone. If there are accesses but only to stack (=local) locations AAMemoryLocation will derive readnone. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D73426	2020-02-14 19:05:51 -06:00
Matt Arsenault	65dbdc329f	AMDGPU: Don't preserve analyses with div64 IR expansion The dominator tree needs to be updated, but that isn't handled now.	2020-02-14 20:06:02 -05:00
Amy Huang	cb36bfa3de	Fix `01b02a73de` to use correct macro spelling and fix unit tests.	2020-02-14 15:58:36 -08:00
Matt Arsenault	dc3e499dd4	AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results This would assert on an unhandled size in getRegSplitParts.	2020-02-14 15:57:40 -08:00
Matt Arsenault	60fea2713d	AMDGPU/GlobalISel: Improve 16-bit bswap Match the new DAG behavior and use v_perm_b32 when available. Also does better on SI/CI by expanding 16-bit swaps. Also fix non-power-of-2 cases.	2020-02-14 15:57:39 -08:00
Matt Arsenault	3bb0ff8341	GlobalISel: Remove unused function argument	2020-02-14 15:57:39 -08:00
Stanislav Mekhanoshin	922197d664	[TBLGEN] Allow to override RC weight Differential Revision: https://reviews.llvm.org/D74509	2020-02-14 15:49:52 -08:00
Derek Schuff	2504f14a06	[WebAssembly] Add section names for some DWARF5 sections Summary: Addresses PR44728 but no tests because I've not yet made any attempt to verify correctness of the debug info. Reviewers: sbc100, aardappel Differential Revision: https://reviews.llvm.org/D74656	2020-02-14 15:45:06 -08:00
Reid Kleckner	1a93285c68	Fix -Wstring-compare warnings in new OpenMP code	2020-02-14 15:23:49 -08:00
Johannes Doerfert	7cbb107feb	[Attributor][FIX] Validate the type for AAValueConstantRange as needed Due to the genericValueTraversal we might visit values for which we did not create an AAValueConstantRange object, e.g., as they are behind a PHI or select or call with `returned` argument. As a consequence we need to validate the types as we are about to query AAValueConstantRange for operands.	2020-02-14 17:22:40 -06:00
Amy Huang	01b02a73de	Don't call computeHostNumPhysicalCores when LLVM_ENABLE_THREADS is off Summary: Fix change from `8404aeb56a` to avoid calling computeHostNumPhysicalCores if LLVM_ENABLE_THREADS is off. Reviewers: rnk, aganea Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74654	2020-02-14 15:09:27 -08:00
Johannes Doerfert	1228d42dda	[OpenMP][Part 2] Use reusable OpenMP context/traits handling This patch implements an almost complete handling of OpenMP contexts/traits such that we can reuse most of the logic in Flang through the OMPContext.{h,cpp} in llvm/Frontend/OpenMP. All but construct SIMD specifiers, e.g., inbranch, and the device ISA selector are define in `llvm/lib/Frontend/OpenMP/OMPKinds.def`. From these definitions we generate the enum classes `TraitSet`, `TraitSelector`, and `TraitProperty` as well as conversion and helper functions in `llvm/lib/Frontend/OpenMP/OMPContext.{h,cpp}`. The above enum classes are used in the parser, sema, and the AST attribute. The latter is not a collection of multiple primitive variant arguments that contain encodings via numbers and strings but instead a tree that mirrors the `match` clause (see `struct OpenMPTraitInfo`). The changes to the parser make it more forgiving when wrong syntax is read and they also resulted in more specialized diagnostics. The tests are updated and the core issues are detected as before. Here and elsewhere this patch tries to be generic, thus we do not distinguish what selector set, selector, or property is parsed except if they do behave exceptionally, as for example `user={condition(EXPR)}` does. The sema logic changed in two ways: First, the OMPDeclareVariantAttr representation changed, as mentioned above, and the sema was adjusted to work with the new `OpenMPTraitInfo`. Second, the matching and scoring logic moved into `OMPContext.{h,cpp}`. It is implemented on a flat representation of the `match` clause that is not tied to clang. `OpenMPTraitInfo` provides a method to generate this flat structure (see `struct VariantMatchInfo`) by computing integer score values and boolean user conditions from the `clang::Expr` we keep for them. The OpenMP context is now an explicit object (see `struct OMPContext`). This is in anticipation of construct traits that need to be tracked. The OpenMP context, as well as the `VariantMatchInfo`, are basically made up of a set of active or respectively required traits, e.g., 'host', and an ordered container of constructs which allows duplication. Matching and scoring is kept as generic as possible to allow easy extension in the future. --- Test changes: The messages checked in `OpenMP/declare_variant_messages.{c,cpp}` have been auto generated to match the new warnings and notes of the parser. The "subset" checks were reversed causing the wrong version to be picked. The tests have been adjusted to correct this. We do not print scores if the user did not provide one. We print spaces to make lists in the `match` clause more legible. Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim Subscribers: merge_guards_bot, rampitec, mgorny, hiraditya, aheejin, fedor.sergeev, simoncook, bollu, guansong, dexonsmith, jfb, s.egerton, llvm-commits, cfe-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71830	2020-02-14 16:37:42 -06:00
Craig Topper	8dc659c131	[Hexagon] Add an explicit makeArrayRef to pacify gcc 5.5 The array seemed to have decayed to a pointer before the ArrayRef constructor got called so there was no size information available.	2020-02-14 13:51:39 -08:00
Sean Fertile	b75692c30e	[AsmPrinter] Use the McASMInfo to determine if we need descriptors. In https://reviews.llvm.org/rG8b737688c21a9755cae14cb9343930e0882164ab I switched the condition gating the creation of the descriptor symbol from checking the MCAsmInfo if we need to support descriptors, to if the OS was AIX. Technically the 2 should be interchangeable: if we are targeting AIX then we need to emit XCOFF object files, and the MCAsmInfo must return true for needing function descriptors. This doesn't account for lit test with runsteps that only set the arch. Eg: test/CodeGen/XCore/section-name.ll which when run natively on AIX we end up with a target xcore-ibm-aix and needFunctionDescriptors is false. This patch reverts to using the MCAsmInfo and adds an assert that the target OS must be AIX since that is the only target using the descriptor hook. Differential Revision: https://reviews.llvm.org/D74622	2020-02-14 15:20:39 -05:00
Austin Kerbow	07824e65bf	[AMDGPU] Always enable XNACK feature when support is explicitly requested Differential Revision: https://reviews.llvm.org/D74630	2020-02-14 11:58:58 -08:00
Matt Arsenault	9ec668606b	AMDGPU: Add option to disable CGP division expansion The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalization in some cases. We still need a way to be able to write tests for the legalizer versions of the expansion. This is mostly for GlobalISel, since the expected optimzations is expecting aren't implemented. The interaction with the flag to expand 64-bit division in the IR is pretty confusing, but these flags have different purposes.	2020-02-14 11:37:07 -08:00
Matt Arsenault	34d9a16e54	AMDGPU: Add option to expand 64-bit integer division in IR I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, so produces significantly smaller code than the inline DAG expansion. This now requires width reductions of 64-bit divisions before introducing the expanded loops. This helps work around missing legalization in GlobalISel for division, which are the only remaining core instructions that didn't work at all. I think this is plausibly a better implementation than exists in the DAG, although turning it on by default misses out on the constant value optimizations and also needs benchmarking.	2020-02-14 11:16:08 -08:00
Craig Topper	391cc4dd41	[X86] Use ZERO_EXTEND instead of SIGN_EXTEND in the fast isel handling of convert_from_fp16.	2020-02-14 10:57:12 -08:00
Craig Topper	fc0c72b2df	[X86] Add AVX512 support to the fast isel code for Intrinsic::convert_from_fp16/convert_to_fp16.	2020-02-14 10:57:11 -08:00
Alina Sbirlea	1326a5a4cf	[LoopRotate] Get and update MSSA only if available in legacy pass manager. Summary: Potential fix for: https://bugs.llvm.org/show_bug.cgi?id=44889 and https://bugs.llvm.org/show_bug.cgi?id=44408 In the legacy pass manager, loop rotate need not compute MemorySSA when not being in the same loop pass manager with other loop passes. There isn't currently a way to differentiate between the two cases, so this attempts to limit the usage in LoopRotate to only update MemorySSA when the analysis is already available. The side-effect of this is that it will split the Loop pipeline. This issue does not apply to the new pass manager, where we have a flag specifying if all loop passes in that loop pass manager preserve MemorySSA. Reviewers: dmgreen, fedor.sergeev, nikic Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74574	2020-02-14 10:47:26 -08:00
Matt Arsenault	bfbfa18591	GlobalISel: Lower s64->s16 G_FPTRUNC This is more or less directly ported from the AMDGPU custom lowering for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES instead of creating shift/trunc to extract the two halves, and zexting an inverted compare instead of select_cc). This also does not include the fast math expansion the DAG which converts to f32 and then to f16. I think that belongs in a pre-legalize combine instead.	2020-02-14 10:46:58 -08:00
Volkan Keles	187686a22f	[GlobalISel] LegalizationArtifactCombiner: Fix a bug in tryCombineMerges Like COPY instructions explained in D70616, we don't check the constraints when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check if registers can be replaced, or a COPY instruction needs to be built. https://reviews.llvm.org/D70564	2020-02-14 10:45:58 -08:00
Brian Cain	bf3b86bc2f	[Hexagon] v67+ HVX register pairs should support either direction Assembler now permits pairs like 'v0:1', which are encoded differently from the odd-first pairs like 'v1:0'. The compiler will require more work to leverage these new register pairs.	2020-02-14 12:43:43 -06:00
Matt Arsenault	8c2c0b3637	AMDGPU: Improve i16/v2i16 bswap	2020-02-14 09:53:22 -08:00
Craig Topper	7badb38918	[X86] Fix copy/paste mistake in comment. NFC	2020-02-14 09:47:50 -08:00
Matt Arsenault	a257bde420	AMDGPU/GlobalISel: Handle G_BSWAP	2020-02-14 09:09:44 -08:00
Evgeniy Brevnov	cae643d596	Reverting D73027 [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151).	2020-02-14 22:57:23 +07:00
Alexandre Ganea	8404aeb56a	[Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket. == Background == Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads. By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to. This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market. == The problem == The heavyweight_hardware_concurrency() API was introduced so that only one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group". == The changes in this patch == To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO). When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead. The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware core will be used. When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once. Differential Revision: https://reviews.llvm.org/D71775	2020-02-14 10:24:22 -05:00
Teresa Johnson	2102ef8aad	Reenable "Always import constants" after compile time fixes Summary: Reenables importing of constants by default, which was disabled in D73724 due to excessive thin link times. These inefficiencies were fixed in D73851. I re-measured thin link times for a number of binaries that had compile time explosions with importing of constants previously and confirmed they no longer have any notable increases with it enabled. Reviewers: wmi, evgeny777 Subscribers: hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74512	2020-02-14 06:37:14 -08:00
Pavel Iliin	b6a9fe2099	[AArch64] Add BIT/BIF support. This patch added generation of SIMD bitwise insert BIT/BIF instructions. In the absence of GCC-like functionality for optimal constraints satisfaction during register allocation the bitwise insert and select patterns are matched by pseudo bitwise select BSP instruction with not tied def. It is expanded later after register allocation with def tied to BSL/BIT/BIF depending on operands registers. This allows to get rid of redundant moves. Reviewers: t.p.northover, samparker, dmgreen Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D74147	2020-02-14 14:19:39 +00:00
Simon Pilgrim	2492075add	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option. REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.	2020-02-14 11:55:18 +00:00
Roger Ferrer Ibanez	2bef1c0e56	[OpenMP] Lower taskyield using OpenMP IR Builder This is similar to D69828. Special codegen for enclosing untied tasks is still done in clang. Differential Revision: https://reviews.llvm.org/D70799	2020-02-14 11:35:17 +00:00
James Henderson	fe6983a75a	[DebugInfo] Error if unsupported address size detected in line table Prior to this patch, if a DW_LNE_set_address opcode was parsed with an address size (i.e. with a length after the opcode) of anything other 1, 2, 4, or 8, an llvm_unreachable would be hit, as the data extractor does not support other values. This patch introduces a new error check that verifies the address size is one of the supported sizes, in common with other places within the DWARF parsing. This patch also fixes calculation of a generated line table's size in unit tests. One of the tests in this patch highlighted a bug introduced in `1271cde474`, when non-byte operands were used as arguments for extended or standard opcodes. Reviewed by: dblaikie Differential Revision: https://reviews.llvm.org/D73962	2020-02-14 11:08:12 +00:00

1 2 3 4 5 ...

131272 Commits