llvm-project

Commit Graph

Author	SHA1	Message	Date
alex-t	1a33294652	[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*. Sometimes, for the sake of efficiency, SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa. VALU operations only process the active lanes of the VGPR and ignore inactive ones. Active lanes correspond to 1 bit in the EXEC mask register. SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively. SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit. To return back to the SALU context we need to do the opposite. To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D109900	2021-09-21 21:19:31 +03:00
Owen Anderson	b5fbbdd202	Teach InstCombine to eliminate malloc-realloc-free triplets. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D109988	2021-09-21 18:07:49 +00:00
Brendon Cahoon	cbdf624bb8	[AMDGPU] Correctly merge alias.scope and noalias metadata for memops When adding alias.scope and noalias metadata to a memcpy function, the alias.scope and noalias metadata from the operands are merged. The rule for merging alias.scope is to take the intersection of the domains and the union of the scopes within those domains. The rule for merging noalias is to take the intersection. The bug is that AMDGPULowerModuleLDS was using concatenation for both alias.scope and noalias. For example, when f1 and f2 are added to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)). Then, concatenation creates noalias metadata for the memcpy that includes both {f1, f2}. That means that the memcpy is assumed not to alias a prior load of f2, which enables the optimizer to remove a load of f2 that occurs after mempcy. The function MDNode::getmostGenericAliasScope defines the semantics for alias.scope. There is a function, combineMetadata in Local.cpp, that uses intersect for noalias. Differential Revision: https://reviews.llvm.org/D110049	2021-09-21 13:02:01 -05:00
Craig Topper	7c975665b4	[RISCV] Make some arrays of constants 'static const'. NFC This helps the compiler generate better code.	2021-09-21 10:52:47 -07:00
Danila Malyutin	78b51c7a2c	[LSR] Make sure that Factor fits into Base type Fixes pr42770 Differential Revision: https://reviews.llvm.org/D108772	2021-09-21 20:50:50 +03:00
Giorgis Georgakoudis	1d66649adf	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D102107	2021-09-21 10:50:04 -07:00
Amy Kwan	2af57b6099	[PowerPC] Add prefix load pattern for fpext to v2f64 This patch adds a prefixed load pattern involving v2f32 fpext v2f64, where we are dealing with a value with an offset that fits into a 34-bit signed immediate. A reduced test case is also added to patch that tests the pattern, in which the pattern is tested in the big endian CHECKs of the newly added test. Differential Revision: https://reviews.llvm.org/D109887	2021-09-21 12:45:24 -05:00
Ayal Zaks	ab6a69dfea	[LV] Fix crash for reverse interleaved loads with gap under fold-tail. This patch fixes the crash found by PR51614: whenever doing tail folding, interleave groups must be considered under mask. Another fix D108900 follows for targets that support masked loads and stores: when deciding to vectorize with masked interleave groups, check if the access is reverse - which is currently not supported; rather than (only) asserting when computing cost and generating code. Differential Revision: https://reviews.llvm.org/D108891	2021-09-21 20:13:32 +03:00
Craig Topper	aeb63d464f	[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor. This requires a minor change to CodeGenPrepare to ensure that shouldSinkOperands will be called for And. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110106	2021-09-21 10:07:29 -07:00
Nico Weber	908c115442	[lldb/win] Default to native PDB reader when LLVM_ENABLE_DIA_SDK=NO Trying to use the DIA SDK reader only to fail with "DIA SDK wasn't enabled" isn't very useful. The native PDB reader is missing some stuff, but it's still better than nothing. Reduces number of lldb-check-shell test failures with LLVM_ENABLE_DIA_SDK=NO from 27 to 15. Differential Revision: https://reviews.llvm.org/D110172	2021-09-21 13:02:52 -04:00
LLVM GN Syncbot	a3bb4f1455	[gn build] Port `a04a6ce772`	2021-09-21 16:34:07 +00:00
Mark de Wever	a04a6ce772	[libc++][format] Adds parser std-format-spec. This implements the generic std.format.spec framework for all types. The Unicode support will be added in a separate patch. Implements parts of: - P0645 Text Formatting Completes: - LWG-3242 std::format: missing rules for arg-id in width and precision - P1892 Extended locale-specific presentation specifiers for std::format Reviewed By: #libc, ldionne, vitaut Differential Revision: https://reviews.llvm.org/D103368	2021-09-21 18:29:58 +02:00
cchen	8c68bd480f	[OpenMP][NFC] Add declare variant and metadirective to support page	2021-09-21 11:28:13 -05:00
Aaron Ballman	73a8bcd789	Revert "Diagnose -Wunused-value based on CFG reachability" This reverts commit `63e0d038fc`. It causes test failures: http://lab.llvm.org:8011/#/builders/119/builds/5612 https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8835548361443044001/+/u/clang/test/stdout	2021-09-21 12:25:13 -04:00
Dávid Bolvanský	c0fdfc9af2	[InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z) We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer). Requires reassoc. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D109954	2021-09-21 18:20:46 +02:00
Quinn Pham	5793930950	[PowerPC] Fix signature of lxvp and stxvp builtins This patch changes the signature of the load and store vector pair builtins to match their documentation. The type of the `signed long long` argument is changed to `signed long`. This patch also changes existing testcases to match the signature change. Reviewed By: lei, Conanap Differential Revision: https://reviews.llvm.org/D109996	2021-09-21 11:19:29 -05:00
Kazu Hirata	54229cd9e4	[CodeGen] Remove redundant declaration getFileType (NFC)	2021-09-21 09:12:30 -07:00
Sanjay Patel	08ef71ca92	[InstCombine] move/add tests for trunc-of-lshr; NFC Planning to reframe a proposed transform in terms of demanded bits as suggested in D110170. The new tests end with an 'or'.	2021-09-21 12:11:25 -04:00
Kostya Serebryany	11c533e1ea	[sanitizer coverage] write the pc-table at the process exit The current code writes the pc-table at the process startup, which may happen before the common_flags() are initialized. Move writing to the process end. This is consistent with how we write the counters and avoids the problem with the uninitalized flags. Add prints if verbosity>=1. Reviewed By: kostik Differential Revision: https://reviews.llvm.org/D110119	2021-09-21 09:09:25 -07:00
Florian Hahn	5131037ea9	[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange. isValidAssumeForContext can provide better results with access to the dominator tree in some cases. This patch adjusts computeConstantRange to allow passing through a dominator tree. The use VectorCombine is updated to pass through the DT to enable additional scalarization. Note that similar APIs like computeKnownBits already accept optional dominator tree arguments. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110175	2021-09-21 16:54:47 +01:00
Michael Liao	5fb3ae525f	[SelectionDAG] Re-calculate scoped AA metadata when merging stores. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D102821	2021-09-21 11:41:17 -04:00
Aleksandr Bezzubikov	624e4d087e	[GlobalISel] Support ConstantAsMetadata in IRTranslator When using instructions which have a MetadataAsValue argument (e.g. some target-specific intrinsics) MD canonicalization strips internal MDNodes with a single ConstantAsMetadata child. That prevented IRTranslator from the proper translation of such a calls.	2021-09-21 11:24:56 -04:00
Tobias Gysi	8b5236def5	[mlir][linalg] Simplify slice dim computation for fusion on tensors (NFC). Compute the tiled producer slice dimensions directly starting from the consumer not using the producer at all. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D110147	2021-09-21 15:09:46 +00:00
Tobias Gysi	9072f1b5f8	[mlir][linalg] Add isPermutation helper (NFC). Add a helper method to check if an index vector contains a permutation of its indices. Additionally, refactor applyPermutationToVector to take int64_t. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D110135	2021-09-21 15:07:39 +00:00
Dmitry Preobrazhensky	3500e7d2b0	[AMDGPU][MC][GFX7][GFX10] Corrected image_atomic_fcmpswap Differential Revision: https://reviews.llvm.org/D109616	2021-09-21 18:06:02 +03:00
Petar Avramovic	f3366983f0	AMDGPU/GlobalISel: Restore run line erased in D109154 by mistake	2021-09-21 17:03:46 +02:00
Andy Wingo	9ae4275557	[clang][NFC] Fix needless double-parenthisation Strip a layer of parentheses in TreeTransform::RebuildQualifiedType. Differential Revision: https://reviews.llvm.org/D108359	2021-09-21 17:03:23 +02:00
David Green	a502294b2d	[AArch64] Regenerate test lines in and-mask-removal.ll	2021-09-21 15:37:00 +01:00
Nicolas Vasilache	101d017a64	[mlir][Linalg] Revisit heuristic ordering of tensor.insert_slice in comprehensive bufferize. It was previously assumed that tensor.insert_slice should be bufferized first in a greedy fashion to avoid out-of-place bufferization of the large tensor. This heuristic does not hold upon further inspection. This CL removes the special handling of such ops and adds a test that exhibits better behavior and appears in real use cases. The only test adversely affected is an artificial test which results in a returned memref: this pattern is not allowed by comprehensive bufferization in real scenarios anyway and the offending test is deleted. Differential Revision: https://reviews.llvm.org/D110072	2021-09-21 14:22:45 +00:00
Nicolas Vasilache	0d2c54e851	[mlir][Linalg] Revisit RAW dependence interference in comprehensive bufferize. Previously, comprehensive bufferize would consider all aliasing reads and writes to the result buffer and matching operand. This resulted in spurious dependences being considered and resulted in too many unnecessary copies. Instead, this revision revisits the gathering of read and write alias sets. This results in fewer alloc and copies. An exhaustive test cases is added that considers all possible permutations of `matmul(extract_slice(fill), extract_slice(fill), ...)`.	2021-09-21 14:22:22 +00:00
Tobias Gysi	c8eed8f9a7	[mlir][linalg] Assert tile loop nest invariants in fusion. Assert the tile loop nest invariants are satisfied instead of failing silently. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D110137	2021-09-21 14:20:57 +00:00
Chris Bieneman	744ec74b30	[NFC] `goto fail` has failed us in the past... This patch replaces reliance on `goto failure` pattern with `llvm::scope_exit`. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D109865	2021-09-21 09:18:37 -05:00
Ben Shi	b3052013b4	[RISCV] Optimize (add (mul x, c0), c1) Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0), if c1/c0 and c1%c0 are simm12, while c1 is not. Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0), if c1%c0 is zero, and c1/c0 is simm12 while c1 is not. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108607	2021-09-21 14:13:14 +00:00
Justas Janickas	32b994bca6	[OpenCL] Defines helper function for OpenCL default address space Helper function `getDefaultOpenCLPointeeAddrSpace()` introduced to `ASTContext` class. It returns default OpenCL address space depending on language version and enabled features. If generic address space is supported, the helper function returns value `LangAS::opencl_generic`. Otherwise, value `LangAS::opencl_private` is returned. Code refactoring changes performed in several suitable places. Differential Revision: https://reviews.llvm.org/D109874	2021-09-21 15:12:08 +01:00
Anna Thomas	69921f6f45	[InstCombine] Improve TryToSinkInstruction with multiple uses This patch allows sinking an instruction which can have multiple uses in a single user. We were previously over-restrictive by looking for exactly one use, rather than one user. Also added an API for retrieving a unique undroppable user. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109700	2021-09-21 10:04:04 -04:00
Saiyedul Islam	ee31ad0ab5	[clang-offload-bundler][docs][NFC] Add archive unbundling documentation Add documentation of unbundling of heterogeneous device archives to create device specific archives, as introduced by D93525. Also, add documentation for supported text file formats. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D110083	2021-09-21 19:24:44 +05:30
OGINO Masanori	17a26f5851	[NFC] Update the list of subprojects in docs. The updated list is based on the output of cmake -G Ninja -S llvm -B build -DLLVM_ENABLE_PROJECTS='foo'. Differential Revision: https://reviews.llvm.org/D110124	2021-09-21 17:27:13 +02:00
Sanjay Patel	af1c5312d7	[InstCombine] add tests for mask-shift with trunc; NFC	2021-09-21 09:41:41 -04:00
Dmitry Preobrazhensky	b8e7f53208	[AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics Differential Revision: https://reviews.llvm.org/D109614	2021-09-21 16:23:20 +03:00
hyeongyu kim	043733d677	[IR] Add the constructor of ShuffleVector for one-input-vector. One of the two inputs of the Shufflevector is often a placeholder. Previously, there were cases where the placeholder was undef, and there were cases where it was poison. I added these constructors to create a placeholder consistently. Changing to use the newly added constructor will be written in a separate patch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110146	2021-09-21 22:06:07 +09:00
Nico Weber	e9ea03c62c	[llvm] Pass LLVM_CHECK_ENABLED_PROJECTS through in cross builds	2021-09-21 09:01:37 -04:00
Jonas Paulsson	a48b43f981	[SystemZ] Emit EXRL target instructions before text section is ended. SystemZ adds the EXRL target instructions in the end of each file. This must be done before debug info emission since that may end the text section, and therefore this is now done in emitConstantPools() (instead of in emitEndOfAsmFile). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109513	2021-09-21 14:32:28 +02:00
Florian Hahn	ea27dd7497	[VectorCombine] Add tests which require DT to use info from assumes.	2021-09-21 13:07:06 +01:00
Nicholas Guy	9e4d72675f	[AArch64] Improve schedule modelling on the Cortex-A55 Enables the FuseAddress feature in the Cortex-A55 scheduling model Differential Revision: https://reviews.llvm.org/D109323	2021-09-21 13:03:34 +01:00
Simon Pilgrim	fc8f1e4419	[InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824) If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057	2021-09-21 13:01:09 +01:00
Simon Pilgrim	20b58855e0	[CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Simon Pilgrim	f5d23d36de	RewriteStatepointsForGC - Use const-ref iterator in for-range loops. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Simon Pilgrim	0f83456cf5	[CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI. Reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Dmitry Vyukov	9d7b7350c9	tsan: simplify thread context setting Currently we set thr->tctx after OnStarted callback taking thread registry mutex again and searching for the context. But OnStarted already runs under the thread registry mutex and has access to the context, so set it in the OnStarted. This makes code simpler and faster. Depends on D110132. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110133	2021-09-21 13:26:55 +02:00
Dmitry Vyukov	908256b0ea	tsan: rearrange thread state callbacks (NFC) Thread state functions are split into 2 parts: tsan entry function (e.g. ThreadStart) and thread registry state change callback (e.g. OnStart). Currently these pairs of functions are located far from each other and in reverse order. This makes it hard to read and follow the logic. Reorder the code so that OnFoo directly follows ThreadFoo. No other code changes. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D110132	2021-09-21 13:26:36 +02:00

1 2 3 4 5 ...

399500 Commits All Branches Search

399500 Commits

All Branches