llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	c986d476cd	AMDGPU: Update reqd-work-group-size optimization for umin intrinsic This code was pattern matching the ID computation expression as it appears in the library. This was a compare and select, but now that umin is canonical, we were no longer matching. Update to match the intrinsic instead.	2022-04-12 20:03:02 -04:00
Muhammad Omair Javaid	42ebfa8269	Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" This reverts commit `64b6192e81`. This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage: https://lab.llvm.org/buildbot/#/builders/176/builds/1515 llvm-tblgen crashes after applying this patch.	2022-04-13 04:53:07 +05:00
Arthur Eubanks	32f3633171	[test][DSE] Precommit test	2022-04-12 16:21:04 -07:00
Matt Arsenault	d4b1be20f6	RegAllocGreedy: Fix illegal eviction assert for urgent evictions The condition in canEvictInterferenceBasedOnCost is slightly different from the assertion in evictInteference. canEvictInterferenceBasedOnCost uses a <= check for the cascade number for legality, but the assert was checking for <. For equal cascade numbers for an urgent eviction, canEvictInterferenceBasedOnCost could return success. The actual eviction would then hit this assert. Avoid ever returning true for equivalent cascade numbers. The resulting failed allocation seems a bit off to me. e.g. in illegal-eviction-assert.mir, I wuold assume %0 gets allocated starting at $vgpr0. That was its initial allocation choice, but was later evicted. In this example no evictions can help improve anything.	2022-04-12 19:16:56 -04:00
Stanislav Mekhanoshin	f6462a26f0	[AMDGPU] Split unaligned 4 DWORD DS operations Similarly to 3 DWORD operations it is better for performance to split unlaligned operations as long a these are at least DWORD alignmened. Performance data: ``` Using platform: AMD Accelerated Parallel Processing Using device: gfx900:xnack- ds_write_b128 aligned by 16: 4.9 sec ds_write2_b64 aligned by 16: 5.1 sec ds_write2_b32 * 2 aligned by 16: 5.5 sec ds_write_b128 aligned by 1: 8.1 sec ds_write2_b64 aligned by 1: 8.7 sec ds_write2_b32 * 2 aligned by 1: 14.0 sec ds_write_b128 aligned by 2: 8.1 sec ds_write2_b64 aligned by 2: 8.7 sec ds_write2_b32 * 2 aligned by 2: 14.0 sec ds_write_b128 aligned by 4: 5.6 sec ds_write2_b64 aligned by 4: 8.7 sec ds_write2_b32 * 2 aligned by 4: 5.6 sec ds_write_b128 aligned by 8: 5.6 sec ds_write2_b64 aligned by 8: 5.1 sec ds_write2_b32 * 2 aligned by 8: 5.6 sec ds_read_b128 aligned by 16: 3.8 sec ds_read2_b64 aligned by 16: 3.8 sec ds_read2_b32 * 2 aligned by 16: 4.0 sec ds_read_b128 aligned by 1: 4.6 sec ds_read2_b64 aligned by 1: 8.1 sec ds_read2_b32 * 2 aligned by 1: 14.0 sec ds_read_b128 aligned by 2: 4.6 sec ds_read2_b64 aligned by 2: 8.1 sec ds_read2_b32 * 2 aligned by 2: 14.0 sec ds_read_b128 aligned by 4: 4.6 sec ds_read2_b64 aligned by 4: 8.1 sec ds_read2_b32 * 2 aligned by 4: 4.0 sec ds_read_b128 aligned by 8: 4.6 sec ds_read2_b64 aligned by 8: 3.8 sec ds_read2_b32 * 2 aligned by 8: 4.0 sec Using platform: AMD Accelerated Parallel Processing Using device: gfx1030 ds_write_b128 aligned by 16: 6.2 sec ds_write2_b64 aligned by 16: 7.1 sec ds_write2_b32 * 2 aligned by 16: 7.6 sec ds_write_b128 aligned by 1: 24.1 sec ds_write2_b64 aligned by 1: 25.2 sec ds_write2_b32 * 2 aligned by 1: 43.7 sec ds_write_b128 aligned by 2: 24.1 sec ds_write2_b64 aligned by 2: 25.1 sec ds_write2_b32 * 2 aligned by 2: 43.7 sec ds_write_b128 aligned by 4: 14.4 sec ds_write2_b64 aligned by 4: 25.1 sec ds_write2_b32 * 2 aligned by 4: 7.6 sec ds_write_b128 aligned by 8: 14.4 sec ds_write2_b64 aligned by 8: 7.1 sec ds_write2_b32 * 2 aligned by 8: 7.6 sec ds_read_b128 aligned by 16: 6.2 sec ds_read2_b64 aligned by 16: 6.3 sec ds_read2_b32 * 2 aligned by 16: 7.5 sec ds_read_b128 aligned by 1: 12.5 sec ds_read2_b64 aligned by 1: 24.0 sec ds_read2_b32 * 2 aligned by 1: 43.6 sec ds_read_b128 aligned by 2: 12.5 sec ds_read2_b64 aligned by 2: 24.0 sec ds_read2_b32 * 2 aligned by 2: 43.6 sec ds_read_b128 aligned by 4: 12.5 sec ds_read2_b64 aligned by 4: 24.0 sec ds_read2_b32 * 2 aligned by 4: 7.5 sec ds_read_b128 aligned by 8: 12.5 sec ds_read2_b64 aligned by 8: 6.3 sec ds_read2_b32 * 2 aligned by 8: 7.5 sec ``` Differential Revision: https://reviews.llvm.org/D123634	2022-04-12 16:07:13 -07:00
Lang Hames	db8469c4d7	[docs][ORC] Fix RST error in `dfffb7df24`.	2022-04-12 16:06:11 -07:00
owenca	0cde8bdb0b	Revert "[clang-format] Allow empty .clang-format file" This reverts commit `4e814a6f2d`.	2022-04-12 16:05:39 -07:00
Matt Arsenault	eefed1dbf0	RegAllocGreedy: Roll back successful recolorings on failure This is a replacement for the original fix attempted in `c46aab01c0`. This fixes "overlapping insert" assertion failures when trying to unwind an unsuccessful recoloring attempt. The problem would occur when there are multiple recoloring candidates which recursively required recoloring. If one recoloring candidate was successfully recolored at one level, and the next recoloring candidate was unsuccessful, we would not roll back the first candidates successful recoloring. The forgotten successful recoloring may have been assigned to something that conflicts with a register that needs to be restored in a parent recoloring attempt. See the testcase added in issue48473 for a more concrete example with explanation.	2022-04-12 19:02:48 -04:00
Lang Hames	dfffb7df24	[docs] Update OrcV2 doc to include some notes on code removal.	2022-04-12 15:58:25 -07:00
owenca	4e814a6f2d	[clang-format] Allow empty .clang-format file Differential Revision: https://reviews.llvm.org/D123535	2022-04-12 15:45:22 -07:00
Yuanfang Chen	81b51b61f8	Fix libcxx build after `cd0a5889d7`	2022-04-12 15:42:56 -07:00
Arthur Eubanks	51561b5e80	[ArgPromo][OpaquePointer] Don't promote mismatched function types Mismatched call/callee function types is considered an indirect call. Fixes crash in https://reviews.llvm.org/D123300#3446023.	2022-04-12 15:17:45 -07:00
Lang Hames	ebdc60a232	[examples][ORC] Add a new example showing the ORCv2 removable code APIs.	2022-04-12 15:05:07 -07:00
Nikita Popov	0adadfa68f	[MSan] Ensure argument shadow initialized on memcpy We need to explicitly query the shadow here, because it is lazily initialized for byval arguments. Without opaque pointers this used to mostly work out, because there would be a bitcast to `i8*` present, and that would query, and copy in case of byval, the argument shadow. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D123602	2022-04-12 14:53:02 -07:00
Vitaly Buka	efdc90baaa	Revert "[MSan] Ensure argument shadow initialized on memcpy" Invalid author. This reverts commit `163a9f4552`.	2022-04-12 14:53:02 -07:00
Yuanfang Chen	cd0a5889d7	[Reland][lit] Use sharding for GoogleTest format This helps lit unit test performance by a lot, especially on windows. The performance gain comes from launching one gtest executable for many subtests instead of one (this is the current situation). The shards are executed by the test runner and the results are stored in the json format supported by the GoogleTest. Later in the test reporting stage, all test results in the json file are retrieved to continue the test results summary etc. On my Win10 desktop, before this patch: `check-clang-unit`: 177s, `check-llvm-unit`: 38s; after this patch: `check-clang-unit`: 37s, `check-llvm-unit`: 11s. On my Linux machine, before this patch: `check-clang-unit`: 46s, `check-llvm-unit`: 8s; after this patch: `check-clang-unit`: 7s, `check-llvm-unit`: 4s. Reviewed By: yln, rnk, abrachet Differential Revision: https://reviews.llvm.org/D122251	2022-04-12 14:51:12 -07:00
Vitaly Buka	163a9f4552	[MSan] Ensure argument shadow initialized on memcpy We need to explicitly query the shadow here, because it is lazily initialized for byval arguments. Without opaque pointers this used to mostly work out, because there would be a bitcast to `i8*` present, and that would query, and copy in case of byval, the argument shadow. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D123602	2022-04-12 14:49:52 -07:00
Johannes Doerfert	9dc7da3f9c	[GlobalsModRef][FIX] Ensure we honor synchronizing effects of intrinsics This is a long standing problem that resurfaces once in a while [0]. There might actually be two problems because I'm not 100% sure if the issue underlying https://reviews.llvm.org/D115302 would be solved by this or not. Anyway. In 2008 we thought intrinsics do not read/write globals passed to them: `d4133ac315` This is not correct given that intrinsics can synchronize threads and cause effects to effectively become visible. NOTE: I did not yet modify any tests but only tried out the reproducer of https://github.com/llvm/llvm-project/issues/54851. Fixes: https://github.com/llvm/llvm-project/issues/54851 [0] https://discourse.llvm.org/t/bug-gvn-memdep-bug-in-the-presence-of-intrinsics/59402 Differential Revision: https://reviews.llvm.org/D123531	2022-04-12 16:42:50 -05:00
Johannes Doerfert	0f070bee82	[NVPTX][FIX] Allow __nvvm_reflect in the presence of opaque pointers Differential Revision: https://reviews.llvm.org/D123522	2022-04-12 16:42:50 -05:00
Johannes Doerfert	a3a42c3ca2	[OpenMP][FIX] Ensure to set the context for wait events if necessary Differential Revision: https://reviews.llvm.org/D123445	2022-04-12 16:42:50 -05:00
Matt Arsenault	788f94f731	AMDGPU: Don't use unreachable on stores to unhandled address space For stores to constant address space, this will now consistently hit a selection error instead of hitting unreachable in an asserts build. I'm not sure what we should really do here. We could either just codegen as if it were global, delete the instruction, or declare the IR invalid (we really should have a target IR verifier to enforce it).	2022-04-12 17:31:50 -04:00
owenca	c80eaa919f	Revert "[clang-format] Allow empty .clang-format file" This reverts commit `6eafda0ef0`.	2022-04-12 14:28:02 -07:00
owenca	6eafda0ef0	[clang-format] Allow empty .clang-format file Differential Revision: https://reviews.llvm.org/D123535	2022-04-12 13:54:12 -07:00
Matt Arsenault	3754f60112	GlobalISel: Implement MoreElements for select of vector conditions	2022-04-12 16:54:04 -04:00
Matt Arsenault	6009122250	AArch64/GlobalISel: Remove pointless s1 legalize rules These have no net effect on the legalize rules.	2022-04-12 16:54:04 -04:00
Matt Arsenault	3f2cc7cc2b	GlobalISel: Fix lowerSelect handling of boolean high bits This was making several invalid assumptions about the incoming select. First, it was assuming the incoming condition was either s1 or already sign extended, not accounting for different boolean high bits behavior between scalar and vector conditions. We only had a vector boolean due to the intermediate step vector select, which is now avoided. Second, it was assuming it can use the result vector type as a boolean mask. These types don't have anything to do with other, and only makes sense in the context of the expansion to bit operations. Since these logically are part of the same lowering, do the complete expansion in a single step. The added select_v4s1_s1 test does fail to legalize, since it seems AArch64's vector legalization support is pretty incomplete.	2022-04-12 16:54:03 -04:00
Matt Arsenault	0e489926be	GlobalISel: Handle widening addo/subo booleans This will be tested in a future patch	2022-04-12 16:54:03 -04:00
Matt Arsenault	95c2bcbf8b	GlobalISel: Handle widening umulo/smulo condition outputs	2022-04-12 16:54:03 -04:00
Matt Arsenault	abe171df06	GlobalISel: Update mutationIsSane assert for scalable vectors	2022-04-12 16:54:03 -04:00
Matt Arsenault	120c5115b8	Mips/GlobalISel: Add test for atomic load	2022-04-12 16:54:03 -04:00
Craig Topper	057c063c9b	[RISCV] Add a encodeLMUL function to RISCVVType. NFC This moves the encoding handling out of the assembly parser. Reviewed By: khchen, frasercrmck Differential Revision: https://reviews.llvm.org/D123553	2022-04-12 13:39:47 -07:00
Quinn Pham	7d7022fb0c	[PowerPC] Fix EmitPPCBuiltinExpr to emit arguments once This patch changes `EmitPPCBuiltinExpr` in `CGBuiltin.cpp` to remove the loop at the beginning of the function that emits the arguments and to delay emitting the arguments until inside the switch statement. These changes will put `EmitPPCBuiltinExpr` in line with the strategy of the target independent function `EmitBuiltinExpr`. Also, this patch ensures that arguments are only emitted once. Tests that included builtins affected by these changes have been modified to match expected behaviour. Reviewed By: #powerpc, nemanjai, amyk Differential Revision: https://reviews.llvm.org/D121637	2022-04-12 15:33:20 -05:00
Fangrui Song	d10c091683	lit.cfg.py: remove obsoleted feature clang-driver	2022-04-12 13:31:07 -07:00
Fangrui Song	63fbc77121	[Driver][test] Remove unused/obsoleted REQUIRES: clang-driver It (introduced by `556d713c70`) appears to be related to the removed dragonegg project. In addition, the feature was a bit misnamed and may lur users to unnecessarily use it.	2022-04-12 13:29:46 -07:00
Walter Erquinigo	44103c96fa	[trace][intelpt] Remove code smell when printing the raw trace size Something ugly I did was to report the trace buffer size to the DecodedThread, which is later used as part of the `dump info` command. Instead of doing that, we can just directly ask the trace for the raw buffer and print its size. I thought about not asking for the entire trace but instead just for its size, but in this case, as our traces as not extremely big, I prefer to ask for the entire trace, ensuring it could be fetched, and then print its size. Differential Revision: https://reviews.llvm.org/D123358	2022-04-12 13:08:03 -07:00
Walter Erquinigo	bdf3e7e5b8	[trace][intelpt] Add task timer classes I'm adding two new classes that can be used to measure the duration of long tasks as process and thread level, e.g. decoding, fetching data from lldb-server, etc. In this first patch, I'm using it to measure the time it takes to decode each thread, which is printed out with the `dump info` command. In a later patch I'll start adding process-level tasks and I might move these classes to the upper Trace level, instead of having them in the intel-pt plugin. I might need to do that anyway in the future when we have to measure HTR. For now, I want to keep the impact of this change minimal. With it, I was able to generate the following info of a very big trace: ``` (lldb) thread trace dump info Trace technology: intel-pt thread #1: tid = 616081 Total number of instructions: 9729366 Memory usage: Raw trace size: 1024 KiB Total approximate memory usage (excluding raw trace): 123517.34 KiB Average memory usage per instruction (excluding raw trace): 13.00 bytes Timing: Decoding instructions: 1.62s Errors: Number of TSC decoding errors: 0 ``` As seen above, it took 1.62 seconds to decode 9.7M instructions. This is great news, as we don't need to do any optimization work in this area. Differential Revision: https://reviews.llvm.org/D123357	2022-04-12 13:08:03 -07:00
Fangrui Song	9f526057d6	[ubsan][test] Unsupport Android for new test diag-stacktrace.cpp https://reviews.llvm.org/D123562#3446485 reported that the test failed on arm-linux-android.	2022-04-12 12:55:44 -07:00
Daniel Grumberg	7443a504bf	[clang][extract-api] Add support for true anonymous enums Anonymous enums without a typedef should have a "(anonymous)" identifier. Differential Revision: https://reviews.llvm.org/D123533	2022-04-12 20:42:17 +01:00
Changpeng Fang	8edaf25986	AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function. Reviewers: arsenm, sameerds, b-sumner and foad Differential Revision: https://reviews.llvm.org/D123548	2022-04-12 12:36:30 -07:00
Stanislav Mekhanoshin	65b8a43243	[AMDGPU] Update ds-alignment.ll test checks. NFC.	2022-04-12 12:06:02 -07:00
Aart Bik	28063a281b	[mlir][sparse] refactored python setup of sparse compiler Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D123419	2022-04-12 11:58:41 -07:00
Mahesh Ravishankar	b40e901333	[mlir][Linalg] Allow collapsing subset of the reassociations when fusing by collapsing. This change generalizes the fusion of `tensor.expand_shape` -> `linalg.generic` op by collapsing to handle cases where only a subset of the reassociations specified in the `tensor.expand_shape` are valid to be collapsed. The method that does the collapsing is refactored to allow it to be a generic utility when required. Reviewed By: gysit Differential Revision: https://reviews.llvm.org/D123153	2022-04-12 18:56:32 +00:00
Simon Pilgrim	f061c1050b	[SLP][X86] Add ray_sphere intersection methods from c-ray benchmark We're failing to vectorize several comparison reduction patterns. Issue #43090 was based off this, but while that simplified test case is now folding, the original still fails due to poor cost model values for vXi1 extractions	2022-04-12 19:51:27 +01:00
Jonas Devlieghere	a66ff2316e	[lldb] Re-enable fixed on-device tests These tests were fixed by `833882b327`.	2022-04-12 11:39:25 -07:00
Nick Desaulniers	23ec5782c3	[Bitcode] materialize Functions early when BlockAddress taken IRLinker builds a work list of functions to materialize, then moves them from a source module to a destination module one at a time. This is a problem for blockaddress Constants, since they need not refer to the function they are used in; IPSCCP is quite good at sinking these constants deep into other functions when passed as arguments. This would lead to curious errors during LTO: ld.lld: error: Never resolved function from blockaddress ... based on the ordering of function definitions in IR. The problem was that IRLinker would basically do: for function f in worklist: materialize f splice f from source module to destination module in one pass, with Functions being lazily added to the running worklist. This confuses BitcodeReader, which cannot disambiguate whether a blockaddress is referring to a function which has not yet been parsed ("materialized") or is simply empty because its body was spliced out. This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs list incorrectly, as it will never re-materialize an already materialized (but spliced out) function. Because of the possibility that blockaddress Constants may appear in Functions other than the ones they reference, this patch adds a new bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of Functions that contain BlockAddress Constants that refer back to this Function, rather then the Function they are scoped in. We then materialize those functions when materializing `f` from the example loop above. This might over-materialize Functions should the user of BitcodeReader ultimately decide not to link those Functions, but we can at least now we can avoid this ordering related issue with blockaddresses. Fixes: https://github.com/llvm/llvm-project/issues/52787 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1215 Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D120781	2022-04-12 11:38:35 -07:00
Shraiysh Vaishay	b18e82186f	[mlir][OpenMP] Added omp.task This patch adds tasking construct according to Section 2.10.1 of OpenMP 5.0 Reviewed By: peixin, kiranchandramohan, abidmalikwaterloo Differential Revision: https://reviews.llvm.org/D123575	2022-04-12 23:55:47 +05:30
Fangrui Song	fdd424e37a	[ubsan] Fix print_stacktrace=1:fast_unwind_on_fatal=0 to correctly fallback to fast unwinder ubsan_GetStackTrace (from `52b751088b`) called by ~ScopeReport leaves top/bottom zeroes in the `!WillUseFastUnwind(request_fast_unwind)` code path. When BufferedStackTrace::Unwind falls back to UnwindFast, `if (stack_top < 4096) return;` will return early, leaving just one frame in the stack trace. Fix this by always initializing top/bottom like `261d6e05d5`. Reviewed By: eugenis, yln Differential Revision: https://reviews.llvm.org/D123562	2022-04-12 11:24:19 -07:00
Martin Sebor	deadda749a	[InstCombine] Add more memrchr tests (NFC).	2022-04-12 11:55:33 -06:00
Jonathan Peyton	d49ce7c356	[OpenMP][libomp] Replace global variable references with local object Remove references to global __kmp_topology within a kmp_topology_t object method. There should just be implicit references to the private object.	2022-04-12 12:50:41 -05:00
Arthur Eubanks	9faab435a3	[docs] Mention that we are in the process of removing the legacy PM for the optimization pipeline And remove references to flags to turn it off. Reviewed By: nikic, MaskRay Differential Revision: https://reviews.llvm.org/D123547	2022-04-12 10:47:58 -07:00

1 2 3 4 5 ...

420879 Commits All Branches Search

420879 Commits

All Branches