llvm-project

Commit Graph

Author	SHA1	Message	Date
Richard Smith	849c60541b	PR47792: Include the type of a pointer or reference non-type template parameter in its notion of template argument identity. We already did this for all the other kinds of non-type template argument. We're still missing the type from the mangling, so we continue to be able to see collisions at link time; that's an open ABI issue.	2020-10-11 15:59:49 -07:00
Craig Topper	9e72d3eaf3	[ValueTracking] Use KnownBits::countMaxLeadingZeros/countMaxTrailingZeros to make code more readable. NFC	2020-10-11 14:26:18 -07:00
Richard Smith	c25da4b04a	Fix arc lint's clang-format rule: only format the file we were asked to format. This avoids diffs being applied in the work tree to files that are supposed to be excluded (clang tests), allows arc to properly provide interactive feedback for the formatting fixes, and reduces the number of files that we format, in a change affecting N files, from N^2 to N.	2020-10-11 14:24:23 -07:00
Christian Iversen	a9cefc3dee	[ELF] Fix broken bitstream linking with lld when e_machine > 255 In ELF/InputFiles.cpp, getBitcodeMachineKind() is limited to uint8_t return type. This works as long as EM_xxx is < 256, which is true for common architectures, but not for some newly assigned or unofficial EM_* values. The corresponding ELF field (e_machine) can hold uint16_t. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D89185	2020-10-11 14:19:25 -07:00
Tres Popp	8178e41dc1	[mlir] Type erase inputs to select statements in shape.broadcast lowering. This is required or broadcasting with operands of different ranks will lead to failures as the select op requires both possible outputs and its output type to be the same. Differential Revision: https://reviews.llvm.org/D89134	2020-10-11 21:58:06 +02:00
Nathan Ridge	f82346fd73	[clangd] Avoid relations being overwritten in a header shard Fixes https://github.com/clangd/clangd/issues/510 Differential Revision: https://reviews.llvm.org/D87256	2020-10-11 15:32:54 -04:00
Roman Lebedev	544a6aa267	[InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load And another step towards transforms not introducing inttoptr and/or ptrtoint casts that weren't there already. As we've been establishing (see D88788/D88789), if there is a int<->ptr cast, it basically must stay as-is, we can't do much with it. I've looked, and the most source of new such casts being introduces, as far as i can tell, is this transform, which, ironically, tries to reduce count of casts.. On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in -33.58% less `IntToPtr`s (19014 -> 12629) and +76.20% more `PtrToInt`s (18589 -> 32753), which is an increase of +20.69% in total. However just on RawSpeed, where i know there are basically none `IntToPtr` in the original source code, this results in -99.27% less `IntToPtr`s (2724 -> 20) and +82.92% more `PtrToInt`s (4513 -> 8255). which is again an increase of 14.34% in total. To me this does seem like the step in the right direction, we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`, which seems like a reasonable trade-off. See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995 for some more discussion on the subject. (Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair` should be taught about this, yes) Reviewed By: nlopes, nikic Differential Revision: https://reviews.llvm.org/D88979	2020-10-11 20:24:28 +03:00
Fangrui Song	cbe4d973ed	[X86] Define __LAHF_SAHF__ if feature 'sahf' is set or 32-bit mode GCC 11 will define this macro. In LLVM, the feature flag only applies to 64-bit mode and we always define the macro in 32-bit mode. This is different from GCC -m32 in which -mno-sahf can suppress the macro. The discrepancy can unlikely cause trouble. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89198	2020-10-11 09:46:00 -07:00
David Green	be6e8e50f4	[LV] Tail folded inloop reductions. This expands upon the inloop reductions added in e9761688e41cb9e976, allowing them to be inserted into tail folded loops. Reductions are generates with the form: x = select(mask, vecop, zero) v = vecreduce.add(x) c = add chain, v Where zero here is chosen as the identity value for add reductions. The backend is then expected to fold the select and the vecreduce into a single predicated instruction. Most of the code is fairly straight forward, except for the creation of blockmasks which need to ensure they are created in dominance order. The order they are added is altered to be after any phis, keeping the requirements for the underlying IR. Differential Revision: https://reviews.llvm.org/D84451	2020-10-11 16:58:34 +01:00
Zinovy Nis	32d565b461	[clang-tidy] Fix crash in readability-function-cognitive-complexity on weak refs Fix for https://bugs.llvm.org/show_bug.cgi?id=47779 Differential Revision: https://reviews.llvm.org/D89194	2020-10-11 18:52:38 +03:00
Nikita Popov	d7186fe371	[MemCpyOpt] Add lifetime may alias test (NFC) Test the case where a lifetime intrinsic may alias the memcpy source. Other cases test must or no alias.	2020-10-11 17:08:28 +02:00
David Green	8f2cacae67	[LV] Extra predicated inloop reduction tests. NFC	2020-10-11 15:06:21 +01:00
Nikita Popov	bdb193a6ed	[MemCpyOpt] Add additional byval tests (NFC) Test read/write clobbers and the the non-local case.	2020-10-11 15:22:31 +02:00
Sanjay Patel	3f3356bdd9	[InstCombine] allow vector splats for add+xor --> shifts	2020-10-11 09:04:24 -04:00
Sanjay Patel	f81200ae99	[InstCombine] add one-use check to add+xor transform As shown in the affected test, we could increase instruction count without this limitation. There's another test with extra use that shows we still convert directly to a real "sext" if possible.	2020-10-11 09:04:24 -04:00
Sanjay Patel	85c7653d92	[InstCombine] add tests with extra uses for add+xor transform; NFC	2020-10-11 09:04:24 -04:00
Sanjay Patel	c5138e61e1	[InstCombine] add/adjust tests for add+xor -> shifts; NFC	2020-10-11 09:04:24 -04:00
Kazushi (Jam) Marukawa	86f69689f9	[VE][NFC] Clean VEISelLowering.cpp Clean the order of setOperationActions and others. Differential Revision: https://reviews.llvm.org/D89203	2020-10-11 21:47:50 +09:00
Simon Pilgrim	c7f3bc87d3	Fix Wdocumentation warning. NFCI. Add a space after /param names before any commas otherwise the doxygen parsers get confused.	2020-10-11 11:25:22 +01:00
Simon Pilgrim	913d7a110e	[X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16. This is my first LLVM patch, so please tell me if there are any process issues. The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one. We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this. Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case. Patch By: @TomHender (Tom Hender) ActuallyaDeviloper Differential Revision: https://reviews.llvm.org/D87236	2020-10-11 11:21:23 +01:00
Simon Pilgrim	7c71b44980	[InstCombine] Remove accidental unnecessary ConstantExpr qualification added in rGb752daa26b64155 MSVC didn't complain but everything else did....	2020-10-11 10:39:51 +01:00
Simon Pilgrim	b97093e520	[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139). Differential Revision: https://reviews.llvm.org/D88783	2020-10-11 10:37:20 +01:00
Simon Pilgrim	b752daa26b	[InstCombine] Replace getLogBase2 internal helper with ConstantExpr::getExactLogBase2. NFCI. This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to. The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.	2020-10-11 10:31:17 +01:00
Tobias Gysi	93377888ae	[mlir] add scf.if op canonicalization pattern that removes unused results The patch adds a canonicalization pattern that removes the unused results of scf.if operation. As a result, cse may remove unused computations in the then and else regions of the scf.if operation. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D89029	2020-10-11 10:40:28 +02:00
Xun Li	667dfe39ca	[Coroutines] Refactor/Rewrite Spill and Alloca processing This patch is a refactoring of how we process spills and allocas during CoroSplit. In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas. And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points. This approach is fundamentally confusing, and unfortunately, incorrect. First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately. Doing so simplify lots of code and makes the logic more clear and easier to reason about. Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap. There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship. For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call. Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!). All these issues suggest that we need to separate spill and alloca in order to properly implement this. This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch. The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap). Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index. In doing so, I also cleaned up a few things and also discovered a few other bugs. Cleanups: 1. Found out that PromiseFieldId is not used, delete it. 2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users. 3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap. 4. All the loops that process Spills are simplified now because we use a map instead of a vector. Bugs: It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function. The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame. Differential Revision: https://reviews.llvm.org/D88872	2020-10-10 22:21:34 -07:00
Craig Topper	9895327914	[X86] Redefine X86ISD::PEXTRB/W and X86ISD::PINSRB/PINSRW to use a i8 TargetConstant for the immediate instead of a ptr constant. This is more consistent with other target specific ISD opcodes that require immediates.	2020-10-10 21:50:58 -07:00
Craig Topper	7f1b2a6125	[X86] AMX intrinsics should have ImmArg for the register numbers and use timm in isel patterns.	2020-10-10 20:12:28 -07:00
Craig Topper	375849518d	[X86] Add a X86ISD::BEXTRI to distinquish the case where the control must be a constant. The bextri intrinsic has a ImmArg attribute which will be converted in SelectionDAG using TargetConstant. We previously converted this to a plain Constant to allow X86ISD::BEXTR to call SimplifyDemandedBits on it. But while trying to decide if D89178 was safe, I realized that this conversion of TargetConstant to Constant would be one case where that would break. So this patch adds a new opcode specifically for the immediate case. And then teaches computeKnownBits and SimplifyDemandedBits to also handle it, but not try to SimplifyDemandedBits on it. To make up for that, I immediately masked the constant to 16 bits when converting from the intrinsic node to the X86ISD node.	2020-10-10 19:18:06 -07:00
Krzysztof Parzyszek	9237e73ae8	[Hexagon] Replace HexagonISD::VSPLAT with ISD::SPLAT_VECTOR This removes VSPLAT and VZERO. VZERO is now SPLAT_VECTOR of (i32 0). Included is also a testcase for the previous (target-independent) commit.	2020-10-10 19:49:47 -05:00
Krzysztof Parzyszek	61eaa2e14a	[SDAG] Remember to set UndefElts in isSplatValue for SPLAT_VECTOR	2020-10-10 19:42:24 -05:00
Fangrui Song	a8682554c6	[X86] Delete redundant 'static' from namespace scope 'static constexpr'. NFC This decreases 7 lines as the result of packing more bits on one line.	2020-10-10 14:05:49 -07:00
Simon Pilgrim	702ccb40e2	[InstCombine] getLogBase2(undef) -> 0. Move the undef element handling into the getLogBase2 helper instead of pre-empting with replaceUndefsWith.	2020-10-10 20:29:03 +01:00
Alex Denisov	d0c8d58527	Fix CMake configuration error when run with -Werror/-Wall The following code doesn't compile uint64_t i = x.load(std::memory_order_relaxed); return 0; when CMAKE_C_FLAGS set to -Werror -Wall, thus incorrectly breaking the CMake configuration step: -- Looking for __atomic_load_8 in atomic -- Looking for __atomic_load_8 in atomic - not found CMake Error at cmake/modules/CheckAtomic.cmake:79 (message): Host compiler appears to require libatomic for 64-bit operations, but cannot find it. Call Stack (most recent call first): cmake/config-ix.cmake:360 (include) CMakeLists.txt:671 (include)	2020-10-10 21:22:40 +02:00
Simon Pilgrim	3aab3cbd4a	[InstCombine] getLogBase2 - no need to specify Type. NFCI. In all the getLogBase2 uses, the specified Type is always the same as the constant being folded.	2020-10-10 20:09:55 +01:00
Simon Pilgrim	f68d174c16	Remove %tmp variables from test cases to appease update_test_checks.py	2020-10-10 19:13:16 +01:00
Simon Pilgrim	2c3e4a21f9	[PowerPC] ReplaceNodeResults - bail on funnel shifts and let generic legalizers deal with it Fixes regression raised on D88834 for 32-bit triple + 64-bit cpu cases (which apparently is a thing).	2020-10-10 19:13:16 +01:00
Krzysztof Parzyszek	4af6c6bf3c	Define splat_vector for ISD::SPLAT_VECTOR in TargetSelectionDAG.td	2020-10-10 13:12:20 -05:00
Martin Storsjö	5d330f435e	[lldb] [Windows] Remove unused functions. NFC. These became unused in `51117e3c51`.	2020-10-10 20:47:40 +03:00
Martin Storsjö	abaca237c5	[lldb] [Windows] Add missing 'override', silencing warnings. NFC. Also remove superfluous 'virtual' in overridden methods.	2020-10-10 20:47:40 +03:00
Simon Pilgrim	f2e08c688e	[PowerPC] Add ppc32 funnel shift test coverage	2020-10-10 18:19:42 +01:00
Simon Pilgrim	803b712330	[InstCombine] Add test case showing rotate intrinsic being split by SimplifyDemandedBits Noticed while triaging regression report on D88834	2020-10-10 18:19:42 +01:00
Michał Górny	8dc2faf642	[lldb] [Process/FreeBSDRemote] Fix double semicolon	2020-10-10 18:54:52 +02:00
Michał Górny	9a37587ee3	[lldb] [Process/FreeBSDRemote] Kill process via PT_KILL Use PT_KILL to kill the stopped process. This ensures that the process termination is reported properly and fixes delay/error on killing it. Differential Revision: https://reviews.llvm.org/D89182	2020-10-10 18:54:05 +02:00
Michał Górny	d83cd73e9d	[lldb] [Process/FreeBSD] Mark methods override in RegisterContext* Differential Revision: https://reviews.llvm.org/D89181	2020-10-10 18:52:23 +02:00
Philip Reames	d89de5a14e	Step down from security group Resigning from security group as Azul representative as I have left Azul. Previously communicated via email with security group. Differential Revision: https://reviews.llvm.org/D88933	2020-10-10 09:48:02 -07:00
Tim Renouf	666ef0db20	[AMDGPU] Add gfx602, gfx705, gfx805 targets At AMD, in an internal audit of our code, we found some corner cases where we were not quite differentiating targets enough for some old hardware. This commit is part of fixing that by adding three new targets: * The "Oland" and "Hainan" variants of gfx601 are now split out into gfx602. LLPC (in the GPUOpen driver) and other front-ends could use that to avoid using the shaderZExport workaround on gfx602. * One variant of gfx703 is now split out into gfx705. LLPC and other front-ends could use that to avoid using the shaderSpiCsRegAllocFragmentation workaround on gfx705. * The "TongaPro" variant of gfx802 is now split out into gfx805. TongaPro has a faster 64-bit shift than its former friends in gfx802, and a subtarget feature could be set up for that to take advantage of it. This commit does not make that change; it just adds the target. V2: Add clang changes. Put TargetParser list in order. V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order, so fix the GPUKind order. Differential Revision: https://reviews.llvm.org/D88916 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-10-10 17:22:22 +01:00
Florian Hahn	d48b249b71	[SCEV] Add test cases where the max BTC is imprecise, due to step != 1. Add a test case where we fail to compute a tight max backedge taken count, due to the step being != 1. This is part of the issue with PR40961.	2020-10-10 16:39:48 +01:00
Florian Hahn	2e9fd754b4	[SCEV] Handle ULE in applyLoopGuards. Handle ULE predicate in similar fashion to ULT predicate in applyLoopGuards.	2020-10-10 16:26:28 +01:00
Florian Hahn	2c6fc28aba	[SCEV] Add a test case with ULE loop guard.	2020-10-10 15:58:26 +01:00
Nikita Popov	329dbdaaaf	[MemCpyOpt] Add test for incorrect memset DSE (NFC) We can't shorten the memset if there's a throwing call in between and the destination is non-local.	2020-10-10 16:11:14 +02:00

1 2 3 4 5 ...

368628 Commits All Branches Search

368628 Commits

All Branches