llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	3c6fe0fc77	[BasicAA] Fix stale comment (NFC) DataLayout is always around...	2020-10-17 23:58:58 +02:00
Dávid Bolvanský	2a75e956e5	Revert "[InferAttrs] Add argmemonly attribute to string libcalls" This reverts commit `b77dd32a6f`. Sanitizer tests are broken.	2020-10-17 23:29:02 +02:00
Dávid Bolvanský	b77dd32a6f	[InferAttrs] Add argmemonly attribute to string libcalls Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89602	2020-10-17 22:42:36 +02:00
Roman Lebedev	ec54867df5	[SCEV] Model `ashr exact x, C` as `(abs(x) EXACT/u (1<<C)) * signum(x)` It's not pretty, but probably better than modelling it as an opaque SCEVUnknown, i guess. It is relevant e.g. for the loop that was brought up in https://bugs.llvm.org/show_bug.cgi?id=46786#c26 as an example of what we'd be able to better analyze once SCEV handles `ptrtoint` (D89456). But as it is evident, even if we deal with `ptrtoint` there, we also fail to model such an `ashr`. Also, modeling of mul-of-exact-shr/div could use improvement. As per alive2: https://alive2.llvm.org/ce/z/tnfZKd ``` define i8 @src(i8 %0) { %2 = ashr exact i8 %0, 4 ret i8 %2 } declare i8 @llvm.abs(i8, i1) declare i8 @llvm.smin(i8, i8) declare i8 @llvm.smax(i8, i8) define i8 @tgt(i8 %x) { %abs_x = call i8 @llvm.abs(i8 %x, i1 false) %div = udiv exact i8 %abs_x, 16 %t0 = call i8 @llvm.smax(i8 %x, i8 -1) %t1 = call i8 @llvm.smin(i8 %t0, i8 1) %r = mul nsw i8 %div, %t1 ret i8 %r } ``` Transformation seems to be correct!	2020-10-17 21:22:24 +03:00
Roman Lebedev	130cc662b5	[NFC][SCEV] Refactor getAbsExpr() out of createSCEV()	2020-10-17 21:21:02 +03:00
Roman Lebedev	be1678bdb9	[NFC][SCEV] Add 'getMinusOne()' method	2020-10-17 21:20:58 +03:00
Sanjay Patel	53e92b4c0e	[InstCombine] (~A & B) ^ A -> A \| B Differential Revision: https://reviews.llvm.org/D86395	2020-10-17 12:20:18 -04:00
Nikita Popov	50cc9a0e61	[MemCpyOpt] Extract common function for unwinding check These two cases should be using the same logic. Not NFC, as this resolves the TODO regarding use of the underlying object.	2020-10-17 15:30:39 +02:00
Pedro Tammela	60b19424bb	[NFC] fix some typos in LoopUnrollPass This patch fixes a couple of typos in the LoopUnrollPass.cpp comments Differential Revision: https://reviews.llvm.org/D89603	2020-10-17 14:20:55 +01:00
David Green	b93d74ac9c	[ARM] Basic getArithmeticReductionCost reduction costs This adds some basic costs for MVE reductions - currently just costing the simple legal add vectors as a single MVE instruction. More complex costing can be added in the future when the framework more readily allows it. Differential Revision: https://reviews.llvm.org/D88980	2020-10-17 10:29:00 +01:00
David Green	d79ee3a807	[ARM] Add a very basic active_lane_mask cost This adds a very basic cost for active_lane_mask under MVE - making the assumption that they will be free and then apologizing for that in a comment. In reality they may either be free (by being nicely folded into a tail predicated loop), cost the same as a VCTP or be expanded into vdup's, adds and cmp's. It is difficult to detect the difference from a single getIntrinsicInstrCost call, so makes the assumption that the vectorizer is adding them, and only added them where it makes sense. We may need to change this in the future to better model predicate costs in the vectorizer, especially at -Os or non-tail predicated loops. The vectorizer currently does not query the cost of these instructions but that will change in the future and a zero cost there probably makes the most sense at the moment. Differential Revision: https://reviews.llvm.org/D88989	2020-10-17 10:09:42 +01:00
Juneyoung Lee	62a0ec1612	Add support for !noundef metatdata on loads This patch adds metadata !noundef and makes load instructions can optionally have it. A load with !noundef always return a well-defined value (has no undef bit or isn't poison). If the loaded value isn't well defined, the behavior is undefined. This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values. It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise. The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead. The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89050	2020-10-17 13:50:10 +09:00
Alok Kumar Sharma	0538353b3b	[DebugInfo] Support for DWARF operator DW_OP_over LLVM rejects DWARF operator DW_OP_over. This DWARF operator is needed for Flang to support assumed rank array. Summary: Currently LLVM rejects DWARF operator DW_OP_over. Below error is produced when llvm finds this operator. [..] invalid expression !DIExpression(151, 20, 16, 48, 30, 35, 80, 34, 6) warning: ignoring invalid debug info in over.ll [..] There were some parts missing in support of this operator, which are now completed. Testing -added a unit testcase -check-debuginfo -check-llvm Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D89208	2020-10-17 08:42:28 +05:30
Craig Topper	278bd06891	[TargetLowering] Extract simplifySetCCs ctpop into a separate function. NFCI As requested in D89346. This allows us to add some early outs. I reordered some checks a little bit to make the more common bail outs happen earlier. Like checking opcode before checking hasOneUse. And I moved the bit width check to make sure it was safe to look through a truncate to the spot where we look through truncates instead of after. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89494	2020-10-16 19:47:56 -07:00
Alina Sbirlea	dc97138123	[MemorySSA] Verify clobbering within reachable blocks. Resolves PR45976.	2020-10-16 17:46:28 -07:00
Amara Emerson	4ad459997e	[AArch64][GlobalISel] Select csinc if a select has a 1 on RHS. Differential Revision: https://reviews.llvm.org/D89513	2020-10-16 16:49:52 -07:00
Albion Fung	d30155feaa	[PowerPC] Implementation of 128-bit Binary Vector Rotate builtins This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10. Differential Revision: https://reviews.llvm.org/D86819	2020-10-16 18:03:22 -04:00
Jameson Nash	4242df1470	Revert "make the AsmPrinterHandler array public" I messed up one of the tests.	2020-10-16 17:22:07 -04:00
Jameson Nash	ac2def2d8d	make the AsmPrinterHandler array public This lets external consumers customize the output, similar to how AssemblyAnnotationWriter lets the caller define callbacks when printing IR. The array of handlers already existed, this just cleans up the code so that it can be exposed publically. Differential Revision: https://reviews.llvm.org/D74158	2020-10-16 16:27:31 -04:00
Artem Belevich	c36c0fabd1	[VectorCombine] Avoid crossing address space boundaries. We can not bitcast pointers across different address spaces, and VectorCombine should be careful when it attempts to find the original source of the loaded data. Differential Revision: https://reviews.llvm.org/D89577	2020-10-16 13:19:31 -07:00
Stanislav Mekhanoshin	874524ab88	[AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs Differential Revision: https://reviews.llvm.org/D89568	2020-10-16 12:37:22 -07:00
Nikita Popov	74c8c2d903	Revert "Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs"" This reverts commit `32b72c3165`. While better than before, this change still introduces a large compile-time regression (>3% on mafft): https://llvm-compile-time-tracker.com/compare.php?from=fbd62fe60fb2281ca33da35dc25ca3c87ec0bb51&to=32b72c3165bf65cca2e8e6197b59eb4c4b60392a&stat=instructions Additionally, the logic here doesn't look quite right to me, I will comment in more detail on the differential revision.	2020-10-16 21:36:33 +02:00
Arthur Eubanks	faf5210420	[CGSCC] Add -abort-on-max-devirt-iterations-reached option Aborts if we hit the max devirtualization iteration. Will be useful for testing that changes to devirtualization don't cause devirtualization to repeat passes more times than necessary. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D89519	2020-10-16 12:34:52 -07:00
Austin Kerbow	978fbd8268	[AMDGPU] Run hazard recognizer pass later If instructions were removed in peephole passes after the hazard recognizer was run it is possible that new hazards could be introduced. Fixes: SWDEV-253090 Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D89077	2020-10-16 12:15:51 -07:00
Amara Emerson	39c05a1a71	[AArch64][GlobalISel] Add selection support for v2s32 and v2s64 reductions for FADD/ADD. We'll need legalizer lower() support for the other types to work. Differential Revision: https://reviews.llvm.org/D89159	2020-10-16 11:41:57 -07:00
Benjamin Kramer	b740899c50	[Indvars][NFCI] Simplify assertion. This should be semantically identical. Also avoids unused variable warnings in Release builds.	2020-10-16 19:58:55 +02:00
Amara Emerson	32f77eea2d	[AArch64][GlobalISel] Regbankselect reductions to use FPR bank for scalars. Differential Revision: https://reviews.llvm.org/D89075	2020-10-16 10:42:15 -07:00
Amara Emerson	9190411fcf	[AArch64][GlobalISel] Add basic legalizer rules for supported add/fadd reductions. NEON is pretty limited in it's reduction support. As a first step add some basic rules for the legal types we can select. Differential Revision: https://reviews.llvm.org/D89070	2020-10-16 10:35:46 -07:00
Amara Emerson	6042c25b0a	[GlobalISel] Add translation support for vector reduction intrinsics. In order to prevent the ExpandReductions pass from expanding some intrinsics before they get to codegen, I had to add a -disable-expand-reductions flag for testing purposes. Differential Revision: https://reviews.llvm.org/D89028	2020-10-16 10:17:53 -07:00
Jay Foad	1417abe54c	[AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic Differential Revision: https://reviews.llvm.org/D89558	2020-10-16 17:10:21 +01:00
Jay Foad	0c1381d795	[llc] Use -filetype=null to disable MIR printing If you use -stop-after or similar options, llc will normally print MIR. This patch checks for -filetype=null as a special case to disable MIR printing. As the comment says, "The Null output is intended for use for performance analysis ...", and I found this useful for timing a subset of the passes that llc runs without the significant overhead of printing MIR just to send it to /dev/null. Differential Revision: https://reviews.llvm.org/D89476	2020-10-16 16:51:56 +01:00
Matt Arsenault	0a7cd99a70	Reapply "OpaquePtr: Add type to sret attribute" This reverts commit `eb9f7c28e5`. Previously this was incorrectly handling linking of the contained type, so this merges the fixes from D88973.	2020-10-16 11:05:02 -04:00
Krzysztof Parzyszek	97533b10b2	[Hexagon] Fix license headers in some .td files, NFC	2020-10-16 10:03:05 -05:00
Simon Pilgrim	83ae625f0c	[InstCombine] visitAnd - pull out repeated I.getType() calls. NFCI.	2020-10-16 15:43:11 +01:00
Simon Pilgrim	253f24cf4c	[InstCombine] Remove custom and(trunc(and(x,c1)),c2) fold This is more correctly handled by canEvaluateTruncated (one use checks etc.) and covers all the tests cases that were added for this fold.	2020-10-16 15:43:10 +01:00
Matt Arsenault	ce16b6835b	AMDGPU: Don't kill super-register with overlapping copy This would end up killing part of the result super-register, resulting in a verifier error on a later use of the overlapping registers. We could add kills of any non-aliasing registers, but we should be moving away from relying on kill flags.	2020-10-16 09:34:35 -04:00
Michael Liao	98f254960f	[globalopt] Teach to look through `addrspacecast`. - so that global variables in numbered address spaces could be properly analyzed. Differential Revision: https://reviews.llvm.org/D89140	2020-10-16 08:43:09 -04:00
Max Kazantsev	0857029011	[Indvars][NFC] Merge two functions together Logic of widenWithVariantUse is split into check and transform part, unlike any other transform in IndVars. We want to pass some extra flags from analysis to transform part and standartize the code at once, so merging them together.	2020-10-16 19:21:57 +07:00
Simon Pilgrim	981fdf01d5	[InstCombine] foldSelectRotate - canonicalize to OR(SHL,LSHR). NFCI. Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15	2020-10-16 13:18:53 +01:00
Max Kazantsev	bb39372e5e	[Indvars][NFCI] Remove meaningless restrictive code in IndVars Variable ExtendOperExpr only exists to check whether it is a SCEV ext. We create it as SCEV ext right here, so semantically this check is trivially true. In theory, it may fail if SCEV is smart enough and can simplify the expression. However, no matter whether it is an ext or not, we never use this fact for further reasoning. So this code is currently useless and in theory may become harmful with SCEV's development. We do not expect any behavior changes with removing it. If it caused negative changes, the patch should be reverted.	2020-10-16 18:04:31 +07:00
Sebastian Neubauer	dd3f7a494a	[AMDGPU] Add a message to an assert	2020-10-16 13:03:20 +02:00
Max Kazantsev	0ee0c7dcc3	[Indvars][NFC] Remove duplicating checks Some facts have already been checked in widenWithVariantUse and then checked again in widenWithVariantUseCodegen. The latter is redundant, we can replace it with asserts.	2020-10-16 17:35:14 +07:00
Max Kazantsev	32b72c3165	Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs" It was reverted because of negative compile time impact. In this version, less powerful proof methods are used (non-recursive reasoning only), and scope limited to constant End values to avoid explision of complex proofs. Differential Revision: https://reviews.llvm.org/D89381	2020-10-16 17:35:13 +07:00
Cullen Rhodes	fbd62fe60f	[ValueTracking] Clarify TypeSize comparisons TypeSize comparisons using overloaded operators should be replaced by the new isKnownXY comparators when the operands can be fixed-length or scalable vectors. In ValueTracking there are several uses of the overloaded operators in `isKnownNonZero` and `ComputeMultiple`. In the former we already bail out on scalable vectors since we currently have no way to represent DemandedElts, and the latter is operating on scalar integers, so we can assume fixed-size in both instances. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D89387	2020-10-16 10:31:12 +00:00
Simon Pilgrim	1cf347e48b	[InstCombine] narrowRotate - minor refactoring for funnel shift support. NFC. Prep work for PR35155 - renamed narrowRotate to narrowFunnelShift, rewrote some comments and adjusted code to collect separate shift values, although we bail if they don't match (still only rotations are only actually folded). I'm trying to match matchFunnelShift as much as possible in case we finally get to merge these one day.	2020-10-16 11:27:28 +01:00
Simon Pilgrim	55991b44b7	[InstCombine] foldAndOrOfICmpsOfAndWithPow2 - add vector support Support vector cases for folding: (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2) (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2)	2020-10-16 10:41:40 +01:00
Dávid Bolvanský	28691cdd71	[MemLoc] Support memchr/memccpy in MemoryLocation::getForArgument Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D89321	2020-10-16 11:37:29 +02:00
Tony	e2af9bd611	[AMDGPU] Correct comment typo in AMDGPUSubtarget.h	2020-10-16 08:49:02 +00:00
Florian Hahn	51ff04567b	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." After investigation by @asbirlea, the issue that caused the revert appears to be an issue in the original source, rather than a problem with the compiler. This patch enables MemorySSA DSE again. This reverts commit `915310bf14`.	2020-10-16 09:02:53 +01:00
Nikita Popov	7d3b475810	Revert "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs" This reverts commit `905101c360`. This causes a large compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=cc175c2cc8e638462bab74e0781e06f9b6eb5017&to=905101c36025fe1c8ecdf9a20cd59db036676073&stat=instructions	2020-10-16 09:47:38 +02:00
Georgii Rymar	3cfd9384bf	[lib/ObjectYAML] - Simplify the code that handles Content/Size fields. This is a follow-up for D89039 patch, which adds a support for `Content`/`Size` for all sections. Assuming that all of sections have a support of these 2 fields, we can simplify and generalize the code. Depends on D89039 Differential revision: https://reviews.llvm.org/D89120	2020-10-16 09:57:27 +03:00
Max Kazantsev	1eb2c6d23f	[SCEV][NFC] Split out type balancing in implication engine We plan to introduce more advanced ways of dealing with different types.	2020-10-16 13:40:24 +07:00
Kito Cheng	cfa7094e49	[RISCV] Add -mtune support - The goal of this patch is improve option compatible with RISCV-V GCC, -mcpu support on GCC side will sent patch in next few days. - -mtune only affect the pipeline model and non-arch/extension related target feature, e.g. instruction fusion; in td file it called TuneFeatures, which is introduced by X86 back-end[1]. - -mtune accept all valid option for -mcpu and extra alias processor option, e.g. `generic`, `rocket` and `sifive-7-series`, the purpose is option compatible with RISCV-V GCC. - Processor alias for -mtune will resolve according the current target arch, rv32 or rv64, e.g. `rocket` will resolve to `rocket-rv32` or `rocket-rv64`. - Interaction between -mcpu and -mtune: * -mtune has higher priority than -mcpu for pipeline model and TuneFeatures. [1] https://reviews.llvm.org/D85165 Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D89025	2020-10-16 13:55:08 +08:00
Max Kazantsev	905101c360	[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs We can sharpen the range of a AddRec if we know that it does not self-wrap and know the symbolic iteration count in the loop. If we can evaluate the value of AddRec on the last iteration and prove that at least one its intermediate value lies between start and end, then no-wrap flag allows us to conclude that all of them also lie between start and end. So the estimate of range can be improved to union of ranges of start and end. Differential Revision: https://reviews.llvm.org/D89381 Reviewed By: efriedma	2020-10-16 12:00:39 +07:00
Vedant Kumar	273c299d5d	[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting This patch adds -f[no-]split-cold-code CC1 options to clang. This allows the splitting pass to be toggled on/off. The current method of passing `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose correctly (say, with `-O0` or `-Oz`). To implement the -fsplit-cold-code option, an attribute is applied to functions to indicate that they may be considered for splitting. This removes some complexity from the old/new PM pipeline builders, and behaves as expected when LTO is enabled. Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> Differential Revision: https://reviews.llvm.org/D57265 Reviewed By: Aditya Kumar, Vedant Kumar Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar	2020-10-15 23:13:33 +00:00
Jessica Paquette	609d765cd3	[AArch64][GlobalISel] NFC: Refactor emitIntegerCompare Simplify emitIntegerCompare and improve comments + asserts. Mostly making the code a little easier to follow. Also, this code is only used for G_ICMP. The legalizer ensures that the LHS/RHS for every G_ICMP is either a s32 or s64. So, there's no need to handle anything else. This lets us remove a bunch of checks for whether or not we successfully emitted the compare. Differential Revision: https://reviews.llvm.org/D89433	2020-10-15 16:04:08 -07:00
Amara Emerson	c2551c1f40	[GlobalISel] Remove scalar src from non-sequential fadd/fmul reductions. It's probably better to split these into separate G_FADD/G_FMUL + G_VECREDUCE operations in the translator rather than carrying the scalar around. The majority of the time it'll get simplified away as the scalars are probably identity values. Differential Revision: https://reviews.llvm.org/D89150	2020-10-15 15:51:44 -07:00
Kazushi (Jam) Marukawa	a91dd3d37d	[VE] Add VGT/VSC/PFCHV instructions Add VGT/VSC/PFCHV vector instructions and regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89471	2020-10-16 06:28:22 +09:00
Kazushi (Jam) Marukawa	410e5b17cf	[VE] Support fabs/fcos/fsin/fsqrt math functions VE doesn't have instruction for fabs/fcos/fsin/fsqrt, so expand them. Add regression tests also. Update fcopysign regression test, also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89457	2020-10-16 06:27:38 +09:00
Thomas Lively	1992e30c2d	[WebAssembly] Prototype i8x16.popcnt As proposed at https://github.com/WebAssembly/simd/pull/379. Use a target builtin and intrinsic rather than normal codegen patterns to make the instruction opt-in until it is merged to the proposal and stabilized in engines. Differential Revision: https://reviews.llvm.org/D89446	2020-10-15 21:18:22 +00:00
Jameson Nash	122d92dfc3	fix symbol printing on windows Similar to MCSymbol::print in `3d6c8ebb58` (llvm-svn: 81682, PR4966), these symbols may need to be quoted to be handled by the linker correctly. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D87099	2020-10-15 17:14:55 -04:00
Florian Hahn	89c0124273	[LoopVersion] Unify SCEVChecks and alias check handling (NFC). This is an initial cleanup of the way LoopVersioning interacts with LAA. Currently LoopVersioning has 2 ways of initializing things: 1. Passing LAI and passing UseLAIChecks = true 2. Passing UseLAIChecks = false, followed by calling setSCEVChecks and setAliasChecks. Both ways of initializing lead to the same result and the duplication seems more complicated than necessary. This patch removes the UseLAIChecks flag from the constructor and the setSCEVChecks & setAliasChecks helpers and move initialization exclusively to the constructor. This simplifies things, by providing a single way to initialize LoopVersioning and reducing duplication. Reviewed By: Meinersbur, lebedev.ri Differential Revision: https://reviews.llvm.org/D84406	2020-10-15 22:02:17 +01:00
alex-t	42ed388120	[AMDGPU] SILowerControlFlow::removeMBBifRedundant should not try to change MBB layout if it can fallthrough removeMBBifRedundant normally tries to keep predecessors fallthrough when removing redundant MBB. It has to change MBBs layout to keep the new successor to immediately follow the predecessor of removed MBB. It only may be allowed in case the new successor itself has no successors to which it fall through. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89397	2020-10-15 23:20:54 +03:00
Evgenii Stepanov	2e794a46b5	[AArch64] Stack frame reordering. Implement stack frame reordering in the AArch64 backend. Unlike the X86 implementation, AArch64 does not seem to benefit from "access density" based frame reordering, mainly because it has a much smaller variety of addressing modes, and the fact that all instructions are 4 bytes so each frame object is either in range of an instruction (and then the access is "free") or not (and that has a code size cost of 4 bytes). This change improves Memory Tagging codegen by * Placing an object that has been chosen as the base tagged pointer of the function at SP + 0. This saves one instruction to setup the pointer (IRG does not have an offset immediate), and more because that object can now be referenced without materializing its tagged address in a scratch register. * Placing objects that go out of scope simultaneously together. This exposes opportunities for instruction merging in tryMergeAdjacentSTG. Differential Revision: https://reviews.llvm.org/D72366	2020-10-15 12:50:16 -07:00
Evgenii Stepanov	2f63e57fa5	[MTE] Pin the tagged base pointer to one of the stack slots. Summary: Pin the tagged base pointer to one of the stack slots, and (if necessary) rewrite tag offsets so that an object that occupies that slot has both address and tag offsets of 0. This allows ADDG instructions for that object to be eliminated and their uses replaced with the tagged base pointer itself. This optimization must be done in machine instructions and not in the IR instrumentation pass, because referring to a stack slot through an IRG pointer would confuse the stack coloring pass. The optimization makes a (pretty naive) attempt to find the slot that would benefit the most by counting the uses of stack slots in the function. Reviewers: ostannard, pcc Subscribers: merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72365	2020-10-15 12:50:16 -07:00
Stanislav Mekhanoshin	d1beb95d12	[AMDGPU] gfx1032 target Differential Revision: https://reviews.llvm.org/D89487	2020-10-15 12:41:18 -07:00
Thomas Lively	3f738d1f5e	Reland "[WebAssembly] v128.load{8,16,32,64}_lane instructions" This reverts commit `7c8385a352` with a typing fix to an instruction selection pattern.	2020-10-15 19:32:34 +00:00
Anh Tuyen Tran	224fd6ff48	[NFC][CaptureTracking] Move static function isNonEscapingLocalObject to llvm namespace Function isNonEscapingLocalObject is a static one within BasicAliasAnalysis.cpp. It wraps around PointerMayBeCaptured of CaptureTracking, checking whether a pointer is to a function-local object, which never escapes from the function. Although at the moment, isNonEscapingLocalObject is used only by BasicAliasAnalysis, its functionality can be used by other pass(es), one of which I will put up for review very soon. Instead of copying the contents of this static function, I move it to llvm scope, and place it amongst other functions with similar functionality in CaptureTracking. The rationale for the location are: - Pointer escape and pointer being captured are actually two sides of the same coin - isNonEscapingLocalObject is wrapping around another function in CaptureTracking Reviewed By: jdoerfert (Johannes Doerfert) Differential Revision: https://reviews.llvm.org/D89465	2020-10-15 18:37:29 +00:00
David Green	13ec3dd66f	[LV] Add a getRecurrenceBinOp and make use of it. NFC	2020-10-15 18:21:41 +01:00
Hiroshi Yamauchi	1ebee7adf8	[PGO] Remove the old memop value profiling buckets. Following up D81682 and D83903, remove the code for the old value profiling buckets, which have been replaced with the new, extended buckets and disabled by default. Also syncing InstrProfData.inc between compiler-rt and llvm. Differential Revision: https://reviews.llvm.org/D88838	2020-10-15 10:09:49 -07:00
Thomas Lively	7c8385a352	Revert "[WebAssembly] v128.load{8,16,32,64}_lane instructions" This reverts commit `7c6bfd90ab`.	2020-10-15 15:49:36 +00:00
Thomas Lively	7c6bfd90ab	[WebAssembly] v128.load{8,16,32,64}_lane instructions Prototype the newly proposed load_lane instructions, as specified in https://github.com/WebAssembly/simd/pull/350. Since these instructions are not available to origin trial users on Chrome stable, make them opt-in by only selecting them from intrinsics rather than normal ISel patterns. Since we only need rough prototypes to measure performance right now, this commit does not implement all the load and store patterns that would be necessary to make full use of the offset immediate. However, the full suite of offset tests is included to make it easy to track improvements in the future. Since these are the first instructions to have a memarg immediate as well as an additional immediate, the disassembler needed some additional hacks to be able to parse them correctly. Making that code more principled is left as future work. Differential Revision: https://reviews.llvm.org/D89366	2020-10-15 15:33:10 +00:00
sunshaoce	2de693756f	[RISCV] fix a mistake in RISCVInstrInfoV.td A commit of VALUVVNoVm was wrong, fixed it. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D88142	2020-10-15 23:16:53 +08:00
Simon Pilgrim	23f1616626	[InstCombine] Use m_SpecificInt instead of m_APInt + comparison. NFCI.	2020-10-15 16:06:27 +01:00
Simon Pilgrim	b3330ae42c	[InstCombine] SimplifyDemandedUseBits - xor - refactor cast<ConstantInt> usage to PatternMatch. NFCI. First step towards replacing these to add full vector support.	2020-10-15 16:06:23 +01:00
Simon Pilgrim	2b45639ea0	[InstCombine] InstCombineAndOrXor - refactor cast<ConstantInt> usages to PatternMatch. NFCI. First step towards replacing these to add full vector support.	2020-10-15 16:06:17 +01:00
Paul C. Anagnostopoulos	4767bb2c0c	[TableGen] Add the !not and !xor operators. Update the TableGen Programmer's Reference.	2020-10-15 10:12:59 -04:00
Matt Arsenault	663f16684d	AMDGPU: Fix verifier error on killed spill of partially undef register This does unfortunately end up with extra waitcnts getting inserted that were avoided before. Ideally we would avoid the spills of these undef components in the first place.	2020-10-15 09:45:44 -04:00
Simon Pilgrim	09be7623e4	[InstCombine] visitXor - refactor ((X^C1)>>C2)^C3 -> (X>>C2)^((C1>>C2)^C3) fold. NFCI. This is still ConstantInt-only (scalar) but is refactored to use PatternMatch to make adding vector support in the future relatively trivial.	2020-10-15 14:38:15 +01:00
Carl Ritson	b70cb50204	[AMDGPU] Minimize number of s_mov generated by copyPhysReg Generate the minimal set of s_mov instructions required when expanding a SGPR copy operation in copyPhysReg. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D89187	2020-10-15 22:35:02 +09:00
Denis Antrushin	8f0ddd4a1a	[Statepoints] Remove MI limit on number of tied operands. After D87915 statepoint can have more than 15 tied operands. Remove this restriction from statepoint lowering code.	2020-10-15 19:02:38 +07:00
Adrian Kuegel	ead2aa7098	Fix unused variable warning when compiling with asserts disabled. Differential Revision: https://reviews.llvm.org/D89454	2020-10-15 12:50:19 +02:00
Jeremy Morse	c521e44def	[DebugInstrRef] Support recording of instruction reference substitutions Add a table recording "substitutions" between pairs of <instruction, operand> numbers, from old pairs to new pairs. Post-isel optimizations are able to record the outcome of an optimization in this way. For example, if there were a divide instruction that generated the quotient and remainder, and it were replaced by one that only generated the quotient: $rax, $rcx = DIV-AND-REMAINDER $rdx, $rsi, debug-instr-num 1 DBG_INSTR_REF 1, 0 DBG_INSTR_REF 1, 1 Became: $rax = DIV $rdx, $rsi, debug-instr-num 2 DBG_INSTR_REF 1, 0 DBG_INSTR_REF 1, 1 We could enter a substitution from <1, 0> to <2, 0>, and no substitution for <1, 1> as it's no longer generated. This approach means that if an instruction or value is deleted once we've left SSA form, all variables that used the value implicitly become "optimized out", something that isn't true of the current DBG_VALUE approach. Differential Revision: https://reviews.llvm.org/D85749	2020-10-15 11:30:14 +01:00
Simon Pilgrim	fadd152317	[AggressiveInstCombine] foldAnyOrAllBitsSet - add uniform vector support Replace m_ConstantInt with m_APInt to support uniform vectors (with no undef elements) Adding non-undef support would involve some refactoring of the MaskOps struct but this might still be worth it.	2020-10-15 11:02:35 +01:00
Denis Antrushin	8c2b69d53a	[Statepoints] Unlimited tied operands. Current limit on amount of tied operands (15) sometimes is too low for statepoint. We may get couple dozens of gc pointer operands on statepoint. Review D87154 changed format of statepoint to list every gc pointer only once, which makes it trivial to find tiedness relation between statepoint operands: defs are mapped 1-1 to gc pointer operands passed on registers. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D87915	2020-10-15 16:16:11 +07:00
Georgii Rymar	d8bb30c551	[yaml2obj] - Allow specifying no tags to create empty sections in few cases. Currently we have a few sections that does not support specifying no keys for them. E.g. it is required that one of "Content", "Size" or "Entries" key is present. There is no reason to have this restriction. We can allow this and emit an empty section instead. This opens road for a simplification and generalization of the code in `validate()` that is discussed in the D89039 thread. Depends on D89039. Differential revision: https://reviews.llvm.org/D89391	2020-10-15 11:22:02 +03:00
Georgii Rymar	400103f3d5	[yaml2obj/obj2yaml] - Add support of 'Size' and 'Content' keys for all sections. Many sections either do not have a support of `Size`/`Content` or support just a one of them, e.g only `Content`. `Section` is the base class for sections. This patch adds `Content` and `Size` members to it and removes similar members from derived classes. This allows to cleanup and generalize the code and adds a support of these keys for all sections (`SHT_MIPS_ABIFLAGS` is a only exception, it requires unrelated specific changes to be done). I had to update/add many tests to test the new functionality properly. Differential revision: https://reviews.llvm.org/D89039	2020-10-15 11:11:41 +03:00
Craig Topper	50c9f1e11d	[TargetLowering] Replace Log2_32_Ceil with Log2_32 in SimplifySetCC ctpop combine. This combine can look through (trunc (ctpop X)). When doing this it tries to make sure the trunc doesn't lose any information from the ctpop. It does this by checking that the truncated type has more bits that Log2_32_Ceil of the ctpop type. The Ceil is unnecessary and pessimizes non-power of 2 types. For example, ctpop of i256 requires 9 bits to represent the max value of 256. But ctpop of i255 only requires 8 bits to represent the max result of 255. Log2_32_Ceil of 256 and 255 both return 8 while Log2_32 returns 8 for 256 and 7 for 255 The code with popcnt enabled is a regression for this test case, but it does match what already happens with i256 truncated to i9. Since power of 2 is more likely, I don't think it should block this change. Differential Revision: https://reviews.llvm.org/D89412	2020-10-15 01:05:07 -07:00
David Sherwood	47f2dc7e5f	[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets In most of lib/Target we know that we are not dealing with scalable types so it's perfectly fine to replace TypeSize comparison operators with their fixed width equivalents, making use of getFixedSize() and so on. Differential Revision: https://reviews.llvm.org/D89101	2020-10-15 09:01:21 +01:00
David Blaikie	a7b209a6d4	llvm-symbolizer: Exit non-zero when DWARF parsing errors have been rendered	2020-10-14 23:42:00 -07:00
Vinay Madhusudan	159b2d8e62	[AArch64] Combine UADDVs to generate vector add ADD(UADDV a, UADDV b) --> UADDV(ADD a, b) This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88731	2020-10-15 09:56:31 +05:30
Tony	b3a38bc2dc	[AMDGPU] Correct typos in SIMemoryLegalizer.cpp comments	2020-10-15 02:07:56 +00:00
Kazushi (Jam) Marukawa	94c18d91d2	[VE] Add vector load/store instructions Add vector registers and vector load/store instructions. Add regression tests for vector load/store instructions too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89183	2020-10-15 09:26:55 +09:00
Kazushi (Jam) Marukawa	8e7b108e80	[VE] Change to expand SHL_PARTS/SRA_PARTS/SRL_PARTS VE doesn't have SHL_PARTS/SRA_PARTS/SRL_PARTS instructions, so need to expand them. Add regression tests too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89396	2020-10-15 09:04:34 +09:00
Amara Emerson	78ccb0359d	[AArch64][GlobalISel] Don't use explicit zero registers for compare results. These cause problems for later optimizations, just using an unused vreg like SelectionDAG generates better code in the end, and obviates the need for some GISel specific flag optimizations. Differential Revision: https://reviews.llvm.org/D89419	2020-10-14 16:49:33 -07:00
Snehasish Kumar	24bf6ff4e0	[llvm] Update default cutoff threshold for machine function splitter. Based on internal testing at Google we found that setting the profile summary cutoff threshold to 999950 yields the best results in terms of itlb and icache metrics (as observed on Intel CPUs). default = Split out code if no profile count available for block size-% = The fraction of bytes split out of .text and .text.hot itlb = Misses per kilo instructions (MPKI) for itlb icache = Misses per kilo instructions (MPKI) for L1 icache Search1 \| cutoff \| size-% \| itlb \| icache \| \|---------\|---------\|-----------\|---------\| \| default \| 42.5861 \| 0.0822151 \| 2.46363 \| \| 999999 \| 44.9350 \| 0.0767194 \| 2.44416 \| \| 999950 \| 50.0660 \| 0.075744 \| 2.4091 \| \| 999500 \| 56.9158 \| 0.082564 \| 2.4188 \| \| 995000 \| 63.8625 \| 0.0814927 \| 2.42832 \| \| 990000 \| 71.7314 \| 0.106906 \| 2.57785 \| Search2 \| cutoff \| size-% \| itlb \| icache \| \|---------\|--------\|----------\|---------\| \| default \| 2.8845 \| 0.626712 \| 4.73245 \| \| 999999 \| 3.3291 \| 0.602309 \| 4.70045 \| \| 999950 \| 3.8577 \| 0.587842 \| 4.71632 \| \| 999500 \| 4.4170 \| 0.63577 \| 4.68351 \| \| 995000 \| 5.1020 \| 0.657969 \| 4.82272 \| \| 990000 \| 5.7153 \| 0.719122 \| 5.39496 \| Differential Revision: https://reviews.llvm.org/D89085	2020-10-14 12:48:10 -07:00
Snehasish Kumar	77638a5343	[llvm] Set the default for -bbsections-cold-text-prefix to .text.split. After using this for a while, we find that it is generally useful to have it set to .text.split. by default, removing the need for an additional -mllvm option. Differential Revision: https://reviews.llvm.org/D88997	2020-10-14 12:16:36 -07:00
Guozhi Wei	adfb541501	[MBP] Add whole chain to BlockFilterSet instead of individual BB Currently we add individual BB to BlockFilterSet if its frequency satisfies LoopFreq / Freq <= LoopToColdBlockRatio LoopFreq is edge frequency from outside to loop header. LoopToColdBlockRatio is a command line parameter. It doesn't make sense since we always layout whole chain, not individual BBs. It may also cause a tricky problem. Sometimes it is possible that the LoopFreq of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop, like .cold in the test case. So it is added to the chain of inner loop. When work on the outer loop, .cold is not added to BlockFilterSet, so the edge to successor .problem is not counted in UnscheduledPredecessors of .problem chain. But other blocks in the inner loop are added BlockFilterSet, so the whole inner loop chain can be layout, and markChainSuccessors is called to decrease UnscheduledPredecessors of following chains. markChainSuccessors calls markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold, so .problem chain's UnscheduledPredecessors is decreased, but this edge was not counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors becomes 0 when it still has an unscheduled predecessor .pred! And it causes problems in following various successor BB selection algorithms. Differential Revision: https://reviews.llvm.org/D89088	2020-10-14 11:55:10 -07:00
Justin Lebar	e9ac1869a8	Preserve param alignment in NVPTXLowerArgs pass. NVPTXLowerArgs works as follows. * Create a regular alloca with alignment identical to arg. * Copy arg from param space (and ASC'ing it from generic AS first) to the alloca (it's still in generic AS). * Replace loads of arg with loads of alloca. The bug here is that we did not preserve the arg's alignment when loading from the alloca. The impact of this bug is that sometimes param loads would be lowered as a series of u8 loads, because we're incorrectly assuming everything has alignment 1. Differential Revision: https://reviews.llvm.org/D89404	2020-10-14 11:15:30 -07:00
Krzysztof Parzyszek	670cd3c6e3	[Hexagon] Generate better splat code on v62+	2020-10-14 12:55:20 -05:00
Simon Pilgrim	60ba9233d1	Revert rG25a97c3a43d7 - "[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts" This reverts commit `25a97c3a43`. We have other constant folds that fold undef funnel shift amounts to 0 - so we need to be consistent. If we end up with regressions where we lose a splat shift amount pattern we'll have to investigate other canonicalizations, but matchFunnelShift currently protects us from that.	2020-10-14 18:14:37 +01:00
Konstantin Zhuravlyov	3fdf3b1539	AMDGPU: Update AMDHSA code object version handling Differential Revision: https://reviews.llvm.org/D89076	2020-10-14 13:04:27 -04:00
Matt Arsenault	6a9484f4bf	InstCombine: Fix losing load properties in copy-constant-to-alloca Preserve the alignment and metadata. Atomic loads are skipped for this, but pass along the properties for consistency.	2020-10-14 12:55:25 -04:00
Matt Arsenault	6da31fa4a6	InstCombine: Fix infinite loop in copy-constant-to-alloca transform This was broken by `16295d521e`, when instructions started being handled and not just constant expressions. This was re-inserting an equivalent bitcast to the original memcpy operand, which made a non-functional IR change on every iteration. This also fixes a secondary problem where it was inserting addrspacecasts which may not have been legal (i.e. it changed the source address space). Start visiting all pointer users and fail out if we can't process them. Also start handling the relevant memory intrinsic users. These cases can be dealt with by running InferAddressSpaces separately.	2020-10-14 12:55:25 -04:00
Florian Hahn	93f6c6b79c	Recommit "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." This reverts the revert commit `710aceb645` and includes a fix for a memsan failure. Original message: This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible.	2020-10-14 17:41:23 +01:00
jasonliu	f85bcc21dd	[AIX] Turn -fdata-sections on by default in Clang Summary: This patch does the following: 1. Make InitTargetOptionsFromCodeGenFlags() accepts Triple as a parameter, because some options' default value is triple dependant. 2. DataSections is turned on by default on AIX for llc. 3. Test cases change accordingly because of the default behaviour change. 4. Clang Driver passes in -fdata-sections by default on AIX. Reviewed By: MaskRay, DiggerLin Differential Revision: https://reviews.llvm.org/D88737	2020-10-14 15:58:31 +00:00
Simon Pilgrim	89657b3a3b	[InstCombine] narrowRotate - canonicalize to OR(SHL,LSHR). NFCI. Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15	2020-10-14 16:45:00 +01:00
Mircea Trofin	c8fcffe775	[NFC][MC] Use MCRegister in Machine{Sink\|Pipeliner}.cpp Differential Revision: https://reviews.llvm.org/D89328	2020-10-14 08:42:17 -07:00
Michael Liao	ae40d2858e	Fix an apparent typo. `assert()` must not contain side-effects. NFC.	2020-10-14 11:33:34 -04:00
Simon Pilgrim	89a2a47870	[InstCombine] Add m_SpecificIntAllowUndef pattern matcher m_SpecificInt doesn't accept undef elements in a vector splat value - tweak specific_intval to optionally allow undefs and add the m_SpecificIntAllowUndef variants. Allows us to remove the m_APIntAllowUndef + comparison hack inside matchFunnelShift	2020-10-14 16:15:53 +01:00
Cameron McInally	421f1b7294	[SVE] Lower fixed length VECREDUCE_FADD operation Differential Revision: https://reviews.llvm.org/D89263	2020-10-14 09:41:11 -05:00
Simon Pilgrim	25a97c3a43	[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts By always performing a modulo on the shift amount constants this was causing undef amounts being replaced with zero, meaning we were losing funnel shift by splat (with undef) patterns. Tweaked the shift amount bounds check to support (passthrough) undefs, and use Constant::mergeUndefsWith to preserve the undefs after folding.	2020-10-14 14:38:21 +01:00
Jonas Paulsson	6756d43af9	[SystemZ] Bugfix in SystemZVectorConstantInfo In order to correctly load an all-ones FP NaN value into a floating point register with a VGBM, the analyzed 32/64 FP bits must first be shifted left (into element 0 of the vector register). SystemZVectorConstantInfo has so far relied on element replication which has bypassed the need to do this shift, but now it is clear that this must be done in order to handle NaNs. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D89389	2020-10-14 15:34:40 +02:00
Jeremy Morse	c4e7857d4e	[DebugInstrRef] Create DBG_INSTR_REFs in SelectionDAG When given the -experimental-debug-variable-locations option (via -Xclang or to llc), have SelectionDAG generate DBG_INSTR_REF instructions instead of DBG_VALUE. For now, this only happens in a limited circumstance: when the value referred to is not a PHI and is defined in the current block. Other situations introduce interesting problems, addresed in later patches. Practically, this patch hooks into InstrEmitter and if it can find a defining instruction for a value, gives it an instruction number, and points the DBG_INSTR_REF at that <instr, operand> pair. Differential Revision: https://reviews.llvm.org/D85747	2020-10-14 14:24:08 +01:00
Roman Lebedev	7ee6c40247	Revert "Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"" and it's follow-ups While we haven't encountered an earth-shattering problem with this yet, by now it is pretty evident that trying to model the ptr->int cast implicitly leads to having to update every single place that assumed no such cast could be needed. That is of course the wrong approach. Let's back this out, and re-attempt with some another approach, possibly one originally suggested by Eli Friedman in https://bugs.llvm.org/show_bug.cgi?id=46786#c20 which should hopefully spare us this pain and more. This reverts commits `1fb6104293`, `7324616660`, `aaafe350bb`, `e92a8e0c74`. I've kept&improved the tests though.	2020-10-14 16:09:18 +03:00
Carl Ritson	01549dd976	[AMDGPU] Base getSubRegFromChannel on TableGen data Generate (at runtime) the table used to drive getSubRegFromChannel, base on AMDGPUSubRegIdxRanges from TableGen data. The is a step closer to it being staticly generated by TableGen and allows getSubRegFromChannel handle all bitwidths in the mean time. Reviewed By: rampitec, arsenm, foad Differential Revision: https://reviews.llvm.org/D89217	2020-10-14 20:25:09 +09:00
Juneyoung Lee	9b3c2a72e4	[ValueTracking] Use assume's noundef operand bundle This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89219	2020-10-14 20:16:33 +09:00
Tim Northover	630d264798	Analysis: only query size of sized objects. Recently we started looking into sret parameters, though the issue could crop up elsewhere. If the pointee type is opaque, we should not try to compute its size because that leads to an assertion failure.	2020-10-14 12:16:05 +01:00
Rainer Orth	3b956a58f3	Reland "[Support][unittests] Enforce alignment in ConvertUTFTest" This relands commit `53b3873cf4`. The failure of `ConvertUTFTest.UTF16WrappersForConvertUTF16ToUTF8String` detected the first time is fixed. Differential Revision: https://reviews.llvm.org/D88824	2020-10-14 12:02:27 +02:00
Jeremy Morse	2c5f3d54c5	[DebugInstrRef] Parse debug instruction-references from/to MIR This patch defines the MIR format for debug instruction references: it's an integer trailing an instruction, marked out by "debug-instr-number", much like how "debug-location" identifies the DebugLoc metadata of an instruction. The instruction number is stored directly in a MachineInstr. Actually referring to an instruction comes in a later patch, but is done using one of these instruction numbers. I've added a round-trip test and two verifier checks: that we don't label meta-instructions as generating values, and that there are no duplicates. Differential Revision: https://reviews.llvm.org/D85746	2020-10-14 10:57:09 +01:00
Evgeniy Brevnov	d0c95808e5	[LV] Unroll factor is expected to be > 0 LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D87679	2020-10-14 16:48:17 +07:00
Simon Pilgrim	1e4d882f9a	[InstCombine] matchFunnelShift - add support for non-uniform vectors containing undefs. Replace m_SpecificInt with m_APIntAllowUndef to matching splats containing undefs, then use ConstantExpr::mergeUndefsWith to merge the undefs together in the result. The undef funnel shift amounts are getting replaced with zero later on - I'll address this in a later patch, otherwise we lose potential shift by splat value patterns.	2020-10-14 10:42:27 +01:00
David Sherwood	af57a0838e	[SVE] Add fatal error when running out of registers for SVE tuple call arguments When passing SVE types as arguments to function calls we can run out of hardware SVE registers. This is normally fine, since we switch to an indirect mode where we pass a pointer to a SVE stack object in a GPR. However, if we switch over part-way through processing a SVE tuple then part of it will be in registers and the other part will be on the stack. This is wrong and we'd like to avoid any silent ABI compatibility issues in future. For now, I've added a fatal error when this happens until we can get a proper fix. Differential Revision: https://reviews.llvm.org/D89326	2020-10-14 09:31:41 +01:00
sstefan1	ce16be253c	[Attributor][NFC] Make `createShallowWrapper()` available outside of Attributor D85703 will need to create shallow wrappers in order to track the spmd icv. We need to make it available. Differential Revision: https://reviews.llvm.org/D89342	2020-10-14 10:08:59 +02:00
Evgeny Leviant	2ad82b0ed1	[ARM.td] Make instruction definitions visible to sched models Differential revision: https://reviews.llvm.org/D89308	2020-10-14 09:58:45 +03:00
Tony	907d799070	[AMDGPU] Cleanup memory legalizer interfaces - Rename interfaces to be in terms of acquire and release. - Improve comments. Differential Revision: https://reviews.llvm.org/D89355	2020-10-14 06:07:51 +00:00
Arthur Eubanks	518ec05a10	[LoopExtract][NewPM] Port -loop-extract to NPM -loop-extract-single is just -loop-extract on one loop. -loop-extract depended on -break-crit-edges and -loop-simplify in the legacy PM, but the NPM doesn't allow specifying pass dependencies like that, so manually add those passes to the RUN lines where necessary. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89016	2020-10-13 22:55:42 -07:00
David Blaikie	9670a45c98	libDebugInfoDWARF: Don't try to parse loclist[.dwo] headers when parsing debug_info[.dwo] There's no way to know whether there's a loclist contribution to parse if there's no loclistx encoding - and if there is one, there's no need to walk back from the loclist_base (or, uin the case of info.dwo/loclist.dwo - starting at 0 in the contribution) to parse the header, instead rely on the DWARF32/64 and address size in the CU that's already available. This would come up in split DWARF (non-split wouldn't try to read a loclist header in the absence of a loclist_base) when one unit had location lists and another does not (because the loclists.dwo section would be non-empty in that case - in the case where it's empty the parsing would silently skip). Simplify the testing a bit, rather than needing a whole dwp, etc - by creating a malformed loclists.dwo section (and use single file Split DWARF) that would trip up any attempt to parse it - but no attempt should be made.	2020-10-13 22:28:59 -07:00
Alexandre Ganea	617d64f6c5	Re-land [ThinLTO] Re-order modules for optimal multi-threaded processing This reverts `9b5b305023` and fixes the unwanted re-ordering when generating ThinLTO indexes. The goal of this patch is to better balance thread utilization during ThinLTO in-process linking (in llvm-lto2 or in LLD). Before this patch, large modules would often be scheduled late during execution, taking a long time to complete, thus starving the thread pool. We now sort modules in descending order, based on each module's bitcode size, so that larger modules are processed first. By doing so, smaller modules have a better chance to keep the thread pool active, and thus avoid starvation when the bitcode compilation is almost complete. In our case (on dual Intel Xeon Gold 6140, Windows 10 version 2004, two-stage build), this saves 15 sec when linking `clang.exe` with LLD & -flto=thin, /opt:lldltojobs=all, no ThinLTO cache, -DLLVM_INTEGRATED_CRT_ALLOC=d:\git\rpmalloc. Before patch: 100 sec After patch: 85 sec Inspired by the work done by David Callahan in D60495. Differential Revision: https://reviews.llvm.org/D87966	2020-10-13 21:54:15 -04:00
Aditya Nandakumar	ef3d17482f	[GISel] Add combine for constant G_PTR_ADD offsets. https://reviews.llvm.org/D88865 This adds a single combine for GlobalISel to fold: ptradd (inttoptr C1) C2 Into: C1 + C2 Additionally, a small test for AArch64 is added. Patch by pnappa.	2020-10-13 17:26:12 -07:00
Vedant Kumar	7fafaa07bc	[llvm-cov] Warn when -arch spec is missing/invalid for universal binary (reland) llvm-cov reports a poor error message when the -arch specifier is missing or invalid, and a binary has multiple slices. Make the error message more specific. (This version of the patch avoids using llvm::none_of -- the way I used the utility caused compile errors on many bots, possibly because the wrong overload of `none_of` was selected.) rdar://40312677	2020-10-13 16:46:03 -07:00
Vedant Kumar	10b6d0901f	Revert "[llvm-cov] Warn when -arch spec is missing/invalid for universal binary" This reverts commit `b81d4bfb44`. It's causing some bots to fail to build due to: "error: no matching function for call to ‘__iterator_category".	2020-10-13 16:32:31 -07:00
Vedant Kumar	b81d4bfb44	[llvm-cov] Warn when -arch spec is missing/invalid for universal binary llvm-cov reports a poor error message when the -arch specifier is missing or invalid, and a binary has multiple slices. Make the error message more specific. rdar://40312677	2020-10-13 16:29:26 -07:00
Jay Foad	edc37baca6	[AMDGPU] Add MC layer support for v_fmac_legacy_f32 This instruction was introduced in GFX10.3, reusing the opcode of v_mac_legacy_f32 from GFX10.1. Differential Revision: https://reviews.llvm.org/D89247	2020-10-13 21:57:33 +01:00
Duncan P. N. Exon Smith	f2b7d9f7fa	Support: Allow use of MemoryBufferRef with line_iterator Split out from https://reviews.llvm.org/D66782, use `Optional<MemoryBufferRef>` in `line_iterator` so you don't need access to a `MemoryBuffer*`. Follow up patches in `clang/` will leverage this. Differential Revision: https://reviews.llvm.org/D89280	2020-10-13 16:43:49 -04:00
Duncan P. N. Exon Smith	f087597124	Support: Add operator== for MemoryBufferRef and split out MemoryBufferRef.h As preparation for changing `LineIterator` to work with `MemoryBufferRef`: - Add an `operator==` that uses buffer pointer identity to ensure two buffers are equivalent. - Split out `MemoryBufferRef.h`, to avoid polluting `LineIterator.h` includers with everything from `MemoryBuffer.h`. This also means moving the `MemoryBuffer` constructor to a source file. Differential Revision: https://reviews.llvm.org/D89279	2020-10-13 16:42:24 -04:00
Andrew Paverd	cfd8481da1	Reland [CFGuard] Add address-taken IAT tables and delay-load support This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table. Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D87544	2020-10-13 13:20:52 -07:00
Nikita Popov	3b31f05372	[LICM] Don't require AST in LoopPromoter (NFC) While promotion currently always has an AST available, it is only relevant for invalidation purposes in LoopPromoter, so we do not need to have it as a hard dependency.	2020-10-13 22:08:49 +02:00
Nikita Popov	cd6f40f432	[MemCpyOpt] Add test scaffolding for MSSA based MemCpyOpt This adds an -enable-memcpyopt-memoryssa option that currently does nothing apart from requiring MSSA as a dependency. The tests are split to run both with the option disabled and enabled. I went with this rather than the separate directory DSE uses, as I found it convenient to have a direct side-by-side comparison of differences. Differential Revision: https://reviews.llvm.org/D89206	2020-10-13 21:45:05 +02:00
Nikita Popov	e79ca751fc	[MemCpyOpt] Fix MemorySSA preservation moveUp() moves instructions, so we should move the corresponding memory accesses as well. We should also move the store instruction itself: Even though we'll end up removing it later, this gives us a correct MemoryDef to replace. The implementation is somewhat more complicated than it should be, because we also handle the case where P does not have a memory access due to a degnerate AA pipeline. Hopefully, the need for this will go away in the future, when the rest of the pass is based on MSSA. Differential Revision: https://reviews.llvm.org/D88778	2020-10-13 21:39:09 +02:00
Roman Lebedev	e92a8e0c74	[SCEV] BuildConstantFromSCEV(): actually properly handle SExt-of-pointer case As being pointed out by @efriedma in https://reviews.llvm.org/rGaaafe350bb65#inline-4883 of course we can't just call ptrtoint in sign-extending case and be done with it, because it will zero-extend. I'm not sure what i was thinking there. This is very much not an NFC, however looking at the user of BuildConstantFromSCEV() i'm not sure how to actually show that it results in a different constant expression.	2020-10-13 22:22:30 +03:00
Nikita Popov	baa3b87015	[MemCpyOpt] Don't shorten memset if memcpy operands may be the same If the memcpy operands are the same (which is allowed since D86815) then the memcpy is effectively a no-op and the partially overlapping memset is not dead. Differential Revision: https://reviews.llvm.org/D89192	2020-10-13 21:19:19 +02:00
Nikita Popov	39c39e8a7f	[MemCpyOpt] Don't shorten memset if destination observable through unwinding MemCpyOpt can shorten a memset if it is later partially overwritten by a memcpy. It checks that the destination is not read in between, but we also need to make sure that the destination cannot be observed via unwinding. Differential Revision: https://reviews.llvm.org/D89190	2020-10-13 21:12:19 +02:00
Ahsan Saghir	f3202b30b8	[PowerPC] Add assemble disassemble intrinsics for MMA This patch adds support for assemble disassemble intrinsics for MMA. Reviewed By: bsaleil, #powerpc Differential Revision: https://reviews.llvm.org/D88739	2020-10-13 13:21:58 -05:00
Xun Li	0ccf9263cc	[ASAN] Make sure we are only processing lifetime markers with offset 0 to alloca This patch addresses https://bugs.llvm.org/show_bug.cgi?id=47787 (and hence https://bugs.llvm.org/show_bug.cgi?id=47767 as well). In latter instrumentation code, we always use the beginning of the alloca as the base for instrumentation, ignoring any offset into the alloca. Because of that, we should only instrument a lifetime marker if it's actually pointing to the beginning of the alloca. Differential Revision: https://reviews.llvm.org/D89191	2020-10-13 10:21:45 -07:00
Nikita Popov	6713332fdd	[LoopVersioningLICM] Fix noalias metadata emission The previous code added the scope on each iteration, so that the same scope was represented many times in the same !noalias metadata. That's legal, and semantically equivalent to only storing the scope once, but it's also wasteful and may pessimize further optimization if AATags get intersected naively, as done by the AliasSetTracker.	2020-10-13 18:58:05 +02:00
Mircea Trofin	08097fc6a9	[NFC][Regalloc] Use MCRegister in MachineCopyPropagation Differential Revision: https://reviews.llvm.org/D89250	2020-10-13 09:05:08 -07:00
Jay Foad	b59d8d7c72	[AMDGPU][GlobalISel] Compute known bits for zero-extending loads Implement computeKnownBitsForTargetInstr for G_AMDGPU_BUFFER_LOAD_UBYTE and G_AMDGPU_BUFFER_LOAD_USHORT. This allows generic combines to remove some unnecessary G_ANDs. Differential Revision: https://reviews.llvm.org/D89316	2020-10-13 16:22:00 +01:00
Paulo Matos	388fb67b0d	[WebAssembly] Added .tabletype to asm and multiple table support in obj files Adds more testing in basic-assembly.s and a new test tables.s. Adds support to yaml reading and writing of tables as well. Differential Revision: https://reviews.llvm.org/D88815	2020-10-13 07:52:23 -07:00
Simon Pilgrim	9c3138bd6d	[InstCombine] visitTrunc - pass through undefs for trunc(shift(trunc/ext(x),c)) patterns Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons. I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant. Differential Revision: https://reviews.llvm.org/D88687	2020-10-13 14:35:18 +01:00
Simon Pilgrim	2e604d23b4	[Analysis] findAffectedValues - remove unused ConstantInt argument. NFCI. We can use m_ConstantInt without a result value as we don't ever use it.	2020-10-13 14:35:18 +01:00
Mirko Brkusanin	52ba4fa6aa	[GlobalISel] Avoid making G_PTR_ADD with nullptr When the first operand is a null pointer we can avoid making a G_PTR_ADD and make a G_INTTOPTR with the offset operand. This helps us avoid making add with 0 later on for targets such as AMDGPU. Differential Revision: https://reviews.llvm.org/D87140	2020-10-13 13:02:55 +02:00
Vinay Madhusudan	37dce7475b	[AArch64] Identify SAD pattern (ABS (SUB (EXTEND a), (EXTEND b))) to ZERO_EXTEND((UABD a, b)) (ABS (SUB (SIGN_EXTEND a), (SIGN_EXTEND b))) to ZERO_EXTEND((SABD a, b)) This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888 Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929 Differential Revision: https://reviews.llvm.org/D88742	2020-10-13 15:50:54 +05:30
Vitaly Buka	710aceb645	Revert "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." It introduced a memory leak. This reverts commit `525b085a65`.	2020-10-13 03:14:08 -07:00
Cullen Rhodes	c87bd2d8eb	[AArch64] Implement .variant_pcs directive A dynamic linker with lazy binding support may need to handle variant PCS function symbols specially, so an ELF symbol table marking STO_AARCH64_VARIANT_PCS [1] was added to address this. Function symbols that follow the vector PCS are marked via the .variant_pcs assembler directive, which takes a single parameter specifying the symbol name and sets the STO_AARCH64_VARIANT_PCS st_other flag in the object file. [1] https://github.com/ARM-software/abi-aa/blob/master/aaelf64/aaelf64.rst#st-other-values Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D89138	2020-10-13 10:06:27 +00:00
Paul Walker	981b31c282	[SVE] Add ISel patterns for "insert undef_nxv#f##, f##, 0" Differential Revision: https://reviews.llvm.org/D89235	2020-10-13 10:49:18 +01:00
Roman Lebedev	aaafe350bb	[SCEV] BuildConstantFromSCEV(): properly handle SCEVSignExtend from ptr Much similar to the ZExt/Trunc handling. Thanks goes to Alexander Richardson for nudging towards noticing this one proactively. The appropriate (currently crashing) test coverage added.	2020-10-13 12:19:59 +03:00
Jay Foad	cdf0214845	[AMDGPU] v_mac_legacy_f32 does not support DPP Differential Revision: https://reviews.llvm.org/D89245	2020-10-13 10:03:00 +01:00
Roman Lebedev	7324616660	[SCEV] BuildConstantFromSCEV(): properly handle SCEVZeroExtend from ptr As being reported in https://reviews.llvm.org/D88806#2326944, this is pretty much the sibling problem of https://reviews.llvm.org/D88806#2325340, with root cause being that SCEV now models `ptrtoint` as trunc/zext/self of unknown. The appropriate (currently crashing) test coverage added.	2020-10-13 11:47:44 +03:00
Simon Pilgrim	5df61724a1	[InstCombine] Support uniform vector splats in ((((X >> C) & CC) + Y) << C) folds. Add support for uniform vector splats (no undefs).	2020-10-13 09:28:39 +01:00
Craig Topper	1687a8d83b	[X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization. This passes existing X86 test but I'm not sure if it handles all type legalization cases it needs to. Alternative to D89200 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D89222	2020-10-12 23:18:29 -07:00
Thomas Lively	72c628e835	Reland "[WebAssembly] Emulate v128.const efficiently"" This reverts commit `432e4e56d3`, which reverted `542523a61a`. Two issues from the original commit have been fixed. First, MSVC does not like when std::array is initialized with only single braces, so this commit switches to using the more portable double braces. Second, there was a subtle endianness bug that prevented the original commit from working correctly on big-endian machines, which has been fixed by switching to using endianness-agnostic bit twiddling instead of type punning. Differential Revision: https://reviews.llvm.org/D88773	2020-10-13 04:36:59 +00:00
Wang, Pengfei	412cdcf2ed	[X86] Add HRESET instruction. For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89102	2020-10-13 08:47:26 +08:00
Ruiling Song	b215a26628	[AMDGPU] Update LiveVariables in convertToThreeAddress() This can fix an asan failure like below. ==15856==ERROR: AddressSanitizer: use-after-poison on address ... READ of size 8 at 0x6210001a3cb0 thread T0 #0 llvm::MachineInstr::getParent() #1 llvm::LiveVariables::VarInfo::findKill() #2 TwoAddressInstructionPass::rescheduleMIBelowKill() #3 TwoAddressInstructionPass::tryInstructionTransform() #4 TwoAddressInstructionPass::runOnMachineFunction() We need to update the Kills if we replace instructions. The Kills may be later accessed within TwoAddressInstruction pass. Differential Revision: https://reviews.llvm.org/D89092	2020-10-13 08:12:20 +08:00
Craig Topper	a184c758b7	[BitCodeAnalyzer] Add a few missing TYPE_CODES and MODULE_CODE_COMDAT to GetCodeName Happened to notice some of these printing as UnknownCode while running llvm-bcanalyzer on a bc file I had. Differential Revision: https://reviews.llvm.org/D86900	2020-10-12 15:43:12 -07:00
Paul C. Anagnostopoulos	350fafabe9	[TableGen] Add overload of RecordKeeper::getAllDerivedDefinitions() and use in PseudoLowering backend. Now the two getAllDerivedDefinitions() use StringRef and Arrayref. Use all_of() in getAllDerivedDefinitions().	2020-10-12 16:40:09 -04:00
Roman Lebedev	1fb6104293	Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown" This relands commit `1c021c64ca` which was reverted in commit `17cec6a11a` because an assertion was being triggered, since `BuildConstantFromSCEV()` wasn't updated to handle the case where the constant we want to truncate is actually a pointer. I was unsuccessful in coming up with a test case where we'd end there with constant zext/sext of a pointer, so i didn't handle those cases there until there is a test case. Original commit message: While we indeed can't treat them as no-ops, i believe we can/should do better than just modelling them as `unknown`. `inttoptr` story is complicated, but for `ptrtoint`, it seems straight-forward to model it just as a zext-or-trunc of unknown. This may be important now that we track towards making inttoptr/ptrtoint casts not no-op, and towards preventing folding them into loads/etc (see D88979/D88789/D88788) Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88806	2020-10-12 23:02:55 +03:00
Simon Pilgrim	4ff7136268	[InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI. There's no need to create constant vector splats manually - missed this one in rG24dd0cd1edd5	2020-10-12 18:39:45 +01:00
Fangrui Song	012dd42e02	[X86] Support -march=x86-64-v[234] PR47686. These micro-architecture levels are defined in the x86-64 psABI: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9 GCC 11 will support these levels. Note, -mtune=x86-64-v[234] are invalid and __builtin_cpu_is cannot be used on them. Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D89197	2020-10-12 10:29:46 -07:00
Simon Pilgrim	24dd0cd1ed	[InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI. There's no need to create constant vector splats manually.	2020-10-12 18:17:20 +01:00
Simon Pilgrim	2de368f6a7	[InstCombine] FoldShiftByConstant - merge equivalent types. NFCI. Consistently use the original shift instruction's Type/BitWidth instead of the operands, casted values etc.	2020-10-12 18:17:19 +01:00
Teresa Johnson	c27ab339ad	Restore "[ThinLTO] Avoid temporaries when loading global decl attachment metadata" This restores commit `ab1b4810b5` which was reverted in `01b9deba76`, with a fix for the issue it caused. We should use a temporary BitstreamCursor when loading the global decl attachment records so that the abbrev ids held in the lazy loading IndexCursor are not clobbered. Enhanced the test so that the issue is exposed there. Original description: When performing ThinLTO importing, the metadata loader attempts to lazy load, by building an index. However, module level global decl attachment metadata was being parsed early while building the index, since the associated (module level) global values aren't materialized on demand. This results in the creation of forward reference temporary metadatas, which are expensive. Normally, these module level global values don't have much attached metadata. However, in the case of -fwhole-program-vtables (e.g. for whole program devirtualization), the vtables may have many attached type metadatas. This was resulting in very slow performance when performing ThinLTO importing with the default lazy loading. This patch restructures the handling of these global decl attachment records, delaying their parsing until after the lazy loading index has been built. Then the parser can use the interface that loads from the index, which resolves forward references immediately instead of creating expensive temporaries. For one ThinLTO backend that imports from modules containing huge numbers of vtables and associated types, I measured the following compile times for the metadata materialization during function importing, rounded to nearest second: No -fwhole-program-vtables: Lazy loading on (head): 1s Lazy loading off (head): 3s Lazy loading on (patch): 1s With -fwhole-program-vtables: Lazy loading on (head): 440s Lazy loading off (head): 4s Lazy loading on (patch): 2s Differential Revision: https://reviews.llvm.org/D87970	2020-10-12 10:11:56 -07:00
Florian Hahn	525b085a65	[VPlan] Use VPValue def for VPMemoryInstructionRecipe. This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84680	2020-10-12 18:02:33 +01:00
Hans Wennborg	17cec6a11a	Revert `1c021c64c` "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown" > While we indeed can't treat them as no-ops, i believe we can/should > do better than just modelling them as `unknown`. `inttoptr` story > is complicated, but for `ptrtoint`, it seems straight-forward > to model it just as a zext-or-trunc of unknown. > > This may be important now that we track towards > making inttoptr/ptrtoint casts not no-op, > and towards preventing folding them into loads/etc > (see D88979/D88789/D88788) > > Reviewed By: mkazantsev > > Differential Revision: https://reviews.llvm.org/D88806 It caused the following assert during Chromium builds: llvm/lib/IR/Constants.cpp:1868: static llvm::Constant llvm::ConstantExpr::getTrunc(llvm::Constant , llvm::Type *, bool): Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed. See code review for a link to a reproducer. This reverts commit `1c021c64ca`.	2020-10-12 18:39:35 +02:00
Konstantin Schwarz	7341123439	[GlobalISel][KnownBits] Early return on out of bound shift amounts If the known shift amount is bigger than or equal to the bitwidth of the type of the value to be shifted, the result is target dependent, so don't try to infer any bits. This fixes a crash we've seen in one of our internal test suites. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89232	2020-10-12 18:39:19 +02:00
Mircea Trofin	43d347995c	[NFC][MC] Use MCRegister in LiveRangeMatrix The change starts from LiveRangeMatrix and also checks the users of the APIs are typed accordingly. Differential Revision: https://reviews.llvm.org/D89145	2020-10-12 08:54:36 -07:00
Florian Hahn	ea058d289c	[VPlan] Use operands for printing of VPWidenMemoryInstructionRecipe. Now that operands of the recipe are managed through VPUser, we can simplify the printing by just using the operands.	2020-10-12 16:51:54 +01:00
Mircea Trofin	596a9f6b89	[NFC][Regalloc] Pass VirtRegMap by reference. It's never null - the reason it's modeled as a pointer is because the pass can't init it in its ctor. Passing by ref simplifies the code, too, as the null checks were unnecessary complexity. Differential Revision: https://reviews.llvm.org/D89171	2020-10-12 08:32:30 -07:00
Florian Hahn	ad5541045a	[LoopDeletion] Remove over-eager SCEV verification. `60b852092c` introduced SCEV verification to deleteDeadLoop, but it appears this check is currently a bit over-eager and some users of deleteDeadLoop appear to only patch up SE after calling it (e.g. PR47753). Remove the extra check for now. We can consider adding it back after we tracked down the source of the inconsistency for PR47753.	2020-10-12 16:18:30 +01:00
Sebastian Neubauer	7f2a641aad	[AMDGPU] Insert waterfall loops for divergent calls Extend loadSRsrcFromVGPR to allow moving a range of instructions into the loop. The call instruction is surrounded by copies into physical registers which should be part of the waterfall loop. Differential Revision: https://reviews.llvm.org/D88291	2020-10-12 17:16:11 +02:00
Cameron McInally	974ddb54c9	[SVE] Lower fixed length VECREDUCE_XOR operation Differential Revision: https://reviews.llvm.org/D88974	2020-10-12 10:12:15 -05:00
Simon Pilgrim	bbf3925879	[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw (REAPPLIED) If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139). Reapplied after the shift canonicalization in rG02295e6d1a15 which removed the need to flip the shift values. Differential Revision: https://reviews.llvm.org/D88783	2020-10-12 16:06:41 +01:00
Simon Pilgrim	fa56623370	[InstCombine] matchFunnelShift - remove shift value commutation. NFCI. After rG02295e6d1a15 we no longer need to invert the shift values for fshr - this is just hidden at the moment as funnel shifts only ever match for constant values so never use the fshr "Sub on SHL" path.	2020-10-12 15:55:18 +01:00
Simon Pilgrim	02295e6d1a	[InstCombine] matchFunnelShift - canonicalize to OR(SHL,LSHR). NFCI. Simplify the shift amount matching code by canonicalizing the shift ops first.	2020-10-12 15:10:59 +01:00
Max Kazantsev	28237c33d9	[NFC] Remove redundant isFullSet checks Full set case is handled inside intersection, no need to litter the code with duplicating them outside.	2020-10-12 20:41:16 +07:00
Simon Pilgrim	c252200e4d	[DAG][ARM][MIPS][RISCV] Improve funnel shift promotion to use 'double shift' patterns Based on a discussion on D88783, if we're promoting a funnel shift to a width at least twice the size as the original type, then we can use the 'double shift' patterns (shifting the concatenated sources). Differential Revision: https://reviews.llvm.org/D89139	2020-10-12 14:11:02 +01:00
Kazushi (Jam) Marukawa	66be2e00ef	[VE] Support copysign math function VE doesn't have instruction for copysign, so expand it. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89228	2020-10-12 21:06:19 +09:00
Simon Pilgrim	45d785e22b	Revert rGb97093e520036f8 - "[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw" This reverts commit `b97093e520`. Funnel shift argument commutation isn't working correctly	2020-10-12 11:38:52 +01:00
Kazushi (Jam) Marukawa	9d6d4b07a2	[VE] Support fneg and frem VE doesn't have fneg or frem instruction, so change them to expand. Add regression tests also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89205	2020-10-12 19:19:29 +09:00
Kazushi (Jam) Marukawa	6c32bc4875	[VE] Change to expand BRCOND VE doesn't have BRCOND instruction, so need to expand it. Also add a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89173	2020-10-12 19:18:37 +09:00
Roman Lebedev	1c021c64ca	[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown While we indeed can't treat them as no-ops, i believe we can/should do better than just modelling them as `unknown`. `inttoptr` story is complicated, but for `ptrtoint`, it seems straight-forward to model it just as a zext-or-trunc of unknown. This may be important now that we track towards making inttoptr/ptrtoint casts not no-op, and towards preventing folding them into loads/etc (see D88979/D88789/D88788) Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88806	2020-10-12 11:04:03 +03:00
David Sherwood	c5ba0d33cc	[SVE] Make ElementCount and TypeSize use a new PolySize class I have introduced a new template PolySize class, where the template parameter determines the type of quantity, i.e. for an element count this is just an unsigned value. The ElementCount class is now just a simple derivation of PolySize<unsigned>, whereas TypeSize is more complicated because it still needs to contain the uint64_t cast operator, since there are still many places in the code that rely upon this implicit cast. As such the class also still needs some of it's own operators. I've tried to minimise the amount of code in the base PolySize class, which led to a couple of changes: 1. In some places we were relying on '==' operator comparisons between ElementCounts and the scalar value 1. I didn't put this operator in the new PolySize class, and thought it was actually clearer to use the isScalar() function instead. 2. I removed the isByteSized function and replaced it with calls to isKnownMultipleOf(8). I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so that it's more consistent with coefficientDivideBy. Differential Revision: https://reviews.llvm.org/D88409	2020-10-12 08:23:38 +01:00
Fangrui Song	cddb49bcc0	[SchedDAGInstrs] Delete redundant contains(). NFC	2020-10-11 20:58:30 -07:00
Craig Topper	9e72d3eaf3	[ValueTracking] Use KnownBits::countMaxLeadingZeros/countMaxTrailingZeros to make code more readable. NFC	2020-10-11 14:26:18 -07:00
Roman Lebedev	544a6aa267	[InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load And another step towards transforms not introducing inttoptr and/or ptrtoint casts that weren't there already. As we've been establishing (see D88788/D88789), if there is a int<->ptr cast, it basically must stay as-is, we can't do much with it. I've looked, and the most source of new such casts being introduces, as far as i can tell, is this transform, which, ironically, tries to reduce count of casts.. On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in -33.58% less `IntToPtr`s (19014 -> 12629) and +76.20% more `PtrToInt`s (18589 -> 32753), which is an increase of +20.69% in total. However just on RawSpeed, where i know there are basically none `IntToPtr` in the original source code, this results in -99.27% less `IntToPtr`s (2724 -> 20) and +82.92% more `PtrToInt`s (4513 -> 8255). which is again an increase of 14.34% in total. To me this does seem like the step in the right direction, we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`, which seems like a reasonable trade-off. See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995 for some more discussion on the subject. (Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair` should be taught about this, yes) Reviewed By: nlopes, nikic Differential Revision: https://reviews.llvm.org/D88979	2020-10-11 20:24:28 +03:00
David Green	be6e8e50f4	[LV] Tail folded inloop reductions. This expands upon the inloop reductions added in e9761688e41cb9e976, allowing them to be inserted into tail folded loops. Reductions are generates with the form: x = select(mask, vecop, zero) v = vecreduce.add(x) c = add chain, v Where zero here is chosen as the identity value for add reductions. The backend is then expected to fold the select and the vecreduce into a single predicated instruction. Most of the code is fairly straight forward, except for the creation of blockmasks which need to ensure they are created in dominance order. The order they are added is altered to be after any phis, keeping the requirements for the underlying IR. Differential Revision: https://reviews.llvm.org/D84451	2020-10-11 16:58:34 +01:00
Sanjay Patel	3f3356bdd9	[InstCombine] allow vector splats for add+xor --> shifts	2020-10-11 09:04:24 -04:00
Sanjay Patel	f81200ae99	[InstCombine] add one-use check to add+xor transform As shown in the affected test, we could increase instruction count without this limitation. There's another test with extra use that shows we still convert directly to a real "sext" if possible.	2020-10-11 09:04:24 -04:00
Kazushi (Jam) Marukawa	86f69689f9	[VE][NFC] Clean VEISelLowering.cpp Clean the order of setOperationActions and others. Differential Revision: https://reviews.llvm.org/D89203	2020-10-11 21:47:50 +09:00
Simon Pilgrim	913d7a110e	[X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16. This is my first LLVM patch, so please tell me if there are any process issues. The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one. We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this. Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case. Patch By: @TomHender (Tom Hender) ActuallyaDeviloper Differential Revision: https://reviews.llvm.org/D87236	2020-10-11 11:21:23 +01:00

... 2 3 4 5 6 ...

140299 Commits