llvm-project

Commit Graph

Author	SHA1	Message	Date
Vladislav Dzhidzhoev	6cf11f4462	[GlobalISel][DebugInfo] Salvage trivially dead instructions Use salvageDebugInfo for instructions erased as trivially dead in GlobalISel. It would be helpful to implement support of G_PTR_ADD and G_FRAME_INDEX in salvageDebugInfo in future in order to preserve more variable location. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D133986	2022-09-17 03:54:55 +03:00
Sotiris Apostolakis	b827e7c600	[SelectOpti] Restrict load sinking This is a follow-up to D133777, which resolved a use-after-free case but did not cover all possible memory bugs due to misplacement of loads. In short, the overall problem was that sinked loads could be moved after state-modifying instructions leading to memory bugs. The solution is to restrict load sinking unless it is found to be sound. i) Within a basic block (to-be-sinked load and select-user are in the same BB), loads can be sinked only if there is no intervening state-modifying instruction. This is a conservative approach to avoid resorting to alias analysis to detect potential memory overlap. ii) Across basic blocks, sinking of loads is avoided. This is because going over multiple basic blocks looking for memory conflicts could be computationally expensive and also unlikely to allow loads to sink. Further, experiments showed that not sinking these loads has a slight positive performance effect. Maybe for some of these loads, having some separation allows enough time for the load to be executed in time for its user. This is not the case for floating point operations that benefit more from sinking. The solution in D133777 was essentially undone in this patch, since the latter is a complete solution to the observed problem. Overall, the performance impact of this patch is minimal. Tested on two internal Google workloads with instrPGO. Search application showed <0.05% perf difference, while the database one showed a slight improvement, but not statistically significant. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D133999	2022-09-16 20:50:46 +00:00
Jessica Paquette	1076b31da8	[GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum This is a partial port of the code used by the SelectionDAGBuilder to translate selects. In particular, see matchSelectPattern in ValueTracking.cpp. This is a GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum. I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler. On the AArch64-end, it seems like the FP cases are more important for perf right now, so I bit the bullet and went at the more complicated problem. :) I elected to do this as a post-legalize combine rather than in the IRTranslator because Deciding which fmax/fmin to use can depend on legalization rules Philosophically-speaking (TM), putting it in a combine just feels cleaner Being able to enable/disable the combine is handy Another option would be to use the ValueTracking code in the IRTranslator and match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat annoying since we'd need to write lowerings back into the selects in the legalizer. I'm not strongly opposed to the approach. We'd also want to be careful with vector selects once that's implemented, which explicitly check if a vector select is legal on the target. That'd probably need a hook. From what I can tell, doing this as a combine is probably a cleaner option long-term. Differential Revision: https://reviews.llvm.org/D116702	2022-09-16 13:35:46 -07:00
Pengxuan Zheng	59365f33e2	[MachineCSE] Add a threshold to avoid spending too much time in isProfitableToCSE Currently, it can become extremely costly to compute MayIncreasePressure if the size of CSUses turns out to be very large. In that case, it's no longer cost effective to keep computing MayIncreasePressure. Therefore, to limit the amount of time spent in isProfitableToCSE, we simply conservatively assume MayIncreasePressure if the size of CSUses is too large. This can reduce overall compile time by 30% for some benchmarks. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134003	2022-09-16 13:35:17 -07:00
Craig Topper	1121eca685	[VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB. I want to default all VP operations to Expand. These 2 were blocking because VE doesn't support them and the tests were expecting them to fail a specific way. Using Expand caused them to fail differently. Seemed better to emulate them using operations that are supported. @simoll mentioned on Discord that VE has some expansion downstream. Not sure if its done like this or in the VE target. Reviewed By: frasercrmck, efocht Differential Revision: https://reviews.llvm.org/D133514	2022-09-16 13:19:02 -07:00
David Majnemer	8a868d8859	Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"" This reverts commit `cd20a18286` and adds a "let Heading" to NoStackProtectorDocs.	2022-09-16 19:39:48 +00:00
Stanislav Mekhanoshin	a0c8f5fefa	[SDAG] Print divergence in SDNode::dump If target does not support divergence the field is set to false and not printed. Differential Revision: https://reviews.llvm.org/D133984	2022-09-16 11:43:34 -07:00
Juan Manuel MARTINEZ CAAMAÑO	e438f2d821	[DAGCombine] Do not fold SRA/SRL of MUL into MULH when MUL's LSB are used, and MUL_LOHI is available Folding into a sra(mul) / srl(mul) into a mulh introduces an extra multiplication to compute the high half of the multiplication, while it is more profitable to compute the high and lower halfs with a single mul_lohi. Differential Revision: https://reviews.llvm.org/D133768	2022-09-16 15:48:36 +00:00
Liqiang Tao	2e37557fde	StackProtector: ensure stack checks are inserted before the tail call The IR stack protector pass should insert stack checks before the tail calls not only the musttail calls. So that the attributes `ssqreq` and `tail call`, which are emited by llvm-opt, could be both enabled by llvm-llc. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D133860	2022-09-16 22:24:46 +08:00
Florian Hahn	6b86b481e3	[AArch64] Use tbl for truncating vector FPtoUI conversions. On AArch64, doing the vector truncate separately after the fptoui conversion can be lowered more efficiently using tbl.4, building on D133495. https://alive2.llvm.org/ce/z/T538CC Depends on D133495 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133496	2022-09-16 14:57:43 +01:00
Florian Hahn	8491d01cc3	[AArch64] Lower vector trunc using tbl. Similar to using tbl to lower vector ZExts, tbl4 can be used to lower vector truncates. The initial version support i32->i8 conversions. Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133495	2022-09-16 12:42:49 +01:00
Nikita Popov	b4309800e9	[CodeGen] Don't zero callee-save registers with zero-call-used-regs (PR57692) Callee save registers must be preserved, so -fzero-call-used-regs should not be zeroing them. The previous implementation only did not zero callee save registers that were saved&restored inside the function, but we need preserve all of them. Fixes https://github.com/llvm/llvm-project/issues/57692. Differential Revision: https://reviews.llvm.org/D133946	2022-09-16 11:52:29 +02:00
Florian Hahn	5871f18827	[AArch64] Lower extending uitofp using tbl. On AArch64, doing the zero-extend separately first can be lowered more efficiently using tbl, building on D120571. https://alive2.llvm.org/ce/z/8Je595 Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133494	2022-09-16 10:20:25 +01:00
Yuta Mukai	116838b151	[MachinePipeliner] Fix the interpretation of the scheduling model The method of counting resource consumption is modified to be based on "Cycles" value when DFA is not used. The calculation of ResMII is modified to total "Cycles" and divide it by the number of units for each resource. Previously, ResMII was excessive because it was assumed that resources were consumed for the cycles of "Latency" value. The method of resource reservation is modified similarly. When a value of "Cycles" is larger than 1, the resource is considered to be consumed by 1 for cycles of its length from the scheduled cycle. To realize this, ResourceManager maintains a resource table for all slots. Previously, resource consumption was always 1 for 1 cycle regardless of the value of "Cycles" or "Latency". In addition, the number of micro operations per cycle is modified to be constrained by "IssueWidth". To disable the constraint, --pipeliner-force-issue-width=100 can be used. For the case of using DFA, the scheduling results are unchanged. Reviewed By: dpenry Differential Revision: https://reviews.llvm.org/D133572	2022-09-16 09:51:48 +09:00
Alexander Timofeev	fbdea5a2e9	[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133593	2022-09-15 22:03:56 +02:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Matt Arsenault	63d1d37d35	RegAllocGreedy: Avoid overflowing priority bitfields The class priority is expected to be at most 5 bits before it starts clobbering bits used for other fields. Also clamp the instruction distance in case we have millions of instructions. AMDGPU was accidentally overflowing into the global priority bit in some cases. I think in principal we would have wanted this, but in the cases I've looked at, it had the counter intuitive effect and de-prioritized the large register tuple. Avoid using weird bit hack PPC uses for global priority. The AllocationPriority field is really 5 bits, and PPC was relying on overflowing this to 6-bits to forcibly set the global priority bit. Split this out as a separate flag to avoid having magic behavior for values above 31.	2022-09-15 10:38:40 -04:00
Stanislav Mekhanoshin	ef4b9c33f5	Fix crash while printing MMO target flags MachineMemOperand::print can dereference a NULL pointer if TII is not passed from the printMemOperand. This does not happen while dumping the DAG/MIR from llc but crashes the debugger if a dump() method is called from gdb. Differential Revision: https://reviews.llvm.org/D133903	2022-09-14 17:29:48 -07:00
Roland Froese	207228c1d6	[DAGCombiner] More load-store forwarding for big-endian Get some load-store forwarding cases for big-endian where a larger store covers a smaller load, and the offset would be 0 and handled on little-endian but on big-endian the offset is adjusted to be non-zero. The idea is just to shift the data to make it look like the offset 0 case. Differential Revision: https://reviews.llvm.org/D130115	2022-09-14 15:36:35 -04:00
Marco Elver	4627a30acf	[MIR] Support printing and parsing pcsections Adds support for printing and parsing PC sections metadata in MIR. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133785	2022-09-14 10:30:25 +02:00
YongKang Zhu	5fa6b24354	Address feedback in https://reviews.llvm.org/D133637 https://reviews.llvm.org/D133637 fixes the problem where we should hash raw content of register mask instead of the pointer to it. Fix the same issue in `llvm::hash_value()`. Remove the added API `MachineOperand::getRegMaskSize()` to avoid potential confusion. Add an assert to emphasize that we probably should hash a machine operand iff it has associated machine function, but keep the fallback logic in the original change. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D133747	2022-09-13 16:12:41 -07:00
Matt Arsenault	d00b0aab31	RegisterCoalescer: Fix verifier error when merging copy of undef There's no real read of the register, so the copy introduced a new live value. Make sure we introduce a replacement implicit_def instead of just erasing the copy. Found from llvm-reduce since it tries to set undef on everything.	2022-09-13 18:40:28 -04:00
Sotiris Apostolakis	eda61fb656	[SelectOpti] Fix lifetime intrinsic bug When a select is converted to a branch and load instructions are sinked to the true/false blocks, lifetime intrinsics (if present) could be made unsound if not moved. This conservatively moves all lifetime intrinsics in a transformed BB to the end block to ensure preserved lifetime semantics. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D133777	2022-09-13 19:00:18 +00:00
Craig Topper	efd5acf120	[LegalizeTypes][NVPTX] Remove extra compare from fallback code for ISD::ADD in ExpandIntRes_ADDSUB. This is the ultimate fallback code if UADDO isn't supported. If the target uses 0/1 we used one compare, but if the target doesn't use 0/1 we emitted two compares. Regardless of boolean constants we should only need to check that the Result is less than one of the original operands. So we only need one compare. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D133708	2022-09-13 09:07:56 -07:00
Hendrik Greving	393a17b5d1	[ValueTypes] Define MVTs for v256i2/v128i4. Adds MVT::v256i2, MVT::v128i4. Differential Revision: https://reviews.llvm.org/D133603	2022-09-13 09:02:23 -07:00
Matt Arsenault	740f920a1f	LiveRegUnits: Break register loop when a clobber is encountered	2022-09-13 10:15:08 -04:00
Matt Arsenault	b7dae832e6	DeadMachineInstructionElim: Don't repeat per-function init This was happening for every iteration but only needs to be done once.	2022-09-13 08:19:54 -04:00
Matt Arsenault	243632c63e	LiveRegUnits: Cleanup isReg checks This is the common case and should be checked first. Provides a very marginal compile time improvement on the example I'm looking at.	2022-09-13 08:19:53 -04:00
Pavel Samolysov	02aaf8e3d6	[NFC][ScheduleDAGInstrs] Use structure bindings and emplace_back Some uses of std::make_pair and the std::pair's first/second members in the ScheduleDAGInstrs.[cpp\|h] files were replaced with using of the vector's emplace_back along with structure bindings from C++17.	2022-09-13 12:49:04 +03:00
Sylvestre Ledru	cd20a18286	Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView" Causing: https://github.com/llvm/llvm-project/issues/57709 This reverts commit `ab56719acd`.	2022-09-13 10:53:59 +02:00
David Green	f124e59b2e	[TypePromotionPass] Don't treat phi's as ToPromote This attempts to stop the type promotion pass transforming where it is not profitable, by not marking PhiNodes as ToPromote and being more aggressive about pulling extends out of loops. Differential Revision: https://reviews.llvm.org/D133203	2022-09-13 08:57:15 +01:00
Matt Arsenault	920b2e65fc	DeadMachineInstructionElim: Fix typo	2022-09-12 20:10:33 -04:00
Aiden Grossman	eec183c171	[nfc] Refactor SlotIndex::getInstrDistance to better reflect actual functionality This patch refactors SlotIndex::getInstrDistance to SlotIndex::getApproxInstrDistance to better describe the actual functionality of this function. This patch also adds in some additional comments better documenting the assumptions that this function makes to increase clarity. Based on discussion on the LLVM Discourse: https://discourse.llvm.org/t/odd-behavior-in-slotindex-getinstrdistance/64934/5 Reviewed By: mtrofin, foad Differential Revision: https://reviews.llvm.org/D133386	2022-09-12 23:33:35 +00:00
Amara Emerson	f24f469223	[GlobalISel] Fix crash when lowering G_SELECT of pointer vectors. The bit masking lowering only works for vectors of scalars, so for pointer element types we need to add some casting. Differential Revision: https://reviews.llvm.org/D133672	2022-09-13 00:01:37 +01:00
Matt Arsenault	d90f7cb559	LiveRegUnits: Do not use phys_regs_and_masks Somehow DeadMachineInstructionElim is about 3x slower when using it. Hopefully this reverses the compile time regression reported for `b5041527c7`.	2022-09-12 17:21:24 -04:00
David Majnemer	ab56719acd	[clang, llvm] Add __declspec(safebuffers), support it in CodeView __declspec(safebuffers) is equivalent to __attribute__((no_stack_protector)). This information is recorded in CodeView. While we are here, add support for strict_gs_check.	2022-09-12 21:15:34 +00:00
Kazu Hirata	9606608474	[llvm] Use x.empty() instead of llvm::empty(x) (NFC) I'm planning to deprecate and eventually remove llvm::empty. I thought about replacing llvm::empty(x) with std::empty(x), but it turns out that all uses can be converted to x.empty(). That is, no use requires the ability of std::empty to accept C arrays and std::initializer_list. Differential Revision: https://reviews.llvm.org/D133677	2022-09-12 13:34:35 -07:00
YongKang Zhu	481a32f587	Bug fix on stable hash calculation for machine operands RegisterMask and RegisterLiveOut MachineOperand::getRegMask() returns a pointer to register mask. We should hash the raw content of register mask instead of its pointer. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D133637	2022-09-12 13:25:04 -07:00
Craig Topper	38ffa2bb96	[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants. For remainder: If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer type, this urem will be expand by DAGCombiner using multiply by magic constant. We do have to take into account that adding high and low together can produce a carry, making it a (BitWidth / 2)+1 bit number. So we need to also add back in the carry from the first addition. For division: We can use the above trick to compute the remainder, subtract that remainder from the dividend, then multiply by the multiplicative inverse of the Divisor modulo (1 << BitWidth). This is based on the section "Remainder by Summing Digits" in Hacker's delight. The remainder trick is similar to a trick you may have learned for determining if a decimal number is divisible by 3. You can add all the digits together and see if the sum is divisible by 3. If you're not sure if the sum is divisible by 3, you can add its digits together. This can be repeated until you have a single decimal digit. If that digit is 3, 6, or 9, then the original number is divisible by 3. This works because 10 % 3 == 1. gcc already does this same trick. There are additional tricks gcc does urem as well as srem, udiv, and sdiv that I plan to add in future patches. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130862	2022-09-12 10:34:52 -07:00
Matthias Gehre	c1502425ba	Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager. Differential Revision: https://reviews.llvm.org/D133691	2022-09-12 17:06:16 +01:00
Jay Foad	210e6a993d	[GlobalISel] Simplify extended add/sub to add/sub with carry Simplify extended add/sub (with carry-in and carry-out) to add/sub with carry (with carry-out only) if carry-in is known to be zero. Differential Revision: https://reviews.llvm.org/D133702	2022-09-12 17:05:44 +01:00
Matt Arsenault	e30271169f	RegAllocGreedy: Try local instruction splitting with subranges This was only trying this to relax register class constraints, but this can also help if there are subranges involved. This solves a compilation failure for AMDGPU when there is high pressure created by large register tuples. If one virtual register is using most of the available budget, we need to be able to evict subranges. This solves the immediate failure, but this solution leaves a lot to be desired. In the relevant testcases, we have 32-element tuples but most of the uses are operations on 1 element subranges of it. What we're now getting is a spill and restore of the full 1024 bits and an extract of the used 32-bits. It would be far better if we introduced a copy to a new virtual register with a smaller register class and used narrower spills. Furthermore, we could probably do a better job if the allocator were to introduce new subranges where none previously existed in the highest pressure scenarios. The block and region splits should also try to split specific subranges out. The mve-vst3.ll test changes looks like noise to me, but instruction count increased by one. mve-vst4.ll looks like a solid improvement with several 16-byte spills eliminated. splitkit-copy-live-lanes.mir also shows a solid reduction in total spill count. This could use more tests but it's pretty tiring to come up with cases that fail on this.	2022-09-12 09:03:55 -04:00
Pavel Samolysov	f045d0c392	[NFC][ScheduleDAG] Use a reference to iterate over NodeSuccs/ChainSuccs	2022-09-12 15:54:48 +03:00
Pavel Samolysov	05946c144d	[NFC][ScheduleDAG] Use structure bindings and emplace_back Some uses of std::make_pair and the std::pair's first/second members in the ScheduleDAGRRList.cpp file were replaced with using of the vector's emplace_back along with structure bindings from C++17.	2022-09-12 15:54:48 +03:00
Matt Arsenault	c34679b2e8	DAG: Sink some getter code closer to uses	2022-09-12 08:38:35 -04:00
Matt Arsenault	bb70b5d406	CodeGen: Set MODereferenceable from isDereferenceableAndAlignedPointer Previously this was assuming piontsToConstantMemory implies dereferenceable.	2022-09-12 08:38:35 -04:00
Pavel Samolysov	354a3d9c02	[NFC][ScheduleDAG] Use Register and MCPhysReg instead of unsigned	2022-09-12 15:18:11 +03:00
Matt Arsenault	b5041527c7	DeadMachineInstructionElim: Switch to using LiveRegUnits Theoretically improves compile time for targets with many overlapping registers	2022-09-12 07:55:14 -04:00
Matt Arsenault	6c44a7179f	RegAlloc: Use SmallSet instead of std::set There shouldn't be more than a small handful of hints at most.	2022-09-12 07:55:10 -04:00

1 2 3 4 5 ...

32955 Commits