llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Parker	e3056ae9a0	[NFC][TTI] Explicit use of VectorType The API for shuffles and reductions uses generic Type parameters, instead of VectorType, and so assertions and casts are used a lot. This patch makes those types explicit, which means that the clients can't be lazy, but results in less ambiguity, and that can only be a good thing. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562 Differential Revision: https://reviews.llvm.org/D78357	2020-04-20 09:16:52 +01:00
Jonas Paulsson	36d4421f50	[LoopDataPrefetch + SystemZ] Let target decide on prefetching for each loop. This patch adds - New arguments to getMinPrefetchStride() to let the target decide on a per-loop basis if software prefetching should be done even with a stride within the limit of the hw prefetcher. - New TTI hook enableWritePrefetching() to let a target do write prefetching by default (defaults to false). - In LoopDataPrefetch: - A search through the whole loop to gather information before emitting any prefetches. This way the target can get information via new arguments to getMinPrefetchStride() and emit prefetches more selectively. Collected information includes: Does the loop have a call, how many memory accesses, how many of them are strided, how many prefetches will cover them. This is NFC to before as long as the target does not change its definition of getMinPrefetchStride(). - If a previous access to the same exact address was 'read', and the current one is 'write', make it a 'write' prefetch. - If two accesses that are covered by the same prefetch do not dominate each other, put the prefetch in a block that dominates both of them. - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop. - A SystemZ implementation of getMinPrefetchStride(). Review: Ulrich Weigand, Michael Kruse Differential Revision: https://reviews.llvm.org/D70228	2020-04-02 14:57:46 +02:00
Anna Welker	a6d3bec83f	[TTI][ARM][MVE] Refine gather/scatter cost model Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper. Differential Revision: https://reviews.llvm.org/D75525	2020-03-11 10:23:41 +00:00
Reid Kleckner	85ba5f637a	Rename TTI::getIntImmCost for instructions and intrinsics Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381	2019-12-11 18:00:20 -08:00
David Green	be7a107070	[ARM] Teach the Arm cost model that a Shift can be folded into other instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966	2019-12-09 10:24:33 +00:00
Guillaume Chatelet	a4783ef58d	[Alignment][NFC] getMemoryOpCost uses MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307	2019-10-25 21:26:59 +02:00
Zi Xuan Wu	9802268ad3	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634	2019-10-12 02:53:04 +00:00
David Greene	2e6f6b4dad	[System Model] [TTI] Update cache and prefetch TTI interfaces Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo. Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Moving the default TargetTransformInfoImplBase implementation to a default MCSubtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default MCSubtargetInfo implementation essentially returns the defaults TargetTransformInfoImplBase used to return. Existing users of TTI defaults will hit the defaults now in MCSubtargetInfo. Targets that define their own custom TTI implementations won't use the BasicTTIImpl implementations that route to the subtarget. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 374205	2019-10-09 19:51:48 +00:00
Jinsong Ji	9912232b46	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit `9f41deccc0`. This reverts commit `18b6fe07bc`. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091	2019-10-08 17:32:56 +00:00
Zi Xuan Wu	9f41deccc0	[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017	2019-10-08 03:28:33 +00:00
David Greene	d300a493df	Revert "[System Model] [TTI] Update cache and prefetch TTI interfaces" This broke some PPC prefetching tests. This reverts commit `9fdfb045ae`. llvm-svn: 365680	2019-07-10 18:25:58 +00:00
David Greene	9fdfb045ae	[System Model] [TTI] Update cache and prefetch TTI interfaces Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Adding a default "no information" subtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default subtarget implementation essentially returns "no information" for these interfaces. None of the existing users of TTI will hit that implementation because they define their own custom TTI implementations and won't use the BasicTTIImpl implementations. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 365676	2019-07-10 18:07:01 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Jonas Paulsson	96782c2c0b	[SystemZTTIImpl] Give correct cost values for vector bswap intrinsics. Implement getIntrinsicInstrCost() and return costs reflecting that bswap can be done with a vperm per vector register. Review: Ulrich Weigand https://reviews.llvm.org/D54789 llvm-svn: 347445	2018-11-22 07:17:29 +00:00
Jonas Paulsson	f15a53bc81	[SystemZ::TTI] Accurate costs for i1->double vector conversions This factors out a new method getBoolVecToIntConversionCost() containing the code for vector sext/zext of i1, in order to reuse it for i1 to double vector conversions. Review: Ulrich Weigand https://reviews.llvm.org/D53923 llvm-svn: 345817	2018-11-01 09:01:51 +00:00
Dorit Nuzman	34da6dd696	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705	2018-10-31 09:57:56 +00:00
Jonas Paulsson	b7caa809e1	[SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted. The SystemZ backend can do arithmetic of memory by loading and then extending one of the operands. Similarly, a load + truncate can be folded into an operand. This patch improves the SystemZ TTI cost function to recognize this. Review: Ulrich Weigand https://reviews.llvm.org/D52692 llvm-svn: 345327	2018-10-25 22:28:25 +00:00
Dorit Nuzman	38bbf81ade	recommit 344472 after fixing build failure on ARM and PPC. llvm-svn: 344475	2018-10-14 08:50:06 +00:00
Dorit Nuzman	5118c68cde	revert 344472 due to failures. llvm-svn: 344473	2018-10-14 07:21:20 +00:00
Dorit Nuzman	8174368955	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472	2018-10-14 07:06:16 +00:00
Jonas Paulsson	5ffb27b166	[SystemZ] Increase the amount of inlining. Implement getInliningThresholdMultiplier() and have it return 3. Review: Ulrich Weigand llvm-svn: 339563	2018-08-13 13:31:30 +00:00
Jonas Paulsson	e54cc1a436	[SystemZ] implement hasDivRemOp() SystemZ can do division and remainder in a single instruction for scalar integer types, which are now reflected by returning true in this hook for those cases. Review: Ulrich Weigand llvm-svn: 317477	2017-11-06 13:10:31 +00:00
Jonas Paulsson	024e319489	[SystemZ, LoopStrengthReduce] This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729	2017-07-21 11:59:37 +00:00
Jonas Paulsson	89ca10de33	[SystemZ] Enable LoopDataPrefetch pass. Loop data prefetching has shown some improvements on benchmarks, and is enabled at -O1 and above. Review: Ulrich Weigand llvm-svn: 308024	2017-07-14 13:52:38 +00:00
Geoff Berry	66d9bdbca8	[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI. Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554	2017-06-28 15:53:17 +00:00
Daniel Neilson	c0112ae8da	Const correctness for TTI::getRegisterBitWidth Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation. Reviewers: chandlerc, rnk, reames Reviewed By: reames Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D33903 llvm-svn: 305189	2017-06-12 14:22:21 +00:00
Jonas Paulsson	8624b7e1ce	[LoopVectorizer] Let target prefer scalar addressing computations. The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744	2017-05-24 13:42:56 +00:00
Jonas Paulsson	da74ed42da	[LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore() Since SystemZ supports vector element load/store instructions, there is no need for extracts/inserts if a vector load/store gets scalarized. This patch lets Target specify that it supports such instructions by means of a new TTI hook that defaults to false. The use for this is in the LoopVectorizer getScalarizationOverhead() method, which will with this patch produce a smaller sum for a vector load/store on SystemZ. New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll Review: Adam Nemet https://reviews.llvm.org/D30680 llvm-svn: 300056	2017-04-12 12:41:37 +00:00
Jonas Paulsson	fccc7d66c3	[SystemZ] TargetTransformInfo cost functions implemented. getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052	2017-04-12 11:49:08 +00:00
Benjamin Kramer	2a8bef8769	Do a sweep over move ctors and remove those that are identical to the default. All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721	2016-10-20 12:20:28 +00:00
Jonas Paulsson	58c5a7f55a	[SystemZ] Implementation of getUnrollingPreferences(). This commit enables more unrolling for SystemZ by implementing the SystemZTargetTransformInfo::getUnrollingPreferences() method. It has been found that it is better to only unroll moderately, so the DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order to set this to a lower value for SystemZ (4). Reviewers: Evgeny Stupachenko, Ulrich Weigand. https://reviews.llvm.org/D24451 llvm-svn: 282570	2016-09-28 09:41:38 +00:00
Eric Christopher	a4e5d3cf8e	constify the Function parameter to the TTI creation callback and propagate to all callers/users/etc. llvm-svn: 247864	2015-09-16 23:38:13 +00:00
Chandler Carruth	93205eb966	[TTI] Make the cost APIs in TargetTransformInfo consistently use 'int' rather than 'unsigned' for their costs. For something like costs in particular there is a natural "negative" value, that of savings or saved cost. As a consequence, there is a lot of code that subtracts or creates negative values based on cost, all of which is prone to awkwardness or bugs when dealing with an unsigned type. Similarly, we never want these values to wrap, as that would cause Very Bad code generation (likely percieved as an infinite loop as we try to emit over 2^32 instructions or some such insanity). All around 'int' seems a much better fit for these basic metrics. I've added asserts to ensure that at least the TTI interface never returns negative numbers here. If we ever have a use case for negative numbers, we can remove this, but this way a bug where someone used '-1' to produce a 'very large' cost will be caught by the assert. This passes all tests, and is also UBSan clean. No functional change intended. Differential Revision: http://reviews.llvm.org/D11741 llvm-svn: 244080	2015-08-05 18:08:10 +00:00
Mehdi Amini	5010ebf181	Make TargetTransformInfo keeping a reference to the Module DataLayout DataLayout is no longer optional. It was initialized with or without a DataLayout, and the DataLayout when supplied could have been the one from the TargetMachine. Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11021 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241774	2015-07-09 02:08:42 +00:00
Ulrich Weigand	ce4c109585	[SystemZ] Add CodeGen support for integer vector types This the first of a series of patches to add CodeGen support exploiting the instructions of the z13 vector facility. This patch adds support for the native integer vector types (v16i8, v8i16, v4i32, v2i64). When the vector facility is present, we default to the new vector ABI. This is characterized by two major differences: - Vector types are passed/returned in vector registers (except for unnamed arguments of a variable-argument list function). - Vector types are at most 8-byte aligned. The reason for the choice of 8-byte vector alignment is that the hardware is able to efficiently load vectors at 8-byte alignment, and the ABI only guarantees 8-byte alignment of the stack pointer, so requiring any higher alignment for vectors would require dynamic stack re-alignment code. However, for compatibility with old code that may use vector types, when not using the vector facility, the old alignment rules (vector types are naturally aligned) remain in use. These alignment rules are not only implemented at the C language level (implemented in clang), but also at the LLVM IR level. This is done by selecting a different DataLayout string depending on whether the vector ABI is in effect or not. Based on a patch by Richard Sandiford. llvm-svn: 236521	2015-05-05 19:25:42 +00:00
Ulrich Weigand	b401218ca2	[SystemZ] Use POPCNT instruction on z196 We already exploit a number of instructions specific to z196, but not yet POPCNT. Add support for the population-count facility, MC support for the POPCNT instruction, CodeGen support for using POPCNT, and implement the getPopcntSupport TargetTransformInfo hook. llvm-svn: 233689	2015-03-31 12:56:33 +00:00
Ulrich Weigand	1f6666a49c	[SystemZ] Provide basic TargetTransformInfo implementation This hooks up the TargetTransformInfo machinery for SystemZ, and provides an implementation of getIntImmCost. In addition, the patch adds the isLegalICmpImmediate and isLegalAddImmediate TargetLowering overrides, and updates a couple of test cases where we now generate slightly better code. llvm-svn: 233688	2015-03-31 12:52:27 +00:00

37 Commits