llvm-project

Commit Graph

Author	SHA1	Message	Date
Tyker	b7338fb1a6	[AssumeBundles] add cannonicalisation to the assume builder Summary: this reduces significantly the number of assumes generated without aftecting too much the information that is preserved. this improves the compile-time cost of enable-knowledge-retention significantly. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79650	2020-06-19 10:32:26 +02:00
Vitaly Buka	fcd67665a8	[StackSafety] Add "Must Live" logic Summary: Extend StackLifetime with option to calculate liveliness where alloca is only considered alive on basic block entry if all non-dead predecessors had it alive at terminators. Depends on D82043. Reviewers: eugenis Reviewed By: eugenis Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82124	2020-06-18 16:53:37 -07:00
Vitaly Buka	f672791e08	[StackSafety] Add pass for StackLifetime testing Summary: lifetime.ll is a copy of SafeStack/X86/coloring2.ll Reviewers: eugenis Reviewed By: eugenis Subscribers: hiraditya, mgrang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82043	2020-06-18 16:34:18 -07:00
Paul Walker	4612f39120	[SVE] Add flag to specify SVE register size, using this to calculate legal vector types. Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE data registers (in bits) to be specified. This allows the code generator to make assumptions it normally couldn't. As a starting point this information is used to mark fixed length vector types that can fit within the specified size as legal. Reviewers: rengolin, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80384	2020-06-18 12:11:16 +00:00
Sameer Sahasrabuddhe	7aad220795	[DA] conservatively mark the join of every divergent branch For a loop, a join block is a block that is reachable along multiple disjoint paths from the exiting block of a loop. If the exit condition of the loop is divergent, then such join blocks must also be marked divergent. This currently fails in some cases because not all join blocks are identified correctly. The workaround is to conservatively mark every join block of any branch (not necessarily the exiting block of a loop) as divergent. https://bugs.llvm.org/show_bug.cgi?id=46372 Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D81806	2020-06-18 17:39:20 +05:30
Christopher Tetreault	8819202dfd	[SVE] Eliminate bad VectorType::getNumElements() calls from ConstantFold Summary: Assume all usages of this function are explicitly fixed-width operations and cast to FixedVectorType Reviewers: efriedma, sdesmalen, c-rhodes, majnemer, dblaikie Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80262	2020-06-17 14:19:56 -07:00
Sameer Sahasrabuddhe	d3963b3a5f	[DA] propagate loop live-out values that get used in a branch Values that are uniform within a loop but appear divergent to uses outside the loop are "tainted" so that such uses are marked divergent. But if such a use is a branch, then it's divergence needs to be propagated. The simplest way to do that is to put the branch back in the main worklist so that it is processed appropriately. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D81822	2020-06-17 09:21:00 +05:30
Tyker	d7deef1206	Revert "[AssumeBundles] add cannonicalisation to the assume builder" This reverts commit `90c50cad19`.	2020-06-16 14:34:55 +02:00
Tyker	90c50cad19	[AssumeBundles] add cannonicalisation to the assume builder Summary: this reduces significantly the number of assumes generated without aftecting too much the information that is preserved. this improves the compile-time cost of enable-knowledge-retention significantly. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79650	2020-06-16 13:12:35 +02:00
Sam Parker	2596da3174	[CostModel] getCFInstrCost in getUserCost. Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 specific implementation as it seems to be very reliant on those instructions being free. Changes are also made to AMDGPU so that their implementations distinguish between cost kinds, so that the unrolling isn't affected. PowerPC also has its own implementation to prevent changes to the reg-usage vectorizer test. The cost model test changes now reflect that ret instructions are not generally free. Differential Revision: https://reviews.llvm.org/D79164	2020-06-15 09:28:46 +01:00
David Green	7507186b94	[ARM] Additional cast cost tests. This adds additional cast cpst tests useful for MVE, notably around half types.	2020-06-14 14:30:07 +01:00
Vitaly Buka	c1e47b47f8	[StackSafety] Run ThinLTO Summary: ThinLTO linking runs dataflow processing on collected function parameters. Then StackSafetyGlobalInfoWrapperPass in ThinLTO backend will run as usual looking up to external symbol in the summary if needed. Depends on D80985. Reviewers: eugenis, pcc Reviewed By: eugenis Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D81242	2020-06-12 18:11:29 -07:00
Vitaly Buka	e6ce0dc5de	[StackSafety,NFC] Extract addOverflowNever	2020-06-12 17:42:32 -07:00
Vitaly Buka	999307323a	[StackSafety] Fix byval handling We don't need process paramenters which marked as byval as we are not going to pass interested allocas without copying. If we pass value into byval argument, we just handle that as Load of corresponding type and stop that branch of analysis.	2020-06-11 20:58:36 -07:00
Alina Sbirlea	519b019a0a	Verify MemorySSA after all updates. Verify after completing all updates. Resolves PR46275.	2020-06-11 18:48:41 -07:00
Simon Pilgrim	28947bc23c	[CostModel][X86] Add broadcast costs for vXi1 bool vectors Doesn't mean much on non-AVX512 targets but better to keep with the other shuffles	2020-06-10 15:27:15 +01:00
Roman Lebedev	c868335e24	[SCEV] ScalarEvolution::createSCEV(): clarify no-wrap flag propagation for shift by bitwidth-1 Summary: There was this comment here previously: ``` - // It is currently not resolved how to interpret NSW for left - // shift by BitWidth - 1, so we avoid applying flags in that - // case. Remove this check (or this comment) once the situation - // is resolved. See - // http://lists.llvm.org/pipermail/llvm-dev/2015-April/084195.html - // and http://reviews.llvm.org/D8890 . ``` But langref was fixed in rL286785, and the behavior is pretty obvious: http://volta.cs.utah.edu:8080/z/MM4WZP ^ nuw can always be propagated. nsw can be propagated if either nuw is specified, or the shift is by less than bitwidth-1. This mimics similar D81189 Reassociate change, alive2 is happy about that one. I'm not sure `NUW` isn't being printed, but that seems unrelated. Reviewers: mkazantsev, reames, sanjoy, nlopes, craig.topper, efriedma Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81243	2020-06-06 13:02:07 +03:00
Philip Reames	32c09d527c	[Tests] Migrate a number of tests to gc-live bundle representation	2020-06-05 16:44:04 -07:00
Roman Lebedev	39e3683534	[NFC][SCEV] Add test with 'or' with no common bits set	2020-06-05 12:18:15 +03:00
Roman Lebedev	39e3c92410	[NFC][SCEV] Some tests for shifts by bitwidth-2/bitwidth-1 w/ no-wrap flags	2020-06-05 11:45:09 +03:00
Vitaly Buka	6dd738e2f0	[StackSafety,NFC] Switch tests to aarch64	2020-06-05 00:24:02 -07:00
Yevgeny Rouban	dcfa78a4cc	Extend InvokeInst !prof branch_weights metadata to unwind branches Allow InvokeInst to have the second optional prof branch weight for its unwind branch. InvokeInst is a terminator with two successors. It might have its unwind branch taken many times. If so the BranchProbabilityInfo unwind branch heuristic can be inaccurate. This patch allows a higher accuracy calculated with both branch weights set. Changes: - A new section about InvokeInst is added to the BranchWeightMetadata page. It states the old information that missed in the doc and adds new about the second branch weight. - Verifier is changed to allow either 1 or 2 branch weights for InvokeInst. - A new test is written for BranchProbabilityInfo to demonstrate the main improvement of the simple fix in calcMetadataWeights(). - Several new testcases are created for Inliner. Those check that both weights are accounted for invoke instruction weight calculation. - PGOUseFunc::setBranchWeights() is fixed to be applicable to InvokeInst. Reviewers: davidxl, reames, xur, yamauchi Tags: #llvm Differential Revision: https://reviews.llvm.org/D80618	2020-06-04 15:37:15 +07:00
Jay Foad	c27214c234	[AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics (fix tests) Try to fix Windows buildbots.	2020-06-03 11:40:52 +01:00
Jay Foad	c823cfde21	[AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics Differential Revision: https://reviews.llvm.org/D80702	2020-06-03 09:34:22 +01:00
Vitaly Buka	d3b7f90d00	[StackSafety] Skip non-pointer parameters Summary: Depends on D80908. Reviewers: eugenis, pcc Reviewed By: eugenis Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80956	2020-06-03 01:16:39 -07:00
Vitaly Buka	232d348c6e	[MTE] Convert StackSafety into analysis This lets us to remove !stack-safe metadata and better controll when to perform StackSafety analysis. Reviewers: eugenis Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80771	2020-06-02 16:08:14 -07:00
Vitaly Buka	fc07c1af69	[StackSafety] Delete useless test	2020-06-02 16:08:14 -07:00
Sam Parker	e70cf280f8	[NFC][ARM][AArch64] Test runs Add code size tests runs for memory ops for both architectures.	2020-06-02 09:05:30 +01:00
Yevgeny Rouban	07239c736a	[BrachProbablityInfo] Proportional distribution of reachable probabilities When fixing probability of unreachable edges in BranchProbabilityInfo::calcMetadataWeights() proportionally distribute remainder probability over the reachable edges. The old implementation distributes the remainder probability evenly. See examples in the fixed tests. Reviewers: yamauchi, ebrevnov Tags: #llvm Differential Revision: https://reviews.llvm.org/D80611	2020-06-02 12:06:52 +07:00
Florian Hahn	d99a1848c4	[BasicAA] Use known lower bounds for index values for size based check. Currently, BasicAA does not exploit information about value ranges of indexes. For example, consider the 2 pointers %a = %base and %b = %base + %stride below, assuming they are used to access 4 elements. If we know that %stride >= 4, we know the accesses do not alias. If %stride is a constant, BasicAA currently gets that. But if the >= 4 constraint is encoded using an assume, it misses the NoAlias. This patch extends DecomposedGEP to include an additional MinOtherOffset field, which tracks the constant offset similar to the existing OtherOffset, which the difference that it also includes non-negative lower bounds on the range of the index value. When checking if the distance between 2 accesses exceeds the access size, we can use this improved bound. For now this is limited to using non-negative lower bounds for indices, as this conveniently skips cases where we do not have a useful lower bound (because it is not constrained). We potential miss out in cases where the lower bound is constrained but negative, but that can be exploited in the future. Reviewers: sanjoy, hfinkel, reames, asbirlea Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D76194	2020-05-30 16:20:42 +01:00
David Green	a01c0049b1	[ConstantFolding] Constant folding for integer vector reduce intrinsics This add constant folding for all the integer vector reduce intrinsics, providing that the argument is a constant vector. zeroinitializer always produces 0 for all intrinsics, and other values can be handled with APInt operators. Differential Revision: https://reviews.llvm.org/D80516	2020-05-29 17:58:42 +01:00
Philip Reames	27304b1737	[Tests] Switch a few statepoint tests to using operand bundles We've started (D80598) the process of migrating away from the inline operand lists in statepoints to using explicit operand bundles. Update a few tests to reflect the new preference. More to come, these were simply the ones outside any obvious grouping.	2020-05-28 14:36:05 -07:00
Vitaly Buka	892c71a5bb	[StackSafety] Don't run datafow on allocas We need to process only parameters. Allocas access can be calculated afterwards. Also don't create fake function for aliases and just resolve them on initialization.	2020-05-28 13:32:57 -07:00
Vitaly Buka	f6383643d9	[StackSafety] Bailout on some function calls Don't miss values used in calls outside regular argument list.	2020-05-27 02:48:42 -07:00
Vitaly Buka	06a07dd608	[StackSafety] Fix formatting in the test	2020-05-27 02:48:41 -07:00
Vitaly Buka	b101c6251a	[StackSafety] Ignore some use of values We should ignore value used in MemTransferInst as other then src/dst argument.	2020-05-27 02:48:41 -07:00
Vitaly Buka	32a1f60d11	[StackSafety] Use SCEV to find mem operation length	2020-05-26 23:22:37 -07:00
Vitaly Buka	d0f1f5adfa	[StackSafety] Use getSignedRange for offsets	2020-05-26 23:22:36 -07:00
Vitaly Buka	b5ae70046b	[StackSafety] Simplify SCEVRewriteVisitor Probably NFC.	2020-05-26 18:09:43 -07:00
Sam Parker	792575ff32	[NFC][ARM][AArch64] More code size tests Add analysis runs for icmp, fcmp and select instructions.	2020-05-26 14:47:02 +01:00
Sam Parker	c5bbc8dd6d	[NFC][ARM] Fix for previous commit Actually analyse code-size for the size runs...	2020-05-26 10:45:35 +01:00
Sam Parker	48cdbd081c	[NFC][ARM] Add code size analysis tests Add code size runs for the cast costs.	2020-05-26 10:30:43 +01:00
Sam Parker	64cfb8a864	[NFC][ARM] Add intrinsic code size runs Add code size analysis of arithmetic intrinsics.	2020-05-26 09:41:54 +01:00
Sam Parker	1f72d5880e	[CostModel] Check for free intrinsics in BasicTTI Recommitting part of "[CostModel] Unify Intrinsic Costs." `de71def3f5` Now that the 'free' intrinsic information has been sunk to the lowest level, query the base implementation in BasicTTI before doing anything else. I suspect this is the change that was causing the main changes, particularly the large effects on debug builds. Differential Revision: https://reviews.llvm.org/D80012	2020-05-26 08:37:13 +01:00
Denis Antrushin	5451289aba	[SCEV] Constant fold MultExpr before applying depth limit. Summary: Users of SCEV reasonably assume that multiplication of two constant SCEVs will in turn be constant. However, that is not always the case: First, we can get here with reached depth limit, and will create MultExpr SCEV `C1 * C2` and cache it. Then, we can get here with the same operands, but with small depth level. But this time we will find existing MultExpr SCEV and return it, instead of expected constant SCEV. This patch changes getMultExpr to not apply depth limit to all constant operands expression, allowing them to be folded. Reviewers: reames, mkazantsev Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79893	2020-05-22 18:34:32 +03:00
Sam Parker	fb3ba38021	[CostModel] Remove getExtCost This has not been implemented by any backends which appear to cover the functionality through getCastInstrCost. Sink what there is in the default implementation into BasicTTI. Differential Revision: https://reviews.llvm.org/D78922	2020-05-21 07:18:06 +01:00
Yevgeny Rouban	8138487468	[BrachProbablityInfo] Set edge probabilities at once and fix calcMetadataWeights() Hide the method that allows setting probability for particular edge and introduce a public method that sets probabilities for all outgoing edges at once. Setting individual edge probability is error prone. More over it is difficult to check that the total probability is 1.0 because there is no easy way to know when the user finished setting all the probabilities. Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights(). Changing unreachable branch probabilities to raw(1) and distributing the rest (oldProbability - raw(1)) over the reachable branches could introduce total probability inaccuracy bigger than 1/numOfBranches. Reviewers: yamauchi, ebrevnov Tags: #llvm Differential Revision: https://reviews.llvm.org/D79396	2020-05-21 12:52:37 +07:00
Eli Friedman	f26bdb539e	Make Value::getPointerAlignment() return an Align, not a MaybeAlign. If we don't know anything about the alignment of a pointer, Align(1) is still correct: all pointers are at least 1-byte aligned. Included in this patch is a bugfix for an issue discovered during this cleanup: pointers with "dereferenceable" attributes/metadata were assumed to be aligned according to the type of the pointer. This wasn't intentional, as far as I can tell, so Loads.cpp was fixed to stop making this assumption. Frontends may need to be updated. I updated clang's handling of C++ references, and added a release note for this. Differential Revision: https://reviews.llvm.org/D80072	2020-05-20 16:37:20 -07:00
Nikita Popov	5fae613a4f	[LVI] Don't require DominatorTree in LVI (NFC) After D76797 the dominator tree is no longer used in LVI, so we can remove it as a pass dependency, and also get rid of the dominator tree enabling/disabling logic in JumpThreading. Apart from cleaning up the code, this also clarifies LVI cache consistency, in that the LVI cache can no longer depend on whether the DT was or wasn't enabled due to pending DT updates at any given time. Differential Revision: https://reviews.llvm.org/D76985	2020-05-19 20:21:46 +02:00
Eli Friedman	11aa3707e3	StoreInst should store Align, not MaybeAlign This is D77454, except for stores. All the infrastructure work was done for loads, so the remaining changes necessary are relatively small. Differential Revision: https://reviews.llvm.org/D79968	2020-05-15 12:26:58 -07:00
Nikita Popov	f89f7da999	[IR] Convert null-pointer-is-valid into an enum attribute The "null-pointer-is-valid" attribute needs to be checked by many pointer-related combines. To make the check more efficient, convert it from a string into an enum attribute. In the future, this attribute may be replaced with data layout properties. Differential Revision: https://reviews.llvm.org/D78862	2020-05-15 19:41:07 +02:00
Sam Parker	0ef62fc25d	[NFC][ARM] Intrinsic CostModel Tests Add throughput tests for saturating, overflowing and reduction operations.	2020-05-15 13:38:42 +01:00
Stanislav Mekhanoshin	184b383457	Add v16f64 value type We need to use it to handle <16 x double> indirect indexes in the AMDGPU BE. The only visible change from adding it is in ARM cost model. To me it looks reasonable. With doubling a vector size it quadruples the cost up to the size 8 and then it did only double it. Now it also quadruples, which seems a logical progression to me. Actual AMDGPU code is to follow, this is a common part, plus load/store legalization in the AMDGPU BE not to break what works now. Differential Revision: https://reviews.llvm.org/D79952	2020-05-14 14:28:00 -07:00
Eli Friedman	4532a50899	Infer alignment of unmarked loads in IR/bitcode parsing. For IR generated by a compiler, this is really simple: you just take the datalayout from the beginning of the file, and apply it to all the IR later in the file. For optimization testcases that don't care about the datalayout, this is also really simple: we just use the default datalayout. The complexity here comes from the fact that some LLVM tools allow overriding the datalayout: some tools have an explicit flag for this, some tools will infer a datalayout based on the code generation target. Supporting this properly required plumbing through a bunch of new machinery: we want to allow overriding the datalayout after the datalayout is parsed from the file, but before we use any information from it. Therefore, IR/bitcode parsing now has a callback to allow tools to compute the datalayout at the appropriate time. Not sure if I covered all the LLVM tools that want to use the callback. (clang? lli? Misc IR manipulation tools like llvm-link?). But this is at least enough for all the LLVM regression tests, and IR without a datalayout is not something frontends should generate. This change had some sort of weird effects for certain CodeGen regression tests: if the datalayout is overridden with a datalayout with a different program or stack address space, we now parse IR based on the overridden datalayout, instead of the one written in the file (or the default one, if none is specified). This broke a few AVR tests, and one AMDGPU test. Outside the CodeGen tests I mentioned, the test changes are all just fixing CHECK lines and moving around datalayout lines in weird places. Differential Revision: https://reviews.llvm.org/D78403	2020-05-14 13:03:50 -07:00
Sam Parker	6bbad7285c	[CostModel] Modify BasicTTI getCastInstrCost Fix the assumption that all bitcasts of the same type sizes are free. We now only assume that bitcasts between ints and ptrs of the same size are free. This allows TTImpl to just call the concrete implementation of getCastInstrCost. Differential Revision: https://reviews.llvm.org/D78918	2020-05-13 07:26:08 +01:00
Juneyoung Lee	e5f602d82c	[ValueTracking] Let propagatesPoison support binops/unaryops/cast/etc. Summary: This patch makes propagatesPoison be more accurate by returning true on more bin ops/unary ops/casts/etc. The changed test in ScalarEvolution/nsw.ll was introduced by `a19edc4d15` . IIUC, the goal of the tests is to show that iv.inc's SCEV expression still has no-overflow flags even if the loop isn't in the wanted form. It becomes more accurate with this patch, so think this is okay. Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, sanjoy Reviewed By: spatel, nikic Subscribers: regehr, nlopes, efriedma, fhahn, javed.absar, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D78615	2020-05-13 02:51:42 +09:00
Sam Parker	f1f8cffce4	[NFC][AArch64] More casts tests... Don't use truncs are users because sometimes they're free too.	2020-05-12 13:06:17 +01:00
Sam Parker	e114bdf072	[NFC][AArch64] More cast cost tests Add truncating stores and casts with users.	2020-05-12 11:32:52 +01:00
Sam Parker	b4a8091a11	[ARM][CostModel] Improve getCastInstrCost - Specifically check for sext/zext users which have 'long' form NEON instructions. - Add more entries to the table for sext/zexts so that we can report more accurately the number of vmovls required for NEON. - Pass the instruction to the pass implementation. Differential Revision: https://reviews.llvm.org/D79561	2020-05-12 10:32:20 +01:00
Sam Parker	1952c86d61	[AArch64][CostModel] getCastInstrCost Pass the instruction to the base implementation. Differential Revision: https://reviews.llvm.org/D79562	2020-05-12 10:02:29 +01:00
Sam Parker	494c7ecef9	[NFC][AArch64] Update tests Add cost model tests for extending loads.	2020-05-12 08:49:05 +01:00
Tyker	821a0f23d8	[AssumeBundles] Prevent generation of some redundant assumes Summary: with this patch the assume salvageKnowledge will not generate assume if all knowledge is already available in an assume with valid context. assume bulider can also in some cases update an existing assume with better information. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78014	2020-05-10 19:23:59 +02:00
zoecarver	f65f566aeb	Re-commit: Mark values as trivially dead when their only use is a start or end lifetime intrinsic. Summary: If the only use of a value is a start or end lifetime intrinsic then mark the intrinsic as trivially dead. This should allow for that value to then be removed as well. Currently, this only works for allocas, globals, and arguments. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79355	2020-05-08 12:24:10 -07:00
Nikita Popov	5a2265647e	Reapply [InstSimplify] Remove known bits constant folding No changes relative to last time, but after a mitigation for an AMDGPU regression landed. --- If SimplifyInstruction() does not succeed in simplifying the instruction, it will compute the known bits of the instruction in the hope that all bits are known and the instruction can be folded to a constant. I have removed a similar optimization from InstCombine in D75801, and would like to drop this one as well. On average, we spend ~1% of total compile-time performing this known bits calculation. However, if we introduce some additional statistics for known bits computations and how many of them succeed in simplifying the instruction we get (on test-suite): instsimplify.NumKnownBits: 216 instsimplify.NumKnownBitsComputed: 13828375 valuetracking.NumKnownBitsComputed: 45860806 Out of ~14M known bits calculations (accounting for approximately one third of all known bits calculations), only 0.0015% succeed in producing a constant. Those cases where we do succeed to compute all known bits will get folded by other passes like InstCombine later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test show a hash difference after this change. On lencod we see an improvement (a loop phi is optimized away), on the GCC torture test a regression (a function return value is determined only after IPSCCP, preventing propagation from a noinline function.) There are various regressions in InstSimplify tests. However, all of these cases are already handled by InstCombine, and corresponding tests have already been added there. Differential Revision: https://reviews.llvm.org/D79294	2020-05-08 10:24:53 +02:00
Sam Parker	751da4d596	[NFC][AArch64] Add test Add cost model test for cast operations.	2020-05-07 13:16:03 +01:00
Alina Sbirlea	8e911545d6	[MemorySSA] Make MemoryLocation unknown when phi translation cannot be performed. Summary: When phi translation cannot be performed, be conservative and make the MemoryLocation unknown. Reviewers: george.burgess.iv Subscribers: Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79386	2020-05-05 13:32:32 -07:00
Nikita Popov	46ee652c70	Revert "[InstSimplify] Remove known bits constant folding" This reverts commit `08556afc54`. This breaks some AMDGPU tests.	2020-05-03 20:45:10 +02:00
Nikita Popov	08556afc54	[InstSimplify] Remove known bits constant folding If SimplifyInstruction() does not succeed in simplifying the instruction, it will compute the known bits of the instruction in the hope that all bits are known and the instruction can be folded to a constant. I have removed a similar optimization from InstCombine in D75801, and would like to drop this one as well. On average, we spend ~1% of total compile-time performing this known bits calculation. However, if we introduce some additional statistics for known bits computations and how many of them succeed in simplifying the instruction we get (on test-suite): instsimplify.NumKnownBits: 216 instsimplify.NumKnownBitsComputed: 13828375 valuetracking.NumKnownBitsComputed: 45860806 Out of ~14M known bits calculations (accounting for approximately one third of all known bits calculations), only 0.0015% succeed in producing a constant. Those cases where we do succeed to compute all known bits will get folded by other passes like InstCombine later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test show a hash difference after this change. On lencod we see an improvement (a loop phi is optimized away), on the GCC torture test a regression (a function return value is determined only after IPSCCP, preventing propagation from a noinline function.) There are various regressions in InstSimplify tests. However, all of these cases are already handled by InstCombine, and corresponding tests have already been added there. Differential Revision: https://reviews.llvm.org/D79294	2020-05-03 20:26:58 +02:00
Nikita Popov	7cf0f8568c	[ValueTracking] Convert test to unit test (NFC) Test this directly, rather than going through InstSimplify.	2020-05-03 12:23:57 +02:00
Craig Topper	e39c7ab2b9	[CostModel][X86][ARM] Teach default implementation of getCastInstrCost to not add a split/join cost if source type and the destination type both have a SplitVector action If both the source and the destination need to be split then the two halves of the split operation are completely independent and don't need to be split or joined. So we don't need to assess a cost for the split or join. Differential Revision: https://reviews.llvm.org/D79111	2020-05-01 18:55:23 -07:00
Craig Topper	b938168aef	[X86] Lower the cost of v4i64->v4i32 truncate with avx512. We use the vpmovqd instruction which is a single uop. So the cost should be 1.	2020-05-01 11:09:37 -07:00
Craig Topper	6a1ad76dab	[X86] Don't return true from isTruncateFree for vectors Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to. Differential Revision: https://reviews.llvm.org/D79045	2020-04-30 16:43:35 -07:00
Craig Topper	ff66919020	[X86][CostModel] Bump the cost of vpermw/vpermt2b/vperm2w vpermw is 2 uops. vpermt2b/vpermt2w are two shuffle uops and a port 015 uop. Weirdly vpermb is a single uop. This patch bumps the cost to 2 for these operations. Maybe should go to 3 for the vpermt2*, but I've started conservative. I've also removed a few entries that were now the same as earlier subtargets or that I didn't think we really did. Like I don't think we extend v32i8 to v32i16, shuffle, and then truncate. Differential Revision: https://reviews.llvm.org/D79148	2020-04-30 11:32:25 -07:00
Evgeniy Brevnov	bb0842a3f1	[BPI] Incorrect probability reported in case of mulptiple edges. Summary: By design 'BranchProbabilityInfo:: getEdgeProbability(const BasicBlock Src, const BasicBlock Dst) const' should return sum of probabilities over all edges from Src to Dst. Current implementation is buggy and returns 1/num_of_successors if probabilities are not explicitly set. Note current implementation of BPI printing has an issue as well and annotates each edge with sum of probabilities over all ages from one basic block to another. That's why 30% probability reported (instead of 10%) in the lit test. This is not urgent issue since only printing is affected. Note also current implementation assumes that either all or none edges have probabilities set. This is not the only place which uses such assumption. At least we should assert that in verifier. In addition we can think on a more robust API of BPI which would prevent situations. Reviewers: skatkov, yrouban, taewookoh Reviewed By: skatkov Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79071	2020-04-30 11:41:03 +07:00
Alina Sbirlea	161ccfe5ba	[MemorySSA] Pass DT to the upward iterator for proper PhiTranslation. Summary: A valid DominatorTree is needed to do PhiTranslation. Before this patch, a MemoryUse could be optimized to an access outside a loop, while the address it loads from is modified in the loop. This can lead to a miscompile. Reviewers: george.burgess.iv Subscribers: Prazek, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79068	2020-04-29 14:28:31 -07:00
Craig Topper	cff6686532	[X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these For v4i64->v4i32 we generate: vextractf128 xmm1, ymm0, 1 vshufps xmm0, xmm0, xmm1, 136 # xmm0 = xmm0[0,2],xmm1[0,2] And for v8i64->v8i32 we generate: vperm2f128 ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3] vinsertf128 ymm0, ymm0, xmm1, 1 vshufps ymm0, ymm0, ymm2, 136 # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6] Differential Revision: https://reviews.llvm.org/D79109	2020-04-29 13:21:44 -07:00
Sam Parker	e9d0f1c8ea	[NFC][ARM] Modify cost model test	2020-04-29 12:42:47 +01:00
Sam Parker	850bdefa65	[NFC][ARM] Add two cost model tests	2020-04-29 12:36:05 +01:00
Simon Pilgrim	090cae8491	[TTI] Add DemandedElts to getScalarizationOverhead The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited. This patch does 2 things: 1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern. 2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs. This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing. A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well. Reviewed By: @craig.topper Differential Revision: https://reviews.llvm.org/D78216	2020-04-29 12:00:38 +01:00
Craig Topper	59b9e6fe76	[X86] Update costs for truncates from less than 128-bit vectors to vXi1 on pre-avx512 targets vXi1 types are legalized by promoting, but the narrow vectors are legalized by widening. This results in some truncates turning into any_extends.	2020-04-28 11:35:41 -07:00
Craig Topper	d42192c50f	[X86][CostModel] Correct the costs for truncate to a mask register with avx512 I've modified isTruncateFree to get an accurate cost for types that need to be split. I'm planning to look into fixing it for all vectors, but need more cost cleanups first. Differential Revision: https://reviews.llvm.org/D78973	2020-04-28 10:39:36 -07:00
Craig Topper	9ea5cc8a25	[X86][CostModel] Add vXiY->vXi1 truncate tests to min-legal-vector-width.ll. NFC	2020-04-27 15:48:11 -07:00
Craig Topper	37ec709233	[X86][CostModel] Update truncate costs for some narrow vector cases to match their wider version. This updates v4i16->v4i8 with sse2 to match v8i16->v8i8. Update v2i16->v2i8 and v4i16->v4i8 with sse 4.1 to match v8i16->v8i8.	2020-04-27 13:47:48 -07:00
Craig Topper	bdbbed115f	[X86][CostModel] Update costs for vector truncate with avx512f/avx512bw. All avx512 truncate instructions except vXi64->vXi32 are 2 uops on port 5. So raise their costs to 2. Except when we have an earlier faster sequence like pshufb for 128 bit input vectors. Add a lower cost of 3 v16i16->v16i8 with avx512f where we can extend to v16i32 then truncate. And a cost of 2 for avx512bw with and without avx512vl. There we can use vpmovwb with either a ymm or zmm input. Both of these beat masking, splitting, and using packuswb which is our avx/avx2 codegen.	2020-04-27 12:00:24 -07:00
Craig Topper	5eff75d86a	[X86][CostModel] Improve costs for fp_to_uint/fp_to_sint for vXi8/vXi16/v2i32 results. Differential Revision: https://reviews.llvm.org/D78893	2020-04-27 10:35:15 -07:00
Craig Topper	8296bcf76f	[X86][CostModel] Fix typos in test. NFC	2020-04-26 21:17:38 -07:00
Craig Topper	5f2ea70980	[X86] Add cost model tests for conversions between <2 x float> and integers. For all but 2 x i32 we were starting from 4 x float.	2020-04-26 19:59:01 -07:00
Craig Topper	b9de62c2b6	[X86] Fix the cost of v16i1->v16i16 sext/zext on avx targets. Previously we were hitting the scalarization case in the default implementation.	2020-04-25 23:16:20 -07:00
Craig Topper	19cb26f517	[X86][CostModel] Improve costs for vXi1 sign_extend/zero_extend with avx512. With avx512 vXi1 is legal and uses k-registers with many custom cases for extending.	2020-04-25 23:16:20 -07:00
Craig Topper	084433702d	[X86][CostModel] Add sext/zext from vXi1 tests to min-legal-vector-width.ll. NFC We aren't properly costing extends from k-registers. I also added command lines without avx512bw to be able to show all the different extending strategies we have.	2020-04-25 23:15:40 -07:00
Craig Topper	061f330d7e	[X86] Add avx512vl to the truncate cost model test. NFC	2020-04-25 12:59:10 -07:00
Craig Topper	999058ba5e	[X86] Add cost model tests for truncating from v2i8/v4i8/v8i8/v16i8 to vXi1. NFC	2020-04-24 23:11:17 -07:00
Craig Topper	7664a0d282	[X86] Improve accuracy of cost for v16i64->v16i8 truncate with avx512. The 2 vpmovqds are only 1 uop each.	2020-04-24 19:13:55 -07:00
Craig Topper	03aa967c0d	[CostModel][X86][ARM] Teach getCastInstrCost to include the splitting factor when handling operations that type legalize to the same number of subvectors or scalar components Previously, we just always returned 1. But that ignores that we have to do the operation for each subvector or scalar component. Differential Revision: https://reviews.llvm.org/D78824	2020-04-24 13:36:26 -07:00
Tyker	42431da895	[AssumeBundles] Use assume bundles in isKnownNonZero Summary: Use nonnull and dereferenceable from an assume bundle in isKnownNonZero Reviewers: jdoerfert, nikic, lebedev.ri, reames, fhahn, sstefan1 Reviewed By: jdoerfert Subscribers: fhahn, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76149	2020-04-24 20:41:51 +02:00
Craig Topper	4cf73a3fc6	[CostModel][X86] Account for splitting cost when vector zext/sext type legalize to the same size vector.	2020-04-24 09:59:23 -07:00
Juneyoung Lee	aca335955c	[ValueTracking] Let analyses assume a value cannot be partially poison Summary: This is RFC for fixes in poison-related functions of ValueTracking. These functions assume that a value can be poison bitwisely, but the semantics of bitwise poison is not clear at the moment. Allowing a value to have bitwise poison adds complexity to reasoning about correctness of optimizations. This patch makes the analysis functions simply assume that a value is either fully poison or not, which has been used to understand the correctness of a few previous optimizations. The bitwise poison semantics seems to be only used by these functions as well. In terms of implementation, using value-wise poison concept makes existing functions do more precise analysis, which is what this patch contains. Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr Reviewed By: nikic Subscribers: fhahn, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78503	2020-04-23 08:08:53 +09:00
Juneyoung Lee	5ceef26350	Revert "RFC: [ValueTracking] Let analyses assume a value cannot be partially poison" This reverts commit `80faa8c3af`.	2020-04-23 08:07:09 +09:00
Juneyoung Lee	80faa8c3af	RFC: [ValueTracking] Let analyses assume a value cannot be partially poison Summary: This is RFC for fixes in poison-related functions of ValueTracking. These functions assume that a value can be poison bitwisely, but the semantics of bitwise poison is not clear at the moment. Allowing a value to have bitwise poison adds complexity to reasoning about correctness of optimizations. This patch makes the analysis functions simply assume that a value is either fully poison or not, which has been used to understand the correctness of a few previous optimizations. The bitwise poison semantics seems to be only used by these functions as well. In terms of implementation, using value-wise poison concept makes existing functions do more precise analysis, which is what this patch contains. Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr Reviewed By: nikic Subscribers: fhahn, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78503	2020-04-23 07:57:12 +09:00
Sam Parker	04ef154124	[NFC] Test changes Add some more targets for the ARM cost model tests and add some tests for icmps and bitcasts.	2020-04-22 08:28:52 +01:00
Eli Friedman	9b9454af8a	Require "target datalayout" to be at the beginning of an IR file. This will allow us to use the datalayout to disambiguate other constructs in IR, like load alignment. Split off from D78403. Differential Revision: https://reviews.llvm.org/D78413	2020-04-20 11:55:49 -07:00
Craig Topper	8dfb9627b7	[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead. This moves v32i16/v64i8 to a model consistent with how we treat integer types with avx1. This does change the ABI for types vXi16/vXi8 vectors larger than 512 bits to pass in multiple zmms instead of multiple ymms. We'd already hacked some code to make v64i8/v32i16 pass in zmm. Cost model is still a bit of a mess. In some place I tried to match existing behavior. But really we need to account for splitting and concating costs. Cost model for shuffles is especially pessimistic. Differential Revision: https://reviews.llvm.org/D76212	2020-04-15 12:17:18 -07:00
Simon Pilgrim	2f951e99c6	[CostModel][X86] Regenerate load_store.ll costs tests Add SSE + AVX512 targets Add some illegal type store tests	2020-04-15 11:54:39 +01:00
Tyker	1d2b76a8fc	[AssumeBundles] adapte GVN to assume bundles Summary: prevent GVN from removing assume bundles make GVN preserve information from removed instructions Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77405	2020-04-14 12:48:14 +02:00
Craig Topper	2f60fbce6c	[X86] Use a more realisitic cost for truncate v16i64->v16i8 with avx512f. Still not great and we could probably codegen this better, but 11 was clearly ridiculous.	2020-04-13 21:09:43 -07:00
Craig Topper	071c64d68d	[X86] Add a more accurate truncate cost for v8i64->v8i8	2020-04-13 21:09:41 -07:00
Craig Topper	b37b1840eb	[X86] Add truncate cost model tests to min-legal-vector-width.ll for when we're avoiding 512 bit vectors.	2020-04-13 21:09:40 -07:00
Simon Pilgrim	353347288b	[CostModel][X86] Remove comments that begin with a filecheck prefix. Stop filecheck from confusing a general comment with a check.	2020-04-13 18:39:24 +01:00
Huihui Zhang	6c989d0248	[BasicAA] Fix aliasGEP/DecomposeGEPExpression for scalable type. Summary: Don't attempt to analyze the decomposed GEP for scalable type. GEP index scale is not compile-time constant for scalable type. Be conservative, return MayAlias. Explicitly call TypeSize::getFixedSize() to assert on places where scalable type doesn't make sense. Add unit tests to check functionality of -basicaa for scalable type. This patch is needed for D76944. Reviewers: sdesmalen, efriedma, spatel, bjope, ctetreau Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77828	2020-04-10 16:58:26 -07:00
Simon Pilgrim	91bc50c0d7	[CostModel][X86] Improve InsertElement costs for sub-128bit vectors If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs	2020-04-10 14:55:46 +01:00
Craig Topper	5625e6ab37	[X86] Improve min/max reduction costs. This is similar to what I recently did for getArithmeticReductionCost. I'm trying to account for the narrowing from 512->256->128 as we go. I've also added a new helper method getMinMaxCost that tries to handle the cases where we have native min/max instructions and fall back to cmp+select when we don't. Differential Revision: https://reviews.llvm.org/D76634	2020-04-09 17:28:50 -07:00
Simon Pilgrim	12c629ec6c	[CostModel][X86] Add shuffle costs for some common sub-128bit vectors v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.	2020-04-09 19:57:06 +01:00
Simon Pilgrim	898e22908c	[MemorySSA] invariant-groups.ll - add missing check to fix issue reported on D77354	2020-04-08 15:18:04 +01:00
Sanjay Patel	a2bb19ca42	[x86] add size cost tests for casts and binops; NFC Shows bugs for div/rem/fdiv and possibly others.	2020-04-06 12:38:15 -04:00
Jonathan Roelofs	7c5d2bec76	[llvm] Fix missing FileCheck directive colons https://reviews.llvm.org/D77352	2020-04-06 09:59:08 -06:00
Simon Pilgrim	be84d2b5b7	[CostModel][X86] Add some insert subvector cost tests for vXf32/vXi32/vXi16/vXi8 types	2020-04-04 22:46:57 +01:00
Simon Pilgrim	6a57ba17c0	[CostModel][X86] Add shuffle cost tests for sub-128bit vectors	2020-04-04 13:08:25 +01:00
Simon Pilgrim	87fd686f6f	[CostModel][X86] Add insert/extract cost tests for sub-128bit vXi8/vXi16 vectors	2020-04-04 13:08:25 +01:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Denis Antrushin	06c58f11a9	[SCEV] Use backedge SCEV of PHI only if its input is loop invariant For the PHI node %1 = phi [%A, %entry], [%X, %latch] it is incorrect to use SCEV of backedge val %X as an exit value of PHI unless %X is loop invariant. This is because exit value of %1 is value of %X at one-before-last iteration of the loop. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D73181	2020-03-31 18:39:24 +07:00
Sebastian Neubauer	5d3a69feca	[AMDGPU] New llvm.amdgcn.ballot intrinsic Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes. This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period. Use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D65088	2020-03-31 10:35:39 +02:00
Craig Topper	b4695351cb	[TTI][X86] Fix the value passed to IsUnsigned for cost modeling of experimental.vector.reduce.smin/smax/umin/umax. We were passing true for smax/smin and false for umax/umin.	2020-03-29 23:34:22 -07:00
Craig Topper	d74533a18b	[X86] Add sse4.1 RUNs lines to the min/max reduction cost model tests. Mostly this matches the sse4.2 we already had command lines for. Except in the i64 case since sse4.1 doesn't have pcmpgtq.	2020-03-29 16:05:35 -07:00
Craig Topper	c0aa97b632	[X86] Add cost model test cases for fmin/fmax reduction.	2020-03-28 17:12:49 -07:00
Craig Topper	f4c67dfa92	[X86] More accurately model the cost of horizontal reductions. This patch attempts to more accurately model the reduction of power of 2 vectors of types we natively support. This takes into account the narrowing of vectors that occur as we go from 512 bits to 256 bits, to 128 bits. It also takes into account the use of wider elements in the shuffles for the first 2 steps of a reduction from 128 bits. And uses a v8i16 shift for the final step of vXi8 reduction. The default implementation uses the legalized type for the arithmetic for all levels. And uses the single source permute cost of the legalized type for all levels. This penalizes things like lack of v16i8 pshufb on pre-sse3 targets and the splitting and joining that needs to be done for integer types on AVX1. We never need v16i8 shuffle for a reduction and we only need split AVX1 ops when type the type wide and needs to be split. I think we're still over costing splits and joins for AVX1, but we're closer now. I've also removed all pairwise special casing because I don't think we ever want to generate that on X86. I've also adjusted the add handling to more accurately account for any type splitting that occurs before we reach a legal type. Differential Revision: https://reviews.llvm.org/D76478	2020-03-22 14:20:15 -07:00
Bjorn Pettersson	d077d678d3	[ValueTracking] Avoid blind cast from Operator to Instruction Summary: Avoid blind cast from Operator to ExtractElementInst in computeKnownBitsFromOperator. This resulted in some crashes in downstream fuzzy testing. Instead we use getOperand directly on the Operator when accessing the vector/index operands. Haven't seen any problems with InsertElement and ShuffleVector, but I believe those could be used in constant expressions as well. So the same kind of fix as for ExtractElement was also applied for InsertElement. When it comes to ShuffleVector we now simply bail out if a dynamic cast of the Operator to ShuffleVectorInst fails. I've got no reproducer indicating problems for ShuffleVector, and a fix would be slightly more complicated as getShuffleDemandedElts is involved. Reviewers: RKSimon, nikic, spatel, efriedma Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76564	2020-03-22 14:45:31 +01:00
Nikita Popov	ce6c95aaca	[InstCombine] Move test to instcombine; NFC This test uses -instcombine, so move it into the appropriate directory. Also fork it for expensive checks enabled/disabled.	2020-03-20 12:41:19 +01:00
Nikita Popov	a09ff56b5b	[Tests] Regenerate some test checks; NFC	2020-03-20 12:06:53 +01:00
Craig Topper	c13aa36bb7	[X86] Attempt to more accurately model the cost of a bool reduction of wide vector type. Previously we multiplied the cost for the table entries by the number of splits needed. But that implies that each split goes through a reduction to scalar independently. I think what really happens is that the we AND/OR the split pieces until we're down to a single value with a legal type and then do special reduction sequence on that. So to model that this patch takes the number of splits minus one multiplied by the cost of a AND/OR at the legal element count and adds that on top of the table lookup. Differential Revision: https://reviews.llvm.org/D76400	2020-03-19 09:31:05 -07:00
Eli Friedman	ebec984e14	[AliasAnalysis] Misc fixes for checking aliasing with scalable types. This is fixing up various places that use the implicit TypeSize->uint64_t conversion. The new overloads in MemoryLocation.h are already used in various places that construct a MemoryLocation from a TypeSize, including MemorySSA. (They were using the implicit conversion before.) Differential Revision: https://reviews.llvm.org/D76249	2020-03-18 12:28:47 -07:00
Sam Parker	ef56b55e12	[NFC][ARM] Add thumb triple to test Test the costs of selects for thumbv8m.base too.	2020-03-18 09:15:19 +00:00
Evgenii Stepanov	2a3723ef11	[memtag] Plug in stack safety analysis. Summary: Run StackSafetyAnalysis at the end of the IR pipeline and annotate proven safe allocas with !stack-safe metadata. Do not instrument such allocas in the AArch64StackTagging pass. Reviewers: pcc, vitalybuka, ostannard Reviewed By: vitalybuka Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, gilang, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73513	2020-03-16 16:35:25 -07:00
Nico Weber	623cb95eb3	Revert "[InstSimplify] Simplify calls with "returned" attribute" This reverts commit `45555c3819`. Causes clang crashes in some causes, see comments on https://reviews.llvm.org/D75815 for details (including repro steps).	2020-03-16 15:21:30 -04:00
Craig Topper	b2da1ddaef	[X86] Add a non-zero cost for truncating v32i16->v32i8 on avx512bw.	2020-03-15 17:18:46 -07:00
Florian Hahn	b8b8f04c0d	[ValueLattice] Go to overdefined in getRange() for full ranges. This is was split off `4878aa36d4`, as it can go in separately.	2020-03-14 19:50:15 +00:00
Eli Friedman	65fc706ddf	[SCEV] Add support for GEPs over scalable vectors. Because we have to use a ConstantExpr at some point, the canonical form isn't set in stone, but this seems reasonable. The pretty sizeof(<vscale x 4 x i32>) dumping is a relic of ancient LLVM; I didn't have to touch that code. :) Differential Revision: https://reviews.llvm.org/D75887	2020-03-13 16:12:45 -07:00
Simon Pilgrim	a2db388dce	[CostModel][X86] Improve ISD::CTTZ costs accounting for BSF/TZCNT implementations	2020-03-13 16:51:13 +00:00
Huihui Zhang	f4f2706572	[ConstantFold][SVE] Fix constant folding for scalable vector compare instruction. Summary: Do not iterate on scalable vector. Also do not return constant scalable vector from ConstantInt::get(). Fix result type by using getElementCount() instead of getNumElements(). Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73753	2020-03-12 16:15:38 -07:00
Anna Welker	a6d3bec83f	[TTI][ARM][MVE] Refine gather/scatter cost model Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper. Differential Revision: https://reviews.llvm.org/D75525	2020-03-11 10:23:41 +00:00
Simon Pilgrim	5cbddf7cbc	[X86][SSE] Add more accurate costs for fmaxnum/fminnum codegen Based off llvm-mca reports on codegen in llvm\test\CodeGen\X86\fmaxnum.ll + llvm\test\CodeGen\X86\fminnum.ll	2020-03-10 11:59:40 +00:00
Simon Pilgrim	0b1dc6016f	[CostModel][X86] Add fmaxnum/fminnum costs tests	2020-03-10 11:18:27 +00:00
Nikita Popov	45555c3819	[InstSimplify] Simplify calls with "returned" attribute If a call argument has the "returned" attribute, we can simplify the call to the value of that argument. The "-inst-simplify" pass already handled this for the constant integer argument case via known bits, which is invoked in SimplifyInstruction. However, non-constant (or non-int) arguments are not handled at all right now. This addresses one of the regressions from D75801. Differential Revision: https://reviews.llvm.org/D75815	2020-03-09 18:53:47 +01:00
Jay Foad	596446623b	[AMDGPU][ConstantFolding] Fold llvm.amdgcn.cube* intrinsics Summary: This folds the following family of intrinsics: llvm.amdgcn.cubeid (face id) llvm.amdgcn.cubema (major axis) llvm.amdgcn.cubesc (S coordinate) llvm.amdgcn.cubetc (T coordinate) Reviewers: nhaehnle, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75187	2020-03-06 16:42:53 +00:00
David Green	587feec07e	[ARM] Change all tests from "thumbv8.1-m.main" to "thumbv8.1m.main". NFC	2020-03-04 13:47:35 +00:00
Evgeniy Brevnov	e60c28746b	Lost regression test from commit `5a63813dc7`.	2020-03-04 19:52:42 +07:00
Whitney Tsang	c84532a70a	[LoopNest]: Analysis to discover properties of a loop nest. Summary: This patch adds an analysis pass to collect loop nests and summarize properties of the nest (e.g the nest depth, whether the nest is perfect, what's the innermost loop, etc...). The motivation for this patch was discussed at the latest meeting of the LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we discussed the unimodular loop transformation framework ( “A Loop Transformation Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework provides a convenient way to unify legality checking and code generation for several loop nest transformations (e.g. loop reversal, loop interchange, loop skewing) and their compositions. Given that the unimodular framework is applicable to perfect loop nests this is one property of interest we expose in this analysis. Several other utility functions are also provided. In the future other properties of interest can be added in a centralized place. Authored By: etiotto Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn, reames, hfinkel, jdoerfert, ppc-slack Reviewed By: Meinersbur Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D68789	2020-03-03 18:25:19 +00:00
Whitney Tsang	613f791131	Revert "[LoopNest]: Analysis to discover properties of a loop nest." This reverts commit `3a063d68e3`. Broke the build with modules enabled: http://green.lab.llvm.org/green/job/lldb-cmake/10655/console .	2020-03-03 14:07:49 +00:00
Whitney Tsang	3a063d68e3	[LoopNest]: Analysis to discover properties of a loop nest. Summary: This patch adds an analysis pass to collect loop nests and summarize properties of the nest (e.g the nest depth, whether the nest is perfect, what's the innermost loop, etc...). The motivation for this patch was discussed at the latest meeting of the LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we discussed the unimodular loop transformation framework ( “A Loop Transformation Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework provides a convenient way to unify legality checking and code generation for several loop nest transformations (e.g. loop reversal, loop interchange, loop skewing) and their compositions. Given that the unimodular framework is applicable to perfect loop nests this is one property of interest we expose in this analysis. Several other utility functions are also provided. In the future other properties of interest can be added in a centralized place. Authored By: etiotto Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn, reames, hfinkel, jdoerfert, ppc-slack Reviewed By: Meinersbur Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D68789	2020-03-03 13:25:28 +00:00
Simon Pilgrim	174cb7c695	[CostModel][X86] Add vXi1 extract/insert cost tests	2020-03-02 11:41:20 +00:00
Simon Pilgrim	168a44a70e	[CostModel][X86] Improve extract/insert element costs (PR43605) This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index. It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true). Differential Revision: https://reviews.llvm.org/D74976	2020-02-27 15:54:13 +00:00

1 2 3 4 5 ...

2186 Commits