llvm-project

Commit Graph

Author	SHA1	Message	Date
Nico Weber	a852ee199c	Reland "[MachineDebugify] Insert synthetic DBG_VALUE instructions" This reverts commit `841f9c937f`. The change landed many months ago; something else broke those tests.	2020-12-14 22:34:23 -05:00
Nico Weber	841f9c937f	Revert "[MachineDebugify] Insert synthetic DBG_VALUE instructions" This reverts commit `2a5675f11d`. The tests it adds fail: https://reviews.llvm.org/D78135#2453736	2020-12-14 22:14:48 -05:00
Reid Kleckner	d2ed9d6b7e	Revert "ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC" We determined that the MSVC implementation of std::aligned* isn't suited to our needs. It doesn't support 16 byte alignment or higher, and it doesn't really guarantee 8 byte alignment. See https://github.com/microsoft/STL/issues/1533 Also reverts "ADT: Change AlignedCharArrayUnion to an alias of std::aligned_union_t, NFC" Also reverts "ADT: Remove AlignedCharArrayUnion, NFC" to bring back AlignedCharArrayUnion. This reverts commit `4d8bf870a8`. This reverts commit `d10f9863a5`. This reverts commit `4b5dc150b9`.	2020-12-14 17:04:06 -08:00
Rong Xu	54e03d03a7	[PGO] Verify BFI counts after loading profile data This patch adds the functionality to compare BFI counts with real profile counts right after reading the profile. It will print remarks under -Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo. Differential Revision: https://reviews.llvm.org/D91813	2020-12-14 15:56:10 -08:00
Gulfem Savrun Yeniceri	7c0e3a77bc	[clang][IR] Add support for leaf attribute This patch adds support for leaf attribute as an optimization hint in Clang/LLVM. Differential Revision: https://reviews.llvm.org/D90275	2020-12-14 14:48:17 -08:00
Sanjay Patel	d399f870b5	[VectorCombine] make load transform poison-safe As noted in D93229, the transform from scalar load to vector load potentially leaks poison from the extra vector elements that are being loaded. We could use freeze here (and x86 codegen at least appears to be the same either way), but we already have a shuffle in this logic to optionally change the vector size, so let's allow that instruction to serve both purposes. Differential Revision: https://reviews.llvm.org/D93238	2020-12-14 17:42:01 -05:00
Craig Topper	25067f179f	[LoopIdiomRecognize] Teach detectShiftUntilZeroIdiom to recognize loops where the counter is decrementing. This adds support for loops like unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT; while (x) { w--; x >>= 1; } return w; } and unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT - 1; while (x >>= 1) { w--; } return w; } To support these we look for add x, -1 as well as add x, 1 that we already matched. If the value was -1 we need to subtract from the initial counter value instead of adding to it. Fixes PR48404. Differential Revision: https://reviews.llvm.org/D92745	2020-12-14 14:25:05 -08:00
Philip Reames	f5fe8493e5	[LAA] Relax restrictions on early exits in loop structure his is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be mostly NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist). Moving the checks does end up changing some of the optimization warnings and debug output slightly, but nothing that appears to be a regression. Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so. Differential Revision: https://reviews.llvm.org/D92066	2020-12-14 12:44:01 -08:00
Roman Lebedev	59560e8589	[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450) Even though `d38205144f` was mostly a correct fix for the external non-PHI users, it's not a generally correct fix, because the 'placeholder' values in those trivial PHI's we create shouldn't be always 'undef', but the PHI itself for the backedges, else we end up with wrong value, as the `@pr48450_2` test shows. But we can't just do that, because we can't check that the PHI can be it's own incoming value when coming from certain predecessor, because we don't have a dominator tree. So until we can address this correctness problem properly, ensure that we don't perform the transformation if there are such problematic external uses. Making dominator tree available there is going to be involved, since `-simplifycfg` pass currently does not preserve/update domtree...	2020-12-14 20:14:31 +03:00
Roman Lebedev	e8360a8e1e	[NFC][SimplifyCFG] FoldBranchToCommonDest(): pull out 'common successor' into a variable Makes it easier to use it elsewhere	2020-12-14 20:14:31 +03:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Markus Lavin	2a6782bb9f	Reland [DebugInfo] Improve dbg preservation in LSR. Use SCEV to salvage additional @llvm.dbg.value that have turned into referencing undef after transformation (and traditional salvageDebugInfo). Before rewrite (but after introduction of new induction variables) use SCEV to compute an equivalent set of values for each @llvm.dbg.value in the loop body (among the loop header PHI-nodes). After rewrite (and dead PHI elimination) update those @llvm.dbg.value now referencing undef by picking a remaining value from its equivalence set. Allow match with offset by inserting compensation code in the DIExpression. Fixes : PR38815 Differential Revision: https://reviews.llvm.org/D87494	2020-12-14 16:15:18 +01:00
Florian Hahn	e42e5263bd	[VPlan] Make VPWidenMemoryInstructionRecipe a VPDef. This patch updates VPWidenMemoryInstructionRecipe to use VPDef to manage the value it produces instead of inheriting from VPValue. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90563	2020-12-14 14:13:59 +00:00
Anton Afanasyev	fac7c7ec3c	[SLP] Fix vector element size for the store chains Vector element size could be different for different store chains. This patch prevents wrong computation of maximum number of elements for that case. Differential Revision: https://reviews.llvm.org/D93192	2020-12-14 15:51:43 +03:00
Kazu Hirata	5891ad4e22	[Transforms] Use llvm::erase_value (NFC)	2020-12-13 09:48:47 -08:00
Florian Hahn	533f85767c	[VPlan] Use interleaveComma in printOperands() (NFC).	2020-12-13 16:29:16 +00:00
Roman Lebedev	d38205144f	[SimplifyCFG] FoldBranchToCommonDest(): bonus instrns must only be used by PHI nodes in successors (PR48450) In particular, if the successor block, which is about to get a new predecessor block, currently only has a single predecessor, then the bonus instructions will be directly used within said successor, which is fine, since the block with bonus instructions dominates that successor. But once there's a new predecessor, the IR is no longer valid, and we don't fix it, because we only update PHI nodes. Which means, the live-out bonus instructions must be exclusively used by the PHI nodes in successor blocks. So we have to form trivial PHI nodes. which will then be successfully updated to recieve cloned bonus instns. This all works fine, except for the fact that we don't have access to the dominator tree, and we don't ignore unreachable code, so we sometimes do end up having to deal with some weird IR. Fixes https://bugs.llvm.org/show_bug.cgi?id=48450	2020-12-13 00:06:57 +03:00
Nikita Popov	afbb6d97b5	[CVP] Simplify and generalize switch handling CVP currently handles switches by checking an equality predicate on all edges from predecessor blocks. Of course, this can only work if the value being switched over is defined in a different block. Replace this implementation with a call to getPredicateAt(), which also does the predecessor edge predicate check (if not defined in the same block), but can also do quite a bit more: It can reason about phi-nodes by checking edge predicates for incoming values, it can reason about assumes, and it can reason about block values. As such, this makes the implementation both simpler and more powerful. The compile-time impact on CTMark is in the noise.	2020-12-12 21:12:27 +01:00
Kazu Hirata	215c1b1935	[Transforms] Use is_contained (NFC)	2020-12-12 09:37:49 -08:00
David Green	ab97c9bdb7	[LV] Fix scalar cost for tail predicated loops When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop. Original patch by Anna Welker Differential Revision: https://reviews.llvm.org/D86452	2020-12-12 14:21:40 +00:00
Fangrui Song	b5ad32ef5c	Migrate deprecated DebugLoc::get to DILocation::get This migrates all LLVM (except Kaleidoscope and CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get. The CodeGen/StackProtector.cpp usage may have a nullptr Scope and can trigger an assertion failure, so I don't migrate it. Reviewed By: #debug-info, dblaikie Differential Revision: https://reviews.llvm.org/D93087	2020-12-11 12:45:22 -08:00
Marco Elver	c28b18af19	[KernelAddressSanitizer] Fix globals exclusion for indirect aliases GlobalAlias::getAliasee() may not always point directly to a GlobalVariable. In such cases, try to find the canonical GlobalVariable that the alias refers to. Link: https://github.com/ClangBuiltLinux/linux/issues/1208 Reviewed By: dvyukov, nickdesaulniers Differential Revision: https://reviews.llvm.org/D92846	2020-12-11 12:20:40 +01:00
David Sherwood	9b76160e53	[Support] Introduce a new InstructionCost class This is the first in a series of patches that attempts to migrate existing cost instructions to return a new InstructionCost class in place of a simple integer. This new class is intended to be as light-weight and simple as possible, with a full range of arithmetic and comparison operators that largely mirror the same sets of operations on basic types, such as integers. The main advantage to using an InstructionCost is that it can encode a particular cost state in addition to a value. The initial implementation only has two states - Normal and Invalid - but these could be expanded over time if necessary. An invalid state can be used to represent an unknown cost or an instruction that is prohibitively expensive. This patch adds the new class and changes the getInstructionCost interface to return the new class. Other cost functions, such as getUserCost, etc., will be migrated in future patches as I believe this to be less disruptive. One benefit of this new class is that it provides a way to unify many of the magic costs in the codebase where the cost is set to a deliberately high number to prevent optimisations taking place, e.g. vectorization. It also provides a route to represent the extremely high, and unknown, cost of scalarization of scalable vectors, which is not currently supported. Differential Revision: https://reviews.llvm.org/D91174	2020-12-11 08:12:54 +00:00
Hongtao Yu	705a4c149d	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 17:29:28 -08:00
Mitch Phillips	7ead5f5aa3	Revert "[CSSPGO] Pseudo probe encoding and emission." This reverts commit `b035513c06`. Reason: Broke the ASan buildbots: http://lab.llvm.org:8011/#/builders/5/builds/2269	2020-12-10 15:53:39 -08:00
Zequan Wu	b5216b2950	[PGO] Enable preinline and cleanup when optimize for size Differential Revision: https://reviews.llvm.org/D91673	2020-12-10 12:29:17 -08:00
Sanjay Patel	4f051fe374	[InstCombine] avoid crash sinking to unreachable block The test is reduced from the example in D82005. Similar to `94f6d365e`, the test here would assert in the DomTree when we tried to convert a select to a phi with an unreachable block operand. We may want to add some kind of guard code in DomTree itself to avoid this sort of problem.	2020-12-10 13:10:26 -05:00
Sanjay Patel	12b684ae02	[VectorCombine] improve readability; NFC If we are going to allow adjusting the pointer for GEPs, rearranging the code a bit will make it easier to follow.	2020-12-10 13:10:26 -05:00
Hongtao Yu	b035513c06	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 09:50:08 -08:00
Jun Ma	137674f882	[TruncInstCombine] Remove scalable vector restriction Differential Revision: https://reviews.llvm.org/D92819	2020-12-10 18:00:19 +08:00
Jianzhou Zhao	ea981165a4	[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char, int> get_token(char p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************ Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. ************************* * An potential implementation plan ************************* The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. ************* * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert\|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261	2020-12-09 19:38:35 +00:00
Sanjay Patel	b2ef264096	[VectorCombine] allow peeking through an extractelt when creating a vector load This is an enhancement to load vectorization that is motivated by a pattern in https://llvm.org/PR16739. Unfortunately, it's still not enough to make a difference there. We will have to handle multi-use cases in some better way to avoid creating multiple overlapping loads. Differential Revision: https://reviews.llvm.org/D92858	2020-12-09 10:36:14 -05:00
Roman Lebedev	e6f2a79d7a	[InstCombine] canonicalizeSaturatedAdd(): last fold is only valid for strict comparison (PR48390) We could create uadd.sat under incorrect circumstances if a select with -1 as the false value was canonicalized by swapping the T/F values. Unlike the other transforms in the same function, it is not invariant to equality. Some alive proofs: https://alive2.llvm.org/ce/z/emmKKL Based on original patch by David Green! Fixes https://bugs.llvm.org/show_bug.cgi?id=48390 Differential Revision: https://reviews.llvm.org/D92717	2020-12-09 18:19:09 +03:00
Anton Afanasyev	e5bf2e8989	[SLP] Use the width of value truncated just before storing For stores chain vectorization we choose the size of vector elements to ensure we fit to minimum and maximum vector register size for the number of elements given. This patch corrects vector element size choosing the width of value truncated just before storing instead of the width of value stored. Fixes PR46983 Differential Revision: https://reviews.llvm.org/D92824	2020-12-09 16:38:45 +03:00
Sander de Smalen	d568cff696	[LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF. * Steps are scaled by `vscale`, a runtime value. * Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately. This can vectorize the following loop [1]: void loop(int N, double a, double b) { #pragma clang loop vectorize_width(4, scalable) for (int i = 0; i < N; i++) { a[i] = b[i] + 1.0; } } [1] This source-level example is based on the pragma proposed separately in D89031. This patch only implements the LLVM part. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91077	2020-12-09 11:25:21 +00:00
Sander de Smalen	adc37145de	[LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable. This patch removes a number of asserts that VF is not scalable, even though the code where this assert lives does nothing that prevents VF being scalable. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91060	2020-12-09 11:25:21 +00:00
Joe Ellis	80c33de2d3	[SelectionDAG] Add llvm.vector.{extract,insert} intrinsics This commit adds two new intrinsics. - llvm.experimental.vector.insert: used to insert a vector into another vector starting at a given index. - llvm.experimental.vector.extract: used to extract a subvector from a larger vector starting from a given index. The codegen work for these intrinsics has already been completed; this commit is simply exposing the existing ISD nodes to LLVM IR. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D91362	2020-12-09 11:08:41 +00:00
Philip Reames	5171b7b40e	[indvars] Common a bit of code [NFC]	2020-12-08 15:25:48 -08:00
Anna Thomas	29356e3279	[ScalarizeMaskedMemIntrin] Add new PM support This patch adds new PM support for the pass and the pass can be now used during middle-end transforms. The old pass is remamed to ScalarizeMaskedMemIntrinLegacyPass. Reviewed-By: skatkov, aeubanks Differential Revision: https://reviews.llvm.org/D92743	2020-12-08 17:15:22 -05:00
Benjamin Kramer	5f18e2f31e	Move createScalarizeMaskedMemIntrinPass to Scalar.h	2020-12-08 19:08:09 +01:00
Benjamin Kramer	10987e30be	Remove unused include. NFC. This is also a layering violation.	2020-12-08 19:03:56 +01:00
Anna Thomas	09f2f9605f	[ScalarizeMaskedMemIntrinsic] Move from CodeGen into Transforms ScalarizeMaskedMemIntrinsic is currently a codeGen level pass. The pass is actually operating on IR level and does not use any code gen specific passes. It is useful to move it into transforms directory so that it can be more widely used as a mid-level transform as well (apart from usage in codegen pipeline). In particular, we have a usecase downstream where we would like to use this pass in our mid-level pipeline which operates on IR level. The next change will be to add support for new PM. Reviewers: craig.topper, apilipenko, skatkov Reviewed-By: skatkov Differential Revision: https://reviews.llvm.org/D92407	2020-12-08 12:25:58 -05:00
Xun Li	31e60b9133	[coroutine] should disable inline before calling coro split This is a rework of D85812, which didn't land. When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well. So we believe that unsplited coroutine function should not be inlined. This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue test plan: check-llvm, check-clang In D85812, there was suggestions on moving the macros to Attributes.td to avoid circular header dependency issue. I believe it's not worth doing just to be able to use one constant string in one place. Today, there are already 3 possible attribute values for "coroutine.presplit": `c6543cc6b8/llvm/lib/Transforms/Coroutines/CoroInternal.h (L40-L42)` If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill. Instead, I think the best way to do this is to add an API in Function class that checks whether this function is a coroutine, by checking the attribute by name directly. Differential Revision: https://reviews.llvm.org/D92706	2020-12-08 08:53:08 -08:00
Teresa Johnson	77b509710c	[ICP] Don't promote when target not defined in module This guards against cases where the symbol was dead code eliminated in the binary by ThinLTO, and we have a sample profile collected for one binary but used to optimize another. Most of the benefit from ICP comes from inlining the target, which we can't do with only a declaration anyway. If this is in the pre-ThinLTO link step (e.g. for instrumentation based PGO), we will attempt the promotion again in the ThinLTO backend after importing anyway, and we don't need the early promotion to facilitate that. Differential Revision: https://reviews.llvm.org/D92804	2020-12-08 07:45:36 -08:00
Sjoerd Meijer	1e260f955d	[LICM][docs] Document that LICM is also a canonicalization transform. NFC. This documents that LICM is a canonicalization transform, which we discussed recently in: http://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html but which was also discused earlier, e.g. in: http://lists.llvm.org/pipermail/llvm-dev/2019-September/135058.html	2020-12-08 11:56:35 +00:00
Evgeniy Brevnov	2d1b024d06	[DSE][NFC] Need to be carefull mixing signed and unsigned types Currently in some places we use signed type to represent size of an access and put explicit casts from unsigned to signed. For example: int64_t EarlierSize = int64_t(Loc.Size.getValue()); Even though it doesn't loos bits (immidiatly) it may overflow and we end up with negative size. Potentially that cause later code to work incorrectly. A simple expample is a check that size is not negative. I think it would be safer and clearer if we use unsigned type for the size and handle it appropriately. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D92648	2020-12-08 16:53:37 +07:00
Valentin Churavy	700cf7dcc9	[VNCoercion] Disallow coercion between different ni addrspaces I'm not sure if it would be legal by the IR reference to introduce an addrspacecast here, since the IR reference is a bit vague on the exact semantics, but at least for our usage of it (and I suspect for many other's usage) it is not. For us, addrspacecasts between non-integral address spaces carry frontend information that the optimizer cannot deduce afterwards in a generic way (though we have frontend specific passes in our pipline that do propagate these). In any case, I'm sure nobody is using it this way at the moment, since it would have introduced inttoptrs, which are definitely illegal. Fixes PR38375 Co-authored-by: Keno Fischer <keno@alumni.harvard.edu> Reviewed By: reames Differential Revision: https://reviews.llvm.org/D50010	2020-12-07 20:19:48 -05:00
Sanjay Patel	5fe1a49f96	[SLP] fix typo in debug string; NFC	2020-12-07 15:09:21 -05:00
Bardia Mahjour	4db9b78c81	[LV] Epilogue Vectorization with Optimal Control Flow - Default Enablement This patch enables epilogue vectorization by default per reviewer requests. Differential Revision: https://reviews.llvm.org/D89566	2020-12-07 14:29:36 -05:00
Florian Hahn	32825e8636	[ConstraintElimination] Tweak placement in pipeline. This patch adds the ConstraintElimination pass to the LTO pipeline and also runs it after SCCP in the function simplification pipeline. This increases the number of cases we can elimination. Pending further tuning.	2020-12-07 19:08:40 +00:00
Simon Pilgrim	50dd1dba6e	[IPO] Fix operator precedence warning. NFCI. Check the entire assertion condition before && with the message.	2020-12-07 18:23:54 +00:00
Alexey Bataev	438682de6a	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the ivectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D92668	2020-12-07 07:50:00 -08:00
Jun Ma	216689ace7	[Coroutines] Add DW_OP_deref for transformed dbg.value intrinsic. Differential Revision: https://reviews.llvm.org/D92462	2020-12-07 10:24:44 +08:00
Craig Topper	305fcc9122	[LoopIdiomRecognize] Merge a conditional operator with an earlier if and remove an extra temporary variable. NFC The CountPrev variable was only used to forward a value from the if statement to the conditional operator under the same condition. While there move some variable declarations to their first assignment.	2020-12-06 15:23:18 -08:00
Fangrui Song	2832f3528c	[Transforms] Delete unused declarations from NewGVN/CoroSplit/ValueMapper	2020-12-06 13:04:01 -08:00
Wenlei He	6b989a1710	[CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon. Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO. Interface There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile. - Query base profile (`getBaseSamplesFor`) Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function. - Query context profile (`getContextSamplesFor`) Context profile is a function's CFG profile for a given calling context. We can query context profile by context string. - Track inlined context profile (`markContextSamplesInlined`) When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function. - Track not-inlined context profile (`promoteMergeContextSamplesTree`) When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination. Implementation Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state. On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes. Integration `SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`. Differential Revision: https://reviews.llvm.org/D90125	2020-12-06 11:49:18 -08:00
Kazu Hirata	ddb002d7c7	[InstCombine] Remove replacePointer (NFC) The declaration was introduced on Feb 10, 2017 in commit `ba01ed00fe` without a corresponding definition.	2020-12-06 10:24:08 -08:00
Sanjay Patel	94f6d365e4	[InstCombine] avoid crash on phi with unreachable incoming block (PR48369)	2020-12-06 09:31:47 -05:00
Fangrui Song	204d0d51b3	[MemProf] Make __memprof_shadow_memory_dynamic_address dso_local in static relocation model The x86-64 backend currently has a bug which uses a wrong register when for the GOTPCREL reference. The program will crash without the dso_local specifier.	2020-12-05 21:36:31 -08:00
Florian Hahn	4ceecc820b	[ConstraintElimination] Handle constraints with all zero var coeffs. Constraints where all variable coefficients are 0 do not add any useful information. When checking, we can check if they are always true/false.	2020-12-05 12:06:53 +00:00
Kazu Hirata	8006043b13	[IRCE] Remove unused IsSigned and its accessor (NFC) IsSigned and its accessor, isSigned, were introduced on Oct 25, 2017 in commit `9ac7021a25`. The last use was removed on Nov 20, 2017 in commit `268467869b`.	2020-12-04 21:26:12 -08:00
Jianzhou Zhao	a28db8b27a	[dfsan] Add empty APIs for field-level shadow This is a child diff of D92261. This diff adds APIs that return shadow type/value/zero from origin objects. For the time being these APIs simply returns primitive shadow type/value/zero. The following diff will be implementing the conversion. As D92261 explains, some cases still use primitive shadow during the incremential changes. The cases include 1) alloca/load/store 2) custom function IO 3) vectors At the cases this diff does not use the new APIs, but uses primitive shadow objects explicitly. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92629	2020-12-04 21:42:07 +00:00
Duncan P. N. Exon Smith	d10f9863a5	ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC Prepare to delete `AlignedCharArrayUnion` by migrating its users over to `std::aligned_union_t`. I will delete `AlignedCharArrayUnion` and its tests in a follow-up commit so that it's easier to revert in isolation in case some downstream wants to keep using it. Differential Revision: https://reviews.llvm.org/D92516	2020-12-04 12:34:49 -08:00
Duncan P. N. Exon Smith	5b267fb796	ADT: Stop peeking inside AlignedCharArrayUnion, NFC Update all the users of `AlignedCharArrayUnion` to stop peeking inside (to look at `buffer`) so that a follow-up patch can replace it with an alias to `std::aligned_union_t`. This was reviewed as part of https://reviews.llvm.org/D92512, but I'm splitting this bit out to commit first to reduce churn in case the change to `AlignedCharArrayUnion` needs to be reverted for some unexpected reason.	2020-12-04 11:07:42 -08:00
Hiroshi Yamauchi	f9c3954a6e	Fix for Bug 48055. Differential Revision: https://reviews.llvm.org/D92599	2020-12-04 11:05:01 -08:00
Arthur Eubanks	7f6f9f4cf9	[NewPM] Make pass adaptors less templatey Currently PassBuilder.cpp is by far the file that takes longest to compile. This is due to tons of templates being instantiated per pass. Follow PassManager by using wrappers around passes to avoid making the adaptors templated on the pass type. This allows us to move various adaptors' run methods into .cpp files. This reduces the compile time of PassBuilder.cpp on my machine from 66 to 39 seconds. It also reduces the size of opt from 685M to 676M. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D92616	2020-12-04 08:30:50 -08:00
Evgeniy Brevnov	061cebb46f	[NFC][NARY-REASSOCIATE] Restructure code to aviod isPotentiallyReassociatable Currently we have to duplicate the same checks in isPotentiallyReassociatable and tryReassociate. With simple pattern like add/mul this may be not a big deal. But the situation gets much worse when I try to add support for min/max. Min/Max may be represented by several instructions and can take different forms. In order reduce complexity for upcoming min/max support we need to restructure the code a bit to avoid mentioned code duplication. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88286	2020-12-04 16:19:43 +07:00
Evgeniy Brevnov	f61c29b3a7	[NARY-REASSOCIATE] Simplify traversal logic by post deleting dead instructions Currently we delete optimized instructions as we go. That has several negative consequences. First it complicates traversal logic itself. Second if newly generated instruction has been deleted the traversal is repeated from scratch. But real motivation for the change is upcoming change with support for min/max reassociation. Here we employ SCEV expander to generate code. As a result newly generated instructions may be inserted not right before original instruction (because SCEV may do hoisting) and there is no way to know 'next' instruction. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88285	2020-12-04 16:17:50 +07:00
Kazu Hirata	e2fc11cf9f	[JumpThreading] Call eraseBlock when folding a conditional branch This patch teaches the jump threading pass to call BPI->eraseBlock when it folds a conditional branch. Without this patch, BranchProbabilityInfo could end up with stale edge probabilities for the basic block containing the conditional branch -- one edge probability with less than 1.0 and the other for a removed edge. Differential Revision: https://reviews.llvm.org/D92608	2020-12-03 23:50:17 -08:00
Max Kazantsev	12b6c5e682	Return "[IndVars] ICmpInst should not prevent IV widening" This reverts commit `4bd35cdc3a`. The patch was reverted during the investigation. The investigation shown that the patch did not cause any trouble, but just exposed the existing problem that is addressed by the previous patch "[IndVars] Quick fix LHS/RHS bug". Returning without changes.	2020-12-04 12:34:43 +07:00
Max Kazantsev	3df0daceb2	[IndVars] Quick fix LHS/RHS bug The code relies on fact that LHS is the NarrowDef but never really checks it. Adding the conservative restrictive check, will follow-up with handling of case where RHS is a NarrowDef.	2020-12-04 12:34:42 +07:00
Jianzhou Zhao	80e326a8c4	[dfsan] Support passing non-i16 shadow values in TLS mode This is a child diff of D92261. It extended TLS arg/ret to work with aggregate types. For a function t foo(t1 a1, t2 a2, ... tn an) Its arguments shadow are saved in TLS args like a1_s, a2_s, ..., an_s TLS ret simply includes r_s. By calculating the type size of each shadow value, we can get their offset. This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp. Note that this change does not add test cases for overflowed TLS arg/ret because this is hard to test w/o supporting aggregate shdow types. We will be adding them after supporting that. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92440	2020-12-04 02:45:07 +00:00
Philip Reames	0c866a3d6a	[LoopVec] Support non-instructions as argument to uniform mem ops The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists. This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of it's operand. Only instructions within the loop have their uses scanned.) In the process, we remove a restriction which required the operand of the uniform mem op to itself be an instruction. This allows detection of uniform mem ops involving global addresses. Differential Revision: https://reviews.llvm.org/D92056	2020-12-03 14:51:44 -08:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Max Kazantsev	4bd35cdc3a	Revert "[IndVars] ICmpInst should not prevent IV widening" This reverts commit `0c9c6ddf17`. We are seeing some failures with this patch locally. Not clear if it's causing them or just triggering a problem in another place. Reverting while investigating.	2020-12-03 18:01:41 +07:00
modimo	c1ba991e8d	[NFC] Fix typo	2020-12-02 22:23:57 -08:00
Jianzhou Zhao	bd726d2796	[dfsan] Rename ShadowTy/ZeroShadow with prefix Primitive This is a child diff of D92261. After supporting field/index-level shadow, the existing shadow with type i16 works for only primitive types. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92459	2020-12-03 05:31:01 +00:00
Florian Hahn	2304528bb5	[ConstraintElimination] Make sure arguments of std:pow match. This should fix a build failure on some systems, e.g. solaris11-sparcv9 http://lab.llvm.org:8014/#/builders/22	2020-12-02 22:23:26 +00:00
Hongtao Yu	24d4291ca7	[CSSPGO] Pseudo probes for function calls. An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work. One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution. With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst. To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use. Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`. Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91756	2020-12-02 13:45:20 -08:00
Jianzhou Zhao	dad5d95883	[dfsan] Rename CachedCombinedShadow to be CachedShadow At D92261, this type will be used to cache both combined shadow and converted shadow values. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92458	2020-12-02 21:39:16 +00:00
jasonliu	a65d8c5d72	[XCOFF][AIX] Generate LSDA data and compact unwind section on AIX Summary: AIX uses the existing EH infrastructure in clang and llvm. The major differences would be 1. AIX do not have CFI instructions. 2. AIX uses a new personality routine, named __xlcxx_personality_v1. It doesn't use the GCC personality rountine, because the interoperability is not there yet on AIX. 3. AIX do not use eh_frame sections. Instead, it would use a eh_info section (compat unwind section) to store the information about personality routine and LSDA data address. Reviewed By: daltenty, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D91455	2020-12-02 18:42:44 +00:00
Bardia Mahjour	a7e2c26939	[LV] Epilogue Vectorization with Optimal Control Flow (Recommit) This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566	2020-12-02 10:09:56 -05:00
Sanjay Patel	56fd29e93b	[SLP] use 'match' for binop/select; NFC This might be a small improvement in readability, but the real motivation is to make it easier to adapt the code to deal with intrinsics like 'maxnum' and/or integer min/max. There is potentially help in doing that with D92086, but we might also just add specialized wrappers here to deal with the expected patterns.	2020-12-02 09:04:08 -05:00
Alex Zinenko	240dd92432	[OpenMPIRBuilder] forward arguments as pointers to outlined function OpenMPIRBuilder::createParallel outlines the body region of the parallel construct into a new function that accepts any value previously defined outside the region as a function argument. This function is called back by OpenMP runtime function __kmpc_fork_call, which expects trailing arguments to be pointers. If the region uses a value that is not of a pointer type, e.g. a struct, the produced code would be invalid. In such cases, make createParallel emit IR that stores the value on stack and pass the pointer to the outlined function instead. The outlined function then loads the value back and uses as normal. Reviewed By: jdoerfert, llitchev Differential Revision: https://reviews.llvm.org/D92189	2020-12-02 14:59:41 +01:00
David Sherwood	71bd59f0cb	[SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute In this patch I have added support for a new loop hint called vectorize.scalable.enable that says whether we should enable scalable vectorization or not. If a user wants to instruct the compiler to vectorize a loop with scalable vectors they can now do this as follows: br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2 ... !2 = !{!2, !3, !4} !3 = !{!"llvm.loop.vectorize.width", i32 8} !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true} Setting the hint to false simply reverts the behaviour back to the default, using fixed width vectors. Differential Revision: https://reviews.llvm.org/D88962	2020-12-02 13:23:43 +00:00
Chen Zheng	3cb7d62452	[LSR][NFC] don't collect chains when isNumRegsMajorCostOfLSR is false. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D92159	2020-12-01 22:29:33 -05:00
Jianzhou Zhao	405ea2b93d	[msan] Replace 8 by kShadowTLSAlignment Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D92275	2020-12-02 01:09:49 +00:00
Fangrui Song	a5309438fe	static const char *const foo => const char foo[] By default, a non-template variable of non-volatile const-qualified type having namespace-scope has internal linkage, so no need for `static`.	2020-12-01 10:33:18 -08:00
Bardia Mahjour	c94af03f7f	Revert "[LV] Epilogue Vectorization with Optimal Control Flow" This reverts commit `9c5504adce`. Reverting to investigate build failure in http://lab.llvm.org:8011/#/builders/98/builds/1461/steps/9	2020-12-01 12:50:36 -05:00
Bardia Mahjour	9c5504adce	[LV] Epilogue Vectorization with Optimal Control Flow This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566	2020-12-01 12:04:29 -05:00
Nikita Popov	624af932a8	[MemCpyOpt] Port to MemorySSA This is a straightforward port of MemCpyOpt to MemorySSA following the approach of D26739. MemDep queries are replaced with MSSA queries without changing the overall structure of the pass. Some care has to be taken to account for differences between these APIs (MemDep also returns reads, MSSA doesn't). Differential Revision: https://reviews.llvm.org/D89207	2020-12-01 17:57:41 +01:00
Clement Courbet	735e6c888e	[MergeICmps] Fix missing split. We were not correctly splitting a blocks for chains of length 1. Before that change, additional instructions for blocks in chains of length 1 were not split off from the block before removing (this was done correctly for chains of longer size). If this first block contained an instruction referenced elsewhere, deleting the block, would result in invalidation of the produced value. This caused a miscompile which motivated D92297 (before D17993, nonnull and dereferenceable attributed were not added so MergeICmps were not triggered.) The new test gep-references-bb.ll demonstrate the issue. The regression was introduced in rG0efadbbcdeb82f5c14f38fbc2826107063ca48b2. This supersedes D92364. Test case by MaskRay (Fangrui Song). Differential Revision: https://reviews.llvm.org/D92375	2020-12-01 16:50:55 +01:00
Sanjay Patel	9f60b8b3d2	[InstCombine] canonicalize sign-bit-shift of difference to ext(icmp) icmp is the preferred spelling in IR because icmp analysis is expected to be better than any other analysis. This should lead to more follow-on folding potential. It's difficult to say exactly what we should do in codegen to compensate. For example on AArch64, which of these is preferred: sub w8, w0, w1 lsr w0, w8, #31 vs: cmp w0, w1 cset w0, lt If there are perf regressions, then we should deal with those in codegen on a case-by-case basis. A possible motivating example for better optimization is shown in: https://llvm.org/PR43198 but that will require other transforms before anything changes there. Alive proof: https://rise4fun.com/Alive/o4E Name: sign-bit splat Pre: C1 == (width(%x) - 1) %s = sub nsw %x, %y %r = ashr %s, C1 => %c = icmp slt %x, %y %r = sext %c Name: sign-bit LSB Pre: C1 == (width(%x) - 1) %s = sub nsw %x, %y %r = lshr %s, C1 => %c = icmp slt %x, %y %r = zext %c	2020-12-01 09:58:11 -05:00
Florian Hahn	7a4f1d59b8	[ConstraintElimination] Decompose GEP %ptr, ZEXT(SHL()). Add support to decompose a GEP with a ZEXT(SHL()) operand.	2020-12-01 14:23:21 +00:00
Bhramar Vatsa	fd679107d6	[InstCombine] Optimize away the unnecessary multi-use sign-extend C.f. https://bugs.llvm.org/show_bug.cgi?id=47765 Added a case for handling the sign-extend (Shl+AShr) for multiple uses, to optimize it away for an individual use, when the demanded bits aren't affected by sign-extend. https://rise4fun.com/Alive/lgf Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D91343	2020-12-01 16:54:00 +03:00
Roman Lebedev	94ead0190f	[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold, 2 If the shift amount was undef for some lane, the shift amount in opposite shift is irrelevant for that lane, and the new shift amount for that lane can be undef.	2020-12-01 16:54:00 +03:00
Roman Lebedev	52533b52b8	Revert "[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold" It seems i have missed checklines, temporairly reverting, will reland momentairly.. This reverts commit `aa1aa13509`.	2020-12-01 15:47:04 +03:00
Roman Lebedev	aa1aa13509	[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold If the shift amount was undef for some lane, the shift amount in opposite shift is irrelevant for that lane, and the new shift amount for that lane can be undef.	2020-12-01 15:13:08 +03:00
Roman Lebedev	8e29e20e0d	[InstCombine] Evaluate new shift amount for sext(ashr(shl(trunc()))) fold in wide type (PR48343) It is not correct to compute that new shift amount in it's narrow type and only then extend it into the wide type: ---------------------------------------- Optimization: PR48343 good Precondition: (width(%X) == width(%r)) %o0 = trunc %X %o1 = shl %o0, %Y %o2 = ashr %o1, %Y %r = sext %o2 => %n0 = sext %Y %n1 = sub width(%o0), %n0 %n2 = sub width(%X), %n1 %n3 = shl %X, %n2 %r = ashr %n3, %n2 Done: 2016 Optimization is correct! ---------------------------------------- Optimization: PR48343 bad Precondition: (width(%X) == width(%r)) %o0 = trunc %X %o1 = shl %o0, %Y %o2 = ashr %o1, %Y %r = sext %o2 => %n0 = sub width(%o0), %Y %n1 = sub width(%X), %n0 %n2 = sext %n1 %n3 = shl %X, %n2 %r = ashr %n3, %n2 Done: 1 ERROR: Domain of definedness of Target is smaller than Source's for i9 %r Example: %X i9 = 0x000 (0) %Y i4 = 0x3 (3) %o0 i4 = 0x0 (0) %o1 i4 = 0x0 (0) %o2 i4 = 0x0 (0) %n0 i4 = 0x1 (1) %n1 i4 = 0x8 (8, -8) %n2 i9 = 0x1F8 (504, -8) %n3 i9 = 0x000 (0) Source value: 0x000 (0) Target value: undef I.e. we should be computing it in the wide type from the beginning. Fixes https://bugs.llvm.org/show_bug.cgi?id=48343	2020-12-01 15:13:07 +03:00
Roman Lebedev	15f8060f6f	[SimplifyCFG] FoldBranchToCommonDest: don't require that cmp of br is last instruction There is no correctness need for that, and since we allow live-out uses, this could theoretically happen, because currently nothing will move the cond to right before the branch in those tests. But regardless, lifting that restriction even makes the transform easier to understand. This makes the transform happen in 81 more cases (+0.55%) )	2020-12-01 15:13:06 +03:00
Cullen Rhodes	cba4accda0	[LV] Clamp VF hint when unsafe In the following loop the dependence distance is 2 and can only be vectorized if the vector length is no larger than this. void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(4) for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } However, when specifying a VF of 4 via a loop hint this loop is vectorized. According to [1][2], loop hints are ignored if the optimization is not safe to apply. This patch introduces a check to bail of vectorization if the user specified VF is greater than the maximum feasible VF, unless explicitly forced with '-force-vector-width=X'. [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave [2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations Reviewed By: sdesmalen, fhahn, Meinersbur Differential Revision: https://reviews.llvm.org/D90687	2020-12-01 11:30:34 +00:00
Caroline Concatto	4b0ef2b075	[NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type This patch replaces the attribute `unsigned VF` in the class IntrinsicCostAttributes by `ElementCount VF`. This is a non-functional change to help upcoming patches to compute the cost model for scalable vector inside this class. Differential Revision: https://reviews.llvm.org/D91532	2020-12-01 11:12:51 +00:00
Florian Hahn	efa9728a50	[ConstraintElimination] Decompose GEP %ptr, SHL(). Add support the decompose a GEP with an SHL operand.	2020-12-01 10:58:36 +00:00
Sjoerd Meijer	f44ba25135	ExtractValue instruction costs Instruction ExtractValue wasn't handled in LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled as a mul which is not really accurate. Since it is free (most of the times), this now gets a cost of 0 using getInstructionCost. This is a follow-up of D92208, that required changing this regression test. In a follow up I will look at InsertValue which also isn't handled yet. Differential Revision: https://reviews.llvm.org/D92317	2020-12-01 10:42:23 +00:00
Greg Parker	bcc802fa36	[DSE] Remove a redundant call to getLocForWriteEx() Differential Revision: https://reviews.llvm.org/D92263	2020-11-30 21:12:24 -08:00
Mircea Trofin	5fe10263ab	[llvm][inliner] Reuse the inliner pass to implement 'always inliner' Enable performing mandatory inlinings upfront, by reusing the same logic as the full inliner, instead of the AlwaysInliner. This has the following benefits: - reduce code duplication - one inliner codebase - open the opportunity to help the full inliner by performing additional function passes after the mandatory inlinings, but before th full inliner. Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve: less call sites, more contextualization, and, depending on the additional function optimization passes run between the 2 inliners, higher accuracy of cost models / decision policies. Note that this patch does not yet enable much in terms of post-always inline function optimization. Differential Revision: https://reviews.llvm.org/D91567	2020-11-30 12:03:39 -08:00
Hongtao Yu	64fa8cce22	[CSSPGO] Pseudo probe instrumentation pass This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86499	2020-11-30 10:16:54 -08:00
Florian Hahn	fe83adb05a	[VPlan] Use VPUser to manage VPPredInstPHIRecipe operand (NFC). VPPredInstPHIRecipe is one of the recipes that was missed during the initial conversion. This patch adjusts the recipe to also manage its operand using VPUser.	2020-11-30 13:09:58 +00:00
Roman Lebedev	b0e9b7c59f	[NFC][SimplifyCFG] Add STATISTIC() to the FoldValueComparisonIntoPredecessors() fold	2020-11-30 12:27:16 +03:00
Max Kazantsev	0c9c6ddf17	[IndVars] ICmpInst should not prevent IV widening If we decided to widen IV with zext, then unsigned comparisons should not prevent widening (same for sext/sign comparisons). The result of comparison in wider type does not change in this case. Differential Revision: https://reviews.llvm.org/D92207 Reviewed By: nikic	2020-11-30 10:51:31 +07:00
Fangrui Song	5408fdcd78	[VPlan] Fix -Wunused-variable after `a813090072`	2020-11-29 10:38:01 -08:00
Florian Hahn	4bc9b909d7	[VPlan] Use VPValue and VPUser ops to print VPReplicateRecipe.	2020-11-29 18:28:27 +00:00
Florian Hahn	a813090072	[VPlan] Manage stored values of interleave groups using VPUser (NFC) Interleave groups also depend on the values they store. Manage the stored values as VPUser operands. This is currently a NFC, but is required to allow VPlan transforms and to manage generated vector values exclusively in VPTransformState.	2020-11-29 17:24:36 +00:00
Andrew Litteken	a8a43b6338	Revert "[IRSim][IROutliner] Adding the extraction basics for the IROutliner." Reverting commit due to address sanitizer errors. > Extracting the similar regions is the first step in the IROutliner. > > Using the IRSimilarityIdentifier, we collect the SimilarityGroups and > sort them by how many instructions will be removed. Each > IRSimilarityCandidate is used to define an OutlinableRegion. Each > region is ordered by their occurrence in the Module and the regions that > are not compatible with previously outlined regions are discarded. > > Each region is then extracted with the CodeExtractor into its own > function. > > We test that correctly extract in: > test/Transforms/IROutliner/extraction.ll > test/Transforms/IROutliner/address-taken.ll > test/Transforms/IROutliner/outlining-same-globals.ll > test/Transforms/IROutliner/outlining-same-constants.ll > test/Transforms/IROutliner/outlining-different-structure.ll > > Reviewers: paquette, jroelofs, yroux > > Differential Revision: https://reviews.llvm.org/D86975 This reverts commit `bf899e8913`.	2020-11-27 19:55:57 -06:00
Andrew Litteken	bf899e8913	[IRSim][IROutliner] Adding the extraction basics for the IROutliner. Extracting the similar regions is the first step in the IROutliner. Using the IRSimilarityIdentifier, we collect the SimilarityGroups and sort them by how many instructions will be removed. Each IRSimilarityCandidate is used to define an OutlinableRegion. Each region is ordered by their occurrence in the Module and the regions that are not compatible with previously outlined regions are discarded. Each region is then extracted with the CodeExtractor into its own function. We test that correctly extract in: test/Transforms/IROutliner/extraction.ll test/Transforms/IROutliner/address-taken.ll test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: paquette, jroelofs, yroux Differential Revision: https://reviews.llvm.org/D86975	2020-11-27 19:08:29 -06:00
Florian Hahn	ae008798a4	[VPlan] Use VPTransformState::set in widenGEP. This patch updates widenGEP to manage the resulting vector values using the VPValue of VPWidenGEP recipe.	2020-11-27 17:01:55 +00:00
Francesco Petrogalli	8e0148dff7	[AllocaInst] Update `getAllocationSizeInBits` to return `TypeSize`. Reviewed By: peterwaller-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D92020	2020-11-27 16:39:10 +00:00
Sjoerd Meijer	10ad64aa3b	[SLP] Dump Tree costs. NFC. This adds LLVM_DEBUG messages to dump the (intermediate) tree cost calculations, which is useful to trace and see how the final cost is calculated.	2020-11-27 11:37:33 +00:00
Roman Lebedev	b33fbbaa34	Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions This was orginally committed in `2245fb8aaa`. but was immediately reverted in `f3abd54958` because of a PHI handling issue. Original commit message: 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-27 12:47:15 +03:00
Wang, Pengfei	8dcf8d1da5	[msan] Fix bugs when instrument x86.avx512_cvt intrinsics. Scalar intrinsics x86.avx512_cvt have an extra rounding mode operand. We can directly ignore it to reuse the SSE/AVX math. This fix the bug https://bugs.llvm.org/show_bug.cgi?id=48298. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92206	2020-11-27 16:33:14 +08:00
Markus Lavin	808fcfe594	Revert "[DebugInfo] Improve dbg preservation in LSR." This reverts commit `06758c6a61`. Bug: https://bugs.llvm.org/show_bug.cgi?id=48166 Additional discussion in: https://reviews.llvm.org/D91711	2020-11-27 08:52:32 +01:00
Max Kazantsev	faf183874c	[IndVars] LCSSA Phi users should not prevent widening When widening an IndVar that has LCSSA Phi users outside the loop, we can safely widen it as usual and then truncate the result outside the loop without hurting the performance. Differential Revision: https://reviews.llvm.org/D91593 Reviewed By: skatkov	2020-11-27 11:19:54 +07:00
Roman Lebedev	f3abd54958	Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions" Many bots are unhappy, at the very least missed a few codegen tests, and possibly this has a logic hole inducing a miscompile (will be really awesome to have ready reproducer..) Need to investigate. This reverts commit `2245fb8aaa`.	2020-11-26 23:13:43 +03:00
Roman Lebedev	2245fb8aaa	[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-26 22:51:22 +03:00
Roman Lebedev	65db7d38e0	[NFC][SimplifyCFG] Add statistic to `FoldBranchToCommonDest()` fold	2020-11-26 22:51:21 +03:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Florian Hahn	bd0b1311db	[VPlan] Turn VPReplicateRecipe into a VPValue. Update VPReplicateRecipe to inherit from VPValue. This still does not update scalarizeInstruction to set the result for the VPValue of VPReplicateRecipe, because this first requires tracking scalar values in VPTransformState. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D91500	2020-11-26 13:50:24 +00:00
David Stenberg	384996f9e1	[IndVarSimplify] Fix Modified status when handling dead PHI nodes When bailing out in rewriteLoopExitValues() you could be left with PHI nodes in the DeadInsts vector. Those would be not handled by the use of RecursivelyDeleteTriviallyDeadInstructions() in IndVarSimplify. This resulted in the IndVarSimplify pass returning an incorrect modified status. This was caught by the expensive check introduced in D86589. This patches changes IndVarSimplify so that it deletes those PHI nodes, using RecursivelyDeleteDeadPHINode(). This fixes PR47486. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D91153	2020-11-26 14:28:21 +01:00
Zhengyang Liu	345fcccb33	Fix use-of-uninitialized-value in rG75f50e15bf8f Differential Revision: https://reviews.llvm.org/D71126	2020-11-26 01:39:22 -07:00
Max Kazantsev	664e1da485	[LoopLoadElim] Make sure all loops are in simplify form. PR48150 LoopLoadElim may end up expanding an AddRec from a loop which is not the current loop. This loop may not be in simplify form. We figure it out after the no-return point, so cannot bail in this case. AddRec requires simplify form to expand. The only way to ensure this does not crash is to simplify all loops beforehand. The issue only exists in new PM. Old PM requests LoopSimplify required pass and it simplifies all loops before the opt begins. Differential Revision: https://reviews.llvm.org/D91525 Reviewed By: asbirlea, aeubanks	2020-11-26 10:51:11 +07:00
Roman Lebedev	a8d74517dc	[PassManager] Run Induction Variable Simplification pass after Recognize loop idioms pass, not before Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does. I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand, but i'm very sure that `-indvars` requires `-loop-idiom` to run afterwards, as it can be seen in the phase-ordering test. LoopIdiom runs on two types of loops: countable ones, and uncountable ones. For uncountable ones, IndVars obviously didn't make any change to them, since they are uncountable, so for them the order should be irrelevant. For countable ones, well, they should have been countable before IndVars for IndVars to make any change to them, and since SCEV is used on them, it shouldn't matter if IndVars have already canonicalized them. So i don't really see why we'd want the current ordering. Should this cause issues, it will give us a reproducer test case that shows flaws in this logic, and we then could adjust accordingly. While this is quite likely beneficial in-the-wild already, it's a required part for the full motivational pattern behind `left-shift-until-bittest` loop idiom (D91038). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91800	2020-11-25 19:20:07 +03:00
Cullen Rhodes	1ba4b82f67	[LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits MaxSafeRegisterWidth is a misnomer since it actually returns the maximum safe vector width. Register suggests it relates directly to a physical register where it could be a vector spanning one or more physical registers. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91727	2020-11-25 13:06:26 +00:00
Florian Hahn	ad5b83ddcf	[VPlan] Add VPReductionSC to VPUser::classof, unify VPValue IDs. This is a follow-up to `00a6601136` to make isa<VPReductionRecipe> work and unifies the VPValue ID names, by making sure they all consistently start with VPV*.	2020-11-25 11:08:25 +00:00
David Green	e0c479cd0e	[VPlan] Switch VPWidenRecipe to be a VPValue Similar to other patches, this makes VPWidenRecipe a VPValue. Because of the way it interacts with the reduction code it also slightly alters the way that VPValues are registered, removing the up front NeedDef and using getOrAddVPValue to create them on-demand if needed instead. Differential Revision: https://reviews.llvm.org/D88447	2020-11-25 08:25:06 +00:00
David Green	00a6601136	[VPlan] Turn VPReductionRecipe into a VPValue This converts the VPReductionRecipe into a VPValue, like other VPRecipe's in preparation for traversing def-use chains. It also makes it a VPUser, now storing the used VPValues as operands. It doesn't yet change how the VPReductionRecipes are created. It will need to call replaceAllUsesWith from the original recipe they replace, but that is not done yet as VPWidenRecipe need to be created first. Differential Revision: https://reviews.llvm.org/D88382	2020-11-25 08:25:05 +00:00
Kazu Hirata	1c82d32089	[CHR] Use pred_size (NFC)	2020-11-24 22:52:30 -08:00
Max Kazantsev	28d7ba1543	[IndVars] Use more precise context when eliminating narrowing When deciding to widen narrow use, we may need to prove some facts about it. For proof, the context is used. Currently we take the instruction being widened as the context. However, we may be more precise here if we take as context the point that dominates all users of instruction being widened. Differential Revision: https://reviews.llvm.org/D90456 Reviewed By: skatkov	2020-11-25 11:47:39 +07:00
Philip Reames	10ddb927c1	[SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC] Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute. Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.	2020-11-24 18:47:49 -08:00
Sanjay Patel	678b9c5dde	[InstCombine] try difference-of-shifts factorization before negator We need to preserve wrapping flags to allow better folds. The cases with geps may be non-intuitive, but that appears to agree with Alive2: https://alive2.llvm.org/ce/z/JQcqw7 We create 'nsw' ops independent from the original wrapping on the sub.	2020-11-24 13:56:30 -05:00
Philip Reames	075468621c	[LoopVec] Add a minor clarifying comment	2020-11-24 10:45:06 -08:00
Teresa Johnson	6e4c1cf293	[ThinLTO/WPD] Enable -wholeprogramdevirt-skip in ThinLTO backends Previously this option could be used to skip devirtualizations of the given functions in regular LTO and in the ThinLTO indexing step. This change allows them to be skipped in the backend as well, which is useful when debugging WPD in a distributed ThinLTO backend. Differential Revision: https://reviews.llvm.org/D91812	2020-11-24 09:35:07 -08:00
Ayal Zaks	32d9a386bf	[LV] Keep Primary Induction alive when folding tail by masking Fix PR47390. The primary induction should be considered alive when folding tail by masking, because it will be used by said masking; even when it may otherwise appear useless: feeding only its own 'bump', which is correctly considered dead, and as the 'bump' of another induction variable, which may wrongfully want to consider its bump = the primary induction, dead. Differential Revision: https://reviews.llvm.org/D92017	2020-11-24 15:12:54 +02:00
Arthur Eubanks	932e4f8815	[FunctionAttrs][NPM] Fix handling of convergent The legacy pass didn't properly detect indirect calls. We can still remove the convergent attribute when there are indirect calls. The LangRef says: > When it appears on a call/invoke, the convergent attribute indicates that we should treat the call as though we’re calling a convergent function. This is particularly useful on indirect calls; without this we may treat such calls as though the target is non-convergent. So don't skip handling of convergent when there are unknown calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89826	2020-11-23 21:09:41 -08:00
Philip Reames	1a9c72f8a8	[LoopVec] Reuse a lambda [NFC] Minor code refactor to improve readability.	2020-11-23 21:07:34 -08:00
Philip Reames	b06a2ad94f	[LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE) A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane. This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic. In review discussion, there was an issue raised with coupling cost modeling with the lowering strategy for uniform inputs. The discussion on that item remains unsettled and is pending larger architectural discussion. We decided to move forward with this patch as is, and revise as warranted once the bigger picture design questions are settled. Differential Revision: https://reviews.llvm.org/D91398	2020-11-23 15:32:17 -08:00
Sanjay Patel	ab29f091eb	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps This is a retry of `324a53205`. I cautiously reverted that at `6aa3fc4` because the rules about gep math were not clear. Since then, we have added this line to LangRef for gep inbounds: "The successive addition of offsets (without adding the base address) does not wrap the pointer index type in a signed sense (nsw)." See D90708 and post-commit comments on the revert patch for more details.	2020-11-23 16:50:09 -05:00
Arthur Eubanks	3c811ce4f3	[NPM] Share pass building options with legacy PM We should share options when possible. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91741	2020-11-23 13:04:05 -08:00
Sjoerd Meijer	33b2c88fa8	[LoopFlatten] Widen IV, support ZExt. I disabled the widening in `fa5cb4b` because it run in an assert, which was related to replacing values with different types. I forgot that an extend could also be a zero-extend, which I have added now. This means that the approach now is to create and insert a trunc value of the outerloop for each user, and use that to replace IV values. Differential Revision: https://reviews.llvm.org/D91690	2020-11-23 08:57:19 +00:00
Kazu Hirata	df73b8c174	[ValueMapper] Remove unused declaration remapFunction (NFC) The function declaration with two parameters was introduced on Apr 16 2016 in commit `f0d73f95c1` without a corresponding definition.	2020-11-22 21:52:03 -08:00
Kazu Hirata	186d129320	[hwasan] Remove unused declaration shadowBase (NFC) The function was introduced on Jan 23, 2019 in commit `73078ecd38`. Its definition was removed on Oct 27, 2020 in commit `0930763b4b`, leaving the declaration unused.	2020-11-22 20:08:51 -08:00
Kazu Hirata	def7cfb7ff	[InstCombine] Use is_contained (NFC)	2020-11-21 15:47:11 -08:00
Alexey Bataev	0b420d674a	[SLP][NFC]Fix assert condition in newTreeEntry, NFC.	2020-11-20 13:25:21 -08:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Jamie Schmeiser	7f6360cdc6	Reland: Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug. Summary: Expand existing loopsink testing to also test loopsinking using new pass manager. Enable memoryssa for loopsink with new pass manager. This combination exposed a bug that was previously fixed for loopsink without memoryssa. When sinking an instruction into a loop, the source block may not be part of the loop but still needs to be checked for pointer invalidation. This is the fix for bugzilla #39695 (PR 54659) expanded to also work with memoryssa. Respond to review comments. Enable Memory SSA in legacy Loop Sink pass under EnableMSSALoopDependency option control. Update tests accordingly. Respond to review comments. Add options controlling whether memoryssa is used for loop sink, defaulting to off. Expand testing based on these options. Respond to review comments. Properly indicated preserved analyses. This relanding addresses a compile-time performance problem by moving test for profile data earlier to avoid unnecessary computations. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90249	2020-11-20 10:26:33 -05:00
Arthur Eubanks	b77436047a	[PGO] Make -disable-preinline work with NPM Fixes cspgo_profile_summary.ll under NPM. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D91826	2020-11-19 22:58:55 -08:00
Arthur Eubanks	513d165b80	Port -lower-matrix-intrinsics-minimal to NPM This reuses the existing lower-matrix-intrinsics pass rather than going the legacy pass route of creating a new pass. Use this new variant in the NPM -O0 pipeline. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91811	2020-11-19 17:42:48 -08:00
Florian Hahn	7fa14a7c69	[ConstraintElimination] Decompose GEP with arbitrary offsets. This patch decomposes `GEP %x, %offset` as 0 + 1 * %x + 1 * %off.	2020-11-19 22:49:21 +00:00
Geoffrey Martin-Noble	b156514f8d	Remove unused private fields Unused since https://reviews.llvm.org/D91762 and triggering -Wunused-private-field ``` llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:365:13: error: private field 'GetArgTLS' is not used [-Werror,-Wunused-private-field] Constant GetArgTLS; ^ llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:366:13: error: private field 'GetRetvalTLS' is not used [-Werror,-Wunused-private-field] Constant GetRetvalTLS; ``` Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D91820	2020-11-19 13:54:54 -08:00
Roman Lebedev	a91e96702a	[InstCombine] Fold `and(shl(zext(x), width(SIGNMASK) - width(%x)), SIGNMASK)` to `and(sext(%x), SIGNMASK)` One less instruction and reducing use count of zext. As alive2 confirms, we're fine with all the weird combinations of undef elts in constants, but unless the shift amount was undef for a lane, we must sanitize undef mask to zero, since sign bits are no longer zeros. https://rise4fun.com/Alive/d7r ``` ---------------------------------------- Optimization: zz Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2)) %o0 = zext %x %o1 = shl %o0, C1 %r = and %o1, C2 => %n0 = sext %x %r = and %n0, C2 Done: 2016 Optimization is correct! ```	2020-11-20 00:31:27 +03:00
Jianzhou Zhao	6c1c308c0e	Remove deadcode from DFSanFunction::getTLS() clean more deadcode after D84704 Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91762	2020-11-19 21:10:37 +00:00
Nikita Popov	393b9e9db3	[MemLoc] Require LocationSize argument (NFC) When constructing a MemoryLocation by hand, require that a LocationSize is explicitly specified. D91649 will split up LocationSize::unknown() into two different states, and callers should make an explicit choice regarding the kind of MemoryLocation they want to have.	2020-11-19 21:45:52 +01:00
Sander de Smalen	41c9f4c1ce	[LoopVectorize] NFC: Fix unused variable warning for MaxSafeDepDist rGf571fe6df585127d8b045f8e8f5b4e59da9bbb73 led to a warning of an unused variable for MaxSafeDepDist (written but not used). It seems this variable and assignment can be safely removed.	2020-11-19 17:41:35 +00:00
Joseph Huber	da8bec47ab	[OpenMP] Add Location Fields to Libomptarget Runtime for Debugging Summary: Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values. Reviewers: jdoerfert grokos Differential Revision: https://reviews.llvm.org/D87946	2020-11-19 12:01:53 -05:00
Simon Moll	a1de391dae	[LV][NFC-ish] Allow vector widths over 256 elements The assertion that vector widths are <= 256 elements was hard wired in the LV code. Eg, VE allows for vectors up to 512 elements. Test again the TTI vector register bit width instead - this is an NFC for non-asserting builds. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91518	2020-11-19 10:58:29 +01:00
Max Kazantsev	515105f46b	[NFC] Remove comment (commited ahead of time by mistake)	2020-11-19 16:28:34 +07:00
Max Kazantsev	7c601d09a7	[NFC] Move code earlier as preparation for further changes	2020-11-19 16:27:23 +07:00
Andrew Wei	ea7ab5a42c	[IndVarSimplify] Notify top most loop to drop cached exit counts Some nested loops may share the same ExitingBB, so after we finishing FoldExit, we need to notify OuterLoop and SCEV to drop any stored trip count. Patched by: guopeilin Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D91325	2020-11-19 15:37:54 +08:00
Kazu Hirata	43c0e4f665	[Transforms] Use llvm::is_contained (NFC)	2020-11-18 20:42:22 -08:00
Jamie Schmeiser	cff479b145	Revert "Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.""" This reverts commit `e29292969b`. This apparently causes a regression in compile time (ie, it slows down).	2020-11-18 16:07:16 -05:00
Roman Lebedev	7bf89c2174	[NFC][Reassociate] Delay checking isLoadCombineCandidate() until after ShouldConvertOrWithNoCommonBitsToAdd() but before haveNoCommonBitsSet() This appears to improve -O3 compile-time performance somewhat: https://llvm-compile-time-tracker.com/compare.php?from=87369c626114ae17f4c637635c119e6de0856a9a&to=c04b8271e1609b0dfb20609b40844b0c4324517e&stat=instructions It doesn't look like delaying it until after haveNoCommonBitsSet() is better: https://llvm-compile-time-tracker.com/compare.php?from=c04b8271e1609b0dfb20609b40844b0c4324517e&to=b2943d450eaf41b5f76d2dc7350f0a279f64cd99&stat=instructions	2020-11-18 23:57:12 +03:00
Jamie Schmeiser	e29292969b	Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."" This reverts commit `562addba65`. Reverted change too quickly, the failing test cases passed on the next build. So reverting revert (to include the changes).	2020-11-18 15:33:02 -05:00
Florian Hahn	2fead1ac61	[ConstraintElimination] Decompose add nuw/sub nuw. Make use of the more flexible constraint handling added in `a8a79c9069` to decompose add nuw/sub nuw.	2020-11-18 20:29:30 +00:00
Joseph Huber	97e55cfef5	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;" Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D89802	2020-11-18 15:28:39 -05:00
Nikita Popov	f4a3969bff	[Inline] Fix incorrectly dropped noalias metadata This is the same fix as `23aeadb89d`, just for CloneScopedAliasMetadata rather than PropagateCallSiteMetadata. In this case the previous outcome was incorrectly dropped metadata, as it was not part of the computed metadata map. The real change in the test is that the first load now retains metadata, the rest of the changes are due to changes in metadata numbering.	2020-11-18 21:22:50 +01:00
Jamie Schmeiser	562addba65	Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug." This reverts commit `d4ba28bddc`.	2020-11-18 15:17:53 -05:00
Nikita Popov	23aeadb89d	[Inline] Fix incorrect noalias metadata application (PR48209) The VMap also contains a mapping from Argument => Instruction, where the instruction is part of the original function, not the inlined one. The code was assuming that all the instructions in the VMap were inlined. This was a pre-existing problem for the loop access metadata, but was extended to the more common noalias metadata by `27f647d117`, thus causing miscompiles. There is a similar assumption inside CloneAliasScopeMetadata(), so that one likely needs to be fixed as well.	2020-11-18 20:52:58 +01:00
Jamie Schmeiser	d4ba28bddc	Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug. Summary: Expand existing loopsink testing to also test loopsinking using new pass manager. Enable memoryssa for loopsink with new pass manager. This combination exposed a bug that was previously fixed for loopsink without memoryssa. When sinking an instruction into a loop, the source block may not be part of the loop but still needs to be checked for pointer invalidation. This is the fix for bugzilla #39695 (PR 54659) expanded to also work with memoryssa. Respond to review comments. Enable Memory SSA in legacy Loop Sink pass under EnableMSSALoopDependency option control. Update tests accordingly. Respond to review comments. Add options controlling whether memoryssa is used for loop sink, defaulting to off. Expand testing based on these options. Respond to review comments. Properly indicated preserved analyses. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90249	2020-11-18 14:08:42 -05:00
Piotr Sobczak	b3b9be4ae7	SpeculativeExecution: Allow speculating more instruction types Support more instructions in SpeculativeExecution pass: - ExtractValue - InsertValue - Trunc - Freeze Differential Revision: https://reviews.llvm.org/D91688	2020-11-18 17:00:19 +01:00
Roman Lebedev	34ff90ad5d	[Reassociate] Don't convert add-like-or's into add's if they appear to be part of load-combining idiom As Wei Mi is reporting in post-commit review https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201116/853479.html teaching -reassociate about add-like-or's (`70472f3`) results in breaking apart load widening patterns, and reassociating them. For now, simply exclude any such `or` that appears to be a root of load widening idiom from the or->add transformation. Note that the heuristic is greedy, it doesn't ensure that loads can actually be widened into a single load.	2020-11-18 17:55:02 +03:00
Florian Hahn	a8a79c9069	[ConstraintElimination] Refactor constraint extraction (NFC). This patch generalizes the extraction of a constraint for a given condition. It allows decompose to return a vector of c * X pairs, which allows de-composing multiple instructions in the future. It also adds more clarifying comments.	2020-11-18 13:59:18 +00:00
Benjamin Kramer	4dbe12e866	[SLP] Use the minimum alignment of the load bundle when forming a masked.gather Instead of the first load. That works when vectorizing contiguous loads, but not for gathers. Fixes a miscompile introduced in `fcad8d3635`.	2020-11-18 12:53:39 +01:00
Max Kazantsev	f33118c61c	[IndVars] Support different types of ExitCount when optimizing exit conds In some cases we can handle IV and iter count of different types. It's a typical situation after IV have been widened. This patch adds support for such cases, when legal. Differential Revision: https://reviews.llvm.org/D88528 Reviewed By: skatkov	2020-11-18 18:20:05 +07:00
Piotr Sobczak	c173f1b8eb	SpeculativeExecution: Allow speculating more instruction types Support more instructions in SpeculativeExecution pass: - ExtractElement - InsertElement - ShuffleVector Differential Revision: https://reviews.llvm.org/D91633	2020-11-18 09:46:43 +01:00
Arthur Eubanks	9e3b4f4941	[JumpThreading] Make -print-lvi-after-jump-threading work with NPM	2020-11-17 23:15:20 -08:00
Arthur Eubanks	ee7d315cd9	[DCE] Always get TargetLibraryInfo I don't see any reason not to unconditionally retrieve TLI, it's fairly cheap. Fixes calls-errno.ll under NPM. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91476	2020-11-17 20:41:05 -08:00
Nick Desaulniers	f4c6080ab8	Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch" This reverts commit `b7926ce6d7`. Going with a simpler approach.	2020-11-17 17:27:14 -08:00
Sanjay Patel	08834979e3	[SLP] avoid unreachable code crash/infloop Example based on the post-commit comments for D88735.	2020-11-17 15:10:23 -05:00
Sanjay Patel	4a66a1d17a	[InstCombine] allow vectors for masked-add -> xor fold https://rise4fun.com/Alive/I4Ge Name: add with pow2 mask Pre: isPowerOf2(C2) && (C1 & C2) != 0 && (C1 & (C2-1)) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %n = and i8 %x, C2 %r = xor i8 %n, C2	2020-11-17 13:36:08 -05:00
Simon Pilgrim	f7ebdec987	[InstCombine] visitAnd - remove unnecessary Value X, Y shadow variables. NFCI. Fixes a number of Wshadow warnings.	2020-11-17 17:59:21 +00:00
Simon Pilgrim	abf29d9862	[InstCombine] visitAnd - use m_SpecificInt instead of m_APInt + comparison. NFCI. m_SpecificInt has the same 'no undef element' behaviour as m_APInt so no change there, and anyway we have test coverage for undef elements in the fold. Noticed while fixing a Wshadow warning about shadow Value X, Y variables.	2020-11-17 17:37:10 +00:00
Sanjay Patel	f791ad7e1e	[InstCombine] remove scalar constraint for mask-of-add fold https://rise4fun.com/Alive/V6fP Name: add with low mask Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %r = and i8 %x, C2	2020-11-17 12:13:45 -05:00
Sanjay Patel	433696911a	[InstCombine] relax constraints on mask-of-add There are 2 changes: 1. Remove the unnecessary one-use check. 2. Remove the unnecessary power-of-2 check. https://rise4fun.com/Alive/V6fP Name: add with low mask Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %r = and i8 %x, C2	2020-11-17 12:13:44 -05:00
Florian Hahn	52f3714dae	[VPlan] Add VPDef class. This patch introduces a new VPDef class, which can be used to manage VPValues defined by recipes/VPInstructions. The idea here is to mirror VPUser for values defined by a recipe. A VPDef can produce either zero (e.g. a store recipe), one (most recipes) or multiple (VPInterleaveRecipe) result VPValues. To traverse the def-use chain from a VPDef to its users, one has to traverse the users of all values defined by a VPDef. VPValues now contain a pointer to their corresponding VPDef, if one exists. To traverse the def-use chain upwards from a VPValue, we first need to check if the VPValue is defined by a VPDef. If it does not have a VPDef, this means we have a VPValue that is not directly defined iniside the plan and we are done. If we have a VPDef, it is defined inside the region by a recipe, which is a VPUser, and the upwards def-use chain traversal continues by traversing all its operands. Note that we need to add an additional field to to VPVAlue to link them to their defs. The space increase is going to be offset by being able to remove the SubclassID field in future patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D90558	2020-11-17 16:18:11 +00:00
Matt Arsenault	c5ce6036c1	Linker: Fix linking of byref types This wasn't properly remapping the type like with the other attributes, so this would end up hitting a verifier error after linking different modules using byref.	2020-11-17 11:02:04 -05:00
Anton Afanasyev	0a1d315f9f	[SLPVectorizer] Fix assert	2020-11-17 18:46:31 +03:00
Anton Afanasyev	fcad8d3635	[SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic For the scattered operands of load instructions it makes sense to use gathering load intrinsic, which can lower to native instruction for X86/AVX512 and ARM/SVE. This also enables building vectorization tree with entries containing scattered operands. The next step is to add scattered store. Fixes PR47629 and PR47623 Differential Revision: https://reviews.llvm.org/D90445	2020-11-17 18:11:45 +03:00
Florian Hahn	13042da5cb	[ConstraintElimination] Add support for And. When processing conditional branches, if the condition is an AND of 2 compares and the true successor only has the current block as predecessor, queue both conditions for the true successor.	2020-11-17 14:12:15 +00:00
Sander de Smalen	f571fe6df5	Reland [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This relands https://reviews.llvm.org/D91059 and reverts commit `30fded75b4`. GetRegUsage now returns 0 when Ty is not a valid vector element type.	2020-11-17 13:45:10 +00:00
Yevgeny Rouban	a57fe210ff	[JumpThreading] Fix branch probabilities in DuplicateCondBranchOnPHIIntoPred() When instructions are cloned from block BB to PredBB in the method DuplicateCondBranchOnPHIIntoPred() number of successors of PredBB changes from 1 to number of successors of BB. So we have to copy branch probabilities from BB to PredBB. Reviewed By: Kazu Hirata Differential Revision: https://reviews.llvm.org/D90841	2020-11-17 14:40:50 +07:00
Max Kazantsev	63dd1734b2	[NFC] Collect ext users into vector instead of finding them twice	2020-11-17 14:01:43 +07:00
Kazu Hirata	1da60f1d44	[Transforms] Use pred_empty (NFC)	2020-11-16 22:09:14 -08:00
Kazu Hirata	5935952c31	[SanitizerCoverage] Use [&] for lambdas (NFC)	2020-11-16 21:45:21 -08:00
Arthur Eubanks	7de6dcd246	[Debugify] Skip debugifying on special/immutable passes With a function pass manager, it would insert debuginfo metadata before getting to function passes while processing the pass manager, causing debugify to skip while running the function passes. Skip special passes + verifier + printing passes. Compared to the legacy implementation of -debugify-each, this additionally skips verifier passes. Probably no need to update the legacy version since it will be obsolete soon. This fixes 2 instcombine tests using -debugify-each under NPM. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D91558	2020-11-16 20:39:46 -08:00
Sjoerd Meijer	fa5cb4b936	[LoopFlatten] Disable IV widening Disable widening of the IV in LoopFlatten while I investigate an assertion failures. Please note that the pass is also not yet enabled by default.	2020-11-16 22:30:52 +00:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Florian Hahn	5a4ca8b550	[ConstraintElimination] Add support for Or. When processing conditional branches, if the condition is an OR of 2 compares and the false successor only has the current block as predecessor, queue both negated conditions for the false successor	2020-11-16 21:48:38 +00:00
Philip Reames	2240d3d054	[LoopVec] Introduce an api for detecting uniform memory ops Split off D91398 at request of reviewer.	2020-11-16 13:30:48 -08:00
Arnold Schwaighofer	d861cc0e43	[coro] Async coroutines: Make sure we can handle control flow in suspend point dispatch function Create a valid basic block with a terminator before we call InlineFunction. Differential Revision: https://reviews.llvm.org/D91547	2020-11-16 11:59:02 -08:00
Sanjay Patel	4e68bc0999	Revert "[InstCombine] add multi-use demanded bits fold for add with low-bit mask" This reverts commit `e56103d250`. There is a stage2 msan failure blamed on this commit: http://lab.llvm.org:8011/#/builders/74/builds/888/steps/9/logs/stdio	2020-11-16 14:48:09 -05:00
Arthur Eubanks	aeb0fdff35	[SimplifyCFG] Respect optforfuzzing in NPM pass Regression caused by refactoring in `cdd006eec9`. See discussion in https://reviews.llvm.org/D89917. Reviewed By: arsenm, morehouse Differential Revision: https://reviews.llvm.org/D91473	2020-11-16 09:56:37 -08:00
Xun Li	985c524001	[Coroutine] Allocas used by StoreInst does not always escape In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped. This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away. These used should be handled without conservatively marking them escaped. This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still consider the original alloca not escaping and keep it on the stack instead of putting it on the frame. Differential Revision: https://reviews.llvm.org/D91305	2020-11-16 09:14:44 -08:00
Florian Hahn	8dbe44cb29	Add pass to add !annotate metadata from @llvm.global.annotations. This patch adds a new pass to add !annotation metadata for entries in @llvm.global.anotations, which is generated using __attribute__((annotate("_name"))) on functions in Clang. This has been discussed on llvm-dev as part of RFC: Combining Annotation Metadata and Remarks http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D91195	2020-11-16 14:57:11 +00:00
Benjamin Kramer	2e7455f00a	[LoopFlatten] Fold variable into assert. NFC.	2020-11-16 11:51:39 +01:00
Sjoerd Meijer	9aa773381b	[LoopFlatten] Widen the IV Widen the IV to the widest available and legal integer type, which makes this transformations always safe so that we can skip overflow checks. Motivation is to let this pass trigger on 64-bit targets too, and this is the last patch in a serie to achieve this: D90402 moves pass LoopFlatten to just before IndVarSimplify so that IVs are not already widened, D90421 factors out widening from IndVarSimplify into Utils/SimplifyIndVar so that we can also use it in LoopFlatten. Differential Revision: https://reviews.llvm.org/D90640	2020-11-16 10:20:13 +00:00
Max Kazantsev	b4624f65cf	Recommit "[NFC] Move code between functions as a preparation step for further improvement" The bug should be fixed now.	2020-11-16 14:30:34 +07:00
Kazu Hirata	147ccc848a	[JumpThreading] Call eraseBlock when folding a conditional branch This patch teaches the jump threading pass to call BPI->eraseBlock when it folds a conditional branch. Without this patch, BranchProbabilityInfo could end up with stale edge probabilities for the basic block containing the conditional branch -- one edge probability with less than 1.0 and the other for a removed edge. This patch is one of the steps before we can safely re-apply D91017. Differential Revision: https://reviews.llvm.org/D91511	2020-11-15 22:29:30 -08:00
Kazu Hirata	0888eaf3fd	[Loop Fusion] Use pred_empty and succ_empty (NFC)	2020-11-15 20:32:57 -08:00
Kazu Hirata	0c03d1328c	[ADCE] Use succ_empty (NFC)	2020-11-15 19:52:59 -08:00
Kazu Hirata	43a6a1e928	[TRE] Use successors(BB) (NFC)	2020-11-15 19:12:49 -08:00
Kazu Hirata	918e3439e2	[SanitizerCoverage] Use llvm::all_of (NFC)	2020-11-15 19:01:20 -08:00
Serguei Katkov	400f6edce7	[IRCE] Use the same min runtime iteration threshold for BPI and BFI checks In the last change to IRCE the BPI is ignored if BFI is present, however BFI and BPI have a different thresholds. Specifically BPI approach checks only latch exit probability so it is expected if the loop has only one exit block (latch) the behavior with BFI and BPI should be the same, BPI approach by default uses threshold 10, so it considers the loop with estimated number of iterations less then 10 should not be considered for IRCE optimization. BFI approach uses the default value 3 and this is inconsistent. The CL modifies the code to use the same threshold for both approaches.. The test is updated due to it has two side-exits (except latch) and each of them has a probability 1/16, so BFI estimates the number of runtime iteration is about to 7 (1/16 + 1/16 + some for latch) and test fails. Reviewers: mkazantsev, ebrevnov Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D91230	2020-11-16 09:21:50 +07:00
Sanjay Patel	6ddc237766	[InstCombine] reduce code for flip of masked bit; NFC There are 1-2 potential follow-up NFC commits to reduce this further on the way to generalizing this for vectors. The operand replacing path should be dead code because demanded bits handles that more generally (D91415).	2020-11-15 15:43:34 -05:00
Sanjay Patel	e56103d250	[InstCombine] add multi-use demanded bits fold for add with low-bit mask I noticed an add example like the one from D91343, so here's a similar patch. The logic is based on existing code for the single-use demanded bits fold. But I only matched a constant instead of using compute known bits on the operands because that was the motivating patterni that I noticed. I think this will allow removing a special-case (but incomplete) dedicated fold within visitAnd(), but I need to untangle the existing code to be sure. https://rise4fun.com/Alive/V6fP Name: add with low mask Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %r = and i8 %x, C2 Differential Revision: https://reviews.llvm.org/D91415	2020-11-15 15:09:49 -05:00
Florian Hahn	0c119ba8a8	[VPlan] Use VPValue def for VPWidenGEPRecipe. This patch turns VPWidenGEPRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84683	2020-11-15 15:12:47 +00:00
Arthur Eubanks	6e04da0a5a	[DCE] Port -redundant-dbg-inst-elim to NPM This is used to test RemoveRedundantDbgInstrs(), which is used by other passes. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D91477	2020-11-14 16:55:20 -08:00
Florian Hahn	a70b511e78	Recommit "[VPlan] Use VPValue def for VPWidenSelectRecipe." This reverts the revert commit `c8d73d939f`. It includes a fix for cases where we missed inserting VPValues for some selects, which should fix PR48142.	2020-11-14 20:00:25 +00:00
Arnold Schwaighofer	8fb73cecfd	[Coroutines] Make sure that async coroutine context size is a multiple of the alignment requirement This simplifies the code the allocator has to executed Differential Revision: https://reviews.llvm.org/D91471	2020-11-14 04:56:56 -08:00
Roman Lebedev	6861d938e5	Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485 the implementation is known-broken for certain inputs, the bugreport was up for a significant amount of timer, and there has been no activity to address it. Therefore, just completely rip out all of misexpect handling. I suspect, fixing it requires redesigning the internals of MD_misexpect. Should anyone commit to fixing the implementation problem, starting from clean slate may be better anyways. This reverts commit `7bdad08429`, and some of it's follow-ups, that don't stand on their own.	2020-11-14 13:12:38 +03:00
Akira Hatanaka	2ed3a76745	[ObjC][ARC] Add and use a function which finds and returns the single dependency. NFC Use findSingleDependency in place of FindDependencies and stop passing a set of Instructions around. Modify FindDependencies to return a boolean flag which indicates whether the dependencies it has found are all valid.	2020-11-13 14:02:58 -08:00
Akira Hatanaka	00d0974e62	Move variable declarations to functions in which they are used. NFC	2020-11-13 14:02:58 -08:00
Guozhi Wei	a20220d25b	[AlwaysInliner] Call mergeAttributesForInlining after inlining Like inlineCallIfPossible and InlinerPass, after inlining mergeAttributesForInlining should be called to merge callee's attributes to caller. But it is not called in AlwaysInliner, causes caller's attributes inconsistent with inlined code. Attached test case demonstrates that attribute "min-legal-vector-width"="512" is not merged into caller without this patch, and it causes failure in SelectionDAG when lowering the inlined AVX512 intrinsic. Differential Revision: https://reviews.llvm.org/D91446	2020-11-13 12:01:35 -08:00
Jianzhou Zhao	06c9b4aaa9	Extend the dfsan store/load callback with write/read address This helped debugging. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91236	2020-11-13 19:46:32 +00:00
Nikita Popov	02dda1c659	[Local] Clean up EmitGEPOffset Handle the emission of the add in a single place, instead of three different ones. Don't emit an unnecessary add with zero to start with. It will get dropped by InstCombine, but we may as well not create it in the first place. This also means that InstCombine does not need to specially handle this extra add. This is conceptually NFC, but can affect worklist order etc.	2020-11-13 18:30:56 +01:00
David Zarzycki	5a327f3337	Revert "[NFC] Move code between functions as a preparation step for further improvement" This reverts commit `08016ac32b`. A bunch of tests are failing my local two stage builder.	2020-11-13 10:52:49 -05:00
serge-sans-paille	95537f4508	llvmbuildectomy - compatibility with ocaml bindings Use exact component name in add_ocaml_library. Make expand_topologically compatible with new architecture. Fix quoting in is_llvm_target_library. Fix LLVMipo component name. Write release note.	2020-11-13 14:35:52 +01:00
Florian Hahn	8bb6347939	Add !annotation metadata and remarks pass. This patch adds a new !annotation metadata kind which can be used to attach annotation strings to instructions. It also adds a new pass that emits summary remarks per function with the counts for each annotation kind. The intended uses cases for this new metadata is annotating 'interesting' instructions and the remarks should provide additional insight into transformations applied to a program. To motivate this, consider these specific questions we would like to get answered: * How many stores added for automatic variable initialization remain after optimizations? Where are they? * How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated? Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) Reviewed By: thegameg, jdoerfert Differential Revision: https://reviews.llvm.org/D91188	2020-11-13 13:24:10 +00:00
Max Kazantsev	08016ac32b	[NFC] Move code between functions as a preparation step for further improvement	2020-11-13 18:12:45 +07:00
Max Kazantsev	185cface2e	[NFC] Refactor lambda into static function	2020-11-13 17:42:23 +07:00
Max Kazantsev	68490aec4e	[NFC] Move lambdae into static functions	2020-11-13 17:07:25 +07:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
Max Kazantsev	9224d322a2	[IndVars] Fix branches exiting by true with invariant conditions Forgot to invert the condition for them.	2020-11-13 15:52:00 +07:00
Max Kazantsev	0a1d394bf3	[NFC] Refactor loop-invariant getters to return Optional	2020-11-13 15:03:10 +07:00
Arthur Eubanks	b9406121a0	[NFC] Removed unused variable Obsolete as of https://reviews.llvm.org/D91046.	2020-11-12 22:24:57 -08:00
Akira Hatanaka	09266e4af0	[ObjC][ARC] Clear the lists of basic blocks and instructions before continuing the loop This fixes a bug introduced in `c6f1713c46`.	2020-11-12 22:20:02 -08:00
Max Kazantsev	77efb73c67	[IndVars] Replace checks with invariants if we cannot remove them If we cannot prove that the check is trivially true, but can prove that it either fails on the 1st iteration or never fails, we can replace it with first iteration check. Differential Revision: https://reviews.llvm.org/D88527 Reviewed By: skatkov	2020-11-13 12:23:12 +07:00
Sanjay Patel	0abde4bc92	[InstCombine] fold sub of low-bit masked value from offset of same value There might be some demanded/known bits way to generalize this, but I'm not seeing it right now. This came up as a regression when I was looking at a different demanded bits improvement. https://rise4fun.com/Alive/5fl Name: general Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0 %a1 = add i8 %x, C1 %a2 = and i8 %x, C2 %r = sub i8 %a1, %a2 => %r = and i8 %a1, ~C2 Name: test 1 %a1 = add i8 %x, 192 %a2 = and i8 %x, 10 %r = sub i8 %a1, %a2 => %r = and i8 %a1, -11 Name: test 2 %a1 = add i8 %x, -108 %a2 = and i8 %x, 3 %r = sub i8 %a1, %a2 => %r = and i8 %a1, -4	2020-11-12 20:10:28 -05:00
Jianzhou Zhao	2d96859ea6	[msan] Break the getShadow loop after matching an argument Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D91320	2020-11-12 19:48:59 +00:00
Alexander Kornienko	76b6cb515b	Fix unused variable warning in release builds	2020-11-12 18:14:06 +01:00
Jamie Schmeiser	f79b483385	[NFC intended] Refactor SinkAndHoistLICMFlags to allow others to construct without exposing internals Summary: Refactor SinkAdHoistLICMFlags from a struct to a class with accessors and constructors to allow other classes to construct flags with meaningful defaults while not exposing LICM internal details. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90482	2020-11-12 15:06:59 +00:00
Xun Li	94a45a8098	Revert "[Coroutine] Allocas used by StoreInst does not always escape" This reverts commit `8bc7b9278e`, which landed by accident.	2020-11-11 21:09:39 -08:00

... 3 4 5 6 7 ...

26132 Commits