llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	750bdda638	[X86] Call SimplifyDemandedBits in combineGatherScatter any time the mask element is wider than i1, not just when AVX512 is disabled. The AVX2 intrinsics can still be used when AVX512 is enabled and those go through this path. So we should simplify them. llvm-svn: 373108	2019-09-27 18:23:55 +00:00
Craig Topper	432a88bf04	[X86] Add test case to show failure to perform SimplifyDemandedBits on mask of avx2 gather intrinsics when avx512 is enabled. llvm-svn: 373107	2019-09-27 18:23:46 +00:00
Roman Lebedev	269f1bea0d	[InstCombine] Simplify shift-by-sext to shift-by-zext Summary: This is valid for any `sext` bitwidth pair: ``` Processing /tmp/opt.ll.. ---------------------------------------- %signed = sext %y %r = shl %x, %signed ret %r => %unsigned = zext %y %r = shl %x, %unsigned ret %r %signed = sext %y Done: 2016 Optimization is correct! ``` (This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.) Main motivation is the C++ semantics: ``` int shl(int a, char b) { return a << b; } ``` ends as ``` %3 = sext i8 %1 to i32 %4 = shl i32 %0, %3 ``` https://godbolt.org/z/0jgqUq which is, as this shows, too pessimistic. There is another problem here - we can only do the fold if sext is one-use. But we can trivially have cases where several shifts have the same sext shift amount. This should be resolved, later. Reviewers: spatel, nikic, RKSimon Reviewed By: spatel Subscribers: efriedma, hiraditya, nlopes, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68103 llvm-svn: 373106	2019-09-27 18:12:15 +00:00
Jakub Kuderski	a524e630a7	XFAIL a codegen test AArch64/tailmerging_in_mbp.ll This test fails when machine dominator tree verifier is run. Needs more investigation, as this is not a new failure. llvm-svn: 373103	2019-09-27 17:41:17 +00:00
Kai Nacke	d8e38b9b88	Change -march=systemz to triple and fix test These two test cases use -march=systemz instead of a triple. In particular, the used file format is then based on the default host triple. This leads to different behaviour on different platforms. The SystemZ implementation uses the integrated assembler for a long time now. The mature-mc-support test can be fully enabled. Differential Revision: https://reviews.llvm.org/D68129 llvm-svn: 373098	2019-09-27 16:19:15 +00:00
Dmitry Preobrazhensky	436d5b335a	[AMDGPU][MC] Corrected parsing of registers Summary of changes: refactored code for better readability and future improvements; fixed bug 41281: https://bugs.llvm.org/show_bug.cgi?id=41281 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D65224 llvm-svn: 373094	2019-09-27 15:41:31 +00:00
Djordje Todorovic	eb4c98ca3d	[DebugInfo] Exclude memory location values as parameter entry values Abandon describing of loaded values due to safety concerns. Loaded values are described as derefed memory location at caller point. At callee we can unintentionally change that memory location which would lead to different entry being printed value before and after the memory location clobbering. This problem is described in llvm.org/PR43343. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D67717 llvm-svn: 373089	2019-09-27 13:52:43 +00:00
Jesper Antonsson	39b81f1cbc	[CodeGenPrepare] Mend "avoid crashing from replacing a phi twice" fix. Summary: An erroneously negated if-statement by an earlier (March 2019) bugfix left phi replacement/simplification under optimizeMemoryInst() in CodeGenPrepare largely inactivated. The error was found when csmith found that the same assert as in the original bug report could still be triggered in a different way. This patch fixes the bugfix. The original bug was: https://bugs.llvm.org/show_bug.cgi?id=41052 ... and the previous fix was D59358. Reviewers: aprantl, skatkov Reviewed By: skatkov Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67838 llvm-svn: 373084	2019-09-27 13:01:37 +00:00
Clement Courbet	9431b72ce9	[llvm-exegesis] Add loop mode for repeating the snippet. Summary: Before this change the Executable function was made by duplicating the snippet. This change adds a --repetion-mode={loop\|duplicate} flag that allows choosing between this behaviour and wrapping the snippet instructions in a loop. The new mode can help measurements when the snippet fits in the DSB by short-cirtcuiting decoding. The loop adds a dec + jmp to the measurements, but since these are not part of the critical path, they execute in parallel with the measured code and do not impact measurements in practice. Overview of the change: - New SnippetRepetitor abstraction that handles repeating the snippet. The assembler delegates repeating the instructions to this class. - ExegesisTarget learns how to decrement loop counter and jump. - Some refactoring of the assembler into FunctionFiller/BasicBlockFiller. Reviewers: gchatelet Subscribers: mgorny, tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68125 llvm-svn: 373083	2019-09-27 12:56:24 +00:00
Sam Parker	110607b284	[NFC][ARM] Add some tail-predication tests Use different data types for some simple loops. llvm-svn: 373064	2019-09-27 10:33:53 +00:00
Simon Pilgrim	756f5cfc2a	[SLPVectorizer][X86] Regenerate arith-fp tests llvm-svn: 373063	2019-09-27 10:04:25 +00:00
Hans Wennborg	3740ae3b8a	Revert r372893 "[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets" This caused severe compile-time regressions, see PR43455. > Modern processors predict the targets of an indirect branch regardless of > the size of any jump table used to glean its target address. Moreover, > branch predictors typically use resources limited by the number of actual > targets that occur at run time. > > This patch changes the semantics of the option `-max-jump-table-size` to limit > the number of different targets instead of the number of entries in a jump > table. Thus, it is now renamed to `-max-jump-table-targets`. > > Before, when `-max-jump-table-size` was specified, it could happen that > cluster jump tables could have targets used repeatedly, but each one was > counted and typically resulted in tables with the same number of entries. > With this patch, when specifying `-max-jump-table-targets`, tables may have > different lengths, since the number of unique targets is counted towards the > limit, but the number of unique targets in tables is the same, but for the > last one containing the balance of targets. > > Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 373060	2019-09-27 09:54:26 +00:00
Roman Lebedev	0956480459	[NFC][InstCombine] Revisit shift-by-signext tests llvm-svn: 373055	2019-09-27 09:09:15 +00:00
Alexandros Lamprineas	c006b6f4cb	[MC][ARM] vscclrm disassembles as vldmia Happens only when the mve.fp subtarget feature is enabled: $ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b" .text vldmia pc, {d0, d1, d2, d3} $ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b" .text vscclrm {d0, d1, d2, d3, vpr} Assembling returns the correct encoding with or without mve.fp: $ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}" .text vscclrm {d0, d1, d2, d3, vpr} @ encoding: [0x9f,0xec,0x08,0x0b] $ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}" .text vscclrm {d0, d1, d2, d3, vpr} @ encoding: [0x9f,0xec,0x08,0x0b] The problem seems to be in the TableGen description of VSCCLRMD. The least significant bit should be set to zero. Differential Revision: https://reviews.llvm.org/D68025 llvm-svn: 373052	2019-09-27 08:22:24 +00:00
Wei Mi	9c8efeda5c	Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicated exits" Get a better approach in https://reviews.llvm.org/D68107 to solve the problem. Revert the initial patch and will commit the new one soon. This reverts commit rL372990. llvm-svn: 373044	2019-09-27 05:43:30 +00:00
Thomas Lively	3fcdd25ad5	[WebAssembly] v128.andnot Summary: As specified at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#bitwise-and-not Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68113 llvm-svn: 373041	2019-09-27 02:11:40 +00:00
Thomas Lively	81125f7362	[WebAssembly] SIMD Load and extend operations Summary: As specified at https://github.com/webassembly/simd/blob/master/proposals/simd/SIMD.md#load-and-extend. These instructions are behind the unimplemented-simd128 target feature for now because they have not been implemented in V8 yet. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68058 llvm-svn: 373040	2019-09-27 02:06:50 +00:00
Peter Collingbourne	c336557f02	hwasan: Compatibility fixes for short granules. We can't use short granules with stack instrumentation when targeting older API levels because the rest of the system won't understand the short granule tags stored in shadow memory. Moreover, we need to be able to let old binaries (which won't understand short granule tags) run on a new system that supports short granule tags. Such binaries will call the __hwasan_tag_mismatch function when their outlined checks fail. We can compensate for the binary's lack of support for short granules by implementing the short granule part of the check in the __hwasan_tag_mismatch function. Unfortunately we can't do anything about inline checks, but I don't believe that we can generate these by default on aarch64, nor did we do so when the ABI was fixed. A new function, __hwasan_tag_mismatch_v2, is introduced that lets code targeting the new runtime avoid redoing the short granule check. Because tag mismatches are rare this isn't important from a performance perspective; the main benefit is that it introduces a symbol dependency that prevents binaries targeting the new runtime from running on older (i.e. incompatible) runtimes. Differential Revision: https://reviews.llvm.org/D68059 llvm-svn: 373035	2019-09-27 01:02:10 +00:00
Craig Topper	d3f82b8b97	[X86] Add VMOVSSZrrk/VMOVSDZrrk/VMOVSSZrrkz/VMOVSDZrrkz to getUndefRegClearance. We have isel patterns that can put an IMPLICIT_DEF on one of the sources for these instructions. So we should make sure we break any dependencies there. This should be done by just using one of the other sources. llvm-svn: 373025	2019-09-26 22:56:06 +00:00
Craig Topper	c898724974	[X86] Add CodeGenOnly instructions for (f32 (X86selects $mask, (loadf32 addr), fp32imm0) to use masked MOVSS from memory. Similar for f64 and having a non-zero passthru value. We were previously not trying to fold the load at all. Using a CodeGenOnly instruction allows us to use FR32X/FR64X as the register class to avoid a bunch of COPY_TO_REGCLASS. llvm-svn: 373021	2019-09-26 22:23:09 +00:00
Jordan Rupprecht	f98d2c099a	Revert [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") This reverts r372626 (git commit `6a278d9073`) llvm-svn: 373019	2019-09-26 22:09:17 +00:00
Kit Barton	50bc610460	[LoopFusion] Add ability to fuse guarded loops Summary: This patch extends the current capabilities in loop fusion to fuse guarded loops (as defined in https://reviews.llvm.org/D63885). The patch adds the necessary safety checks to ensure that it safe to fuse the guarded loops (control flow equivalent, no intervening code, and same guard conditions). It also provides an alternative method to perform the actual fusion of guarded loops. The mechanics to fuse guarded loops are slightly different then fusing non-guarded loops, so I opted to keep them separate methods. I will be cleaning this up in later patches, and hope to converge on a single method to fuse both guarded and non-guarded loops, but for now I think the review will be easier to keep them separate. Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65464 llvm-svn: 373018	2019-09-26 21:42:45 +00:00
Zhaoshi Zheng	1128fa0924	[Unroll] Do NOT unroll a loop with small runtime upperbound For a runtime loop if we can compute its trip count upperbound: Don't unroll if: 1. loop is not guaranteed to run either zero or upperbound iterations; and 2. trip count upperbound is less than UnrollMaxUpperBound Unless user or TTI asked to do so. If unrolling, limit unroll factor to loop's trip count upperbound. Differential Revision: https://reviews.llvm.org/D62989 Change-Id: I6083c46a9d98b2e22cd855e60523fdc5a4929c73 llvm-svn: 373017	2019-09-26 21:40:27 +00:00
Roman Lebedev	3a5ca1c8b5	[DAGCombine][X86][AArch64][NFC] Add tests for shift-by-signext llvm-svn: 373014	2019-09-26 20:49:49 +00:00
Roman Lebedev	86b40b0bbf	[InstCombine][NFC] Add tests for shift-by-signext llvm-svn: 373013	2019-09-26 20:49:30 +00:00
Roman Lebedev	d1ef2e48fb	[InstCombine][NFC] Regenerate load-cmp.ll test llvm-svn: 373012	2019-09-26 20:49:21 +00:00
Xiangling Liao	3b808fb330	[AIX]Emit function descriptor csect in assembly This patch emits the function descriptor csect for functions with definitions under both 32-bit/64-bit mode on AIX. Differential Revision: https://reviews.llvm.org/D66724 llvm-svn: 373009	2019-09-26 19:38:32 +00:00
David Bolvansky	f1a5a93157	[NFC] Precommit tests for D68089 llvm-svn: 373006	2019-09-26 19:01:18 +00:00
Craig Topper	46721bb7f5	[InstCombine] Use m_Zero instead of isNullValue() when checking if a GEP index is all zeroes to prevent an infinite loop. The test case here previously infinite looped. Only one element from the GEP is used so SimplifyDemandedVectorElts would replace the other lanes in each index with undef leading to the first index being <0, undef, undef, undef>. But there's a GEP transform that tries to replace an index into a 0 sized type with a zero index. But the zero index check only works on ConstantInt 0 or ConstantAggregateZero so it would turn the index back to zeroinitializer. Resulting in a loop. The fix is to use m_Zero() to allow a vector of zeroes and undefs. Differential Revision: https://reviews.llvm.org/D67977 llvm-svn: 373000	2019-09-26 17:20:50 +00:00
Wei Mi	67d93f0d91	[LoopInfo] Limit the iterations to check whether a loop has dedicated exits for extreme large case. We had a case that a single loop which has 4000 exits and the average number of predecessors of each exit is > 1000, and we found compiling the case spent a significant amount of time on checking whether a loop has dedicated exits. This patch adds a limit for the iterations to the check. With the patch, the time to compile our testcase reduced from 1000s to 200s (clang release build). Differential Revision: https://reviews.llvm.org/D67359 llvm-svn: 372990	2019-09-26 15:36:25 +00:00
Jakub Kuderski	d98cb81cd1	Handle successor's PHI node correctly when flattening CFG merges two if-regions Summary: FlattenCFG merges two 'if' basicblocks by inserting one basicblock to another basicblock. The inserted basicblock can have a successor that contains a PHI node whoes incoming basicblock is the inserted basicblock. Since the existing code does not handle it, it becomes a badref. if (cond1) statement if (cond2) statement successor - contains PHI node whose predecessor is cond2 --> if (cond1 \|\| cond2) statement (BB for cond2 was deleted) successor - contains PHI node whose predecessor is cond2 --> bad ref! Author: Jaebaek Seo Reviewers: asbirlea, kuhar, tstellar, chandlerc, davide, dexonsmith Reviewed By: kuhar Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68032 llvm-svn: 372989	2019-09-26 15:20:17 +00:00
Jinsong Ji	eaf6746db0	[PowerPC] Add missing pattern for VSX Scalar Negative Multiply-Subtract Single Precision Summary: This was found during review of https://reviews.llvm.org/D66050. In the simple test of fdiv, we miss to fold ``` fneg 2, 2 xsmaddasp 3, 2, 0 ``` to ``` xsnmsubasp 3, 2, 0 ``` We have the patterns for Double Precision and vectors, just missing Single Precision, the patch add that. Reviewers: #powerpc, hfinkel, nemanjai, steven.zhang Reviewed By: #powerpc, steven.zhang Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67595 llvm-svn: 372985	2019-09-26 15:11:33 +00:00
Owen Reynolds	b4e2d471f7	[llvm-ar][test] Move MRI tests from "llvm/test/Object/" llvm/test/Object/ contains tests for the ArchiveWriter library, however support for MRI scripts is found in llvm-ar and not the library. This diff moves the MRI related tests and removes those that are duplicates. Differential Revision: https://reviews.llvm.org/D68038 llvm-svn: 372973	2019-09-26 12:32:11 +00:00
Bjorn Pettersson	163c54d288	[InstCombine] Don't assume CmpInst has been visited in getFlippedStrictnessPredicateAndConstant Summary: Removing an assumption (assert) that the CmpInst already has been simplified in getFlippedStrictnessPredicateAndConstant. Solution is to simply bail out instead of hitting the assertion. Instead we assume that any profitable rewrite will happen in the next iteration of InstCombine. The reason why we can't assume that the CmpInst already has been simplified is that the worklist does not guarantee such an ordering. Solves https://bugs.llvm.org/show_bug.cgi?id=43376 Reviewers: spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68022 llvm-svn: 372972	2019-09-26 12:16:01 +00:00
Petar Avramovic	ed3051917e	[MIPS GlobalISel] Lower aggregate structure return arguments Implement aggregate structure split to simpler types in splitToValueTypes. splitToValueTypes is used for return values. According to MipsABIInfo from clang/lib/CodeGen/TargetInfo.cpp, aggregate structure arguments for O32 always get simplified and thus will remain unsupported by the MIPS GlobalISel for the time being. For O32, aggregate structures can be encountered only for complex number returns e.g. 'complex float' or 'complex double' from <complex.h>. Differential Revision: https://reviews.llvm.org/D67963 llvm-svn: 372957	2019-09-26 10:48:07 +00:00
Simon Pilgrim	fc82c7a1b0	[SLPVectorizer][X86] Add SSE common check prefix to let us merge SSE2+SLM checks llvm-svn: 372955	2019-09-26 10:23:57 +00:00
Simon Pilgrim	d7f0207d73	[CostModel][X86] Fix SLM <2 x i64> icmp costs SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs. This should remove some of the SLM codegen diffs in D43582 llvm-svn: 372954	2019-09-26 10:14:38 +00:00
Jonas Paulsson	6e504d7706	[SystemZ] Recognize mnop-mcount in backend With -pg -mfentry -mnop-mcount, a nop is emitted instead of the call to fentry. Review: Ulrich Weigand https://reviews.llvm.org/D67765 llvm-svn: 372950	2019-09-26 08:38:07 +00:00
Mikael Holmen	957e090ac9	[IfConversion] Disallow TBB == FBB for valid triangles Summary: Previously the case EBB \| \_ \| \| \| TBB \| / FBB was treated as a valid triangle also when TBB and FBB was the same basic block. This could then lead to an invalid CFG when we removed the edge from EBB to TBB, since that meant we would also remove the edge from EBB to FBB. Since TBB == FBB is quite a degenerated case of a triangle, we now don't treat it as a valid triangle anymore, and thus we will avoid the trouble with updating the CFG. Reviewers: efriedma, dmgreen, kparzysz Reviewed By: efriedma Subscribers: bjope, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67832 llvm-svn: 372943	2019-09-26 06:35:55 +00:00
Craig Topper	ee78e44126	[X86] Mark the EVEX encoded PSADBW instructions as commutable to enable load folding of the other operand. The SSE and VEX versions are already correct. llvm-svn: 372941	2019-09-26 04:42:58 +00:00
Nick Lewycky	f57e968dd0	Improve C API support for atomicrmw and cmpxchg. atomicrmw and cmpxchg have a volatile flag, so allow them to be get and set with LLVM{Get,Set}Volatile. atomicrmw and fence have orderings, so allow them to be get and set with LLVM{Get,Set}Ordering. Add missing LLVMAtomicRMWBinOpFAdd and LLVMAtomicRMWBinOpFSub enum constants. AtomicCmpXchg also has a weak flag, add a getter/setter for that too. Add a getter/setter for the binary-op of an atomicrmw. atomicrmw and cmpxchg have a volatile flag, so allow it to be set/get with LLVMGetVolatile and LLVMSetVolatile. Add missing LLVMAtomicRMWBinOpFAdd and LLVMAtomicRMWBinOpFSub enum constants. AtomicCmpXchg also has a weak flag, add a getter/setter for that too. Add a getter/setter for the binary-op of an atomicrmw. Add LLVMIsA## for CatchSwitchInst, CallBrInst and FenceInst, as well as AtomicCmpXchgInst and AtomicRMWInst. Update llvm-c-test to include atomicrmw and fence, and to copy volatile for the four applicable instructions. Differential Revision: https://reviews.llvm.org/D67132 llvm-svn: 372938	2019-09-26 00:58:55 +00:00
Sam Clegg	079cba04bf	[MC][WebAssembly] Error on data symbols in the text section. Previously we had an assert but this can actually occur in valid user code so we need to handle this in release builds too. Differential Revision: https://reviews.llvm.org/D67997 llvm-svn: 372934	2019-09-25 23:33:16 +00:00
Alina Sbirlea	6720ed851b	[MemorySSA] Avoid adding Phis in the presence of unreachable blocks. Summary: If a block has all incoming values with the same MemoryAccess (ignoring incoming values from unreachable blocks), then use that incoming MemoryAccess and do not create a Phi in the first place. Revert IDF work-around added in rL372673; it should not be required unless the Def inserted is the first in its block. The patch also cleans up a series of tests, added during the many iterations on insertDef. The patch also fixes PR43438. The same issue that occurs in insertDef with "adding phis, hence the IDF of Phis is needed", can also occur in fixupDefs: the `getPreviousRecursive` call only adds Phis walking on the predecessor edges, which means there may be the case of a Phi added walking the CFG "backwards" which triggers the needs for an additional Phi in successor blocks. Such Phis are added during fixupDefs only in the presence of unreachable blocks. Hence this highlights the need to avoid adding Phis in blocks with unreachable predecessors in the first place. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67995 llvm-svn: 372932	2019-09-25 23:24:39 +00:00
Roman Lebedev	a2fa03af3a	[InstCombine] foldUnsignedUnderflowCheck(): one last pattern with 'sub' (PR43251) https://rise4fun.com/Alive/0j9 llvm-svn: 372930	2019-09-25 22:59:59 +00:00
Roman Lebedev	ca524621d1	[NFC][InstCombine] Tests for 'base u<= offset && (base - offset) != 0' pattern (PR43251) llvm-svn: 372929	2019-09-25 22:59:48 +00:00
Roman Lebedev	914a3d1cf2	[InstSimplify] Handle more 'A </>/>=/<= B &&/\|\| (A - B) !=/== 0' patterns (PR43251) https://rise4fun.com/Alive/sl9s https://rise4fun.com/Alive/2plN https://bugs.llvm.org/show_bug.cgi?id=43251 llvm-svn: 372928	2019-09-25 22:59:41 +00:00
Roman Lebedev	26606bec9a	[NFC][InstSimplify] More exaustive test coverage for 'A </>/>=/<= B &&/\|\| (A - B) !=/== 0' pattern (PR43251) llvm-svn: 372927	2019-09-25 22:59:24 +00:00
Nick Desaulniers	93d87260f1	[Verifier] add invariant check for callbr Summary: The list of indirect labels should ALWAYS have their blockaddresses as argument operands to the callbr (but not necessarily the other way around). Add an invariant that checks this. The verifier catches a bad test case that was added recently in r368478. I think that was a simple mistake, and the test was made less strict in regards to the precise addresses (as those weren't specifically the point of the test). This invariant will be used to find a reported bug. Link: https://www.spinics.net/lists/arm-kernel/msg753473.html Link: https://github.com/ClangBuiltLinux/linux/issues/649 Reviewers: craig.topper, void, chandlerc Reviewed By: void Subscribers: ychen, lebedev.ri, javed.absar, kristof.beyls, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D67196 llvm-svn: 372923	2019-09-25 22:28:27 +00:00
Florian Hahn	d663efe23a	[InstSimplify] Match 1.0 and 0.0 for both operands in SimplifyFMAMul Because we do not constant fold multiplications in SimplifyFMAMul, we match 1.0 and 0.0 for both operands, as multiplying by them is guaranteed to produce an exact result (if it is allowed to do so). Note that it is not enough to just swap the operands to ensure a constant is on the RHS, as we want to also cover the case with 2 constants. Reviewers: lebedev.ri, spatel, reames, scanon Reviewed By: lebedev.ri, reames Differential Revision: https://reviews.llvm.org/D67553 llvm-svn: 372915	2019-09-25 19:33:26 +00:00
Roman Lebedev	23646952e2	[InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0 https://rise4fun.com/Alive/KtL This also shows that the fold added in D67412 / r372257 was too specific, and the new fold allows those test cases to be handled more generically, therefore i delete now-dead code. This is yet again motivated by D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour" llvm-svn: 372912	2019-09-25 19:06:40 +00:00
Roman Lebedev	dfda7d2d90	[NFC][InstCombine] Add tests for (X - Y) < X --> Y <= X iff Y != 0 https://rise4fun.com/Alive/KtL This should go to InstCombiner::foldICmpBinO(), next to "Convert sub-with-unsigned-overflow comparisons into a comparison of args." llvm-svn: 372911	2019-09-25 19:06:26 +00:00
Vadzim Dambrouski	efcad77431	[MSP430] Allow msp430_intrcc functions to not have interrupt attribute. Summary: Useful in case you want to have control over interrupt vector generation. For example in Rust language we have an arrangement where all unhandled ISR vectors gets mapped to a single default handler function. Which is hard to implement when LLVM tries to generate vectors on its own. Reviewers: asl, krisb Subscribers: hiraditya, JDevlieghere, awygle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67313 llvm-svn: 372910	2019-09-25 18:58:07 +00:00
Stanislav Mekhanoshin	374c04e257	[AMDGPU] Improve fma.f64 test. NFC. llvm-svn: 372908	2019-09-25 18:50:34 +00:00
Stanislav Mekhanoshin	d3b2b97195	[AMDGPU] gfx10 v_fmac_f16 operand folding Fold immediates into v_fmac_f16. Differential Revision: https://reviews.llvm.org/D68037 llvm-svn: 372906	2019-09-25 18:40:20 +00:00
Florian Hahn	f3ab99dcf8	[InstCombine] Limit FMul constant folding for fma simplifications. As @reames pointed out post-commit, rL371518 adds additional rounding in some cases, when doing constant folding of the multiplication. This breaks a guarantee llvm.fma makes and must be avoided. This patch reapplies rL371518, but splits off the simplifications not requiring rounding from SimplifFMulInst as SimplifyFMAFMul. Reviewers: spatel, lebedev.ri, reames, scanon Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67434 llvm-svn: 372899	2019-09-25 17:03:20 +00:00
Jessica Paquette	8535a8672e	[AArch64][GlobalISel] Choose CCAssignFns per-argument for tail call lowering When checking for tail call eligibility, we should use the correct CCAssignFn for each argument, rather than just checking if the caller/callee is varargs or not. This is important for tail call lowering with varargs. If we don't check it, then basically any varargs callee with parameters cannot be tail called on Darwin, for one thing. If the parameters are all guaranteed to be in registers, this should be entirely safe. On top of that, not checking for this could potentially make it so that we have the wrong stack offsets when checking for tail call eligibility. Also refactor some of the stuff for CCAssignFnForCall and pull it out into a helper function. Update call-translator-tail-call.ll to show that we can now correctly tail call on Darwin. Also add two extra tail call checks. The first verifies that we still respect the caller's stack size, and the second verifies that we still don't tail call when a varargs function has a memory argument. Differential Revision: https://reviews.llvm.org/D67939 llvm-svn: 372897	2019-09-25 16:45:35 +00:00
Evandro Menezes	3bd8ba156b	[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets Modern processors predict the targets of an indirect branch regardless of the size of any jump table used to glean its target address. Moreover, branch predictors typically use resources limited by the number of actual targets that occur at run time. This patch changes the semantics of the option `-max-jump-table-size` to limit the number of different targets instead of the number of entries in a jump table. Thus, it is now renamed to `-max-jump-table-targets`. Before, when `-max-jump-table-size` was specified, it could happen that cluster jump tables could have targets used repeatedly, but each one was counted and typically resulted in tables with the same number of entries. With this patch, when specifying `-max-jump-table-targets`, tables may have different lengths, since the number of unique targets is counted towards the limit, but the number of unique targets in tables is the same, but for the last one containing the balance of targets. Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 372893	2019-09-25 16:10:20 +00:00
Sanjay Patel	831a7e7068	[DAGCombiner] add one-use restriction to vector transform with cheap extract We might be able to do better on the example in the test, but in general, we should not scalarize a splatted vector binop if there are other uses of the binop. Otherwise, we can end up with code as we had - a scalar op that is redundant with a vector op. llvm-svn: 372886	2019-09-25 15:08:33 +00:00
Sanjay Patel	1aa09e0585	[x86] add test for multi-use scalarization of vector binop; NFC llvm-svn: 372883	2019-09-25 14:57:45 +00:00
Sanjay Patel	6d4ea22e70	[IR] allow fast-math-flags on phi of FP values (2nd try) The changes here are based on the corresponding diffs for allowing FMF on 'select': D61917 <https://reviews.llvm.org/D61917> As discussed there, we want to have fast-math-flags be a property of an FP value because the alternative (having them on things like fcmp) leads to logical inconsistency such as: https://bugs.llvm.org/show_bug.cgi?id=38086 The earlier patch for select made almost no practical difference because most unoptimized conditional code begins life as a phi (based on what I see in clang). Similarly, I don't expect this patch to do much on its own either because SimplifyCFG promptly drops the flags when converting to select on a minimal example like: https://bugs.llvm.org/show_bug.cgi?id=39535 But once we have this plumbing in place, we should be able to wire up the FMF propagation and start solving cases like that. The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a regression in a LoopVectorize test. We are intersecting the FMF of any FPMathOperator there, so if a phi is not properly annotated, new math instructions may not be either. Once we fix the propagation in SimplifyCFG, it may be safe to remove that hack. Differential Revision: https://reviews.llvm.org/D67564 llvm-svn: 372878	2019-09-25 14:35:02 +00:00
Jonas Paulsson	c5d90e4b5c	[SystemZ] Improve emitSelect() Merge more Select pseudo instructions in emitSelect() by allowing other instructions between them as long as they do not clobber CC. Debug value instructions are now moved down to below the new PHIs instead of erasing them. Review: Ulrich Weigand https://reviews.llvm.org/D67619 llvm-svn: 372873	2019-09-25 14:00:33 +00:00
Sanjay Patel	2cec4b58f5	Revert [IR] allow fast-math-flags on phi of FP values This reverts r372866 (git commit `dec03223a9`) llvm-svn: 372868	2019-09-25 13:29:09 +00:00
George Rimar	7915260853	[llvm-readobj/llvm-readelf] - .stack_sizes: demangle symbol names in warnings reported. I started this patch as a refactoring, tried to make a helper for getting symbol names, similar to how we get section names used in warning messages. So this patch cleanups the code and fixes an issue: symbol names in warning messages were not demangled. Differential revision: https://reviews.llvm.org/D68012 llvm-svn: 372867	2019-09-25 13:16:43 +00:00
Sanjay Patel	dec03223a9	[IR] allow fast-math-flags on phi of FP values The changes here are based on the corresponding diffs for allowing FMF on 'select': D61917 As discussed there, we want to have fast-math-flags be a property of an FP value because the alternative (having them on things like fcmp) leads to logical inconsistency such as: https://bugs.llvm.org/show_bug.cgi?id=38086 The earlier patch for select made almost no practical difference because most unoptimized conditional code begins life as a phi (based on what I see in clang). Similarly, I don't expect this patch to do much on its own either because SimplifyCFG promptly drops the flags when converting to select on a minimal example like: https://bugs.llvm.org/show_bug.cgi?id=39535 But once we have this plumbing in place, we should be able to wire up the FMF propagation and start solving cases like that. The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a regression in a LoopVectorize test. We are intersecting the FMF of any FPMathOperator there, so if a phi is not properly annotated, new math instructions may not be either. Once we fix the propagation in SimplifyCFG, it may be safe to remove that hack. Differential Revision: https://reviews.llvm.org/D67564 llvm-svn: 372866	2019-09-25 13:14:12 +00:00
George Rimar	8ce581f586	[llvm-readobj] - Simplify stack-sizes.test test case. This is a follow-up for D67757, which allows to describe .stack_sizes sections with a new YAML syntax. Differential revision: https://reviews.llvm.org/D67759 llvm-svn: 372855	2019-09-25 12:18:45 +00:00
George Rimar	cfc2bccfd8	[yaml2elf] - Support describing .stack_sizes sections using unique suffixes. Currently we can't use unique suffixes in section names to describe stack sizes sections. E.g. '.stack_sizes [1]' will be treated as a regular section. This happens because we recognize stack sizes section by name and do not yet drop the suffix before the check. The patch fixes it. Differential revision: https://reviews.llvm.org/D68018 llvm-svn: 372853	2019-09-25 12:09:30 +00:00
George Rimar	f302436a0a	[yaml2obj] - Add a Size field for StackSizesSection. It is a follow-up requested in the review comment for D67757. Allows to use Content + Size or just Size when describing .stack_sizes sections in YAML document Differential revision: https://reviews.llvm.org/D67958 llvm-svn: 372845	2019-09-25 11:40:11 +00:00
David Green	10d10102a4	[ARM] Ensure we do not attempt to create lsll #0 During legalisation we can end up with some pretty strange nodes, like shifts of 0. We need to make sure we don't try to make long shifts of these, ending up with invalid assembly instructions. A long shift with a zero immediate actually encodes a shift by 32. Differential Revision: https://reviews.llvm.org/D67664 llvm-svn: 372839	2019-09-25 10:16:48 +00:00
George Rimar	5b9a408113	[llvm-readobj] - Don't crash when dumping .stack_sizes and unable to find a relocation resolver. The crash might happen when we have either a broken or unsupported object and trying to resolve relocations when dumping the .stack_sizes section. For the test case I used a 32-bits ELF header and a 64-bit relocation. In this case a null pointer is returned by the code instead of the relocation resolver function and then we crash. Differential revision: https://reviews.llvm.org/D67962 llvm-svn: 372838	2019-09-25 10:14:50 +00:00
Florian Hahn	364a23427b	[AArch64] Convert neon_ushl and neon_sshl with positive constants to VSHL. I think we should be able to use shl instead of sshl and ushl for positive constant shift values, unless I am missing something. We already have the machinery in place to ensure we only replace nodes, if the shift value is positive and <= the element width. This is a generalization of an earlier patch rL372565. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D67955 llvm-svn: 372824	2019-09-25 08:22:05 +00:00
Amara Emerson	f674d7dab1	[AArch64][GlobalISel] Tweak legalization rule for G_BSWAP to handle widening s16. llvm-svn: 372812	2019-09-25 04:52:42 +00:00
Fangrui Song	f2bbfa05fe	[llvm-objcopy][test] Clean up -B tests -B is ignored for GNU objcopy compatibility after D67215/r371914. * Delete mentions of -B from input-output-target.test - we have enough -B tests. * Merge binary-input-with-arch.test into binary-output-target.test. Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D67693 llvm-svn: 372809	2019-09-25 03:41:01 +00:00
Yonghong Song	1487bf6c82	[BPF] Generate array dimension size properly for zero-size elements Currently, if an array element type size is 0, the number of array elements will be set to 0, regardless of what user specified. This implementation is done in the beginning where BTF is mostly used to calculate the member offset. For example, struct s {}; struct s1 { int b; struct s a[2]; }; struct s1 s1; The BTF will have struct "s1" member "a" with element count 0. Now BTF types are used for compile-once and run-everywhere relocations and we need more precise type representation for type comparison. Andrii reported the issue as there are differences between original structure and BTF-generated structure. This patch made the change to correctly assign "2" as the number elements of member "a". Some dead codes related to ElemSize compuation are also removed. Differential Revision: https://reviews.llvm.org/D67979 llvm-svn: 372785	2019-09-24 22:38:43 +00:00
Sean Fertile	b3a9320c08	Extends the expansion of the LWZtoc pseduo op for AIX. Differential Revision: https://reviews.llvm.org/D67853 llvm-svn: 372772	2019-09-24 18:04:51 +00:00
Philip Reames	d9629b88ff	[GCRelocate] Add a peephole to canonicalize base pointer relocation If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting. llvm-svn: 372771	2019-09-24 17:24:16 +00:00
Simon Pilgrim	a7f27f357d	[X86] Add MMX MOVD/MOVQ stores to folding tables to support stack folding llvm-svn: 372770	2019-09-24 16:15:32 +00:00
Roman Lebedev	45fd1e9d50	[InstCombine] (a+b) < a && (a+b) != 0 -> (0-b) < a iff a/b != 0 (PR43259) Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. For ``` #include <cassert> char* test(char& base, signed long offset) { __builtin_assume(offset < 0); return &base + offset; } ``` We produce https://godbolt.org/z/r40U47 and again those two icmp's can be merged: ``` Name: 0 Pre: C != 0 %adjusted = add i8 %base, C %not_null = icmp ne i8 %adjusted, 0 %no_underflow = icmp ult i8 %adjusted, %base %r = and i1 %not_null, %no_underflow => %neg_offset = sub i8 0, C %r = icmp ugt i8 %base, %neg_offset ``` https://rise4fun.com/Alive/ALap https://rise4fun.com/Alive/slnN There are 3 other variants of this pattern, i believe they all will go into InstSimplify. https://bugs.llvm.org/show_bug.cgi?id=43259 Reviewers: spatel, xbolva00, nikic Reviewed By: spatel Subscribers: efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67849 llvm-svn: 372768	2019-09-24 16:10:50 +00:00
Roman Lebedev	5b881f356c	[InstCombine] (a+b) <= a && (a+b) != 0 -> (0-b) < a (PR43259) Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. This pattern isn't exactly what we get there (strict vs. non-strict predicate), but this pattern does not require known-bits analysis, so it is best to handle it first. ``` Name: 0 %adjusted = add i8 %base, %offset %not_null = icmp ne i8 %adjusted, 0 %no_underflow = icmp ule i8 %adjusted, %base %r = and i1 %not_null, %no_underflow => %neg_offset = sub i8 0, %offset %r = icmp ugt i8 %base, %neg_offset ``` https://rise4fun.com/Alive/knp There are 3 other variants of this pattern, they all will go into InstSimplify: https://rise4fun.com/Alive/bIDZ https://bugs.llvm.org/show_bug.cgi?id=43259 Reviewers: spatel, xbolva00, nikic Reviewed By: spatel Subscribers: hiraditya, majnemer, vsk, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67846 llvm-svn: 372767	2019-09-24 16:10:38 +00:00
Simon Pilgrim	682d41a506	[X86] Add tests showing failure to stack fold MMX MOVD/MOVQ stores llvm-svn: 372766	2019-09-24 15:40:09 +00:00
George Rimar	1a219aa8df	[yaml2obj/obj2yaml] - Add support for .stack_sizes sections. .stack_sizes is a SHT_PROGBITS section that contains pairs of <address (4/8 bytes), stack size (uleb128)>. This patch teach tools to parse and dump it. Differential revision: https://reviews.llvm.org/D67757 llvm-svn: 372762	2019-09-24 14:22:37 +00:00
Ilya Biryukov	60e5e0b667	Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756	2019-09-24 13:48:02 +00:00
David Green	2fb41fc70c	[ARM] Split large widening MVE loads Similar to rL372717, we can force the splitting of extends of vector loads in MVE, in order to use the better widening loads as opposed to going through expensive extends. This adds a combine to early-on detect extends of loads and split the load in two, from where normal legalisation will kick in and we get a series of widening loads. Differential Revision: https://reviews.llvm.org/D67909 llvm-svn: 372721	2019-09-24 10:53:09 +00:00
David Green	2462d421ee	[ARM] MVE sext and widen/narrow tests from larger types. NFC llvm-svn: 372719	2019-09-24 10:39:58 +00:00
David Green	49d851f403	[ARM] Split large truncating MVE stores MVE does not have a simple sign extend instruction that can move elements across lanes. We currently often end up moving each lane into and out of a GPR, in order to get elements into the correct places. When we have a store of a trunc (or a extend of a load), we can instead just split the store/load in two, using the narrowing/widening load/store instructions from each half of the vector. This does that for stores. It happens very early in a store combine, so as to easily detect the truncates. (It would be possible to do this later, but that would involve looking through a buildvector of extract elements. Not impossible but this way seemed simpler). By enabling store combines we also get a vmovdrr combine for free, helping some other tests. Differential Revision: https://reviews.llvm.org/D67828 llvm-svn: 372717	2019-09-24 10:10:41 +00:00
Pavel Labath	aaff1a631a	MCRegisterInfo: Merge getLLVMRegNum and getLLVMRegNumFromEH Summary: The functions different in two ways: - getLLVMRegNum could return both "eh" and "other" dwarf register numbers, while getLLVMRegNumFromEH only returned the "eh" number. - getLLVMRegNum asserted if the register was not found, while the second function returned -1. The second distinction was pretty important, but it was very hard to infer that from the function name. Aditionally, for the use case of dumping dwarf expressions, we needed a function which can work with both kinds of number, but does not assert. This patch solves both of these issues by merging the two functions into one, returning an Optional<unsigned> value. While the same thing could be achieved by adding an "IsEH" argument to the (renamed) getLLVMRegNumFromEH function, it seemed better to avoid the confusion of two functions and put the choice of asserting into the hands of the caller -- if he checks the Optional value, he can safely process "untrusted" input, and if he blindly dereferences the Optional, he gets the assertion. I've updated all call sites to the new API, choosing between the two options according to the function they were calling originally, except that I've updated the usage in DWARFExpression.cpp to use the "safe" method instead, and added a test case which would have previously triggered an assertion failure when processing (incorrect?) dwarf expressions. Reviewers: dsanders, arsenm, JDevlieghere Subscribers: wdng, aprantl, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67154 llvm-svn: 372710	2019-09-24 09:31:02 +00:00
Alexey Lapshin	49f3c2b604	[Debuginfo] dbg.value points to undef value after Induction Variable Simplification. Induction Variable Simplification pass does not update dbg.value intrinsic. Before: %add = add nuw nsw i32 %ArgIndex.06, 1 call void @llvm.dbg.value(metadata i32 %add, metadata !17, metadata !DIExpression()) After: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 call void @llvm.dbg.value(metadata i64 undef, metadata !17, metadata !DIExpression()) There should be: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 call void @llvm.dbg.value(metadata i64 %indvars.iv.next, metadata !17, metadata !DIExpression()) Differential Revision: https://reviews.llvm.org/D67770 llvm-svn: 372703	2019-09-24 08:47:03 +00:00
Sjoerd Meijer	0fcb3afb40	[LV] Forced vectorization with runtime checks and OptForSize When vectorisation is forced with a pragma, we optimise for min size, and we need to emit runtime memory checks, then allow this code growth and don't run in an assert like we currently do. This is the result of D65197 and D66803, and was a use-case not really considered before. If this now happens, we emit an optimisation remark warning about the code-size expansion, which can be avoided by not forcing vectorisation or possibly source-code modifications. Differential Revision: https://reviews.llvm.org/D67764 llvm-svn: 372694	2019-09-24 08:03:34 +00:00
Huihui Zhang	a4dd98f2e9	[InstCombine] Fold a shifty implementation of clamp-to-allones. Summary: Fold or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) into X s> Y ? -1 : X https://rise4fun.com/Alive/d8Ab clamp255 is a common operator in image processing, can be implemented in a shifty way "(255 - X) >> 31 \| X & 255". Fold shift into select enables more optimization, e.g., vmin generation for ARM target. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Reviewed By: lebedev.ri Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67800 llvm-svn: 372678	2019-09-24 00:30:09 +00:00
Huihui Zhang	8952199715	[InstCombine] Fold a shifty implementation of clamp-to-zero. Summary: Fold and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) into X s> Y ? X : 0 https://rise4fun.com/Alive/lFH Fold shift into select enables more optimization, e.g., vmax generation for ARM target. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Reviewed By: lebedev.ri Subscribers: xbolva00, andreadb, craig.topper, RKSimon, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67799 llvm-svn: 372676	2019-09-24 00:15:03 +00:00
Amara Emerson	adec1209e6	[GlobalISel][IRTranslator] Fix switch table lowering to use signed LE not unsigned. We were miscompiling switch value comparisons with the wrong signedness, which shows up when we have things like switch case values with i1 types, which end up being legalized incorrectly. Fixes PR43383 llvm-svn: 372675	2019-09-24 00:09:23 +00:00
Alina Sbirlea	2c5e6646ef	[MemorySSA] Update Phi insertion. Summary: MemoryPhis may be needed following a Def insertion inthe IDF of all the new accesses added (phis + potentially a def). Ensure this also occurs when only the new MemoryPhis are the defining accesses. Note: The need for computing IDF here is because of new Phis added with edges incoming from unreachable code, Phis that had previously been simplified. The preferred solution is to not reintroduce such Phis. This patch is the needed fix while working on the preferred solution. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67927 llvm-svn: 372673	2019-09-23 23:50:16 +00:00
Huihui Zhang	5b5f1c8efd	[NFC][InstCombine] Add tests for shifty implementation of clamping. Summary: Clamp negative to zero and clamp positive to allOnes are common operation in image saturation. Add tests for shifty implementation of clamping, as prepare work for folding: and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) --> X s> 0 ? X : 0; or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) --> X s> Y ? allOnes : X. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67798 llvm-svn: 372671	2019-09-23 23:48:32 +00:00
Saleem Abdulrasool	082f895b1a	HotColdSplitting: invalidate the AssumptionCache on split When a cold path is outlined, the value tracking in the assumption cache may be invalidated due to the code motion. We would previously trip an assertion in subsequent passes (but required the passes to happen in a single run as the assumption cache is shared across the passes). Invalidating the cache ensures that we get the correct information when needed with the legacy pass manager as well. llvm-svn: 372667	2019-09-23 22:23:01 +00:00
Alexander Shaposhnikov	2eef85e247	[llvm-lipo] Add support for archives Add support for creating universal binaries which can contain an archive. Differential revision: https://reviews.llvm.org/D67758 Test plan: make check-all llvm-svn: 372666	2019-09-23 22:22:55 +00:00
Wei Mi	22fd88530b	[SampleFDO] Treat names in profile as not cold only when profile symbol list is available In rL372232, we treated names showing up in profile as not cold when profile-sample-accurate is enabled. This caused 70k size regression in Chrome/Android. The patch put a guard and only enable the change when profile symbol list is available, i.e., keep the old behavior when profile symbol list is not available. Differential Revision: https://reviews.llvm.org/D67931 llvm-svn: 372665	2019-09-23 22:11:35 +00:00
Craig Topper	8a6916e6db	[X86] Reduce the number of unique check prefixes in memset-nonzero.ll. NFC The avx512 with prefer-256-bit generates the same code as AVX2 so just reuse that prefix. llvm-svn: 372661	2019-09-23 21:29:28 +00:00
Thomas Lively	99d3dd287a	[WebAssembly] vNxM.load_splat instructions Summary: Adds the new load_splat instructions as specified at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#load-and-splat. DAGISel does not allow matching multiple copies of the same load in a single pattern, so we use a new node in WebAssemblyISD to wrap loads that should be splatted. Depends on D67783. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67784 llvm-svn: 372655	2019-09-23 20:42:12 +00:00
David Bolvansky	48db0272d6	[InstCombine] Annotate strndup calls with dereferenceable_or_null "Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size." llvm-svn: 372647	2019-09-23 19:55:45 +00:00
Aditya Nandakumar	72a4621cdf	[TableGen] Emit OperandType enums for RegisterOperands/RegisterClasses https://reviews.llvm.org/D66773 The OpTypes::OperandType was creating an enum for all records that inherit from Operand, but in reality there are operands for instructions that inherit from other types too. In particular, RegisterOperand and RegisterClass. This commit adds those types to the list of operand types that are tracked by the OperandType enum. Patch by: nlguillemot llvm-svn: 372641	2019-09-23 18:51:00 +00:00
David Bolvansky	8d52016155	[SLC] Convert some strndup calls to strdup calls Summary: Motivation: - If we can fold it to strdup, we should (strndup does more things than strdup). - Annotation mechanism. (Works for strdup well). strdup and strndup are part of C 20 (currently posix fns), so we should optimize them. Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67679 llvm-svn: 372636	2019-09-23 18:20:01 +00:00
Roman Lebedev	0a51e1f66d	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. c/d/e with mask (PR42563) Summary: If we have a pattern `(x & (-1 >> maskNbits)) << shiftNbits`, we already know (have a fold) that will drop the `& (-1 >> maskNbits)` mask iff `(shiftNbits-maskNbits) s>= 0` (i.e. `shiftNbits u>= maskNbits`). So even if `(shiftNbits-maskNbits) s< 0`, we can still fold, we will just need to apply a constant mask afterwards: ``` Name: c, normal+mask %t0 = lshr i32 -1, C1 %t1 = and i32 %t0, %x %r = shl i32 %t1, C2 => %n0 = shl i32 %x, C2 %n1 = i32 ((-(C2-C1))+32) %n2 = zext i32 %n1 to i64 %n3 = lshr i64 -1, %n2 %n4 = trunc i64 %n3 to i32 %r = and i32 %n0, %n4 ``` https://rise4fun.com/Alive/gslRa Naturally, old `%masked` will have to be one-use. This is not valid for pattern f - where "masking" is done via `ashr`. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67725 llvm-svn: 372630	2019-09-23 17:04:28 +00:00
Roman Lebedev	b4a1d8a84c	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563) Summary: And this is finally the interesting part of that fold! If we have a pattern `(x & (~(-1 << maskNbits))) << shiftNbits`, we already know (have a fold) that will drop the `& (~(-1 << maskNbits))` mask iff `(maskNbits+shiftNbits) u>= bitwidth(x)`. But that is actually ignorant, there's more general fold here: In this pattern, `(maskNbits+shiftNbits)` actually correlates with the number of low bits that will remain in the final value. So even if `(maskNbits+shiftNbits) u< bitwidth(x)`, we can still fold, we will just need to apply a constant mask afterwards: ``` Name: a, normal+mask %onebit = shl i32 -1, C1 %mask = xor i32 %onebit, -1 %masked = and i32 %mask, %x %r = shl i32 %masked, C2 => %n0 = shl i32 %x, C2 %n1 = add i32 C1, C2 %n2 = zext i32 %n1 to i64 %n3 = shl i64 -1, %n2 %n4 = xor i64 %n3, -1 %n5 = trunc i64 %n4 to i32 %r = and i32 %n0, %n5 ``` https://rise4fun.com/Alive/F5R Naturally, old `%masked` will have to be one-use. Similar fold exists for patterns c,d,e, will post patch later. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67677 llvm-svn: 372629	2019-09-23 17:04:14 +00:00
Sanjay Patel	7414151929	[BreakFalseDeps] ignore function with minsize attribute This came up in the x86-specific: https://bugs.llvm.org/show_bug.cgi?id=43239 ...but it is a general problem for the BreakFalseDeps pass. Dependencies may be broken by adding some other instruction, so that should be avoided if the overall goal is to minimize size. Differential Revision: https://reviews.llvm.org/D67363 llvm-svn: 372628	2019-09-23 17:01:01 +00:00
Alexey Bataev	6a278d9073	[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") Summary: Initially SLP vectorizer replaced all going-to-be-vectorized instructions with Undef values. It may break ScalarEvaluation and may cause a crash. Reworked SLP vectorizer so that it does not replace vectorized instructions by UndefValue anymore. Instead vectorized instructions are marked for deletion inside if BoUpSLP class and deleted upon class destruction. Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29641 llvm-svn: 372626	2019-09-23 16:25:03 +00:00
Dmitry Preobrazhensky	6784a3cd79	[AMDGPU][MC] Corrected handling of relocatable expressions See bug 43359: https://bugs.llvm.org//show_bug.cgi?id=43359 Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67829 llvm-svn: 372622	2019-09-23 15:41:51 +00:00
Krzysztof Parzyszek	f97fdf5792	[Hexagon] Bitcast v4i16 to v8i8, unify no-op casts between scalar and HVX llvm-svn: 372616	2019-09-23 14:33:27 +00:00
Sanjay Patel	31b9dfe23f	[x86] fix assert with horizontal math + broadcast of vector (PR43402) https://bugs.llvm.org/show_bug.cgi?id=43402 llvm-svn: 372606	2019-09-23 13:30:23 +00:00
Nico Weber	da298aa913	llvm-undname: Add support for demangling typeinfo names typeinfo names aren't symbols but string constant contents stored in compiler-generated typeinfo objects, but llvm-cxxfilt can demangle these for Itanium names. In the MSVC ABI, these are just a '.' followed by a mangled type -- this means they don't start with '?' like all MS-mangled symbols do. Differential Revision: https://reviews.llvm.org/D67851 llvm-svn: 372602	2019-09-23 13:13:37 +00:00
Djordje Todorovic	ead96d73ac	Revert "Reland "[utils] Implement the llvm-locstats tool"" This reverts commit rL372554. llvm-svn: 372580	2019-09-23 11:04:11 +00:00
George Rimar	753f6cff2f	[llvm-readobj] - Stop treating ".stack_sizes." sections as stack sizes sections. llvm-readobj currently handles .stack_sizes. (e.g. .stack_sizes.foo) as a normal stack sizes section. Though MC does not produce sections with such names. Also, linkers do not combine .stack_sizes.* into .stack_sizes. A mini discussion about this correctness issue is here: https://reviews.llvm.org/D67757#inline-609274 This patch changes implementation so that only now only '.stack_sizes' name is accepted as a real stack sizes section. Differential revision: https://reviews.llvm.org/D67824 llvm-svn: 372578	2019-09-23 10:43:09 +00:00
George Rimar	4e0faa338b	[llvm-readobj] - Implement LLVM-style dumping for .stack_sizes sections. D65313 implemented GNU-style dumping (llvm-readelf). This one implements LLVM-style dumping (llvm-readobj). Differential revision: https://reviews.llvm.org/D67834 llvm-svn: 372576	2019-09-23 10:33:19 +00:00
Sam Parker	9feb429a33	[ARM][MVE] Remove old tail predicates Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also look into the exit block to see if there's any duplicates of the predicates that we've replaced and clone the vctp to be used there instead. Differential Revision: https://reviews.llvm.org/D67709 llvm-svn: 372567	2019-09-23 09:48:25 +00:00
Florian Hahn	3e2fdbee80	[AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine. Try to generate ushll/sshll for aarch64_neon_ushl/aarch64_neon_sshl, if their first operand is extended and the second operand is a constant Also adds a few tests marked with FIXME, where we can further increase codegen. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D62308 llvm-svn: 372565	2019-09-23 09:38:53 +00:00
Sam Parker	4ba6d0ded2	[ARM][LowOverheadLoops] Use subs during revert. Check whether there are any uses or defs between the LoopDec and LoopEnd. If there's not, then we can use a subs to set the cpsr and skip generating a cmp. Differential Revision: https://reviews.llvm.org/D67801 llvm-svn: 372560	2019-09-23 08:57:50 +00:00
Sam Parker	566127e376	[ARM][LowOverheadLoops] Use tBcc when reverting Check the branch target ranges and use a tBcc instead of t2Bcc when we can. Differential Revision: https://reviews.llvm.org/D67796 llvm-svn: 372557	2019-09-23 08:35:31 +00:00
Petar Avramovic	c063b0b0d3	[MIPS GlobalISel] VarArg argument lowering, select G_VASTART and vacopy CC_Mips doesn't accept vararg functions for O32, so we have to explicitly use CC_Mips_FixedArg. For lowerCall we now properly figure out whether callee function is vararg or not, this has no effect for O32 since we always use CC_Mips_FixedArg. For lower formal arguments we need to copy arguments in register to stack and save pointer to start for argument list into MipsMachineFunction object so that G_VASTART could use it during instruction select. For vacopy we need to copy content from one vreg to another, load and store are used for that purpose. Differential Revision: https://reviews.llvm.org/D67756 llvm-svn: 372555	2019-09-23 08:11:41 +00:00
Djordje Todorovic	0e490ae0a9	Reland "[utils] Implement the llvm-locstats tool" The tool reports verbose output for the DWARF debug location coverage. The llvm-locstats for each variable or formal parameter DIE computes what percentage from the code section bytes, where it is in scope, it has location description. The line 0 shows the number (and the percentage) of DIEs with no location information, but the line 100 shows the number (and the percentage) of DIEs where there is location information in all code section bytes (where the variable or parameter is in the scope). The line 50..59 shows the number (and the percentage) of DIEs where the location information is in between 50 and 59 percentage of its scope covered. Differential Revision: https://reviews.llvm.org/D66526 llvm-svn: 372554	2019-09-23 07:57:53 +00:00
Craig Topper	03b5a13ee3	[X86] Canonicalize all zeroes vector to RHS in X86DAGToDAGISel::tryVPTESTM. llvm-svn: 372544	2019-09-23 05:35:23 +00:00
Craig Topper	5e26064c40	[X86] Remove SETEQ/SETNE canonicalization code from LowerIntVSETCC_AVX512 to prevent an infinite loop. The attached test case would previous infinite loop after r365711. I'm going to move this to X86ISelDAGToDAG.cpp to get the setcc to match VPTEST in 32-bit mode in a follow up commit. llvm-svn: 372543	2019-09-23 05:35:20 +00:00
Craig Topper	1f058538e0	[X86] Add 32-bit command line to avx512f-vec-test-testn.ll llvm-svn: 372542	2019-09-23 05:35:15 +00:00
David Zarzycki	a7a515cb77	Prefer AVX512 memcpy when applicable When AVX512 is available and the preferred vector width is 512-bits or more, we should prefer AVX512 for memcpy(). https://bugs.llvm.org/show_bug.cgi?id=43240 https://reviews.llvm.org/D67874 llvm-svn: 372540	2019-09-23 05:00:59 +00:00
Craig Topper	a533e87792	[X86][SelectionDAGBuilder] Move the hack for handling MMX shift by i32 intrinsics into the X86 backend. This intrinsics should be shift by immediate, but gcc allows any i32 scalar and clang needs to match that. So we try to detect the non-constant case and move the data from an integer register to an MMX register. Previously this was done by creating a v2i32 build_vector and bitcast in SelectionDAGBuilder. This had to be done early since v2i32 isn't a legal type. The bitcast+build_vector would be DAG combined to X86ISD::MMX_MOVW2D which isel will turn into a GPR->MMX MOVD. This commit just moves the whole thing to lowering and emits the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The test changes just seem to be due to nodes being linearized in a different order. llvm-svn: 372535	2019-09-23 01:05:33 +00:00
Roman Lebedev	7c3d6f5a1b	[X86] X86DAGToDAGISel::matchBEXTRFromAndImm(): if can't use BEXTR, fallback to BZHI is profitable (PR43381) Summary: PR43381 notes that while we are good at matching `(X >> C1) & C2` as BEXTR/BEXTRI, we only do that if we either have BEXTRI (TBM), or if BEXTR is marked as being fast (`-mattr=+fast-bextr`). In all other cases we don't match. But that is mainly only true for AMD CPU's. However, for all the CPU's for which we have sched models, the BZHI is always fast (or the sched models are all bad.) So if we decide that it's unprofitable to emit BEXTR/BEXTRI, we should consider falling-back to BZHI if it is available, and follow-up with the shift. While it's really tempting to do something because it's cool it is wise to first think whether it actually makes sense to do. We shouldn't just use BZHI because we can, but only it it is beneficial. In particular, it isn't really worth it if the input is a register, mask is small, or we can fold a load. But it is worth it if the mask does not fit into 32-bits. (careful, i don't know much about intel cpu's, my choice of `-mcpu` may be bad here) Thus we manage to fold a load: https://godbolt.org/z/Er0OQz Or if we'd end up using BZHI anyways because the mask is large: https://godbolt.org/z/dBJ_5h But this isn'r actually profitable in general case, e.g. here we'd increase microop count (the register renaming is free, mca does not model that there it seems) https://godbolt.org/z/k6wFoz Likewise, not worth it if we just get load folding: https://godbolt.org/z/1M1deG https://bugs.llvm.org/show_bug.cgi?id=43381 Reviewers: RKSimon, craig.topper, davezarzycki, spatel Reviewed By: craig.topper, davezarzycki Subscribers: andreadb, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67875 llvm-svn: 372532	2019-09-22 22:04:29 +00:00
Roman Lebedev	24159592ca	[NFC][X86] Add BEXTR test with load and 33-bit mask (PR43381 / D67875) llvm-svn: 372524	2019-09-22 19:36:38 +00:00
Craig Topper	a1d86857ff	[X86] Update commutable EVEX vcmp patterns to use timm instead of imm. We need to match TargetConstant, not Constant. This was broken in r372338, but we lacked test coverage. llvm-svn: 372523	2019-09-22 19:06:13 +00:00
Craig Topper	ac84771261	[X86] Add more tests for commuting evex vcmp instructions during isel to fold a load. Some of the isel patterns were not updated to check for TargetConstant instead of Constant in r372338. llvm-svn: 372522	2019-09-22 19:06:08 +00:00
Simon Pilgrim	4d486156e7	[Cost][X86] Add more missing vector truncation costs The AVX512 cases still need some work to correct recognise the PMOV truncation cases. llvm-svn: 372514	2019-09-22 16:46:15 +00:00
Sanjay Patel	eb8d39e113	[InstCombine] allow icmp+binop folds before min/max bailout (PR43310) This has the potential to uncover missed analysis/folds as shown in the min/max code comment/test, but fewer restrictions on icmp folds should be better in general to solve cases like: https://bugs.llvm.org/show_bug.cgi?id=43310 llvm-svn: 372510	2019-09-22 14:31:53 +00:00
Sanjay Patel	d2a524288d	[InstCombine] add tests for icmp fold hindered by min/max; NFC llvm-svn: 372509	2019-09-22 14:23:22 +00:00
Simon Pilgrim	665ccbff60	[Cost][X86] Add v2i64 truncation costs We are missing costs for a lot of truncation cases, I'm hoping to address all the 'zero cost' cases in trunc.ll I thought this was a vector widening side effect, but even before this we had some interesting LV decisions (notably over indvars) being made due to these zero costs. llvm-svn: 372498	2019-09-22 12:04:38 +00:00
Craig Topper	38014c553f	[X86] Add test memset and memcpy testcases for D67874. NFC llvm-svn: 372494	2019-09-22 06:52:25 +00:00
Roman Lebedev	baf809811b	[InstSimplify] simplifyUnsignedRangeCheck(): X >= Y && Y == 0 --> Y == 0 https://rise4fun.com/Alive/v9Y4 llvm-svn: 372491	2019-09-21 22:27:39 +00:00
Roman Lebedev	ac4dda8052	[NFC][InstSimplify] Add exhaustive test coverage for simplifyUnsignedRangeCheck(). One case is not handled. llvm-svn: 372489	2019-09-21 22:27:18 +00:00
Suyog Sarda	cd629ea0a8	SROA: Check Total Bits of vector type While Promoting alloca instruction of Vector Type, Check total size in bits of its slices too. If they don't match, don't promote the alloca instruction. Bug : https://bugs.llvm.org/show_bug.cgi?id=42585 llvm-svn: 372480	2019-09-21 18:16:37 +00:00
Wei Mi	eee532cd5f	Recommit [SampleFDO] Expose an interface to return the size of a section or the size of the profile for profile in ExtBinary format. Fix a test failure on Mac. [SampleFDO] Expose an interface to return the size of a section or the size of the profile for profile in ExtBinary format. Sometimes we want to limit the size of the profile by stripping some functions with low sample count or by stripping some function names with small text size from profile symbol list. That requires the profile reader to have the interfaces returning the size of a section or the size of total profile. The patch add those interfaces. At the same time, add some dump facility to show the size of each section. Differential revision: https://reviews.llvm.org/D67726 llvm-svn: 372478	2019-09-21 17:23:55 +00:00
Hideto Ueno	63f6066b53	[Attributor] Implement "norecurse" function attribute deduction Summary: This patch introduces `norecurse` function attribute deduction. `norecurse` will be deduced if the following conditions hold: * The size of SCC in which the function belongs equals to 1. * The function doesn't have self-recursion. * We have `norecurse` for all call site. To avoid a large change, SCC is calculated using scc_iterator in InfoCache initialization for now. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67751 llvm-svn: 372475	2019-09-21 15:13:19 +00:00
Roman Lebedev	854b0f0f00	[NFC][X86] Adjust check prefixes in bmi.ll (PR43381) llvm-svn: 372468	2019-09-21 11:12:55 +00:00
Amara Emerson	9c7d599dec	[AArch64][GlobalISel] Implement selection for G_SHL of <2 x i64> Simple continuation of existing selection support. llvm-svn: 372467	2019-09-21 09:21:16 +00:00
Amara Emerson	a59a886832	[AArch64][GlobalISel] Selection support for G_ASHR of <2 x s64> Just add an extra case to the existing selection logic. llvm-svn: 372466	2019-09-21 09:21:13 +00:00
Amara Emerson	fae979bc68	[AArch64][GlobalISel] Make <4 x s32> G_ASHR and G_LSHR legal. llvm-svn: 372465	2019-09-21 09:21:10 +00:00
Amara Emerson	3bb56fa478	Revert "[SampleFDO] Expose an interface to return the size of a section or the size" This reverts commit `f118852046`. Broke the macOS build/greendragon bots. llvm-svn: 372464	2019-09-21 09:11:51 +00:00
James Molloy	8a74eca398	[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount Recommit: fix asan errors. The way MachinePipeliner uses these target hooks is stateful - we reduce trip count by one per call to reduceLoopCount. It's a little overfit for hardware loops, where we don't have to worry about stitching a loop induction variable across prologs and epilogs (the induction variable is implicit). This patch introduces a new API: /// Analyze loop L, which must be a single-basic-block loop, and if the /// conditions can be understood enough produce a PipelinerLoopInfo object. virtual std::unique_ptr<PipelinerLoopInfo> analyzeLoopForPipelining(MachineBasicBlock LoopBB) const; The return value is expected to be an implementation of the abstract class: /// Object returned by analyzeLoopForPipelining. Allows software pipelining /// implementations to query attributes of the loop being pipelined. class PipelinerLoopInfo { public: virtual ~PipelinerLoopInfo(); /// Return true if the given instruction should not be pipelined and should /// be ignored. An example could be a loop comparison, or induction variable /// update with no users being pipelined. virtual bool shouldIgnoreForPipelining(const MachineInstr MI) const = 0; /// Create a condition to determine if the trip count of the loop is greater /// than TC. /// /// If the trip count is statically known to be greater than TC, return /// true. If the trip count is statically known to be not greater than TC, /// return false. Otherwise return nullopt and fill out Cond with the test /// condition. virtual Optional<bool> createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB, SmallVectorImpl<MachineOperand> &Cond) = 0; /// Modify the loop such that the trip count is /// OriginalTC + TripCountAdjust. virtual void adjustTripCount(int TripCountAdjust) = 0; /// Called when the loop's preheader has been modified to NewPreheader. virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0; /// Called when the loop is being removed. virtual void disposed() = 0; }; The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while allowing the target to hold its own state across all calls. This API, in particular the disjunction of creating a trip count check condition and adjusting the loop, improves the code quality in ModuloSchedule.cpp. llvm-svn: 372463	2019-09-21 08:19:41 +00:00
Craig Topper	04682939eb	[X86] Use sse_load_f32/f64 and timm in patterns for memory form of vgetmantss/sd. Previously we only matched scalar_to_vector and scalar load, but we should be able to narrow a vector load or match vzload. Also need to match TargetConstant instead of Constant. The register patterns were previously updated, but not the memory patterns. llvm-svn: 372458	2019-09-21 06:44:29 +00:00
Craig Topper	4fa12ac92c	[X86] Add test case to show failure to fold load with getmantss due to isel pattern looking for Constant instead of TargetConstant The intrinsic has an immarg so its gets created with a TargetConstant instead of a Constant after r372338. The isel pattern was only updated for the register form, but not the memory form. llvm-svn: 372457	2019-09-21 06:44:24 +00:00
Matt Arsenault	eb6eb694e4	AMDGPU/GlobalISel: Allow selection of scalar min/max I believe all of the uniform/divergent pattern predicates are redundant and can be removed. The uniformity bit already influences the register class, and nothhing has broken when I've removed this and others. llvm-svn: 372450	2019-09-21 02:37:33 +00:00
Amara Emerson	7ac1039957	[GlobalISel] Defer setting HasCalls on MachineFrameInfo to selection time. We currently always set the HasCalls on MFI during translation and legalization if we're handling a call or legalizing to a libcall. However, if that call is later optimized to a tail call then we don't need the flag. The flag being set to true causes frame lowering to always save and restore FP/LR, which adds unnecessary code. This change does the same thing as SelectionDAG and ports over some code that scans instructions after selection, using TargetInstrInfo to determine if target opcodes are known calls. Code size geomean improvements on CTMark: -O0 : 0.1% -Os : 0.3% Differential Revision: https://reviews.llvm.org/D67868 llvm-svn: 372443	2019-09-20 23:52:07 +00:00
Teresa Johnson	2f32e5d84d	[Inliner] Remove incorrect early exit during switch cost computation Summary: The CallAnalyzer::visitSwitchInst has an early exit when the estimated lower bound of the switch cost will put the overall cost of the inline above the threshold. However, this code is not correctly estimating the lower bound for switches that can be transformed into bit tests, leading to unnecessary lost inlines, and also differing behavior with optimization remarks enabled. First, the early exit is controlled by whether ComputeFullInlineCost is enabled or not, and that in turn is disabled by default but enabled when enabling -pass-remarks=missed. This by itself wouldn't lead to a problem, except that as described below, the lower bound can be above the real lower bound, so we can sometimes get different inline decisions with inline remarks enabled, which is problematic. The early exit was added in along with a new switch cost model in D31085. The reason why this early exit was added is due to a concern one reviewer raised about compile time for large switches: https://reviews.llvm.org/D31085?id=94559#inline-276200 However, the code just below there calls getEstimatedNumberOfCaseClusters, which in turn immediately calls BasicTTIImpl getEstimatedNumberOfCaseClusters, which in the worst case does a linear scan of the cases to get the high and low values. The bit test handling in particular is guarded by whether the number of cases fits into the max bit width. There is no suggestion that anyone measured a compile time issue, it appears to be theoretical. The problem is that the reviewer's comment about the lower bound calculation is incorrect, specifically in the case of a switch that can be lowered to a bit test. This isn't followed up on the comment thread, but the author does add a FIXME to that effect above the early exit added when they subsequently revised the patch. As a result, we were incorrectly early exiting and not inlining functions with switch statements that would be lowered to bit tests in cases where we were nearing the threshold. Combined with the fact that this early exit was skipped with opt remarks enabled, this caused different inlining decisions to be made when -pass-remarks=missed is enabled to debug the missing inline. Remove the early exit for the above reasons. I also copied over an existing AArch64 inlining test to X86, and adjusted the threshold so that the bit test inline only occurs with the fix in this patch. Reviewers: davidxl Subscribers: eraman, kristof.beyls, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67716 llvm-svn: 372440	2019-09-20 23:29:17 +00:00
Wei Mi	f118852046	[SampleFDO] Expose an interface to return the size of a section or the size of the profile for profile in ExtBinary format. Sometimes we want to limit the size of the profile by stripping some functions with low sample count or by stripping some function names with small text size from profile symbol list. That requires the profile reader to have the interfaces returning the size of a section or the size of total profile. The patch add those interfaces. At the same time, add some dump facility to show the size of each section. llvm-svn: 372439	2019-09-20 23:24:50 +00:00
Ulrich Weigand	819c1651f7	[SystemZ] Support z15 processor name The recently announced IBM z15 processor implements the architecture already supported as "arch13" in LLVM. This patch adds support for "z15" as an alternate architecture name for arch13. The patch also uses z15 in a number of places where we used arch13 as long as the official name was not yet announced. llvm-svn: 372435	2019-09-20 23:04:45 +00:00
Sterling Augustine	4a58936716	Fix missed case of switching getConstant to getTargetConstant. Try 2. Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67850 llvm-svn: 372434	2019-09-20 22:26:55 +00:00
Jinsong Ji	216be996d6	[NFC][PowerPC] Consolidate testing of common linkage symbols Add a new file to test the code gen for common linkage symbol. Remove common linkage in some other testcases to avoid distraction. llvm-svn: 372426	2019-09-20 20:31:37 +00:00
Mitch Phillips	72a3d8597d	Revert "[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount" This commit broke the ASan buildbot. See comments in rL372376 for more information. This reverts commit `15e27b0b6d`. llvm-svn: 372425	2019-09-20 20:25:16 +00:00
Craig Topper	c139d1e281	[Mips] Remove immarg test for intrinsics that no longer have an immarg after r372409. llvm-svn: 372420	2019-09-20 18:52:49 +00:00
Roman Lebedev	081eebc58f	[NFC][InstCombine] Fixup newly-added tests llvm-svn: 372413	2019-09-20 17:43:46 +00:00
Evgeniy Stepanov	c2bda3e422	[MTE] Handle MTE instructions in AArch64LoadStoreOptimizer. Summary: Generate pre- and post-indexed forms of ST*G and STGP when possible. Reviewers: ostannard, vitalybuka Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67741 llvm-svn: 372412	2019-09-20 17:36:27 +00:00
Sebastian Pop	f6398fb72c	[aarch64] add def-pats for dot product This patch adds the patterns to select the dot product instructions. Tested on aarch64-linux with make check-all. Differential Revision: https://reviews.llvm.org/D67645 llvm-svn: 372408	2019-09-20 16:33:33 +00:00
Stanislav Mekhanoshin	af77ca7e6e	Remove assert from MachineLoop::getLoopPredecessor() According to the documentation method returns predecessor if the given loop's header has exactly one unique predecessor outside the loop. Otherwise return null. In reality it asserts if there is no predecessor outside of the loop. The testcase has the loop where predecessors outside of the loop were not identified as analyzeBranch() was unable to process the mask branch and returned true. That is also not correct to assert for the truly dead loops. Differential Revision: https://reviews.llvm.org/D67634 llvm-svn: 372405	2019-09-20 15:26:10 +00:00
Krzysztof Parzyszek	2b5d7e93dd	[MVT] Add v256i1 to MachineValueType This type can show up when lowering some HVX vector code on Hexagon. llvm-svn: 372403	2019-09-20 15:19:20 +00:00
Roman Lebedev	d21087af95	[InstCombine] Tests for (a+b)<=a && (a+b)!=0 fold (PR43259) https://rise4fun.com/Alive/knp https://rise4fun.com/Alive/ALap llvm-svn: 372402	2019-09-20 15:06:47 +00:00
Oliver Cruickshank	c84722ff27	[ARM] Fix CTTZ not generating correct instructions MVE CTTZ intrinsic should have been set to Custom, not Expand llvm-svn: 372401	2019-09-20 15:03:44 +00:00
David Stenberg	b71d8d465a	Add a missing space in a MIR parser error message llvm-svn: 372398	2019-09-20 14:41:41 +00:00
Sanjay Patel	4896f7243d	[SLPVectorizer] add tests for bogus reductions; NFC https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 llvm-svn: 372393	2019-09-20 14:17:00 +00:00
David Zarzycki	4fff87d2ee	[Testing] Python 3 requires `print` to use parens llvm-svn: 372392	2019-09-20 13:52:47 +00:00
David Tellenbach	2a47c77e72	[FastISel] Fix insertion of unconditional branches during FastISel The insertion of an unconditional branch during FastISel can differ depending on building with or without debug information. This happens because FastISel::fastEmitBranch emits an unconditional branch depending on the size of the current basic block without distinguishing between debug and non-debug instructions. This patch fixes this issue by ignoring debug instructions when getting the size of the basic block. Reviewers: aprantl Reviewed By: aprantl Subscribers: ormris, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67703 llvm-svn: 372389	2019-09-20 13:22:59 +00:00
Nico Weber	03475adcf7	Revert r372366 "Use getTargetConstant for BLENDI, and add a test to catch it." This reverts commit `52621307bc`. Tests have been failing all night with [0/2] ACTION //llvm/test:check-llvm(//llvm/utils/gn/build/toolchain:unix) -- Testing: 33647 tests, 64 threads -- Testing: 0 .. 10.. UNRESOLVED: LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll (6943 of 33647) ****************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll' FAILED **************** Test has no run line! ****************** Since there were other concerns on https://reviews.llvm.org/D67785, I'm just reverting for now. llvm-svn: 372383	2019-09-20 12:05:29 +00:00
George Rimar	4d69967f44	[yaml2obj/obj2yaml] - Do not trigger llvm_unreachable when dumping/parsing relocations and e_machine is unsupported. Currently when e_machine is set to something that is not supported by YAML lib, then tools fail with llvm_unreachable. In this patch I allow them to handle relocations in this case. It can be used to dump and create objects for broken or unsupported targets. Differential revision: https://reviews.llvm.org/D67657 llvm-svn: 372377	2019-09-20 09:15:36 +00:00
James Molloy	15e27b0b6d	[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount The way MachinePipeliner uses these target hooks is stateful - we reduce trip count by one per call to reduceLoopCount. It's a little overfit for hardware loops, where we don't have to worry about stitching a loop induction variable across prologs and epilogs (the induction variable is implicit). This patch introduces a new API: /// Analyze loop L, which must be a single-basic-block loop, and if the /// conditions can be understood enough produce a PipelinerLoopInfo object. virtual std::unique_ptr<PipelinerLoopInfo> analyzeLoopForPipelining(MachineBasicBlock LoopBB) const; The return value is expected to be an implementation of the abstract class: /// Object returned by analyzeLoopForPipelining. Allows software pipelining /// implementations to query attributes of the loop being pipelined. class PipelinerLoopInfo { public: virtual ~PipelinerLoopInfo(); /// Return true if the given instruction should not be pipelined and should /// be ignored. An example could be a loop comparison, or induction variable /// update with no users being pipelined. virtual bool shouldIgnoreForPipelining(const MachineInstr MI) const = 0; /// Create a condition to determine if the trip count of the loop is greater /// than TC. /// /// If the trip count is statically known to be greater than TC, return /// true. If the trip count is statically known to be not greater than TC, /// return false. Otherwise return nullopt and fill out Cond with the test /// condition. virtual Optional<bool> createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB, SmallVectorImpl<MachineOperand> &Cond) = 0; /// Modify the loop such that the trip count is /// OriginalTC + TripCountAdjust. virtual void adjustTripCount(int TripCountAdjust) = 0; /// Called when the loop's preheader has been modified to NewPreheader. virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0; /// Called when the loop is being removed. virtual void disposed() = 0; }; The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while allowing the target to hold its own state across all calls. This API, in particular the disjunction of creating a trip count check condition and adjusting the loop, improves the code quality in ModuloSchedule.cpp. llvm-svn: 372376	2019-09-20 08:57:46 +00:00
Owen Reynolds	25040f8dec	Reapply [llvm-ar] Include a line number when failing to parse an MRI script Reapply r372309 Errors that occur when reading an MRI script now include a corresponding line number. Differential Revision: https://reviews.llvm.org/D67449 llvm-svn: 372374	2019-09-20 08:10:14 +00:00
Craig Topper	a34f13f2ba	[X86] Use timm in MMX pinsrw/pextrw isel patterns. Add missing test cases. This fixes an isel failure after r372338. llvm-svn: 372371	2019-09-20 06:00:35 +00:00
Fangrui Song	c768ad94b7	[llvm-ar] Removes repetition in the error message As per bug 40244, fixed an error where the error message was repeated. Differential Revision: https://reviews.llvm.org/D67038 Patch by Yu Jian (wyjw) llvm-svn: 372370	2019-09-20 04:40:44 +00:00
Sterling Augustine	52621307bc	Use getTargetConstant for BLENDI, and add a test to catch it. Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: jvesely, wdng, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67785 Tighten up the test case. llvm-svn: 372366	2019-09-20 02:29:16 +00:00
Matt Arsenault	dd74f4839b	MachineScheduler: Fix missing dependency with multiple subreg defs If an instruction had multiple subregister defs, and one of them was undef, this would improperly conclude all other lanes are killed. There could still be other defs of those read-undef lanes in other operands. This would improperly remove register uses from CurrentVRegUses, so the visitation of later operands would not find the necessary register dependency. This would also mean this would fail or not depending on how different subregister def operands were ordered. On an undef subregister def, scan the instruction for other subregister defs and avoid killing those. This possibly should be deferring removing anything from CurrentVRegUses until the entire instruction has been processed instead. llvm-svn: 372362	2019-09-20 00:09:15 +00:00
Akira Hatanaka	75fbb171c3	[ObjC][ARC] Skip debug instructions when computing the insert point of objc_release calls This fixes a bug where the presence of debug instructions would cause ARC optimizer to change the order of retain and release calls. rdar://problem/55319419 llvm-svn: 372352	2019-09-19 20:58:51 +00:00
Jakub Kuderski	e6b2164723	Don't use invalidated iterators in FlattenCFGPass Summary: FlattenCFG may erase unnecessary blocks, which also invalidates iterators to those erased blocks. Before this patch, `iterativelyFlattenCFG` could try to increment a BB iterator after that BB has been removed and crash. This patch makes FlattenCFGPass use `WeakVH` to skip over erased blocks. Reviewers: dblaikie, tstellar, davide, sanjoy, asbirlea, grosser Reviewed By: asbirlea Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67672 llvm-svn: 372347	2019-09-19 19:39:42 +00:00
Shoaib Meenai	d89f2d872d	[Analysis] Allow -scalar-evolution-max-iterations more than once At present, `-scalar-evolution-max-iterations` is a `cl::Optional` option, which means it demands to be passed exactly zero or one times. Our build system makes it pretty tricky to guarantee this. We often accidentally pass the flag more than once (but always with the same value) which results in an error, after which compilation fails: ``` clang (LLVM option parsing): for the -scalar-evolution-max-iterations option: may only occur zero or one times! ``` It seems reasonable to allow -scalar-evolution-max-iterations to be passed more than once. Quoting the [[ http://llvm.org/docs/CommandLine.html#controlling-the-number-of-occurrences-required-and-allowed \| documentation ]]: > The cl::ZeroOrMore modifier ... indicates that your program will allow the option to be specified zero or more times. > ... > If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained. Original patch by: Enrico Bern Hardy Tanuwidjaja <etanuwid@fb.com> Differential Revision: https://reviews.llvm.org/D67512 llvm-svn: 372346	2019-09-19 18:21:32 +00:00
Jinsong Ji	ca4c5deae5	[NFC][PowerPC] Fast-isel VSX support test We have fixed most of the VSX limitation in Fast-isel, so we can remove the -mattr=-vsx for most testcases now. llvm-svn: 372345	2019-09-19 18:18:18 +00:00
Roman Lebedev	7a67ed5795	[InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251) Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. In this particular case, given ``` char* test(char& base, unsigned long offset) { return &base - offset; } ``` it will end up producing something like https://godbolt.org/z/luGEju which after optimizations reduces down to roughly ``` declare void @use64(i64) define i1 @test(i8* dereferenceable(1) %base, i64 %offset) { %base_int = ptrtoint i8* %base to i64 %adjusted = sub i64 %base_int, %offset call void @use64(i64 %adjusted) %not_null = icmp ne i64 %adjusted, 0 %no_underflow = icmp ule i64 %adjusted, %base_int %no_underflow_and_not_null = and i1 %not_null, %no_underflow ret i1 %no_underflow_and_not_null } ``` Without D67122 there was no `%not_null`, and in this particular case we can "get rid of it", by merging two checks: Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset` Alive proofs: https://rise4fun.com/Alive/QOs The `@llvm.usub.with.overflow` pattern itself is not handled here because this is the main pattern, that we currently consider canonical. https://bugs.llvm.org/show_bug.cgi?id=43251 Reviewers: spatel, nikic, xbolva00, majnemer Reviewed By: xbolva00, majnemer Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67356 llvm-svn: 372341	2019-09-19 17:25:19 +00:00
Alexander Timofeev	e2f9bc3b11	[AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from min/max lit tests. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67712 llvm-svn: 372340	2019-09-19 16:44:38 +00:00
Sanjay Patel	13e71ce693	[Float2Int] avoid crashing on unreachable code (PR38502) In the example from: https://bugs.llvm.org/show_bug.cgi?id=38502 ...we hit infinite looping/crashing because we have non-standard IR - an instruction operand is used before defined. This and other unusual constructs are allowed in unreachable blocks, so avoid the problem by using DominatorTree to step around landmines. Differential Revision: https://reviews.llvm.org/D67766 llvm-svn: 372339	2019-09-19 16:31:17 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Andrea Di Biagio	e0900f285b	[MCA] Improved cost computation for loop carried dependencies in the bottleneck analysis. This patch introduces a cut-off threshold for dependency edge frequences with the goal of simplifying the critical sequence computation. This patch also removes the cost normalization for loop carried dependencies. We didn't really need to artificially amplify the cost of loop-carried dependencies since it is already computed as the integral over time of the delay (in cycle). In the absence of backend stalls there is no need for computing a critical sequence. With this patch we early exit from the critical sequence computation if no bottleneck was reported during the simulation. llvm-svn: 372337	2019-09-19 16:05:11 +00:00
Matt Arsenault	7decdbf2db	X86: Add missing test for vshli SimplifyDemandedBitsForTargetNode This would have caught this regression which triggered the revert of r372285: https://bugs.chromium.org/p/chromium/issues/detail?id=1005750 llvm-svn: 372335	2019-09-19 15:44:00 +00:00
Simon Pilgrim	af6043557d	[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 372333	2019-09-19 15:02:47 +00:00
Sanjay Patel	7592e3a81f	[Float2Int] auto-generate complete test checks; NFC llvm-svn: 372324	2019-09-19 13:58:15 +00:00
James Molloy	88a5fbfcea	[TableGen] Support encoding per-HwMode Much like ValueTypeByHwMode/RegInfoByHwMode, this patch allows targets to modify an instruction's encoding based on HwMode. When the EncodingInfos field is non-empty the Inst and Size fields of the Instruction are ignored and taken from EncodingInfos instead. As part of this promote getHwMode() from TargetSubtargetInfo to MCSubtargetInfo. This is NFC for all existing targets - new code is generated only if targets use EncodingByHwMode. llvm-svn: 372320	2019-09-19 13:39:54 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
David Green	0cfb78e52a	[ARM] MVE i1 splat We needn't BFI each lane individually into a predicate register when each lane in the same. A simple sign extend and a vmsr will do. Differential Revision: https://reviews.llvm.org/D67653 llvm-svn: 372313	2019-09-19 12:17:41 +00:00
Owen Reynolds	aa03c14827	Revert [llvm-ar] Include a line number when failing to parse an MRI script Revert r372309 due to buildbot failures Differential Revision: https://reviews.llvm.org/D67449 llvm-svn: 372311	2019-09-19 11:22:59 +00:00
Owen Reynolds	04398c729b	[llvm-ar] Include a line number when failing to parse an MRI script Errors that occur when reading an MRI script now include a corresponding line number. Differential Revision: https://reviews.llvm.org/D67449 llvm-svn: 372309	2019-09-19 10:51:43 +00:00
Serguei Katkov	a44768858c	[Unroll] Add an option to control complete unrolling Add an ability to specify the max full unroll count for LoopUnrollPass pass in pass options. Reviewers: fhahn, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: hiraditya, zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D67701 llvm-svn: 372305	2019-09-19 06:57:29 +00:00
Craig Topper	c2d25ed1b3	[X86] Prevent crash in LowerBUILD_VECTORvXi1 for v64i1 vectors on 32-bit targets when the vector is a mix of constants and non-constant. We need to materialize the constants as two 32-bit values that are casted to v32i1 and then concatenated. llvm-svn: 372304	2019-09-19 06:50:39 +00:00
Sam Parker	56aa691c41	[ARM] Fix for buildbots I had missed that massive.mir also needed updating. llvm-svn: 372303	2019-09-19 06:50:19 +00:00
Matt Arsenault	bffbeecb44	AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.ds.swizzle llvm-svn: 372297	2019-09-19 04:11:17 +00:00
Matt Arsenault	494243597b	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.format This needs special handling due to some subtargets that have a nonstandard register layout for f16 vectors Also reject some illegal types on other targets. llvm-svn: 372293	2019-09-19 02:35:08 +00:00
Matt Arsenault	67f1f6ff8c	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store llvm-svn: 372292	2019-09-19 02:30:27 +00:00
Matt Arsenault	838ff36553	AMDGPU/GlobalISel: RegBankSelect struct buffer load/store llvm-svn: 372291	2019-09-19 02:26:53 +00:00
Matt Arsenault	a62ef58346	AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.raw.buffer.{load\|store} llvm-svn: 372290	2019-09-19 02:25:09 +00:00
Matt Arsenault	a30d022db6	AMDGPU/GlobalISel: Attempt to RegBankSelect image intrinsics Images should always have 2 consecutive, mandatory SGPR arguments. llvm-svn: 372289	2019-09-19 02:23:06 +00:00
Matt Arsenault	01213407c4	Fix typo llvm-svn: 372288	2019-09-19 02:15:29 +00:00
Matt Arsenault	c189f023ac	MachineScheduler: Fix assert from not checking subregs The assert would fail if there was a dead def of a subregister if there was a previous use of a different subregister. llvm-svn: 372287	2019-09-19 02:14:12 +00:00

... 2 3 4 5 6 ...

65467 Commits