llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjoy Das	7e36337935	[SCEV] A different fix for PR33494 Summary: I don't think rL309080 is the right fix for PR33494 -- caching ExitLimit only hides the problem[0]. The real issue is that because of how we forget SCEV expressions ScalarEvolution::getBackedgeTakenInfo, in the test case for PR33494 computing the backedge for any loop invalidates the trip count for every other loop. This effectively makes the SCEV cache useless. I've instead made the SCEV expression invalidation in ScalarEvolution::getBackedgeTakenInfo less aggressive to fix this issue. [0]: One way to think about this is that rL309080 essentially augmented the backedge-taken-count cache with another equivalent exit-limit cache. The bug went away because we were explicitly not clearing the exit-limit cache in getBackedgeTakenInfo. But instead of doing all of that, we can just avoid clearing the backedge-taken-count cache. Reviewers: mkazantsev, mzolotukhin Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D39361 llvm-svn: 319678	2017-12-04 19:22:00 +00:00
Sanjoy Das	aa92cae14e	[BypassSlowDivision] Improve our handling of divisions by constants (This reapplies r314253. r314253 was reverted on r314482 because of a correctness regression on P100, but that regression was identified to be something else.) Summary: Don't bail out on constant divisors for divisions that can be narrowed without introducing control flow . This gives us a 32 bit multiply instead of an emulated 64 bit multiply in the generated PTX assembly. Reviewers: jlebar Subscribers: jholewinski, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38265 llvm-svn: 319677	2017-12-04 19:21:58 +00:00
Matthias Braun	7eae251bae	MachineVerifier: undef phi arg doesn't need to be live-out from predecessor Differential Revision: https://reviews.llvm.org/D40756 llvm-svn: 319674	2017-12-04 18:57:48 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Pablo Barrio	2b4385846c	Fix function pointer tail calls in armv8-M.base Summary: The compiler fails with the following error message: fatal error: error in backend: ran out of registers during register allocation Tail call optimization for Armv8-M.base fails to meet all the required constraints when handling calls to function pointers where the arguments take up r0-r3. This is because the pointer to the function to be called can only be stored in r0-r3, but these are all occupied by arguments. This patch makes sure that tail call optimization does not try to handle this type of calls. Reviewers: chill, MatzeB, olista01, rengolin, efriedma Reviewed By: olista01, efriedma Subscribers: efriedma, aemerson, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D40706 llvm-svn: 319664	2017-12-04 16:55:49 +00:00
Pavel Labath	f2fdc183b7	Revert "[cmake] Enable zlib support on windows" This reverts commit r319533 as it broke llvm-config --system-libs output and everything that depends on it (which is mostly out of tree or downstream folks, but includes a couple of llvm buildbots as well). I think I have a fix for this in D40779, but I want someone to look review it first. In the mean time, I am reverting this change, as it seems to break a lot of people. llvm-svn: 319663	2017-12-04 16:46:20 +00:00
Sam Kolton	5f7f32c382	[AMDGPU] SDWA: add support for PRESERVE into SDWA peephole. Summary: Reviewers: arsenm, vpykhtin, rampitec Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D37817 llvm-svn: 319662	2017-12-04 16:22:32 +00:00
Anna Thomas	7b360434ff	[Loop Predication] Teach LP about reverse loops Summary: Currently, we only support predication for forward loops with step of 1. This patch enables loop predication for reverse or countdownLoops, which satisfy the following conditions: 1. The step of the IV is -1. 2. The loop has a singe latch as B(X) = X <pred> latchLimit with pred as s> or u> 3. The IV of the guard is the decrement IV of the latch condition (Guard is: G(X) = X-1 u< guardLimit). This patch was downstream for a while and is the last series of patches that's from our LP implementation downstream. Reviewers: apilipenko, mkazantsev, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40353 llvm-svn: 319659	2017-12-04 15:11:48 +00:00
Jonas Hahnfeld	5db24d7c22	[NVPTX] Assign valid global names PTX requires that identifiers consist only of [a-zA-Z0-9_$]. The existing pass already ensured this for globals and this patch adds the cleanup for functions with local linkage. However, there was a different problem in the case of collisions of the adjusted name: The ValueSymbolTable then automatically appended ".N" with increasing Ns to get a unique name while helping the ABI demangling. Special case this behavior to omit the dots and append N directly. This will always give us legal names according to the PTX requirements. Differential Revision: https://reviews.llvm.org/D40573 llvm-svn: 319657	2017-12-04 14:19:33 +00:00
Oliver Stannard	7ab60605f8	Revert r319649 - [Asm, ARM] Add fallback diag for multiple invalid operands This is causing a failure in the llvm-clang-x86_64-expensive-checks-win buildbot, and I can't reproduce it locally, so reverting until I can work out what is wrong. llvm-svn: 319654	2017-12-04 13:42:22 +00:00
Sam McCall	d0d43e6f14	Revert "[ValueTracking] Pass only a single lambda to computeKnownBitsFromShiftOperator by using KnownBits struct instead of separate APInts. NFCI" This reverts commit r319624, which seems to cause a miscompile (breaks the multistage PPC buildbots) llvm-svn: 319652	2017-12-04 12:51:49 +00:00
Tim Corringham	6c6d5e24cd	AMDGPU: fix missing s_waitcnt Summary: The pass that inserts s_waitcnt instructions where needed propagated info used to track dependencies for each block by iterating over the predecessor blocks. The iteration was terminated when a predecessor that had not yet been processed was encountered. Any info in blocks later in the list was therefore not processed, leading to the possiblility of a required s_waitcnt not being inserted. The fix is simply to change the "break" to "continue" for the relevant loops, so that all visited blocks are processed. This is likely what was intended when the code was written. There is no test case provided for this fix because: 1) the only example that reproduces this is large and resistant to being reduced 2) the change is trivial Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D40544 llvm-svn: 319651	2017-12-04 12:30:49 +00:00
Oliver Stannard	7cd4db94f8	[Asm, ARM] Add fallback diag for multiple invalid operands This adds a "invalid operands for instruction" diagnostic for instructions where there is an instruction encoding with the correct mnemonic and which is available for this target, but where multiple operands do not match those which were provided. This makes it clear that there is some combination of operands that is valid for the current target, which the default diagnostic of "invalid instruction" does not. Since this is a very general error, we only emit it if we don't have a more specific error. Differential revision: https://reviews.llvm.org/D36747 llvm-svn: 319649	2017-12-04 12:02:32 +00:00
Jonas Paulsson	e86327f290	[TwoAddressInstructionPass] Bugfix in handling of sunk instructions. An instruction returned by TII->convertToThreeAddress() may contain a %noreg (undef) operand, which is not expected by tryInstructionTransform(). So if this MI is sunk to a lower point in MBB, it must be skipped when later encountered. A new set SunkInstrs is used for this purpose. Note: there is no test supplied here, as this was triggered on SystemZ while working on a review of instruction flags. A test case for this bugfix will be included in the upcoming SystemZ commit. Review: Quentin Colombet https://reviews.llvm.org/D40711 llvm-svn: 319646	2017-12-04 10:03:14 +00:00
Sam Parker	1e26d986aa	[DAGCombine] Remove isAndLoadExtLoad arguments Both LoadedVT and NarrowLoad are passed as references and neither of them are used by any of its callers. Differential Revision: https://reviews.llvm.org/D40713 llvm-svn: 319645	2017-12-04 09:48:26 +00:00
Martin Storsjo	eca862de07	[AArch64] Allow using emulated tls on platforms other than ELF This matches how it is done on X86. This allows using emulated tls on windows; in MinGW environments, native tls isn't supported at the moment. Set the right Data*bitsDirective for windows to match the existing tests for other platforms. Make parts of the existing tests a regex, to allow matching .section .rdata for windows, to avoid having to duplicate the rest of the tests for windows. Differential Revision: https://reviews.llvm.org/D40770 llvm-svn: 319644	2017-12-04 09:09:04 +00:00
Martin Storsjo	c85cc41801	[ARM] Allow using emulated tls on platforms other than ELF This matches how it is done on X86. This allows using emulated tls on windows; in MinGW environments, native tls isn't supported at the moment. Differential Revision: https://reviews.llvm.org/D40769 llvm-svn: 319643	2017-12-04 09:08:55 +00:00
Craig Topper	4520d4f8ad	[X86] Allow VPMAXUQ/VPMAXSQ/VPMINUQ/VPMINSQ to be used with 128/256 bit vectors when AVX512 is enabled. These instructions can be used by widening to 512-bits and extracting back to 128/256. We do similar to several other instructions already. llvm-svn: 319641	2017-12-04 07:21:01 +00:00
Craig Topper	1151facf76	[X86] Don't turn UINT_TO_FP into SINT_TO_FP during lowering. We already do this as a DAG combine. The version during lowering can only trigger if known bits changes something that improves known bits analysis. But this means we should be improving known bits analysis to work on the unlowered form instead. llvm-svn: 319640	2017-12-04 05:38:44 +00:00
Craig Topper	67217d7eb4	[SelectionDAG] Teach computeKnownBits some improvements to ISD::SRL with a non-splat constant shift amount. If we have a non-splat constant shift amount, the minimum shift amount can be used to infer the number of zero upper bits of the result. There's probably a lot more that we can do here, but this fixes a case where I wanted to infer the sign bit as zero when all the shift amounts are non-zero. llvm-svn: 319639	2017-12-04 05:38:42 +00:00
Simon Pilgrim	569e53b0f6	[X86][AVX512] Tag PH2PS/PS2PH conversion instructions scheduler classes llvm-svn: 319637	2017-12-03 21:43:54 +00:00
Simon Pilgrim	465a88bb92	[X86][AVX512] Tag packed F2I/I2F/F2F conversion instructions scheduler class llvm-svn: 319636	2017-12-03 21:16:12 +00:00
Simon Pilgrim	bc8d0223fb	[X86][SSE] Remove unused IIC_SSE_CVT_PI2PS_RR/IIC_SSE_CVT_PI2PS_RM itineraries llvm-svn: 319634	2017-12-03 20:57:04 +00:00
Yaxun Liu	30e4608cca	CodeGen: Fix SelectionDAGISel::LowerArguments for sret addr space SelectionDAGISel::LowerArguments assumes sret addr space is 0, which is not true for amdgcn---amdgiz target. This patch fixes that. Differential Revision: https://reviews.llvm.org/D40255 llvm-svn: 319630	2017-12-03 03:31:45 +00:00
Craig Topper	f3470e1ed4	[SelectionDAG] Use the inlined APInt shift methods since we've already bounds checked the shift. The version that takes APInt is out of line. The 'unsigned' version optimizes for the common case of single word APInts. llvm-svn: 319628	2017-12-03 03:07:09 +00:00
Sam Clegg	a2b35dac03	Reland "[WebAssembly] Add visibility flag to Wasm symbol flags"" Original change was rL319488. This was reverted rL319602 due to a gcc 7.1 warning. Differential Revision: https://reviews.llvm.org/D40772 llvm-svn: 319626	2017-12-03 01:19:23 +00:00
Craig Topper	199acd88e3	[ValueTracking] Pass only a single lambda to computeKnownBitsFromShiftOperator by using KnownBits struct instead of separate APInts. NFCI llvm-svn: 319624	2017-12-02 23:42:17 +00:00
Yaxun Liu	494770403a	CodeGen: Fix pointer info in SplitVecOp_EXTRACT_VECTOR_ELT/SplitVecRes_INSERT_VECTOR_ELT Two issues found when doing codegen for splitting vector with non-zero alloca addr space: DAGTypeLegalizer::SplitVecRes_INSERT_VECTOR_ELT/SplitVecOp_EXTRACT_VECTOR_ELT uses dummy pointer info for creating SDStore. Since one pointer operand contains multiply and add, InferPointerInfo is unable to infer the correct pointer info, which ends up with a dummy pointer info for the target to lower store and results in isel failure. The fix is to introduce MachinePointerInfo::getUnknownStack to represent MachinePointerInfo which is known in alloca address space but without other information. TargetLowering::getVectorElementPointer uses value type of pointer in addr space 0 for multiplication of index and then add it to the pointer. However the pointer may be in an addr space which has different size than addr space 0. The fix is to use the pointer value type for index multiplication. Differential Revision: https://reviews.llvm.org/D39758 llvm-svn: 319622	2017-12-02 22:13:22 +00:00
Simon Pilgrim	299a54c5b9	[X86][SSE] Cleanup float/int conversion scheduler itinerary classes Makes it easier to grok where each is supposed to be used, mainly useful for adding to the AVX512 instructions but hopefully can be used more in SSE/AVX as well. llvm-svn: 319614	2017-12-02 12:27:44 +00:00
Craig Topper	7d9a3b82c6	[X86] Teach the assembler to support %db8-%db15 as aliases for %dr8-%dr15. llvm-svn: 319612	2017-12-02 08:27:46 +00:00
Craig Topper	3e846ecb5b	[X86] Support %dr8-%dr15 in the assembler. Apparently I failed to make this work when I fixed it in the disassembler way back in r224862. llvm-svn: 319611	2017-12-02 08:27:45 +00:00
Tatyana Krasnukha	f665f6a279	[ARC] Add instruction subset for the ARC backend. Reviewers: petecoup, kparzysz Reviewed By: petecoup Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37983 llvm-svn: 319609	2017-12-02 05:25:17 +00:00
Nirav Dave	839ff79a8d	[DAG][AArch64] Disable post-legalization store Disable post-legalization store for AArch64 backend which is causing errors out-of-tree. llvm-svn: 319607	2017-12-02 04:01:26 +00:00
Heejin Ahn	e74a864cec	[WebAssembly] Revert r319488 "Add visibility flag to Wasm symbol flags" This patch reportedly broke one of LLVM bots (ubuntu-gcc7.1-werror). See http://lab.llvm.org:8011/builders/ubuntu-gcc7.1-werror/builds/3369 for details. llvm-svn: 319602	2017-12-02 02:05:06 +00:00
Matt Morehouse	9e658c974b	Revert "[X86] Improvement in CodeGen instruction selection for LEAs." This reverts r319543, due to ASan bot breakage. llvm-svn: 319591	2017-12-01 22:20:26 +00:00
Jessica Paquette	52df8015c5	[MachineOutliner] NFC: Throw out self-intersections on candidates early Currently, the outliner considers candidates that intersect with themselves in the candidate pruning step. That is, candidates of the form "AA" in ranges like "AAAAAA". In that range, it looks like there are 5 instances of "AA" that could possibly be outlined, and that's considered in the benefit calculation. However, only at most 3 instances of "AA" could ever be outlined in "AAAAAA". Thus, it's possible to pass through "AA" to the candidate selection step even though it's never the case that "AA" could be outlined. This makes it so that when we find candidates, we consider only non-overlapping occurrences of that candidate. llvm-svn: 319588	2017-12-01 21:56:56 +00:00
Nirav Dave	3e76e1e89e	[DAG][ARM] Revert "Reenable post-legalize store merge" due to failures in AArch and ARM code gen. llvm-svn: 319587	2017-12-01 21:55:47 +00:00
Jake Ehrlich	3da7982cca	[MC] Handle unknown literal register numbers in .cfi_* directives r230670 introduced a step to map EH register numbers to standard DWARF register numbers. This failed to consider the case when a user .cfi_* directive uses an integer literal rather than a register name, to specify a DWARF register number that has no corresponding LLVM register number (e.g. a special register that the compiler and assembler have no name for). Fixes PR34028. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D36493 llvm-svn: 319586	2017-12-01 21:44:27 +00:00
Philip Reames	6260cf71d3	[IndVars] Fix a bug introduced in r317012 Turns out we can have comparisons which are indirect users of the induction variable that we can make invariant. In this case, there is no loop invariant value contributing and we'd fail an assert. The test case was found by a java fuzzer and reduced. It's a real cornercase. You have to have a static loop which we've already proven only executes once, but haven't broken the backedge on, and an inner phi whose result can be constant folded by SCEV using exit count reasoning but not proven by isKnownPredicate. To my knowledge, only the fuzzer has hit this case. llvm-svn: 319583	2017-12-01 20:57:19 +00:00
Adam Nemet	9303f62255	[opt-remarks] If hotness threshold is set, ignore remarks without hotness These are blocks that haven't not been executed during training. For large projects this could make a significant difference. For the project, I was looking at, I got an order of magnitude decrease in the size of the total YAML files with this and r319235. Differential Revision: https://reviews.llvm.org/D40678 Re-commit after fixing the failing testcase in rL319576, rL319577 and rL319578. llvm-svn: 319581	2017-12-01 20:41:38 +00:00
Eli Friedman	b34a8198a9	[DAGCombine] Simplify ISD::AND handling in ReduceLoadWidth Followup to D39595. Removes a bunch of redundant checks. Differential Revision: https://reviews.llvm.org/D40667 llvm-svn: 319573	2017-12-01 19:33:56 +00:00
Simon Pilgrim	031d8b71b3	[X86][AVX512] Tag subvector extract/insert instructions scheduler classes llvm-svn: 319568	2017-12-01 18:40:32 +00:00
Benjamin Kramer	094ac65d72	[IR] Avoid dangling else warning. NFC. llvm-svn: 319567	2017-12-01 18:39:58 +00:00
Fedor Sergeev	3b459c3847	IR printing improvement for loop passes - handle -print-module-scope Summary: Adding support for -print-module-scope similar to how it is being done for function passes. This option causes loop-pass printer to emit a whole-module IR instead of just a loop itself. Reviewers: sanjoy, silvas, weimingz Reviewed By: sanjoy Subscribers: apilipenko, skatkov, llvm-commits Differential Revision: https://reviews.llvm.org/D40247 llvm-svn: 319566	2017-12-01 18:33:58 +00:00
Paul Robinson	ab69b477a9	[DebugInfo] Bail out if making no progress dumping line tables. llvm-svn: 319564	2017-12-01 18:25:30 +00:00
Adam Nemet	57783730fd	Revert "[opt-remarks] If hotness threshold is set, ignore remarks without hotness" This reverts commit r319556. Something is not working with this when used with sample-based profiling. Investigating... llvm-svn: 319562	2017-12-01 18:12:29 +00:00
Fedor Sergeev	94dca7c7ea	IR printing improvement for function passes - introducing -print-module-scope Summary: When debugging function passes it happens to be rather useful to dump the whole module before the transformation and then use this dump to analyze this single transformation by running it separately on that particular module state. Introducing -print-module-scope debugging option that forces all the function-level IR dumps to become whole-module dumps. This option builds on top of normal dumping controls like -print-before/after -filter-print-funcs The plan is to eventually extend this option to cover other local passes (at least loop passes) but that should go as a separate change. Reviewers: sanjoy, weimingz, silvas, fedor.sergeev Reviewed By: weimingz Subscribers: apilipenko, skatkov, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D40245 llvm-svn: 319561	2017-12-01 17:42:46 +00:00
Simon Pilgrim	8d5e469c32	Fix line endings. NFCI. llvm-svn: 319559	2017-12-01 17:24:15 +00:00
Simon Pilgrim	fb01cb1b0c	[X86][AVX512] Tag VPERM2I/VPERM2T instructions scheduler class llvm-svn: 319558	2017-12-01 17:23:06 +00:00
Adam Nemet	8d1fc2b65b	[opt-remarks] If hotness threshold is set, ignore remarks without hotness These are blocks that haven't not been executed during training. For large projects this could make a significant difference. For the project, I was looking at, I got an order of magnitude decrease in the size of the total YAML files with this and r319235. Differential Revision: https://reviews.llvm.org/D40678 llvm-svn: 319556	2017-12-01 17:02:04 +00:00
Simon Pilgrim	54c6083fb1	[X86][AVX512] Tag VFPCLASS instructions scheduler class llvm-svn: 319554	2017-12-01 16:51:48 +00:00
Simon Pilgrim	07b4c5917e	[X86][AVX512] Tag VPSHUFBITQMB instructions scheduler class llvm-svn: 319553	2017-12-01 16:35:57 +00:00
Simon Pilgrim	904d1a895c	[X86][AVX512] Tag VPCOMRESS/VPEXPAND instructions scheduler classes llvm-svn: 319551	2017-12-01 16:20:03 +00:00
Hans Wennborg	e2470b95da	Revert r319531 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops." It causes builds to fail with "Instruction does not dominate all uses" (PR35497). > Patch tries to improve vectorization of the following code: > > void add1(int * __restrict dst, const int * __restrict src) { > dst++ = src++; > dst++ = src++ + 1; > dst++ = src++ + 2; > dst++ = src++ + 3; > } > Allows to vectorize even if the very first operation is not a binary add, but just a load. > > Fixed issues related to previous commit. > > Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev > > Reviewed By: ABataev, RKSimon > > Subscribers: llvm-commits, RKSimon > > Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 319550	2017-12-01 16:17:24 +00:00
Nirav Dave	eb2b24fded	[ARM][DAG] Reenable post-legalize store merge Summary: Reenable post-legalize stores with constant merging computation and cofrresponding test case. Reviewers: eastig, efriedma Subscribers: aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40701 llvm-svn: 319547	2017-12-01 14:49:26 +00:00
Jatin Bhateja	328199ec26	[X86] Improvement in CodeGen instruction selection for LEAs. Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will now look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. 4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put. PR32755 will be taken care of by this pathc. Previous patch revisions : r313343 , r314886 Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja Reviewed By: lsaba, RKSimon, jbhateja Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 319543	2017-12-01 14:07:38 +00:00
Simon Pilgrim	2dc4ff1cde	[X86][AVX512] Tag vshift/vpermv/pshufd/pshufb instructions scheduler classes llvm-svn: 319540	2017-12-01 13:25:54 +00:00
Mikael Holmen	9c13c8b6ec	Revert r319537: Bail out of a SimplifyCFG switch table opt at undef values. Broke build bots so reverting. llvm-svn: 319539	2017-12-01 13:11:39 +00:00
Florian Hahn	30932a3c16	[InstSimplify] More fcmp cases when comparing against negative constants. Summary: For known positive non-zero value X: fcmp uge X, -C => true fcmp ugt X, -C => true fcmp une X, -C => true fcmp oeq X, -C => false fcmp ole X, -C => false fcmp olt X, -C => false Patch by Paul Walker. Reviewers: majnemer, t.p.northover, spatel, RKSimon Reviewed By: spatel Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D40012 llvm-svn: 319538	2017-12-01 12:34:16 +00:00
Mikael Holmen	9f047795fb	Bail out of a SimplifyCFG switch table opt at undef values. Summary: A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef. The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization. Patch by: JesperAntonsson Reviewers: spatel, eeckstein, hans Reviewed By: hans Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40639 llvm-svn: 319537	2017-12-01 12:30:49 +00:00
Nemanja Ivanovic	4364513cb2	Follow-up to r319434 to turn the pass on by default Now that the patch has gone through the buildbot cycle, turn it on by default. llvm-svn: 319535	2017-12-01 12:02:59 +00:00
Alexander Timofeev	c1425c9d6b	[AMDGPU] SiFixSGPRCopies should not modify non-divergent PHI Differential revision: https://reviews.llvm.org/D40556 llvm-svn: 319534	2017-12-01 11:56:34 +00:00
Pavel Labath	11ce6e6a83	[cmake] Enable zlib support on windows Summary: zlib support was hard-wired to off for (non-cygwin) windows targets. This disables some features, such as reading debug info from compressed dwarf sections. This has been this way since zlib support was added in 2013 (r180083), but there is no obvious reason for that. Zlib is perfectly capable of being compiled for windows (it even has a cmake file that works out of the box). This enables one to turn on zlib support on windows, if one has zlib avaliable. Reviewers: rnk, beanz Subscribers: mgorny, aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D40655 llvm-svn: 319533	2017-12-01 11:41:07 +00:00
Dinar Temirbulatov	29e86584c6	[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. Patch tries to improve vectorization of the following code: void add1(int * __restrict dst, const int * __restrict src) { dst++ = src++; dst++ = src++ + 1; dst++ = src++ + 2; dst++ = src++ + 3; } Allows to vectorize even if the very first operation is not a binary add, but just a load. Fixed issues related to previous commit. Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev Reviewed By: ABataev, RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 319531	2017-12-01 11:10:47 +00:00
Volkan Keles	a32ff00b00	GlobalISel: Enable the legalization of G_MERGE_VALUES and G_UNMERGE_VALUES Summary: LegalizerInfo assumes all G_MERGE_VALUES and G_UNMERGE_VALUES instructions are legal, so it is not possible to legalize vector operations on illegal vector types. This patch fixes the problem by removing the related check and adding default actions for G_MERGE_VALUES and G_UNMERGE_VALUES. Reviewers: qcolombet, ab, dsanders, aditya_nandakumar, t.p.northover, kristof.beyls Reviewed By: dsanders Subscribers: rovka, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D39823 llvm-svn: 319524	2017-12-01 08:19:10 +00:00
Hiroshi Inoue	48e4c7aae6	Recommit rL319407: [SROA] enable splitting for non-whole-alloca loads and stores Recommiting once reverted patch rL319407 after adding a check for bit vector size to avoid failures in some build bots. llvm-svn: 319522	2017-12-01 06:05:05 +00:00
Craig Topper	f8470a6399	[X86] Custom legalize v2i32 gathers via widening rather than promoting. The default legalization for v2i32 is promotion to v2i64. This results in a gather that reads 64-bit elements rather than 32. If one of the elements is near a page boundary this can cause an illegal access that can fault. We also miscalculate the scale for the gather which is an even worse problem, but we probably could have found a separate way to fix that. llvm-svn: 319521	2017-12-01 06:02:02 +00:00
Craig Topper	c261213abc	[X86][SelectionDAG] Make sure we explicitly sign extend the index when type promoting the index of scatter and gather. Type promotion makes no guarantee about the contents of the promoted bits. Since the gather/scatter instruction will use the bits to calculate addresses, we need to ensure they aren't garbage. llvm-svn: 319520	2017-12-01 06:02:00 +00:00
Craig Topper	11f733df9b	[X86] Add a DAG combine to simplify masks for AVX2 gather instructions. AVX2 gathers only use the upper bit of the mask allowing us to simplify sign_extend_inreg to a shift left. llvm-svn: 319514	2017-12-01 02:49:07 +00:00
Jake Ehrlich	1a468481c0	Add flag to ArchiveWriter to test GNU64 format more efficiently Even with the sparse file optimizations the SYM64 test can still be painfully slow. This unnecessarily slows down devs. It's critical that we test that the switch to the SYM64 format occurs at 4GB but there isn't any better of a way to fake the size of the file than sparse files. This change introduces a flag that allows the cutoff to be arbitrarily set to whatever power of two is desired. The flag is hidden as it really isn't meant to be used outside this one test. This is unfortunate but appears necessary, at least until the average hard drive is much faster. The changes to the test require some explanation. Prior to this change we knew that the SYM64 format was being used because the file was simply too large to have validly handled this case if the SYM64 format were not used. To ensure that the SYM64 format is still being used I am grepping the file for "SYM64". Without changing the filename however this would be pointless because "SYM64" would occur in the file either way. So the filename of the test is also changed in order to avoid this issue. Differential Revision: https://reviews.llvm.org/D40632 llvm-svn: 319507	2017-12-01 00:54:28 +00:00
Zachary Turner	8065f0b975	Mark all library options as hidden. These command line options are not intended for public use, and often don't even make sense in the context of a particular tool anyway. About 90% of them are already hidden, but when people add new options they forget to hide them, so if you were to make a brand new tool today, link against one of LLVM's libraries, and run tool -help you would get a bunch of junk that doesn't make sense for the tool you're writing. This patch hides these options. The real solution is to not have libraries defining command line options, but that's a much larger effort and not something I'm prepared to take on. Differential Revision: https://reviews.llvm.org/D40674 llvm-svn: 319505	2017-12-01 00:53:10 +00:00
Matt Arsenault	686d5c728f	AMDGPU: Use carry-less adds in FI elimination llvm-svn: 319501	2017-11-30 23:42:30 +00:00
Peter Collingbourne	1f03422610	ThinLTOBitcodeWriter: Try harder to discard unused references to the merged module. If the thin module has no references to an internal global in the merged module, we need to make sure to preserve that property if the global is a member of a comdat group, as otherwise promotion can end up adding global symbols to the comdat, which is not allowed. This situation can arise if the external global in the thin module has dead constant users, which would cause use_empty() to return false and would cause us to try to promote it. To prevent this from happening, discard the dead constant users before asking whether a global is empty. Differential Revision: https://reviews.llvm.org/D40593 llvm-svn: 319494	2017-11-30 23:05:52 +00:00
Zachary Turner	f0e4c6a819	Simplify the DenseSet used for hashing CodeView records. This was storing the hash alongside the key so that the hash doesn't need to be re-computed every time, but in doing so it was allocating a structure to keep the key size small in the DenseMap. This is a noble goal, but it also leads to a pointer indirection on every probe, and this cost of this pointer indirection ends up being higher than the cost of having a slightly larger entry in the hash table. Removing this not only simplifies the code, but yields a small but noticeable performance improvement in the type merging algorithm. llvm-svn: 319493	2017-11-30 23:00:30 +00:00
Matt Arsenault	84445dd13c	AMDGPU: Use gfx9 carry-less add/sub instructions llvm-svn: 319491	2017-11-30 22:51:26 +00:00
Reid Kleckner	ba4014e9dc	XOR the frame pointer with the stack cookie when protecting the stack Summary: This strengthens the guard and matches MSVC. Reviewers: hans, etienneb Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits Differential Revision: https://reviews.llvm.org/D40622 llvm-svn: 319490	2017-11-30 22:41:21 +00:00
Sam Clegg	9138b7b005	Add visibility flag to Wasm symbol flags The LLVM "hidden" flag needs to be passed through the Wasm intermediate objects in order for the linker to apply it to the final Wasm object. The corresponding change in LLD is here: https://github.com/WebAssembly/lld/pull/14 Patch by Nicholas Wilson Differential Revision: https://reviews.llvm.org/D40442 llvm-svn: 319488	2017-11-30 22:34:58 +00:00
Dan Gohman	59e4c0b938	[memcpyopt] Teach memcpyopt to optimize across basic blocks This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. Fixes PR28958. Differential Revision: https://reviews.llvm.org/D38374 llvm-svn: 319482	2017-11-30 22:10:53 +00:00
Davide Italiano	9d939c8f19	[InlineCost] Prefer getFunction() to two calls to getParent(). Improves clarity, also slightly cheaper. NFCI. llvm-svn: 319481	2017-11-30 22:10:35 +00:00
Krzysztof Parzyszek	d76814200b	[Hexagon] Implement HexagonSubtarget::useAA() llvm-svn: 319477	2017-11-30 21:25:28 +00:00
Daniel Sanders	0c43b3a023	[globalisel][tablegen] Add support for relative AtomicOrderings No test yet because the relevant rules are blocked on the atomic_load, and atomic_store nodes. llvm-svn: 319475	2017-11-30 21:05:59 +00:00
Krzysztof Parzyszek	44555225a6	[Hexagon] Solo instructions cannot be used with new value jumps llvm-svn: 319470	2017-11-30 20:32:54 +00:00
Craig Topper	d4257565cf	[X86] Promote i8 CTPOP to i32 instead of i16 when we have the POPCNT instruction. The 32-bit version is shorter to encode and the zext we emit for the promotion is likely going to be a 32-bit zero extend anyway. llvm-svn: 319468	2017-11-30 20:15:31 +00:00
Daniel Sanders	aef1dfc690	[aarch64][globalisel] Legalize G_ATOMIC_CMPXCHG_WITH_SUCCESS and G_ATOMICRMW_* G_ATOMICRMW_* is generally legal on AArch64. The exception is G_ATOMICRMW_NAND. G_ATOMIC_CMPXCHG_WITH_SUCCESS needs to be lowered to G_ATOMIC_CMPXCHG with an external comparison. Note that IRTranslator doesn't generate these instructions yet. llvm-svn: 319466	2017-11-30 20:11:42 +00:00
Amara Emerson	d78d65c2a4	[GlobalISel][IRTranslator] Fix crash during translation of zero sized loads/stores/args/returns. This fixes PR35358. rdar://35619533 Differential Revision: https://reviews.llvm.org/D40604 llvm-svn: 319465	2017-11-30 20:06:02 +00:00
Xinliang David Li	c23d2c6883	[PGO] Skip counter promotion for infinite loops Differential Revision: http://reviews.llvm.org/D40662 llvm-svn: 319462	2017-11-30 19:16:25 +00:00
Zachary Turner	ca6dbf1440	Split TypeTableBuilder into two classes. llvm-svn: 319456	2017-11-30 18:39:50 +00:00
Dan Gohman	78c19d60a9	[WebAssembly] Revert r319186 "Support bitcasted function addresses with varargs." The patch broke Emscripten's EM_ASM macros, which utiltize unprototyped functions. See https://bugs.llvm.org/show_bug.cgi?id=35385 for details. llvm-svn: 319452	2017-11-30 18:16:49 +00:00
Francis Visoiu Mistrih	c71cced0aa	[CodeGen] Always use `printReg` to print registers in both MIR and debug output As part of the unification of the debug format and the MIR format, always use `printReg` to print all kinds of registers. Updated the tests using '_' instead of '%noreg' until we decide which one we want to be the default one. Differential Revision: https://reviews.llvm.org/D40421 llvm-svn: 319445	2017-11-30 16:12:24 +00:00
Igor Laevsky	0cdf7fdc48	[FuzzMutate] Bailout from injecting into empty basic blocks. In rare cases we can receive request to inject into completelly empty basic block. In the normal case all basic blocks contain at least terminator instruction, but it is possible that the only instruction is catchpad instruction which is not part of the instruction iterator. This case seems rare enough to not care about it. Submiting without review, since it seems almost NFC. I couldn't come up with any reasonable way to test this. llvm-svn: 319444	2017-11-30 15:41:58 +00:00
Igor Laevsky	33031926b6	[FuzzMutate] Correctly handle vector types in the insertvalue operation Differential Revision: https://reviews.llvm.org/D40397 llvm-svn: 319442	2017-11-30 15:31:13 +00:00
Igor Laevsky	65902db279	[FuzzMutate] Don't use index operands as sinks Differential Revision: https://reviews.llvm.org/D40396 llvm-svn: 319441	2017-11-30 15:29:16 +00:00
Igor Laevsky	48147d012b	[FuzzMutate] Pick correct index for the insertvalue instruction Differential Revision: https://reviews.llvm.org/D40395 llvm-svn: 319440	2017-11-30 15:26:48 +00:00
Igor Laevsky	faacdf8d54	[FuzzMutate] Don't create load as a new source if it doesn't match with the descriptor Differential Revision: https://reviews.llvm.org/D40394 llvm-svn: 319439	2017-11-30 15:24:41 +00:00
Igor Laevsky	444afc82c0	[FuzzMutate] Don't crash when we can't remove instruction from empty function Differential Revision: https://reviews.llvm.org/D40393 llvm-svn: 319438	2017-11-30 15:07:38 +00:00
Nemanja Ivanovic	db7e77047c	[PowerPC] Recommit r314244 with refactoring and off by default This re-commits everything that was pulled in r314244. The transformation is off by default (patch to enable it to follow). The code is refactored to have a single entry-point and provide fine-grained control over patterns that it selects. This patch also fixes the bugs in the original code. Everything that failed with the original patch has been re-tested with this patch (with the transformation turned on). So the patch to turn this on is soon to follow. Differential Revision: https://reviews.llvm.org/D38575 llvm-svn: 319434	2017-11-30 13:39:10 +00:00
Simon Pilgrim	bb791b3dbd	[X86][AVX512] Tag fcmp/ptest/ternlog instructions scheduler classes llvm-svn: 319433	2017-11-30 13:18:06 +00:00
Sean Eveson	a6bcd53d52	[MC] Function stack size section. Re applying after fixing issues in the diff, sorry for any painful conflicts/merges! Original RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-August/117028.html This change adds a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. The section contains pairs of function symbol references (8 byte) and stack sizes (unsigned LEB128). The contents of this section can be used to measure changes to stack sizes between different versions of the compiler or a source base. The advantage of having a section is that we can extract this information when examining binaries that we didn't build, and it allows users and tools easy access to that information just by referencing the binary. There is a follow up change to add an option to clang. Thanks. Reviewers: hfinkel, MatzeB Reviewed By: MatzeB Subscribers: thegameg, asb, llvm-commits Differential Revision: https://reviews.llvm.org/D39788 llvm-svn: 319430	2017-11-30 13:05:14 +00:00
Sean Eveson	661e4fbf83	Revert r319423: [MC] Function stack size section. I messed up the diff. llvm-svn: 319429	2017-11-30 12:43:25 +00:00
Diana Picus	f003d9ff95	[ARM GlobalISel] Bail out for byval Fallback if we have a byval parameter or argument since we don't support them yet. llvm-svn: 319428	2017-11-30 12:23:44 +00:00
Francis Visoiu Mistrih	93ef145862	[CodeGen] Print "%vreg0" as "%0" in both MIR and debug output As part of the unification of the debug format and the MIR format, avoid printing "vreg" for virtual registers (which is one of the current MIR possibilities). Basically: * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g" * grep -nr '%vreg' . and fix if needed * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g" * grep -nr 'vreg[0-9]\+' . and fix if needed Differential Revision: https://reviews.llvm.org/D40420 llvm-svn: 319427	2017-11-30 12:12:19 +00:00
Simon Pilgrim	d1a7d0c3f1	[X86][AVX512] Tag binop/rounding/sae instructions scheduler classes llvm-svn: 319424	2017-11-30 12:01:52 +00:00
Sean Eveson	f77b4d2f38	[MC] Function stack size section. Summary: Original RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-August/117028.html I wasn't sure who to put as reviewers, so please add/remove people as appropriate. This change adds a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. The section contains pairs of function symbol references (8 byte) and stack sizes (unsigned LEB128). The contents of this section can be used to measure changes to stack sizes between different versions of the compiler or a source base. The advantage of having a section is that we can extract this information when examining binaries that we didn't build, and it allows users and tools easy access to that information just by referencing the binary. There is a follow up change to add an option to clang. Thanks. Reviewers: hfinkel, MatzeB Reviewed By: MatzeB Subscribers: thegameg, asb, llvm-commits Differential Revision: https://reviews.llvm.org/D39788 llvm-svn: 319423	2017-11-30 12:01:16 +00:00
Sam Parker	4bd776e001	[DAGCombine] Refactor ReduceLoadWidth visitAND attempts to narrow the width of extending loads that are then masked off. ReduceLoadWidth already exists for a similar purpose and handles shifts, so I've moved the code to handle AND nodes there. Differential Revision: https://reviews.llvm.org/D39595 llvm-svn: 319421	2017-11-30 11:49:11 +00:00
Serge Guelton	24386867b8	Support generic lowering of vector bswap llvm-svn: 319419	2017-11-30 11:06:22 +00:00
Simon Pilgrim	3e5987cf8d	[X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes llvm-svn: 319418	2017-11-30 10:48:47 +00:00
Hiroshi Inoue	21e8ded4d2	Revert rL319407: [SROA] enable splitting for non-whole-alloca loads and stores This reverts commit rL319407 due to failures in some buildbot. llvm-svn: 319410	2017-11-30 08:29:51 +00:00
Jonas Paulsson	b9a2467501	[SystemZ] Bugfix in adjustSubwordCmp. Csmith generated a program where a store after load to the same address did not get chained after the new load created during DAG legalizing, and so performed an illegal overwrite of the expected value. When the new zero-extending load is created, the chain users of the original load must be updated, which was not done previously. A similar case was also found and handled in lowerBITCAST. Review: Ulrich Weigand https://reviews.llvm.org/D40542 llvm-svn: 319409	2017-11-30 08:18:50 +00:00
Hiroshi Inoue	422e80aee2	[SROA] enable splitting for non-whole-alloca loads and stores Currently, SROA splits loads and stores only when they are accessing the whole alloca. This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is. The whole-alloca loads and stores meet this new condition and so they are still splittable. Here is a simplified motivating example. struct record { long long a; int b; int c; }; int func(struct record r) { for (int i = 0; i < r.c; i++) r.b++; return r.b; } When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument. With this patch, the above example is compiled into only few instructions without loop. Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers. Differential Revision: https://reviews.llvm.org/D32998 llvm-svn: 319407	2017-11-30 07:44:46 +00:00
Craig Topper	a495744d2c	[X86] Optimize avx2 vgatherqps for v2f32 with v2i64 index type. Normal type legalization will widen everything. This requires forcing 0s into the mask register. We can instead choose the form that only reads 2 elements without zeroing the mask. llvm-svn: 319406	2017-11-30 07:01:40 +00:00
Craig Topper	321a8b9b63	[X86] Make sure we don't remove sign extends of masks with AVX2 masked gathers. We don't use k-registers and instead use the MSB so we need to make sure we sign extend the mask to the msb. llvm-svn: 319405	2017-11-30 06:31:31 +00:00
Graham Yiu	70293fa27a	- Removed unused lamba (IsReturnBlock) causing build bots to fail for r319398 - Added lit testcases that were supposed to be part of r319398 llvm-svn: 319399	2017-11-30 03:36:57 +00:00
Graham Yiu	8b1882c186	With PGO information, we can do more aggressive outlining of cold regions in the inline candidate function. This contrasts with the scheme of keeping only the 'early return' portion of the inline candidate and outlining the rest of the function as a single function call. Support for outlining multiple regions of each function is added, as well as some basic heuristics to determine which regions are good to outline. Outline candidates limited to regions that are single-entry & single-exit. We also avoid outlining regions that produce live-exit variables, which may inhibit some forms of code motion (like commoning). Fallback to the regular partial inlining scheme is retained when either i) no regions are identified for outlining in the function, or ii) the outlined function could not be inlined in any of its callers. Differential Revision: https://reviews.llvm.org/D38190 llvm-svn: 319398	2017-11-30 02:41:36 +00:00
Matt Arsenault	caf0ed4d74	AMDGPU: Allow negative MUBUF vaddr for gfx9 GFX9 does not enable bounds checking for the resource descriptors used for private access, so it should be OK to use vaddr with a potentially negative value. llvm-svn: 319393	2017-11-30 00:52:40 +00:00
Vedant Kumar	80fbb85555	[Coverage] Use the most-recent completed region count (PR35437) This is a fix for the coverage segment builder. If multiple regions must be popped off the active stack at once, and more than one of them end at the same location, emit a segment using the count from the most-recent completed region. Fixes PR35437, rdar://35760630 Testing: invoked llvm-cov on a stage2 build of clang, additional unit tests, check-profile llvm-svn: 319391	2017-11-30 00:28:23 +00:00
Peter Collingbourne	9e3175bb6b	LowerTypeTests: Deduplicate code. NFC. llvm-svn: 319390	2017-11-30 00:27:08 +00:00
Peter Collingbourne	943aca3c27	LowerTypeTests: Remove unnecessary cast. NFC. llvm-svn: 319387	2017-11-30 00:02:55 +00:00
Craig Topper	56a41d4b3a	[X86] Remove some questionable looking code that seems to be looking through a VZEXT to create a larger VSEXT. If the input the vzext was signed this would do the wrong thing. Not sure how to test this. llvm-svn: 319382	2017-11-29 23:08:25 +00:00
Joerg Sonnenberger	4b1acff9b3	First step towards more human-friendly PPC assembler output: - add -ppc-reg-with-percent-prefix option to use %r3 etc as register names - split off logic for Darwinish verbose conditional codes into a helper function - be explicit about Darwin vs AIX vs GNUish assembler flavors Based on the patch from Alexandre Yukio Yamashita Differential Revision: https://reviews.llvm.org/D39016 llvm-svn: 319381	2017-11-29 23:05:56 +00:00
Sam Clegg	da8d83f911	[WebAssembly] Update test expectations for gcc torture tests I believe these were recently fixed by: https://reviews.llvm.org/rL319186 Differential Revision: https://reviews.llvm.org/D40619 llvm-svn: 319380	2017-11-29 23:05:50 +00:00
Zachary Turner	52d036e693	[CodeView] Factor some code out of TypeTableBuilder. This class had some code that would automatically remap type indices before hashing and serializing. The only caller of this method was the TypeStreamMerger anyway, and the method doesn't make general sense, and prevents making certain future improvements to the class. So, factoring this up one level into the TypeStreamMerger where it belongs. llvm-svn: 319377	2017-11-29 22:41:56 +00:00
Craig Topper	cf461a0a32	[SelectionDAG][X86] Teach promotion legalization for fp_to_sint/fp_to_uint to insert an assertsext/assertzext based on the original type If we put in an assertsext/zext here, we're able to generate better truncate code using pack on pre-avx512 targets. Similar is already done during type legalization. This is the equivalent for op legalization Differential Revision: https://reviews.llvm.org/D40591 llvm-svn: 319368	2017-11-29 22:15:43 +00:00
Dan Gohman	580c102ab8	[WebAssembly] Fix fptoui lowering bounds To fully avoid trapping on wasm, fptoui needs a second check to ensure that the operand isn't below the supported range. llvm-svn: 319354	2017-11-29 20:20:11 +00:00
Krzysztof Parzyszek	f4dcc42e7b	[Hexagon] Remove HexagonISD::PACKHL llvm-svn: 319352	2017-11-29 19:59:29 +00:00
Krzysztof Parzyszek	6a8e5f4b0f	[Hexagon] Create helpers extractVector and insertVector in lowering llvm-svn: 319351	2017-11-29 19:58:10 +00:00
Simon Pilgrim	4d2c703492	[X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes (REVERSION) Accidental commit of incomplete patch llvm-svn: 319346	2017-11-29 19:37:38 +00:00
Zachary Turner	3e3936da93	Make TypeTableBuilder inherit from TypeCollection. A couple of places in LLD were passing references to TypeTableCollections around, which makes it hard to change the implementation at runtime. However, these cases only needed to iterate over the types in the collection, and TypeCollection already provides a handy abstract interface for this purpose. By implementing this interface, we can get rid of the need to pass TypeTableBuilder references around, which should allow us to swap the implementation at runtime in subsequent patches. llvm-svn: 319345	2017-11-29 19:35:21 +00:00
Simon Pilgrim	87034cb498	[X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes llvm-svn: 319338	2017-11-29 19:19:59 +00:00
Simon Pilgrim	36be852cee	[X86][AVX512] Tag 3OP (shuffles, double-shifts and GFNI) instructions scheduler classes llvm-svn: 319337	2017-11-29 18:52:20 +00:00
Nirav Dave	bafaa53c4d	[ARM][DAG] Revert Disable post-legalization store merge for ARM Partially reverting enabling of post-legalization store merge (r319036) for just ARM backend as it is causing incorrect code in some Thumb2 cases. llvm-svn: 319331	2017-11-29 18:06:13 +00:00
Simon Pilgrim	6a00970ade	[X86][AVX512] Add itinerary argument to all AVX512_maskable_* wrappers. NFCI All default to NoItinerary llvm-svn: 319326	2017-11-29 17:21:15 +00:00
Sander de Smalen	6a3bf1f84a	Reverted r319315 because of unused functions (due to PPR not yet being used by any instructions). llvm-svn: 319321	2017-11-29 15:14:39 +00:00
Simon Pilgrim	1401a75341	[X86][AVX512] Tag VPERMILV instruction scheduler class llvm-svn: 319316	2017-11-29 14:58:34 +00:00
Sander de Smalen	2b6338b2bc	[AArch64][SVE] Asm: Add SVE predicate register definitions and parsing support Summary: Patch [1/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro, echristo, efriedma Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D40360 llvm-svn: 319315	2017-11-29 14:34:18 +00:00
Diana Picus	863b5b05f1	[ARM GlobalISel] Fix selecting G_BRCOND When lowering a G_BRCOND, we generate a TSTri of the condition against 1, which sets the flags, and then a Bcc which branches based on the value of the flags. Unfortunately, we were using the wrong condition code to check whether we need to branch (EQ instead of NE), which caused all our branches to do the opposite of what they were intended to do. This patch fixes the issue by using the correct condition code. llvm-svn: 319313	2017-11-29 14:20:06 +00:00
Simon Pilgrim	756348c1c9	[X86][AVX512] Setup unary (PABS/VPLZCNT/VPOPCNT/VPCONFLICT/VMOV*DUP) instruction scheduler classes llvm-svn: 319312	2017-11-29 13:49:51 +00:00
Dmitry Preobrazhensky	1ac7177abb	[AMDGPU][MC][GFX9] Corrected mapping of GFX9 v_add/sub/subrev_u32 When translating pseudo to MC, v_add/sub/subrev_u32 shall be mapped via a separate table as GFX8 has opcodes with the same names. These instructions shall also be labelled as renamed for pseudoToMCOpcode to handle them correctly. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D40550 llvm-svn: 319311	2017-11-29 13:33:40 +00:00
Simon Pilgrim	e3291de2b8	[X86][SSE] Merged sse2_unpack and sse2_unpack PUNPCK instruction templates. NFCI. llvm-svn: 319310	2017-11-29 12:12:27 +00:00
Simon Pilgrim	da95772230	[X86][SSE] Merged sse2_pack and sse2_pack_y PACKSS/PACKUS instruction templates. NFCI. llvm-svn: 319308	2017-11-29 11:35:45 +00:00
Max Kazantsev	9545a408b6	[SCEV][NFC] Break from loop after we found first non-Phi in getAddRecExprPHILiterally llvm-svn: 319306	2017-11-29 10:54:16 +00:00
Oliver Stannard	9ea2eaeb50	[ARM] Add support for armv7e-m to the .arch directive This will allow compilation of assembly files targeting armv7e-m without having to specify the Tag_CPU_arch attribute as a workaround. Differential revision: https://reviews.llvm.org/D40370 Patch by Ian Tessier! llvm-svn: 319303	2017-11-29 10:12:15 +00:00
Serguei Katkov	d4df744434	[CGP] Enable complex addr mode Enable complex addr modes after two critical fixes: rL319109 and rL319292 llvm-svn: 319302	2017-11-29 09:48:50 +00:00
Craig Topper	e3515001b9	[X86] Remove setOperationAction Promote for ISD::SINT_TO_FP MVT::v8i16/v16i8/v16i16. A DAG combine ensures these ops are always promoted to vXi32. llvm-svn: 319298	2017-11-29 08:19:36 +00:00
Max Kazantsev	1c3b622820	[SCEV][NFC] Remove condition that can never happen due to check few lines above llvm-svn: 319293	2017-11-29 06:10:36 +00:00
Serguei Katkov	5036459ae3	[CGP] Fix common type handling in optimizeMemoryInst If common type is different we should bail out due to we will not be able to create a select or Phi of these values. Basically it is done in ExtAddrMode::compare however it does not work if we handle the null first and then two values of different types. so add a check in initializeMap as well. The check in ExtAddrMode::compare is used as earlier bail out. Reviewers: reames, john.brawn Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40479 llvm-svn: 319292	2017-11-29 05:51:26 +00:00
Sean Fertile	aab3ef76d9	[PowerPC] Relax the checking on AND/AND8 in isSignOrZeroExtended. Separate the handling of AND/AND8 out from PHI/OR/ISEL checking. The reasoning is the others need all their operands to be sign/zero extended for their output to also be sign/zero extended. This is true for AND and sign-extension, but for zero-extension we only need at least one of the input operands to be zero extended for the result to also be zero extended. Differential Revision: https://reviews.llvm.org/D39078 llvm-svn: 319289	2017-11-29 04:09:29 +00:00
Matt Arsenault	b655fa9ce2	DAG: Add nuw when splitting loads and stores The object can't straddle the address space wrap around, so I think it's OK to assume any offsets added to the base object pointer can't overflow. Similar logic already appears to be applied in SelectionDAGBuilder when lowering aggregate returns. llvm-svn: 319272	2017-11-29 01:25:12 +00:00
Adrian Prantl	5da51f435a	llvm-dwarfdump: honor the --show-children option when dumping a specific DIE. llvm-svn: 319271	2017-11-29 01:12:22 +00:00
Matt Arsenault	3f71c0e3ee	AMDGPU: Select DS insts without m0 initialization GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed. llvm-svn: 319270	2017-11-29 00:55:57 +00:00
Craig Topper	fbf7b3bf3e	[X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization. llvm-svn: 319266	2017-11-29 00:32:09 +00:00
Zachary Turner	4c1fa68590	Fix a warning. llvm-svn: 319263	2017-11-29 00:13:44 +00:00
Zachary Turner	29b081dcd1	[NFC] Minor cleanups in CodeView TypeTableBuilder. llvm-svn: 319260	2017-11-28 23:57:13 +00:00
Craig Topper	88ffb5d4d5	[X86] Mark ISD::FP_TO_UINT v16i8/v16i16 as Promote under AVX512 instead of legal. Fix infinite loop in op legalization when promotion requires 2 steps. Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel. The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value. llvm-svn: 319259	2017-11-28 23:56:02 +00:00
Matt Arsenault	607a756651	AMDGPU: Enable IPRA llvm-svn: 319256	2017-11-28 23:40:12 +00:00
Simon Pilgrim	b9aa93cb93	[X86] Tag CLFLUSHOPT with same scheduling behaviour as CLFLUSH llvm-svn: 319253	2017-11-28 23:25:42 +00:00
Simon Pilgrim	f490c6efee	[X86][SSE] Add SSE_SHUFP OpndItins Update multi-classes to take the scheduling OpndItins instead of hard coding it. Will be reused in the AVX512 equivalents. llvm-svn: 319249	2017-11-28 23:09:18 +00:00
Simon Pilgrim	8f62394751	[X86][SSE] Add SSE_UNPCK/SSE_PUNPCK OpndItins Update multi-classes to take the scheduling OpndItins instead of hard coding it. Will be reused in the AVX512 equivalents. llvm-svn: 319245	2017-11-28 22:55:08 +00:00
Simon Pilgrim	1bc7b0e148	[X86][SSE] Use SSE_PACK OpndItins in PACKSS/PACKUS instruction definitions Update multi-classes to take the scheduling OpndItins instead of hard coding it. SSE_PACK will be reused in the AVX512 equivalents. llvm-svn: 319243	2017-11-28 22:47:45 +00:00
Simon Pilgrim	14d3fd29f8	Fix VS2017 narrowing conversion warning. NFCI llvm-svn: 319240	2017-11-28 22:32:43 +00:00
Craig Topper	ab9bfc904b	[X86] Remove unused variable. llvm-svn: 319239	2017-11-28 22:28:23 +00:00
Adam Nemet	2e92289014	Demote this opt remark to DEBUG. From a random opt-stat output: Top 10 remarks: tailcallelim/tailcall 53% inline/AlwaysInline 13% gvn/LoadClobbered 13% inline/Inlined 8% inline/TooCostly 2% inline/NoDefinition 2% licm/LoadWithLoopInvariantAddressInvalidated 2% licm/Hoisted 1% asm-printer/InstructionCount 1% prologepilog/StackSize 1% llvm-svn: 319235	2017-11-28 22:11:00 +00:00
Craig Topper	a27f1e675a	[X86] Remove code from combineUIntToFP that tried to favor UINT_TO_FP if legal when zero extending from vXi8/vX816. The UINT_TO_FP is immediately converted to SINT_TO_FP when the node is re-evaluated because we'll detect that the sign bit is zero. llvm-svn: 319234	2017-11-28 22:08:51 +00:00
Craig Topper	3aaa71f222	[X86] Remove custom lowering for uint_to_fp from vXi8/vXi16. We have a DAG combine that uses a zero extend that should prevent this from ever occurring now. llvm-svn: 319233	2017-11-28 22:08:48 +00:00
Adrian Prantl	77d90b0c39	SROA: Don't create variable fragments that are outside of the variable. An alloca may be larger than a variable that is described to be stored there. Don't create a dbg.value for fragments that are outside of the variable. This fixes PR35447. https://bugs.llvm.org/show_bug.cgi?id=35447 llvm-svn: 319230	2017-11-28 21:30:38 +00:00
Mandeep Singh Grang	e0173664e9	[Hexagon] Use stable sort for HexagonShuffler to remove non-deterministic ordering Summary: This fixes failures in the following tests uncovered by D39245: LLVM :: CodeGen/Hexagon/args.ll LLVM :: CodeGen/Hexagon/constp-extract.ll LLVM :: CodeGen/Hexagon/expand-condsets-basic.ll LLVM :: CodeGen/Hexagon/gp-rel.ll LLVM :: CodeGen/Hexagon/packetize_cond_inst.ll LLVM :: CodeGen/Hexagon/simple_addend.ll LLVM :: CodeGen/Hexagon/swp-stages4.ll LLVM :: CodeGen/Hexagon/swp-vmult.ll LLVM :: CodeGen/Hexagon/swp-vsum.ll LLVM :: MC/Hexagon/align.s LLVM :: MC/Hexagon/asmMap.s LLVM :: MC/Hexagon/dis-duplex-p0.s LLVM :: MC/Hexagon/double-vector-producer.s LLVM :: MC/Hexagon/inst_select.ll LLVM :: MC/Hexagon/instructions/j.s Reviewers: colinl, kparzysz, adasgupt, slarin Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40227 llvm-svn: 319223	2017-11-28 20:48:10 +00:00
Sean Fertile	e200016ea9	[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions. Allow fastcc callees to be tail-called from ccc callers. Differential Revision: https://reviews.llvm.org/D40355 llvm-svn: 319218	2017-11-28 20:25:58 +00:00
Daniel Sanders	7fe7acc6b1	[aarch64][globalisel] Define G_ATOMIC_CMPXCHG and G_ATOMICRMW_* and make them legal The IRTranslator cannot generate these instructions at the moment so there's no issue with not having implemented ISel for them yet. D40092 will add G_ATOMIC_CMPXCHG_WITH_SUCCESS and G_ATOMICRMW_* to the IRTranslator and a further patch will add support for lowering G_ATOMIC_CMPXCHG_WITH_SUCCESS into G_ATOMIC_CMPXCHG with an external success check via the `Lower` action. The separation of G_ATOMIC_CMPXCHG_WITH_SUCCESS and G_ATOMIC_CMPXCHG is to import SelectionDAG rules while still supporting targets that prefer to custom lower the original LLVM-IR-like operation. llvm-svn: 319216	2017-11-28 20:21:15 +00:00
Mandeep Singh Grang	230b0a1477	[SelectionDAG] Make sorting predicate stronger to remove non-deterministic ordering Summary: Recommitting this with the correct sorting predicate. The Low field of Clusters is a ConstantInt and cannot be directly compared. So we needed to invoke slt (signed less than) to compare correctly. This fixes failures in the following tests uncovered by D39245: LLVM :: CodeGen/ARM/ifcvt3.ll LLVM :: CodeGen/ARM/switch-minsize.ll LLVM :: CodeGen/X86/switch.ll LLVM :: CodeGen/X86/switch-bt.ll LLVM :: CodeGen/X86/switch-density.ll Reviewers: hans, fhahn Reviewed By: hans Subscribers: aemerson, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D40541 llvm-svn: 319210	2017-11-28 19:55:54 +00:00
Simon Pilgrim	d49bd0cd87	[X86][SSE] Add SSE_HADDSUB/SSE_PABS/SSE_PALIGN OpndItins Update multi-classes to take the scheduling OpndItins instead of hard coding it. Will be reused in the AVX512 equivalents. llvm-svn: 319209	2017-11-28 19:39:47 +00:00
Craig Topper	dd4295626b	[X86] In lowerVectorShuffleAsElementInsertion, if were able to find a scalar i8 or i16 and need to zero extend it, make sure we use a vXi32 type of the full vector width. Previously, this was hardcoded to v4i32, but if the input type is 256 bits we need to use v8i32. Fixes PR35443 llvm-svn: 319208	2017-11-28 19:25:45 +00:00
Francis Visoiu Mistrih	3aa8eaa951	[CodeGen] Fix doxygen \file comment style llvm-svn: 319207	2017-11-28 19:23:39 +00:00
Francis Visoiu Mistrih	d4b340b460	[CodeGen] Fix doxygen llvm-svn: 319206	2017-11-28 19:15:46 +00:00
Krzysztof Parzyszek	081e458e90	[Hexagon] Make sure to zero-extend bytes before building a vector llvm-svn: 319204	2017-11-28 19:13:17 +00:00
Daniel Sanders	17d277b734	[mir] Print/Parse both MOLoad and MOStore when they occur together. Summary: They're not always mutually exclusive. read-modify-write atomics are both at the same time. One example of this is the SWP instructions on AArch64. Another example is GlobalISel's G_ATOMICRMW_* generic instructions which will be added in a later patch. Reviewers: arphaman, aemerson Reviewed By: aemerson Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D40157 llvm-svn: 319202	2017-11-28 18:57:02 +00:00
Rafael Espindola	bba7f862d8	Fix non assert build warnings. llvm-svn: 319200	2017-11-28 18:50:08 +00:00
Hans Wennborg	ca46db957d	EntryExitInstrumenter: set DebugLocs on the inserted call instructions (PR35412) Apparently the verifier requires that inlineable calls in a function with debug info have debug locations. llvm-svn: 319199	2017-11-28 18:44:26 +00:00
Zachary Turner	6900de1dfb	[CodeView] Refactor / Rewrite TypeSerializer and TypeTableBuilder. The motivation behind this patch is that future directions require us to be able to compute the hash value of records independently of actually using them for de-duplication. The current structure of TypeSerializer / TypeTableBuilder being a single entry point that takes an unserialized type record, and then hashes and de-duplicates it is not flexible enough to allow this. At the same time, the existing TypeSerializer is already extremely complex for this very reason -- it tries to be too many things. In addition to serializing, hashing, and de-duplicating, ti also supports splitting up field list records and adding continuations. All of this functionality crammed into this one class makes it very complicated to work with and hard to maintain. To solve all of these problems, I've re-written everything from scratch and split the functionality into separate pieces that can easily be reused. The end result is that one class TypeSerializer is turned into 3 new classes SimpleTypeSerializer, ContinuationRecordBuilder, and TypeTableBuilder, each of which in isolation is simple and straightforward. A quick summary of these new classes and their responsibilities are: - SimpleTypeSerializer : Turns a non-FieldList leaf type into a series of bytes. Does not do any hashing. Every time you call it, it will re-serialize and return bytes again. The same instance can be re-used over and over to avoid re-allocations, and in exchange for this optimization the bytes returned by the serializer only live until the caller attempts to serialize a new record. - ContinuationRecordBuilder : Turns a FieldList-like record into a series of fragments. Does not do any hashing. Like SimpleTypeSerializer, returns references to privately owned bytes, so the storage is invalidated as soon as the caller tries to re-use the instance. Works equally well for LF_FIELDLIST as it does for LF_METHODLIST, solving a long-standing theoretical limitation of the previous implementation. - TypeTableBuilder : Accepts sequences of bytes that the user has already serialized, and inserts them by de-duplicating with a hash table. For the sake of convenience and efficiency, this class internally stores a SimpleTypeSerializer so that it can accept unserialized records. The same is not true of ContinuationRecordBuilder. The user is required to create their own instance of ContinuationRecordBuilder. Differential Revision: https://reviews.llvm.org/D40518 llvm-svn: 319198	2017-11-28 18:33:17 +00:00
Simon Pilgrim	4fecbd8871	[X86][X87] Tag FP_TO_INT_IN_MEM pseudos with hasNoSchedulingInfo We don't need scheduling info for pseudos llvm-svn: 319197	2017-11-28 18:10:29 +00:00
Francis Visoiu Mistrih	aa739695a4	[CodeGen] Separate MachineOperand implementation from MachineInstr Move the implementation to its own file. Differential Revision: https://reviews.llvm.org/D40419 llvm-svn: 319194	2017-11-28 17:58:43 +00:00
Francis Visoiu Mistrih	946e394e33	[CodeGen] Cleanup MachineOperand * clang-format * move doxygen from the implementation to headers * remove duplicate doxygen llvm-svn: 319193	2017-11-28 17:58:38 +00:00
Konstantin Zhuravlyov	06ae4ec78e	AMDGPU: Add num spilled s/vgprs to metadata This was requested by tools. Differential Revision: https://reviews.llvm.org/D40321 llvm-svn: 319192	2017-11-28 17:51:08 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Dan Gohman	2803bfaf00	[WebAssembly] Support bitcasted function addresses with varargs. Generalize FixFunctionBitcasts to handle varargs functions. This in particular fixes the case where clang bitcasts away a varargs when calling a K&R-style function. This avoids interacting with tricky ABI details because it operates at the LLVM IR level before varargs ABI details are exposed. This fixes PR35385. llvm-svn: 319186	2017-11-28 17:15:03 +00:00
Matt Arsenault	e123aba94e	DAG: Legalize truncstores to illegal int types Truncate to a legal int type, and produce a new truncstore from a narrower type. llvm-svn: 319185	2017-11-28 17:11:30 +00:00
Simon Pilgrim	ece5bc358a	[X86][X87] Tag FTST x87 instruction scheduler class Looking through Agner, FTST is very similar to generic float compare behaviour, so I've added them to the existing IIC_FCOMI (WriteFAdd) tags. llvm-svn: 319184	2017-11-28 16:57:20 +00:00
Simon Pilgrim	0747a7e8c3	[X86][X87] Tag FABS/FCHS/FSQRT/FSIN/FCOS x87 instruction scheduler classes Atom's FABS/FCHS/FSQRT latencies taken from Agner. Note: I just added FSIN and FCOS to the existing IIC_FSINCOS itinerary, which is actually a more costly instruction. llvm-svn: 319175	2017-11-28 15:03:42 +00:00
Jonas Paulsson	f0ff20f1f0	Use getStoreSize() in various places instead of 'BitSize >> 3'. This is needed for cases when the memory access is not as big as the width of the data type. For instance, storing i1 (1 bit) would be done in a byte (8 bits). Using 'BitSize >> 3' (or '/ 8') would e.g. give the memory access of an i1 a size of 0, which for instance makes alias analysis return NoAlias even when it shouldn't. There are no tests as this was done as a follow-up to the bugfix for the case where this was discovered (r318824). This handles more similar cases. Review: Björn Petterson https://reviews.llvm.org/D40339 llvm-svn: 319173	2017-11-28 14:44:32 +00:00
Francis Visoiu Mistrih	26d6fc1f0e	[Support] Merge toLower / toUpper implementations Merge the ones from StringRef and StringExtras. llvm-svn: 319171	2017-11-28 14:22:27 +00:00
Francis Visoiu Mistrih	9d419d3b0c	[CodeGen] Rename functions PrintReg* to printReg* LLVM Coding Standards: Function names should be verb phrases (as they represent actions), and command-like function should be imperative. The name should be camel case, and start with a lower case letter (e.g. openFile() or isFoo()). Differential Revision: https://reviews.llvm.org/D40416 llvm-svn: 319168	2017-11-28 12:42:37 +00:00
Simon Pilgrim	8dc603b031	[X86][3DNow] Add instruction itinerary and scheduling classes for femms/prefetch/prefetchw llvm-svn: 319167	2017-11-28 12:37:35 +00:00
Peter Smith	a939257a42	[ARM][AArch64] Workaround ARM/AArch64 peculiarity in clearing icache. Certain ARM implementations treat icache clear instruction as a memory read, and CPU segfaults on trying to clear cache on !PROT_READ page. We workaround this in Memory::protectMappedMemory by adding PROT_READ to affected pages, clearing the cache, and then setting desired protection. This fixes "AllocationTests/MappedMemoryTest.***/3" unit-tests on affected hardware. Reviewers: psmith, zatrazz, kristof.beyls, lhames Reviewed By: lhames Subscribers: llvm-commits, krytarowski, peter.smith, jgreenhalgh, aemerson, rengolin Patch by maxim-kuvrykov! Differential Revision: https://reviews.llvm.org/D40423 llvm-svn: 319166	2017-11-28 12:34:05 +00:00
Chandler Carruth	c34f789e38	Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable. The core idea is to (re-)introduce some redundancies where their cost is hidden by the cost of materializing immediates for constant operands of PHI nodes. When the cost of the redundancies is covered by this, avoiding materializing the immediate has numerous benefits: 1) Less register pressure 2) Potential for further folding / combining 3) Potential for more efficient instructions due to immediate operand As a motivating example, consider the remarkably different cost on x86 of a SHL instruction with an immediate operand versus a register operand. This pattern turns up surprisingly frequently, but is somewhat rarely obvious as a significant performance problem. The pass is entirely target independent, but it does rely on the target cost model in TTI to decide when to speculate things around the PHI node. I've included x86-focused tests, but any target that sets up its immediate cost model should benefit from this pass. There is probably more that can be done in this space, but the pass as-is is enough to get some important performance on our internal benchmarks, and should be generally performance neutral, but help with more extensive benchmarking is always welcome. One awkward part is that this pass has to be scheduled after everything that can eliminate these kinds of redundancies. This includes SimplifyCFG, GVN, etc. I'm open to suggestions about better places to put this. We could in theory make it part of the codegen pass pipeline, but there doesn't really seem to be a good reason for that -- it isn't "lowering" in any sense and only relies on pretty standard cost model based TTI queries, so it seems to fit well with the "optimization" pipeline model. Still, further thoughts on the pipeline position are welcome. I've also only implemented this in the new pass manager. If folks are very interested, I can try to add it to the old PM as well, but I didn't really see much point (my use case is already switched over to the new PM). I've tested this pretty heavily without issue. A wide range of benchmarks internally show no change outside the noise, and I don't see any significant changes in SPEC either. However, the size class computation in tcmalloc is substantially improved by this, which turns into a 2% to 4% win on the hottest path through tcmalloc for us, so there are definitely important cases where this is going to make a substantial difference. Differential revision: https://reviews.llvm.org/D37467 llvm-svn: 319164	2017-11-28 11:32:31 +00:00
Florian Hahn	25ea91a838	[TailRecursionElimination] Skip debug intrinsics. Summary: I think we do not need to analyze debug intrinsics here, as they should not impact codegen. This has 2 benefits: 1) slightly less work to do and 2) avoiding generating optimization remarks for converting calls to debug intrinsics to tail calls, which are not really helpful for users. Based on work by Sander de Smalen. Reviewers: davide, trentxintong, aprantl Reviewed By: aprantl Subscribers: llvm-commits, JDevlieghere Tags: #debug-info Differential Revision: https://reviews.llvm.org/D40440 llvm-svn: 319158	2017-11-28 09:32:25 +00:00
Nicolai Haehnle	b4f28deda0	AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer Summary: The entire algorithm operates per basic-block, so for cache locality it should be better to re-optimize a basic-block immediately rather than in a separate loop. I don't have performance measurements. Change-Id: I85106570bd623c4ff277faaa50ee43258e1ddcc5 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40344 llvm-svn: 319156	2017-11-28 08:42:46 +00:00
Nicolai Haehnle	39980dac0b	AMDGPU: Consistently check for immediates in SIInstrInfo::FoldImmediate Summary: The PeepholeOptimizer pass calls this function solely based on checking DefMI->isMoveImmediate(), which only checks the MoveImm bit of the instruction description. So it's up to FoldImmediate itself to properly check that DefMI actually moves from an immediate. I don't have a separate test case for this, but the next patch introduces a test case which happens to crash without this change. This error is caught by the assertion in MachineOperand::getImm(). Change-Id: I88e7cdbcf54d75e1a296822e6fe5f9a5f095bbf8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40342 llvm-svn: 319155	2017-11-28 08:41:50 +00:00
Max Kazantsev	6e78ad35cc	[SCEV][NFC] More efficient caching in CompareValueComplexity Currently, we use a set of pairs to cache responces like `CompareValueComplexity(X, Y) == 0`. If we had proved that `CompareValueComplexity(S1, S2) == 0` and `CompareValueComplexity(S2, S3) == 0`, this cache does not allow us to prove that `CompareValueComplexity(S1, S3)` is also `0`. This patch replaces this set with `EquivalenceClasses` that merges Values into equivalence sets so that any two values from the same set are equal from point of `CompareValueComplexity`. This, in particular, allows us to prove the fact from example above. Differential Revision: https://reviews.llvm.org/D40429 llvm-svn: 319153	2017-11-28 08:26:43 +00:00
Martin Storsjo	04b68446eb	[COFF] Implement constructor priorities The priorities in the section name suffixes are zero padded, allowing the linker to just do a lexical sort. Add zero padding for .ctors sections in ELF as well. Differential Revision: https://reviews.llvm.org/D40407 llvm-svn: 319150	2017-11-28 08:07:18 +00:00
Max Kazantsev	cf9b1b24ce	[SCEV][NFC] More efficient caching in CompareSCEVComplexity Currently, we use a set of pairs to cache responces like `CompareSCEVComplexity(X, Y) == 0`. If we had proved that `CompareSCEVComplexity(S1, S2) == 0` and `CompareSCEVComplexity(S2, S3) == 0`, this cache does not allow us to prove that `CompareSCEVComplexity(S1, S3)` is also `0`. This patch replaces this set with `EquivalenceClasses` any two values from the same set are equal from point of `CompareSCEVComplexity`. This, in particular, allows us to prove the fact from example above. Differential Revision: https://reviews.llvm.org/D40428 llvm-svn: 319149	2017-11-28 07:48:12 +00:00
Max Kazantsev	115607226a	[GVN] Prevent ScalarPRE from hoisting across instructions that don't pass control flow to successors This is to address a problem similar to those in D37460 for Scalar PRE. We should not PRE across an instruction that may not pass execution to its successor unless it is safe to speculatively execute it. Differential Revision: https://reviews.llvm.org/D38619 llvm-svn: 319147	2017-11-28 07:07:55 +00:00
Dan Gohman	3ff73cfbcd	[WebAssembly] Handle errors better in fast-isel. Fast-isel routines need to bail out in the case that fast-isel fails on the operands. This fixes https://bugs.llvm.org/show_bug.cgi?id=35064 llvm-svn: 319144	2017-11-28 05:36:42 +00:00
Craig Topper	640a3c1e2a	[X86] Remove some unused pattern fragments from td file. NFC llvm-svn: 319143	2017-11-28 05:23:57 +00:00
Simon Dardis	3aeb1a5404	[DAGCombine] Disable finding better chains for stores at O0 Unoptimized IR can have linear sequences of stores to an array, where the initial GEP for the first store is formed from the pointer to the array, and the GEP for each store after the first is formed from the previous GEP with some offset in an inductive fashion. The (large) resulting DAG when analyzed by DAGCombine undergoes an excessive number of combines as each store node is examined every time its' offset node is combined with any child of the offset. One of the transformations is findBetterNeighborChains which assists MergeConsecutiveStores. The former relies on repeated chain walking to do its' work, however MergeConsecutiveStores is disabled at O0 which makes the transformation redundant. Any optimization level other than O0 would invoke InstCombine which would resolve the chain of GEPs into flat base + offset GEP for each store which does not exhibit the repeated examination of each store to the array. Disabling this optimization fixes an excessive compile time issue (30~ minutes for the test case provided) at O0. Reviewers: niravd, craig.topper, t.p.northover Differential Revision: https://reviews.llvm.org/D40193 llvm-svn: 319142	2017-11-28 04:07:59 +00:00
Matthias Braun	eca985847c	MachineVerifier: Improve register operand checks This fixes cases where we wouldn't perform various register operand checks just because we didn't happen to have a definition in the MCInstrDesc. This changes the code to only skip the tests that actually depend on the MCInstrDesc definition. This makes the machine verifier spot the problem from https://llvm.org/PR33071 after the pass that actually caused it. llvm-svn: 319141	2017-11-28 03:54:20 +00:00
Matthias Braun	a6d5374ee6	MachineVerifier: Improve PHI operand checking Additional checks for phi operands: - first operand should be a virtual register def. It should not be tied, implicit, internalread, earlyclobber or a read. - The other operands should be register/mbb operands next to each other - The register operands should not be implicit, internalread, earlyclobber, debug or tied. - We can perform most of the PHI checks even for unreachable blocks. llvm-svn: 319140	2017-11-28 03:54:19 +00:00
Rafael Espindola	3ecd20430c	Use FILE_FLAG_DELETE_ON_CLOSE for TempFile on windows. We won't see the temp file no more. llvm-svn: 319137	2017-11-28 01:41:22 +00:00
Craig Topper	ddbc340c20	[X86] Make zero extend from v16i1/v8i1 to v16i8/v8i16/v16i16 not scalarize under AVX512. llvm-svn: 319136	2017-11-28 01:36:33 +00:00
Rafael Espindola	2c4e920f0c	Move code. NFC. This moves the TempFile implementation so that it can use system specific code. llvm-svn: 319134	2017-11-28 01:34:20 +00:00
Rafael Espindola	c06f55e1e8	This reverts commit r319096 and r319097. Revert "[SROA] Propagate !range metadata when moving loads." Revert "[Mem2Reg] Clang-format unformatted parts of this file. NFCI." Davide says they broke a bot. llvm-svn: 319131	2017-11-28 01:25:38 +00:00
Matthias Braun	5d01e708e1	ARM: Fix PR32578 https://llvm.org/PR32578 I simplified and converted the reproducer into a lit test. Patch by Vedant Kumar! llvm-svn: 319130	2017-11-28 01:17:52 +00:00
Dan Gohman	cdd48b8a6b	[WebAssembly] Fix trapping behavior in fptosi/fptoui. This adds code to protect WebAssembly's `trunc_s` family of opcodes from values outside their domain. Even though such conversions have full undefined behavior in C/C++, LLVM IR's `fptosi` and `fptoui` do not, and only return undef. This also implements the proposed non-trapping float-to-int conversion feature and uses that instead when available. llvm-svn: 319128	2017-11-28 01:13:40 +00:00
Adrian Prantl	d7f6f1636d	SROA: Avoid creating a fragment expression that covers the entire variable. Fixes PR35416. https://bugs.llvm.org/show_bug.cgi?id=35416 llvm-svn: 319126	2017-11-28 00:57:53 +00:00
Adrian Prantl	3e0e1d0934	Move getVariableSize from Verifier.cpp into DIVariable::getSize() (NFC) llvm-svn: 319125	2017-11-28 00:57:51 +00:00
Craig Topper	8b9cd03824	[X86] Remove unnecessary fp<->int setOperationAction lines from a hasVLX block. NFCI These lines all exist identically either under SSE2, AVX2 or AVX512. Given that VLX implies all of those, these aren't providing anything new. llvm-svn: 319124	2017-11-28 00:41:12 +00:00
Craig Topper	ce732e7c30	[X86] Remove duplicate calls to setOperationAction. NFCI These same calls exist a few lines down. llvm-svn: 319122	2017-11-28 00:16:42 +00:00
Rafael Espindola	bce112c9e9	Add an F_Delete flag. For now this only changes the handle Access. llvm-svn: 319121	2017-11-28 00:12:44 +00:00
Craig Topper	dbd4a7fecc	[DAGCombiner] Don't combine aext(setcc) if the setcc is already using the target's preferred result type. With AVX512 vXi1 types are legal so we shouldn't be extending them. This change is similar to existing code in the zext(setcc) combine. llvm-svn: 319120	2017-11-27 23:51:40 +00:00
Craig Topper	57c02d18b9	[DAGCombiner] Use EVT::changeVectorElementTypeToInteger() instead of implementing manually. llvm-svn: 319119	2017-11-27 23:51:31 +00:00
Rafael Espindola	d19c2e8126	Add OpenFlags to the create(Unique\|Temporary)File interfaces. This will allow a future F_Delete flag to be specified when we want the file to be automatically deleted on close. llvm-svn: 319117	2017-11-27 23:44:11 +00:00
Craig Topper	256cc48df6	[X86] Teach getSetCCResultType to handle more than just SimpleVTs when looking at larger than 512-bit vectors. Which VTs are considered simple is determined by the superset of the legal types of all targets in LLVM. If we're looking at VTs that are going to be split down to 512-bits we should allow any VT not just simple ones since the simple list changes over time as new targets are added. llvm-svn: 319110	2017-11-27 22:56:10 +00:00
Greg Clayton	d6b67eb15c	Fixed the ability to recursively get an attribute value from a DWARFDie. The previous implementation would only look 1 DW_AT_specification or DW_AT_abstract_origin deep. This means DWARFDie::getName() would fail in certain cases. I ran into such a case while creating a tool that used the LLVM DWARF parser to generate a symbolication format so I have seen this in the wild. Differential Revision: https://reviews.llvm.org/D40156 llvm-svn: 319104	2017-11-27 22:12:44 +00:00
Craig Topper	4aa519507d	[X86] Remove lines that set v8f32 FP_ROUND/FP_EXTEND to Legal under AVX512. NFCI We don't do this for narrow vectors under AVX or SSE features. We also don't set them to Expand like we do for many vectors op. Nor does TargetLoweringBase.cpp. This leads me to believe these default to Legal. llvm-svn: 319103	2017-11-27 22:01:17 +00:00
Davide Italiano	824d71a9c5	[Mem2Reg] Clang-format unformatted parts of this file. NFCI. llvm-svn: 319097	2017-11-27 21:25:52 +00:00
Davide Italiano	b5d59e73ee	[SROA] Propagate !range metadata when moving loads. This tries to propagate !range metadata to a pre-existing load when a load is optimized out. This is done instead of adding an assume because converting loads to and from assumes creates a lot of IR. Patch by Ariel Ben-Yehuda. Differential Revision: https://reviews.llvm.org/D37216 llvm-svn: 319096	2017-11-27 21:25:13 +00:00
Sanjay Patel	0de1a4bc2d	[PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result This should fix PR31455: https://bugs.llvm.org/show_bug.cgi?id=31455 Differential Revision: https://reviews.llvm.org/D28314 llvm-svn: 319094	2017-11-27 21:15:43 +00:00
Zaara Syeda	f94d58d908	[PowerPC] Remove redundant TOC saves This patch adds a peep hole optimization to remove any redundant toc save instructions added as part of the call sequence for indirect calls. It removes any toc saves within a function that are dominated by another toc save. Differential Revision: https://reviews.llvm.org/D39736 llvm-svn: 319087	2017-11-27 20:26:36 +00:00
Craig Topper	820ce04377	[SelectionDAG] Add a debug message when vector_shuffle nodes are created. We print a debug message when most nodes are created, but getVectorShuffle was missing. llvm-svn: 319085	2017-11-27 19:54:57 +00:00
Arnold Schwaighofer	d9e710984d	Inliner: Don't mark notail calls with the 'tail' attribute enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2, TCK_NoTail = 3 }; TCK_NoTail is greater than TCK_Tail so taking the min does not do the correct thing. rdar://35639547 llvm-svn: 319075	2017-11-27 19:03:40 +00:00
Zachary Turner	96c6985b53	[BinaryStream] Support growable streams. The existing library assumed that a stream's length would never change. This makes some things simpler, but it's not flexible enough for what we need, especially for writable streams where what you really want is for each call to write to actually append. llvm-svn: 319070	2017-11-27 18:48:37 +00:00
Craig Topper	3decf89ccc	[X86] Remove an unused isel pattern that looked for pshufd with v4f32 type. I don't believe our current lowering/combining would ever produce such a node. We only produce integer typed pshufds. llvm-svn: 319068	2017-11-27 18:25:54 +00:00
Sanjay Patel	863d494730	[InstCombine] use 'auto' with 'dyn_cast'; NFC llvm-svn: 319067	2017-11-27 18:19:32 +00:00
Craig Topper	a4120fc42c	[X86] Teach combineX86ShuffleChain that AllowIntDomain requires at least SSE2. I don't have a good test case for this at the moment. I was playing around with a change in legalizing and triggered this code to produce a PSHUFD with sse1 only. llvm-svn: 319066	2017-11-27 18:15:14 +00:00
Simon Pilgrim	4ac95c9eba	[X86][AVX512] Tag AVX512 PACKSS/PACKUS/PMADDWD/PMADDUBSW instructions with SSE_PACK/SSE_PMADD schedule classes llvm-svn: 319065	2017-11-27 18:14:18 +00:00
Krzysztof Parzyszek	ac1966e15d	[Hexagon] Implement HexagonSubtarget::isHVXVectorType llvm-svn: 319064	2017-11-27 18:12:16 +00:00
Craig Topper	62189f7ab3	[X86] Make getSetCCResultType return vXi1 for any vXi32/vXi64 vector over 512 bits long when AVX512 is enabled. Similar for vXi16/vXi8 with BWI. Any vector larger than 512 bits will be split to 512 bits during legalization. But without this we will fold sexts with them before that making it difficult to recover leading to scalarization. llvm-svn: 319059	2017-11-27 17:51:55 +00:00
Simon Pilgrim	18fc7ff93a	[X86][SSE] Fix roundpd instructions to correctly use IIC_SSE_ROUNDPD_* itineraries llvm-svn: 319054	2017-11-27 17:29:49 +00:00
Dmitry Preobrazhensky	16608e67d3	[AMDGPU][MC][DISASSEMBLER][GFX9] Corrected decoding of GLOBAL/SCRATCH opcodes See bug 35433: https://bugs.llvm.org/show_bug.cgi?id=35433 Differential Revision: https://reviews.llvm.org/D40493 Reviewers: artem.tamazov, SamWot, arsenm llvm-svn: 319050	2017-11-27 17:14:35 +00:00
Zaara Syeda	48cb3c1557	[Power9] Improvements to vector extract with variable index exploitation This patch extends on to rL307174 to not use the power9 vector extract with variable index instructions when extracting word element 1. For such cases, the existing selection of MFVSRWZ provides a better sequence. Differential Revision: https://reviews.llvm.org/D38287 llvm-svn: 319049	2017-11-27 17:11:03 +00:00
Simon Pilgrim	647dd6a602	[X86][AVX512] Tag AVX512 sqrt instructions with SSE_SQRT schedule classes llvm-svn: 319045	2017-11-27 16:43:18 +00:00
Jonas Devlieghere	6a9c5929d4	[llvm-dwarfdump] Display DW_AT_high_pc as absolute value DWARF4 relative DW_AT_high_pc values are now displayed as absolute addresses. The relative value is only shown when explicitly dumping the forms, i.e. in show-form or verbose mode. ``` DW_AT_low_pc (0x0000000000000049) DW_AT_high_pc (0x00000019) ``` becomes ``` DW_AT_low_pc (0x0000000000000049) DW_AT_high_pc (0x0000000000000062) ``` Differential revision: https://reviews.llvm.org/D40317 rdar://35416943 llvm-svn: 319044	2017-11-27 16:40:46 +00:00
Sanjay Patel	4ca9968155	[InstSimplify] use m_APFloat to simplify fcmp folds; NFCI llvm-svn: 319043	2017-11-27 16:37:09 +00:00
Nirav Dave	db77e57ea8	[DAG] Do MergeConsecutiveStores again before Instruction Selection Summary: Now that store-merge is only generates type-safe stores, do a second pass just before instruction selection to allow lowered intrinsics to be merged as well. Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33675 llvm-svn: 319036	2017-11-27 15:28:15 +00:00
Simon Pilgrim	4164009b48	[X86] Add INVLPGA to the existing INVLPG scheduling llvm-svn: 319031	2017-11-27 14:39:50 +00:00
Petar Jovanovic	7745d2f02f	[mips] fix asmstring of Ext and Ins instructions and mips16 JALRC/JRC Make the print format consistent with other assembler instructions. Adding a tab character instead of space in asmstring of Ext and Ins instructions. Removing space around the tab character for JALRC and replacing space with tab in JRC. Patch by Milos Stojanovic. Differential Revision: https://reviews.llvm.org/D38144 llvm-svn: 319030	2017-11-27 14:25:36 +00:00
Jan Korous	c723f65709	[Support] Fix locking of shared variable in threadpool llvm-svn: 319027	2017-11-27 13:42:03 +00:00
Vedran Miletic	ad21f2687d	[AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics AMDGPU backend errors with "unsupported call to function" upon encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch adds custom lowering to avoid that error on both R600 and SI. Reviewers: arsenm, jvesely Subscribers: tstellar Differential Revision: https://reviews.llvm.org/D29942 llvm-svn: 319025	2017-11-27 13:26:38 +00:00
John Brawn	4b476488ba	[CGP] Fix handling of null pointer values in optimizeMemoryInst The current way that trivial addressing modes are detected incorrectly thinks that null pointers are non-trivial, leading to an infinite loop where we keep duplicating the same select. Fix this by aware of null when deciding if an addressing mode is trivial. Differential Revision: https://reviews.llvm.org/D40447 llvm-svn: 319019	2017-11-27 11:29:15 +00:00
Simon Pilgrim	97160be53d	[X86][FMA] Tag all FMA/FMA4 instructions with WriteFMA schedule class As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general). Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes. Differential Revision: https://reviews.llvm.org/D40351 llvm-svn: 319016	2017-11-27 10:41:32 +00:00
Momchil Velikov	bd2c7eb923	[ARM] Fix an off-by-one error when restoring LR for 16-bit Thumb The commit https://reviews.llvm.org/rL318143 computes incorrectly to offset to restore LR from. The number of tPOP operands is 2 (condition) + 2 (implicit def and use of SP) + count of the popped registers. We need to load LR from just past the last register, hence the correct offset should be either getNumOperands() - 4 and getNumExplicitOperands() - 2 (multiplied by 4). Differential revision: https://reviews.llvm.org/D40305 llvm-svn: 319014	2017-11-27 10:13:14 +00:00
Andrew V. Tischenko	26dde7719b	Update BTVER2 sched numbers for SSE42 string instructions. Differential Revision: https://reviews.llvm.org/D39846 llvm-svn: 319013	2017-11-27 09:58:00 +00:00
Craig Topper	400c32ecb9	[SelectionDAG] Teach SplitVecRes_SETCC to call GetSplitVector if the operands have already been split. llvm-svn: 319010	2017-11-27 05:52:54 +00:00

... 3 4 5 6 7 ...

108717 Commits