llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	67aa726f8c	[X86][AVX512DQ] Use packed instructions for scalar FP<->i64 conversions on 32-bit targets As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead. Fixes PR3163. Differential Revision: https://reviews.llvm.org/D43441 llvm-svn: 332498	2018-05-16 17:40:07 +00:00
Sanjay Patel	84caa9659e	[x86] add run with unsafe global param; NFC llvm-svn: 332486	2018-05-16 16:23:41 +00:00
Tony Tye	43259df44a	[AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume execution. No longer require the queue pointer to be passed in in fixed SGPRs. Differential Revision: https://reviews.llvm.org/D46769 llvm-svn: 332485	2018-05-16 16:19:34 +00:00
Sanjay Patel	b3ac148cb4	[x86] add tests for DAG FP undef operands; NFC llvm-svn: 332484	2018-05-16 16:16:48 +00:00
Sirish Pande	cabe50a308	[AArch64] Gangup loads and stores for pairing. Keep loads and stores together (target defines how many loads and stores to gang up), such that it will help in pairing and vectorization. Differential Revision https://reviews.llvm.org/D46477 llvm-svn: 332482	2018-05-16 15:36:52 +00:00
Matt Arsenault	67a9815a5c	AMDGPU: Custom lower v4i16/v4f16 vector operations Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453	2018-05-16 11:47:30 +00:00
Amara Emerson	0d6a26dffc	[GlobalISel][IRTranslator] Split aggregates during IR translation. We currently handle all aggregates by creating one large LLT, and letting the legalizer deal with splitting them up. However using this approach means that we can't support big endian code correctly. This patch changes the way that the IRTranslator deals with aggregate values, by splitting them up into their constituent element values. To do this, parts of the translator need to be modified to deal with multiple VRegs for a single Value. A new Value to VReg mapper is introduced to help keep compile time under control, currently there is no measurable impact on CTMark despite the extra code being generated in some cases. Patch is based on the original work of Tim Northover. Differential Revision: https://reviews.llvm.org/D46018 llvm-svn: 332449	2018-05-16 10:32:02 +00:00
Simon Dardis	5cf9de4b72	[mips] Add support for isBranchOffsetInRange and use it for MipsLongBranch Add support for this target hook, covering MIPS, microMIPS and MIPSR6, along with some tests. Also add missing getOppositeBranchOpc() cases exposed by the tests. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D46794 llvm-svn: 332446	2018-05-16 10:03:05 +00:00
Peter Smith	c811758da6	[AArch64] Support "S" inline assembler constraint This patch re-introduces the "S" inline assembler constraint. This matches an absolute symbolic address or a label reference. The primary use case is asm("adrp %0, %1\n\t" "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var)); I say re-introduces as it seems like "S" was implemented in the original AArch64 backend, but it looks like it wasn't carried forward to the merged backend. The original implementation had A and L modifiers that could be used to print ":lo12:" to the string. It looks like gcc doesn't use these and :lo12: is expected to be written in the inline assembly string so I've not implemented A and L. Clang already supports the S modifier. Fixes PR37180 Differential Revision: https://reviews.llvm.org/D46745 llvm-svn: 332444	2018-05-16 09:33:25 +00:00
Alexander Richardson	8f44579d0b	Emit a left-shift instead of a power-of-two multiply for jump-tables Summary: SelectionDAGLegalize::ExpandNode() inserts an ISD::MUL when lowering a BR_JT opcode. While many backends optimize this multiply into a shift, e.g. the MIPS backend currently always lowers this into a sequence of load-immediate+multiply+mflo in MipsSETargetLowering::lowerMulDiv(). I initially changed the multiply to a shift in the MIPS backend but it turns out that would not have handled the MIPSR6 case and was a lot more code than doing it in LegalizeDAG. I believe performing this simple optimization in LegalizeDAG instead of each individual backend is the better solution since this also fixes other backeds such as MSP430 which calls the multiply runtime function __mspabi_mpyi without this patch. Reviewers: sdardis, atanasyan, pftbest, asl Reviewed By: sdardis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45760 llvm-svn: 332439	2018-05-16 08:58:26 +00:00
Simon Pilgrim	5df1ef7a8c	[X86][SSE] Fix tests for vector rotates by splat variable. We weren't correctly splatting the offset shift llvm-svn: 332435	2018-05-16 08:23:47 +00:00
Eli Friedman	25bef201c5	[MachineOutliner] Add optsize markings to outlined functions. It doesn't matter much this late in the pipeline, but one place that does check for it is the function alignment code. Differential Revision: https://reviews.llvm.org/D46373 llvm-svn: 332415	2018-05-15 23:36:46 +00:00
Simon Pilgrim	de13589625	[X86][SSE] Add tests for vector rotates by splat variable. llvm-svn: 332410	2018-05-15 22:11:51 +00:00
Stanislav Mekhanoshin	57d341c27a	[AMDGPU] Fix handling of void types in isLegalAddressingMode It is legal for the type passed to isLegalAddressingMode to be unsized or, more specifically, VoidTy. In this case, we must check the legality of load / stores for all legal types. Directly trying to call getTypeStoreSize is incorrect, and leads to breakage in e.g. Loop Strength Reduction. This change guards against that behaviour. Differential Revision: https://reviews.llvm.org/D40405 llvm-svn: 332409	2018-05-15 22:07:51 +00:00
Marek Olsak	37b9f55cc6	AMDGPU: Add a missing test for the 128-bit local addr space option This should have been pushed with: "AMDGPU: enable 128-bit for local addr space under an option" llvm-svn: 332404	2018-05-15 21:41:57 +00:00
Evandro Menezes	8d522d811a	[AArch64] Improve single vector lane unscaled stores When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead. In this case, also using the unscaled offset. Differential revision: https://reviews.llvm.org/D46762 llvm-svn: 332394	2018-05-15 20:41:12 +00:00
Chandler Carruth	5ecd81aab0	[x86][eflags] Fix PR37431 by teaching the EFLAGS copy lowering to specially handle SETB_C* pseudo instructions. Summary: While the logic here is somewhat similar to the arithmetic lowering, it is different enough that it made sense to have its own function. I actually tried a bunch of different optimizations here and none worked well so I gave up and just always do the arithmetic based lowering. Looking at code from the PR test case, we actually pessimize a bunch of code when generating these. Because SETB_C* pseudo instructions clobber EFLAGS, we end up creating a bunch of copies of EFLAGS to feed multiple SETB_C* pseudos from a single set of EFLAGS. This in turn causes the lowering code to ruin all the clever code generation that SETB_C* was hoping to achieve. None of this is needed. Whenever we're generating multiple SETB_C* instructions from a single set of EFLAGS we should instead generate a single maximally wide one and extract subregs for all the different desired widths. That would result in substantially better code generation. But this patch doesn't attempt to address that. The test case from the PR is included as well as more directed testing of the specific lowering pattern used for these pseudos. Reviewers: craig.topper Subscribers: sanjoy, mcrosier, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D46799 llvm-svn: 332389	2018-05-15 20:16:57 +00:00
Tom Stellard	e182b28ae4	AMDGPU/GlobalISel: Implement select() for G_FCONSTANT Summary: Also clean up G_CONSTANT selection. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46170 llvm-svn: 332379	2018-05-15 17:57:09 +00:00
Simon Pilgrim	be9a206883	[X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classes BtVer2 - Fixes schedules for (V)CVTPS2PD instructions A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332376	2018-05-15 17:36:49 +00:00
Geoff Berry	32d07d59f5	[AArch64] Fix mir test case liveins info. The test case added in r332265 had incomplete livein information which was caught by the EXPENSIVE_CHECKS bot. Fix the livein information and add -verify-machineinstrs to the test case. llvm-svn: 332367	2018-05-15 16:27:34 +00:00
Krzysztof Parzyszek	8c389bd368	[Hexagon] Remove unused flag from subtarget and (non)corresponding test llvm-svn: 332365	2018-05-15 16:13:52 +00:00
Simon Dardis	f40eb03ce9	[mips] Mark select instructions correctly Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D46702 llvm-svn: 332364	2018-05-15 16:05:04 +00:00
Sanjay Patel	8652c53d29	[DAG] propagate FMF for all FPMathOperators This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. It gets us most of the FMF functionality that we want without adding any state bits to the flags. It also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch. It should provide a superset of the functionality from D46563 - the extra tests show propagation and codegen diffs for fcmp, vecreduce, and FP libcalls. The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, so that shouldn't make any difference. Differential Revision: https://reviews.llvm.org/D46854 llvm-svn: 332358	2018-05-15 14:16:24 +00:00
Simon Pilgrim	891ebcdbaa	[X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classes Btver2 - VCVTPH2PSYrm needs to double pump the AGU Broadwell - missing VCVTPS2PH*mr stores extra latency Allows us to remove the WriteCvtF2FSt conversion store class llvm-svn: 332357	2018-05-15 14:12:32 +00:00
Artur Gainullin	243a3d56d8	[X86] Improve unsigned saturation downconvert detection. Summary: New unsigned saturation downconvert patterns detection was implemented in X86 Codegen: (truncate (smin (smax (x, C1), C2)) to dest_type), where C1 >= 0 and C2 is unsigned max of destination type. (truncate (smax (smin (x, C2), C1)) to dest_type) where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2. These two patterns are equivalent to: (truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type) Reviewers: RKSimon Subscribers: llvm-commits, a.elovikov Differential Revision: https://reviews.llvm.org/D45315 llvm-svn: 332336	2018-05-15 10:24:12 +00:00
Craig Topper	fadf8b8dec	[X86] Add fast isel tests for some of the avx512 truncate intrinsics to match current clang codegen. llvm-svn: 332326	2018-05-15 04:26:27 +00:00
Sanjay Patel	165587b424	[AArch64] enhance test to show FMF loss; NFC llvm-svn: 332301	2018-05-14 21:53:21 +00:00
Martin Storsjo	ace7ae935f	[ARM] Back up R4 and LR if calling the stack probe function Differential Revision: https://reviews.llvm.org/D46777 llvm-svn: 332298	2018-05-14 21:32:52 +00:00
Sanjay Patel	4c8a67a229	[PowerPC] add more tests for FMF propagation; NFC llvm-svn: 332295	2018-05-14 21:17:49 +00:00
Krzysztof Parzyszek	771f2422d0	[Hexagon] Add a target feature for memop generation llvm-svn: 332285	2018-05-14 20:09:07 +00:00
Sid Manning	d9f2873511	Hexagon: Put relocations after instructions not packets. Change relocation output so that relocation information follows individual instructions rather than clustering them at the end of packets. This change required shifting block of code but the actual change is in HexagonPrettyPrinter's PrintInst. Differential Revision: https://reviews.llvm.org/D46728 llvm-svn: 332283	2018-05-14 19:46:08 +00:00
Craig Topper	53ceb4805f	[X86] Remove and autoupgrade avx512.vbroadcast.ss/avx512.vbroadcast.sd intrinsics. llvm-svn: 332271	2018-05-14 18:21:22 +00:00
Simon Pilgrim	228d24a2d6	[X86][BtVer2] Fix MMX/YMM integer vector nt store schedules MMX was missing and YMM was tagged as a fp nt store llvm-svn: 332269	2018-05-14 18:07:28 +00:00
Geoff Berry	64a2ea41ea	[BranchFolding] Allow hoisting to block with a single conditional branch. Summary: The BranchFolding pass is currently missing opportunities to hoist common code if the hoisted-to block contains a single conditional branch that has register uses. This occurs somewhat frequently on AArch64 with CBZ/TBZ opcodes. This change also eliminates some code differences when debug info is present since the presence of e.g. DBG_VALUE instructions in the hoisted-to block can enable hoisting that wouldn't have occurred without them. Reviewers: MatzeB, rnk, kparzysz, twoh, aprantl, javed.absar Subscribers: kristof.beyls, JDevlieghere, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46324 llvm-svn: 332265	2018-05-14 17:31:18 +00:00
Krzysztof Parzyszek	329c3e9a5f	[Hexagon] Avoid predicate copies to integer registers from store-locked llvm-svn: 332260	2018-05-14 16:41:40 +00:00
Evandro Menezes	14fa2e4fa5	[AArch64] Improve single vector lane stores When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead. Differential revision: https://reviews.llvm.org/D46655 llvm-svn: 332251	2018-05-14 15:26:35 +00:00
Craig Topper	f633f3eb67	[X86] Add fast isel test cases for the clang output for 512-bit cvtps2pd related intrinsics. llvm-svn: 332214	2018-05-14 05:09:41 +00:00
Craig Topper	0e71c6d5ca	[X86] Remove and autoupgrade the cvtusi2sd intrinsic. Use uitofp+insertelement instead. llvm-svn: 332206	2018-05-14 00:06:49 +00:00
Craig Topper	97e74b05ef	[X86] Add patterns for combining movss+uint_to_fp into the intrinsic instructions under AVX512. This matches what we do for sint_to_fp. llvm-svn: 332205	2018-05-13 23:24:21 +00:00
Craig Topper	12067185d4	[X86] Add fast-isel test cases for _mm_cvtu32_sd, _mm_cvtu64_sd, _mm_cvtu32_ss, and _mm_cvtu64_ss. llvm-svn: 332204	2018-05-13 23:24:19 +00:00
Craig Topper	85906cf041	[X86] Remove and autoupgrade masked vpermd/vpermps intrinsics. llvm-svn: 332198	2018-05-13 18:03:59 +00:00
Dimitry Andric	a39c409619	Follow-up to rL332176 by adding a test case for PR37264. Noticed by Simon Pilgrim. llvm-svn: 332197	2018-05-13 14:32:23 +00:00
Matt Arsenault	dfb88dfe30	AMDGPU: Make undef legal for v2i16/v2f16 This is apparently necessary to stop undef from being turned into a build_vector of 0s. llvm-svn: 332195	2018-05-13 10:04:38 +00:00
Puyan Lotfi	380a6f55ff	[NFC] MIR-Canon: switching to a stable string sorting of instructions. llvm-svn: 332191	2018-05-13 06:07:20 +00:00
Craig Topper	38b713d4a7	[X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic instructions. llvm-svn: 332189	2018-05-13 01:54:33 +00:00
Craig Topper	28b85caea8	[X86] Remove some unused CHECK lines from tests. llvm-svn: 332188	2018-05-13 00:58:23 +00:00
Craig Topper	df3a9cedff	[X86] Remove an autoupgrade legacy cvtss2sd intrinsics. llvm-svn: 332187	2018-05-13 00:29:40 +00:00
Craig Topper	38ad7ddabc	[X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what clang has used for a very long time. llvm-svn: 332186	2018-05-12 23:14:39 +00:00
Craig Topper	a288f241cd	[X86] Remove some unused masked conversion intrinsics that can be replaced with an older intrinsic and a select. This is what clang already uses. llvm-svn: 332170	2018-05-12 02:34:28 +00:00
Stanislav Mekhanoshin	7012c246c1	[AMDGPU] Fix amdgpu-waves-per-eu accounting in scheduler We cannot query this attribute from a subtarget given a machine function. At this point attribute itself is already unavailable and can only be obtained through MFI. Differential Revision: https://reviews.llvm.org/D46781 llvm-svn: 332166	2018-05-12 01:41:56 +00:00
Tom Stellard	655fdd3f82	AMDGPU/GlobalISel: Implement select() for >32-bit G_STORE Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46153 llvm-svn: 332154	2018-05-11 23:12:49 +00:00
Changpeng Fang	f094885a9e	AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction. Summary: We have no logic to promote alloca to vector for an AddrSpaceCast instruction. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D45993 llvm-svn: 332147	2018-05-11 22:17:57 +00:00
Craig Topper	a17d627abb	[X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer used by clang. llvm-svn: 332146	2018-05-11 21:59:34 +00:00
Yaxun Liu	deba150c27	[AMDGPU] Fix compilation failure when IR contains comdat Remove a useless SwitchSection which also causes compilation failure when IR contains comdat. The SwitchSection is useless because the current section is already correct text section for the function therefore no need to switch. It causes compilation failure for comdat because functions with comdat has specific text section, not the default .text section. Since HIP uses comdat, this bug caused failures for HIP. Differential Revision: https://reviews.llvm.org/D46770 llvm-svn: 332137	2018-05-11 20:40:14 +00:00
Vedant Kumar	99d5c072f0	[DAGCombiner] Set the right SDLoc on extended SETCC uses (7/N) ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to reflect a simplification to the load (ExtLoad). Based on my reading, ExtendSetCCUses may create new nodes to extend a constant attached to a SETCC. It also creates fresh SETCC nodes which refer to any updated operands. ISTM that the location applied to the new constant and SETCC nodes should be the same as the location of the ExtLoad. This was suggested by Adrian in https://reviews.llvm.org/D45995. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46216 llvm-svn: 332119	2018-05-11 18:40:10 +00:00
Vedant Kumar	fd340a4047	[DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N) This teaches tryToFoldExtOfLoad to set the right location on a newly-created extload. With that in place, the logic for performing a certain ([s\|z]ext (load ...)) combine becomes identical for sexts and zexts, and we can get rid of one copy of the logic. The test case churn is due to dependencies on IROrders inherited from the wrong SDLoc. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46158 llvm-svn: 332118	2018-05-11 18:40:08 +00:00
Simon Pilgrim	661ae7778d	[X86][BtVer2] Model ymm move as double pumped instructions We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions llvm-svn: 332109	2018-05-11 17:38:36 +00:00
Alex Bradbury	bca0c3cdb6	[RISCV] Support .option rvc and norvc assembler directives These directives allow the 'C' (compressed) extension to be enabled/disabled within a single file. Differential Revision: https://reviews.llvm.org/D45864 Patch by Kito Cheng llvm-svn: 332107	2018-05-11 17:30:28 +00:00
Geoff Berry	60460268c0	[AArch64] Fix performPostLD1Combine to check for constant lane index. Summary: performPostLD1Combine in AArch64ISelLowering looks for vector insert_vector_elt of a loaded value which it can optimize into a single LD1LANE instruction. The code checking for the pattern was not checking if the lane index was a constant which could cause two problems: - an assert when lowering the LD1LANE ISD node since it assumes an constant operand - an assert in isel if the lane index value depends on the post-incremented base register Both of these issues are avoided by simply checking that the lane index is a constant. Fixes bug 35822. Reviewers: t.p.northover, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46591 llvm-svn: 332103	2018-05-11 16:25:06 +00:00
Sanjoy Das	82105e2a7d	Use iteration instead of recursion in CFIInserter Summary: This recursive step can overflow the stack. Reviewers: djokov, petarj Subscribers: mcrosier, jlebar, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D46671 llvm-svn: 332101	2018-05-11 15:54:46 +00:00
Simon Pilgrim	22dd72b995	[X86] Split WriteF/WriteVec Move/Load/Store scheduler classes by vector width Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions llvm-svn: 332094	2018-05-11 14:30:54 +00:00
Tom Stellard	dcc95e9385	AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUI Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45883 llvm-svn: 332082	2018-05-11 05:44:16 +00:00
Craig Topper	1ee19ae126	[X86] Add new patterns for masked scalar load/store to match clang's codegen from r331958. Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets. So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened. We may be able to drop some of the old patterns, but I leave that for a future patch. llvm-svn: 332049	2018-05-10 21:49:16 +00:00
Tom Stellard	1e0edad4bb	AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16> Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45881 llvm-svn: 332042	2018-05-10 21:20:10 +00:00
Tom Stellard	1dc90204bf	AMDGPU/GlobalISel: Enable TableGen'd instruction selector Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45994 llvm-svn: 332039	2018-05-10 20:53:06 +00:00
Sam Clegg	a5908009cd	[WebAsembly] Update default triple in test files to wasm32-unknown-unkown. Summary: The final -wasm component has been the default for some time now. Subscribers: jfb, dschuff, jgravelle-google, eraman, aheejin, JDevlieghere, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D46342 llvm-svn: 332007	2018-05-10 17:49:11 +00:00
Simon Pilgrim	38ac0e9c6b	[X86] Split WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes Split off XMM classes from the default (MMX) classes. llvm-svn: 331999	2018-05-10 17:06:09 +00:00
Sanjay Patel	b4e7893ba8	[x86] fix fmaxnum/fminnum with nnan With nnan, there's no need for the masked merge / blend sequence (that probably costs much more than the min/max instruction). Somewhere between clang 5.0 and 6.0, we started producing these intrinsics for fmax()/fmin() in C source instead of libcalls or fcmp/select. The backend wasn't prepared for that, so we regressed perf in those cases. Note: it's possible that other targets have similar problems as seen here. Noticed while investigating PR37403 and related bugs: https://bugs.llvm.org/show_bug.cgi?id=37403 The IR FMF propagation cases still don't work. There's a proposal that might fix those cases in D46563. llvm-svn: 331992	2018-05-10 15:40:49 +00:00
Sanjay Patel	ec9b6be26f	[x86] fix test names; NFC llvm-svn: 331989	2018-05-10 14:58:47 +00:00
Sanjay Patel	be97134955	[x86] add tests for maxnum/minnum intrinsics with nnan; NFC Clang 6.0 was updated to create these intrinsics rather than libcalls or fcmp/select, but the backend wasn't prepared to handle that optimally. This bug is not the primary reason for PR37403: https://bugs.llvm.org/show_bug.cgi?id=37403 ...but it's probably more important for x86 perf. llvm-svn: 331988	2018-05-10 14:48:42 +00:00
Nirav Dave	a5ad417589	[DAG] Avoid using deleted node in rebuildSetCC Summary: The combine in rebuildSetCC may be combined to another node leaving our references stale. Keep a handle on it to avoid stale references. Fixes PR36602. Reviewers: dbabokin, RKSimon, eli.friedman, davide Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D46404 llvm-svn: 331985	2018-05-10 14:28:54 +00:00
Gabor Buella	a832b22bae	[X86] ptwrite intrinsic Reviewers: craig.topper, RKSimon Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D46539 llvm-svn: 331961	2018-05-10 07:26:05 +00:00
Artem Belevich	2f348ea1c7	[NVPTX] Added a feature to use short pointers for const/local/shared AS. Const/local/shared address spaces are all < 4GB and we can always use 32-bit pointers to access them. This has substantial performance impact on kernels that uses shared memory for intermediary results. The feature is disabled by default. Differential Revision: https://reviews.llvm.org/D46147 llvm-svn: 331941	2018-05-09 23:46:19 +00:00
Roman Tereshin	6d26638c90	[GlobalISel][Legalizer] Widening the second src op of shifts bug fix The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its value as a (small) unsigned integer, therefore its incorrect to widen it in any way but by zero extending it. G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their destination and first source operands, but not the "number of bits to shift" operand). Generally, shifts aren't as similar to regular binary operations as it might seem, for instance, they aren't commutative nor associative and the second source operand usually requires a special treatment. Reviewers: bogner, javed.absar, aivchenk, rovka Reviewed By: bogner Subscribers: igorb, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D46413 llvm-svn: 331926	2018-05-09 21:43:30 +00:00
Farhana Aleen	e24f3ff8de	[AMDGPU] Support horizontal vectorization of min/max. Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920	2018-05-09 21:18:34 +00:00
Matt Arsenault	eac81b2448	AMDGPU: Ignore any_extend in mul24 combine If a multiply is truncated, SimplifyDemandedBits sometimes turns a zero_extend of the inputs into an any_extend, which makes the known bits computation unhelpful. Ignore these and compute known bits for the underlying value, since we insert the correct extend type after. llvm-svn: 331919	2018-05-09 21:11:35 +00:00
Krzysztof Parzyszek	cff73a2118	[Hexagon] Add patterns for vector shift-and-accumulate llvm-svn: 331918	2018-05-09 21:10:41 +00:00
Matt Arsenault	74fd7600d2	AMDGPU: Handle partial shift reduction for variable shifts If the variable shift amount has known bits, we can still reduce the shift. llvm-svn: 331917	2018-05-09 20:52:54 +00:00
Matt Arsenault	b143d9a5ea	AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit This is an extension of an existing combine to reduce wider shls if the result fits in the final result type. This introduces the same combine, but reduces the shift to a middle sized type to avoid the slow 64-bit shift. llvm-svn: 331916	2018-05-09 20:52:43 +00:00
Matt Arsenault	762d498808	AMDGPU: Add combine for trunc of bitcast from build_vector If the truncate is only accessing the first element of the vector, we can use the original source value. This helps with some combine ordering issues after operations are lowered to integer operations between bitcasts of build_vector. In particular it stops unnecessarily materializing the unused top half of a vector in some cases. llvm-svn: 331909	2018-05-09 18:37:39 +00:00
Roman Tereshin	d5fa9fde58	Reapplying r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC The commit was a suspect for clang-cmake-aarch64-global-isel and clang-cmake-aarch64-quick bot failures, proved to be innocent. llvm-svn: 331898	2018-05-09 17:28:18 +00:00
Craig Topper	176ec8506f	[DAGCombiner] In visitBITCAST when trying to constant fold the bitcast, only call getBitcast if its an fp->int or int->fp conversion even when before legalize ops. Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification. This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations. llvm-svn: 331896	2018-05-09 17:14:27 +00:00
Simon Pilgrim	ab34aa8294	[X86] Cleanup WriteFStore/WriteVecStore schedules MOVNTPD/MOVNTPS should be WriteFStore Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction.... llvm-svn: 331864	2018-05-09 11:01:16 +00:00
Craig Topper	b9a473d186	[X86] Combine (vXi1 (bitcast (-1)))) and (vXi1 (bitcast (0))) to all ones or all zeros vXi1 vector. llvm-svn: 331847	2018-05-09 06:07:20 +00:00
Daniel Sanders	618437459c	Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Reverting this to see if the clang-cmake-aarch64-global-isel and clang-cmake-aarch64-quick bots are failing because of this commit. We know it wasn't r331819. llvm-svn: 331846	2018-05-09 05:00:17 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Roman Tereshin	27bba4495a	Revert r331819 [GlobalISel][Legalizer] More concise and faster widenScalar, NFC Reverting this to see if the clang-cmake-aarch64-global-isel and clang-cmake-aarch64-quick bots are failing because of this commit llvm-svn: 331839	2018-05-09 01:43:12 +00:00
Roman Tereshin	25cbfe680e	[GlobalISel][Legalizer] More concise and faster widenScalar, NFC Refactoring LegalizerHelper::widenScalar member function reducing its size by approximately a factor of 2 and (hopefuly) making it more straightforward and regular by introducing widenScalarSrc and widenScalarDst helper methods. The new widenScalar* methods mutate the instructions in place instead of recreating them from scratch and removing the originals. The compile time implications of this were measured on sqlite3 amalgamation, targeting AArch64 in -O0: LegalizerHelper::widenScalar: > 25% faster Legalizer::runOnMachineFunction: ~ 4.0 - 4.5% faster Also adding MachineOperand::setCImm and refactoring out MachineIRBuilder::recordInsertion methods to make the change possible. Reviewers: aditya_nandakumar, bogner, javed.absar, t.p.northover, ab, dsanders, arsenm Reviewed By: aditya_nandakumar Subscribers: wdng, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D46414 llvm-svn: 331819	2018-05-08 22:53:09 +00:00
Daniel Sanders	d24dcdd1f7	[globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Reviewed By: aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 llvm-svn: 331816	2018-05-08 22:26:39 +00:00
Jessica Paquette	ec37c640dd	Revert "[X86][CET] Shadow stack fix for setjmp/longjmp" This reverts commit 30962eca38ef02666ebcdded72a94f2cd0292d68. This commit has been causing test asan failures on a build bot. http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/45108/ Original commit: https://reviews.llvm.org/D46181 llvm-svn: 331813	2018-05-08 22:00:57 +00:00
Daniel Neilson	65a7eb71f9	Changing constants in a test (NFC) Summary: Changing the lengths of the atomic memory intrinsics in a test to make sure that they don't get lowered into loads/stores if/when expansion of these occurs in selectiondag. llvm-svn: 331800	2018-05-08 19:08:12 +00:00
Lei Huang	e41e3d3237	[Power9]Legalize and emit code for truncate and convert QP to HW and Byte Legalize and emit code for truncate and convert float128 to (un)signed short and (un)signed char. Differential Revision: https://reviews.llvm.org/D46194 llvm-svn: 331797	2018-05-08 18:52:06 +00:00
Matt Arsenault	80fb05dc28	AMDGPU: Fix broken check lines in test llvm-svn: 331796	2018-05-08 18:43:44 +00:00
Matt Arsenault	3ec8803f53	AMDGPU: Don't use undef in a test llvm-svn: 331795	2018-05-08 18:43:34 +00:00
Matt Arsenault	869cbedc81	AMDGPU: Fix broken dynamic vector indexing for packed types The intention of this was to multiply by 16, not shift by 16. llvm-svn: 331793	2018-05-08 18:43:25 +00:00
Lei Huang	6364288dba	[Power9]Legalize and emit code for truncate and convert Quad-Precision to Word Legalize and emit code for: * xscvqpswz : VSX Scalar truncate & Convert Quad-Precision to Signed Word * xscvqpuwz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Word Differential Revision: https://reviews.llvm.org/D45635 llvm-svn: 331790	2018-05-08 18:34:00 +00:00
Lei Huang	c517e95bc6	[Power9]Legalize and emit code for truncate and convert QP to DW Legalize and emit code for: * xscvqpsdz : VSX Scalar truncate & Convert Quad-Precision to Signed Dword * xscvqpudz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Dword Differential Revision: https://reviews.llvm.org/D45553 llvm-svn: 331787	2018-05-08 18:23:31 +00:00
Guozhi Wei	1aea95a9ea	[CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions. Differential Revision: https://reviews.llvm.org/D45537 llvm-svn: 331783	2018-05-08 17:58:32 +00:00
Lei Huang	c29229a644	[PowerPC] Unify handling for conversion of FP_TO_INT feeding a store Existing DAG combine only handles conversions for FP_TO_SINT: "{f32, f64} x { i32, i16 }" This patch simplifies the code to handle: "{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }" Differential Revision: https://reviews.llvm.org/D46102 llvm-svn: 331778	2018-05-08 17:36:40 +00:00
Stefan Maksimovic	c7113cc9e4	[mips][msa] Pattern match the splat.d instruction Introduced a new pattern for matching splat.d explicitly. Both splat.d and splati.d can now be generated from the @llvm.mips.splat.d intrinsic depending on whether an immediate value has been passed. Differential Revision: https://reviews.llvm.org/D45683 llvm-svn: 331771	2018-05-08 15:12:29 +00:00

1 2 3 4 5 ...

24537 Commits