llvm-project

Commit Graph

Author	SHA1	Message	Date
Guy Blank	de425ae753	[X86][AVX512] Add combine for TESTM Add an X86 combine for TESTM when one of the operands is a BUILD_VECTOR(0,0,...). TESTM op0, BUILD_VECTOR(0,0,...) -> BUILD_VECTOR(0,0,...) TESTM BUILD_VECTOR(0,0,...), op1 -> BUILD_VECTOR(0,0,...) Differential Revision: https://reviews.llvm.org/D36536 llvm-svn: 310787	2017-08-13 08:03:37 +00:00
Craig Topper	44cb1ffb6a	[X86] When handling addcarry intrinsic, create the flag result with the correct type so we don't crash if we use a memory instruction Summary: Previously we were creating the flag result with MVT::Other which is interpretted as a Chain node. If we used a memory form of the instruction we would end up with a copyToReg that consumed the chain result of the adcx instruction instead of the flag result. Pretty sure we should be using MVT::i32 here, that's what we do other places we create these node types. We should probably consider this for 5.0 as well. Reviewers: RKSimon, zvi, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36645 llvm-svn: 310784	2017-08-12 20:19:44 +00:00
Simon Pilgrim	5a86f0e717	[DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED) If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index. Reapplied with fix to only work with simple value types. Committed on behalf of @jbhateja (Jatin Bhateja) Differential Revision: https://reviews.llvm.org/D35788 llvm-svn: 310782	2017-08-12 17:43:25 +00:00
Simon Pilgrim	32546d1434	[X86] Regenerate merge store tests. NFCI. Gives us a much better idea of what is going on than just relying on a few checks. llvm-svn: 310780	2017-08-12 17:27:35 +00:00
Richard Smith	3704eba1d1	D36604: PR34148: Do not assume we can use a copy relocation for an `external_weak` global An `external_weak` global may be intended to resolve as a null pointer if it's not defined, so it doesn't make sense to use a copy relocation for it. Differential Revision: https://reviews.llvm.org/D36604 llvm-svn: 310773	2017-08-11 23:52:28 +00:00
Sanjay Patel	2b452c7192	[x86] add tests for rotate left/right with masked shifter; NFC As noted in the test comment, instcombine now produces the masked shift value even when it's not included in the source, so we should handle this. Although the AMD/Intel docs don't say it explicitly, over-rotating the narrow ops produces the same results. An existence proof that this works as expected on all x86 comes from gcc 4.9 or later: https://godbolt.org/g/K6rc1A llvm-svn: 310770	2017-08-11 22:38:40 +00:00
John Baldwin	eebcc47500	[MIPS] Use ABI to determine stack alignment. Summary: The stack alignment depends on the ABI (16 bytes for N32 and N64 and 8 bytes for O32), not the CPU type. Reviewers: sdardis Reviewed By: sdardis Subscribers: atanasyan, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D36326 llvm-svn: 310768	2017-08-11 22:07:56 +00:00
Sanjay Patel	5d6df36fde	[x86] regenerate test checks, add 64-bit run; NFC llvm-svn: 310767	2017-08-11 22:05:33 +00:00
Craig Topper	ac217b7aa3	[X86] Don't use fsin/fcos/fsincos instructions ever Summary: Previously we would use these instructions if sse was disabled and fastmath was enabled. As mentioned in D28335, this is a bad idea. Reviewers: efriedma, scanon, DavidKreitzer Reviewed By: DavidKreitzer Subscribers: zvi, llvm-commits Differential Revision: https://reviews.llvm.org/D36344 llvm-svn: 310762	2017-08-11 20:55:29 +00:00
Rafael Espindola	b8956a70d3	Fix access to undefined weak symbols in pic code When the access to a weak symbol is not a call, the access has to be able to produce the value 0 at runtime. We were sometimes producing code sequences where that was not possible if the code was leaded more than 4g away from 0. llvm-svn: 310756	2017-08-11 20:49:27 +00:00
Matt Arsenault	71bcbd451f	AMDGPU: Start adding tail call support Handle the sibling call cases. llvm-svn: 310753	2017-08-11 20:42:08 +00:00
Daniel Sanders	e6c216ed5b	Revert r310716 (and r310735): [globalisel][tablegen] Support zero-instruction emission. Two of the Windows bots are failing test\CodeGen\X86\GlobalISel\select-inc.mir which should not have been affected by the change. Reverting while I investigate. Also reverted r310735 because it builds on r310716. llvm-svn: 310745	2017-08-11 19:19:21 +00:00
Stanislav Mekhanoshin	7f37794ebd	[AMDGPU] Ported and adopted AMDLibCalls pass The pass does simplifications of well known AMD library calls. If given -amdgpu-prelink option it works in a pre-link mode which allows to reference new library functions which will be linked in later. In addition it also used to process traditional AMD option -fuse-native which allows to replace some of the functions with their fast native implementations from the library. The necessary glue to pass the prelink option and translate -fuse-native is to be added to the driver. Differential Revision: https://reviews.llvm.org/D36436 llvm-svn: 310731	2017-08-11 16:42:09 +00:00
Craig Topper	561092f233	[AVX512] Remove and autoupgrade many of the broadcast intrinsics Summary: This autoupgrades most of the broadcast intrinsics. They've been unused in clang for some time. This leaves the 32x2 intrinsics because they are still used in clang. Reviewers: RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36606 llvm-svn: 310725	2017-08-11 16:22:45 +00:00
Craig Topper	0f30fe9634	[x86] Enable some support for lowerVectorShuffleWithUndefHalf with AVX-512 Summary: This teaches 512-bit shuffles to detect unused halfs in order to reduce shuffle size. We may need to refine the 512-bit exit point. I couldn't remember if we had good cross lane shuffles for 8/16 bit with AVX-512 or not. I believe this is step towards being able to handle D36454 without a special case. From here we need to improve our ability to combine extract_subvector with insert_subvector and other extract_subvectors. And we need to support narrowing binary operations where we don't demand all elements. This may be improvements to DAGCombiner::narrowExtractedVectorBinOp(by recognizing an insert_subvector in addition to concat) or we may need a target specific combiner. Reviewers: RKSimon, zvi, delena, jbhateja Reviewed By: RKSimon, jbhateja Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36601 llvm-svn: 310724	2017-08-11 16:20:05 +00:00
Sanjay Patel	169dae70a6	[x86] use more shift or LEA for select-of-constants (2nd try) The previous rev (r310208) failed to account for overflow when subtracting the constants to see if they're suitable for shift/lea. This version add a check for that and more test were added in r310490. We can convert any select-of-constants to math ops: http://rise4fun.com/Alive/d7d For this patch, I'm enhancing an existing x86 transform that uses fake multiplies (they always become shl/lea) to avoid cmov or branching. The current code misses cases where we have a negative constant and a positive constant, so this is just trying to plug that hole. The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start with a select in IR, create a select DAG node, convert it into a sext, convert it back into a select, and then lower it to sext machine code. Some notes about the test diffs: 1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR. 2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a post-DAG problem though. 3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if that's a regression, but those would always be nearly equivalent. 4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs. 5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0. 6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again. 7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops. Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass. Differential Revision: https://reviews.llvm.org/D35340 llvm-svn: 310717	2017-08-11 15:44:14 +00:00
Daniel Sanders	1fb1ce0c87	[globalisel][tablegen] Support zero-instruction emission. Summary: Support the case where an operand of a pattern is also the whole of the result pattern. In this case the original result and all its uses must be replaced by the operand. However, register class restrictions can require a COPY. This patch handles both cases by always emitting the copy and leaving it for the register allocator to optimize. Depends on D35833 Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Subscribers: javed.absar, kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D36084 llvm-svn: 310716	2017-08-11 15:40:32 +00:00
Simon Dardis	ae5b53e7cd	[mips] Lift the assertion on the types that can be used with MipsGPRel Post commit review of rL308619 highlighted the need for handling N64 with -fno-pic. Testing reveale a stale assert when generating a GP relative addressing mode. This patch removes that assert and adds the necessary patterns for MIPS64 to perform gp relative addressing with -fno-pic (and the implicit -mno-abicalls + -mgpopt). Reviewers: atanasyan, nitesh.jain Differential Revision: https://reviews.llvm.org/D36472 llvm-svn: 310713	2017-08-11 14:36:05 +00:00
Nirav Dave	0a48e5d506	Improve handling of insert_subvector of bitcast values Fix insert_subvector / extract_subvector merges of bitcast values. Reviewers: efriedma, craig.topper, RKSimon Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D34571 llvm-svn: 310711	2017-08-11 13:21:41 +00:00
Nirav Dave	d1b3f09faa	[X86][DAG] Switch X86 Target to post-legalized store merge Move store merge to happen after intrinsic lowering to allow lowered stores to be merged. Some regressions due in MergeConsecutiveStores to missing insert_subvector that are addressed in follow up patch. Reviewers: craig.topper, efriedma, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34559 llvm-svn: 310710	2017-08-11 13:21:35 +00:00
Simon Pilgrim	83cf3a29b5	[DAGCombiner] Remove shuffle support from simplifyShuffleMask rL310372 enabled simplifyShuffleMask to support undef shuffle mask inputs, but its causing hangs. Removing support until I can triage the problem llvm-svn: 310699	2017-08-11 08:37:00 +00:00
Mikael Holmen	8b10680922	[IfConversion] Maintain the CFG when predicating/merging blocks in IfConvert* Summary: This fixes PR32721 in IfConvertTriangle and possible similar problems in IfConvertSimple, IfConvertDiamond and IfConvertForkedDiamond. In PR32721 we had a triangle EBB \| \ \| \| \| TBB \| / FBB where FBB didn't have any successors at all since it ended with an unconditional return. Then TBB and FBB were be merged into EBB, but EBB would still keep its successors, and the use of analyzeBranch and CorrectExtraCFGEdges wouldn't help to remove them since the return instruction is not analyzable (at least not on ARM). The edge updating code and branch probability updating code is now pushed into MergeBlocks() which allows us to share the same update logic between more callsites. This lets us remove several dependencies on analyzeBranch and completely eliminate RemoveExtraEdges. One thing that showed up with this patch was that IfConversion sometimes left a successor with 0% probability even if there was no branch or fallthrough to the successor. One such example from the test case ifcvt_bad_zero_prob_succ.mir. The indirect branch tBRIND can only jump to bb.1, but without the patch we got: bb.0: successors: %bb.1(0x80000000) bb.1: successors: %bb.1(0x80000000), %bb.2(0x00000000) tBRIND %r1, 1, %cpsr B %bb.1 bb.2: There is no way to jump from bb.1 to bb2, but still there is a 0% edge from bb.1 to bb.2. With the patch applied we instead get the expected: bb.0: successors: %bb.1(0x80000000) bb.1: successors: %bb.1(0x80000000) tBRIND %r1, 1, %cpsr B %bb.1 Since bb.2 had no predecessor at all, it was removed. Several testcases had to be updated due to this since the removed successor made the "Branch Probability Basic Block Placement" pass sometimes place blocks in a different order. Finally added a couple of new test cases: * PR32721_ifcvt_triangle_unanalyzable.mir: Regression test for the original problem dexcribed in PR 32721. * ifcvt_triangleWoCvtToNextEdge.mir: Regression test for problem that caused a revert of my first attempt to solve PR 32721. * ifcvt_simple_bad_zero_prob_succ.mir: Test case showing the problem where a wrong successor with 0% probability was previously left. * ifcvt_[diamond\|forked_diamond\|simple]_unanalyzable.mir Very simple test cases for the simple and (forked) diamond cases involving unanalyzable branches that can be nice to have as a base if wanting to write more complicated tests. Reviewers: iteratee, MatzeB, grosser, kparzysz Reviewed By: kparzysz Subscribers: kbarton, davide, aemerson, nemanjai, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34099 llvm-svn: 310697	2017-08-11 06:57:08 +00:00
Nirav Dave	4d28c0ff4f	[DAG] Relax type restriction for store merge Summary: Allow stores of bitcastable types to be merged by peeking through BITCAST nodes and recasting stored values constant and vector extract nodes as necessary. Reviewers: jyknight, hfinkel, efriedma, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34569 llvm-svn: 310655	2017-08-10 19:52:45 +00:00
Taewook Oh	f5040b9685	Make .file directive to have basename only Summary: Currently LLVM puts directory along with the filename in .file directive, but this behavior doesn't match gcc. There's a no clear description about which one is right (https://sourceware.org/binutils/docs/as/File.html#File), but one document (https://sourceware.org/gdb/current/onlinedocs/stabs/ELF-Linker-Relocation.html) suggests that STT_FILE symbol in elf file is expected to have basename only, which should have a same sting file .file directive according to (https://docs.oracle.com/cd/E26502_01/html/E28388/eoiyg.html). This also affects badly on the build system that uses hashing, as the directory info could be differnt from developer to developer even when they're working on same file. Reviewers: pcc, mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36018 llvm-svn: 310642	2017-08-10 18:17:11 +00:00
Nirav Dave	926e2d39bf	[X86] Keep dependencies when constructing loads in combineStore Summary: Preserve chain dependecies between old and new loads constructed to prevent loads from reordering below later stores. Fixes PR34088. Reviewers: craig.topper, spatel, RKSimon, efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36528 llvm-svn: 310604	2017-08-10 15:12:32 +00:00
Guy Blank	136b543745	[SelectionDAG] Allow constant folding for implicitly truncating BUILD_VECTOR nodes. In FoldConstantArithmetic, handle BUILD_VECTOR nodes that do implicit truncation on the elements. This is similar to what is done in FoldConstantVectorArithmetic. Differential Revision: https://reviews.llvm.org/D36506 llvm-svn: 310593	2017-08-10 14:09:50 +00:00
Zoran Jovanovic	f4f2d084c6	[mips][microMIPS] Extending size reduction pass with XOR16 Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. XOR instruction is transformed into 16-bit instruction XOR16, if possible. Differential Revision: https://reviews.llvm.org/D34239 llvm-svn: 310579	2017-08-10 10:27:29 +00:00
Elad Cohen	22ba97a0a6	[SelectionDAG] When scalarizing vselect, don't assert on a legal cond operand. When scalarizing the result of a vselect, the legalizer currently expects to already have scalarized the operands. While this is true for the true/false operands (which have the same type as the result), it is not case for the condition operand. On X86 AVX512, v1i1 is legal - this leads to operations such as '< N x type> vselect < N x i1> < N x type> < N x type>' where < N x type > is illegal to hit an assertion during the scalarization. The handling is similar to r205625. This also exposes the fact that (v1i1 extract_subvector) should be legal and selectable on AVX512 - We do this by custom lowering to vector_extract_elt. This still leaves us in some cases with redundant dag nodes which will be combined in a separate soon to come patch. This fixes pr33349. Differential revision: https://reviews.llvm.org/D36511 llvm-svn: 310552	2017-08-10 07:44:23 +00:00
Matthias Braun	a88587ce0c	ARM: Fix CMP_SWAP expansion Clean up after my misguided attempt in r304267 to "fix" CMP_SWAP returning an uninitialized status value. - I was always using tMOVi8 to zero the status register which cannot encode higher register numbers and llvm would silently miscompile) - Nobody was ever looking at that status value outside the expansion. ARMDAGToDAGISel::SelectCMP_SWAP() the only place creating CMP_SWAP instructions was not mapping anything to it. (The cmpxchg status value from llvm IR is lowered to a manual comparison after the CMP_SWAP) So this: - Renames the register from "status" to "temp" it make it obvious that it isn't used outside the expansion. - Remove the zeroing status/temp register. - Keep the live-in list improvements from r304267 Fixes http://llvm.org/PR34056 llvm-svn: 310534	2017-08-09 22:22:05 +00:00
Krzysztof Parzyszek	1966fd79a7	[Hexagon] Ignore DBG_VALUEs when counting instructions in hexagon-early-if llvm-svn: 310524	2017-08-09 21:22:05 +00:00
Matt Arsenault	36cd1859f3	AMDGPU: Fix assert on n inline asm constraint llvm-svn: 310515	2017-08-09 20:09:35 +00:00
Guy Blank	7f60c991ae	[X86][AVX512] Choose correct registers in vpbroadcastb/w Fixes the vpbroadcastb/w instructions which use GPRs as source operands, to use the correct registers. The full GPR should be used, and not the subregister, as it happens before the patch. Fixes pr33795 Differential Revision: https://reviews.llvm.org/D36479 llvm-svn: 310498	2017-08-09 17:21:01 +00:00
Sanjay Patel	6f80d6b46f	[x86] add more tests for select-of-constants; NFC This is to help recommit a fixed version of r310208. As shown in PR34097, we could miscompile if subtraction of the constants overflowed. llvm-svn: 310490	2017-08-09 15:57:02 +00:00
Florian Hahn	d68bc7ae8d	[ARM] Emit error when ARM exec mode is not available. Summary: A similar error message has been removed from the ARMTargetMachineBase constructor in r306939. With this patch, we generate an error message for the example below, compiled with -mcpu=cortex-m0, which does not have ARM execution mode. __attribute__((target("arm"))) int foo(int a, int b) { return a + b % a; } __attribute__((target("thumb"))) int bar(int a, int b) { return a + b % a; } By adding this error message to ARMBaseTargetMachine::getSubtargetImpl, we can deal with functions that set -thumb-mode in target-features. At the moment it seems like Clang does not have access to target-feature specific information, so adding the error message to the frontend will be harder. Reviewers: echristo, richard.barton.arm, t.p.northover, rengolin, efriedma Reviewed By: echristo, efriedma Subscribers: efriedma, aemerson, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D35627 llvm-svn: 310486	2017-08-09 15:39:10 +00:00
Florian Hahn	fc4b3951e9	[ARM] Remove FeatureNoARM implies ModeThumb. Summary: By removing FeatureNoARM implies ModeThumb, we can detect cases where a function's target-features contain -thumb-mode (enables ARM codegen for the function), but the architecture does not support ARM mode. Previously, the implication caused the FeatureNoARM bit to be cleared for functions with -thumb-mode, making the assertion in ARMSubtarget::ARMSubtarget [1] pointless for such functions. This assertion is the only guard against generating ARM code for architectures without ARM codegen support. Is there a place where we could easily generate error messages for the user? At the moment, we would generate ARM code for Thumb-only architectures. X86 has the same behavior as ARM, as in it only has an assertion and no error message, but I think for ARM an error message would be helpful. What do you think? For the example below, `llc -mtriple=armv7m-eabi test.ll -o -` will generate ARM assembler (or fail with an assertion error with this patch). Note that if we run the resulting assembler through llvm-mc, we get an appropriate error message, but not when codegen is handled through clang. ``` define void @bar() #0 { entry: ret void } attributes #0 = { "target-features"="-thumb-mode" } ``` [1] `c1f7b54cef/lib/Target/ARM/ARMSubtarget.cpp (L147)` Reviewers: t.p.northover, rengolin, peter.smith, aadg, silviu.baranga, richard.barton.arm, echristo Reviewed By: rengolin, echristo Subscribers: efriedma, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35569 llvm-svn: 310476	2017-08-09 13:53:28 +00:00
Jonas Paulsson	6228aeda65	[LSR / TTI / SystemZ] Eliminate TargetTransformInfo::isFoldableMemAccess() isLegalAddressingMode() has recently gained the extra optional Instruction* parameter, and therefore it can now do the job that previously only isFoldableMemAccess() could do. The SystemZ implementation of isLegalAddressingMode() has gained the functionality of checking for offsets, which used to be done with isFoldableMemAccess(). The isFoldableMemAccess() hook has been removed everywhere. Review: Quentin Colombet, Ulrich Weigand https://reviews.llvm.org/D35933 llvm-svn: 310463	2017-08-09 11:28:01 +00:00
Serguei Katkov	6ea2e81cf6	[ImplicitNullCheck] Fix the bug when dependent instruction accesses memory It is possible that dependent instruction may access memory. In this case we must reject optimization because the memory change will be visible in null handler basic block. So we will execute an instruction which we must not execute if check fails. Reviewers: sanjoy, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36392 llvm-svn: 310443	2017-08-09 05:17:02 +00:00
Jessica Paquette	d36945bf3a	[MachineOutliner] Ensure AArch64 outliner doesn't mess with W30 or LR Before, the outliner would mark all instructions that read from/modify LR as illegal. This doesn't handle W30, which overlaps with LR. This shouldn't be outlined. This commit fixes that by making modifiesRegister() and readsRegister() look at W30 + take in a TRI argument. This makes sure that modifiesRegister() and readsRegister() won't outline either of W30 and LR. https://reviews.llvm.org/D36435 llvm-svn: 310422	2017-08-08 21:51:26 +00:00
Connor Abbott	249fc7bd2a	[AMDGPU] Add llvm.amdgpu.update.dpp intrinsic Summary: Now that we've made all the necessary backend changes, we can add a new intrinsic which exposes the new capabilities to IR producers. Since llvm.amdgpu.update.dpp is a strict superset of llvm.amdgpu.mov.dpp, we should deprecate the former. We also add tests for all the functionality that was added in previous changes, now that we can access it via an IR construct. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34718 llvm-svn: 310399	2017-08-08 18:52:22 +00:00
Simon Pilgrim	91b7b991d4	[DAGCombiner] simplifyShuffleMask - handle UNDEF inputs from shuffles as well as BUILD_VECTOR Minor extension to D36393 llvm-svn: 310372	2017-08-08 16:10:33 +00:00
Nemanja Ivanovic	979dcb6f09	[PowerPC] Don't crash on larger splats achieved through 1-byte splats We've implemented a 1-byte splat using XXSPLTISB on P9. However, LLVM will produce a 1-byte splat even for wider element BUILD_VECTOR nodes. This patch prevents crashing in that situation. Differential Revision: https://reviews.llvm.org/D35650 llvm-svn: 310358	2017-08-08 13:52:45 +00:00
Amjad Aboud	6fa6813aec	[X86] Improved X86::CMOV to Branch heuristic. Resolved PR33954. This patch contains two more constraints that aim to reduce the noise cases where we convert CMOV into branch for small gain, and end up spending more cycles due to overhead. Differential Revision: https://reviews.llvm.org/D36081 llvm-svn: 310352	2017-08-08 12:17:56 +00:00
Nemanja Ivanovic	809fbfa6a1	[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it adds the handling for the special case where RHS == 0. Differential Revision: https://reviews.llvm.org/D34048 llvm-svn: 310346	2017-08-08 11:20:44 +00:00
Simon Pilgrim	ef44228acb	[DAGCombiner] Simplify shuffle mask index if the referenced input element is UNDEF Fixes one of the cases in PR34041. Differential Revision: https://reviews.llvm.org/D36393 llvm-svn: 310344	2017-08-08 11:03:30 +00:00
Daniel Sanders	0554004698	[globalisel][tablegen] Add support for importing 'imm' operands. Summary: This patch enables the import of rules containing 'imm' operands that do not constrain the acceptable values using predicates. Support for ImmLeaf will arrive in a later patch. Depends on D35681 Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35833 llvm-svn: 310343	2017-08-08 10:44:31 +00:00
Simon Pilgrim	8c1167df5c	[X86][AVX] Added test for broadcast shuffle from binary sources with undefs (D36393) llvm-svn: 310317	2017-08-07 22:20:06 +00:00
Evgeny Stupachenko	c675290680	Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720). The root cause of reverting was fixed - PR33514. Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310289	2017-08-07 19:56:34 +00:00
Connor Abbott	79f3ade51a	[AMDGPU] Add pseudo "old" source to all DPP instructions Summary: All instructions with the DPP modifier may not write to certain lanes of the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask aren't set, so the destination register may be both defined and modified. The right way to handle this is to add a constraint that the destination register is the same as one of the inputs. We could tie the destination to the first source, but that would be too restrictive for some use-cases where we want the destination to be some other value before the instruction executes. Instead, add a fake "old" source and tie it to the destination. Effectively, the "old" source defines what value unwritten lanes will get. We'll expose this functionality to users with a new intrinsic later. Also, we want to use DPP instructions for computing derivatives, which means we need to set WQM for them. We also need to enable the entire wavefront when using DPP intrinsics to implement nonuniform subgroup reductions, since otherwise we'll get incorrect results in some cases. To accomodate this, add a new operand to all DPP instructions which will be interpreted by the SI WQM pass. This will be exposed with a new intrinsic later. We'll also add support for Whole Wavefront Mode later. I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up the test. However, I could also keep the old behavior (where lanes that aren't written are undefined) if people want it. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34716 llvm-svn: 310283	2017-08-07 19:10:56 +00:00
Matt Arsenault	36b4b0bed7	AMDGPU: Remove -mcpu=SI Leftover from before amdgcn/r600 split. llvm-svn: 310277	2017-08-07 18:30:35 +00:00
Simon Pilgrim	0242cead2c	[X86][AVX] Add full test coverage of subvector_broadcasts from registers X86SubVBroadcast is for memory subvector broadcasts, but we must test that it handles all cases without the load as well just in case. This was noticed while I was triaging the test cases from PR34041. llvm-svn: 310268	2017-08-07 16:49:09 +00:00

1 2 3 4 5 ...

21168 Commits