llvm-project

Commit Graph

Author	SHA1	Message	Date
Zvi Rackover	48cdde0e59	[DAGCombine] Bail out if can't create a vector with at least two elements Summary: Fixes pr32278 Reviewers: igorb, craig.topper, RKSimon, spatel, hfinkel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30978 llvm-svn: 297878	2017-03-15 19:48:36 +00:00
Ahmed Bougacha	2fb8030748	[GlobalISel] Avoid translating synthetic constants to new G_CONSTANTS. Currently, we create a G_CONSTANT for every "synthetic" integer constant operand (for instance, for the G_GEP offset). Instead, share the G_CONSTANTs we might have created by going through the ValueToVReg machinery. When we're emitting synthetic constants, we do need to get Constants from the context. One could argue that we shouldn't modify the context at all (for instance, this means that we're going to use a tad more memory if the constant wasn't used elsewhere), but constants are mostly harmless. We currently do this for extractvalue and all. For constant fcmp, this does mean we'll emit an extra COPY, which is not necessarily more optimal than an extra materialized constant. But that preserves the current intended design of uniqued G_CONSTANTs, and the rematerialization problem exists elsewhere and should be resolved with a single coherent solution. llvm-svn: 297875	2017-03-15 19:21:11 +00:00
Ahmed Bougacha	62cd73d989	[GlobalISel][AArch64] Select ADDXri. We're now able to select ADDWri thanks to the new complex pattern support. Extend that to ADDXri. llvm-svn: 297874	2017-03-15 19:20:59 +00:00
Matt Arsenault	86e02ce2dc	AMDGPU: Fix unnecessary ands when packing f16 vectors computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873	2017-03-15 19:04:26 +00:00
Tim Northover	0d98b03b9f	ARM: avoid clobbering register in v6 jump-table expansion. If we got unlucky with register allocation and actual constpool placement, we could end up producing a tTBB_JT with an index that's already been clobbered. Technically, we might be able to fix this situation up with a MOV, but I think the constant islands pass is complex enough without having to deal with more weird edge-cases. llvm-svn: 297871	2017-03-15 18:38:13 +00:00
Ahmed Bougacha	07f247b6c2	[GlobalISel] Insert translated switch icmp blocks after switch parent. Now that we preserve the IR layout, we would end up with all the newly synthesized switch comparison blocks at the end of the function. Instead, use a hopefully more reasonable layout, with the comparison blocks immediately following the switch comparison blocks. llvm-svn: 297869	2017-03-15 18:22:37 +00:00
Ahmed Bougacha	a61c214f51	[GlobalISel] Preserve IR block layout. It makes the output function layout more predictable; the layout has an effect on performance, we don't want it to be at the mercy of the translator's visitation order and such. The predictable output is also easier to digest. getOrCreateBB isn't appropriately named anymore, as it never needs to create anything. Rename it and extract the MBB creation logic out of it. A couple tests were sensitive to the order. Update them. llvm-svn: 297868	2017-03-15 18:22:33 +00:00
Ahmed Bougacha	1a6deeefe0	[GlobalISel][AArch64] Add back constant select tests. NFC. More of r297856. llvm-svn: 297859	2017-03-15 16:51:41 +00:00
Ahmed Bougacha	d691cf731c	[GlobalISel][AArch64] Use appropriate test function names. NFC. These FP tests are on FPR, not GPR. Don't lie in the name. llvm-svn: 297857	2017-03-15 16:29:40 +00:00
Ahmed Bougacha	170778f0db	[GlobalISel][AArch64] Split out select tests. NFC. The test has grown enough to be annoying to navigate. While there, Remove unnecessary RUNs, and cleanup a couple comments. llvm-svn: 297856	2017-03-15 16:29:37 +00:00
Peter Collingbourne	d44a01aae6	CodeGen: Use the source filename as the argument to .file, rather than the module ID. Using the module ID here is wrong for a couple of reasons: 1) The module ID is not persisted, so we can end up with different object file contents given the same input file (for example if the same file is accessed via different paths). 2) With ThinLTO the module ID field may contain the path to a bitcode file, which is incorrect, as the .file argument is supposed to contain the path to a source file. Differential Revision: https://reviews.llvm.org/D30584 llvm-svn: 297853	2017-03-15 16:24:52 +00:00
Simon Pilgrim	018eedd9a5	[SelectionDAG] Support BUILD_VECTOR implicit truncation in SelectionDAG::ComputeNumSignBits (PR32273) llvm-svn: 297852	2017-03-15 16:22:24 +00:00
Nemanja Ivanovic	ffcf0fb1cc	[PowerPC][Altivec] Add mfvrd and mffprd extended mnemonic mfvrd and mffprd are both alias to mfvrsd. This patch enables correct parsing of the aliases, but we still emit a mfvrsd. Committing on behalf of brunoalr (Bruno Rosa). Differential Revision: https://reviews.llvm.org/D29177 llvm-svn: 297849	2017-03-15 16:04:53 +00:00
Simon Pilgrim	a5f332edd1	[SelectionDAG][AArch64] Add test case showing incorrect SelectionDAG::ComputeNumSignBits BUILD_VECTOR handling Reduced from a mixture of PR32273 and David Green's test cases showing SelectionDAG::ComputeNumSignBits not correctly handling BUILD_VECTOR implicit truncation of inputs. llvm-svn: 297847	2017-03-15 15:40:34 +00:00
Artyom Skrobov	e72e1ba434	Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648" This reverts r297820 which apparently fails on A15 hosts. llvm-svn: 297842	2017-03-15 14:50:43 +00:00
Eric Liu	8f49635500	Add 'REQUIRES: asserts' to pr32278.ll introduced in r297822 llvm-svn: 297835	2017-03-15 13:37:20 +00:00
Simon Pilgrim	493f4462bf	[X86][SSE] Fixed shuffle MOVSS/MOVSD combining of all zeroable inputs Turns out it can happen, so the assertion was too harsh Found during fuzz testing llvm-svn: 297833	2017-03-15 13:16:46 +00:00
Petar Jovanovic	b71386a4a4	[Mips] Add support to match more patterns for DEXT and CINS This patch adds support for recognizing more patterns to match to DEXT and CINS instructions. It finds cases where multiple instructions could be replaced with a single DEXT or CINS instruction. For example, for the following: define i64 @dext_and32(i64 zeroext %a) { entry: %and = and i64 %a, 4294967295 ret i64 %and } instead of generating: 0000000000000088 <dext_and32>: 88: 64010001 daddiu at,zero,1 8c: 0001083c dsll32 at,at,0x0 90: 6421ffff daddiu at,at,-1 94: 03e00008 jr ra 98: 00811024 and v0,a0,at 9c: 00000000 nop the following gets generated: 0000000000000068 <dext_and32>: 68: 03e00008 jr ra 6c: 7c82f803 dext v0,a0,0x0,0x20 Cases that are covered: DEXT: 1. and $src, mask where mask > 0xffff 2. zext $src zero extend from i32 to i64 CINS: 1. and (shl $src, pos), mask 2. shl (and $src, mask), pos 3. zext (shl $src, pos) zero extend from i32 to i64 Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D30464 llvm-svn: 297832	2017-03-15 13:10:08 +00:00
Zvi Rackover	4aacd5d3c4	Fix malformed XFAIL in previous commit llvm-svn: 297823	2017-03-15 11:44:14 +00:00
Zvi Rackover	81f7b88910	[DAGCombine] Add reproducer for pr32278 llvm-svn: 297822	2017-03-15 11:34:51 +00:00
Artyom Skrobov	3fa5fd1dd2	[Thumb1] Fix the bug when adding/subtracting -2147483648 Differential Revision: https://reviews.llvm.org/D30829 llvm-svn: 297820	2017-03-15 10:19:16 +00:00
Sam Parker	654cb8263a	[ARM] Enable SMLAL[B\|T] isel Enable the selection of the 64-bit signed multiply accumulate instructions which operate on 16-bit operands. These are enabled for ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb architectures. Differential Revision: https://reviews.llvm.org/D30044 llvm-svn: 297809	2017-03-15 08:27:11 +00:00
Taewook Oh	fb1833efeb	[BranchFolding] Merge debug locations from common tail instead of removing Summary: D25742 improved the precision of debug locations for PGO by removing debug locations from common tail when tail-merging. However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them. This patch creates a merged debug location of identical instructions across SameTails and assign it to the instruction in the common tail, so that the debug locations are maintained if they are same across identical instructions. Reviewers: aprantl, probinson, MatzeB, rob.lougher Reviewed By: aprantl Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D30226 llvm-svn: 297805	2017-03-15 05:44:59 +00:00
Peter Collingbourne	7f6e2c97b8	Ensure that prefix data is preserved with subsections-via-symbols On MachO platforms that use subsections-via-symbols dead code stripping will drop prefix data. Unfortunately there is no great way to convey the relationship between a function and its prefix data to the linker. We are forced to use a bit of a hack: we give the prefix data it’s own symbol, and mark the actual function entry an .alt_entry. Patch by Moritz Angermann! Differential Revision: https://reviews.llvm.org/D30770 llvm-svn: 297804	2017-03-15 04:18:16 +00:00
Volkan Keles	4862c63594	[GlobalISel] IRTranslator: Return the scalar for <1 x Ty> constant vectors Summary: <1 x Ty> is not a legal vector type in LLT, we shouldn’t build G_MERGE_VALUES instruction for them. Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab, javed.absar Reviewed By: qcolombet Subscribers: dberris, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D30948 llvm-svn: 297792	2017-03-14 23:45:06 +00:00
Daniel Sanders	8a4bae9993	[globalisel][tblgen] Add support for ComplexPatterns Summary: Adds a new kind of MachineOperand: MO_Placeholder. This operand must not appear in the MIR and only exists as a way of creating an 'uninitialized' operand until a matcher function overwrites it. Depends on D30046, D29712 Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet Reviewed By: qcolombet Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D30089 llvm-svn: 297782	2017-03-14 21:32:08 +00:00
Simon Pilgrim	cf2da96c82	[SelectionDAG] Add a signed integer absolute ISD node Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780	2017-03-14 21:26:58 +00:00
Sanjay Patel	8dd99dce6c	[DAG] vector div/rem with any zero element in divisor is undef This is the backend counterpart to: https://reviews.llvm.org/rL297390 https://reviews.llvm.org/rL297409 and follow-up to: https://reviews.llvm.org/rL297384 It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic, but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those someday. Differential Revision: https://reviews.llvm.org/D30826 llvm-svn: 297762	2017-03-14 18:06:28 +00:00
Simon Pilgrim	3a196cbc4f	[X86] Add extra BITREVERSE tests Test on 32-bit and 64-bit targets. Add bitreverse tests for i64, i32 and i16 llvm-svn: 297741	2017-03-14 14:03:16 +00:00
Simon Pilgrim	e1a72a936f	[X86][MMX] Update FIXME comment. NFCI. llvm-svn: 297736	2017-03-14 12:13:41 +00:00
Sam Parker	916b1ba617	[ARM] Move SMULW[B\|T] isel to DAG Combine Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716	2017-03-14 09:13:22 +00:00
Oren Ben Simhon	fe34c5e429	Disable Callee Saved Registers Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller. Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list. The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee. The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee. Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span). The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments. The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC. Differential Revision: https://reviews.llvm.org/D28566 llvm-svn: 297715	2017-03-14 09:09:26 +00:00
Craig Topper	7a5ee1c5ed	[AVX-512] Use iPTR instead of i64 in patterns for extract_subvector/insert_subvector index. llvm-svn: 297707	2017-03-14 06:40:04 +00:00
Craig Topper	b0a82eaea6	[AVX-512] Add test cases that demonstrate some patterns that don't work correctly in 32-bit mode. NFC llvm-svn: 297706	2017-03-14 06:40:00 +00:00
Nirav Dave	4fc8401abf	Recommitting Craig Topper's patch now that r296476 has been recommitted. When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node. This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency. llvm-svn: 297698	2017-03-14 01:42:23 +00:00
Nirav Dave	54e22f33d9	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695	2017-03-14 00:34:14 +00:00
Artyom Skrobov	bf19d4bc29	[Thumb1] combine ADDC/SUBC with a negative immediate Summary: This simple optimization has been split out of https://reviews.llvm.org/D30400 Reviewers: efriedma, jmolloy Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30829 llvm-svn: 297682	2017-03-13 22:36:14 +00:00
Craig Topper	784f241b59	[AVX-512] Fix another case where we are copying from a mask register using AH/BH/CH/DH with fastisel. Fixes PR32256. Still planning to do an audit for other possible cases. llvm-svn: 297678	2017-03-13 21:58:54 +00:00
Volkan Keles	38a91a0de6	GlobalISel: Translate ConstantDataVector Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, javed.absar, ab Reviewed By: qcolombet, dsanders, ab Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30216 llvm-svn: 297670	2017-03-13 21:36:19 +00:00
Tim Northover	55e6f10d69	Revert "GlobalISel: move vector extract/insert inside generic opcode region." I was writing against an earlier branch and Volkan had already fixed this. llvm-svn: 297668	2017-03-13 21:25:10 +00:00
Simon Pilgrim	9df7d08cb2	[X86][MMX] Fix folding of shift value loads to cover whole 64-bits rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value. This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source. Found during fuzz testing. Differential Revision: https://reviews.llvm.org/D30833 llvm-svn: 297667	2017-03-13 21:23:29 +00:00
Tim Northover	0f1d32d557	GlobalISel: move vector extract/insert inside generic opcode region. Otherwise they won't be legalized or selected, causing instruction selection to fail horribly. llvm-svn: 297666	2017-03-13 21:18:59 +00:00
Andrew Kaylor	a11d020699	Revert r295004 (Add MXCSR) due to errors reported by MachineVerifier I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced. llvm-svn: 297664	2017-03-13 20:35:10 +00:00
Matt Arsenault	971c85ebb4	AMDGPU: Treat 0 as private null pointer in addrspacecast lowering llvm-svn: 297658	2017-03-13 19:47:31 +00:00
Jessica Paquette	c984e21394	[Outliner] Add tail call support This commit adds tail call support to the MachineOutliner pass. This allows the outliner to insert jumps rather than calls in areas where tail calling is possible. Outlined tail calls include the return or terminator of the basic block being outlined from. Tail call support allows the outliner to take returns and terminators into consideration while finding candidates to outline. It also allows the outliner to save more instructions. For example, in the X86-64 outliner, a tail called outlined function saves one instruction since no return has to be inserted. llvm-svn: 297653	2017-03-13 18:39:33 +00:00
Craig Topper	616641632e	[X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input source optimizations to break execution dependencies. For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2. llvm-svn: 297652	2017-03-13 18:34:46 +00:00
Craig Topper	eb7ea28bdd	[AVX-512] If gather mask is all ones, force the input to a zero vector. We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too. Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today. llvm-svn: 297651	2017-03-13 18:17:46 +00:00
Diana Picus	94db2e288b	[ARM] GlobalISel: Support SP in regbankselect We used to hit an unreachable in getRegBankFromRegClass when dealing with the stack pointer. This commit adds support for the GPRsp reg class. llvm-svn: 297621	2017-03-13 14:28:34 +00:00
Craig Topper	7746565754	[AVX-512] Add EVEX2VEX test cases for the cvt instructions fixed in r297599 and r297600. llvm-svn: 297603	2017-03-13 05:47:56 +00:00
Craig Topper	bb4089d260	Revert "[AVX-512] EVEX2VEX, don't reject intrinsic instructions when both have a memory operand. We should just continue to check other operands instead." This reverts r297596. There were other issues that were making this not work that have been fixed now. Reverting this results in a more accurate table. llvm-svn: 297602	2017-03-13 05:34:03 +00:00
Craig Topper	166085f0f2	[AVX-512] EVEX2VEX, don't reject intrinsic instructions when both have a memory operand. We should just continue to check other operands instead. This exposed that we have several intrinsic instructions that have identical TSFlags to other instructions. We should merge their patterns and kill of the duplicate. I'll fix that in a follow up patch. llvm-svn: 297596	2017-03-13 00:36:49 +00:00
Craig Topper	7d56c8315b	[AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics. The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly. llvm-svn: 297591	2017-03-12 22:29:12 +00:00
Sanjay Patel	f06b963a2b	[x86] don't blindly transform SETB into SBB I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result. This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch. Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files where this transform still fires. The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D30611 llvm-svn: 297586	2017-03-12 18:28:48 +00:00
Azharuddin Mohammed	473b75c3d5	Remove CRC32 instructions from AArch64InstrInfo::hasShiftedReg Summary: A53 scheduler causes an assertion failure on all CRC instructions: include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i < getNumOperands() && "getOperand() out of range!"' failed. The case statements corresponding to CRC instructions are incorrect and should be removed. Also adding a testcase while on this. Reviewers: t.p.northover, javed.absar, apazos, rengolin Reviewed By: rengolin Subscribers: evandro, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30274 llvm-svn: 297582	2017-03-12 14:02:32 +00:00
Igor Breger	293dfb9768	[X86] Add vector zext tests. llvm-svn: 297581	2017-03-12 13:20:10 +00:00
Craig Topper	58647b16e5	[AVX-512] Fix a bad use of a high GR8 register after copying from a mask register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask. I'm pretty sure there are more problems lurking here. But I think this fixes PR32241. I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again. llvm-svn: 297574	2017-03-12 03:37:37 +00:00
Craig Topper	e726cd0cd1	[AVX-512] Add test case for PR32241. Fix coming in another commit. llvm-svn: 297573	2017-03-12 03:37:34 +00:00
Simon Pilgrim	18debfa5b4	[X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41) Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte. This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte. Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate? Differential Revision: https://reviews.llvm.org/D29841 llvm-svn: 297568	2017-03-11 20:42:31 +00:00
Craig Topper	d511c2ce04	[X86] Add avx2 gather tests cases that show a failure to remove zeroing of the source when the mask is all ones. llvm-svn: 297564	2017-03-11 18:26:00 +00:00
Matt Arsenault	dd905b0e9b	AMDGPU: Remove packf16 intrinsic llvm-svn: 297557	2017-03-11 05:51:16 +00:00
Matt Arsenault	3cb9ff8863	AMDGPU: Keep track of modifiers when converting v_mac to v_mad Since v_max_f32_e64/v_max_f16_e64 can be folded if the target instruction supports the clamp bit, we also need to maintain modifiers when converting v_mac to v_mad. This fixes a rendering issue with Dirt Rally because a v_mac instruction with the clamp bit set was converted to a v_mad but that bit was lost during the conversion. Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit") Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 297556	2017-03-11 05:40:40 +00:00
Stanislav Mekhanoshin	79da2a7698	[AMDGPU] Remove getBidirectionalReasonRank This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536	2017-03-11 00:29:27 +00:00
Krzysztof Parzyszek	0e7b1f83b7	[RDF] Remove the map of reaching defs from copy propagation Use Liveness::getNearestAliasedRef to find the reaching def instead. llvm-svn: 297526	2017-03-10 22:44:24 +00:00
Simon Pilgrim	128a10a41d	[X86][SSE] Fix load folding for (V)CVTDQ2PD This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl llvm-svn: 297523	2017-03-10 22:35:07 +00:00
Simon Pilgrim	9956661456	[X86][RTM] Regenerate RTM intrinsic tests for 32/64-bit targets. llvm-svn: 297518	2017-03-10 21:55:24 +00:00
Volkan Keles	970fee4bfe	GlobalISel: Translate ConstantAggregateZero vectors Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar Reviewed By: qcolombet Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30259 llvm-svn: 297509	2017-03-10 21:23:13 +00:00
Volkan Keles	04cb08cc83	[GlobalISel] Translate insertelement and extractelement Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar Reviewed By: qcolombet Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30761 llvm-svn: 297495	2017-03-10 19:08:28 +00:00
Simon Pilgrim	7dedbfa89d	[SelectionDAG] Add support for BUILD_VECTOR to ComputeNumSignBits llvm-svn: 297492	2017-03-10 18:36:46 +00:00
Simon Pilgrim	e54cd65399	[X86][SSE] Added tests showing missed truncations for sitofp conversion SelectionDAG::ComputeNumSignBits is poor at build_vector handling, meaning that we can't see that all the vXi64 sources are in fact sign extended i32 or smaller. llvm-svn: 297486	2017-03-10 18:01:53 +00:00
Amaury Sechet	62e0759d56	[SelectionDAG] Make SelectionDAG aware of the known bits in USUBO and SSUBO and SUBC. Summary: Depends on D30379 This improves the state of things for the sub class of operation. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30436 llvm-svn: 297482	2017-03-10 17:26:44 +00:00
Simon Pilgrim	ed655f09db	[X86][MMX] Add tests showing missed opportunities to use MMX sitofp conversions If we are transferring MMX registers to XMM for conversion we could use the MMX equivalents (CVTPI2PD + CVTPI2PS) without affecting rounding/exceptions etc. llvm-svn: 297481	2017-03-10 17:23:55 +00:00
Amaury Sechet	69fa16c810	[SelectionDAG] Make SelectionDAG aware of the known bits in UADDO and SADDO. Summary: As per title. This is extracted from D29872 and I threw SADDO in. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30379 llvm-svn: 297479	2017-03-10 17:06:52 +00:00
Simon Pilgrim	c6b55729a5	[X86][MMX] Add tests showing missed opportunities to use MMX fptosi conversions If we are transferring XMM conversion results to MMX registers we could use the MMX equivalents (CVTPD2PI/CVTTPD2PI + CVTPS2PI/CVTTPS2PI) with affecting rounding/expections etc. llvm-svn: 297476	2017-03-10 16:59:43 +00:00
Simon Pilgrim	b8856148d9	[X86][MMX] Updated bad stack spill shift value test to actually show the problem Cleaning up the ir had stopped showing the issue. llvm-svn: 297475	2017-03-10 16:18:50 +00:00
Simon Pilgrim	67d25b298a	[X86][MMX] Regenerate mmx bitcast tests llvm-svn: 297474	2017-03-10 16:07:39 +00:00
Simon Pilgrim	caa9172ba7	[X86][MMX] Add test showing bad stack spill of shift value i32 is spilled to stack but 64-bit mmx is reloaded - leaving garbage in the other half of the register llvm-svn: 297471	2017-03-10 15:53:41 +00:00
Simon Pilgrim	63ad95aee6	[X86][MMX] Regenerate mmx load folding tests llvm-svn: 297470	2017-03-10 15:41:05 +00:00
Simon Dardis	7090d145e8	[mips][msa] Accept more values for constant splats This patches teaches the MIPS backend to accept more values for constant splats. Previously, only 10 bit signed immediates or values that could be loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes that constraint so that any constant value that be splatted is accepted. As a result, the constant pool is used less for vector operations, and the suite of bit manipulation instructions b(clr\|set\|neg)i can now be used with the full range of their immediate operand. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30640 llvm-svn: 297457	2017-03-10 13:27:14 +00:00
Artyom Skrobov	0c93ceb5d8	For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes, same as already done for ARM and Thumb2. Reviewers: jmolloy, rogfer01, efriedma Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30400 llvm-svn: 297443	2017-03-10 07:40:27 +00:00
Sanjay Patel	65e2e6805a	[x86] add tests for vec div/rem with 0 element in divisor; NFC llvm-svn: 297433	2017-03-10 00:55:29 +00:00
Ahmed Bougacha	4ec6d5abed	[GlobalISel] Fallback when failing to translate invoke. We unintentionally stopped falling back in r293670. While there, change an unusual construct. llvm-svn: 297425	2017-03-10 00:25:35 +00:00
Tim Northover	aa995c98f4	GlobalISel: support trivial inlineasm calls. They're used for nefarious purposes by ObjC. llvm-svn: 297422	2017-03-09 23:36:26 +00:00
Amaury Sechet	e7d102cf02	[DAGCombiner] Do various combine on uaddo. Summary: This essentially does the same transform as for ADC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30417 llvm-svn: 297416	2017-03-09 22:47:00 +00:00
Krzysztof Parzyszek	544210304f	[Hexagon] Fixes to the bitsplit generation - Fix the insertion point, which occasionally could have been incorrect. - Avoid creating multiple bitsplits with the same operands, if an old one could be reused. llvm-svn: 297414	2017-03-09 22:02:14 +00:00
Tim Northover	d1e951e5eb	GlobalISel: inform FrameLowering when we emit a function call. Amongst other things (I expect) this is necessary to ensure decent backtraces when an "unreachable" is involved. llvm-svn: 297413	2017-03-09 22:00:39 +00:00
Tim Northover	7a9ea8f628	GlobalISel: put debug info for static allocas in the MachineFunction. The good reason to do this is that static allocas are pretty simple to handle (especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline should give some kind of performance benefit. The bad reason is that the debug pipeline is an unholy mess of implicit contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a load or not involves the services of at least 3 soothsayers and the sacrifice of at least one chicken. And it still gets it wrong if the variable is at SP directly. llvm-svn: 297410	2017-03-09 21:12:06 +00:00
Amaury Sechet	10425de063	[DAGCombiner] Do various combine on usubo. Summary: This essentially does the same transform as for SUBC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30437 llvm-svn: 297404	2017-03-09 19:28:00 +00:00
Krzysztof Parzyszek	78c4fcf12e	[Hexagon] Propagate zext of i1 into arithmetic code in selection DAG (op ... (zext i1 c) ...) -> (select c (op ... 1 ...), (op ... 0 ...)) llvm-svn: 297391	2017-03-09 16:29:30 +00:00
Sam Parker	b308b48d69	[ARM] Remove t2xtpk feature from tests I previously removed the T2XtPk feature from the ARM backend, but it looks like I missed some of the tests that were using the feature. Differential Revision: https://reviews.llvm.org/D30778 llvm-svn: 297386	2017-03-09 15:14:32 +00:00
Sanjay Patel	df21979db7	[DAG] recognize div/rem by 0 as undef before trying constant folding As discussed in the review thread for rL297026, this is actually 2 changes that would independently fix all of the test cases in the patch: 1. Return undef in FoldConstantArithmetic for div/rem by 0. 2. Move basic undef simplifications for div/rem (simplifyDivRem()) before foldBinopIntoSelect() as a matter of efficiency. I will handle the case of vectors with any zero element as a follow-up. That change is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic(). I'm deleting the test for PR30693 because it does not test for the actual bug any more (dangers of using bugpoint). Differential Revision: https://reviews.llvm.org/D30741 llvm-svn: 297384	2017-03-09 15:02:25 +00:00
Simon Dardis	7577ce2140	[mips] Revert fixes for PR32020. The fix introduces segfaults and clobbers the value to be stored when the atomic sequence loops. Revert "[Target/MIPS] Kill dead code, no functional change intended." This reverts commit r296153. Revert "Recommit "[mips] Fix atomic compare and swap at O0."" This reverts commit r296134. llvm-svn: 297380	2017-03-09 14:03:26 +00:00
Simon Dardis	158956c6cc	[mips] Fix return lowering Fix a machine verifier issue where a instruction was using a invalid register. The return pseudo is expanded and has the return address register added to it. The return register may have been spuriously mark as killed earlier. This partially resolves PR/27458 Thanks to Quentin Colombet for reporting the issue! llvm-svn: 297372	2017-03-09 11:19:48 +00:00
Adam Nemet	5361b82d54	[SSP] In opt remarks, stream Function directly With this, it shows up as an attribute in YAML and non-printable characters are properly removed by GlobalValue::getRealLinkageName. llvm-svn: 297362	2017-03-09 06:10:27 +00:00
Matt Arsenault	9a3fd87523	DAG: Check no signed zeros instead of unsafe math attribute llvm-svn: 297354	2017-03-09 01:36:39 +00:00
Tim Northover	7596bd7a27	GlobalISel: correctly handle trivial fcmp predicates. It makes sense to only do them once in IRTranslator rather than making everyone deal with them. llvm-svn: 297304	2017-03-08 18:49:54 +00:00
Volkan Keles	5698b2ae6e	[GlobalISel] Add default action for G_FNEG Summary: rL297171 introduced G_FNEG for floating-point negation instruction and IRTranslator started to translate `FSUB -0.0, X` to `FNEG X`. This patch adds a default action for G_FNEG to avoid breaking existing targets. Reviewers: qcolombet, ab, kristof.beyls, t.p.northover, aditya_nandakumar, dsanders Reviewed By: qcolombet Subscribers: dberris, rovka, llvm-commits Differential Revision: https://reviews.llvm.org/D30721 llvm-svn: 297301	2017-03-08 18:09:14 +00:00
Sanjay Patel	9f495695bb	[x86] regenerate checks; NFC This test could be reduced? The check fails for a seemingly unrelated change, so I'm adding full checks to see what is happening. llvm-svn: 297296	2017-03-08 17:19:56 +00:00
Krzysztof Parzyszek	1b7197e690	[Hexagon] Use correct offset when extracting from the high word When extracting a bitfield from the high register in a register pair, the final offset should be relative to the high register (for 32-bit extracts). llvm-svn: 297288	2017-03-08 15:46:28 +00:00
Daniel Cederman	9db582a656	[Sparc] Check register use with isPhysRegUsed() instead of reg_nodbg_empty() Summary: By using reg_nodbg_empty() to determine if a function can be treated as a leaf function or not, we miss the case when the register pair L0_L1 is used but not L0 by itself. This has the effect that use_all_i32_regs(), a test in reserved-regs.ll which tries to use all registers, gets treated as a leaf function. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: davide, RKSimon, sepavloff, llvm-commits Differential Revision: https://reviews.llvm.org/D27089 llvm-svn: 297285	2017-03-08 15:23:10 +00:00
Tim Shen	c7472d912b	Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size"" After inspection, it's an UB in our code base. Someone cast a var-arg function pointer to a non-var-arg one. :/ Re-commit r296771 to continue testing on the patch. Sorry for the trouble! llvm-svn: 297256	2017-03-08 02:41:35 +00:00
Matt Arsenault	52d1b62a28	AMDGPU: Don't wait at end of block with a trivial successor If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. llvm-svn: 297251	2017-03-08 01:06:58 +00:00
Eli Friedman	c2c2e21d77	[DAGCombine] Simplify ISD::AND in GetDemandedBits. This helps in cases involving bitfields where an AND is exposed by legalization. Differential Revision: https://reviews.llvm.org/D30472 llvm-svn: 297249	2017-03-08 00:56:35 +00:00
Matt Arsenault	d8ed207a20	AMDGPU: Constant fold rcp node When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248	2017-03-08 00:48:46 +00:00
Changpeng Fang	6b49fa4ca7	AMDGPU/SI: Do not insert EndCf in an unreachable block Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D22025 llvm-svn: 297243	2017-03-07 23:29:36 +00:00
Krzysztof Parzyszek	434d50a796	[Hexagon] Check for presence before looking registers up in bit tracker llvm-svn: 297240	2017-03-07 23:12:04 +00:00
Krzysztof Parzyszek	8e4d2e0512	[Hexagon] Generate bitsplit instruction llvm-svn: 297239	2017-03-07 23:08:35 +00:00
Tim Northover	542d1c1463	GlobalISel: use inserts for landingpad instead of sequences. llvm-svn: 297237	2017-03-07 23:04:06 +00:00
Tim Northover	2eb18d3c4b	GlobalISel: fix legalization of G_INSERT We were calculating incorrect extract/insert offsets by trying to be too tricksy with min/max. It's clearer to just split the logic up into "register starts before this segment" vs "after". llvm-svn: 297226	2017-03-07 21:24:33 +00:00
Ahmed Bougacha	55d10423a6	[GlobalISel] Don't translate intrinsics with metadata parameters. Some intrinsics take metadata parameters. These all need custom handling of some form, and cannot possibly be lowered generically to G_INTRINSIC calls with vreg operands. Reject them, instead of hitting an assert later in getOrCreateVReg. llvm-svn: 297209	2017-03-07 20:53:09 +00:00
Ahmed Bougacha	5c7924fca5	[GlobalISel] Avoid invalidating ValToVReg when translating no-op bitcast. When we translate a no-op (same type) bitcast, we try to be clever and only emit a COPY if we already assigned a vreg to the defined value. However, when we didn't, we tried to assign to a reference into the ValToVReg DenseMap, even though the RHS of the assignment (getOrCreateVReg) could potentially grow that DenseMap, invalidating the reference. Avoid that by getting the source vreg first. I audited the rest of the translator; this is the only tricky case. The test is quite unwieldy, as the problem is caused by the DenseMap growing, which happens after the 47th mapped value. llvm-svn: 297208	2017-03-07 20:53:06 +00:00
Ahmed Bougacha	38455ea8a6	[GlobalISel] Relax vector G_SELECT assertion. For vector operands, the `select` instruction supports both vector and non-vector conditions. The MIR builder had an overly restrictive assertion, that only accepted vector conditions for vector selects (in effect implementing ISD::VSELECT). Make it possible to express the full range of G_SELECTs. llvm-svn: 297207	2017-03-07 20:53:03 +00:00
Ahmed Bougacha	70dd6c2212	[GlobalISel] Add vector select translation test. NFC. llvm-svn: 297206	2017-03-07 20:53:00 +00:00
Ahmed Bougacha	c373262d52	[GlobalISel] Ignore %noreg when applying default regbank mapping. When computing the mapping for non-generic instructions, we skipped %noreg operands, because we can't always reason about their banks. Also skip them when applying the mapping. Otherwise, we could end up with mappings that we can't apply. While there, duplicate an assert to distinguish between the two error conditions. llvm-svn: 297201	2017-03-07 20:34:23 +00:00
Ahmed Bougacha	4826bae8b4	[GlobalISel] Emit DBG_VALUE %noreg for non-int/fp constant values. When a dbg_value has a constant operand that isn't representable in MI, there isn't much we can do. Use %noreg (0) for those situations. This matches the SelectionDAG behavior. llvm-svn: 297200	2017-03-07 20:34:20 +00:00
Ahmed Bougacha	ab50ecb1c7	[GlobalISel] Add constant dbg.value translation tests. NFC. llvm-svn: 297199	2017-03-07 20:34:13 +00:00
Artem Belevich	2524a22562	[NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors. Differential Revision: https://reviews.llvm.org/D30672 llvm-svn: 297198	2017-03-07 20:33:38 +00:00
Arnold Schwaighofer	69e74b48f2	SjLjEHPrepare: Fix the pass for swifterror arguments We cannot leave the identity copies 'select true, arg, undef' that this pass inserts for arguments to simplify handling of values on swifterror arguments. swifterror arguments have restrictions on their uses. rdar://30839288 llvm-svn: 297197	2017-03-07 20:29:02 +00:00
Joel Jones	2852088126	[AArch64] Vulcan is now ThunderXT99 Broadcom Vulcan is now Cavium ThunderX2T99. LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113 Minor fixes for the alignments of loops and functions for ThunderX T81/T83/T88 (better performance). Patch was tested with SpecCPU2006. Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D30510 llvm-svn: 297190	2017-03-07 19:42:40 +00:00
Volkan Keles	20d3c4200d	[GlobalISel] Translate floating-point negation Reviewers: qcolombet, javed.absar, aditya_nandakumar, dsanders, t.p.northover, ab Reviewed By: qcolombet Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30671 llvm-svn: 297171	2017-03-07 18:03:28 +00:00
Krzysztof Parzyszek	3cceffb752	[Hexagon] Do not insert instructions before PHI nodes llvm-svn: 297141	2017-03-07 14:20:19 +00:00
Ranjeet Singh	3d0af578cc	[ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each other"" The original patch r296865 was reverted as it broke the chromium builds for Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies r296865 with a fix to make sure it doesn't cause the build regression. The problem was that intrinsic selection on int_arm_get_fpscr was failing in ISel this was because the code to manually select this intrinsic still thought it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as it doesn't semantically match the definition in the tablegen code which says it does have side-effects, I've fixed this by updating the intrinsic type to INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on Hans original reproducer. Differential Revision: https://reviews.llvm.org/D30645 llvm-svn: 297137	2017-03-07 11:17:53 +00:00
Jonas Paulsson	1d33cd3988	[SystemZ] Add check VT.isSimple() in canTreateAsByteVector() Since BB-vectorizer can produce vectors of for example 3 elements, this check is needed. Review: Ulrich Weigand llvm-svn: 297136	2017-03-07 09:49:31 +00:00
Artyom Skrobov	1388e2f792	In Thumb1, materialize a move between low registers as a `movs`, if CPSR isn't live. Summary: Previously, it had always been materialized as a push/pop sequence. Reviewers: labrinea, jroelofs Reviewed By: jroelofs Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30648 llvm-svn: 297134	2017-03-07 09:38:16 +00:00
Ayman Musa	ac5a2c43af	[X86][AVX512] Add missing entries to EVEX2VEX tables evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries. Differential Revision: https://reviews.llvm.org/D30501 llvm-svn: 297126	2017-03-07 08:05:53 +00:00
Tim Shen	70054bb827	Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size" This reverts commit r296771. We found some wide spread test failures internally. I'm working on a testcase. Politely revert the patch in the mean time. :) llvm-svn: 297124	2017-03-07 07:40:10 +00:00
Konstantin Zhuravlyov	e8aaab8abe	Revert "AMDGPU: Set MCAsmInfo::PointerSize" It breaks line tables because the patch is not complete, working on a complete one at the moment This reverts commit r294031. llvm-svn: 297118	2017-03-07 04:44:33 +00:00
Tim Northover	c2c545b8f7	GlobalISel: restrict G_EXTRACT instruction to just one operand. A bit more painful than G_INSERT because it was more widely used, but this should simplify the handling of extract operations in most locations. llvm-svn: 297100	2017-03-06 23:50:28 +00:00
Jessica Paquette	596f483a5e	[Outliner] Fixed Asan bot failure in r296418 Fixed the asan bot failure which led to the last commit of the outliner being reverted. The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector is no longer initialized using reserve but just a standard constructor. llvm-svn: 297081	2017-03-06 21:31:18 +00:00
Chad Rosier	9a70c7c02a	[AArch64][Redundant Copy Elim] Add support for CMN and shifted imm. This patch extends the current functionality of the AArch64 redundant copy elimination pass to handle CMN instructions as well as a shifted immediates. Differential Revision: https://reviews.llvm.org/D30576. llvm-svn: 297078	2017-03-06 21:20:00 +00:00
Krzysztof Parzyszek	9e60e51a71	Revert r297039, it's causing some mysterious buildbot failures llvm-svn: 297062	2017-03-06 20:24:21 +00:00
Jan Vesely	3ea1704434	AMDGPU/R600: Fix ALU clause markers use detection also exit early on kill instead of redefinition. Differential Revision: https://reviews.llvm.org/D30230 llvm-svn: 297060	2017-03-06 20:10:05 +00:00
Krzysztof Parzyszek	5b8fae5edd	[IfConversion] Only renormalize probabilities if branches are analyzable If a block has non-analyzable branches, the listed successors don't need to add up to one. For example, if a block has a conditional tail call, that tail call will not have a corresponding successor in the successor list, but will still be a possible branch. Differential Revision: https://reviews.llvm.org/D30556 llvm-svn: 297054	2017-03-06 19:12:42 +00:00
Tim Northover	95b6d5f2b1	GlobalISel: don't emit degenerate G_INSERT instructions. Before, we were producing G_INSERT instructions that were actually closer to a cast or even a COPY when both input and output sizes are the same. This doesn't really make sense and means that everything interpreting a G_INSERT also has to handle all these kinds of casts. So now we detect these degenerate cases and emit real casts instead. llvm-svn: 297051	2017-03-06 19:04:17 +00:00
Reid Kleckner	812191584f	[X86] Fix arg copy elision for illegal types Use the store size of the argument type, which will be a byte-sized quantity, rather than dividing the size in bits by 8. Fixes PR32136 and re-enables copy elision from i64 arguments. Reverts the workaround in from r296950. llvm-svn: 297045	2017-03-06 18:39:39 +00:00
Tim Northover	75e0b91e59	GlobalISel: refactor legalization of G_INSERT. Now that G_INSERT instructions can only insert one register, this code was overly general. In another direction it didn't handle registers that crossed split boundaries properly, which needed to be fixed. llvm-svn: 297042	2017-03-06 18:23:04 +00:00
Krzysztof Parzyszek	03c5c21568	[TableGen] Ensure proper ordering of subtarget feature names llvm-svn: 297039	2017-03-06 18:08:37 +00:00
Krzysztof Parzyszek	8a4c601abc	[Hexagon] Early-if-convert branches that may exit the loop Merge the tail block into the loop in cases where the main loop body exits early, subject to profitability constraints. This will coalesce the loop body into fewer blocks. For example: loop: loop: // loop body // loop body if (...) jump exit --> // more body more: if (...) jump exit // more body jump loop jump loop llvm-svn: 297033	2017-03-06 17:24:04 +00:00
Krzysztof Parzyszek	e16ce15687	[Hexagon] Mark dead defs as <dead> in expand-condsets The code in updateDeadFlags removed unnecessary <dead> flags, but there can be cases where such a flag is not set, and yet a register has become dead. For example, if a mux with identical inputs is replaced with a COPY, the predicate register may no longer be used after that. llvm-svn: 297032	2017-03-06 17:09:06 +00:00
Krzysztof Parzyszek	143158b72e	[Hexagon] Pick a dot-old instruction that matches the architecture llvm-svn: 297031	2017-03-06 17:03:16 +00:00
Sanjay Patel	7f7947bf41	[DAGCombiner] simplify div/rem-by-0 Refactoring of duplicated code and more fixes to follow. This is motivated by the post-commit comments for r296699: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html Ie, we can crash if we're missing obvious simplifications like this that exist in the IR simplifier or if these occur later than expected. The x86 change for non-splat division shows a potential opportunity to improve vector codegen: we assumed that since only one lane had meaningful results, we should do the math in scalar. But that means moving back and forth from vector registers. llvm-svn: 297026	2017-03-06 16:36:42 +00:00
Sanjay Patel	9aad934710	[x86] add tests to show missing div/rem simplifications; NFC These are not x86-specific, but the problem is not visible for all targets because it is masked by other transforms. These can lead to compiler crashes. llvm-svn: 297017	2017-03-06 15:50:07 +00:00
Nemanja Ivanovic	12e67d868a	[PowerPC] Fix failure with STBRX when store is narrower than the bswap Fixes a crash caused by r296811 by truncating the input of the STBRX node when the bswap is wider than i32. Fixes https://bugs.llvm.org/show_bug.cgi?id=32140 Differential Revision: https://reviews.llvm.org/D30615 llvm-svn: 297001	2017-03-06 07:32:13 +00:00
Dean Michael Berris	7e8eea429f	[XRay] Allow logging the first argument of a function call. Summary: Functions with the "xray-log-args" attribute will have a special XRay sled kind emitted, for compiler-rt to copy any call arguments to your logging handler. For practical and performance reasons, only the first argument is supported, and only up to 64 bits. Reviewers: dberris Reviewed By: dberris Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29702 llvm-svn: 296998	2017-03-06 06:48:56 +00:00
Simon Pilgrim	584d6d9d91	[SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions Found by fuzz testing after rL296985 landed llvm-svn: 296989	2017-03-05 15:52:18 +00:00
Simon Pilgrim	9f5c251d57	[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985	2017-03-05 09:57:20 +00:00
Sanjay Patel	b974be5ef4	[x86] don't require a zext when forming ADC/SBB The larger goal is to move the ADC/SBB transforms currently in combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're creating ADC/SBB in lots of places where we shouldn't. This was intended to be an NFC change, but avx-512 has something strange going on. It doesn't seem like any of the affected tests should really be using SET+TEST or ADC; a simple ADD could replace several instructions. But that's another bug... llvm-svn: 296978	2017-03-04 20:35:19 +00:00
Sanjay Patel	066f3208bf	[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C) select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977	2017-03-04 19:18:09 +00:00
Simon Pilgrim	40a0e66b37	[X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks. Differential Revision: https://reviews.llvm.org/D30213 llvm-svn: 296966	2017-03-04 12:50:47 +00:00
Florian Hahn	6406f98342	[legalize-types] Remove stale entries from SoftenedFloats. Summary: When replacing a SDValue, we should remove the replaced value from SoftenedFloats (and possibly the other maps as well?). When we revisit a Node because it needs analyzing again, we have to remove all result values from SoftenedFloats (and possibly other maps?). This fixes the fp128 test failures with expensive checks for X86. I think we probably should also remove the values from the other maps (PromotedIntegers and so on), let me know what you think. Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon Reviewed By: chh Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits Differential Revision: https://reviews.llvm.org/D29265 llvm-svn: 296964	2017-03-04 12:00:35 +00:00
Matthias Braun	21f340fd25	X86ISelLowering: Only perform copy elision on legal types. This fixes cases where i1 types were not properly legalized yet and lead to the creating of 0-sized stack slots. This fixes http://llvm.org/PR32136 llvm-svn: 296950	2017-03-04 01:40:40 +00:00
Sanjay Patel	a84fd041c6	[x86] check for commuted add pattern to find ADC/SBB llvm-svn: 296933	2017-03-04 00:18:31 +00:00
Sanjay Patel	71c1958fca	[x86] add test to show failed recognition of commuted pattern; NFC llvm-svn: 296931	2017-03-04 00:06:37 +00:00
Hans Wennborg	1c9d800fbc	Revert r296865 "[ARM] fpscr read/write intrinsics not aware of each other" It caused PR32134: "Cannot select: intrinsic %llvm.arm.get.fpscr". llvm-svn: 296926	2017-03-03 23:19:31 +00:00
Tim Northover	3e6a7afd81	GlobalISel: constrain G_INSERT to inserting just one value per instruction. It's much easier to reason about single-value inserts and no-one was actually using the variadic variants before. llvm-svn: 296923	2017-03-03 23:05:47 +00:00
Tim Northover	bf017293af	GlobalISel: add merge/unmerge nodes for legalization. These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which assume the individual parts will be contiguous, homogeneous, and occupy the entirity of the larger register. This makes reasoning about them much easer since you only have to look at the first register being merged and the result to know what the instruction is doing. I intend to gradually replace all uses of the more complicated sequence/extract with these (or single-element insert/extracts), and then remove the older variants. For now we start with legalization. llvm-svn: 296921	2017-03-03 22:46:09 +00:00
Sanjay Patel	000d61acfd	[x86] regenerate checks; NFC llvm-svn: 296908	2017-03-03 20:48:54 +00:00
Sanjay Patel	1716aa45f1	[x86] regenerate checks; NFC llvm-svn: 296883	2017-03-03 16:58:51 +00:00
Sanjay Patel	9f5db7d4e0	[x86] regenerate checks; NFC llvm-svn: 296881	2017-03-03 16:45:57 +00:00
Sanjay Patel	872e0b86eb	[x86] regenerate checks; NFC llvm-svn: 296880	2017-03-03 16:42:43 +00:00
Sanjay Patel	6fa6316769	[x86] regenerate checks; NFC llvm-svn: 296877	2017-03-03 16:34:35 +00:00
Ranjeet Singh	7b60a9ed0c	[ARM] fpscr read/write intrinsics not aware of each other The intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr read and write to the fpscr (Floating-Point Status and Control Register) register. A bug exists in the __builtin_arm_get_fpscr intrinsic definition in llvm which treats this intrinsic as a IntroNoMem which means it's not a memory access and doesn't have any other side-effects. Having this property on this intrinsic means that various optimizations can be done on this such as common sub-expression elimination with other reads. This can cause issues if there has been write to this register, e.g. void foo(int *p) { p[0] = __builtin_arm_get_fpscr(); __builtin_arm_set_fpscr(1); p[1] = __builtin_arm_get_fpscr(); } in the above example the second read is currently CSE'd into the first read, this is because llvm isn't aware that the write done by __builtin_arm_set_fpscr effects the same register that __builtin_arm_get_fpscr reads from, to fix this problem I've removed the property IntrNoMem so that __builtin_arm_get_fpscr is treated as a memory access. Differential Revision: https://reviews.llvm.org/D30542 llvm-svn: 296865	2017-03-03 11:40:07 +00:00
Chandler Carruth	ce52b80744	[SDAG] Revert r296476 (and r296486, r296668, r296690). This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862	2017-03-03 10:02:25 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00
Igor Breger	321cf3c650	[GlobalISel][X86] Support float/double and vector types. Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector. Reviewers: delena, zvi Reviewed By: zvi Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30533 llvm-svn: 296856	2017-03-03 08:06:46 +00:00
Kyle Butt	1fa6030767	CodeGen: BlockPlacement: Precompute layout for chains of triangles. For chains of triangles with small join blocks that can be tail duplicated, a simple calculation of probabilities is insufficient. Tail duplication can be profitable in 3 different ways for these cases: 1) The post-dominators marked 50% are actually taken 56% (This shrinks with longer chains) 2) The chains are statically correlated. Branch probabilities have a very U-shaped distribution. [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805] If the branches in a chain are likely to be from the same side of the distribution as their predecessor, but are independent at runtime, this transformation is profitable. (Because the cost of being wrong is a small fixed cost, unlike the standard triangle layout where the cost of being wrong scales with the # of triangles.) 3) The chains are dynamically correlated. If the probability that a previous branch was taken positively influences whether the next branch will be taken We believe that 2 and 3 are common enough to justify the small margin in 1. The code pre-scans a function's CFG to identify this pattern and marks the edges so that the standard layout algorithm can use the computed results. llvm-svn: 296845	2017-03-03 01:00:22 +00:00
Taewook Oh	96c6415697	[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y) Summary: Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below ``` extern int bar(); extern int baz(); int foo(int x, int y) { if (x != y) return bar(); else return baz(); } ``` , following is the bitcode representation of 'foo' at the end of llvm-ir level optimization: ``` define i32 @foo(i32 %x, i32 %y) !dbg !4 { entry: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12 tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13 %cmp = icmp ne i32 %x, %y, !dbg !14 br i1 %cmp, label %if.then, label %if.else, !dbg !16 if.then: ; preds = %entry %call = tail call i32 (...) @bar() #3, !dbg !17 br label %return, !dbg !18 if.else: ; preds = %entry %call1 = tail call i32 (...) @baz() #3, !dbg !19 br label %return, !dbg !20 return: ; preds = %if.else, %if.then %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ] ret i32 %retval.0, !dbg !21 } !14 = !DILocation(line: 5, column: 9, scope: !15) !16 = !DILocation(line: 5, column: 7, scope: !4) ``` As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue. Reviewers: atrick, bogner, andreadb, craig.topper, aprantl Reviewed By: andreadb Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D29813 llvm-svn: 296825	2017-03-02 21:58:35 +00:00
Krzysztof Parzyszek	e720feb1c6	[Hexagon] Pick the right branch opcode depending on branch probabilities Specifically, pick the opcode with the correct branch prediction, i.e. jump:t or jump:nt. llvm-svn: 296821	2017-03-02 21:49:49 +00:00
Tobias Grosser	02d86b80ec	Revert "AMDGPU: Re-do update for branch-relaxation test" This commit also relied on r296812, which I just reverted. We should probably apply it again, after the r296812 has been discussed and been reapplied in some variant. llvm-svn: 296820	2017-03-02 21:47:51 +00:00
Kyle Butt	1393761e0c	CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic. Outlining optional branches isn't a good heuristic, and it's never been on by default. Remove it to clean things up. llvm-svn: 296818	2017-03-02 21:44:24 +00:00
Eli Friedman	bb821276d0	[ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would move stores to the wrong insert point. Re-commit with a fix to increment NumMove in the right place. Differential Revision: https://reviews.llvm.org/D30124 llvm-svn: 296815	2017-03-02 21:39:39 +00:00
Guozhi Wei	ed28e742ee	[PPC] Fix code generation for bswap(int32) followed by store16 This patch fixes pr32063. Current code in PPCTargetLowering::PerformDAGCombine can transform bswap store into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications, 1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT(). 2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side. Differential Revision: https://reviews.llvm.org/D30362 llvm-svn: 296811	2017-03-02 21:07:59 +00:00
Chad Rosier	ea25eca04a	[AArch64] Extend redundant copy elimination pass to handle non-zero stores. This patch extends the current functionality of the AArch64 redundant copy elimination pass to handle non-zero cases such as: BB#0: cmp x0, #1 b.eq .LBB0_1 .LBB0_1: orr x0, xzr, #0x1 ; <-- redundant copy; x0 known to hold #1. Differential Revision: https://reviews.llvm.org/D29344 llvm-svn: 296809	2017-03-02 20:48:11 +00:00
Vadzim Dambrouski	eafb805506	[MSP430] Add SRet support to MSP430 target This patch adds support for struct return values to the MSP430 target backend. It also reverses the order of argument and return registers in the calling convention to bring it into closer alignment with the published EABI from TI. Patch by Andrew Wygle (awygle). Differential Revision: https://reviews.llvm.org/D29069 llvm-svn: 296807	2017-03-02 20:25:10 +00:00
Simon Pilgrim	b3067dc374	[X86][MMX] Fixed i32 extraction on 32-bit targets MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal llvm-svn: 296782	2017-03-02 18:56:06 +00:00
Krzysztof Parzyszek	056c945a5d	[Hexagon] Skip blocks that define vector predicate registers in early-if llvm-svn: 296777	2017-03-02 18:10:59 +00:00
Krzysztof Parzyszek	fcbb7d10fe	[Hexagon] Properly handle 'q' constraint in 128-byte vector mode llvm-svn: 296772	2017-03-02 17:50:24 +00:00
Nemanja Ivanovic	db8425eff0	[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size This patch reduces the stack frame size by not allocating the parameter area if it is not required. In the current implementation LowerFormalArguments_64SVR4 already handles the parameter area, but LowerCall_64SVR4 does not (when calculating the stack frame size). What this patch does is make LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29881 llvm-svn: 296771	2017-03-02 17:38:59 +00:00
Sanjay Patel	fffa179837	[DAGCombiner] avoid assertion when folding binops with opaque constants This bug was introduced with: https://reviews.llvm.org/rL296699 There may be a way to loosen the restriction, but for now just bail out on any opaque constant. The tests show that opacity is target-specific. This goes back to cost calculations in ConstantHoisting based on TTI->getIntImmCost(). llvm-svn: 296768	2017-03-02 17:18:56 +00:00
Tim Northover	e80d6d1360	GlobalISel: record correct stack usage for signext parameters. The CallingConv.td rules allocate 8 bytes for these kinds of arguments on AAPCS targets, but we were only recording the smaller amount. The difference is theoretical on AArch64 because we don't actually store more than the smaller amount, but it's still much better to have these two components in agreement. Based on Diana Picus's ARM equivalent patch (where it matters a lot more). llvm-svn: 296754	2017-03-02 15:34:18 +00:00
Andrew V. Tischenko	2855dc7ddc	Added special test covering a problem with PIC relocation model on SLM architecture. The fix will come in D26855. llvm-svn: 296746	2017-03-02 13:47:03 +00:00
Serge Pavlov	e2bf69715f	Do not verify MachimeDominatorTree if it is not calculated If dominator tree is not calculated or is invalidated, set corresponding pointer in the pass state to nullptr. Such pointer value will indicate that operations with dominator tree are not allowed. In particular, it allows to skip verification for such pass state. The dominator tree is not calculated if the machine dominator pass was skipped, it occures in the case of entities with linkage available_externally. The change fixes some test fails observed when expensive checks are enabled. Differential Revision: https://reviews.llvm.org/D29280 llvm-svn: 296742	2017-03-02 12:00:10 +00:00
Matthias Braun	dbcf9e2ee4	LiveRegMatrix: Fix some subreg interference checks Surprisingly, one of the three interference checks in LiveRegMatrix was using the main live range instead of the apropriate subregister range resulting in unnecessarily conservative results. llvm-svn: 296722	2017-03-02 00:35:08 +00:00
Eli Friedman	933863ce61	Revert r296708; causing test failures on ARM hosts. Original commit message: [ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. llvm-svn: 296718	2017-03-02 00:08:50 +00:00
Amaury Sechet	71f511fd1e	[DAGCombiner] mulhi + 1 never overflow. Summary: This can be used to optimize large multiplications after legalization. Depends on D29565 Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29587 llvm-svn: 296711	2017-03-01 23:44:17 +00:00
Ahmed Bougacha	120ae22d70	[GlobalISel] Add a way for targets to enable GISel. Until now, we've had to use -global-isel to enable GISel. But using that on other targets that don't support it will result in an abort, as we can't build a full pipeline. Additionally, we want to experiment with enabling GISel by default for some targets: we can't just enable GISel by default, even among those target that do have some support, because the level of support varies. This first step adds an override for the target to explicitly define its level of support. For AArch64, do that using a new command-line option (I know..): -aarch64-enable-global-isel-at-O=<N> Where N is the opt-level below which GISel should be used. Default that to -1, so that we still don't enable GISel anywhere. We're not there yet! While there, remove a couple LLVM_UNLIKELYs. Building the pipeline is such a cold path that in practice that shouldn't matter at all. llvm-svn: 296710	2017-03-01 23:33:08 +00:00
Amaury Sechet	683f5743f6	Improve mulhi overflow test. NFC llvm-svn: 296709	2017-03-01 23:31:19 +00:00
Eli Friedman	1c9216b003	[ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. Differential Revision: https://reviews.llvm.org/D30124 llvm-svn: 296708	2017-03-01 23:20:29 +00:00
Eli Friedman	28c2c0e311	[ARM] Check correct instructions for load/store rescheduling. This code starts from the high end of the sorted vector of offsets, and works backwards: it tries to find contiguous offsets, process them, then pops them from the end of the vector. Most of the code agrees with this order of processing, but one loop doesn't: it instead processes elements from the low end of the vector (which are nodes with unrelated offsets). Fix that loop to process the correct elements. This has a few implications. One, we don't incorrectly return early when processing multiple groups of offsets in the same block (which allows rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert point for loads, so they're correctly sorted (which affects the scheduling of vldm-liveness.ll). I think it might also impact some of the heuristics slightly. Differential Revision: https://reviews.llvm.org/D30368 llvm-svn: 296701	2017-03-01 22:56:20 +00:00
Sanjay Patel	92938657a0	[DAGCombiner] fold binops with constant into select-of-constants This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699	2017-03-01 22:51:31 +00:00
Amaury Sechet	250b4a7491	Add test case for mulhi's overflow. NFC llvm-svn: 296696	2017-03-01 22:27:21 +00:00
Reid Kleckner	f7c0980c10	Elide argument copies during instruction selection Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 llvm-svn: 296683	2017-03-01 21:42:00 +00:00
Sanjay Patel	f8edc3e870	[x86] add vector tests for more coverage of D30502; NFC llvm-svn: 296671	2017-03-01 20:31:23 +00:00
Nemanja Ivanovic	b223cfabcc	Improve scheduling with branch coalescing This patch adds a MachineSSA pass that coalesces blocks that branch on the same condition. Committing on behalf of Lei Huang. Differential Revision: https://reviews.llvm.org/D28249 llvm-svn: 296670	2017-03-01 20:29:34 +00:00
Nirav Dave	0a4703b5ec	[DAG] Prevent Stale nodes from entering worklist Add check that deleted nodes do not get added to worklist. This can occur when a node's operand is simplified to an existing node. This fixes PR32108. Reviewers: jyknight, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30506 llvm-svn: 296668	2017-03-01 20:19:38 +00:00
Nirav Dave	3de7fce3ac	Add test cases for merging stores of multiply used stores llvm-svn: 296667	2017-03-01 20:18:14 +00:00
Artur Pilipenko	e1b2d31468	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336). Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 296651	2017-03-01 18:12:29 +00:00
Krzysztof Parzyszek	8f23dd6d68	[Hexagon] Fix lowering of formal arguments of type i1 On Hexagon, values of type i1 are passed in registers of type i32, even though i1 is not a legal value for these registers. This is a special case and needs special handling to maintain consistency of the lowering information. This fixes PR32089. llvm-svn: 296645	2017-03-01 17:30:10 +00:00
Diana Picus	9c52309b37	[ARM] GlobalISel: Lower call params that need extensions Lower i1, i8 and i16 call parameters by extending them before storing them on the stack. Also make sure we encode the correct, extended size in the corresponding memory operand, and that we compute the correct stack size in the end. The latter is a bit more complicated because we used to compute the stack size in the getStackAddress method, based on the Size and Offset of the parameters. However, if the last parameter is sign extended, we'd be using the wrong, non-extended size, and we'd end up with a smaller stack than we need to hold the extended value. Instead of hacking this up based on the value of Size in getStackAddress, we move our stack size handling logic to assignArg, where we have access to the CCState which knows everything we could possibly want to know about the stack. This way we don't need to duplicate any knowledge or resort to any ugly hacks. On this same occasion, update the IRTranslator test to check the sizes of the stores everywhere, not just for sign extended paramteres. llvm-svn: 296631	2017-03-01 15:35:14 +00:00
Sanjay Patel	88a1b8b466	[x86] auto-generate checks; NFC llvm-svn: 296629	2017-03-01 14:46:59 +00:00
Sanjay Patel	f0496a6a5c	[x86] regenerate checks; NFC llvm-svn: 296628	2017-03-01 14:41:57 +00:00
Sanjay Patel	ffc6943011	[PPC] add tests for select-of-constants with binop; NFC llvm-svn: 296621	2017-03-01 14:26:49 +00:00
Matt Arsenault	103af90034	AMDGPU: Re-do update for branch-relaxation test Modify the test so that it is still testing something closer to what it was intended to originally. I think the original intent was to test the situation where there was a branch on execz and then unconditional branch required relaxing.With the change in r296539, there was no longer and execz branch. Change the test so that there is now an execz branch inserted. There is no longer an unconditional branch after the execz branch, so this might need to be tricked in some other way to keep that there. llvm-svn: 296574	2017-03-01 03:36:04 +00:00
Daniel Berlin	65f8cf945d	Only run the overloaded-intrinsic-name.ll test once, with FileCheck. llvm-svn: 296564	2017-03-01 01:56:41 +00:00
Daniel Berlin	3f91004ce7	Keep attributes, calling convention, etc, when remangling intrinsic Summary: Fix issue reported where intrinsic calling convention is dropped after r295253. Reviewers: sanjoy Subscribers: materi, llvm-commits Differential Revision: https://reviews.llvm.org/D30422 llvm-svn: 296563	2017-03-01 01:49:13 +00:00
Ahmed Bougacha	20b3e9a835	[CodeGen] Remove dead FastISel code after SDAG emitted a tailcall. When SDAGISel (top-down) selects a tail-call, it skips the remainder of the block. If, before that, FastISel (bottom-up) selected some of the (no-op) next few instructions, we can end up with dead instructions following the terminator (selected by SDAGISel). We need to erase them, as we know they aren't necessary (in addition to being incorrect). We already do this when FastISel falls back on the tail-call itself. Also remove the FastISel-emitted code if we fallback on the instructions between the tail-call and the return. llvm-svn: 296552	2017-03-01 00:43:42 +00:00
Ahmed Bougacha	67d1c7c3c2	[GlobalISel] Replace all combined G_EXTRACT uses. Iterating on the use-list we're modifying doesn't work: after the first iteration, the use-list iterator will point to a MachineOperand referencing the new register. This caused us to skip the other uses to replace. Instead, use MRI.replaceRegWith(), which accounts for this behavior. llvm-svn: 296551	2017-03-01 00:43:39 +00:00
Dan Gohman	7d7409e553	[WebAssembly] Convert the remaining unit tests to the new wasm-object-file target. To facilitate this, add a new hidden command-line option to disable the explicit-locals pass. That causes llc to emit invalid code that doesn't have all locals converted to get_local/set_local, however it simplifies testwriting in many cases. llvm-svn: 296540	2017-02-28 23:37:04 +00:00
Daniel Berlin	06f92e6dcb	Update AMDGPU test branch-relaxation.ll for changes after post-dom fixes llvm-svn: 296539	2017-02-28 23:35:24 +00:00
Eli Friedman	36795239f5	[ARM] Don't generate deprecated T1 STM. This prevents generating stm r1!, {r0, r1} on Thumb1, where value stored for r1 is UNKONWN. Patch by Zhaoshi Zheng. Differential Revision: https://reviews.llvm.org/D27910 llvm-svn: 296538	2017-02-28 23:32:55 +00:00
Krzysztof Parzyszek	33fd0bbbe8	[Hexagon] Generate extract instructions more aggressively llvm-svn: 296537	2017-02-28 23:27:33 +00:00
Krzysztof Parzyszek	f208681731	[Hexagon] Fix instruction selection for sign-extending i1 to i64 llvm-svn: 296532	2017-02-28 22:37:01 +00:00
Paul Robinson	cddd60445e	[DWARFv5] Emit new unit header format. Requesting DWARF v5 will now get you the new compile-unit and type-unit headers. llvm-dwarfdump will also recognize them. Differential Revision: http://reviews.llvm.org/D30206 llvm-svn: 296514	2017-02-28 20:24:55 +00:00
Sanjay Patel	74ca880749	[x86] add alternate IR tests for select of constants; NFC llvm-svn: 296496	2017-02-28 18:02:38 +00:00
David Bozier	3246aecb42	Fix issue with test case. Make test x86_64 specific llvm-svn: 296492	2017-02-28 17:25:38 +00:00
Craig Topper	419f145ebb	[DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node. This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency. llvm-svn: 296486	2017-02-28 16:52:05 +00:00
David Bozier	5159968786	[Stack Protection] Add diagnostic information for why stack protection was applied to a function Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which functions have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function. This change adds a remark that is reported by the stack protection code when an instruction or attribute is encountered that causes SSP to be applied. Patch by: James Henderson Differential Revision: https://reviews.llvm.org/D29023 llvm-svn: 296483	2017-02-28 16:02:37 +00:00
Nirav Dave	f830dec3f2	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476	2017-02-28 14:24:15 +00:00
Diana Picus	1ffca2aeaf	[ARM] GlobalISel: Lower i32 and fp call parameters on the stack Lower i32, float and double parameters that need to live on the stack. This boils down to creating some G_GEPs starting from the stack pointer and storing the values there. During the process we also keep track of the stack size and use the final value in the ADJCALLSTACKDOWN/UP instructions. We currently assert for smaller types, since they usually require extensions. They will be handled in a separate patch. llvm-svn: 296473	2017-02-28 14:17:53 +00:00
Diana Picus	5a7203a0af	[ARM] GlobalISel: Select 32-bit G_CONSTANT Put it into a register by means of a MOVi. llvm-svn: 296471	2017-02-28 13:05:42 +00:00
Diana Picus	5b8514559e	[ARM] GlobalISel: Add mapping for G_CONSTANT Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register operand. llvm-svn: 296469	2017-02-28 12:13:58 +00:00
Diana Picus	e6beac6742	[ARM] GlobalISel: Legalize 32-bit constants llvm-svn: 296468	2017-02-28 11:33:46 +00:00
Diana Picus	9d07094913	[ARM] GlobalISel: Select G_GEP At this point, G_GEP is just an add, so we treat it exactly like a G_ADD. llvm-svn: 296462	2017-02-28 10:14:38 +00:00
Diana Picus	566a15d749	[ARM] GlobalISel: Add reg bank mapping for G_GEP This should be the same as the mapping for G_ADD etc. llvm-svn: 296455	2017-02-28 09:35:10 +00:00
Diana Picus	8598b17076	[ARM] GlobalISel: Legalize G_GEP with 32-bit offsets At the moment we're only interested in GEPs for putting call parameters on the stack, so we'll stick to 32-bit offsets. llvm-svn: 296452	2017-02-28 09:02:42 +00:00
Artyom Skrobov	24a593fd20	Relate the CHECK: lines to the functions that they're checking [NFC] llvm-svn: 296450	2017-02-28 08:58:40 +00:00
Sanjoy Das	eef785c1a5	[ImplicitNullCheck] Add alias analysis usage Summary: With this change ImplicitNullCheck optimization uses alias analysis and can use load/store memory access for implicit null check if there are other load/store before but memory accesses do not alias. Patch by Serguei Katkov! Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30331 llvm-svn: 296440	2017-02-28 07:04:49 +00:00
Matthias Braun	81f68ec3a9	Revert "Add MIR-level outlining pass" Revert Machine Outliner for now, as it breaks the asan bot. This reverts commit r296418. llvm-svn: 296426	2017-02-28 02:24:30 +00:00
Amaury Sechet	428f8f5a27	Add test case for usubo combine. NFC. llvm-svn: 296420	2017-02-28 01:16:39 +00:00
Matthias Braun	d36410945f	Add MIR-level outlining pass This is a patch for the outliner described in the RFC at: http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html The outliner is a code-size reduction pass which works by finding repeated sequences of instructions in a program, and replacing them with calls to functions. This is useful to people working in low-memory environments, where sacrificing performance for space is acceptable. This adds an interprocedural outliner directly before printing assembly. For reference on how this would work, this patch also includes X86 target hooks and an X86 test. The outliner is run like so: clang -mno-red-zone -mllvm -enable-machine-outliner file.c Patch by Jessica Paquette<jpaquette@apple.com>! rdar://29166825 Differential Revision: https://reviews.llvm.org/D26872 llvm-svn: 296418	2017-02-28 00:33:32 +00:00
Amaury Sechet	5a605fc6c6	Add test case for computing known bits of substraction operations. NFC llvm-svn: 296417	2017-02-28 00:15:13 +00:00
Michael Kuperstein	13bf8a2684	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296416	2017-02-28 00:11:34 +00:00
Matt Arsenault	10268f93e8	AMDGPU: Use v_med3_{f16\|i16\|u16} llvm-svn: 296401	2017-02-27 22:40:39 +00:00
Matt Arsenault	eb522e68bc	AMDGPU: Support v2i16/v2f16 packed operations llvm-svn: 296396	2017-02-27 22:15:25 +00:00
Arnold Schwaighofer	b2605f31ed	ISel: We need to notify FastIS of the IMPLICIT_DEF we created in createSwiftErrorEntriesInEntryBlock Otherwise, it will insert instructions before it. rdar://30536186 llvm-svn: 296395	2017-02-27 22:12:06 +00:00
Sanjay Patel	ae7873fe55	[ARM] don't transform an add(ext Cond), C to select unless there's a setcc of the condition The transform in question claims to be doing: // fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c)) ...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node for the sext/zext patterns. This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(), so I was seeing infinite loops with my draft of a patch applied. The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's not affected by this code change. Differential Revision: https://reviews.llvm.org/D30355 llvm-svn: 296389	2017-02-27 21:30:54 +00:00
Simon Pilgrim	5c4efcdddf	[X86][SSE] Attempt to extract vector elements through target shuffles DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381	2017-02-27 21:01:57 +00:00
Matt Arsenault	7596f13d15	AMDGPU: Support inlineasm for packed instructions Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. llvm-svn: 296379	2017-02-27 20:52:10 +00:00
Matt Arsenault	2ed2193218	AMDGPU: Don't fold immediate if clamp/omod are set Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. llvm-svn: 296375	2017-02-27 20:21:31 +00:00
Matt Arsenault	3cb390498e	AMDGPU: Fold omod into instructions llvm-svn: 296372	2017-02-27 19:35:42 +00:00
Taewook Oh	a49eb8578a	[TailDuplicator] Maintain DebugLoc for branch instructions Summary: Existing implementation of duplicateSimpleBB function drops DebugLoc metadata of branch instructions during the transformation. This patch addresses this issue by making newly created branch instructions to keep the metadata of replaced branch instructions. Reviewers: qcolombet, craig.topper, aprantl, MatzeB, sanjoy, dblaikie Reviewed By: dblaikie Subscribers: dblaikie, llvm-commits Differential Revision: https://reviews.llvm.org/D30026 llvm-svn: 296371	2017-02-27 19:30:01 +00:00
Matt Arsenault	e2d1d3a940	AMDGPU: Add f16 to shader calling conventions Mostly useful for writing tests for f16 features. llvm-svn: 296370	2017-02-27 19:24:47 +00:00
Amaury Sechet	10c7fb4187	Refactor xaluo.ll and xmulo.ll tests. NFC llvm-svn: 296367	2017-02-27 18:32:54 +00:00
Amaury Sechet	00bf6f52dc	Remove an empty line in icmp-illegal.ll . NFC llvm-svn: 296350	2017-02-27 16:09:44 +00:00
Artur Pilipenko	f7196c8d9e	[DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE targets This pattern is essentially a i16 load from p+1 address: %p1.i16 = bitcast i8* %p to i16* %p2.i8 = getelementptr i8, i8* %p, i64 2 %v1 = load i16, i16* %p1.i16 %v2.i8 = load i8, i8* %p2.i8 %v2 = zext i8 %v2.i8 to i16 %v1.shl = shl i16 %v1, 8 %res = or i16 %v1.shl, %v2 Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining. llvm-svn: 296336	2017-02-27 13:04:23 +00:00
Amaury Sechet	681472cd0f	Do full codegen for various tests. NFC llvm-svn: 296305	2017-02-27 01:15:57 +00:00
Daniel Jasper	3ca4525612	Revert "[CGP] Split some critical edges coming out of indirect branches" This reverts commit r296149 as it leads to crashes when compiling for PPC. llvm-svn: 296295	2017-02-26 11:09:12 +00:00
Craig Topper	6028584d8c	[X86] Fix execution domain for cmpss/sd instructions. llvm-svn: 296293	2017-02-26 06:45:59 +00:00
Craig Topper	e70231be51	[AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps. llvm-svn: 296291	2017-02-26 06:45:54 +00:00
Craig Topper	fe25988c68	[AVX-512] Fix the execution domain for AVX-512 integer broadcasts. llvm-svn: 296290	2017-02-26 06:45:51 +00:00
Craig Topper	6bf9b809ce	[AVX-512] Fix execution domain for VPMADD52 instructions. llvm-svn: 296288	2017-02-26 06:45:45 +00:00
Craig Topper	9ef7e44d2f	[AVX-512] Use update_llc_test_checks.py to regenerate a test. llvm-svn: 296287	2017-02-26 06:45:43 +00:00
Craig Topper	ed64904c74	[X86] Fix the execution domain for scalar SQRT intrinsic instruction. llvm-svn: 296284	2017-02-26 06:45:35 +00:00
Craig Topper	a87b40051d	[X86] Add an additional CHECK prefix to a test. Some of the cases used it, but it wasn't on the FileCheck command lines. llvm-svn: 296283	2017-02-26 06:45:32 +00:00
David L. Jones	d95c34abe1	[X86] Clean up test/CodeGen/X86/2006-03-02-InstrSchedBug.ll Summary: Migrated from grep to FileCheck. Re-indented code, removed boilerplate comments. Added 'entry' label at beginning of basic block. Patch by Jorge Gorbe! Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30320 llvm-svn: 296280	2017-02-26 01:32:35 +00:00
Nirav Dave	73cd0194cf	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279	2017-02-26 01:27:32 +00:00
Craig Topper	2caa97c891	[AVX-512] Fix the execution domain for scalar FMA instructions. llvm-svn: 296271	2017-02-25 19:36:28 +00:00
Craig Topper	176f3310b6	[AVX-512] Fix the execution domain on some instructions. llvm-svn: 296270	2017-02-25 19:18:11 +00:00
Craig Topper	0524035fb4	[AVX-512] Add an additional test case to show the execution domain for vrqsrtsd is wrong. llvm-svn: 296269	2017-02-25 19:18:08 +00:00
Craig Topper	9b43d459bf	[AVX-512] Use update_llc_test_checks.py to regenerate the avx512er intrinsic test. llvm-svn: 296268	2017-02-25 19:18:04 +00:00
Nirav Dave	4a20711826	reenable accidentally disabled test NFC. llvm-svn: 296266	2017-02-25 19:11:53 +00:00
Craig Topper	3b8aca2ecf	[ExecutionDepsFix] Don't make copies of LiveReg objects when collecting operands for soft instructions Summary: While collecting operands we make copies of the LiveReg objects which are stored in the LiveRegs array. If the instruction uses the same register multiple times we end up with multiple copies. Later we iterate through the collected list of LiveReg objects and merge DomainValues. In the process of doing this the merge function can change the contents of the original LiveReg object in the LiveRegs array, but not the copies that have been made. So when we get to the second usage of the register we end up seeing a stale copy of the LiveReg object. To fix this I've stopped copying and now just store a pointer to the original LiveReg object. Another option might be to avoid adding the same register to the Regs array twice, but this approach seemed simpler. The included test case exposes this bug due to an AVX-512 masked OR instruction using the same register for the passthru operand and one of the inputs to the OR operation. Fixes PR30284. Reviewers: RKSimon, stoklund, MatzeB, spatel, myatsina Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30242 llvm-svn: 296260	2017-02-25 18:12:25 +00:00
Amaury Sechet	09ecd3117e	Update various test's codegen. NFC llvm-svn: 296257	2017-02-25 16:46:47 +00:00
Amaury Sechet	7bda8cdef2	Add test for known bits in uaddo and saddo. llvm-svn: 296255	2017-02-25 15:58:34 +00:00
Artyom Skrobov	2716910caf	The automatic CHECK: to CHECK-LABEL: conversion, back in 2013, had missed most labels in this test because they didn't end with a colon. llvm-svn: 296254	2017-02-25 15:17:16 +00:00
Nirav Dave	beabf456df	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252	2017-02-25 11:43:58 +00:00
Dan Gohman	82607f56bd	[WebAssembly] Add support for using a wasm global for the stack pointer. This replaces the __stack_pointer variable which was allocated in linear memory. llvm-svn: 296201	2017-02-24 23:46:05 +00:00
Krzysztof Parzyszek	0d67b10a3c	[Hexagon] Undo shift folding where it could simplify addressing mode For example, avoid (single shift): r0 = and(##536870908,lsr(r0,#3)) r0 = memw(r1+r0<<#0) in favor of (two shifts): r0 = lsr(r0,#5) r0 = memw(r1+r0<<#2) llvm-svn: 296196	2017-02-24 23:34:24 +00:00
Dan Gohman	d934cb8806	[WebAssembly] Basic support for Wasm object file encoding. With the "wasm32-unknown-unknown-wasm" triple, this allows writing out simple wasm object files, and is another step in a larger series toward migrating from ELF to general wasm object support. Note that this code and the binary format itself is still experimental. llvm-svn: 296190	2017-02-24 23:18:00 +00:00
Wei Ding	4d3d4ca1b3	AMDGPU : Replace FMAD with FMA when denormals are enabled. Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186	2017-02-24 23:00:29 +00:00
Stanislav Mekhanoshin	42259cf35e	Revert "Correct register pressure calculation in presence of subregs" This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182	2017-02-24 21:56:16 +00:00
Evgeniy Stepanov	00400d36c9	Disallow redefinition of section symbols. Differential Revision: https://reviews.llvm.org/D30235 llvm-svn: 296180	2017-02-24 21:44:58 +00:00
Sanjay Patel	ab08bb8da9	[ARM] add tests for alternate forms of select-of-constants; NFC llvm-svn: 296178	2017-02-24 21:36:34 +00:00
Tim Northover	ef29e7284b	GlobalISel: check for CImm rather than Imm on G_CONSTANTs. All G_CONSTANTS created by the MachineIRBuilder have an operand of type CImm (i.e. a ConstantInt), so that's what the selector needs to look for. llvm-svn: 296176	2017-02-24 21:21:38 +00:00
Sanjay Patel	cd72f156d6	[ARM] auto-generate complete checks; NFC The affected test may change with a patch I'm looking at for DAGCombiner, so I want to make sure it's not a regression. llvm-svn: 296175	2017-02-24 21:19:09 +00:00
Dan Gohman	6999c4fd28	[WebAssembly] Handle f16 in fast-isel. llvm-svn: 296172	2017-02-24 21:05:35 +00:00
Michael Kuperstein	46b131e3f8	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296149	2017-02-24 18:41:32 +00:00
Nemanja Ivanovic	195c5452d3	[PowerPC] Use subfic instruction for subtract from immediate Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate. The corresponding pattern already exists for 32-bit integers. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29387 llvm-svn: 296144	2017-02-24 18:16:06 +00:00
Nemanja Ivanovic	82d53ed492	[PowerPC] Use rldicr instruction for AND with an immediate if possible Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that clear bits from the right hand size. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29388 llvm-svn: 296143	2017-02-24 18:03:16 +00:00
Sanjay Patel	832b1622d8	[DAGCombiner] add missing folds for scalar select of {-1,0,1} The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in this patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 llvm-svn: 296137	2017-02-24 17:17:33 +00:00
Simon Dardis	ae6f2bcb25	Recommit "[mips] Fix atomic compare and swap at O0." This time with the missing files. Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296134	2017-02-24 16:32:18 +00:00
Simon Dardis	3c58c18ff0	Revert "[mips] Fix atomic compare and swap at O0." This reverts r296132. I forgot to include the tests. llvm-svn: 296133	2017-02-24 16:30:27 +00:00
Simon Dardis	cf0e06d375	[mips] Fix atomic compare and swap at O0. Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296132	2017-02-24 16:27:45 +00:00
Daniel Sanders	066ebbfd46	[globalisel] Decouple src pattern operands from dst pattern operands. Summary: This isn't testable for AArch64 by itself so this patch also adds support for constant immediates in the pattern and physical register uses in the result. The new IntOperandMatcher matches the constant in patterns such as '(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold immediates into an instruction so this is the first rule that will match across multiple BB's. The Renderer hierarchy is responsible for adding operands to the result instruction. Renderers can copy operands (CopyRenderer) or add physical registers (in particular %wzr and %xzr) to the result instruction in any order (OperandMatchers now import the operand names from SelectionDAG to allow renderers to access any operand). This allows us to emit the result instruction for: %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0 %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0 although the latter is untested since the matcher/importer has not been taught about commutativity yet. Added BuildMIAction which can build new instructions and mutate them where possible. W.r.t the mutation aspect, MatchActions are now told the name of an instruction they can recycle and BuildMIAction will emit mutation code when the renderers are appropriate. They are appropriate when all operands are rendered using CopyRenderer and the indices are the same as the matcher. This currently assumes that all operands have at least one matcher. Finally, this change also fixes a crash in AArch64InstructionSelector::select() caused by an immediate operand passing isImm() rather than isCImm(). This was uncovered by the other changes and was detected by existing tests. Depends on D29711 Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar Reviewed By: rovka Subscribers: aemerson, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29712 llvm-svn: 296131	2017-02-24 15:43:30 +00:00
Diana Picus	3b99c64ba1	[ARM] GlobalISel: Select G_STORE Same as selecting G_LOAD. llvm-svn: 296122	2017-02-24 14:01:27 +00:00
Diana Picus	b31a259198	Minor test fix The test was using a size of 8 for loading/storing pointers. It should be 4. llvm-svn: 296120	2017-02-24 13:27:55 +00:00
Diana Picus	1f432f995a	[ARM] GlobalISel: Add reg bank mappings for stores Same as the ones for loads. llvm-svn: 296115	2017-02-24 13:07:25 +00:00
Diana Picus	a2b632a353	[ARM] GlobalISel: Legalize stores Allow the same types that we allow for loads. llvm-svn: 296108	2017-02-24 11:28:24 +00:00
Diana Picus	c21d1e5d94	Revert "[ARM] GlobalISel: Legalize stores" This reverts commit r296103 because the test broke on one of the bots. Sorry! llvm-svn: 296104	2017-02-24 10:35:39 +00:00
Diana Picus	a5f1cfd1a7	[ARM] GlobalISel: Legalize stores Allow the same types that we allow for loads. llvm-svn: 296103	2017-02-24 10:19:23 +00:00
Craig Topper	f2529c188b	[AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz intrinsics with select. Clang has been emitting cltz intrinsics for a while now. llvm-svn: 296091	2017-02-24 05:35:04 +00:00
Craig Topper	dc13344150	[AVX-512] Move lzcnt and conflict intrinsic tests to avx512cd intrinsic test file since that's their feature. llvm-svn: 296090	2017-02-24 05:34:59 +00:00
Craig Topper	700edc3275	[AVX-512] Use update_llc_test_checks.py to generate a test. llvm-svn: 296089	2017-02-24 05:34:57 +00:00
Petr Hosek	a7d5916308	[Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081	2017-02-24 03:10:10 +00:00
Eli Friedman	7e0ce82c4a	Add some testcases for bitfields with illegal widths. clang will generate IR like this for input using packed bitfields; very simple semantically, but it's a bit tricky to actually generate good code. llvm-svn: 296080	2017-02-24 03:04:11 +00:00
Eli Friedman	8f34746c72	Fix old testcase for dead store to match the original intent. The x86 backend has a special case for load+xor+store, which isn't really what this is trying to test. llvm-svn: 296077	2017-02-24 02:58:49 +00:00
Adam Nemet	f373e68bfc	[LazyMachineBFI] Add testcase This is based on Justin's testcase and checking whether BFI is not populated in case hotness is off. This is a patch meant on top of Justin's patch to enable Machine opt-remarks in the AsmPrinter (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170130/426595.html) Differential Revision: https://reviews.llvm.org/D29837 llvm-svn: 296065	2017-02-24 01:22:55 +00:00
Michael Kuperstein	581c9f4b20	Revert r269060 to pacify bots. llvm-svn: 296064	2017-02-24 01:22:19 +00:00
Justin Bogner	d75fd0988d	OptDiag: Add test for r296053 Forgot to commit this with the change. llvm-svn: 296061	2017-02-24 01:13:09 +00:00
Michael Kuperstein	12e79d5002	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296060	2017-02-24 00:56:21 +00:00
Artem Belevich	620db1f3dd	[NVPTX] Added support for .f16x2 instructions. This patch enables support for .f16x2 operations. Added new register type Float16x2. Added support for .f16x2 instructions. Added handling of vectorized loads/stores of v2f16 values. Differential Revision: https://reviews.llvm.org/D30057 Differential Revision: https://reviews.llvm.org/D30310 llvm-svn: 296032	2017-02-23 22:38:24 +00:00
Tim Northover	063a56e81c	ARM: make sure FastISel bails on f64 operations for Cortex-M4. FastISel wasn't checking the isFPOnlySP subtarget feature before emitting double-precision operations, so it got completely invalid CodeGen for doubles on Cortex-M4F. The normal ISel testing wasn't spectacular either so I added a second RUN line to improve that while I was in the area. llvm-svn: 296031	2017-02-23 22:35:00 +00:00
Krzysztof Parzyszek	128e191eac	[Hexagon] Handle saturations in Hexagon bit tracker llvm-svn: 296026	2017-02-23 22:11:52 +00:00
Evgeniy Stepanov	ee2d77f6d6	Disable TLS for stack protector on Android API<17. The TLS slot did not exist back then. llvm-svn: 296014	2017-02-23 21:06:35 +00:00
Ahmed Bougacha	ae9dadecf3	[GlobalISel] Emit opt remarks on isel fallbacks. Having more fine-grained information on the specific construct that caused us to fallback is valuable for large-scale data collection. We still have the fallback warning, that's also used for FastISel. We still need to remove the fallback warning, and teach FastISel to also emit remarks (it currently has a combination of the warning, stats, and debug prints: the remarks could unify all three). The abort-on-fallback path could also be better handled using remarks: one could imagine a "-Rpass-error", analoguous to "-Werror", which would promote missed/failed remarks to errors. It's not clear whether that would be useful for other remarks though, so we're not there yet. llvm-svn: 296013	2017-02-23 21:05:42 +00:00
Stanislav Mekhanoshin	ce3ddd2de4	Correct register pressure calculation in presence of subregs If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 llvm-svn: 296009	2017-02-23 20:19:44 +00:00
Krzysztof Parzyszek	2cfc7a48de	[Hexagon] Avoid IMPLICIT_DEFs as new-value producers llvm-svn: 295997	2017-02-23 17:47:34 +00:00
Jan Vesely	70293a045b	AMDGPU/SI: Fix trunc i16 pattern Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990	2017-02-23 16:12:21 +00:00
Krzysztof Parzyszek	af5ff65d67	[Hexagon] Patterns for CTPOP, BSWAP and BITREVERSE llvm-svn: 295981	2017-02-23 15:02:09 +00:00
Diana Picus	a8cb0cd8f2	[ARM] GlobalISel: Lower call returns Introduce a common ValueHandler for call returns and formal arguments, and inherit two different versions for handling the differences (at the moment the only difference is the way physical registers are marked as used). llvm-svn: 295973	2017-02-23 14:18:41 +00:00
Diana Picus	a606713c33	[ARM] GlobalISel: Lower call parameters in regs Add support for lowering calls with parameters than can fit into regs. Use the same ValueHandler that we used for function returns, but rename it to match its new, extended purpose. llvm-svn: 295971	2017-02-23 13:25:43 +00:00
Ayman Musa	4b2c968c43	[X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel. (Quick fix to buildbot failure after rL295940 commit). llvm-svn: 295970	2017-02-23 13:15:44 +00:00
Kristof Beyls	5ac6adbb6d	Fix assertion failure in ARMConstantIslandPass. The ARMConstantIslandPass didn't have support for handling accesses to constant island objects through ARM::t2LDRBpci instructions. This adds support for that. This fixes PR31997. llvm-svn: 295964	2017-02-23 12:24:55 +00:00
Ayman Musa	6e670cf44f	[X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency between VEX/EVEX versions. AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors. Differential Revision: https://reviews.llvm.org/D29988 llvm-svn: 295940	2017-02-23 07:24:21 +00:00
Matt Arsenault	a9e16e6597	AMDGPU: Add another BFE pattern This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912	2017-02-23 00:23:43 +00:00
Matt Arsenault	79a45db7f5	AMDGPU: Use clamp with f64 llvm-svn: 295908	2017-02-22 23:53:37 +00:00
Matt Arsenault	d5c6515b68	AMDGPU: Fold FP clamp as modifier bit The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905	2017-02-22 23:27:53 +00:00
Wei Ding	f2cce02eb2	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904	2017-02-22 23:22:19 +00:00
Matt Arsenault	f5262256a1	AMDGPU: Add replacement bfe intrinsics llvm-svn: 295899	2017-02-22 23:04:58 +00:00
Dylan McKay	19d9533496	[AVR] Disable integrated assembler for a few tests Fixes the build. llvm-svn: 295895	2017-02-22 22:41:13 +00:00
Krzysztof Parzyszek	ab57c2bad3	[Hexagon] Implement @llvm.readcyclecounter() llvm-svn: 295892	2017-02-22 22:28:47 +00:00
Matt Arsenault	7b6c5d28f5	AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891	2017-02-22 22:23:32 +00:00
Krzysztof Parzyszek	65971d97b0	[Hexagon] Add intrinsics for masked vector stores Patch by Harsha Jagasia. llvm-svn: 295879	2017-02-22 21:23:09 +00:00
Matt Arsenault	93e65ea733	AMDGPU: Don't look at chain users when adjusting writemask Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878	2017-02-22 21:16:41 +00:00
Matt Arsenault	707780b420	AMDGPU: Always allocate emergency stack slot at offset 0 This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877	2017-02-22 21:05:25 +00:00
Matt Arsenault	61ec6a03ca	AMDGPU: Change exp with compr bit printing llvm-svn: 295873	2017-02-22 20:37:12 +00:00
Wei Ding	6ade56e0a0	Revert "AMDGPU : Update TrapCode based on Trap Handler ABI." This reverts commit r295867. llvm-svn: 295871	2017-02-22 20:29:22 +00:00
Wei Ding	4991d3570f	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867	2017-02-22 20:05:06 +00:00
Matthias Braun	e8a0f5ef3b	Bring back 2>&1 redirection for this test llvm-svn: 295864	2017-02-22 19:16:33 +00:00
Geoff Berry	6bb79157dd	[AArch64] Extend AArch64RedundantCopyElimination to do simple copy propagation. Summary: Extend AArch64RedundantCopyElimination to catch cases where the register that is known to be zero is COPY'd in the predecessor block. Before this change, this pass would catch cases like: CBZW %W0, <BB#1> BB#1: %W0 = COPY %WZR // removed After this change, cases like the one below are also caught: %W0 = COPY %W1 CBZW %W1, <BB#1> BB#1: %W0 = COPY %WZR // removed This change results in a 4% increase in static copies removed by this pass when compiling the llvm test-suite. It also fixes regressions caused by doing post-RA copy propagation (a separate change to be put up for review shortly). Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D30113 llvm-svn: 295863	2017-02-22 19:10:45 +00:00
Matthias Braun	f1141285eb	MIRTests: Remove unnecessary 2>&1 redirection llc mir output goes to stdout nowadays, so the 2>&1 is not necessary anymore for most tests. llvm-svn: 295859	2017-02-22 18:47:41 +00:00
Dan Gohman	a63e8eb138	[WebAssembly] Configure codegen to legalize f16 values. llvm-svn: 295850	2017-02-22 16:28:00 +00:00
Bill Seurer	8e48f416ad	[DAGCombiner] revert r295336 r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849	2017-02-22 16:27:33 +00:00
Igor Breger	f7359d893a	[X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr . Reviewers: qcolombet, rovka, zvi, ab Reviewed By: rovka Subscribers: mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29816 llvm-svn: 295824	2017-02-22 12:25:09 +00:00
Simon Pilgrim	07056a06a0	[X86] Regenerate CSE test with codegen instead of just the instruction count llvm-svn: 295819	2017-02-22 10:12:46 +00:00
Roger Ferrer Ibanez	56db97d4de	[ARM] Fix constant islands pass. The pass tries to fix a spill of LR that turns out to be unnecessary. So it removes the tPOP but forgets to remove tPUSH. This causes the stack be misaligned upon returning the function. Thus, remove the tPUSH as well in this case. Differential Revision: https://reviews.llvm.org/D30207 llvm-svn: 295816	2017-02-22 09:06:21 +00:00
Javed Absar	b672722810	[ARM] Classification Improvements to ARM Sched-Models. NFCI. This patch adds missing sched classes for Thumb2 instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. These patches should help write schedulers better and faster in the future for ARM sub-targets. Reviewer: Diana Picus Differential Revision: https://reviews.llvm.org/D29953 llvm-svn: 295811	2017-02-22 07:22:57 +00:00
Craig Topper	56d4022997	[AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810	2017-02-22 06:54:18 +00:00
Matt Arsenault	1f17c66890	AMDGPU: Add cvt.pkrtz intrinsic Convert llvm.SI.packf16 test uses llvm-svn: 295797	2017-02-22 00:27:34 +00:00
Matt Arsenault	3ea06336fc	AMDGPU: Remove some uses of llvm.SI.export in tests Merge some of the old, smaller tests into more complete versions. llvm-svn: 295792	2017-02-22 00:02:21 +00:00
Matt Arsenault	9417505f7d	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic llvm-svn: 295789	2017-02-21 23:46:04 +00:00
Matt Arsenault	2fdf2a1a18	AMDGPU: Redefine clamp node as clamp 0.0-1.0 Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788	2017-02-21 23:35:48 +00:00
Artem Belevich	29bbdc1c32	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784	2017-02-21 22:56:05 +00:00
Evandro Menezes	bc9a13db0e	[AArch64] Add test case for fusion of literal generation Add test case from https://reviews.llvm.org/D28698 that was somehow lost in transit. llvm-svn: 295775	2017-02-21 22:16:09 +00:00
Evandro Menezes	ec330cc283	[AArch64] Add test case for fusion of AES crypto operations Add test case from https://reviews.llvm.org/D28491 that was somehow lost in transit. llvm-svn: 295774	2017-02-21 22:16:06 +00:00
Evgeniy Stepanov	1fd19c6e5d	Fix PR31896. Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset). llvm-svn: 295762	2017-02-21 20:17:34 +00:00
Matt Arsenault	f3ffe75a1b	AMDGPU: Remove dead declarations in tests llvm-svn: 295757	2017-02-21 19:31:33 +00:00
Matt Arsenault	b2e6811ec1	AMDGPU: Remove dead declarations from MIR tests llvm-svn: 295755	2017-02-21 19:27:36 +00:00
Matt Arsenault	c2a44e4c3c	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic llvm-svn: 295754	2017-02-21 19:27:33 +00:00
Matt Arsenault	e0bf7d02f0	AMDGPU: Don't use stack space for SGPR->VGPR spills Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753	2017-02-21 19:12:08 +00:00
Geoff Berry	5d534b6a11	[CodeGenPrepare] Sink and duplicate more 'and' instructions. Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746	2017-02-21 18:53:14 +00:00
Simon Pilgrim	5afda30930	[X86][AVX512] Update VPBROADCASTQ test to combine from VPERMQ instead of VPERMI2Q. VPERMI2Q doesn't have shuffle decoding from re-materializable constants. llvm-svn: 295736	2017-02-21 17:04:11 +00:00
Simon Pilgrim	f321ab6dd2	[X86][AVX] Rename shuffle combine tests to show combined shuffle type. NFCI. llvm-svn: 295735	2017-02-21 16:45:31 +00:00
Simon Pilgrim	791955819c	[X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets. As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ. llvm-svn: 295733	2017-02-21 16:41:44 +00:00
Simon Pilgrim	f98a32fa7f	[X86][AVX2] Add AVX512 test targets to AVX2 shuffle combines. llvm-svn: 295731	2017-02-21 16:29:28 +00:00
Simon Pilgrim	4cc6dd0cf6	[X86][AVX] Add tests showing missed VPBROADCASTQ folding on 32-bit targets. As i64 isn't a value type on 32-bit targets, we fail to fold the VZEXT_LOAD into VPBROADCASTQ. Also shows that we're not decoding VPERMIV3 shuffles very well.... llvm-svn: 295729	2017-02-21 16:05:35 +00:00
Simon Pilgrim	3546156122	[X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL. This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns. llvm-svn: 295723	2017-02-21 15:09:00 +00:00
Simon Pilgrim	0c094f504c	[X86][SSE] Added SSE41 shuffle combining test file. Currently just contains one case where we combine to VZEXT_MOVL instead of VZEXT which would avoid the need for a zero vector to be generated llvm-svn: 295721	2017-02-21 14:51:15 +00:00
Igor Breger	812f319794	[AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index. Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718	2017-02-21 14:01:25 +00:00
Diana Picus	613b65696a	[ARM] GlobalISel: Lower calls to void() functions For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair with amount 0. llvm-svn: 295716	2017-02-21 11:33:59 +00:00
Craig Topper	fe78d95a49	[X86] Remove ssse3 intrinsic tests from the avx intrinsics test file. They are all covered by the SSSE3 intrinsics test with SSSE3, AVX, and AVX512 command lines. llvm-svn: 295708	2017-02-21 08:06:08 +00:00
Craig Topper	55e2de869d	[X86] Remove sse4.2 intrinsic tests from the avx intrinsics test file. Fix some other consistency issues. They are all covered by the SSE4.2 intrinsics test with SSE4.2, AVX, and AVX512 command lines. Merge sse42.ll into the other intrinsics test. Rename sse42_64.ll to be named like other intrinsic tests. llvm-svn: 295707	2017-02-21 08:06:05 +00:00
Craig Topper	25191b4ac3	[X86] Remove sse4.1 intrinsic tests from the avx intrinsics test file. They are all covered by the SSE4.1 intrinsics test with SSE4.1, AVX, and AVX512 command lines. llvm-svn: 295706	2017-02-21 08:06:02 +00:00
Craig Topper	da8e6f1337	[X86] Remove sse3 intrinsic tests from the avx intrinsics test file. They are all covered by the SSE3 intrinsics test with SSE2, AVX, and AVX512 command lines. llvm-svn: 295705	2017-02-21 08:05:59 +00:00
Craig Topper	002549b8be	[X86] Remove aes intrinsic tests from the avx intrinsics test file. They are all covered by the AES intrinsics test with a legacy command line and an AVX command line. llvm-svn: 295702	2017-02-21 07:32:18 +00:00
Craig Topper	2a71fd95e8	[X86] Add an AVX command line and regenerate AES intrinsics test using the update_llc_test_checks.py llvm-svn: 295701	2017-02-21 07:32:14 +00:00
Craig Topper	dbf6f367e9	[X86] Remove sse2 intrinsic tests from the avx intrinsics test file. They are all covered by the SSE2 intrinsics test with SSE2, AVX, and AVX512 command lines. Also remove an unneeded lfence intrinsic test since it was already covered. llvm-svn: 295700	2017-02-21 07:32:11 +00:00
Craig Topper	0d47fdcf3f	[X86] Remove sse1 intrinsic tests from the avx intrinsics test file. They are all covered by the SSE intrinsics test with SSE, AVX, and AVX512 command lines. Also remove an unneeded sfence intrinsic test since it was already covered. llvm-svn: 295699	2017-02-21 07:32:03 +00:00
Craig Topper	d88389aa7e	[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697	2017-02-21 06:39:13 +00:00
Craig Topper	d9fe664868	[AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns. llvm-svn: 295693	2017-02-21 04:26:10 +00:00
Craig Topper	63b7d71844	[AVX-512] Add test cases showing failure to fold zero extending scalar loads in scalar intrinsics without the peephole pass. llvm-svn: 295692	2017-02-21 04:26:07 +00:00
Taewook Oh	4cf5c1087c	[BranchFolding] Update debug location along with the update of branch instruction. Summary: Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of ``` !12 = !DILocation(line: 6, column: 3, scope: !11) ``` , but this information is gone when then block is lowered because BranchFolder misses it. This patch is a fix for this issue. Reviewers: qcolombet, aprantl, craig.topper, MatzeB Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29902 llvm-svn: 295684	2017-02-21 00:12:38 +00:00
Craig Topper	e8beaff021	[X86] Add additonal check lines to one of the rotate tests. llvm-svn: 295682	2017-02-20 23:38:51 +00:00
Craig Topper	a80f90e66b	[X86] FileCheckize one of the rotate tests. llvm-svn: 295681	2017-02-20 23:38:48 +00:00
Craig Topper	bb10c0f1ec	[X86] FileCheckize one of the rotate tests. llvm-svn: 295676	2017-02-20 19:44:10 +00:00
Craig Topper	2012dda9a0	[AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0. llvm-svn: 295673	2017-02-20 17:44:09 +00:00
Simon Pilgrim	e9a8145adb	[X86][SSE] Regenerate extracted bitcasted constant tests and add 32-bit test target llvm-svn: 295669	2017-02-20 15:57:14 +00:00
Simon Pilgrim	72d666e443	[X86][SSE] Regenerate re-materialized store tests and add 64-bit test target llvm-svn: 295666	2017-02-20 15:20:37 +00:00
Simon Pilgrim	5a33d1c266	[X86][SSE] Regenerate vselect widening tests and add 32-bit test target llvm-svn: 295665	2017-02-20 15:16:43 +00:00
Igor Breger	fda32d266a	[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector. Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660	2017-02-20 14:16:29 +00:00
Simon Pilgrim	5910ebe720	[X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX Use v8i64 ASHR instructions if we don't have VLX. Differential Revision: https://reviews.llvm.org/D28537 llvm-svn: 295656	2017-02-20 12:16:38 +00:00
Sanne Wouda	47eb9723de	[ARM] Add a div regression test for Cortex-M23 Summary: This file was missed in the commit for Cortex-M23 and Cortex-M33 support. See https://reviews.llvm.org/D29073?id=85814 . Reviewers: rengolin, javed.absar, samparker Reviewed By: samparker Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D30162 llvm-svn: 295655	2017-02-20 12:05:07 +00:00
Simon Pilgrim	50b958c07a	[SelectionDAG] Add scalarization support for ISD::*_EXTEND_VECTOR_INREG opcodes. Thanks to Mikael Holmén for the initial test case llvm-svn: 295652	2017-02-20 11:55:58 +00:00
Craig Topper	c6c68f5958	[AVX-512] Add more patterns to fold masked VPTERNLOG with load when the passthru isn't operand 0. llvm-svn: 295640	2017-02-20 07:00:40 +00:00
Craig Topper	5aef828ba7	[AVX-512] Add tests for missed opportunities to fold masked VPTERNLOG with load when the passthru op isn't operand 0. llvm-svn: 295639	2017-02-20 07:00:37 +00:00
Craig Topper	a5fa2e40f9	[AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG patterns. llvm-svn: 295638	2017-02-20 07:00:34 +00:00
Craig Topper	cb5b45cc36	[AVX-512] Use a better immediate in the VPTERNLOG commuting tests so its easier to spot bad swizzling. llvm-svn: 295637	2017-02-20 07:00:31 +00:00
Craig Topper	5b4e36aafa	[AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads that aren't in operand 2. llvm-svn: 295634	2017-02-20 02:47:42 +00:00
Craig Topper	c184b671d9	[X86] Use memory form of shift right by 1 when the rotl immediate is one less than the operation size. An earlier commit already did this for the register form. llvm-svn: 295626	2017-02-20 00:37:23 +00:00
Craig Topper	0f14411b57	[X86] Add test cases showing missed opportunities to use rotate right by 1 instructions when operation reads/writes memory. llvm-svn: 295625	2017-02-20 00:37:20 +00:00
Craig Topper	489057715e	[AVX-512] Disable peephole optimizations on the VPTERNLOG commute test. Add new patterns to enable isel to fold the loads on it own. llvm-svn: 295616	2017-02-19 21:32:15 +00:00
Simon Pilgrim	d590de2998	[X86][SSE] Use getTargetConstantBitsFromNode to find zeroable shuffle elements. Replaces existing approach that could only search BUILD_VECTOR nodes. Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements. llvm-svn: 295613	2017-02-19 19:40:31 +00:00
Craig Topper	4e794c71a6	[AVX-512] Add patterns to recognize masked vpternlog when the passthrough operand is not operand 0. This uses a SDNodeXForm to swizzle the appropriate immediate bits to allow this to be matched. llvm-svn: 295612	2017-02-19 19:36:58 +00:00
Craig Topper	ab1afa85ba	[AVX-512] Add test cases that show failure to select masked VPTERNLOG when a select is used to force the passthru operand to be not operand 0. llvm-svn: 295611	2017-02-19 19:36:54 +00:00
Simon Pilgrim	4271186f9c	[X86][SSE] Enable initial support for domain crossing at high shuffle combine depths. As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths. We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles. llvm-svn: 295608	2017-02-19 17:19:38 +00:00
Craig Topper	218d1a020e	[AVX-512] Add broadcast VPTERNLOG instructions to special case commuting switch. The instructions are marked commutable, but without special handling we don't get the immediate correct. While here also remove the masked memory forms that aren't commutable. llvm-svn: 295602	2017-02-19 08:03:26 +00:00
Craig Topper	94de4b9330	[AVX-512] Add patterns to show missed opportunities for folding vpternlog with broadcast loads. Also demonstrates a bug in the commuting of broadcast vpternlog instructions when we are able to select them. llvm-svn: 295601	2017-02-19 08:03:23 +00:00
NAKAMURA Takumi	486dfe11af	llvm/test/CodeGen/AMDGPU/r600.alu-limits.ll should require +Asserts. This would run into infinite loop anyways with -Asserts. llvm-svn: 295591	2017-02-19 02:31:06 +00:00
Craig Topper	de10312bea	Recommit "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR." Clang has now been fixed to not use these intrinsics. llvm-svn: 295571	2017-02-18 21:50:58 +00:00
Sanjay Patel	dc8a24ea4c	[x86] remove stale comments from tests; NFC llvm-svn: 295569	2017-02-18 21:07:37 +00:00
Sanjay Patel	12c2093e1e	[x86] fold sext (xor Bool, -1) --> sub (zext Bool), 1 This is the same transform that is current used for: select Bool, 0, -1 llvm-svn: 295568	2017-02-18 21:03:28 +00:00

... 6 7 8 9 10 ...

19938 Commits