llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	2dca3b287b	[X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions. This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions. llvm-svn: 276559	2016-07-24 08:26:38 +00:00
Simon Pilgrim	9e99dd9e8b	[X86][SSE] Added float widened broadcast tests llvm-svn: 276535	2016-07-23 21:24:02 +00:00
Simon Pilgrim	8d18969716	[X86][SSE] Added more widened broadcast tests Added more vXi16 and vXi8 tests llvm-svn: 276534	2016-07-23 21:15:31 +00:00
Simon Pilgrim	b9e47a8cd0	[X86][SSE] Added tests where we should be trying to widen a load+splat into a broadcast llvm-svn: 276527	2016-07-23 16:19:17 +00:00
Simon Pilgrim	8aa6f34455	[X86][SSE] Regenerated uitofp <2 x i32> -> <2 x float> conversion tests Demonstrate difference in codegen discussed on PR14760 llvm-svn: 276526	2016-07-23 15:55:42 +00:00
Craig Topper	b6519db90d	[AVX512] Implement commuting support for EVEX encoded FMA3 instructions. llvm-svn: 276521	2016-07-23 07:16:56 +00:00
Tom Stellard	b8253c88b6	Revert "[AMDGPU] Emit read-only data to .rodata for hsa" This reverts commit r276298. Data stored in .rodata can have a negative offset from .text, but we don't support negative values in relocations yet. This caused a regression in one of the amp conformance tests: 5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01 llvm-svn: 276498	2016-07-22 23:46:40 +00:00
Tim Northover	98a56eb7f4	GlobalISel: allow multiple types on MachineInstrs. llvm-svn: 276481	2016-07-22 22:13:36 +00:00
Tim Northover	33b07d6725	GlobalISel: implement legalization pass, with just one transformation. This adds the actual MachineLegalizeHelper to do the work and a trivial pass wrapper that legalizes all instructions in a MachineFunction. Currently the only transformation supported is splitting up a vector G_ADD into one acting on smaller vectors. llvm-svn: 276461	2016-07-22 20:03:43 +00:00
Anna Thomas	0be4a0e6a4	Invariant start/end intrinsics overloaded for address space Summary: The llvm.invariant.start and llvm.invariant.end intrinsics currently support specifying invariant memory objects only in the default address space. With this change, these intrinsics are overloaded for any adddress space for memory objects and we can use these llvm invariant intrinsics in non-default address spaces. Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr) This overloaded intrinsic is needed for representing final or invariant memory in managed languages. Reviewers: apilipenko, reames Subscribers: llvm-commits llvm-svn: 276447	2016-07-22 17:49:40 +00:00
Matt Arsenault	e2fe67b951	AMDGPU: Remove redundant test llvm-svn: 276439	2016-07-22 17:01:36 +00:00
Matt Arsenault	3c07c813c0	AMDGPU: Fix groupstaticsize for large LDS The size can exceed s_movk_i32's limit, and we don't want to use it this early since it inhibits optimizations. This should probably be merged to the release branch. llvm-svn: 276438	2016-07-22 17:01:33 +00:00
Matt Arsenault	8d718dcfda	AMDGPU: Add HSA dispatch id intrinsic llvm-svn: 276437	2016-07-22 17:01:30 +00:00
Matt Arsenault	7fb961f3e6	AMDGPU: Fix i1 fp_to_int R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435	2016-07-22 17:01:21 +00:00
Tim Northover	bd5054602e	GlobalISel: implement alloca instruction llvm-svn: 276433	2016-07-22 16:59:52 +00:00
Simon Pilgrim	820f87a72d	[SelectionDAG] Optimization of BITREVERSE legalization for power-of-2 integer scalar/vector types An extension of D19978, this patch replaces the default BITREVERSE evaluation of individual bit masks+shifts with block mask+shifts when we have integer elements of power-of-2 bits in size. After calling BSWAP to reverse the order of the constituent bytes (which typically follows a similar approach), every neighbouring 4-bits, 2-bits and finally 1-bit pairs are masked off and swapped over with shifts. In doing so we can significantly reduce the number of operations required. Differential Revision: https://reviews.llvm.org/D21578 llvm-svn: 276432	2016-07-22 16:46:25 +00:00
Krzysztof Parzyszek	d3d0a4bda3	[Hexagon] Use loop data prefetch on Hexagon llvm-svn: 276422	2016-07-22 14:22:43 +00:00
Simon Pilgrim	ea0d4f9962	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 (reapplied) As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276416	2016-07-22 13:58:44 +00:00
Ahmed Bougacha	29333c9de6	[FastISel] Ignore @llvm.assume. llvm-svn: 276410	2016-07-22 12:54:53 +00:00
Benjamin Kramer	5ba0e20315	Revert "[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128" It caused PR28657. This reverts commit r276281. llvm-svn: 276405	2016-07-22 11:03:10 +00:00
Hrvoje Varga	2db00ce4b6	[mips][microMIPS] Implement SLT, SLTI, SLTIU, SLTU microMIPS32r6 instructions Differential Revision: https://reviews.llvm.org/D19906 llvm-svn: 276397	2016-07-22 07:18:33 +00:00
Craig Topper	52e2e8381b	[AVX512] Add ExeDomain to vector extend and truncate instructions. llvm-svn: 276394	2016-07-22 05:46:44 +00:00
Craig Topper	f4151bea72	[AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions. llvm-svn: 276393	2016-07-22 05:00:52 +00:00
Craig Topper	0b90756b0a	[AVX512] Add load folding for some AVX512VL logic and arithmetic instructions. llvm-svn: 276391	2016-07-22 05:00:39 +00:00
Craig Topper	ab13b33ded	[AVX512] Update X86InstrInfo::foldMemoryOperandCustom to handle the EVEX encoded instructions too. llvm-svn: 276390	2016-07-22 05:00:35 +00:00
Quentin Colombet	ecd81a3d1b	[MIRTesting] Abort when failing to parse a function. When we failed to parse a function in the mir parser, we should abort the whole compilation instead of continuing in a weird state. Indeed, this was creating strange machine function passes failures that were hard to understand, until we notice that the function actually did not get parsed correctly! llvm-svn: 276348	2016-07-21 22:25:57 +00:00
Michael Kuperstein	c523333bbf	[X86] Do not use AND8ri8 in AVX512 pattern This variant is (as documented in the TD) for disassembler use only, and should not be used in patterns - it is longer, and is broken on 64-bit. llvm-svn: 276347	2016-07-21 22:24:08 +00:00
Akira Hatanaka	b8d2873d93	[AArch64][Inline-Asm] Return the 32-bit floating point register class when constraint "w" is used on a 32-bit operand. This enables compiling the following code, which used to error out in the backend: void foo1(int a) { asm volatile ("sqxtn h0, %s0\n" : : "w"(a):); } Fixes PR28633. llvm-svn: 276344	2016-07-21 21:39:05 +00:00
Anna Thomas	c858faa244	Revert "Invariant start/end intrinsics overloaded for address space" This reverts commit r276316. llvm-svn: 276320	2016-07-21 19:06:28 +00:00
Anna Thomas	29b24dfe44	Invariant start/end intrinsics overloaded for address space Summary: The llvm.invariant.start and llvm.invariant.end intrinsics currently support specifying invariant memory objects only in the default address space. With this change, these intrinsics are overloaded for any adddress space for memory objects and we can use these llvm invariant intrinsics in non-default address spaces. Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr) This overloaded intrinsic is needed for representing final or invariant memory in managed languages. Reviewers: tstellarAMD, reames, apilipenko Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22519 llvm-svn: 276316	2016-07-21 18:41:44 +00:00
Quentin Colombet	2b59eab79f	[IRTranslator] Add G_SUB opcode. This commit adds a generic SUB opcode to global-isel. llvm-svn: 276308	2016-07-21 17:26:50 +00:00
Konstantin Zhuravlyov	3c0d8d22fe	[AMDGPU] Emit read-only data to .rodata for hsa Differential Revision: https://reviews.llvm.org/D22538 llvm-svn: 276298	2016-07-21 15:59:23 +00:00
Quentin Colombet	7bcc921dd8	[IRTranslator] Add G_AND opcode. This commit adds a generic AND opcode to global-isel. llvm-svn: 276297	2016-07-21 15:50:42 +00:00
Geoff Berry	4ff2e36d32	[AArch64] Load/store opt: Don't count transient instructions towards search limits. Summary: This change also changes findMatchingInsn and findMatchingUpdateInsnForward to take DBG_VALUE opcodes into account when tracking register defs and uses, which could potentially inhibit these optimizations in the presence of debug information. Reviewers: mcrosier Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22582 llvm-svn: 276293	2016-07-21 15:20:25 +00:00
Simon Pilgrim	88e0940d3b	[X86][SSE] Allow folding of store/zext with PEXTRW of 0'th element Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW. But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register. This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41). Fix for PR27265. Differential Revision: https://reviews.llvm.org/D22509 llvm-svn: 276289	2016-07-21 14:54:17 +00:00
Simon Pilgrim	4caefdf834	Fixed line endings llvm-svn: 276287	2016-07-21 14:36:41 +00:00
Simon Pilgrim	c8e20b1150	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276281	2016-07-21 14:10:54 +00:00
Marina Yatsina	c1fa163392	ExecutionDepsFix - Fix bug in clearance calculation The clearance calculation did not take into account registers defined as outputs or clobbers in inline assembly machine instructions because these register defs are implicit. Differential Revision: http://reviews.llvm.org/D22580 llvm-svn: 276266	2016-07-21 12:37:07 +00:00
Matt Arsenault	f0ba86a4d5	AMDGPU: Fix phis from blocks split due to register indexing llvm-svn: 276257	2016-07-21 09:40:57 +00:00
Matthias Braun	d9fdad72ae	IPRA: Fix RegMask calculation for alias registers This patch fixes a very subtle bug in regmask calculation. Thanks to zan jyu Wong <zyfwong@gmail.com> for bringing this to notice. For example if CL is only clobbered than CH should not be marked clobbered but CX, RCX and ECX should be mark clobbered. Previously for each modified register all of its aliases are marked clobbered by markRegClobbred() in RegUsageInfoCollector.cpp but that is wrong because when CL is clobbered then MRI::isPhysRegModified() will return true for CL, CX, ECX, RCX which is correct behavior but then for CX, EXC, RCX we mark CH also clobbered as CH is aliased to CX,ECX,RCX so markRegClobbred() is not required because isPhysRegModified already take cares of proper aliasing register. A very simple test case has been added to verify this change. Please find relevant bug report here : http://llvm.org/PR28567 Patch by Vivek Pandya <vivekvpandya@gmail.com> Differential Revision: https://reviews.llvm.org/D22400 llvm-svn: 276235	2016-07-21 03:50:39 +00:00
Justin Lebar	cd564c6b46	[NVPTX] Enable the load-store vectorizer on nvptx. Reviewers: tra Subscribers: jholewinski, arsenm, asbirlea Differential Revision: https://reviews.llvm.org/D22592 llvm-svn: 276196	2016-07-20 22:11:36 +00:00
Artem Belevich	7e9c9a6582	[NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC. After r276153 the pass applies to both kernels and regular functions. Differential Revision: https://reviews.llvm.org/D22583 llvm-svn: 276189	2016-07-20 21:44:07 +00:00
Ahmed Bougacha	a0cdd79070	[AArch64][FastISel] Select -O0 legal cmpxchg. At -O0, cmpxchg survives AtomicExpand: it's mostly straightforward to select it in fast-isel, and let the pseudo be expanded later. extractvalues on the result are the tricky part: the generic logic only works for legal types (and it would be painful to make it support illegal types), so we can only support i32/i64 cmpxchg. llvm-svn: 276183	2016-07-20 21:12:32 +00:00
Ahmed Bougacha	b0674d1143	[AArch64][FastISel] Select atomic stores into STLR. llvm-svn: 276182	2016-07-20 21:12:27 +00:00
Tim Northover	62ae568bbb	GlobalISel: implement low-level type with just size & vector lanes. This should be all the low-level instruction selection needs to determine how to implement an operation, with the remaining context taken from the opcode (e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math). llvm-svn: 276158	2016-07-20 19:09:30 +00:00
Artem Belevich	74158b5061	[NVPTX] deal with all aggregate return types. Fixes a crash in llvm_unreachable when a function has array return type. Differential Revision: https://reviews.llvm.org/D22524 llvm-svn: 276154	2016-07-20 18:39:52 +00:00
Artem Belevich	b2e76a5e7a	[NVPTX] Improve lowering of byval args of device functions. Avoid unnecessary spills of byval arguments of device functions to local space on SASS level and subsequent pointer conversion to generic address space that follows. Instead, make a local copy in IR, provide a way to access arguments directly, and let LLVM optimize the copy away when possible. Differential Review: https://reviews.llvm.org/D21421 llvm-svn: 276153	2016-07-20 18:39:47 +00:00
Matt Arsenault	f14db7a933	AMDGPU: Add missing test coverage for control flow breaks None of the current lit tests hit si_break handling. llvm-svn: 276129	2016-07-20 15:20:35 +00:00
Yaxun Liu	4b1d9f7f18	AMDGPU: Fix bug causing crash due to invalid opencl version metadata. Differential Revision: https://reviews.llvm.org/D22526 llvm-svn: 276119	2016-07-20 14:38:06 +00:00
Diana Picus	f345d40ae2	[ARM] Skip inline asm memory operands in DAGToDAGISel Retry r275776 (no changes, we suspect the issue was with another commit). The current logic for handling inline asm operands in DAGToDAGISel interprets the operands by looking for constants, which should represent the flags describing the kind of operand we're dealing with (immediate, memory, register def etc). The operands representing actual data are skipped only if they are non-const, with the exception of immediate operands which are skipped explicitly when a flag describing an immediate is found. The oversight is that memory operands may be const too (e.g. for device drivers reading a fixed address), so we should explicitly skip the operand following a flag describing a memory operand. If we don't, we risk interpreting that constant as a flag, which is definitely not intended. Fixes PR26038 Differential Revision: https://reviews.llvm.org/D22103 llvm-svn: 276101	2016-07-20 09:48:24 +00:00
David Majnemer	5d26127752	Revert "Disable this-return argument forwarding on ARM/AArch64" Inference of the 'returned' attribute was fixed in r276008, lets try turning the backend support back on. This reverts commit r275677. llvm-svn: 276081	2016-07-20 04:13:01 +00:00
Matthias Braun	5b9722d6c7	Revert "RegScavenging: Add scavengeRegisterBackwards()" Reverting this commit for now as it seems to be causing failures on test-suite tests on the clang-ppc64le-linux-lnt bot. This reverts commit r276044. llvm-svn: 276068	2016-07-20 00:21:32 +00:00
Matt Arsenault	a1fe17c9ad	AMDGPU: Change fdiv lowering based on !fpmath metadata If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051	2016-07-19 23:16:53 +00:00
Matthias Braun	84fd4bee6c	RegScavenging: Add scavengeRegisterBackwards() This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 276044	2016-07-19 22:37:09 +00:00
Evandro Menezes	238fa76574	[AArch64] Properly validate the reciprocal estimation. Add check for legal data types when expanding into a Newton series. Differential Revision: https://reviews.llvm.org/D22267 llvm-svn: 276041	2016-07-19 22:31:11 +00:00
Ahmed Bougacha	5a59b24bdd	[GlobalISel] Mark newly-created gvregs as having a bank. Also verify that we never try to set the size of a vreg associated to a register class. Report an error when we encounter that in MIR. Fix a testcase that hit that error and had a size for no reason. llvm-svn: 276012	2016-07-19 19:48:36 +00:00
Simon Pilgrim	5366d0e0bc	[X86][AVX512] Added AVX512 subvector broadcast tests llvm-svn: 275994	2016-07-19 17:04:28 +00:00
Simon Pilgrim	f2d02cb0f6	[X86][AVX] Fixed typo in test names llvm-svn: 275992	2016-07-19 16:52:05 +00:00
Simon Pilgrim	0ea8d275cc	[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. A companion clang patch is at D22105 Differential Revision: https://reviews.llvm.org/D22106 llvm-svn: 275981	2016-07-19 15:07:43 +00:00
Sam Parker	6ca4bbb00d	[ARM] Refactor Thumb2 Mul and Mla instr descs Recommitting after r274347 was reverted. This patch introduces some classes to refactor the 3 and 4 register Thumb2 multiplication instruction descriptions, plus improved tests for some of those instructions. Differential Revision: https://reviews.llvm.org/D21929 llvm-svn: 275979	2016-07-19 14:44:05 +00:00
Simon Pilgrim	b87a21f1c3	[AARCH64] Fix linu triple typo As promised in D22191 llvm-svn: 275976	2016-07-19 14:12:45 +00:00
Simon Pilgrim	fc4d4b251d	[AARCH64] Enable AARCH64 lit tests on windows dev machines As discussed on PR27654, this patch fixes the triples of a lot of aarch64 tests and enables lit tests on windows This will hopefully help stop cases where windows developers break the aarch64 target Differential Revision: https://reviews.llvm.org/D22191 llvm-svn: 275973	2016-07-19 13:35:11 +00:00
Daniel Sanders	6a73883c48	[mips] Correct label prefixes for N32 and N64. Summary: N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own ($). This fixes the majority of object differences between -fintegrated-as and -fno-integrated-as. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D22412 llvm-svn: 275967	2016-07-19 10:49:03 +00:00
Elena Demikhovsky	2c0780b8e5	AVX-512: Fixed BT instruction selection. The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets. But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1). I added the new sequence (truncate (a >>n) to i1) to the BT pattern. Differential Revision: https://reviews.llvm.org/D22354 llvm-svn: 275950	2016-07-19 07:14:21 +00:00
Craig Topper	d6ca1dc45e	[AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions. llvm-svn: 275942	2016-07-19 02:00:38 +00:00
Matt Arsenault	cb540bc03c	AMDGPU: Expand register indexing pseudos in custom inserter This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934	2016-07-19 00:35:03 +00:00
Matt Arsenault	50b76399ed	AMDGPU: Fix test name and broken CHECK-LABEL llvm-svn: 275928	2016-07-18 23:09:51 +00:00
Artem Belevich	9f97dcb018	[NVPTX] Make sure we adjust alignment at all call sites .. including calls from kernel functions that were ignored by mistake before. llvm-svn: 275920	2016-07-18 21:58:48 +00:00
Artem Belevich	052b1ed2fd	[NVPTX] Force minimum alignment of 4 for byval arguments of device-side functions. Taking address of a byval variable in PTX is legal, but currently runs into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042). Work around the issue by enforcing minimum alignment on byval arguments of device functions. The change is a no-op on SASS level for sm_3x where ptxas already aligns local copy by at least 4. Differential Revision: https://reviews.llvm.org/D22428 llvm-svn: 275893	2016-07-18 19:54:56 +00:00
Vitaly Buka	c93e10fcbb	Revert "[ARM] Skip inline asm memory operands in DAGToDAGISel" Breaks asan, see https://reviews.llvm.org/D22103 This reverts commit r275776. llvm-svn: 275890	2016-07-18 19:44:01 +00:00
Vitaly Buka	fa474e3eb9	Revert "[ARM] Update test to use CHECK-LABEL. NFCI." Breaks asan, see https://reviews.llvm.org/D22103 This reverts commit r275777. llvm-svn: 275889	2016-07-18 19:43:58 +00:00
Simon Pilgrim	069c732f82	[X86][SSE] Regenerate extraction from promotion test Added tests for SSE2 as well as SSE41 llvm-svn: 275878	2016-07-18 18:53:15 +00:00
Simon Pilgrim	a68b8df3a7	[X86][SSE] Regenerate extraction+store memop tests Added tests for SSE2 as well as SSE41+AVX llvm-svn: 275876	2016-07-18 18:44:01 +00:00
Simon Pilgrim	b21b47ba61	[X86][SSE] Regenerate truncate+extension memop tests Added tests for SSE2 as well as SSE41 llvm-svn: 275875	2016-07-18 18:42:33 +00:00
Simon Pilgrim	600baaed89	Regenerate test llvm-svn: 275872	2016-07-18 18:38:51 +00:00
Matt Arsenault	c96e1deffa	AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32 llvm-svn: 275871	2016-07-18 18:35:05 +00:00
Matt Arsenault	4c519d3518	AMDGPU/R600: Replace barrier intrinsics llvm-svn: 275870	2016-07-18 18:34:59 +00:00
Matt Arsenault	efb24540b1	AMDGPU: Remove dead check in AMDGPUPromoteAlloca This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869	2016-07-18 18:34:53 +00:00
Tim Northover	918f05063c	CodeGenPrep: use correct function to determine Global's alignment. Elsewhere (particularly computeKnownBits) we assume that a global will be aligned to the value returned by Value::getPointerAlignment. This is used to boost the alignment on memcpy/memset, so any target-specific request can only increase that value. llvm-svn: 275866	2016-07-18 18:28:52 +00:00
Simon Pilgrim	c941f6b329	[X86][AVX] Add target shuffle decode support for VBROADCAST Currently we only decode broadcasts from a vector of the same size. llvm-svn: 275823	2016-07-18 17:32:59 +00:00
Krzysztof Parzyszek	5948ea78b9	[Hexagon] Handle returning small structures by value This is compliant with the official ABI, but allows experimentation with calling conventions. llvm-svn: 275822	2016-07-18 17:30:41 +00:00
Chih-Hung Hsieh	4d9f2c154d	[X86] Accept SELECT op code for x86-64 fp128 type DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow SELECT op code for x86_64 fp128 type for MME targets, so SoftenFloatOperand does not abort on SELECT op code. Differential Revision: http://reviews.llvm.org/D21758 llvm-svn: 275818	2016-07-18 17:20:09 +00:00
Simon Pilgrim	4ac7420618	[X86][AVX2] Added tests that demonstrate duplicate broadcasts We don't yet decode broadcasts as a target shuffle llvm-svn: 275808	2016-07-18 16:17:34 +00:00
Krzysztof Parzyszek	786333ffcc	[Hexagon] Enable .cur formation in MISched for Hexagon V60 Schedule a load and its use in the same packet in MISched. Previously, isResourceAvailable was returning false for dependences in the same packet, which prevented MISched from packetizing a load and its use in the same packet for v60. Patch by Ikhlas Ajbar. llvm-svn: 275804	2016-07-18 16:05:27 +00:00
Nemanja Ivanovic	d3c284f645	[PowerPC] Remove redundant direct moves when extracting integers and converting to FP This patch corresponds to review: https://reviews.llvm.org/D21354 We use direct moves for extracting integer elements from vectors. We also use direct moves when converting integers to FP. When these operations are chained, we get a direct move out of a VSR followed by a direct move back into a VSR. These are redundant - all we need to do is line up the element and convert. llvm-svn: 275796	2016-07-18 15:30:00 +00:00
Krzysztof Parzyszek	393b37937b	[Hexagon] Use timing class info as tie-breaker in machine scheduler Patch by Sirish Pande. llvm-svn: 275794	2016-07-18 15:17:10 +00:00
Krzysztof Parzyszek	3467e9d0a9	[Hexagon] HexagonMachineScheduler should account for resources The machine scheduler needs to account for available resources more accurately in order to avoid scheduling an instruction that forces a new packet to be created. This occurs in two ways: First, an instruction without an available resource may have a large priority due to other metrics and be scheduled when there are other instructions with available resources. Second, an instruction with a non-zero latency may become available prematurely. In both these cases, we attempt change the priority in order to allow a better instruction to be scheduled. Patch by Brendon Cahoon. llvm-svn: 275793	2016-07-18 14:52:13 +00:00
Krzysztof Parzyszek	748d3efec6	[Hexagon] Fix zero latency instructions with multiple predecessors An instruction may have multiple predecessors that are candidates for using .cur. However, only one of them can use .cur in the packet. When this case occurs, we need to make sure that only one of the dependences gets a 0 latency value. Patch by Brendon Cahoon. llvm-svn: 275790	2016-07-18 14:23:10 +00:00
Simon Dardis	d32a2d30cb	[inlineasm] Propagate operand constraints to the backend When SelectionDAGISel transforms a node representing an inline asm block, memory constraint information is not preserved. This can cause constraints to be broken when a memory offset is of the form: offset + frame index when the frame is resolved. By propagating the constraints all the way to the backend, targets can enforce memory operands of inline assembly to conform to their constraints. For MIPSR6, some instructions had their offsets reduced to 9 bits from 16 bits such as ll/sc. This becomes problematic when using inline assembly to perform atomic operations, as an offset can generated that is too big to encode in the instruction. Reviewers: dsanders, vkalintris Differential Review: https://reviews.llvm.org/D21615 llvm-svn: 275786	2016-07-18 13:17:31 +00:00
Nicolai Haehnle	bef1ceb815	AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions. Summary: The work item intrinsics are not available for the shader calling conventions. And even if we did hook them up most shader stages haves some extra restrictions on the amount of available LDS. Reviewers: tstellarAMD, arsenm Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D20728 llvm-svn: 275779	2016-07-18 09:02:47 +00:00
Diana Picus	6731f13458	[ARM] Update test to use CHECK-LABEL. NFCI. llvm-svn: 275777	2016-07-18 07:48:42 +00:00
Diana Picus	73ed44d328	[ARM] Skip inline asm memory operands in DAGToDAGISel The current logic for handling inline asm operands in DAGToDAGISel interprets the operands by looking for constants, which should represent the flags describing the kind of operand we're dealing with (immediate, memory, register def etc). The operands representing actual data are skipped only if they are non-const, with the exception of immediate operands which are skipped explicitly when a flag describing an immediate is found. The oversight is that memory operands may be const too (e.g. for device drivers reading a fixed address), so we should explicitly skip the operand following a flag describing a memory operand. If we don't, we risk interpreting that constant as a flag, which is definitely not intended. Fixes PR26038 Differential Revision: https://reviews.llvm.org/D22103 llvm-svn: 275776	2016-07-18 07:35:14 +00:00
Craig Topper	a3c55f5915	[AVX512] Add EVEX versions of scalar ADD/SUB/MUL/DIV to load folding tables. llvm-svn: 275775	2016-07-18 06:49:32 +00:00
Craig Topper	83613bb436	[X86] Fix test checks to include leading 'v' on avx mnemonic names. llvm-svn: 275774	2016-07-18 06:49:29 +00:00
Diana Picus	774d157a5d	[ARM] Honour ABI for rem under -O0 for EABI, GNUEABI, Android and Musl At higher optimization levels, we generate the libcall for DIVREM_Ix, which is fine: aeabi_{u\|i}divmod. At -O0 we generate the one for REM_Ix, which is the default {u}mod{q\|h\|s\|d}i3. This commit makes sure that we don't generate REM_Ix calls for ABIs that don't support them (i.e. where we need to use DIVREM_Ix instead). This is achieved by bailing out of FastISel, which can't handle non-double multi-reg returns, and letting the legalization infrastructure expand the REM_Ix calls. It also updates the divmod-eabi.ll test to run under -O0 as well, and adds some Windows checks to it to make sure we don't break things for it. Fixes PR27068 Differential Revision: https://reviews.llvm.org/D21926 llvm-svn: 275773	2016-07-18 06:48:25 +00:00
Craig Topper	1af6cc00dc	[X86] Add VPADD instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275769	2016-07-18 06:14:54 +00:00
Craig Topper	ba9b93d7f2	[X86] Add floating point packed logical ops to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275768	2016-07-18 06:14:50 +00:00
Craig Topper	3a99de4067	[X86] Add AVX512 instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275767	2016-07-18 06:14:47 +00:00
Craig Topper	f7a06c29bc	[X86] Add AVX512 load opcodes and a couple AVX load opcodes to X86InstrInfo::areLoadsFromSameBasePtr. llvm-svn: 275765	2016-07-18 06:14:43 +00:00
Craig Topper	650a15e2b3	[X86] Add more opcodes to isFrameLoadOpcode/isFrameStoreOpcode. Mainly AVX-512 related. llvm-svn: 275764	2016-07-18 06:14:39 +00:00

1 2 3 4 5 ...

16742 Commits