llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	7823fd2535	[X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type. This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain..... llvm-svn: 259816	2016-02-04 19:27:51 +00:00
Chad Rosier	05f8020cdf	[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3). This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769 and r259790. The tramp3d failure was caused by an incorrect refactoring in the patch. Specifically, we weren't always properly clearing the SExtIdx flag. llvm-svn: 259812	2016-02-04 18:59:49 +00:00
Silviu Baranga	33b3bd17dd	[AArch64] Multiply extended 32-bit ints with `[U\|S]MADDL' During instruction selection, the AArch64 backend can recognise the following pattern and generate an [U\|S]MADDL instruction, i.e. a multiply of two 32-bit operands with a 64-bit result: (mul (sext i32), (sext i32)) However, when one of the operands is constant, the sign extension gets folded into the constant in SelectionDAG::getNode(). This means that the instruction selection sees this: (mul (sext i32), i64) ...which doesn't match the pattern. Sign-extension and 64-bit multiply instructions are generated, which are slower than one 32-bit multiply. Add a pattern to match this and generate the correct instruction, for both signed and unsigned multiplies. Patch by Chris Diamand! llvm-svn: 259800	2016-02-04 16:47:09 +00:00
Simon Pilgrim	6788f33cf2	[X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load. Differential Revision: http://reviews.llvm.org/D16729 llvm-svn: 259796	2016-02-04 16:12:56 +00:00
Chad Rosier	18896c0f5e	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR." This reverts commit r259790. tramp3d-v4 is still having problems. llvm-svn: 259795	2016-02-04 16:01:40 +00:00
Elena Demikhovsky	86528270b9	AVX-512: Fixed a bug in FMA instruction selection on KNL The FMA instruction was selected from AVX2 set instead of AVX-512 Differential Revision: http://reviews.llvm.org/D16884 llvm-svn: 259792	2016-02-04 15:11:11 +00:00
Chad Rosier	feec2aeb0f	[AArch64] Improve load/store optimizer to handle LDUR + LDR. This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769, which was reverted in r246782 due to a test-suite failure. I'm unable to reproduce the issue at this time. llvm-svn: 259790	2016-02-04 14:42:55 +00:00
Michael Zuckerman	7d73360479	[AVX512] add vfmadd132ss and vfmadd132sd Intrinsic Differential Revision: http://reviews.llvm.org/D16589 llvm-svn: 259789	2016-02-04 14:41:08 +00:00
Simon Pilgrim	1d2d6c5a57	[X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC. llvm-svn: 259771	2016-02-04 09:27:19 +00:00
Andrey Turetskiy	bca0f99224	[X86] Use hash table in LEA optimization pass. Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time. Differential Revision: http://reviews.llvm.org/D16404 llvm-svn: 259770	2016-02-04 08:57:03 +00:00
Jingyue Wu	f650441b04	[NVPTX] Disable performance optimizations when OptLevel==None Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749	2016-02-04 04:15:36 +00:00
Sanjay Patel	460ce9cd9b	clean up; NFC llvm-svn: 259720	2016-02-03 22:37:37 +00:00
Saleem Abdulrasool	f36005a358	ARM: support TLS for WoA Add support for TLS access for Windows on ARM. This generates a similar access to MSVC for ARM. The changes to the tablegen data is needed to support loading an external symbol global that is not for a call. The adjustments to the DAG to DAG transforms are needed to preserve the 32-bit move. llvm-svn: 259676	2016-02-03 18:21:59 +00:00
Renato Golin	6027dd38ef	[ARM] Move GNUEABI divmod to __aeabi_divmod* The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores which happens to be a lot faster than __divsi3/__modsi3 when the core has hardware divide instructions. Do the same here. Fixes PR26450. llvm-svn: 259657	2016-02-03 16:10:54 +00:00
Daniel Sanders	3b1a2dbffa	[mips] Remove redundant inclusions of MipsAnalyzeImmediate.h llvm-svn: 259655	2016-02-03 15:54:12 +00:00
Nemanja Ivanovic	82e1168989	Fix for PR 26381 Simple fix - Constant values were not being sign extended in FastIsel. llvm-svn: 259645	2016-02-03 12:53:38 +00:00
Simon Atanasyan	e774126c96	[mips] Add SHF_MIPS_GPREL flag to the MIPS .sbss and .sdata sections MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL flag. See Figure 4–7 on page 69 in the following document: ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf. Differential Revision: http://reviews.llvm.org/D15740 llvm-svn: 259641	2016-02-03 11:50:22 +00:00
Simon Pilgrim	18bcf93efb	[X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads Follow up to D16217 and D16729 This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD Differential Revision: http://reviews.llvm.org/D16768 llvm-svn: 259635	2016-02-03 09:41:59 +00:00
Kyle Butt	d62d8b771d	Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates. The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms on PPC. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7 ;v6 = v6 + v5 * v7 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96 ;v5 = v5 * v7 + v96 This was broken in the case where the target register was also used as a multiplicand. Fix this case by checking for it and replacing both uses with the copied register. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6 ;v6 = v6 + v5 * v6 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96 ;v5 = v5 * v96 + v96 llvm-svn: 259617	2016-02-03 01:41:09 +00:00
Yunzhong Gao	eb959722a7	Revert r259576: Disable the vzeroupper insertion pass on PS4. Will re-implement based on review feedback. llvm-svn: 259615	2016-02-03 01:25:12 +00:00
Yunzhong Gao	b76ccacfb1	Disable the vzeroupper insertion pass on PS4. See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation. Original patch by: Sean Silva llvm-svn: 259576	2016-02-02 21:39:23 +00:00
Matt Arsenault	de4208122b	AMDGPU: Do not promote allocas with non-inbounds GEPs If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573	2016-02-02 21:16:12 +00:00
Matt Arsenault	7e747f1a38	AMDGPU: Handle promoting memmove Also add missing tests for the others. llvm-svn: 259558	2016-02-02 20:28:10 +00:00
Quentin Colombet	b8fb2ba1bb	[X86] Fix the merging of SP updates in prologue/epilogue insertions. When the merging was involving LEAs, we were taking the wrong immediate from the list of operands. rdar://problem/24446069 llvm-svn: 259553	2016-02-02 20:11:17 +00:00
Matt Arsenault	8b175672cb	AMDGPU: Skip promote alloca with no optimizations llvm-svn: 259551	2016-02-02 19:32:42 +00:00
Matt Arsenault	fb8cdbae0c	AMDGPU: Minor cleanups for AMDGPUPromoteAlloca Mostly convert to use range loops. llvm-svn: 259550	2016-02-02 19:32:35 +00:00
Matt Arsenault	e5737f7cac	AMDGPU: Report AMDGPUPromoteAlloca changed the function llvm-svn: 259547	2016-02-02 19:18:57 +00:00
Matt Arsenault	ad1348459f	AMDGPU: Whitelist handled intrinsics We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546	2016-02-02 19:18:53 +00:00
Matt Arsenault	853a1fc6d9	AMDGPU: Use inbounds when calculating workitem offset When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545	2016-02-02 19:18:48 +00:00
Eugene Zelenko	ecefe5a81f	Fix Clang-tidy readability-redundant-control-flow warnings; other minor fixes. Differential revision: http://reviews.llvm.org/D16793 llvm-svn: 259539	2016-02-02 18:20:45 +00:00
Derek Schuff	c6d8fd3f54	[MC] Enable eip-relative addressing on x86-64 for X32 ABI Summary: Enables eip-based addressing, e.g., lea constant(%eip), %rax lea constant(%eip), %eax in MC, (used for the x32 ABI). EIP-base addressing is also valid in x86_64, it is left enabled for that architecture as well. Patch by João Porto Differential Revision: http://reviews.llvm.org/D16581 llvm-svn: 259528	2016-02-02 17:20:04 +00:00
Chad Rosier	1142f3cf90	[AArch64] Add a FIXME comment. llvm-svn: 259515	2016-02-02 15:22:55 +00:00
Chad Rosier	bba881ef3d	[AArch64] Allocate the modified and used regs only once per function. llvm-svn: 259510	2016-02-02 15:02:30 +00:00
JF Bastien	926b189a81	WebAssembly: update expected GCC torture test failures The 3 programs used __attribute__((mode(?))) on enum, which clang r259497 fixed. llvm-svn: 259508	2016-02-02 14:27:34 +00:00
Oliver Stannard	7e7d983a87	Refactor backend diagnostics for unsupported features Re-commit of r258951 after fixing layering violation. The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. llvm-svn: 259498	2016-02-02 13:52:43 +00:00
Simon Pilgrim	96fe4ef5f7	[X86][AVX512] Add support for AVX512 VMOVQ (load) shuffle decoding llvm-svn: 259496	2016-02-02 13:32:56 +00:00
JF Bastien	dc1255f02f	WebAssembly: add option to disable register coloring Having this hidden option makes it easier to debug other issues. llvm-svn: 259482	2016-02-02 09:30:01 +00:00
Sjoerd Meijer	ffe19f5245	Removed FeatureVFPOnlySP from the Cortex-R7 processor model description and changed the regression test accordingly. The default configuration of a Cortex-R7 is to implement the VFPv3-D16 architecture and the feature line as it was is too restrictive. llvm-svn: 259480	2016-02-02 09:28:20 +00:00
Sanjoy Das	881de4d12a	[X86] Fix a bug in getMemOpBaseRegImmOfs Fix a crash in `getMemOpBaseRegImmOfs` that happens if the base of `MemOp` is a frame index memory operand. The fix is to have `getMemOpBaseRegImmOfs` bail out in such cases. We can possibly be more clever here, if needed. llvm-svn: 259456	2016-02-02 02:32:43 +00:00
Ahmed Bougacha	68a8efa374	[X86][FastISel] Don't force Nearest-Even rounding for VCVTPS2PH, use MXCSR. FastISel counterpart to r259448. llvm-svn: 259449	2016-02-02 01:44:03 +00:00
Ahmed Bougacha	55c6682ae2	[X86] Don't force Nearest-Even rounding for VCVTPS2PH, use MXCSR. Officially, we don't acknowledge non-default configurations of MXCSR, as getting there would require usage of the FENV_ACCESS pragma (at least insofar as rounding mode is concerned). We don't support the pragma, so we can assume that the default rounding mode - round to nearest, ties to even - is always used. However, it's inconsistent with the rest of the instruction set, where MXCSR is always effective (unless otherwise specified). Also, it's an unnecessary obstacle to the few brave souls that use fenv.h with LLVM. Avoid the hard-coded rounding mode for fp_to_f16; use MXCSR instead. llvm-svn: 259448	2016-02-02 01:32:50 +00:00
Sanjay Patel	c54600dbb1	fix typos; NFC llvm-svn: 259438	2016-02-01 23:53:35 +00:00
Simon Pilgrim	5be17b6e3e	[X86][AVX512] Add support for AVX512 VMOVD (load) shuffle decoding llvm-svn: 259430	2016-02-01 23:04:05 +00:00
Simon Pilgrim	f5c23ad3d7	[X86][AVX512] Add support for AVX512 VMOVSD/VMOVSS shuffle decoding llvm-svn: 259427	2016-02-01 22:26:28 +00:00
Simon Pilgrim	025a3d857a	[X86][AVX512] Add support for AVX512 VINSERTPS shuffle decoding llvm-svn: 259420	2016-02-01 22:05:50 +00:00
Matthias Braun	3f88eabe93	SmallSet/SmallPtrSet: Refuse huge Small numbers These sets do linear searching in small mode; It is not a good idea to use huge numbers as the small value here, save people from themselves by adding a static_assert. Differential Revision: http://reviews.llvm.org/D16706 llvm-svn: 259419	2016-02-01 22:05:16 +00:00
Chad Rosier	dbdb1d6eaf	Move comments a bit closer to associated code. NFC. llvm-svn: 259411	2016-02-01 21:38:31 +00:00
Chad Rosier	064261da16	Remove extra semicolon. NFC. llvm-svn: 259402	2016-02-01 20:54:36 +00:00
Balaram Makam	92431703d7	AArch64: Implement missed conditional compare sequences. Summary: This is an extension to the existing implementation of r242436 which restricts to only select inputs. This version fixes missed opportunities in pr26084 by attempting to lower conditional compare sequences of and/or trees with setcc leafs. This will additionaly handle the case when a tree with select input is not a conjunction-disjunction tree but some of the sub trees are conjunction-disjunction trees. Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry Differential Revision: http://reviews.llvm.org/D16291 llvm-svn: 259387	2016-02-01 19:13:07 +00:00
Geoff Berry	29d4a695f4	[AArch64] Simplify prolog/epilog callee save/restore. NFC. Summary: Factor out common code for callee-save register pair calculation. This is intended to simplify follow-on changes that reduce the number of registers saved/restored. Depends on D16732 Reviewers: mcrosier, jmolloy, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16734 llvm-svn: 259384	2016-02-01 19:07:06 +00:00
Ulrich Weigand	4a4d4ab7a4	[SystemZ] Fix wrong-code generation for certain always-false conditions We've found another bug in the code generation logic conditions for a certain class of always-false conditions, those of the form if ((a & 1) < 0) These only reach the back end when compiling without optimization. The bug was introduced by the choice of using TEST UNDER MASK to implement a check for if ((a & MASK) < VAL) as if ((a & MASK) == 0) where VAL is less than the the lowest bit of MASK. This is correct in all cases except for VAL == 0, in which case the original condition is always false, but the replacement isn't. Fixed by excluding that particular case. llvm-svn: 259381	2016-02-01 18:31:19 +00:00
Colin LeMahieu	6fdfa3dc32	[NFC] Referencing manual for reason why subregbit is checked llvm-svn: 259380	2016-02-01 18:15:39 +00:00
Geoff Berry	04bf91a8c1	[AArch64] Simplify callee-save register save/restore. NFC. Summary: Simplify callee-save register save/restore code generation by remembering the size of the callee-save area when it is computed so we don't have to scan the prologue/epilogue instructions again later to reconstruct it. This is intended to simplify follow-on changes that reduce the number of registers saved/restored. Reviewers: mcrosier, jmolloy, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16732 llvm-svn: 259365	2016-02-01 16:29:19 +00:00
Asaf Badouh	5a3a0231f4	[X86][AVX512VBMI] add encoding and intrinsics for Multishift Differential Revision: http://reviews.llvm.org/D16399 llvm-svn: 259363	2016-02-01 15:48:21 +00:00
Daniel Sanders	f8bb23e509	[mips] Range check uimm16 and fix several bugs this revealed. Summary: The bugs were: * teq and similar take 4-bit unsigned immediates on microMIPS. * teqi and similar have side-effects like teq do. * shll_s.w and shra_r.w take 5-bit unsigned immediates. * The various DSP ext* instructions take a 5-bit immediate. * repl.qh takes an 8-bit unsigned immediate. * repl.ph takes a 10-bit unsigned immediate. * rddsp/wrdsp take a 10-bit unsigned immediate. * teqi and similar take signed 16-bit immediates (10-bit for microMIPS). * Out-of-range immediate macros for or/xor take a simm32/simm64 depending on architecture. I'll fix the simm64 case properly when I reach simm32. lui is a bit more lenient than GAS and accepts signed immediates in addition to unsigned. This is because MipsMCExpr can produce signed values when constant folding and it currently lacks a way of knowing it should fold to an unsigned value. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D15446 llvm-svn: 259360	2016-02-01 15:13:31 +00:00
JF Bastien	a5b8ea0d66	WebAssembly NFC: simplify control flow This should now be easier to read. llvm-svn: 259349	2016-02-01 10:46:16 +00:00
Igor Breger	56b039ea17	AVX512: fix mask handling for gather/scatter/prefetch intrinsics. Differential Revision: http://reviews.llvm.org/D16755 llvm-svn: 259346	2016-02-01 09:57:15 +00:00
Simon Pilgrim	1358d86659	[X86][SSE] Find source of the inserted element of INSERTPS Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle. Differential Revision: http://reviews.llvm.org/D16652 llvm-svn: 259343	2016-02-01 08:59:30 +00:00
Igor Breger	6cc9115cec	AVX512 : Fix SETCCE lowering for KNL 32 bit. Differential Revision: http://reviews.llvm.org/D16752 llvm-svn: 259342	2016-02-01 07:56:09 +00:00
David Majnemer	efb41741f2	[X86] Cleanup the WinEHState pass Remove unnecessary includes and class state. No functional change intended. llvm-svn: 259340	2016-02-01 04:28:59 +00:00
Craig Topper	3ef74f5956	Replace usages of llvm::utostr_32 with just llvm::utostr. While this is less efficient, its unclear the few places that were using the _32 version were doing so for efficiency. llvm-svn: 259330	2016-01-31 20:00:24 +00:00
JF Bastien	578c8cde53	WebAssembly: more failures are gone llvm-svn: 259321	2016-01-31 08:19:40 +00:00
JF Bastien	ac9e8664a4	WebAssembly: update expected failures r259305 fixed a few assertions around FrameIndex, and I forgot to update these failures despite having run the torture tests. llvm-svn: 259320	2016-01-31 08:05:05 +00:00
Derek Schuff	c97ba939d1	[WebAssembly] Fix uses of FrameIndex as store values Previously the code assumed all uses of FI on loads and stores were as addresses. This checks whether the use is the address or a value and handles the latter case as it does for non-memory instructions. llvm-svn: 259306	2016-01-30 21:43:08 +00:00
JF Bastien	fbc89d21dd	WebAssembly: don't optimize frameindex store The previous code was incorrect (can't getReg a frameindex). We could instead optimize it to reduce tree height, but I'm not sure that's worthwhile yet because we then try to eliminate the frameindex. This patch also fixes frame index elimination for operations which may load or store: it used to assume the base was operand 2 and immediate offset operand 1. That's not true for stores, where they're 4 and 3. llvm-svn: 259305	2016-01-30 14:11:26 +00:00
JF Bastien	3ca3ea690f	WebAssembly NFC: fix build warning WebAssemblyFrameLowering.cpp:158:44: warning: enumeral and non-enumeral type in conditional expression [enabled by default] llvm-svn: 259303	2016-01-30 11:19:26 +00:00
Matt Arsenault	e013246462	AMDGPU: Fix emitting invalid workitem intrinsics for HSA The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297	2016-01-30 05:19:45 +00:00
Matt Arsenault	d0799df707	AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr Only the dispatch.ptr intrinsic is supposed to be used now to get the workgroup size, and the read.local.size intrinsics do not work correctly. llvm-svn: 259296	2016-01-30 05:10:59 +00:00
Dan Gohman	ed0f113885	[WebAssembly] Refine block placement to insert blocks between trees. Refine the test for whether an instruction is in an expression tree so that it detects when one tree ends and another begins, so we can place a block at that point, rather than continuing to find the first instruction not in a tree at all. llvm-svn: 259294	2016-01-30 05:01:06 +00:00
Matt Arsenault	43976df0da	AMDGPU: Add new amdgcn workitem intrinsics These use the correct prefix and follow the HSA naming convention rather than the config register option names. llvm-svn: 259293	2016-01-30 04:25:19 +00:00
Matthias Braun	b30f2f5141	Avoid overly large SmallPtrSet/SmallSet These sets perform linear searching in small mode so it is never a good idea to use SmallSize/N bigger than 32. llvm-svn: 259283	2016-01-30 01:24:31 +00:00
Justin Lebar	ead59f4765	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor. Summary: Previously we'd just silently skip these. Reviewers: tra, jholewinski Subscribers: llvm-commits, jhen, echristo, Differential Revision: http://reviews.llvm.org/D16739 llvm-svn: 259279	2016-01-30 01:07:38 +00:00
Yaron Keren	eb2a25467e	Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith r259192 post commit comment. clang part in r259232, this is the LLVM part of the patch. llvm-svn: 259240	2016-01-29 20:50:44 +00:00
Tim Northover	c4093c3ced	ARM: don't mangle DAG constant if it has more than one use The basic optimisation was to convert (mul $LHS, $complex_constant) into roughly "(shl (mul $LHS, $simple_constant), $simple_amt)" when it was expected to be cheaper. The original logic checks that the mul only has one use (since we're mangling $complex_constant), but when used in even more complex addressing modes there may be an outer addition that can pick up the wrong value too. I think the ARM addressing-mode problem is actually unreachable at the moment, but that depends on complex assessments of the profitability of pre-increment addressing modes so I've put a real check in there instead of an assertion. llvm-svn: 259228	2016-01-29 19:18:46 +00:00
Derek Schuff	d91a12ec11	[WebAssembly] Update test expectations llvm-svn: 259223	2016-01-29 18:54:38 +00:00
Derek Schuff	6ea637af35	[WebAssembly] Support frame pointer Add support for frame pointer use in prolog/epilog. Supports dynamic allocas but not yet over-aligned locals. Target-independend CG generates SP updates, but we still need to write back the SP value to memory when necessary. llvm-svn: 259220	2016-01-29 18:37:49 +00:00
Zoran Jovanovic	d474ef3a3b	[mips] Absolute value macro expansion Author: obucina Reviewers: dsanders Differential Revision: http://reviews.llvm.org/D16323 llvm-svn: 259202	2016-01-29 16:18:34 +00:00
Alexandros Lamprineas	8c26e7c647	[ARM] Emit trap instruction using .inst directive The trap instruction is emitted as a data-in-text rather than an instruction. This patch uses the .inst directive for emitting trap. Differential Revision: http://reviews.llvm.org/D16684 llvm-svn: 259182	2016-01-29 10:23:32 +00:00
Matt Arsenault	295875efda	AMDGPU: Remove 24-bit intrinsics The known bit matching code seems to work reasonably well, so these shouldn't really be needed. llvm-svn: 259180	2016-01-29 10:05:16 +00:00
Eric Christopher	7d9b9b2d7d	Refactor common code for PPC fast isel load immediate selection. llvm-svn: 259178	2016-01-29 07:20:30 +00:00
Eric Christopher	5a2429e239	Since LI/LIS sign extend the constant passed into the instruction we should check that the sign extended constant fits into 16-bits if we want a zero extended value, otherwise go ahead and put it together piecemeal. Fixes PR26356. llvm-svn: 259177	2016-01-29 07:20:01 +00:00
Eric Christopher	80ba58a15c	Fix up conditional formatting. llvm-svn: 259176	2016-01-29 07:19:49 +00:00
David Majnemer	f2bb710da5	[WinEH] Don't perform state stores in cleanups Our cleanups do not support true lexical nesting of funclets which obviates the need to perform state stores. This fixes PR26361. llvm-svn: 259161	2016-01-29 05:33:15 +00:00
Ahmed Bougacha	53010a0d5b	[AArch64] Fix i64 nontemporal high-half extraction. Since we only have pair - not single - nontemporal store instructions, we have to extract the high part into a separate register to be able to use them. When the initial nontemporal codegen support was added, I wrote the extract using the nonsensical UBFX [0,32[. Use the correct LSR form instead. llvm-svn: 259134	2016-01-29 01:08:41 +00:00
Matt Arsenault	5b39b34ca5	AMDGPU: Match fmed3 patterns with legacy fmin/fmax llvm-svn: 259090	2016-01-28 20:53:48 +00:00
Matt Arsenault	f639c32739	AMDGPU: Match some med3 patterns llvm-svn: 259089	2016-01-28 20:53:42 +00:00
Matt Arsenault	7293f9895e	AMDGPU: Set DX10Clamp bit llvm-svn: 259088	2016-01-28 20:53:35 +00:00
Tom Stellard	3d2c852958	AMDGPU: waitcnt operand fixes Summary: Allow lgkmcnt up to 0xF (hardware allows that). Fix mask for ExpCnt in AMDGPUInstPrinter. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16314 Patch by: Nikolay Haustov llvm-svn: 259059	2016-01-28 17:13:44 +00:00
Mitch Bodart	e5cadbbcdd	[X86] Test commit, fixed typos in comments. NFC. llvm-svn: 259057	2016-01-28 16:40:51 +00:00
Tom Stellard	2ff726272a	AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp Summary: Also delete all the stub functions that are identical to the implementations in TargetInstrInfo.cpp. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16609 llvm-svn: 259054	2016-01-28 16:04:37 +00:00
Chad Rosier	3ada75f7e8	[AArch64] Set MMOs on pre- and post-index instructions. Without the MMOs the MI scheduler is unable to reason about the dependencies of these instructions. llvm-svn: 259052	2016-01-28 15:38:24 +00:00
Simon Pilgrim	de16172d9d	[x86] Merge multiple calls to DAG.getTargetLoweringInfo(). NFC. llvm-svn: 259050	2016-01-28 15:29:11 +00:00
Oliver Stannard	02fa1c80c4	Revert r259035, it introduces a cyclic library dependency llvm-svn: 259045	2016-01-28 13:19:47 +00:00
Igor Breger	fca0a34398	AVX512: Fix truncate v32i8 to v32i1 lowering implementation. Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions. Differential Revision: http://reviews.llvm.org/D16531 llvm-svn: 259044	2016-01-28 13:19:25 +00:00
Benjamin Kramer	16e0f147a9	Unbreak the wasm backend again after r259035. llvm-svn: 259040	2016-01-28 11:26:34 +00:00
Zoran Jovanovic	838eabcd46	[mips][microMIPS] Disable FastISel for microMIPS Author: milena.vujosevic.janicic Reviewers: dsanders FastIsel is not supported for microMIPS, thus it needs to be disabled. Test micromips-zero-mat-uses.ll is deleted since the tested sequence of instructions is not generated for microMIPS without FastISel. Differential Revision: http://reviews.llvm.org/D15892 llvm-svn: 259039	2016-01-28 11:08:03 +00:00
Oliver Stannard	b4b092ea1b	Add backend dignostic printer for unsupported features Re-commit of r258951 after fixing layering violation. The related LLVM patch adds a backend diagnostic type for reporting unsupported features, this adds a printer for them to clang. In the case where debug location information is not available, I've changed the printer to report the location as the first line of the function, rather than the closing brace, as the latter does not give the user any information. This also affects optimisation remarks. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 259035	2016-01-28 10:07:27 +00:00
Simon Pilgrim	d3b78430d1	[X86][SSE] Move setTargetShuffleZeroElements closer to getTargetShuffleMask. NFCI. Keep target shuffle mask helper functions closer together. llvm-svn: 259034	2016-01-28 09:45:01 +00:00
Asaf Badouh	42852d99e7	[X86][AVX512] small fix in ptestm intrinsics move ptestm{q\|d} intrinsics from patterns form (in td file) to the intrinsics table Differential Revision: http://reviews.llvm.org/D16633 llvm-svn: 259029	2016-01-28 08:33:22 +00:00
JF Bastien	1e02c70ba3	WebAssembly: fix build r259016 didn't also revert r258957 which broken the WebAssembly build. llvm-svn: 259020	2016-01-28 05:05:17 +00:00
NAKAMURA Takumi	628a7a0aef	Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported features" It broke layering violation in LLVMIR. clang r258950 "Add backend dignostic printer for unsupported features" llvm r258951 "Refactor backend diagnostics for unsupported features" llvm-svn: 259016	2016-01-28 04:41:32 +00:00
Dan Gohman	fbfe5ec4a4	[WebAssembly] Don't stackify a register def past a get_local use in the same tree. llvm-svn: 259013	2016-01-28 03:59:09 +00:00
Dan Gohman	adf28177eb	[WebAssembly] Enhanced register stackification This patch revamps the RegStackifier pass with a new tree traversal mechanism, enabling three major new features: - Stackification of values with multiple uses, using the result value of set_local - More aggressive stackification of instructions with side effects - Reordering operands in commutative instructions to enable more stackification. llvm-svn: 259009	2016-01-28 01:22:44 +00:00
Adam Nemet	dadfbb52f7	[TTI] Add getPrefetchDistance from PPCLoopDataPrefetch, NFC This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). As it was discussed in the above thread, getPrefetchDistance is currently using instruction count which may change in the future. llvm-svn: 258995	2016-01-27 22:21:25 +00:00
Derek Schuff	4dd6778660	[WebAssembly] Implement byval arguments Summary: Just does the simple allocation of a stack object and passes a pointer to the callee. Differential Revision: http://reviews.llvm.org/D16610 llvm-svn: 258989	2016-01-27 21:17:39 +00:00
Tim Northover	042a6c1fe1	ARMv7k: base ABI decision on v7k Arch rather than watchos OS. Various bits we want to use the new ABI actually compile with "-arch armv7k -miphoneos-version-min=9.0". Not ideal, but also not ridiculous given how slices work. llvm-svn: 258975	2016-01-27 19:32:29 +00:00
Benjamin Kramer	391be792f2	One more batch of self-containing headers. llvm-svn: 258974	2016-01-27 19:29:56 +00:00
Benjamin Kramer	b32a5042bd	Don't put classes in headers into anonymous namespaces. You want ODR violations? That's how you get ODR violations. llvm-svn: 258973	2016-01-27 19:29:42 +00:00
Benjamin Kramer	c8be5be968	Unbreak wasm build after r258951. llvm-svn: 258957	2016-01-27 18:03:40 +00:00
Benjamin Kramer	45275a4d3c	Make more headers self-contained. A lot of this comes from the new complete type requirement of DenseMap. llvm-svn: 258956	2016-01-27 18:03:37 +00:00
Oliver Stannard	1e67a9f196	Refactor backend diagnostics for unsupported features The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. The implementation of DiagnosticInfoUnsupported::print must be in lib/Codegen rather than in the existing file in lib/IR/ to avoid introducing a dependency from IR to CodeGen. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 258951	2016-01-27 17:30:33 +00:00
Benjamin Kramer	f9172fd4ac	Rename TargetSelectionDAGInfo into SelectionDAGTargetInfo and move it to CodeGen/ It's a SelectionDAG thing, not a Target thing. llvm-svn: 258939	2016-01-27 16:32:26 +00:00
Benjamin Kramer	820f7548a1	Make some headers self-contained, remove unused includes that violate layering. llvm-svn: 258937	2016-01-27 16:05:37 +00:00
Tom Stellard	6e3b14de62	AMDGPU/SI: Fix commuting of 32-bit VOPC instructions Summary: We didn't have entries in the commuting table for the 32-bit instructions. I don't think we hit this problem now, but we will once uniform branching is enabled. Tests will come in a later commit. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16600 llvm-svn: 258936	2016-01-27 15:53:52 +00:00
Benjamin Kramer	d477e9e378	Revert "Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed." and "Add a missing test case for r258847." This reverts commit r258847, r258848. Causes miscompilations and backend errors. llvm-svn: 258927	2016-01-27 12:44:12 +00:00
Marek Olsak	e86f252209	AMDGPU/SI: Stoney has only 16 LDS banks Summary: This is a candidate for stable, along with all patches that add the "stoney" processor. Reviewers: tstellarAMD Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16485 llvm-svn: 258922	2016-01-27 11:19:45 +00:00
Benjamin Kramer	b3e8a6d2b8	Move MCTargetAsmParser.h to llvm/MC/MCParser where it belongs. llvm-svn: 258917	2016-01-27 10:01:28 +00:00
Igor Breger	b1bd47ca1a	AVX512: Fix vpmovzxbw predicate for AVX1/2 instructions. Differential Revision: http://reviews.llvm.org/D16595 llvm-svn: 258915	2016-01-27 08:57:46 +00:00
Igor Breger	d6c187b038	AVX512: Add store mask patterns. Differential Revision: http://reviews.llvm.org/D16596 llvm-svn: 258914	2016-01-27 08:43:25 +00:00
Matt Arsenault	b22828f2fb	AMDGPU: Fix default device handling When no device name is specified, default to kaveri for HSA since SI is not supported and it woud fail. Default to "tahiti" instead of "SI" since these are effectively the same, and tahiti is an actual device. Move default device handling to the TargetMachine rather than the AMDGPUSubtarget. The module ISA version is computed from the device name provided with the target machine, so the attributes printed by the AsmPrinter were inconsistent with those computed in the subtarget. Also remove DevName field from subtarget since it's redundant with getCPU() in the superclass. llvm-svn: 258901	2016-01-27 02:17:49 +00:00
Reid Kleckner	5b4637141e	[llvm-tblgen] Avoid StringMatcher for GCC and MS builtin names This brings the compile time of Function.cpp from ~40s down to ~4s for me locally. It also shaves off about 400KB of object file size in a release+asserts build. I also realized that the AMDGPU backend does not have any GCC builtin names to match, so the extra lookup was a no-op. I removed it to silence a zero-length string table array warning. There should be no functional change here. This change really ends the story of PR11951. llvm-svn: 258897	2016-01-27 01:43:12 +00:00
Reid Kleckner	1c93b4cd7b	[llvm-tblgen] Stop emitting the intrinsic name matching code The AMDGPU backend was the last user of the old StringMatcher recognition code. Move it over to the new lookupLLVMIntrinsicName funciton, which is now improved to handle all of the interesting edge cases exposed by AMDGPU intrinsic names. llvm-svn: 258875	2016-01-26 23:01:21 +00:00
Derek Schuff	90d9e8d370	[WebAssembly] Omit no-op adds for non-mem uses of FrameIndex Differential Revision: http://reviews.llvm.org/D16554 llvm-svn: 258872	2016-01-26 22:47:43 +00:00
Sanjay Patel	06fe9183b0	[x86] make the subtarget member a const reference, not a pointer ; NFCI It's passed in as a reference; it's not optional; it's not a pointer. llvm-svn: 258867	2016-01-26 22:08:58 +00:00
Simon Pilgrim	00adc1e105	[X86] Add support for zeroed shuffle elements to getShuffleScalarElt Enable handling of SM_SentinelZero shuffle elements to getShuffleScalarElt. Improves VZEXT_LOAD matches in EltsFromConsecutiveLoads. llvm-svn: 258865	2016-01-26 21:39:25 +00:00
Chris Bieneman	e49730d4ba	Remove autoconf support Summary: This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html "I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened." - Obi Wan Kenobi Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16471 llvm-svn: 258861	2016-01-26 21:29:08 +00:00
Derek Schuff	e7305cc4b3	[WebAssembly] Remove check for FrameIndex operands in WebAssemblyPeephole This pass runs after FrameIndex elimination, so it should never see FI operands. NFC llvm-svn: 258860	2016-01-26 21:08:27 +00:00
Sanjay Patel	3e1701da29	[x86] add materializeVectorConstant() helper function; NFC LowerBUILD_VECTOR is still over 300 lines long, but it's a start... llvm-svn: 258858	2016-01-26 21:05:00 +00:00
JF Bastien	43436716aa	WebAssembly NFC: update error message I forgot to update this one in my previous patch. llvm-svn: 258853	2016-01-26 20:24:51 +00:00
JF Bastien	1a6c7608b1	WebAssembly: don't optimize memcpy/memmove/memcpy to frame index r258781 optimized memcpy/memmove/memcpy so the intrinsic call can return its first argument, but missed the frame index case. Teach it to ignore that case so C code doesn't assert out in these cases. llvm-svn: 258851	2016-01-26 20:22:42 +00:00
Cong Hou	551a57f797	Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 258847	2016-01-26 20:08:01 +00:00
Sanjay Patel	70fa79fdf2	[x86] simplify getOnesVector() ; NFCI Let DAG.getConstant() handle the splatting; there's no need to repeat that logic here. llvm-svn: 258833	2016-01-26 18:49:36 +00:00
Eugene Zelenko	6ac3f739ca	Fix Clang-tidy modernize-use-nullptr and modernize-use-override warnings; other minor fixes. Differential revision: reviews.llvm.org/D16568 llvm-svn: 258831	2016-01-26 18:48:36 +00:00
Benjamin Kramer	c50b89070c	Update wasm target for r258819. llvm-svn: 258827	2016-01-26 18:21:38 +00:00
Benjamin Kramer	f57c1977c1	Reflect the MC/MCDisassembler split on the include/ level. No functional change, just moving code around. llvm-svn: 258818	2016-01-26 16:44:37 +00:00
Dan Gohman	fb619e9686	[WebAssembly] Fix a typo in a comment. llvm-svn: 258810	2016-01-26 14:55:17 +00:00
Simon Pilgrim	46696ef93c	[X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to EltsFromConsecutiveLoads This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load). It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads. After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion. Differential Revision: http://reviews.llvm.org/D16217 llvm-svn: 258798	2016-01-26 09:30:08 +00:00
Craig Topper	b9c932f26e	[X86] Mark LDS/LES as not being allowed in 64-bit mode. Their opcodes are used as part of the VEX prefix in 64-bit mode. Clearly the disassembler implicitly decoded them as AVX instructions in 64-bit mode, but I think the AsmParser would have encoded them. llvm-svn: 258793	2016-01-26 06:10:15 +00:00
Matt Arsenault	bee7575e1a	AMDGPU: Move AMDGPU intrinsics only used by R600 llvm-svn: 258790	2016-01-26 04:49:24 +00:00
Matt Arsenault	382d945d16	AMDGPU: Tidy minor td file issues Make comments and indentation more consistent. Rearrange a few things to be in a more consistent order, such as organizing subtarget features from those describing an actual device property, and those used as options. llvm-svn: 258789	2016-01-26 04:49:22 +00:00
Matt Arsenault	c5f6152911	AMDGPU: Make v32i8/v64i8 illegal types Old intrinsics were forcing these, but they have now all been removed. This fixes large i8 vector operations generally being broken. llvm-svn: 258788	2016-01-26 04:43:48 +00:00
Matt Arsenault	018179fc46	AMDGPU: Remove old sample intrinsics I did my best to try to update all the uses in tests that just happened to use the old ones to the newer intrinsics. I'm not sure I got all of the immediate operand conversions correct, since the value seems to have been ignored by the old pattern but I don't think it really matters. llvm-svn: 258787	2016-01-26 04:38:08 +00:00
Matt Arsenault	051d6f9fde	AMDGPU: Add new amdgcn intrinsics for cube instructions More cleanup to try to get all intrinsics using the correct amdgcn prefix that are as close to the instruction as possible. llvm-svn: 258786	2016-01-26 04:29:56 +00:00
Matt Arsenault	9a10cea7fb	AMDGPU: Implement read_register and write_register intrinsics Some of the special intrinsics now that now correspond to a instruction also have special setting of some registers, e.g. llvm.SI.sendmsg sets m0 as well as use s_sendmsg. Using these explicit register intrinsics may be a better option. Reading the exec mask and others may be useful for debugging. For this I'm not sure this is entirely correct because we would want this to be convergent, although it's possible this is already treated sufficently conservatively. llvm-svn: 258785	2016-01-26 04:29:24 +00:00
Matt Arsenault	0c3e2338fe	AMDGPU: Restore AMDGPU prefixed rsq intrinsic for now Also move into backend intrinsics to discourage use of the old name. llvm-svn: 258783	2016-01-26 04:14:16 +00:00
Dan Gohman	bdf08d5da6	[WebAssembly] Optimize memcpy/memmove/memcpy calls. These calls return their first argument, but because LLVM uses an intrinsic with a void return type, they can't use the returned attribute. Generalize the store results pass to optimize these calls too. llvm-svn: 258781	2016-01-26 04:01:11 +00:00
Dan Gohman	be6f196bff	[WebAssembly] Remove a completed entry from the README.txt. llvm-svn: 258780	2016-01-26 03:43:48 +00:00
Dan Gohman	bb3722430f	[WebAssembly] Implement unaligned loads and stores. Differential Revision: http://reviews.llvm.org/D16534 llvm-svn: 258779	2016-01-26 03:39:31 +00:00
Reid Kleckner	86ff2689a5	Sort intrinsics by LLVM intrinsic name, rather than tablegen def name Step one towards using a simple binary search to lookup intrinsic IDs instead of our crazy table generated switch+memcmp+startswith code that makes Function.cpp take about a minute to compile. See PR24785 and PR11951 for why we should do this. The X86 backend contains tables that need to be sorted on intrinsic ID, so reorder those. llvm-svn: 258757	2016-01-26 00:55:00 +00:00
Matthias Braun	4e67e5c91a	X86ISelLowering: Fix cmov(cmov) special lowering bug There's a special case in EmitLoweredSelect() that produces an improved lowering for cmov(cmov) patterns. However this special lowering is currently broken if the inner cmov has multiple users so this patch stops using it in this case. If you wonder why this wasn't fixed by continuing to use the special lowering and inserting a 2nd PHI for the inner cmov: I believe this would incur additional copies/register pressure so the special lowering does not improve upon the normal one anymore in this case. This fixes http://llvm.org/PR26256 (= rdar://24329747) llvm-svn: 258729	2016-01-25 22:08:25 +00:00
Simon Pilgrim	d1d118097d	[X86][AVX] Add commutation support for VPERM2X128 instructions Its main use is to allow memory folding of the 1st operand Differential Revision: http://reviews.llvm.org/D16521 llvm-svn: 258726	2016-01-25 21:51:34 +00:00
Dan Gohman	899cb5ab7b	[WebAssembly] Fix unbalanced register stack code in the case of late DCE. Instructions can be DCE'd after the RegStackify pass. If the instruction which would be the pop for what would be a push is removed, don't use a push. llvm-svn: 258694	2016-01-25 16:48:44 +00:00
Dan Gohman	ec977b07a8	[WebAssembly] Minor code formatting cleanups. NFC. llvm-svn: 258692	2016-01-25 15:12:05 +00:00
Michael Zuckerman	1bd7f993fc	[AVX512] Adding PTESTNMB/D/W/Q instruction Differential Revision: http://reviews.llvm.org/D16520 llvm-svn: 258688	2016-01-25 14:43:23 +00:00
Michael Zuckerman	19670d479a	[AVX512] Adding PTESTMB/W/D/Q instruction Differential Revision: http://reviews.llvm.org/D16519 llvm-svn: 258686	2016-01-25 13:27:32 +00:00
Bradley Smith	d27a6a7072	[ARM] Add DSP build attribute and extension targeting This patch was originally committed as r257885, but was reverted due to windows failures. The cause of these failures has been fixed under r258677, hence re-committing the original patch. llvm-svn: 258683	2016-01-25 11:26:11 +00:00
Bradley Smith	f277c8a5ea	[ARM] Add new system registers to ARMv8-M Baseline/Mainline This patch was originally committed as r257884, but was reverted due to windows failures. The cause of these failures has been fixed under r258677, hence re-committing the original patch. llvm-svn: 258682	2016-01-25 11:25:36 +00:00
Bradley Smith	fed3e4ac00	[ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline This patch was originally committed as r257883, but was reverted due to windows failures. The cause of these failures has been fixed under r258677, hence re-committing the original patch. llvm-svn: 258681	2016-01-25 11:24:47 +00:00
Asaf Badouh	655822ab7e	[X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned 52bit integer VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators Differential Revision: http://reviews.llvm.org/D16407 llvm-svn: 258680	2016-01-25 11:14:24 +00:00
Oliver Stannard	65b85382f6	[ARM] Add ARMv8.2-A FP16 scalar instructions This was originally committed as r255762, but reverted as it broke windows bots. Re-commitiing the exact same patch, as the underlying cause was fixed by r258677. ARMv8.2-A adds 16-bit floating point versions of all existing VFP floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. The assembly for these instructions uses S registers (AArch32 does not have H registers), but the instructions have ".f16" type specifiers rather than ".f32" or ".f64". The top 16 bits of each source register are ignored, and the top 16 bits of the destination register are set to zero. These instructions are mostly the same as the 32- and 64-bit versions, but they use coprocessor 9 rather than 10 and 11. Two new instructions, VMOVX and VINS, have been added to allow packing and extracting two 16-bit floats stored in the top and bottom halves of an S register. New fixup kinds have been added for the PC-relative load and store instructions, but no ELF relocations have been added as they have a range of 512 bytes. Differential Revision: http://reviews.llvm.org/D15038 llvm-svn: 258678	2016-01-25 10:26:26 +00:00
Oliver Stannard	7772f023b5	[TableGen] Fix sort order of asm operand classes This is a fix for https://llvm.org/bugs/show_bug.cgi?id=22796. The previous implementation of ClassInfo::operator< allowed cycles of classes such that x < y < z < x, meaning that a list of them cannot be correctly sorted, and the sort order could differ with different standard libraries. The original implementation sorted classes by ValueName if they were otherwise equal. This isn't strictly necessary, but some backends seem to accidentally rely on it. If I reverse this comparison I get 8 test failures spread across the AArch64, Mips and X86 backends, so I have left it in until those backends can be fixed. There was one case in the X86 backend where the observable behaviour of the assembler is changed by this patch. This was because some of the memory asm operands were not marked as children of X86MemAsmOperand. Differential Revision: http://reviews.llvm.org/D16141 llvm-svn: 258677	2016-01-25 10:20:19 +00:00
Junmo Park	3ca3e192d0	Silence a -Wparentheses warning; NFC. llvm-svn: 258676	2016-01-25 10:17:17 +00:00
Igor Breger	6d421419db	AVX1 : Enable vector masked_load/store to AVX1. Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q). Differential Revision: http://reviews.llvm.org/D16528 llvm-svn: 258675	2016-01-25 10:17:11 +00:00
Michael Zuckerman	72b7223ae6	[AVX512] [CMPPS ][ CMPPD ] Adding full Comparison Predicate names X86AsmParser.cpp is missing full comparison predicate names for CMPPD and CMPPS Instructions. X86AsmParser.cpp defines only the short names of the Comparison predicate that you can find in the following pdf: https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Page 5-61 table 5-3 Differential Revision: http://reviews.llvm.org/D16518 llvm-svn: 258671	2016-01-25 08:43:26 +00:00
Elena Demikhovsky	29cde35b43	Added Skylake client to X86 targets and features Changes in X86.td: I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X .. I added Skylake client processor and defined it's features FeatureADX was missing on KNL Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others Differential Revision: http://reviews.llvm.org/D16357 llvm-svn: 258659	2016-01-24 10:41:28 +00:00
Igor Breger	1e5bafbc82	AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16137 llvm-svn: 258657	2016-01-24 08:04:33 +00:00
Simon Pilgrim	0423b382d3	[X86][SSE] Generalised TRUNC -> PACKSS/PACKUS code. NFC. Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types. llvm-svn: 258645	2016-01-23 22:02:48 +00:00
Justin Lebar	3a5f5798a1	[CUDA] Die gracefully when trying to output an LLVM alias. Summary: Previously, we would just output "foo = bar" in the assembly, and then ptxas would choke. Now we die before emitting any invalid code. Reviewers: echristo Subscribers: jholewinski, llvm-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16490 llvm-svn: 258638	2016-01-23 21:12:20 +00:00
Justin Lebar	2a161f986f	[CUDA] Make empty parameter lists in nvptx function decls easier to read. Summary: Before: .func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv( ) { After: .func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv() { Reviewers: bkramer Subscribers: llvm-commits, tra, jhen, echristo, jholewinski Differential Revision: http://reviews.llvm.org/D16512 llvm-svn: 258637	2016-01-23 21:12:17 +00:00
Aaron Ballman	add830b5d1	Silence a -Wparentheses warning; NFC. llvm-svn: 258626	2016-01-23 15:42:21 +00:00
Simon Pilgrim	ead22d095e	Added missing comment. NFC. llvm-svn: 258624	2016-01-23 14:38:02 +00:00
Simon Pilgrim	fd66169341	[X86][SSE] Remove INSERTPS dependencies from unreferenced operands. If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies. llvm-svn: 258622	2016-01-23 13:37:07 +00:00
Matthias Braun	327bca776c	Inline variable into assert Seems like some compilers still give unused variable warnings for bool var = ...; (void)var; so I have to inline the variable. llvm-svn: 258619	2016-01-23 06:49:29 +00:00
NAKAMURA Takumi	9974fa9c8c	AArch64ISelLowering.cpp: Fix a warning. [-Wunused-variable] llvm-svn: 258618	2016-01-23 06:34:59 +00:00
Manuel Jacob	45cc9bb581	Put space after pointer type in test. NFC. llvm-svn: 258615	2016-01-23 05:47:34 +00:00
Matt Arsenault	7713162c32	AMDGPU: Remove more unused intrinsics Replace tests with lrp with basic IR expansion llvm-svn: 258612	2016-01-23 05:42:38 +00:00
Matt Arsenault	f75257aaa6	AMDGPU: Move amdgcn intrinsic handling into SITargetLowering llvm-svn: 258608	2016-01-23 05:32:20 +00:00
Matt Arsenault	f1341406bf	AMDGPU: Remove IntrNoMem from llvm.SI.sendmsg This has side effects. llvm-svn: 258607	2016-01-23 05:32:18 +00:00
Matt Arsenault	2a93bb6365	AMDGPU: Remove Feature64BitPtr This is a leftover from AMDIL that doesn't do anything and doesn't belong here. llvm-svn: 258606	2016-01-23 05:32:14 +00:00
Matthias Braun	fdef49b183	AArch64ISel: Fix ccmp code selection matching deep expressions. Some of the conditions necessary to produce ccmp sequences were only checked in recursive calls to emitConjunctionDisjunctionTree() after some of the earlier expressions were already built. Move all checks over to isConjunctionDisjunctionTree() so they are all checked before we start emitting instructions. Also rename some variable to better reflect their usage. llvm-svn: 258605	2016-01-23 04:05:22 +00:00
Matthias Braun	985bdf9084	AArch64ISelLowering: Reduce maximum recursion depth of isConjunctionDisjunctionTree() This function will exhibit exponential runtime (2**n) so we should rather use a lower limit. llvm-svn: 258604	2016-01-23 04:05:18 +00:00
Matthias Braun	fd13c14669	Fix wrong indentation llvm-svn: 258603	2016-01-23 04:05:16 +00:00
Derek Schuff	65194682e9	[WebAssembly] Fix RegNumbering for the stack pointer Previously it failed to add NumArgRegs to the offset and so clobbered an already-used register. Now just start the numbering after the arg regs and don't duplicate the add. Test coverage for this coming shortly with the implementation of byval. llvm-svn: 258597	2016-01-23 01:20:43 +00:00
Sanjay Patel	c4efadb665	fix typos; NFC llvm-svn: 258567	2016-01-22 22:09:41 +00:00
Matt Arsenault	10ca39ca8b	AMDGPU: Add new name for barrier intrinsic llvm-svn: 258558	2016-01-22 21:30:43 +00:00
Matt Arsenault	bef34e21c7	AMDGPU: Rename intrinsics to use amdgcn prefix The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557	2016-01-22 21:30:34 +00:00
Matt Arsenault	0b783ef076	AMDGPU: Fix crash with invariant markers The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537	2016-01-22 19:47:54 +00:00
Jingyue Wu	585ec8671d	[NVPTX] expand mul_lohi to mul_lo and mul_hi Summary: Fixes PR26186. Reviewers: grosser, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16479 llvm-svn: 258536	2016-01-22 19:47:26 +00:00
Ahmed Bougacha	78d6efdb93	[AArch64] Simplify emitConditionalCompare calls. NFC. Now that both callsites are identical, we can simplify the prototype and make it easier to reason about the 2-CC case. llvm-svn: 258534	2016-01-22 19:43:57 +00:00
Ahmed Bougacha	99209b90a4	[AArch64] Lower 2-CC FCCMPs (one/ueq) using AND'ed CCs. The current behavior is incorrect, as the two CCs returned by changeFPCCToAArch64CC, intended to be OR'ed, are instead used in an AND ccmp chain. Consider: define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) { %cc1 = fcmp one float %a, %b %cc2 = fcmp olt float %c, %d %and = and i1 %cc1, %cc2 %r = select i1 %and, i32 %e, i32 %f ret i32 %r } Assuming (%a < %b) and (%c < %d); we used to do: fcmp s0, s1 # nzcv <- 1000 orr w8, wzr, #0x1 # w8 <- 1 csel w9, w8, wzr, mi # w9 <- 1 csel w8, w8, w9, gt # w8 <- 1 fcmp s2, s3 # nzcv <- 1000 cset w9, mi # w9 <- 1 tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000 csel w0, w0, w1, ne # w0 <- w0 We now do: fcmp s2, s3 # nzcv <- 1000 fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000 fccmp s0, s1, #8, le # !le, so: nzcv <- 1000 csel w0, w0, w1, pl # !pl, so: w0 <- w1 In other words, we transformed: (c < d) && ((a < b) \|\| (a > b)) into: (c < d) && (a u>= b) && (a u<= b) whereas, per De Morgan's, we wanted: (c < d) && !((a u>= b) && (a u<= b)) Note that this problem doesn't occur in the test-suite. changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt. We can't represent that in the fccmp chain; it can't express arbitrary OR sequences, as one comment explains: In general we can create code for arbitrary "... (and (and A B) C)" sequences. We can also implement some "or" expressions, because "(or A B)" is equivalent to "not (and (not A) (not B))" and we can implement some negation operations. [...] However there is no way to negate the result of a partial sequence. Instead, introduce changeFPCCToANDAArch64CC, which produces the conjunct cond codes: - (a one b) == ((a olt b) \|\| (a ogt b)) == ((a ord b) && (a une b)) - (a ueq b) == ((a uno b) \|\| (a oeq b)) == ((a ule b) && (a uge b)) Note that, at first, one might think that, when PushNegate is true, we should use the disjunct CCs, in effect doing: (a \|\| b) = !(!a && !(b)) = !(!a && !(b1 \|\| b2)) <- changeFPCCToAArch64CC(b, b1, b2) = !(!a && !b1 && !b2) However, we can take advantage of the fact that the CC is already negated, which lets us avoid special-casing PushNegate and doing the simpler to reason about: (a \|\| b) = !(!a && (!b)) = !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2) = !(!a && b1 && b2) This makes both emitConditionalCompare cases behave identically, and produces correct ccmp sequences for the 2-CC fcmps. llvm-svn: 258533	2016-01-22 19:43:54 +00:00
Ahmed Bougacha	6345b9ecfa	[AArch64] Assert that CCMP isel didn't fail inconsistently. We verify that the op tree is eligible for CCMP emission in isConjunctionDisjunctionTree, but it's also possible that emitConjunctionDisjunctionTree fails later. The initial check is useful, as it avoids building nodes that will get discarded. Still, make sure that inconsistencies don't happen with an assert. llvm-svn: 258532	2016-01-22 19:43:43 +00:00
Krzysztof Parzyszek	7b413c6c63	[Hexagon] Use general purpose registers to spill pred/mod registers into Patch by Tobias Edler Von Koch. llvm-svn: 258527	2016-01-22 19:15:58 +00:00
Matt Arsenault	59bd3014f2	AMDGPU: Rename some r600 intrinsics to use correct TargetPrefix These ones aren't directly emitted by mesa and inserted by a pass. llvm-svn: 258523	2016-01-22 19:00:09 +00:00
Matt Arsenault	bb4ff5f5b6	AMDGPU: Remove unused R600 intrinsics llvm-svn: 258522	2016-01-22 18:52:14 +00:00
Matt Arsenault	7898b90ee1	AMDGPU: Change control flow intrinsics to use amdgcn prefix These aren't supposed to be used outside of the backend, so there aren't any users to worry about. llvm-svn: 258516	2016-01-22 18:42:55 +00:00
Matt Arsenault	8d903029e8	AMDGPU: Don't use separate mulhu/mulhs Pats llvm-svn: 258515	2016-01-22 18:42:49 +00:00
Matt Arsenault	ee0930821a	AMDGPU: Remove random TGSI intrinsic I don't think this was ever used. llvm-svn: 258514	2016-01-22 18:42:44 +00:00
Matt Arsenault	0cbaa1762b	AMDGPU: Remove AMDGPU.fract intrinsic Mesa doesn't use this, and this is pattern matched already from fsub x, (ffloor x) llvm-svn: 258513	2016-01-22 18:42:38 +00:00
JF Bastien	4383a34268	NFC WebAssembly: update links I got a vanity URL, and moved the github waterfall repo. llvm-svn: 258484	2016-01-22 04:21:49 +00:00
Pirama Arumuga Nainar	71e9a2a4c4	Do not lower VSETCC if operand is an f16 vector Summary: SETCC with f16 vectors has OperationAction set to Expand but still gets lowered to FCM* intrinsics based on its result type. This patch skips lowering of VSETCC if the operand is an f16 vector. v4 and v8 tests included. Reviewers: ab, jmolloy Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D15361 llvm-svn: 258471	2016-01-22 01:16:57 +00:00
Simon Pilgrim	5ba1c127fc	[X86][SSE] Improve i16 splatting shuffles Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector. Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles. Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU. Differential Revision: http://reviews.llvm.org/D14901 llvm-svn: 258440	2016-01-21 22:07:41 +00:00
Adam Nemet	af761104ba	[TTI] Add getCacheLineSize Summary: And use it in PPCLoopDataPrefetch.cpp. @hfinkel, please let me know if your preference would be to preserve the ppc-loop-prefetch-cache-line option in order to be able to override the value of TTI::getCacheLineSize for PPC. Reviewers: hfinkel Subscribers: hulx2000, mcrosier, mssimpso, hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D16306 llvm-svn: 258419	2016-01-21 18:28:36 +00:00
Scott Egerton	2455701117	[mips] Allowed dla instructions on 32-bit architectures. Summary: This is now the same as the behaviour of the GNU assembler. This was done as it is required in order to build the Linux kernel with the integrated assembler enabled. Reviewers: dsanders, vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D13594 llvm-svn: 258400	2016-01-21 15:11:01 +00:00
Igor Breger	7a000f5bb2	AVX512: Masked move intrinsic implementation. Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD. Differential Revision: http://reviews.llvm.org/D16316 llvm-svn: 258398	2016-01-21 14:18:11 +00:00
Michael Zuckerman	21a30a42a9	[AVX512] Adding VPERMT2B and VPERMI2B Intrinsics Differential Revision: http://reviews.llvm.org/D16398 llvm-svn: 258397	2016-01-21 13:36:01 +00:00
Krzysztof Parzyszek	14f9535eec	PR26172: unnecessary indirection in HexagonCopyToCombine.cpp llvm-svn: 258395	2016-01-21 12:45:17 +00:00
Marina Yatsina	ff262fa807	[X86] - Removing warning on legal cases caused by commit r258132 There's an overloading of the "movsd" and "cmpsd" instructions, e.g. movsd can be either "Move Data from String to String" or "Move or Merge Scalar Double-Precision Floating-Point Value". The former should produce warnings when parsing a memory operand that is not ESI/EDI, but the latter should not. Fixed the code to produce warnings only after making sure we're dealing with the first case. Expanded the tests of the produced warnings + fixed RUN line of the test so that it would check both stdout and stderr Differential Revision: http://reviews.llvm.org/D16359 llvm-svn: 258393	2016-01-21 11:37:06 +00:00
Tom Stellard	de008d338c	AMDGPU/SI: Pass whether to use the SI scheduler via Target Attribute Summary: Currently the SI scheduler can be selected via command line option, but it turned out it would be better if it was selectable via a Target Attribute. This patch adds "si-scheduler" attribute to the backend. Reviewers: tstellarAMD, echristo Subscribers: echristo, arsenm Differential Revision: http://reviews.llvm.org/D16192 llvm-svn: 258386	2016-01-21 04:28:34 +00:00
Tom Stellard	d1efda8e9e	AMDGPU/SI: Promote i1 SETCC operations Summary: While working on uniform branching, I've hit a few cases where we emit i1 SETCC operations. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16233 llvm-svn: 258352	2016-01-20 21:48:24 +00:00
Matt Arsenault	7836f895fe	AMDGPU: Fix old comments that mention AMDIL llvm-svn: 258350	2016-01-20 21:22:21 +00:00
Matt Arsenault	7ba334a7d9	AMDGPU: Remove AMDGPU.trunc intrinsic llvm-svn: 258348	2016-01-20 21:05:53 +00:00
Matt Arsenault	15fbe49daf	AMDGPU: Remove AMDIL.fraction intrinsic llvm-svn: 258347	2016-01-20 21:05:49 +00:00
Matt Arsenault	7cccd2672e	AMDGPU: Remove AMDIL.round.nearest intrinsic llvm-svn: 258346	2016-01-20 21:05:40 +00:00
Matt Arsenault	1c9e4ef0df	AMDGPU: Remove abs intrinsic llvm-svn: 258343	2016-01-20 20:58:29 +00:00
Matt Arsenault	f7e6e89718	AMDGPU: Remove min/max intrinsics This removes support for mesa 11.0.x llvm-svn: 258342	2016-01-20 20:50:19 +00:00
Keith Walker	8c44bf1b89	Write AArch64 big endian data fixup entries as BE. There was support for writing the AArch64 big endian data fixup entries in the .eh_frame section in BE. This is changed to write all such fixup entries in BE with no restriction on the section. This is similar to the existing support for fixup entries for ARM. A test is added to check the length field in the .debug_line section as this is an example of where such a fixup occurs. Differential Revision: http://reviews.llvm.org/D16064 llvm-svn: 258320	2016-01-20 15:59:14 +00:00
Tom Stellard	77a177722f	Correctly initialize SIAnnotateControlFlow Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16304 llvm-svn: 258319	2016-01-20 15:48:27 +00:00
Michael Zuckerman	65c40afb03	[AVX512] Adding VPERMB Intrinsics Differential Revision: http://reviews.llvm.org/D16296 llvm-svn: 258316	2016-01-20 15:24:56 +00:00
Marina Yatsina	701938d64e	Fixing bug in rL258132: [X86] Adding support for missing variations of X86 string related instructions There was a bug in my rL258132 because there's an overloading of the "movsd" and "cmpsd" instructions, e.g. movsd can be either "Move Data from String to String" (the case I wanted to handle) or "Move or Merge Scalar Double-Precision Floating-Point Value" (the case that causes the asserts). Added code for escaping the unfamiliar scenarios and falling back to old behviour. Also changed the asserts to llvm_unreachable. llvm-svn: 258312	2016-01-20 14:03:47 +00:00
Igor Breger	d3341f5021	AVX512: Store (MOVNTPD, MOVNTPS, MOVNTDQ) using non-temporal hint intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16350 llvm-svn: 258309	2016-01-20 13:11:47 +00:00
Oliver Stannard	f7696f8267	[AArch64] Fix two bugs in the .inst directive The AArch64 .inst directive was implemented using EmitIntValue, which resulted in both $x and $d (code and data) mapping symbols being emitted at the same address. This fixes it to only emit the $x mapping symbol. EmitIntValue also emits the value in big-endian order when targeting big-endian systems, but instructions are always emitted in little-endian order for AArch64. Differential Revision: http://reviews.llvm.org/D16349 llvm-svn: 258308	2016-01-20 12:54:31 +00:00
Dan Gohman	8394756937	[WebAssembly] Minor code cleanups. NFC. llvm-svn: 258294	2016-01-20 05:54:22 +00:00
Dan Gohman	26cf4f3689	[WebAssembly] Remove the Relooper code, as it is not currently being used. llvm-svn: 258293	2016-01-20 05:50:29 +00:00
Dan Gohman	7e64917fd1	[WebAssembly] Don't stackify stores across instructions with side effects. llvm-svn: 258285	2016-01-20 04:21:16 +00:00
Eduard Burtescu	23c4d83aa3	[NFC] Replace several manual GEP loops with gep_type_iterator. Reviewers: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16335 llvm-svn: 258262	2016-01-20 00:26:52 +00:00
Matthias Braun	5d458617aa	RegisterPressure: Make liveness tracking subregister aware Differential Revision: http://reviews.llvm.org/D14968 llvm-svn: 258258	2016-01-20 00:23:26 +00:00
Tom Stellard	2e045bbc5f	AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputs Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15035 llvm-svn: 258256	2016-01-20 00:13:22 +00:00
Quentin Colombet	4cf56917ea	[X86] Do not run shrink-wrapping on function with split-stack attribute or HiPE calling convention. The implementation of the related callbacks in the x86 backend for such functions are not ready to deal with a prologue block that is not the entry block of the function. This fixes PR26107, but the longer term solution would be to fix those callbacks. llvm-svn: 258221	2016-01-19 23:29:03 +00:00
David Majnemer	ce10842036	[MC, COFF] Add .reloc support for WinCOFF This adds rudimentary support for a few relocations that we will use for the CodeView debug format. llvm-svn: 258216	2016-01-19 23:05:27 +00:00
Simon Pilgrim	4b919b2ab3	[X86][SSE] Add VZEXT_MOVL target shuffle decoding. Add support for decoding VZEXT_MOVL target shuffle masks, allowing it to be used as a source in target shuffle combines. llvm-svn: 258215	2016-01-19 23:04:56 +00:00
Simon Pilgrim	e74653b67a	[X86][SSE] Add INSERTPS target shuffle combines. As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles. This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements. Differential Revision: http://reviews.llvm.org/D16072 llvm-svn: 258205	2016-01-19 22:24:12 +00:00
Chad Rosier	5c72966ea3	[AArch64] Remove a bunch of useless FIXME comments. llvm-svn: 258193	2016-01-19 21:47:24 +00:00
Dan Gohman	cff798386e	[WebAssembly] Remove an unused data member. NFC. llvm-svn: 258192	2016-01-19 21:31:41 +00:00
Chad Rosier	b11c82d3e2	[AArch64] Remove more dead code after r258093. llvm-svn: 258191	2016-01-19 21:27:05 +00:00
JF Bastien	17999f20fa	WebAssembly: mark known failure caused by r258125 The following test program triggers the assertion: https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.c-torture/execute/20030916-1.c llvm-svn: 258182	2016-01-19 20:53:12 +00:00
Michael Zuckerman	4582bdab12	[AVX512] Adding VPERMT2B and VPERMI2B instruction . Differential Revision: http://reviews.llvm.org/D16297 llvm-svn: 258161	2016-01-19 18:47:02 +00:00
Eduard Burtescu	19eb03106d	[opaque pointer types] [NFC] GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType. Summary: GEPOperator: provide getResultElementType alongside getSourceElementType. This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has. GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType. Reviewers: mjacob, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16275 llvm-svn: 258145	2016-01-19 17:28:00 +00:00
Michael Zuckerman	d9cac592f4	[AVX512] Adding VPERMB instruction Differential Revision: http://reviews.llvm.org/D16294 llvm-svn: 258144	2016-01-19 17:07:43 +00:00
Dan Gohman	b6fd39a3a7	[WebAssembly] Rematerialize constants rather than hold them live in registers. Teach the register stackifier to rematerialize constants that have multiple uses instead of leaving them in registers. In the WebAssembly encoding, it's the same code size to materialize most constants as it is to read a value from a register. llvm-svn: 258142	2016-01-19 16:59:23 +00:00
Chad Rosier	401a4ab8d8	Typo. llvm-svn: 258137	2016-01-19 16:50:45 +00:00
Marina Yatsina	d9658d16fd	[X86] Add support for "xlat m8" According to x86 spec "xlat m8" is a legal instruction and it is equivalent to "xlatb". Differential Revision: http://reviews.llvm.org/D15150 llvm-svn: 258135	2016-01-19 16:35:38 +00:00
Marina Yatsina	b9f4f62cfe	[X86] Adding support for missing variations of X86 string related instructions The following are legal according to X86 spec: ins mem, DX outs DX, mem lods mem stos mem scas mem cmps mem, mem movs mem, mem Differential Revision: http://reviews.llvm.org/D14827 llvm-svn: 258132	2016-01-19 15:37:56 +00:00
Dan Gohman	b13c91f159	[WebAssembly] Disable some WebAssembly-specific optimization passes at -O0. llvm-svn: 258127	2016-01-19 14:55:02 +00:00
Dan Gohman	3196650bf3	[WebAssembly] Use the templated form of MachineFunction::getSubtarget(). NFC. llvm-svn: 258126	2016-01-19 14:53:19 +00:00
Asaf Badouh	d4a0d9a78c	[X86][AVX512]fix dag & add intrinsics for fixupimm cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics Differential Revision: http://reviews.llvm.org/D16313 llvm-svn: 258124	2016-01-19 14:21:39 +00:00
Matt Arsenault	33e3ecee0c	AMDGPU: Reduce 64-bit SRAs llvm-svn: 258096	2016-01-18 22:09:04 +00:00
Matt Arsenault	6e3a45193a	AMDGPU: Split 64-bit and of constant up This breaks the tests that were meant for testing 64-bit inline immediates, so move those to shl where they won't be broken up. This should be repeated for the other related bit ops. llvm-svn: 258095	2016-01-18 22:01:13 +00:00
Chad Rosier	234bf6fe5c	[AArch64] Remove unused arguments. NFC. AFAICT, these have been unused since the initial backend import. llvm-svn: 258093	2016-01-18 21:56:40 +00:00
Matt Arsenault	3cbbc10488	AMDGPU: Generalize shl combine Reduce 64-bit shl with constant > 32. We already special cased this for the == 32 case, but this also works for any >= 32 constant. llvm-svn: 258092	2016-01-18 21:55:14 +00:00
Matt Arsenault	80edab99ff	AMDGPU: Reduce 64-bit lshr by constant to 32-bit 64-bit shifts are very slow on some subtargets. llvm-svn: 258090	2016-01-18 21:43:36 +00:00
Matt Arsenault	e83690c1cc	AMDGPU: Add subtarget feature for instruction rates llvm-svn: 258085	2016-01-18 21:13:50 +00:00
Simon Pilgrim	99c6c29c0c	Fixed MSVC Win64 warning of implicit conversion of 32-bit shift to 64-bits. llvm-svn: 258084	2016-01-18 21:11:19 +00:00
Simon Pilgrim	3e5fb61978	[X86][AVX2] Broadcast subvectors AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly. Differential Revision: http://reviews.llvm.org/D16050 llvm-svn: 258081	2016-01-18 20:59:04 +00:00
Krzysztof Parzyszek	7aae9b3782	[Hexagon] Recognize more copy-equivalents in RDF optimizations llvm-svn: 258076	2016-01-18 20:45:51 +00:00
Krzysztof Parzyszek	adc64b7df0	[RDF] Improvements to copy propagation - Allow any instruction to define equality between registers. - Keep the DFG updated. llvm-svn: 258075	2016-01-18 20:43:57 +00:00
Krzysztof Parzyszek	e6b0662092	[RDF] Improve compile-time performance of dead code elimination llvm-svn: 258074	2016-01-18 20:42:47 +00:00
Krzysztof Parzyszek	69e670d5f9	[RDF] Allow unlinking ref nodes from data-flow chains only llvm-svn: 258073	2016-01-18 20:41:34 +00:00
Igor Breger	239fda676c	AVX512: Masked store intrinsic implementation. Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD. Differential Revision: http://reviews.llvm.org/D16271 llvm-svn: 258047	2016-01-18 13:52:57 +00:00
Elena Demikhovsky	9242ea87d6	Added Cannonlake processor to X86 Target Differential Revision: http://reviews.llvm.org/D16289 llvm-svn: 258046	2016-01-18 13:00:31 +00:00
Igor Breger	dd6522c653	AVX512 : Change v8i1 bitconvert GR8 pattern, remove unnecessary movzbl instruction. code example , previous implementation. movzbl %dil, %eax kmovw %eax, %k0 new code kmovw %edi, %k0 Differential Revision: http://reviews.llvm.org/D16287 llvm-svn: 258045	2016-01-18 12:02:45 +00:00
Oliver Stannard	9f68749eba	[ARM] Operands for PKHTB alias should be swapped When the shift immediate is zero, PKHTB is an alias for PKHBT, but the order of the input operands needs to be swapped. Differential Revision: http://reviews.llvm.org/D16288 llvm-svn: 258044	2016-01-18 11:56:35 +00:00
Manuel Jacob	190577ac81	[opaque pointer types] [NFC] CallSite: use getFunctionType() instead of going through PointerType::getElementType. Patch by Eduard Burtescu. Reviewers: dblaikie, mjacob Subscribers: dsanders, llvm-commits, dblaikie Differential Revision: http://reviews.llvm.org/D16273 llvm-svn: 258023	2016-01-17 22:37:39 +00:00
Michael Zuckerman	97b6a6923e	[AVX512] adding AVXVBMI feature flag The feature flag is for VPERMB,VPERMI2B,VPERMT2B and VPMULTISHIFTQB instructions. More about the instruction can be found in: hattps://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Differential Revision: http://reviews.llvm.org/D16190 llvm-svn: 258012	2016-01-17 13:42:12 +00:00
Igor Breger	e1f273d900	AVX512: Use MemIntrinsicSDNode to implement load/store intrinsic. Differential Revision: http://reviews.llvm.org/D16184 llvm-svn: 258009	2016-01-17 12:10:24 +00:00
Michael Zuckerman	ac1b238b0a	[AVX512] Adding VPERMW/D/Q VPERMPS/D Intrinsics Differential Revision: http://reviews.llvm.org/D16189 llvm-svn: 258008	2016-01-17 11:33:29 +00:00
Michael Zuckerman	ede597c753	[AVX512] Adding VPERMQ VPERMPD Intrinsics Differential Revision: http://reviews.llvm.org/D16194 llvm-svn: 258006	2016-01-17 08:32:14 +00:00
Simon Pilgrim	20f31fa31a	[X86][AVX] Enable extraction of upper 128-bit subvectors for 'half undef' shuffle lowering Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks. Minor follow up to D15477. llvm-svn: 258000	2016-01-16 22:30:20 +00:00
Manuel Jacob	5f6eaac611	GlobalValue: use getValueType() instead of getType()->getPointerElementType(). Reviewers: mjacob Subscribers: jholewinski, arsenm, dsanders, dblaikie Patch by Eduard Burtescu. Differential Revision: http://reviews.llvm.org/D16260 llvm-svn: 257999	2016-01-16 20:30:46 +00:00
Manman Ren	53a54c41d7	CXX_FAST_TLS calling convention: fix issue on x86-64. %RBP can't be handled explicitly. We generate the following code: pushq %rbp movq %rsp, %rbp ... movq %rbx, (%rbp) ## 8-byte Spill where %rbp will be overwritten by the spilled value. The fix is to let PEI handle %RBP. PR26136 llvm-svn: 257997	2016-01-16 16:39:46 +00:00
NAKAMURA Takumi	33ff1dda6a	[Cygwin] Use -femulated-tls by default since r257718 introduced the new pass. FIXME: Add more targets to use emutls into clang/test/Driver/emulated-tls.cpp. FIXME: Add cygwin tests into llvm/test/CodeGen/X86. Working in progress. llvm-svn: 257984	2016-01-16 03:44:52 +00:00
Dan Gohman	7f86ca1803	[WebAssembly] Add some more README.txt entries. llvm-svn: 257969	2016-01-16 00:20:03 +00:00
Kevin B. Smith	c831a08fbf	[X86]: Make param names in header and body match for isCalleePop. Differential Revision: http://reviews.llvm.org/D16246 llvm-svn: 257965	2016-01-16 00:08:36 +00:00
Dan Gohman	2f301f3e92	[WebAssembly] Don't create a needless .note.GNU-stack section WebAssembly's stack will never be executable by default, so it isn't necessary to declare .note.GNU-stack sections to request a non-executable stack. Differential Revision: http://reviews.llvm.org/D15969 llvm-svn: 257962	2016-01-15 23:59:13 +00:00
Artem Belevich	5be0706ebe	[NVPTX] Do not emit .hidden or .protected directives as they are not allowed by PTX. llvm-svn: 257961	2016-01-15 23:57:53 +00:00
Manman Ren	e5f807f928	CXX_FAST_TLS calling convention: fix issue on ARM. When we have a single basic block, the explicit copy-back instructions should be inserted right before the terminator. Before this fix, they were wrongly placed at the beginning of the basic block. PR26136 llvm-svn: 257930	2016-01-15 20:24:11 +00:00
Manman Ren	4632e8e625	CXX_FAST_TLS calling convention: fix issue on AArch64. When we have a single basic block, the explicit copy-back instructions should be inserted right before the terminator. Before this fix, they were wrongly placed at the beginning of the basic block. I will commit fixes to other platforms as well. PR26136 llvm-svn: 257929	2016-01-15 20:13:28 +00:00
Manman Ren	4fe01bd8f9	CXX_FAST_TLS calling convention: fix issue on X86-64. When we have a single basic block, the explicit copy-back instructions should be inserted right before the terminator. Before this fix, they were wrongly placed at the beginning of the basic block. I will commit fixes to other platforms as well. PR26136 llvm-svn: 257925	2016-01-15 19:35:42 +00:00
Kyle Butt	132bf36161	Codegen: [PPC] Silence false-positive initialization warning. NFC Some compilers don't do exhaustive switch checking. For those compilers, add an initialization to prevent un-initialized variable warnings from firing. For compilers with exhaustive switch checking, we still get a guarantee that the switch is exhaustive, and hence the initializations are redundant, and a non-functional change. llvm-svn: 257923	2016-01-15 19:20:06 +00:00
Reid Kleckner	d4a0d18899	Revert "[ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline" This reverts commit r257883. Somehow this didn't make it into r257916. llvm-svn: 257919	2016-01-15 18:55:12 +00:00
Reid Kleckner	47f2452da8	# This is a combination of 2 commits. # The first commit's message is: Revert "[ARM] Add DSP build attribute and extension targeting" This reverts commit b11cc50c0b4a7c8cdb628abc50b7dc226ff583dc. # This is the 2nd commit message: Revert "[ARM] Add new system registers to ARMv8-M Baseline/Mainline" This reverts commit 837d08454e3e5beb8581951ac26b22fa07df3cd5. llvm-svn: 257916	2016-01-15 18:31:29 +00:00
Krzysztof Parzyszek	2a3b2f9841	[Hexagon] Generate CONST64 when optimizing for size in copy-to-combine llvm-svn: 257891	2016-01-15 14:08:31 +00:00
Krzysztof Parzyszek	9b7320e621	[Hexagon] Handle DBG_VALUE instructions in copy-to-combine llvm-svn: 257890	2016-01-15 13:55:57 +00:00
Bradley Smith	48b93e1f21	[ARM] Add DSP build attribute and extension targeting llvm-svn: 257885	2016-01-15 10:28:25 +00:00
Bradley Smith	42f6e90a43	[ARM] Add new system registers to ARMv8-M Baseline/Mainline llvm-svn: 257884	2016-01-15 10:28:03 +00:00
Bradley Smith	618712df04	[ARM] Add ARMv8-M security extension instructions to ARMv8-M Baseline/Mainline llvm-svn: 257883	2016-01-15 10:27:14 +00:00
Bradley Smith	433c22e35c	[ARM] Add ARMv8-A semaphore/atomic instructions to ARMv8-M Baseline/Mainline llvm-svn: 257882	2016-01-15 10:26:51 +00:00
Bradley Smith	a1189106d5	[ARM] Add B.W and CBZ instructions to ARMv8-M Baseline llvm-svn: 257881	2016-01-15 10:26:17 +00:00
Bradley Smith	519563e371	[ARM] Add SDIV/UDIV instructions to ARMv8-M Baseline llvm-svn: 257880	2016-01-15 10:25:35 +00:00
Bradley Smith	d9a99ce53d	[ARM] Add MOVW/MOVT instructions to ARMv8-M Baseline/Mainline llvm-svn: 257879	2016-01-15 10:25:14 +00:00
Bradley Smith	e26f799422	[ARM] Add ARMv8-M Baseline/Mainline LLVM targeting llvm-svn: 257878	2016-01-15 10:24:39 +00:00
Bradley Smith	4c21cba72b	[ARM] Split out ARMv8-A semaphores and atomics and ARMv7 clrex as separate features llvm-svn: 257877	2016-01-15 10:23:46 +00:00
Jonas Paulsson	5b29e096ac	[SystemZ] Fix bad instruction name SLGBR -> SLBGR Reviewed by Ulrich Weigand llvm-svn: 257874	2016-01-15 07:12:09 +00:00
Pete Cooper	835594e627	Delete MCRelocationInfo::createExprForRelocation. This method has no callers. Also remove X86ELFRelocationInfo.cpp and X86MachORelocationInfo.cpp which only existed to provide an implementation of that method. Ok'd by Rafael and Jim. llvm-svn: 257859	2016-01-15 02:24:12 +00:00
Weiming Zhao	038393bba0	Fix AArch64ConditionOptimizer Summary: This pass may modify the Cmp operands. However, the flag reg may be used by both the branch and CSEL. Modifying CMP will have side effect on CSEL. Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: http://reviews.llvm.org/D16147 llvm-svn: 257844	2016-01-15 00:06:58 +00:00
Krzysztof Parzyszek	0d11212f00	[Hexagon] Use S2_lsr_i_r instead of S2_extractu to obtain upper halfword llvm-svn: 257815	2016-01-14 21:59:22 +00:00
Krzysztof Parzyszek	5337a3e965	[Hexagon] Handle HVX registers in bit simplification llvm-svn: 257811	2016-01-14 21:45:43 +00:00
Rui Ueyama	da00f2fdf4	Update to use new name alignTo(). llvm-svn: 257804	2016-01-14 21:06:47 +00:00
Rafael Espindola	c897cdde70	Handle offsets larger than 32 bits. David Majnemer noticed that it was not obvious what the behavior would be if B.Offset - A.Offset could not fit in an int. llvm-svn: 257803	2016-01-14 21:03:06 +00:00
Rafael Espindola	56cb2734e3	Assert that a cmp function defines a total order. Thanks to David Blaikie for noticing it. llvm-svn: 257796	2016-01-14 20:28:25 +00:00
Krzysztof Parzyszek	237b96132d	[Hexagon] Expand pseudo instruction Insert4 llvm-svn: 257771	2016-01-14 15:37:16 +00:00
Krzysztof Parzyszek	b28ae10a16	[Hexagon] Handle branches with non-mbb operands llvm-svn: 257768	2016-01-14 15:05:27 +00:00
Benjamin Kramer	fc1f7d893e	[ARM] Use the efficient version of BitVector::set and a static_assert. No functional change intended. llvm-svn: 257766	2016-01-14 14:33:04 +00:00
Igor Breger	fc96331d88	AVX512: VMOVDQA32/64 (load) intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16142 llvm-svn: 257749	2016-01-14 07:56:04 +00:00
Ahmed Bougacha	dfc77357a0	[AArch64] Don't assume extractelt constant index when matching shuffle. llvm-svn: 257735	2016-01-14 02:12:30 +00:00
JF Bastien	d1bd129d00	WebAssembly: mark a few new failures A recent change introduced this assertion failure in some corner cases. Repro: mkdir /s/wasm/torture-out ; time /s/wasm/waterfall/src/compile_torture_tests.py --c /s/llvm/out/bin/clang --cxx /s/llvm/out/bin/clang++ --testsuite /s/gcc/gcc/testsuite --fails /s/llvm/llvm/lib/Target/WebAssembly/known_gcc_test_failures.txt --out /s/wasm/torture-out Or look on the wasm integration bot: https://build.chromium.org/p/client.wasm.llvm/console llvm-svn: 257733	2016-01-14 01:49:22 +00:00
David Majnemer	3463e696fb	[X86] Don't alter HasOpaqueSPAdjustment after we've relied on it We rely on HasOpaqueSPAdjustment not changing after we've calculated things based on it. Things like whether or not we can use 'rep;movs' to copy bytes around, that sort of thing. If it changes, invariants in the backend will quietly break. This situation arose when we had a call to memcpy and a COPY of the FLAGS register where we would attempt to reference local variables using %esi, a register that was clobbered by the 'rep;movs'. This fixes PR26124. llvm-svn: 257730	2016-01-14 01:20:03 +00:00
JF Bastien	664fd461c2	WebAssembly: fix build break introduced by ELFObjectWriter churn llvm-svn: 257709	2016-01-13 23:36:00 +00:00
Rafael Espindola	8340f94df1	Convert a few assert failures into proper errors. Fixes PR25944. llvm-svn: 257697	2016-01-13 22:56:57 +00:00
Krzysztof Parzyszek	a61f7da6ba	[Hexagon] Fix the options controlling jump table generation llvm-svn: 257679	2016-01-13 21:43:13 +00:00
Changpeng Fang	c16be00313	AMDGPU/SI: Update ISA version for FIJI llvm-svn: 257666	2016-01-13 20:39:25 +00:00
Dan Gohman	a39ca60126	[WebAssembly] Add an assertion to catch unexpected MCFixupKindInfo flags. llvm-svn: 257657	2016-01-13 19:31:57 +00:00
Dan Gohman	938ff9f0aa	[WebAssembly] MCFixupKindInfo's TargetSize is in bits rather than bytes. llvm-svn: 257655	2016-01-13 19:29:37 +00:00
Hans Wennborg	81efb6b418	Fix struct/class mismatch for MachineSchedContext llvm-svn: 257648	2016-01-13 18:59:45 +00:00
Marek Olsak	46dadbfab2	AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabled Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16037 llvm-svn: 257625	2016-01-13 17:23:20 +00:00
Marek Olsak	3c0ebc71f1	AMDGPU/SI: Remove ending s_endpgm from non-void functions Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16035 llvm-svn: 257623	2016-01-13 17:23:12 +00:00
Marek Olsak	8e9cc63bfb	AMDGPU/SI: Add s_waitcnt at the end of non-void functions Summary: v2: Make ReturnsVoid private, so that I can another 8 lines of code and look more productive. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16034 llvm-svn: 257622	2016-01-13 17:23:09 +00:00
Marek Olsak	8a0f335ad6	AMDGPU/SI: Add support for non-void functions Summary: Return values can be stored in SGPRs (i32) and VGPRs (f32). This will be used by functions which expect some bytecode or other binary to be appended at the end. It allows defining in which registers the return values will be stored. v2: don't do this for compute shaders Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16033 llvm-svn: 257621	2016-01-13 17:23:04 +00:00
Derek Schuff	9c3bf3187a	[WebAssemly] Invalidate liveness in CFG stackifier WebAssemblyCFGStackify does not track liveness for EXPR_STACK, causing verifier failure if liveness has not already been invalidated. llvm-svn: 257620	2016-01-13 17:10:28 +00:00
Nicolai Haehnle	02c3291566	AMDGPU/SI: Add SI Machine Scheduler Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609	2016-01-13 16:10:10 +00:00
Michael Zuckerman	6b35f460ac	Fixing warning by adding the X86ISD::VROTRI case. Differential Revision: http://reviews.llvm.org/D16052 llvm-svn: 257607	2016-01-13 15:48:42 +00:00
Krzysztof Parzyszek	a3c5d44437	[Hexagon] Do not insert non-phis before phis in bit simplification llvm-svn: 257606	2016-01-13 15:48:18 +00:00
Michael Zuckerman	0e31b22487	[AVX512] Adding PMOVSXBD/W/Q , PMOVZSDQ and PMOVZSWD/Q Intrinsics . Differential Revision: http://reviews.llvm.org/D16111 llvm-svn: 257604	2016-01-13 14:59:19 +00:00
Michael Zuckerman	43cea85db9	[AVX512] Adding PMOVZXBD/W/Q , PMOVZXDQ and PMOVZXWD/Q Intrinsics Differential Revision:http://reviews.llvm.org/D16071 llvm-svn: 257601	2016-01-13 14:25:21 +00:00
Ulrich Weigand	46ff7ec317	[PowerPC] Fix large code model with the ELFv2 ABI The global entry point prologue currently assumes that the TOC associated with a function is less than 2GB away from the function entry point. This is always true when using the medium or small code model, but may not be the case when using the large code model. This patch adds a new variant of the ELFv2 global entry point prologue that lifts the 2GB restriction when building with -mcmodel=large. This works by emitting a quadword containing the distance from the function entry point to its associated TOC immediately before the entry point, and then using a prologue like: ld r2,-8(r12) add r2,r2,r12 Since creation of the entry point prologue is now split across two separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog code), I've switched to using named labels instead of just temporaries to indicate the locations of the global and local entry points and the new TOC offset data word. These names are provided by new routines in PPCFunctionInfo modeled after the existing PPCFunctionInfo::getPICOffsetSymbol. Note that a corresponding change was committed to GCC here: https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html Reviewers: hfinkel Differential Revision: http://reviews.llvm.org/D15500 llvm-svn: 257597	2016-01-13 13:12:23 +00:00
Michael Zuckerman	298a680c80	[AVX512] adding PRORQ , PRORD , PRORLVQ and PRORLVD Intrinsics Differential Revision: http://reviews.llvm.org/D16052 llvm-svn: 257594	2016-01-13 12:39:33 +00:00
Marek Olsak	4e99b6ec01	AMDGPU/SI: Allow more shader inputs Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16032 llvm-svn: 257593	2016-01-13 11:46:48 +00:00
Marek Olsak	b6c8c3d165	AMDGPU/SI: Allow any number of PS inputs Summary: With the ability to concatenate shader binaries, the limit of 15 no longer applies. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16031 llvm-svn: 257592	2016-01-13 11:46:10 +00:00
Marek Olsak	fccabaf57e	AMDGPU/SI: Add new target attribute InitialPSInputAddr Summary: This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM. The register assigns VGPR locations to PS inputs, while the ENA register determines whether or not they are loaded. Mesa needs to set some inputs as not-movable, so that a pixel shader prolog binary appended at the beginning can assume where some inputs are. v2: Make PSInputAddr private, because there is never enough silly getters and setters for people to read. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16030 llvm-svn: 257591	2016-01-13 11:45:36 +00:00
Marek Olsak	926c56f50c	AMDGPU/SI: Fix a bug in SIFoldOperands Summary: ret.ll will contain a test for this Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16029 llvm-svn: 257590	2016-01-13 11:44:29 +00:00
Andrey Turetskiy	1ce2c9973f	LEA code size optimization pass (Part 2): Remove redundant LEA instructions. Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA (in the same basic block) which calculates address differing only be a displacement. Works only for -Oz. Differential Revision: http://reviews.llvm.org/D13295 llvm-svn: 257589	2016-01-13 11:30:44 +00:00
James Y Knight	7699494f08	[SPARC] Revamp AnalyzeBranch and add ReverseBranchCondition. AnalyzeBranch on X86 (and, previously, SPARC, which implementation was copied from X86) tries to modify the branches based on block layout (e.g. checking isLayoutSuccessor), when AllowModify is true. The rest of the architectures leave that up to the caller, which can call InsertBranch, RemoveBranch, and ReverseBranchCondition as appropriate. That appears to be the preferred way to do it nowadays. This commit makes SPARC like the rest: replaces AnalyzeBranch with an implementation cribbed from AArch64, and adds a ReverseBranchCondition implementation. Additionally, a test-case has been added (also cribbed from AArch64) demonstrating that redundant branch sequences no longer get emitted. E.g., it used to emit code like this: bne .LBB1_2 nop ba .LBB1_1 nop .LBB1_2: And now emits: cmp %i0, 42 be .LBB1_1 nop llvm-svn: 257572	2016-01-13 04:44:14 +00:00
Ana Pazos	359cab3bb3	Guard fabs to bfc convert with V6T2 flag Summary: BFC instructions are available in ARMv6T2 and above. Reviewers: t.p.northover Subscribers: aemerson Differential Revision: http://reviews.llvm.org/D16076 llvm-svn: 257546	2016-01-13 00:03:35 +00:00
Quentin Colombet	f8e3030794	[ARM] Mark VMOV with immediate: isAsCheapAsMove. VMOVs are not strictly speaking cheap, but they are as expensive as a vector copy (VORR), so we should prefer rematerialization over splitting when it applies. rdar://problem/23754176 llvm-svn: 257545	2016-01-13 00:02:40 +00:00
Derek Schuff	4377e2d713	[WebAssembly] Fix disassembler shared-libs build llvm-svn: 257536	2016-01-12 23:03:40 +00:00
Dan Gohman	0656f5f845	[WebAsssembly] Register the MC register info. llvm-svn: 257525	2016-01-12 21:27:55 +00:00
Michael Zuckerman	2ddcbcf464	[AVX512] adding PROLQ and PROLD Intrinsics Differential Revision: http://reviews.llvm.org/D16048 llvm-svn: 257523	2016-01-12 21:19:17 +00:00
Kyle Butt	cec40806f1	Codegen: [PPC] Handle weighted comparisons when inserting selects. Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle the weighted predicates as well. This latent bug was triggered by r255398, because it added use of the branch-weighted predicates. While here, switch over an enum instead of an int to get the compiler to enforce totality in the future. llvm-svn: 257518	2016-01-12 21:00:43 +00:00
Dan Gohman	4635017176	[WebAssembly] Add a EM_WEBASSEMBLY value, and several bits of code that use it. A request has been made to the official registry, but an official value is not yet available. This patch uses a temporary value in order to support development. When an official value is recieved, the value of EM_WEBASSEMBLY will be updated. llvm-svn: 257517	2016-01-12 20:56:01 +00:00
Dan Gohman	3469ee120c	[WebAssembly] Introduce a WebAssemblyTargetStreamer class. Refactor .param, .result, .local, and .endfunc, as directives, using the proper MCTargetStreamer mechanism, rather than fake instructions. llvm-svn: 257511	2016-01-12 20:30:51 +00:00
Krzysztof Parzyszek	f62d44be28	Replace inherited constructor with an explicit one Some bots failed when the inherited constructor was used. llvm-svn: 257508	2016-01-12 19:27:59 +00:00
Dan Gohman	1d68e80f26	[WebAssembly] Make CFG stackification independent of basic-block labels. This patch changes the way labels are referenced. Instead of referencing the basic-block label name (eg. .LBB0_0), instructions now just have an immediate which indicates the depth in the control-flow stack to find a label to jump to. This makes them much closer to what we expect to have in the binary encoding, and avoids the problem of basic-block label names not being explicit in the binary encoding. Also, it terminates blocks and loops with end_block and end_loop instructions, rather than basic-block label names, for similar reasons. This will also fix problems where two constructs appear to have the same label, because we no longer explicitly use labels, so consumers that need labels will presumably create their own labels, and presumably they won't reuse labels when they do. This patch does make the code a little more awkward to read; as a partial mitigation, this patch also introduces comments showing where the labels are, and comments on each branch showing where it's branching to. llvm-svn: 257505	2016-01-12 19:14:46 +00:00
Krzysztof Parzyszek	1279881315	[Hexagon] Implement RDF-based post-RA optimizations - Handle simple cases of register copies (what current RDF CP allows). - Hexagon-specific dead code elimination: handles dead address updates in post-increment instructions. llvm-svn: 257504	2016-01-12 19:09:01 +00:00
Krzysztof Parzyszek	c09d630e50	RDF: Copy propagation This is a very limited implementation of DFG-based copy propagation. It only handles actual COPY instructions (does not handle other equivalents such as add-immediate with a 0 operand). The major limitation is that it does not update the DFG: that will be the change required to make it more robust (hopefully coming up soon). llvm-svn: 257490	2016-01-12 17:23:48 +00:00
Tom Stellard	f421837250	AMDGPU: Emit note directive for HSA even if there are no functions Reviewers: arsenm, echristo Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16010 llvm-svn: 257488	2016-01-12 17:18:17 +00:00
Krzysztof Parzyszek	6f4000e763	RDF: Dead code elimination Utility class to perform DFG-based dead code elimination. llvm-svn: 257485	2016-01-12 17:01:16 +00:00
Krzysztof Parzyszek	8dca45efa8	Fix compiler warnings from r257477 llvm-svn: 257483	2016-01-12 16:51:55 +00:00
Krzysztof Parzyszek	acdff46a9c	RDF: Implement register liveness analysis Compute block live-ins and operand kill flags from the DFG. llvm-svn: 257480	2016-01-12 15:56:33 +00:00
Daniel Sanders	5e1d5a789a	[mips] Correct operand order in DSP's mthi/mtlo Summary: The result register is the second operand as per the other mt* instructions. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D15993 llvm-svn: 257478	2016-01-12 15:15:14 +00:00
Krzysztof Parzyszek	b5b5a1d7ad	Register Data Flow: data flow graph Target independent, SSA-based data flow framework for representing data flow between physical registers. This commit implements the creation of the actual data flow graph. llvm-svn: 257477	2016-01-12 15:09:49 +00:00
Benjamin Kramer	ab8cc02ba5	[Hexagon] Make helper function static. NFC. llvm-svn: 257476	2016-01-12 14:58:49 +00:00

... 5 6 7 8 9 ...

36314 Commits