llvm-project

Commit Graph

Author	SHA1	Message	Date
Richard Trieu	e0129e474d	Call the correct overload. Call the correct overload so a string literal does not get converted to a bool. Also fix the test case to match the names given. llvm-svn: 249183	2015-10-02 20:52:14 +00:00
Dan Gohman	baba8c648b	[WebAssembly] Add a resize_memory intrinsic. llvm-svn: 249178	2015-10-02 20:10:26 +00:00
Michael Zolotukhin	d57f4b9011	[Tests] Add one more case to LoopUnroll/pr18861.ll for better coverage. llvm-svn: 249174	2015-10-02 19:21:52 +00:00
Michael Zolotukhin	8df4bddd16	[Tests] Give meaningful names to blocks in LoopUnroll/pr18861.ll, add a description of what's going on. llvm-svn: 249173	2015-10-02 19:21:49 +00:00
Michael Zolotukhin	47eef7a3c9	[Tests] Slightly reduce test LoopUnroll/pr18861.ll. llvm-svn: 249172	2015-10-02 19:21:43 +00:00
Dan Gohman	72f1692a2c	[WebAssembly] Add a memory_size intrinsic. llvm-svn: 249171	2015-10-02 19:21:15 +00:00
Sanjoy Das	7d910f2b11	[SCEV] Try to prove predicates by splitting them Summary: This change teaches SCEV that to prove `A u< B` it is sufficient to prove each of these facts individually: - B >= 0 - A s< B - A >= 0 In practice, SCEV sometimes finds it easier to prove these facts individually than to prove `A u< B` as one atomic step. Reviewers: reames, atrick, nlewycky, hfinkel Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D13042 llvm-svn: 249168	2015-10-02 18:50:30 +00:00
Roman Divacky	4b5507a037	Actually switch the arch when we see .arch. PR21695 llvm-svn: 249165	2015-10-02 18:25:25 +00:00
Tim Northover	8d67b8e053	ARM: diagnose invalid local fixups on Thumb1 We previously stopped producing Thumb2 relaxations when they weren't supported, but only diagnosed the case where an actual relocation was produced. We should also tell people if local symbols aren't going to work rather than silently overflowing. llvm-svn: 249164	2015-10-02 18:07:18 +00:00
Tim Northover	956b008db6	ARM: correctly align constant pool value on Thumb1 targets. Since we're using tLDRpci to access it, the constant pool's address must be 0 (mod 4). llvm-svn: 249163	2015-10-02 18:07:13 +00:00
Andrea Di Biagio	77f62652c1	Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types." This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Originally reviewed here: http://reviews.llvm.org/D13347 llvm-svn: 249147	2015-10-02 16:08:05 +00:00
Andrea Di Biagio	45874e67a1	Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types. r249121 caused a Clang test failure (avx2-buitins.c). Revert r249121 while I keep investigating on the reason why that test failed. llvm-svn: 249124	2015-10-02 13:06:19 +00:00
Zoran Jovanovic	9ffdfa5986	[mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend Differential Revision: http://reviews.llvm.org/D13235 llvm-svn: 249123	2015-10-02 13:06:02 +00:00
Andrea Di Biagio	cb33456122	[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types. This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Differential Revision: http://reviews.llvm.org/D13347 llvm-svn: 249121	2015-10-02 12:45:37 +00:00
Adrian Prantl	42562c38f5	dsymutil: Also ignore the ByteSize when building the DeclContext cache for clang modules. Forward decls of ObjC interfaces don't have a bytesize. llvm-svn: 249110	2015-10-02 00:27:08 +00:00
Bruno Cardoso Lopes	b491a2d641	[SimplifyLibCalls] Fix instruction misplacement in string/memory libcall optimization When trying to optimize fortified library functions use the right location to insert new instructions in order to preserve correct def-use order. This fixes an issue where a misplaced instruction definition would happen to be after one of its use after a RAUW, forming invalid IR. This behavior was introduced by r227250. Differential Revision: http://reviews.llvm.org/D13301 rdar://problem/22802369 llvm-svn: 249092	2015-10-01 22:43:53 +00:00
Colin LeMahieu	665c9be489	[Hexagon] XFAILing test while diagnosing backend error. llvm-svn: 249088	2015-10-01 22:14:05 +00:00
Joerg Sonnenberger	c8d50d6347	Fix relocation used for GOT references in non-PIC mode. Fix relocations for "set" pseudo op in PIC mode. Differential Revision: http://reviews.llvm.org/D13173 llvm-svn: 249086	2015-10-01 22:08:20 +00:00
Davide Italiano	f070688ecf	[PATCH] D13360: [llvm-objdump] Teach -d about AArch64 mapping symbols AArch64 uses $d* and $x* to interleave between text and data. llvm-objdump didn't know about this so it ended up printing garbage. This patch is a first step towards a solution of the problem. Differential Revision: http://reviews.llvm.org/D13360 llvm-svn: 249083	2015-10-01 21:57:09 +00:00
Reid Kleckner	fc64fae6e3	[WinEH] Emit __C_specific_handler tables for the new IR We emit denormalized tables, where every range of invokes in the same state gets a complete list of EH action entries. This is significantly simpler than trying to infer the correct nested scoping structure from the MI. Fortunately, for SEH, the nesting structure is really just a size optimization. With this, some basic __try / __except examples work. llvm-svn: 249078	2015-10-01 21:38:24 +00:00
Colin LeMahieu	f92c175bdd	[Hexagon] XFAILing test while diagnosing backend error. llvm-svn: 249075	2015-10-01 21:19:03 +00:00
Tom Stellard	e9f8b24985	AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass Summary: Instead of asserting when the kernel metadata is different than we expect, we should just skip lowering that function. This fixes assertion failures with OpenCL argument metadata from older LLVM releases. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13356 llvm-svn: 249073	2015-10-01 21:16:05 +00:00
David Majnemer	4600c06434	[WinEH] Stop BranchFolding from merging across funclets BranchFolding would merge two funclets together, this is not OK. Disable this and strengthen the assertion in FuncletLayout. llvm-svn: 249069	2015-10-01 21:04:13 +00:00
David Majnemer	f828a0ccc7	[WinEH] Make FuncletLayout more robust against catchret Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. llvm-svn: 249052	2015-10-01 18:44:59 +00:00
Jonas Paulsson	12629324a4	[SystemZ] Add some generic (floating point support) load instructions. Add generic instructions for load complement, load negative and load positive for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so give scheduler more freedom. SystemZElimCompare pass will convert them when it can to the CC-setting variants. Regression tests updated to expect the new opcodes in places where the old ones where used. New test case SystemZ/fp-cmp-05.ll checks that SystemZCompareElim.cpp can handle the new opcodes. README.txt updated (bullet removed). Note that fp128 is not yet handled, because it is relatively rare, and is a bit trickier, because of the fact that l.dfr would operate on the sign bit of one of the subregisters of a fp128, but we would not want to copy the other sub-reg in case src and dst regs are not the same. Reviewed by Ulrich Weigand. llvm-svn: 249046	2015-10-01 18:12:28 +00:00
Rafael Espindola	e883514736	Fix printing of 64 bit values and make test more strict. llvm-svn: 249043	2015-10-01 17:57:31 +00:00
Tom Stellard	e0e582c9aa	AMDGPU: Add MEM_RAT STORE_TYPED. v2: Add test (Matt). Fix capitalization of isEOP (Matt). Move pattern to class parameter (Matt). Make the instruction available to Cayman (Matt). Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED. Patch by: Zoltan Gilian llvm-svn: 249042	2015-10-01 17:51:34 +00:00
NAKAMURA Takumi	1ed20db720	Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64" It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll llvm-svn: 249032	2015-10-01 17:00:56 +00:00
Arnaud A. de Grandmaison	849f3bf8c9	[InstCombine] Remove trivially empty lifetime start/end ranges. Summary: Some passes may open up opportunities for optimizations, leaving empty lifetime start/end ranges. For example, with the following code: void foo(char , char ); void bar(int Size, bool flag) { for (int i = 0; i < Size; ++i) { char text[1]; char buff[1]; if (flag) foo(text, buff); // BBFoo } } the loop unswitch pass will create 2 versions of the loop, one with flag==true, and the other one with flag==false, but always leaving the BBFoo basic block, with lifetime ranges covering the scope of the for loop. Simplify CFG will then remove BBFoo in the case where flag==false, but will leave the lifetime markers. This patch teaches InstCombine to remove trivially empty lifetime marker ranges, that is ranges ending right after they were started (ignoring debug info or other lifetime markers in the range). This fixes PR24598: excessive compile time after r234581. Reviewers: reames, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13305 llvm-svn: 249018	2015-10-01 14:54:31 +00:00
Ulrich Weigand	cf1670a095	[SystemZ] Add assembly instructions for obtaining clock values as well as CPU features Provide assembler support for STCK, STCKF, STCKE, and STFLE. Author: joncmu Differential Revision: http://reviews.llvm.org/D13299 llvm-svn: 249015	2015-10-01 14:43:48 +00:00
Zoran Jovanovic	2960f3a346	[mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructions Differential Revision: http://reviews.llvm.org/D10337 llvm-svn: 249004	2015-10-01 12:49:27 +00:00
Scott Douglass	290183d734	[ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer Differential Revision: http://reviews.llvm.org/D13240 llvm-svn: 249002	2015-10-01 11:56:19 +00:00
Jingyue Wu	df1a1b113b	[NaryReassociate] SeenExprs records WeakVH Summary: The instructions SeenExprs records may be deleted during rewriting. FindClosestMatchingDominator should ignore these deleted instructions. Fixes PR24301. Reviewers: grosser Subscribers: grosser, llvm-commits Differential Revision: http://reviews.llvm.org/D13315 llvm-svn: 248983	2015-10-01 03:51:44 +00:00
Dehao Chen	7c41dd6498	Update sample profile propagation algorithm. http://reviews.llvm.org/D13218 llvm-svn: 248968	2015-10-01 00:26:56 +00:00
Ahmed Bougacha	23a0d1a1d6	[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math. The custom code produces incorrect results if later reassociated. Since r221657, on x86, vNi32 uitofp is lowered using an optimized sequence: movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...] pand %xmm0, %xmm1 por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...] psrld $16, %xmm0 por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...] addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...] addps %xmm1, %xmm0 Since r240361, the machine combiner opportunistically reassociates 2-instruction sequences (with -ffast-math). In the new code sequence, the ADDPS' are eligible. In isolation, for simple examples (without reassociable users), this makes no performance difference (the goal being to enable reassociation of longer chains). In the trivial example (just one uitofp), the reassociation doesn't happen, because (I think) it would require the emission of a separate movaps for a constantpool load (instead of folding it into addps). However, when we have multiple uitofp sequences, and the constantpool loads are CSE'd earlier, the machine combiner can do the reassociation. When the ADDPS' are reassociated, the resulting sequence isn't correct anymore, as we'd be adding large (239) constants with comparatively smaller values (~223). Given that two of the three inputs are powers of 2 larger than 216, and that ulp(239) == 2(39-24) == 215, the reassociated chain will produce 0 for any input in [0, 214[. In my testing, it also produces wrong results for 99.5% of [0, 232[. Avoid this by disabling the new lowering when -ffast-math. It does mean that we'll get slower code than without it, but at least we won't get egregiously incorrect code. One might argue that, considering -ffast-math is all but meaningless, uitofp producing wrong results isn't a compiler bug. But it really is. Fixes PR24512. ...though this is really more of a workaround. Ideally, we'd have some sort of Machine FMF, but that's a problem that's not worth tackling until we do more with machine IR. llvm-svn: 248965	2015-10-01 00:11:07 +00:00
Reid Kleckner	6dec87a8a0	[WinEH] Emit int3 after noreturn calls on Win64 The Win64 unwinder disassembles forwards from each PC to try to determine if this PC is in an epilogue. If so, it skips calling the EH personality function for that frame. Typically, this means you cannot catch an exception in the same frame that you threw it, because 'throw' calls a noreturn runtime function. Previously we avoided this problem with the TrapUnreachable TargetOption, but that's a much bigger hammer than we need. All we need is a 1 byte non-epilogue instruction right after the call. Instead, what we got was an unconditional branch to a shared block containing the ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which added TrapUnreachable, and replaces it with something better. The new code pattern matches for invoke/call followed by unreachable and inserts an int3 into the DAG. To be 100% watertight, we would need to insert SEH_Epilogue instructions into all basic blocks ending in a call with no terminators or successors, but in practice this is unlikely to come up. llvm-svn: 248959	2015-09-30 23:09:23 +00:00
Sanjay Patel	a114a10bbe	[x86] enable machine combiner reassociations for 256-bit vector logical integer insts llvm-svn: 248955	2015-09-30 22:25:55 +00:00
Chad Rosier	4c5a4646bf	[AArch64] Remove an unnecessary run line and other cleanup. NFC. Unscaled load/store combining has been enabled since the initial ARM64 port. No need for a redundance run. Also, add CHECK-LABEL directives. llvm-svn: 248945	2015-09-30 21:10:02 +00:00
Michael Zolotukhin	fc783e91e0	[SLP] Don't vectorize loads of non-packed types (like i1, i2). Summary: Given an array of i2 elements, 4 consecutive scalar loads will be lowered to i8-sized loads and thus will access 4 consecutive bytes in memory. If we vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in memory. Hence, we should prohibit vectorization in such cases. PS: Initial patch was proposed by Arnold. Reviewers: aschwaighofer, nadav, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13277 llvm-svn: 248943	2015-09-30 21:05:43 +00:00
Evgeniy Stepanov	422a61306e	Move dw_op_minus test to DebugInfo/X86. The test requires X86 target support, and checks the actual debug info contents, including register numbers which would be different on other platforms. llvm-svn: 248938	2015-09-30 20:23:24 +00:00
Evgeniy Stepanov	f608111d1b	Fix debug info with SafeStack. llvm-svn: 248933	2015-09-30 19:55:43 +00:00
Chad Rosier	11c825f7db	[AArch64] Remove an unnecessary restriction on pre-index instructions. Previously, the index was constrained to the size of the memory operation for no apparent reason. This change removes that constraint so that we can form pre-index instructions with any valid offset. llvm-svn: 248931	2015-09-30 19:44:40 +00:00
Hal Finkel	4c45775880	[PowerPC] Disable shrink wrapping Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for now until the problem can be fixed. llvm-svn: 248924	2015-09-30 17:29:03 +00:00
Erik Eckstein	91c49810f2	SLPVectorizer: add a test to check if the minimum region size works. This is an addition to rL248917. llvm-svn: 248923	2015-09-30 17:28:19 +00:00
Artyom Skrobov	72ca6b8f3f	[ARM] Support for ARMv6-Z / ARMv6-ZK missing As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121 TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK. In particular, there were no tests for TrustZone being supported in these architectures. The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes his test case which hadn't really been testing what it was claiming to test. Differential Revision: http://reviews.llvm.org/D13236 llvm-svn: 248921	2015-09-30 17:25:52 +00:00
Erik Eckstein	848c1aa452	SLPVectorizer: limit the scheduling region size per basic block. Usually large blocks are not a problem. But if a large block (> 10k instructions) contains many (potential) chains of vector instructions, and those chains are spread over a wide range of instructions, then scheduling becomes a compile time problem. This change introduces a limit for the accumulate scheduling region size of a block. For real-world functions this limit will never be exceeded (it's about 10x larger than the maximum value seen in the test-suite and external test suite). llvm-svn: 248917	2015-09-30 17:00:44 +00:00
Andrea Di Biagio	0594e2a1e9	[InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant. This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant. Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag. We may end up with byte shuffles with constant masks as the result of inlining. Differential Revision: http://reviews.llvm.org/D13252 llvm-svn: 248913	2015-09-30 16:44:39 +00:00
Jeroen Ketema	ab99b59e8c	[ARM][NEON] Use address space in vld([1234]\|[234]lane) and vst([1234]\|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887	2015-09-30 10:56:37 +00:00
Simon Pilgrim	3d11c994f7	[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878	2015-09-30 08:17:50 +00:00
Dehao Chen	aae9e1f2bd	Add unittest for new samle profile format. http://reviews.llvm.org/D13145 llvm-svn: 248870	2015-09-30 01:05:37 +00:00
Dehao Chen	6722688eaa	http://reviews.llvm.org/D13145 Support hierarachical sample profile format. llvm-svn: 248865	2015-09-30 00:42:46 +00:00
Evgeniy Stepanov	d3f544f271	[safestack] Fix a stupid mix-up in the direct-tls code path. llvm-svn: 248863	2015-09-30 00:01:47 +00:00
Reid Kleckner	a13dfd539b	[WinEH] Setup RBP correctly in Win64 funclet prologues Previously local variable captures just didn't work in 64-bit. Now we can access local variables more or less correctly. llvm-svn: 248857	2015-09-29 23:32:01 +00:00
David Majnemer	91b0ab9172	[WinEH] Ensure that funclets obey the x64 ABI The x64 ABI requires that epilogues do not contain code other than stack adjustments and some limited control flow. However, we'd insert code to initialize the return address after stack adjustments. Instead, insert EAX/RAX with the current value before we create the stack adjustments in the epilogue. llvm-svn: 248839	2015-09-29 22:33:36 +00:00
Maksim Panchenko	cce239c45d	HHVM calling conventions. HHVM calling convention, hhvmcc, is used by HHVM JIT for functions in translated cache. We currently support LLVM back end to generate code for X86-64 and may support other architectures in the future. In HHVM calling convention any GP register could be used to pass and return values, with the exception of R12 which is reserved for thread-local area and is callee-saved. Other than R12, we always pass RBX and RBP as args, which are our virtual machine's stack pointer and frame pointer respectively. When we enter translation cache via hhvmcc function, we expect the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed to standard ABI alignment. This affects stack object alignment and stack adjustments for function calls. One extra calling convention, hhvm_ccc, is used to call C++ helpers from HHVM's translation cache. It is almost identical to standard C calling convention with an exception of first argument which is passed in RBP (before we use RDI, RSI, etc.) Differential Revision: http://reviews.llvm.org/D12681 llvm-svn: 248832	2015-09-29 22:09:16 +00:00
Chad Rosier	1769d8505f	Fix test from r248825. llvm-svn: 248827	2015-09-29 20:50:15 +00:00
Chad Rosier	4315012769	[AArch64] Add support for pre- and post-index LDPSWs. llvm-svn: 248825	2015-09-29 20:39:55 +00:00
David Majnemer	a80c151286	[WinEH] Teach AsmPrinter about funclets Summary: Funclets have been turned into functions by the time they hit the object file. Make sure that they have decent names for the symbol table and CFI directives explaining how to reason about their prologues. Differential Revision: http://reviews.llvm.org/D13261 llvm-svn: 248824	2015-09-29 20:12:33 +00:00
Zachary Turner	4dddcc64d3	[llvm-pdbdump] Add include-only filters. PDB files have a lot of noise in them, with hundreds (or thousands) of symbols from system libraries and compiler generated types. If you're only looking for a specific type, this can be problematic. This CL allows you to display only types, variables, or compilands matching a particular pattern. These filters can even be combined with exclude filters. Include-only filters are given priority, so that first the set of items to display is limited only to those that match the include filters, and then the set of exclude filters is applied to those. If there are no include filters specified, then it means "display everything". llvm-svn: 248822	2015-09-29 19:49:06 +00:00
Chad Rosier	dabe2534ed	[AArch64] Add integer pre- and post-index halfword/byte loads and stores. llvm-svn: 248817	2015-09-29 18:26:15 +00:00
Dehao Chen	028e122ca9	Revert r248810 which breaks tests. llvm-svn: 248814	2015-09-29 18:18:49 +00:00
Dehao Chen	410a25aa7a	http://reviews.llvm.org/D13231 Change lookup functions to const functions. llvm-svn: 248810	2015-09-29 17:59:58 +00:00
James Molloy	897048bee3	[ValueTracking] Teach isKnownNonZero about monotonically increasing PHIs If a PHI starts at a non-negative constant, monotonically increases (only adds of a constant are supported at the moment) and that add does not wrap, then the PHI is known never to be zero. llvm-svn: 248796	2015-09-29 14:08:45 +00:00
Jeroen Ketema	740f9d79ca	Arguments spilled on the stack before a function call may have alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786	2015-09-29 10:12:57 +00:00
Simon Pilgrim	43f5e0848e	[InstCombine] Improve Vector Demanded Bits Through Bitcasts Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements. This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector). This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts. Differential Revision: http://reviews.llvm.org/D12935 llvm-svn: 248784	2015-09-29 08:19:11 +00:00
Dan Gohman	868e1c08d9	[WebAssembly] Rename test files to match platform naming conventions. llvm-svn: 248783	2015-09-29 08:13:58 +00:00
Chen Li	9f27fc0599	[LoopUnswitch] Add block frequency analysis to recognize hot/cold regions Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default. Reviewers: broune, silvas, dnovillo, reames Subscribers: davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D11605 llvm-svn: 248777	2015-09-29 05:03:32 +00:00
Evgeniy Stepanov	d8b86f7cdc	Move dbg.declare intrinsics when merging and replacing allocas. Place new and update dbg.declare calls immediately after the corresponding alloca. Current code in replaceDbgDeclareForAlloca puts the new dbg.declare at the end of the basic block. LLVM codegen has problems emitting debug info in a situation when dbg.declare appears after all uses of the variable. This usually kinda works for inlining and ASan (two users of this function) but not for SafeStack (see the pending change in http://reviews.llvm.org/D13178). llvm-svn: 248769	2015-09-29 00:30:19 +00:00
Reid Kleckner	c71d6275ca	[WinEH] Fix ip2state table emission with funclets Previously we were hijacking the old LandingPadInfo data structures to communicate our state numbers. Now we don't need that anymore. llvm-svn: 248763	2015-09-28 23:56:30 +00:00
Sanjoy Das	4f1c45952c	[SCEV] Don't crash on pointer comparisons `ScalarEvolution::isImpliedCondOperandsViaNoOverflow` tries to cast the operand type of the comparison it is given to an `IntegerType`. This is incorrect because it could actually be simplifying a comparison between two pointers. Switch it to using `getTypeSizeInBits` instead, which does the right thing for both pointers and integers. Fixed PR24956. llvm-svn: 248743	2015-09-28 21:14:32 +00:00
Matt Arsenault	73aa8f687a	AMDGPU: Fix splitting x16 SMRD loads When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741	2015-09-28 20:54:52 +00:00
Matt Arsenault	e5d042cd56	AMDGPU: Fix moving SMRD loads with literal offsets on CI llvm-svn: 248740	2015-09-28 20:54:46 +00:00
Matt Arsenault	b378f075a2	AMDGPU: Add testcases Make sure we are testing moving users of the moved and split SMRD loads. llvm-svn: 248738	2015-09-28 20:54:38 +00:00
Matt Arsenault	f3c91f573f	AMDGPU: Cleanup test Run instnamer on it, and rename check prefix. This is in preparation for adding new testcases to cover bugs on other subtargets. llvm-svn: 248737	2015-09-28 20:54:32 +00:00
Sean Silva	ace7818ce6	[GlobalOpt] Sort members of llvm.used deterministically Patch by Jake VanAdrighem! Summary: Fix the way we sort the llvm.used and llvm.compiler.used members. This bug seems to have been introduced in rL183756 through a set of improper casts to GlobalValue*. In subsequent patches this problem was missed and transformed into a getName call on a ConstantExpr. Reviewers: silvas Subscribers: silvas, llvm-commits Differential Revision: http://reviews.llvm.org/D12851 llvm-svn: 248728	2015-09-28 19:02:11 +00:00
Artur Pilipenko	b4d009042b	Introduce !align metadata for load instruction Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D12853 llvm-svn: 248721	2015-09-28 17:41:08 +00:00
Philip Reames	13f023c09d	[InstSimplify] Fold simple known implications to true This was split off of http://reviews.llvm.org/D13040 to make it easier to test the correctness of the implication logic. For the moment, this only handles a single easy case which shows up when eliminating and combining range checks. In the (near) future, I plan to extend this for other cases which show up in range checks, but I wanted to make those changes incrementally once the framework was in place. At the moment, the implication logic will be used by three places. One in InstSimplify (this review) and two in SimplifyCFG (http://reviews.llvm.org/D13040 & http://reviews.llvm.org/D13070). Can anyone think of other locations this style of reasoning would make sense? Differential Revision: http://reviews.llvm.org/D13074 llvm-svn: 248719	2015-09-28 17:14:24 +00:00
Weiming Zhao	310770a90f	[LoopReroll] Ignore debug intrinsics Originally, debug intrinsics and annotation intrinsics may prevent the loop to be rerolled, now they are ignored. Differential Revision: http://reviews.llvm.org/D13150 llvm-svn: 248718	2015-09-28 17:03:23 +00:00
Dan Gohman	05a17aa82a	[WebAssembly] Support for direct call and call_indirect. llvm-svn: 248716	2015-09-28 16:22:39 +00:00
Zoran Jovanovic	cdb64566cc	[mips] Handling of immediates bigger than 16 bits Differential Revision: http://reviews.llvm.org/D10539 llvm-svn: 248706	2015-09-28 11:11:34 +00:00
Hal Finkel	bd582581b8	[DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking When AA is being used, non-aliasing stores are canonicalized to use the same chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of this by looking only as users of a store's chain operand. However, user iteration is not result-number specific, we need to check that the use is as a chain operand, and not via some other operand. It is certainly possible to have another potentially-aliasing store, which shares the first's base pointer, and uses the first's chain's node via some other operand. Failure to catch this situation caused, at least in the included test case, an assert later because the relative sequence-number ordering caused later replacement to create a cycle in the DAG. llvm-svn: 248698	2015-09-28 08:02:14 +00:00
Sanjoy Das	f1090b6061	[SCEV] identical instructions don't compute equal values Before this change `HasSameValue` would return true for distinct `alloca` instructions if they happened to be allocating the same type (`alloca` instructions are not specified as reading memory). This change adds an explicit whitelist of instruction types for which "identical" instructions compute the same value. Fixes PR24952. llvm-svn: 248690	2015-09-27 21:09:48 +00:00
Sanjay Patel	9533407566	[InstCombine] fold zexts and constants into a phi (PR24766) This is one step towards solving PR24766: https://llvm.org/bugs/show_bug.cgi?id=24766 We were not producing the same IR for these two C functions because the store to the temp bool causes extra zexts: #include <stdbool.h> bool switchy(char x1, char x2, char condition) { bool conditionMet = false; switch (condition) { case 0: conditionMet = (x1 == x2); break; case 1: conditionMet = (x1 <= x2); break; } return conditionMet; } bool switchy2(char x1, char x2, char condition) { switch (condition) { case 0: return (x1 == x2); case 1: return (x1 <= x2); } return false; } As noted in the code comments, this test case manages to avoid the more general existing phi optimizations where there are only 2 phi inputs or where there are no constant phi args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, then I don't think we can get SimplifyCFG to further optimize towards the canonical form for this function shown in the bug report. Differential Revision: http://reviews.llvm.org/D12866 llvm-svn: 248689	2015-09-27 20:34:31 +00:00
Joseph Tremoulet	09af67aba5	[EH] Create removeUnwindEdge utility Summary: Factor the code that rewrites invokes to calls and rewrites WinEH terminators to their "unwind to caller" equivalents into a helper in Utils/Local, and use it in the three places I'm aware of that need to do this. Reviewers: andrew.w.kaylor, majnemer, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13152 llvm-svn: 248677	2015-09-27 01:47:46 +00:00
Simon Pilgrim	91717ee233	[InstCombine] Removed unnecessary meta attributes. llvm-svn: 248672	2015-09-26 17:49:04 +00:00
Chen Li	7452d95656	[Bug 24848] Use range metadata to constant fold comparisons between two values Summary: This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848. If both operands of a comparison have range metadata, they should be used to constant fold the comparison. Reviewers: sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13177 llvm-svn: 248650	2015-09-26 03:26:47 +00:00
Matt Arsenault	86095b8dec	AMDGPU: Fix sched model for VOP2b instructions Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646	2015-09-26 02:25:45 +00:00
Dan Gohman	d0bf981296	[WebAssembly] Rename several functions and types according to the new spec. llvm-svn: 248644	2015-09-26 01:09:44 +00:00
Ahmed Bougacha	e81610fabb	[ARM] Don't generate clrex for pre-v7 targets. Since r248294, we emit clrex, but it doesn't exist on v6. llvm-svn: 248640	2015-09-26 00:14:02 +00:00
Sanjoy Das	b174f9a316	[SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts' Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 The original checkin, r248608 had to be backed out due to an issue with a ObjCXX unit test. That issue is now fixed, so re-landing. Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248638	2015-09-25 23:53:50 +00:00
Sanjoy Das	96709c4854	[SCEV] Reapply 'Exploit A < B => (A+K) < (B+K) when possible' Summary: This change teaches SCEV's `isImpliedCond` two new identities: A u< B u< -C => (A + C) u< (B + C) A s< B s< INT_MIN - C => (A + C) s< (B + C) While these are useful on their own, they're really intended to support D12950. The original checkin, r248606 had to be backed out due to an issue with a ObjCXX unit test. That issue is now fixed, so re-landing. Reviewers: atrick, reames, majnemer, nlewycky, hfinkel Subscribers: aadg, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12948 llvm-svn: 248637	2015-09-25 23:53:45 +00:00
Sanjay Patel	e1b09caaaf	[InstCombine] match De Morgan's Law hidden by zext ops (PR22723) This is a fix for PR22723: https://llvm.org/bugs/show_bug.cgi?id=22723 My first attempt at this was to change what I thought was the root problem: xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32 ...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop! My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would mean potentially returning a different size for the match than what was input. I think this would require all users of m_Not to check the size of the returned match, so I abandoned that idea. I settled on just fixing the exact case presented in the PR. This patch does allow the 2 functions in PR22723 to compile identically (x86): bool test(bool x, bool y) { return !x \| !y; } bool test(bool x, bool y) { return !x \|\| !y; } ... andb %sil, %dil xorb $1, %dil movb %dil, %al retq Differential Revision: http://reviews.llvm.org/D12705 llvm-svn: 248634	2015-09-25 23:21:38 +00:00
Cong Hou	15ea016346	Use fixed-point representation for BranchProbability. BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change: Pros: 1. It uses only a half space of the current one. 2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer. Cons: 1. Constructing a probability using arbitrary numerator and denominator needs additional calculations. 2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio. One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern. Differential revision: http://reviews.llvm.org/D12603 llvm-svn: 248633	2015-09-25 23:09:59 +00:00
Matthias Braun	a3b701f828	SelectionDAGDumper: Print simple operands inline. Print simple operands inline instead of their pointer/value number. Simple operands are SDNodes without predecessors like Constant(FP), Register, UNDEF. This unifies the behaviour with dumpr() which was already doing this. Previously: t0: ch = EntryToken t1: i64 = Register %vreg0 t2: i64,ch = CopyFromReg t0, t1 t3: i64 = Constant<1> t4: i64 = add t2, t3 t5: i64 = Constant<2> t6: i64 = add t2, t5 t10: i64 = undef t11: i8,ch = load t0, t2, t10<LD1[%tmp81]> t12: i8,ch = load t0, t4, t10<LD1[%tmp10]> t13: i8,ch = load t0, t6, t10<LD1[%tmp12]> Now: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0 t4: i64 = add t2, Constant:i64<1> t6: i64 = add t2, Constant:i64<2> t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64 t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64 t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64 Differential Revision: http://reviews.llvm.org/D12567 llvm-svn: 248628	2015-09-25 22:27:02 +00:00
Sanjay Patel	bbbf9a1a34	merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622	2015-09-25 21:49:48 +00:00
Matthias Braun	e86bbd8979	PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepoint The algorithm would not modify the live-in list of blocks below the save block point which is correct unless it happens to be a restore point at the same time. Also fixes the benign issue of live-in registers being added twice in some cases. The testcase is based on a test submitted by Kit Barton. Differential Revision: http://reviews.llvm.org/D13176 llvm-svn: 248620	2015-09-25 21:41:40 +00:00
Tom Stellard	e135ffd554	AMDGPU/SI: Use .hsatext section instead of .text for HSA Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619	2015-09-25 21:41:28 +00:00
Sanjoy Das	4a39b97671	Revert two SCEV changes that caused test failures in clang. r248606: "[SCEV] Exploit A < B => (A+K) < (B+K) when possible" r248608: "[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts." llvm-svn: 248614	2015-09-25 21:16:50 +00:00
Matt Arsenault	10aa807856	PeepholeOptimizer: Remove redundant copies If a virtual register is copied and another copy was already seen, replace with the previous copy. This only handles the simplest cases for now. This pattern shows up from various operand restrictions AMDGPU has which require inserting copies depending on the register class of the operands. llvm-svn: 248611	2015-09-25 20:22:12 +00:00
Sanjoy Das	d706fa8a0c	[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts. Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248608	2015-09-25 19:59:57 +00:00
Sanjoy Das	fdec9deb13	[SCEV] Exploit A < B => (A+K) < (B+K) when possible Summary: This change teaches SCEV's `isImpliedCond` two new identities: A u< B u< -C => (A + C) u< (B + C) A s< B s< INT_MIN - C => (A + C) s< (B + C) While these are useful on their own, they're really intended to support D12950. Reviewers: atrick, reames, majnemer, nlewycky, hfinkel Subscribers: aadg, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12948 llvm-svn: 248606	2015-09-25 19:59:49 +00:00
Matt Arsenault	28bd7d4afe	AMDGPU: Add some more tests for literal operands llvm-svn: 248600	2015-09-25 18:21:47 +00:00
Chad Rosier	1bbd7fb38e	[AArch64] Add support for generating pre- and post-index load/store pairs. llvm-svn: 248593	2015-09-25 17:48:17 +00:00
Matt Arsenault	4bf43d4e68	AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589	2015-09-25 17:27:08 +00:00
Matt Arsenault	5f70436c49	AMDGPU: Improve accuracy of instruction rates for VOPC These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582	2015-09-25 16:58:25 +00:00
James Molloy	eb46641c28	[GlobalsAA] Teach GlobalsAA about nocapture Arguments to function calls marked "nocapture" can be marked as non-escaping. However, nocapture is defined in terms of the lifetime of the callee, and if the callee can directly or indirectly recurse to the caller, the semantics of nocapture are invalid. Therefore, we eagerly discover which SCC each function belongs to, and later can check if callee and caller of a callsite belong to the same SCC, in which case there could be recursion. This means that we can't be so optimistic in getModRefInfo(ImmutableCallsite) - previously we assumed all call arguments never aliased with an escaping global. Now we need to check, because a global could now be passed as an argument but still not escape. This also solves a related conformance problem: MemCpyOptimizer can turn non-escaping stores of globals into calls to intrinsics like llvm.memcpy/llvm/memset. This confuses GlobalsAA, which knows the global can't escape and so returns NoModRef when queried, when obviously a memcpy/memset call does indeed reference and modify its arguments. This fixes PR24800, PR24801, and PR24802. llvm-svn: 248576	2015-09-25 15:39:29 +00:00
Saleem Abdulrasool	fe83b50289	ARM: address WoA division limitation We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561	2015-09-25 05:15:46 +00:00
Sanjoy Das	b513a9fa4f	[Bitcode][Asm] Teach LLVM to read and write operand bundles. Summary: This also adds the first set of tests for operand bundles. The optimizer has not been audited to ensure that it does the right thing with operand bundles. Depends on D12456. Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner Subscribers: maksfb, llvm-commits Differential Revision: http://reviews.llvm.org/D12457 llvm-svn: 248551	2015-09-24 23:34:52 +00:00
Ed Maste	f021808d60	Restore test coverage for other than ELFOSABI_NONE Add a FreeBSD test to restore testing of ELF OSABI other than ELFOSABI_NONE after r248534. Differential Revision: http://reviews.llvm.org/D13146 llvm-svn: 248550	2015-09-24 23:01:16 +00:00
Simon Pilgrim	68d0050c6a	[X86][SSE2] Fix zero/any extension shuffles that don't start from the first element Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H llvm-svn: 248536	2015-09-24 21:02:17 +00:00
Rafael Espindola	4405d5d889	Use ELFOSABI_NONE instead of ELFOSABI_LINUX. The doesn't seem to be a difference and ELFOSABI_NONE seems to be far more common: * Linux doesn't care when loading and puts ELFOSABI_NONE on core dumps. * Gold and bfd ld produce files with ELFOSABI_NONE. * Gold and bfd ld seems to ignore EI_OSABI other than for freebsd. * Gas puts ELFOSABI_NONE in most .o files. llvm-svn: 248534	2015-09-24 20:57:24 +00:00
Matt Arsenault	e66621b306	AMDGPU: Add s_dcache_* instructions llvm-svn: 248533	2015-09-24 19:52:27 +00:00
Matt Arsenault	d6adfb401c	AMDGPU: Add cache invalidation instructions. These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532	2015-09-24 19:52:21 +00:00
Matt Arsenault	c116767fec	AMDGPU: Run mubuf assembler test for CI llvm-svn: 248531	2015-09-24 19:52:15 +00:00
Adrian Prantl	f3e634b8fb	dsymutil: Fix the condition to distinguish module imports form definitions. llvm-svn: 248512	2015-09-24 16:10:14 +00:00
James Molloy	b6be1ebb7d	[ValueTracking] Teach isKnownNonZero a new trick If the shifter operand is a constant, and all of the bits shifted out are known to be zero, then if X is known non-zero at least one non-zero bit must remain. llvm-svn: 248508	2015-09-24 16:06:32 +00:00
Mohammad Shahid	d0203cbf1c	Regression Test: Deletes redundant/invalid test. Removes absdiff_expand.ll regression test file which is invalid. Diffrential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248493	2015-09-24 14:37:25 +00:00
Mohammad Shahid	13f1dfdf2e	Codegen: Fix llvm.absdiff semantic. Fixes the overflow case of llvm.absdiff intrinsic also updats the tests and LangRef.rst accordingly. Differential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248483	2015-09-24 10:35:03 +00:00
Charlie Turner	2720593ab4	[InstCombine] Recognize another bswap idiom. Summary: The byte-swap recognizer can now notice that this ``` uint32_t bswap(uint32_t x) { x = (x & 0x0000FFFF) << 16 \| (x & 0xFFFF0000) >> 16; x = (x & 0x00FF00FF) << 8 \| (x & 0xFF00FF00) >> 8; return x; } ``` is a bswap. Fixes PR23863. Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin Subscribers: majnemer, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D12637 llvm-svn: 248482	2015-09-24 10:24:58 +00:00
Matt Arsenault	68d938649e	Introduce target hook for optimizing register copies Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478	2015-09-24 08:36:14 +00:00
Matt Arsenault	cab64f1c75	AMDGPU: Fix printing trailing whitespace for mubuf atomics llvm-svn: 248472	2015-09-24 07:51:17 +00:00
Matt Arsenault	c721df0478	Use new TokenFactor chain when merging stores If the stores are storing values from loads which partially alias the stores, we could end up placing the merged loads and stores on the same chain which has the potential to break. Each store may have a different chain dependency on only some of the original loads. Create a new TokenFactor to capture all of the required dependencies of the stores rather than assuming all stores can use the same chain. The testcase is a situation where this happens, although it does not have an observable change from this. The DAG nodes just happened to not be reordered before despite this missing chain dependency. This is based on an off-list report for an out of tree target which regressed due to r246307 and I haven't managed to find a case where the nodes do end up reordered with an in tree target. llvm-svn: 248468	2015-09-24 07:22:38 +00:00
Matt Arsenault	c8e2ce4046	AMDGPU: Reduce number of copies emitted Instead of always inserting a copy in case the super register is itself a subregister, only extract to the super reg class if this is actually the case. This shouldn't really change codegen, but makes looking at the output of SIFixSGPRCopies easier to read. llvm-svn: 248467	2015-09-24 07:16:37 +00:00
Evgeniy Stepanov	8685daf23e	[safestack] Fix compiler crash in the presence of stack restores. A use can be emitted before def in a function with stack restore points but no static allocas. llvm-svn: 248455	2015-09-24 01:23:51 +00:00
Adrian Prantl	3236c9ce4a	Add REQUIRES: default_triple to these testcases. llvm-svn: 248452	2015-09-24 00:35:14 +00:00
Wei Mi	3cc9204a52	Put profile variables of COMDAT functions to it's own COMDAT group. In -fprofile-instr-generate compilation, to remove the redundant profile variables for the COMDAT functions, these variables are placed in the same COMDAT group as its associated function. This way when the COMDAT function is not picked by the linker, those profile variables will also not be output in the final binary. This may cause warning when mix link objects built w and wo -fprofile-instr-generate. This patch puts the profile variables for COMDAT functions to its own COMDAT group to avoid the problem. Patch by xur. Differential Revision: http://reviews.llvm.org/D12248 llvm-svn: 248440	2015-09-23 22:40:45 +00:00
Sanjay Patel	13e8bbc237	set div/rem default values to 'expensive' in TargetTransformInfo's cost model ...because that's what the cost model was intended to do. As discussed in D12882, this fix has a temporary unintended consequence for SimplifyCFG: it causes us to not speculate an fdiv. However, two wrongs make PR24818 right, and two wrongs make PR24343 act right even though it's really still wrong. I intend to correct SimplifyCFG and add to CodeGenPrepare to account for this cost model change and preserve the righteousness for the bug report cases. https://llvm.org/bugs/show_bug.cgi?id=24818 https://llvm.org/bugs/show_bug.cgi?id=24343 Differential Revision: http://reviews.llvm.org/D12882 llvm-svn: 248439	2015-09-23 22:28:18 +00:00
Tim Northover	beb5bccf88	ARM: fix folding stack adjustment (again again again...) This time, the issue is that we weren't accounting for the possibility that aligned DPRs could have been stored after the final "push" in a prologue. When that happened we effectively moved a "sub sp, #N" from below the aligned stores to above them, and everything went to pot. To make it worse, I'd actually committed something testing that we produced wrong code, so the test update is tiny. llvm-svn: 248437	2015-09-23 22:21:09 +00:00
Adrian Prantl	ea8a724474	dsymutil: Don't prune forward declarations inside a module definition. llvm-svn: 248428	2015-09-23 20:44:37 +00:00
Adrian Prantl	209c424d1e	Fix this dsymutil testcase by not passing in a path to the modulemap file, so the lookup works as expected after prepending the oso-prepend-path. This manifested only on Windows, because "/" is not a relative path there. llvm-svn: 248423	2015-09-23 19:53:10 +00:00
Philip Reames	d63df5107e	Remove handling of AddrSpaceCast in stripAndAccumulateInBoundsConstantOffsets Patch by: simoncook Unlike BitCasts, AddrSpaceCasts do not always produce an output the same size as its input, which was previously assumed. This fixes cases where two address spaces do not have the same size pointer, as an assertion failure would occur when trying to prove deferenceability. LoopUnswitch is used in the particular test, but LICM also exhibits the same problem. Differential Revision: http://reviews.llvm.org/D13008 llvm-svn: 248422	2015-09-23 19:48:43 +00:00
Lawrence Hu	cac0b89289	Swap loop invariant GEP with loop variant GEP to allow more LICM. This patch changes the order of GEPs generated by Splitting GEPs pass, specially when one of the GEPs has constant and the base is loop invariant, then we will generate the GEP with constant first when beneficial, to expose more cases for LICM. If originally Splitting GEP generate the following: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 %3 %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032 ... Now it genereates: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 1032 %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3 ... For no-loop cases, the original way of generating GEPs seems to expose more CSE cases, so we don't change the logic for no-loop cases, and only limit our change to the specific case we are interested in. llvm-svn: 248420	2015-09-23 19:25:30 +00:00
Akira Hatanaka	f6afd11538	[InstCombine] Preserve metadata when merging loads that are phi arguments. Make sure InstCombiner::FoldPHIArgLoadIntoPHI doesn't drop the following metadata: MD_tbaa MD_alias_scope MD_noalias MD_invariant_load MD_nonnull MD_range rdar://problem/17617709 Differential Revision: http://reviews.llvm.org/D12710 llvm-svn: 248419	2015-09-23 18:40:57 +00:00
Sanjay Patel	1a6534661b	[x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx xorl %eax, %ecx movd %ecx, %xmm0 into this: xorps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248415	2015-09-23 18:33:42 +00:00
Sanjay Patel	aba37553c4	[x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx orl %eax, %ecx movd %ecx, %xmm0 into this: orps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248409	2015-09-23 18:19:07 +00:00
Adrian Prantl	4c36e2f47e	Fix the order of operations. llvm-svn: 248406	2015-09-23 18:09:01 +00:00
Evgeniy Stepanov	a2002b08f7	Android support for SafeStack. Add two new ways of accessing the unsafe stack pointer: * At a fixed offset from the thread TLS base. This is very similar to StackProtector cookies, but we plan to extend it to other backends (ARM in particular) soon. Bionic-side implementation here: https://android-review.googlesource.com/170988. * Via a function call, as a fallback for platforms that provide neither a fixed TLS slot, nor a reasonable TLS implementation (i.e. not emutls). This is a re-commit of a change in r248357 that was reverted in r248358. llvm-svn: 248405	2015-09-23 18:07:56 +00:00
Adrian Prantl	c040893085	Temporarily make testcase more verbose to debug a msvc buildbot failure. llvm-svn: 248403	2015-09-23 17:59:45 +00:00
Chen Li	5cd6deeae3	[Bug 24848] Use range metadata to constant fold comparisons with constant values Summary: This is the first part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848. When range metadata is provided, it should be used to constant fold comparisons with constant values. Reviewers: sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12988 llvm-svn: 248402	2015-09-23 17:58:44 +00:00
Adrian Prantl	a112ef9e2d	dsymutil: Resolve forward decls for types defined in clang modules. This patch extends llvm-dsymutil's ODR type uniquing machinery to also resolve forward decls for types defined in clang modules. http://reviews.llvm.org/D13038 llvm-svn: 248398	2015-09-23 17:35:52 +00:00
Adrian Prantl	209370260d	dsymutil: print a warning when there is a module hash mismatch. This also updates the module binaries in the test directory because their module hash mismatched. llvm-svn: 248396	2015-09-23 17:11:10 +00:00
Sanjay Patel	df2495f331	[x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx andl %eax, %ecx movd %ecx, %xmm0 into this: andps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 Differential Revision: http://reviews.llvm.org/D13065 llvm-svn: 248395	2015-09-23 17:00:06 +00:00
Vedant Kumar	ff08e926ba	[Inline] Use AssumptionCache from the right Function This changes the behavior of AddAligntmentAssumptions to match its comment. I.e, prove the asserted alignment in the context of the caller, not the callee. Thanks to Mehdi Amini for seeing the issue here! Also to Artur Pilipenko who also saw a fix for the issue. rdar://22521387 Differential Revision: http://reviews.llvm.org/D12997 llvm-svn: 248390	2015-09-23 15:49:08 +00:00
David Majnemer	fa36bde2f6	[DeadArgElim] Split the invoke successor edge Invoking a function which returns an aggregate can sometimes be transformed to return a scalar value. However, this means that we need to create an insertvalue instruction(s) to recreate the correct aggregate type. We achieved this by inserting an insertvalue instruction at the invoke's normal successor. However, this is not feasible if the normal successor uses the invoke's return value inside a PHI node. Instead, split the edge between the invoke and the unwind successor and create the insertvalue instruction in the new basic block. The new basic block's successor will be the old invoke successor which leaves us with IR which is well behaved. This fixes PR24906. llvm-svn: 248387	2015-09-23 15:41:09 +00:00
Igor Laevsky	029bd93c5d	[DeadStoreElimination] Remove dead zero store to calloc initialized memory This change allows dead store elimination to remove zero and null stores into memory freshly allocated with calloc-like function. Differential Revision: http://reviews.llvm.org/D13021 llvm-svn: 248374	2015-09-23 11:38:44 +00:00
Simon Pilgrim	9cb018b6b6	[X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368	2015-09-23 08:48:33 +00:00
Evgeniy Stepanov	8d0e3011d8	Revert "Android support for SafeStack." test/Transforms/SafeStack/abi.ll breaks when target is not supported; needs refactoring. llvm-svn: 248358	2015-09-23 01:23:22 +00:00
Evgeniy Stepanov	ce2e16f00c	Android support for SafeStack. Add two new ways of accessing the unsafe stack pointer: * At a fixed offset from the thread TLS base. This is very similar to StackProtector cookies, but we plan to extend it to other backends (ARM in particular) soon. Bionic-side implementation here: https://android-review.googlesource.com/170988. * Via a function call, as a fallback for platforms that provide neither a fixed TLS slot, nor a reasonable TLS implementation (i.e. not emutls). llvm-svn: 248357	2015-09-23 01:03:51 +00:00
Cong Hou	b54a72ef78	Add a test case for the fix of profile update issue when lowering switch statement. llvm-svn: 248356	2015-09-23 00:34:56 +00:00
Adrian Prantl	77fefeba37	Debug Info: Emit the dwo_name only in skeleton CUs, not in DWOs. llvm-svn: 248340	2015-09-22 23:21:00 +00:00
Matthias Braun	73e4221e6c	LiveIntervalAnalysis: Avoid multiple connected liveness components We may have subregister defs which are unused but not discovered and cleaned up prior to liveness analysis. This creates multiple connected components in the resulting live range which are forbidden in the MachineVerifier because they would unnecesarily constrain the register allocator. Rewrite those dead definitions to define a newly created virtual register. Differential Revision: http://reviews.llvm.org/D13035 llvm-svn: 248335	2015-09-22 22:37:44 +00:00
Michael Zolotukhin	deade19630	[Unroll] Do not crash trying to propagate a value to vector load. llvm-svn: 248333	2015-09-22 22:27:12 +00:00
Adrian Prantl	e5162dba49	dsymutil: Follow references to clang modules and recursively clone the debug info. This does not yet resolve external type references. llvm-svn: 248331	2015-09-22 22:20:50 +00:00
Michael Zolotukhin	8bb31dd08a	[Unroll] Follow-up for r247769: fix a bug in UnrolledInstAnalyzer::visitLoad. Apart from checking that GlobalVariable is a constant, we should check that it's not a weak constant, in which case we can't propagate its value. llvm-svn: 248327	2015-09-22 21:41:29 +00:00
Davide Italiano	77011ba16a	Remove macho-dump. Its functionality is now covered by llvm-readobj. Approved by: Rafael Espindola, Eric Christopher, Jim Grosbach, Alex Rosenberg llvm-svn: 248302	2015-09-22 17:46:10 +00:00
Ahmed Bougacha	81616a72ea	[ARM] Emit clrex in the expanded cmpxchg fail block. ARM counterpart to r248291: In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248294	2015-09-22 17:22:58 +00:00
Ahmed Bougacha	07a844d758	[AArch64] Emit clrex in the expanded cmpxchg fail block. In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248291	2015-09-22 17:21:44 +00:00
Stephen Canon	8216d88511	Don't raise inexact when lowering ceil, floor, round, trunc. The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions did raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice. Differential Revision: http://reviews.llvm.org/D12969 llvm-svn: 248266	2015-09-22 11:43:17 +00:00
Daniel Sanders	f173dda0e2	[mips][ias] Implement .cpreturn directive. Summary: Based on a patch by David Chisnall. I've modified the original patch as follows: * Moved the expansion to the TargetStreamers so that the directive isn't expanded when emitting assembly. * Fixed an operand order bug. * Changed the move instructions from DADDu to OR to match recent changes to GAS. Reviewers: vkalintiris Subscribers: llvm-commits, emaste, seanbruno, theraven Differential Revision: http://reviews.llvm.org/D13017 llvm-svn: 248258	2015-09-22 10:50:09 +00:00
Simon Pilgrim	1cad0cd3ce	[X86][SSE] Match zero/any extension shuffles that don't start from the first element This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane. The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well. Differential Revision: http://reviews.llvm.org/D12561 llvm-svn: 248250	2015-09-22 08:16:08 +00:00
Philip Reames	5f99423de9	[LICM] Hoist calls to readonly argmemonly functions even with stores in the loop We know that an argmemonly function can only access memory pointed to by it's pointer arguments. Rather than needing to consider all possible stores as aliasing (as we do for a readonly function), we can only consider the aliasing of the pointer arguments. Note that this change only addresses hoisting. I'm thinking about how to address speculation safety as well, but that will be a different change. FYI, argmemonly disallows accessing memory through non-pointer typed arguments. Differential Revision: http://reviews.llvm.org/D12771 llvm-svn: 248220	2015-09-21 22:27:59 +00:00
Philip Reames	963febd4f8	Fix for pr24866 Turns out that not every basic block is guaranteed to have a node within the DominatorTree. This is really hard to trigger, but the test case from the PR managed to do so. There's active discussion continuing about what documentation and/or invariants needed cleaned up. llvm-svn: 248216	2015-09-21 22:04:10 +00:00
Simon Pilgrim	4003ed2da3	[DAGCombiner] Improve FMA support for interpolation patterns This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210	2015-09-21 20:32:48 +00:00
Jeroen Ketema	41681a5329	[ARM] Do not scale vext with a factor The vext pseudo-instruction takes the number of elements that need to be extracted, not the number of bytes. Hence, use the number of elements directly instead of scaling them with a factor. Reviewers: Silviu Baranga, James Molloy (not reflected in the differential revision) Differential Revision: http://reviews.llvm.org/D12974 llvm-svn: 248208	2015-09-21 20:28:04 +00:00
James Molloy	50a4c27f97	[LoopUtils,LV] Propagate fast-math flags on generated FCmp instructions We're currently losing any fast-math flags when synthesizing fcmps for min/max reductions. In LV, make sure we copy over the scalar inst's flags. In LoopUtils, we know we only ever match patterns with hasUnsafeAlgebra, so apply that to any synthesized ops. llvm-svn: 248201	2015-09-21 19:41:19 +00:00
Rafael Espindola	8055ed0c12	Avoid SEGFAULT if a requested symbol section is absent. Patch by Igor Kudrin! llvm-svn: 248194	2015-09-21 19:17:18 +00:00
Ulrich Weigand	126caeb043	[SystemZ] Fix expansion of ISD::FPOW and ISD::FSINCOS The ISD::FPOW and ISD::FSINCOS opcodes default to Legal, but there is no legal instruction for those on SystemZ. This could cause LLVM internal errors. Fixed by setting the operation action to Expand for those opcodes. Also added test cases for all other LLVM IR intrinsics that should generate a library call. (Those already work correctly since the default operation action is fine.) llvm-svn: 248180	2015-09-21 17:35:45 +00:00
Matt Arsenault	b774834429	DAGCombiner: Replace store of FP constant after attemping store merges If storing multiple FP constants, some subset of the stores would be replaced with integers due to visit order, so MergeConsecutiveStores would only partially merge these. llvm-svn: 248169	2015-09-21 15:59:46 +00:00
Asaf Badouh	eaf2da14bf	[X86][AVX512] add masked version for RSQRT14 & RCP14 Scalar FP Differential Revision: http://reviews.llvm.org/D12524 llvm-svn: 248147	2015-09-21 10:23:53 +00:00
Daniel Sanders	5d7962880d	[mips] Allow constant expressions in second argument of .cpsetup. Summary: Also tightened up the test and made a trivial fix to prevent double-newline after emitting .cpsetup directives. Reviewers: vkalintiris Subscribers: seanbruno, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D12956 llvm-svn: 248143	2015-09-21 09:26:55 +00:00
Sanjay Patel	bab5d6c636	add test file ahead of any functional changes for PR22428 llvm-svn: 248123	2015-09-20 15:58:00 +00:00
Simon Pilgrim	c6a553241c	[X86][SSE] Intrinsics builtins test refresh. NFCI llvm-svn: 248122	2015-09-20 15:41:35 +00:00
Igor Breger	b7e1f9d680	AVX512: Implemented encoding and intrinsics for vcmpss/sd. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12593 llvm-svn: 248121	2015-09-20 15:15:10 +00:00
Asaf Badouh	2744d21fb8	[X86][AVX512] extend support in Scalar conversion add scalar FP to Int conversion with truncation intrinsics add scalar conversion FP32 from/to FP64 intrinsics add rounding mode and SAE mode encoding for these intrinsics Differential Revision: http://reviews.llvm.org/D12665 llvm-svn: 248117	2015-09-20 14:31:19 +00:00
Igor Breger	4c4cd789c9	AVX512: vsqrtss/sd encoding and intrinsics implementation. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12102 llvm-svn: 248116	2015-09-20 09:13:41 +00:00
Asaf Badouh	572bbceecc	[X86][AVX512DQ] Add fpclass instruction Differential Revision: http://reviews.llvm.org/D12931 llvm-svn: 248115	2015-09-20 08:46:07 +00:00
Michael Kuperstein	58e86bc893	[X86] Fix sitofp and uitofp instruction matching failures with long double and avx512 The operation action for i32 and i64 cannot be set to legal, as long double needs custom lowering. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D12372 llvm-svn: 248114	2015-09-20 08:12:17 +00:00
Igor Breger	1d55f20bee	AVX512: Implemented intrinsics for vshuff32x4, vshuff64x2, vshufi64x2, vshufi32x4 Added tests for intrinsics. Differential Revision: http://reviews.llvm.org/D12525 llvm-svn: 248113	2015-09-20 07:18:53 +00:00
Igor Breger	0ede3cbb5c	AVX512: Implement instructions encoding, lowering and intrinsics vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4 Added tests for encoding, lowering and intrinsics. Differential Revision: http://reviews.llvm.org/D11893 llvm-svn: 248111	2015-09-20 06:52:42 +00:00
Sanjoy Das	428db150d1	[IndVars] Fix a bug in r248045. Because -indvars widens induction variables through arithmetic, `NeverNegative` cannot be a property of the `WidenIV` (a `WidenIV` manages information for all transitive uses of an IV being widened, including uses of `-1 * IV`). Instead it must live on `NarrowIVDefUse` which manages information for a specific def-use edge in the transitive use list of an induction variable. This change also adds a test case that demonstrates the problem with r248045. llvm-svn: 248107	2015-09-20 01:52:18 +00:00
Davide Italiano	e210ee56f2	Fixup r248096, commit the correct test. llvm-svn: 248097	2015-09-19 20:52:47 +00:00
Davide Italiano	a539f63ae1	[obj2yaml] Fix "time of check to time of use" bug. Add a test. llvm-svn: 248096	2015-09-19 20:49:34 +00:00
Simon Pilgrim	27f81776ad	[X86][AVX2] Use general sext IR for vpmovsx stack folding tests llvm-svn: 248093	2015-09-19 17:04:18 +00:00
Simon Pilgrim	d0448ee59f	[X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1)) Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x)) Differential Revision: http://reviews.llvm.org/D12663 llvm-svn: 248091	2015-09-19 13:22:57 +00:00
NAKAMURA Takumi	5881d349f9	[CMake] Update LLVM_TEST_DEPENDS not to use macho-dump. It has been unused since r247235. llvm-svn: 248088	2015-09-19 07:19:30 +00:00
David Majnemer	47ce0b81b0	[InstCombine] FoldICmpCstShrCst failed for ashr when comparing against -1 (icmp eq (ashr C1, %V) -1) may have multiple answers if C1 is not a power of two and has the sign bit set. This fixes PR24873. llvm-svn: 248074	2015-09-19 00:48:31 +00:00
Matt Arsenault	cc5d106263	AMDGPU: Add failing testcase for live interval construction llvm-svn: 248067	2015-09-19 00:03:56 +00:00
Sanjoy Das	f69d0e3384	[IndVars] Widen more comparisons for non-negative induction vars Summary: If an induction variable is provably non-negative, its sign extension is equal to its zero extension. This means narrow uses like icmp slt iNarrow %indvar, %rhs can be widened into icmp slt iWide zext(%indvar), sext(%rhs) Reviewers: atrick, mcrosier, hfinkel Subscribers: hfinkel, reames, llvm-commits Differential Revision: http://reviews.llvm.org/D12745 llvm-svn: 248045	2015-09-18 21:21:02 +00:00
Cong Hou	d40105d321	Update edge weights properly when merging blocks in if-conversion. In if-conversion, there is a utility function MergeBlocks() that is used to merge blocks. However, when new edges are built in this function the edge weight is either not provided or not updated properly, leading to a modified CFG with incorrect edge weights. This patch corrects this issue. Differential Revision: http://reviews.llvm.org/D12513 llvm-svn: 248030	2015-09-18 20:22:41 +00:00
Eric Christopher	a835956bda	Limit the range of processors supported by ARM fast isel to v6 or later as that's all that is tested right now. Fixes PR24858. llvm-svn: 248027	2015-09-18 20:08:18 +00:00
Cong Hou	f9f9ffb98b	Scaling up values in ARMBaseInstrInfo::isProfitableToIfCvt() before they are scaled by a probability to avoid precision issue. In ARMBaseInstrInfo::isProfitableToIfCvt(), there is a simple cost model in which the number of cycles is scaled by a probability to estimate the cost. However, when the number of cycles is small (which is usually the case), there is a precision issue after the computation. To avoid this issue, this patch scales those cycles by 1024 (chosen to make the multiplication a litter faster) before they are scaled by the probability. Other variables are also scaled up for the final comparison. Differential Revision: http://reviews.llvm.org/D12742 llvm-svn: 248018	2015-09-18 18:19:40 +00:00
Matthias Braun	f89b7c7188	SelectionDAGDumper: Hide [ID=X], [ORD=X] and source locations by default. You can show them with the new -dag-dump-verbose switch. Differential Revision: http://reviews.llvm.org/D12566 llvm-svn: 248011	2015-09-18 17:57:28 +00:00
Matthias Braun	0b7d6c14c9	SelectionDAG: Introduce PersistentID to SDNode for assert builds. This gives us more human readable numbers to identify nodes in debug dumps. Before: 0x7fcbd9700160: ch = EntryToken 0x7fcbd985c7c8: i64 = Register %RAX ... 0x7fcbd9700160: <multiple use> 0x7fcbd985c578: i64,ch = MOV64rm 0x7fcbd985c6a0, 0x7fcbd985cc68, 0x7fcbd985c200, 0x7fcbd985cd90, 0x7fcbd985ceb8, 0x7fcbd9700160<Mem:LD8[@foo]> [ORD=2] 0x7fcbd985c8f0: ch,glue = CopyToReg 0x7fcbd9700160, 0x7fcbd985c7c8, 0x7fcbd985c578 [ORD=3] 0x7fcbd985c7c8: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985ca18: ch = RETQ 0x7fcbd985c7c8, 0x7fcbd985c8f0, 0x7fcbd985c8f0:1 [ORD=3] Now: t0: ch = EntryToken t5: i64 = Register %RAX ... t0: <multiple use> t3: i64,ch = MOV64rm t10, t12, t11, t13, t14, t0<Mem:LD8[@foo]> [ORD=2] t6: ch,glue = CopyToReg t0, t5, t3 [ORD=3] t5: <multiple use> t6: <multiple use> t6: <multiple use> t7: ch = RETQ t5, t6, t6:1 [ORD=3] Differential Revision: http://reviews.llvm.org/D12564 llvm-svn: 248010	2015-09-18 17:41:00 +00:00
Geoff Berry	43ec15e57e	[AArch64] Improved bitfield instruction selection. Summary: For bitfield insert OR matching, check both operands for larger pattern first before checking for smaller pattern. Add pattern for unsigned bitfield insert-in-zero done with SHL+AND. Resolves PR21631. Reviewers: jmolloy, t.p.northover Subscribers: aemerson, rengolin, llvm-commits, mcrosier Differential Revision: http://reviews.llvm.org/D12908 llvm-svn: 248006	2015-09-18 17:11:53 +00:00
Daniel Sanders	df19a5e605	[mips][microMIPS] Fix an invalid read for lwm32 and reserved reglist values. Summary: Some values of 'reglist' are reserved and cause the disassembler to read past the end of the Regs array. Treat lwm32's containing reserved values as invalid instructions. Reviewers: zoran.jovanovic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12959 llvm-svn: 247990	2015-09-18 14:20:54 +00:00
Igor Laevsky	0fa4819dd8	[LazyValueInfo] Report nonnull range for nonnull pointers Currently LazyValueInfo will report only alloca's as having nonnull range. For loads with !nonnull metadata it will bailout with no additional information. Same is true for calls returning nonnull pointers. This change extends LazyValueInfo to handle additional nonnull instructions. Differential Revision: http://reviews.llvm.org/D12932 llvm-svn: 247985	2015-09-18 13:01:48 +00:00
Artur Pilipenko	84bc62f7a3	Support align attribute for return values Reviewed By: reames Differential Revision: http://reviews.llvm.org/D12844 llvm-svn: 247984	2015-09-18 12:33:31 +00:00
Quentin Colombet	b4c6886215	[ShrinkWrap] Refactor the handling of infinite loop in the analysis. - Strenghten the logic to be sure we hoist the restore point out of the current loop. (The fixes a bug with infinite loop, added as part of the patch.) - Walk over the exit blocks of the current loop to conver to the desired restore point in one iteration of the update loop. llvm-svn: 247958	2015-09-17 23:21:34 +00:00
Davide Italiano	096cda11fc	[llvm-readobj] Fix another "time of check to time of use bug". It seems there's more copy-paste between tools than needed. llvm-svn: 247954	2015-09-17 22:29:58 +00:00
David Majnemer	163b7f121c	[WinEH] Fix tests broken by funclet-layout llvm-svn: 247944	2015-09-17 21:11:12 +00:00

... 2 3 4 5 6 ...

32350 Commits