llvm-project

Commit Graph

Author	SHA1	Message	Date
Ulrich Weigand	cf1670a095	[SystemZ] Add assembly instructions for obtaining clock values as well as CPU features Provide assembler support for STCK, STCKF, STCKE, and STFLE. Author: joncmu Differential Revision: http://reviews.llvm.org/D13299 llvm-svn: 249015	2015-10-01 14:43:48 +00:00
Zoran Jovanovic	2960f3a346	[mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructions Differential Revision: http://reviews.llvm.org/D10337 llvm-svn: 249004	2015-10-01 12:49:27 +00:00
Scott Douglass	290183d734	[ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer Differential Revision: http://reviews.llvm.org/D13240 llvm-svn: 249002	2015-10-01 11:56:19 +00:00
Jingyue Wu	df1a1b113b	[NaryReassociate] SeenExprs records WeakVH Summary: The instructions SeenExprs records may be deleted during rewriting. FindClosestMatchingDominator should ignore these deleted instructions. Fixes PR24301. Reviewers: grosser Subscribers: grosser, llvm-commits Differential Revision: http://reviews.llvm.org/D13315 llvm-svn: 248983	2015-10-01 03:51:44 +00:00
Dehao Chen	7c41dd6498	Update sample profile propagation algorithm. http://reviews.llvm.org/D13218 llvm-svn: 248968	2015-10-01 00:26:56 +00:00
Ahmed Bougacha	23a0d1a1d6	[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math. The custom code produces incorrect results if later reassociated. Since r221657, on x86, vNi32 uitofp is lowered using an optimized sequence: movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...] pand %xmm0, %xmm1 por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...] psrld $16, %xmm0 por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...] addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...] addps %xmm1, %xmm0 Since r240361, the machine combiner opportunistically reassociates 2-instruction sequences (with -ffast-math). In the new code sequence, the ADDPS' are eligible. In isolation, for simple examples (without reassociable users), this makes no performance difference (the goal being to enable reassociation of longer chains). In the trivial example (just one uitofp), the reassociation doesn't happen, because (I think) it would require the emission of a separate movaps for a constantpool load (instead of folding it into addps). However, when we have multiple uitofp sequences, and the constantpool loads are CSE'd earlier, the machine combiner can do the reassociation. When the ADDPS' are reassociated, the resulting sequence isn't correct anymore, as we'd be adding large (239) constants with comparatively smaller values (~223). Given that two of the three inputs are powers of 2 larger than 216, and that ulp(239) == 2(39-24) == 215, the reassociated chain will produce 0 for any input in [0, 214[. In my testing, it also produces wrong results for 99.5% of [0, 232[. Avoid this by disabling the new lowering when -ffast-math. It does mean that we'll get slower code than without it, but at least we won't get egregiously incorrect code. One might argue that, considering -ffast-math is all but meaningless, uitofp producing wrong results isn't a compiler bug. But it really is. Fixes PR24512. ...though this is really more of a workaround. Ideally, we'd have some sort of Machine FMF, but that's a problem that's not worth tackling until we do more with machine IR. llvm-svn: 248965	2015-10-01 00:11:07 +00:00
Reid Kleckner	6dec87a8a0	[WinEH] Emit int3 after noreturn calls on Win64 The Win64 unwinder disassembles forwards from each PC to try to determine if this PC is in an epilogue. If so, it skips calling the EH personality function for that frame. Typically, this means you cannot catch an exception in the same frame that you threw it, because 'throw' calls a noreturn runtime function. Previously we avoided this problem with the TrapUnreachable TargetOption, but that's a much bigger hammer than we need. All we need is a 1 byte non-epilogue instruction right after the call. Instead, what we got was an unconditional branch to a shared block containing the ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which added TrapUnreachable, and replaces it with something better. The new code pattern matches for invoke/call followed by unreachable and inserts an int3 into the DAG. To be 100% watertight, we would need to insert SEH_Epilogue instructions into all basic blocks ending in a call with no terminators or successors, but in practice this is unlikely to come up. llvm-svn: 248959	2015-09-30 23:09:23 +00:00
Sanjay Patel	a114a10bbe	[x86] enable machine combiner reassociations for 256-bit vector logical integer insts llvm-svn: 248955	2015-09-30 22:25:55 +00:00
Chad Rosier	4c5a4646bf	[AArch64] Remove an unnecessary run line and other cleanup. NFC. Unscaled load/store combining has been enabled since the initial ARM64 port. No need for a redundance run. Also, add CHECK-LABEL directives. llvm-svn: 248945	2015-09-30 21:10:02 +00:00
Michael Zolotukhin	fc783e91e0	[SLP] Don't vectorize loads of non-packed types (like i1, i2). Summary: Given an array of i2 elements, 4 consecutive scalar loads will be lowered to i8-sized loads and thus will access 4 consecutive bytes in memory. If we vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in memory. Hence, we should prohibit vectorization in such cases. PS: Initial patch was proposed by Arnold. Reviewers: aschwaighofer, nadav, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13277 llvm-svn: 248943	2015-09-30 21:05:43 +00:00
Evgeniy Stepanov	422a61306e	Move dw_op_minus test to DebugInfo/X86. The test requires X86 target support, and checks the actual debug info contents, including register numbers which would be different on other platforms. llvm-svn: 248938	2015-09-30 20:23:24 +00:00
Evgeniy Stepanov	f608111d1b	Fix debug info with SafeStack. llvm-svn: 248933	2015-09-30 19:55:43 +00:00
Chad Rosier	11c825f7db	[AArch64] Remove an unnecessary restriction on pre-index instructions. Previously, the index was constrained to the size of the memory operation for no apparent reason. This change removes that constraint so that we can form pre-index instructions with any valid offset. llvm-svn: 248931	2015-09-30 19:44:40 +00:00
Hal Finkel	4c45775880	[PowerPC] Disable shrink wrapping Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for now until the problem can be fixed. llvm-svn: 248924	2015-09-30 17:29:03 +00:00
Erik Eckstein	91c49810f2	SLPVectorizer: add a test to check if the minimum region size works. This is an addition to rL248917. llvm-svn: 248923	2015-09-30 17:28:19 +00:00
Artyom Skrobov	72ca6b8f3f	[ARM] Support for ARMv6-Z / ARMv6-ZK missing As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121 TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK. In particular, there were no tests for TrustZone being supported in these architectures. The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes his test case which hadn't really been testing what it was claiming to test. Differential Revision: http://reviews.llvm.org/D13236 llvm-svn: 248921	2015-09-30 17:25:52 +00:00
Erik Eckstein	848c1aa452	SLPVectorizer: limit the scheduling region size per basic block. Usually large blocks are not a problem. But if a large block (> 10k instructions) contains many (potential) chains of vector instructions, and those chains are spread over a wide range of instructions, then scheduling becomes a compile time problem. This change introduces a limit for the accumulate scheduling region size of a block. For real-world functions this limit will never be exceeded (it's about 10x larger than the maximum value seen in the test-suite and external test suite). llvm-svn: 248917	2015-09-30 17:00:44 +00:00
Andrea Di Biagio	0594e2a1e9	[InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant. This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant. Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag. We may end up with byte shuffles with constant masks as the result of inlining. Differential Revision: http://reviews.llvm.org/D13252 llvm-svn: 248913	2015-09-30 16:44:39 +00:00
Jeroen Ketema	ab99b59e8c	[ARM][NEON] Use address space in vld([1234]\|[234]lane) and vst([1234]\|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887	2015-09-30 10:56:37 +00:00
Simon Pilgrim	3d11c994f7	[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878	2015-09-30 08:17:50 +00:00
Dehao Chen	aae9e1f2bd	Add unittest for new samle profile format. http://reviews.llvm.org/D13145 llvm-svn: 248870	2015-09-30 01:05:37 +00:00
Dehao Chen	6722688eaa	http://reviews.llvm.org/D13145 Support hierarachical sample profile format. llvm-svn: 248865	2015-09-30 00:42:46 +00:00
Evgeniy Stepanov	d3f544f271	[safestack] Fix a stupid mix-up in the direct-tls code path. llvm-svn: 248863	2015-09-30 00:01:47 +00:00
Reid Kleckner	a13dfd539b	[WinEH] Setup RBP correctly in Win64 funclet prologues Previously local variable captures just didn't work in 64-bit. Now we can access local variables more or less correctly. llvm-svn: 248857	2015-09-29 23:32:01 +00:00
David Majnemer	91b0ab9172	[WinEH] Ensure that funclets obey the x64 ABI The x64 ABI requires that epilogues do not contain code other than stack adjustments and some limited control flow. However, we'd insert code to initialize the return address after stack adjustments. Instead, insert EAX/RAX with the current value before we create the stack adjustments in the epilogue. llvm-svn: 248839	2015-09-29 22:33:36 +00:00
Maksim Panchenko	cce239c45d	HHVM calling conventions. HHVM calling convention, hhvmcc, is used by HHVM JIT for functions in translated cache. We currently support LLVM back end to generate code for X86-64 and may support other architectures in the future. In HHVM calling convention any GP register could be used to pass and return values, with the exception of R12 which is reserved for thread-local area and is callee-saved. Other than R12, we always pass RBX and RBP as args, which are our virtual machine's stack pointer and frame pointer respectively. When we enter translation cache via hhvmcc function, we expect the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed to standard ABI alignment. This affects stack object alignment and stack adjustments for function calls. One extra calling convention, hhvm_ccc, is used to call C++ helpers from HHVM's translation cache. It is almost identical to standard C calling convention with an exception of first argument which is passed in RBP (before we use RDI, RSI, etc.) Differential Revision: http://reviews.llvm.org/D12681 llvm-svn: 248832	2015-09-29 22:09:16 +00:00
Chad Rosier	1769d8505f	Fix test from r248825. llvm-svn: 248827	2015-09-29 20:50:15 +00:00
Chad Rosier	4315012769	[AArch64] Add support for pre- and post-index LDPSWs. llvm-svn: 248825	2015-09-29 20:39:55 +00:00
David Majnemer	a80c151286	[WinEH] Teach AsmPrinter about funclets Summary: Funclets have been turned into functions by the time they hit the object file. Make sure that they have decent names for the symbol table and CFI directives explaining how to reason about their prologues. Differential Revision: http://reviews.llvm.org/D13261 llvm-svn: 248824	2015-09-29 20:12:33 +00:00
Zachary Turner	4dddcc64d3	[llvm-pdbdump] Add include-only filters. PDB files have a lot of noise in them, with hundreds (or thousands) of symbols from system libraries and compiler generated types. If you're only looking for a specific type, this can be problematic. This CL allows you to display only types, variables, or compilands matching a particular pattern. These filters can even be combined with exclude filters. Include-only filters are given priority, so that first the set of items to display is limited only to those that match the include filters, and then the set of exclude filters is applied to those. If there are no include filters specified, then it means "display everything". llvm-svn: 248822	2015-09-29 19:49:06 +00:00
Chad Rosier	dabe2534ed	[AArch64] Add integer pre- and post-index halfword/byte loads and stores. llvm-svn: 248817	2015-09-29 18:26:15 +00:00
Dehao Chen	028e122ca9	Revert r248810 which breaks tests. llvm-svn: 248814	2015-09-29 18:18:49 +00:00
Dehao Chen	410a25aa7a	http://reviews.llvm.org/D13231 Change lookup functions to const functions. llvm-svn: 248810	2015-09-29 17:59:58 +00:00
James Molloy	897048bee3	[ValueTracking] Teach isKnownNonZero about monotonically increasing PHIs If a PHI starts at a non-negative constant, monotonically increases (only adds of a constant are supported at the moment) and that add does not wrap, then the PHI is known never to be zero. llvm-svn: 248796	2015-09-29 14:08:45 +00:00
Jeroen Ketema	740f9d79ca	Arguments spilled on the stack before a function call may have alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786	2015-09-29 10:12:57 +00:00
Simon Pilgrim	43f5e0848e	[InstCombine] Improve Vector Demanded Bits Through Bitcasts Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements. This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector). This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts. Differential Revision: http://reviews.llvm.org/D12935 llvm-svn: 248784	2015-09-29 08:19:11 +00:00
Dan Gohman	868e1c08d9	[WebAssembly] Rename test files to match platform naming conventions. llvm-svn: 248783	2015-09-29 08:13:58 +00:00
Chen Li	9f27fc0599	[LoopUnswitch] Add block frequency analysis to recognize hot/cold regions Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default. Reviewers: broune, silvas, dnovillo, reames Subscribers: davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D11605 llvm-svn: 248777	2015-09-29 05:03:32 +00:00
Evgeniy Stepanov	d8b86f7cdc	Move dbg.declare intrinsics when merging and replacing allocas. Place new and update dbg.declare calls immediately after the corresponding alloca. Current code in replaceDbgDeclareForAlloca puts the new dbg.declare at the end of the basic block. LLVM codegen has problems emitting debug info in a situation when dbg.declare appears after all uses of the variable. This usually kinda works for inlining and ASan (two users of this function) but not for SafeStack (see the pending change in http://reviews.llvm.org/D13178). llvm-svn: 248769	2015-09-29 00:30:19 +00:00
Reid Kleckner	c71d6275ca	[WinEH] Fix ip2state table emission with funclets Previously we were hijacking the old LandingPadInfo data structures to communicate our state numbers. Now we don't need that anymore. llvm-svn: 248763	2015-09-28 23:56:30 +00:00
Sanjoy Das	4f1c45952c	[SCEV] Don't crash on pointer comparisons `ScalarEvolution::isImpliedCondOperandsViaNoOverflow` tries to cast the operand type of the comparison it is given to an `IntegerType`. This is incorrect because it could actually be simplifying a comparison between two pointers. Switch it to using `getTypeSizeInBits` instead, which does the right thing for both pointers and integers. Fixed PR24956. llvm-svn: 248743	2015-09-28 21:14:32 +00:00
Matt Arsenault	73aa8f687a	AMDGPU: Fix splitting x16 SMRD loads When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741	2015-09-28 20:54:52 +00:00
Matt Arsenault	e5d042cd56	AMDGPU: Fix moving SMRD loads with literal offsets on CI llvm-svn: 248740	2015-09-28 20:54:46 +00:00
Matt Arsenault	b378f075a2	AMDGPU: Add testcases Make sure we are testing moving users of the moved and split SMRD loads. llvm-svn: 248738	2015-09-28 20:54:38 +00:00
Matt Arsenault	f3c91f573f	AMDGPU: Cleanup test Run instnamer on it, and rename check prefix. This is in preparation for adding new testcases to cover bugs on other subtargets. llvm-svn: 248737	2015-09-28 20:54:32 +00:00
Sean Silva	ace7818ce6	[GlobalOpt] Sort members of llvm.used deterministically Patch by Jake VanAdrighem! Summary: Fix the way we sort the llvm.used and llvm.compiler.used members. This bug seems to have been introduced in rL183756 through a set of improper casts to GlobalValue*. In subsequent patches this problem was missed and transformed into a getName call on a ConstantExpr. Reviewers: silvas Subscribers: silvas, llvm-commits Differential Revision: http://reviews.llvm.org/D12851 llvm-svn: 248728	2015-09-28 19:02:11 +00:00
Artur Pilipenko	b4d009042b	Introduce !align metadata for load instruction Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D12853 llvm-svn: 248721	2015-09-28 17:41:08 +00:00
Philip Reames	13f023c09d	[InstSimplify] Fold simple known implications to true This was split off of http://reviews.llvm.org/D13040 to make it easier to test the correctness of the implication logic. For the moment, this only handles a single easy case which shows up when eliminating and combining range checks. In the (near) future, I plan to extend this for other cases which show up in range checks, but I wanted to make those changes incrementally once the framework was in place. At the moment, the implication logic will be used by three places. One in InstSimplify (this review) and two in SimplifyCFG (http://reviews.llvm.org/D13040 & http://reviews.llvm.org/D13070). Can anyone think of other locations this style of reasoning would make sense? Differential Revision: http://reviews.llvm.org/D13074 llvm-svn: 248719	2015-09-28 17:14:24 +00:00
Weiming Zhao	310770a90f	[LoopReroll] Ignore debug intrinsics Originally, debug intrinsics and annotation intrinsics may prevent the loop to be rerolled, now they are ignored. Differential Revision: http://reviews.llvm.org/D13150 llvm-svn: 248718	2015-09-28 17:03:23 +00:00
Dan Gohman	05a17aa82a	[WebAssembly] Support for direct call and call_indirect. llvm-svn: 248716	2015-09-28 16:22:39 +00:00
Zoran Jovanovic	cdb64566cc	[mips] Handling of immediates bigger than 16 bits Differential Revision: http://reviews.llvm.org/D10539 llvm-svn: 248706	2015-09-28 11:11:34 +00:00
Hal Finkel	bd582581b8	[DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking When AA is being used, non-aliasing stores are canonicalized to use the same chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of this by looking only as users of a store's chain operand. However, user iteration is not result-number specific, we need to check that the use is as a chain operand, and not via some other operand. It is certainly possible to have another potentially-aliasing store, which shares the first's base pointer, and uses the first's chain's node via some other operand. Failure to catch this situation caused, at least in the included test case, an assert later because the relative sequence-number ordering caused later replacement to create a cycle in the DAG. llvm-svn: 248698	2015-09-28 08:02:14 +00:00
Sanjoy Das	f1090b6061	[SCEV] identical instructions don't compute equal values Before this change `HasSameValue` would return true for distinct `alloca` instructions if they happened to be allocating the same type (`alloca` instructions are not specified as reading memory). This change adds an explicit whitelist of instruction types for which "identical" instructions compute the same value. Fixes PR24952. llvm-svn: 248690	2015-09-27 21:09:48 +00:00
Sanjay Patel	9533407566	[InstCombine] fold zexts and constants into a phi (PR24766) This is one step towards solving PR24766: https://llvm.org/bugs/show_bug.cgi?id=24766 We were not producing the same IR for these two C functions because the store to the temp bool causes extra zexts: #include <stdbool.h> bool switchy(char x1, char x2, char condition) { bool conditionMet = false; switch (condition) { case 0: conditionMet = (x1 == x2); break; case 1: conditionMet = (x1 <= x2); break; } return conditionMet; } bool switchy2(char x1, char x2, char condition) { switch (condition) { case 0: return (x1 == x2); case 1: return (x1 <= x2); } return false; } As noted in the code comments, this test case manages to avoid the more general existing phi optimizations where there are only 2 phi inputs or where there are no constant phi args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, then I don't think we can get SimplifyCFG to further optimize towards the canonical form for this function shown in the bug report. Differential Revision: http://reviews.llvm.org/D12866 llvm-svn: 248689	2015-09-27 20:34:31 +00:00
Joseph Tremoulet	09af67aba5	[EH] Create removeUnwindEdge utility Summary: Factor the code that rewrites invokes to calls and rewrites WinEH terminators to their "unwind to caller" equivalents into a helper in Utils/Local, and use it in the three places I'm aware of that need to do this. Reviewers: andrew.w.kaylor, majnemer, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13152 llvm-svn: 248677	2015-09-27 01:47:46 +00:00
Simon Pilgrim	91717ee233	[InstCombine] Removed unnecessary meta attributes. llvm-svn: 248672	2015-09-26 17:49:04 +00:00
Chen Li	7452d95656	[Bug 24848] Use range metadata to constant fold comparisons between two values Summary: This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848. If both operands of a comparison have range metadata, they should be used to constant fold the comparison. Reviewers: sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13177 llvm-svn: 248650	2015-09-26 03:26:47 +00:00
Matt Arsenault	86095b8dec	AMDGPU: Fix sched model for VOP2b instructions Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646	2015-09-26 02:25:45 +00:00
Dan Gohman	d0bf981296	[WebAssembly] Rename several functions and types according to the new spec. llvm-svn: 248644	2015-09-26 01:09:44 +00:00
Ahmed Bougacha	e81610fabb	[ARM] Don't generate clrex for pre-v7 targets. Since r248294, we emit clrex, but it doesn't exist on v6. llvm-svn: 248640	2015-09-26 00:14:02 +00:00
Sanjoy Das	b174f9a316	[SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts' Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 The original checkin, r248608 had to be backed out due to an issue with a ObjCXX unit test. That issue is now fixed, so re-landing. Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248638	2015-09-25 23:53:50 +00:00
Sanjoy Das	96709c4854	[SCEV] Reapply 'Exploit A < B => (A+K) < (B+K) when possible' Summary: This change teaches SCEV's `isImpliedCond` two new identities: A u< B u< -C => (A + C) u< (B + C) A s< B s< INT_MIN - C => (A + C) s< (B + C) While these are useful on their own, they're really intended to support D12950. The original checkin, r248606 had to be backed out due to an issue with a ObjCXX unit test. That issue is now fixed, so re-landing. Reviewers: atrick, reames, majnemer, nlewycky, hfinkel Subscribers: aadg, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12948 llvm-svn: 248637	2015-09-25 23:53:45 +00:00
Sanjay Patel	e1b09caaaf	[InstCombine] match De Morgan's Law hidden by zext ops (PR22723) This is a fix for PR22723: https://llvm.org/bugs/show_bug.cgi?id=22723 My first attempt at this was to change what I thought was the root problem: xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32 ...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop! My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would mean potentially returning a different size for the match than what was input. I think this would require all users of m_Not to check the size of the returned match, so I abandoned that idea. I settled on just fixing the exact case presented in the PR. This patch does allow the 2 functions in PR22723 to compile identically (x86): bool test(bool x, bool y) { return !x \| !y; } bool test(bool x, bool y) { return !x \|\| !y; } ... andb %sil, %dil xorb $1, %dil movb %dil, %al retq Differential Revision: http://reviews.llvm.org/D12705 llvm-svn: 248634	2015-09-25 23:21:38 +00:00
Cong Hou	15ea016346	Use fixed-point representation for BranchProbability. BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change: Pros: 1. It uses only a half space of the current one. 2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer. Cons: 1. Constructing a probability using arbitrary numerator and denominator needs additional calculations. 2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio. One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern. Differential revision: http://reviews.llvm.org/D12603 llvm-svn: 248633	2015-09-25 23:09:59 +00:00
Matthias Braun	a3b701f828	SelectionDAGDumper: Print simple operands inline. Print simple operands inline instead of their pointer/value number. Simple operands are SDNodes without predecessors like Constant(FP), Register, UNDEF. This unifies the behaviour with dumpr() which was already doing this. Previously: t0: ch = EntryToken t1: i64 = Register %vreg0 t2: i64,ch = CopyFromReg t0, t1 t3: i64 = Constant<1> t4: i64 = add t2, t3 t5: i64 = Constant<2> t6: i64 = add t2, t5 t10: i64 = undef t11: i8,ch = load t0, t2, t10<LD1[%tmp81]> t12: i8,ch = load t0, t4, t10<LD1[%tmp10]> t13: i8,ch = load t0, t6, t10<LD1[%tmp12]> Now: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0 t4: i64 = add t2, Constant:i64<1> t6: i64 = add t2, Constant:i64<2> t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64 t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64 t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64 Differential Revision: http://reviews.llvm.org/D12567 llvm-svn: 248628	2015-09-25 22:27:02 +00:00
Sanjay Patel	bbbf9a1a34	merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622	2015-09-25 21:49:48 +00:00
Matthias Braun	e86bbd8979	PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepoint The algorithm would not modify the live-in list of blocks below the save block point which is correct unless it happens to be a restore point at the same time. Also fixes the benign issue of live-in registers being added twice in some cases. The testcase is based on a test submitted by Kit Barton. Differential Revision: http://reviews.llvm.org/D13176 llvm-svn: 248620	2015-09-25 21:41:40 +00:00
Tom Stellard	e135ffd554	AMDGPU/SI: Use .hsatext section instead of .text for HSA Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619	2015-09-25 21:41:28 +00:00
Sanjoy Das	4a39b97671	Revert two SCEV changes that caused test failures in clang. r248606: "[SCEV] Exploit A < B => (A+K) < (B+K) when possible" r248608: "[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts." llvm-svn: 248614	2015-09-25 21:16:50 +00:00
Matt Arsenault	10aa807856	PeepholeOptimizer: Remove redundant copies If a virtual register is copied and another copy was already seen, replace with the previous copy. This only handles the simplest cases for now. This pattern shows up from various operand restrictions AMDGPU has which require inserting copies depending on the register class of the operands. llvm-svn: 248611	2015-09-25 20:22:12 +00:00
Sanjoy Das	d706fa8a0c	[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts. Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248608	2015-09-25 19:59:57 +00:00
Sanjoy Das	fdec9deb13	[SCEV] Exploit A < B => (A+K) < (B+K) when possible Summary: This change teaches SCEV's `isImpliedCond` two new identities: A u< B u< -C => (A + C) u< (B + C) A s< B s< INT_MIN - C => (A + C) s< (B + C) While these are useful on their own, they're really intended to support D12950. Reviewers: atrick, reames, majnemer, nlewycky, hfinkel Subscribers: aadg, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12948 llvm-svn: 248606	2015-09-25 19:59:49 +00:00
Matt Arsenault	28bd7d4afe	AMDGPU: Add some more tests for literal operands llvm-svn: 248600	2015-09-25 18:21:47 +00:00
Chad Rosier	1bbd7fb38e	[AArch64] Add support for generating pre- and post-index load/store pairs. llvm-svn: 248593	2015-09-25 17:48:17 +00:00
Matt Arsenault	4bf43d4e68	AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589	2015-09-25 17:27:08 +00:00
Matt Arsenault	5f70436c49	AMDGPU: Improve accuracy of instruction rates for VOPC These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582	2015-09-25 16:58:25 +00:00
James Molloy	eb46641c28	[GlobalsAA] Teach GlobalsAA about nocapture Arguments to function calls marked "nocapture" can be marked as non-escaping. However, nocapture is defined in terms of the lifetime of the callee, and if the callee can directly or indirectly recurse to the caller, the semantics of nocapture are invalid. Therefore, we eagerly discover which SCC each function belongs to, and later can check if callee and caller of a callsite belong to the same SCC, in which case there could be recursion. This means that we can't be so optimistic in getModRefInfo(ImmutableCallsite) - previously we assumed all call arguments never aliased with an escaping global. Now we need to check, because a global could now be passed as an argument but still not escape. This also solves a related conformance problem: MemCpyOptimizer can turn non-escaping stores of globals into calls to intrinsics like llvm.memcpy/llvm/memset. This confuses GlobalsAA, which knows the global can't escape and so returns NoModRef when queried, when obviously a memcpy/memset call does indeed reference and modify its arguments. This fixes PR24800, PR24801, and PR24802. llvm-svn: 248576	2015-09-25 15:39:29 +00:00
Saleem Abdulrasool	fe83b50289	ARM: address WoA division limitation We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561	2015-09-25 05:15:46 +00:00
Sanjoy Das	b513a9fa4f	[Bitcode][Asm] Teach LLVM to read and write operand bundles. Summary: This also adds the first set of tests for operand bundles. The optimizer has not been audited to ensure that it does the right thing with operand bundles. Depends on D12456. Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner Subscribers: maksfb, llvm-commits Differential Revision: http://reviews.llvm.org/D12457 llvm-svn: 248551	2015-09-24 23:34:52 +00:00
Ed Maste	f021808d60	Restore test coverage for other than ELFOSABI_NONE Add a FreeBSD test to restore testing of ELF OSABI other than ELFOSABI_NONE after r248534. Differential Revision: http://reviews.llvm.org/D13146 llvm-svn: 248550	2015-09-24 23:01:16 +00:00
Simon Pilgrim	68d0050c6a	[X86][SSE2] Fix zero/any extension shuffles that don't start from the first element Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H llvm-svn: 248536	2015-09-24 21:02:17 +00:00
Rafael Espindola	4405d5d889	Use ELFOSABI_NONE instead of ELFOSABI_LINUX. The doesn't seem to be a difference and ELFOSABI_NONE seems to be far more common: * Linux doesn't care when loading and puts ELFOSABI_NONE on core dumps. * Gold and bfd ld produce files with ELFOSABI_NONE. * Gold and bfd ld seems to ignore EI_OSABI other than for freebsd. * Gas puts ELFOSABI_NONE in most .o files. llvm-svn: 248534	2015-09-24 20:57:24 +00:00
Matt Arsenault	e66621b306	AMDGPU: Add s_dcache_* instructions llvm-svn: 248533	2015-09-24 19:52:27 +00:00
Matt Arsenault	d6adfb401c	AMDGPU: Add cache invalidation instructions. These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532	2015-09-24 19:52:21 +00:00
Matt Arsenault	c116767fec	AMDGPU: Run mubuf assembler test for CI llvm-svn: 248531	2015-09-24 19:52:15 +00:00
Adrian Prantl	f3e634b8fb	dsymutil: Fix the condition to distinguish module imports form definitions. llvm-svn: 248512	2015-09-24 16:10:14 +00:00
James Molloy	b6be1ebb7d	[ValueTracking] Teach isKnownNonZero a new trick If the shifter operand is a constant, and all of the bits shifted out are known to be zero, then if X is known non-zero at least one non-zero bit must remain. llvm-svn: 248508	2015-09-24 16:06:32 +00:00
Mohammad Shahid	d0203cbf1c	Regression Test: Deletes redundant/invalid test. Removes absdiff_expand.ll regression test file which is invalid. Diffrential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248493	2015-09-24 14:37:25 +00:00
Mohammad Shahid	13f1dfdf2e	Codegen: Fix llvm.absdiff semantic. Fixes the overflow case of llvm.absdiff intrinsic also updats the tests and LangRef.rst accordingly. Differential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248483	2015-09-24 10:35:03 +00:00
Charlie Turner	2720593ab4	[InstCombine] Recognize another bswap idiom. Summary: The byte-swap recognizer can now notice that this ``` uint32_t bswap(uint32_t x) { x = (x & 0x0000FFFF) << 16 \| (x & 0xFFFF0000) >> 16; x = (x & 0x00FF00FF) << 8 \| (x & 0xFF00FF00) >> 8; return x; } ``` is a bswap. Fixes PR23863. Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin Subscribers: majnemer, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D12637 llvm-svn: 248482	2015-09-24 10:24:58 +00:00
Matt Arsenault	68d938649e	Introduce target hook for optimizing register copies Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478	2015-09-24 08:36:14 +00:00
Matt Arsenault	cab64f1c75	AMDGPU: Fix printing trailing whitespace for mubuf atomics llvm-svn: 248472	2015-09-24 07:51:17 +00:00
Matt Arsenault	c721df0478	Use new TokenFactor chain when merging stores If the stores are storing values from loads which partially alias the stores, we could end up placing the merged loads and stores on the same chain which has the potential to break. Each store may have a different chain dependency on only some of the original loads. Create a new TokenFactor to capture all of the required dependencies of the stores rather than assuming all stores can use the same chain. The testcase is a situation where this happens, although it does not have an observable change from this. The DAG nodes just happened to not be reordered before despite this missing chain dependency. This is based on an off-list report for an out of tree target which regressed due to r246307 and I haven't managed to find a case where the nodes do end up reordered with an in tree target. llvm-svn: 248468	2015-09-24 07:22:38 +00:00
Matt Arsenault	c8e2ce4046	AMDGPU: Reduce number of copies emitted Instead of always inserting a copy in case the super register is itself a subregister, only extract to the super reg class if this is actually the case. This shouldn't really change codegen, but makes looking at the output of SIFixSGPRCopies easier to read. llvm-svn: 248467	2015-09-24 07:16:37 +00:00
Evgeniy Stepanov	8685daf23e	[safestack] Fix compiler crash in the presence of stack restores. A use can be emitted before def in a function with stack restore points but no static allocas. llvm-svn: 248455	2015-09-24 01:23:51 +00:00
Adrian Prantl	3236c9ce4a	Add REQUIRES: default_triple to these testcases. llvm-svn: 248452	2015-09-24 00:35:14 +00:00
Wei Mi	3cc9204a52	Put profile variables of COMDAT functions to it's own COMDAT group. In -fprofile-instr-generate compilation, to remove the redundant profile variables for the COMDAT functions, these variables are placed in the same COMDAT group as its associated function. This way when the COMDAT function is not picked by the linker, those profile variables will also not be output in the final binary. This may cause warning when mix link objects built w and wo -fprofile-instr-generate. This patch puts the profile variables for COMDAT functions to its own COMDAT group to avoid the problem. Patch by xur. Differential Revision: http://reviews.llvm.org/D12248 llvm-svn: 248440	2015-09-23 22:40:45 +00:00
Sanjay Patel	13e8bbc237	set div/rem default values to 'expensive' in TargetTransformInfo's cost model ...because that's what the cost model was intended to do. As discussed in D12882, this fix has a temporary unintended consequence for SimplifyCFG: it causes us to not speculate an fdiv. However, two wrongs make PR24818 right, and two wrongs make PR24343 act right even though it's really still wrong. I intend to correct SimplifyCFG and add to CodeGenPrepare to account for this cost model change and preserve the righteousness for the bug report cases. https://llvm.org/bugs/show_bug.cgi?id=24818 https://llvm.org/bugs/show_bug.cgi?id=24343 Differential Revision: http://reviews.llvm.org/D12882 llvm-svn: 248439	2015-09-23 22:28:18 +00:00
Tim Northover	beb5bccf88	ARM: fix folding stack adjustment (again again again...) This time, the issue is that we weren't accounting for the possibility that aligned DPRs could have been stored after the final "push" in a prologue. When that happened we effectively moved a "sub sp, #N" from below the aligned stores to above them, and everything went to pot. To make it worse, I'd actually committed something testing that we produced wrong code, so the test update is tiny. llvm-svn: 248437	2015-09-23 22:21:09 +00:00
Adrian Prantl	ea8a724474	dsymutil: Don't prune forward declarations inside a module definition. llvm-svn: 248428	2015-09-23 20:44:37 +00:00
Adrian Prantl	209c424d1e	Fix this dsymutil testcase by not passing in a path to the modulemap file, so the lookup works as expected after prepending the oso-prepend-path. This manifested only on Windows, because "/" is not a relative path there. llvm-svn: 248423	2015-09-23 19:53:10 +00:00
Philip Reames	d63df5107e	Remove handling of AddrSpaceCast in stripAndAccumulateInBoundsConstantOffsets Patch by: simoncook Unlike BitCasts, AddrSpaceCasts do not always produce an output the same size as its input, which was previously assumed. This fixes cases where two address spaces do not have the same size pointer, as an assertion failure would occur when trying to prove deferenceability. LoopUnswitch is used in the particular test, but LICM also exhibits the same problem. Differential Revision: http://reviews.llvm.org/D13008 llvm-svn: 248422	2015-09-23 19:48:43 +00:00
Lawrence Hu	cac0b89289	Swap loop invariant GEP with loop variant GEP to allow more LICM. This patch changes the order of GEPs generated by Splitting GEPs pass, specially when one of the GEPs has constant and the base is loop invariant, then we will generate the GEP with constant first when beneficial, to expose more cases for LICM. If originally Splitting GEP generate the following: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 %3 %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032 ... Now it genereates: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 1032 %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3 ... For no-loop cases, the original way of generating GEPs seems to expose more CSE cases, so we don't change the logic for no-loop cases, and only limit our change to the specific case we are interested in. llvm-svn: 248420	2015-09-23 19:25:30 +00:00
Akira Hatanaka	f6afd11538	[InstCombine] Preserve metadata when merging loads that are phi arguments. Make sure InstCombiner::FoldPHIArgLoadIntoPHI doesn't drop the following metadata: MD_tbaa MD_alias_scope MD_noalias MD_invariant_load MD_nonnull MD_range rdar://problem/17617709 Differential Revision: http://reviews.llvm.org/D12710 llvm-svn: 248419	2015-09-23 18:40:57 +00:00
Sanjay Patel	1a6534661b	[x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx xorl %eax, %ecx movd %ecx, %xmm0 into this: xorps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248415	2015-09-23 18:33:42 +00:00
Sanjay Patel	aba37553c4	[x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx orl %eax, %ecx movd %ecx, %xmm0 into this: orps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248409	2015-09-23 18:19:07 +00:00
Adrian Prantl	4c36e2f47e	Fix the order of operations. llvm-svn: 248406	2015-09-23 18:09:01 +00:00
Evgeniy Stepanov	a2002b08f7	Android support for SafeStack. Add two new ways of accessing the unsafe stack pointer: * At a fixed offset from the thread TLS base. This is very similar to StackProtector cookies, but we plan to extend it to other backends (ARM in particular) soon. Bionic-side implementation here: https://android-review.googlesource.com/170988. * Via a function call, as a fallback for platforms that provide neither a fixed TLS slot, nor a reasonable TLS implementation (i.e. not emutls). This is a re-commit of a change in r248357 that was reverted in r248358. llvm-svn: 248405	2015-09-23 18:07:56 +00:00
Adrian Prantl	c040893085	Temporarily make testcase more verbose to debug a msvc buildbot failure. llvm-svn: 248403	2015-09-23 17:59:45 +00:00
Chen Li	5cd6deeae3	[Bug 24848] Use range metadata to constant fold comparisons with constant values Summary: This is the first part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848. When range metadata is provided, it should be used to constant fold comparisons with constant values. Reviewers: sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12988 llvm-svn: 248402	2015-09-23 17:58:44 +00:00
Adrian Prantl	a112ef9e2d	dsymutil: Resolve forward decls for types defined in clang modules. This patch extends llvm-dsymutil's ODR type uniquing machinery to also resolve forward decls for types defined in clang modules. http://reviews.llvm.org/D13038 llvm-svn: 248398	2015-09-23 17:35:52 +00:00
Adrian Prantl	209370260d	dsymutil: print a warning when there is a module hash mismatch. This also updates the module binaries in the test directory because their module hash mismatched. llvm-svn: 248396	2015-09-23 17:11:10 +00:00
Sanjay Patel	df2495f331	[x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx andl %eax, %ecx movd %ecx, %xmm0 into this: andps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 Differential Revision: http://reviews.llvm.org/D13065 llvm-svn: 248395	2015-09-23 17:00:06 +00:00
Vedant Kumar	ff08e926ba	[Inline] Use AssumptionCache from the right Function This changes the behavior of AddAligntmentAssumptions to match its comment. I.e, prove the asserted alignment in the context of the caller, not the callee. Thanks to Mehdi Amini for seeing the issue here! Also to Artur Pilipenko who also saw a fix for the issue. rdar://22521387 Differential Revision: http://reviews.llvm.org/D12997 llvm-svn: 248390	2015-09-23 15:49:08 +00:00
David Majnemer	fa36bde2f6	[DeadArgElim] Split the invoke successor edge Invoking a function which returns an aggregate can sometimes be transformed to return a scalar value. However, this means that we need to create an insertvalue instruction(s) to recreate the correct aggregate type. We achieved this by inserting an insertvalue instruction at the invoke's normal successor. However, this is not feasible if the normal successor uses the invoke's return value inside a PHI node. Instead, split the edge between the invoke and the unwind successor and create the insertvalue instruction in the new basic block. The new basic block's successor will be the old invoke successor which leaves us with IR which is well behaved. This fixes PR24906. llvm-svn: 248387	2015-09-23 15:41:09 +00:00
Igor Laevsky	029bd93c5d	[DeadStoreElimination] Remove dead zero store to calloc initialized memory This change allows dead store elimination to remove zero and null stores into memory freshly allocated with calloc-like function. Differential Revision: http://reviews.llvm.org/D13021 llvm-svn: 248374	2015-09-23 11:38:44 +00:00
Simon Pilgrim	9cb018b6b6	[X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368	2015-09-23 08:48:33 +00:00
Evgeniy Stepanov	8d0e3011d8	Revert "Android support for SafeStack." test/Transforms/SafeStack/abi.ll breaks when target is not supported; needs refactoring. llvm-svn: 248358	2015-09-23 01:23:22 +00:00
Evgeniy Stepanov	ce2e16f00c	Android support for SafeStack. Add two new ways of accessing the unsafe stack pointer: * At a fixed offset from the thread TLS base. This is very similar to StackProtector cookies, but we plan to extend it to other backends (ARM in particular) soon. Bionic-side implementation here: https://android-review.googlesource.com/170988. * Via a function call, as a fallback for platforms that provide neither a fixed TLS slot, nor a reasonable TLS implementation (i.e. not emutls). llvm-svn: 248357	2015-09-23 01:03:51 +00:00
Cong Hou	b54a72ef78	Add a test case for the fix of profile update issue when lowering switch statement. llvm-svn: 248356	2015-09-23 00:34:56 +00:00
Adrian Prantl	77fefeba37	Debug Info: Emit the dwo_name only in skeleton CUs, not in DWOs. llvm-svn: 248340	2015-09-22 23:21:00 +00:00
Matthias Braun	73e4221e6c	LiveIntervalAnalysis: Avoid multiple connected liveness components We may have subregister defs which are unused but not discovered and cleaned up prior to liveness analysis. This creates multiple connected components in the resulting live range which are forbidden in the MachineVerifier because they would unnecesarily constrain the register allocator. Rewrite those dead definitions to define a newly created virtual register. Differential Revision: http://reviews.llvm.org/D13035 llvm-svn: 248335	2015-09-22 22:37:44 +00:00
Michael Zolotukhin	deade19630	[Unroll] Do not crash trying to propagate a value to vector load. llvm-svn: 248333	2015-09-22 22:27:12 +00:00
Adrian Prantl	e5162dba49	dsymutil: Follow references to clang modules and recursively clone the debug info. This does not yet resolve external type references. llvm-svn: 248331	2015-09-22 22:20:50 +00:00
Michael Zolotukhin	8bb31dd08a	[Unroll] Follow-up for r247769: fix a bug in UnrolledInstAnalyzer::visitLoad. Apart from checking that GlobalVariable is a constant, we should check that it's not a weak constant, in which case we can't propagate its value. llvm-svn: 248327	2015-09-22 21:41:29 +00:00
Davide Italiano	77011ba16a	Remove macho-dump. Its functionality is now covered by llvm-readobj. Approved by: Rafael Espindola, Eric Christopher, Jim Grosbach, Alex Rosenberg llvm-svn: 248302	2015-09-22 17:46:10 +00:00
Ahmed Bougacha	81616a72ea	[ARM] Emit clrex in the expanded cmpxchg fail block. ARM counterpart to r248291: In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248294	2015-09-22 17:22:58 +00:00
Ahmed Bougacha	07a844d758	[AArch64] Emit clrex in the expanded cmpxchg fail block. In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248291	2015-09-22 17:21:44 +00:00
Stephen Canon	8216d88511	Don't raise inexact when lowering ceil, floor, round, trunc. The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions did raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice. Differential Revision: http://reviews.llvm.org/D12969 llvm-svn: 248266	2015-09-22 11:43:17 +00:00
Daniel Sanders	f173dda0e2	[mips][ias] Implement .cpreturn directive. Summary: Based on a patch by David Chisnall. I've modified the original patch as follows: * Moved the expansion to the TargetStreamers so that the directive isn't expanded when emitting assembly. * Fixed an operand order bug. * Changed the move instructions from DADDu to OR to match recent changes to GAS. Reviewers: vkalintiris Subscribers: llvm-commits, emaste, seanbruno, theraven Differential Revision: http://reviews.llvm.org/D13017 llvm-svn: 248258	2015-09-22 10:50:09 +00:00
Simon Pilgrim	1cad0cd3ce	[X86][SSE] Match zero/any extension shuffles that don't start from the first element This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane. The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well. Differential Revision: http://reviews.llvm.org/D12561 llvm-svn: 248250	2015-09-22 08:16:08 +00:00
Philip Reames	5f99423de9	[LICM] Hoist calls to readonly argmemonly functions even with stores in the loop We know that an argmemonly function can only access memory pointed to by it's pointer arguments. Rather than needing to consider all possible stores as aliasing (as we do for a readonly function), we can only consider the aliasing of the pointer arguments. Note that this change only addresses hoisting. I'm thinking about how to address speculation safety as well, but that will be a different change. FYI, argmemonly disallows accessing memory through non-pointer typed arguments. Differential Revision: http://reviews.llvm.org/D12771 llvm-svn: 248220	2015-09-21 22:27:59 +00:00
Philip Reames	963febd4f8	Fix for pr24866 Turns out that not every basic block is guaranteed to have a node within the DominatorTree. This is really hard to trigger, but the test case from the PR managed to do so. There's active discussion continuing about what documentation and/or invariants needed cleaned up. llvm-svn: 248216	2015-09-21 22:04:10 +00:00
Simon Pilgrim	4003ed2da3	[DAGCombiner] Improve FMA support for interpolation patterns This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210	2015-09-21 20:32:48 +00:00
Jeroen Ketema	41681a5329	[ARM] Do not scale vext with a factor The vext pseudo-instruction takes the number of elements that need to be extracted, not the number of bytes. Hence, use the number of elements directly instead of scaling them with a factor. Reviewers: Silviu Baranga, James Molloy (not reflected in the differential revision) Differential Revision: http://reviews.llvm.org/D12974 llvm-svn: 248208	2015-09-21 20:28:04 +00:00
James Molloy	50a4c27f97	[LoopUtils,LV] Propagate fast-math flags on generated FCmp instructions We're currently losing any fast-math flags when synthesizing fcmps for min/max reductions. In LV, make sure we copy over the scalar inst's flags. In LoopUtils, we know we only ever match patterns with hasUnsafeAlgebra, so apply that to any synthesized ops. llvm-svn: 248201	2015-09-21 19:41:19 +00:00
Rafael Espindola	8055ed0c12	Avoid SEGFAULT if a requested symbol section is absent. Patch by Igor Kudrin! llvm-svn: 248194	2015-09-21 19:17:18 +00:00
Ulrich Weigand	126caeb043	[SystemZ] Fix expansion of ISD::FPOW and ISD::FSINCOS The ISD::FPOW and ISD::FSINCOS opcodes default to Legal, but there is no legal instruction for those on SystemZ. This could cause LLVM internal errors. Fixed by setting the operation action to Expand for those opcodes. Also added test cases for all other LLVM IR intrinsics that should generate a library call. (Those already work correctly since the default operation action is fine.) llvm-svn: 248180	2015-09-21 17:35:45 +00:00
Matt Arsenault	b774834429	DAGCombiner: Replace store of FP constant after attemping store merges If storing multiple FP constants, some subset of the stores would be replaced with integers due to visit order, so MergeConsecutiveStores would only partially merge these. llvm-svn: 248169	2015-09-21 15:59:46 +00:00
Asaf Badouh	eaf2da14bf	[X86][AVX512] add masked version for RSQRT14 & RCP14 Scalar FP Differential Revision: http://reviews.llvm.org/D12524 llvm-svn: 248147	2015-09-21 10:23:53 +00:00
Daniel Sanders	5d7962880d	[mips] Allow constant expressions in second argument of .cpsetup. Summary: Also tightened up the test and made a trivial fix to prevent double-newline after emitting .cpsetup directives. Reviewers: vkalintiris Subscribers: seanbruno, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D12956 llvm-svn: 248143	2015-09-21 09:26:55 +00:00
Sanjay Patel	bab5d6c636	add test file ahead of any functional changes for PR22428 llvm-svn: 248123	2015-09-20 15:58:00 +00:00
Simon Pilgrim	c6a553241c	[X86][SSE] Intrinsics builtins test refresh. NFCI llvm-svn: 248122	2015-09-20 15:41:35 +00:00
Igor Breger	b7e1f9d680	AVX512: Implemented encoding and intrinsics for vcmpss/sd. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12593 llvm-svn: 248121	2015-09-20 15:15:10 +00:00
Asaf Badouh	2744d21fb8	[X86][AVX512] extend support in Scalar conversion add scalar FP to Int conversion with truncation intrinsics add scalar conversion FP32 from/to FP64 intrinsics add rounding mode and SAE mode encoding for these intrinsics Differential Revision: http://reviews.llvm.org/D12665 llvm-svn: 248117	2015-09-20 14:31:19 +00:00
Igor Breger	4c4cd789c9	AVX512: vsqrtss/sd encoding and intrinsics implementation. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12102 llvm-svn: 248116	2015-09-20 09:13:41 +00:00
Asaf Badouh	572bbceecc	[X86][AVX512DQ] Add fpclass instruction Differential Revision: http://reviews.llvm.org/D12931 llvm-svn: 248115	2015-09-20 08:46:07 +00:00
Michael Kuperstein	58e86bc893	[X86] Fix sitofp and uitofp instruction matching failures with long double and avx512 The operation action for i32 and i64 cannot be set to legal, as long double needs custom lowering. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D12372 llvm-svn: 248114	2015-09-20 08:12:17 +00:00
Igor Breger	1d55f20bee	AVX512: Implemented intrinsics for vshuff32x4, vshuff64x2, vshufi64x2, vshufi32x4 Added tests for intrinsics. Differential Revision: http://reviews.llvm.org/D12525 llvm-svn: 248113	2015-09-20 07:18:53 +00:00
Igor Breger	0ede3cbb5c	AVX512: Implement instructions encoding, lowering and intrinsics vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4 Added tests for encoding, lowering and intrinsics. Differential Revision: http://reviews.llvm.org/D11893 llvm-svn: 248111	2015-09-20 06:52:42 +00:00
Sanjoy Das	428db150d1	[IndVars] Fix a bug in r248045. Because -indvars widens induction variables through arithmetic, `NeverNegative` cannot be a property of the `WidenIV` (a `WidenIV` manages information for all transitive uses of an IV being widened, including uses of `-1 * IV`). Instead it must live on `NarrowIVDefUse` which manages information for a specific def-use edge in the transitive use list of an induction variable. This change also adds a test case that demonstrates the problem with r248045. llvm-svn: 248107	2015-09-20 01:52:18 +00:00
Davide Italiano	e210ee56f2	Fixup r248096, commit the correct test. llvm-svn: 248097	2015-09-19 20:52:47 +00:00
Davide Italiano	a539f63ae1	[obj2yaml] Fix "time of check to time of use" bug. Add a test. llvm-svn: 248096	2015-09-19 20:49:34 +00:00
Simon Pilgrim	27f81776ad	[X86][AVX2] Use general sext IR for vpmovsx stack folding tests llvm-svn: 248093	2015-09-19 17:04:18 +00:00
Simon Pilgrim	d0448ee59f	[X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1)) Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x)) Differential Revision: http://reviews.llvm.org/D12663 llvm-svn: 248091	2015-09-19 13:22:57 +00:00
NAKAMURA Takumi	5881d349f9	[CMake] Update LLVM_TEST_DEPENDS not to use macho-dump. It has been unused since r247235. llvm-svn: 248088	2015-09-19 07:19:30 +00:00
David Majnemer	47ce0b81b0	[InstCombine] FoldICmpCstShrCst failed for ashr when comparing against -1 (icmp eq (ashr C1, %V) -1) may have multiple answers if C1 is not a power of two and has the sign bit set. This fixes PR24873. llvm-svn: 248074	2015-09-19 00:48:31 +00:00
Matt Arsenault	cc5d106263	AMDGPU: Add failing testcase for live interval construction llvm-svn: 248067	2015-09-19 00:03:56 +00:00
Sanjoy Das	f69d0e3384	[IndVars] Widen more comparisons for non-negative induction vars Summary: If an induction variable is provably non-negative, its sign extension is equal to its zero extension. This means narrow uses like icmp slt iNarrow %indvar, %rhs can be widened into icmp slt iWide zext(%indvar), sext(%rhs) Reviewers: atrick, mcrosier, hfinkel Subscribers: hfinkel, reames, llvm-commits Differential Revision: http://reviews.llvm.org/D12745 llvm-svn: 248045	2015-09-18 21:21:02 +00:00
Cong Hou	d40105d321	Update edge weights properly when merging blocks in if-conversion. In if-conversion, there is a utility function MergeBlocks() that is used to merge blocks. However, when new edges are built in this function the edge weight is either not provided or not updated properly, leading to a modified CFG with incorrect edge weights. This patch corrects this issue. Differential Revision: http://reviews.llvm.org/D12513 llvm-svn: 248030	2015-09-18 20:22:41 +00:00
Eric Christopher	a835956bda	Limit the range of processors supported by ARM fast isel to v6 or later as that's all that is tested right now. Fixes PR24858. llvm-svn: 248027	2015-09-18 20:08:18 +00:00
Cong Hou	f9f9ffb98b	Scaling up values in ARMBaseInstrInfo::isProfitableToIfCvt() before they are scaled by a probability to avoid precision issue. In ARMBaseInstrInfo::isProfitableToIfCvt(), there is a simple cost model in which the number of cycles is scaled by a probability to estimate the cost. However, when the number of cycles is small (which is usually the case), there is a precision issue after the computation. To avoid this issue, this patch scales those cycles by 1024 (chosen to make the multiplication a litter faster) before they are scaled by the probability. Other variables are also scaled up for the final comparison. Differential Revision: http://reviews.llvm.org/D12742 llvm-svn: 248018	2015-09-18 18:19:40 +00:00
Matthias Braun	f89b7c7188	SelectionDAGDumper: Hide [ID=X], [ORD=X] and source locations by default. You can show them with the new -dag-dump-verbose switch. Differential Revision: http://reviews.llvm.org/D12566 llvm-svn: 248011	2015-09-18 17:57:28 +00:00
Matthias Braun	0b7d6c14c9	SelectionDAG: Introduce PersistentID to SDNode for assert builds. This gives us more human readable numbers to identify nodes in debug dumps. Before: 0x7fcbd9700160: ch = EntryToken 0x7fcbd985c7c8: i64 = Register %RAX ... 0x7fcbd9700160: <multiple use> 0x7fcbd985c578: i64,ch = MOV64rm 0x7fcbd985c6a0, 0x7fcbd985cc68, 0x7fcbd985c200, 0x7fcbd985cd90, 0x7fcbd985ceb8, 0x7fcbd9700160<Mem:LD8[@foo]> [ORD=2] 0x7fcbd985c8f0: ch,glue = CopyToReg 0x7fcbd9700160, 0x7fcbd985c7c8, 0x7fcbd985c578 [ORD=3] 0x7fcbd985c7c8: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985ca18: ch = RETQ 0x7fcbd985c7c8, 0x7fcbd985c8f0, 0x7fcbd985c8f0:1 [ORD=3] Now: t0: ch = EntryToken t5: i64 = Register %RAX ... t0: <multiple use> t3: i64,ch = MOV64rm t10, t12, t11, t13, t14, t0<Mem:LD8[@foo]> [ORD=2] t6: ch,glue = CopyToReg t0, t5, t3 [ORD=3] t5: <multiple use> t6: <multiple use> t6: <multiple use> t7: ch = RETQ t5, t6, t6:1 [ORD=3] Differential Revision: http://reviews.llvm.org/D12564 llvm-svn: 248010	2015-09-18 17:41:00 +00:00
Geoff Berry	43ec15e57e	[AArch64] Improved bitfield instruction selection. Summary: For bitfield insert OR matching, check both operands for larger pattern first before checking for smaller pattern. Add pattern for unsigned bitfield insert-in-zero done with SHL+AND. Resolves PR21631. Reviewers: jmolloy, t.p.northover Subscribers: aemerson, rengolin, llvm-commits, mcrosier Differential Revision: http://reviews.llvm.org/D12908 llvm-svn: 248006	2015-09-18 17:11:53 +00:00
Daniel Sanders	df19a5e605	[mips][microMIPS] Fix an invalid read for lwm32 and reserved reglist values. Summary: Some values of 'reglist' are reserved and cause the disassembler to read past the end of the Regs array. Treat lwm32's containing reserved values as invalid instructions. Reviewers: zoran.jovanovic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12959 llvm-svn: 247990	2015-09-18 14:20:54 +00:00
Igor Laevsky	0fa4819dd8	[LazyValueInfo] Report nonnull range for nonnull pointers Currently LazyValueInfo will report only alloca's as having nonnull range. For loads with !nonnull metadata it will bailout with no additional information. Same is true for calls returning nonnull pointers. This change extends LazyValueInfo to handle additional nonnull instructions. Differential Revision: http://reviews.llvm.org/D12932 llvm-svn: 247985	2015-09-18 13:01:48 +00:00
Artur Pilipenko	84bc62f7a3	Support align attribute for return values Reviewed By: reames Differential Revision: http://reviews.llvm.org/D12844 llvm-svn: 247984	2015-09-18 12:33:31 +00:00
Quentin Colombet	b4c6886215	[ShrinkWrap] Refactor the handling of infinite loop in the analysis. - Strenghten the logic to be sure we hoist the restore point out of the current loop. (The fixes a bug with infinite loop, added as part of the patch.) - Walk over the exit blocks of the current loop to conver to the desired restore point in one iteration of the update loop. llvm-svn: 247958	2015-09-17 23:21:34 +00:00
Davide Italiano	096cda11fc	[llvm-readobj] Fix another "time of check to time of use bug". It seems there's more copy-paste between tools than needed. llvm-svn: 247954	2015-09-17 22:29:58 +00:00
David Majnemer	163b7f121c	[WinEH] Fix tests broken by funclet-layout llvm-svn: 247944	2015-09-17 21:11:12 +00:00
Joerg Sonnenberger	1bbfa7f9d7	[SPARC] Add mulscc. llvm-svn: 247940	2015-09-17 20:54:26 +00:00
David Majnemer	978902309a	[WinEH] Add a funclet layout pass Windows EH funclets need to be contiguous. The FuncletLayout pass will ensure that the funclets are together and begin with a funclet entry MBB. Differential Revision: http://reviews.llvm.org/D12943 llvm-svn: 247937	2015-09-17 20:45:18 +00:00
Reid Kleckner	5b8a46e771	[WinEH] Make funclet return instrs pseudo instrs This makes catchret look more like a branch, and less like a weird use of BlockAddress. It also lets us get away from llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label arithmetic. llvm-svn: 247936	2015-09-17 20:43:47 +00:00
Simon Pilgrim	61116ddc7b	[InstCombine] Added vector demanded bits support for SSE4A EXTRQ/INSERTQ instructions The SSE4A instructions EXTRQ/INSERTQ only use the lower 64-bits (or less) for many of their input vector operands and all of them have undefined upper 64-bits results. Differential Revision: http://reviews.llvm.org/D12680 llvm-svn: 247934	2015-09-17 20:32:45 +00:00
Teresa Johnson	ff642b9b84	Restore "Function bitcode index in Value Symbol Table and lazy reading support" This reverts commit r247898 (which reverted r247894). Patch fixed to address two issues exposed by buildbots: - unused variable warning in NDEBUG mode - std::initializer_list lifetime issue causing test failures Original Summary: Support for including the function bitcode indices in the Value Symbol Table. This requires writing the VST after the function blocks, which in turn requires a new VST forward declaration record encoding the offset of the full VST (which is backpatched to contain the offset after the VST is written). This patch also enables the lazy function reader to use the new function indices out of the VST. This support will be used by ThinLTO as well, which will be in a follow on patch. Backwards compatibility with older bitcode files is maintained. A new test is also included. The bitcode format (used for the lazy reader as well as the upcoming ThinLTO patches) came out of discussions with Duncan and others and is described here: https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view Reviewers: dexonsmith, davidxl, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12536 llvm-svn: 247927	2015-09-17 20:12:00 +00:00
Diego Novillo	d2e2137b4c	Temporarily fix gcov failures in big-endian hosts. This test uses a gcov file generated in a little-endian host. The gcov reader does not allow different endianness, so the test fails on big endian hosts. XFAILing for now. llvm-svn: 247920	2015-09-17 19:05:48 +00:00
Reid Kleckner	b78585b3c2	Fix the test case I just committed llvm-svn: 247905	2015-09-17 17:21:45 +00:00
Reid Kleckner	ed17079b52	[WinEH] Add and use hasEHPadSuccessor instead of getLandingPadSuccessor getLandingPadSuccessor assumes that each invoke can have at most one EH pad successor, but WinEH invokes can have more than one. Two out of three callers of getLandingPadSuccessor don't use the returned landingpad, so we can make them use this simple predicate instead. Eventually we'll have to circle back and fix SplitKit.cpp so that register allocation works. Baby steps. llvm-svn: 247904	2015-09-17 17:19:40 +00:00
Teresa Johnson	2e98d57ad4	Revert "Function bitcode index in Value Symbol Table and lazy reading support" Temporarily revert to fix some buildbot issues. One is a minor issue with a variable unused in NDEBUG mode. More concerning are some test failures on win7 that I need to dig into. This reverts commit 4e66a74543459832cfd571db42b4543580ae1d1d. llvm-svn: 247898	2015-09-17 16:19:10 +00:00
Daniel Sanders	e2982adc0b	[mips] Add assembler support for the .cprestore directive. Summary: This assembler directive is used in O32 PIC to restore the current function's $gp after executing JAL's. The $gp is first stored on the stack at a user-specified offset. It has the following format: ".cprestore 8" (where 8 is the offset). This fixes llvm.org/PR20967. Patch by Toma Tabacu. Reviewers: seanbruno, tomatabacu Subscribers: brooks, seanbruno, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D6267 llvm-svn: 247897	2015-09-17 16:08:39 +00:00
Teresa Johnson	b77b1f8a0c	Function bitcode index in Value Symbol Table and lazy reading support Summary: Support for including the function bitcode indices in the Value Symbol Table. This requires writing the VST after the function blocks, which in turn requires a new VST forward declaration record encoding the offset of the full VST (which is backpatched to contain the offset after the VST is written). This patch also enables the lazy function reader to use the new function indices out of the VST. This support will be used by ThinLTO as well, which will be in a follow on patch. Backwards compatibility with older bitcode files is maintained. A new test is also included. The bitcode format (used for the lazy reader as well as the upcoming ThinLTO patches) came out of discussions with Duncan and others and is described here: https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view Reviewers: dexonsmith, davidxl, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12536 llvm-svn: 247894	2015-09-17 15:52:30 +00:00
Zoran Jovanovic	7ba636cb4c	[mips][microMIPS] Implement TEQ, TGE, TGEU, TLT, TLTU and TNE instructions Differential Revision: http://reviews.llvm.org/D9658 llvm-svn: 247880	2015-09-17 10:14:09 +00:00
Elena Demikhovsky	702a6adfaa	AVX-512: shufflevector for i1 vectors <2 x i1> .. <64 x i1> AVX-512 does not provide an instruction that shuffles mask register. So I do the following way: mask-2-simd , shuffle simd , simd-2-mask Differential Revision: http://reviews.llvm.org/D12727 llvm-svn: 247876	2015-09-17 06:53:12 +00:00
Diego Novillo	3376a78781	GCC AutoFDO profile reader - Initial support. This adds enough machinery to support reading simple GCC AutoFDO profiles. It now supports reading flat profiles (no function calls). Subsequent patches will add support for: - Inlined calls (in particular, the inline call stack is not traversed to accumulate samples). - Working sets and modules. These are used mostly for GCC's LIPO optimizations, so they're not needed in LLVM atm. I'm not sure that we will ever need them. For now, I've if0'd around the calls. The patch also adds support in GCOV.h for gcov version V704 (generated by GCC's profile conversion tool). llvm-svn: 247874	2015-09-17 00:17:24 +00:00
Reid Kleckner	813f1b65bc	[WinEH] Rip out the landingpad-based C++ EH state numbering code It never really worked, and the new code is working better every day. llvm-svn: 247860	2015-09-16 22:14:46 +00:00
David Majnemer	67bff0d88b	[WinEHPrepare] Turn terminatepad into a cleanuppad + call + cleanupret The MSVC doesn't really support exception specifications so let's just turn these into cleanuppads. Later, we might use terminatepad to more efficiently encode the "noexcept"-ness of a function body. llvm-svn: 247848	2015-09-16 20:42:16 +00:00
Sanjoy Das	e5f4889ba9	[InstCombine] Optimize icmp slt signum(x), 1 --> icmp slt x, 1 Summary: `signum(x)` is sometimes implemented as `(x >> 63) \| (-x >>> 63)` (for an `i64` `x`). This change adds a matcher for that pattern, and an instcombine rule to optimize `signum(x) s< 1`. Later, we can also consider optimizing: icmp slt signum(x), 0 --> icmp slt x, 0 icmp sle signum(x), 1 --> true etc. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12703 llvm-svn: 247846	2015-09-16 20:41:29 +00:00
Reid Kleckner	b005d281c3	[WinEH] Pull Adjectives and CatchObj out of the catchpad arg list Clang now passes the adjectives as an argument to catchpad. Getting the CatchObj working is simply a matter of threading another static alloca through codegen, first as an alloca, then as a frame index, and finally as a frame offset. llvm-svn: 247844	2015-09-16 20:16:27 +00:00
David Majnemer	459a64aed7	[WinEHPrepare] Provide a cloning mode which doesn't demote We are experimenting with a new approach to saving and restoring SSA values used across funclets: let the register allocator do the dirty work for us. However, this means that we need to be able to clone commoned blocks without relying on demotion. llvm-svn: 247835	2015-09-16 18:40:37 +00:00
Teresa Johnson	8c8fe5a015	Disable the second verification run when performing LTO through gold in NDEBUG mode. Follow on patch for r247729 - LTO: Disable extra verify runs in release builds. llvm-svn: 247824	2015-09-16 18:06:45 +00:00
Reid Kleckner	84ebff4a5e	[WinEH] Skip state numbering when no EH pads are present Otherwise we'd try to emit the thunk that passes the LSDA to __CxxFrameHandler3. We don't emit the LSDA if there were no landingpads, so we'd end up with an assembler error when trying to write the COFF object. llvm-svn: 247820	2015-09-16 17:19:44 +00:00
Mehdi Amini	6f6f137e49	Improve "default_triple" specification: make it at the directory level for test/tools/llvm-mc From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247819	2015-09-16 17:03:12 +00:00
Dan Gohman	950a13cfa3	[WebAssembly] Check in an initial CFG Stackifier pass This pass implements a simple algorithm for conversion from CFG to wasm's structured control flow. It doesn't yet handle multiple-entry loops; that will be added in a future patch. It also adds initial support for switch statements. Differential Revision: http://reviews.llvm.org/D12735 llvm-svn: 247818	2015-09-16 16:51:30 +00:00
Sanjay Patel	a260701bbb	propagate fast-math-flags on DAG nodes After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815	2015-09-16 16:31:21 +00:00
Reid Kleckner	85dfb68e50	Add assembler fatal error for undefined assembler labels in COFF writer llvm-svn: 247814	2015-09-16 16:26:29 +00:00
Joerg Sonnenberger	22cd644e1b	[SPARC] Both GNU and Solaris as support eq as condition code for integer ops. llvm-svn: 247804	2015-09-16 14:41:36 +00:00
Joerg Sonnenberger	9763490e4d	[SPARC] Recognize st/stx operations with %fsr argument too. llvm-svn: 247794	2015-09-16 13:30:54 +00:00
Michael Kuperstein	d926465342	[X86] Do not generate 64-bit pops of 32-bit GPRs. When trying emit a stack adjustments using pops, frame lowering selects an arbitrary free GPR. It should always select one from an appropriate class... This fixes PR24649. Patch by: amjad.aboud@intel.com Differential Revision: http://reviews.llvm.org/D12609 llvm-svn: 247785	2015-09-16 11:27:20 +00:00
Zoran Jovanovic	6e6a2c9cd7	[mips][microMIPS] Implement PREFX, LHUE, LBE, LBUE, LHE, LWE, SBE, SHE and SWE instructions Differential Revision: http://reviews.llvm.org/D9189 llvm-svn: 247780	2015-09-16 09:14:35 +00:00
NAKAMURA Takumi	d42d3df56f	Copy back Inputs/gmlt.ll. Also DebugInfo/X86/gmlt.test uses it. llvm-svn: 247777	2015-09-16 06:22:55 +00:00
Mehdi Amini	01fee92d05	Fix test gmlt.test by moving its Inputs where expected. I couldn't see the failure as the test is XFAIL'ed on Darwin. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247776	2015-09-16 06:04:31 +00:00
Mehdi Amini	d178f4fc89	Make the default triple optional by allowing an empty string When building LLVM as a (potentially dynamic) library that can be linked against by multiple compilers, the default triple is not really meaningful. We allow to explicitely set it to an empty string when configuring LLVM. In this case, said "target independent" tests in the test suite that are using the default triple are disabled by matching the newly available feature "default_triple". Reviewers: probinson, echristo Differential Revision: http://reviews.llvm.org/D12660 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247775	2015-09-16 05:34:32 +00:00
Mehdi Amini	8e468d6388	[NaryReassociate] Improve test CHECK Add `CHECK` directives for the function calls. Differential Revision: http://reviews.llvm.org/D12885 Patch by: Volkan Keles <vkeles@apple.com> From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247774	2015-09-16 05:27:46 +00:00
Michael Zolotukhin	fc314be0ec	[Unroll] Fix a bug in UnrolledInstAnalyzer::visitLoad. We only checked that a global is initialized with constants, which is incorrect. We should be checking that GlobalVariable is a constant, not just initialized with it. llvm-svn: 247769	2015-09-16 03:25:09 +00:00
Sanjoy Das	8a5526e8be	[IndVars] Fix PR24783. In `IndVarSimplify::ExpandSCEVIfNeeded`, `SCEVExpander::findExistingExpansion` may return an `llvm::Value` that differs in type from the SCEV it was asked to find an expansion for (but computes the same value). In such cases, we fall back on `expandCodeFor`; and rely on LLVM to CSE the two equivalent expressions (different only by a no-op cast) into a single computation. I tried a few other approaches to fixing PR24783, all of which turned out to be more complex than this current version: 1. Move the `ExpandSCEVIfNeeded` logic into `expandCodeFor`. This got problematic because currently we do not pass in the `Loop *` into `expandCodeFor`. Changing the interface to do this is a more invasive change, and really does not make much semantic sense unless the SCEV being passed in is an add recurrence. There is also the problem of `expandCodeFor` being used in places other than `indvars` -- there may be performance / correctness issues elsewhere if `expandCodeFor` is moved from always generating IR from scratch to cache-like model. 2. Have `findExistingExpansion` only return expression with the correct type. This would make `isHighCostExpansionHelper` and thus `isHighCostExpansion` more conservative than necessary. 3. Insert casts on the value returned by `findExistingExpansion` if needed using `InsertNoopCastOfTo`. This is complicated because `InsertNoopCastOfTo` depends on internal state of its `SCEVExpander` (specifically `Builder.GetInserPoint()`), and this may not be set up when `ExpandSCEVIfNeeded` is called. 4. Manually insert casts on the value returned by `findExistingExpansion` if needed using `InsertNoopCastOfTo` via `CastInst::Create`. This is probably workable, but figuring out the location where the cast instruction needs to be inserted has enough edge cases (arguments, constants, invokes, LCSSA must be preserved) makes me feel what I have right now is simplest solution. llvm-svn: 247749	2015-09-15 23:45:39 +00:00
Davide Italiano	386e2ab158	[llvm-cxxdump] Remove duplicate code check. We already fail with 'No such file or directory' when we try to open the file -- if that doesn't exist. Also add a test to verify this behavior. llvm-svn: 247744	2015-09-15 23:35:32 +00:00
Duncan P. N. Exon Smith	cff5feff6f	Reapply "LTO: Disable extra verify runs in release builds" This reverts commit r247730, effectively reapplying r247729. This time I have an lld commit ready to follow. llvm-svn: 247735	2015-09-15 23:05:59 +00:00
Alexey Samsonov	c1603b6493	[ASan] Don't instrument globals in .preinit_array/.init_array/.fini_array These sections contain pointers to function that should be invoked during startup/shutdown by __libc_csu_init and __libc_csu_fini. Instrumenting these globals will append redzone to them, which will be filled with zeroes. This will cause null pointer dereference at runtime. Merge ASan regression tests for globals that should be ignored by instrumentation pass. llvm-svn: 247734	2015-09-15 23:05:48 +00:00
Duncan P. N. Exon Smith	7de73e56a4	Revert "LTO: Disable extra verify runs in release builds" This temporarily reverts commit r247729, as it caused lld build failures. I'll recommit once I have an lld patch ready-to-go. llvm-svn: 247730	2015-09-15 22:47:38 +00:00
Duncan P. N. Exon Smith	236787838c	LTO: Disable extra verify runs in release builds The verifier currently runs three times in LTO: (1) after parsing, (2) at the beginning of the optimization pipeline, and (3) at the end of it. The first run is important, since we're not sure where the bitcode comes from and it's nice to validate it, but in release builds the extra runs aren't appropriate. This commit: - Allows these runs to be disabled in LTOCodeGenerator. - Adds command-line options to llvm-lto. - Adds command-line options to libLTO.dylib, and disables the verifier by default in release builds (based on NDEBUG). This shaves about 3.5% off the runtime of ld64 when linking verify-uselistorder with -flto -g. rdar://22509081 llvm-svn: 247729	2015-09-15 22:26:11 +00:00
Justin Bogner	01fa3f96b3	test: Add "REQUIRES: native" so this test passes with no default triple configured llvm-svn: 247719	2015-09-15 21:13:33 +00:00
Quentin Colombet	7b13976254	[ShrinkWrapping] Add a test case for r247710. llvm-svn: 247713	2015-09-15 18:51:43 +00:00
Piotr Padlewski	6c15ec49ed	Introducing llvm.invariant.group.barrier intrinsic For more info for what reason it was invented, goto: http://lists.llvm.org/pipermail/cfe-dev/2015-July/044227.html invariant.group.barrier: http://reviews.llvm.org/D12310 docs: http://reviews.llvm.org/D11399 CodeGenPrepare: http://reviews.llvm.org/D12875 llvm-svn: 247711	2015-09-15 18:32:14 +00:00
Arch D. Robison	8ed0854f55	Broaden optimization of fcmp ([us]itofp x, constant) by instcombine. The patch extends the optimization to cases where the constant's magnitude is so small or large that the rounding of the conversion is irrelevant. The "so small" case includes negative zero. Differential review: http://reviews.llvm.org/D11210 llvm-svn: 247708	2015-09-15 17:51:59 +00:00
Igor Laevsky	bdc1eafe20	[CorrelatedValuePropagation] Infer nonnull attributes LazuValueInfo can prove that value is nonnull based on the context information. Make use of this ability to infer nonnull attributes for the call arguments. Differential Revision: http://reviews.llvm.org/D12836 llvm-svn: 247707	2015-09-15 17:51:50 +00:00
Marcello Maggioni	454faa84e2	[NaryReassociate] Add support for Mul instructions This patch extends the current pass by handling Mul instructions as well. Patch by: Volkan Keles (vkeles@apple.com) llvm-svn: 247705	2015-09-15 17:22:52 +00:00
Zoran Jovanovic	dc4b8c2761	[mips][microMIPS] Fix an issue with disassembling lwm32 instruction Fixed microMIPS disassembler crash on test case generated by llvm-mc-fuzzer. Differential Revision: http://reviews.llvm.org/D12881 llvm-svn: 247698	2015-09-15 15:21:27 +00:00
Zoran Jovanovic	8eb8c9861d	[mips] Add support for branch-likely pseudo-instructions Differential Revision: http://reviews.llvm.org/D10537 llvm-svn: 247697	2015-09-15 15:06:26 +00:00
Ulrich Weigand	e861e6442c	[SystemZ] Fix assertion failure in tryBuildVectorShuffle Under certain circumstances, tryBuildVectorShuffle would attempt to create a BUILD_VECTOR node with an invalid combination of types. This happened when one of the components of the original BUILD_VECTOR was itself a TRUNCATE node. That TRUNCATE was stripped off during intermediate processing to simplify code, but when adding the node back to the result vector, we still need it to get the type right. llvm-svn: 247694	2015-09-15 14:27:46 +00:00
Zoran Jovanovic	7beb737b46	[mips][microMIPS] Implement CACHEE and PREFE instructions for microMIPS32r6 Differential Revision: http://reviews.llvm.org/D11632 llvm-svn: 247670	2015-09-15 10:05:10 +00:00
Daniel Sanders	e4e83a7bc1	[mips] Added support for various EVA ASE instructions. Summary: Added support for the following instructions: CACHEE, LBE, LBUE, LHE, LHUE, LWE, LLE, LWLE, LWRE, PREFE, SBE, SHE, SWE, SCE, SWLE, SWRE, TLBINV, TLBINVF This required adding some infrastructure for the EVA ASE. Patch by Scott Egerton. Reviewers: vkalintiris, dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11139 llvm-svn: 247669	2015-09-15 10:02:16 +00:00
Sanjoy Das	f75e15e5ac	[PlaceSafepoints] Make the width of a counted loop settable. Summary: This change lets a `PlaceSafepoints` client change how wide the trip count of a loop has to be for the loop to be considerd "counted", via `CountedLoopTripWidth`. It also removes the boolean `SkipCounted` flag and the `upperTripBound` constant -- we can get the old behavior of `SkipCounted` == `false` by setting `CountedLoopTripWidth` to `13` (2 ^ 13 == 8192). Reviewers: reames Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D12789 llvm-svn: 247656	2015-09-15 01:42:48 +00:00
Dan Gohman	311b488d76	[WebAssembly] Implement int64-to-int32 conversion. llvm-svn: 247649	2015-09-15 00:55:19 +00:00
Adrian Prantl	deef90d7f5	DwarfDebug: Emit dwo_id+dwo_name for DICompileUnits that provide a dwoId. For module debugging clang emits prefabricated skeleton compile units that can be recognized by a nonzero dwoId. llvm-svn: 247626	2015-09-14 22:10:22 +00:00
Chen Li	0d043b52eb	[InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not covered under llvm/test, but fixes should be pretty obvious. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12779 llvm-svn: 247587	2015-09-14 18:10:43 +00:00
Davide Italiano	2c007d050a	[llvm-mc] Better error handling in ENOENT case + test. This is a follow up to r247518. As a general note, I think we could do a much better job testing for error conditions in tools. I already anticipated in a previous mail, but while implementing this I noticed that the code coverage we have for error checking is pretty low. I can arbitrarily remove checks from several tools and the suite still passes. Differential Revision: http://reviews.llvm.org/D12846 llvm-svn: 247582	2015-09-14 17:10:01 +00:00
Jun Bum Lim	34b9bd0435	Improve ISel using across lane min/max reduction In vectorized integer min/max reduction code, the final "reduce" step is sub-optimal. In AArch64, this change wll combine : %svn0 = vector_shuffle %0, undef<2,3,u,u> %smax0 = smax %0, svn0 %svn3 = vector_shuffle %smax0, undef<1,u,u,u> %sc = setcc %smax0, %svn3, gt %n0 = extract_vector_elt %sc, #0 %n1 = extract_vector_elt %smax0, #0 %n2 = extract_vector_elt $smax0, #1 %result = select %n0, %n1, n2 becomes : %1 = smaxv %0 %result = extract_vector_elt %1, 0 This change extends r246790. llvm-svn: 247575	2015-09-14 16:19:52 +00:00
JF Bastien	26aca14b15	[MergeFuncs] Fix bug in merging GetElementPointers GetElementPointers must have the first argument's type compared for structural equivalence. Previously the code erroneously compared the pointer's type, but this code was dead because all pointer types (of the same address space) are the same. The pointee must be compared instead (using the type stored in the GEP, not from the pointer type which will be erased anyway). Author: jrkoenig Reviewers: dschuff, nlewycky, jfb Subscribers: nlewycky, llvm-commits Differential revision: http://reviews.llvm.org/D12820 llvm-svn: 247570	2015-09-14 15:37:48 +00:00
John Brawn	056e67865a	[ARM] Extract shifts out of multiply-by-constant Turning (op x (mul y k)) into (op x (lsl (mul y k>>n) n)) is beneficial when we can do the lsl as a shifted operand and the resulting multiply constant is simpler to generate. Do this by doing the transformation when trying to select a shifted operand, as that ensures that it actually turns out better (the alternative would be to do it in PreprocessISelDAG, but we don't know for sure there if extracting the shift would allow a shifted operand to be used). Differential Revision: http://reviews.llvm.org/D12196 llvm-svn: 247569	2015-09-14 15:19:41 +00:00
NAKAMURA Takumi	c397e7881a	Revert part of r247553, "[CMake] Reformat CLANG_TEST_DEPS." It was accidental commit. llvm-svn: 247555	2015-09-14 12:51:01 +00:00
NAKAMURA Takumi	0f1cbee00e	[CMake] Reformat CLANG_TEST_DEPS. llvm-svn: 247553	2015-09-14 12:41:53 +00:00
Simon Pilgrim	f8f86ab176	[X86][MMX] Added shuffle decodes for MMX/3DNow! shuffles. Added shuffle decodes for MMX PUNPCK + PSHUFW shuffles. Added shuffle decodes for 3DNow! PSWAPD shuffles. llvm-svn: 247526	2015-09-13 11:28:45 +00:00
Elena Demikhovsky	8671fcbbd6	AVX-512: Fixed a bug in OR/XOR operations for 512-bit FP values on KNL. KNL does not have VXORPS, VORPS for 512-bit values. I use integer VPXOR, VPOR that actually do the same. X86ISD::FXOR/FOR are generated as a result of FSUB combining. Differential Revision: http://reviews.llvm.org/D12753 llvm-svn: 247523	2015-09-13 08:15:15 +00:00
Sanjay Patel	8b960d22ad	[x86] enable machine combiner reassociations for 128-bit vector logical integer insts (2nd try) The changes in: test/CodeGen/X86/machine-cp.ll are just due to scheduling differences after some logic instructions were reassociated. llvm-svn: 247516	2015-09-12 19:47:50 +00:00
Simon Pilgrim	42c834bad5	[X86] Added i1 vector sextload tests llvm-svn: 247509	2015-09-12 15:36:41 +00:00
Simon Pilgrim	779bcf3e3d	[X86][FMA] Refreshed fma tests llvm-svn: 247508	2015-09-12 15:33:05 +00:00
Sanjay Patel	99f7370a79	revert r247506; need to verify changes in existing tests llvm-svn: 247507	2015-09-12 15:27:31 +00:00
Sanjay Patel	08755c7dbc	[x86] enable machine combiner reassociations for 128-bit vector logical integer insts llvm-svn: 247506	2015-09-12 14:58:04 +00:00
Simon Pilgrim	20c607b110	[InstCombine] CVTPH2PS Vector Demanded Elements + Constant Folding Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion): <4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion. Added constant folding support. Differential Revision: http://reviews.llvm.org/D12731 llvm-svn: 247504	2015-09-12 13:39:53 +00:00
Simon Pilgrim	8ee50e9cff	[X86][SSE] Use general sext IR for (v)pmovsx stack folding tests llvm-svn: 247502	2015-09-12 11:45:24 +00:00
Chandler Carruth	29a18a4663	[PM] Port SROA to the new pass manager. In some ways this is a very boring port to the new pass manager as there are no interesting analyses or dependencies or other oddities. However, this does introduce the first good example of a transformation pass with non-trivial state porting to the new pass manager. I've tried to carve out patterns here to replicate elsewhere, and would appreciate comments on whether folks like these patterns: - A common need in the new pass manager is to effectively lift the pass class and some of its state into a public header file. Prior to this, LLVM used anonymous namespaces to provide "module private" types and utilities, but that doesn't scale to cases where a public header file is needed and the new pass manager will exacerbate that. The pattern I've adopted here is to use the namespace-cased-name of the core pass (what would be a module if we had them) as a module-private namespace. Then utility and other code can be declared and defined in this namespace. At some point in the future, we could even have (conditionally compiled) code that used modules features when available to do the same basic thing. - I've split the actual pass run method in two in order to expose a private method usable by the old pass manager to wrap the new class with a minimum of duplicated code. I actually looked at a bunch of ways to automate or generate these, but they are all quite terrible IMO. The fundamental need is to extract the set of analyses which need to cross this interface boundary, and that will end up being too unpredictable to effectively encapsulate IMO. This is also a relatively small amount of boiler plate that will live a relatively short time, so I'm not too worried about the fact that it is boiler plate. The rest of the patch is totally boring but results in a massive diff (sorry). It just moves code around and removes or adds qualifiers to reflect the new name and nesting structure. Differential Revision: http://reviews.llvm.org/D12773 llvm-svn: 247501	2015-09-12 09:09:14 +00:00
Davide Italiano	63cee81c3c	[MC] Don't crash on division by zero. Differential Revision: http://reviews.llvm.org/D12776 llvm-svn: 247471	2015-09-11 20:47:35 +00:00
Yunzhong Gao	46261a74db	Add a non-exiting diagnostic handler for LTO. This is in order to give LTO clients a chance to do some clean-up before terminating the process. llvm-svn: 247461	2015-09-11 20:01:53 +00:00
Akira Hatanaka	bc497c93f5	Use function attribute "stackrealign" to decide whether stack realignment should be forced. With this commit, we can now force stack realignment when doing LTO and do so on a per-function basis. Also, add a new cl::opt option "stackrealign" to CommandFlags.h which is used to force stack realignment via llc's command line. Out-of-tree projects currently using -force-align-stack to force stack realignment should make changes to attach the attribute to the functions in the IR. Differential Revision: http://reviews.llvm.org/D11814 llvm-svn: 247450	2015-09-11 18:54:38 +00:00
David Majnemer	0e70598a5b	[X86] Make sure startproc/endproc are paired We used different conditions to determine if we should emit startproc vs endproc. Use the same condition to ensure that they will always be paired. This fixes PR24374. llvm-svn: 247435	2015-09-11 17:34:34 +00:00
Reid Kleckner	5dbee7baef	[IR] Print the label operands of a catchpad like an invoke The rest of the EH pads are fine, since they have at most one label and take fewer operands for the personality. Old catchpad vs. new: %5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8)] to label %__except.ret.10 unwind label %catchendblock.9 ----- %5 = catchpad [i8 bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)] to label %__except.ret.10 unwind label %catchendblock.9 llvm-svn: 247433	2015-09-11 17:27:52 +00:00
Daniel Sanders	715f8f1332	[mips] Add missing disassembler tests for MIPS64-MIPS64R5. llvm-svn: 247422	2015-09-11 16:24:11 +00:00
Daniel Sanders	9676db005e	[mips] Add missing MIPS32 - MIPS32R5 disassembler tests. llvm-svn: 247420	2015-09-11 15:28:19 +00:00
Daniel Sanders	aba4daab10	[mips] Attempt to fix llvm-s390x-linux1 It doesn't seem to like the '\|&' in the test command. llvm-svn: 247418	2015-09-11 14:57:54 +00:00

... 3 4 5 6 7 ...

32371 Commits