llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	284192730a	AMDGPU: Use explicit register size indirect pseudos This stops using an unknown reg class operand. Currently build_vector selection has a broken looking check where it tries to use a VGPR reg class and an SGPR one if it sees an SGPR use. With the source operand has an explicit VGPR class, illegal copies will be inserted that SIFixSGPRCopies will take care of normally later, which will allow removing the weird check of build_vector users. Without this, when removed v_movrels_b32 would still be emitted even though all of the values were only stored in SGPRs. llvm-svn: 249494	2015-10-07 00:42:51 +00:00
Reid Kleckner	72ba70418f	[SEH] Add llvm.eh.exceptioncode intrinsic This will support the Clang __exception_code intrinsic. llvm-svn: 249492	2015-10-07 00:27:33 +00:00
David Majnemer	7735a6d07a	[WinEH] Create a separate MBB for funclet prologues Our current emission strategy is to emit the funclet prologue in the CatchPad's normal destination. This is problematic because intra-funclet control flow to the normal destination is not erroneous and results in us reevaluating the prologue if said control flow is taken. Instead, use the CatchPad's location for the funclet prologue. This correctly models our desire to have unwind edges evaluate the prologue but edges to the normal destination result in typical control flow. Differential Revision: http://reviews.llvm.org/D13424 llvm-svn: 249483	2015-10-06 23:31:59 +00:00
Tom Stellard	0fbf899c0f	AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments() Summary: We currently ignore the calling convention, so there is no real reason to assert on the calling convention of functions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13367 llvm-svn: 249468	2015-10-06 21:16:34 +00:00
Chad Rosier	cb14dd0265	[ARM] Simplify tests and make checks more rigid. NFC. llvm-svn: 249432	2015-10-06 17:54:12 +00:00
Krzysztof Parzyszek	fb33824efd	[Hexagon] Add an early if-conversion pass llvm-svn: 249423	2015-10-06 15:49:14 +00:00
Daniel Sanders	1b3341724c	[mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend Summary: This fixes 7 tests during fast LLVM test-suite run: * MultiSource/Benchmarks/McCat/18-imp/imp * MultiSource/Applications/oggenc/oggenc * MultiSource/Benchmarks/MallocBench/gs/gs * MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan * MultiSource/Benchmarks/VersaBench/beamformer/beamformer * MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame * MultiSource/Benchmarks/Bullet/bullet Error message was in the form of: fatal error: error in backend: Cannot select: 0x95c3288: f32 = fsqrt 0x95c0190 [ORD=9] [ID=18] 0x95c0190: f32 = fadd 0x95bef30, 0x95c4d00 [ORD=8] [ID=17] 0x95bef30: f32 = fmul 0x95c4988, 0x95c4988 [ORD=5] [ID=16] ... There was problem with selecting sqrt instruction in LLVM backend. To fix the issue changes are made in TableGen definition for sqrt instruction in MipsInstrFPU.td and new test file sqrt.ll is added to LLVM regression tests. Patch by Zlatko Buljan Reviewers: zoran.jovanovic, hvarga, dsanders Subscribers: llvm-commits, petarj Differential Revision: http://reviews.llvm.org/D13235 llvm-svn: 249416	2015-10-06 15:17:25 +00:00
Daniel Sanders	add9057fa7	Revert r249123 - [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend The author was not credited and most of the commit message is missing. Will re-commit with this fixed. llvm-svn: 249415	2015-10-06 15:13:16 +00:00
Craig Topper	2c4068f409	[TwoAddressInstructionPass] When looking for a 3 addr conversion after commuting, make sure regB has been updated to take into account the commute. llvm-svn: 249378	2015-10-06 05:39:59 +00:00
Alexei Starovoitov	4e01a38da0	[bpf] Avoid extra pointer arithmetic for stack access For the program like below struct key_t { int pid; char name[16]; }; extern void test1(char *); int test() { struct key_t key = {}; test1(key.name); return 0; } For key.name, the llc/bpf may generate the below code: R1 = R10 // R10 is the frame pointer R1 += -24 // framepointer adjustment R1 \|= 4 // R1 is then used as the first parameter of test1 OR operation is not recognized by in-kernel verifier. This patch introduces an intermediate FI_ri instruction and generates the following code that can be properly verified: R1 = R10 R1 += -20 Patch by Yonghong Song <yhs@plumgrid.com> llvm-svn: 249371	2015-10-06 04:00:53 +00:00
Craig Topper	79dd1bf094	[X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted. Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister. llvm-svn: 249370	2015-10-06 02:50:24 +00:00
Dan Gohman	e51c058ecc	[WebAssembly] Switch to a more traditional assembly syntax This new syntax is built around putting each instruction on its own line in a "mnemonic op, op, op" like syntax. It also uses conventional data section directives like ".byte" and so on rather than requiring everything to be in hierarchical S-expression format. This is a more natural syntax for a ".s" file format from the perspective of LLVM MC and related tools, while remaining easy to translate into other forms as needed. llvm-svn: 249364	2015-10-06 00:27:55 +00:00
Scott Douglass	953f908173	[ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM. We were previously codegen'ing memcpy as regular load/store operations and hoping that the register allocator would allocate registers in ascending order so that we could apply an LDM/STM combine after register allocation. According to the commit that first introduced this code (r37179), we planned to teach the register allocator to allocate the registers in ascending order. This never got implemented, and up to now we've been stuck with very poor codegen. A much simpler approach for achieving better codegen is to create MEMCPY pseudo instructions, attach scratch virtual registers to them and then, post register allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers. The register allocator will have picked arbitrary registers which we sort when expanding the MEMCPY. This approach also avoids the need to repeatedly calculate offsets which ultimately ought to be eliminated pre-RA in order to decrease register pressure. Fixes PR9199 and PR23768. [This is based on Peter Collingbourne's r238473 which was reverted.] Differential Revision: http://reviews.llvm.org/D13239 Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458 Author: Peter Collingbourne <peter@pcc.me.uk> llvm-svn: 249322	2015-10-05 14:49:54 +00:00
Simon Pilgrim	bb01c6fda2	[X86][SSE4A] Added shuffle decode tests for 'special case' SSE4A EXTRQI/INSERTQI ops. llvm-svn: 249263	2015-10-04 10:12:53 +00:00
Igor Breger	78741a1b1e	AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12690 llvm-svn: 249261	2015-10-04 07:20:41 +00:00
David Majnemer	161935520d	[WinEH] Permit branch folding in the face of funclets Track which basic blocks belong to which funclets. Permit branch folding to fire but only if it can prove that doing so will not cause code in one funclet to be reused in another. llvm-svn: 249257	2015-10-04 02:22:52 +00:00
Simon Pilgrim	dde63374c5	[DAGCombiner] Generalize FADD constant combines to work with vectors Updated the FADD combines to work with vectors as well as scalars. Differential Revision: http://reviews.llvm.org/D13416 llvm-svn: 249251	2015-10-03 22:06:06 +00:00
Sanjay Patel	004ea240ad	add test cases that demonstrate bad behavior These are based on PR25016 and likely caused by a bug in MachineCombiner's definition of improvesCriticalPathLen(). llvm-svn: 249249	2015-10-03 20:52:55 +00:00
Simon Pilgrim	93ea954e6d	[X86][SSE] Add FADD combine tests. llvm-svn: 249240	2015-10-03 18:17:43 +00:00
Dan Gohman	dc51b96b7f	[WebAssembly] Implement the remaining conversion operations. This is a temporary assembly syntax that will likely evolve along with broader upcoming syntax changes. llvm-svn: 249225	2015-10-03 02:10:28 +00:00
Dan Gohman	6a050f30de	[WebAssembly] Rename setlocal to set_local to match the spec. llvm-svn: 249218	2015-10-03 00:01:53 +00:00
Dan Gohman	eb440092c9	[WebAssembly] Update this test for the new loop scheme. llvm-svn: 249217	2015-10-02 23:54:03 +00:00
Dan Gohman	e3e4a5ff52	[WebAssembly] Fix CFG stackification of nested loops. llvm-svn: 249187	2015-10-02 21:11:36 +00:00
Dan Gohman	9cc692b06e	[WebAssembly] Support calls marked as "tail", fastcc, and coldcc. llvm-svn: 249184	2015-10-02 20:54:23 +00:00
Richard Trieu	e0129e474d	Call the correct overload. Call the correct overload so a string literal does not get converted to a bool. Also fix the test case to match the names given. llvm-svn: 249183	2015-10-02 20:52:14 +00:00
Dan Gohman	baba8c648b	[WebAssembly] Add a resize_memory intrinsic. llvm-svn: 249178	2015-10-02 20:10:26 +00:00
Dan Gohman	72f1692a2c	[WebAssembly] Add a memory_size intrinsic. llvm-svn: 249171	2015-10-02 19:21:15 +00:00
Tim Northover	956b008db6	ARM: correctly align constant pool value on Thumb1 targets. Since we're using tLDRpci to access it, the constant pool's address must be 0 (mod 4). llvm-svn: 249163	2015-10-02 18:07:13 +00:00
Andrea Di Biagio	77f62652c1	Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types." This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Originally reviewed here: http://reviews.llvm.org/D13347 llvm-svn: 249147	2015-10-02 16:08:05 +00:00
Andrea Di Biagio	45874e67a1	Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types. r249121 caused a Clang test failure (avx2-buitins.c). Revert r249121 while I keep investigating on the reason why that test failed. llvm-svn: 249124	2015-10-02 13:06:19 +00:00
Zoran Jovanovic	9ffdfa5986	[mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend Differential Revision: http://reviews.llvm.org/D13235 llvm-svn: 249123	2015-10-02 13:06:02 +00:00
Andrea Di Biagio	cb33456122	[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types. This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Differential Revision: http://reviews.llvm.org/D13347 llvm-svn: 249121	2015-10-02 12:45:37 +00:00
Reid Kleckner	fc64fae6e3	[WinEH] Emit __C_specific_handler tables for the new IR We emit denormalized tables, where every range of invokes in the same state gets a complete list of EH action entries. This is significantly simpler than trying to infer the correct nested scoping structure from the MI. Fortunately, for SEH, the nesting structure is really just a size optimization. With this, some basic __try / __except examples work. llvm-svn: 249078	2015-10-01 21:38:24 +00:00
Tom Stellard	e9f8b24985	AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass Summary: Instead of asserting when the kernel metadata is different than we expect, we should just skip lowering that function. This fixes assertion failures with OpenCL argument metadata from older LLVM releases. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13356 llvm-svn: 249073	2015-10-01 21:16:05 +00:00
David Majnemer	4600c06434	[WinEH] Stop BranchFolding from merging across funclets BranchFolding would merge two funclets together, this is not OK. Disable this and strengthen the assertion in FuncletLayout. llvm-svn: 249069	2015-10-01 21:04:13 +00:00
David Majnemer	f828a0ccc7	[WinEH] Make FuncletLayout more robust against catchret Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. llvm-svn: 249052	2015-10-01 18:44:59 +00:00
Jonas Paulsson	12629324a4	[SystemZ] Add some generic (floating point support) load instructions. Add generic instructions for load complement, load negative and load positive for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so give scheduler more freedom. SystemZElimCompare pass will convert them when it can to the CC-setting variants. Regression tests updated to expect the new opcodes in places where the old ones where used. New test case SystemZ/fp-cmp-05.ll checks that SystemZCompareElim.cpp can handle the new opcodes. README.txt updated (bullet removed). Note that fp128 is not yet handled, because it is relatively rare, and is a bit trickier, because of the fact that l.dfr would operate on the sign bit of one of the subregisters of a fp128, but we would not want to copy the other sub-reg in case src and dst regs are not the same. Reviewed by Ulrich Weigand. llvm-svn: 249046	2015-10-01 18:12:28 +00:00
Tom Stellard	e0e582c9aa	AMDGPU: Add MEM_RAT STORE_TYPED. v2: Add test (Matt). Fix capitalization of isEOP (Matt). Move pattern to class parameter (Matt). Make the instruction available to Cayman (Matt). Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED. Patch by: Zoltan Gilian llvm-svn: 249042	2015-10-01 17:51:34 +00:00
NAKAMURA Takumi	1ed20db720	Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64" It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll llvm-svn: 249032	2015-10-01 17:00:56 +00:00
Scott Douglass	290183d734	[ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer Differential Revision: http://reviews.llvm.org/D13240 llvm-svn: 249002	2015-10-01 11:56:19 +00:00
Ahmed Bougacha	23a0d1a1d6	[X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math. The custom code produces incorrect results if later reassociated. Since r221657, on x86, vNi32 uitofp is lowered using an optimized sequence: movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...] pand %xmm0, %xmm1 por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...] psrld $16, %xmm0 por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...] addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...] addps %xmm1, %xmm0 Since r240361, the machine combiner opportunistically reassociates 2-instruction sequences (with -ffast-math). In the new code sequence, the ADDPS' are eligible. In isolation, for simple examples (without reassociable users), this makes no performance difference (the goal being to enable reassociation of longer chains). In the trivial example (just one uitofp), the reassociation doesn't happen, because (I think) it would require the emission of a separate movaps for a constantpool load (instead of folding it into addps). However, when we have multiple uitofp sequences, and the constantpool loads are CSE'd earlier, the machine combiner can do the reassociation. When the ADDPS' are reassociated, the resulting sequence isn't correct anymore, as we'd be adding large (239) constants with comparatively smaller values (~223). Given that two of the three inputs are powers of 2 larger than 216, and that ulp(239) == 2(39-24) == 215, the reassociated chain will produce 0 for any input in [0, 214[. In my testing, it also produces wrong results for 99.5% of [0, 232[. Avoid this by disabling the new lowering when -ffast-math. It does mean that we'll get slower code than without it, but at least we won't get egregiously incorrect code. One might argue that, considering -ffast-math is all but meaningless, uitofp producing wrong results isn't a compiler bug. But it really is. Fixes PR24512. ...though this is really more of a workaround. Ideally, we'd have some sort of Machine FMF, but that's a problem that's not worth tackling until we do more with machine IR. llvm-svn: 248965	2015-10-01 00:11:07 +00:00
Reid Kleckner	6dec87a8a0	[WinEH] Emit int3 after noreturn calls on Win64 The Win64 unwinder disassembles forwards from each PC to try to determine if this PC is in an epilogue. If so, it skips calling the EH personality function for that frame. Typically, this means you cannot catch an exception in the same frame that you threw it, because 'throw' calls a noreturn runtime function. Previously we avoided this problem with the TrapUnreachable TargetOption, but that's a much bigger hammer than we need. All we need is a 1 byte non-epilogue instruction right after the call. Instead, what we got was an unconditional branch to a shared block containing the ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which added TrapUnreachable, and replaces it with something better. The new code pattern matches for invoke/call followed by unreachable and inserts an int3 into the DAG. To be 100% watertight, we would need to insert SEH_Epilogue instructions into all basic blocks ending in a call with no terminators or successors, but in practice this is unlikely to come up. llvm-svn: 248959	2015-09-30 23:09:23 +00:00
Sanjay Patel	a114a10bbe	[x86] enable machine combiner reassociations for 256-bit vector logical integer insts llvm-svn: 248955	2015-09-30 22:25:55 +00:00
Chad Rosier	4c5a4646bf	[AArch64] Remove an unnecessary run line and other cleanup. NFC. Unscaled load/store combining has been enabled since the initial ARM64 port. No need for a redundance run. Also, add CHECK-LABEL directives. llvm-svn: 248945	2015-09-30 21:10:02 +00:00
Chad Rosier	11c825f7db	[AArch64] Remove an unnecessary restriction on pre-index instructions. Previously, the index was constrained to the size of the memory operation for no apparent reason. This change removes that constraint so that we can form pre-index instructions with any valid offset. llvm-svn: 248931	2015-09-30 19:44:40 +00:00
Hal Finkel	4c45775880	[PowerPC] Disable shrink wrapping Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for now until the problem can be fixed. llvm-svn: 248924	2015-09-30 17:29:03 +00:00
Jeroen Ketema	ab99b59e8c	[ARM][NEON] Use address space in vld([1234]\|[234]lane) and vst([1234]\|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887	2015-09-30 10:56:37 +00:00
Simon Pilgrim	3d11c994f7	[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878	2015-09-30 08:17:50 +00:00
Evgeniy Stepanov	d3f544f271	[safestack] Fix a stupid mix-up in the direct-tls code path. llvm-svn: 248863	2015-09-30 00:01:47 +00:00
Reid Kleckner	a13dfd539b	[WinEH] Setup RBP correctly in Win64 funclet prologues Previously local variable captures just didn't work in 64-bit. Now we can access local variables more or less correctly. llvm-svn: 248857	2015-09-29 23:32:01 +00:00
David Majnemer	91b0ab9172	[WinEH] Ensure that funclets obey the x64 ABI The x64 ABI requires that epilogues do not contain code other than stack adjustments and some limited control flow. However, we'd insert code to initialize the return address after stack adjustments. Instead, insert EAX/RAX with the current value before we create the stack adjustments in the epilogue. llvm-svn: 248839	2015-09-29 22:33:36 +00:00
Maksim Panchenko	cce239c45d	HHVM calling conventions. HHVM calling convention, hhvmcc, is used by HHVM JIT for functions in translated cache. We currently support LLVM back end to generate code for X86-64 and may support other architectures in the future. In HHVM calling convention any GP register could be used to pass and return values, with the exception of R12 which is reserved for thread-local area and is callee-saved. Other than R12, we always pass RBX and RBP as args, which are our virtual machine's stack pointer and frame pointer respectively. When we enter translation cache via hhvmcc function, we expect the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed to standard ABI alignment. This affects stack object alignment and stack adjustments for function calls. One extra calling convention, hhvm_ccc, is used to call C++ helpers from HHVM's translation cache. It is almost identical to standard C calling convention with an exception of first argument which is passed in RBP (before we use RDI, RSI, etc.) Differential Revision: http://reviews.llvm.org/D12681 llvm-svn: 248832	2015-09-29 22:09:16 +00:00
Chad Rosier	1769d8505f	Fix test from r248825. llvm-svn: 248827	2015-09-29 20:50:15 +00:00
Chad Rosier	4315012769	[AArch64] Add support for pre- and post-index LDPSWs. llvm-svn: 248825	2015-09-29 20:39:55 +00:00
David Majnemer	a80c151286	[WinEH] Teach AsmPrinter about funclets Summary: Funclets have been turned into functions by the time they hit the object file. Make sure that they have decent names for the symbol table and CFI directives explaining how to reason about their prologues. Differential Revision: http://reviews.llvm.org/D13261 llvm-svn: 248824	2015-09-29 20:12:33 +00:00
Chad Rosier	dabe2534ed	[AArch64] Add integer pre- and post-index halfword/byte loads and stores. llvm-svn: 248817	2015-09-29 18:26:15 +00:00
Jeroen Ketema	740f9d79ca	Arguments spilled on the stack before a function call may have alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786	2015-09-29 10:12:57 +00:00
Dan Gohman	868e1c08d9	[WebAssembly] Rename test files to match platform naming conventions. llvm-svn: 248783	2015-09-29 08:13:58 +00:00
Reid Kleckner	c71d6275ca	[WinEH] Fix ip2state table emission with funclets Previously we were hijacking the old LandingPadInfo data structures to communicate our state numbers. Now we don't need that anymore. llvm-svn: 248763	2015-09-28 23:56:30 +00:00
Matt Arsenault	73aa8f687a	AMDGPU: Fix splitting x16 SMRD loads When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741	2015-09-28 20:54:52 +00:00
Matt Arsenault	e5d042cd56	AMDGPU: Fix moving SMRD loads with literal offsets on CI llvm-svn: 248740	2015-09-28 20:54:46 +00:00
Matt Arsenault	b378f075a2	AMDGPU: Add testcases Make sure we are testing moving users of the moved and split SMRD loads. llvm-svn: 248738	2015-09-28 20:54:38 +00:00
Matt Arsenault	f3c91f573f	AMDGPU: Cleanup test Run instnamer on it, and rename check prefix. This is in preparation for adding new testcases to cover bugs on other subtargets. llvm-svn: 248737	2015-09-28 20:54:32 +00:00
Dan Gohman	05a17aa82a	[WebAssembly] Support for direct call and call_indirect. llvm-svn: 248716	2015-09-28 16:22:39 +00:00
Hal Finkel	bd582581b8	[DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking When AA is being used, non-aliasing stores are canonicalized to use the same chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of this by looking only as users of a store's chain operand. However, user iteration is not result-number specific, we need to check that the use is as a chain operand, and not via some other operand. It is certainly possible to have another potentially-aliasing store, which shares the first's base pointer, and uses the first's chain's node via some other operand. Failure to catch this situation caused, at least in the included test case, an assert later because the relative sequence-number ordering caused later replacement to create a cycle in the DAG. llvm-svn: 248698	2015-09-28 08:02:14 +00:00
Joseph Tremoulet	09af67aba5	[EH] Create removeUnwindEdge utility Summary: Factor the code that rewrites invokes to calls and rewrites WinEH terminators to their "unwind to caller" equivalents into a helper in Utils/Local, and use it in the three places I'm aware of that need to do this. Reviewers: andrew.w.kaylor, majnemer, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13152 llvm-svn: 248677	2015-09-27 01:47:46 +00:00
Matt Arsenault	86095b8dec	AMDGPU: Fix sched model for VOP2b instructions Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646	2015-09-26 02:25:45 +00:00
Dan Gohman	d0bf981296	[WebAssembly] Rename several functions and types according to the new spec. llvm-svn: 248644	2015-09-26 01:09:44 +00:00
Ahmed Bougacha	e81610fabb	[ARM] Don't generate clrex for pre-v7 targets. Since r248294, we emit clrex, but it doesn't exist on v6. llvm-svn: 248640	2015-09-26 00:14:02 +00:00
Cong Hou	15ea016346	Use fixed-point representation for BranchProbability. BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change: Pros: 1. It uses only a half space of the current one. 2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer. Cons: 1. Constructing a probability using arbitrary numerator and denominator needs additional calculations. 2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio. One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern. Differential revision: http://reviews.llvm.org/D12603 llvm-svn: 248633	2015-09-25 23:09:59 +00:00
Matthias Braun	a3b701f828	SelectionDAGDumper: Print simple operands inline. Print simple operands inline instead of their pointer/value number. Simple operands are SDNodes without predecessors like Constant(FP), Register, UNDEF. This unifies the behaviour with dumpr() which was already doing this. Previously: t0: ch = EntryToken t1: i64 = Register %vreg0 t2: i64,ch = CopyFromReg t0, t1 t3: i64 = Constant<1> t4: i64 = add t2, t3 t5: i64 = Constant<2> t6: i64 = add t2, t5 t10: i64 = undef t11: i8,ch = load t0, t2, t10<LD1[%tmp81]> t12: i8,ch = load t0, t4, t10<LD1[%tmp10]> t13: i8,ch = load t0, t6, t10<LD1[%tmp12]> Now: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0 t4: i64 = add t2, Constant:i64<1> t6: i64 = add t2, Constant:i64<2> t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64 t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64 t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64 Differential Revision: http://reviews.llvm.org/D12567 llvm-svn: 248628	2015-09-25 22:27:02 +00:00
Sanjay Patel	bbbf9a1a34	merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622	2015-09-25 21:49:48 +00:00
Matthias Braun	e86bbd8979	PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepoint The algorithm would not modify the live-in list of blocks below the save block point which is correct unless it happens to be a restore point at the same time. Also fixes the benign issue of live-in registers being added twice in some cases. The testcase is based on a test submitted by Kit Barton. Differential Revision: http://reviews.llvm.org/D13176 llvm-svn: 248620	2015-09-25 21:41:40 +00:00
Tom Stellard	e135ffd554	AMDGPU/SI: Use .hsatext section instead of .text for HSA Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619	2015-09-25 21:41:28 +00:00
Matt Arsenault	10aa807856	PeepholeOptimizer: Remove redundant copies If a virtual register is copied and another copy was already seen, replace with the previous copy. This only handles the simplest cases for now. This pattern shows up from various operand restrictions AMDGPU has which require inserting copies depending on the register class of the operands. llvm-svn: 248611	2015-09-25 20:22:12 +00:00
Matt Arsenault	28bd7d4afe	AMDGPU: Add some more tests for literal operands llvm-svn: 248600	2015-09-25 18:21:47 +00:00
Chad Rosier	1bbd7fb38e	[AArch64] Add support for generating pre- and post-index load/store pairs. llvm-svn: 248593	2015-09-25 17:48:17 +00:00
Matt Arsenault	4bf43d4e68	AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589	2015-09-25 17:27:08 +00:00
Matt Arsenault	5f70436c49	AMDGPU: Improve accuracy of instruction rates for VOPC These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582	2015-09-25 16:58:25 +00:00
Saleem Abdulrasool	fe83b50289	ARM: address WoA division limitation We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561	2015-09-25 05:15:46 +00:00
Simon Pilgrim	68d0050c6a	[X86][SSE2] Fix zero/any extension shuffles that don't start from the first element Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H llvm-svn: 248536	2015-09-24 21:02:17 +00:00
Matt Arsenault	e66621b306	AMDGPU: Add s_dcache_* instructions llvm-svn: 248533	2015-09-24 19:52:27 +00:00
Matt Arsenault	d6adfb401c	AMDGPU: Add cache invalidation instructions. These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532	2015-09-24 19:52:21 +00:00
Mohammad Shahid	d0203cbf1c	Regression Test: Deletes redundant/invalid test. Removes absdiff_expand.ll regression test file which is invalid. Diffrential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248493	2015-09-24 14:37:25 +00:00
Mohammad Shahid	13f1dfdf2e	Codegen: Fix llvm.absdiff semantic. Fixes the overflow case of llvm.absdiff intrinsic also updats the tests and LangRef.rst accordingly. Differential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248483	2015-09-24 10:35:03 +00:00
Matt Arsenault	68d938649e	Introduce target hook for optimizing register copies Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478	2015-09-24 08:36:14 +00:00
Matt Arsenault	cab64f1c75	AMDGPU: Fix printing trailing whitespace for mubuf atomics llvm-svn: 248472	2015-09-24 07:51:17 +00:00
Matt Arsenault	c721df0478	Use new TokenFactor chain when merging stores If the stores are storing values from loads which partially alias the stores, we could end up placing the merged loads and stores on the same chain which has the potential to break. Each store may have a different chain dependency on only some of the original loads. Create a new TokenFactor to capture all of the required dependencies of the stores rather than assuming all stores can use the same chain. The testcase is a situation where this happens, although it does not have an observable change from this. The DAG nodes just happened to not be reordered before despite this missing chain dependency. This is based on an off-list report for an out of tree target which regressed due to r246307 and I haven't managed to find a case where the nodes do end up reordered with an in tree target. llvm-svn: 248468	2015-09-24 07:22:38 +00:00
Matt Arsenault	c8e2ce4046	AMDGPU: Reduce number of copies emitted Instead of always inserting a copy in case the super register is itself a subregister, only extract to the super reg class if this is actually the case. This shouldn't really change codegen, but makes looking at the output of SIFixSGPRCopies easier to read. llvm-svn: 248467	2015-09-24 07:16:37 +00:00
Tim Northover	beb5bccf88	ARM: fix folding stack adjustment (again again again...) This time, the issue is that we weren't accounting for the possibility that aligned DPRs could have been stored after the final "push" in a prologue. When that happened we effectively moved a "sub sp, #N" from below the aligned stores to above them, and everything went to pot. To make it worse, I'd actually committed something testing that we produced wrong code, so the test update is tiny. llvm-svn: 248437	2015-09-23 22:21:09 +00:00
Lawrence Hu	cac0b89289	Swap loop invariant GEP with loop variant GEP to allow more LICM. This patch changes the order of GEPs generated by Splitting GEPs pass, specially when one of the GEPs has constant and the base is loop invariant, then we will generate the GEP with constant first when beneficial, to expose more cases for LICM. If originally Splitting GEP generate the following: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 %3 %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032 ... Now it genereates: do.body.i: %idxprom.i = sext i32 %shr.i to i64 %2 = bitcast %typeD* %s to i8* %3 = shl i64 %idxprom.i, 2 %uglygep = getelementptr i8, i8* %2, i64 1032 %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3 ... For no-loop cases, the original way of generating GEPs seems to expose more CSE cases, so we don't change the logic for no-loop cases, and only limit our change to the specific case we are interested in. llvm-svn: 248420	2015-09-23 19:25:30 +00:00
Sanjay Patel	1a6534661b	[x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx xorl %eax, %ecx movd %ecx, %xmm0 into this: xorps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248415	2015-09-23 18:33:42 +00:00
Sanjay Patel	aba37553c4	[x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx orl %eax, %ecx movd %ecx, %xmm0 into this: orps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248409	2015-09-23 18:19:07 +00:00
Sanjay Patel	df2495f331	[x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx andl %eax, %ecx movd %ecx, %xmm0 into this: andps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 Differential Revision: http://reviews.llvm.org/D13065 llvm-svn: 248395	2015-09-23 17:00:06 +00:00
Simon Pilgrim	9cb018b6b6	[X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368	2015-09-23 08:48:33 +00:00
Cong Hou	b54a72ef78	Add a test case for the fix of profile update issue when lowering switch statement. llvm-svn: 248356	2015-09-23 00:34:56 +00:00
Matthias Braun	73e4221e6c	LiveIntervalAnalysis: Avoid multiple connected liveness components We may have subregister defs which are unused but not discovered and cleaned up prior to liveness analysis. This creates multiple connected components in the resulting live range which are forbidden in the MachineVerifier because they would unnecesarily constrain the register allocator. Rewrite those dead definitions to define a newly created virtual register. Differential Revision: http://reviews.llvm.org/D13035 llvm-svn: 248335	2015-09-22 22:37:44 +00:00
Ahmed Bougacha	81616a72ea	[ARM] Emit clrex in the expanded cmpxchg fail block. ARM counterpart to r248291: In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248294	2015-09-22 17:22:58 +00:00
Ahmed Bougacha	07a844d758	[AArch64] Emit clrex in the expanded cmpxchg fail block. In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248291	2015-09-22 17:21:44 +00:00
Stephen Canon	8216d88511	Don't raise inexact when lowering ceil, floor, round, trunc. The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions did raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice. Differential Revision: http://reviews.llvm.org/D12969 llvm-svn: 248266	2015-09-22 11:43:17 +00:00
Simon Pilgrim	1cad0cd3ce	[X86][SSE] Match zero/any extension shuffles that don't start from the first element This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane. The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well. Differential Revision: http://reviews.llvm.org/D12561 llvm-svn: 248250	2015-09-22 08:16:08 +00:00
Simon Pilgrim	4003ed2da3	[DAGCombiner] Improve FMA support for interpolation patterns This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents. This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t)))) Differential Revision: http://reviews.llvm.org/D13003 llvm-svn: 248210	2015-09-21 20:32:48 +00:00
Jeroen Ketema	41681a5329	[ARM] Do not scale vext with a factor The vext pseudo-instruction takes the number of elements that need to be extracted, not the number of bytes. Hence, use the number of elements directly instead of scaling them with a factor. Reviewers: Silviu Baranga, James Molloy (not reflected in the differential revision) Differential Revision: http://reviews.llvm.org/D12974 llvm-svn: 248208	2015-09-21 20:28:04 +00:00
Ulrich Weigand	126caeb043	[SystemZ] Fix expansion of ISD::FPOW and ISD::FSINCOS The ISD::FPOW and ISD::FSINCOS opcodes default to Legal, but there is no legal instruction for those on SystemZ. This could cause LLVM internal errors. Fixed by setting the operation action to Expand for those opcodes. Also added test cases for all other LLVM IR intrinsics that should generate a library call. (Those already work correctly since the default operation action is fine.) llvm-svn: 248180	2015-09-21 17:35:45 +00:00
Matt Arsenault	b774834429	DAGCombiner: Replace store of FP constant after attemping store merges If storing multiple FP constants, some subset of the stores would be replaced with integers due to visit order, so MergeConsecutiveStores would only partially merge these. llvm-svn: 248169	2015-09-21 15:59:46 +00:00
Sanjay Patel	bab5d6c636	add test file ahead of any functional changes for PR22428 llvm-svn: 248123	2015-09-20 15:58:00 +00:00
Simon Pilgrim	c6a553241c	[X86][SSE] Intrinsics builtins test refresh. NFCI llvm-svn: 248122	2015-09-20 15:41:35 +00:00
Igor Breger	b7e1f9d680	AVX512: Implemented encoding and intrinsics for vcmpss/sd. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12593 llvm-svn: 248121	2015-09-20 15:15:10 +00:00
Asaf Badouh	2744d21fb8	[X86][AVX512] extend support in Scalar conversion add scalar FP to Int conversion with truncation intrinsics add scalar conversion FP32 from/to FP64 intrinsics add rounding mode and SAE mode encoding for these intrinsics Differential Revision: http://reviews.llvm.org/D12665 llvm-svn: 248117	2015-09-20 14:31:19 +00:00
Igor Breger	4c4cd789c9	AVX512: vsqrtss/sd encoding and intrinsics implementation. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12102 llvm-svn: 248116	2015-09-20 09:13:41 +00:00
Asaf Badouh	572bbceecc	[X86][AVX512DQ] Add fpclass instruction Differential Revision: http://reviews.llvm.org/D12931 llvm-svn: 248115	2015-09-20 08:46:07 +00:00
Michael Kuperstein	58e86bc893	[X86] Fix sitofp and uitofp instruction matching failures with long double and avx512 The operation action for i32 and i64 cannot be set to legal, as long double needs custom lowering. Patch by: mitch.l.bodart@intel.com Differential Revision: http://reviews.llvm.org/D12372 llvm-svn: 248114	2015-09-20 08:12:17 +00:00
Igor Breger	1d55f20bee	AVX512: Implemented intrinsics for vshuff32x4, vshuff64x2, vshufi64x2, vshufi32x4 Added tests for intrinsics. Differential Revision: http://reviews.llvm.org/D12525 llvm-svn: 248113	2015-09-20 07:18:53 +00:00
Igor Breger	0ede3cbb5c	AVX512: Implement instructions encoding, lowering and intrinsics vinserti64x4, vinserti64x2, vinserti32x8, vinserti32x4, vinsertf64x4, vinsertf64x2, vinsertf32x8, vinsertf32x4 Added tests for encoding, lowering and intrinsics. Differential Revision: http://reviews.llvm.org/D11893 llvm-svn: 248111	2015-09-20 06:52:42 +00:00
Simon Pilgrim	27f81776ad	[X86][AVX2] Use general sext IR for vpmovsx stack folding tests llvm-svn: 248093	2015-09-19 17:04:18 +00:00
Simon Pilgrim	d0448ee59f	[X86][SSE] Vectorize CTTZ + CTTZ_ZERO_UNDEF Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1)) Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x)) Differential Revision: http://reviews.llvm.org/D12663 llvm-svn: 248091	2015-09-19 13:22:57 +00:00
Matt Arsenault	cc5d106263	AMDGPU: Add failing testcase for live interval construction llvm-svn: 248067	2015-09-19 00:03:56 +00:00
Cong Hou	d40105d321	Update edge weights properly when merging blocks in if-conversion. In if-conversion, there is a utility function MergeBlocks() that is used to merge blocks. However, when new edges are built in this function the edge weight is either not provided or not updated properly, leading to a modified CFG with incorrect edge weights. This patch corrects this issue. Differential Revision: http://reviews.llvm.org/D12513 llvm-svn: 248030	2015-09-18 20:22:41 +00:00
Eric Christopher	a835956bda	Limit the range of processors supported by ARM fast isel to v6 or later as that's all that is tested right now. Fixes PR24858. llvm-svn: 248027	2015-09-18 20:08:18 +00:00
Cong Hou	f9f9ffb98b	Scaling up values in ARMBaseInstrInfo::isProfitableToIfCvt() before they are scaled by a probability to avoid precision issue. In ARMBaseInstrInfo::isProfitableToIfCvt(), there is a simple cost model in which the number of cycles is scaled by a probability to estimate the cost. However, when the number of cycles is small (which is usually the case), there is a precision issue after the computation. To avoid this issue, this patch scales those cycles by 1024 (chosen to make the multiplication a litter faster) before they are scaled by the probability. Other variables are also scaled up for the final comparison. Differential Revision: http://reviews.llvm.org/D12742 llvm-svn: 248018	2015-09-18 18:19:40 +00:00
Matthias Braun	0b7d6c14c9	SelectionDAG: Introduce PersistentID to SDNode for assert builds. This gives us more human readable numbers to identify nodes in debug dumps. Before: 0x7fcbd9700160: ch = EntryToken 0x7fcbd985c7c8: i64 = Register %RAX ... 0x7fcbd9700160: <multiple use> 0x7fcbd985c578: i64,ch = MOV64rm 0x7fcbd985c6a0, 0x7fcbd985cc68, 0x7fcbd985c200, 0x7fcbd985cd90, 0x7fcbd985ceb8, 0x7fcbd9700160<Mem:LD8[@foo]> [ORD=2] 0x7fcbd985c8f0: ch,glue = CopyToReg 0x7fcbd9700160, 0x7fcbd985c7c8, 0x7fcbd985c578 [ORD=3] 0x7fcbd985c7c8: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985c8f0: <multiple use> 0x7fcbd985ca18: ch = RETQ 0x7fcbd985c7c8, 0x7fcbd985c8f0, 0x7fcbd985c8f0:1 [ORD=3] Now: t0: ch = EntryToken t5: i64 = Register %RAX ... t0: <multiple use> t3: i64,ch = MOV64rm t10, t12, t11, t13, t14, t0<Mem:LD8[@foo]> [ORD=2] t6: ch,glue = CopyToReg t0, t5, t3 [ORD=3] t5: <multiple use> t6: <multiple use> t6: <multiple use> t7: ch = RETQ t5, t6, t6:1 [ORD=3] Differential Revision: http://reviews.llvm.org/D12564 llvm-svn: 248010	2015-09-18 17:41:00 +00:00
Geoff Berry	43ec15e57e	[AArch64] Improved bitfield instruction selection. Summary: For bitfield insert OR matching, check both operands for larger pattern first before checking for smaller pattern. Add pattern for unsigned bitfield insert-in-zero done with SHL+AND. Resolves PR21631. Reviewers: jmolloy, t.p.northover Subscribers: aemerson, rengolin, llvm-commits, mcrosier Differential Revision: http://reviews.llvm.org/D12908 llvm-svn: 248006	2015-09-18 17:11:53 +00:00
Quentin Colombet	b4c6886215	[ShrinkWrap] Refactor the handling of infinite loop in the analysis. - Strenghten the logic to be sure we hoist the restore point out of the current loop. (The fixes a bug with infinite loop, added as part of the patch.) - Walk over the exit blocks of the current loop to conver to the desired restore point in one iteration of the update loop. llvm-svn: 247958	2015-09-17 23:21:34 +00:00
David Majnemer	163b7f121c	[WinEH] Fix tests broken by funclet-layout llvm-svn: 247944	2015-09-17 21:11:12 +00:00
David Majnemer	978902309a	[WinEH] Add a funclet layout pass Windows EH funclets need to be contiguous. The FuncletLayout pass will ensure that the funclets are together and begin with a funclet entry MBB. Differential Revision: http://reviews.llvm.org/D12943 llvm-svn: 247937	2015-09-17 20:45:18 +00:00
Reid Kleckner	5b8a46e771	[WinEH] Make funclet return instrs pseudo instrs This makes catchret look more like a branch, and less like a weird use of BlockAddress. It also lets us get away from llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label arithmetic. llvm-svn: 247936	2015-09-17 20:43:47 +00:00
Reid Kleckner	b78585b3c2	Fix the test case I just committed llvm-svn: 247905	2015-09-17 17:21:45 +00:00
Reid Kleckner	ed17079b52	[WinEH] Add and use hasEHPadSuccessor instead of getLandingPadSuccessor getLandingPadSuccessor assumes that each invoke can have at most one EH pad successor, but WinEH invokes can have more than one. Two out of three callers of getLandingPadSuccessor don't use the returned landingpad, so we can make them use this simple predicate instead. Eventually we'll have to circle back and fix SplitKit.cpp so that register allocation works. Baby steps. llvm-svn: 247904	2015-09-17 17:19:40 +00:00
Elena Demikhovsky	702a6adfaa	AVX-512: shufflevector for i1 vectors <2 x i1> .. <64 x i1> AVX-512 does not provide an instruction that shuffles mask register. So I do the following way: mask-2-simd , shuffle simd , simd-2-mask Differential Revision: http://reviews.llvm.org/D12727 llvm-svn: 247876	2015-09-17 06:53:12 +00:00
Reid Kleckner	813f1b65bc	[WinEH] Rip out the landingpad-based C++ EH state numbering code It never really worked, and the new code is working better every day. llvm-svn: 247860	2015-09-16 22:14:46 +00:00
David Majnemer	67bff0d88b	[WinEHPrepare] Turn terminatepad into a cleanuppad + call + cleanupret The MSVC doesn't really support exception specifications so let's just turn these into cleanuppads. Later, we might use terminatepad to more efficiently encode the "noexcept"-ness of a function body. llvm-svn: 247848	2015-09-16 20:42:16 +00:00
Reid Kleckner	b005d281c3	[WinEH] Pull Adjectives and CatchObj out of the catchpad arg list Clang now passes the adjectives as an argument to catchpad. Getting the CatchObj working is simply a matter of threading another static alloca through codegen, first as an alloca, then as a frame index, and finally as a frame offset. llvm-svn: 247844	2015-09-16 20:16:27 +00:00
David Majnemer	459a64aed7	[WinEHPrepare] Provide a cloning mode which doesn't demote We are experimenting with a new approach to saving and restoring SSA values used across funclets: let the register allocator do the dirty work for us. However, this means that we need to be able to clone commoned blocks without relying on demotion. llvm-svn: 247835	2015-09-16 18:40:37 +00:00
Reid Kleckner	84ebff4a5e	[WinEH] Skip state numbering when no EH pads are present Otherwise we'd try to emit the thunk that passes the LSDA to __CxxFrameHandler3. We don't emit the LSDA if there were no landingpads, so we'd end up with an assembler error when trying to write the COFF object. llvm-svn: 247820	2015-09-16 17:19:44 +00:00
Dan Gohman	950a13cfa3	[WebAssembly] Check in an initial CFG Stackifier pass This pass implements a simple algorithm for conversion from CFG to wasm's structured control flow. It doesn't yet handle multiple-entry loops; that will be added in a future patch. It also adds initial support for switch statements. Differential Revision: http://reviews.llvm.org/D12735 llvm-svn: 247818	2015-09-16 16:51:30 +00:00
Sanjay Patel	a260701bbb	propagate fast-math-flags on DAG nodes After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815	2015-09-16 16:31:21 +00:00
Michael Kuperstein	d926465342	[X86] Do not generate 64-bit pops of 32-bit GPRs. When trying emit a stack adjustments using pops, frame lowering selects an arbitrary free GPR. It should always select one from an appropriate class... This fixes PR24649. Patch by: amjad.aboud@intel.com Differential Revision: http://reviews.llvm.org/D12609 llvm-svn: 247785	2015-09-16 11:27:20 +00:00
Mehdi Amini	d178f4fc89	Make the default triple optional by allowing an empty string When building LLVM as a (potentially dynamic) library that can be linked against by multiple compilers, the default triple is not really meaningful. We allow to explicitely set it to an empty string when configuring LLVM. In this case, said "target independent" tests in the test suite that are using the default triple are disabled by matching the newly available feature "default_triple". Reviewers: probinson, echristo Differential Revision: http://reviews.llvm.org/D12660 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 247775	2015-09-16 05:34:32 +00:00
Quentin Colombet	7b13976254	[ShrinkWrapping] Add a test case for r247710. llvm-svn: 247713	2015-09-15 18:51:43 +00:00
Ulrich Weigand	e861e6442c	[SystemZ] Fix assertion failure in tryBuildVectorShuffle Under certain circumstances, tryBuildVectorShuffle would attempt to create a BUILD_VECTOR node with an invalid combination of types. This happened when one of the components of the original BUILD_VECTOR was itself a TRUNCATE node. That TRUNCATE was stripped off during intermediate processing to simplify code, but when adding the node back to the result vector, we still need it to get the type right. llvm-svn: 247694	2015-09-15 14:27:46 +00:00
Dan Gohman	311b488d76	[WebAssembly] Implement int64-to-int32 conversion. llvm-svn: 247649	2015-09-15 00:55:19 +00:00
Jun Bum Lim	34b9bd0435	Improve ISel using across lane min/max reduction In vectorized integer min/max reduction code, the final "reduce" step is sub-optimal. In AArch64, this change wll combine : %svn0 = vector_shuffle %0, undef<2,3,u,u> %smax0 = smax %0, svn0 %svn3 = vector_shuffle %smax0, undef<1,u,u,u> %sc = setcc %smax0, %svn3, gt %n0 = extract_vector_elt %sc, #0 %n1 = extract_vector_elt %smax0, #0 %n2 = extract_vector_elt $smax0, #1 %result = select %n0, %n1, n2 becomes : %1 = smaxv %0 %result = extract_vector_elt %1, 0 This change extends r246790. llvm-svn: 247575	2015-09-14 16:19:52 +00:00
John Brawn	056e67865a	[ARM] Extract shifts out of multiply-by-constant Turning (op x (mul y k)) into (op x (lsl (mul y k>>n) n)) is beneficial when we can do the lsl as a shifted operand and the resulting multiply constant is simpler to generate. Do this by doing the transformation when trying to select a shifted operand, as that ensures that it actually turns out better (the alternative would be to do it in PreprocessISelDAG, but we don't know for sure there if extracting the shift would allow a shifted operand to be used). Differential Revision: http://reviews.llvm.org/D12196 llvm-svn: 247569	2015-09-14 15:19:41 +00:00
Simon Pilgrim	f8f86ab176	[X86][MMX] Added shuffle decodes for MMX/3DNow! shuffles. Added shuffle decodes for MMX PUNPCK + PSHUFW shuffles. Added shuffle decodes for 3DNow! PSWAPD shuffles. llvm-svn: 247526	2015-09-13 11:28:45 +00:00
Elena Demikhovsky	8671fcbbd6	AVX-512: Fixed a bug in OR/XOR operations for 512-bit FP values on KNL. KNL does not have VXORPS, VORPS for 512-bit values. I use integer VPXOR, VPOR that actually do the same. X86ISD::FXOR/FOR are generated as a result of FSUB combining. Differential Revision: http://reviews.llvm.org/D12753 llvm-svn: 247523	2015-09-13 08:15:15 +00:00
Sanjay Patel	8b960d22ad	[x86] enable machine combiner reassociations for 128-bit vector logical integer insts (2nd try) The changes in: test/CodeGen/X86/machine-cp.ll are just due to scheduling differences after some logic instructions were reassociated. llvm-svn: 247516	2015-09-12 19:47:50 +00:00
Simon Pilgrim	42c834bad5	[X86] Added i1 vector sextload tests llvm-svn: 247509	2015-09-12 15:36:41 +00:00
Simon Pilgrim	779bcf3e3d	[X86][FMA] Refreshed fma tests llvm-svn: 247508	2015-09-12 15:33:05 +00:00
Sanjay Patel	99f7370a79	revert r247506; need to verify changes in existing tests llvm-svn: 247507	2015-09-12 15:27:31 +00:00
Sanjay Patel	08755c7dbc	[x86] enable machine combiner reassociations for 128-bit vector logical integer insts llvm-svn: 247506	2015-09-12 14:58:04 +00:00
Simon Pilgrim	8ee50e9cff	[X86][SSE] Use general sext IR for (v)pmovsx stack folding tests llvm-svn: 247502	2015-09-12 11:45:24 +00:00
Akira Hatanaka	bc497c93f5	Use function attribute "stackrealign" to decide whether stack realignment should be forced. With this commit, we can now force stack realignment when doing LTO and do so on a per-function basis. Also, add a new cl::opt option "stackrealign" to CommandFlags.h which is used to force stack realignment via llc's command line. Out-of-tree projects currently using -force-align-stack to force stack realignment should make changes to attach the attribute to the functions in the IR. Differential Revision: http://reviews.llvm.org/D11814 llvm-svn: 247450	2015-09-11 18:54:38 +00:00
David Majnemer	0e70598a5b	[X86] Make sure startproc/endproc are paired We used different conditions to determine if we should emit startproc vs endproc. Use the same condition to ensure that they will always be paired. This fixes PR24374. llvm-svn: 247435	2015-09-11 17:34:34 +00:00
Reid Kleckner	5dbee7baef	[IR] Print the label operands of a catchpad like an invoke The rest of the EH pads are fine, since they have at most one label and take fewer operands for the personality. Old catchpad vs. new: %5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8)] to label %__except.ret.10 unwind label %catchendblock.9 ----- %5 = catchpad [i8 bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)] to label %__except.ret.10 unwind label %catchendblock.9 llvm-svn: 247433	2015-09-11 17:27:52 +00:00
David Blaikie	2f40830dde	[opaque pointer type] Add textual IR support for explicit type parameter for global aliases update.py: import fileinput import sys import re alias_match_prefix = r"(.(?:=\|:\|^)\s(?:external \|)(?:(?:private\|internal\|linkonce\|linkonce_odr\|weak\|weak_odr\|common\|appending\|extern_weak\|available_externally) )?(?:default \|hidden \|protected )?(?:dllimport \|dllexport )?(?:unnamed_addr \|)(?:thread_local(?:$[a-z]$)? )?alias" plain = re.compile(alias_match_prefix + r" (.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|addrspacecast\|\[\[[a-zA-Z]\|\{\{).$)") cast = re.compile(alias_match_prefix + r") ((?:bitcast\|inttoptr\|addrspacecast)\s$. to (.?)(\| addrspace\(\d+$ )\\)\s(?:;.)?$)") gep = re.compile(alias_match_prefix + r") ((?:getelementptr)\s(?:inbounds)?\s$(?P<type>.), (?P=type)(?:\saddrspace\(\d+$\s)?\* .\)\s(?:;.)?$)") def conv(line): m = re.match(cast, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(gep, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(plain, line) if m: return m.group(1) + ", " + m.group(2) + m.group(3) + "" + m.group(4) + "\n" return line for line in sys.stdin: sys.stdout.write(conv(line)) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name .ll \| xargs ./apply.sh llvm-svn: 247378	2015-09-11 03:22:04 +00:00
Reid Kleckner	da6dcc5d92	[WinEH] Push and pop EBP for 32-bit funclets The Win32 EH runtime caller does not preserve EBP, even though it does preserve the CSRs (EBX, ESI, EDI) for us. The result was that each finally funclet call would leave the frame pointer off by 12 bytes. llvm-svn: 247348	2015-09-10 22:00:02 +00:00
James Y Knight	1f3e6af7d0	[SPARC] Switch to the Machine Scheduler. The (mostly-deprecated) SelectionDAG-based ILPListDAGScheduler scheduler was making poor scheduling decisions, causing high register pressure and extraneous register spills. Switching to the newer machine scheduler generates better code -- even without there being a machine model defined for SPARC yet. (Actually committing the test changes too, this time, unlike r247315) llvm-svn: 247343	2015-09-10 21:49:06 +00:00
Reid Kleckner	7bb20bd69e	Fix SEH state numbering algorithm to handle cleanupendpads WinEHPrepare's new coloring algorithm really expects to see cleanupendpads now, so Clang will start emitting them soon. llvm-svn: 247341	2015-09-10 21:46:36 +00:00
David Blaikie	a7970f3cf8	Tidy up some alias syntax to make explicit pointer type migration easier llvm-svn: 247312	2015-09-10 18:03:45 +00:00
Joseph Tremoulet	f3aff31401	[WinEH] Fix single-block cleanup coloring Summary: The coloring code in WinEHPrepare queues cleanuprets' successors with the correct color (the parent one) when it sees their cleanuppad, and so later when iterating successors knows to skip processing cleanuprets since they've already been queued. This latter check was incorrectly under an 'else' condition and so inadvertently was not kicking in for single-block cleanups. This change sinks the check out of the 'else' to fix the bug. Reviewers: majnemer, andrew.w.kaylor, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12751 llvm-svn: 247299	2015-09-10 16:51:25 +00:00
Alex Lorenz	0153e59935	Fix PR 24724 - The implicit register verifier shouldn't assume certain operand order. The implicit register verifier in the MIR parser should only check if the instruction's default implicit operands are present in the instruction. It should not check the order in which they occur. llvm-svn: 247283	2015-09-10 14:04:34 +00:00
Igor Breger	7f69a99c54	AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247276	2015-09-10 12:54:54 +00:00
Silviu Baranga	df9ce8408a	[DAGCombine] Truncate BUILD_VECTOR operators if necessary when constant folding vectors Summary: The BUILD_VECTOR node will truncate its operators to match the type. We need to take this into account when constant folding - we need to perform a truncation before constant folding the elements. This is because the upper bits can change the result, depending on the operation type (for example this is the case for min/max). This change also adds a regression test. Reviewers: jmolloy Subscribers: jmolloy, llvm-commits Differential Revision: http://reviews.llvm.org/D12697 llvm-svn: 247265	2015-09-10 10:34:34 +00:00
James Molloy	8c995a93ce	[ARM] Do not use vtrn for vectorshuffle if the order is reversed The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case. Patch by Jeroen Ketema! llvm-svn: 247254	2015-09-10 08:42:28 +00:00
Kit Barton	d3b904d440	Enable the shrink wrapping optimization for PPC64. The changes in this patch are as follows: 1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function 2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function 3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run: Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64. Phabricator review: http://reviews.llvm.org/D11817 llvm-svn: 247237	2015-09-10 01:55:44 +00:00
Ahmed Bougacha	05541459fa	[AArch64] Match FI+offset in STNP addressing mode. First, we need to teach isFrameOffsetLegal about STNP. It already knew about the STP/LDP variants, but those were probably never exercised, because it's only the load/store optimizer that generates STP/LDP, and the only user of the method is frame lowering, which runs earlier. The STP/LDP cases were wrong: they didn't take into account the fact that they return two results, not one, so the immediate offset will be the 4th operand, not the 3rd. Follow-up to r247234. llvm-svn: 247236	2015-09-10 01:54:43 +00:00
Ahmed Bougacha	c0ac38d584	[AArch64] Match base+offset in STNP addressing mode. Followup to r247231. llvm-svn: 247234	2015-09-10 01:48:29 +00:00
Ahmed Bougacha	b8886b517d	[AArch64] Support selecting STNP. We could go through the load/store optimizer and match STNP where we would have matched a nontemporal-annotated STP, but that's not reliable enough, as an opportunistic optimization. Insetad, we can guarantee emitting STNP, by matching them at ISel. Since there are no single-input nontemporal stores, we have to resort to some high-bits-extracting trickery to generate an STNP from a plain store. Also, we need to support another, LDP/STP-specific addressing mode, base + signed scaled 7-bit immediate offset. For now, only match the base. Let's make it smart separately. Part of PR24086. llvm-svn: 247231	2015-09-10 01:42:28 +00:00
Reid Kleckner	7878391208	[WinEH] Add codegen support for cleanuppad and cleanupret All of the complexity is in cleanupret, and it mostly follows the same codepaths as catchret, except it doesn't take a return value in RAX. This small example now compiles and executes successfully on win32: extern "C" int printf(const char *, ...) noexcept; struct Dtor { ~Dtor() { printf("~Dtor\n"); } }; void has_cleanup() { Dtor o; throw 42; } int main() { try { has_cleanup(); } catch (int) { printf("caught it\n"); } } Don't try to put the cleanup in the same function as the catch, or Bad Things will happen. llvm-svn: 247219	2015-09-10 00:25:23 +00:00
Reid Kleckner	94b704c469	[SEH] Emit 32-bit SEH tables for the new EH IR The 32-bit tables don't actually contain PC range data, so emitting them is incredibly simple. The 64-bit tables, on the other hand, use the same table for state numbering as well as label ranges. This makes things more difficult, so it will be implemented later. llvm-svn: 247192	2015-09-09 21:10:03 +00:00
Dan Gohman	5e0668426c	[WebAssembly] Update target datalayout strings. llvm-svn: 247187	2015-09-09 20:54:31 +00:00
Renato Golin	db7ea86bf4	Revert "AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding." This reverts commit r247149, as it was breaking numerous buildbots of varied architectures. llvm-svn: 247177	2015-09-09 19:44:40 +00:00
Dan Gohman	f71abef701	[WebAssembly] Implement calls with void return types. llvm-svn: 247158	2015-09-09 16:13:47 +00:00
Tom Stellard	9a197676b1	AMDGPU/SI: Fold operands through REG_SEQUENCE instructions Summary: This helps mostly when we use add instructions for address calculations that contain immediates. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12256 llvm-svn: 247157	2015-09-09 15:43:26 +00:00
Igor Breger	ac29a82921	AVX512: Implemented encoding and intrinsics for vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4 Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11802 llvm-svn: 247149	2015-09-09 14:35:09 +00:00
Alex Lorenz	b9a68dbcae	Fix PR 24633 - Handle undef values when parsing standalone constants. llvm-svn: 247145	2015-09-09 13:44:33 +00:00
Daniel Sanders	2038747fce	Fix vector splitting for extract_vector_elt and vector elements of <8-bits. Summary: One of the vector splitting paths for extract_vector_elt tries to lower: define i1 @via_stack_bug(i8 signext %idx) { %1 = extractelement <2 x i1> <i1 false, i1 true>, i8 %idx ret i1 %1 } to: define i1 @via_stack_bug(i8 signext %idx) { %base = alloca <2 x i1> store <2 x i1> <i1 false, i1 true>, <2 x i1>* %base %2 = getelementptr <2 x i1>, <2 x i1>* %base, i32 %idx %3 = load i1, i1* %2 ret i1 %3 } However, the elements of <2 x i1> are not byte-addressible. The result of this is that the getelementptr expands to '%base + %idx * (1 / 8)' which simplifies to '%base + %idx * 0', and then simply '%base' causing all values of %idx to extract element zero. This commit fixes this by promoting the vector elements of <8-bits to i8 before splitting the vector. This fixes a number of test failures in pocl. Reviewers: pekka.jaaskelainen Subscribers: pekka.jaaskelainen, llvm-commits Differential Revision: http://reviews.llvm.org/D12591 llvm-svn: 247128	2015-09-09 09:53:20 +00:00
Dan Gohman	e590b33bf8	[WebAssembly] Fix lowering of calls with more than one argument. llvm-svn: 247118	2015-09-09 01:52:45 +00:00
Matt Arsenault	acd68b58ae	SelectionDAG: Support Expand of f16 extloads Currently this hits an assert that extload should always be supported, which assumes integer extloads. This moves a hack out of SI's argument lowering and is covered by existing tests. llvm-svn: 247113	2015-09-09 01:12:27 +00:00
Dan Gohman	4f52e00ecb	[WebAssembly] Implement WebAssemblyInstrInfo::copyPhysReg llvm-svn: 247110	2015-09-09 00:52:47 +00:00
Reid Kleckner	51189f0a1d	[WinEH] Avoid creating MBBs for LLVM BBs that cannot contain code Typically these are catchpads, which hold data used to decide whether to catch the exception or continue unwinding. We also shouldn't create MBBs for catchendpads, cleanupendpads, or terminatepads, since no real code can live in them. This fixes a problem where MI passes (like the register allocator) would try to put code into catchpad blocks, which are not executed by the runtime. In the new world, blocks ending in invokes now have many possible successors. llvm-svn: 247102	2015-09-08 23:28:38 +00:00
Reid Kleckner	df1295173f	[WinEH] Emit prologues and epilogues for funclets Summary: 32-bit funclets have short prologues that allocate enough stack for the largest call in the whole function. The runtime saves CSRs for the funclet. It doesn't restore CSRs after we finally transfer control back to the parent funciton via a CATCHRET, but that's a separate issue. 32-bit funclets also have to adjust the incoming EBP value, which is what llvm.x86.seh.recoverframe does in the old model. 64-bit funclets need to spill CSRs as normal. For simplicity, this just spills the same set of CSRs as the parent function, rather than trying to compute different CSR sets for the parent function and each funclet. 64-bit funclets also allocate enough stack space for the largest outgoing call frame, like 32-bit. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12546 llvm-svn: 247092	2015-09-08 22:44:41 +00:00
Eric Christopher	71f6e2f568	Fix the PPC CTR Loop pass to look for calls to the intrinsics that read CTR and count them as reading the CTR. llvm-svn: 247083	2015-09-08 22:14:58 +00:00
Derek Schuff	45c832c5d8	Fix comments and RUN line in x86-64 stdarg test leftover from last commit From http://reviews.llvm.org/D12346 llvm-svn: 247070	2015-09-08 20:58:41 +00:00
Derek Schuff	eef533f422	x32. Fixes a bug in how struct va_list is initialized in x32 Summary: This patch modifies X86TargetLowering::LowerVASTART so that struct va_list is initialized with 32 bit pointers in x32. It also includes tests that call @llvm.va_start() for x32. Patch by João Porto Subscribers: llvm-commits, hjl.tools Differential Revision: http://reviews.llvm.org/D12346 llvm-svn: 247069	2015-09-08 20:51:31 +00:00
Derek Schuff	ee4e947e23	x32. Fixes a bug in i8mem_NOREX declaration. The old implementation assumed LP64 which is broken for x32. Specifically, the MOVE8rm_NOREX and MOVE8mr_NOREX, when selected, would cause a 'Cannot emit physreg copy instruction' error message to be reported. This patch also enable the h-register*ll tests for x32. Differential Revision: http://reviews.llvm.org/D12336 Patch by João Porto llvm-svn: 247058	2015-09-08 19:47:15 +00:00
Matt Arsenault	966a94f861	AMDGPU: Handle sub of constant for DS offset folding sub C, x - > add (sub 0, x), C for DS offsets. This is mostly to fix regressions that show up when SeparateConstOffsetFromGEP is enabled. llvm-svn: 247054	2015-09-08 19:34:22 +00:00
David Blaikie	12dd3c4ebb	Fix CPP Backend for GEP API changes for opaque pointer types Based on a patch by Jerome Witmann. llvm-svn: 247047	2015-09-08 18:42:29 +00:00
JF Bastien	749ed88aa5	WebAssembly: NFC rename shr/sar Renamed from: https://github.com/WebAssembly/design/pull/332 llvm-svn: 247028	2015-09-08 17:21:21 +00:00
Dan Gohman	d4a12d25ff	[WebAssembly] Temporarily disable this test, as it depends on additional patches that aren't yet checked in. llvm-svn: 247011	2015-09-08 13:21:12 +00:00
Igor Breger	a54a1a84dd	AVX512: kunpck encoding implementation Added tests for encoding. Differential Revision: http://reviews.llvm.org/D12061 llvm-svn: 247010	2015-09-08 13:10:00 +00:00
Dan Gohman	25d2a0dda4	[WebAssembly] Enable SSA lowering and other pre-regalloc passes llvm-svn: 247008	2015-09-08 12:39:25 +00:00
Daniel Sanders	808dfb8ba7	[mips] Reserve address spaces 1-255 for software use. Summary: And define them to have noop casts with address spaces 0-255. Reviewers: pekka.jaaskelainen Subscribers: pekka.jaaskelainen, llvm-commits Differential Revision: http://reviews.llvm.org/D12678 llvm-svn: 246990	2015-09-08 09:07:03 +00:00
Elena Demikhovsky	e88038f235	AVX-512: Lowering for 512-bit vector shuffles. Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer. Differential Revision: http://reviews.llvm.org/D10683 llvm-svn: 246981	2015-09-08 06:38:21 +00:00
Simon Pilgrim	15fc134865	[X86][MMX] Added missing stack folding tests for MMX/SSSE3 instructions llvm-svn: 246949	2015-09-06 17:50:15 +00:00
Simon Pilgrim	b38c09d7ff	[X86][AVX512] Added 512-bit vector shift tests. Only works for avx512f (dq) targets so far - need to add avx512bw tests once char/short shifts are supported. llvm-svn: 246943	2015-09-06 13:36:32 +00:00
Hal Finkel	10aac5fd0e	[SelectionDAG] Swap commutative binops before constant-based folding In searching for a fix for the underlying code-quality bug highlighted by r246937 (that SDAG simplification can lead to us generating an ISD::OR node with a constant zero LHS), I ran across this: We generically canonicalize commutative binary-operation nodes in SDAG getNode so that, if only one operand is a constant, it will be on the RHS. However, we were doing this only after a bunch of constant-based simplification checks that all assume this canonical form (that any constant will be on the RHS). Moving the operand-swapping canonicalization prior to these checks seems like the right thing to do (and, as it turns out, causes SDAG to completely fold away the computation in test/CodeGen/ARM/2012-11-14-subs_carry.ll, just like InstCombine would do). llvm-svn: 246938	2015-09-06 05:42:13 +00:00
Hal Finkel	ccf9259c00	[PowerPC] Don't commute trivial rlwimi instructions To commute a trivial rlwimi instructions (meaning one with a full mask and zero shift), we'd need to ability to form an all-zero mask (instead of an all-one mask) using rlwimi. We can't represent this, however, and we'll miscompile code if we try. The code quality problem that this highlights (that SDAG simplification can lead to us generating an ISD::OR node with a constant zero LHS) will be fixed as a follow-up. Fixes PR24719. llvm-svn: 246937	2015-09-06 04:17:30 +00:00
Simon Pilgrim	a1ceba8ab4	[X86] Updated vector lzcnt tests. Added missing vec512 tests. llvm-svn: 246927	2015-09-05 11:56:30 +00:00
Simon Pilgrim	5cce4381ab	[X86] Updated vector tzcnt tests. Added vec512 tests. llvm-svn: 246922	2015-09-05 10:19:07 +00:00

... 2 3 4 5 6 ...

14004 Commits