llvm-project

Commit Graph

Author	SHA1	Message	Date
Reid Kleckner	a13dfd539b	[WinEH] Setup RBP correctly in Win64 funclet prologues Previously local variable captures just didn't work in 64-bit. Now we can access local variables more or less correctly. llvm-svn: 248857	2015-09-29 23:32:01 +00:00
David Majnemer	91b0ab9172	[WinEH] Ensure that funclets obey the x64 ABI The x64 ABI requires that epilogues do not contain code other than stack adjustments and some limited control flow. However, we'd insert code to initialize the return address after stack adjustments. Instead, insert EAX/RAX with the current value before we create the stack adjustments in the epilogue. llvm-svn: 248839	2015-09-29 22:33:36 +00:00
Maksim Panchenko	cce239c45d	HHVM calling conventions. HHVM calling convention, hhvmcc, is used by HHVM JIT for functions in translated cache. We currently support LLVM back end to generate code for X86-64 and may support other architectures in the future. In HHVM calling convention any GP register could be used to pass and return values, with the exception of R12 which is reserved for thread-local area and is callee-saved. Other than R12, we always pass RBX and RBP as args, which are our virtual machine's stack pointer and frame pointer respectively. When we enter translation cache via hhvmcc function, we expect the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed to standard ABI alignment. This affects stack object alignment and stack adjustments for function calls. One extra calling convention, hhvm_ccc, is used to call C++ helpers from HHVM's translation cache. It is almost identical to standard C calling convention with an exception of first argument which is passed in RBP (before we use RDI, RSI, etc.) Differential Revision: http://reviews.llvm.org/D12681 llvm-svn: 248832	2015-09-29 22:09:16 +00:00
Chad Rosier	4315012769	[AArch64] Add support for pre- and post-index LDPSWs. llvm-svn: 248825	2015-09-29 20:39:55 +00:00
David Majnemer	a80c151286	[WinEH] Teach AsmPrinter about funclets Summary: Funclets have been turned into functions by the time they hit the object file. Make sure that they have decent names for the symbol table and CFI directives explaining how to reason about their prologues. Differential Revision: http://reviews.llvm.org/D13261 llvm-svn: 248824	2015-09-29 20:12:33 +00:00
Chad Rosier	dabe2534ed	[AArch64] Add integer pre- and post-index halfword/byte loads and stores. llvm-svn: 248817	2015-09-29 18:26:15 +00:00
Nemanja Ivanovic	2c84b29464	Addition of interfaces the BE to conform to Table A-2 of ELF V2 ABI V1.1 This patch corresponds to review: http://reviews.llvm.org/D13191 Back end portion of the fifth round of additions to altivec.h. llvm-svn: 248809	2015-09-29 17:41:53 +00:00
Chad Rosier	32d4d37e61	[AArch64] Scale offsets by the size of the memory operation. NFC. The immediate in the load/store should be scaled by the size of the memory operation, not the size of the register being loaded/stored. This change gets us one step closer to forming LDPSW instructions. This change also enables pre- and post-indexing for halfword and byte loads and stores. llvm-svn: 248804	2015-09-29 16:07:32 +00:00
Chad Rosier	a4d3217e81	[AArch64] Remove some redundant cases. NFC. llvm-svn: 248800	2015-09-29 14:57:10 +00:00
Jeroen Ketema	740f9d79ca	Arguments spilled on the stack before a function call may have alignment requirements, for example in the case of vectors. These requirements are exploited by the code generator by using move instructions that have similar alignment requirements, e.g., movaps on x86. Although the code generator properly aligns the arguments with respect to the displacement of the stack pointer it computes, the displacement itself may cause misalignment. For example if we have %3 = load <16 x float>, <16 x float>* %1, align 64 call void @bar(<16 x float> %3, i32 0) the x86 back-end emits: movaps 32(%ecx), %xmm2 movaps (%ecx), %xmm0 movaps 16(%ecx), %xmm1 movaps 48(%ecx), %xmm3 subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such. movl $0, 16(%esp) calll __bar To solve this, we need to make sure that the computed value with which the stack pointer is changed is a multiple af the maximal alignment seen during its computation. With this change we get proper alignment: subl $32, %esp movaps %xmm3, (%esp) Differential Revision: http://reviews.llvm.org/D12337 llvm-svn: 248786	2015-09-29 10:12:57 +00:00
NAKAMURA Takumi	0c12a3949e	[CMake] X86AsmParser: Prune redundant LINK_LIBS. It is described in LLVMBuild.txt. llvm-svn: 248771	2015-09-29 01:25:01 +00:00
Sanjay Patel	3a14f1a338	add a FIXME for a CPU model check that should have an attribute instead llvm-svn: 248746	2015-09-28 22:00:24 +00:00
Matt Arsenault	ba6aae785a	AMDGPU: Factor switch into separate function llvm-svn: 248742	2015-09-28 20:54:57 +00:00
Matt Arsenault	73aa8f687a	AMDGPU: Fix splitting x16 SMRD loads When used recursively, this would set the kill flag on the intermediate step from first splitting x16 to x8. llvm-svn: 248741	2015-09-28 20:54:52 +00:00
Matt Arsenault	e5d042cd56	AMDGPU: Fix moving SMRD loads with literal offsets on CI llvm-svn: 248740	2015-09-28 20:54:46 +00:00
Matt Arsenault	dd49c5fc1b	AMDGPU: Fix splitting SMRD with large offset The splitting of > 4 dword SMRD instructions if using an offset in an SGPR instead of an immediate was not setting the destination register, resulting an an instruction missing an operand which would assert later. Test will be included in a following commit which fixes a related issue. llvm-svn: 248739	2015-09-28 20:54:42 +00:00
Andrew Kaylor	16c4da03d5	Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing. Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com) Differential Revision: http://reviews.llvm.org/D11370 llvm-svn: 248735	2015-09-28 20:33:22 +00:00
Daniel Sanders	7727e1098c	[mips][p5600] Added P5600 processor and initial scheduler. Summary: The P5600 is an out-of-order, superscalar implementation of the MIPS32R5 architecture. The scheduler has a few missing details (see the 'Tricky Instructions' section and some quirks of the P5600 are deliberately omitted due to implementation difficulty and low chance of significant benefit (e.g. the predicate on P5600WriteEitherALU). However, testing on SingleSource is showing significant performance benefits on some apps (seven in the 10-30% range) and only one significant regression (12%) when -pre-RA-sched=linearize is given. Without -pre-RA-sched=linearize the results are more variable. Some do even better (up to 55% improvement) but increased numbers of copies are slowing others down (up to 12%). Overall, the scheduler as it currently stands is a 2.4% win with -pre-RA-sched=linearize and a 2.7% win without -pre-RA-sched=linearize. I'm sure we can improve on this further. For completeness, the FPGA this was tested on shows some failures with and without the P5600 scheduler. These appear to be scheduling related since the two test runs have fairly different sets of failing tests even after accounting for other factors (e.g. spurious connection failures) however it's not P5600 specific since we also get some for the generic scheduler. Reviewers: vkalintiris Subscribers: mpf, llvm-commits, atrick, vkalintiris Differential Revision: http://reviews.llvm.org/D12193 llvm-svn: 248725	2015-09-28 18:24:08 +00:00
Dan Gohman	05a17aa82a	[WebAssembly] Support for direct call and call_indirect. llvm-svn: 248716	2015-09-28 16:22:39 +00:00
Zoran Jovanovic	cdb64566cc	[mips] Handling of immediates bigger than 16 bits Differential Revision: http://reviews.llvm.org/D10539 llvm-svn: 248706	2015-09-28 11:11:34 +00:00
Artyom Skrobov	ad8a0638f7	[ARM] Avoid redundant checks for isThumb1Only() after supportsTailCall() supportsTailCall() has two callers. Both of them double-check isThumb1Only(), and refuse to proceed with tail-calling in that case. Therefore, it makes sense to move this check to ARMSubtarget::initSubtargetFeatures, where SupportsTailCall is initialized; and to eliminate the extra checks at the call sites. Following a review comment, added an "assert(supportsTailCall())" in IsEligibleForTailCall. NFC. llvm-svn: 248703	2015-09-28 09:44:11 +00:00
Craig Topper	862d5d8322	Remove 'const' from some ArrayRefs. ArrayRefs are already immutable. NFC llvm-svn: 248693	2015-09-28 00:15:34 +00:00
Yaron Keren	e5a9dc2f5b	Silence clang warning: variable ‘Status’ set but not used. llvm-svn: 248691	2015-09-27 21:31:33 +00:00
Matt Arsenault	1d36b717a5	AMDGPU: Remove hasPostISelHook from most instructions Since this is only needed for VOP3 and a few other special case instructions, stop setting it on everything. llvm-svn: 248657	2015-09-26 05:06:48 +00:00
Matt Arsenault	f32481372c	AMDGPU: Switch over reg class size instead of checking all super classes This gets isSGPRClass out of my profile of SIFixSGPRCopies. llvm-svn: 248656	2015-09-26 04:59:04 +00:00
Matt Arsenault	6e28010215	AMDGPU: Don't handle invalid reg classes in helper functions No tests hit these and it would be better to have checks like this explicit where they are used. llvm-svn: 248655	2015-09-26 04:53:30 +00:00
Saleem Abdulrasool	9174623b2d	AMDGPU: address -Winconsistent-missing-override Add missing override. NFC. llvm-svn: 248652	2015-09-26 04:34:52 +00:00
Matt Arsenault	8e1ddf84fe	AMDGPU: Set CopyCost of register classes These require multiple mov instructions to copy, but the default value is that 1 instruction is needed. I'm not sure if this actually changes anything. llvm-svn: 248651	2015-09-26 04:09:34 +00:00
Matt Arsenault	e98a074c42	AMDGPU: VOP3b definition cleanups llvm-svn: 248647	2015-09-26 02:25:48 +00:00
Matt Arsenault	86095b8dec	AMDGPU: Fix sched model for VOP2b instructions Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646	2015-09-26 02:25:45 +00:00
Dan Gohman	d0bf981296	[WebAssembly] Rename several functions and types according to the new spec. llvm-svn: 248644	2015-09-26 01:09:44 +00:00
Ahmed Bougacha	e81610fabb	[ARM] Don't generate clrex for pre-v7 targets. Since r248294, we emit clrex, but it doesn't exist on v6. llvm-svn: 248640	2015-09-26 00:14:02 +00:00
Matt Arsenault	e229c0c45e	AMDGPU: Construct new buffer instruction when moving SMRD It's easier to understand creating a full instruction than the current situation where sometimes a new instruction is created and sometimes it is awkwardly mutated in place. llvm-svn: 248627	2015-09-25 22:21:19 +00:00
Sanjay Patel	bbbf9a1a34	merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622	2015-09-25 21:49:48 +00:00
Tom Stellard	e135ffd554	AMDGPU/SI: Use .hsatext section instead of .text for HSA Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619	2015-09-25 21:41:28 +00:00
Matthias Braun	c2d4befb54	MachineBasicBlock: Factor out common code into isReturnBlock() llvm-svn: 248617	2015-09-25 21:25:19 +00:00
Matt Arsenault	f743b838cb	AMDGPU: Make getNamedOperandIdx declaration readonly This matches how it is defined in the generated implementation. llvm-svn: 248598	2015-09-25 18:09:15 +00:00
Chad Rosier	1bbd7fb38e	[AArch64] Add support for generating pre- and post-index load/store pairs. llvm-svn: 248593	2015-09-25 17:48:17 +00:00
Matt Arsenault	0a10900070	AMDGPU: Disable some passes that are not meaningful Don't run passes related to stack maps, garbage collection, exceptions since these aren't useful for GPUs. There might be a few more to turn off that I'm less sure about (e.g. ShrinkWrapping) or I'm not sure how to disable (SafeStack and StackProtector) llvm-svn: 248591	2015-09-25 17:41:20 +00:00
Matt Arsenault	4bf43d4e68	AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589	2015-09-25 17:27:08 +00:00
Matt Arsenault	0cb8517dc6	AMDGPU: Fix recomputing dominator tree unnecessarily SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands. llvm-svn: 248587	2015-09-25 17:21:28 +00:00
Matt Arsenault	2d6fdb8495	AMDGPU: Re-justify workaround and fix worked around problem When buffer resource descriptors were built, the upper two components of the descriptor were first composed into a 64-bit register because legalizeOperands assumed all operands had the same register class. Fix that problem, but keep the workaround. I'm not sure anything actually is actually emitting such a REG_SEQUENCE now. If multiple resource descriptors are set up with different base pointers, this is copied with a single s_mov_b64. We probably should fix this better by recognizing a pair of s_mov_b32 later, but for now delete the dead code. llvm-svn: 248585	2015-09-25 17:08:42 +00:00
Matt Arsenault	3ad55ec946	AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources This avoids needting to re-legalize the new REG_SEQUENCE. llvm-svn: 248584	2015-09-25 17:08:40 +00:00
Matt Arsenault	6525aa3529	AMDGPU: Fix not adding exec to defs of cmpx instruction pseudos This was only set on the final _si/_vi version, but not on the pseudos most of codegen sees. No test since these instructions aren't used yet. llvm-svn: 248583	2015-09-25 16:58:27 +00:00
Matt Arsenault	5f70436c49	AMDGPU: Improve accuracy of instruction rates for VOPC These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582	2015-09-25 16:58:25 +00:00
Saleem Abdulrasool	8e99f50768	ARM: make -Asserts,-Werror=unused-variable build happy The value was only used in an assertion. Sink the variable usage into the assertion. llvm-svn: 248562	2015-09-25 05:41:02 +00:00
Saleem Abdulrasool	fe83b50289	ARM: address WoA division limitation We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561	2015-09-25 05:15:46 +00:00
Matt Arsenault	8aa9973696	AMDGPU: Remove unused includes llvm-svn: 248553	2015-09-25 00:28:43 +00:00
Chad Rosier	b02f5a5a1f	[AArch64] Improve the readability of the ld/st optimization pass. NFC. In this context, MI is an add/sub instruction not a loads/store. llvm-svn: 248540	2015-09-24 21:27:49 +00:00
Simon Pilgrim	68d0050c6a	[X86][SSE2] Fix zero/any extension shuffles that don't start from the first element Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H llvm-svn: 248536	2015-09-24 21:02:17 +00:00

1 2 3 4 5 ...

34416 Commits