llvm-project

Commit Graph

Author	SHA1	Message	Date
Benjamin Kramer	b7f5fb5751	Legalizer: Add support for splitting insert_subvectors. We handle this by spilling the whole thing to the stack and doing the insertion as a store. PR19492. This happens in real code because the vectorizer creates v2i128 when AVX is enabled. llvm-svn: 211435	2014-06-21 12:56:42 +00:00
Andrea Di Biagio	e5015d8aba	[X86] Add ISel patterns to select SSE3/AVX ADDSUB instructions. This patch adds ISel patterns to select SSE3/AVX ADDSUB instructions from a sequence of "vadd + vsub + blend". Example: /// typedef float float4 __attribute__((ext_vector_type(4))); float4 foo(float4 A, float4 B) { float4 X = A - B; float4 Y = A + B; return (float4){X[0], Y[1], X[2], Y[3]}; } /// Before this patch, (with flag -mcpu=corei7) llc produced the following assembly sequence: movaps %xmm0, %xmm2 addps %xmm1, %xmm2 subps %xmm1, %xmm0 blendps $10, %xmm2, %xmm0 With this patch, we now get a single addsubps %xmm1, %xmm0 llvm-svn: 211427	2014-06-21 01:31:15 +00:00
Reid Kleckner	4a01230db4	Generate native unwind info on Win64 This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211399	2014-06-20 20:35:47 +00:00
Tom Stellard	ae4c9e7bc3	R600/SI: Add patterns for ctpop inside a branch llvm-svn: 211378	2014-06-20 17:06:11 +00:00
Tom Stellard	9c603ebca4	R600/SI: Add a pattern for f32 ftrunc llvm-svn: 211377	2014-06-20 17:06:09 +00:00
Tom Stellard	a79e9f0f6d	R600: Expand vector flog2 llvm-svn: 211376	2014-06-20 17:06:07 +00:00
Tom Stellard	5222a88653	R600: Expand vector fexp2 llvm-svn: 211375	2014-06-20 17:06:05 +00:00
Tom Stellard	c9dedb8e29	R600/SI: Add a VALU pattern for i64 xor llvm-svn: 211373	2014-06-20 17:05:57 +00:00
Ulrich Weigand	59c6ab20d6	[PowerPC] Fix small argument stack slot offset for LE When small arguments (structures < 8 bytes or "float") are passed in a stack slot in the ppc64 SVR4 ABI, they must reside in the least significant part of that slot. On BE, this means that an offset needs to be added to the stack address of the parameter, but on LE, the least significant part of the slot has the same address as the slot itself. This changes the PowerPC back-end ABI code to only add the small argument stack slot offset for BE. It also adds test cases to verify the correct behavior on both BE and LE. llvm-svn: 211368	2014-06-20 16:34:05 +00:00
Rafael Espindola	e5bb30d9a7	Move test so that it is skipped if the ARM target is not enabled. llvm-svn: 211366	2014-06-20 15:30:38 +00:00
Oliver Stannard	5dc2934ba2	Emit the ARM build attributes ABI_PCS_wchar_t and ABI_enum_size. Emit the ARM build attributes ABI_PCS_wchar_t and ABI_enum_size based on module flags metadata. llvm-svn: 211349	2014-06-20 10:08:11 +00:00
Zoran Jovanovic	6a29b55a5a	ps][mips64r6] Added LSA/DLSA instructions Differential Revision: http://reviews.llvm.org/D3897 llvm-svn: 211346	2014-06-20 09:28:09 +00:00
Alp Toker	1d099d9339	Fix typos llvm-svn: 211304	2014-06-19 19:41:26 +00:00
Andrea Di Biagio	54b0949af9	[X86] Teach how to combine horizontal binop even in the presence of undefs. Before this change, the backend was unable to fold a build_vector dag node with UNDEF operands into a single horizontal add/sub. This patch teaches how to combine a build_vector with UNDEF operands into a horizontal add/sub when possible. The algorithm conservatively avoids to combine a build_vector with only a single non-UNDEF operand. Added test haddsub-undef.ll to verify that we correctly fold horizontal binop even in the presence of UNDEFs. llvm-svn: 211265	2014-06-19 10:29:41 +00:00
Matt Arsenault	8e34ecb797	R600: Add a few tests I forgot to add. These belong with r210827 llvm-svn: 211253	2014-06-19 04:24:43 +00:00
Matt Arsenault	a0050b0961	R600/SI: Add intrinsics for various math instructions. These will be used for custom lowering and for library implementations of various math functions, so it's useful to expose these as builtins. llvm-svn: 211247	2014-06-19 01:19:19 +00:00
Matt Arsenault	692bd5ec2f	R600: Handle fnearbyint The difference from rint isn't really relevant here, so treat them as equivalent. OpenCL doesn't have nearbyint, so this is sort of pointless other than for completeness. llvm-svn: 211229	2014-06-18 22:03:45 +00:00
Marek Olsak	51b8e7b2e7	R600/SI: add gather4 and getlod intrinsics (v3) This contains all the previous patches + getlod support on top of it. It doesn't use SDNodes anymore, so it's quite small. It also adds v16i8 to SReg_128, which is used for the sampler descriptor. Reviewed-by: Tom Stellard llvm-svn: 211228	2014-06-18 22:00:29 +00:00
Jan Vesely	85f0dbce5c	R600: Expand vector fceil Move fp64 fceil tests to fceil64.ll v2: rebase Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 211194	2014-06-18 17:57:29 +00:00
Ulrich Weigand	ad0cb91ed9	[PowerPC] Simplify and improve loading into TOC register During an indirect function call sequence on the 64-bit SVR4 ABI, generate code must load and then restore the TOC register. This does not use a regular LOAD instruction since the TOC register r2 is marked as reserved. Instead, the are two special instruction patterns: let RST = 2, DS = 2 in def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg), "ld 2, 8($reg)", IIC_LdStLD, [(PPCload_toc i64:$reg)]>, isPPC64; let RST = 2, DS = 10, RA = 1 in def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins), "ld 2, 40(1)", IIC_LdStLD, [(PPCtoc_restore)]>, isPPC64; Note that these not only restrict the destination of the load to r2, but they also restrict the source of the load to particular address combinations. The latter is a problem when we want to support the ELFv2 ABI, since there the TOC save slot is no longer at 40(1). This patch replaces those two instructions with a single instruction pattern that only hard-codes r2 as destination, but supports generic addresses as source. This will allow supporting the ELFv2 ABI, and also helps generate more efficient code for calls to absolute addresses (allowing simplification of the ppc64-calls.ll test case). llvm-svn: 211193	2014-06-18 17:52:49 +00:00
Ulrich Weigand	e581920d12	[PowerPC] Add back test case for absolute calls (removed in r211174) As requested by Hal Finkel, this adds back a test for calls to a known-constant function pointer value, and verifies that the 64-bit SVR4 indirect function call sequence is used. llvm-svn: 211190	2014-06-18 17:28:56 +00:00
Arnold Schwaighofer	fc308f5c9f	Add a triple so that right syntax is choosen on mac osx systems llvm-svn: 211188	2014-06-18 17:20:49 +00:00
Matt Arsenault	43160e7af2	R600/SI: Add intrinsics for brev instructions llvm-svn: 211187	2014-06-18 17:13:57 +00:00
Matt Arsenault	dbc9aae1fb	R600/SI: Prettier operand printing for 64-bit ops. Copy what is done for 32-bit already so the order is about the same. llvm-svn: 211186	2014-06-18 17:13:51 +00:00
Matheus Almeida	784f797d4c	[mips] SYNC $stype instruction was added in Mips32 but SYNC with an implied operand ($stype = 0) is valid since Mips2. llvm-svn: 211185	2014-06-18 17:10:30 +00:00
Matt Arsenault	4601093267	R600: Implement f64 ftrunc, ffloor and fceil. CI has instructions for these, so this fixes them for older hardware. llvm-svn: 211183	2014-06-18 17:05:30 +00:00
Matt Arsenault	e8208ec95b	R600: Custom lower f64 frint for pre-CI llvm-svn: 211182	2014-06-18 17:05:26 +00:00
Adam Nemet	efd0785d82	[X86] AVX512: Add non-temporal stores Note that I followed the AVX2 convention here and didn't add LLVM intrinsics for stores. These can be generated with the nontemporal hint on LLVM IR stores (see new test). The GCC builtins are lowered directly into nontemporal stores. <rdar://problem/17082571> llvm-svn: 211176	2014-06-18 16:51:10 +00:00
Ulrich Weigand	9aa09ef30f	[PowerPC] Do not use BLA with the 64-bit SVR4 ABI The PowerPC back-end uses BLA to implement calls to functions at known-constant addresses, which is apparently used for certain system routines on Darwin. However, with the 64-bit SVR4 ABI, this is actually incorrect. An immediate function pointer value on this platform is not directly usable as a target address for BLA: - in the ELFv1 ABI, the function pointer value refers to the function descriptor, not the code address - in the ELFv2 ABI, the function pointer value refers to the global entry point, but BL(A) would only be correct when calling the local entry point This bug didn't show up since using immediate function pointer values is not usually done in the 64-bit SVR4 ABI in the first place. However, I ran into this issue with a certain use case of LLVM as JIT, where immediate function pointer values were uses to implement callbacks from JITted code to helpers in statically compiled code. Fixed by simply not using BLA with the 64-bit SVR4 ABI. llvm-svn: 211174	2014-06-18 16:14:04 +00:00
Cameron McInally	f10a7c963b	Add pattern for unsigned v4i32->v4f64 convert on AVX512. llvm-svn: 211164	2014-06-18 14:04:37 +00:00
Jan Vesely	ecf5133a2b	R600: Implement 64bit SRA v2: Use capitalized variable name Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 211159	2014-06-18 12:27:17 +00:00
Jan Vesely	900ff2e74b	R600: Implement 64bit SRL v2: use C++ style comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 211158	2014-06-18 12:27:15 +00:00
Jan Vesely	25f362766e	R600: Implement 64bit SHL v2: Use c++ style comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 211157	2014-06-18 12:27:13 +00:00
Tim Northover	d82ed2e581	DAG: move sret demotion into most basic LowerCallTo implementation. It looks like there are two versions of LowerCallTo here: the SelectionDAGBuilder one is designed to operate on LLVM IR, and the TargetLowering one in the case where everything is at DAG level. Previously, only the SelectionDAGBuilder variant could handle demoting an impossible return to sret semantics (before delegating to the TargetLowering version), but this functionality is also useful for certain libcalls (e.g. 128-bit operations on 32-bit x86). So this commit moves the sret handling down a level. rdar://problem/17242889 llvm-svn: 211155	2014-06-18 11:52:44 +00:00
Kevin Qin	f0ec9aff2a	[AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR. ReconstructShuffle() may wrongly creat a CONCAT_VECTOR trying to concat 2 of v2i32 into v4i16. This commit is to fix this issue and try to generate UZP1 instead of lots of MOV and INS. Patch is initalized by Kevin Qin, and refactored by Tim Northover. llvm-svn: 211144	2014-06-18 05:54:42 +00:00
Louis Gerbarg	343f5cdfad	Allow X86FastIsel to cope with 64 bit absolute relocations This patch is a follow up to r211040 & r211052. Rather than bailing out of fast isel this patch will generate an alternate instruction (movabsq) instead of the leaq. While this will always have enough room to handle the 64 bit displacment it is generally over kill for internal symbols (most displacements will be within 32 bits) but since we have no way of communicating the code model to the the assmebler in order to avoid flagging an absolute leal/leaq as illegal when using a symbolic displacement. llvm-svn: 211130	2014-06-17 23:22:41 +00:00
Juergen Ributzka	aa60209311	[FastISel][X86] Optimize predicates and fold CMP instructions. This optimizes predicates for certain compares, such as fcmp oeq %x, %x to fcmp ord %x, %x. The latter one is more efficient to generate. The same optimization is applied to conditional branches. llvm-svn: 211126	2014-06-17 21:55:43 +00:00
Matt Arsenault	295b86e81d	R600/SI: Match cttz_zero_undef llvm-svn: 211116	2014-06-17 17:36:27 +00:00
Matt Arsenault	8579601050	R600/SI: Match ctlz_zero_undef llvm-svn: 211115	2014-06-17 17:36:24 +00:00
Tom Stellard	880a80ad07	R600: Use LDS and vectors for private memory llvm-svn: 211110	2014-06-17 16:53:14 +00:00
Tom Stellard	aad4659470	SelectionDAG: Expand i64 = FP_TO_SINT i32 llvm-svn: 211108	2014-06-17 16:53:07 +00:00
Juergen Ributzka	e35705675f	[FastISel][X86] Fix previous refactoring commit (r211077) Overlooked that fcmp_une uses an "or" instead of an "and" for combining the flags. llvm-svn: 211104	2014-06-17 14:47:45 +00:00
Tim Northover	d5531f72dc	AArch64: estimate inline asm length during branch relaxation To make sure branches are in range, we need to do a better job of estimating the length of an inline assembly block than "it's probably 1 instruction, who'd write asm with more than that?". Fortunately there's already a (highly suspect, see how many ways you can think of to break it!) callback for this purpose, which is used by the other targets. rdar://problem/17277590 llvm-svn: 211095	2014-06-17 11:31:42 +00:00
Juergen Ributzka	2da1bbc113	[FastISel][X86] Refactor the code to get the X86 condition from a helper function. NFC. Make use of helper functions to simplify the branch and compare instruction selection in FastISel. Also add test cases for compare and conditonal branch. llvm-svn: 211077	2014-06-16 23:58:24 +00:00
Reed Kotler	9fe3bfd087	Add load/store functionality Summary: This patches allows non conversions like i1=i2; where both are global ints. In addition, arithmetic and other things start to work since fast-isel will use existing patterns for non fast-isel from tablegen files where applicable. In addition i8, i16 will work in this limited context for assignment without the need for sign extension (zero or signed). It does not matter how i8 or i16 are loaded (zero or sign extended) since only the 8 or 16 relevant bits are used and clang will ask for sign extension before using them in arithmetic. This is all made more complete in forthcoming patches. for example: int i, j=1, k=3; void foo() { i = j + k; } Keep in mind that this pass is not enabled right now and is an experimental pass It can only be enabled with a hidden option to llvm of -mips-fast-isel. Test Plan: Run test-suite, loadstore2.ll and I will run some executable tests. Reviewers: dsanders Subscribers: mcrosier Differential Revision: http://reviews.llvm.org/D3856 llvm-svn: 211061	2014-06-16 22:05:47 +00:00
Bill Schmidt	5d82f09b53	[PPC64] Fix PR19893 - improve code generation for local function addresses Rafael opened http://llvm.org/bugs/show_bug.cgi?id=19893 to track non-optimal code generation for forming a function address that is local to the compile unit. The existing code was treating both local and non-local functions identically. This patch fixes the problem by properly identifying local functions and generating the proper addis/addi code. I also noticed that Rafael's earlier changes to correct the surrounding code in PPCISelLowering.cpp were also needed for fast instruction selection in PPCFastISel.cpp, so this patch fixes that code as well. The existing test/CodeGen/PowerPC/func-addr.ll is modified to test the new code generation. I've added a -O0 run line to test the fast-isel code as well. Tested on powerpc64[le]-unknown-linux-gnu with no regressions. llvm-svn: 211056	2014-06-16 21:36:02 +00:00
Tim Northover	b45c3b74b4	ARM: implement correct atomic operations on v7M ARM v7M has ldrex/strex but not ldrexd/strexd. This means 32-bit operations should work as normal, but 64-bit ones are almost certainly doomed. Patch by Phoebe Buckheister. llvm-svn: 211042	2014-06-16 18:49:36 +00:00
Louis Gerbarg	a5360c4cd8	Fix illegal relocations in X86FastISel On x86_86 the lea instruction can only use a 32 bit immediate value. When the code is compiled statically the RIP register is not used, meaning the immediate is all that can be used for the relocation, which is not sufficient in the case of targets more than +/- 2GB away. This patch bails out of fast isel in those cases and reverts to DAG which does the right thing. Test case included. llvm-svn: 211040	2014-06-16 17:35:40 +00:00
Cameron McInally	0d0489cea6	Hook up vector int_ctlz for AVX512. llvm-svn: 211024	2014-06-16 14:12:28 +00:00
Daniel Sanders	00463119a5	[mips][mips64r6] cl[oz], and dcl[oz] are re-encoded in MIPS32r6/MIPS64r6 Summary: There is no change to the restrictions, just the result register is stored once in the encoding rather than twice. The rt field is zero in MIPS32r6/MIPS64r6. Depends on D4119 Reviewers: zoran.jovanovic, jkolek, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D4120 llvm-svn: 211019	2014-06-16 13:18:59 +00:00

1 2 3 4 5 ...

10048 Commits