llvm-project

Commit Graph

Author	SHA1	Message	Date
Duncan P. N. Exon Smith	d77de6495e	X86: Remove implicit ilist iterator conversions, NFC llvm-svn: 250741	2015-10-19 21:48:29 +00:00
Benjamin Kramer	335332329b	Remove CRLF newlines. NFC. llvm-svn: 250698	2015-10-19 13:05:25 +00:00
Elena Demikhovsky	20662e39f1	Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore(). Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case. Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces. Differential Revision: http://reviews.llvm.org/D13850 llvm-svn: 250686	2015-10-19 07:43:38 +00:00
Asaf Badouh	696e8e0bb7	[X86][AVX512DQ] add scalar fpclass Differential Revision: http://reviews.llvm.org/D13769 llvm-svn: 250650	2015-10-18 11:04:38 +00:00
Igor Breger	cbb9550537	AVX512: Lowering i8/i16 vector CTLZ using the dword LZCNT vector instruction Differential Revision: http://reviews.llvm.org/D13632 llvm-svn: 250649	2015-10-18 09:56:39 +00:00
Simon Pilgrim	86c5e85e84	[X86][XOP] Add VPROT instruction opcodes Added X86ISD opcodes for VPROT vector rotate by variable and by immediate. llvm-svn: 250620	2015-10-17 19:04:24 +00:00
Craig Topper	9ff9bf4959	Replace a custom table sort check with std::is_sorted. Change a function to take ArrayRef instead of pointer and length. NFC llvm-svn: 250615	2015-10-17 16:37:13 +00:00
Simon Pilgrim	a18ae9bd70	[CostModel] Fixed AVX integer shift costs Targets with AVX but without AVX2 were incorrectly reporting costs of 256-bit integer shifts. llvm-svn: 250611	2015-10-17 13:23:38 +00:00
Simon Pilgrim	5b65f28fe7	[X86][FastISel] Teach how to select SSE4A nontemporal stores. Add FastISel support for SSE4A scalar float / double non-temporal stores Follow up to D13698 Differential Revision: http://reviews.llvm.org/D13773 llvm-svn: 250610	2015-10-17 13:04:42 +00:00
Reid Kleckner	28e490342b	[WinEH] Fix stack alignment in funclets and ParentFrameOffset calculation Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset is incorrect when CSRs are involved. We were supposed to have a test case to catch this, but it wasn't very rigorous. The main effect here is that calling _CxxThrowException inside a catchpad doesn't immediately crash on MOVAPS when you have an odd number of CSRs. llvm-svn: 250583	2015-10-16 23:43:27 +00:00
Sanjay Patel	bbd524496c	[x86] promote 'add nsw' to a wider type to allow more combines The motivation for this patch starts with PR20134: https://llvm.org/bugs/show_bug.cgi?id=20134 void foo(int *a, int i) { a[i] = a[i+1] + a[i+2]; } It seems better to produce this (14 bytes): movslq %esi, %rsi movl 0x4(%rdi,%rsi,4), %eax addl 0x8(%rdi,%rsi,4), %eax movl %eax, (%rdi,%rsi,4) Rather than this (22 bytes): leal 0x1(%rsi), %eax cltq leal 0x2(%rsi), %ecx movslq %ecx, %rcx movl (%rdi,%rcx,4), %ecx addl (%rdi,%rax,4), %ecx movslq %esi, %rax movl %ecx, (%rdi,%rax,4) The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine, but it gets more complicated after that because we need to consider architecture and micro-architecture. For example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm also not sure if FeatureSlowLEA should also mean "slow complex addressing mode". I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size improvements when building clang and the LLVM tools with the patched compiler. A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that this patch takes care of. Differential Revision: http://reviews.llvm.org/D13757 llvm-svn: 250560	2015-10-16 22:14:12 +00:00
Andrew Kaylor	09b39acc03	Fix assertion failure with fp128 to unsigned i64 conversion Patch by Mitch Bodart Differential Revision: http://reviews.llvm.org/D13780 llvm-svn: 250550	2015-10-16 20:39:20 +00:00
Craig Topper	09b6598572	[X86] Add fxsr feature flag for fxsave/fxrestore instructions. llvm-svn: 250497	2015-10-16 06:03:09 +00:00
Evgeniy Stepanov	9addbc9fc1	Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android." Breaks the hexagon buildbot. llvm-svn: 250461	2015-10-15 21:26:49 +00:00
Evgeniy Stepanov	142947e9f0	[safestack] Fast access to the unsafe stack pointer on AArch64/Android. Android libc provides a fixed TLS slot for the unsafe stack pointer, and this change implements direct access to that slot on AArch64 via __builtin_thread_pointer() + offset. This change also moves more code into TargetLowering and its target-specific subclasses to get rid of target-specific codegen in SafeStackPass. This change does not touch the ARM backend because ARM lowers builting_thread_pointer as aeabi_read_tp, which is not available on Android. llvm-svn: 250456	2015-10-15 20:50:16 +00:00
JF Bastien	2cdd5e4710	x86: preserve flags when folding atomic operations D4796 taught LLVM to fold some atomic integer operations into a single instruction. The pattern was unaware that the instructions clobbered flags. I fixed some of this issue in D13680 but had missed INC/DEC. This patch adds the missing EFLAGS definition. llvm-svn: 250438	2015-10-15 18:24:52 +00:00
JF Bastien	5b327712b0	x86 FP atomic codegen: don't drop globals, stack Summary: x86 codegen is clever about generating good code for relaxed floating-point operations, but it was being silly when globals and immediates were involved, forgetting where the global was and loading/storing from/to the wrong place. The same applied to hard-coded address immediates. Don't let it forget about the displacement. This fixes https://llvm.org/bugs/show_bug.cgi?id=25171 A very similar bug when doing floating-points atomics to the stack is also fixed by this patch. This fixes https://llvm.org/bugs/show_bug.cgi?id=25144 Reviewers: pete Subscribers: llvm-commits, majnemer, rsmith Differential Revision: http://reviews.llvm.org/D13749 llvm-svn: 250429	2015-10-15 16:46:29 +00:00
Benjamin Kramer	5dfcda73d5	[X86] Rip out orphaned method declarations and other dead code. NFC. llvm-svn: 250406	2015-10-15 14:09:59 +00:00
Igor Breger	d7bae451de	AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity ) Differential Revision: http://reviews.llvm.org/D13648 llvm-svn: 250400	2015-10-15 13:29:07 +00:00
Igor Breger	b4bb190eed	AVX512: Implemented encoding and intrinsics for vpternlogd/q. Differential Revision: http://reviews.llvm.org/D13768 llvm-svn: 250396	2015-10-15 12:33:24 +00:00
Elena Demikhovsky	ecff21b297	AVX-512: Fixed a bug in shuffle lowering 32-bit mode AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants. I split 8x64-bit const vector to 16x32 on 32-bit mode. Differential Revision: http://reviews.llvm.org/D13644 llvm-svn: 250390	2015-10-15 11:35:33 +00:00
Craig Topper	fd2cc7cd8a	Add XSAVE/XSAVEOPT to KNL processor. llvm-svn: 250362	2015-10-15 03:56:54 +00:00
Andrea Di Biagio	c47edbef4c	[x86][FastISel] Teach how to select nontemporal stores. This patch teaches x86 fast-isel how to select nontemporal stores. On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords. Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal vector stores. Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti' for doubleword/quadword nontemporal stores. In the case of nontemporal stores of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of movntps/movntpd/movntdq. With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now always get the expected (V)MOVNT instructions. The lack of fast-isel support for nontemporal stores was spotted when analyzing the -O0 codegen for nontemporal stores. Differential Revision: http://reviews.llvm.org/D13698 llvm-svn: 250285	2015-10-14 10:03:13 +00:00
Craig Topper	0ee356951a	[X86] Add XSAVE feature flags to their various processors. llvm-svn: 250268	2015-10-14 05:37:38 +00:00
Sanjay Patel	85030aa1bd	function names should start with a lower case letter; NFC llvm-svn: 250174	2015-10-13 16:23:00 +00:00
Sanjay Patel	b5723d0dbd	don't repeat function/class/variable names in comments; NFC llvm-svn: 250162	2015-10-13 15:12:27 +00:00
Michael Kuperstein	af22dafc8b	Fix line-ending issue. NFC. llvm-svn: 250151	2015-10-13 06:22:30 +00:00
Craig Topper	24b56a62bb	[X86] Mark the AAD and AAM aliases as not valid in 64-bit mode. llvm-svn: 250148	2015-10-13 05:12:07 +00:00
Craig Topper	4f76372afc	[X86] Change all the i8imm operands in XOP instructions to u8imm so the parser will check the size. llvm-svn: 250147	2015-10-13 05:06:25 +00:00
JF Bastien	986ed68eed	x86: preserve flags when folding atomic operations Summary: D4796 taught LLVM to fold some atomic integer operations into a single instruction. The pattern was unaware that the instructions clobbered flags. This patch adds the missing EFLAGS definition. Floating point operations don't set flags, the subsequent fadd optimization is therefore correct. The same applies for surrounding load/store optimizations. Reviewers: rsmith, rtrieu Subscribers: llvm-commits, reames, morisset Differential Revision: http://reviews.llvm.org/D13680 llvm-svn: 250135	2015-10-13 00:28:47 +00:00
Reid Kleckner	4a5f35c0ae	Make Win64 localescape offsets FP relative instead of SP relative We made them SP relative back in March (r233137) because that's the value the runtime passes to EH functions. With the new cleanuppad IR, funclets adjust their frame argument from SP to FP, so our offsets should now be FP-relative. llvm-svn: 250088	2015-10-12 19:43:34 +00:00
Andrea Di Biagio	b0fe4eb199	[x86] Fix wrong lowering of vsetcc nodes (PR25080). Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong assumption that for non-AVX512 targets, the source type and destination type of a type-legalized setcc node were always the same type. This assumption was unfortunately incorrect; the type legalizer is not always able to promote the return type of a setcc to the same type as the first operand of a setcc. In the case of a vsetcc node, the legalizer firstly checks if the first input operand has a legal type. If so, then it promotes the return type of the vsetcc to that same type. Otherwise, the return type is promoted to the 'next legal type', which, for vectors of MVT::i1 is always a 128-bit integer vector type. Example (-mattr=+avx): %0 = trunc <8 x i32> %a to <8 x i23> %1 = icmp eq <8 x i23> %0, zeroinitializer The initial selection dag for the code above is: v8i1 = setcc t5, t7, seteq:ch t5: v8i23 = truncate t2 t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1 t7: v8i32 = build_vector of all zeroes. The type legalizer would firstly check if 't5' has a legal type. If so, then it would reuse that same type to promote the return type of the setcc node. Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to promote the return type of the setcc node. Consequently, the setcc return type is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the following dag node: v8i16 = setcc t32, t25, seteq:ch where t32 and t25 are now values of type v8i32. Before this patch, function LowerVSETCC would have wrongly expanded the setcc to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an instruction. In our case, ISel would have matched a VPCMPEQWrr: t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25 However, t36 and t25 are both VR256, while the result type is instead of class VR128. This inconsistency ended up causing the insertion of COPY instructions like this: %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3 Which is an invalid full copy (not a sub register copy). Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy instruction" in the attempt to expand the malformed pseudo COPY instructions. This patch fixes the problem adding the missing logic in LowerVSETCC to handle the corner case of a setcc with 128-bit return type and 256-bit operand type. This problem was originally reported by Dimitry as PR25080. It has been latent for a very long time. I have added the minimal reproducible from that bugzilla as test setcc-lowering.ll. Differential Revision: http://reviews.llvm.org/D13660 llvm-svn: 250085	2015-10-12 19:22:30 +00:00
Sanjay Patel	0dc91b3143	combine predicates; NFCI llvm-svn: 250075	2015-10-12 18:15:08 +00:00
Sanjay Patel	b814ef1ad6	fix typos; NFC llvm-svn: 250059	2015-10-12 16:09:59 +00:00
Sanjay Patel	53d1d8b731	fix capitalization; NFC llvm-svn: 250049	2015-10-12 15:24:01 +00:00
Amjad Aboud	1db6d7af46	[X86] Add XSAVE intrinsic family Add intrinsics for the XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64) XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64) XSAVEC instructions (XSAVEC/XSAVEC64) XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64) Differential Revision: http://reviews.llvm.org/D13012 llvm-svn: 250029	2015-10-12 11:47:46 +00:00
Andrea Di Biagio	a0922ed8fe	[x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all indices have the most significant bit set. This patch fixes a problem in function 'combineX86ShuffleChain' that causes a chain of shuffles to be wrongly folded away when the combined shuffle mask has only one element. We may end up with a combined shuffle mask of one element as a result of multiple calls to function 'canWidenShuffleElements()'. Function canWidenShuffleElements attempts to simplify a shuffle mask by widening the size of the elements being shuffled. For every pair of shuffle indices, function canWidenShuffleElements checks if indices refer to adjacent elements. If all pairs refer to "adjacent" elements then the shuffle mask is safely widened. As a consequence of widening, we end up with a new shuffle mask which is half the size of the original shuffle mask. The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero indices. Function canWidenShuffleElements would combine each pair of SM_SentinelZero indices into a single SM_SentinelZero index. So, in a logarithmic number of steps (4 in this case), the pshufb mask is simplified to a mask with only one index which is equal to SM_SentinelZero. Before this patch, function combineX86ShuffleChain wrongly assumed that a mask of size one is always equivalent to an identity mask. So, the entire shuffle chain was just folded away as the combined shuffle mask was treated as a no-op mask. With this patch we know check if the only element of a combined shuffle mask is SM_SentinelZero. In case, we propagate a zero vector. Differential Revision: http://reviews.llvm.org/D13364 llvm-svn: 250027	2015-10-12 11:25:41 +00:00
Craig Topper	8d2e6bc25b	[X86] Use u8imm for the immediate type for all shift and rotate instructions. This way the assembler will perform range checking. Believe this matches gas behavior. llvm-svn: 250016	2015-10-12 06:23:10 +00:00
Craig Topper	d6b661dbf0	[X86] Add support to assembler and MCInst lowering to use the other vmovq %xmmX, %xmmX encoding if it would be a shorter VEX encoding. llvm-svn: 250014	2015-10-12 04:57:59 +00:00
Craig Topper	635e05df0a	[X86] Cleanup formatting a bit. NFC llvm-svn: 250013	2015-10-12 04:27:17 +00:00
Craig Topper	5be914eda1	[X86] Change the immediate for IN/OUT instructions to u8imm so the assembly parser will check the size. llvm-svn: 250012	2015-10-12 04:17:55 +00:00
Craig Topper	95fffba227	[X86] Add some instruction aliases to get the assembly parser table to favor arithmetic instructions with 8-bit immediates over the forms that implicitly use the ax/eax/rax. This allows us to remove the explicit code for working around the existing priority llvm-svn: 250011	2015-10-12 03:39:57 +00:00
Craig Topper	fcc34bdee0	[X86] Fix CMP and TEST with al/ax/eax/rax to not mark EFLAGS as a use or al/ax/eax/rax as a def. Probably doesn't have a functional affect since these aren't used in isel. llvm-svn: 249994	2015-10-11 19:54:02 +00:00
Craig Topper	87990ee4ec	[X86] Remove special validation for INT immediate operand from AsmParser. Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior. This also fixes a bug where negative immediates below -128 were not being reported as errors. llvm-svn: 249989	2015-10-11 18:27:24 +00:00
Craig Topper	a71630729d	[X86] Simplify immediate range checking code. llvm-svn: 249979	2015-10-11 16:38:14 +00:00
Simon Pilgrim	52d47e5704	[X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions. The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646). llvm-svn: 249976	2015-10-11 14:15:17 +00:00
Craig Topper	7143d8001a	Use range-based for loops. NFC. llvm-svn: 249941	2015-10-10 05:25:06 +00:00
Craig Topper	7d5b23101c	Use emplace_back instead of a constructor call and push_back. NFC llvm-svn: 249940	2015-10-10 05:25:02 +00:00
David Majnemer	bfa5b98201	[WinEH] Remove more dead code wineh-parent is dead, so is ValueOrMBB. llvm-svn: 249920	2015-10-10 00:04:29 +00:00
Reid Kleckner	14e773500e	[WinEH] Delete the old landingpad implementation of Windows EH The new implementation works at least as well as the old implementation did. Also delete the associated preparation tests. They don't exercise interesting corner cases of the new implementation. All the codegen tests of the EH tables have already been ported. llvm-svn: 249918	2015-10-09 23:34:53 +00:00

1 2 3 4 5 ...

12227 Commits