llvm-project

Commit Graph

Author	SHA1	Message	Date
Evandro Menezes	405c90e6cc	[AArch64] Adjust the scheduling model for Exynos M1. Further refine the model for branches. llvm-svn: 280736	2016-09-06 19:22:29 +00:00
Evandro Menezes	77e6b5d4e0	[AArch64] Adjust the scheduling model for Exynos M1. Further refine the model for stores. llvm-svn: 280735	2016-09-06 19:22:27 +00:00
Evandro Menezes	199cad4f17	[AArch64] Adjust the scheduling model for Exynos M1. Further refine the model for loads. llvm-svn: 280734	2016-09-06 19:22:19 +00:00
Davide Italiano	5715012b9e	[MCTargetDesc] Delete dead code. Found by GCC7 -Wunused-function. Also unbreak newer gcc build with -Werror. llvm-svn: 280726	2016-09-06 18:02:09 +00:00
Krzysztof Parzyszek	7c9b012629	[RDF] Ignore undef use operands llvm-svn: 280717	2016-09-06 17:03:13 +00:00
Chris Dewhurst	92cac9322d	[Sparc][Leon] Corrected supported atomics size for processors supporting Leon CASA instruction back to 32 bits. This was erroneously checked-in for 64 bits while trying to find if there was a way to get 64 bit atomicity in Leon processors. There is not and this change should not have been checked-in. There is no unit test for this as the existing unit tests test for behaviour to 32 bits, which was the original intention of the code. llvm-svn: 280710	2016-09-06 14:41:09 +00:00
Simon Dardis	b432a3ed7e	[mips] Tighten FastISel restrictions LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower arguments assuming that it was using the paired 32bit registers to perform operations for f64. This mode of operation is not supported for MIPSR6. This patch resolves the reported issue by adding additional checks for unsupported floating point unit configuration. Thanks to mike.k for reporting this issue! Reviewers: seanbruno, vkalintiris Differential Review: https://reviews.llvm.org/D23795 llvm-svn: 280706	2016-09-06 12:36:24 +00:00
Krzysztof Parzyszek	020ec299bf	[PPC] Claim stack frame before storing into it, if no red zone is present Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it there is no guarantee that this part of the stack will not be modified by any interrupt. To avoid this, make sure to claim the stack frame first before storing into it. This fixes https://llvm.org/bugs/show_bug.cgi?id=26519. Differential Revision: https://reviews.llvm.org/D24093 llvm-svn: 280705	2016-09-06 12:30:00 +00:00
Craig Topper	4fa3b50fc3	[AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast. We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type. llvm-svn: 280696	2016-09-06 06:56:59 +00:00
Craig Topper	43fbd840dd	[X86] Remove unused encoding from IntrinsicType enum. llvm-svn: 280694	2016-09-06 05:45:24 +00:00
Craig Topper	a0055d315d	[X86] Fix indentation. NFC llvm-svn: 280693	2016-09-06 05:45:21 +00:00
Saleem Abdulrasool	bfa25bd1ac	ARM: workaround bundled operation predication This is a Windows ARM specific issue. If the code path in the if conversion ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up with a bundle to ensure that the mov.w/mov.t pair is not split up. This is normally fine, however, if the branch is also predicated, then we end up trying to predicate the bundle. For now, report a bundle as being unpredicatable. Although this is false, this would trigger a failure case previously anyways, so this is no worse. That is, there should not be any code which would previously have been if converted and predicated which would not be now. Under certain circumstances, it may be possible to "predicate the bundle". This would require scanning all bundle instructions, and ensure that the bundle contains only predicatable instructions, and converting the bundle into an IT block sequence. If the bundle is larger than the maximal IT block length (4 instructions), it would require materializing multiple IT blocks from the single bundle. llvm-svn: 280689	2016-09-06 04:00:12 +00:00
Craig Topper	62d0a5e7d3	[AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets. llvm-svn: 280684	2016-09-06 00:31:10 +00:00
Craig Topper	dfc4fc9f02	[AVX-512] Teach fastisel load/store handling to use EVEX encoded instructions for 128/256-bit vectors and scalar single/double. Still need to fix the register classes to allow the extended range of registers. llvm-svn: 280682	2016-09-05 23:58:40 +00:00
Craig Topper	93f7b5699b	[AVX-512] Integrate mask register copying more completely into X86InstrInfo::copyPhysReg and simplify. No functional change intended. The code is now written in terms of source and dest classes with feature checks inside each type of copy instead of having separate functions for each feature set. llvm-svn: 280673	2016-09-05 20:34:50 +00:00
Benjamin Kramer	ef0a45aaa5	[WebAssembly] Unbreak the build. Not sure why ADL isn't working here. llvm-svn: 280656	2016-09-05 12:06:47 +00:00
Valery Pykhtin	8bc659637c	[AMDGPU] Refactor FLAT TD instructions Differential revision: https://reviews.llvm.org/D24072 llvm-svn: 280655	2016-09-05 11:22:51 +00:00
James Molloy	728cf85950	[Thumb1] Add relocations for fixups fixup_arm_thumb_{br,bcc} These need to be mapped through to R_ARM_THM_JUMP{11,8} respectively. Fixes PR30279. llvm-svn: 280651	2016-09-05 08:29:15 +00:00
Igor Breger	a2f8ca9a34	[AVX512] Fix v8i1 /v16i1 zext + bitcast lowering pattern. Explicitly zero upper bits. Differential Revision: http://reviews.llvm.org/D23983 llvm-svn: 280650	2016-09-05 08:26:51 +00:00
Craig Topper	428169a5d6	[X86] Make some static arrays of opcodes const and shrink to uint16_t. NFC llvm-svn: 280649	2016-09-05 07:14:21 +00:00
Craig Topper	d9ca3d97ef	[AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space. Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available. llvm-svn: 280648	2016-09-05 06:43:06 +00:00
Craig Topper	e3807febd8	[X86] Remove FsVMOVAPSrm/FsVMOVAPDrm/FsMOVAPSrm/FsMOVAPDrm. Due to their placement in the td file they had lower precedence than (V)MOVSS/SD and could almost never be selected. The only way to select them was in AVX512 mode because EVEX VMOVSS/SD was below them and the patterns weren't qualified properly for AVX only. So if you happened to have an aligned FR32/FR64 load in AVX512 you could get a VEX encoded VMOVAPS/VMOVAPD. I tried to search back through history and it seems like these instructions were probably unselectable for at least 5 years, at least to the time the VEX versions were added. But I can't prove they ever were. llvm-svn: 280644	2016-09-05 02:20:49 +00:00
Craig Topper	040b10784e	[AVX-512] Add EVEX encoded scalar FMA intrinsic instructions to isNonFoldablePartialRegisterLoad. llvm-svn: 280636	2016-09-04 19:33:47 +00:00
Craig Topper	4177345d7f	[AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div intrinsics and upgrade to native IR. llvm-svn: 280633	2016-09-04 18:13:33 +00:00
Hal Finkel	f0bc9db96e	[PowerPC] During branch relaxation, recompute padding offsets before each iteration We used to compute the padding contributions to the block sizes during branch relaxation only at the start of the transformation. As we perform branch relaxation, we change the sizes of the blocks, and so the amount of inter-block padding might change. Accordingly, we need to recompute the (alignment-based) padding in between every iteration on our way toward the fixed point. Unfortunately, I don't have a test case (and none was provided in the bug report), and while this obviously seems needed, algorithmically, I don't have any way of generating a small and/or non-fragile regression test. llvm-svn: 280626	2016-09-04 14:18:29 +00:00
Igor Breger	7e2a0dfa0c	revert r279960. https://llvm.org/bugs/show_bug.cgi?id=30249 llvm-svn: 280625	2016-09-04 14:03:52 +00:00
Simon Pilgrim	122b0de1c1	Strip trailing whitespace llvm-svn: 280623	2016-09-04 13:28:46 +00:00
Hal Finkel	73390c7acd	[PowerPC] Zero-extend constants in FastISel As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which are illegal types promoted to i32 on PowerPC, is a choice constrained by assumptions within the infrastructure. Specifically, the logic in FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI operands will be zero extended, and so, at least when materializing constants that are PHI operands, we must do the same. The rest of our fast-isel implementation does not appear to depend on the fact that we were sign-extending i8/i16 constants, and all other targets also appear to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we had been doing this only for i1 constants, and sign-extending the others). Fixes PR27721. llvm-svn: 280614	2016-09-04 06:07:19 +00:00
Craig Topper	af0d63d2e7	[AVX-512] Remove masked integer add/sub/mull intrinsics and upgrade to native IR. llvm-svn: 280611	2016-09-04 02:09:53 +00:00
Simon Pilgrim	3606d2346c	Strip trailing whitespace llvm-svn: 280598	2016-09-03 20:36:05 +00:00
Matt Arsenault	ac42ba8633	AMDGPU: Set sizes of spill pseudos llvm-svn: 280595	2016-09-03 17:25:44 +00:00
Matt Arsenault	5ffe3e1d93	AMDGPU: Fix adding duplicate implicit exec uses I'm not sure if this should be considered a bug in copyImplicitOps or not, but implicit operands that are part of the static instruction definition should not be copied. llvm-svn: 280594	2016-09-03 17:25:39 +00:00
Craig Topper	907b580d72	[AVX-512] Add integer ADD/SUB instructions to load folding tables. Add an AVX512 stack folding test. llvm-svn: 280593	2016-09-03 17:20:07 +00:00
Craig Topper	392cd0300d	[AVX-512] Mark EVEX encoded vpcmpeq as commutable just like its AVX and SSE equivalent. llvm-svn: 280592	2016-09-03 16:28:03 +00:00
Nicolai Haehnle	3bba6a8438	AMDGPU: Reduce the duration of whole-quad-mode Summary: This contains two changes that reduce the time spent in WQM, with the intention of reducing bandwidth required by VMEM loads: 1. Sampling instructions by themselves don't need to run in WQM, only their coordinate inputs need it (unless of course there is a dependent sampling instruction). The initial scanInstructions step is modified accordingly. 2. When switching back from WQM to Exact, switch back as soon as possible. This affects the logic in processBlock. This should always be a win or at best neutral. There are also some cleanups (e.g. remove unused ExecExports) and some new debugging output. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22092 llvm-svn: 280590	2016-09-03 12:26:38 +00:00
Nicolai Haehnle	a246dccc26	AMDGPU: Fix an interaction between WQM and polygon stippling Summary: This fixes a rare bug in polygon stippling with non-monolithic pixel shaders. The underlying problem is as follows: the prolog part contains the polygon stippling sequence, i.e. a kill. The main part then enables WQM based on the _reduced_ exec mask, effectively undoing most of the polygon stippling. Since we cannot know whether polygon stippling will be used, the main part of a non-monolithic shader must always return to exact mode to fix this problem. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23131 llvm-svn: 280589	2016-09-03 12:26:32 +00:00
Matt Arsenault	2510a31677	AMDGPU: Fix spilling of m0 readlane/writelane do not support using m0 as the output/input. Constrain the register class of spill vregs to try to avoid this, but also handle spilling of the physreg when necessary by inserting an additional copy to a normal SGPR. llvm-svn: 280584	2016-09-03 06:57:55 +00:00
Craig Topper	892ce56901	[AVX-512] Add EVEX encoded VPCMPEQ and VPCMPGT to the load folding tables. llvm-svn: 280581	2016-09-03 04:37:50 +00:00
Hal Finkel	522e4d9d66	[PowerPC] Support asm parsing for bc[l][a][+-] mnemonics PowerPC assembly code in the wild, so it seems, has things like this: bc+ 12, 28, .L9 This is a bit odd because the '+' here becomes part of the BO field, and the BO field is otherwise the first operand. Nevertheless, the ISA specification does clearly say that the +- hint syntax applies to all conditional-branch mnemonics (that test either CTR or a condition register, although not the forms which check both), both basic and extended, so this is supposed to be valid. This introduces some asm-parser-only definitions which take only the upper three bits from the specified BO value, and the lower two bits are implied by the +- suffix (via some associated aliases). Fixes PR23646. llvm-svn: 280571	2016-09-03 02:31:44 +00:00
Hal Finkel	28842b96f3	[PowerPC] Add asm parser/disassembler support for hrfid,nap,slbmfev These few book-III instructions are used by the Linux kernel. Partially fixes PR24796. llvm-svn: 280560	2016-09-02 23:42:01 +00:00
Hal Finkel	277736eee6	[PowerPC] Add support for the extended dcbf form and mnemonics dcbf has an optional hint-like field, add support for the extended form and the associated mnemonics (dcbfl and dcbflp). Partially fixes PR24796. llvm-svn: 280559	2016-09-02 23:41:54 +00:00
Ron Lieberman	88159e5549	Make sure to maintain register liveness when generating predicated instructions. Author: Krzysztof Parzyszek <kparzysz@codeaurora.org> Differential Revision: https://reviews.llvm.org/D24209 llvm-svn: 280552	2016-09-02 22:56:24 +00:00
Hal Finkel	7b104d4721	[PowerPC] For larger offsets, when possible, fold offset into addis toc@ha When we have an offset into a global, etc. that is accessed relative to the TOC base pointer, and the offset is larger than the minimum alignment of the global itself and the TOC base pointer (which is 8-byte aligned), we can still fold the @toc@ha into the memory access, but we must update the addis instruction's symbol reference with the offset as the symbol addend. When there is only one use of the addi to be folded and only one use of the addis that would need its symbol's offset adjusted, then we can make the adjustment and fold the @toc@l into the memory access. llvm-svn: 280545	2016-09-02 21:37:07 +00:00
James Y Knight	6ef32bf2af	[Sparc] Mark i128 shift libcalls unavailable in 32-bit mode. Recently, llvm wants to emit calls to these functions, while it didn't seem to be an issue before. Not sure why. Nor do I know why only these three are important to disable, out of all of the i128 libcalls. Nevertheless, many other targets have this snippet of code, so, just copying it to sparc as well, to unbreak things. llvm-svn: 280537	2016-09-02 20:29:11 +00:00
Jan Vesely	ea45746d5a	AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors have the same number of elements. Fixes R600 piglit regressions since r280298 Differential Revision: https://reviews.llvm.org/D24174 llvm-svn: 280535	2016-09-02 20:13:19 +00:00
Sjoerd Meijer	6c4140b6c0	Setting fp trapping mode and denormal type: this an improvement of r280246 and calculates compatibility of functions attributes in a better way. Differential Revision: https://reviews.llvm.org/D24070 llvm-svn: 280534	2016-09-02 19:51:34 +00:00
Jan Vesely	00864886f4	AMDGPU/R600: Expand unaligned writes to local and global AS LOCAL and GLOBAL AS only PRIVATE needs special treatment Differential Revision: https://reviews.llvm.org/D23971 llvm-svn: 280526	2016-09-02 19:07:06 +00:00
Wei Mi	c54d1298f5	Split the store of a wide value merged from an int-fp pair into multiple stores. For the store of a wide value merged from a pair of values, especially int-fp pair, sometimes it is more efficent to split it into separate narrow stores, which can remove the bitwise instructions or sink them to colder places. Now the feature is only enabled on x86 target, and only store of int-fp pair is splitted. It is possible that the application scope gets extended with perf evidence support in the future. Differential Revision: https://reviews.llvm.org/D22840 llvm-svn: 280505	2016-09-02 17:17:04 +00:00
Derek Schuff	a66ae923e4	[WebAssembly] Update known test failures Fixed an issue with the experimental C headers llvm-svn: 280498	2016-09-02 16:26:24 +00:00
Craig Topper	e75c49543c	[AVX-512] Remove floating point logical operation instrinsics and replace them with native IR. llvm-svn: 280466	2016-09-02 05:29:17 +00:00
Craig Topper	45d6503089	[AVX-512] Add more patterns for masked and broadcasted logical operations where the select or broadcast has a floating point type. These are needed in order to remove the masked floating point logical operation intrinsics and use native IR. llvm-svn: 280465	2016-09-02 05:29:13 +00:00
Craig Topper	00aecd97bf	[AVX-512] Add execution domain fixing for logical operations with broadcast loads. This builds on the handling of masked ops since we need to keep element size the same. llvm-svn: 280464	2016-09-02 05:29:09 +00:00
Craig Topper	f8ad647b93	[X86] Strengthen some SDNode type constraints. llvm-svn: 280463	2016-09-02 04:25:33 +00:00
Craig Topper	8b9e671e97	[AVX-512] Add NoVLX Predicates to some patterns so they don't rely on pattern ordering to be lower priority than their equivalent VLX pattern. llvm-svn: 280462	2016-09-02 04:25:30 +00:00
Hal Finkel	5ef4b03106	[PowerPC] hasAndNotCompare should return true As Sanjay suggested when he added the hook, PPC should return true from hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a compare). Fixes PR27203. llvm-svn: 280457	2016-09-02 02:58:25 +00:00
Hal Finkel	a39fd4bc53	[PowerPC] Add a pattern for a runtime bit check Following a suggestion by Sanjay, we should lower: %shl = shl i32 1, %y %and = and i32 %x, %shl %cmp = icmp eq i32 %and, %shl ret i1 %cmp into: subfic r4, r4, 32 rlwnm r3, r3, r4, 31, 31 Add this pattern and some associated patterns for the 64-bit case and the not-equal case. Fixes PR27356. llvm-svn: 280454	2016-09-02 02:34:44 +00:00
Hal Finkel	b54579fab6	[PowerPC] Don't apply the PPC64 address-formation peephole for offsets greater than 7 When applying our address-formation PPC64 peephole, we are reusing the @ha TOC addis value with the low parts associated with different offsets (i.e. different effective symbol addends). We were assuming this was okay so long as the offsets were less than the alignment of the global variable being accessed. This ignored the fact, however, that the TOC base pointer itself need only be 8-byte aligned. As a result, what we were doing is legal only for offsets less than 8 regardless of the alignment of the object being accessed. Fixes PR28727. llvm-svn: 280441	2016-09-02 00:28:20 +00:00
Hal Finkel	1e8218cc09	[PowerPC] Don't consider fusion in PPC64 address-formation peephole The logic in this function assumes that the P8 supports fusion of addis/addi, but it does not. As a result, there is no advantage to restricting our peephole application, merging addi instructions into dependent memory accesses, even when the addi has multiple users, regardless of whether or not we're optimizing for size. We might need something like this again for the P9; I suspect we'll revisit this code when we work on P9 tuning. llvm-svn: 280440	2016-09-02 00:27:50 +00:00
Michael Kuperstein	5f17d08f49	[SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes Prior to this, we could generate a vector_shuffle from an IR shuffle when the size of the result was exactly the sum of the sizes of the input vectors. If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts and inserts. Instead, we can form a larger vector_shuffle, and then extract a subvector of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8> and then extract a <12 x i8>. This also includes a target-specific X86 combine that in the presence of AVX2 combines: (vector_shuffle <mask> (concat_vectors t1, undef) (concat_vectors t2, undef)) into: (vector_shuffle <mask> (concat_vectors t1, t2), undef) in cases where this allows us to form VPERMD/VPERMQ. (This is not a separate commit, as that pattern does not appear without the DAGBuilder change.) llvm-svn: 280418	2016-09-01 21:32:09 +00:00
Heejin Ahn	c0f18172f5	[WebAssembly] Add asm.js-style setjmp/longjmp handling for wasm (reland r280302) Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism. Reviewers: jpp, dschuff Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D24121 llvm-svn: 280415	2016-09-01 21:05:15 +00:00
Tim Northover	8d8812c5d7	GlobalISel: add a G_PHI instruction to give phis a type. They're another source of generic vregs, which are going to need a type on the definition when we remove the register width from MachineRegisterInfo. llvm-svn: 280412	2016-09-01 20:45:41 +00:00
Andrey Turetskiy	cde38b6a99	[X86] Loosen memory folding requirements for cvtdq2pd and cvtps2pd instructions. According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned to 16 bytes. This patch removes this requirement from the memory folding table. Differential Revision: https://reviews.llvm.org/D23919 llvm-svn: 280402	2016-09-01 18:50:02 +00:00
Yaxun Liu	add05a8d95	AMDGPU: Add runtime metadata for pointee alignment of argument. Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer. Differential Revision: https://reviews.llvm.org/D24145 llvm-svn: 280399	2016-09-01 18:46:49 +00:00
Changpeng Fang	b28fe0307f	AMDGPU/SI: MIMG TD Refactoring. Summary: Created a new td file MIMGInstructions.td which contains all definitions of MIMG related instructions. Reviewed by: kzhuravl, vpykhtin Differential Revision: http://reviews.llvm.org/D24106 llvm-svn: 280385	2016-09-01 17:54:54 +00:00
Simon Dardis	1fa1fb0f8d	[mips] Include missed file from previous commit llvm-svn: 280377	2016-09-01 15:03:13 +00:00
Simon Pilgrim	ce0e9f0b91	[X86][SSE] Dropped (V)CVTPD2PS intrinsic patterns now that its bound to X86vfpround It now uses X86vfpround patterns directly instead. Followup to D23797 llvm-svn: 280376	2016-09-01 14:59:20 +00:00
Simon Dardis	bd27154757	[mips] interAptiv based generic schedule model This scheduler describes a processor which covers all MIPS ISAs based around the interAptiv and P5600 timings. Reviewers: vkalintiris, dsanders Differential Revision: https://reviews.llvm.org/D23551 llvm-svn: 280374	2016-09-01 14:53:53 +00:00
Krzysztof Parzyszek	07d9f53b51	[Hexagon] Deal with undefs when extending live intervals Reapply r280275, since MSVC accepts r280358. llvm-svn: 280369	2016-09-01 13:59:35 +00:00
Elena Demikhovsky	4d7738dfde	Optimized FMA intrinsic + FNEG , like -(ab+c) and FNEG + FMA, like ab-c or (-a)*b+c. The bug description is here : https://llvm.org/bugs/show_bug.cgi?id=28892 Differential revision: https://reviews.llvm.org/D23313 llvm-svn: 280368	2016-09-01 13:58:53 +00:00
Hal Finkel	5081ac27c7	Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is lowered using: ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET) where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86, FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not work for PowerPC. Because of the way that the stack layout works, the canonical frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC (there is a lower save-area offset as well), so it is not just a matter of implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its semantics -- We can do that, since it is currently used only for @llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips currently does this, but by using a custom lowering for ADD that specifically recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern. This change introduces a ISD::EH_DWARF_CFA node, which by default expands using the existing logic, but can be directly lowered by the target. Mips is updated to use this method (which simplifies its implementation, and I suspect makes it more robust), and updates PowerPC to do the same. Fixes PR26761. Differential Revision: https://reviews.llvm.org/D24038 llvm-svn: 280350	2016-09-01 10:28:47 +00:00
Valery Pykhtin	1b13886b5f	[AMDGPU] Scalar Memory instructions TD refactoring Differential revision: https://reviews.llvm.org/D23996 llvm-svn: 280349	2016-09-01 09:56:47 +00:00
Dean Michael Berris	6d6addbe15	[NFC] Remove unnecessary comment llvm-svn: 280336	2016-09-01 01:58:24 +00:00
Dean Michael Berris	e8ae5baaf7	[XRay] Detect and emit sleds for sibling/tail calls Summary: This change promotes the 'isTailCall(...)' member function to TargetInstrInfo as a query interface for determining on a per-target basis whether a given MachineInstr is a tail call instruction. We build upon this in the XRay instrumentation pass to emit special sleds for tail call optimisations, where we emit the correct kind of sled. The tail call sleds look like a mix between the function entry and function exit sleds. Form-wise, the sled comes before the "jmp" instruction that implements the tail call similar to how we do it for the function entry sled. Functionally, because we know this is a tail call, it behaves much like an exit sled -- i.e. at runtime we may use the exit trampolines instead of a different kind of trampoline. A follow-up change to recognise these sleds will be done in compiler-rt, so that we can start intercepting these initially as exits, but also have the option to have different log entries to more accurately reflect that this is actually a tail call. Reviewers: echristo, rSerge, majnemer Subscribers: mehdi_amini, dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D23986 llvm-svn: 280334	2016-09-01 01:29:13 +00:00
Dean Michael Berris	40e6ba16a1	[XRay][NFC] Promote isTailCall() as virtual in TargetInstrInfo. This change is broken out from D23986, where XRay detects tail call exits. llvm-svn: 280331	2016-09-01 01:03:22 +00:00
Heejin Ahn	10a7086700	Revert "Add asm.js-style setjmp/longjmp handling for wasm" This reverts commit r280302, it broke the integration tests. llvm-svn: 280329	2016-09-01 00:44:37 +00:00
Heejin Ahn	23d57103a4	Add asm.js-style setjmp/longjmp handling for wasm Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism. Reviewers: jpp, dschuff Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D23928 llvm-svn: 280302	2016-08-31 22:40:34 +00:00
Reid Kleckner	109448ee81	Revert "Add an optional parameter with a list of undefs to extendToIndices" This reverts commit r280268, it causes all MSVC 2013 to ICE. This appears to have been fixed in a later MSVC 2013 update, because I cannot reproduce it locally. That said, all upstream LLVM bots are broken right now, so I am reverting. Also reverts dependent change r280275, "[Hexagon] Deal with undefs when extending live intervals". llvm-svn: 280301	2016-08-31 22:36:02 +00:00
Matt Arsenault	b50eb8dc2b	AMDGPU: Fix introducing stack access on unaligned v16i8 llvm-svn: 280298	2016-08-31 21:52:27 +00:00
Matt Arsenault	1d2151781b	AMDGPU: Use copy instead of mov during frame lowering This occurs before RA pseudos are expanded. It's less code to emit the copy. llvm-svn: 280297	2016-08-31 21:52:25 +00:00
Matt Arsenault	57bc4324f8	AMDGPU: Refactor frame lowering This will make future changes easier. llvm-svn: 280296	2016-08-31 21:52:21 +00:00
Tim Northover	11a2354670	GlobalISel: use G_TYPE to annotate physregs with a type. More preparation for dropping source types from MachineInstrs: regsters coming out of already-selected code (i.e. non-generic instructions) don't have a type, but that information is needed so we must add it manually. This is done via a new G_TYPE instruction. llvm-svn: 280292	2016-08-31 21:24:02 +00:00
Derek Schuff	1b258d313c	[WebAssembly] Disable folding of GA+reg into load/store constant offsets Summary: If the register has a negative value then unsigned overflow will occur; this case is sometimes even created intentionally by LSR. For now disable GA+reg folding. Fixes PR29127 Differential Revision: https://reviews.llvm.org/D24053 llvm-svn: 280285	2016-08-31 20:27:20 +00:00
Krzysztof Parzyszek	e21a0b3b9f	[Hexagon] Deal with undefs when extending live intervals llvm-svn: 280275	2016-08-31 18:52:09 +00:00
Tom Stellard	ba5730884b	AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte aligned Summary: This fixes some OpenCV tests that were broken by libclc commit r276443. Reviewers: arsenm, jvesely Subscribers: arsenm, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24051 llvm-svn: 280274	2016-08-31 18:46:07 +00:00
Simon Pilgrim	6199b4fd49	[X86][SSE] Improve awareness of (v)cvtpd2ps implicit zeroing of upper 64-bits of xmm result Associate x86_sse2_cvtpd2ps with X86ISD::VFPROUND to avoid inserting unnecessary zeroing shuffles. Differential Revision: https://reviews.llvm.org/D23797 llvm-svn: 280249	2016-08-31 15:09:34 +00:00
Sjoerd Meijer	46b5b88387	Clang patch r280064 introduced ways to set the FP exceptions and denormal types. This is the LLVM counterpart and it adds options that map onto FP exceptions and denormal build attributes allowing better fp math library selections. Differential Revision: https://reviews.llvm.org/D24070 llvm-svn: 280246	2016-08-31 14:17:38 +00:00
Diana Picus	760c757633	Use abstraction in AArch64AsmPrinter::lowerSTACKMAP. NFCI Use functionality from StackMapOpers instead of hardcoding an operand access. llvm-svn: 280230	2016-08-31 12:43:49 +00:00
Diana Picus	16c818820b	Typo fixes. NFC llvm-svn: 280229	2016-08-31 12:43:44 +00:00
Nikolay Haustov	eba808957e	AMDGPU/SI: Handle aliases in AMDGPUAlwaysInlinePass Summary: Simply replace usage of aliases to functions with aliasee. This came up when bitcode linking to builtin library and calls to aliases not being resolved. Also made minor improvements to existing test. Reviewers: tstellarAMD, alex-t, vpykhtin Subscribers: arsenm, wdng, rampitec Differential Revision: https://reviews.llvm.org/D24023 llvm-svn: 280221	2016-08-31 11:18:33 +00:00
Simon Pilgrim	7b09af193a	[X86][SSE] Improve awareness of fptrunc implicit zeroing of upper 64-bits of xmm result Add patterns to avoid inserting unnecessary zeroing shuffles when lowering fptrunc to (v)cvtpd2ps Differential Revision: https://reviews.llvm.org/D23797 llvm-svn: 280214	2016-08-31 10:35:13 +00:00
Craig Topper	8f6827c945	[AVX-512] Add patterns to select masked logical operations if the select has a floating point type. This is needed in order to replace the masked floating point logical op intrinsics with native IR. llvm-svn: 280195	2016-08-31 05:37:52 +00:00
Hal Finkel	97a189c716	[PowerPC] Don't spill the frame pointer twice When a function contains something, such as inline asm, which explicitly clobbers the register used as the frame pointer, don't spill it twice. If we need a frame pointer, it will be saved/restored in the prologue/epilogue code. Explicitly spilling it again will reuse the same spill slot used by the prologue/epilogue code, thus clobbering the saved value. The same applies to the base-pointer or PIC-base register. Partially fixes PR26856. Thanks to Ulrich for his analysis and the small inline-asm reproducer. llvm-svn: 280188	2016-08-31 00:52:03 +00:00
Matt Arsenault	a609e2d5ce	AMDGPU: Relax SGPR asm constraint register class s should be SReg_32 to be as general as possible. This can avoid a copy from m0. llvm-svn: 280154	2016-08-30 20:50:08 +00:00
Valery Pykhtin	a34fb49f8f	[AMDGPU] Refactor SOP instructions TD files. Differential revision: https://reviews.llvm.org/D23617 llvm-svn: 280101	2016-08-30 15:20:31 +00:00
NAKAMURA Takumi	9720f57a17	SILoadStoreOptimizer.cpp: Fix a warning in r279991. [-Wunused-variable] llvm-svn: 280075	2016-08-30 11:50:21 +00:00
James Y Knight	d7d9e1069b	Replace incorrect "#ifdef DEBUG" with "#ifndef NDEBUG". The former is simply wrong -- the code will either never be used or will always be used, rather than being dependent upon whether it's built with debug assertions enabled. The macro DEBUG isn't ever set by the llvm build system. But, the macro DEBUG(X) is defined (unconditionally) if you happen to include llvm/Support/Debug.h. The code in Value.h which was erroneously protected by the #ifdef DEBUG didn't even compile -- you can't cast<> from an LLVMOpaqueValue directly. Fortunately, it was never invoked, as Core.cpp included Value.h before Debug.h. The conditionalized code in AArch64CollectLOH.cpp was previously always used, as it includes Debug.h. llvm-svn: 280056	2016-08-30 03:16:16 +00:00
Hal Finkel	18d0e3f44c	[PowerPC] Force entry alignment in .got2 Implement Bill's suggested fix for 32-bit targets for PR22711 (for the alignment of each entry). As pointed out in the bug report, we could just force the section alignment, since we only add pointer-sized things currently, but this fix is somewhat more future-proof. llvm-svn: 280049	2016-08-30 01:43:38 +00:00
Hal Finkel	b074a608ce	[PowerPC] Add support for -mlongcall The "long call" option forces the use of the indirect calling sequence for all calls (even those that don't really need it). GCC provides this option; This is helpful, under certain circumstances, for building very-large binaries, and some other specialized use cases. Fixes PR19098. llvm-svn: 280040	2016-08-30 00:59:23 +00:00
Duncan P. N. Exon Smith	5c001c367f	ADT: Give ilist<T>::reverse_iterator a handle to the current node Reverse iterators to doubly-linked lists can be simpler (and cheaper) than std::reverse_iterator. Make it so. In particular, change ilist<T>::reverse_iterator so that it is never invalidated unless the node it references is deleted. This matches the guarantees of ilist<T>::iterator. (Note: MachineBasicBlock::iterator is not an ilist iterator, but a MachineInstrBundleIterator<MachineInstr>. This commit does not change MachineBasicBlock::reverse_iterator, but it does update MachineBasicBlock::reverse_instr_iterator. See note at end of commit message for details on bundle iterators.) Given the list (with the Sentinel showing twice for simplicity): [Sentinel] <-> A <-> B <-> [Sentinel] the following is now true: 1. begin() represents A. 2. begin() holds the pointer for A. 3. end() represents [Sentinel]. 4. end() holds the poitner for [Sentinel]. 5. rbegin() represents B. 6. rbegin() holds the pointer for B. 7. rend() represents [Sentinel]. 8. rend() holds the pointer for [Sentinel]. The changes are #6 and #8. Here are some properties from the old scheme (which used std::reverse_iterator): - rbegin() held the pointer for [Sentinel] and rend() held the pointer for A; - operator() cost two dereferences instead of one; - converting from a valid iterator to its valid reverse_iterator involved a confusing increment; and - "RI++->erase()" left RI invalid. The unintuitive replacement was "RI->erase(), RE = end()". With vector-like data structures these properties are hard to avoid (since past-the-beginning is not a valid pointer), and don't impose a real cost (since there's still only one dereference, and all iterators are invalidated on erase). But with lists, this was a poor design. Specifically, the following code (which obviously works with normal iterators) now works with ilist::reverse_iterator as well: for (auto RI = L.rbegin(), RE = L.rend(); RI != RE;) fooThatMightRemoveArgFromList(RI++); Converting between iterator and reverse_iterator for the same node uses the getReverse() function. reverse_iterator iterator::getReverse(); iterator reverse_iterator::getReverse(); Why doesn't iterator <=> reverse_iterator conversion use constructors? In order to catch and update old code, reverse_iterator does not even have an explicit conversion from iterator. It wouldn't be safe because there would be no reasonable way to catch all the bugs from the changed semantic (see the changes at call sites that are part of this patch). Old code used this API: std::reverse_iterator::reverse_iterator(iterator); iterator std::reverse_iterator::base(); Here's how to update from old code to new (that incorporates the semantic change), assuming I is an ilist<>::iterator and RI is an ilist<>::reverse_iterator: [Old] ==> [New] reverse_iterator(I) (--I).getReverse() reverse_iterator(I) ++I.getReverse() --reverse_iterator(I) I.getReverse() reverse_iterator(++I) I.getReverse() RI.base() (--RI).getReverse() RI.base() ++RI.getReverse() --RI.base() RI.getReverse() (++RI).base() RI.getReverse() delete &RI, RE = end() delete &RI++ RI->erase(), RE = end() RI++->erase() ======================================= Note: bundle iterators are out of scope ======================================= MachineBasicBlock::iterator, also known as MachineInstrBundleIterator<MachineInstr>, is a wrapper to represent MachineInstr bundles. The idea is that each operator++ takes you to the beginning of the next bundle. Implementing a sane reverse iterator for this is harder than ilist. Here are the options: - Use std::reverse_iterator<MBB::i>. Store a handle to the beginning of the next bundle. A call to operator() runs a loop (usually operator--() will be called 1 time, for unbundled instructions). Increment/decrement just works. This is the status quo. - Store a handle to the final node in the bundle. A call to operator() still runs a loop, but it iterates one time fewer (usually operator--() will be called 0 times, for unbundled instructions). Increment/decrement just works. - Make the ilist_sentinel<MachineInstr> always store that it's the sentinel (instead of just in asserts mode). Then the bundle iterator can sniff the sentinel bit in operator++(). I initially tried implementing the end() option as part of this commit, but updating iterator/reverse_iterator conversion call sites was error-prone. I have a WIP series of patches that implements the final option. llvm-svn: 280032	2016-08-30 00:13:12 +00:00
Jan Vesely	89876673cd	AMDGPU/R600: Cleanup DAGCombine Move SDLoc initialization to comon place. fall back to AMDGPU version in one place Differential Revision: https://reviews.llvm.org/D23900 llvm-svn: 280030	2016-08-29 23:21:46 +00:00
Michael Kuperstein	173b43da35	Fix typo in comment. NFC. llvm-svn: 280025	2016-08-29 22:49:05 +00:00
Hal Finkel	3d70a9dbb7	[PowerPC] Fix i8/i16 atomics for little-Endian targets without partword atomics For little-Endian PowerPC, we generally target only P8 and later by default. However, generic (older) 64-bit configurations are still an option, and in that case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics without true i8/i16 atomic operations, we emulate using i32 atomics in combination with a bunch of shifting and masking, etc. The amount by which to shift in little-Endian mode is different from the amount in big-Endian mode (it is inverted -- meaning we can leave off the xor when computing the amount). Fixes PR22923. llvm-svn: 280022	2016-08-29 22:25:36 +00:00
Jan Vesely	77ed6af416	AMDGPU/R600: Remove MergeVectorStores from legalization This is handled by DAGCombiner in a more generic way Differential Revision: https://reviews.llvm.org/D23970 llvm-svn: 280019	2016-08-29 22:05:06 +00:00
Saleem Abdulrasool	43e5fe3fac	AMDGPU: fix mismatch tags, NFC llvm-svn: 280006	2016-08-29 20:42:07 +00:00
Douglas Katzman	47ca88ace2	[Myriad]: add missing 'mcpu' values Should have been done with r276646. llvm-svn: 279996	2016-08-29 19:42:57 +00:00
Tom Stellard	0d23ebe888	AMDGPU/SI: Implement a custom MachineSchedStrategy Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995	2016-08-29 19:42:52 +00:00
Tom Stellard	c2ff0eb697	AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991	2016-08-29 19:15:22 +00:00
Tim Northover	edb3c8ccb8	GlobalISel: legalize frem to a libcall on AArch64. llvm-svn: 279988	2016-08-29 19:07:16 +00:00
Tim Northover	fe5f89ba14	GlobalISel: rework CallLowering so that it can be used for libcalls too. There should be no functional change here, I'm just making the implementation of "frem" (to libcall) legalization easier for a followup. llvm-svn: 279987	2016-08-29 19:07:08 +00:00
Matt Arsenault	b90fc9b3b4	AMDGPU/R600: Fix fixups used for constant arrays Fixes bug 29289 llvm-svn: 279986	2016-08-29 19:01:48 +00:00
Evandro Menezes	a8a25ca905	[AArch64] Adjust the scheduling model for Exynos M1. Further refine the model for loads. llvm-svn: 279976	2016-08-29 16:04:37 +00:00
Tom Stellard	5d3f71f721	AMDGPU/SI: Improve register allocation hints for sopk instructions Summary: For shrinking SOPK instructions, we were creating a hint to tell the register allocator to use the register allocated for src0 for the dst operand as well. However, this seems to not work sometimes depending on the order virtual registers are assigned physical registers. To fix this, I've added a second allocation hint which does the reverse, asks that the register allocated for dst is used for src0. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23862 llvm-svn: 279968	2016-08-29 13:06:10 +00:00
Haojian Wu	eab33cecf3	Fix -Wunused-but-set-variable warning. Summary: A follow-up fix on r279958. Reviewers: bkramer Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D23989 llvm-svn: 279964	2016-08-29 12:26:33 +00:00
Tom Stellard	662f330852	AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint() Summary: The SILoadStoreOptimizer will need to use AliasAnalysis here in order to move it before scheduling. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23813 llvm-svn: 279963	2016-08-29 12:05:32 +00:00
Igor Breger	1a388871b9	[AVX512] In some cases KORTEST instruction may be used instead of ZEXT + TEST sequence. Differential Revision: http://reviews.llvm.org/D23490 llvm-svn: 279960	2016-08-29 08:52:52 +00:00
Craig Topper	713085e60a	[X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just create a ConstantFPSDNode and let that be lowered. This allows broadcast loads to used when available. llvm-svn: 279958	2016-08-29 04:49:31 +00:00
Craig Topper	f0e822ff31	[AVX-512] Always use v8i64 when converting 512-bit FAND/FOR/FXOR/FANDN to integer operations when DQI isn't supported. This is consistent with the recent changes to promote logical operations to i64 vectors. llvm-svn: 279957	2016-08-29 04:49:27 +00:00
Craig Topper	850feaf3b7	[AVX-512] Add support for selecting 512-bit VPABSB/VPABSW when BWI is available. llvm-svn: 279951	2016-08-28 22:20:51 +00:00
Craig Topper	056c9062f3	[AVX-512] Add patterns for selecting 128/256-bit EVEX VPABS instructions. llvm-svn: 279950	2016-08-28 22:20:48 +00:00
Simon Pilgrim	5369cd9e9c	[X86][AVX512] Only combine EVEX targets shuffles to shuffles of the same number of vector elements Over eager combing prevents the correct folding of writemasks. At the moment this occurs for ALL EVEX shuffles, in the future we need to check that the user of the root shuffle is a VSELECT that can fold to a writemask. llvm-svn: 279934	2016-08-28 17:27:14 +00:00
Hal Finkel	5728200f33	[PowerPC] Implement lowering for atomicrmw min/max/umin/umax Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818. llvm-svn: 279933	2016-08-28 16:17:58 +00:00
Craig Topper	abe80cc04d	[AVX-512] Promote AND/OR/XOR to v2i64/v4i64/v8i64 even when we have AVX512F/AVX512VL. Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available. Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though. llvm-svn: 279929	2016-08-28 06:06:28 +00:00
Craig Topper	8877a026e4	[X86] Rename PABSB/D/W instructions to be consistent with SSE/AVX instructions instead of ending 128/256. NFC llvm-svn: 279927	2016-08-28 06:06:21 +00:00
Jan Vesely	38814fa2fd	AMDGPU/R600: Enable Load combine Fix and improve tests Differential Revision: https://reviews.llvm.org/D23899 llvm-svn: 279925	2016-08-27 19:09:43 +00:00
Craig Topper	6943aa306e	[X86] Rename predicate function that detects if requires one of the REX.B, REX.X or REX.R bits. It's old name conflicted with a function in X8II namespace that doesnt' quite do the same thing. NFC llvm-svn: 279924	2016-08-27 17:13:43 +00:00
Craig Topper	45793a1f7a	[X86] Keep looping over operands looking for byte registers even if we already found a register that requires a REX prefix. Otherwise we don't error if a high byte register is used after SPL/BPL/DIL/SIL. llvm-svn: 279923	2016-08-27 17:13:41 +00:00
Craig Topper	6acca80e17	[X86] Include XMM/YMM/ZMM16-23 in X86II::isX86_64ExtendedReg. This feels more consistent with its name and simplifies assembler code. llvm-svn: 279922	2016-08-27 17:13:37 +00:00
Craig Topper	06c60c067f	[X86] Don't allow DR8-DR15 to be assembled in 32-bit mode. Add missing test for CR8-CR15. llvm-svn: 279921	2016-08-27 17:13:34 +00:00
Craig Topper	ed71f04abb	[X86] Remove stale comment about FixupBWInsts pass being off by default. NFC llvm-svn: 279915	2016-08-27 05:26:54 +00:00
Craig Topper	225da2cb84	[AVX-512] Allow EVEX encoding unordered/ordered/equal/notequal VCMPPS/PD/SS/SD to be commuted just like the SSE and AVX counterparts. llvm-svn: 279914	2016-08-27 05:22:15 +00:00
Craig Topper	144fdef66b	[X86] Enable FR32/FR64 cmpeq/cmpne/cmpunord/cmpord to be commuted. llvm-svn: 279913	2016-08-27 05:22:12 +00:00
Craig Topper	4891c724aa	[AVX-512] Add load folding for EVEX vcmpps/pd/ss/sd. llvm-svn: 279912	2016-08-27 05:22:08 +00:00
Matt Arsenault	a15ea4e217	AMDGPU: Mark sched model complete Fixes bug 26800 llvm-svn: 279910	2016-08-27 03:39:27 +00:00
Matt Arsenault	71ed8a67e8	AMDGPU: Remove unneeded implicit exec uses/defs SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec. SI_IF_BREAK and SI_ELSE_BREAK do not read it either. llvm-svn: 279909	2016-08-27 03:00:51 +00:00
Matt Arsenault	2712d4a3d8	AMDGPU: Select mulhi 24-bit instructions llvm-svn: 279902	2016-08-27 01:32:27 +00:00
Matt Arsenault	22e417956d	AMDGPU: Move cndmask pseudo to be isel pseudo There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901	2016-08-27 01:00:37 +00:00
Matt Arsenault	e949744474	AMDGPU: Fix sched type for branches llvm-svn: 279900	2016-08-27 00:51:02 +00:00
Matt Arsenault	f98a596954	AMDGPU: Remove register operand from si_mask_branch It isn't used for anything, and is also misleading since it could be spilled at the end of the block, so it can't be relied on. There ends up being a verifier error about using an undefined register since the spill kills the register. llvm-svn: 279899	2016-08-27 00:42:21 +00:00
Matt Arsenault	00e102baf4	AMDGPU: Improve error reporting for maximum branch distance Unfortunately this seems to only help the assembler diagnostic. llvm-svn: 279895	2016-08-27 00:21:22 +00:00
Quentin Colombet	a94caa5673	[AArch64][CallLowering] Do not assert for not implemented part. When doing the ABI lowering, report a failure to the caller instead of asserting. This gives a chance for the caller to recover. llvm-svn: 279890	2016-08-27 00:18:28 +00:00
Tom Stellard	e175d8aba5	AMDGPU/SI: Canonicalize offset order for merged DS instructions Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets. Also, we will want to do this if we move the load/store optimizer before the scheduler. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23776 llvm-svn: 279870	2016-08-26 21:36:47 +00:00
Tom Stellard	4b5cd87ed3	XXX llvm-svn: 279868	2016-08-26 21:16:40 +00:00
Tom Stellard	7c463c9168	AMDGPU/SI: Use a better method for determining the largest pressure sets Summary: There are a few different sgpr pressure sets, but we only care about the one which covers all of the sgprs. We were using hard-coded register pressure set names to determine the reg set id for the biggest sgpr set. However, we were using the wrong name, and this method is pretty fragile, since the reg pressure set names may change. The new method just looks for the pressure set that contains the most reg units and sets that set as our SGPR pressure set. We've also adopted the same technique for determining our VGPR pressure set. Reviewers: arsenm Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23687 llvm-svn: 279867	2016-08-26 21:16:37 +00:00
Manman Ren	66b54e9f32	Swift Calling Convetion: add support for AArch64. It will just be the same as the regular calling convention. rdar://28029509 llvm-svn: 279853	2016-08-26 19:28:17 +00:00
Tim Northover	85cf564c51	AArch64: avoid assertion on illegal types in performFDivCombine. In the code to detect fixed-point conversions and make use of AArch64's special instructions, we weren't prepared for weird types. The fptosi direction got fixed recently, but not the similar sitofp code. llvm-svn: 279852	2016-08-26 18:52:31 +00:00
Chad Rosier	58f505ba24	[AArch64] Avoid materializing constant values when generating csel instructions. Differential Revision: https://reviews.llvm.org/D23677 llvm-svn: 279849	2016-08-26 18:05:50 +00:00
Reid Kleckner	a5b1eef846	[MC] Move .cv_loc management logic out of MCContext MCContext already has many tasks, and separating CodeView out from it is probably a good idea. The .cv_loc tracking was modelled on the DWARF tracking which lived directly in MCContext. Removes the inclusion of MCCodeView.h from MCContext.h, so now there are only 10 build actions while I hack on CodeView support instead of 265. llvm-svn: 279847	2016-08-26 17:58:37 +00:00
Tim Northover	bc1701c7fb	GlobalISel: mark G_FPEXT legal from float to double. llvm-svn: 279845	2016-08-26 17:46:22 +00:00
Tim Northover	30bd36e3fc	GlobalISel: mark G_FCMP legal on float & double. llvm-svn: 279844	2016-08-26 17:46:19 +00:00
Tim Northover	051b8ad3d9	GlobalISel: simplify G_ICMP legalization regime. It's unclear how the old %res(32) = G_ICMP { s32, s32 } intpred(eq), %0, %1 is actually different from an s1 verison %res(1) = G_ICMP { s1, s32 } intpred(eq), %0, %1 so we'll remove it for now. llvm-svn: 279843	2016-08-26 17:46:17 +00:00
Tim Northover	cecee56abb	GlobalISel: legalize sdiv and srem operations. llvm-svn: 279842	2016-08-26 17:46:13 +00:00
Tim Northover	7a753d9bec	GlobalISel: legalize under-width divisions. llvm-svn: 279841	2016-08-26 17:46:06 +00:00
Tim Northover	1d18a99a53	GlobalISel: mark selects legal llvm-svn: 279840	2016-08-26 17:46:03 +00:00
Tim Northover	5d0eaa4e79	GlobalISel: mark float/int conversions legal llvm-svn: 279839	2016-08-26 17:45:58 +00:00
Chad Rosier	39c1dbb845	[AArch64] Avoid materializing constant 1 by using csinc, rather than csel. This is similar to what was done in r261675, but for CSINC rather than CSINV. Differential Revision: https://reviews.llvm.org/D23892 llvm-svn: 279822	2016-08-26 14:01:55 +00:00
Pablo Barrio	b8ec630583	Handle empty functions with debug info in load/store opt pass Summary: In fuctions that contained debug info but were empty otherwise, the ARM load/store optimizer could abort. This was because function MergeReturnIntoLDM handled the special case where a Machine Basic BLock is empty by calling MBB.empty(). However, this returns false in presence of debug info, although the function should be considered empty in the eyes of the load/store optimizer. This has been fixed by handling the case where searching through the block finds only debug instructions. Reviewers: rengolin, dexonsmith, llvm-commits, jmolloy Subscribers: t.p.northover, aemerson, rengolin, samparker Differential Revision: https://reviews.llvm.org/D23847 llvm-svn: 279820	2016-08-26 13:00:39 +00:00
Simon Pilgrim	091c4c781c	[X86][SSE4A] The EXTRQ/INSERTQ bit extraction/insertion ops should be in the integer domain llvm-svn: 279811	2016-08-26 09:55:41 +00:00
Craig Topper	8f27f51192	[X86][SSE] Add CMPSS/CMPSD intrinsic scalar load folding support. llvm-svn: 279806	2016-08-26 07:08:00 +00:00
Michael Kuperstein	2ee911e985	Revert r274613 because it breaks the test suite with AVX512 This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289), due to miscompiles in the test suite. The FastISel change was left in, because it apparently fixes an unrelated issue. (Recommit of r279782 which was broken due to a bad merge.) This fixes 4 out of the 5 test failures in PR29112. llvm-svn: 279788	2016-08-25 22:48:11 +00:00
Michael Kuperstein	6e271f4ce8	Revert r279782 due to debug buildbot breakage. llvm-svn: 279785	2016-08-25 22:14:45 +00:00
Michael Kuperstein	a6ccc8d365	Revert r274613 because it breaks the test suite with AVX512 This reverts most of r274613 and its follow-ups (r276347, r277289), due to miscompiles in the test suite. The FastISel change was left in, because it apparently fixes an unrelated issue. This fixes 4 out of the 5 test failures in PR29112. llvm-svn: 279782	2016-08-25 21:55:41 +00:00
Tim Northover	3495647d0d	ARM: by default don't set the Thumb bit on MachO relocated values. Its existence is largely historical, apparently we tried to make ARM object files look maybe-almost-possibly runnable by putting our best guess at the actual value into relocated locations. Of course, the real linker then comes along and can completely change things. But it should only be there for word-sized and movw/movt relocations. It can't be encoded in branch relocations, and I've seen it mess up validity calculations twice in the last couple of weeks so the default is clearly problematic. llvm-svn: 279773	2016-08-25 20:41:30 +00:00
Tim Northover	d8a6d7ce91	GlobalISel: mark overflow bit of overflow ops legal. It's expected this will map to NZCV register class and be properly selectable. llvm-svn: 279761	2016-08-25 17:37:41 +00:00
Tim Northover	fe880a8801	GlobalISel: mark simple ops legal even on types < 32-bit. The 32-bit variants of these operations don't depend on the bits not being operated on, so they also naturally model operations narrower than the actual register width. llvm-svn: 279760	2016-08-25 17:37:39 +00:00
Tim Northover	7a1ec0141a	GlobalISel: mark pointer constants as legal on AArch64. llvm-svn: 279759	2016-08-25 17:37:35 +00:00
Tim Northover	438c77ca1a	GlobalISel: perform multi-step legalization llvm-svn: 279758	2016-08-25 17:37:32 +00:00
Tim Northover	2c4a838e24	GlobalISel: mark small extends as legal on AArch64 llvm-svn: 279757	2016-08-25 17:37:25 +00:00
Michael Kuperstein	40887c5566	[X86] 512-bit VPAVG requires AVX512BW Fix VPAVG detection to require AVX512BW, not AVX512F for 512-bit widths, and change associated asserts to assert in the right direction... This fixes PR29111. llvm-svn: 279755	2016-08-25 17:17:46 +00:00
Simon Pilgrim	5aa9c203ac	[X86][SSE] INSERTPS is only combined on v4f32 types. NFCI. llvm-svn: 279751	2016-08-25 17:02:00 +00:00
Ron Lieberman	a3c739b977	[Hexagon] Remove extraneous debug output from HexagonCopyToCombine.cpp BB# ... llvm-svn: 279750	2016-08-25 16:46:09 +00:00
Simon Pilgrim	6fe4a9ed1e	Fix line endings llvm-svn: 279745	2016-08-25 15:45:27 +00:00
Ron Lieberman	c93d123b86	[Hexagon] vector store print tracing. Add vector store print tracing option for hexagon vector instructions. https://reviews.llvm.org/D23870 llvm-svn: 279739	2016-08-25 13:35:48 +00:00
Simon Pilgrim	0ad9f3e93b	[X86][AVX] Provide SubVectorBroadcast fallback if load fold fails (PR29133) Fix for PR29133, matching the approach that was taken for AVX1 scalar broadcasts. llvm-svn: 279735	2016-08-25 12:45:16 +00:00
Craig Topper	5ef7a0f45a	[X86] Simplify getOperandBias as a bit. NFC There's no reason for it to return a signed type. Just return the operand bias in each if instead of starting from 0 and adding in the 'if'. llvm-svn: 279720	2016-08-25 04:16:10 +00:00
Craig Topper	969e56a2cc	[X86] Fix indentation per coding standards. NFC llvm-svn: 279719	2016-08-25 04:16:08 +00:00
Matthias Braun	1eb473680a	MachineFunctionProperties/MIRParser: Rename AllVRegsAllocated->NoVRegs, compute it Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of running after register and simply describes that no vregs are used in a machine function. With that we can simply compute the property and do not need to dump/parse it in .mir files. Differential Revision: http://reviews.llvm.org/D23850 llvm-svn: 279698	2016-08-25 01:27:13 +00:00
George Burgess IV	381fc0ee3c	Make some LLVM_CONSTEXPR variables const. NFC. This patch changes LLVM_CONSTEXPR variable declarations to const variable declarations, since LLVM_CONSTEXPR expands to nothing if the current compiler doesn't support constexpr. In all of the changed cases, it looks like the code intended the variable to be const instead of sometimes-constexpr sometimes-not. llvm-svn: 279696	2016-08-25 01:05:08 +00:00
Heejin Ahn	b6cd5121b7	[WebAssembly] Change a comment line Test for commit access. llvm-svn: 279683	2016-08-24 22:53:00 +00:00
Krzysztof Parzyszek	6dff336ad1	[Hexagon] Check for block end when skipping debug instructions llvm-svn: 279681	2016-08-24 22:36:35 +00:00
Krzysztof Parzyszek	951fb36120	[Hexagon] Change insertion of expand-condsets pass to avoid memory leaks llvm-svn: 279678	2016-08-24 22:27:36 +00:00
Tim Northover	9c3633f516	ARM: don't diagnose cbz/cbnz to Thumb functions. A branch-distance to a Thumb function shouldn't be forced to be odd for CBZ/CBNZ instructions because (assuming it's within range), it's going to be a valid, even offset. llvm-svn: 279665	2016-08-24 21:21:29 +00:00
Changpeng Fang	75f0968b39	AMDGCN/SI: Implement readlane/readfirstlane intrinsics Summary: This patch implements readlane/readfirstlane intrinsics. TODO: need to define a new register class to consider the case that the source could be a vector register or M0. Reviewed by: arsenm and tstellarAMD Differential Revision: http://reviews.llvm.org/D22489 llvm-svn: 279660	2016-08-24 20:35:23 +00:00
Rafael Espindola	70c6a3976b	Use isTargetMachO instead of isTargetDarwin. llvm-svn: 279655	2016-08-24 19:02:29 +00:00
Simon Pilgrim	e14653e17d	[X86][SSE] Add MINSD/MAXSD/MINSS/MAXSS intrinsic scalar load folding support These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad llvm-svn: 279652	2016-08-24 18:40:53 +00:00
Evandro Menezes	5395187fe5	[AArch64] Adjust the feature set for Exynos M1. Enable zero cycle zeroing. llvm-svn: 279648	2016-08-24 18:17:30 +00:00
Simon Pilgrim	941bd6bbae	[X86][SSE] Add support for combining VZEXT_MOVL target shuffles Includes adding more general support for the pattern: VZEXT_MOVL(VZEXT_LOAD(ptr)) -> VZEXT_LOAD(ptr) This has unearthed a couple of latent poor codegen issues (MINSS/MAXSS scalar load folding and MOVDDUP/BROADCAST load folding patterns), which will be fixed shortly. Its also reduced a couple of tests so that they no longer reach the instruction threshold necessary to be combined to PSHUFB (see PR26183). llvm-svn: 279646	2016-08-24 18:07:53 +00:00
Krzysztof Parzyszek	b5ec48755d	[Hexagon] Enable subregister liveness tracking llvm-svn: 279642	2016-08-24 17:17:39 +00:00
Krzysztof Parzyszek	cbd559f507	[Hexagon] Remove the utilization of IMPLICIT_DEFs from expand-condsets This is no longer necessary, because since r279625 the subregister liveness properly accounts for read-undefs. llvm-svn: 279637	2016-08-24 16:36:37 +00:00
Wei Ding	1041a646a9	AMDGPU : Add V_SAD_U32 instruction pattern. Differential Revision: http://reviews.llvm.org/D23069 llvm-svn: 279629	2016-08-24 14:59:47 +00:00
Krzysztof Parzyszek	a7ed090bba	Create subranges for new intervals resulting from live interval splitting The register allocator can split a live interval of a register into a set of smaller intervals. After the allocation of registers is complete, the rewriter will modify the IR to replace virtual registers with the corres- ponding physical registers. At this stage, if a register corresponding to a subregister of a virtual register is used, the rewriter will check if that subregister is undefined, and if so, it will add the <undef> flag to the machine operand. The function verifying liveness of the subregis- ter would assume that it is undefined, unless any of the subranges of the live interval proves otherwise. The problem is that the live intervals created during splitting do not have any subranges, even if the original parent interval did. This could result in the <undef> flag placed on a register that is actually defined. Differential Revision: http://reviews.llvm.org/D21189 llvm-svn: 279625	2016-08-24 13:37:55 +00:00
Simon Dardis	f114820912	[mips] Preparatory work for a generic scheduler Extend instruction definitions from nearly all ISAs to include appropriate instruction itineraries. Change MIPS16s gp prologue generation to use real instructions instead of using a pseudo instruction. Reviewers: dsanders, vkalintiris Differential Review: https://reviews.llvm.org/D23548 llvm-svn: 279623	2016-08-24 13:00:47 +00:00
Simon Pilgrim	7a50c8c2ba	[X86][AVX2] Ensure on 32-bit targets that we broadcast f64 types not i64 (PR29101) llvm-svn: 279622	2016-08-24 12:42:31 +00:00
Simon Pilgrim	6392b8d4ce	[X86][SSE] Add support for 32-bit element vectors to X86ISD::VZEXT_LOAD Consecutive load matching (EltsFromConsecutiveLoads) currently uses VZEXT_LOAD (load scalar into lowest element and zero uppers) for vXi64 / vXf64 vectors only. For vXi32 / vXf32 vectors it instead creates a scalar load, SCALAR_TO_VECTOR and finally VZEXT_MOVL (zero upper vector elements), relying on tablegen patterns to match this into an equivalent of VZEXT_LOAD. This patch adds the VZEXT_LOAD patterns for vXi32 / vXf32 vectors directly and updates EltsFromConsecutiveLoads to use this. This has proven necessary to allow us to easily make VZEXT_MOVL a full member of the target shuffle set - without this change the call to combineShuffle (which is the main caller of EltsFromConsecutiveLoads) tended to recursively recreate VZEXT_MOVL nodes...... Differential Revision: https://reviews.llvm.org/D23673 llvm-svn: 279619	2016-08-24 10:46:40 +00:00
Matthias Braun	733fe3676c	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses Re-apply this patch, hopefully I will get away without any warnings in the constructor now. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279602	2016-08-24 01:52:46 +00:00
Philip Reames	e83c4b30ca	[stackmaps] More extraction of common code [NFCI] General cleanup before starting to work on the part I want to actually change. llvm-svn: 279586	2016-08-23 23:33:29 +00:00
Richard Smith	8c3fbdc6c4	Revert r279564. It introduces undefined behavior (binding a reference to a dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes -Werror builds (including several buildbots) to fail. llvm-svn: 279580	2016-08-23 22:08:27 +00:00
Matthias Braun	90799ce8b2	MachineFunction: Introduce NoPHIs property I want to compute the SSA property of .mir files automatically in upcoming patches. The problem with this is that some inputs will be reported as static single assignment with some passes claiming not to support SSA form. In reality though those passes do not support PHI instructions => Track the presence of PHI instructions separate from the SSA property. Differential Revision: https://reviews.llvm.org/D22719 llvm-svn: 279573	2016-08-23 21:19:49 +00:00
Tim Northover	6cd4b23a0f	GlobalISel: legalize integer comparisons on AArch64. Next step is doing both legalizations at the same time! Marvel at GlobalISel's cunning. llvm-svn: 279566	2016-08-23 21:01:26 +00:00
Tim Northover	b3a0be4d38	GlobalISel: legalize conditional branches on AArch64. llvm-svn: 279565	2016-08-23 21:01:20 +00:00
Matthias Braun	4c1f1f120c	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses Re-apply this commit with the deletion of a MachineFunction delegated to a separate pass to avoid use after free when doing this directly in AsmPrinter. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279564	2016-08-23 20:58:29 +00:00

... 2 3 4 5 6 ...

39335 Commits