llvm-project

Commit Graph

Author	SHA1	Message	Date
Zvi Rackover	76dbf26599	[X86][GlobalISel] Add minimal call lowering support to the IRTranslator Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935	2016-11-15 06:34:33 +00:00
Matt Arsenault	c79dc70d50	AMDGPU: Fix f16 fabs/fneg llvm-svn: 286931	2016-11-15 02:25:28 +00:00
Matt Arsenault	81da114e65	AMDGPU: Set hasExtraSrcRegAllocReq on v_div_scale_* This doesn't solve any problems I know about, but this should have more conservative assumptions about the operands' llvm-svn: 286913	2016-11-15 00:05:42 +00:00
Matt Arsenault	972034bda9	AMDGPU: Fix formatting of 1/2pi immediate llvm-svn: 286912	2016-11-15 00:04:33 +00:00
Evandro Menezes	9fc54826e0	[AArch64] Compute the Newton series for reciprocals natively Implement the Newton series for square root, its reciprocal and reciprocal natively using the specialized instructions in AArch64 to perform each series iteration. Differential revision: https://reviews.llvm.org/D26518 llvm-svn: 286907	2016-11-14 23:29:01 +00:00
Krzysztof Parzyszek	b16a4e5869	[Hexagon] Give a predicate function a more meaningful name Change "orisadd" to "IsOrAdd" to follow the naming conventions, and change "isOrAdd" in the C++ code to "isOrEquivalentToAdd". llvm-svn: 286886	2016-11-14 20:53:09 +00:00
Tim Northover	3d38c38826	ARM: try to fix GCC 4.8 compilation again after r286881. llvm-svn: 286882	2016-11-14 20:31:53 +00:00
Tim Northover	46a6f0fbf0	Recommit: ARM: sort register lists by encoding in push/pop instructions. For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. Fixed usage of std::sort so that we (hopefully) use instantiations that actually exist in GCC 4.8. llvm-svn: 286881	2016-11-14 20:28:24 +00:00
Geoff Berry	e8de67abad	[AArch64] Change some pointers to references. NFC. Follow-up change to r286875. llvm-svn: 286879	2016-11-14 19:59:11 +00:00
Geoff Berry	526c50588d	[AArch64] Split 0 vector stores into scalar store pairs. Summary: Replace a splat of zeros to a vector store by scalar stores of WZR/XZR. The load store optimizer pass will merge them to store pair stores. This should be better than a movi to create the vector zero followed by a vector store if the zero constant is not re-used, since one instructions and one register live range will be removed. For example, the final generated code should be: stp xzr, xzr, [x0] instead of: movi v0.2d, #0 str q0, [x0] Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26561 llvm-svn: 286875	2016-11-14 19:39:04 +00:00
Geoff Berry	def4bfa9d9	[AArch64] Factor out transform code from split16BStore. NFC. llvm-svn: 286874	2016-11-14 19:39:00 +00:00
Daniel Sanders	08714cdee4	Revert: r286868 - Test commit llvm-svn: 286869	2016-11-14 19:10:56 +00:00
Daniel Sanders	12432e0b04	Test commit llvm-svn: 286868	2016-11-14 19:09:33 +00:00
Tim Northover	1b66f39cf2	Revert "ARM: sort register lists by encoding in push/pop instructions." This reverts commit 286866. It broke a bot, something to do with exactly which templates std::sort accepts. llvm-svn: 286867	2016-11-14 19:05:28 +00:00
Tim Northover	e908ea844c	ARM: sort register lists by encoding in push/pop instructions. For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. llvm-svn: 286866	2016-11-14 19:02:17 +00:00
Sean Fertile	a435e07de8	[PPC] Add intrinsic mapping to the xscvhpsp instruction add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to Single-Precision' instruction. Differential review: https://reviews.llvm.org/D26536 llvm-svn: 286862	2016-11-14 18:43:59 +00:00
Changpeng Fang	8236fe103f	AMDGPU/SI: Support data types other than V4f32 in image intrinsics Summary: Extend image intrinsics to support data types of V1F32 and V2F32. TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active, even though such case should be very rare. Reviewers: tstellarAMD Differential Revision: http://reviews.llvm.org/D26472 llvm-svn: 286860	2016-11-14 18:33:18 +00:00
Sumanth Gundapaneni	d428cf8b5f	[Hexagon] Remove unsafe load instructions that affect Stack Slot Coloring The Stack slot coloring pass removes a store that is followed by a load that deal with the same stack slot. The function isLoadFromStackSlot is supposed to consider the loads that have no side-effects. This patch fixed the issue by removing the unsafe loads from this function Eg: %vreg0<def> = L2_loadruh_io <fi#15>, 0 S2_storeri_io <fi#15>, 0, %vreg0 In this case, we load an unsigned extended half word and store this in to the same stack slot. The Stack slot coloring pass considers safe to remove the store. This patch marked all the non-vector byte and half word loads as unsafe. llvm-svn: 286843	2016-11-14 17:11:00 +00:00
Simon Pilgrim	779da8e5ea	[CostModel][X86] Added mul costs for vXi8 vectors More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838	2016-11-14 15:54:24 +00:00
Simon Pilgrim	27fed8e5d6	[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832	2016-11-14 14:45:16 +00:00
Sean Fertile	adda5b2d2b	[PPC] add intrinsics for vec extract exp/significand and vec test data class. Differential Revision: https://reviews.llvm.org/D26272 llvm-svn: 286829	2016-11-14 14:42:37 +00:00
Diana Picus	bda7276120	GlobalISel: Fix indentation. NFC llvm-svn: 286808	2016-11-14 10:25:43 +00:00
Craig Topper	8f85ad1755	[AVX-512] Add suffixless aliases for EVEX encoded vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior. Fixes another problem from PR28850. llvm-svn: 286790	2016-11-14 02:46:58 +00:00
Craig Topper	b8596e4d1d	[X86] Cleanup 'x' and 'y' mnemonic suffixes for vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787	2016-11-14 01:53:29 +00:00
Craig Topper	353e59b6d6	[AVX-512] Remove and autoupgrade masked dword/qword variable shift intrinsics to the new unmasked versions and selects. llvm-svn: 286786	2016-11-14 01:53:22 +00:00
Craig Topper	ba13703bb3	[AVX-512] Fix a disassembler failure for AVX-512 vcmpss/vcmpsd with an immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd. Fixes PR24941. llvm-svn: 286775	2016-11-13 19:58:18 +00:00
Matt Arsenault	dc45274d54	AMDGPU: Implement SGPR spilling with scalar stores nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766	2016-11-13 18:20:54 +00:00
Igor Breger	e2399f9e0e	revert commit r286761, some builds failed on Win platforms llvm-svn: 286765	2016-11-13 15:48:11 +00:00
Ayman Musa	c09b3769ae	[X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss\|sd} intrinsics. Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 286761	2016-11-13 14:51:25 +00:00
Ayman Musa	46af8f9c6f	[X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions. Differential Revision: https://reviews.llvm.org/D26022 llvm-svn: 286758	2016-11-13 14:29:32 +00:00
Craig Topper	43e97649a1	[AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords. These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2. llvm-svn: 286754	2016-11-13 07:26:15 +00:00
Konstantin Zhuravlyov	f86e4b7266	[AMDGPU] Add f16 support (VI+) Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753	2016-11-13 07:01:11 +00:00
Craig Topper	da6a63db1c	[AVX-512] Remove the remaining masked shift by immediate or by single value. Autoupgrade them to recently introduced unmasked versions and a select. After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics. llvm-svn: 286725	2016-11-12 18:04:46 +00:00
Craig Topper	9d25c5e2fa	[AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM. Summary: This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend. This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang. Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26333 llvm-svn: 286711	2016-11-12 05:28:24 +00:00
Craig Topper	5cb13062d2	[AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases. Reviewers: delena, RKSimon Subscribers: Farhana, llvm-commits Differential Revision: https://reviews.llvm.org/D26297 llvm-svn: 286709	2016-11-12 05:05:27 +00:00
Tom Stellard	b4c8e8e30b	AMDGPU/SI: Promote i16 = fp_[us]int f32 for VI Summary: This fixes a regression caused by r286464. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26570 llvm-svn: 286687	2016-11-12 00:19:11 +00:00
Tom Stellard	9fdbec870c	AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopies Summary: This pass was assuming that when a PHI instruction defined a register used by another PHI instruction that the defining insstruction would be legalized before the using instruction. This assumption was causing the pass to not legalize some PHI nodes within divergent flow-control. This fixes a bug that was uncovered by r285762. Reviewers: nhaehnle, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26303 llvm-svn: 286676	2016-11-11 23:35:42 +00:00
Nemanja Ivanovic	ec4b0c360f	[PowerPC] Add remaining vector permute builtins in altivec.h - LLVM portion This patch corresponds to review: https://reviews.llvm.org/D26480 Adds all the intrinsics used for various permute builtins that will be added to altivec.h. llvm-svn: 286638	2016-11-11 21:42:01 +00:00
Chad Rosier	8ade03463e	[AArch64] Update a FIXME comment to reflect current state. NFC. llvm-svn: 286625	2016-11-11 19:52:45 +00:00
Geoff Berry	25fa4999ff	[AArch64] Fix bugs in isel lowering replaceSplatVectorStore. Summary: Fix off-by-one indexing error in loop checking that inserted value was a splat vector. Add code to check that INSERT_VECTOR_ELT nodes constructing the splat vector have the expected constant index values. Reviewers: t.p.northover, jmolloy, mcrosier Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26409 llvm-svn: 286616	2016-11-11 19:25:20 +00:00
Chad Rosier	d6e85ce3c3	[AArch64] Remove lots of redundant code. NFC. llvm-svn: 286606	2016-11-11 17:49:34 +00:00
Chad Rosier	31ee813068	[AArch64] Early return and minor renaming/refactoring to ease code review. NFC. llvm-svn: 286601	2016-11-11 17:07:37 +00:00
Nemanja Ivanovic	2efc3cb968	[PowerPC] Add vector conversion builtins to altivec.h - LLVM portion This patch corresponds to review: https://reviews.llvm.org/D26307 Adds all the intrinsics used for various conversion builtins that will be added to altivec.h. These are type conversions between various types of vectors. llvm-svn: 286596	2016-11-11 14:41:19 +00:00
Chad Rosier	10c7aaaee9	[AArch64] Enable merging of adjacent zero stores for all subtargets. This optimization merges adjacent zero stores into a wider store. e.g., strh wzr, [x0] strh wzr, [x0, #2] ; becomes str wzr, [x0] e.g., str wzr, [x0] str wzr, [x0, #4] ; becomes str xzr, [x0] Previously, this was only enabled for Kryo and Cortex-A57. Differential Revision: https://reviews.llvm.org/D26396 llvm-svn: 286592	2016-11-11 14:10:12 +00:00
Sam Kolton	ce0aba74c1	[AMDGPU] TargetStreamer: Fix .note section name llvm-svn: 286591	2016-11-11 13:41:52 +00:00
Ulrich Weigand	a0e7325023	[SystemZ] Support CL(G)T instructions This adds support for the compare logical and trap (memory) instructions that were added as part of the miscellaneous instruction extensions feature with zEC12. llvm-svn: 286587	2016-11-11 12:48:26 +00:00
Ulrich Weigand	92c2c672e5	[SystemZ] Support load-and-zero-rightmost-byte facility This adds support for the LZRF/LZRG/LLZRGF instructions that were added on z13, and uses them for code generation were appropriate. SystemZDAGToDAGISel::tryRISBGZero is updated again to prefer LLZRGF over RISBG where both would be possible. llvm-svn: 286586	2016-11-11 12:46:28 +00:00
Ulrich Weigand	5dc7b67c62	[SystemZ] Use LLGT(R) instructions This adds support for the 31-to-64-bit zero extension instructions LLGT and LLGTR and uses them for code generation where appropriate. Since this operation can also be performed via RISBG, we have to update SystemZDAGToDAGISel::tryRISBGZero so that we prefer LLGT over RISBG in case both are possible. The patch includes some simplification to the tryRISBGZero code; this is not intended to cause any (further) functional change in codegen. llvm-svn: 286585	2016-11-11 12:43:51 +00:00
Diana Picus	22274934f4	[ARM] Add plumbing for GlobalISel Add GlobalISel skeleton, up to the point where we can select a ret void. llvm-svn: 286573	2016-11-11 08:27:37 +00:00
Yaxun Liu	c5bf4b831d	AMDGPU: Attempt to fix build failure on x86-64 selfhost build Remove redundant include file. llvm-svn: 286552	2016-11-11 02:48:50 +00:00
Sean Fertile	e1ca561b0a	Add a blank line for a test commit. llvm-svn: 286550	2016-11-11 02:33:17 +00:00
Stanislav Mekhanoshin	6fc8a1cdaa	Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies" This reverts commit r286171, it breaks piglit test fs-discard-exit-2 llvm-svn: 286530	2016-11-11 00:22:34 +00:00
Joerg Sonnenberger	618d475c03	Fix requirements. llvm-svn: 286527	2016-11-10 23:53:45 +00:00
Matthias Braun	d67fa9dc6a	Timer: Remove group-less NamedRegionTimer constructor. The NamedRegionTimer initializer without a group name puts the Timer into the "Misc" group and is (nearly) unused. Remove it. The only user of this constructor appears to be the HexagonGenInsert pass, which creates a counter without group to count the complete execution time of that pass, however since every pass gets a counter by the PassManager anyway this should be unnecessary. Also removed the pointless TimerGroup there. Differential Revision: https://reviews.llvm.org/D25582 llvm-svn: 286524	2016-11-10 23:36:44 +00:00
Evandro Menezes	21f9ce1a0d	[DAG Combiner] Fix the native computation of the Newton series for reciprocals The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523	2016-11-10 23:31:06 +00:00
Yaxun Liu	d6fbe65040	AMDGPU: Emit runtime metadata as a note element in .note section Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata. However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html). This patch lets AMDGPU backend emits runtime metadata as a note element in .note section. Differential Revision: https://reviews.llvm.org/D25781 llvm-svn: 286502	2016-11-10 21:18:49 +00:00
Davide Italiano	a22ddddfea	[Target] Rename X86/ARM Assembly printer to reflect reality. This shows up a lot profiling LTO testcases with -time-passes, so better have a non confusing name. llvm-svn: 286488	2016-11-10 18:39:31 +00:00
Tom Stellard	115a61560e	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464	2016-11-10 16:02:37 +00:00
Oliver Stannard	18ca2adf2d	[ARM] Thumb2 LDR (literal) should accept PC as the destination The version of this instruction with the .w suffix already correctly accepts this, but the alias without the .w did not. Differential Revision: https://reviews.llvm.org/D26499 llvm-svn: 286446	2016-11-10 13:20:41 +00:00
Craig Topper	bd298c37d1	[AVX-512] Allow legacy cvtpd2dq intrinsics to select EVEX encoded instruction when available. llvm-svn: 286435	2016-11-10 07:47:17 +00:00
Craig Topper	e0845d8e8c	[AVX-512][X86] Convert avx_cvtt_ps2dq_256 and sse2_cvttps2dq intrinsics to ISD::FP_TO_SINT in the intrinsics table and delete patterns. While nearby also move CVTDQ2PS patterns into their instructions. This allows these intrinsics to also use EVEX instructons. llvm-svn: 286434	2016-11-10 07:24:52 +00:00
Craig Topper	f37b9b9b5f	[X86] Convert int_x86_avx_cvtt_pd2dq_256 to fp_to_sint using the intrinsics table. Removes extra patterns and allows legacy intrinsic to select EVEX encoded instructions when available. llvm-svn: 286433	2016-11-10 06:45:39 +00:00
Craig Topper	2afed2c790	[X86] Move some custom patterns into the currently empty pattern of their corresponding instructions. NFC llvm-svn: 286432	2016-11-10 06:45:37 +00:00
Craig Topper	1d2e74f030	[X86] Remove some patterns still referencing int_x86_sse2_cvttpd2dq that should have been removed in r286344. NFC llvm-svn: 286431	2016-11-10 06:45:34 +00:00
Peter Collingbourne	32ab3a817d	Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate.", with a fix for 32-bit x86. Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions that take a global address operand. llvm-svn: 286420	2016-11-09 23:53:43 +00:00
Tim Northover	a9105be437	GlobalISel: translate invoke and landingpad instructions Pretty bare-bones support for exception handling (no weird MSVC stuff, no SjLj etc), but it should get things going. llvm-svn: 286407	2016-11-09 22:39:54 +00:00
Peter Collingbourne	a9cadeddd4	Revert r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate." Suspected to be the cause of a sanitizer-windows bot failure: Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420 llvm-svn: 286385	2016-11-09 18:17:50 +00:00
Peter Collingbourne	4c15db45e4	X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate. A relocatable immediate is either an immediate operand or an operand that can be relocated by the linker to an immediate, such as a regular symbol in non-PIC code. Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands of type "imm32_su". Remove a number of now-redundant patterns. Differential Revision: https://reviews.llvm.org/D25812 llvm-svn: 286384	2016-11-09 17:51:58 +00:00
Krzysztof Parzyszek	f817efbbb0	[Hexagon] Silence "sometimes uninitialized" warning in HexagonCopyToCombine llvm-svn: 286383	2016-11-09 17:50:46 +00:00
Krzysztof Parzyszek	a540997ce4	[Hexagon] Separate Hexagon subreg indices for different register classes For pairs of 32-bit registers: isub_lo, isub_hi. For pairs of vector registers: vsub_lo, vsub_hi. Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg) that returns the appropriate subreg index for RegClass. llvm-svn: 286377	2016-11-09 16:19:08 +00:00
Krzysztof Parzyszek	601d7eb11a	[Hexagon] Eliminate Insert4 pseudo-instruction, use combines instead llvm-svn: 286368	2016-11-09 14:16:29 +00:00
Jonas Paulsson	e127fe7083	[SystemZ] A few fixes in scheduler files. Review: U Weigand llvm-svn: 286362	2016-11-09 12:47:57 +00:00
Jonas Paulsson	28f29487b9	[MachineScheduler] Comments fixing. The name/comment of the third argument to the ScheduleDAGMI constructor is RemoveKillFlags and not IsPostRA. Only the comments are changed. Review: A Trick llvm-svn: 286350	2016-11-09 09:59:27 +00:00
Craig Topper	f334ac19ad	[AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32 This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions. If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result. It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result. Differential Revision: https://reviews.llvm.org/D26331 llvm-svn: 286345	2016-11-09 07:48:51 +00:00
Craig Topper	731bf9c5d6	[X86] Lower AVX512 and SSE intrinsics for CVTTPD2DQ to X86ISD::CVTTPD2DQ. Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26330 llvm-svn: 286344	2016-11-09 07:31:32 +00:00
Craig Topper	28e3dfc02b	[AVX-512] Use alignedstore256 in patterns that look for stores of the lower 256-bits of a 512-bit vector to use a 256-bit aligned store. Previously we were only checking for 16 byte alignment instead of 32 byte alignment. Fixes PR30947. llvm-svn: 286342	2016-11-09 05:31:57 +00:00
Craig Topper	5c842be9a0	[AVX-512] Make VBMI instruction set enabling imply that the BWI instruction set is also enabled. Summary: This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912. Reviewers: delena, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26322 llvm-svn: 286339	2016-11-09 04:50:48 +00:00
Matthias Braun	c53cbbb1d1	AArch64DeadRegisterDefinitionsPass: Fix Changed flag Fix a bug in the calculation of the changed flag introduced in r285488. llvm-svn: 286293	2016-11-08 20:59:03 +00:00
Ulrich Weigand	05effca2d8	[SystemZ] Add missing FP extension instructions This completes assembler / disassembler support for all BFP instructions provided by the floating-point extensions facility. The instructions added here are not currently used for codegen. llvm-svn: 286285	2016-11-08 20:18:41 +00:00
Ulrich Weigand	4006e09d1d	[SystemZ] Add program mask and addressing mode instructions Add several instructions that operate on the program mask or the addressing mode. These are not really needed for code generation under Linux, but are provided for completeness for the assembler/disassembler. llvm-svn: 286284	2016-11-08 20:17:02 +00:00
Ulrich Weigand	fffc7110d6	[SystemZ] Model access registers as LLVM registers Add the 16 access registers as LLVM registers. This allows removing a lot of special cases in the assembler and disassembler where we were handling access registers; this can all just use the generic register code now. Also add a bunch of instructions to operate on access registers, for assembler/disassembler use only. No change in code generation intended. llvm-svn: 286283	2016-11-08 20:15:26 +00:00
Dan Gohman	e81021a5cb	[WebAssembly] Convert stackified IMPLICIT_DEF into constant 0. Since IMPLIFIT_DEF instructions are omitted in the output, when the output of an IMPLICIT_DEF instruction is stackified, the resulting register lacks an explicit push, leading to a push/pop mismatch. Fix this by converting such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit pushes. llvm-svn: 286274	2016-11-08 19:40:38 +00:00
Ulrich Weigand	3d07d45089	[SystemZ] Always use semantic instruction classes Define a couple of additional semantic classes and use them throughout the .td files to make them more consistent and more easily readable. No functional change. llvm-svn: 286268	2016-11-08 18:37:48 +00:00
Ulrich Weigand	bfcfa0e207	[SystemZ] Refactor InstRR* instruction format patterns This changes the InstRR (and related) patterns to no longer automatically add an "r" at the end of the mnemonic. This makes the .td files more obviously understandable, and also allows using the patterns for those few instructions that do not follow the *r scheme. Also add some more sub-formats of the RRF format class, to match operand names and sequence from the PoP better. No functional change. llvm-svn: 286267	2016-11-08 18:36:31 +00:00
Ulrich Weigand	37bd451a55	[SystemZ] Rename some Inst* instruction format classes Now that we've added instruction format subclasses like InstRIb, it makes sense to rename the old InstRI to InstRIa. Similar for InstRX, InstRXY, InstRS, InstRSY, and InstSS. No functional change. llvm-svn: 286266	2016-11-08 18:32:50 +00:00
Nirav Dave	e833c6c61a	[MC][AArch64] Cleanup end-of-line parsing in AArch64 AsmParser. Reviewers: t.p.northover, rengolin Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D26309 llvm-svn: 286265	2016-11-08 18:31:04 +00:00
Ulrich Weigand	d2148caffc	[SystemZ] Refactor branch and conditional instruction patterns Rework patterns for branches, call & return instructions, compare-and-branch, compare-and-trap, and conditional move instructions. In particular, simplify creation of patterns for the extended opcodes of instructions that take a CC mask. Also, use semantical instruction classes for all the instructions instead of open-coding them in SystemZInstrInfo.td. Adds a couple of the basic branch instructions (that are unused for codegen) for the assembler/disassembler. llvm-svn: 286263	2016-11-08 18:30:50 +00:00
Tim Northover	5f7dea85c2	GlobalISel: support selecting fpext/fptrunc instructions on AArch64. llvm-svn: 286253	2016-11-08 17:44:07 +00:00
Anton Korobeynikov	243a4700ce	Fix PR27500: on MSP430 the branch destination offset is measured in words, not bytes. Summary: In addition, the branch instructions will have proper BB destinations, not offsets, like before. Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23718 llvm-svn: 286252	2016-11-08 17:19:59 +00:00
Simon Pilgrim	d02c55204b	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233	2016-11-08 14:10:28 +00:00
Roger Ferrer Ibanez	80c0f33c29	[AArch64] Fix incorrect CSEL node created Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64 transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to "a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have the same type which can lead to a wrong CSEL node that crashes later due to nonsensical copies. Differential Revision: https://reviews.llvm.org/D26394 llvm-svn: 286231	2016-11-08 13:34:41 +00:00
Tim Northover	9ac0eba672	GlobalISel: support selecting G_SELECT on AArch64. llvm-svn: 286185	2016-11-08 00:45:29 +00:00
Tim Northover	7d88da6a46	GlobalISel: constrain PHI registers on AArch64. Self-referencing PHI nodes need their destination operands to be constrained because nothing else is likely to do so. For now we just pick a register class naively. Patch mostly by Ahmed again. llvm-svn: 286183	2016-11-08 00:34:06 +00:00
Stanislav Mekhanoshin	92e01ee90b	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a forward propagation of a v_cmp 64 bit result to an user is implemented. Additional side effect of this is that we may consume less VGPRs in a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. llvm-svn: 286171	2016-11-07 23:04:50 +00:00
Sanjin Sijaric	6f020d91a1	[AArch64] Transfer memory operands when lowering vector load/store intrinsics Summary: Some vector loads and stores generated from AArch64 intrinsics alias each other unnecessarily, preventing better scheduling. We just need to transfer memory operands during lowering. Reviewers: mcrosier, t.p.northover, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26313 llvm-svn: 286168	2016-11-07 22:39:02 +00:00
Derek Schuff	0d41b7b3f3	[WebAssembly] Emit a BasePointer when we have overly-aligned stack objects Because we shift the stack pointer by an unknown amount, we need an additional pointer. In the case where we have variable-size objects as well, we can't reuse the frame pointer, thus three pointers. Patch by Jacob Gravelle Differential Revision: https://reviews.llvm.org/D26263 llvm-svn: 286160	2016-11-07 22:00:48 +00:00
Davide Italiano	5df6066ec1	[AArch64] Remove dead store. Found by gcc7. llvm-svn: 286137	2016-11-07 19:11:25 +00:00
Matt Arsenault	f530e8b3f0	AMDGPU: Remove unnecessary and on conditional branch The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134	2016-11-07 19:09:33 +00:00
Matt Arsenault	52f14ec596	AMDGPU: Preserve vcc undef flags when inverting branch If the branch was on a read-undef of vcc, passes that used analyzeBranch to invert the branch condition wouldn't preserve the undef flag resulting in a verifier error. Fixes verifier failures in a future commit. Also fix verifier error when inserting copy for vccz corruption bug. llvm-svn: 286133	2016-11-07 19:09:27 +00:00
Matt Arsenault	2ae5653072	AMDGPU: Try to fix (non-clang?) bot builds llvm-svn: 286120	2016-11-07 16:52:50 +00:00
Matt Arsenault	314cbf7a3b	AMDGPU: Refactor copyPhysReg Separate the subregister splitting logic to re-use later. llvm-svn: 286118	2016-11-07 16:39:22 +00:00
Jonas Paulsson	4f0509fab3	[SystemZ] Correct the SchedModel regarding vector unit / instructions. * Use a generic vector unit to model the issue unit more accurately. * Update some vector instructions that actually use the vector unit for more than one cycle. Review: Ulrich Weigand llvm-svn: 286112	2016-11-07 15:45:06 +00:00
Amara Emerson	614b44bbe9	This patch adds support for 16 bit floating point registers to the inline asm register selection on AArch64. Without this patch, register allocation for the example below fails. define half @test(half %a1, half %a2) #0 { entry: %0 = tail call half asm "sqrshl ${0:h}, ${1:h}, ${2:h}", "=w,w,w" (half %a1, half %a2) #1 ret half %0 } Patch by Florian Hahn. Differential Revision: https://reviews.llvm.org/D25080 llvm-svn: 286111	2016-11-07 15:42:12 +00:00
Chad Rosier	d6daac4746	[AArch64] Removed the narrow load merging code in the ld/st optimizer. This feature has been disabled for some time now, so remove cruft. Differential Revision: https://reviews.llvm.org/D26248 llvm-svn: 286110	2016-11-07 15:27:22 +00:00
Jonas Paulsson	818431a61a	[SystemZ] Fixes in SchedModels for older subtargets. IssueWidth updated to reflect the capacity of the issue unit correctly. Correct number of FX and LS units modelled (2, was 1). Review: Ulrich Weigand llvm-svn: 286109	2016-11-07 14:47:25 +00:00
James Molloy	b03e0879fc	[Thumb1] Move padding earlier when synthesizing TBBs off of the PC When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding. If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset. In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4. llvm-svn: 286107	2016-11-07 13:38:21 +00:00
Dylan McKay	c988b334b6	[AVR] Enable the ISel, frame analyzer, and alloca passes llvm-svn: 286095	2016-11-07 06:02:55 +00:00
Craig Topper	b110e04851	[AVX-512] Remove masked pmovzx/pmovsx builtins and autoupgrade them to selects and native zext/sext. This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select. llvm-svn: 286092	2016-11-07 02:12:57 +00:00
Craig Topper	7e545335d6	[AVX-512] Remove 128/256 masked pshufb intrinsics. Autoupgrade them to legacy intrinsics and a select. llvm-svn: 286089	2016-11-07 00:13:39 +00:00
Krzysztof Parzyszek	39d14f3bc3	Reapply r286080 with a phony change in Hexagon's CMakeLists.txt Cmake has not recognized that Hexagon.td has a new dependency in HexagonPatterns.td. All changes to that file were not visible to the build bots. llvm-svn: 286084	2016-11-06 20:55:57 +00:00
Saleem Abdulrasool	804e12eeb5	ARM: lower fpowi appropriately for Windows ARM This handles the last case of the builtin function calls that we would generate code which differed from Microsoft's ABI. Rather than generating a call to `__pow{d,s}i2` we now promote the parameter to a float or double and invoke `powf` or `pow` instead. Addresses PR30825! llvm-svn: 286082	2016-11-06 19:46:54 +00:00
Krzysztof Parzyszek	f8d38d11b9	Revert r286080: it breaks build bots llvm-svn: 286081	2016-11-06 19:36:09 +00:00
Krzysztof Parzyszek	9e3520c884	[Hexagon] Remove redundant custom selection code The clr/set/toggle-bit instructions (with the bit index given as an immediate operand) had both, custom selection code that generated them, and selection patterns at the same time. The selection patterns were not used, because the custom selection code was executed first. This patch removes the custom code in favor of the selection patterns. The custom code handled 64-bit registers as well with an immediate bit index, and so new patterns were added to implement that. It was also the same case for the instruction "Rd += asr(Rs, Rt)", except that the custom code did not offer any additional functionality, and was simply removed. llvm-svn: 286080	2016-11-06 19:03:38 +00:00
Krzysztof Parzyszek	c93815ef04	[Hexagon] Round 5 of selection pattern simplifications Remove unnecessary type casts in patterns. llvm-svn: 286079	2016-11-06 18:13:14 +00:00
Krzysztof Parzyszek	f914278f8b	[Hexagon] Round 4 of selection pattern simplifications Give simpler or more meaningful names to pat frags and xforms. llvm-svn: 286078	2016-11-06 18:09:56 +00:00
Krzysztof Parzyszek	846597d081	[Hexagon] Round 3 of selection pattern simplifications Remove unnecessary C++ functions for SDNode transforms. Move more pat frags to files where they are used. llvm-svn: 286077	2016-11-06 18:05:14 +00:00
Krzysztof Parzyszek	84755104b4	[Hexagon] Round 2 of selection pattern simplifications Add pat frags for any-, sign-, and zero-extensions. llvm-svn: 286076	2016-11-06 17:56:48 +00:00
Craig Topper	46de41330c	[AVX-512] Remove intrinsics for 128/256-bit masked variable shift. Instead upgrade them to a select and the older AVX2 intrinsic. llvm-svn: 286073	2016-11-06 16:29:19 +00:00
Craig Topper	af9b3fe752	[AVX-512] Remove intrinsics for 128/256-bit masked shift by immediate. Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286072	2016-11-06 16:29:14 +00:00
Craig Topper	c9467ed31e	[AVX-512] Remove intrinsics for 128/256-bit masked shift by single element in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286070	2016-11-06 16:29:08 +00:00
Simon Pilgrim	b3ad5f7ebf	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsElementInsertion. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286067	2016-11-06 14:20:29 +00:00
Craig Topper	5471fc29e4	[AVX-512] Add missing EVEX version of pattern for (v2f64 (extloadv2f32 addr:)) -> VCVTPS2PDZ128rm llvm-svn: 286059	2016-11-06 04:12:52 +00:00
Craig Topper	1162857ec4	[AVX-512] Lower AVX cvtpd2ps intrinsic to ISD::FP_ROUND so it can use EVEX instruction when available. llvm-svn: 286057	2016-11-06 04:12:46 +00:00
Craig Topper	9a4a3af5dd	[AVX-512] Lower SSE/AVX cvtdq2ps intrinsics directly to ISD::SINT_TO_FP so they can use EVEX instructions when available. llvm-svn: 286056	2016-11-06 04:12:42 +00:00
Krzysztof Parzyszek	2839b29f4b	[Hexagon] Relocate pattern-related bits to proper places llvm-svn: 286049	2016-11-05 21:44:50 +00:00
Krzysztof Parzyszek	4b4012a5c9	[Hexagon] Round 1 of selection pattern simplifications Consistently use register class pat frags instead of spelling out the type and class each time. llvm-svn: 286048	2016-11-05 21:02:54 +00:00
Simon Pilgrim	4a9f210412	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBlend. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286045	2016-11-05 18:31:57 +00:00
Simon Pilgrim	725174694a	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsZeroOrAnyExtend. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286044	2016-11-05 18:22:13 +00:00
Simon Pilgrim	9f0afc6ae1	[X86][SSE] Reuse zeroable element mask in SSE4A EXTRQ/INSERTQ vector shuffle lowering. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286043	2016-11-05 18:05:13 +00:00
Simon Pilgrim	3cae21960e	[X86][SSE] Reuse zeroable element mask in PSHUFB vector shuffle lowering. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286042	2016-11-05 17:53:27 +00:00
Simon Pilgrim	64a592d0a2	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsInsertPS. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286040	2016-11-05 17:27:48 +00:00
Simon Pilgrim	009befbd88	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBitMask. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286039	2016-11-05 17:12:19 +00:00
Simon Pilgrim	1af0fc1103	[X86][SSE] Reuse zeroable element mask instead of regenerating it. NFCI We are repeatedly calling computeZeroableShuffleElements in many shuffle lowering calls for the same shuffle mask/inputs. This is a first step towards reusing the zeroable result, initially just for lowerVectorShuffleAsShift calls. llvm-svn: 286037	2016-11-05 16:40:20 +00:00
Krzysztof Parzyszek	a8d63dc289	[Hexagon] Split all selection patterns into a separate file This is just the basic separation, without any cleanup. Further changes will follow. llvm-svn: 286036	2016-11-05 15:01:38 +00:00
Simon Pilgrim	1b4e1ac966	Strip trailing whitespace. NFCI. llvm-svn: 286034	2016-11-05 14:43:04 +00:00
Krzysztof Parzyszek	b7eb7fc892	[Hexagon] Account for <def,read-undef> when validating moves for predication llvm-svn: 286009	2016-11-04 20:41:03 +00:00
Zvi Rackover	85bc64c734	[X86] Broadcast from memory intructions aren't unfoldable Broadcast from memory instructions should be treated as moves. They can't be unfolded. Fixes pr30693. llvm-svn: 285998	2016-11-04 15:15:19 +00:00
Tom Stellard	2d2d33f1dc	Revert "AMDGPU: Add VI i16 support" This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995	2016-11-04 13:06:34 +00:00
Justin Bogner	2c2c6ac7b5	X86: Move a non-null assert to before the pointer is dereferenced llvm-svn: 285975	2016-11-03 23:55:36 +00:00
Chandler Carruth	651f019297	Sink all of the code relying on the MachO MachineModuleInfo to live behind the test that the MachineModuleInfo analysis was actually available and can be used. While the MachO bits may well be reasonable to assume in the darwin assembly printer, the analysis isn't constructively guaranteed anywhere I could find so it seems safest to avoid crashing here. This issue was found with PVS-Studio. Pretty sure the Clang Static Anaylzer flags similar issues but we've probably never pointed it at this code effectively. llvm-svn: 285972	2016-11-03 23:33:46 +00:00
Weiming Zhao	962eaaea9c	[Cortex-M0] Atomic lowering Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls. Reviewers: rengolin, efriedma, t.p.northover, jmolloy Subscribers: efriedma, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26120 llvm-svn: 285969	2016-11-03 21:49:08 +00:00
Tony Jiang	946242b5d2	NFC - Test commit. Delete an empty line at the end of README.txt file. llvm-svn: 285964	2016-11-03 20:32:21 +00:00
Tom Stellard	cc34983181	AMDGPU/SI: Re add VIInstructions.td to unbreak bots This file is unused as of r285939, but we need to keep it around for bots that don't do full rebuilds. We should be able to delete this again in a few days. llvm-svn: 285948	2016-11-03 17:56:46 +00:00
Chandler Carruth	5589aa60c7	Remove a redundant condition found by PVS-Studio. Filed http://llvm.org/PR30897 to teach Clang to warn on this kind of stuff. llvm-svn: 285945	2016-11-03 17:42:02 +00:00
Tom Stellard	2b3379cdff	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939	2016-11-03 17:13:50 +00:00
Chandler Carruth	30e0029904	Delete a dead store found by PVS-Studio. Quite sad we still aren't really using aggressive dead code warnings from Clang that we could potentially use to catch this and so many other things. llvm-svn: 285936	2016-11-03 17:01:38 +00:00
Alexander Timofeev	f867a40bf6	[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads. hange explores the fact that LDS reads may be reordered even if access the same location. Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness. Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS. Differential Revision: https://reviews.llvm.org/D25944 llvm-svn: 285919	2016-11-03 14:37:13 +00:00
Zvi Rackover	a455864fdf	Refactor creation of X86ISD::SETCC nodes to a helper function. NFC. llvm-svn: 285917	2016-11-03 14:25:24 +00:00
James Molloy	e7d97368f2	Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently" This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 . llvm-svn: 285912	2016-11-03 14:08:01 +00:00
James Molloy	b60d8b1987	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk. For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 285893	2016-11-03 10:18:20 +00:00
Craig Topper	7b9cc1474f	[AVX-512] Use 'vnot' instead of 'not' in patterns involving vXi1 vectors. This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR. Reviewers: delena, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26134 llvm-svn: 285878	2016-11-03 06:04:28 +00:00
Elena Demikhovsky	caaceef4b3	Expandload and Compressstore intrinsics 2 new intrinsics covering AVX-512 compress/expand functionality. This implementation includes syntax, DAG builder, operation lowering and tests. Does not include: handling of illegal data types, codegen prepare pass and the cost model. llvm-svn: 285876	2016-11-03 03:23:55 +00:00
Krzysztof Parzyszek	ead77016d8	[Hexagon] Remove registers coalesced in expand-condsets from live intervals llvm-svn: 285846	2016-11-02 17:59:54 +00:00
Nicolai Haehnle	368972c3b3	AMDGPU: Allow additional implicit operands on MOVRELS instructions Summary: The post-RA scheduler occasionally uses additional implicit operands when the vector implicit operand as a whole is killed, but some subregisters are still live because they are directly referenced later. Unfortunately, this seems incredibly subtle to reproduce. Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test and others. Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25656 llvm-svn: 285835	2016-11-02 17:03:11 +00:00
Malcolm Parsons	06ac79c210	Fix Clang-tidy readability-redundant-string-cstr warnings Reviewers: beanz, lattner, jlebar Subscribers: jholewinski, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D26235 llvm-svn: 285832	2016-11-02 16:43:50 +00:00
Nirav Dave	0a392a8e7f	[ARM][MC] Cleanup ARM Target Assembly Parser Summary: Correctly parse end-of-statement tokens and handle preprocessor end-of-line comments in ARM assembly processor. Reviewers: rnk, majnemer Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26152 llvm-svn: 285830	2016-11-02 16:22:51 +00:00
Vasileios Kalintiris	e3bb72ea78	[mips] Always run the MipsOptimizePICCall pass. Summary: Remove this pass from addMachineSSAOptimization() and register it unconditionally in through addPreRegAlloc(). This pass is required for generating correct PIC calls. Reviewers: sdardis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26036 llvm-svn: 285814	2016-11-02 15:11:27 +00:00
Joerg Sonnenberger	bef3621ad0	Create the virtual register for the global base in the intersection of GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter. GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore not a true subclass of GPRC. llvm-svn: 285813	2016-11-02 15:00:31 +00:00
Aaron Ballman	3ac3a7efff	Removing a switch statement that contains a default label, but no case labels. Silences an MSVC warning; NFC. llvm-svn: 285806	2016-11-02 13:58:57 +00:00
Ulrich Weigand	75d2f1b10d	[SystemZ] Fix compiler warnings introduced by r285574 SystemZAsmParser::parseOperand returns a bool, not an enum. llvm-svn: 285800	2016-11-02 11:32:28 +00:00
Kirill Bobyrev	1f1751182e	[llvm] FIx if-clause -Wmisleading-indentation issue. While bootstrapping Clang with recent `gcc 6.2.0` I found a bug related to misleading indentation. I believe, a pair of `{}` was forgotten, especially given the above similar piece of code: ``` if (!RDef \|\| !HII->isPredicable(*RDef)) { Done = coalesceRegisters(RD, RegisterRef(S1)); if (Done) { UpdRegs.insert(RD.Reg); UpdRegs.insert(S1.getReg()); } } ``` Reviewers: kparzysz Differential Revision: https://reviews.llvm.org/D26204 llvm-svn: 285794	2016-11-02 10:00:40 +00:00
Dylan McKay	7549b0a013	[AVR] Add instruction selection lowering code Summary: This adds AVRISelLowering.cpp Reviewers: arsenm, kparzysz Subscribers: llvm-commits, modocache, japaric, wdng, beanz, mgorny Differential Revision: https://reviews.llvm.org/D25034 llvm-svn: 285790	2016-11-02 06:47:40 +00:00
Peter Collingbourne	4e76019e34	Support: Remove MemoryObject and DataStreamer interfaces. These interfaces are no longer used. Differential Revision: https://reviews.llvm.org/D26222 llvm-svn: 285774	2016-11-02 00:08:37 +00:00
Alex Bradbury	6b2cca7f8f	[RISCV] Add bare-bones RISC-V MCTargetDesc This is enough to compile and link but doesn't yet do anything particularly useful. Once an ASM parser and printer are added in the next two patches, the whole thing can be usefully tested. Differential Revision: https://reviews.llvm.org/D23562 llvm-svn: 285770	2016-11-01 23:47:30 +00:00
Alex Bradbury	24d9b13b36	[RISCV 4/10] Add basic RISCV{InstrFormats,InstrInfo,RegisterInfo,}.td For now, only add instruction definitions for basic ALU operations. Our initial target is a working MC layer rather than codegen, so appropriate SelectionDAG patterns will come later. Differential Revision: https://reviews.llvm.org/D23561 llvm-svn: 285769	2016-11-01 23:40:28 +00:00
Matt Arsenault	c507cdb4bc	AMDGPU: Handle CopyToReg in getOperandRegClass llvm-svn: 285768	2016-11-01 23:22:17 +00:00
Matt Arsenault	663ab8c119	AMDGPU: Use brev for materializing SGPR constants This is already done with VGPR immediates and saves 4 bytes. llvm-svn: 285765	2016-11-01 23:14:20 +00:00
Matt Arsenault	3d463193a9	AMDGPU: Default to using scalar mov to materialize immediate This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762	2016-11-01 22:55:07 +00:00
Matt Arsenault	a6319b82ca	AMDGPU: Stop creating unused virtual registers These are only used in the spill to VMEM path. Move them to the one use. llvm-svn: 285756	2016-11-01 21:58:07 +00:00
Matt Arsenault	2d8c289b4b	AMDGPU: Workaround for instruction size with literals Instructions with a 32-bit base encoding with an optional 32-bit literal encoded after them report their size as 4 for the disassembler. Consider these when computing the MachineInstr size. This fixes problems caused by size estimate consistency in BranchRelaxation. llvm-svn: 285743	2016-11-01 20:42:24 +00:00
Krzysztof Parzyszek	654dc11b79	[Hexagon] Rename operand/predicate names for unshifted integers For example, rename s6Ext to s6_0Ext. The names for shifted integers include the underscore and this will make the naming consistent. It also exposed a few duplicates that were removed. llvm-svn: 285728	2016-11-01 19:02:10 +00:00
Konstantin Zhuravlyov	d971a1123f	[AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32 This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716	2016-11-01 17:49:33 +00:00
Alex Bradbury	b2e5472d85	[RISCV] Add stub backend This contains just enough for lib/Target/RISCV to compile. Notably a basic RISCVTargetMachine and RISCVTargetInfo. At this point you can attempt llc -march=riscv32 myinput.ll and will find it fails due to the lack of MCAsmInfo. See http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html for further discussion Differential Revision: https://reviews.llvm.org/D23560 llvm-svn: 285712	2016-11-01 17:27:54 +00:00
Tom Stellard	9677b60288	AMDGPU: Fix buildbots broken by r285704 llvm-svn: 285711	2016-11-01 17:20:03 +00:00
Alex Bradbury	58eba09949	[TableGen] Move OperandMatchResultTy enum to MCTargetAsmParser.h As it stands, the OperandMatchResultTy is only included in the generated header if there is custom operand parsing. However, almost all backends make use of MatchOperand_Success and friends from OperandMatchResultTy for e.g. parseRegister. This is a pain when starting an AsmParser for a new backend that doesn't yet have custom operand parsing. Move the enum to MCTargetAsmParser.h. This patch is a prerequisite for D23563 Differential Revision: https://reviews.llvm.org/D23496 llvm-svn: 285705	2016-11-01 16:32:05 +00:00
Tom Stellard	94c21bc088	AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64 I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 llvm-svn: 285704	2016-11-01 16:31:48 +00:00
James Molloy	70a3d6df52	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables [Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment] The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 285690	2016-11-01 13:37:41 +00:00
Valery Pykhtin	8a89d3662a	[AMDGPU] Expand vector mulhu/mulhs Differential revision: https://reviews.llvm.org/D26077 llvm-svn: 285684	2016-11-01 10:26:48 +00:00
Nemanja Ivanovic	e70fa63390	[PowerPC] Implement vector shift builtins - llvm portion This patch corresponds to review https://reviews.llvm.org/D26095. Committing on behalf of Tony Jiang. llvm-svn: 285681	2016-11-01 09:42:32 +00:00
Matt Arsenault	f3dd863031	AMDGPU: Whitespace fixes llvm-svn: 285659	2016-11-01 00:55:14 +00:00
Davide Italiano	51cbe13a3f	[Hexagon] Garbage collect dead code. llvm-svn: 285654	2016-10-31 22:56:56 +00:00
Saleem Abdulrasool	e1aa782bd0	CodeGen: further loosen -O0 CG for WoA division Generate the slowest possible codepath for noopt CodeGen. Even trying to be clever with the negated jump can cause out-of-range jumps. Use a wide branch instead. Although the code is modelled simplistically, the later optimizations would recombine the branching into `cbz` if possible. This re-enables the previous optimization as well as hopefully gives us working code in all cases. Addresses PR30356! llvm-svn: 285649	2016-10-31 22:12:37 +00:00
Justin Lebar	ed1e312f05	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass. Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642	2016-10-31 21:51:42 +00:00
Nemanja Ivanovic	60bdfe5a7c	[PPC] add absolute difference altivec instructions and matching intrinsics This patch corresponds to review https://reviews.llvm.org/D26072. Committing on behalf of Sean Fertile. llvm-svn: 285627	2016-10-31 19:47:52 +00:00
Tim Northover	037af52c8b	GlobalISel: allow truncating pointer casts on AArch64. llvm-svn: 285615	2016-10-31 18:31:09 +00:00
Tim Northover	cdf23f1d93	GlobalISel: translate stack protector intrinsics llvm-svn: 285614	2016-10-31 18:30:59 +00:00
Michael Zuckerman	68a5c53616	[x86][inline-asm][AVX512][llvm][PART-2] Introducing "k" and "Yk" constraints for extended inline assembly, enabling use of AVX512 masked vectorized instructions. Commit on behalf of mharoush Extending inline assembly support, compatible with GCC as folowing: "k" constraint hints the compiler to select any of AVX512 k0-k7 registers. "Yk" constraint is a subset of "k" excluding k0 which is not allowd to be used as a mask. Reviewer: 1. rnk Differential Revision: https://reviews.llvm.org/D25062 llvm-svn: 285591	2016-10-31 16:19:58 +00:00
Artem Tamazov	54bfd548aa	[AMDGPU][MC][gfx8] Support 20-bit immediate offset in SMEM instructions. Fixes Bug 30808. Note that passing subtarget information to predicates seems too complicated, so gfx8-specific def smrd_offset_20 introduced. Old gfx6/7-specific def renamed to smrd_offset_8 for clarity. Lit tests updated. Differential Revision: https://reviews.llvm.org/D26085 llvm-svn: 285590	2016-10-31 16:07:39 +00:00
Krzysztof Parzyszek	22586dcb2a	[Hexagon] Don't expand mux instructions with both sources identical llvm-svn: 285588	2016-10-31 15:45:09 +00:00
Ulrich Weigand	2e5e51b3f3	[SystemZ] Rework processor feature definitions and add -mcpu=archX support This patch implements two changes: - Move processor feature definition into a new file SystemZFeatures.td, and provide explicit lists of supported and unsupported features for each level of the z/Architecture. This allows specifying unsupported features in the scheduler definition files for each processor. - Add optional aliases for the -mcpu processor names according to the level of the z/Architecture, for compatibility with other compilers on the platform. The supported aliases are: -mcpu=arch8 equals -mcpu=z10 -mcpu=arch9 equals -mcpu=z196 -mcpu=arch10 equals -mcpu=zEC12 -mcpu=arch11 equals -mcpu=z13 llvm-svn: 285577	2016-10-31 14:33:29 +00:00
Ulrich Weigand	d28be373d4	[SystemZ] Guard LEFR/LFER with FeatureVector The LEFR/LFER pseudos are aliases for vector instructions and should therefore be guared by FeatureVector. If they aren't, the TableGen scheduler definition checking might complain that there is no data for those pseudos for pre-z13 machines. No functional change intended. llvm-svn: 285576	2016-10-31 14:28:43 +00:00
Ulrich Weigand	d9001301d9	[SystemZ] Correctly diagnose missing features in AsmParser Currently, when using an instruction that is not supported on the currently selected architecture, the LLVM assembler is likely to diagnose an "invalid operand" instead of a "missing feature". This is because many operands require a custom parser in order to be processed correctly, and if an instruction is not available according to the current feature set, the generated parser code will also not detect the associated custom operand parsers. Fixed by temporarily enabling all features while parsing operands. The missing features will then be correctly detected when actually parsing the instruction itself. llvm-svn: 285575	2016-10-31 14:25:05 +00:00
Ulrich Weigand	ec5d779eb8	[SystemZ] Fix encoding of MVCK and .insn ss LLVM currently treats the first operand of MVCK as if it were a regular base+index+displacement address. However, it is in fact a base+displacement combined with a length register field. While the two might look syntactically similar, there are two semantic differences: - %r0 is a valid length register, even though it cannot be used as an index register. - In an expression with just a single register like 0(%rX), the register is treated as base with normal addresses, while it is treated as the length register (with an empty base) for MVCK. Fixed by adding a new operand parser class BDRAddr and reworking the assembler parser to distinguish between address + length register operands and regular addresses. llvm-svn: 285574	2016-10-31 14:21:36 +00:00
Jonas Paulsson	6788ddeac9	[SystemZ] Model 2 VBU units (not 1) in SystemZScheduleZ13.td. NFC. Review: Ulrich Weigand. llvm-svn: 285566	2016-10-31 13:05:48 +00:00
Alexey Bataev	d07c731d86	Improved cost model for FDIV and FSQRT, by Andrew Tischenko There is a bug describing poor cost model for floating point operations: Bug 29083 - [X86][SSE] Improve costs for floating point operations. This patch is the second one in series of patches dealing with cost model. Differential Revision: https://reviews.llvm.org/D25722 llvm-svn: 285564	2016-10-31 12:10:53 +00:00
Craig Topper	d4e580705d	[AVX-512] Add missing patterns for selecting masked vector extracts that started from shuffles. llvm-svn: 285546	2016-10-31 05:55:57 +00:00
Craig Topper	b7781a95fd	[X86] Use intrinsics table for PMADDUBSW and PMADDWD so that we can use the legacy intrinsics to select EVEX encoded instructions when available. This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type. llvm-svn: 285515	2016-10-30 06:56:16 +00:00
Craig Topper	bf9e5a16a4	[X86] Don't use loadv2i64 on SSE version of PMULHRSW. Use memopv2i64 instead. This bug was introduced in r285501. llvm-svn: 285510	2016-10-30 00:02:55 +00:00
Craig Topper	defe9ffbb5	[X86] Use intrinsics table for VPMULHRSW intrincis so that the legacy intrinsics can select EVEX encoded instructions when available. This requires a minor rename of the instructions due to the use of different tablegen classes and how the names are concatenated. llvm-svn: 285501	2016-10-29 18:41:45 +00:00
Elena Demikhovsky	519b4ccd70	Fixed FMA + FNEG combine. Masked form of FMA should be omitted in this optimization. Differential Revision: https://reviews.llvm.org/D25984 llvm-svn: 285492	2016-10-29 08:44:46 +00:00
Matt Arsenault	c88ba36eab	AMDGPU: Use 1/2pi inline imm on VI I'm guessing at how it is supposed to be printed llvm-svn: 285490	2016-10-29 04:05:06 +00:00
Matthias Braun	7d78614ae9	AArch64DeadRegisterDefinitionsPass: Cleanup; NFC - Fix doxygen file comment - reduce indentation in loop - Factor out some common subexpressions - Move independent helper function out of class - Fix Changed flag (this is not strictly NFC but a bugfix, but the flag seems ignored anyway) llvm-svn: 285488	2016-10-29 01:03:41 +00:00
Tom Stellard	6695ba0440	AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions Summary: Flat instruction can return out of order, so we need always need to wait for all the outstanding flat operations. Reviewers: tony-tye, arsenm Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D25998 llvm-svn: 285479	2016-10-28 23:53:48 +00:00
Matt Arsenault	4e9c1e3a79	AMDGPU: Fix instruction flags for s_endpgm Set isReturn, remove hasSideEffects. Also remove hasCtrlDep, I'm not really sure what that does. llvm-svn: 285476	2016-10-28 23:00:38 +00:00
Matt Arsenault	7b6475568d	AMDGPU: Add definitions for scalar store instructions Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463	2016-10-28 21:55:15 +00:00
Matt Arsenault	4b6a6cc8e9	AMDGPU: Rename glc operand type While trying to add the glc bit to SMEM instructions on VI with the new refactoring I ran into some kind of shadowing problem for the glc operand when using the pseudoinstruction as a multiclass parameter. Everywhere that currently uses it defines the operand to have the same name as its type, i.e. glc:$glc which works. For some reason now it conflicts, and its up evaluating to the wrong thing. For the real encoding classes, let Inst{16} = !if(ps.has_glc, glc, ?); was not being evaluated and still visible in the Inst initializer in the expanded td file. In other cases I got a a different error about an illegal operand where this was using { 0 } initializer from the bits<1> glc initializer instead of evaluating it as false in the if. For consistency all of the operand types should probably be captialized to avoid conflicting with the variable names unless somebody has a better idea of how to fix this. llvm-svn: 285462	2016-10-28 21:55:08 +00:00
Justin Lebar	f0a80ba385	[NVPTX] Compute 'rem' using the result of 'div', if possible. Summary: In isel, transform Num % Den into Num - (Num / Den) * Den if the result of Num / Den is already available. Reviewers: tra Subscribers: hfinkel, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26090 llvm-svn: 285461	2016-10-28 21:44:00 +00:00
Matt Arsenault	4eae301995	AMDGPU: Diagnose using too many SGPRs This is possible when using inline asm. llvm-svn: 285447	2016-10-28 20:31:47 +00:00
Matt Arsenault	08906a3c62	AMDGPU: Fix using incorrect private resource with no allocation It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435	2016-10-28 19:43:31 +00:00
Nemanja Ivanovic	e28a0fc72a	Implement vector count leading/trailing bytes with zero lsb and vector parity builtins - llvm portion This patch corresponds to review https://reviews.llvm.org/D26003. Committing on behalf of Zaara Syeda. llvm-svn: 285434	2016-10-28 19:38:24 +00:00
Krzysztof Parzyszek	87a47be039	[Hexagon] Maintain kill flags through splitting in expand-condsets Do not use LiveIntervals to recalculate kills, because that cannot be done accurately without implicit uses on predicated instructions. llvm-svn: 285409	2016-10-28 15:50:22 +00:00
Tom Stellard	aea899e2a0	AMDGPU/SI: Handle hazard with s_rfe_b64 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25638 llvm-svn: 285368	2016-10-27 23:50:21 +00:00
Tom Stellard	04051b5fad	AMDGPU/SI: Handle hazard with sgpr lane selects for v_{read,write}lane Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25637 llvm-svn: 285367	2016-10-27 23:42:29 +00:00
Tom Stellard	6b9c1be4ea	AMDGPU/SI: Fix unused variable warning on non-debug builds llvm-svn: 285363	2016-10-27 23:28:03 +00:00
Tom Stellard	b133fbb9a4	AMDGPU/SI: Handle hazard with > 8 byte VMEM stores Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25577 llvm-svn: 285359	2016-10-27 23:05:31 +00:00
Tom Stellard	30d30824b4	AMDGPU/SI: Handle s_setreg hazard in GCNHazardRecognizer Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25528 llvm-svn: 285338	2016-10-27 20:39:09 +00:00
Simon Pilgrim	d23219b9ee	[X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets llvm-svn: 285329	2016-10-27 18:32:06 +00:00
Simon Pilgrim	47c1ff7a43	[X86][AVX512DQ] Move v2i64 and v4i64 MUL lowering to tablegen As suggested by @igorb on D26011 llvm-svn: 285313	2016-10-27 17:07:40 +00:00
Saleem Abdulrasool	075d2e3c59	ARM: ensure that the Windows DBZ check is in range The Windows ARM target expects the compiler to emit a division-by-zero check. The check would use the form of: cmp r?, #0 cbz .Ltrap b .Lbody .Lbody: ... .Ltrap: udf #249 @ __brkdiv0 This works great most of the time. However, if the body of the function is greater than 127 bytes, the branch target limitation of cbz becomes an issue. This occurs in the unoptimized code generation cases sometimes (like in compiler-rt). Since this is a matter of correctness, possibly pay a small penalty instead. We now form this slightly differently: cbnz .Lbody udf #249 @ __brkdiv0 .Lbody: ... The positive case is through the branch instead of being the next instruction. However, because of the basic block layout, the negated branch is going to be a short distance always (2 bytes away, after the inserted __brkdiv0). The new t__brkdiv0 instruction is required to explicitly mark the instruction as a terminator as the generic UDF instruction is not a terminator. Addresses PR30532! llvm-svn: 285312	2016-10-27 16:59:22 +00:00
Vasileios Kalintiris	cfb005a0ee	[mips] Do not allow -opt-bisect-limit to skip the PIC call optimization pass. r282428 added the MipsOptimizePICCall as an opt-in pass that can be skipped when using the -opt-bisect-limit option. However, this pass is needed because it generates code that conforms to the o32 ABI specification by using the $t9 register for PIC calls with JALR instructions. This bug was exposed by the fact that skipFunction() also checks for the "optnone" attribute. This caused functions with that attribute to break the requirements of the o32 ABI. llvm-svn: 285305	2016-10-27 15:50:36 +00:00
Simon Pilgrim	820e1326d7	[X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64 With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304	2016-10-27 15:27:00 +00:00
Krzysztof Parzyszek	046da74699	[Hexagon] Do not expand ISD::SELECT for HVX vectors llvm-svn: 285297	2016-10-27 14:30:16 +00:00
Sam Parker	e7d9505c08	[ARM] Predicate UMAAL selection on hasDSP. UMAAL is a DSP instruction and it is not available on thumbv7m (Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong CHECK prefix in longMAC.ll test. Patch by Vadzim Dambrouski. Differential Revision: https://reviews.llvm.org/D25890 llvm-svn: 285278	2016-10-27 09:47:10 +00:00
Dylan McKay	dd680cc753	[AVR] Generate all of the TableGen files we need This enables generation of all of the TableGen files that are used downstream. llvm-svn: 285274	2016-10-27 08:20:47 +00:00
Nicolai Haehnle	7b0e25b7ad	AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies Summary: When finding a match for a merge and collecting the instructions that must be moved, keep in mind that the instruction we merge might actually use one of the defs that are being moved. Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect]. The fact that the ds_read in the test case is not eliminated suggests that there might be another problem related to alias analysis, but that's a separate problem: this pass should still work correctly even when earlier optimization passes missed something or were disabled. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25829 llvm-svn: 285273	2016-10-27 08:15:07 +00:00
Dylan McKay	00009d4824	[AVR] Compile the disassembler This also updates references of 'TheAVRTarget' to the new 'getTheAVRTarget()' method. llvm-svn: 285272	2016-10-27 08:09:15 +00:00
Dylan McKay	ec47065795	[AVR] Add AVRISelDAGToDAG.cpp Summary: This pulls the AVR instruction selector in-tree. Reviewers: arsenm, kparzysz Subscribers: llvm-commits, wdng, beanz, japaric, mgorny Differential Revision: https://reviews.llvm.org/D25278 llvm-svn: 285270	2016-10-27 07:03:47 +00:00
Dylan McKay	6eaa4e4bcc	[AVR] Add the machine code emitter Reviewers: arsenm, kparzysz Subscribers: wdng, beanz, japaric, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D25388 llvm-svn: 285269	2016-10-27 06:56:46 +00:00
Nemanja Ivanovic	32b5fed639	[PowerPC] - No SExt/ZExt needed for count trailing zeros This patch corresponds to review: https://reviews.llvm.org/D25896 It just eliminates the redundant ZExt after a count trailing zeros instruction. llvm-svn: 285267	2016-10-27 05:17:58 +00:00
Evandro Menezes	ca8370396a	[AArch64] Create feature set for Samsung Exynos-M2 Since Exynos-M2 improved the FP square root unit a bit over the one in Exynos-M1, it does not benefit from using the Newton series for such operations. llvm-svn: 285246	2016-10-26 22:06:20 +00:00
Tim Northover	a9cc385664	ARM: don't rely on push/pop reglists being in order when folding SP adjust. It would be a very nice invariant to rely on, but unfortunately it doesn't necessarily hold (and the causes of mis-sorted reglists appear to be quite varied) so to be robust the frame lowering code can't assume that the first register in the list is also the first one that actually gets pushed. Should fix an issue where we were turning something like: push {r8, r4, r7, lr} sub sp, #24 into nonsense like: push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr} llvm-svn: 285232	2016-10-26 20:01:00 +00:00
Nemanja Ivanovic	0f45998bc6	[PowerPC] Implement vec_insert_exp builtins - llvm portion This revision corresponds to review: https://reviews.llvm.org/D25957. Committing on behalf of Zaara Syeda. llvm-svn: 285225	2016-10-26 19:03:40 +00:00
Chad Rosier	0c621fda0d	[AArch64] Avoid materializing constant 1 when generating cneg instructions. Instead of cmp w0, #1 orr w8, wzr, #0x1 cneg w0, w8, ne we now generate cmp w0, #1 csinv w0, w0, wzr, eq PR28965 llvm-svn: 285217	2016-10-26 18:15:32 +00:00
Dan Gohman	68a423bf84	[WebAssembly] Update the README.txt. Update the README.txt with newer information, add a link to the Emscripten page explaining the current easiest way to use the LLVM wasm backend, and mention that other ways of using the LLVM wasm backend are in development. llvm-svn: 285215	2016-10-26 17:44:09 +00:00
Yaxun Liu	94add85adb	AMDGPU: Refactor processor definition to use ISA version features Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend. Refactor processor definition to use ISA version features. Fixed ISA version for stoney. Based on Laurent Morichetti's patch. Differential Revision: https://reviews.llvm.org/D25919 llvm-svn: 285210	2016-10-26 16:37:56 +00:00
Matt Arsenault	39787bdcbb	Reapply "AMDGPU: Don't use offen if it is 0" This reverts r283003 llvm-svn: 285203	2016-10-26 15:08:16 +00:00
Matt Arsenault	1110f14b42	AMDGPU: Fix counting si_mask_branch as 4 bytes llvm-svn: 285202	2016-10-26 14:53:54 +00:00
Tom Stellard	f8e6eaff6e	AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch Summary: A single flat memory operations that might access the scratch buffer can only access MaxPrivateElementSize bytes. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25788 llvm-svn: 285198	2016-10-26 14:38:47 +00:00
Zvi Rackover	aa3402b41e	[X86] AVX512 fallback for floating-point scalar selects Summary: In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches. Fixes pr30561 Reviewers: igorb, aymanmus, delena Differential Revision: https://reviews.llvm.org/D25310 llvm-svn: 285196	2016-10-26 14:12:46 +00:00
Craig Topper	812d3d30ae	[AVX-512] Add scalar vfmsub/vfnmsub mask3 intrinsics Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions. Reviewers: igorb, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25933 llvm-svn: 285173	2016-10-26 04:59:58 +00:00
James Y Knight	2e64b8b79e	[Sparc] Don't overlap variable-sized allocas with other stack variables. On SparcV8, it was previously the case that a variable-sized alloca might overlap by 4-bytes the last fixed stack variable, effectively because 92 (the number of bytes reserved for the register spill area) != 96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC). It's not as simple as changing 96 to 92, because variables that should be 8-byte aligned would then be misaligned. For now, simply increase the allocation size by 8 bytes for each dynamic allocation -- wastes space, but at least doesn't overlap. As the large comment says, doing this more efficiently will require larger changes in llvm. Also adds some test cases showing that we continue to not support dynamic stack allocation and over-alignment in the same function. llvm-svn: 285131	2016-10-25 22:13:28 +00:00
Evandro Menezes	7696dc0685	[AArch64] Adjust the cost model for Exynos M1. Modify the maximum jump table size. llvm-svn: 285106	2016-10-25 20:05:42 +00:00
Dan Gohman	f50d964bdb	[WebAssembly] Add immediate fields to call_indirect and memory operators. call_indirect, grow_memory, and current_memory now have immediate operands in the 0xd binary encoding. llvm-svn: 285085	2016-10-25 16:55:52 +00:00
Ulrich Weigand	7bdb485e18	[SystemZ] Do not use LOC(G) for volatile loads It is not safe to use LOAD ON CONDITION to implement access to a memory location marked "volatile", since the architecture leaves it unspecified whether or not an access happens if the condition is false. The current code already appears to care about that: def LOC : CondUnaryRSY<"loc", 0xEBF2, nonvolatile_load, GR32, 4>; Unfortunately, that "nonvolatile_load" operator is simply ignored by the CondUnaryRSY class, and there was no test to catch it. llvm-svn: 285077	2016-10-25 15:39:15 +00:00
Simon Pilgrim	5c3c9707c3	[X86][SSE] Add support for (V)PMOVSX* constant folding We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches. This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent. Differential Revision: https://reviews.llvm.org/D25874 llvm-svn: 285072	2016-10-25 14:29:25 +00:00
Benjamin Kramer	7df3043db3	Fix an unused warning in WebAssemblyInstPrinter with NDEBUG. Patch by Sam McCall! Differential Revision: https://reviews.llvm.org/D25934 llvm-svn: 285055	2016-10-25 09:08:50 +00:00
Craig Topper	01e4667e02	[AVX-512] Add support for creating SIGN_EXTEND_VECTOR_INREG and ZERO_EXTEND_VECTOR_INREG for 512-bit vectors to support vpmovzxbq and vpmovsxbq. Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added. Reviewers: delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25594 llvm-svn: 285053	2016-10-25 04:00:29 +00:00
Matthias Braun	c8440dddb2	MachineInstrBundle: Pass iterators to getBundle(Start\|End); NFC This is a function to go backwards in a block to find the first instruction in a bundle, so iterator is a more natural choice for parameter/return rather than a reference to a MachineInstruction. llvm-svn: 285051	2016-10-25 02:55:17 +00:00
Dan Gohman	48abaa9c74	[WebAssembly] Reorder load/store operands to match binary encoding. The p2align operand of a load/store is encoded before the offset operand; reorder the MachineInstr operands accordingly. llvm-svn: 285044	2016-10-25 00:17:11 +00:00
Dan Gohman	3acb187d95	[WebAssembly] Implement more WebAssembly binary encoding. This changes locals from being declared by the emitLocal hook in WebAssemblyTargetStreamer, rather than with an instruction. After exploring the infastructure in LLVM more, this seems to make more sense since declaring locals doesn't use an encoded opcode. This also adds more 0xd opcodes, type encodings, and miscellaneous binary encoding bits. llvm-svn: 285040	2016-10-24 23:27:49 +00:00
Matthias Braun	8b38ffaa98	CodeGen/Passes: Pass MachineFunction as functor arg; NFC Passing a MachineFunction as argument is more natural and avoids an unnecessary round-trip through the logic determining the correct Subtarget because MachineFunction already has a reference anyway. llvm-svn: 285039	2016-10-24 23:23:02 +00:00
Matthias Braun	fc371558a0	Use MachineInstr::mop_iterator instead of MIOperands; NFC (Const)?MIOperands is equivalent to the C++ style MachineInstr::mop_iterator. Use the latter for consistency except for a few callers of MIOperands::analyzePhysReg(). llvm-svn: 285029	2016-10-24 21:36:43 +00:00
Dan Gohman	5d3391f859	[WebAssembly] Fix a broken URL. llvm-svn: 285017	2016-10-24 20:35:17 +00:00
Dan Gohman	4becc58587	[WebAssembly] Define the `end` opcode value. CFGStackify differentiates between END_LOOP and END_BLOCK, but wasm itself doesn't. For now, just use the same opcode for both. llvm-svn: 285016	2016-10-24 20:32:04 +00:00
Dan Gohman	c968297b95	[WebAssembly] Update opcode values according to recent spec changes. This corresponds to the "0xd" opcode renumbering. llvm-svn: 285014	2016-10-24 20:21:49 +00:00
Dan Gohman	4fc4e42dea	[WebAssembly] Add an option to make get_local/set_local explicit. This patch adds a pass, controlled by an option and off by default for now, for making implicit get_local/set_local explicit. This simplifies emitting wasm with MC. Differential Revision: https://reviews.llvm.org/D25836 llvm-svn: 285009	2016-10-24 19:49:43 +00:00
Peter Collingbourne	6733564e5a	Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject. These functions are about classifying a global which will actually be emitted, so it does not make sense for them to take a GlobalValue which may for example be an alias. Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to look through aliases before using TargetLoweringObjectFile interfaces. These are functional changes but all appear to be bug fixes. Differential Revision: https://reviews.llvm.org/D25917 llvm-svn: 285006	2016-10-24 19:23:39 +00:00
Krzysztof Parzyszek	eb6172404d	Revert r284972 and remove other defaulted copy/move constructors/= David Blaikie pointed out that we get them for free without having to write anything. llvm-svn: 284996	2016-10-24 17:40:46 +00:00
Ehsan Amiri	c90b02cf50	[PPC] Generate positive FP zero using xor insn instead of loading from constant area https://reviews.llvm.org/D23614 Currently we load +0.0 from constant area. That can change to be generated using XOR instruction. llvm-svn: 284995	2016-10-24 17:31:09 +00:00
Eli Friedman	b37864b58d	Revert r284580+r284917. ("Synthesize TBB/TBH instructions") The optimization has correctness issues, so reverting for now to fix tests on thumb1 targets. llvm-svn: 284993	2016-10-24 17:20:50 +00:00
Evandro Menezes	eff2bd9d4f	[AArch64] Optionally use the Newton series for reciprocal estimation Add support for estimating the square root or its reciprocal and division or reciprocal using the combiner generic Newton series. Differential revision: https://reviews.llvm.org/D25291 llvm-svn: 284986	2016-10-24 16:14:58 +00:00
Ehsan Amiri	1f31e9157d	[PPC] Better codegen for AND, ANY_EXT, SRL sequence https://reviews.llvm.org/D24924 This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review. llvm-svn: 284983	2016-10-24 15:46:58 +00:00
Nicolai Haehnle	a785209bc2	AMDGPU: Fix Two Address problems with v_movreld Summary: The v_movreld machine instruction is used with three operands that are in a sense tied to each other (the explicit VGPR_32 def and the implicit VGPR_NN def and use). There is no way to express that using the currently available operand bits, and indeed there are cases where the Two Address instructions pass does the wrong thing. This patch introduces a new set of pseudo instructions that are identical in intended semantics as v_movreld, but they only have two tied operands. Having to add a new set of pseudo instructions is admittedly annoying, but it's a fairly straightforward and solid approach. The only alternative I see is to try to teach the Two Address instructions pass about Three Address instructions, and I'm afraid that's trickier and is going to end up more fragile. Note that v_movrels does not suffer from this problem, and so this patch does not touch it. This fixes several GL45-CTS.shaders.indexing.* tests. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25633 llvm-svn: 284980	2016-10-24 14:56:02 +00:00
Pavel Labath	51c454c1a9	Remove unused #includes of TimeValue.h. NFC. llvm-svn: 284975	2016-10-24 14:00:26 +00:00
Joel Jones	504bf334b0	AArch64 ILP32 relocations for assembly and ELF Summary: Add relocations for AArch64 ILP32. Includes: - Addition of definitions for R_AARCH32_* - Definition of new -target-abi: ilp32 - Definition of data layout string - Tests for added relocations. Not comprehensive, but matches existing tests for 64-bit. Renames "CHECK-OBJ" to "CHECK-OBJ-LP64". - Tests for llvm-readobj Reviewers: zatrazz, peter.smith, echristo, t.p.northover Subscribers: aemerson, rengolin, mehdi_amini Differential Revision: https://reviews.llvm.org/D25159 llvm-svn: 284973	2016-10-24 13:37:13 +00:00
Krzysztof Parzyszek	f74683f930	[RDF] Add default move constructors/assignment operators llvm-svn: 284972	2016-10-24 13:15:20 +00:00
Simon Dardis	9c34854833	[mips] synci microMIPS instruction definition. Add synci to the microMIPS instruction definitions, mark the MIPS sync & synci as not being part of microMIPS. This does not cover the sync instruction alias, as that will be handled with a different patch. Add sync to the valid tests for microMIPS. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D25795 llvm-svn: 284962	2016-10-24 10:23:59 +00:00
Craig Topper	8ec5c7326d	[AVX-512] Remove masked pmin/pmax intrinsics and autoupgrade to native IR. Clang patch to replace 512-bit vector and 64-bit element versions with native IR will follow. llvm-svn: 284955	2016-10-24 04:04:16 +00:00
Simon Pilgrim	6ac1e98b09	[X86][SSE] Add SSE41/AVX1 costs for vector shifts. We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939	2016-10-23 16:49:04 +00:00
Simon Pilgrim	96ef0c1103	Use APInt::isAllOnesValue instead of popcnt. NFCI. More obvious implementation and faster too. llvm-svn: 284937	2016-10-23 15:09:44 +00:00
Dylan McKay	479a13c0aa	[AVR] Add the machine code disassembler This adds a super basic implementation of a machine code disassembler. It doesn't support any operands with custom encoding. llvm-svn: 284930	2016-10-22 23:57:59 +00:00
Simon Pilgrim	d3829c89bc	[X86][AVX512VL] Added support for combining target 256-bit shuffles to AVX512VL VPERMV3 llvm-svn: 284922	2016-10-22 20:15:39 +00:00
Simon Pilgrim	56c0524f0f	[X86][AVX512] Added support for combining target shuffles to AVX512 VPERMV3 llvm-svn: 284921	2016-10-22 19:53:59 +00:00
James Molloy	2bae8640d7	[ARM] Fix crash in ConstantIslands tPCRelJT may not be the first instruction in a block. Check that instead of dereferencing a broken iterator. llvm-svn: 284917	2016-10-22 09:58:37 +00:00
Craig Topper	b084c90a18	[X86] Add support for printing shuffle comments for VALIGN instructions. llvm-svn: 284915	2016-10-22 06:51:56 +00:00
Craig Topper	7b2b8db438	[X86] Add support for lowering v4i64 and v8i64 shuffles directly to PALIGNR. I think shuffle combine can figure it out later, but we should try to get it right up front. llvm-svn: 284914	2016-10-22 06:51:52 +00:00
Craig Topper	9f374533e3	[X86] Remove unnecessary AVX2 check that was already covered by an assertion earlier in the function. NFC llvm-svn: 284913	2016-10-22 06:51:49 +00:00
Craig Topper	bea5cb5491	[X86] Remove 128-bit lane handling from the main loop of matchVectorShuffleAsByteRotate. Instead check for is128LaneRepeatedSuffleMask before the loop and just loop over the repeated mask. I plan to use the loop to support VALIGND/Q shuffles so this makes it easier to reuse. llvm-svn: 284912	2016-10-22 06:51:44 +00:00
Simon Pilgrim	0d376bcbf0	[X86][SSE] Use getConstVector helper for VPERMV mask generation. NFCI. llvm-svn: 284911	2016-10-22 06:18:36 +00:00
Konstantin Zhuravlyov	fda33eaf0c	[AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/cvt_f32_ubyte.ll Differential Revision: https://reviews.llvm.org/D25805 llvm-svn: 284891	2016-10-21 22:10:03 +00:00
Tom Stellard	6c7dd980e4	AMDGPU/SI: Fix crash caused by r284267 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25782 llvm-svn: 284875	2016-10-21 20:25:11 +00:00
Peter Collingbourne	e9bd49824d	X86: Improve BT instruction selection for 64-bit values. If a 64-bit value is tested against a bit which is known to be in the range [0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly shorter encoding. Differential Revision: https://reviews.llvm.org/D25862 llvm-svn: 284864	2016-10-21 19:57:55 +00:00
Simon Pilgrim	ab48872313	[X86][AVX512BWVL] Added support for lowering v16i16 shuffles to AVX512BWVL vpermw llvm-svn: 284863	2016-10-21 19:54:38 +00:00
Simon Pilgrim	da814cba0d	[X86][AVX512BWVL] Added support for combining target v16i16 shuffles to AVX512BWVL vpermw llvm-svn: 284860	2016-10-21 19:40:29 +00:00
Simon Pilgrim	0109bf116f	[X86][AVX512] Added support for combining target shuffles to AVX512 vpermpd/vpermq/vpermps/vpermd/vpermw llvm-svn: 284858	2016-10-21 19:18:09 +00:00
Krzysztof Parzyszek	6e7fa99d3a	[RDF] Use RegisterId typedef more consistently, NFC llvm-svn: 284857	2016-10-21 19:12:13 +00:00
Krzysztof Parzyszek	b71085b547	[Hexagon] Handle spills of partially defined double vector registers After register allocation it is possible to have a spill of a register that is only partially defined. That in itself it fine, but creates a problem for double vector registers. Stores of such registers are pseudo instructions that are expanded into pairs of individual vector stores, and in case of a partially defined source, one of the stores may use an entirely undefined register. To avoid this, track the defined parts and only generate actual stores for those. llvm-svn: 284841	2016-10-21 16:38:29 +00:00
Derek Schuff	6f69783f1f	[WebAssembly] Fix for 0xc call_indirect changes Summary: Need to reorder the operands to have the callee as the last argument. Adds a pseudo-instruction, and a pass to lower it into a real call_indirect. This is the first of two options for how to fix the problem. Reviewers: dschuff, sunfish Subscribers: jfb, beanz, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D25708 llvm-svn: 284840	2016-10-21 16:38:07 +00:00
Abderrazek Zaafrani	9daf8110c8	Set the vectorizer MaxInterleaveFactor for Exynos. llvm-svn: 284839	2016-10-21 16:28:27 +00:00
Simon Pilgrim	2d96daa885	[X86] Use DAG::getBuildVector helper wrapper where possible. NFCI. llvm-svn: 284835	2016-10-21 16:07:51 +00:00
Abderrazek Zaafrani	9f382f53d1	Test commit llvm-svn: 284832	2016-10-21 15:24:08 +00:00
Artem Tamazov	751985a757	[AMDGPU][mc] Fix ds_min/max[_rtn]_f32 - extra source operand removed. Fixes Bug 28215. Lit tests updated. Differential Revision: https://reviews.llvm.org/D25837 llvm-svn: 284825	2016-10-21 14:49:22 +00:00
Simon Pilgrim	c98d99a600	[X86][AVX2] Begun generalizing lowering to VPERMD/VPERMPS in preparation for AVX512 support. llvm-svn: 284823	2016-10-21 13:00:47 +00:00
Simon Pilgrim	32b06235da	[X86][AVX512] Add mask/maskz writemask support to subvector broadcast shuffle decode comments llvm-svn: 284821	2016-10-21 12:14:24 +00:00
Bjorn Pettersson	9fcd605d1e	[AArch64] Corrected spill size for DDD register class. NFCI Summary: The spill size was incorrectly set to 196 bits, which isn't a multiple of 8. This problem was detected when experimenting with asserts that the spill size should be a multiple of the byte size. New corrected value for the spill size is set to 192 bits. Note that tablegen (RegisterInfoEmitter) will divide the size set in the RegisterClass definition by 8. So this change should not have any impact on the tablegen output (trunc(192/8) == trunc(196/8) == 24 bytes). Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: https://reviews.llvm.org/D25818 llvm-svn: 284814	2016-10-21 09:53:42 +00:00
Michael Kuperstein	b2443ed62b	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779	2016-10-20 21:04:31 +00:00
Konstantin Zhuravlyov	521e5ef4ce	[AMDGPU] Make note record name a static const member of target streamer Differential Revision: https://reviews.llvm.org/D25746 llvm-svn: 284760	2016-10-20 18:22:36 +00:00
Konstantin Zhuravlyov	08326b6256	[AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only) Differential Revision: https://reviews.llvm.org/D25693 llvm-svn: 284759	2016-10-20 18:12:38 +00:00
Simon Pilgrim	365be4f95c	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors We weren't checking for uniform const costs before the general cost, resulting in very high estimates. llvm-svn: 284755	2016-10-20 18:00:35 +00:00
Sanjay Patel	0051efcf97	[Target] remove TargetRecip class; 2nd try This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746	2016-10-20 16:55:45 +00:00

... 4 5 6 7 8 ...

40345 Commits