llvm-project

Commit Graph

Author	SHA1	Message	Date
Sean Fertile	e1ca561b0a	Add a blank line for a test commit. llvm-svn: 286550	2016-11-11 02:33:17 +00:00
Stanislav Mekhanoshin	6fc8a1cdaa	Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies" This reverts commit r286171, it breaks piglit test fs-discard-exit-2 llvm-svn: 286530	2016-11-11 00:22:34 +00:00
Joerg Sonnenberger	618d475c03	Fix requirements. llvm-svn: 286527	2016-11-10 23:53:45 +00:00
Matthias Braun	d67fa9dc6a	Timer: Remove group-less NamedRegionTimer constructor. The NamedRegionTimer initializer without a group name puts the Timer into the "Misc" group and is (nearly) unused. Remove it. The only user of this constructor appears to be the HexagonGenInsert pass, which creates a counter without group to count the complete execution time of that pass, however since every pass gets a counter by the PassManager anyway this should be unnecessary. Also removed the pointless TimerGroup there. Differential Revision: https://reviews.llvm.org/D25582 llvm-svn: 286524	2016-11-10 23:36:44 +00:00
Evandro Menezes	21f9ce1a0d	[DAG Combiner] Fix the native computation of the Newton series for reciprocals The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523	2016-11-10 23:31:06 +00:00
Yaxun Liu	d6fbe65040	AMDGPU: Emit runtime metadata as a note element in .note section Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata. However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html). This patch lets AMDGPU backend emits runtime metadata as a note element in .note section. Differential Revision: https://reviews.llvm.org/D25781 llvm-svn: 286502	2016-11-10 21:18:49 +00:00
Davide Italiano	a22ddddfea	[Target] Rename X86/ARM Assembly printer to reflect reality. This shows up a lot profiling LTO testcases with -time-passes, so better have a non confusing name. llvm-svn: 286488	2016-11-10 18:39:31 +00:00
Tom Stellard	115a61560e	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464	2016-11-10 16:02:37 +00:00
Oliver Stannard	18ca2adf2d	[ARM] Thumb2 LDR (literal) should accept PC as the destination The version of this instruction with the .w suffix already correctly accepts this, but the alias without the .w did not. Differential Revision: https://reviews.llvm.org/D26499 llvm-svn: 286446	2016-11-10 13:20:41 +00:00
Craig Topper	bd298c37d1	[AVX-512] Allow legacy cvtpd2dq intrinsics to select EVEX encoded instruction when available. llvm-svn: 286435	2016-11-10 07:47:17 +00:00
Craig Topper	e0845d8e8c	[AVX-512][X86] Convert avx_cvtt_ps2dq_256 and sse2_cvttps2dq intrinsics to ISD::FP_TO_SINT in the intrinsics table and delete patterns. While nearby also move CVTDQ2PS patterns into their instructions. This allows these intrinsics to also use EVEX instructons. llvm-svn: 286434	2016-11-10 07:24:52 +00:00
Craig Topper	f37b9b9b5f	[X86] Convert int_x86_avx_cvtt_pd2dq_256 to fp_to_sint using the intrinsics table. Removes extra patterns and allows legacy intrinsic to select EVEX encoded instructions when available. llvm-svn: 286433	2016-11-10 06:45:39 +00:00
Craig Topper	2afed2c790	[X86] Move some custom patterns into the currently empty pattern of their corresponding instructions. NFC llvm-svn: 286432	2016-11-10 06:45:37 +00:00
Craig Topper	1d2e74f030	[X86] Remove some patterns still referencing int_x86_sse2_cvttpd2dq that should have been removed in r286344. NFC llvm-svn: 286431	2016-11-10 06:45:34 +00:00
Peter Collingbourne	32ab3a817d	Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate.", with a fix for 32-bit x86. Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions that take a global address operand. llvm-svn: 286420	2016-11-09 23:53:43 +00:00
Tim Northover	a9105be437	GlobalISel: translate invoke and landingpad instructions Pretty bare-bones support for exception handling (no weird MSVC stuff, no SjLj etc), but it should get things going. llvm-svn: 286407	2016-11-09 22:39:54 +00:00
Peter Collingbourne	a9cadeddd4	Revert r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate." Suspected to be the cause of a sanitizer-windows bot failure: Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420 llvm-svn: 286385	2016-11-09 18:17:50 +00:00
Peter Collingbourne	4c15db45e4	X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate. A relocatable immediate is either an immediate operand or an operand that can be relocated by the linker to an immediate, such as a regular symbol in non-PIC code. Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands of type "imm32_su". Remove a number of now-redundant patterns. Differential Revision: https://reviews.llvm.org/D25812 llvm-svn: 286384	2016-11-09 17:51:58 +00:00
Krzysztof Parzyszek	f817efbbb0	[Hexagon] Silence "sometimes uninitialized" warning in HexagonCopyToCombine llvm-svn: 286383	2016-11-09 17:50:46 +00:00
Krzysztof Parzyszek	a540997ce4	[Hexagon] Separate Hexagon subreg indices for different register classes For pairs of 32-bit registers: isub_lo, isub_hi. For pairs of vector registers: vsub_lo, vsub_hi. Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg) that returns the appropriate subreg index for RegClass. llvm-svn: 286377	2016-11-09 16:19:08 +00:00
Krzysztof Parzyszek	601d7eb11a	[Hexagon] Eliminate Insert4 pseudo-instruction, use combines instead llvm-svn: 286368	2016-11-09 14:16:29 +00:00
Jonas Paulsson	e127fe7083	[SystemZ] A few fixes in scheduler files. Review: U Weigand llvm-svn: 286362	2016-11-09 12:47:57 +00:00
Jonas Paulsson	28f29487b9	[MachineScheduler] Comments fixing. The name/comment of the third argument to the ScheduleDAGMI constructor is RemoveKillFlags and not IsPostRA. Only the comments are changed. Review: A Trick llvm-svn: 286350	2016-11-09 09:59:27 +00:00
Craig Topper	f334ac19ad	[AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32 This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions. If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result. It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result. Differential Revision: https://reviews.llvm.org/D26331 llvm-svn: 286345	2016-11-09 07:48:51 +00:00
Craig Topper	731bf9c5d6	[X86] Lower AVX512 and SSE intrinsics for CVTTPD2DQ to X86ISD::CVTTPD2DQ. Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26330 llvm-svn: 286344	2016-11-09 07:31:32 +00:00
Craig Topper	28e3dfc02b	[AVX-512] Use alignedstore256 in patterns that look for stores of the lower 256-bits of a 512-bit vector to use a 256-bit aligned store. Previously we were only checking for 16 byte alignment instead of 32 byte alignment. Fixes PR30947. llvm-svn: 286342	2016-11-09 05:31:57 +00:00
Craig Topper	5c842be9a0	[AVX-512] Make VBMI instruction set enabling imply that the BWI instruction set is also enabled. Summary: This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912. Reviewers: delena, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26322 llvm-svn: 286339	2016-11-09 04:50:48 +00:00
Matthias Braun	c53cbbb1d1	AArch64DeadRegisterDefinitionsPass: Fix Changed flag Fix a bug in the calculation of the changed flag introduced in r285488. llvm-svn: 286293	2016-11-08 20:59:03 +00:00
Ulrich Weigand	05effca2d8	[SystemZ] Add missing FP extension instructions This completes assembler / disassembler support for all BFP instructions provided by the floating-point extensions facility. The instructions added here are not currently used for codegen. llvm-svn: 286285	2016-11-08 20:18:41 +00:00
Ulrich Weigand	4006e09d1d	[SystemZ] Add program mask and addressing mode instructions Add several instructions that operate on the program mask or the addressing mode. These are not really needed for code generation under Linux, but are provided for completeness for the assembler/disassembler. llvm-svn: 286284	2016-11-08 20:17:02 +00:00
Ulrich Weigand	fffc7110d6	[SystemZ] Model access registers as LLVM registers Add the 16 access registers as LLVM registers. This allows removing a lot of special cases in the assembler and disassembler where we were handling access registers; this can all just use the generic register code now. Also add a bunch of instructions to operate on access registers, for assembler/disassembler use only. No change in code generation intended. llvm-svn: 286283	2016-11-08 20:15:26 +00:00
Dan Gohman	e81021a5cb	[WebAssembly] Convert stackified IMPLICIT_DEF into constant 0. Since IMPLIFIT_DEF instructions are omitted in the output, when the output of an IMPLICIT_DEF instruction is stackified, the resulting register lacks an explicit push, leading to a push/pop mismatch. Fix this by converting such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit pushes. llvm-svn: 286274	2016-11-08 19:40:38 +00:00
Ulrich Weigand	3d07d45089	[SystemZ] Always use semantic instruction classes Define a couple of additional semantic classes and use them throughout the .td files to make them more consistent and more easily readable. No functional change. llvm-svn: 286268	2016-11-08 18:37:48 +00:00
Ulrich Weigand	bfcfa0e207	[SystemZ] Refactor InstRR* instruction format patterns This changes the InstRR (and related) patterns to no longer automatically add an "r" at the end of the mnemonic. This makes the .td files more obviously understandable, and also allows using the patterns for those few instructions that do not follow the *r scheme. Also add some more sub-formats of the RRF format class, to match operand names and sequence from the PoP better. No functional change. llvm-svn: 286267	2016-11-08 18:36:31 +00:00
Ulrich Weigand	37bd451a55	[SystemZ] Rename some Inst* instruction format classes Now that we've added instruction format subclasses like InstRIb, it makes sense to rename the old InstRI to InstRIa. Similar for InstRX, InstRXY, InstRS, InstRSY, and InstSS. No functional change. llvm-svn: 286266	2016-11-08 18:32:50 +00:00
Nirav Dave	e833c6c61a	[MC][AArch64] Cleanup end-of-line parsing in AArch64 AsmParser. Reviewers: t.p.northover, rengolin Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D26309 llvm-svn: 286265	2016-11-08 18:31:04 +00:00
Ulrich Weigand	d2148caffc	[SystemZ] Refactor branch and conditional instruction patterns Rework patterns for branches, call & return instructions, compare-and-branch, compare-and-trap, and conditional move instructions. In particular, simplify creation of patterns for the extended opcodes of instructions that take a CC mask. Also, use semantical instruction classes for all the instructions instead of open-coding them in SystemZInstrInfo.td. Adds a couple of the basic branch instructions (that are unused for codegen) for the assembler/disassembler. llvm-svn: 286263	2016-11-08 18:30:50 +00:00
Tim Northover	5f7dea85c2	GlobalISel: support selecting fpext/fptrunc instructions on AArch64. llvm-svn: 286253	2016-11-08 17:44:07 +00:00
Anton Korobeynikov	243a4700ce	Fix PR27500: on MSP430 the branch destination offset is measured in words, not bytes. Summary: In addition, the branch instructions will have proper BB destinations, not offsets, like before. Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23718 llvm-svn: 286252	2016-11-08 17:19:59 +00:00
Simon Pilgrim	d02c55204b	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233	2016-11-08 14:10:28 +00:00
Roger Ferrer Ibanez	80c0f33c29	[AArch64] Fix incorrect CSEL node created Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64 transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to "a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have the same type which can lead to a wrong CSEL node that crashes later due to nonsensical copies. Differential Revision: https://reviews.llvm.org/D26394 llvm-svn: 286231	2016-11-08 13:34:41 +00:00
Tim Northover	9ac0eba672	GlobalISel: support selecting G_SELECT on AArch64. llvm-svn: 286185	2016-11-08 00:45:29 +00:00
Tim Northover	7d88da6a46	GlobalISel: constrain PHI registers on AArch64. Self-referencing PHI nodes need their destination operands to be constrained because nothing else is likely to do so. For now we just pick a register class naively. Patch mostly by Ahmed again. llvm-svn: 286183	2016-11-08 00:34:06 +00:00
Stanislav Mekhanoshin	92e01ee90b	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a forward propagation of a v_cmp 64 bit result to an user is implemented. Additional side effect of this is that we may consume less VGPRs in a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. llvm-svn: 286171	2016-11-07 23:04:50 +00:00
Sanjin Sijaric	6f020d91a1	[AArch64] Transfer memory operands when lowering vector load/store intrinsics Summary: Some vector loads and stores generated from AArch64 intrinsics alias each other unnecessarily, preventing better scheduling. We just need to transfer memory operands during lowering. Reviewers: mcrosier, t.p.northover, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26313 llvm-svn: 286168	2016-11-07 22:39:02 +00:00
Derek Schuff	0d41b7b3f3	[WebAssembly] Emit a BasePointer when we have overly-aligned stack objects Because we shift the stack pointer by an unknown amount, we need an additional pointer. In the case where we have variable-size objects as well, we can't reuse the frame pointer, thus three pointers. Patch by Jacob Gravelle Differential Revision: https://reviews.llvm.org/D26263 llvm-svn: 286160	2016-11-07 22:00:48 +00:00
Davide Italiano	5df6066ec1	[AArch64] Remove dead store. Found by gcc7. llvm-svn: 286137	2016-11-07 19:11:25 +00:00
Matt Arsenault	f530e8b3f0	AMDGPU: Remove unnecessary and on conditional branch The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134	2016-11-07 19:09:33 +00:00
Matt Arsenault	52f14ec596	AMDGPU: Preserve vcc undef flags when inverting branch If the branch was on a read-undef of vcc, passes that used analyzeBranch to invert the branch condition wouldn't preserve the undef flag resulting in a verifier error. Fixes verifier failures in a future commit. Also fix verifier error when inserting copy for vccz corruption bug. llvm-svn: 286133	2016-11-07 19:09:27 +00:00
Matt Arsenault	2ae5653072	AMDGPU: Try to fix (non-clang?) bot builds llvm-svn: 286120	2016-11-07 16:52:50 +00:00
Matt Arsenault	314cbf7a3b	AMDGPU: Refactor copyPhysReg Separate the subregister splitting logic to re-use later. llvm-svn: 286118	2016-11-07 16:39:22 +00:00
Jonas Paulsson	4f0509fab3	[SystemZ] Correct the SchedModel regarding vector unit / instructions. * Use a generic vector unit to model the issue unit more accurately. * Update some vector instructions that actually use the vector unit for more than one cycle. Review: Ulrich Weigand llvm-svn: 286112	2016-11-07 15:45:06 +00:00
Amara Emerson	614b44bbe9	This patch adds support for 16 bit floating point registers to the inline asm register selection on AArch64. Without this patch, register allocation for the example below fails. define half @test(half %a1, half %a2) #0 { entry: %0 = tail call half asm "sqrshl ${0:h}, ${1:h}, ${2:h}", "=w,w,w" (half %a1, half %a2) #1 ret half %0 } Patch by Florian Hahn. Differential Revision: https://reviews.llvm.org/D25080 llvm-svn: 286111	2016-11-07 15:42:12 +00:00
Chad Rosier	d6daac4746	[AArch64] Removed the narrow load merging code in the ld/st optimizer. This feature has been disabled for some time now, so remove cruft. Differential Revision: https://reviews.llvm.org/D26248 llvm-svn: 286110	2016-11-07 15:27:22 +00:00
Jonas Paulsson	818431a61a	[SystemZ] Fixes in SchedModels for older subtargets. IssueWidth updated to reflect the capacity of the issue unit correctly. Correct number of FX and LS units modelled (2, was 1). Review: Ulrich Weigand llvm-svn: 286109	2016-11-07 14:47:25 +00:00
James Molloy	b03e0879fc	[Thumb1] Move padding earlier when synthesizing TBBs off of the PC When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding. If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset. In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4. llvm-svn: 286107	2016-11-07 13:38:21 +00:00
Dylan McKay	c988b334b6	[AVR] Enable the ISel, frame analyzer, and alloca passes llvm-svn: 286095	2016-11-07 06:02:55 +00:00
Craig Topper	b110e04851	[AVX-512] Remove masked pmovzx/pmovsx builtins and autoupgrade them to selects and native zext/sext. This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select. llvm-svn: 286092	2016-11-07 02:12:57 +00:00
Craig Topper	7e545335d6	[AVX-512] Remove 128/256 masked pshufb intrinsics. Autoupgrade them to legacy intrinsics and a select. llvm-svn: 286089	2016-11-07 00:13:39 +00:00
Krzysztof Parzyszek	39d14f3bc3	Reapply r286080 with a phony change in Hexagon's CMakeLists.txt Cmake has not recognized that Hexagon.td has a new dependency in HexagonPatterns.td. All changes to that file were not visible to the build bots. llvm-svn: 286084	2016-11-06 20:55:57 +00:00
Saleem Abdulrasool	804e12eeb5	ARM: lower fpowi appropriately for Windows ARM This handles the last case of the builtin function calls that we would generate code which differed from Microsoft's ABI. Rather than generating a call to `__pow{d,s}i2` we now promote the parameter to a float or double and invoke `powf` or `pow` instead. Addresses PR30825! llvm-svn: 286082	2016-11-06 19:46:54 +00:00
Krzysztof Parzyszek	f8d38d11b9	Revert r286080: it breaks build bots llvm-svn: 286081	2016-11-06 19:36:09 +00:00
Krzysztof Parzyszek	9e3520c884	[Hexagon] Remove redundant custom selection code The clr/set/toggle-bit instructions (with the bit index given as an immediate operand) had both, custom selection code that generated them, and selection patterns at the same time. The selection patterns were not used, because the custom selection code was executed first. This patch removes the custom code in favor of the selection patterns. The custom code handled 64-bit registers as well with an immediate bit index, and so new patterns were added to implement that. It was also the same case for the instruction "Rd += asr(Rs, Rt)", except that the custom code did not offer any additional functionality, and was simply removed. llvm-svn: 286080	2016-11-06 19:03:38 +00:00
Krzysztof Parzyszek	c93815ef04	[Hexagon] Round 5 of selection pattern simplifications Remove unnecessary type casts in patterns. llvm-svn: 286079	2016-11-06 18:13:14 +00:00
Krzysztof Parzyszek	f914278f8b	[Hexagon] Round 4 of selection pattern simplifications Give simpler or more meaningful names to pat frags and xforms. llvm-svn: 286078	2016-11-06 18:09:56 +00:00
Krzysztof Parzyszek	846597d081	[Hexagon] Round 3 of selection pattern simplifications Remove unnecessary C++ functions for SDNode transforms. Move more pat frags to files where they are used. llvm-svn: 286077	2016-11-06 18:05:14 +00:00
Krzysztof Parzyszek	84755104b4	[Hexagon] Round 2 of selection pattern simplifications Add pat frags for any-, sign-, and zero-extensions. llvm-svn: 286076	2016-11-06 17:56:48 +00:00
Craig Topper	46de41330c	[AVX-512] Remove intrinsics for 128/256-bit masked variable shift. Instead upgrade them to a select and the older AVX2 intrinsic. llvm-svn: 286073	2016-11-06 16:29:19 +00:00
Craig Topper	af9b3fe752	[AVX-512] Remove intrinsics for 128/256-bit masked shift by immediate. Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286072	2016-11-06 16:29:14 +00:00
Craig Topper	c9467ed31e	[AVX-512] Remove intrinsics for 128/256-bit masked shift by single element in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286070	2016-11-06 16:29:08 +00:00
Simon Pilgrim	b3ad5f7ebf	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsElementInsertion. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286067	2016-11-06 14:20:29 +00:00
Craig Topper	5471fc29e4	[AVX-512] Add missing EVEX version of pattern for (v2f64 (extloadv2f32 addr:)) -> VCVTPS2PDZ128rm llvm-svn: 286059	2016-11-06 04:12:52 +00:00
Craig Topper	1162857ec4	[AVX-512] Lower AVX cvtpd2ps intrinsic to ISD::FP_ROUND so it can use EVEX instruction when available. llvm-svn: 286057	2016-11-06 04:12:46 +00:00
Craig Topper	9a4a3af5dd	[AVX-512] Lower SSE/AVX cvtdq2ps intrinsics directly to ISD::SINT_TO_FP so they can use EVEX instructions when available. llvm-svn: 286056	2016-11-06 04:12:42 +00:00
Krzysztof Parzyszek	2839b29f4b	[Hexagon] Relocate pattern-related bits to proper places llvm-svn: 286049	2016-11-05 21:44:50 +00:00
Krzysztof Parzyszek	4b4012a5c9	[Hexagon] Round 1 of selection pattern simplifications Consistently use register class pat frags instead of spelling out the type and class each time. llvm-svn: 286048	2016-11-05 21:02:54 +00:00
Simon Pilgrim	4a9f210412	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBlend. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286045	2016-11-05 18:31:57 +00:00
Simon Pilgrim	725174694a	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsZeroOrAnyExtend. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286044	2016-11-05 18:22:13 +00:00
Simon Pilgrim	9f0afc6ae1	[X86][SSE] Reuse zeroable element mask in SSE4A EXTRQ/INSERTQ vector shuffle lowering. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286043	2016-11-05 18:05:13 +00:00
Simon Pilgrim	3cae21960e	[X86][SSE] Reuse zeroable element mask in PSHUFB vector shuffle lowering. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286042	2016-11-05 17:53:27 +00:00
Simon Pilgrim	64a592d0a2	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsInsertPS. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286040	2016-11-05 17:27:48 +00:00
Simon Pilgrim	009befbd88	[X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBitMask. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286039	2016-11-05 17:12:19 +00:00
Simon Pilgrim	1af0fc1103	[X86][SSE] Reuse zeroable element mask instead of regenerating it. NFCI We are repeatedly calling computeZeroableShuffleElements in many shuffle lowering calls for the same shuffle mask/inputs. This is a first step towards reusing the zeroable result, initially just for lowerVectorShuffleAsShift calls. llvm-svn: 286037	2016-11-05 16:40:20 +00:00
Krzysztof Parzyszek	a8d63dc289	[Hexagon] Split all selection patterns into a separate file This is just the basic separation, without any cleanup. Further changes will follow. llvm-svn: 286036	2016-11-05 15:01:38 +00:00
Simon Pilgrim	1b4e1ac966	Strip trailing whitespace. NFCI. llvm-svn: 286034	2016-11-05 14:43:04 +00:00
Krzysztof Parzyszek	b7eb7fc892	[Hexagon] Account for <def,read-undef> when validating moves for predication llvm-svn: 286009	2016-11-04 20:41:03 +00:00
Zvi Rackover	85bc64c734	[X86] Broadcast from memory intructions aren't unfoldable Broadcast from memory instructions should be treated as moves. They can't be unfolded. Fixes pr30693. llvm-svn: 285998	2016-11-04 15:15:19 +00:00
Tom Stellard	2d2d33f1dc	Revert "AMDGPU: Add VI i16 support" This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995	2016-11-04 13:06:34 +00:00
Justin Bogner	2c2c6ac7b5	X86: Move a non-null assert to before the pointer is dereferenced llvm-svn: 285975	2016-11-03 23:55:36 +00:00
Chandler Carruth	651f019297	Sink all of the code relying on the MachO MachineModuleInfo to live behind the test that the MachineModuleInfo analysis was actually available and can be used. While the MachO bits may well be reasonable to assume in the darwin assembly printer, the analysis isn't constructively guaranteed anywhere I could find so it seems safest to avoid crashing here. This issue was found with PVS-Studio. Pretty sure the Clang Static Anaylzer flags similar issues but we've probably never pointed it at this code effectively. llvm-svn: 285972	2016-11-03 23:33:46 +00:00
Weiming Zhao	962eaaea9c	[Cortex-M0] Atomic lowering Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls. Reviewers: rengolin, efriedma, t.p.northover, jmolloy Subscribers: efriedma, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26120 llvm-svn: 285969	2016-11-03 21:49:08 +00:00
Tony Jiang	946242b5d2	NFC - Test commit. Delete an empty line at the end of README.txt file. llvm-svn: 285964	2016-11-03 20:32:21 +00:00
Tom Stellard	cc34983181	AMDGPU/SI: Re add VIInstructions.td to unbreak bots This file is unused as of r285939, but we need to keep it around for bots that don't do full rebuilds. We should be able to delete this again in a few days. llvm-svn: 285948	2016-11-03 17:56:46 +00:00
Chandler Carruth	5589aa60c7	Remove a redundant condition found by PVS-Studio. Filed http://llvm.org/PR30897 to teach Clang to warn on this kind of stuff. llvm-svn: 285945	2016-11-03 17:42:02 +00:00
Tom Stellard	2b3379cdff	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939	2016-11-03 17:13:50 +00:00
Chandler Carruth	30e0029904	Delete a dead store found by PVS-Studio. Quite sad we still aren't really using aggressive dead code warnings from Clang that we could potentially use to catch this and so many other things. llvm-svn: 285936	2016-11-03 17:01:38 +00:00
Alexander Timofeev	f867a40bf6	[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads. hange explores the fact that LDS reads may be reordered even if access the same location. Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness. Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS. Differential Revision: https://reviews.llvm.org/D25944 llvm-svn: 285919	2016-11-03 14:37:13 +00:00
Zvi Rackover	a455864fdf	Refactor creation of X86ISD::SETCC nodes to a helper function. NFC. llvm-svn: 285917	2016-11-03 14:25:24 +00:00
James Molloy	e7d97368f2	Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently" This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 . llvm-svn: 285912	2016-11-03 14:08:01 +00:00
James Molloy	b60d8b1987	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk. For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 285893	2016-11-03 10:18:20 +00:00
Craig Topper	7b9cc1474f	[AVX-512] Use 'vnot' instead of 'not' in patterns involving vXi1 vectors. This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR. Reviewers: delena, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26134 llvm-svn: 285878	2016-11-03 06:04:28 +00:00
Elena Demikhovsky	caaceef4b3	Expandload and Compressstore intrinsics 2 new intrinsics covering AVX-512 compress/expand functionality. This implementation includes syntax, DAG builder, operation lowering and tests. Does not include: handling of illegal data types, codegen prepare pass and the cost model. llvm-svn: 285876	2016-11-03 03:23:55 +00:00
Krzysztof Parzyszek	ead77016d8	[Hexagon] Remove registers coalesced in expand-condsets from live intervals llvm-svn: 285846	2016-11-02 17:59:54 +00:00
Nicolai Haehnle	368972c3b3	AMDGPU: Allow additional implicit operands on MOVRELS instructions Summary: The post-RA scheduler occasionally uses additional implicit operands when the vector implicit operand as a whole is killed, but some subregisters are still live because they are directly referenced later. Unfortunately, this seems incredibly subtle to reproduce. Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test and others. Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25656 llvm-svn: 285835	2016-11-02 17:03:11 +00:00
Malcolm Parsons	06ac79c210	Fix Clang-tidy readability-redundant-string-cstr warnings Reviewers: beanz, lattner, jlebar Subscribers: jholewinski, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D26235 llvm-svn: 285832	2016-11-02 16:43:50 +00:00
Nirav Dave	0a392a8e7f	[ARM][MC] Cleanup ARM Target Assembly Parser Summary: Correctly parse end-of-statement tokens and handle preprocessor end-of-line comments in ARM assembly processor. Reviewers: rnk, majnemer Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26152 llvm-svn: 285830	2016-11-02 16:22:51 +00:00
Vasileios Kalintiris	e3bb72ea78	[mips] Always run the MipsOptimizePICCall pass. Summary: Remove this pass from addMachineSSAOptimization() and register it unconditionally in through addPreRegAlloc(). This pass is required for generating correct PIC calls. Reviewers: sdardis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26036 llvm-svn: 285814	2016-11-02 15:11:27 +00:00
Joerg Sonnenberger	bef3621ad0	Create the virtual register for the global base in the intersection of GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter. GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore not a true subclass of GPRC. llvm-svn: 285813	2016-11-02 15:00:31 +00:00
Aaron Ballman	3ac3a7efff	Removing a switch statement that contains a default label, but no case labels. Silences an MSVC warning; NFC. llvm-svn: 285806	2016-11-02 13:58:57 +00:00
Ulrich Weigand	75d2f1b10d	[SystemZ] Fix compiler warnings introduced by r285574 SystemZAsmParser::parseOperand returns a bool, not an enum. llvm-svn: 285800	2016-11-02 11:32:28 +00:00
Kirill Bobyrev	1f1751182e	[llvm] FIx if-clause -Wmisleading-indentation issue. While bootstrapping Clang with recent `gcc 6.2.0` I found a bug related to misleading indentation. I believe, a pair of `{}` was forgotten, especially given the above similar piece of code: ``` if (!RDef \|\| !HII->isPredicable(*RDef)) { Done = coalesceRegisters(RD, RegisterRef(S1)); if (Done) { UpdRegs.insert(RD.Reg); UpdRegs.insert(S1.getReg()); } } ``` Reviewers: kparzysz Differential Revision: https://reviews.llvm.org/D26204 llvm-svn: 285794	2016-11-02 10:00:40 +00:00
Dylan McKay	7549b0a013	[AVR] Add instruction selection lowering code Summary: This adds AVRISelLowering.cpp Reviewers: arsenm, kparzysz Subscribers: llvm-commits, modocache, japaric, wdng, beanz, mgorny Differential Revision: https://reviews.llvm.org/D25034 llvm-svn: 285790	2016-11-02 06:47:40 +00:00
Peter Collingbourne	4e76019e34	Support: Remove MemoryObject and DataStreamer interfaces. These interfaces are no longer used. Differential Revision: https://reviews.llvm.org/D26222 llvm-svn: 285774	2016-11-02 00:08:37 +00:00
Alex Bradbury	6b2cca7f8f	[RISCV] Add bare-bones RISC-V MCTargetDesc This is enough to compile and link but doesn't yet do anything particularly useful. Once an ASM parser and printer are added in the next two patches, the whole thing can be usefully tested. Differential Revision: https://reviews.llvm.org/D23562 llvm-svn: 285770	2016-11-01 23:47:30 +00:00
Alex Bradbury	24d9b13b36	[RISCV 4/10] Add basic RISCV{InstrFormats,InstrInfo,RegisterInfo,}.td For now, only add instruction definitions for basic ALU operations. Our initial target is a working MC layer rather than codegen, so appropriate SelectionDAG patterns will come later. Differential Revision: https://reviews.llvm.org/D23561 llvm-svn: 285769	2016-11-01 23:40:28 +00:00
Matt Arsenault	c507cdb4bc	AMDGPU: Handle CopyToReg in getOperandRegClass llvm-svn: 285768	2016-11-01 23:22:17 +00:00
Matt Arsenault	663ab8c119	AMDGPU: Use brev for materializing SGPR constants This is already done with VGPR immediates and saves 4 bytes. llvm-svn: 285765	2016-11-01 23:14:20 +00:00
Matt Arsenault	3d463193a9	AMDGPU: Default to using scalar mov to materialize immediate This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762	2016-11-01 22:55:07 +00:00
Matt Arsenault	a6319b82ca	AMDGPU: Stop creating unused virtual registers These are only used in the spill to VMEM path. Move them to the one use. llvm-svn: 285756	2016-11-01 21:58:07 +00:00
Matt Arsenault	2d8c289b4b	AMDGPU: Workaround for instruction size with literals Instructions with a 32-bit base encoding with an optional 32-bit literal encoded after them report their size as 4 for the disassembler. Consider these when computing the MachineInstr size. This fixes problems caused by size estimate consistency in BranchRelaxation. llvm-svn: 285743	2016-11-01 20:42:24 +00:00
Krzysztof Parzyszek	654dc11b79	[Hexagon] Rename operand/predicate names for unshifted integers For example, rename s6Ext to s6_0Ext. The names for shifted integers include the underscore and this will make the naming consistent. It also exposed a few duplicates that were removed. llvm-svn: 285728	2016-11-01 19:02:10 +00:00
Konstantin Zhuravlyov	d971a1123f	[AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32 This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716	2016-11-01 17:49:33 +00:00
Alex Bradbury	b2e5472d85	[RISCV] Add stub backend This contains just enough for lib/Target/RISCV to compile. Notably a basic RISCVTargetMachine and RISCVTargetInfo. At this point you can attempt llc -march=riscv32 myinput.ll and will find it fails due to the lack of MCAsmInfo. See http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html for further discussion Differential Revision: https://reviews.llvm.org/D23560 llvm-svn: 285712	2016-11-01 17:27:54 +00:00
Tom Stellard	9677b60288	AMDGPU: Fix buildbots broken by r285704 llvm-svn: 285711	2016-11-01 17:20:03 +00:00
Alex Bradbury	58eba09949	[TableGen] Move OperandMatchResultTy enum to MCTargetAsmParser.h As it stands, the OperandMatchResultTy is only included in the generated header if there is custom operand parsing. However, almost all backends make use of MatchOperand_Success and friends from OperandMatchResultTy for e.g. parseRegister. This is a pain when starting an AsmParser for a new backend that doesn't yet have custom operand parsing. Move the enum to MCTargetAsmParser.h. This patch is a prerequisite for D23563 Differential Revision: https://reviews.llvm.org/D23496 llvm-svn: 285705	2016-11-01 16:32:05 +00:00
Tom Stellard	94c21bc088	AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64 I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 llvm-svn: 285704	2016-11-01 16:31:48 +00:00
James Molloy	70a3d6df52	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables [Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment] The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 285690	2016-11-01 13:37:41 +00:00
Valery Pykhtin	8a89d3662a	[AMDGPU] Expand vector mulhu/mulhs Differential revision: https://reviews.llvm.org/D26077 llvm-svn: 285684	2016-11-01 10:26:48 +00:00
Nemanja Ivanovic	e70fa63390	[PowerPC] Implement vector shift builtins - llvm portion This patch corresponds to review https://reviews.llvm.org/D26095. Committing on behalf of Tony Jiang. llvm-svn: 285681	2016-11-01 09:42:32 +00:00
Matt Arsenault	f3dd863031	AMDGPU: Whitespace fixes llvm-svn: 285659	2016-11-01 00:55:14 +00:00
Davide Italiano	51cbe13a3f	[Hexagon] Garbage collect dead code. llvm-svn: 285654	2016-10-31 22:56:56 +00:00
Saleem Abdulrasool	e1aa782bd0	CodeGen: further loosen -O0 CG for WoA division Generate the slowest possible codepath for noopt CodeGen. Even trying to be clever with the negated jump can cause out-of-range jumps. Use a wide branch instead. Although the code is modelled simplistically, the later optimizations would recombine the branching into `cbz` if possible. This re-enables the previous optimization as well as hopefully gives us working code in all cases. Addresses PR30356! llvm-svn: 285649	2016-10-31 22:12:37 +00:00
Justin Lebar	ed1e312f05	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass. Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642	2016-10-31 21:51:42 +00:00
Nemanja Ivanovic	60bdfe5a7c	[PPC] add absolute difference altivec instructions and matching intrinsics This patch corresponds to review https://reviews.llvm.org/D26072. Committing on behalf of Sean Fertile. llvm-svn: 285627	2016-10-31 19:47:52 +00:00
Tim Northover	037af52c8b	GlobalISel: allow truncating pointer casts on AArch64. llvm-svn: 285615	2016-10-31 18:31:09 +00:00
Tim Northover	cdf23f1d93	GlobalISel: translate stack protector intrinsics llvm-svn: 285614	2016-10-31 18:30:59 +00:00
Michael Zuckerman	68a5c53616	[x86][inline-asm][AVX512][llvm][PART-2] Introducing "k" and "Yk" constraints for extended inline assembly, enabling use of AVX512 masked vectorized instructions. Commit on behalf of mharoush Extending inline assembly support, compatible with GCC as folowing: "k" constraint hints the compiler to select any of AVX512 k0-k7 registers. "Yk" constraint is a subset of "k" excluding k0 which is not allowd to be used as a mask. Reviewer: 1. rnk Differential Revision: https://reviews.llvm.org/D25062 llvm-svn: 285591	2016-10-31 16:19:58 +00:00
Artem Tamazov	54bfd548aa	[AMDGPU][MC][gfx8] Support 20-bit immediate offset in SMEM instructions. Fixes Bug 30808. Note that passing subtarget information to predicates seems too complicated, so gfx8-specific def smrd_offset_20 introduced. Old gfx6/7-specific def renamed to smrd_offset_8 for clarity. Lit tests updated. Differential Revision: https://reviews.llvm.org/D26085 llvm-svn: 285590	2016-10-31 16:07:39 +00:00
Krzysztof Parzyszek	22586dcb2a	[Hexagon] Don't expand mux instructions with both sources identical llvm-svn: 285588	2016-10-31 15:45:09 +00:00
Ulrich Weigand	2e5e51b3f3	[SystemZ] Rework processor feature definitions and add -mcpu=archX support This patch implements two changes: - Move processor feature definition into a new file SystemZFeatures.td, and provide explicit lists of supported and unsupported features for each level of the z/Architecture. This allows specifying unsupported features in the scheduler definition files for each processor. - Add optional aliases for the -mcpu processor names according to the level of the z/Architecture, for compatibility with other compilers on the platform. The supported aliases are: -mcpu=arch8 equals -mcpu=z10 -mcpu=arch9 equals -mcpu=z196 -mcpu=arch10 equals -mcpu=zEC12 -mcpu=arch11 equals -mcpu=z13 llvm-svn: 285577	2016-10-31 14:33:29 +00:00
Ulrich Weigand	d28be373d4	[SystemZ] Guard LEFR/LFER with FeatureVector The LEFR/LFER pseudos are aliases for vector instructions and should therefore be guared by FeatureVector. If they aren't, the TableGen scheduler definition checking might complain that there is no data for those pseudos for pre-z13 machines. No functional change intended. llvm-svn: 285576	2016-10-31 14:28:43 +00:00
Ulrich Weigand	d9001301d9	[SystemZ] Correctly diagnose missing features in AsmParser Currently, when using an instruction that is not supported on the currently selected architecture, the LLVM assembler is likely to diagnose an "invalid operand" instead of a "missing feature". This is because many operands require a custom parser in order to be processed correctly, and if an instruction is not available according to the current feature set, the generated parser code will also not detect the associated custom operand parsers. Fixed by temporarily enabling all features while parsing operands. The missing features will then be correctly detected when actually parsing the instruction itself. llvm-svn: 285575	2016-10-31 14:25:05 +00:00
Ulrich Weigand	ec5d779eb8	[SystemZ] Fix encoding of MVCK and .insn ss LLVM currently treats the first operand of MVCK as if it were a regular base+index+displacement address. However, it is in fact a base+displacement combined with a length register field. While the two might look syntactically similar, there are two semantic differences: - %r0 is a valid length register, even though it cannot be used as an index register. - In an expression with just a single register like 0(%rX), the register is treated as base with normal addresses, while it is treated as the length register (with an empty base) for MVCK. Fixed by adding a new operand parser class BDRAddr and reworking the assembler parser to distinguish between address + length register operands and regular addresses. llvm-svn: 285574	2016-10-31 14:21:36 +00:00
Jonas Paulsson	6788ddeac9	[SystemZ] Model 2 VBU units (not 1) in SystemZScheduleZ13.td. NFC. Review: Ulrich Weigand. llvm-svn: 285566	2016-10-31 13:05:48 +00:00
Alexey Bataev	d07c731d86	Improved cost model for FDIV and FSQRT, by Andrew Tischenko There is a bug describing poor cost model for floating point operations: Bug 29083 - [X86][SSE] Improve costs for floating point operations. This patch is the second one in series of patches dealing with cost model. Differential Revision: https://reviews.llvm.org/D25722 llvm-svn: 285564	2016-10-31 12:10:53 +00:00
Craig Topper	d4e580705d	[AVX-512] Add missing patterns for selecting masked vector extracts that started from shuffles. llvm-svn: 285546	2016-10-31 05:55:57 +00:00
Craig Topper	b7781a95fd	[X86] Use intrinsics table for PMADDUBSW and PMADDWD so that we can use the legacy intrinsics to select EVEX encoded instructions when available. This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type. llvm-svn: 285515	2016-10-30 06:56:16 +00:00
Craig Topper	bf9e5a16a4	[X86] Don't use loadv2i64 on SSE version of PMULHRSW. Use memopv2i64 instead. This bug was introduced in r285501. llvm-svn: 285510	2016-10-30 00:02:55 +00:00
Craig Topper	defe9ffbb5	[X86] Use intrinsics table for VPMULHRSW intrincis so that the legacy intrinsics can select EVEX encoded instructions when available. This requires a minor rename of the instructions due to the use of different tablegen classes and how the names are concatenated. llvm-svn: 285501	2016-10-29 18:41:45 +00:00
Elena Demikhovsky	519b4ccd70	Fixed FMA + FNEG combine. Masked form of FMA should be omitted in this optimization. Differential Revision: https://reviews.llvm.org/D25984 llvm-svn: 285492	2016-10-29 08:44:46 +00:00
Matt Arsenault	c88ba36eab	AMDGPU: Use 1/2pi inline imm on VI I'm guessing at how it is supposed to be printed llvm-svn: 285490	2016-10-29 04:05:06 +00:00
Matthias Braun	7d78614ae9	AArch64DeadRegisterDefinitionsPass: Cleanup; NFC - Fix doxygen file comment - reduce indentation in loop - Factor out some common subexpressions - Move independent helper function out of class - Fix Changed flag (this is not strictly NFC but a bugfix, but the flag seems ignored anyway) llvm-svn: 285488	2016-10-29 01:03:41 +00:00
Tom Stellard	6695ba0440	AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions Summary: Flat instruction can return out of order, so we need always need to wait for all the outstanding flat operations. Reviewers: tony-tye, arsenm Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D25998 llvm-svn: 285479	2016-10-28 23:53:48 +00:00
Matt Arsenault	4e9c1e3a79	AMDGPU: Fix instruction flags for s_endpgm Set isReturn, remove hasSideEffects. Also remove hasCtrlDep, I'm not really sure what that does. llvm-svn: 285476	2016-10-28 23:00:38 +00:00
Matt Arsenault	7b6475568d	AMDGPU: Add definitions for scalar store instructions Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463	2016-10-28 21:55:15 +00:00
Matt Arsenault	4b6a6cc8e9	AMDGPU: Rename glc operand type While trying to add the glc bit to SMEM instructions on VI with the new refactoring I ran into some kind of shadowing problem for the glc operand when using the pseudoinstruction as a multiclass parameter. Everywhere that currently uses it defines the operand to have the same name as its type, i.e. glc:$glc which works. For some reason now it conflicts, and its up evaluating to the wrong thing. For the real encoding classes, let Inst{16} = !if(ps.has_glc, glc, ?); was not being evaluated and still visible in the Inst initializer in the expanded td file. In other cases I got a a different error about an illegal operand where this was using { 0 } initializer from the bits<1> glc initializer instead of evaluating it as false in the if. For consistency all of the operand types should probably be captialized to avoid conflicting with the variable names unless somebody has a better idea of how to fix this. llvm-svn: 285462	2016-10-28 21:55:08 +00:00
Justin Lebar	f0a80ba385	[NVPTX] Compute 'rem' using the result of 'div', if possible. Summary: In isel, transform Num % Den into Num - (Num / Den) * Den if the result of Num / Den is already available. Reviewers: tra Subscribers: hfinkel, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26090 llvm-svn: 285461	2016-10-28 21:44:00 +00:00
Matt Arsenault	4eae301995	AMDGPU: Diagnose using too many SGPRs This is possible when using inline asm. llvm-svn: 285447	2016-10-28 20:31:47 +00:00
Matt Arsenault	08906a3c62	AMDGPU: Fix using incorrect private resource with no allocation It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435	2016-10-28 19:43:31 +00:00
Nemanja Ivanovic	e28a0fc72a	Implement vector count leading/trailing bytes with zero lsb and vector parity builtins - llvm portion This patch corresponds to review https://reviews.llvm.org/D26003. Committing on behalf of Zaara Syeda. llvm-svn: 285434	2016-10-28 19:38:24 +00:00
Krzysztof Parzyszek	87a47be039	[Hexagon] Maintain kill flags through splitting in expand-condsets Do not use LiveIntervals to recalculate kills, because that cannot be done accurately without implicit uses on predicated instructions. llvm-svn: 285409	2016-10-28 15:50:22 +00:00
Tom Stellard	aea899e2a0	AMDGPU/SI: Handle hazard with s_rfe_b64 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25638 llvm-svn: 285368	2016-10-27 23:50:21 +00:00
Tom Stellard	04051b5fad	AMDGPU/SI: Handle hazard with sgpr lane selects for v_{read,write}lane Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25637 llvm-svn: 285367	2016-10-27 23:42:29 +00:00
Tom Stellard	6b9c1be4ea	AMDGPU/SI: Fix unused variable warning on non-debug builds llvm-svn: 285363	2016-10-27 23:28:03 +00:00
Tom Stellard	b133fbb9a4	AMDGPU/SI: Handle hazard with > 8 byte VMEM stores Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25577 llvm-svn: 285359	2016-10-27 23:05:31 +00:00
Tom Stellard	30d30824b4	AMDGPU/SI: Handle s_setreg hazard in GCNHazardRecognizer Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25528 llvm-svn: 285338	2016-10-27 20:39:09 +00:00
Simon Pilgrim	d23219b9ee	[X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets llvm-svn: 285329	2016-10-27 18:32:06 +00:00
Simon Pilgrim	47c1ff7a43	[X86][AVX512DQ] Move v2i64 and v4i64 MUL lowering to tablegen As suggested by @igorb on D26011 llvm-svn: 285313	2016-10-27 17:07:40 +00:00
Saleem Abdulrasool	075d2e3c59	ARM: ensure that the Windows DBZ check is in range The Windows ARM target expects the compiler to emit a division-by-zero check. The check would use the form of: cmp r?, #0 cbz .Ltrap b .Lbody .Lbody: ... .Ltrap: udf #249 @ __brkdiv0 This works great most of the time. However, if the body of the function is greater than 127 bytes, the branch target limitation of cbz becomes an issue. This occurs in the unoptimized code generation cases sometimes (like in compiler-rt). Since this is a matter of correctness, possibly pay a small penalty instead. We now form this slightly differently: cbnz .Lbody udf #249 @ __brkdiv0 .Lbody: ... The positive case is through the branch instead of being the next instruction. However, because of the basic block layout, the negated branch is going to be a short distance always (2 bytes away, after the inserted __brkdiv0). The new t__brkdiv0 instruction is required to explicitly mark the instruction as a terminator as the generic UDF instruction is not a terminator. Addresses PR30532! llvm-svn: 285312	2016-10-27 16:59:22 +00:00
Vasileios Kalintiris	cfb005a0ee	[mips] Do not allow -opt-bisect-limit to skip the PIC call optimization pass. r282428 added the MipsOptimizePICCall as an opt-in pass that can be skipped when using the -opt-bisect-limit option. However, this pass is needed because it generates code that conforms to the o32 ABI specification by using the $t9 register for PIC calls with JALR instructions. This bug was exposed by the fact that skipFunction() also checks for the "optnone" attribute. This caused functions with that attribute to break the requirements of the o32 ABI. llvm-svn: 285305	2016-10-27 15:50:36 +00:00
Simon Pilgrim	820e1326d7	[X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64 With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304	2016-10-27 15:27:00 +00:00
Krzysztof Parzyszek	046da74699	[Hexagon] Do not expand ISD::SELECT for HVX vectors llvm-svn: 285297	2016-10-27 14:30:16 +00:00
Sam Parker	e7d9505c08	[ARM] Predicate UMAAL selection on hasDSP. UMAAL is a DSP instruction and it is not available on thumbv7m (Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong CHECK prefix in longMAC.ll test. Patch by Vadzim Dambrouski. Differential Revision: https://reviews.llvm.org/D25890 llvm-svn: 285278	2016-10-27 09:47:10 +00:00
Dylan McKay	dd680cc753	[AVR] Generate all of the TableGen files we need This enables generation of all of the TableGen files that are used downstream. llvm-svn: 285274	2016-10-27 08:20:47 +00:00
Nicolai Haehnle	7b0e25b7ad	AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies Summary: When finding a match for a merge and collecting the instructions that must be moved, keep in mind that the instruction we merge might actually use one of the defs that are being moved. Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect]. The fact that the ds_read in the test case is not eliminated suggests that there might be another problem related to alias analysis, but that's a separate problem: this pass should still work correctly even when earlier optimization passes missed something or were disabled. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25829 llvm-svn: 285273	2016-10-27 08:15:07 +00:00
Dylan McKay	00009d4824	[AVR] Compile the disassembler This also updates references of 'TheAVRTarget' to the new 'getTheAVRTarget()' method. llvm-svn: 285272	2016-10-27 08:09:15 +00:00
Dylan McKay	ec47065795	[AVR] Add AVRISelDAGToDAG.cpp Summary: This pulls the AVR instruction selector in-tree. Reviewers: arsenm, kparzysz Subscribers: llvm-commits, wdng, beanz, japaric, mgorny Differential Revision: https://reviews.llvm.org/D25278 llvm-svn: 285270	2016-10-27 07:03:47 +00:00
Dylan McKay	6eaa4e4bcc	[AVR] Add the machine code emitter Reviewers: arsenm, kparzysz Subscribers: wdng, beanz, japaric, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D25388 llvm-svn: 285269	2016-10-27 06:56:46 +00:00
Nemanja Ivanovic	32b5fed639	[PowerPC] - No SExt/ZExt needed for count trailing zeros This patch corresponds to review: https://reviews.llvm.org/D25896 It just eliminates the redundant ZExt after a count trailing zeros instruction. llvm-svn: 285267	2016-10-27 05:17:58 +00:00
Evandro Menezes	ca8370396a	[AArch64] Create feature set for Samsung Exynos-M2 Since Exynos-M2 improved the FP square root unit a bit over the one in Exynos-M1, it does not benefit from using the Newton series for such operations. llvm-svn: 285246	2016-10-26 22:06:20 +00:00
Tim Northover	a9cc385664	ARM: don't rely on push/pop reglists being in order when folding SP adjust. It would be a very nice invariant to rely on, but unfortunately it doesn't necessarily hold (and the causes of mis-sorted reglists appear to be quite varied) so to be robust the frame lowering code can't assume that the first register in the list is also the first one that actually gets pushed. Should fix an issue where we were turning something like: push {r8, r4, r7, lr} sub sp, #24 into nonsense like: push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr} llvm-svn: 285232	2016-10-26 20:01:00 +00:00
Nemanja Ivanovic	0f45998bc6	[PowerPC] Implement vec_insert_exp builtins - llvm portion This revision corresponds to review: https://reviews.llvm.org/D25957. Committing on behalf of Zaara Syeda. llvm-svn: 285225	2016-10-26 19:03:40 +00:00
Chad Rosier	0c621fda0d	[AArch64] Avoid materializing constant 1 when generating cneg instructions. Instead of cmp w0, #1 orr w8, wzr, #0x1 cneg w0, w8, ne we now generate cmp w0, #1 csinv w0, w0, wzr, eq PR28965 llvm-svn: 285217	2016-10-26 18:15:32 +00:00
Dan Gohman	68a423bf84	[WebAssembly] Update the README.txt. Update the README.txt with newer information, add a link to the Emscripten page explaining the current easiest way to use the LLVM wasm backend, and mention that other ways of using the LLVM wasm backend are in development. llvm-svn: 285215	2016-10-26 17:44:09 +00:00
Yaxun Liu	94add85adb	AMDGPU: Refactor processor definition to use ISA version features Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend. Refactor processor definition to use ISA version features. Fixed ISA version for stoney. Based on Laurent Morichetti's patch. Differential Revision: https://reviews.llvm.org/D25919 llvm-svn: 285210	2016-10-26 16:37:56 +00:00
Matt Arsenault	39787bdcbb	Reapply "AMDGPU: Don't use offen if it is 0" This reverts r283003 llvm-svn: 285203	2016-10-26 15:08:16 +00:00
Matt Arsenault	1110f14b42	AMDGPU: Fix counting si_mask_branch as 4 bytes llvm-svn: 285202	2016-10-26 14:53:54 +00:00
Tom Stellard	f8e6eaff6e	AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch Summary: A single flat memory operations that might access the scratch buffer can only access MaxPrivateElementSize bytes. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25788 llvm-svn: 285198	2016-10-26 14:38:47 +00:00
Zvi Rackover	aa3402b41e	[X86] AVX512 fallback for floating-point scalar selects Summary: In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches. Fixes pr30561 Reviewers: igorb, aymanmus, delena Differential Revision: https://reviews.llvm.org/D25310 llvm-svn: 285196	2016-10-26 14:12:46 +00:00
Craig Topper	812d3d30ae	[AVX-512] Add scalar vfmsub/vfnmsub mask3 intrinsics Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions. Reviewers: igorb, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25933 llvm-svn: 285173	2016-10-26 04:59:58 +00:00
James Y Knight	2e64b8b79e	[Sparc] Don't overlap variable-sized allocas with other stack variables. On SparcV8, it was previously the case that a variable-sized alloca might overlap by 4-bytes the last fixed stack variable, effectively because 92 (the number of bytes reserved for the register spill area) != 96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC). It's not as simple as changing 96 to 92, because variables that should be 8-byte aligned would then be misaligned. For now, simply increase the allocation size by 8 bytes for each dynamic allocation -- wastes space, but at least doesn't overlap. As the large comment says, doing this more efficiently will require larger changes in llvm. Also adds some test cases showing that we continue to not support dynamic stack allocation and over-alignment in the same function. llvm-svn: 285131	2016-10-25 22:13:28 +00:00
Evandro Menezes	7696dc0685	[AArch64] Adjust the cost model for Exynos M1. Modify the maximum jump table size. llvm-svn: 285106	2016-10-25 20:05:42 +00:00
Dan Gohman	f50d964bdb	[WebAssembly] Add immediate fields to call_indirect and memory operators. call_indirect, grow_memory, and current_memory now have immediate operands in the 0xd binary encoding. llvm-svn: 285085	2016-10-25 16:55:52 +00:00
Ulrich Weigand	7bdb485e18	[SystemZ] Do not use LOC(G) for volatile loads It is not safe to use LOAD ON CONDITION to implement access to a memory location marked "volatile", since the architecture leaves it unspecified whether or not an access happens if the condition is false. The current code already appears to care about that: def LOC : CondUnaryRSY<"loc", 0xEBF2, nonvolatile_load, GR32, 4>; Unfortunately, that "nonvolatile_load" operator is simply ignored by the CondUnaryRSY class, and there was no test to catch it. llvm-svn: 285077	2016-10-25 15:39:15 +00:00
Simon Pilgrim	5c3c9707c3	[X86][SSE] Add support for (V)PMOVSX* constant folding We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches. This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent. Differential Revision: https://reviews.llvm.org/D25874 llvm-svn: 285072	2016-10-25 14:29:25 +00:00
Benjamin Kramer	7df3043db3	Fix an unused warning in WebAssemblyInstPrinter with NDEBUG. Patch by Sam McCall! Differential Revision: https://reviews.llvm.org/D25934 llvm-svn: 285055	2016-10-25 09:08:50 +00:00
Craig Topper	01e4667e02	[AVX-512] Add support for creating SIGN_EXTEND_VECTOR_INREG and ZERO_EXTEND_VECTOR_INREG for 512-bit vectors to support vpmovzxbq and vpmovsxbq. Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added. Reviewers: delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25594 llvm-svn: 285053	2016-10-25 04:00:29 +00:00
Matthias Braun	c8440dddb2	MachineInstrBundle: Pass iterators to getBundle(Start\|End); NFC This is a function to go backwards in a block to find the first instruction in a bundle, so iterator is a more natural choice for parameter/return rather than a reference to a MachineInstruction. llvm-svn: 285051	2016-10-25 02:55:17 +00:00
Dan Gohman	48abaa9c74	[WebAssembly] Reorder load/store operands to match binary encoding. The p2align operand of a load/store is encoded before the offset operand; reorder the MachineInstr operands accordingly. llvm-svn: 285044	2016-10-25 00:17:11 +00:00
Dan Gohman	3acb187d95	[WebAssembly] Implement more WebAssembly binary encoding. This changes locals from being declared by the emitLocal hook in WebAssemblyTargetStreamer, rather than with an instruction. After exploring the infastructure in LLVM more, this seems to make more sense since declaring locals doesn't use an encoded opcode. This also adds more 0xd opcodes, type encodings, and miscellaneous binary encoding bits. llvm-svn: 285040	2016-10-24 23:27:49 +00:00
Matthias Braun	8b38ffaa98	CodeGen/Passes: Pass MachineFunction as functor arg; NFC Passing a MachineFunction as argument is more natural and avoids an unnecessary round-trip through the logic determining the correct Subtarget because MachineFunction already has a reference anyway. llvm-svn: 285039	2016-10-24 23:23:02 +00:00
Matthias Braun	fc371558a0	Use MachineInstr::mop_iterator instead of MIOperands; NFC (Const)?MIOperands is equivalent to the C++ style MachineInstr::mop_iterator. Use the latter for consistency except for a few callers of MIOperands::analyzePhysReg(). llvm-svn: 285029	2016-10-24 21:36:43 +00:00
Dan Gohman	5d3391f859	[WebAssembly] Fix a broken URL. llvm-svn: 285017	2016-10-24 20:35:17 +00:00
Dan Gohman	4becc58587	[WebAssembly] Define the `end` opcode value. CFGStackify differentiates between END_LOOP and END_BLOCK, but wasm itself doesn't. For now, just use the same opcode for both. llvm-svn: 285016	2016-10-24 20:32:04 +00:00
Dan Gohman	c968297b95	[WebAssembly] Update opcode values according to recent spec changes. This corresponds to the "0xd" opcode renumbering. llvm-svn: 285014	2016-10-24 20:21:49 +00:00
Dan Gohman	4fc4e42dea	[WebAssembly] Add an option to make get_local/set_local explicit. This patch adds a pass, controlled by an option and off by default for now, for making implicit get_local/set_local explicit. This simplifies emitting wasm with MC. Differential Revision: https://reviews.llvm.org/D25836 llvm-svn: 285009	2016-10-24 19:49:43 +00:00
Peter Collingbourne	6733564e5a	Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject. These functions are about classifying a global which will actually be emitted, so it does not make sense for them to take a GlobalValue which may for example be an alias. Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to look through aliases before using TargetLoweringObjectFile interfaces. These are functional changes but all appear to be bug fixes. Differential Revision: https://reviews.llvm.org/D25917 llvm-svn: 285006	2016-10-24 19:23:39 +00:00
Krzysztof Parzyszek	eb6172404d	Revert r284972 and remove other defaulted copy/move constructors/= David Blaikie pointed out that we get them for free without having to write anything. llvm-svn: 284996	2016-10-24 17:40:46 +00:00
Ehsan Amiri	c90b02cf50	[PPC] Generate positive FP zero using xor insn instead of loading from constant area https://reviews.llvm.org/D23614 Currently we load +0.0 from constant area. That can change to be generated using XOR instruction. llvm-svn: 284995	2016-10-24 17:31:09 +00:00
Eli Friedman	b37864b58d	Revert r284580+r284917. ("Synthesize TBB/TBH instructions") The optimization has correctness issues, so reverting for now to fix tests on thumb1 targets. llvm-svn: 284993	2016-10-24 17:20:50 +00:00
Evandro Menezes	eff2bd9d4f	[AArch64] Optionally use the Newton series for reciprocal estimation Add support for estimating the square root or its reciprocal and division or reciprocal using the combiner generic Newton series. Differential revision: https://reviews.llvm.org/D25291 llvm-svn: 284986	2016-10-24 16:14:58 +00:00
Ehsan Amiri	1f31e9157d	[PPC] Better codegen for AND, ANY_EXT, SRL sequence https://reviews.llvm.org/D24924 This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review. llvm-svn: 284983	2016-10-24 15:46:58 +00:00
Nicolai Haehnle	a785209bc2	AMDGPU: Fix Two Address problems with v_movreld Summary: The v_movreld machine instruction is used with three operands that are in a sense tied to each other (the explicit VGPR_32 def and the implicit VGPR_NN def and use). There is no way to express that using the currently available operand bits, and indeed there are cases where the Two Address instructions pass does the wrong thing. This patch introduces a new set of pseudo instructions that are identical in intended semantics as v_movreld, but they only have two tied operands. Having to add a new set of pseudo instructions is admittedly annoying, but it's a fairly straightforward and solid approach. The only alternative I see is to try to teach the Two Address instructions pass about Three Address instructions, and I'm afraid that's trickier and is going to end up more fragile. Note that v_movrels does not suffer from this problem, and so this patch does not touch it. This fixes several GL45-CTS.shaders.indexing.* tests. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25633 llvm-svn: 284980	2016-10-24 14:56:02 +00:00
Pavel Labath	51c454c1a9	Remove unused #includes of TimeValue.h. NFC. llvm-svn: 284975	2016-10-24 14:00:26 +00:00
Joel Jones	504bf334b0	AArch64 ILP32 relocations for assembly and ELF Summary: Add relocations for AArch64 ILP32. Includes: - Addition of definitions for R_AARCH32_* - Definition of new -target-abi: ilp32 - Definition of data layout string - Tests for added relocations. Not comprehensive, but matches existing tests for 64-bit. Renames "CHECK-OBJ" to "CHECK-OBJ-LP64". - Tests for llvm-readobj Reviewers: zatrazz, peter.smith, echristo, t.p.northover Subscribers: aemerson, rengolin, mehdi_amini Differential Revision: https://reviews.llvm.org/D25159 llvm-svn: 284973	2016-10-24 13:37:13 +00:00
Krzysztof Parzyszek	f74683f930	[RDF] Add default move constructors/assignment operators llvm-svn: 284972	2016-10-24 13:15:20 +00:00
Simon Dardis	9c34854833	[mips] synci microMIPS instruction definition. Add synci to the microMIPS instruction definitions, mark the MIPS sync & synci as not being part of microMIPS. This does not cover the sync instruction alias, as that will be handled with a different patch. Add sync to the valid tests for microMIPS. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D25795 llvm-svn: 284962	2016-10-24 10:23:59 +00:00
Craig Topper	8ec5c7326d	[AVX-512] Remove masked pmin/pmax intrinsics and autoupgrade to native IR. Clang patch to replace 512-bit vector and 64-bit element versions with native IR will follow. llvm-svn: 284955	2016-10-24 04:04:16 +00:00
Simon Pilgrim	6ac1e98b09	[X86][SSE] Add SSE41/AVX1 costs for vector shifts. We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939	2016-10-23 16:49:04 +00:00
Simon Pilgrim	96ef0c1103	Use APInt::isAllOnesValue instead of popcnt. NFCI. More obvious implementation and faster too. llvm-svn: 284937	2016-10-23 15:09:44 +00:00
Dylan McKay	479a13c0aa	[AVR] Add the machine code disassembler This adds a super basic implementation of a machine code disassembler. It doesn't support any operands with custom encoding. llvm-svn: 284930	2016-10-22 23:57:59 +00:00
Simon Pilgrim	d3829c89bc	[X86][AVX512VL] Added support for combining target 256-bit shuffles to AVX512VL VPERMV3 llvm-svn: 284922	2016-10-22 20:15:39 +00:00
Simon Pilgrim	56c0524f0f	[X86][AVX512] Added support for combining target shuffles to AVX512 VPERMV3 llvm-svn: 284921	2016-10-22 19:53:59 +00:00
James Molloy	2bae8640d7	[ARM] Fix crash in ConstantIslands tPCRelJT may not be the first instruction in a block. Check that instead of dereferencing a broken iterator. llvm-svn: 284917	2016-10-22 09:58:37 +00:00
Craig Topper	b084c90a18	[X86] Add support for printing shuffle comments for VALIGN instructions. llvm-svn: 284915	2016-10-22 06:51:56 +00:00
Craig Topper	7b2b8db438	[X86] Add support for lowering v4i64 and v8i64 shuffles directly to PALIGNR. I think shuffle combine can figure it out later, but we should try to get it right up front. llvm-svn: 284914	2016-10-22 06:51:52 +00:00
Craig Topper	9f374533e3	[X86] Remove unnecessary AVX2 check that was already covered by an assertion earlier in the function. NFC llvm-svn: 284913	2016-10-22 06:51:49 +00:00
Craig Topper	bea5cb5491	[X86] Remove 128-bit lane handling from the main loop of matchVectorShuffleAsByteRotate. Instead check for is128LaneRepeatedSuffleMask before the loop and just loop over the repeated mask. I plan to use the loop to support VALIGND/Q shuffles so this makes it easier to reuse. llvm-svn: 284912	2016-10-22 06:51:44 +00:00
Simon Pilgrim	0d376bcbf0	[X86][SSE] Use getConstVector helper for VPERMV mask generation. NFCI. llvm-svn: 284911	2016-10-22 06:18:36 +00:00
Konstantin Zhuravlyov	fda33eaf0c	[AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/cvt_f32_ubyte.ll Differential Revision: https://reviews.llvm.org/D25805 llvm-svn: 284891	2016-10-21 22:10:03 +00:00
Tom Stellard	6c7dd980e4	AMDGPU/SI: Fix crash caused by r284267 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25782 llvm-svn: 284875	2016-10-21 20:25:11 +00:00
Peter Collingbourne	e9bd49824d	X86: Improve BT instruction selection for 64-bit values. If a 64-bit value is tested against a bit which is known to be in the range [0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly shorter encoding. Differential Revision: https://reviews.llvm.org/D25862 llvm-svn: 284864	2016-10-21 19:57:55 +00:00
Simon Pilgrim	ab48872313	[X86][AVX512BWVL] Added support for lowering v16i16 shuffles to AVX512BWVL vpermw llvm-svn: 284863	2016-10-21 19:54:38 +00:00
Simon Pilgrim	da814cba0d	[X86][AVX512BWVL] Added support for combining target v16i16 shuffles to AVX512BWVL vpermw llvm-svn: 284860	2016-10-21 19:40:29 +00:00
Simon Pilgrim	0109bf116f	[X86][AVX512] Added support for combining target shuffles to AVX512 vpermpd/vpermq/vpermps/vpermd/vpermw llvm-svn: 284858	2016-10-21 19:18:09 +00:00
Krzysztof Parzyszek	6e7fa99d3a	[RDF] Use RegisterId typedef more consistently, NFC llvm-svn: 284857	2016-10-21 19:12:13 +00:00
Krzysztof Parzyszek	b71085b547	[Hexagon] Handle spills of partially defined double vector registers After register allocation it is possible to have a spill of a register that is only partially defined. That in itself it fine, but creates a problem for double vector registers. Stores of such registers are pseudo instructions that are expanded into pairs of individual vector stores, and in case of a partially defined source, one of the stores may use an entirely undefined register. To avoid this, track the defined parts and only generate actual stores for those. llvm-svn: 284841	2016-10-21 16:38:29 +00:00
Derek Schuff	6f69783f1f	[WebAssembly] Fix for 0xc call_indirect changes Summary: Need to reorder the operands to have the callee as the last argument. Adds a pseudo-instruction, and a pass to lower it into a real call_indirect. This is the first of two options for how to fix the problem. Reviewers: dschuff, sunfish Subscribers: jfb, beanz, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D25708 llvm-svn: 284840	2016-10-21 16:38:07 +00:00
Abderrazek Zaafrani	9daf8110c8	Set the vectorizer MaxInterleaveFactor for Exynos. llvm-svn: 284839	2016-10-21 16:28:27 +00:00
Simon Pilgrim	2d96daa885	[X86] Use DAG::getBuildVector helper wrapper where possible. NFCI. llvm-svn: 284835	2016-10-21 16:07:51 +00:00
Abderrazek Zaafrani	9f382f53d1	Test commit llvm-svn: 284832	2016-10-21 15:24:08 +00:00
Artem Tamazov	751985a757	[AMDGPU][mc] Fix ds_min/max[_rtn]_f32 - extra source operand removed. Fixes Bug 28215. Lit tests updated. Differential Revision: https://reviews.llvm.org/D25837 llvm-svn: 284825	2016-10-21 14:49:22 +00:00
Simon Pilgrim	c98d99a600	[X86][AVX2] Begun generalizing lowering to VPERMD/VPERMPS in preparation for AVX512 support. llvm-svn: 284823	2016-10-21 13:00:47 +00:00
Simon Pilgrim	32b06235da	[X86][AVX512] Add mask/maskz writemask support to subvector broadcast shuffle decode comments llvm-svn: 284821	2016-10-21 12:14:24 +00:00
Bjorn Pettersson	9fcd605d1e	[AArch64] Corrected spill size for DDD register class. NFCI Summary: The spill size was incorrectly set to 196 bits, which isn't a multiple of 8. This problem was detected when experimenting with asserts that the spill size should be a multiple of the byte size. New corrected value for the spill size is set to 192 bits. Note that tablegen (RegisterInfoEmitter) will divide the size set in the RegisterClass definition by 8. So this change should not have any impact on the tablegen output (trunc(192/8) == trunc(196/8) == 24 bytes). Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: https://reviews.llvm.org/D25818 llvm-svn: 284814	2016-10-21 09:53:42 +00:00
Michael Kuperstein	b2443ed62b	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779	2016-10-20 21:04:31 +00:00
Konstantin Zhuravlyov	521e5ef4ce	[AMDGPU] Make note record name a static const member of target streamer Differential Revision: https://reviews.llvm.org/D25746 llvm-svn: 284760	2016-10-20 18:22:36 +00:00
Konstantin Zhuravlyov	08326b6256	[AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only) Differential Revision: https://reviews.llvm.org/D25693 llvm-svn: 284759	2016-10-20 18:12:38 +00:00
Simon Pilgrim	365be4f95c	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors We weren't checking for uniform const costs before the general cost, resulting in very high estimates. llvm-svn: 284755	2016-10-20 18:00:35 +00:00
Sanjay Patel	0051efcf97	[Target] remove TargetRecip class; 2nd try This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746	2016-10-20 16:55:45 +00:00
Simon Pilgrim	025e26dd32	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults. We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases. llvm-svn: 284744	2016-10-20 16:39:11 +00:00
Valery Pykhtin	e55fd41f73	[AMDGPU] add fcopysign(f64, f32) pattern Differential revision: https://reviews.llvm.org/D25827 llvm-svn: 284743	2016-10-20 16:17:54 +00:00
Benjamin Kramer	2a8bef8769	Do a sweep over move ctors and remove those that are identical to the default. All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721	2016-10-20 12:20:28 +00:00
Jonas Paulsson	8010b631d5	[SystemZ] Post-RA scheduler implementation Post-RA sched strategy and scheduling instruction annotations for z196, zEC12 and z13. This scheduler optimizes decoder grouping and balances processor resources (including side steering the FPd unit instructions). The SystemZHazardRecognizer keeps track of the scheduling state, which can be dumped with -debug-only=misched. Reviers: Ulrich Weigand, Andrew Trick. https://reviews.llvm.org/D17260 llvm-svn: 284704	2016-10-20 08:27:16 +00:00
Peter Collingbourne	c7766778a0	X86: Allow expressions to appear as u8imm operands. llvm-svn: 284688	2016-10-20 01:58:34 +00:00
Peter Collingbourne	de1f039360	X86: Deduplicate some lowering code. NFCI. llvm-svn: 284686	2016-10-20 01:21:26 +00:00
Reid Kleckner	40d7230f2f	Use __func__ directly now that all supported compilers support it Remove the portability macro now that it is unused. llvm-svn: 284681	2016-10-20 00:22:23 +00:00
Wei Ding	3cb2a1e8d1	AMDGPU : Add a function to enable and disable IEEEBit for SC and shader respectively. Differential Revision: http://reviews.llvm.org/D25789 llvm-svn: 284655	2016-10-19 22:34:49 +00:00
Krzysztof Parzyszek	c87155037b	[AMDGPU] Stop using MCRegisterClass::getSize() Differential Review: https://reviews.llvm.org/D24675 llvm-svn: 284619	2016-10-19 17:40:36 +00:00
Krzysztof Parzyszek	7bb63ac029	[RDF] Switch RefMap in liveness calculation to use lane masks This required reengineering of some of the part of liveness calculation, including fixing some issues caused by the limitations of the previous approach. The current code is not necessarily the fastest, but it should be functionally correct (at least more so than before). The compile-time performance will be addressed in the future. llvm-svn: 284609	2016-10-19 16:30:56 +00:00
Chris Dewhurst	2c3cdd66d2	[Sparc][LEON] Detects an erratum on UT699 LEON 3 processors involving rounding mode changes and issues an appropriate user error message. Differential Revision: https://reviews.llvm.org/D24665 llvm-svn: 284591	2016-10-19 14:01:06 +00:00
Sjoerd Meijer	2fc4cb6f72	Reapply r284571 (with the new tests fixed). llvm-svn: 284588	2016-10-19 13:43:02 +00:00
Ulrich Weigand	6e31ab388a	[SystemZ] Add missing vector instructions for the assembler Most z13 vector instructions have a base form where the data type of the operation (whether to consider the vector to be 16 bytes, 8 halfwords, 4 words, or 2 doublewords) is encoded into a mask field, and then a set of extended mnemonics where the mask field is not present but the data type is encoded into the mnemonic name. Currently, LLVM only supports the type-specific forms (since those are really the ones needed for code generation), but not the base type-generic forms. To complete the assembler support and make it fully compatible with the GNU assembler, this commit adds assembler aliases for all the base forms of the various vector instructions. It also adds two more alias forms that are documented in the PoP: VFPSO/VFPSODB/WFPSODB -- generic form of VFLCDB etc. VNOT -- special variant of VNO llvm-svn: 284586	2016-10-19 13:03:18 +00:00
Ulrich Weigand	556a90c00c	[SystemZ] Add optional argument to some vector string instructions The vfee[bhf], vfene[bhf], and vistr[bhf] assembler mnemonics are documented in the Principles of Operation to have an optional last operand to encode arbitrary values in a mask field. This commit adds support for those optional operands, and cleans up the patterns to generate vector string instruction as bit. No change to code generation intended. llvm-svn: 284585	2016-10-19 12:57:46 +00:00
James Molloy	fbfd173447	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 284580	2016-10-19 12:06:49 +00:00
Sjoerd Meijer	3f5111d363	Revert of r284571 because of failing tests. llvm-svn: 284572	2016-10-19 07:45:48 +00:00
Sjoerd Meijer	a318779263	Checking FP function attribute values and adding more build attribute tests. This renames the function for checking FP function attribute values and also adds more build attribute tests (which are in separate files because build attributes are set per file). Differential Revision: https://reviews.llvm.org/D25625 llvm-svn: 284571	2016-10-19 07:25:06 +00:00
Craig Topper	a4dc340cf2	[AVX-512] Teach isel lowering that a subvector broadcast being inserted into both halves of a 512-bit vector can be combined into a larger subvector broadcast. Summary: This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors. New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions. There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert. The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns. Reviewers: RKSimon, delena, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25651 llvm-svn: 284567	2016-10-19 04:44:17 +00:00
Eli Friedman	c0a717ba5b	Improve ARM lowering for "icmp <2 x i64> eq". The custom lowering is pretty straightforward: basically, just AND together the two halves of a <4 x i32> compare. Differential Revision: https://reviews.llvm.org/D25713 llvm-svn: 284536	2016-10-18 21:03:40 +00:00
Evandro Menezes	ce8d60156c	[AArch64] Avoid materializing 0.0 when generating FP SELECT Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0` to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not have to be materialized beforehand for FCMP, as it has a form that has 0.0 as an implicit operand. Differential Revision: https://reviews.llvm.org/D24808 llvm-svn: 284531	2016-10-18 20:37:35 +00:00
Tim Northover	55782222c0	GlobalISel: select small binary operations on AArch64. AArch64 actually supports many 8-bit operations under the definition used by GlobalISel: the designated information-carrying bits of a GPR32 get the right value if you just use the normal 32-bit instruction. llvm-svn: 284526	2016-10-18 20:03:48 +00:00
Tim Northover	4494d69862	GlobalISel: support floating-point constants on AArch64. Patch from Ahmed Bougacha. llvm-svn: 284523	2016-10-18 19:47:57 +00:00
Krzysztof Parzyszek	5bb417bed2	[Hexagon] Handle block live-ins with lane masks in HexagonBlockRanges llvm-svn: 284522	2016-10-18 19:47:20 +00:00
Benjamin Kramer	4c2582ad78	Reduce global namespace pollution. NFC. llvm-svn: 284521	2016-10-18 19:39:31 +00:00
Sanjay Patel	19601fa587	revert r284495: [Target] remove TargetRecip class There's something wrong with the StringRef usage while parsing the attribute string. llvm-svn: 284513	2016-10-18 18:36:49 +00:00
Sanjay Patel	08fff9ca81	[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284495	2016-10-18 17:05:05 +00:00
Simon Pilgrim	ca3072ac58	[X86][AVX512] Add mask/maskz writemask support to constant pool shuffle decode commentx llvm-svn: 284488	2016-10-18 15:45:37 +00:00
Simon Dardis	858915f054	[mips][ias] Handle more complicated expressions for memory operands This patch teaches ias for mips to handle expressions such as (84)+(831)($sp). Such expression typically occur from the expansion of multiple macro definitions. This partially resolves PR/30383. Thanks to Sean Bruno for reporting the issue! Reviewers: zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D24667 llvm-svn: 284485	2016-10-18 15:17:17 +00:00
Simon Dardis	c4463c942c	[mips] Fix sync instruction definition The 'sync' instruction for MIPS was defined in MIPS-II as taking no operands. MIPS32 extended the define of 'sync' as taking an optional unsigned 5 bit immediate. This patch correct the definition of sync so that it is accepted with an operand of 0 or no operand for MIPS-II to MIPS-V, and a 5 bit unsigned immediate for MIPS32 and later revisions. Additionally a clear error is given when the MIPS32 version of sync is used when targeting pre MIPS32. This partially resolves PR/30714. Thanks to Daniel Sanders for reporting this issue! Reveiwers: vkalintiris Differential Revision: https://reviews.llvm.org/D25672 llvm-svn: 284483	2016-10-18 14:42:13 +00:00
Simon Dardis	aff4d141b9	[mips] Macro expansion for ld, sd for O32 ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads or stores using the specified source or destination register and the next register. This patch does not add support for the cases where the offset is greater than a 16 bit signed immediate as that would lead to a wrong/misleading error message as the assembler would report "instruction requires a CPU feature not currently enabled" for ld & sd for MIPS64 when their offset is not a signed 16 bit number. This fixes PR/29159. Thanks to Sean Bruno for reporting this issue! Reviewers: vkalintiris, seanbruno, zoran.jovanovic Differential Review: https://reviews.llvm.org/D24556 llvm-svn: 284481	2016-10-18 14:28:00 +00:00
Michael Zuckerman	1bee6340ef	[x86][inline-asm][avx512] allow swapping of '{k<num>}' & '{z}' marks Committing on behalf of Coby Tayree: After check-all and LGTM Desc: AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}') Currently, the following forms are allowed: {k<num>} {k<num>}{z} This patch allows the following forms: {z}{k<num>} and ignores the next form: {z} Justification would be quite simple - GCC Differential Revision: http://reviews.llvm.org/D25013 llvm-svn: 284479	2016-10-18 13:52:39 +00:00
Vasileios Kalintiris	3955b75ba9	[mips][FastISel] Instantiate the MipsFastISel class only for targets that support FastISel. Summary: Instead of instantiating the MipsFastISel class and checking if the target is supported in the overriden methods, we should perform that check before creating the class. This allows us to enable FastISel only for targets that truly support it, ie. MIPS32 to MIPS32R5. Reviewers: sdardis Subscribers: ehostunreach, llvm-commits Differential Revision: https://reviews.llvm.org/D24824 llvm-svn: 284475	2016-10-18 13:05:42 +00:00
Javed Absar	e7c338081a	[ARM] Assign cost of scaling for Cortex-R52 This patch assigns cost of the scaling used in addressing for Cortex-R52. On Cortex-R52 a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. Differential Revision: http://reviews.llvm.org/D25670 Reviewer: jmolloy llvm-svn: 284460	2016-10-18 09:08:54 +00:00
Simon Pilgrim	4ddc92b6cd	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459	2016-10-18 07:42:15 +00:00
Dean Michael Berris	156f6cafc2	[XRay] Support for for tail calls for ARM no-Thumb This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456	2016-10-18 05:54:15 +00:00
Craig Topper	448358b5f1	[X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself. This is especially important for 32-bit targets with 64-bit shuffle elements. llvm-svn: 284453	2016-10-18 04:48:33 +00:00
Craig Topper	7268bf99ab	[AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself. Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25666 llvm-svn: 284451	2016-10-18 04:00:32 +00:00
Craig Topper	175a415e78	[AVX-512] Add support for decoding shuffle mask from constant pool for masked VPERMILPS/PD. llvm-svn: 284450	2016-10-18 03:36:52 +00:00
Konstantin Zhuravlyov	98a3ac7106	[AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it Differential Revision: https://reviews.llvm.org/D25694 llvm-svn: 284435	2016-10-17 22:40:15 +00:00
Tim Northover	020d104496	GlobalISel: support wider range of load/store sizes in AArch64. llvm-svn: 284406	2016-10-17 18:36:53 +00:00
Tom Stellard	083f1626f5	AMDGPU/SI: LowerParameter() should be computing align based on memory type Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25203 llvm-svn: 284398	2016-10-17 16:56:19 +00:00
Tom Stellard	bc6c523cce	AMDGPU/SI: Fix LowerParameter() for i16 arguments Summary: If we are loading an i16 value from a 32-bit memory location, then we need to be able to truncate the loaded value to i16. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25198 llvm-svn: 284397	2016-10-17 16:21:45 +00:00
Craig Topper	1f5178ff9f	[X86] Fix shuffle decoding assertions to print the right number of required operands. Update the checks themselves to be >= to the same number instead of > one less than the required number. llvm-svn: 284365	2016-10-17 06:41:18 +00:00
Craig Topper	5b24cd31f5	[AVX-512] Add shuffle combining support for vpermi2var shuffles derived from existing support for vpermt2var. llvm-svn: 284357	2016-10-17 04:26:47 +00:00
Craig Topper	715ad7fef5	[AVX-512] Add support for turning a 256-bit load that goes to both halfs of an insert_subvector into a subvector broadcast. Differential Revision: https://reviews.llvm.org/D25650 llvm-svn: 284353	2016-10-16 23:29:51 +00:00
Craig Topper	aa1370ac57	[AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the other vpermi2var intrinsics. llvm-svn: 284329	2016-10-16 04:54:35 +00:00
Craig Topper	4729fe8bb6	[AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS. llvm-svn: 284328	2016-10-16 04:54:31 +00:00
Craig Topper	f18b9201f5	[AVX-512] Move (v4i64 (X86SubVBroadcast (v2i64))) alternate patterns under a HasVLX predicate. Similar for floating point. llvm-svn: 284327	2016-10-16 04:54:26 +00:00
Davide Italiano	e9cdb24f67	[ArmFastISel] Kill dead code. NFCI. llvm-svn: 284320	2016-10-16 01:09:39 +00:00
Konstantin Zhuravlyov	8ea0246e93	[MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG. Differential Revision: https://reviews.llvm.org/D24577 llvm-svn: 284312	2016-10-15 22:01:18 +00:00
Craig Topper	dde865afb5	[AVX-512] Add shuffle comments for vbroadcast instructions. llvm-svn: 284305	2016-10-15 16:26:07 +00:00
Craig Topper	51e052f741	[AVX-512] Rename VPBROADCASTI32X2 and VPBROADCASTF32X2 instruction classes to match the mnemonic which does not include a 'P'. llvm-svn: 284304	2016-10-15 16:26:02 +00:00
Tom Stellard	961811c906	AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizer Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25526 llvm-svn: 284298	2016-10-15 00:58:14 +00:00
Tim Northover	69fa84a6e9	GlobalISel: rename legalizer components to match others. The previous names were both misleading (the MachineLegalizer actually contained the info tables) and inconsistent with the selector & translator (in having a "Machine") prefix. This should make everything sensible again. The only functional change is the name of a couple of command-line options. llvm-svn: 284287	2016-10-14 22:18:18 +00:00
Guozhi Wei	0cd65429be	[PPC] Shorter sequence to load 64bit constant with same hi/lo words This is a patch to implement pr30640. When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register. This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together. Differential Revision: https://reviews.llvm.org/D25521 llvm-svn: 284276	2016-10-14 20:41:50 +00:00
Tom Stellard	09c2bd6bd4	AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations Summary: We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff) Reviewers: arsenm, nhaehnle Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24672 llvm-svn: 284267	2016-10-14 19:14:29 +00:00
Krzysztof Parzyszek	775a20913d	The real fix for post-r284255 failures llvm-svn: 284264	2016-10-14 19:06:25 +00:00
Krzysztof Parzyszek	a22daa0fa6	Workaround to eliminate check-llvm failures after r284255 llvm-svn: 284262	2016-10-14 18:36:42 +00:00
David L Kreitzer	01a057a0c4	Add a pass to optimize patterns of vectorized interleaved memory accesses for X86. The pass optimizes as a unit the entire wide load + shuffles pattern produced by interleaved vectorization. This initial patch optimizes one pattern (64-bit elements interleaved by a factor of 4). Future patches will generalize to additional patterns. Patch by Farhana Aleen Differential revision: http://reviews.llvm.org/D24681 llvm-svn: 284260	2016-10-14 18:20:41 +00:00
Tom Stellard	64a9d0876c	AMDGPU/SI: Don't allow unaligned scratch access Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257	2016-10-14 18:10:39 +00:00
Krzysztof Parzyszek	445bd12621	[RDF] Switch RegisterRef to be a pair (Register, LaneMask) Use PackedRegisterRef to store the register information in the graph nodes. This commit also removes support for virtual registers. It has never been tested or used. It will be possible to add it back if there is a need. llvm-svn: 284255	2016-10-14 17:57:55 +00:00
David L Kreitzer	d5c6755d83	[safestack] Use non-thread-local unsafe stack pointer for Contiki OS Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D19852 llvm-svn: 284254	2016-10-14 17:56:00 +00:00
Eric Christopher	c39f8b0a3a	Revert "In preparation for removing getNameWithPrefix off of TargetMachine," as it's causing sanitizer/memory issues until I can track down this set. This reverts commit r284203 llvm-svn: 284252	2016-10-14 17:28:23 +00:00
Pierre Gousseau	b6d652adb5	[X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248	2016-10-14 16:41:38 +00:00
Nicolai Haehnle	67624af0cc	AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes Summary: This will be used for 64-bit MULHU, which is in turn used for the 64-bit divide-by-constant optimization (see D24822). Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25289 llvm-svn: 284224	2016-10-14 10:30:00 +00:00
Simon Dardis	b3fd189cb5	[mips] Fix aui/daui/dahi/dati for MIPSR6 For compatiblity with binutils, define these instructions to take two registers with a 16bit unsigned immediate. Both of the registers have to be same for dahi and dati. Reviewers: dsanders, zoran.jovanovic Differential Review: https://reviews.llvm.org/D21473 llvm-svn: 284218	2016-10-14 09:31:42 +00:00
Nicolai Haehnle	bd15c3267f	AMDGPU: Fix use-after-frees Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25312 llvm-svn: 284215	2016-10-14 09:03:04 +00:00
Michael Zuckerman	174d2e784b	[x86][ms-inline-asm] use of "jmp short" in asm is not supported Committing in the name of Ziv Izhar: After check-all and LGTM . The following patch is for compatability with Microsoft. Microsoft ignores the keyword "short" when used after a jmp, for example: __asm { jmp short label label: } A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly. link to test: https://reviews.llvm.org/D24958 Differential Revision: https://reviews.llvm.org/D24957 llvm-svn: 284211	2016-10-14 08:09:40 +00:00
Eric Christopher	2bd52b5d91	In preparation for removing getNameWithPrefix off of TargetMachine, sink the current behavior into the callers and sink TargetMachine::getNameWithPrefix into TargetMachine::getSymbol. llvm-svn: 284203	2016-10-14 05:47:41 +00:00
Eric Christopher	445c952bd0	Tidy the calls to getCurrentSection().first -> getCurrentSectionOnly to help readability a bit. llvm-svn: 284202	2016-10-14 05:47:37 +00:00
Konstantin Zhuravlyov	c96b5d7073	[AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables Differential Revision: https://reviews.llvm.org/D25562 llvm-svn: 284196	2016-10-14 04:37:34 +00:00
Konstantin Zhuravlyov	2a2ac37c2c	[AMDGPU] Add 32-bit lo/hi got and pc relative variant kinds and emit appropriate relocations Differential Revision: https://reviews.llvm.org/D25548 llvm-svn: 284195	2016-10-14 04:21:32 +00:00
Saleem Abdulrasool	7705c4f1be	CodeGen: use MSVC division on windows itanium Windows itanium is identical to MSVC when dealing with everything but C++. Lower the math routines into msvcrt rather than compiler-rt. llvm-svn: 284175	2016-10-13 23:00:11 +00:00
Saleem Abdulrasool	06383dd272	CodeGen: adjust floating point operations in Windows itanium Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the promote the 32-bit floating point operations to their 64-bit equivalences. llvm-svn: 284173	2016-10-13 22:38:15 +00:00
Sriraman Tallam	f29fa586e1	New llc option pie-copy-relocations to optimize access to extern globals. This option indicates copy relocations support is available from the linker when building as PIE and allows accesses to extern globals to avoid the GOT. Differential Revision: https://reviews.llvm.org/D24849 llvm-svn: 284160	2016-10-13 20:54:39 +00:00
Nirav Dave	a81682aad4	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157	2016-10-13 20:23:25 +00:00
Nirav Dave	4b36957243	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151	2016-10-13 19:20:16 +00:00
Quentin Colombet	b3f5a8c644	[AArch64][RegisterBankInfo] Switch to fully static opds mapping for G_BITCAST. NFC. llvm-svn: 284146	2016-10-13 18:46:38 +00:00
Igor Breger	8409c356ad	[X86][AVX512] Fix sext v32i1 -> v32i8 lowering. Fix PR30600. Differential Revision: https://reviews.llvm.org/D25554 llvm-svn: 284134	2016-10-13 17:20:38 +00:00
Reid Kleckner	468e793fea	Fix for PR30687. Avoid dereferencing MBB.end(). We don't need to return a MachineInstr* from these stack probe insertion calls anyway. If we ever need to add it back, we can return an iterator instead. Based on a patch by David Kreitzer This bug is a consequence of r279314 \| dexonsmith \| 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) \| 110 lines We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion, but only when inserting a stack probe call at the end of an MBB, which isn't necessarily a common situation. Differential Revision: https://reviews.llvm.org/D25566 llvm-svn: 284130	2016-10-13 15:48:48 +00:00
Javed Absar	85874a9360	[ARM]: Assign cost of scaling used in addressing mode for ARM cores This patch assigns cost of the scaling used in addressing. On many ARM cores, a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. For instance: LDR R0, [R1, R2 LSL #2] LDR R0, [R1, -R2 LSL #2] Above, (1) takes less cycles than (2). By assigning appropriate scaling factor cost, we enable the LLVM to make the right trade-offs in the optimization and code-selection phase. Differential Revision: http://reviews.llvm.org/D24857 Reviewers: jmolloy, rengolin llvm-svn: 284127	2016-10-13 14:57:43 +00:00
Matt Arsenault	253640e18d	AMDGPU: Assume spilling will occur at -O0 Because everything live is spilled at the end of a block by fast regalloc, assume this will happen and avoid the copies of the resource descriptor. llvm-svn: 284119	2016-10-13 13:10:00 +00:00
Matt Arsenault	dac31db12f	AMDGPU: Fix truncate to bool warnings llvm-svn: 284116	2016-10-13 12:45:16 +00:00
Simon Dardis	515e8699f4	[mips] Add IAS support for dvp, evp These instructions were only defined for microMIPSR6 previously. Add definitions for MIPSR6, correct definitions for microMIPSR6, flag these instructions as having unmodelled side effects (they disable/enable virtual processors) and add missing disassember tests for microMIPSR6. Reviewers: vkalintiris Differential Review: https://reviews.llvm.org/D24291 llvm-svn: 284115	2016-10-13 12:12:56 +00:00
Oren Ben Simhon	92ccbf20ff	[X86] Basic additions to support RegCall Calling Convention. The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call. This calling convention ensures that as many values as possible are passed or returned in registers. This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86. Differential Revision: http://reviews.llvm.org/D25022 llvm-svn: 284108	2016-10-13 07:53:43 +00:00
Daniel Jasper	bee9dea306	Silence unused warning in non-assert builds. llvm-svn: 284107	2016-10-13 06:39:44 +00:00
Craig Topper	ff23af4299	[AVX-512] Teach shuffle lowering to recognize 512-bit zero extends. llvm-svn: 284105	2016-10-13 05:29:41 +00:00
Craig Topper	8cb2efa58a	[X86] Simplify the lowering code for extracting and inserting subvectors. We don't need to check if AVX is enabled. It's implied by the operation action being set to Custom. We don't need to check both the input and output type widths. We only need to check the type that's being inserted or extracted. The other type is known to be a legal type and we can assume its a different width. llvm-svn: 284102	2016-10-13 04:14:47 +00:00
Quentin Colombet	6b87a3109c	[AArch64][RegisterBankInfo] Provide alternative mappings for 64-bit load This allows RegBankSelect in greedy mode to get rid some of the cross register bank copies when loads are involved in the chain of computation. llvm-svn: 284097	2016-10-13 01:01:23 +00:00
Reid Kleckner	741d8a21d3	Correct PrivateLinkage for COFF - Use storage class C_STAT for 'PrivateLinkage' The storage class for PrivateLinkage should equal to the Internal Linkage. - Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix "L" may conflict to the normal symbol name starting with 'L'. Based on a patch by Han Sangjin! Manually updated test cases. llvm-svn: 284096	2016-10-13 00:55:24 +00:00
Quentin Colombet	cd80e97e88	[AArch64][RegisterBankInfo] Provide alternative mappings for G_BITCASTs. Thanks to this patch, RegBankSelect is able to get rid of some register bank copies as demonstrated in the test case. llvm-svn: 284094	2016-10-13 00:34:48 +00:00
Quentin Colombet	45c9c1432f	[AArch64][RegisterBankInfo] Describe cross regbank copies statically. NFC. llvm-svn: 284091	2016-10-13 00:12:06 +00:00
Quentin Colombet	9e64919b7c	[AArch64][RegisterBankInfo] Use static mapping for same bank G_BITCAST. NFC. llvm-svn: 284090	2016-10-13 00:12:04 +00:00
Quentin Colombet	db643d9091	[AArch64][MachineLegalizer] Mark more G_BITCAST as legal. Basically any vector types that fits in a 32-bit register is also valid as far as copies are concerned. llvm-svn: 284089	2016-10-13 00:12:01 +00:00
Quentin Colombet	f760799c40	[AArch64][RegisterBankInfo] Bump the cost of vector loads. This does not change anything yet, because we do not offer any alternative mapping. llvm-svn: 284088	2016-10-13 00:11:59 +00:00
Quentin Colombet	f35a8c5bdc	[AArch64][RegisterBankInfo] Use a proper cost for cross regbank G_BITCASTs. This does not change anything yet, because we do not offer any alternative mapping. llvm-svn: 284087	2016-10-13 00:11:57 +00:00
Quentin Colombet	27b40356f7	[AArch64][RegisterBankInfo] Provide more realistic copy costs. llvm-svn: 284086	2016-10-13 00:11:55 +00:00
Tim Northover	fb8d989818	GlobalISel: support G_TRUNC selection on AArch64. Ahmed's patch again. llvm-svn: 284075	2016-10-12 22:49:15 +00:00
Tim Northover	69271c64d5	GlobalISel: support int <-> float conversions on AArch64. More of Ahmed's work. llvm-svn: 284074	2016-10-12 22:49:11 +00:00
Tim Northover	7dd378dd08	GlobalISel: select G_FCMP instructions on AArch64. Another of Ahmed's patches. llvm-svn: 284073	2016-10-12 22:49:07 +00:00

... 5 6 7 8 9 ...

40345 Commits