llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	9513b3c4c7	[X86][SSE] More thorough testing of all-ones vectors re-materialization llvm-svn: 260889	2016-02-15 13:50:48 +00:00
Simon Pilgrim	02d3b6a82d	[X86][SSE] Regenerated uint2fp special case tests llvm-svn: 260888	2016-02-15 13:41:41 +00:00
Simon Pilgrim	4e4989a64a	[X86][SSE] Regenerated fast isel intrinsics tests llvm-svn: 260885	2016-02-15 12:32:16 +00:00
Igor Breger	4dc7d390db	AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878	2016-02-15 08:25:28 +00:00
Simon Pilgrim	834931554b	[X86][AVX] Fixed copy+paste typo in shuffle test llvm-svn: 260852	2016-02-14 18:11:52 +00:00
Simon Pilgrim	08ba012973	[X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834	2016-02-13 21:54:04 +00:00
Sanjay Patel	e9bf993cee	[x86-64] allow mfence even with -mno-sse (PR23203) As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828	2016-02-13 17:26:29 +00:00
Matt Arsenault	f2ddbf00ed	AMDGPU: Prepare for reducing private element size. Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804	2016-02-13 04:18:53 +00:00
Tom Stellard	4409051d00	AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsic This intrinsic will be used to expose dpp functionality to higher-level languages. It will map to the dpp version of v_mov_b32. llvm-svn: 260792	2016-02-13 02:09:49 +00:00
Matt Arsenault	ce56a0ef54	AMDGPU: Add intrinsics for sin/cos These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782	2016-02-13 01:19:56 +00:00
Matt Arsenault	79963e80b8	AMDGPU: Rename intrinsic to better match instruction name Also fixes missing f32 test. llvm-svn: 260780	2016-02-13 01:03:00 +00:00
Pirama Arumuga Nainar	7476bc89e9	Don't combine fp_round (fp_round x) if f80 to f16 is generated Summary: This patch skips DAG combine of fp_round (fp_round x) if it results in an fp_round from f80 to f16. fp_round from f80 to f16 always generates an expensive (and as yet, unimplemented) libcall to __truncxfhf2. This prevents selection of native f16 conversion instructions from f32 or f64. Moreover, the first (value-preserving) fp_round from f80 to either f32 or f64 may become a NOP in platforms like x86. Reviewers: ab Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D17221 llvm-svn: 260769	2016-02-13 00:08:05 +00:00
Tom Stellard	bc4497b13c	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765	2016-02-12 23:45:29 +00:00
Yunzhong Gao	0de36ec169	Disable the vzeroupper insertion pass on PS4. Differential Revision: http://reviews.llvm.org/D16837 llvm-svn: 260764	2016-02-12 23:37:57 +00:00
Krzysztof Parzyszek	7793ddb043	[Hexagon] Optimize stack slot spills Replace spills to memory with spills to registers, if possible. This applies mostly to predicate registers (both scalar and vector), since they are very limited in number. A spill of a predicate register may happen even if there is a general-purpose register available. In cases like this the stack spill/reload may be eliminated completely. This optimization will consider all stack objects, regardless of where they came from and try to match the live range of the stack slot with a dead range of a register from an appropriate register class. llvm-svn: 260758	2016-02-12 22:53:35 +00:00
Sanjay Patel	1617d5ab15	fix test to use FileCheck llvm-svn: 260751	2016-02-12 22:07:54 +00:00
Dan Gohman	a6771b37f8	[WebAssembly] Fix byval for empty types. llvm-svn: 260740	2016-02-12 21:30:18 +00:00
Dan Gohman	a187ab2aeb	[WebAssembly] Fix insertion of a BLOCK in a loop header that also ends a BLOCK. llvm-svn: 260737	2016-02-12 21:19:25 +00:00
Andrew Kaylor	d1188ddd33	[WinEH] Prevent EH state numbering from skipping nested cleanup pads that never return Differential Revision: http://reviews.llvm.org/D17208 llvm-svn: 260733	2016-02-12 21:10:16 +00:00
Krzysztof Parzyszek	996ad1fa00	[Hexagon] Replace expansion of spill pseudo-instructions in frame lowering Rewrite the code to handle all pseudo-instructions in a single pass. This temporarily reverts spill slot optimization that used general- purpose registers to hold values of spilled predicate registers. llvm-svn: 260696	2016-02-12 18:19:53 +00:00
Tom Stellard	46937ca4e7	[AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD assembler Historically, AMD internal sp3 assembler has flat_store* addr, data format. To match existing code and to enable reuse, change LLVM definitions to match. Also update MC and CodeGen tests. Differential Revision: http://reviews.llvm.org/D16927 Patch by: Nikolay Haustov llvm-svn: 260694	2016-02-12 17:57:54 +00:00
Changpeng Fang	e07f1aa8fa	AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass. Summary: It is possible that the loop condition can be a boolean constant (infinite loop, for example). So we sould handle constant condition in annotating a loop. This patch adds this functionality to support annotating constant condition. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D15093 llvm-svn: 260692	2016-02-12 17:11:04 +00:00
Krzysztof Parzyszek	7d5b4db7f9	[Hexagon] Eliminate pseudo instructions for circ/brev loads and stores We can generate the actual instructions from the intrinsics without the need for pseudo-instructions. Also, since the intrinsics have a side- effect in a form of a store, attempt to optimize away loads from the store location. llvm-svn: 260690	2016-02-12 17:01:51 +00:00
Geoff Berry	c25d3bd238	[AArch64] Reduce number of callee-save save/restores. Summary: Before this change, callee-save registers would be rounded up to even pairs of GPRs and FPRs. This change eliminates these extra padding load/stores, though it does keep the stack allocation the same size unless both the GPR and FPR sets have an odd size, in which case one full pair stack slot (16 bytes) is saved. This optimization cannot currently be done for MachO targets since they rely on a fast-path .debug_frame equivalent that can only encode callee-save registers as pairs. Reviewers: t.p.northover, rengolin, mcrosier, jmolloy Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17000 llvm-svn: 260689	2016-02-12 16:31:41 +00:00
Chad Rosier	cd2be7f084	[AArch64] Add support for Qualcomm Kryo CPU. Machine model description by Dave Estes <cestes@codeaurora.org>. llvm-svn: 260686	2016-02-12 15:51:51 +00:00
Jun Bum Lim	397eb7b0b3	[AArch64] Merge two adjacent str WZR into str XZR Summary: This change merges adjacent 32 bit zero stores into a 64 bit zero store. e.g., str wzr, [x0] str wzr, [x0, #4] becomes str xzr, [x0] Therefore, four adjacent 32 bit zero stores will be a single stp. e.g., str wzr, [x0] str wzr, [x0, #4] str wzr, [x0, #8] str wzr, [x0, #12] becomes stp xzr, xzr, [x0] Reviewers: mcrosier, jmolloy, gberry, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16933 llvm-svn: 260682	2016-02-12 15:25:39 +00:00
Krzysztof Parzyszek	e59964377c	[Hexagon] Specify vector alignment in DataLayout string The DataLayout can calculate alignment of vectors based on the alignment of the element type and the number of elements. In fact, it is the product of these two values. The problem is that for vectors of N x i1, this will return the alignment of N bytes, since the alignment of i1 is 8 bits. The vector types of vNi1 should be aligned to N bits instead. Provide explicit alignment for HVX vectors to avoid such complications. llvm-svn: 260678	2016-02-12 14:47:38 +00:00
Matt Arsenault	296b849163	AMDGPU: Set flat_scratch from flat_scratch_init reg This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658	2016-02-12 06:31:30 +00:00
Matt Arsenault	24ee0785dd	AMDGPU: Set element_size in private resource descriptor Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651	2016-02-12 02:40:47 +00:00
Nicolai Haehnle	b80a5811ce	AMDGPU: Quick fix for extreme slowness in spill-scavenge-offset.ll test Summary: Also, some cosmetic fixes. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, llvm-commits Differential Revision: http://reviews.llvm.org/D17161 llvm-svn: 260625	2016-02-12 00:05:34 +00:00
NAKAMURA Takumi	e5fc9f3513	llvm/test/CodeGen/NVPTX/debug-file-loc.ll: Tweak expressions for dos path. llvm-svn: 260623	2016-02-11 23:59:43 +00:00
Tom Stellard	1397d49ef5	AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599	2016-02-11 21:45:07 +00:00
Sanjay Patel	e5df1dfb14	[SelectionDAG] change getConstant() to use the input SDLoc when building splat vectors The code change is simple enough: instead of attaching an anonymous SDLoc to splatted vector constants, use the scalar constant's existing SDLoc since that is what is passed into getConstant() as a param. But this changes instruction scheduling, so I'll explain why that happens. The motivation for this patch starts near: http://reviews.llvm.org/rL258833 ...x86's getZeroVector() could be similarly cleaned up and I thought it would be 'NFC'. But when I made that change locally, several x86 codegen tests wiggled. It turns out that the lack of SDLoc consistency in getConstant() changes the way ScheduleDAGRRList behaves. This is because the SDLoc contains 'IROrder' and some DAG scheduler algorithms use IROrder for tie-breaking. Differential Revision: http://reviews.llvm.org/D16972 llvm-svn: 260582	2016-02-11 20:21:24 +00:00
Kevin B. Smith	6a83350bee	[X86] New pass to change byte and word instructions to zero-extending versions. Differential Revision: http://reviews.llvm.org/D17032 llvm-svn: 260572	2016-02-11 19:43:04 +00:00
Artem Belevich	a8455f2e2b	[NVPTX] emit .file directives for files referenced by subprograms. .. so .loc directives referring to those files work correctly. Differential Revision: http://reviews.llvm.org/D17086 llvm-svn: 260557	2016-02-11 18:21:47 +00:00
Hans Wennborg	75fab7b0b0	Revert r260507: "[X86] Enable the LEA optimization pass by default." This caused PR26575. llvm-svn: 260538	2016-02-11 16:44:06 +00:00
Chad Rosier	00f9d23f8e	[AArch64] Improve load/store optimizer to handle LDUR + LDR. This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. This is a reapplication of r259812, which had an incorrect assert. The test_stur_str_no_assert() test is a reduced version of the issue hit in the AArch64 self-host. PR24465 llvm-svn: 260523	2016-02-11 14:25:08 +00:00
Andrey Turetskiy	193956e25f	[X86] Enable the LEA optimization pass by default. Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 260507	2016-02-11 10:51:26 +00:00
Simon Atanasyan	be18620432	[MC][ELF] Handle MIPS specific .sdata and .sbss directives MIPS specific .sdata and .sbss directives create corresponding sections with proper initialized ELF flags including ELF::SHF_MIPS_GPREL. Differential Revision: http://reviews.llvm.org/D17001 llvm-svn: 260498	2016-02-11 06:45:54 +00:00
Matt Arsenault	fcb345f172	AMDGPU: Fix constant bus use check with subregisters If the two operands to an instruction were both subregisters of the same super register, it would incorrectly think this counted as the same constant bus use. This fixes the verifier error in fmin_legacy.ll which was missing -verify-machineinstrs. llvm-svn: 260495	2016-02-11 06:15:39 +00:00
Matt Arsenault	9c47dd583a	AMDGPU: Remove some old intrinsic uses from tests llvm-svn: 260493	2016-02-11 06:02:01 +00:00
Nicolai Haehnle	d791bd07c7	AMDGPU: Release the scavenged offset register during VGPR spill Summary: This fixes a crash where subsequent spills would be unable to scavenge a register. In particular, it fixes a crash in piglit's spec@glsl-1.50@execution@geometry@max-input-components (the test still has a shader that fails to compile because of too many SGPR spills, but at least it doesn't crash any more). This is a candidate for the release branch. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, arsenm Differential Revision: http://reviews.llvm.org/D16558 llvm-svn: 260427	2016-02-10 20:13:58 +00:00
Derek Schuff	27501e2065	[WebAssembly] Switch varags calling convention to use a register Instead of passing varargs directly on the user stack, allocate a buffer in the caller's stack frame and pass a pointer to it. This simplifies the C ABI (e.g. non-C callers of C functions do not need to use C's user stack if they have their own mechanism) and allows further optimizations in the future (e.g. fewer functions may need to use the stack). Differential Revision: http://reviews.llvm.org/D17048 llvm-svn: 260421	2016-02-10 19:51:04 +00:00
Andrey Turetskiy	2396c38a8a	[X86] Fix stack alignment for MCU target, by Anton Nadolskiy. This patch fixes stack alignments for MCU (should be aligned to 4 bytes). Differential Revision: http://reviews.llvm.org/D15646 llvm-svn: 260375	2016-02-10 11:57:06 +00:00
Sanjay Patel	c7dde5f502	[x86] convert masked load of exactly one element to scalar load This is the load counterpart to the store optimization that was added in: http://reviews.llvm.org/rL260145 llvm-svn: 260325	2016-02-09 23:44:35 +00:00
Geoff Berry	173b14db7c	[AArch64] AArch64LoadStoreOptimizer: fix bug in pre-inc check iterator Summary: Fix case where a pre-inc/dec load/store would not be formed if the add/sub that forms the inc/dec part of the operation was the first instruction in the block being examined. Reviewers: mcrosier, jmolloy, t.p.northover, junbuml Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16785 llvm-svn: 260275	2016-02-09 20:47:21 +00:00
Simon Pilgrim	7e671e06a2	[X86][AVX2] Fix SIGN_EXTEND vector handling on AVX2 targets. On AVX2 target we are poorly legalizing SIGN_EXTEND ops for which the input's legalized type doesn't have the same number of elements as the destination, resulting in an ANY_EXTEND followed by a SIGN_EXTEND_INREG. This patch uses the existing SIGN_EXTEND -> SIGN_EXTEND_VECTOR_INREG combine to extend the input to the size of the result and using SIGN_EXTEND_VECTOR_INREG instead. Differential Revision: http://reviews.llvm.org/D16994 llvm-svn: 260210	2016-02-09 08:19:19 +00:00
Simon Pilgrim	a207436b01	[X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents. This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines. Differential Revision: http://reviews.llvm.org/D16956 llvm-svn: 260168	2016-02-08 23:03:46 +00:00
Andrew Kaylor	1224488e0c	[regalloc][WinEH] Do not mark intervals as not spillable if they contain a regmask Differential Revision: http://reviews.llvm.org/D16831 llvm-svn: 260164	2016-02-08 22:52:51 +00:00
Dan Gohman	06b4958260	[WebAssembly] Update the br_if instructions' operand orders to match the spec. llvm-svn: 260152	2016-02-08 21:50:13 +00:00

1 2 3 4 5 ...

14950 Commits