llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	6d4444f931	[InstSimplify] add tests for or-of-icmps; NFC llvm-svn: 288830	2016-12-06 17:49:10 +00:00
Simon Pilgrim	4a2979ce12	[X86][SSE] Add knownbits test demonstrating demandedelts not ignoring undef shuffle elements llvm-svn: 288825	2016-12-06 17:00:47 +00:00
Simon Pilgrim	0caaadfc2d	[X86][SSE] Added vector sext_in_reg combine tests llvm-svn: 288819	2016-12-06 15:57:26 +00:00
Simon Pilgrim	7c7b649639	[X86] Improve UMAX/UMIN knownbits test Test the sequential effect of each op llvm-svn: 288815	2016-12-06 15:17:50 +00:00
Simon Pilgrim	e633741c3a	[SLPVectorizer][X86] Tests to show missed buildvector sitofp/fptosi vectorizations e.g. buildvector(sitofp(i32), sitofp(i32), sitofp(i32), sitofp(i32)) --> sitofp(buildvector(i32, i32, i32, i32)) llvm-svn: 288807	2016-12-06 13:29:55 +00:00
Oliver Stannard	870b5cad45	[ARM] Better error message for invalid flag-preserving Thumb1 insts When we see a non flag-setting instruction for which only the flag-setting version is available in Thumb1, we should give a better error message than "invalid instruction". Differential Revision: https://reviews.llvm.org/D27414 llvm-svn: 288805	2016-12-06 12:59:08 +00:00
Ayman Musa	86c00b799f	[X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting. Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern. For example: "build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>" Differential Revision: https://reviews.llvm.org/D26802 llvm-svn: 288804	2016-12-06 12:24:14 +00:00
Simon Pilgrim	ae63dd10f8	[X86] Add tests to show missed opportunities to calculate knownbits in SMAX/SMIN/UMAX/UMIN llvm-svn: 288801	2016-12-06 12:12:20 +00:00
Nemanja Ivanovic	15748f4921	[PowerPC] Improvements for BUILD_VECTOR Vol. 4 This is the final patch in the series of patches that improves BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations to remove redundant instructions. It also adds a large test case which encompasses a large set of code patterns that build vectors - this test case was the motivator for this series of patches. Differential Revision: https://reviews.llvm.org/D26066 llvm-svn: 288800	2016-12-06 11:47:14 +00:00
Florian Hahn	7582c669bd	[framelowering] Improve tracking of first CS pop instruction. Summary: This patch makes sure FirstCSPop and MBBI never point to DBG_VALUE instructions, which affected the code generated. Reviewers: mkuper, aprantl, MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27343 llvm-svn: 288794	2016-12-06 10:24:55 +00:00
Craig Topper	b34eef7b41	[X86] Remove another weird scalar sqrt/rcp/rsqrt pattern. This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still. The only test case for this pattern is one I just added so there was no coverage of this. llvm-svn: 288784	2016-12-06 08:08:12 +00:00
Craig Topper	26ce4267ef	[X86] Add test case demonstrating a case where a vector sqrt being passed (scalar_to_vector loadf64) uses a scalar sqrt instruction. This occurs due to a pattern that uses sse_load_f32/f64 with vector sqrt/rcp/rsqrt operations and turns them into scalar instructions. Perhaps for the case were the upper bits come from undef this is ok. I believe a (vzmovl load64) would do the same thing but those seems to become vzload instead and selectScalarSSELoad doesn't handle that today. In that case we should be performing the vector operation on the zeros in the upper bits which is not equivalent to using a scalar instruction. I will remove this pattern in a follow up patch. There appears to be no other test content for it. llvm-svn: 288783	2016-12-06 08:08:09 +00:00
Craig Topper	aa2c38378c	[X86] Regenerate a test using update_llc_test_checks.py llvm-svn: 288782	2016-12-06 08:08:07 +00:00
Craig Topper	683470bf1b	[X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic. The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits. llvm-svn: 288781	2016-12-06 08:08:04 +00:00
Craig Topper	125939ff65	[X86] Add test case that shows a scalar sqrtsd intrinsic of a 128-bit vector load using the load form of the sqrtsd instruction which violates the intrinsic semantics. The sqrtsd instruction only loads 64-bits and writes bits 63:0 with the sqrt result. Bits 127:64 are preserved in the destination register. The semantics of the intrinsic indicate bits 127:64 should come from the intrinsic argument which in this case is a 128-bit load. So the generated code should have a 128-bit load and use a register form of sqrtsd. llvm-svn: 288780	2016-12-06 08:08:01 +00:00
Craig Topper	5fc7bc91f9	[X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined. The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands. I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded. llvm-svn: 288779	2016-12-06 08:07:58 +00:00
Chris Bieneman	8b058aec1d	[ObjectYAML] First bit of support for encoding DWARF in MachO This patch adds the starting support for encoding data from the MachO __DWARF segment. The first section supported is the __debug_str section because it is the simplest. llvm-svn: 288774	2016-12-06 06:00:49 +00:00
Craig Topper	6413f8a8f2	[X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead Summary: This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently. I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel. I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available. Reviewers: spatel, delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27401 llvm-svn: 288771	2016-12-06 04:58:39 +00:00
Matt Arsenault	ad55ee5869	AMDGPU: Don't required structured CFG The structured CFG is just an aid to inserting exec mask modification instructions, once that is done we don't really need it anymore. We also do not analyze blocks with terminators that modify exec, so this should only be impacting true branches. llvm-svn: 288744	2016-12-06 01:02:51 +00:00
Weiming Zhao	b38cfced8d	Summary: Currently there is no way to disable deprecated warning from asm like this clang -target arm deprecated-asm.s -c deprecated-asm.s:30:9: warning: use of SP or PC in the list is deprecated stmia r4!, {r12-r14} We have to have an option what can disable it. Patched by Yin Ma! Reviewers: joey, echristo, weimingz Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D27219 llvm-svn: 288734	2016-12-05 23:55:13 +00:00
Tim Northover	800638fd67	GlobalISel: avoid looking too closely at PHIs when we bail. The function used to finish off PHIs by adding the relevant basic blocks can fail if we're aborting and still don't actually have the needed MachineBasicBlocks. So avoid trying in that case. llvm-svn: 288727	2016-12-05 23:10:19 +00:00
Tim Northover	b566848d68	GlobalISel: place constants correctly in the entry block. When the entry block was empty after arg lowering, we were always placing constants at the end. This is probably hamrless while translating the same block, but horribly wrong once its terminator has been translated. So switch to inserting at the beginning. llvm-svn: 288720	2016-12-05 22:40:13 +00:00
Tim Northover	c0bd197c6b	GlobalISel: handle pointer arguments that get assigned to the stack. llvm-svn: 288717	2016-12-05 22:20:32 +00:00
Tim Northover	cc35f90492	GlobalISel: translate constants larger than 64 bits. llvm-svn: 288713	2016-12-05 21:54:17 +00:00
Tim Northover	9267ac5d47	GlobalISel: make G_CONSTANT take a ConstantInt rather than int64_t. This makes it more similar to the floating-point constant, and also allows for larger constants to be translated later. There's no real functional change in this patch though, just syntax updates. llvm-svn: 288712	2016-12-05 21:47:07 +00:00
Tim Northover	6ad7b9f837	GlobalISel: improve translation fallback for constants. Returning 0 (NoReg) from getOrCreateVReg leads to unexpected situations later in the translation. It's better to return a valid (if undefined) register and let the rest of the instruction carry on as planned. llvm-svn: 288709	2016-12-05 21:40:33 +00:00
Tim Northover	d1fd383b28	GlobalISel: handle 1-element aggregates during ABI lowering. llvm-svn: 288706	2016-12-05 21:25:33 +00:00
Keno Fischer	92f377bd74	[LAA] Prevent invalid IR for loop-invariant bound in loop body Summary: If LAA expands a bound that is loop invariant, but not hoisted out of the loop body, it used to use that value anyway, causing a non-domination error, because the memcheck block is of course not dominated by the scalar loop body. Detect this situation and expand the SCEV expression instead. Fixes PR31251 Reviewers: anemet Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D27397 llvm-svn: 288705	2016-12-05 21:25:03 +00:00
Michael Kuperstein	e3036abcf9	[X86] Fix non-intrinsic roundss/roundsd to not read the destination register This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking. This fixes PR31143. Differential Revision: https://reviews.llvm.org/D27323 llvm-svn: 288703	2016-12-05 20:57:37 +00:00
Matt Arsenault	bf6bdac1ad	AMDGPU: Assembler support for exp compr is not currently parsed (or printed) correctly, but that should probably be fixed along with intrinsic changes. llvm-svn: 288698	2016-12-05 20:42:41 +00:00
Matt Arsenault	8a63cb9044	AMDGPU: Change how exp is printed This is an improvement over a long list of unreadable numbers. A follow up patch will try to match how sc formats these. llvm-svn: 288697	2016-12-05 20:31:49 +00:00
Matt Arsenault	7bee6ac798	AMDGPU: Refactor exp instructions Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695	2016-12-05 20:23:10 +00:00
Adrian Prantl	941fa7588b	[DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation so we can stop using DW_OP_bit_piece with the wrong semantics. The entire back story can be found here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's offset field to mean the offset into the source variable rather than the offset into the location at the top the DWARF expression stack. In order to be able to fix this in a subsequent patch, this patch introduces a dedicated DW_OP_LLVM_fragment operation with the semantics that we used to apply to DW_OP_bit_piece, which is what we actually need while inside of LLVM. This patch is complete with a bitcode upgrade for expressions using the old format. It does not yet fix the DWARF backend to use DW_OP_bit_piece correctly. Implementation note: We discussed several options for implementing this, including reserving a dedicated field in DIExpression for the fragment size and offset, but using an custom operator at the end of the expression works just fine and is more efficient because we then only pay for it when we need it. Differential Revision: https://reviews.llvm.org/D27361 rdar://problem/29335809 llvm-svn: 288683	2016-12-05 18:04:47 +00:00
Sanjay Patel	1f158d6955	[TargetLowering] add special-case for demanded bits analysis of 'not' We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get folded into another logic op if the target has those. However, if we can remove a logic instruction by changing the xor's constant mask value, that should always be a win. Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case currently (although that's marked with a FIXME). So if you run this IR through -instcombine, you should get the same end result. I'm hoping to add a different backend transform that will expose this problem though, so I need to solve this first. Differential Revision: https://reviews.llvm.org/D27356 llvm-svn: 288676	2016-12-05 15:58:21 +00:00
Sanjay Patel	f807f6a05f	[x86] fold fand (fxor X, -1) Y --> fandn X, Y I noticed this gap in the scalar FP-logic matching with: D26712 and: rL287171 Differential Revision: https://reviews.llvm.org/D27385 llvm-svn: 288675	2016-12-05 15:45:27 +00:00
Nirav Dave	d6642c1163	[PPC] Slightly Improve Assembly Parsing errors and add EOL comment parsing tests. NFC intended. llvm-svn: 288667	2016-12-05 14:11:03 +00:00
Simon Dardis	8fe36cd77c	[mips][ias] N32/N64 must not sort the relocation table. Doing so changes the evaluation order for relocation composition. Patch By: Daniel Sanders Reviewers: vkalintiris, atanasyan Differential Revision: https://reviews.llvm.org/D26401 llvm-svn: 288666	2016-12-05 12:55:19 +00:00
Simon Pilgrim	b08c98f125	[X86][SSE] Add support for combining target shuffles to UNPCKL/UNPCKH. llvm-svn: 288663	2016-12-05 11:25:13 +00:00
Sam Kolton	83102d99ce	[AMDGPU] Disassembler: fix s_buffer_store_dword instructions Summary: s_buffer_store_dword instructions sdata operand was called sdst in encoding. This caused disassembler to fail. Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, nhaehnle, rampitec Differential Revision: https://reviews.llvm.org/D27100 llvm-svn: 288657	2016-12-05 09:58:51 +00:00
Craig Topper	db8467ae26	[AVX-512] Teach fast isel to handle 512-bit vector bitcasts. llvm-svn: 288641	2016-12-05 05:50:51 +00:00
Craig Topper	7ef6ea324a	[AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel. llvm-svn: 288636	2016-12-05 04:51:31 +00:00
Craig Topper	227d4279a8	[AVX-512] Add avx512f command lines to fast isel SSE select test. Currently the fast isel code emits an avx1 instruction sequence even with avx512. This is different than normal isel. A follow up commit will fix this. llvm-svn: 288635	2016-12-05 04:51:28 +00:00
Simon Pilgrim	6133fc3aa2	[X86][XOP] Add target shuffle tests showing missing UNPCKL combine. llvm-svn: 288628	2016-12-04 22:55:57 +00:00
Simon Pilgrim	38d245197e	[X86][AVX512] Add target shuffle tests showing missing UNPCK combines. llvm-svn: 288627	2016-12-04 22:54:21 +00:00
Dylan McKay	6e8c2b1b65	[AVR] Remove 'XFAIL' from a CodeGen test This seems to be fixed as of r288052. llvm-svn: 288618	2016-12-04 09:50:42 +00:00
Rafael Espindola	7e00c8ca45	Prefix path when displaying thin archives. Patch by Mark Santaniello. llvm-svn: 288615	2016-12-04 06:52:30 +00:00
Matt Arsenault	92fede361f	DAG: Fold out out of bounds insert_vector_elt getNode already prevents formation of out of bounds constant extract_vector_elts. Do the same for insert_vector_elt. llvm-svn: 288603	2016-12-03 23:03:26 +00:00
Craig Topper	9d16bfa0f5	[AVX-512] Add many of the VPERM instructions to the load folding table. Move VPERMPDZri to the correct table. llvm-svn: 288591	2016-12-03 19:37:39 +00:00
Craig Topper	c210827b53	[AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables. llvm-svn: 288587	2016-12-03 17:19:15 +00:00
Sanjay Patel	b7f8cb698c	[InstCombine] change select type to eliminate bitcasts This solves a secondary problem seen in PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137#c6 This is similar to the bitwise logic op fold added with: https://reviews.llvm.org/rL287707 And like that patch, I'm artificially restricting the transform from vector <-> scalar types until we're sure that the backend can handle that. llvm-svn: 288584	2016-12-03 15:25:16 +00:00
Craig Topper	8e7498976a	[X86] Fix VEX encoded VPMADDUBSW to not be marked commutable. This was accidentallly broken in r285515 when we started lowering the intrinsic to an ISD node. Should fix PR31241. llvm-svn: 288578	2016-12-03 05:35:44 +00:00
Craig Topper	da73a09fcd	[X86] Add test cases demonstrating where we incorrectly commute VEX VPMADDUSBW due to a bug introduced in r285515. I believe this is the cause of PR31241. llvm-svn: 288577	2016-12-03 05:35:38 +00:00
Haicheng Wu	584042981d	[TTI/CostModel] Correct the way getGEPCost() calls isLegalAddressingMode() Fix a bug when we call isLegalAddressingMode() from getGEPCost(). Differential Revision: https://reviews.llvm.org/D27357 llvm-svn: 288569	2016-12-03 01:57:24 +00:00
Matthias Braun	a39c2ca44e	testcase only works in a debug build llvm-svn: 288567	2016-12-03 01:42:32 +00:00
Matthias Braun	1fbb0f6dd9	AArch64CollectLOH: Rewrite as block-local analysis. Previously this pass was using up to 5% compile time in some cases which is a bit much for what it is doing. The pass featured a full blown data-flow analysis which in the default configuration was restricted to a single block. This rewrites the pass under the assumption that we only ever work on a single block. This is done in a single pass maintaining a state machine per general purpose register to catch LOH patterns. Differential Revision: https://reviews.llvm.org/D27329 llvm-svn: 288561	2016-12-03 00:52:56 +00:00
Guozhi Wei	835de1f3ab	[ppc] Correctly compute the cost of loading 32/64 bit memory into VSR VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial. Differential Revision: https://reviews.llvm.org/D26713 llvm-svn: 288560	2016-12-03 00:41:43 +00:00
Jacques Pienaar	3bec3ef6cd	[lanai] Custom lowering of SHL_PARTS Summary: Implement custom lowering of SHL_PARTS to enable lowering of left shift with larger than 32-bit shifts. Reviewers: eliben, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27232 llvm-svn: 288541	2016-12-02 22:01:28 +00:00
Rong Xu	a5b5745a62	[PGO] Fix PGO use ICE when there are unreachable BBs For -O0 there might be unreachable BBs, which breaks the assumption that all the BBs have an auxiliary data structure. In this patch, we add another interface called findBBInfo() so that a nullptr can be returned for the unreachable BBs (and the callers can ignore those BBs). This fixes the bug reported https://llvm.org/bugs/show_bug.cgi?id=31209 Differential Revision: https://reviews.llvm.org/D27280 llvm-svn: 288528	2016-12-02 19:10:29 +00:00
Ulrich Weigand	612d24badf	[SystemZ] Support remaining atomic instructions Add assembler support for all atomic instructions that weren't already supported. Some of those could be used to implement codegen for 128-bit atomic operations, but this isn't done here yet. llvm-svn: 288526	2016-12-02 18:24:16 +00:00
Ulrich Weigand	1c5a5c42de	[SystemZ] Support floating-point control register instructions Add assembler support for instructions manipulating the FPC. Also add codegen support via the GCC compatibility builtins: __builtin_s390_sfpc __builtin_s390_efpc llvm-svn: 288525	2016-12-02 18:21:53 +00:00
Matt Arsenault	d4da0edd98	AMDGPU: Implement isCheapAddrSpaceCast llvm-svn: 288523	2016-12-02 18:12:53 +00:00
Sanjay Patel	a5dbdf342b	[x86] add common check prefix to reduce duplication; NFC llvm-svn: 288522	2016-12-02 17:58:26 +00:00
Adam Nemet	4c207a6a1f	[LTOs] Allow generation of hotness information The flag is passed by the clang driver. Differential Revision: https://reviews.llvm.org/D27331 llvm-svn: 288519	2016-12-02 17:53:56 +00:00
Adam Nemet	4df50e1fb0	Make LTO opt-remarks tests matching stricter This ensures that we don't generate the hotness attribute by default. llvm-svn: 288518	2016-12-02 17:53:49 +00:00
Sanjay Patel	c731187732	fix check-label llvm-svn: 288517	2016-12-02 17:50:14 +00:00
Sanjay Patel	91d1ed5ee6	[x86] add tests to show missing demanded bits analysis; NFC llvm-svn: 288515	2016-12-02 17:48:48 +00:00
Simon Pilgrim	b2116d9b94	[InstCombine] Add vector urem tests Demonstrate missed opportunity for urem -> and combine for powerof2 or zero non-uniform constant dividers llvm-svn: 288510	2016-12-02 17:16:21 +00:00
Simon Pilgrim	43bc269ffa	[InstCombine] Regenerate vector srem tests llvm-svn: 288509	2016-12-02 17:12:56 +00:00
Renato Golin	5b8e7ecdb3	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's builtins (twice: once in r288412 and once in r288497). We should investigate this offline. llvm-svn: 288508	2016-12-02 16:56:26 +00:00
Nicolai Haehnle	33ca182c91	[DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default Summary: When X = 0 and Y = inf, the original code produces inf, but the transformed code produces nan. So this transform (and its relatives) should only be used when the no-infs-fp-math flag is explicitly enabled. Also disable the transform using fmad (intermediate rounding) when unsafe-math is not enabled, since it can reduce the precision of the result; consider this example with binary floating point numbers with two bits of mantissa: x = 1.01 y = 111 x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step) x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero) The example relies on rounding towards zero at least in the second step. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578 Reviewers: RKSimon, tstellarAMD, spatel, arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26602 llvm-svn: 288506	2016-12-02 16:06:18 +00:00
Simon Pilgrim	3a19863f1c	[X86][SSE] Renamed shuffle combine test. We're trying to combine to vpunpckhbw not vpunpckhwd llvm-svn: 288501	2016-12-02 14:43:39 +00:00
Simon Pilgrim	cbf5f97018	[X86][SSE] Add support for extracting constant bit data from broadcasted constants llvm-svn: 288499	2016-12-02 13:16:08 +00:00
Alexey Bataev	e8e94a7176	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288497	2016-12-02 12:20:22 +00:00
Simon Pilgrim	c70d3796fb	[SLPVectorizer][X86] Add tests for vectorization of buildvector of scalar fp-ops (PR6246) llvm-svn: 288492	2016-12-02 10:54:46 +00:00
Craig Topper	4961fa9bba	[AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding tables. llvm-svn: 288484	2016-12-02 07:57:11 +00:00
Craig Topper	17ddb521ef	[AVX-512] Add EVEX PSHUFB instructions to load folding tables. llvm-svn: 288482	2016-12-02 07:06:30 +00:00
Craig Topper	f7866fad54	[AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables. llvm-svn: 288481	2016-12-02 06:24:38 +00:00
Paul Robinson	dad4907bc1	[DWARF] Put linkage-name on abstract origin even when there's a declaration. In r266692, we made it possible to emit linkage names for just inlined functions, putting the attribute on the abstract origin. Make sure we don't think the linkage-name was already emitted on a declaration. Differential Revision: http://reviews.llvm.org/D27320 llvm-svn: 288450	2016-12-02 01:55:17 +00:00
Teresa Johnson	185b4ab6d4	[ThinLTO] Stop importing constant global vars as copies in the backend Summary: We were doing an optimization in the ThinLTO backends of importing constant unnamed_addr globals unconditionally as a local copy (regardless of whether the thin link decided to import them). This should be done in the thin link instead, so that resulting exported references are marked and promoted appropriately, but will need a summary enhancement to mark these variables as constant unnamed_addr. The function import logic during the thin link was trying to handle this proactively, by conservatively marking all values referenced in the initializer lists of exported global variables as also exported. However, this only handled values referenced directly from the initializer list of an exported global variable. If the value is itself a constant unnamed_addr variable, we could end up exporting its references as well. This caused multiple issues. The first is that the transitively exported references weren't promoted. Secondly, some could not be promoted/renamed (e.g. they had a section or other constraint). recursively, instead of just adding the first level of initializer list references to the ExportList directly. Remove this optimization and the associated handling in the function import backend. SPEC measurements indicate we weren't getting much from it in any case. Fixes PR31052. Reviewers: mehdi_amini Subscribers: krasin, llvm-commits Differential Revision: https://reviews.llvm.org/D26880 llvm-svn: 288446	2016-12-02 01:02:30 +00:00
Matt Arsenault	c47701c0e9	AMDGPU: Use wider scalar spills for SGPR spilling Since the spill is for the whole wave, these don't have the swizzling problems that vector stores do and a single 4-byte allocation is enough to spill a 64 element register. This should reduce the number of spill instructions and put all the spills for a register in the same cacheline. This should save allocated private size, but for now it doesn't. The extra slots are allocated for each component, but never used because the frame layout is essentially finalized before frame indices are replaced. For always using the scalar store path, this should probably be moved into processFunctionBeforeFrameFinalized. llvm-svn: 288445	2016-12-02 00:54:45 +00:00
Wolfgang Pieb	42f92a7225	When instructions are hoisted out of loops by MachineLICM, remove their debug loc. This prevents erratic stepping behavior as well as incorrect source attribution for sample profiling. Reviewers: dblakie Subscribers: llvm-commit Differential Revision: https://reviews.llvm.org/D27290 llvm-svn: 288442	2016-12-02 00:37:57 +00:00
Geoff Berry	7ffce7be0c	[AArch64] Fold more spilled/refilled COPYs. Summary: Make AArch64InstrInfo::foldMemoryOperandImpl more general by folding all full COPYs between register classes of the same size that are either spilled or refilled. Reviewers: MatzeB, qcolombet Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27271 llvm-svn: 288439	2016-12-01 23:43:55 +00:00
Peter Collingbourne	85c2184a8e	llvm-modextract: Call keep() on the output stream before exiting. llvm-svn: 288435	2016-12-01 23:13:11 +00:00
Oleg Ranevskyy	e2ae41519f	[ARM] Fix for 64-bit CAS expansion on ARM32 with -O0 Summary: This patch fixes comparison of 64-bit atomic with its expected value in CMP_SWAP_64 expansion. Currently, the low words are compared with CMP, while the high words are compared with SBC. SBC expects the carry flag to be set if CMP detects a difference. CMP might leave the carry unset for unequal arguments though if the first one is >= than the second. This might cause the comparison logic to detect false equality. Example of the broken C++ code: ``` std::atomic<long long> at(2); long long ll = 1; std::atomic_compare_exchange_strong(&at, &ll, 3); ``` Even though the atomic `at` and the expected value `ll` are not equal and `atomic_compare_exchange_strong` returns `false`, `at` is changed to 3. The patch replaces SBC with CMPEQ. Reviewers: t.p.northover Subscribers: aemerson, rengolin, llvm-commits, asl Differential Revision: https://reviews.llvm.org/D27315 llvm-svn: 288433	2016-12-01 22:58:35 +00:00
Artem Belevich	704395a25a	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts r288412 which causes severe compile-time regression. llvm-svn: 288431	2016-12-01 22:52:15 +00:00
Matthias Braun	709a4cc238	RegisterCoalscer: Only coalesce complete reserved registers. The coalescer eliminates copies from reserved registers of the form: %vregX = COPY %rY in the case where %rY is a reserved register. However this turns out to be invalid if only some of the subregisters are reserved (see also https://reviews.llvm.org/D26648). Differential Revision: https://reviews.llvm.org/D26687 llvm-svn: 288428	2016-12-01 22:39:51 +00:00
Chih-Hung Hsieh	76b913c470	[SelectionDAG] getRawSubclassData should not return HasDebugValue. This change fixes a regression in r279537 and makes getRawSubclassData behave like r279536. Without this change, the fp128-g.ll test case will have an infinite loop involving SoftenFloatRes_LOAD. Differential Revision: http://reviews.llvm.org/D26942 llvm-svn: 288420	2016-12-01 21:56:33 +00:00
Tim Northover	5bb87b6769	AArch64: fix 128-bit cmpxchg at -O0 (again, again). This time the issue is fortunately just a simple mistake rather than a horrible design spectre. I thought SUBS/SBCS provided sufficient NZCV flags for comparing two 64-bit values, but they don't. The fix is slightly clunkier in AArch64 because we can't use conditional execution to emit a pair of CMPs. Traditionally an "icmp ne i128" would map to an EOR/EOR/ORR/CBNZ, but that uses more registers so it's easier to go with a CSET/CINC/CBNZ combination. Slightly less efficient, but this is -O0 anyway. Thanks to Anton Korobeynikov for pointing out the issue. llvm-svn: 288418	2016-12-01 21:31:59 +00:00
Philip Reames	89e92d21b4	[PR29121] Don't fold if it would produce atomic vector loads or stores The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal. Differential Revision: https://reviews.llvm.org/D24365 llvm-svn: 288415	2016-12-01 20:17:06 +00:00
Alexey Bataev	2c01af5904	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288412	2016-12-01 20:06:53 +00:00
Vedant Kumar	47de8391c0	[tablegen] Delete duplicates from a vector without skipping elements Tablegen's -gen-instr-info pass has a bug in its emitEnums() routine. The function intends for values in a vector to be deduplicated, but it accidentally skips over elements after performing a deletion. I think there are smarter ways of doing this deduplication, but we can do that in a follow-up commit if there's interest. See the thread: [PATCH] TableGen InstrMapping Bug fix. Patch by Tyler Kenney! llvm-svn: 288408	2016-12-01 19:38:50 +00:00
Kevin Enderby	5997c9480b	Fix a bug with llvm-size and the -m option with multiple files not printing the file names. llvm-svn: 288402	2016-12-01 19:12:55 +00:00
Alexey Bataev	62af7252f1	[SLP] Fixed cost model for horizontal reduction. Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost. Several test showed that vector cost model is too optimistic. It allowed vectorization of 8 or less add/fadd operations, though scalar code is faster. Actually, only for 16 or more operations vector code provides better performance. Differential Revision: https://reviews.llvm.org/D26277 llvm-svn: 288398	2016-12-01 18:42:42 +00:00
Weiming Zhao	cf26d56390	[AsmParser] Diagnose empty symbol for .set directive Summary: Diagnose empty symbol to avoid hitting assertion in MCContext::getOrCreateSymbol Reviewers: eli.friedman, rengolin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26728 llvm-svn: 288390	2016-12-01 18:00:36 +00:00
Adam Nemet	4ddb8c01b1	[GVN, OptDiag] Print the interesting instructions involved in missed load-elimination [recommitting after the fix in r288307] This includes the intervening store and the load/store that we're trying to forward from in the optimization remark for the missed load elimination. This is hooked up under a new mode in ORE that allows for compile-time budget for a bit more analysis to print more insightful messages. This mode is currently enabled for -fsave-optimization-record (-Rpass is trickier since it is controlled in the front-end). With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 Differential Revision: https://reviews.llvm.org/D26490 llvm-svn: 288381	2016-12-01 17:34:50 +00:00
Adam Nemet	8b5fba8081	[GVN, OptDiag] Include the value that is forwarded in load elimination [recommitting after the fix in r288307] This requires some changes to the opt-diag API. Hal and I have discussed this at the Dev Meeting and came up with a streaming delimiter (setExtraArgs) to solve this. Arguments after this delimiter are only included in the optimization records and not in the remarks printed in the compiler output. (Note, how in the test the content of the YAML file changes but the remarks on the compiler output don't.) This implements the green GVN message with a bug fix at line http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 The fix is that now we properly include the constant value in the message: "load of type i32 eliminated in favor of 7" Differential Revision: https://reviews.llvm.org/D26489 llvm-svn: 288380	2016-12-01 17:34:44 +00:00
Alexey Bataev	fc617690ab	[SLP] Additional tests with the cost of vector operations. llvm-svn: 288377	2016-12-01 17:26:54 +00:00
Alexey Bataev	e59a8351d0	Revert "[SLP] Additional tests with the cost of vector operations." This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e. llvm-svn: 288371	2016-12-01 16:45:04 +00:00
Adam Nemet	4d2a6e5998	[GVN] Basic optimization remark support [recommitting after the fix in r288307] Follow-on patches will add more interesting cases. The goal of this patch-set is to get the GVN messages printed in opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This is the optimization view for the function (the last remark in the function has a bug which is fixed in this series): http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430 Differential Revision: https://reviews.llvm.org/D26488 llvm-svn: 288370	2016-12-01 16:40:32 +00:00
Alexey Bataev	2ff768475d	[SLP] Additional tests with the cost of vector operations. llvm-svn: 288369	2016-12-01 16:11:48 +00:00

1 2 3 4 5 ...

41223 Commits