llvm-project

Commit Graph

Author	SHA1	Message	Date
Pirama Arumuga Nainar	bc26482717	[ARM] Fix computeKnownBits for ARMISD::CMOV Summary: The true and false operands for the CMOV are operands 0 and 1. ARMISelLowering.cpp::computeKnownBits was looking at operands 1 and 2 instead. This can cause CMOV instructions to be incorrectly folded into BFI if value set by the CMOV is another CMOV, whose known bits are computed incorrectly. This patch fixes the issue and adds a test case. Reviewers: kristof.beyls, jmolloy Subscribers: llvm-commits, aemerson, srhines, rengolin Differential Revision: https://reviews.llvm.org/D31265 llvm-svn: 298624	2017-03-23 16:47:47 +00:00
Adrian McCarthy	997a15c3c3	Re-land: Make NativeExeSymbol a concrete subclass of NativeRawSymbol [PDB] The new test should pass on all platforms now that llvm-pdbdump has the `-color-output` option. This moves exe symbol-specific method implementations out of NativeRawSymbol into a concrete subclass. Also adds implementations for hasCTypes and hasPrivateSymbols and a simple test to ensure the native reader can access the summary information for the executable from the PDB. Original Differential Revision: https://reviews.llvm.org/D31059 llvm-svn: 298623	2017-03-23 16:45:20 +00:00
Matthew Simpson	4e7b71bc86	[LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298620	2017-03-23 16:29:58 +00:00
Simon Pilgrim	1c048ab6ba	[X86][SSE] Extract elements from narrower shuffle masks. Add support for widening narrow shuffle masks so we can directly extract from the relevant input vector of the shuffle. llvm-svn: 298616	2017-03-23 16:09:34 +00:00
Matthew Simpson	1fb4064531	[LV] Delete unneeded scalar GEP creation code The code for generating scalar base pointers in vectorizeMemoryInstruction is not needed. We currently scalarize all GEPs and maintain the scalarized values in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as that in scalarizeInstruction. The test cases that changed as a result of this patch changed because we were able to reuse the scalarized GEP that we previously generated instead of cloning a new one. Differential Revision: https://reviews.llvm.org/D30587 llvm-svn: 298615	2017-03-23 16:07:21 +00:00
Tim Shen	ce26a45f7c	[PPC] Add generated tests for all atomic operations Summary: Add tests for all atomic operations for powerpc64le, so that all changes can be easily examined. Reviewers: kbarton, hfinkel, echristo Subscribers: mehdi_amini, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D31285 llvm-svn: 298614	2017-03-23 16:02:47 +00:00
Sanjay Patel	8db22f9032	[x86] add memcmp tests, remove run Add tests for vector lengths that could be handled without a libcall. There's an explicit test for 'nobuiltin', so there's not much value in a separate run that checks that same behavior over and over again. llvm-svn: 298611	2017-03-23 15:38:22 +00:00
Igor Breger	a8ba572dcf	[GlobalISel][X86] Support G_STORE/G_LOAD operation Summary: 1. Support pointer type as function argumnet and return value 2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128 3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank. 4. Support instruction selection for G_LOAD/G_STORE Reviewers: zvi, rovka, ab, qcolombet Reviewed By: rovka Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank Differential Revision: https://reviews.llvm.org/D30973 llvm-svn: 298609	2017-03-23 15:25:57 +00:00
Nirav Dave	e9ca32ae52	[SDAG] Fix zeroExtend assertion error Move CombineTo preventing deleted node from being returned in visitZERO_EXTEND. Fixes PR32284. Reviewers: RKSimon, bogner Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31254 llvm-svn: 298604	2017-03-23 15:01:50 +00:00
Dehao Chen	53a0c082d2	Do not set branch weight if the branch weight annotation is present. Summary: ThinLTO will annotate the CFG twice. If the branch weight is set by the first annotation, we should not set the branch weight again in the second annotation because the first annotation is more accurate as there is less optimization that could affect debug info accuracy. Reviewers: tejohnson, davidxl Reviewed By: tejohnson Subscribers: mehdi_amini, aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D31228 llvm-svn: 298602	2017-03-23 14:43:10 +00:00
Strahinja Petrovic	cac14b5334	[Mips] Fix for decoding DINS instruction - disassembler This patch fixes decoding of size and position for DINSM and DINSU instructions. Differential Revision: https://reviews.llvm.org/D31072 llvm-svn: 298593	2017-03-23 13:19:04 +00:00
Simon Pilgrim	d341290b5c	[X86][SSE] Add computeNumSignBits test for sitofp of (extended) i64 extracted element llvm-svn: 298592	2017-03-23 13:18:09 +00:00
Michael Zuckerman	85436ece89	[X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction Up until now, vpmovm2 instruction described its destination operand size by the source operand size. This patch adds new pattern for the vpmovm2 instruction. The node describes new expansion of the destination (from {128\|256} to 512). Differential Revision: https://reviews.llvm.org/D30654 llvm-svn: 298586	2017-03-23 09:57:01 +00:00
Artyom Skrobov	92c0653095	Reapply r298417 "[ARM] Recommit the glueless lowering of addc/adde in Thumb1" The UB in t2_so_imm_neg conversion has been addressed under D31242 / r298512 This reverts commit r298482. llvm-svn: 298562	2017-03-22 23:35:51 +00:00
Konstantin Zhuravlyov	4cbb68959b	[AMDGPU] Do not emit isa info as code object metadata - It was decided to expose this information through other means (rocr) Differential Revision: https://reviews.llvm.org/D30970 llvm-svn: 298560	2017-03-22 23:27:09 +00:00
Konstantin Zhuravlyov	a780ffaac2	[AMDGPU] Emit kernel debug properties as code object metadata Differential Revision: https://reviews.llvm.org/D30969 llvm-svn: 298558	2017-03-22 23:10:46 +00:00
Konstantin Zhuravlyov	ca0e7f6472	[AMDGPU] Emit kernel code properties as code object metadata - These are not required for low level runtime Differential Revision: https://reviews.llvm.org/D29949 llvm-svn: 298556	2017-03-22 22:54:39 +00:00
Sanjay Patel	51e077924d	[x86] improve tests, add tests, auto-generate checks; NFC llvm-svn: 298553	2017-03-22 22:39:17 +00:00
Konstantin Zhuravlyov	7498cd61fb	[AMDGPU] Restructure code object metadata creation - Rename runtime metadata -> code object metadata - Make metadata not flow - Switch enums to use ScalarEnumerationTraits - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc - Introduce in-memory representation for attributes - Code object metadata streamer - Create metadata for isa and printf during EmitStartOfAsmFile - Create metadata for kernel during EmitFunctionBodyStart - Finalize and emit metadata to .note during EmitEndOfAsmFile - Other minor improvements/bug fixes Differential Revision: https://reviews.llvm.org/D29948 llvm-svn: 298552	2017-03-22 22:32:22 +00:00
Saleem Abdulrasool	1723b8bbbc	c++filt: support COFF import thunks The synthetic thunk for the import is prefixed with __imp_. Attempt to undecorate the names when they begin with the __imp_ prefix. llvm-svn: 298550	2017-03-22 21:15:19 +00:00
Anna Thomas	e27b39a976	[LVI] Add an LVI printer pass to capture test LVI cache after transformations Summary: Adding a printer pass for printing the LVI cache values after transformations that use LVI. This will help us in identifying cases where LVI invariants are violated, or transforms that leave LVI in an incorrect state. Right now, I have added two test cases to show that the printer pass is working. I will be adding more test cases in a later change, once this change is checked in upstream. Reviewers: reames, dberlin, sanjoy, apilipenko Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30790 llvm-svn: 298542	2017-03-22 19:27:12 +00:00
Luqman Aden	3f807c91dc	Preserve nonnull metadata on Loads through SROA & mem2reg. Summary: https://llvm.org/bugs/show_bug.cgi?id=31142 : SROA was dropping the nonnull metadata on loads from allocas that got optimized out. This patch simply preserves nonnull metadata on loads through SROA and mem2reg. Reviewers: chandlerc, efriedma Reviewed By: efriedma Subscribers: hfinkel, spatel, efriedma, arielb1, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D27114 llvm-svn: 298540	2017-03-22 19:16:39 +00:00
Adrian Prantl	3eb1ec9767	Fix testcase on windows. llvm-svn: 298521	2017-03-22 17:15:03 +00:00
Sanjay Patel	2f602cea41	[InstCombine] canonicalize insertelement of scalar constant ahead of insertelement of variable insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 --> insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1 As noted in the code comment and seen in the test changes, the motivation is that by pulling constant insertion up, we may be able to constant fold some insertelement instructions. Differential Revision: https://reviews.llvm.org/D31196 llvm-svn: 298520	2017-03-22 17:10:44 +00:00
Adrian Prantl	5503077cde	Fix PR32298 by adding an early exit to getFrameIndexExprs(). Also add an assertion for the case that there are multiple FI expressions with a DW_OP_LLVM_fragment; which should violate internal constraints in DbgVariable. llvm-svn: 298518	2017-03-22 16:50:16 +00:00
Artyom Skrobov	50a066b313	[ARM] t2_so_imm_neg had a subtle bug in the conversion, and could trigger UB by negating (int)-2147483648. By pure luck, none of the pre-existing tests triggered this; so I'm adding one. Summary: Thanks to Vitaly Buka for helping catch this. Reviewers: rengolin, jmolloy, efriedma, vitalybuka Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D31242 llvm-svn: 298512	2017-03-22 15:09:30 +00:00
Rafael Espindola	72dc254532	Add default typo to .tbss.* This matches gas behavior and is part of pr31888. llvm-svn: 298508	2017-03-22 14:04:19 +00:00
Rafael Espindola	f4b9da6286	Set the default type for .bss.foo. This matches gas and is part of pr31888. llvm-svn: 298506	2017-03-22 13:57:16 +00:00
Rafael Espindola	ccd9f4f4c7	Produce INIT_ARRAY for sections named .init_array.* These sections are merged together by the linker, so they should have the same time. llvm-svn: 298505	2017-03-22 13:35:41 +00:00
Dmitry Preobrazhensky	895d377dc7	[AMDGPU][MC] Fix for Bug 28204 + LIT tests Fixed v_mad_i64_i32/u64_u32 encoding Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30828 llvm-svn: 298502	2017-03-22 13:31:01 +00:00
Simon Pilgrim	345481599f	[X86] Add multiply by constant tests (PR28513) As discussed on PR28513, add tests for constant multiplication by constants between 1 to 32 llvm-svn: 298497	2017-03-22 12:03:56 +00:00
Max Kazantsev	c6effaa495	Revert "[ScalarEvolution] Predicate implication from operations" This reverts commit rL298481 Fails clang-with-lto-ubuntu build. llvm-svn: 298489	2017-03-22 07:50:33 +00:00
Craig Topper	ad5c2d04f7	[ValueTracking] Make sure we keep range metadata information when calculating known bits for calls to bitreverse intrinsic. llvm-svn: 298488	2017-03-22 07:22:49 +00:00
Jonas Paulsson	808c89f467	[SystemZ] Don't drop any operands in expandZExtPseudo() Make sure that any operands, e.g. of an implicit def of a super reg is transferred to the new instruction. Review: Ulrich Weigand llvm-svn: 298484	2017-03-22 06:03:32 +00:00
Vitaly Buka	e69c137f90	Revert "[ARM] Recommit the glueless lowering of addc/adde in Thumb1, including the amended (no UB anymore) fix for adding/subtracting -2147483648." Fails check-llvm with ubsan This reverts commit r298417. llvm-svn: 298482	2017-03-22 05:07:44 +00:00
Max Kazantsev	15e76aa0f8	[ScalarEvolution] Predicate implication from operations This patch allows SCEV predicate analysis to prove implication of some expression predicates from context predicates related to arguments of those expressions. It introduces three new rules: For addition: (A >X && B >= 0) \|\| (B >= 0 && A > X) ===> (A + B) > X. For division: (A > X) && (0 < B <= X + 1) ===> (A / B > 0). (A > X) && (-B <= X < 0) ===> (A / B >= 0). Using these rules, SCEV is able to prove facts like "if X > 1 then X / 2 > 0". They can also be combined with the same context, to prove more complex expressions like "if X > 1 then X/2 + 1 > 1". Diffirential Revision: https://reviews.llvm.org/D30887 Reviewed by: sanjoy llvm-svn: 298481	2017-03-22 04:48:46 +00:00
Craig Topper	07f2915ad8	[InstCombine] Teach SimplifyDemandedUseBits to shrink Constants on the left side of subtracts Summary: Subtracts can have constants on the left side, but we don't shrink them based on demanded bits. This patch fixes that to match the right hand side. Reviewers: davide, majnemer, spatel, sanjoy, hfinkel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31119 llvm-svn: 298478	2017-03-22 04:03:53 +00:00
Reid Kleckner	45928018c5	[codeview] Use separate records for LF_SUBSTR_LIST and LF_ARGLIST They are structurally the same, but now we need to distinguish them because one record lives in the IPI stream and the other lives in TPI. llvm-svn: 298474	2017-03-22 01:37:38 +00:00
Aditya Nandakumar	bc389badbc	[GlobalISel]: Create VREGs for ConstantInt args This patch changes the behavior of IRTranslating intrinsics where we now create VREG + G_CONSTANT for ConstantInt values. We already do this for FloatingPoint values. This makes it easier for the backends to select code and it won't have to de-duplicate creation+selection of constants. Reviewed by: ab llvm-svn: 298473	2017-03-22 01:16:39 +00:00
Adrian Prantl	0498baa4be	Don't compose DWARF expressions with multiple subregisters. If a register location can only be described by a complex expression (i.e., multiple subregisters) it doesn't safely compose with another complex expression. For example, it is not possible to apply a DW_OP_deref operation to multiple DW_OP_pieces. llvm-svn: 298472	2017-03-22 01:16:01 +00:00
Ahmed Bougacha	15b3e8a93a	[GlobalISel] Update DBG_VALUEs referencing DCE'd instructions. Quentin points out that r298358 would cause us to emit different code with debug info. That's a big no-no; also erase the instructions that only live thanks to DBG_VALUE users. Adrian explained how this is an existing problem and an OK thing to do: clang has allocas for all variables so shouldn't be affected at -O0, but swift uses a bit of inlineasm to explicitly keep values live for the purpose of debug info quality. I'm not sure there is a better scheme. llvm-svn: 298460	2017-03-21 23:42:54 +00:00
Ahmed Bougacha	e8e1fa3a7c	[GlobalISel] Don't translate br to layout successor. MI can represent fallthrough to layout successor blocks, and our post-isel representation uses that extensively. We might as well use it too, to avoid translating and carrying along unnecessary branches. llvm-svn: 298459	2017-03-21 23:42:50 +00:00
Matt Arsenault	5b20fbb748	AMDGPU: Rename SI_RETURN This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452	2017-03-21 22:18:10 +00:00
Matthias Braun	8445cbd1ca	SplitKit: Fix subreg copy related problems Fix two problems related to r298025: - SplitKit would create duplicate VNIs in some cases leading to crashs when hoisting copies. - VirtRegMap could fail expanding copies at the beginning of a basic block. This fixes http://llvm.org/PR32353 llvm-svn: 298448	2017-03-21 21:58:08 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Tim Northover	dd4b9d6d7b	GlobalISel: widen booleans by zero-extending to a byte. A bool is represented by a single byte, which the ARM ABI requires to be either 0 or 1. So we cannot use G_ANYEXT when legalizing the type. llvm-svn: 298439	2017-03-21 21:12:04 +00:00
Sanjay Patel	68c6beabe9	[InstCombine] regenerate checks; NFC llvm-svn: 298432	2017-03-21 20:14:38 +00:00
George Burgess IV	56c7e88c2c	Let llvm.objectsize be conservative with null pointers This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430	2017-03-21 20:08:59 +00:00
Artyom Skrobov	40a4f40679	[ARM] Recommit the glueless lowering of addc/adde in Thumb1, including the amended (no UB anymore) fix for adding/subtracting -2147483648. This reverts r298328 "[ARM] Revert r297443 and r297820." and partially reverts r297842 "Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648"" llvm-svn: 298417	2017-03-21 18:39:41 +00:00
Dehao Chen	190f17cae7	Use ProfileSummary:getProfileCount to get ScaledCount for ModuleSummary Summary: ModuleSummary should use the standard interface of ProfileSummary::getProfileCount. Reviewers: eraman, tejohnson Reviewed By: tejohnson Subscribers: tejohnson, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D31154 llvm-svn: 298404	2017-03-21 17:22:35 +00:00
Adrian Prantl	4dc0324ff8	Revert 298388 and 298389 because they broke some AMDGPU tests. llvm-svn: 298401	2017-03-21 17:14:30 +00:00
Krzysztof Parzyszek	d033d1fd82	Recommit r298282 with fixes for memory allocation/deallocation [Hexagon] Recognize polynomial-modulo loop idiom again Regain the ability to recognize loops calculating polynomial modulo operation. This ability has been lost due to some changes in the preceding optimizations. Add code to preprocess the IR to a form that the pattern matching code can recognize. llvm-svn: 298400	2017-03-21 17:09:27 +00:00
Marek Olsak	5c7a61d221	AMDGPU: Buffer descriptor changes for GFX9 Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31158 llvm-svn: 298397	2017-03-21 17:00:39 +00:00
Marek Olsak	e22fdb9cac	AMDGPU: Always use VGPR indexing on GFX9 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396	2017-03-21 17:00:32 +00:00
Krzysztof Parzyszek	5e7f06f354	[Hexagon] Add -march=hexagon to a testcase llvm-svn: 298395	2017-03-21 16:59:40 +00:00
Adrian Prantl	1db6486fb1	Don't compose DWARF expressions with multiple subregisters. If a register location can only be described by a complex expression (i.e., multiple subregisters) it doesn't safely compose with another complex expression. For example, it is not possible to apply a DW_OP_deref operation to multiple DW_OP_pieces. llvm-svn: 298389	2017-03-21 16:37:39 +00:00
Matt Arsenault	f8fb605a68	AMDGPU: Fix asserting on 0 dmask for image intrinsics Fold these to undef during lowering so users get eliminated. llvm-svn: 298387	2017-03-21 16:32:17 +00:00
Matt Arsenault	964a848514	AMDGPU: Convert image intrinsic uses in tests llvm-svn: 298386	2017-03-21 16:24:12 +00:00
Matt Arsenault	dce313c3cf	DAG: Fold bitcast/extract_vector_elt of undef to undef Fixes not eliminating store when intrinsic is lowered to undef. llvm-svn: 298385	2017-03-21 16:20:16 +00:00
Simon Pilgrim	5e39cbaee5	Fix shufpd test name. llvm-svn: 298381	2017-03-21 15:12:53 +00:00
Sanne Wouda	2409c6403d	[ARM] [Assembler] Support negative immediates for A32, T32 and T16 Summary: To support negative immediates for certain arithmetic instructions, the instruction is converted to the inverse instruction with a negated (or inverted) immediate. For example, "ADD r0, r1, #FFFFFFFF" cannot be encoded as an ADD instruction. However, "SUB r0, r1, #1" is equivalent. These conversions are different from instruction aliases. An alias maps several assembler instructions onto one encoding. A conversion, however, maps an invalid instruction--e.g. with an immediate that cannot be represented in the encoding--to a different (but equivalent) instruction. Several instructions with negative immediates were being converted already, but this was not systematically tested, nor did it cover all instructions. This patch implements all possible substitutions for ARM, Thumb1 and Thumb2 assembler and adds tests. It also adds a feature flag (-mattr=+no-neg-immediates) to turn these substitutions off. This is helpful for users who want their code to assemble to exactly what they wrote. Reviewers: t.p.northover, rovka, samparker, javed.absar, peter.smith, rengolin Reviewed By: javed.absar Subscribers: aadg, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D30571 llvm-svn: 298380	2017-03-21 14:59:17 +00:00
Sanjay Patel	00ece756c3	[InstCombine] auto-generate better checks; NFC llvm-svn: 298377	2017-03-21 14:04:44 +00:00
Sanjay Patel	79379cae15	[x86] use PMOVMSK for vector-sized equality comparisons We could do better by splitting any oversized type into whatever vector size the target supports, but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte structs, so I think we can wire that up with a TLI hook that feeds into this. Differential Revision: https://reviews.llvm.org/D31156 llvm-svn: 298376	2017-03-21 13:50:33 +00:00
Simon Pilgrim	8bda035121	[X86][AVX] Tests showing missing SHUFPD + ZERO lowering This lowers to SHUFPD if the input is zeroinitializer but not with a demanded elts optimized build vector. llvm-svn: 298370	2017-03-21 13:30:40 +00:00
Valery Pykhtin	fd4c410f4d	[AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368	2017-03-21 13:15:46 +00:00
Volkan Keles	044e003203	[GlobalISel] Fix shufflevector tests clang-lld-x86_64-2stage fails because of the order of the instructions. `CHECK-DAG` directives should fix the problem. llvm-svn: 298367	2017-03-21 13:12:59 +00:00
Sam Kolton	f60ad58dad	[ADMGPU] SDWA peephole optimization pass. Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365	2017-03-21 12:51:34 +00:00
Andrea Di Biagio	7937be7dd3	[DebugInfo][X86] Teach Optimize LEAs pass to handle debug values This patch fixes an issue in the Optimize LEAs pass where redundant LEAs were not removed because they were being used by debug values. The debug values are now ignored when determining whether LEAs are redundant. For now the debug values for the redundant LEAs are marked as undefined, effectively lost. The intention is for a follow up patch which will attempt to preserve the debug values where possible. Patch by Andrew Ng. Differential Revision: https://reviews.llvm.org/D30835 llvm-svn: 298360	2017-03-21 11:36:21 +00:00
Jonas Paulsson	54c7680e1f	[DAGTypeLegalizer] Handle widening truncate to vector of i1. Previously, PromoteIntRes_TRUNCATE() did not handle the case where the operand needs widening, which resulted in llvm_unreachable(). This patch adds the needed handling, along with a test case. Review: Eli Friedman, Simon Pilgrim. https://reviews.llvm.org/D31077 llvm-svn: 298357	2017-03-21 10:24:14 +00:00
David Green	da21170c49	[ConstantFolding] Fix to prevent constant folding having to repeatedly scan operands. NFCI After the loop unroll threshold was increased in r295538, very large constant expressions can be created. This prevents them from having to be recursively scanned, leading to a compile time blow-up. Differential Revision: https://reviews.llvm.org/D30689 llvm-svn: 298356	2017-03-21 10:17:39 +00:00
Volkan Keles	75bdc7690e	[GlobalISel] Translate shufflevector Reviewers: qcolombet, aditya_nandakumar, t.p.northover, javed.absar, ab, dsanders Reviewed By: javed.absar Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30962 llvm-svn: 298347	2017-03-21 08:44:13 +00:00
Jonas Paulsson	bd65421f08	[SystemZ] Don't drop MO flags in foldMemoryOperandImpl() The def operand of the new LG/LD should have the old def operands flags and subreg index. New test: test/CodeGen/SystemZ/fold-memory-op-impl.ll Review: Ulrich Weigand llvm-svn: 298341	2017-03-21 05:49:40 +00:00
Vitaly Buka	c12716e742	Revert "[Hexagon] Recognize polynomial-modulo loop idiom again" Fix memory leaks on check-llvm tests detected by Asan. This reverts commit r298282. llvm-svn: 298329	2017-03-21 00:59:51 +00:00
Eli Friedman	76732acc23	[ARM] Revert r297443 and r297820. The glueless lowering of addc/adde in Thumb1 has known serious miscompiles (see https://reviews.llvm.org/D31081), and r297820 causes an infinite loop for certain constructs. It's not clear when they will be fixed, so let's just take them out of the tree for now. (I resolved a small conflict with r297453.) llvm-svn: 298328	2017-03-21 00:26:39 +00:00
Vadzim Dambrouski	ba789cbd3d	[ARM] Fix PR32130: Handle promotion of zero sized constants. The special case of zero sized values was previously not handled correctly. This patch handles this by not promoting if the size is zero. Patch by Tim Neumann. Differential Revision: https://reviews.llvm.org/D31116 llvm-svn: 298320	2017-03-20 22:59:57 +00:00
Sanjay Patel	f238902f52	[x86] add tests for setcc of i128/i256; NFC llvm-svn: 298317	2017-03-20 22:15:40 +00:00
Matt Arsenault	6b00d40900	InstCombine: Check source value precision when reducing cast intrinsic Missed this check when porting from the libcall version. llvm-svn: 298312	2017-03-20 21:59:24 +00:00
Tim Northover	4340d64f91	GlobalISel: add implicit defs & uses when mutating an instruction. Otherwise a scheduler might do bad things to the code we produce. llvm-svn: 298311	2017-03-20 21:58:23 +00:00
Eli Friedman	b1578d3612	[SCEV] Fix trip multiple calculation If loop bound containing calculations like min(a,b), the Scalar Evolution API getSmallConstantTripMultiple returns 4294967295 "-1" as the trip multiple. The problem is that, SCEV use -1 * umax to represent umin. The multiple constant -1 was returned, and the logic of guarding against huge trip counts was skipped. Because -1 has 32 active bits. The fix attempt to factor more general cases. First try to get the greatest power of two divisor of trip count expression. In case overflow happens, the trip count expression is still divisible by the greatest power of two divisor returned. Returns 1 if not divisible by 2. Patch by Huihui Zhang <huihuiz@codeaurora.org> Differential Revision: https://reviews.llvm.org/D30840 llvm-svn: 298301	2017-03-20 20:25:46 +00:00
David L. Jones	d61548471c	[X86] Clean up test/CodeGen/X86/2006-03-01-InstrSchedBug.ll Summary: - Migrated from grep to FileCheck. - Re-indented, removed boilerplate comments. - Added 'entry' label at beginning of basic block. Patch by Jorge Gorbe! Reviewed By: RKSimon Subscribers: RKSimon, jgorbe, llvm-commits Differential Revision: https://reviews.llvm.org/D30317 llvm-svn: 298298	2017-03-20 20:10:30 +00:00
Nirav Dave	f5f0864ac2	Add test case for merging of chained stores of mismatched type. llvm-svn: 298293	2017-03-20 19:48:22 +00:00
Kevin Enderby	a8d256cb36	Add the rest of the error checking for Mach-O dyld compact bind entry errors and test cases for each of the error checks. To do this more plumbing was needed so that the segment indexes and segment offsets can be checked. Basically what was done was the SegInfo from llvm-objdump’s MachODump.cpp was moved into libObject for Mach-O objects as BindRebaseSegInfo and it is only created when an iterator for bind or rebase entries are created. This commit really only adds the error checking and test cases for the bind table entires and the checking for the lazy bind and weak bind entries are still to be fully done as well as the rebase entires. Though some of the plumbing for those are added with this commit. Those other error checks and test cases will be added in follow on commits. Note, the two llvm_unreachable() calls should now actually be unreachable with the error checks in place and would take a logic bug in the error checking code to be reached if the segment indexes and segment offsets are used from a checked bind entry. Comments have been added to the methods that require the arguments to have been checked prior to calling. llvm-svn: 298292	2017-03-20 19:46:55 +00:00
Evgeniy Stepanov	c440572715	Revert r298158. Revert "[asan] Fix dead stripping of globals on Linux." OOM in gold linker. llvm-svn: 298288	2017-03-20 18:45:34 +00:00
Krzysztof Parzyszek	8490251de3	[Hexagon] Recognize polynomial-modulo loop idiom again Regain the ability to recognize loops calculating polynomial modulo operation. This ability has been lost due to some changes in the preceding optimizations. Add code to preprocess the IR to a form that the pattern matching code can recognize. llvm-svn: 298282	2017-03-20 18:12:58 +00:00
Konstantin Zhuravlyov	2534bc07f4	[AMDGPU] Run always inliner early in opt Differential Revision: https://reviews.llvm.org/D31141 llvm-svn: 298281	2017-03-20 18:06:45 +00:00
Daniel Berlin	fa42a23cfc	Add missing updated test from VN coercion changes. Instructions were renamed. NFC llvm-svn: 298280	2017-03-20 18:04:19 +00:00
Reid Kleckner	8819c73878	[WinEH] Adjust decision to emit SEH moves for leaf functions Move the check for "MF->hasWinCFI()" up into the calculation of the shouldEmitMoves boolean, rather than putting it in the early returning if. This ensures that endFunction doesn't try to emit .seh_* directives for leaf functions. llvm-svn: 298276	2017-03-20 17:45:59 +00:00
Tim Northover	89268b183f	GlobalISel: allow quad-precision values to be dumped. Otherwise the fallback path fails with an assertion on AAPCS AArch64 targets, when "long double" is encountered. llvm-svn: 298273	2017-03-20 16:52:08 +00:00
Peter Collingbourne	25a17ba4c7	Support, LTO: When pruning a directory, ignore files matching a prefix. This is a safeguard against data loss if the user specifies a directory that is not a cache directory. Teach the existing cache pruning clients to create files with appropriate names. Differential Revision: https://reviews.llvm.org/D31109 llvm-svn: 298271	2017-03-20 16:41:57 +00:00
Dehao Chen	e593049fb0	Updates branch_weights annotation for call instructions during inlining. Summary: Inliner should update the branch_weights annotation to scale it to proper value. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D30767 llvm-svn: 298270	2017-03-20 16:40:44 +00:00
Dmitry Preobrazhensky	1e124e1825	[AMDGPU][MC] Fix for Bugs 28201, 28199, 28170 + LIT tests This fix enables sp3 abs modifier with constants Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30825 llvm-svn: 298265	2017-03-20 16:33:20 +00:00
Daniel Sanders	b96f40dd03	[tablegen][globalisel] Capture instructions into locals and related infrastructure for multiple instructions matches. Summary: Prepare the way for nested instruction matching support by having actions like CopyRenderer look up operands in the RuleMatcher rather than a specific InstructionMatcher. This allows actions to reference any operand from any matched instruction. It works by checking the 'shape' of the match and capturing each matched instruction to a local variable. If the shape is wrong (not enough operands, leaf nodes where non-leafs are expected, etc.), then the rule exits early without checking the predicates. Once we've captured the instructions, we then test the predicates as before (except using the local variables). If the match is successful, then we render the new instruction as before using the local variables. It's not noticable in this patch but by the time we support multiple instruction matching, this patch will also cause a significant improvement to readability of the emitted code since MRI.getVRegDef(I->getOperand(0).getReg()) will simply be MI1 after emitCxxCaptureStmts(). This isn't quite NFC because I've also fixed a bug that I'm surprised we haven't encountered yet. It now checks there are at least the expected number of operands before accessing them with getOperand(). Depends on D30531 Reviewers: t.p.northover, qcolombet, aditya_nandakumar, ab, rovka Reviewed By: rovka Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D30535 llvm-svn: 298257	2017-03-20 15:20:42 +00:00
Dmitry Preobrazhensky	40af9c35d3	[AMDGPU][MC] Fix for Bugs 28200, 28202 + LIT tests Fixed several related issues with VOP3 fp modifiers. Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30821 llvm-svn: 298255	2017-03-20 14:50:35 +00:00
Diana Picus	d79253a9f7	[GlobalISel] Use the correct calling conv for calls This commit adds a parameter that lets us pass in the calling convention of the call to CallLowering::lowerCall. This allows us to handle situations where the calling convetion of the callee is different from that of the caller. Differential Revision: https://reviews.llvm.org/D31039 llvm-svn: 298254	2017-03-20 14:40:18 +00:00
Konstantin Zhuravlyov	8a67eb144f	Revert "[AMDGPU] Run always inliner early in opt" This reverts commit r297958, it breaks device-libs build. llvm-svn: 298239	2017-03-20 09:26:08 +00:00
Craig Topper	5992c8d1dc	[AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel Summary: Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust. This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely. Reviewers: zvi, delena, RKSimon, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31056 llvm-svn: 298228	2017-03-19 17:11:09 +00:00
Craig Topper	ff2283ec0e	[InstCombine] Use update_test_checks.py to regenerate a test. NFC llvm-svn: 298227	2017-03-19 17:04:52 +00:00
Simon Pilgrim	8424df7dea	Fix constant folding of fp2int to large integers We make the assumption in most of our constant folding code that a fp2int will target an integer of 128-bits or less, calling the APFloat::convertToInteger with only uint64_t[2] of raw bits for the result. Fuzz testing (PR24662) showed that we don't handle other cases at all, resulting in stack overflows and all sorts of crashes. This patch uses the APSInt version of APFloat::convertToInteger instead to better handle such cases. Differential Revision: https://reviews.llvm.org/D31074 llvm-svn: 298226	2017-03-19 16:50:25 +00:00
Ahmed Bougacha	931904d777	[GlobalISel] Don't select trivially dead instructions. Folding instructions when selecting can cause them to become dead. Don't select these dead instructions (if they don't have other side effects, and don't define physical registers). Preserve existing tests by adding COPYs. In some tests, the G_CONSTANT vregs never get constrained to a class: the only use of the vreg was folded into another instruction, so the G_CONSTANT, now dead, never gets selected. llvm-svn: 298224	2017-03-19 16:13:00 +00:00
Ahmed Bougacha	48bcd22ce8	[GlobalISel][AArch64] Add DBG_VALUE select test. NFC. llvm-svn: 298223	2017-03-19 16:12:53 +00:00
Ahmed Bougacha	dcd416a4b9	[GlobalISel][AArch64] Split out cast select tests. NFC. And remove some redundant bitcast tests. Also split the test functions themselves: it makes it obvious to see what's tested where and what isn't, it makes the tests much easier to read and manually update, and, most importantly, it makes them almost trivial to update using tooling. Yes, it's obnoxiously verbose, but said tooling helps upgrade to better MIR syntax whenever available. llvm-svn: 298222	2017-03-19 16:12:51 +00:00
Xin Tong	967e313078	Remove unused arguments. NFCI llvm-svn: 298218	2017-03-19 15:31:16 +00:00
Xin Tong	d67fb1b66e	[JumpThreading] Perform phi-translation in SimplifyPartiallyRedundantLoad. Summary: In case we are loading on a phi-load in SimplifyPartiallyRedundantLoad. Try to phi translate it into incoming values in the predecessors before we search for available loads. This needs https://reviews.llvm.org/D30524 Reviewers: davide, sanjoy, efriedma, dberlin, rengolin Reviewed By: dberlin Subscribers: junbuml, llvm-commits Differential Revision: https://reviews.llvm.org/D30543 llvm-svn: 298217	2017-03-19 15:30:53 +00:00
Teresa Johnson	9b4b8c8d7b	Enable stripping of multiple DILocation on !llvm.loop metadata Summary: I found that stripDebugInfo was still leaving significant amounts of debug info due to !llvm.loop that contained DILocation after stripping. The support for stripping debug info on !llvm.loop added in r293377 only removes a single DILocation. Enhance that to remove all DILocation from !llvm.loop. Reviewers: hfinkel, aprantl, dsanders Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31117 llvm-svn: 298213	2017-03-19 13:54:57 +00:00
Oren Ben Simhon	75537b6566	[MIR] Test assumes x64 windows calling convention upon printing/parsing MIR output/input. llvm-svn: 298212	2017-03-19 13:23:20 +00:00
Benjamin Kramer	6520f83ba4	[MIR] Add triple to test that assumes it runs on windows. llvm-svn: 298211	2017-03-19 13:04:35 +00:00
Oren Ben Simhon	9ce0ec5dbc	CalleeSavedRegister was removed from MIR and is recalculated upon MIR parsing. llvm-svn: 298210	2017-03-19 11:18:09 +00:00
Oren Ben Simhon	a96fdbf233	Moving the test to x86 because other architectures do not suport regcall calling convention. llvm-svn: 298209	2017-03-19 08:53:42 +00:00
Oren Ben Simhon	0ef61ec32a	[MIR] Support Customed Register Mask and CSRs The MIR printer dumps a string that describe the register mask of a function. A static predefined list of register masks matches a static list of strings. However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails. This patch adds support to custom register mask printing and dumping. Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic. As such this data needs to be dumped and parsed back to the Machine Register Info. Differential Revision: https://reviews.llvm.org/D30971 llvm-svn: 298207	2017-03-19 08:14:18 +00:00
Brian Gesiak	1640e68728	[Analysis] bitreverse(undef) returns undef Summary: The reverse of an artbitrary bitpattern is also an arbitrary bitpattern. Reviewers: trentxintong, arsenm, majnemer Reviewed By: majnemer Subscribers: majnemer, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D31118 llvm-svn: 298201	2017-03-19 04:40:42 +00:00
Daniel Berlin	41b39169e2	NewGVN: Fix PHI evaluation bug exposed by new verifier. We were checking whether the incoming block was reachable instead of whether the specific edge was reachable llvm-svn: 298187	2017-03-18 15:41:36 +00:00
Nirav Dave	ac6081cb67	Make library calls sensitive to regparm module flag (Fixes PR3997). Reviewers: mkuper, rnk Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D27050 llvm-svn: 298179	2017-03-18 00:44:07 +00:00
Stanislav Mekhanoshin	8e45acfc38	[AMDGPU] Add address space based alias analysis pass This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172	2017-03-17 23:56:58 +00:00
Sanjay Patel	0429b1a431	[x86] regenerate checks; NFC llvm-svn: 298166	2017-03-17 23:04:18 +00:00
Sanjay Patel	77e6ebe748	[x86] regenerate checks; NFC llvm-svn: 298164	2017-03-17 22:47:21 +00:00
Jessica Paquette	ea8cc09be0	[Outliner] Add outliner for AArch64 This commit adds the necessary target hooks for outlining in AArch64. It also refactors the switch statement used in `getMemOpBaseRegImmOfsWidth` into a more general function, `getMemOpInfo`. This allows the outliner to share that code without copying and pasting it. The AArch64 outliner can be run using -mllvm -enable-machine-outliner, as with the X86-64 outliner. The test for this pass verifies that the outliner does, in fact outline functions, fixes up the stack accesses properly, and can correctly generate a tail call. In the future, this test should be replaced with a MIR test, so that we can properly test immediate offset overflows in fixed-up instructions. llvm-svn: 298162	2017-03-17 22:26:55 +00:00
Evgeniy Stepanov	c5aa6b9411	[asan] Fix dead stripping of globals on Linux. Use a combination of !associated, comdat, @llvm.compiler.used and custom sections to allow dead stripping of globals and their asan metadata. Sometimes. Currently this works on LLD, which supports SHF_LINK_ORDER with sh_link pointing to the associated section. This also works on BFD, which seems to treat comdats as all-or-nothing with respect to linker GC. There is a weird quirk where the "first" global in each link is never GC-ed because of the section symbols. At this moment it does not work on Gold (as in the globals are never stripped). Differential Revision: https://reviews.llvm.org/D30121 llvm-svn: 298158	2017-03-17 22:17:29 +00:00
Evgeniy Stepanov	51c962f72e	Add !associated metadata. This is an ELF-specific thing that adds SHF_LINK_ORDER to the global's section pointing to the metadata argument's section. The effect of that is a reverse dependency between sections for the linker GC. !associated does not change the behavior of global-dce. The global may also need to be added to llvm.compiler.used. Since SHF_LINK_ORDER is per-section, !associated effectively enables fdata-sections for the affected globals, the same as comdats do. Differential Revision: https://reviews.llvm.org/D29104 llvm-svn: 298157	2017-03-17 22:17:24 +00:00
Eli Friedman	46ddab3810	[SelectionDAG] Remove redundant stores more aggressively. Handle TokenFactors more aggressively in SDValue::reachesChainWithoutSideEffects. This isn't really a very effective change anymore because of other changes to chain handling, but it's a cheap check, and the expanded comments are still useful. It might be possible to loosen the hasOneUse() requirement with a deeper analysis, but a naive implementation of that check would be expensive. Differential Revision: https://reviews.llvm.org/D29845 llvm-svn: 298156	2017-03-17 22:15:50 +00:00
Matt Arsenault	e70d5dcf3e	AMDGPU: Fix handling of constant phi input loop conditions If the loop condition was an i1 phi with a constantexpr input, this would add a loop intrinsic fed by a phi dependent on a call to if.break in the same block. Insert the call in the loop header. llvm-svn: 298121	2017-03-17 20:52:21 +00:00
Sanjay Patel	455703a0c6	[x86] clean up setcc with negated operand transform and add missing test; NFCI llvm-svn: 298118	2017-03-17 20:29:40 +00:00
Reid Kleckner	edf1cbb580	[X86] Emit fewer instructions to allocate >16GB stack frames Summary: Use this code pattern when RAX is live, instead of emitting up to 2 billion adjustments: pushq %rax movabsq +-$Offset+-8, %rax addq %rsp, %rax xchg %rax, (%rsp) movq (%rsp), %rsp Try to clean this code up a bit while I'm here. In particular, hoist the logic that handles the entire adjustment with `movabsq $imm, %rax` out of the loop. This negates the offset in the prologue and uses ADD because X86 only has a two operand subtract which always subtracts from the destination register, which can no longer be RSP. Fixes PR31962 Reviewers: majnemer, sdardis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30052 llvm-svn: 298116	2017-03-17 20:25:49 +00:00
Rong Xu	661ffe104e	[PGO] Add omitted test cases. llvm-svn: 298115	2017-03-17 20:05:13 +00:00
Jun Bum Lim	4230101def	[CodeGenPrep]Restructure promoting Ext to form ExtLoad Summary: Instead of just looking for a load which is mergable with Ext to form ExtLoad, trying to promote Exts as long as the cost is acceptable. This change is not a NFC as it continue promoting Exts even after finding a load during promotions; the change in arm64-codegen-prepare-extload.ll described in 2.b might show the case. This change was motivated from D26524. Based on this change, I will move the transformation performed in aarch64-type-promotion into CGP. Reviewers: jmolloy, qcolombet, mcrosier, javed.absar Reviewed By: qcolombet Subscribers: rengolin, llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D27853 llvm-svn: 298114	2017-03-17 19:05:21 +00:00
Rong Xu	e60343d6b0	[PGO] Value profile for size of memory intrinsic calls This patch annotates the valuesites profile to memory intrinsics. Differential Revision: http://reviews.llvm.org/D31002 llvm-svn: 298110	2017-03-17 18:07:26 +00:00
Vedant Kumar	c2a250c35b	[Bitcode] Add compatibility test for the 4.0 release Fork off compatibility.ll for the 4.0 release. The *.bc file in this commit was produced using a Release build of the release_40 branch. llvm-svn: 298109	2017-03-17 17:53:26 +00:00
Simon Pilgrim	5a68d401c7	[SelectionDAG] Add SelectionDAG.computeKnownBits test support for ISD::ABS llvm-svn: 298108	2017-03-17 17:45:36 +00:00
Sanjay Patel	25bd713d33	[x86] avoid adc/sbb assert when both sides of add are zexted (PR32316) As noted in the comment, we might want to account for this case, but I didn't look at what that would mean for the asm. I'm also not sure why this only reproduces with avx512, but I'm putting a conservative fix in for now to avoid the crash. Also, if both sides of an add are zexted, shouldn't we shrink that add? https://bugs.llvm.org/show_bug.cgi?id=32316 llvm-svn: 298107	2017-03-17 17:27:31 +00:00
Stanislav Mekhanoshin	ee2dd785f6	Only unswitch loops with uniform conditions Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104	2017-03-17 17:13:41 +00:00
Simon Pilgrim	d06b025c9c	[X86] Add SelectionDAG.computeKnownBits test showing inability to handle ISD::ABS We have to be careful as abs(INT_MIN) == INT_MIN. llvm-svn: 298103	2017-03-17 16:58:15 +00:00
Chad Rosier	a69dcb6b66	[AArch64] Use alias analysis in the load/store optimization pass. This allows the optimization to rearrange loads and stores more aggressively. Differential Revision: http://reviews.llvm.org/D30903 llvm-svn: 298092	2017-03-17 14:19:55 +00:00
Oliver Stannard	8761e9bb43	[Asm] Don't list '@<type>' in diag when '@' is a comment This fixes https://bugs.llvm.org//show_bug.cgi?id=31280 Differential revision: https://reviews.llvm.org/D31026 llvm-svn: 298067	2017-03-17 11:10:17 +00:00
Andre Vieira	913ffeb5ba	[ARM] Fix triple format in test branch disassemble test Fixing triple format in the tests added for the branch label fix for Thumb Targets. Also recommitting previously approved patch, see https://reviews.llvm.org/D30943. Reviewed by: samparker Differential Revision: https://reviews.llvm.org/D30987 llvm-svn: 298056	2017-03-17 09:37:10 +00:00
Craig Topper	a8d4097445	[AVX-512] Make VEX encoded FMA instructions available when AVX512 is enabled regardless of whether +fma was added on the command line. We weren't able to handle isel of the 128/256-bit FMA instructions when AVX512F was enabled but VLX and FMA weren't. I didn't mask FeatureAVX512 imply FeatureFMA as I wasn't sure I wanted disabling FMA to also disable AVX512. Instead we just can't prevent FMA instructions if AVX512 is enabled. Another option would be to promote 128/256-bit to 512-bit, do the operation and extract it. But that requires a lot of extra isel patterns. Since no CPUs exist that support AVX512, but not FMA just using the VEX instructions seems better. llvm-svn: 298051	2017-03-17 07:37:31 +00:00
Jonas Paulsson	f496bd9a59	[SystemZ] New CodeGen tests for vector compare / select. New SystemZ tests for the improved codegen of vector compare and select, including cases with a logical combination of two compares. Review: Ulrich Weigand. https://reviews.llvm.org/D29489 llvm-svn: 298049	2017-03-17 07:11:46 +00:00
Jonas Paulsson	8a7bd24c82	[SystemZ] Add use of super-reg in splitMove() If one of the subregs of the 128 bit reg is undefined when splitMove() splits a store into two instructions, a use of an undefined physical register results. To remedy this, an implicit use of the super register is added onto both new instructions, along with propagated kill and undef flags. This was discovered with llvm-stress, and that test case is attached as test/CodeGen/SystemZ/splitMove_undefReg_mverifier.ll Thanks to Matthias Braun for helping with a nice explanation. Review: Ulrich Weigand llvm-svn: 298047	2017-03-17 06:47:08 +00:00
Craig Topper	6a1290a0fd	[AVX-512] Give priority to EVEX encoded scalar FMA instructions when we have FMA, AVX512 and no VLX. We were giving priority if VLX was enabled. llvm-svn: 298046	2017-03-17 06:10:37 +00:00
Craig Topper	c1338f21ed	[X86] Use update_llc_test_checks.py to regenerate a test and add command lines to demonstrate that we don't pick EVEX encoded instruction when AVX512 and FMA3 are both enabled. This bug only exists on the scalar llvm.fma instrinsics. Looks like we don't test the llvm.fma intrinsics very thoroughly. In fact I don't see any tests for the vector versions. llvm-svn: 298045	2017-03-17 06:00:01 +00:00
Craig Topper	30c89eeeb6	[X86] Use update_llc_test_checks.py to regenerate a test. llvm-svn: 298044	2017-03-17 05:59:57 +00:00
Sanjoy Das	c4e4dcdf64	[RSForGC] Handle vector GEPs We were not handling getelemenptr instructions of vector type before. Since getelemenptr instructions for vector types follow the same rule as getelementptr instructions for non-vector types, we can just handle them in the same way. llvm-svn: 298028	2017-03-17 00:55:53 +00:00
Zachary Turner	2d9c082033	Revert "Make NativeExeSymbol a concrete subclass of NativeRawSymbol [PDB]" For some reason this is causing ANSI color codes to be printed even when run through FileCheck. llvm-svn: 298026	2017-03-17 00:46:42 +00:00
Matthias Braun	f0b68d3fbc	SplitKit: Correctly implement partial subregister copies - This fixes a bug where subregister incompatible with the vregs register class where used. - Implement the case where multiple copies are necessary to cover a given lanemask. Differential Revision: https://reviews.llvm.org/D30438 llvm-svn: 298025	2017-03-17 00:41:39 +00:00
Eli Friedman	da228fee0c	[ARM] Use alias analysis in ARMPreAllocLoadStoreOpt. This allows the optimization to rearrange loads and stores more aggressively. This doesn't really affect performance, but it helps codesize. Differential Revision: https://reviews.llvm.org/D30839 llvm-svn: 298021	2017-03-17 00:34:26 +00:00
Zachary Turner	2ed2aa75bf	[pdb] Fix an uninitialized read, and add a test for it. This was originally reported in pr32249, uncovered by PTVS-Studio. There was no code coverage for this path because it was difficult to construct odd-case PDB files that were not generated by cl. Now that we can write construct minimal PDB files from YAML, it's easy to construct fragments that generate whatever we want. In this patch I add a test that creates 2 type records. One with a unique name, and one without. I verify that we can go from PDB to Yaml with no errors. In a future patch I'd like to add something like llvm-pdbdump raw -lookup-type that will just dump one record and nothing else, which should make it a bit cleaner to find this kind of thing. llvm-svn: 298017	2017-03-17 00:15:55 +00:00
Adrian McCarthy	21b54cf632	Make NativeExeSymbol a concrete subclass of NativeRawSymbol [PDB] This moves exe symbol-specific method implementations out of NativeRawSymbol into a concrete subclass. Also adds implementations for hasCTypes and hasPrivateSymbols and a simple test to ensure the native reader can access the summary information for the executable from the PDB. Differential Revision: https://reviews.llvm.org/D31059 llvm-svn: 298005	2017-03-16 22:28:39 +00:00
Kyle Butt	d609d6ebf0	CodeGen: BlockPlacement: Adjust test case so it covers rL297925. NFC I had ajusted the test case before when testing a chain of length 2, and then reverted it with rL296845 when I switched to 3 triangles. After running benchmarks and examining generated code at length 2 I forgot to put the test back. llvm-svn: 298000	2017-03-16 21:33:29 +00:00
Rong Xu	60faea19f8	Resubmit r297897: [PGO] Value profile for size of memory intrinsic calls R297897 inadvertently enabled annotation for memop profiling. This new patch fixed it. llvm-svn: 297996	2017-03-16 21:15:48 +00:00
Adrian Prantl	47ea6478ed	Salvage debug info from instructions about to be deleted [Reapplies r297971 and punting on finding a better API for findDbgValues()] This patch improves debug info quality in InstCombine by looking at values that are about to be deleted, checking whether there are any dbg.value instrinsics referring to them, and potentially encoding the semantics of the deleted instruction into the dbg.value's DIExpression. In the example in the testcase (which was extracted from XNU) there is a sequence of %4 = load %struct.entry, %struct.entry* %next2, align 8, !dbg !41 %5 = bitcast %struct.entry* %4 to i8, !dbg !42 %add.ptr4 = getelementptr inbounds i8, i8 %5, i64 -8, !dbg !43 %6 = bitcast i8* %add.ptr4 to %struct.entry, !dbg !44 call void @llvm.dbg.value(metadata %struct.entry %6, i64 0, metadata !20, metadata !21), !dbg 34 When these instructions are eliminated by instcombine one after another, we can still salvage the otherwise dead debug info: - Bitcasts have no effect, so have the dbg.value point to operand(0) - Loads can be expressed via a DW_OP_deref - Constant gep instructions can be replaced by DWARF expression arithmetic The API introduced by this patch is not specific to instcombine and can be useful in other places, too. rdar://problem/30725338 Differential Revision: https://reviews.llvm.org/D30919 llvm-svn: 297994	2017-03-16 21:14:09 +00:00
Michael Kuperstein	2da2bfa088	[LoopUnroll] Don't peel loops where the latch isn't the exiting block Peeling assumed this doesn't happen, but didn't check it. This fixes PR32178. Differential Revision: https://reviews.llvm.org/D30757 llvm-svn: 297993	2017-03-16 21:07:48 +00:00
Michael Zolotukhin	99de88d1f3	[SCEV] Compute affine range in another way to avoid bitwidth extending. Summary: This approach has two major advantages over the existing one: 1. We don't need to extend bitwidth in our computations. Extending bitwidth is a big issue for compile time as we often end up working with APInts wider than 64bit, which is a slow case for APInt. 2. When we zero extend a wrapped range, we lose some information (we replace the range with [0, 1 << src bit width)). Thus, avoiding such extensions better preserves information. Correctness testing: I ran 'ninja check' with assertions that the new implementation of getRangeForAffineAR gives the same results as the old one (this functionality is not present in this patch). There were several failures - I inspected them manually and found out that they all are caused by the fact that we're returning more accurate results now (see bullet (2) above). Without such assertions 'ninja check' works just fine, as well as SPEC2006. Compile time testing: CTMark/Os: - mafft/pairlocalalign -16.98% - tramp3d-v4/tramp3d-v4 -12.72% - lencod/lencod -11.51% - Bullet/bullet -4.36% - ClamAV/clamscan -3.66% - 7zip/7zip-benchmark -3.19% - sqlite3/sqlite3 -2.95% - SPASS/SPASS -2.74% - Average -5.81% Performance testing: The changes are expected to be neutral for runtime performance. Reviewers: sanjoy, atrick, pete Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30477 llvm-svn: 297992	2017-03-16 21:07:38 +00:00

1 2 3 4 5 ...

43857 Commits