llvm-project

Commit Graph

Author	SHA1	Message	Date
Javed Absar	e5ad87e939	[ARM] Enable Cortex-M23 and Cortex-M33 support. Add both cores to the target parser and TableGen. Test that eabi attributes are set correctly for both cores. Additionally, test the absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively. Committed on behalf of Sanne Wouda. Reviewers : rengolin, olista01. Differential Revision: https://reviews.llvm.org/D29073 llvm-svn: 293761	2017-02-01 11:55:03 +00:00
Florian Hahn	a35b8a4852	[LoopUnroll] Use addClonedBlockToLoopInfo to add loop header to LI (NFC). Summary: I have a similar patch up for review already (D29173). If you prefer I can squash them both together. Also I think there more potential for code sharing between LoopUnroll.cpp and LoopUnrollRuntime.cpp. Do you think patches for that would be worthwhile? Reviewers: mkuper, mzolotukhin Reviewed By: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29311 llvm-svn: 293758	2017-02-01 10:39:35 +00:00
NAKAMURA Takumi	468487d71a	*MacroFusion.cpp: Suppress warnings to eliminate \param(s). [-Wdocumentation] llvm-svn: 293744	2017-02-01 07:30:46 +00:00
Craig Topper	0bcba19cdf	[X86] For AVX1/AVX2 isel, don't use FP move instructions for 128-bit loads/stores of integer types. For SSE we use fp because of the smaller encoding, but that doesn't apply to AVX. So just do the natural thing so we don't have to explain why we aren't. We can't do this for 256-bit loads/stores since integer loads and stores aren't available in AVX1 so we need fallback patterns since the integer types are legal. This doesn't affect any tests because execution domain fixing freely converts the instructions anyway. Honestly, we could probably rely on it for the SSE size optimization too. llvm-svn: 293743	2017-02-01 07:17:16 +00:00
Evandro Menezes	455382ea22	[AArch64] Add new target feature to fuse literal generation This feature enables the fusion of such operations on Cortex A57, as recommended in its Software Optimisation Guide, sections 4.14 and 4.15. Differential revision: https://reviews.llvm.org/D28698 llvm-svn: 293739	2017-02-01 02:54:42 +00:00
Evandro Menezes	b21fb29c26	[AArch64] Add new subtarget feature to fuse AES crypto operations This feature enables the fusion of such operations on Cortex A57, as recommended in its Software Optimisation Guide, section 4.13, and on Exynos M1. Differential revision: https://reviews.llvm.org/D28491 llvm-svn: 293738	2017-02-01 02:54:39 +00:00
Evandro Menezes	94edf02923	[CodeGen] Move MacroFusion to the target This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target. In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end. Differential revision: https://reviews.llvm.org/D28489 llvm-svn: 293737	2017-02-01 02:54:34 +00:00
Sanjoy Das	15e50b510e	[ImplicitNullCheck] NFC isSuitableMemoryOp cleanup Summary: isSuitableMemoryOp method is repsonsible for verification that instruction is a candidate to use in implicit null check. Additionally it checks that base register is not re-defined before. In case base has been re-defined it just returns false and lookup is continued while any suitable instruction will not succeed this check as well. This results in redundant further operations. So when we found that base register has been re-defined we just stop. Patch by Serguei Katkov! Reviewers: reames, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29119 llvm-svn: 293736	2017-02-01 02:49:25 +00:00
Justin Bogner	41e632bf6b	SanitizerCoverage: Support sanitizer guard section on darwin MachO's sections need a segment as well as a section name, and the section start and end symbols are spelled differently than on ELF. llvm-svn: 293733	2017-02-01 02:38:39 +00:00
Matthias Braun	8d115a384c	MCMacho: Allow __thread_ptr section after dwarf sections Differential Revision: https://reviews.llvm.org/D29315 llvm-svn: 293730	2017-02-01 01:31:36 +00:00
Eugene Zelenko	926883e1c2	[Mips] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 293729	2017-02-01 01:22:51 +00:00
Stanislav Mekhanoshin	70c245e92d	Fix regalloc assignment of overlapping registers SplitEditor::defFromParent() can create a register copy. If register is a tuple of other registers and not all lanes are used a copy will be done on a full tuple regardless. Later register unit for an unused lane will be considered free and another overlapping register tuple can be assigned to a different value even though first register is live at that point. That is because interference only look at liveness info, while full register copy clobbers all lanes, even unused. This patch fixes copy to only cover used lanes. Differential Revision: https://reviews.llvm.org/D29105 llvm-svn: 293728	2017-02-01 01:18:36 +00:00
Davide Italiano	7343b9f340	[IPSCCP] Teach how to not propagate return values of naked functions. Differential Revision: https://reviews.llvm.org/D29360 llvm-svn: 293727	2017-02-01 01:01:22 +00:00
Matt Arsenault	da7a656542	AMDGPU: Cleanup fmin/fmax legacy function Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726	2017-02-01 00:42:40 +00:00
Matt Arsenault	bdd59e6879	InferAddressSpaces: Handle select This fails to handle some cases where one of the inputs is a constant to be fixed in a later commit. llvm-svn: 293723	2017-02-01 00:08:53 +00:00
Kostya Serebryany	5c76e3d034	[libFuzzer] increase the default size for shmem llvm-svn: 293722	2017-02-01 00:07:47 +00:00
Dean Michael Berris	0e8ababf7d	[XRay] Define the InstrumentationMap type Summary: This change implements the instrumentation map loading library which can understand both YAML-defined instrumentation maps, and ELF 64-bit object files that have the XRay instrumentation map section. We break it out into a library on its own to allow for other applications to deal with the XRay instrumentation map defined in XRay-instrumented binaries. This type provides both raw access to the logical representation of the instrumentation map entries as well as higher level functions for converting a function ID into a function address. At this point we only support ELF64 binaries and YAML-defined XRay instrumentation maps. Future changes should extend this to support 32-bit ELF binaries, as well as other binary formats (like MachO). As part of this change we also migrate all uses of the extraction logic that used to be defined in tools/llvm-xray/ to use this new type and interface for loading from files. We also remove the flag from the `llvm-xray` tool that required users to specify the type of the instrumentation map file being provided to instead make the library auto-detect the file type. Reviewers: dblaikie Subscribers: mgorny, varno, llvm-commits Differential Revision: https://reviews.llvm.org/D29319 llvm-svn: 293721	2017-02-01 00:05:29 +00:00
Matt Arsenault	864fbacb4a	InferAddressSpaces: Remove dead declaration llvm-svn: 293720	2017-01-31 23:57:20 +00:00
Matt Arsenault	517a290e4f	InferAddressSpaces: Avoid double map lookup llvm-svn: 293719	2017-01-31 23:48:44 +00:00
Matt Arsenault	2a46d81038	InferAddressSpaces: Fix broken casting of constants llvm-svn: 293718	2017-01-31 23:48:40 +00:00
Matt Arsenault	1575cb893c	AMDGPU: Fix warning llvm-svn: 293717	2017-01-31 23:48:37 +00:00
Kyle Butt	b15c06677c	CodeGen: Allow small copyable blocks to "break" the CFG. When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well, subject to some simple frequency calculations. Differential Revision: https://reviews.llvm.org/D28583 llvm-svn: 293716	2017-01-31 23:48:32 +00:00
Rafael Espindola	a86be22230	Move more code to helper functions. NFC. llvm-svn: 293715	2017-01-31 23:26:32 +00:00
Justin Lebar	06fcea4cd9	[NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than xrsqrt(x). xrsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as desired. Verified that the particular nvptx approximate instructions here do in fact return 0 for x = 0. llvm-svn: 293713	2017-01-31 23:08:57 +00:00
Rafael Espindola	d9953d9dd2	Move some code to a helper function. NFC. llvm-svn: 293712	2017-01-31 23:07:08 +00:00
Michael Kuperstein	e18aad39ab	Shut up GCC warning about operator precedence. NFC. Technically, this is actually changes the expression and the original assert was "wrong", but since the conjunction is with true, it doesn't matter in this case. llvm-svn: 293709	2017-01-31 22:48:45 +00:00
Daniel Berlin	97718e6081	NewGVN: Dead argument cleanup llvm-svn: 293708	2017-01-31 22:32:03 +00:00
Daniel Berlin	ff12c922fe	NewGVN: Cleanup conditions to match reality llvm-svn: 293707	2017-01-31 22:32:01 +00:00
Daniel Berlin	c22aafe5b3	NewGVN: Add basic support for symbolic comparison evaluation llvm-svn: 293706	2017-01-31 22:31:58 +00:00
Daniel Berlin	808e3ff8a2	NewGVN: Formatting cleanup after lookupOperandLeader change llvm-svn: 293705	2017-01-31 22:31:56 +00:00
Daniel Berlin	203f47bbd8	NewGVN: Remove the unsued two arguments from lookupOperandLeader. llvm-svn: 293704	2017-01-31 22:31:53 +00:00
Daniel Berlin	74d300361a	NewGVN: Cleanup header files we are using. llvm-svn: 293703	2017-01-31 22:31:50 +00:00
David Blaikie	0012dd5db1	Add a verbose/human readable mode to llvm-symbolizer to investigate discriminators and other line table/backtrace features Patch by Simon Que! Differential Revision: https://reviews.llvm.org/D29094 llvm-svn: 293697	2017-01-31 22:19:38 +00:00
Davide Italiano	116464a55d	[NewGVN] Preserve TargetLibraryInfo analysis. We can maybe preserve more but this is a first step. Ack'ed by Danny on IRC. llvm-svn: 293694	2017-01-31 21:53:18 +00:00
Davide Italiano	5a473d230d	[Support] Add newline when dumping an APInt. This annoyed me a few times but was lazy so I haven't fixed it until today, when the output of my debugger was too confusing. llvm-svn: 293691	2017-01-31 21:26:18 +00:00
Rafael Espindola	2d55781ae3	Make this file clang-format friendly and clang-format it. llvm-svn: 293689	2017-01-31 21:11:12 +00:00
Taewook Oh	75acec8a14	Do not propagate DebugLoc across basic blocks Summary: DebugLoc shouldn't be propagated across basic blocks to prevent incorrect stepping and imprecise sample profile result. rL288903 addressed the wrong DebugLoc propagation issue by limiting the copy of DebugLoc when GVN removes a fully redundant load that is dominated by some other load. However, DebugLoc is still incorrectly propagated in the following example: ``` 1: extern int g; 2: 3: void foo(int x, int y, int z) { 4: if (x) 5: g = 0; 6: else 7: g = 1; 8: 9: int i = 0; 10: for ( ; i < y ; i++) 11: if (i > z) 12: g++; 13: } ``` Below is LLVM IR representation of the program before GVN: ``` @g = external local_unnamed_addr global i32, align 4 ; Function Attrs: nounwind uwtable define void @foo(i32 %x, i32 %y, i32 %z) local_unnamed_addr #0 !dbg !4 { entry: %not.tobool = icmp eq i32 %x, 0, !dbg !8 %.sink = zext i1 %not.tobool to i32, !dbg !8 store i32 %.sink, i32* @g, align 4, !tbaa !9 %cmp8 = icmp sgt i32 %y, 0, !dbg !13 br i1 %cmp8, label %for.body.preheader, label %for.end, !dbg !17 for.body.preheader: ; preds = %entry br label %for.body, !dbg !19 for.body: ; preds = %for.body.preheader, %for.inc %i.09 = phi i32 [ %inc4, %for.inc ], [ 0, %for.body.preheader ] %cmp1 = icmp sgt i32 %i.09, %z, !dbg !19 br i1 %cmp1, label %if.then2, label %for.inc, !dbg !21 if.then2: ; preds = %for.body %0 = load i32, i32* @g, align 4, !dbg !22, !tbaa !9 %inc = add nsw i32 %0, 1, !dbg !22 store i32 %inc, i32* @g, align 4, !dbg !22, !tbaa !9 br label %for.inc, !dbg !23 for.inc: ; preds = %for.body, %if.then2 %inc4 = add nuw nsw i32 %i.09, 1, !dbg !24 %exitcond = icmp ne i32 %inc4, %y, !dbg !13 br i1 %exitcond, label %for.body, label %for.end.loopexit, !dbg !17 for.end.loopexit: ; preds = %for.inc br label %for.end, !dbg !26 for.end: ; preds = %for.end.loopexit, %entry ret void, !dbg !26 } ``` where ``` !21 = !DILocation(line: 11, column: 9, scope: !15) !22 = !DILocation(line: 12, column: 8, scope: !20) !23 = !DILocation(line: 12, column: 7, scope: !20) !24 = !DILocation(line: 10, column: 20, scope: !25) ``` And below is after GVN: ``` @g = external local_unnamed_addr global i32, align 4 define void @foo(i32 %x, i32 %y, i32 %z) local_unnamed_addr !dbg !4 { entry: %not.tobool = icmp eq i32 %x, 0, !dbg !8 %.sink = zext i1 %not.tobool to i32, !dbg !8 store i32 %.sink, i32* @g, align 4, !tbaa !9 %cmp8 = icmp sgt i32 %y, 0, !dbg !13 br i1 %cmp8, label %for.body.preheader, label %for.end, !dbg !17 for.body.preheader: ; preds = %entry br label %for.body, !dbg !19 for.body: ; preds = %for.inc, %for.body.preheader %0 = phi i32 [ %1, %for.inc ], [ %.sink, %for.body.preheader ], !dbg !21 %i.09 = phi i32 [ %inc4, %for.inc ], [ 0, %for.body.preheader ] %cmp1 = icmp sgt i32 %i.09, %z, !dbg !19 br i1 %cmp1, label %if.then2, label %for.inc, !dbg !22 if.then2: ; preds = %for.body %inc = add nsw i32 %0, 1, !dbg !21 store i32 %inc, i32* @g, align 4, !dbg !21, !tbaa !9 br label %for.inc, !dbg !23 for.inc: ; preds = %if.then2, %for.body %1 = phi i32 [ %inc, %if.then2 ], [ %0, %for.body ] %inc4 = add nuw nsw i32 %i.09, 1, !dbg !24 %exitcond = icmp ne i32 %inc4, %y, !dbg !13 br i1 %exitcond, label %for.body, label %for.end.loopexit, !dbg !17 for.end.loopexit: ; preds = %for.inc br label %for.end, !dbg !26 for.end: ; preds = %for.end.loopexit, %entry ret void, !dbg !26 } ``` As you see, GVN removes the load in if.then2 block and creates a phi instruction in for.body for it. The problem is that DebugLoc of remove load instruction is propagated to the newly created phi instruction, which is wrong. rL288903 cannot handle this case because ValuesPerBlock.size() is not 1 in this example when the load is removed. Reviewers: aprantl, andreadb, wolfgangp Reviewed By: andreadb Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D29254 llvm-svn: 293688	2017-01-31 20:57:13 +00:00
Tim Northover	c6bfa481cf	GlobalISel: the translation of an invoke must branch to the good block. Otherwise bad things happen if the basic block order isn't trivial after an invoke. llvm-svn: 293679	2017-01-31 20:12:18 +00:00
Matthias Braun	01fa962226	InterleaveAccessPass: Avoid constructing invalid shuffle masks Fix a bug where we would construct shufflevector instructions addressing invalid elements. Differential Revision: https://reviews.llvm.org/D29313 llvm-svn: 293673	2017-01-31 18:37:53 +00:00
Tim Northover	293f74355b	GlobalISel: merge invoke and call translation paths. Well, sort of. But the lower-level code that invoke used to be using completely botched the handling of varargs functions, which hopefully won't be possible if they're using the same code. llvm-svn: 293670	2017-01-31 18:36:11 +00:00
Peter Collingbourne	d763c4cc85	MC: Introduce the ABS8 symbol modifier. @ABS8 can be applied to symbols which appear as immediate operands to instructions that have a 8-bit immediate form for that operand. It causes the assembler to use the 8-bit form and an 8-bit relocation (e.g. R_386_8 or R_X86_64_8) for the symbol. Differential Revision: https://reviews.llvm.org/D28688 llvm-svn: 293667	2017-01-31 18:28:44 +00:00
Davide Italiano	aec4617dc8	[Instcombine] Combine consecutive identical fences Differential Revision: https://reviews.llvm.org/D29314 llvm-svn: 293661	2017-01-31 18:09:05 +00:00
Arnold Schwaighofer	c368563bd6	Don't combine stores to a swifterror pointer operand to a different type llvm-svn: 293658	2017-01-31 17:53:49 +00:00
Dehao Chen	274df5ea41	Explicitly promote indirect calls before sample profile annotation. Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation. Reviewers: xur, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29040 llvm-svn: 293657	2017-01-31 17:49:37 +00:00
Matt Arsenault	d5d78510c7	AMDGPU: Use source mods with fcanonicalize llvm-svn: 293654	2017-01-31 17:28:40 +00:00
Sanjay Patel	2217f75ad1	fix formatting; NFC llvm-svn: 293652	2017-01-31 17:25:42 +00:00
Nirav Dave	a7c041d147	[X86] Implement -mfentry Summary: Insert calls to __fentry__ at function entry. Reviewers: hfinkel, craig.topper Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D28000 llvm-svn: 293648	2017-01-31 17:00:27 +00:00
David Bozier	60b80d2233	Add support for demangling C++11 thread_local variables. In clang, the grammar for mangling for these names are "<special-name> ::= TW <object name>" for wrapper variables or "<special-name> ::= TH <object name>" for initialization variables. Initial change was made in libccxxabi r293638 llvm-svn: 293643	2017-01-31 15:56:36 +00:00
Tom Stellard	124f5cc8c2	AMDGPU/SI: Fix inst-select-load-smrd.mir on some builds Summary: For some reason instructions are being inserted in the wrong order with some builds. I'm not sure why this is happening. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D29325 llvm-svn: 293639	2017-01-31 15:24:11 +00:00
Simon Pilgrim	1b39d5db7b	[X86][SSE] Add support for combining PINSRB into a target shuffle. llvm-svn: 293637	2017-01-31 14:59:44 +00:00
Nicolai Haehnle	8813d5d221	[DAGCombine] require UnsafeFPMath for re-association of addition Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 llvm-svn: 293635	2017-01-31 14:35:37 +00:00
Sam Parker	9bf658d5fe	[ARM] Avoid using ARM instructions in Thumb mode The Requires class overrides the target requirements of an instruction, rather than adding to them, so all ARM instructions need to include the IsARM predicate when they have overwitten requirements. This caused the swp and swpb instructions to be allowed in thumb mode assembly, and the ARM encoding of CDP to be selected in codegen (which is different for conditional instructions). Differential Revision: https://reviews.llvm.org/D29283 llvm-svn: 293634	2017-01-31 14:35:01 +00:00
Benjamin Kramer	94a833962c	[X86] Silence unused variable warning in Release builds. llvm-svn: 293631	2017-01-31 14:13:53 +00:00
Silviu Baranga	c6d21eba0e	[InstCombine] Make sure that LHS and RHS have the same type in transformToIndexedCompare If they don't have the same type, the size of the constant index would need to be adjusted (and this wouldn't be always possible). Alternatively we could try the analysis with the initial RHS value, which would guarantee that the two sides have the same type. However it is unlikely that in practice this would pass our transformation requirements. Fixes PR31808 (https://llvm.org/bugs/show_bug.cgi?id=31808). llvm-svn: 293629	2017-01-31 14:04:15 +00:00
Simon Pilgrim	4eab18f6b8	[X86][SSE] Detect unary PBLEND shuffles. These can appear during shuffle combining. llvm-svn: 293628	2017-01-31 13:58:01 +00:00
Simon Pilgrim	c29eab52e8	[X86][SSE] Add support for combining PINSRW into a target shuffle. Also add the ability to recognise PINSR(Vex, 0, Idx). Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat. The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch. llvm-svn: 293627	2017-01-31 13:51:10 +00:00
Nemanja Ivanovic	2f2a6ab991	[PowerPC][Altivec] Add vmr extended mnemonic Just adds the vmr (Vector Move Register) mnemonic for the VOR instruction in the PPC back end. Committing on behalf of brunoalr (Bruno Rosa). Differential Revision: https://reviews.llvm.org/D29133 llvm-svn: 293626	2017-01-31 13:43:11 +00:00
Florian Hahn	5364cf3b56	[LoopUnroll] Use addClonedBlockToLoopInfo to clone the top level loop (NFC) Summary: rL293124 added the necessary infrastructure to properly add the cloned top level loop to LoopInfo, which means we do not have to do it manually in CloneLoopBlocks. @mkuper sorry for not pointing this out during my review of D29156, I just realized that today. Reviewers: mzolotukhin, chandlerc, mkuper Reviewed By: mkuper Subscribers: llvm-commits, mkuper Differential Revision: https://reviews.llvm.org/D29173 llvm-svn: 293615	2017-01-31 11:13:44 +00:00
Simon Dardis	12850eeac5	[mips] Addition of the immediate cases for the instructions [d]div, [d]divu Related to http://reviews.llvm.org/D15772 Depends on http://reviews.llvm.org/D16888 Adds support for immediate operand for [D]DIV[U] instructions. Patch By: Srdjan Obucina Reviewers: zoran.jovanovic, vkalintiris, dsanders, obucina Differential Revision: https://reviews.llvm.org/D16889 llvm-svn: 293614	2017-01-31 10:49:24 +00:00
Craig Topper	2cfa2071bd	[AVX-512] Don't both looking into the AVX512DQ execution domain fixing tables if AVX512DQ isn't supported since we can't do any conversion anyway. llvm-svn: 293608	2017-01-31 06:49:55 +00:00
Craig Topper	797e32dd98	[X86] Add AVX and SSE2 version of MOVSDmr to execution domain fixing table. AVX-512 already did this for the EVEX version. llvm-svn: 293607	2017-01-31 06:49:53 +00:00
Craig Topper	779e4c5bb4	[AVX-512] Fix copy and paste bug in execution domain fixing tables so that we can convert 256-bit movnt instructions. llvm-svn: 293606	2017-01-31 06:49:50 +00:00
Justin Lebar	1c9692a46f	[NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate. Summary: This lets us lower to sqrt.approx and rsqrt.approx under more circumstances. * Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32, when fast-math is enabled. Previously, we only would emit it for calls to @llvm.nvvm.sqrt.f. (With this patch we no longer emit sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.) * Now we emit the ftz version of rsqrt.approx when ftz is enabled. Previously, we only emitted rsqrt.approx when ftz was disabled. Reviewers: hfinkel Subscribers: llvm-commits, tra, jholewinski Differential Revision: https://reviews.llvm.org/D28508 llvm-svn: 293605	2017-01-31 05:58:22 +00:00
Craig Topper	06e038c6de	[X86] Update the broadcast fallback patterns to use shuffle instructions from the appropriate execution domain. llvm-svn: 293603	2017-01-31 05:18:29 +00:00
Craig Topper	e9e84c8284	[AVX-512] Fix the ExeDomain for VMOVDDUP, VMOVSLDUP, and VMOVSHDUP. llvm-svn: 293601	2017-01-31 05:18:24 +00:00
Matt Arsenault	f84e5d9a27	AMDGPU: Generalize matching of v_med3_f32 I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. llvm-svn: 293598	2017-01-31 03:07:46 +00:00
Matt Arsenault	973c4aebad	InferAddressSpaces: Rename constant llvm-svn: 293594	2017-01-31 02:17:41 +00:00
Matt Arsenault	72f259b8eb	InferAddressSpaces: Handle icmp llvm-svn: 293593	2017-01-31 02:17:32 +00:00
Craig Topper	d064cc93b2	[X86] Remove patterns for X86VPermilpi with integer types. I don't think we've formed these since the shuffle lowering rewrite. llvm-svn: 293592	2017-01-31 02:09:53 +00:00
Craig Topper	85935f69fb	[X86] Remove duplicate patterns for X86VPermilpv that already exist in the instructions themselves. llvm-svn: 293591	2017-01-31 02:09:51 +00:00
Craig Topper	ced68315ce	[X86] Remove patterns for selecting PSHUFD with FP types. We don't seem to do this anymore and the AVX case definitely should be using VPERMILPS anyway. llvm-svn: 293590	2017-01-31 02:09:49 +00:00
Craig Topper	b76494e017	[X86] Remove 'else' after 'return'. NFC llvm-svn: 293589	2017-01-31 02:09:46 +00:00
Craig Topper	f9d901f0ea	[X86] Use integer broadcast instructions for integer broadcast patterns. I'm not sure why we were using an FP instruction before and had to have a comment calling attention to it, but not justifying it. llvm-svn: 293588	2017-01-31 02:09:43 +00:00
Matt Arsenault	6d5a8d48fd	InferAddressSpaces: Support memory intrinsics llvm-svn: 293587	2017-01-31 01:56:57 +00:00
Matt Arsenault	6c907a9bb3	InferAddressSpaces: Support atomics llvm-svn: 293584	2017-01-31 01:40:38 +00:00
Matt Arsenault	d89a6e11a7	InferAddressSpaces: Don't replace volatile users llvm-svn: 293582	2017-01-31 01:30:16 +00:00
Matt Arsenault	b6491cc854	AMDGPU: Implement hook for InferAddressSpaces For now just port some of the existing NVPTX tests and from an old HSAIL optimization pass which approximately did the same thing. Don't enable the pass yet until more testing is done. llvm-svn: 293580	2017-01-31 01:20:54 +00:00
Matt Arsenault	850657a439	NVPTX: Move InferAddressSpaces to generic code llvm-svn: 293579	2017-01-31 01:10:58 +00:00
Eugene Zelenko	342257ea92	[ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 293578	2017-01-31 00:56:17 +00:00
Eli Friedman	10d1ff64fe	[SCEV] Simplify/generalize howFarToZero solving. Make SolveLinEquationWithOverflow take the start as a SCEV, so we can solve more cases. With that implemented, get rid of the special case for powers of two. The additional functionality probably isn't particularly useful, but it might help a little for certain cases involving pointer arithmetic. Differential Revision: https://reviews.llvm.org/D28884 llvm-svn: 293576	2017-01-31 00:42:42 +00:00
Keno Fischer	578cf7aae7	[ExecutionDepsFix] Improve clearance calculation for loops Summary: In revision rL278321, ExecutionDepsFix learned how to pick a better register for undef register reads, e.g. for instructions such as `vcvtsi2sdq`. While this revision improved performance on a good number of our benchmarks, it unfortunately also caused significant regressions (up to 3x) on others. This regression turned out to be caused by loops such as: PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT ^ \| +----------------------------------+ In the previous version of the clearance calculation, we would visit the blocks in order, remembering for each whether there were any incoming backedges from blocks that we hadn't processed yet and if so queuing up the block to be re-processed. However, for loop structures such as the above, this is clearly insufficient, since the block B does not have any unknown backedges, so we do not see the false dependency from the previous interation's Def of xmm registers in B. To fix this, we need to consider all blocks that are part of the loop and reprocess them one the correct clearance values are known. As an optimization, we also want to avoid reprocessing any later blocks that are not part of the loop. In summary, the iteration order is as follows: Before: PH A B C D A' Corrected (Naive): PH A B C D A' B' C' D' Corrected (w/ optimization): PH A B C A' B' C' D To facilitate this optimization we introduce two new counters for each basic block. The first counts how many of it's predecssors have completed primary processing. The second counts how many of its predecessors have completed all processing (we will call such a block done. Now, the criteria to reprocess a block is as follows: - All Predecessors have completed primary processing - For x the number of predecessors that have completed primary processing at the time of primary processing of this block, the number of predecessors that are done has reached x. The intuition behind this criterion is as follows: We need to perform primary processing on all predecessors in order to find out any direct defs in those predecessors. When predecessors are done, we also know that we have information about indirect defs (e.g. in block B though that were inherited through B->C->A->B). However, we can't wait for all predecessors to be done, since that would cause cyclic dependencies. However, it is guaranteed that all those predecessors that are prior to us in reverse postorder will be done before us. Since we iterate of the basic blocks in reverse postorder, the number x above, is precisely the count of the number of predecessors prior to us in reverse postorder. Reviewers: myatsina Differential Revision: https://reviews.llvm.org/D28759 llvm-svn: 293571	2017-01-30 23:37:03 +00:00
Sanjay Patel	8c5f236197	[InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2) for vectors with splat constants llvm-svn: 293570	2017-01-30 23:35:52 +00:00
Derek Schuff	6d76b7b455	[WebAssembly] Add wasm support for llvm-readobj Create a WasmDumper subclass of ObjDumper to support Webassembly binary files. Patch by Sam Clegg Differential Revision: https://reviews.llvm.org/D27355 llvm-svn: 293569	2017-01-30 23:30:52 +00:00
Matt Arsenault	9f432ec24c	NVPTX: Trivial cleanups of NVPTXInferAddressSpaces - Move DEBUG_TYPE below includes - Change unknown address space constant to be consistent with other passes - Grammar fixes in debug output llvm-svn: 293567	2017-01-30 23:27:11 +00:00
Eugene Zelenko	dde94e4c4f	[Mips] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 293565	2017-01-30 23:21:32 +00:00
Benjamin Kramer	365c9bd941	[ICP] Fix bool conversion warning and actually write out the reason instead of dropping it. llvm-svn: 293564	2017-01-30 23:11:29 +00:00
Matt Arsenault	42b6478344	NVPTX: Refactor NVPTXInferAddressSpaces to check TTI Add a new TTI hook for getting the generic address space value. llvm-svn: 293563	2017-01-30 23:02:12 +00:00
Sanjay Patel	0c39d56a60	[InstCombine] enable more lshr(shl X, C1), C2 folds for vectors with splat constants llvm-svn: 293562	2017-01-30 23:01:05 +00:00
Simon Pilgrim	3905e03a47	[X86][SSE] Fix unsigned <= 0 warning in assert. NFCI. Thanks to @mkuper llvm-svn: 293561	2017-01-30 22:58:44 +00:00
Simon Pilgrim	a80a47afef	[X86][SSE] Generalize the number of decoded shuffle inputs. NFCI. combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs. This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns. llvm-svn: 293560	2017-01-30 22:48:49 +00:00
Dehao Chen	6775f5d629	Expose isLegalToPromot as a global helper function so that SamplePGO pass can call it for legality check. Summary: SamplePGO needs to check if it is legal to promote a target before it actually promotes it. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29306 llvm-svn: 293559	2017-01-30 22:46:37 +00:00
Dehao Chen	6217fa44b8	Revert r292979 which causes compile time failure. llvm-svn: 293557	2017-01-30 22:26:05 +00:00
Eli Friedman	2345733246	Fix line endings. llvm-svn: 293554	2017-01-30 22:04:23 +00:00
Tom Stellard	887a2562b7	AMDGPU: Fix release build broken by r293551 llvm-svn: 293553	2017-01-30 22:02:58 +00:00
Tom Stellard	ca16621b2a	Re-commit AMDGPU/GlobalISel: Add support for simple shaders Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551	2017-01-30 21:56:46 +00:00
Tim Northover	2bf8c9d381	GlobalISel: correctly translate invoke when callee is a register. This should fix the GlobalISel verifier. llvm-svn: 293550	2017-01-30 21:45:21 +00:00
Stanislav Mekhanoshin	a3b72798af	[AMDGPU] Internalize non-kernel symbols Since we have no call support and late linking we can produce code only for used symbols. This saves compilation time, size of the final executable, and size of any intermediate dumps. Run Internalize pass early in the opt pipeline followed by global DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option. Differential Revision: https://reviews.llvm.org/D29214 llvm-svn: 293549	2017-01-30 21:05:18 +00:00
Tim Northover	c944970484	GlobalISel: account for differing exception selector sizes. For some reason the exception selector register must be a pointer (that's assumed by SDag); on the other hand, it gets moved into an IR-level type which might be entirely different (i32 on AArch64). IRTranslator needs to be aware of this. llvm-svn: 293546	2017-01-30 20:52:42 +00:00
Tim Northover	c94d70336b	GlobalISel: tidy up def/use test. NFC. llvm-svn: 293545	2017-01-30 20:52:37 +00:00
Matt Arsenault	1f2ca66317	LSR: Don't drop address space when type doesn't match For targets with different addressing modes in each address space, if this is dropped querying isLegalAddressingMode later with this will give a nonsense result, breaking the isLegalUse assertions. This is a candidate for the 4.0 release branch. llvm-svn: 293542	2017-01-30 19:50:17 +00:00
Tim Northover	79f43f195c	GlobalISel: translate memset & memmove. llvm-svn: 293541	2017-01-30 19:33:07 +00:00
Matt Arsenault	af635240d5	AMDGPU: Undo sub x, c -> add x, -c canonicalization This is worse if the original constant is an inline immediate. This should also be done for 64-bit adds, but requires fixing operand folding bugs first. llvm-svn: 293540	2017-01-30 19:30:24 +00:00
Krzysztof Parzyszek	3695d06a10	[RDF] Add support for regmasks llvm-svn: 293538	2017-01-30 19:16:30 +00:00
Tim Northover	480609d0f3	GlobalISel: permit unused vregs without a register-class after ISel. This can happen if earlier combining has removed all uses of some VReg, which is fine and shouldn't flag an error. llvm-svn: 293537	2017-01-30 19:12:50 +00:00
Benjamin Kramer	a9df941403	Fix the GCC build. This is fairly ugly, but apparently GCC still doesn't understand C++11. llvm-svn: 293535	2017-01-30 19:05:09 +00:00
Simon Pilgrim	ffe2535cf6	Use SelectionDAG::getBuildVector helper function where possible. NFCI. llvm-svn: 293532	2017-01-30 18:53:45 +00:00
Benjamin Kramer	a846e0b082	[MC] Remove global constructors from MCSectionMachO.cpp. llvm-svn: 293526	2017-01-30 18:46:26 +00:00
Matt Arsenault	0c3293844b	AMDGPU: Run AMDGPUCodeGenPrepare after inlining With leaf functions, this makes nonsensical decisions based on the uniformity of the arguments. llvm-svn: 293525	2017-01-30 18:40:29 +00:00
Sanjay Patel	373db5ba6c	[InstCombine] enable (X >>?exact C1) << C2 --> X >>?exact (C1-C2) for vectors with splat constants llvm-svn: 293524	2017-01-30 18:40:23 +00:00
Justin Bogner	8f520a73b2	SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we happened to replace a node during UpdateChains, because it would be left in the list we were iterating over. This nulls out the pointer when that happens so that we can avoid the issue. Fixes llvm.org/PR31710 llvm-svn: 293522	2017-01-30 18:29:46 +00:00
Simon Pilgrim	0a5ab5c4db	Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI. llvm-svn: 293520	2017-01-30 18:20:42 +00:00
Marcos Pividori	d2406ea900	[libFuzzer] Implement TmpDir() for Windows. Differential Revision: https://reviews.llvm.org/D28977 llvm-svn: 293516	2017-01-30 18:14:53 +00:00
Daniel Berlin	a53a72243a	NewGVN: Instead of changeToUnreachable, insert an instruction SimplifyCFG will turn into unreachable when it runs llvm-svn: 293515	2017-01-30 18:12:56 +00:00
Matt Arsenault	ee3f0acf20	AMDGPU: Make i32 uaddo/usubo legal llvm-svn: 293514	2017-01-30 18:11:38 +00:00
Matt Arsenault	32e6bfa20f	DAG: Fold fneg into compare with constant into the constant fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. llvm-svn: 293512	2017-01-30 17:57:28 +00:00
Krzysztof Parzyszek	49ffff12e5	[RDF] Extract the physical register information into a separate class llvm-svn: 293510	2017-01-30 17:46:56 +00:00
Tom Stellard	7a19d56f73	Revert "AMDGPU/GlobalISel: Add support for simple shaders" This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509	2017-01-30 17:42:41 +00:00
Sanjay Patel	062c14af5c	[InstCombine] use auto with obvious type; NFC llvm-svn: 293508	2017-01-30 17:38:55 +00:00
Sanjay Patel	77732d5033	[InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1-C2) for vectors with splat constants llvm-svn: 293507	2017-01-30 17:19:32 +00:00
David Blaikie	a66696f210	unique_ptrify some containers in GlobalISel::RegisterBankInfo To simplify/clarify memory ownership, make leaks (as one was found/fixed recently) harder to write, etc. (also, while I was there - removed a duplicate lookup in a container) llvm-svn: 293506	2017-01-30 17:13:56 +00:00
Matt Arsenault	41c1499504	AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent llvm-svn: 293504	2017-01-30 17:09:47 +00:00
Tom Stellard	e48f60aec8	AMDGPU/GlobalISel: Add support for simple shaders Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503	2017-01-30 17:09:15 +00:00
Daniel Berlin	e19f0e01a8	Revert "NewGVN: Make unreachable blocks be marked with unreachable" This reverts commit r293196 Besides making things look nicer, ATM, we'd like to preserve analysis more than we'd like to destroy the CFG. We'll probably revisit in the future llvm-svn: 293501	2017-01-30 17:06:55 +00:00
Simon Pilgrim	098998aef0	[X86][SSE] Add support for combining PINSRW+ASSERTZEXT+PEXTRW patterns with target shuffles llvm-svn: 293500	2017-01-30 16:58:34 +00:00
Matt Arsenault	0c687390fe	DAG: Constant fold fp16_to_fp/fp16_to_fp This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. llvm-svn: 293499	2017-01-30 16:57:41 +00:00
Sanjay Patel	8e644c08ee	[InstCombine] fixed to propagate 'exact' on lshr The original shift is bigger, so this may qualify as 'obvious', but here's an attempt at an Alive-based proof: Name: exact Pre: (C1 u< C2) %a = shl i8 %x, C1 %b = lshr exact i8 %a, C2 => %c = lshr exact i8 %x, C2 - C1 %b = and i8 %c, ((1 << width(C1)) - 1) u>> C2 Optimization is correct! llvm-svn: 293498	2017-01-30 16:53:03 +00:00
Benjamin Kramer	585756568c	[Coroutines] Add header guard to header that's missing one. llvm-svn: 293494	2017-01-30 16:32:20 +00:00
Adam Nemet	e7bdf227f6	[Inliner] Fold analysis remarks into missed remarks This significantly reduces the noise level of these messages. llvm-svn: 293492	2017-01-30 16:22:45 +00:00
Krzysztof Parzyszek	b561cf953a	[RDF] Add phis for entry block live-ins (in addition to function live-ins) llvm-svn: 293491	2017-01-30 16:20:30 +00:00
Haicheng Wu	f8dc2d8c8b	[Inliner] Fix a comment to match the code. NFC. TotalAltCost => TotalSecondaryCost Differential Revision: https://reviews.llvm.org/D29231 llvm-svn: 293490	2017-01-30 16:15:14 +00:00
Sanjay Patel	1196d7cd7f	[InstCombine] enable lshr(shl X, C1), C2 folds for vectors with splat constants llvm-svn: 293489	2017-01-30 16:11:40 +00:00
Rafael Espindola	e0eba3c493	Only print architecture dependent flags for that architecture. Different architectures can have different meaning for flags in the SHF_MASKPROC mask, so we should always check what the architecture use before checking the flag. NFC for now, but will allow fixing the value of an xmos flag. llvm-svn: 293484	2017-01-30 15:38:43 +00:00
Benjamin Kramer	73564981fe	[Hexagon] Make header self-contained. llvm-svn: 293482	2017-01-30 14:55:33 +00:00
Asaf Badouh	e11d2d73bf	[X86][MCU] Minor bug fix for r293469 + test case llvm-svn: 293478	2017-01-30 13:14:37 +00:00
Marek Olsak	e81adb52b1	AMDGPU: Remove a useless VI SMRD pattern Summary: already covered by complex patterns Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28995 llvm-svn: 293477	2017-01-30 12:25:14 +00:00
Marek Olsak	8e93529020	AMDGPU: Fix assembler encoding for EXP instructions on VI Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28992 llvm-svn: 293476	2017-01-30 12:25:03 +00:00
Daniel Berlin	9d8a335ce0	Revert "[MemorySSA] Revert r293361 and r293363, as the tests fail under asan." This reverts commit r293471, reapplying r293361 and r293363 with a fix for an out-of-bounds read. llvm-svn: 293474	2017-01-30 11:35:39 +00:00
Sam McCall	b9d6c10c2d	[MemorySSA] Revert r293361 and r293363, as the tests fail under asan. llvm-svn: 293471	2017-01-30 09:19:50 +00:00
Kristof Beyls	65a12c012f	[GlobalISel] Add support for indirectbr Differential Revision: https://reviews.llvm.org/D28079 llvm-svn: 293470	2017-01-30 09:13:18 +00:00
Asaf Badouh	53713df0c2	[X86][MCU] replace select with bit manipulation instead of branches Differential Revision: https://reviews.llvm.org/D28354 llvm-svn: 293469	2017-01-30 08:16:59 +00:00
Craig Topper	f6df4a6978	[AVX-512] Remove duplicate CodeGenOnly patterns for scalar register broadcast. We can use COPY_TO_REGCLASS like AVX does. This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX. llvm-svn: 293464	2017-01-30 06:59:06 +00:00
Sam McCall	a682dfb3e5	Include LLVMDumpValue in release builds. This part of the C API is still used in language bindings. llvm-svn: 293460	2017-01-30 05:40:52 +00:00
Jonas Paulsson	3f71d6a38e	[LoopVectorize] Improve getVectorCallCost() getScalarizationOverhead() call. By calling getScalarizationOverhead with the CallInst instead of the types of its arguments, we make sure that only unique call arguments are added to the scalarization cost. getScalarizationOverhead() is extended to handle calls by only passing on the actual call arguments (which is not all the operands). This also eliminates a wrapper function with the same name. review: Hal Finkel llvm-svn: 293459	2017-01-30 05:38:05 +00:00
Craig Topper	0265a39472	[AVX-512] Remove KSET0B/KSET1B in favor of the patterns that select KSET0W/KSET1W for v8i1. llvm-svn: 293458	2017-01-30 05:37:47 +00:00
Davide Italiano	6c77de0367	[MemorySSA] Correct an assertion surrounding with parentheses. llvm-svn: 293453	2017-01-30 03:16:43 +00:00
Craig Topper	3b7e823f92	[AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI shift within elements while KSHIFT moves whole elements. llvm-svn: 293448	2017-01-30 00:06:01 +00:00
Chris Ray	30b3fafb94	[X86][Disassembler] Added SALC instruction Reviewers: joe.abbey, craig.topper Reviewed By: craig.topper Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D29201 llvm-svn: 293447	2017-01-29 23:02:47 +00:00
Craig Topper	db919caf1b	[AVX-512] Fix lowering for mask register concatenation with undef in the lower half. Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width. llvm-svn: 293446	2017-01-29 22:53:33 +00:00
Chris Ray	ba3741cb2b	[X86] Fixing flag usage for RCL and RCR Summary: The RCL and RCR instructions use the carry flag. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29237 llvm-svn: 293441	2017-01-29 20:05:30 +00:00
Matthias Braun	a4976c6166	MachineInstr: Remove parameter from dump() The primary use of the dump() functions in LLVM is for use in a debugger. Unfortunately lldb does not seem to handle default arguments so using `p SomeMI.dump()` fails and you have to type the longer `p SomeMI.dump(nullptr)`. Remove the paramter to make the most common use easy. (You can always construct something like `p SomeMI.print(dbgs(),MyTII)` if you need more features). Differential Revision: https://reviews.llvm.org/D29241 llvm-svn: 293440	2017-01-29 18:20:42 +00:00

1 2 3 4 5 ...

99251 Commits