llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Parker	312409e464	[ARM] MVE Tail Predication The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179	2019-09-06 08:24:41 +00:00
Kang Zhang	f879c68755	[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks Summary: Fix a bug of not update the jump table and recommit it again. In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 371177	2019-09-06 08:16:18 +00:00
Mikael Holmen	dee0702b2a	[MIR] Change test case to read from stdin instead of file The ;CHECK: bb ;CHECK-NEXT: %namedVReg1353:_(p0) = COPY $d0 parts of the test case failed when the tests were placed in a directory including "bb" in the path, since the full path of the file is then output in the ; ModuleID = '/repo/bb/ line which the CHECK matched on and then the CHECK-NEXT failed. llvm-svn: 371171	2019-09-06 06:55:54 +00:00
Craig Topper	463c8e5eeb	[X86] Add tests for extending and truncating between v16i8 and v16i64 with min-legal-vector-width=256. It looks like we might be able to do these in fewer steps, but I'm not sure. llvm-svn: 371170	2019-09-06 06:02:17 +00:00
Alex Brachet	dfacf8851e	Fix rL371162 again llvm-svn: 371164	2019-09-06 03:31:42 +00:00
Alex Brachet	27d42af603	Fix failing test from rL371162 llvm-svn: 371163	2019-09-06 02:56:48 +00:00
Alex Brachet	0b69c59656	[yaml2obj] Make e_phoff and e_phentsize 0 if there are no program headers Summary: It says [[ http://www.sco.com/developers/gabi/latest/ch4.eheader.html \| here ]] that if there are no program headers than e_phoff should be 0, but currently it is always set after the header. GNU's `readelf` (but not `llvm-readelf`) complains about this: `readelf: Warning: possibly corrupt ELF header - it has a non-zero program header offset, but no program headers`. Reviewers: jhenderson, grimar, MaskRay, rupprecht Reviewed By: jhenderson, grimar, MaskRay Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67054 llvm-svn: 371162	2019-09-06 02:27:55 +00:00
Alina Sbirlea	57fcb1d7fc	Cleanup test. llvm-svn: 371158	2019-09-06 00:58:03 +00:00
Fangrui Song	9d2504b6d8	[llvm-readobj][yaml2obj] Support SHT_LLVM_SYMPART, SHT_LLVM_PART_EHDR and SHT_LLVM_PART_PHDR See http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html and D60242 for the lld partition feature. This patch: * Teaches yaml2obj to parse the 3 section types. * Teaches llvm-readobj/llvm-readelf to dump the 3 section types. There is no test for SHT_LLVM_DEPENDENT_LIBRARIES in llvm-readobj. Add it as well. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D67228 llvm-svn: 371157	2019-09-06 00:53:28 +00:00
Matt Arsenault	4d90625271	AMDGPU/GlobalISel: Fix load/store of types in other address spaces There should probably be a size only matcher. llvm-svn: 371155	2019-09-06 00:36:06 +00:00
Matt Arsenault	9ceb6edf11	GlobalISel/TableGen: Fix handling of EXTRACT_SUBREG constraints This was only using the correct register constraints if this was the final result instruction. If the extract was a sub instruction of the result, it would attempt to use GIR_ConstrainSelectedInstOperands on a COPY, which won't work. Move the handling to createAndImportSubInstructionRenderer so it works correctly. I don't fully understand why runOnPattern and createAndImportSubInstructionRenderer both need to handle these special cases, and constrain them with slightly different methods. If I remove the runOnPattern handling, it does break the constraint when the final result instruction is EXTRACT_SUBREG. llvm-svn: 371150	2019-09-06 00:05:58 +00:00
Matt Arsenault	60c8b8bcf2	AMDGPU: Allow getMemOperandWithOffset to analyze stack accesses Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149	2019-09-05 23:54:35 +00:00
Matt Arsenault	59ff77ee38	AMDGPU: Fix emitting multiple stack loads for stack passed workitems The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148	2019-09-05 23:40:14 +00:00
Eli Friedman	9dd453ce8d	[AArch64] Add testcase for codegen for sdiv by 2. llvm-svn: 371147	2019-09-05 23:40:03 +00:00
Matt Arsenault	524a9d5774	InstCombine: Fix crash on icmp of gep with addrspacecasted null llvm-svn: 371146	2019-09-05 23:39:21 +00:00
David Blaikie	707be7ef9c	llvm-reduce: Use %python from lit to get the correct/valid python binary for the reduction script llvm-svn: 371143	2019-09-05 23:33:44 +00:00
Alina Sbirlea	35548e80d6	[AliasSetTracker] Correct AAInfo check. Properly check if NewAAInfo conflicts with AAInfo. Update local variable and alias set that a change occured when a conflict is found. Resolves PR42969. llvm-svn: 371139	2019-09-05 23:00:36 +00:00
Vitaly Buka	9020f11377	[SimplifyCFG] Don't SimplifyBranchOnICmpChain with ExtraCase Summary: Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain which can check on undef. Msan by design reports branches on uninitialized memory and undefs, so we have false report here. In general msan does not like when we convert ``` // If at least one of them is true we can MSAN is ok if another is undefs if (a \|\| b) return; ``` into ``` // If 'a' is undef MSAN will complain even if 'b' is true if (a) return; if (b) return; ``` Example Before optimization we had something like this: ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue break; } // we know that c == 10 \|\| c == 13 if we get here, // so msan know that branch is not affected by maybe_undef if (maybe_undef \|\| c == 10 \|\| c == 13) continue; return; } ``` SimplifyBranchOnICmpChain will convert that into ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue; break; } // however msan will complain here: if (maybe_undef) continue; // we know that c == 10 \|\| c == 13, so either way we will get continue switch(c) { case 10: continue; case 13: continue; } return; } ``` Reviewers: eugenis, efriedma Reviewed By: eugenis, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67205 llvm-svn: 371138	2019-09-05 22:49:34 +00:00
Puyan Lotfi	dc97ca9f25	[MIR] MIRNamer pass for improving MIR test authoring experience. This patch reuses the MIR vreg renamer from the MIRCanonicalizerPass to cleanup names of vregs in a MIR file for MIR test authors. I found it useful when writing a regression test for a globalisel failure I encountered recently and thought it might be useful for other folks as well. Differential Revision: https://reviews.llvm.org/D67209 llvm-svn: 371121	2019-09-05 20:44:33 +00:00
Jessica Paquette	20e8667098	Recommit "[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls" Recommit basic sibling call lowering (https://reviews.llvm.org/D67189) The issue was that if you have a return type other than void, call lowering will emit COPYs to get the return value after the call. Disallow sibling calls other than ones that return void for now. Also proactively disable swifterror tail calls for now, since there's a similar issue with COPYs there. Update call-translator-tail-call.ll to include test cases for each of these things. llvm-svn: 371114	2019-09-05 20:18:34 +00:00
Eli Friedman	cae1e47f6e	[IfConversion] Fix diamond conversion with unanalyzable branches. The code was incorrectly counting the number of identical instructions, and therefore tried to predicate an instruction which should not have been predicated. This could have various effects: a compiler crash, an assembler failure, a miscompile, or just generating an extra, unnecessary instruction. Instead of depending on TargetInstrInfo::removeBranch, which only works on analyzable branches, just remove all branch instructions. Fixes https://bugs.llvm.org/show_bug.cgi?id=43121 and https://bugs.llvm.org/show_bug.cgi?id=41121 . Differential Revision: https://reviews.llvm.org/D67203 llvm-svn: 371111	2019-09-05 20:02:38 +00:00
Roman Lebedev	071ce66729	[NFC][InstCombine] Overhaul 'unsigned add overflow' tests, ensure that all 3 patterns have full test coverage llvm-svn: 371108	2019-09-05 19:13:15 +00:00
Craig Topper	0fde412140	[X86] Enable BuildSDIVPow2 for i16. We're able to use a 32-bit ADD and CMOV here and should work well with our other i16->i32 promotion optimizations. llvm-svn: 371107	2019-09-05 18:49:52 +00:00
Craig Topper	b8d6ba3ca2	[X86] Override BuildSDIVPow2 for X86. As noted in PR43197, we can use test+add+cmov+sra to implement signed division by a power of 2. This is based off the similar version in AArch64, but I've adjusted it to use target independent nodes where AArch64 uses target specific CMP and CSEL nodes. I've also blocked INT_MIN as the transform isn't valid for that. I've limited this to i32 and i64 on 64-bit targets for now and only when CMOV is supported. i8 and i16 need further investigation to be sure they get promoted to i32 well. I adjusted a few tests to enable cmov to demonstrate the new codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode without cmov to avoid perturbing the scenario that is being set up there. Differential Revision: https://reviews.llvm.org/D67087 llvm-svn: 371104	2019-09-05 18:15:07 +00:00
Roman Lebedev	8360c42e25	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub overflow' check A follow-up for r329011. This may be changed to produce @llvm.sub.with.overflow in a later patch, but for now just make things more consistent overall. A few observations stem from this: * There does not seem to be a similar one-instruction fold for uadd-overflow * I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`, so since the `icmp` here no longer refers to `sub`, reconstructing `usub.with.overflow` will be problematic, and will likely require standalone pass (similar to DivRemPairs). https://rise4fun.com/Alive/Zqs Name: (A - B) u> A --> B u> A %t0 = sub i8 %A, %B %r = icmp ugt i8 %t0, %A => %r = icmp ugt i8 %B, %A Name: (A - B) u<= A --> B u<= A %t0 = sub i8 %A, %B %r = icmp ule i8 %t0, %A => %r = icmp ule i8 %B, %A Name: C u< (C - D) --> C u< D %t0 = sub i8 %C, %D %r = icmp ult i8 %C, %t0 => %r = icmp ult i8 %C, %D Name: C u>= (C - D) --> C u>= D %t0 = sub i8 %C, %D %r = icmp uge i8 %C, %t0 => %r = icmp uge i8 %C, %D llvm-svn: 371101	2019-09-05 17:41:02 +00:00
Roman Lebedev	ecb7ea1ae7	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add overflow' check A follow-up for r342004. This will be changed to produce @llvm.add.with.overflow in a later patch, but for now just make things more consistent overall. https://rise4fun.com/Alive/qxE Name: (Op1 + X) u< Op1 --> ~Op1 u< X %t0 = add i8 %Op1, %X %r = icmp ult i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp ult i8 %n, %X Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X %t0 = add i8 %Op1, %X %r = icmp uge i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp uge i8 %n, %X ;------------------------------------------------------------------------------- Name: Op0 u> (Op0 + X) --> X u> ~Op0 %t0 = add i8 %Op0, %X %r = icmp ugt i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ugt i8 %X, %n Name: Op0 u<= (Op0 + X) --> X u<= ~Op0 %t0 = add i8 %Op0, %X %r = icmp ule i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ule i8 %X, %n llvm-svn: 371100	2019-09-05 17:40:49 +00:00
Roman Lebedev	1d9e0dcc9d	[InstCombine][NFC] Tests for 'unsigned sub overflow' check ---------------------------------------- Name: unsigned sub, overflow, v0 %sub = sub i8 %x, %y %ov = icmp ugt i8 %sub, %x => %agg = usub_overflow i8 %x, %y %sub = extractvalue {i8, i1} %agg, 0 %ov = extractvalue {i8, i1} %agg, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned sub, no overflow, v0 %sub = sub i8 %x, %y %ov = icmp ule i8 %sub, %x => %agg = usub_overflow i8 %x, %y %sub = extractvalue {i8, i1} %agg, 0 %not.ov = extractvalue {i8, i1} %agg, 1 %ov = xor %not.ov, -1 Done: 1 Optimization is correct! llvm-svn: 371099	2019-09-05 17:40:37 +00:00
Roman Lebedev	745046c23f	[InstCombine][NFC] Tests for 'unsigned add overflow' check ---------------------------------------- Name: unsigned add, overflow, v0 %add = add i8 %x, %y %ov = icmp ult i8 %add, %x => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %ov = extractvalue {i8, i1} %agg, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned add, overflow, v1 %add = add i8 %x, %y %ov = icmp ult i8 %add, %y => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %ov = extractvalue {i8, i1} %agg, 1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned add, no overflow, v0 %add = add i8 %x, %y %ov = icmp uge i8 %add, %x => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %not.ov = extractvalue {i8, i1} %agg, 1 %ov = xor %not.ov, -1 Done: 1 Optimization is correct! ---------------------------------------- Name: unsigned add, no overflow, v1 %add = add i8 %x, %y %ov = icmp uge i8 %add, %y => %agg = uadd_overflow i8 %x, %y %add = extractvalue {i8, i1} %agg, 0 %not.ov = extractvalue {i8, i1} %agg, 1 %ov = xor %not.ov, -1 Done: 1 Optimization is correct! llvm-svn: 371098	2019-09-05 17:40:28 +00:00
Sanjay Patel	10412a69f9	[x86] fix horizontal math bug exposed by improved demanded elements analysis (PR43225) https://bugs.llvm.org/show_bug.cgi?id=43225 llvm-svn: 371095	2019-09-05 17:28:17 +00:00
Craig Topper	673da001c5	[X86] Remove unneeded CHECK lines from a test. NFC llvm-svn: 371093	2019-09-05 17:24:25 +00:00
Denis Bakhvalov	58f172f05a	[MergedLoadStoreMotion] Sink stores to BB with more than 2 predecessors If we have: bb5: br i1 %arg3, label %bb6, label %bb7 bb6: %tmp = getelementptr inbounds i32, i32* %arg1, i64 2 store i32 3, i32* %tmp, align 4 br label %bb9 bb7: %tmp8 = getelementptr inbounds i32, i32* %arg1, i64 2 store i32 3, i32* %tmp8, align 4 br label %bb9 bb9: ; preds = %bb4, %bb6, %bb7 ... We can't sink stores directly into bb9. This patch creates new BB that is successor of %bb6 and %bb7 and sinks stores into that block. SplitFooterBB is the parameter to the pass that controls that behavior. Change-Id: I7fdf50a772b84633e4b1b860e905bf7e3e29940f Differential: https://reviews.llvm.org/D66234 llvm-svn: 371089	2019-09-05 17:00:32 +00:00
Sanjay Patel	3856512334	[x86] add test for horizontal math bug (PR43225); NFC llvm-svn: 371088	2019-09-05 16:58:18 +00:00
Hiroshi Yamauchi	d842f2eec4	[PGO][CHR] Speed up following long, interlinked use-def chains. Summary: Avoid visiting an instruction more than once by using a map. This is similar to https://reviews.llvm.org/rL361416. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67198 llvm-svn: 371086	2019-09-05 16:56:55 +00:00
Alina Sbirlea	ae900d3882	[MemorySSA] Update MemorySSA when removing debug.value calls. llvm-svn: 371084	2019-09-05 16:25:24 +00:00
Krzysztof Parzyszek	0ce93194fe	[Hexagon] Fix type in HexagonTargetLowering::ReplaceNodeResults llvm-svn: 371083	2019-09-05 16:19:47 +00:00
Simon Pilgrim	29361c704d	[X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads (PR43227) As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset. Until we can properly handle this, we must bail from EltsFromConsecutiveLoads. llvm-svn: 371078	2019-09-05 15:07:07 +00:00
Fangrui Song	c3bc697974	[yaml2obj] Write the section header table after section contents Linkers (ld.bfd/gold/lld) place the section header table at the very end. This allows tools to strip it, which is optional in executable/shared objects. In addition, if we add or section, the size of the section header table will change. Placing the section header table in the end keeps section offsets unchanged. yaml2obj currently places the section header table immediately after the program header. Follow what linkers do to make offset updating easier. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D67221 llvm-svn: 371074	2019-09-05 14:25:57 +00:00
George Rimar	4e14bf71b7	[llvm-readelf] - Allow dumping dynamic symbols when there is no program headers. D62179 introduced a regression. llvm-readelf lose the ability to dump the dynamic symbols when there is .dynamic section with a DT_SYMTAB, but there are no program headers: https://reviews.llvm.org/D62179#1652778 Below is a program flow before the D62179 change: 1) Find SHT_DYNSYM. 2) Find there is no PT_DYNAMIC => don't try to parse it. 3) Print dynamic symbols using information about them found on step (1). And after the change it became: 1) Find SHT_DYNSYM. 2) Find there is no PT_DYNAMIC => find SHT_DYNAMIC. 3) Parse dynamic table, but fail to handle the DT_SYMTAB because of the absence of the PT_LOAD. Report the "Virtual address is not in any segment" error. This patch fixes the issue. For doing this it checks that the value of DT_SYMTAB was mapped to a segment. If not - it ignores it. Differential revision: https://reviews.llvm.org/D67078 llvm-svn: 371071	2019-09-05 14:02:58 +00:00
David Green	83a3341246	[ARM] Fixup the creation of VPT blocks This attempts to just fix the creation of VPT blocks, fixing up the iterating, which instructions are considered in the bundle, and making sure that we do not overrun the end of the block. Differential Revision: https://reviews.llvm.org/D67219 llvm-svn: 371064	2019-09-05 13:37:04 +00:00
Simon Pilgrim	215910eeb2	[X86][SSE] Add (failing) test case for PR43227 llvm-svn: 371061	2019-09-05 12:36:11 +00:00
Petar Avramovic	a4bfc8dfda	[MIPS GlobalISel] Select G_FENCE G_FENCE comes form fence instruction. For MIPS fence is generated in AtomicExpandPass when atomic instruction gets surrounded with fence instruction when needed. G_FENCE arguments don't have LLT, because of that there is no job for legalizer and regbankselect. Instruction select G_FENCE for MIPS32. Differential Revision: https://reviews.llvm.org/D67181 llvm-svn: 371056	2019-09-05 11:20:32 +00:00
Petar Avramovic	f5c7fe0795	[MIPS GlobalISel] Select llvm.trap intrinsic Select G_INTRINSIC_W_SIDE_EFFECTS for Intrinsic::trap for MIPS32 via legalizeIntrinsic. Differential Revision: https://reviews.llvm.org/D67180 llvm-svn: 371055	2019-09-05 11:16:37 +00:00
Petar Avramovic	d2574d79b6	[MIPS GlobalISel] Lower SRet pointer arguments Instead of returning structure by value clang usually adds pointer to that structure as an argument. Pointers don't require special handling no matter the SRet flag. Remove unsuccessful exit from lowerCall for arguments with SRet flag if they are pointers. Differential Revision: https://reviews.llvm.org/D67179 llvm-svn: 371054	2019-09-05 11:12:01 +00:00
Simon Pilgrim	071287c5a9	Revert rL370996 from llvm/trunk: [AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 ........ This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll llvm-svn: 371051	2019-09-05 10:38:39 +00:00
Jonas Paulsson	821858780e	[SystemZ] Recognize INLINEASM_BR in backend Handle the remaining cases also by handling asm goto in SystemZInstrInfo::getBranchInfo(). Review: Ulrich Weigand https://reviews.llvm.org/D67151 llvm-svn: 371048	2019-09-05 10:20:05 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
George Rimar	e7b4d20998	Recommit r371023 "[lib/ObjectYAML] - Stop calling error(1) when mapping the st_other field of a symbol." Fix: added missing return "return 0;" Original commit message: This eliminates one of the error(1) call in this lib. It is different from the others because happens on a fields mapping stage and can be easily fixed. Differential revision: https://reviews.llvm.org/D67150 llvm-svn: 371030	2019-09-05 08:52:26 +00:00
George Rimar	7f1f50de41	Revert r371023 "[lib/ObjectYAML] - Stop calling error(1) when mapping the st_other field of a symbol." It broke BBots: http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/36387/steps/build_Lld/logs/stdio http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/17117/steps/test/logs/stdio llvm-svn: 371024	2019-09-05 08:38:29 +00:00
George Rimar	2c9c432256	[lib/ObjectYAML] - Stop calling error(1) when mapping the st_other field of a symbol. This eliminates one of the error(1) call in this lib. It is different from the others because happens on a fields mapping stage and can be easily fixed. Differential revision: https://reviews.llvm.org/D67150 llvm-svn: 371023	2019-09-05 08:28:43 +00:00
Igor Kudrin	e46639620d	[DWARF] Fix referencing Range List Tables from CUs for DWARF64. As DW_AT_rnglists_base points after the header and headers have different sizes for DWARF32 and DWARF64, we have to use the format of the CU to adjust the offset correctly in order to extract the referenced range list table. The patch also changes the type of RangeSectionBase because in DWARF64 it is 8-bytes long. Differential Revision: https://reviews.llvm.org/D67098 llvm-svn: 371016	2019-09-05 07:02:28 +00:00
Igor Kudrin	991f0fb149	[DWARF] Support DWARF64 in DWARFListTableHeader. This enables 64-bit DWARF support for parsing range and location list tables. Differential Revision: https://reviews.llvm.org/D66643 llvm-svn: 371014	2019-09-05 06:49:05 +00:00
Matt Arsenault	f581d575ce	AMDGPU: Add intrinsics for address space identification The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009	2019-09-05 02:20:39 +00:00
Matt Arsenault	69b1a2ae65	AMDGPU/GlobalISel: Restore insert point when getting aperture Avoids SSA violations in a future patch. llvm-svn: 371008	2019-09-05 02:20:32 +00:00
Matt Arsenault	25156ae7ea	AMDGPU/GlobalISel: Fix placeholder value used for addrspacecast llvm-svn: 371007	2019-09-05 02:20:29 +00:00
Matt Arsenault	d51a3746d0	AMDGPU/GlobalISel: Fix assert on load from constant address llvm-svn: 371006	2019-09-05 02:20:25 +00:00
Puyan Lotfi	6d3ea2d9b6	[mir-canon][NFC] Adding -verify-machineinstrs to mir-canon tests. In the review process for some of the refactoring of MIRCanonicalizationPass it was noted that some of the tests didn't have verifier enabled. Enabling here. llvm-svn: 371005	2019-09-05 02:10:41 +00:00
Reid Kleckner	29ccc8523a	Use -mtriple to fix AMDGPU test sensitive to object file format GOTPCREL32 doesn't exist on COFF, so it isn't used when this test runs on Windows. llvm-svn: 371000	2019-09-05 00:34:01 +00:00
Jessica Paquette	b78324fc40	[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 llvm-svn: 370996	2019-09-04 22:54:52 +00:00
Matt Arsenault	2df41a8e38	AMDGPU/GlobalISel: Select G_BITREVERSE llvm-svn: 370980	2019-09-04 20:46:31 +00:00
Matt Arsenault	5ff310e298	GlobalISel: Add basic legalization for G_BITREVERSE llvm-svn: 370979	2019-09-04 20:46:15 +00:00
Johannes Doerfert	7ab5253704	[Attributor][Fix] Make sure we do not delete live code Summary: Liveness needs to mark edges, not blocks as dead. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67191 llvm-svn: 370975	2019-09-04 20:34:52 +00:00
Leonard Chan	eca01b031d	[NewPM][Sancov] Make Sancov a Module Pass instead of 2 Passes This patch merges the sancov module and funciton passes into one module pass. The reason for this is because we ran into an out of memory error when attempting to run asan fuzzer on some protobufs (pc.cc files). I traced the OOM error to the destructor of SanitizerCoverage where we only call appendTo[Compiler]Used which calls appendToUsedList. I'm not sure where precisely in appendToUsedList causes the OOM, but I am able to confirm that it's calling this function repeatedly that causes the OOM. (I hacked sancov a bit such that I can still create and destroy a new sancov on every function run, but only call appendToUsedList after all functions in the module have finished. This passes, but when I make it such that appendToUsedList is called on every sancov destruction, we hit OOM.) I don't think the OOM is from just adding to the SmallSet and SmallVector inside appendToUsedList since in either case for a given module, they'll have the same max size. I suspect that when the existing llvm.compiler.used global is erased, the memory behind it isn't freed. I could be wrong on this though. This patch works around the OOM issue by just calling appendToUsedList at the end of every module run instead of function run. The same amount of constants still get added to llvm.compiler.used, abd we make the pass usage and logic simpler by not having any inter-pass dependencies. Differential Revision: https://reviews.llvm.org/D66988 llvm-svn: 370971	2019-09-04 20:30:29 +00:00
Evandro Menezes	bf78e39cbb	[InstCombine] Add more test cases (NFC) Add more test cases simplifying `log()`. llvm-svn: 370966	2019-09-04 20:01:09 +00:00
Alina Sbirlea	6da79ce1fe	[MemorySSA] Re-enable MemorySSA use. Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370957	2019-09-04 19:16:04 +00:00
David Bolvansky	420cbb6190	[InstCombine] sub(xor(x, y), or(x, y)) -> neg(and(x, y)) Summary: ``` Name: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) %or = or i32 %y, %x %xor = xor i32 %x, %y %sub = sub i32 %xor, %or => %sub1 = and i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/8OI Reviewers: lebedev.ri Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67188 llvm-svn: 370945	2019-09-04 18:03:21 +00:00
David Bolvansky	f6233d90f0	[NFC] Added tests for new fold llvm-svn: 370941	2019-09-04 17:37:06 +00:00
David Bolvansky	2ceb00db76	[NFC] Adjust test filename llvm-svn: 370939	2019-09-04 17:33:53 +00:00
Craig Topper	f0081dac81	[X86] Pre-commit test cases and test run line changes for D67087 llvm-svn: 370937	2019-09-04 17:33:38 +00:00
David Bolvansky	0e07248704	[InstCombine] Fold sub (and A, B) (or A, B)) to neg (xor A, B) Summary: ``` Name: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) %or = or i32 %y, %x %and = and i32 %x, %y %sub = sub i32 %and, %or => %sub1 = xor i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/VI6 Found by @lebedev.ri. Also author of the proof. Reviewers: lebedev.ri, spatel Reviewed By: lebedev.ri Subscribers: llvm-commits, lebedev.ri Tags: #llvm Differential Revision: https://reviews.llvm.org/D67155 llvm-svn: 370934	2019-09-04 17:30:53 +00:00
Matt Arsenault	84489b34f6	AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9 Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929	2019-09-04 17:12:57 +00:00
Matt Arsenault	70becc20fa	GlobalISel: Add G_BITREVERSE This is the first failing pattern for AMDGPU and is trivial to handle. llvm-svn: 370927	2019-09-04 17:06:53 +00:00
Johannes Doerfert	2f6220633c	[Attributor] Look at internal functions only on-demand Summary: Instead of building attributes for internal functions which we do not update as long as we assume they are dead, we now do not create attributes until we assume the internal function to be live. This improves the number of required iterations, as well as the number of required updates, in real code. On our tests, the results are mixed. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66914 llvm-svn: 370924	2019-09-04 16:35:20 +00:00
Johannes Doerfert	97fd582b91	[Attributor] Use the white list for attributes consistently Summary: We create attributes on-demand so we need to check the white list on-demand. This also unifies the location at which we create, initialize, and eventually invalidate new abstract attributes. The tests show mixed results, a few more call site attributes are determined which can cause more iterations. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66913 llvm-svn: 370922	2019-09-04 16:26:20 +00:00
Matt Arsenault	d9af712da4	AMDGPU/GlobalISel: Make 16-bit constants legal This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921	2019-09-04 16:19:45 +00:00
Johannes Doerfert	b0412e437c	[Attributor] Deal more explicit with non-exact definitions Summary: Before we tried to rule out non-exact definitions early but that lead to on-demand attributes created for them anyway. As a consequence we needed to look at the definition in the initialize of each attribute again. This patch centralized this lookup and tightens the condition under which we give up on non-exact definitions. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67115 llvm-svn: 370917	2019-09-04 16:16:13 +00:00
Krzysztof Parzyszek	08a09822a5	[Hexagon] Improve generated code for test-if-bit-clear, one more time Adjust isel patterns after recent commit. Fixes https://llvm.org/PR43194. llvm-svn: 370913	2019-09-04 15:22:36 +00:00
Sanjay Patel	4a2cd7be5a	[InstSimplify] guard against unreachable code (PR43218) This would crash: https://bugs.llvm.org/show_bug.cgi?id=43218 llvm-svn: 370911	2019-09-04 15:12:55 +00:00
Alexey Lapshin	cbf1f3b771	[Debuginfo][SROA] Need to handle dbg.value in SROA pass. SROA pass processes debug info incorrecly if applied twice. Specifically, after SROA works first time, instcombine converts dbg.declare intrinsics into dbg.value. Inlining creates new opportunities for SROA, so it is called again. This time it does not handle correctly previously inserted dbg.value intrinsics. Differential Revision: https://reviews.llvm.org/D64595 llvm-svn: 370906	2019-09-04 14:19:49 +00:00
Sanjay Patel	791949afe5	[InstCombine] add tests for insert/extract with identity shuffles; NFC llvm-svn: 370901	2019-09-04 13:38:49 +00:00
David Bolvansky	b9e9478244	[NFC] Added a negative test for new fold llvm-svn: 370890	2019-09-04 12:46:25 +00:00
David Bolvansky	13dadedc29	[NFC] Fixed test llvm-svn: 370888	2019-09-04 12:43:14 +00:00
David Bolvansky	3747c48d64	[NFC] Adjust tests for new fold llvm-svn: 370886	2019-09-04 12:22:28 +00:00
David Bolvansky	163b05b45d	[NFC] Added tests for new fold llvm-svn: 370885	2019-09-04 12:18:53 +00:00
David Bolvansky	358b80b340	[InstCombine] Fold sub (or A, B) (and A, B) to (xor A, B) Summary: ``` Name: sub or and to xor %or = or i32 %y, %x %and = and i32 %x, %y %sub = sub i32 %or, %and => %sub = xor i32 %x, %y Optimization: sub or and to xor Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/eJu Reviewers: spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67153 llvm-svn: 370883	2019-09-04 12:00:33 +00:00
Pavel Labath	98634c2e11	Fix address sizes in the dwarfdump-debug-loc-error-cases test the test is building a 64-bit executable, so the addresses should be 64-bit too. The test was still passing even with smaller address size, but it was hitting the "unexpected end of data" error sooner than it should. llvm-svn: 370882	2019-09-04 11:47:20 +00:00
David Bolvansky	54f3a651f3	[NFC] Added a new test for D67153 llvm-svn: 370881	2019-09-04 11:44:00 +00:00
David Bolvansky	75d734475a	[NFC] Added tests for 'SUB of OR and AND to XOR' fold llvm-svn: 370878	2019-09-04 11:17:08 +00:00
Jeremy Morse	337a7cb55e	[DebugInfo] LiveDebugValues: locations with different exprs should not be merged When comparing variable locations, LiveDebugValues currently considers only the machine location, ignoring any DIExpression applied to it. This is a problem because that DIExpression can do pretty much anything to the machine location, for example dereferencing it. This patch adds DIExpressions to that comparison; now variables based on the same register/memory-location but with different expressions will compare differently, and be dropped if we attempt to merge them between blocks. This reduces variable coverage-range a little, but only because we were producing broken locations. Differential Revision: https://reviews.llvm.org/D66942 llvm-svn: 370877	2019-09-04 11:09:05 +00:00
Pavel Labath	88b4e28a67	DWARF: Fix a regression in location list dumping Summary: While fixing the handling of some error cases, r370363 introduced new problems -- assertion failures due to unchecked errors (my excuse is that a very early version of that patch used Optional<T> instead of Expected). This patch adds proper handling of parsing errors encountered when dumping location lists from inside DWARF DIEs, and adds a bunch of additional tests. I reorder the arguments of the location list dumping functions to make them consistent, and also be able to dump the two kinds of location lists generically. Reviewers: JDevlieghere, dblaikie, probinson Subscribers: aprantl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67102 llvm-svn: 370868	2019-09-04 10:09:12 +00:00
Fangrui Song	441d450115	[yaml2obj] Support PT_GNU_STACK and PT_GNU_RELRO PT_GNU_STACK is used in an llvm-objcopy test. I plan to use PT_GNU_RELRO in a patch to improve nested segment processing in llvm-objcopy (PR42963). Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D67146 llvm-svn: 370857	2019-09-04 09:19:31 +00:00
Sam Parker	fea532230b	[ARM][ParallelDSP] SExt mul for accumulation For any unpaired muls, we accumulate them as an input to the reduction. Check the type of the mul and perform a sext if the existing accumlator input type is not the same. Differential Revision: https://reviews.llvm.org/D66993 llvm-svn: 370851	2019-09-04 08:41:34 +00:00
Taewook Oh	1975e635e6	[IRPrinting] Improve module pass printer to work better with -filter-print-funcs Summary: Previously module pass printer pass prints the banner even when the module doesn't include any function provided with `-filter-print-funcs` option. This introduced a lot of noise, especailly with ThinLTO. This diff addresses the issue and makes the banner printed only when the module includes functions in `-filter-print-funcs` list. Reviewers: fedor.sergeev Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66560 llvm-svn: 370849	2019-09-04 08:08:58 +00:00
Jim Lin	b77aa1d248	[RISCV] Enable tail call opt for variadic function Summary: Tail call opt can treat variadic function call the same as normal function call Reviewers: mgrang, asb, lenary, lewis-revill Reviewed By: lenary Subscribers: luismarques, pzheng, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66278 llvm-svn: 370835	2019-09-04 02:03:36 +00:00
Puyan Lotfi	954d6d661f	[NFC][llvm-ifs] Adding .ifs files to the test list for llvm-ifs tool. llvm-svn: 370830	2019-09-04 00:07:49 +00:00
Reid Kleckner	3fa07dee94	Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn This reverts r370525 (git commit `0bb1630685`) Also reverts r370543 (git commit `185ddc08ee`) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829	2019-09-03 22:27:27 +00:00
Heejin Ahn	49e7ee4dd5	[WebAssembly] Compare functions by names in Emscripten Sjlj Summary: This removes all string constants for function names and compares functions by string directly when needed. Many of these constants are used only once or twice so the benefit of defining them separately is not very clear, and this actually fixes a bug. When we already have a `malloc` declaration which is an alias to something else within the module, ``` @malloc = weak hidden alias i8* (i32), i8* (i32)* @dlmalloc ``` (this happens compiling with emscripten with `-s WASM_OBJECT_FILES=0` because all bc files are merged before being fed into `wasm-ld` which runs the backend optimizations as LTO) `Module::getFunction("malloc")` in `canLongjmp` returns `nullptr` because `Module::getFunction` dyncasts pointer into `Function`, but the alias is a `GlobalValue` but not a `Function`. This makes `canLongjmp` return false for `malloc` in this case, and we end up adding a lot of longjmp handling code around malloc. This is not only a code size increase but actually a bug because `malloc` is used in the entry block when preparing for setjmp tables for emscripten sjlj handling, and this makes initial setjmp preparation, which has to happen in the entry block, move to another split block, and this interferes with SSA update later. This also adds two more functions, `getTempRet0` and `setTempRet0`, in the list of not longjmp-able functions. Fixes https://github.com/emscripten-core/emscripten/issues/8935. Reviewers: sbc100 Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, dexonsmith, dschuff, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67129 llvm-svn: 370828	2019-09-03 22:26:49 +00:00
Vedant Kumar	0fcfe89717	[llvm-profdata] Add mode to recover from profile read failures Add a mode in which profile read errors are not immediately treated as fatal. In this mode, merging makes forward progress and reports failure only if no inputs can be read. Differential Revision: https://reviews.llvm.org/D66985 llvm-svn: 370827	2019-09-03 22:23:16 +00:00
Vedant Kumar	95fb23ab37	[InstrProf] Tighten a check for malformed data records in raw profiles The check needs to validate a counter offset before performing pointer arithmetic with the (potentially corrupt) offset. Found by UBSan's pointer overflow check. rdar://54843625 Differential Revision: https://reviews.llvm.org/D66979 llvm-svn: 370826	2019-09-03 22:23:14 +00:00
Amara Emerson	2a2c25ba48	[AArch64][GlobalISel] Legalize 128 bit divisions to libcalls. Now that we have the infrastructure to support s128 types as parameters we can expand these to libcalls. Differential Revision: https://reviews.llvm.org/D66185 llvm-svn: 370823	2019-09-03 21:42:32 +00:00
Amara Emerson	fbaf425b79	[GlobalISel][CallLowering] Add support for splitting types according to calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822	2019-09-03 21:42:28 +00:00

1 2 3 4 5 ...

64773 Commits