llvm-project

Commit Graph

Author	SHA1	Message	Date
Kit Barton	ff07fc66d9	[LoopFusion] Restrict loop fusion to rotated loops. Summary: This patch restricts loop fusion to only consider rotated loops as valid candidates. This simplifies the analysis and transformation and aligns with other loop optimizations. Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel Reviewed By: Meinersbur Subscribers: ormris, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71025	2019-12-16 15:17:29 -05:00
Craig Topper	02f644c59a	[InstCombine] Teach removeBitcastsFromLoadStoreOnMinMax not to change the size of a store. We can change the type as long as we don't change the size. Fixes PR44306 Differential Revision: https://reviews.llvm.org/D71532	2019-12-16 12:12:54 -08:00
Thomas Lively	3a93756dfb	[WebAssembly] Replace SIMD int min/max builtins with patterns Summary: The instructions were originally implemented via builtins and intrinsics so users would have to explicitly opt-in to using them. This was useful while were validating whether these instructions should have been merged into the spec proposal. Now that they have been, we can use normal codegen patterns, so the intrinsics and builtins are no longer useful. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71500	2019-12-16 11:48:49 -08:00
Jonas Paulsson	49f55dda01	[SystemZ] Improve verification of MachineOperands. Now that the machine verifier will check for cases of register/immediate MachineOperands and their correspondence to the MC instruction descriptor, this patch adds the operand types to the descriptors where they were previously missing. All MCOI::OPERAND_UNKNOWN operand types have been handled to get a known type, except for G_... (global isel) instructions. Review: Ulrich Weigand https://reviews.llvm.org/D71494	2019-12-16 09:51:54 -08:00
Teresa Johnson	878ab6df03	[TLI] Support for per-Function TLI that overrides available libfuncs Summary: Follow-on to D66428 and D71193, to build the TLI per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. With D71193, the -fno-builtin* flags are converted to function attributes, so we can now set this information per-function on the TLI. In this patch, the TLI constructor is changed to take a Function, which can be used to override the available builtins. The TLI is augmented with an array that can be used to specify which builtins are not available for the corresponding function. The available function checks are changed to consult this override before checking the underlying module level baseline TLII. New code is added to set this override array based on the attributes. I also removed the code that sets availability in the TLII in clang from the options, which is no longer needed. I removed a per-Triple caching of TLII objects in the analysis object, as it is based on the Module's Triple which is the same for all functions in any case. Is there a case where we would be compiling multiple Modules with different Triples in one compilation? Finally, I have changed the legacy analysis wrapper to create and use the new PM analysis class (TargetLibraryAnalysis) in getTLI. This is consistent with the behavior of getTTI for the legacy TargetTransformInfo analysis. This change means that getTLI now creates a new TLI on each call (although that should be very cheap as we cache the module level TLII, and computing the per-function attribute based availability should also be reasonably efficient). I measured the compile time for a large C++ file with tens of thousands of functions and as expected there was no increase. Reviewers: chandlerc, hfinkel, gchatelet Subscribers: mehdi_amini, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67923	2019-12-16 09:19:30 -08:00
Miloš Stojanović	d7efa6b198	[mips] Add an assert in getTargetStreamer() Check if the TargetStreamer can be accessed. Differential Revision: https://reviews.llvm.org/D71477	2019-12-16 17:04:55 +01:00
Guillaume Chatelet	4658da10e4	Revert "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove" This reverts commit `181ab91efc`.	2019-12-16 15:19:49 +01:00
David Tellenbach	df0cc105fa	Reland [AArch64][MachineOutliner] Return address signing for outlined functions Summary: Reland after fixing a bug that allowed outlining of SP modifying instructions that invalidated return address signing. During AArch64 frame lowering instructions to enable return address signing are inserted into functions if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 3. If an outlining candidate would outline instructions that modify sp in a way that invalidates return address signing, outlining is disabled for that particular candidate. 4. If all candidate functions agree on the signing scope, signing key and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: ostannard, paquette Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70635	2019-12-16 14:40:45 +01:00
Guillaume Chatelet	181ab91efc	[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove Summary: This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients. Functions will be deprecated one by one and as in tree code is cleaned up. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71473	2019-12-16 13:35:55 +01:00
Andrzej Warzynski	c41d2b5ab2	[AArch64][SVE2] Add intrinsics for binary narrowing operations Summary: The following intrinsics for binary narrowing add and sub operations are added: * @llvm.aarch64.sve.addhnb * @llvm.aarch64.sve.addhnt * @llvm.aarch64.sve.raddhnb * @llvm.aarch64.sve.raddhnt * @llvm.aarch64.sve.subhnb * @llvm.aarch64.sve.subhnt * @llvm.aarch64.sve.rsubhnb * @llvm.aarch64.sve.rsubhnt Reviewers: sdesmalen, rengolin, efriedma Reviewed By: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71424	2019-12-16 12:22:56 +00:00
Kristof Beyls	7f4f07ddf3	[AArch64] Enable emission of stack maps for non-Mach-O binaries on AArch64. The emission of stack maps in AArch64 binaries has been disabled for all binary formats except Mach-O since rL206610, probably mistakenly, as far as I can tell. This patch reverts this to its intended state. Differential Revision: https://reviews.llvm.org/D70069 Patch by Loic Ottet.	2019-12-16 12:02:47 +00:00
Andrzej Warzynski	7e20c3a71d	[Aarch64][SVE] Add intrinsics for scatter stores Summary: This patch adds the following SVE intrinsics for scatter stores: * 64-bit offsets: * @llvm.aarch64.sve.st1.scatter (unscaled) * @llvm.aarch64.sve.st1.scatter.index (scaled) * 32-bit unscaled offsets: * @llvm.aarch64.sve.st1.scatter.uxtw (zero-extended offset) * @llvm.aarch64.sve.st1.scatter.sxtw (sign-extended-offset) * 32-bit scaled offsets: * @llvm.aarch64.sve.st1.scatter.uxtw.index (zero-extended offset) * @llvm.aarch64.sve.st1.scatter.sxtw.index (sign-extended offset) * vector base + immediate: * @llvm.aarch64.sve.st1.scatter.imm Reviewers: rengolin, efriedma, sdesmalen Reviewed By: efriedma, sdesmalen Subscribers: kmclaughlin, eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71074	2019-12-16 11:52:53 +00:00
Jay Foad	f8495017f0	Fix whitespace.	2019-12-16 10:42:34 +00:00
Bjorn Pettersson	e5f07080b8	[BasicBlockUtils] Fix dbg.value elimination problem in MergeBlockIntoPredecessor Summary: In commit `d60f34c20a` (llvm-svn 317128, PR35113) MergeBlockIntoPredecessor was changed into discarding some dbg.value intrinsics referring to PHI values, post-splice due to loop rotation. That elimination of dbg.value intrinsics did not consider which dbg.value to keep depending on the context (e.g. if the variable is changing its value several times inside the basic block). In the past that hasn't been such a big problem since CodeGenPrepare::placeDbgValues has moved the dbg.value to be next to the PHI node anyway. But after commit `00e238896c` CodeGenPrepare isn't doing that any longer, so we need to be more careful when avoiding duplicate dbg.value intrinsics in MergeBlockIntoPredecessor. This patch replaces the code that tried to avoid duplicate dbg.values by using the RemoveRedundantDbgInstrs helper. Reviewers: aprantl, jmorse, vsk Reviewed By: aprantl, vsk Subscribers: jholewinski, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71480	2019-12-16 11:41:21 +01:00
Bjorn Pettersson	1c49553c19	[BasicBlockUtils] Add utility to remove redundant dbg.value instrs Summary: Add a RemoveRedundantDbgInstrs to BasicBlockUtils with the goal to remove redundant dbg intrinsics from a basic block. This can be useful after various transforms, as it might be simpler to do a filtering of dbg intrinsics after the transform than during the transform. One primary use case would be to replace a too aggressive removal done by MergeBlockIntoPredecessor, seen at loop rotate (not done in this patch). The elimination algorithm currently focuses on dbg.value intrinsics and is doing two iterations over the BB. First we iterate backward starting at the last instruction in the BB. Whenever a consecutive sequence of dbg.value instructions are found we keep the last dbg.value for each variable found (variable fragments are identified using the {DILocalVariable, FragmentInfo, inlinedAt} triple as given by the DebugVariable helper class). Next we iterate forward starting at the first instruction in the BB. Whenever we find a dbg.value describing a DebugVariable (identified by {DILocalVariable, inlinedAt}) we save the {DIValue, DIExpression} that describes that variables value. But if the variable already was mapped to the same {DIValue, DIExpression} pair we instead drop the second dbg.value. To ease the process of making lit tests for this utility a new pass is introduced called RedundantDbgInstElimination. It can be executed by opt using -redundant-dbg-inst-elim. Reviewers: aprantl, jmorse, vsk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71478	2019-12-16 11:41:21 +01:00
Jay Foad	4f17b1784e	Fix for AMDGPU MUL_I24 known bits calculation Summary: At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number". In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8. Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Patch by Eugene Kuznetsov. Differential Revision: https://reviews.llvm.org/D70367	2019-12-16 10:25:57 +00:00
Valentin Churavy	5c29e8c65f	[CodegenPrepare] Guard against degenerate branches Summary: Guard against a potential crash observed in https://github.com/JuliaLang/julia/issues/32994#issuecomment-524249628 If two branches are collapsed we can encounter a degenerate conditional branch `TBB==FBB`. The subsequent code assumes that they differ, so we exit out early. Reviewers: ributzka, spatel Subscribers: loladiro, dexonsmith, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66657	2019-12-16 04:23:32 -05:00
Sjoerd Meijer	049f9672d8	[ARM] Move MVE opcode helper functions to ARMBaseInstrInfo. NFC. In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have quite a few helper functions all looking at the opcodes of MVE instructions. This moves all these utility functions to ARMBaseInstrInfo. Diferential Revision: https://reviews.llvm.org/D71426	2019-12-16 09:13:59 +00:00
Lang Hames	67a1b7f053	[Orc][LLJIT] Automatically use JITLink for LLJIT on supported platforms. JITLink (which underlies ObjectLinkingLayer) is a replacement for RuntimeDyld. It supports the native code model, and linker plugins that enable a wider range of features than RuntimeDyld. Currently only enabled for MachO/x86-64 and MachO/arm64. New architectures will be added as JITLink support for them is developed.	2019-12-15 21:57:10 -08:00
Fangrui Song	d25db94fa7	[MC] Delete STT_SECTION special cases from MCSymbolELF::setType and setBinding The special cases added by rL293936 were no longer needed after rL296180 disallowed redefinition of section symbols.	2019-12-15 20:39:25 -08:00
Jim Lin	7e0fd77645	[PowerPC] Fix %llvm.ppc.altivec.vc* lowering Summary: r372285 changed LLVM to use a `TargetConstant` for parameters of intrinsics that are required to be immediates. Since that commit, use of `%llvm.ppc.altivec.vc{fsx,fux,tsxs,tuxs}` intrinsics has not worked, and resulted in a `LLVM ERROR: Cannot select: intrinsic %llvm.ppc.altivec.vc*` error. The intrinsics' TableGen definitions matched on `imm` instead of `timm`. This commit updates those definitions to use `timm`. Fixes: https://llvm.org/PR44239 Reviewers: hfinkel, nemanjai, #powerpc, Jim Reviewed By: Jim Subscribers: qiucf, wuzish, Jim, hiraditya, kbarton, jsji, shchenz, llvm-commits Tags: #llvm Patched by vddvss (Colin Samples). Differential Revision: https://reviews.llvm.org/D71138	2019-12-16 10:21:55 +08:00
Lang Hames	c0143f37da	[ORC] Make ObjectLinkingLayer own its jitlink::MemoryManager. This relieves ObjectLinkingLayer clients of the responsibility of holding the memory manager. This makes it easier to select between RTDyldObjectLinkingLayer (which already owned its memory manager factory) and ObjectLinkingLayer at runtime as clients aren't required to hold a jitlink::MemoryManager field just in case ObjectLinkingLayer is selected.	2019-12-15 17:35:52 -08:00
Fangrui Song	1ea5ce6335	[MC] Assume CommentStream is non-null in MCDisassembler::tryAdding* AArch64/ARM/X86 call the two functions. CommentStream is always initialized.	2019-12-15 16:21:06 -08:00
Fangrui Song	2b0256e49b	[MC] Ignore VK_WEAKREF in MCValue::getAccessVariant MCSymbolRefExpr::getVariantKindForName does not return VK_WEAKREF, so this code path is not exercised. Moreoever, .weakref is probably a feature that nobody uses.	2019-12-15 16:05:46 -08:00
Fangrui Song	fdb408f348	[MC] Delete unused MCAsmInfoELF::UsesNonexecutableStackSection after EM_WEBASSEMBLY was removed in D48744 This removes remnant of D15969 which hasn't been removed by D48744.	2019-12-15 15:43:30 -08:00
Sanjay Patel	6080387f13	[InstSimplify] fold splat of inserted constant to vector constant shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...> This is another missing shuffle fold pattern uncovered by the shuffle correctness fix from D70246. The problem was visible in the post-commit thread example, but we managed to overcome the limitation for that particular case with D71220. This is something like the inverse of the previous fix - there we didn't demand the inserted scalar, and here we are only demanding an inserted scalar. Differential Revision: https://reviews.llvm.org/D71488	2019-12-15 09:32:03 -05:00
Sanjay Patel	2afe864118	[DAG] Add SimplifyDemandedBits support for BSWAP This exposes a shortcoming for AArch64, and that is tracked by PR40881: https://bugs.llvm.org/show_bug.cgi?id=40881 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D58017	2019-12-15 08:52:34 -05:00
Craig Topper	1dc0c8af5e	[LegalizeTypes] Teach BitcastToInt_ATOMIC_SWAP to only create FP16_TO_FP when called from PromoteFloatResult. There's also a call from SoftenFloatResult that should not be promoted. The change test case would fail with the new RUN line prior to this change.	2019-12-14 15:05:32 -08:00
Craig Topper	95ce8f9498	[LegalizeTypes] In PromoteFloatOp_SETCC, don't both querying for transforming the result type. The result type is already legal, is doesnt' need to be transformed.	2019-12-14 15:05:32 -08:00
Logan Chien	061a94e4e2	Revert "AArch64: Fix frame record chain" Breaks aosp-O3-polly-before-vectorizer-unprofitable with the following error message: void llvm::emitFrameOffset(llvm::MachineBasicBlock &, MachineBasicBlock::iterator, const llvm::DebugLoc &, unsigned int, unsigned int, llvm::StackOffset, const llvm::TargetInstrInfo , MachineInstr::MIFlag, bool, bool, bool ): Assertion `(DestReg != AArch64::SP \|\| Bytes % 16 == 0) && "SP increment/decrement not 16-byte aligned"' failed. This reverts commit `d4e10e6adb`.	2019-12-14 13:58:40 -08:00
Logan Chien	d4e10e6adb	AArch64: Fix frame record chain The commit r369122 may keep LR and FP register (aka. frame record) in the middle of a frame, thus we must add the offsets to ensure the FP register always points to innermost frame record on the stack. According to AAPCS64[1], a conforming code shall construct a linked list of stack frames that can be traversed with frame records. This commit is also essential to frame-pointer-based stack unwinder (e.g. the stack unwinder in linx-perf-tools.) [1] https://github.com/ARM-software/software-standards/blob/master/abi/aapcs64/aapcs64.rst#the-frame-pointer Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64/framelayout-frame-record.ll Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64 Differential Revision: https://reviews.llvm.org/D70800	2019-12-14 10:23:20 -08:00
Puyan Lotfi	816985c120	[NFC][llvm][MIRVRegNamerUtils] Refactoring GetHashableMO into switch-statement. This refactors the if-statements handling the hashing of various MachineOperand types into a switch-statement. The purpose is to cover all the basis for all MachineOperand types while being very deliberate about which MachineOperand types we are not handling and why (better added comments). This patch is a NFC redo of https://reviews.llvm.org/D71396. Much of the changes present in D71396 will come in smaller follow-up patches that will add support for hashing the MachineOperand types that aren't covered piece-meal with tests for each new case.	2019-12-14 02:31:07 -05:00
Johannes Doerfert	139c9ef45a	[Attributor] Annotate call sites of declarations with a callback Even if a declaration is called, if there is a callback we might need the information during CG-SCC traversal (D70767).	2019-12-13 23:51:59 -06:00
Johannes Doerfert	3d347e2835	[Attributor][NFC] Simplify debug printing for abstract attributes This also fixes a type in the debug printing of AANoAlias.	2019-12-13 23:51:59 -06:00
Johannes Doerfert	5d34602da4	[Attributor] Only replace instruction operands This was part of D70767. When we replace the value of (call/invoke) instructions we do not want to disturb the old call graph so we will only replace instruction uses until we get rid of the old PM. Accepted as part of D70767.	2019-12-13 22:16:38 -06:00
Fangrui Song	a0aa58dad5	[AArch64] Save FP for leaf functions when disabling frame pointer elimination The change allows clang -mno-omit-leaf-frame-pointer to disable frame pointer elimination. This behavior matches X86 and Mips, and also GCC AArch64. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D71168	2019-12-13 18:48:58 -08:00
Sean Fertile	93faa237da	[PowerPC] Add Support for indirect calls on AIX. Extends the desciptor-based indirect call support for 32-bit codegen, and enables indirect calls for AIX. In-depth Description: In a function descriptor based ABI, a function pointer points at a descriptor structure as opposed to the function's entry point. The descriptor takes the form of 3 pointers: 1 for the function's entry point, 1 for the TOC anchor of the module containing the function definition, and 1 for the environment pointer: struct FunctionDescriptor { void EntryPoint; void TOCAnchor; void *EnvironmentPointer; }; An indirect call has several steps of loading the the information from the descriptor into the proper registers for setting up the call. Namely it has to: 1) Save the caller's TOC pointer into the TOC save slot in the linkage area, and then load the callee's TOC pointer into the TOC register (GPR 2 on AIX). 2) Load the function descriptor's entry point into the count register. 3) Load the environment pointer into the environment pointer register (GPR 11 on AIX). 4) Perform the call by branching on count register. 5) Restore the caller's TOC pointer after returning from the indirect call. A couple important caveats to the above: - There is no way to directly load a value from memory into the count register. Instead we populate the count register by loading the entry point address into a gpr and then moving the gpr to the count register. - The TOC restore has to come immediately after the branch on count register instruction (i.e., the 1st instruction executed after we return from the call). This is an implementation limitation. We could, in theory, schedule the restore elsewhere as long as no uses of the TOC pointer fall in between the call and the restore; however, to keep it simple, we insert a pseudo instruction that represents both the indirect branch instruction and the load instruction that restores the caller's TOC from the linkage area. As they flow through the compiler as a single pseudo instruction, nothing can be inserted between them and the caller's TOC is then valid at any use. Differtential Revision: https://reviews.llvm.org/D70724	2019-12-13 20:07:00 -05:00
Fangrui Song	40c288b75c	[Mips] Fix gcc -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71028	2019-12-13 16:41:08 -08:00
Roman Tereshin	8731799fc6	[Legalizer] Making artifact combining order-independent Legalization algorithm is complicated by two facts: 1) While regular instructions should be possible to legalize in an isolated, per-instruction, context-free manner, legalization artifacts can only be eliminated in pairs, which could be deeply, and ultimately arbitrary nested: { [ () ] }, where which paranthesis kind depicts an artifact kind, like extend, unmerge, etc. Such structure can only be fully eliminated by simple local combines if they are attempted in a particular order (inside out), or alternatively by repeated scans each eliminating only one innermost pair, resulting in O(n^2) complexity. 2) Some artifacts might in fact be regular instructions that could (and sometimes should) be legalized by the target-specific rules. Which means failure to eliminate all artifacts on the first iteration is not a failure, they need to be tried as instructions, which may produce more artifacts, including the ones that are in fact regular instructions, resulting in a non-constant number of iterations required to finish the process. I trust the recently introduced termination condition (no new artifacts were created during as-a-regular-instruction-retrial of artifacts not eliminated on the previous iteration) to be efficient in providing termination, but only performing the legalization in full if and only if at each step such chains of artifacts are successfully eliminated in full as well. Which is currently not guaranteed, as the artifact combines are applied only once and in an arbitrary order that has to do with the order of creation or insertion of artifacts into their worklist, which is a no particular order. In this patch I make a small change to the artifact combiner, making it to re-insert into the worklist immediate (modulo a look-through copies) artifact users of each vreg that changes its definition due to an artifact combine. Here the first scan through the artifacts worklist, while not being done in any guaranteed order, only needs to find the innermost pair(s) of artifacts that could be immediately combined out. After that the process follows def-use chains, making them shorter at each step, thus combining everything that can be combined in O(n) time. Reviewers: volkan, aditya_nandakumar, qcolombet, paquette, aemerson, dsanders Reviewed By: aditya_nandakumar, paquette Tags: #llvm Differential Revision: https://reviews.llvm.org/D71448	2019-12-13 15:45:18 -08:00
Roman Tereshin	18bf9670aa	[Legalizer] Refactoring out legalizeMachineFunction and introducing new unittests/CodeGen/GlobalISel/LegalizerTest.cpp relying on it to unit test the entire legalizer algorithm (including the top-level main loop). See also https://reviews.llvm.org/D71448	2019-12-13 15:45:18 -08:00
Roman Tereshin	8207c81597	[Legalizer] More detailed debugging printing in main loop	2019-12-13 15:45:18 -08:00
Alex Richardson	11448eeb72	[NFC] Use SelectionDAG::getMemBasePlusOffset() instead of getNode(ISD::ADD) Summary: To find potential opportunities to use getMemBasePlusOffset() I looked at all ISD::ADD uses found with the regex getNode\(ISD::ADD,.+,.+Ptr in lib/CodeGen/SelectionDAG. If this patch is accepted I will convert the files in the individual backends too. The motivation for this change is our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). We use a separate register type to store pointers (128-bit capabilities, which are effectively unforgeable and monotonic fat pointers). These capabilities permit a reduced set of operations and therefore use a separate ValueType (iFATPTR). to represent pointers implemented as capabilities. Therefore, we need to avoid using ISD::ADD for our patterns that operate on pointers and need to use a function that chooses ISD::ADD or a new ISD::PTRADD opcode depending on the value type. We originally added a new DAG.getPointerAdd() function, but after this patch series we can modify the implementation of getMemBasePlusOffset() instead. Avoiding direct uses of ISD::ADD for pointer types will significantly reduce the amount of assertion/instruction selection failures for us in future upstream merges. Reviewers: spatel Reviewed By: spatel Subscribers: merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71207	2019-12-13 21:40:03 +00:00
Alex Richardson	fc83f53a86	[NFC] Implement SelectionDAG::getObjectPtrOffset() using getMemBasePlusOffset() Summary: This change is preparatory work to use this helper functions in more places. In order to make this change, getMemBasePlusOffset() has been extended to also take a SDNodeFlags parameter. The motivation for this change is our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). We use a separate register type to store pointers (128-bit capabilities, which are effectively unforgeable and monotonic fat pointers). These capabilities permit a reduced set of operations and therefore use a separate ValueType (iFATPTR). to represent pointers implemented as capabilities. Therefore, we need to avoid using ISD::ADD for our patterns that operate on pointers and need to use a function that chooses ISD::ADD or a new ISD::PTRADD opcode depending on the value type. We originally added a new DAG.getPointerAdd() function, but after this patch series we can modify the implementation of getMemBasePlusOffset() instead. Avoiding direct uses of ISD::ADD for pointer types will significantly reduce the amount of assertion/instruction selection failures for us in future upstream merges. Reviewers: spatel Reviewed By: spatel Subscribers: merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71206	2019-12-13 21:40:03 +00:00
Alex Richardson	ea8888d1af	[NFC] Add a SDValue overload for SelectionDAG::getMemBasePlusOffset() Summary: This change is preparatory work to use this helper functions in more places. Currently the function only allows integer constants offsets, but there are cases where we can use an existing SDValue parameter. The motivation for this change is our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). We use a separate register type to store pointers (128-bit capabilities, which are effectively unforgeable and monotonic fat pointers). These capabilities permit a reduced set of operations and therefore use a separate ValueType (iFATPTR). to represent pointers implemented as capabilities. Therefore, we need to avoid using ISD::ADD for our patterns that operate on pointers and need to use a function that chooses ISD::ADD or a new ISD::PTRADD opcode depending on the value type. We originally added a new DAG.getPointerAdd() function, but after this patch series we can modify the implementation of getMemBasePlusOffset() instead. Avoiding direct uses of ISD::ADD for pointer types will significantly reduce the amount of assertion/instruction selection failures for us in future upstream merges. Reviewers: spatel, craig.topper Reviewed By: spatel, craig.topper Subscribers: craig.topper, merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71205	2019-12-13 21:40:03 +00:00
Alex Richardson	d9bb70acd7	[NFC] Change SelectionDAG::getMemBasePlusOffset() to use int64_t Summary: This change is preparatory work to use this helper functions in more places. Currently the function only allows positive offsets, but there are cases where we want to subtract an offset from an existing pointer. The motivation for this change is our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). We use a separate register type to store pointers (128-bit capabilities, which are effectively unforgeable and monotonic fat pointers). These capabilities permit a reduced set of operations and therefore use a separate ValueType (iFATPTR). to represent pointers implemented as capabilities. Therefore, we need to avoid using ISD::ADD for our patterns that operate on pointers and need to use a function that chooses ISD::ADD or a new ISD::PTRADD opcode depending on the value type. We originally added a new DAG.getPointerAdd() function, but after this patch series we can modify the implementation of getMemBasePlusOffset() instead. Avoiding direct uses of ISD::ADD for pointer types will significantly reduce the amount of assertion/instruction selection failures for us in future upstream merges. Reviewers: spatel Reviewed By: spatel Subscribers: merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71204	2019-12-13 21:40:03 +00:00
Sam Elliott	a0f43b0043	[RISCV] Move DebugLoc Copy into CompressInstEmitter Summary: This copy ensures that debug location information is kept for compressed instructions. There are places where both compressInstruction and uncompressInstruction are called that were not doing this copy, discarding some debug info. This change merely moves the copy into the generated file, so you cannot forget to copy the location over when compressing or uncompressing. Reviewers: asb, luismarques Reviewed By: luismarques Subscribers: sameer.abuasal, aprantl, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67493	2019-12-13 20:01:04 +00:00
Francesco Petrogalli	19f73f0d1b	Revert "[VectorUtils] Introduce the Vector Function Database (VFDatabase)." This reverts commit `0be81968a2`. The VFDatabase needs some rework to be able to handle vectorization and subsequent scalarization of intrinsics in out-of-tree versions of the compiler. For more details, see the discussion in https://reviews.llvm.org/D67572.	2019-12-13 19:42:04 +00:00
Fangrui Song	193da743db	[profile] Fix a crash when -fprofile-remapping-file= triggers an error Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D71485	2019-12-13 11:38:20 -08:00
Sanjay Patel	2f0c7fd2db	[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc (2nd try) The initial attempt (rG89633320) botched the logic by reversing the source/dest types. Added x86 tests for additional coverage. The vector tests show a potential improvement (fold vector load instead of broadcasting), but that's a known/existing problem. This fold is done in IR by instcombine, and we have a special form of it already here in DAGCombiner, but we want the more general transform too: https://rise4fun.com/Alive/3jZm Name: general Pre: (C1 + zext(C2) < 64) %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %a = and i64 %s2, zext((1 << (16 - C2)) - 1) %r = trunc %a to i16 Name: special Pre: C1 == 48 %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %r = trunc %s2 to i16 ...because D58017 exposes a regression without this fold.	2019-12-13 14:03:54 -05:00
Hiroshi Yamauchi	ed50e6060b	[PGO][PGSO] Enable size optimizations in code gen / target passes for cold code. Summary: Split off of D67120. Reviewers: davidxl Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71288	2019-12-13 11:01:19 -08:00
Momchil Velikov	8e8e3181aa	[ARM] Fix in ICE when retrieving the number of micro-ops for vlldm/vlstm The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for `VLLDM` and `VLSTM`, which are currently defined with itineraries having a dynamic count of micro-ops. Assuming an optimistic case in which these instruction do not actually perform loads or stores, and with the idea that Armv8-m cores are supposed to use the new style scheduling models, this patch just sets the itinerary for those two instructions to `NoItinerary`. Differential Revision: https://reviews.llvm.org/D71266	2019-12-13 18:19:40 +00:00
Momchil Velikov	d53e61863d	[AArch64] Emit PAC/BTI .note.gnu.property flags This patch make LLVM emit the processor specific program property types defined in AArch64 ELF spec https://developer.arm.com/docs/ihi0056/f/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q2-documentation A file containing no functions gets both property flags. Otherwise, a property is set iff all the functions in the file have the corresponding attribute. Patch by Daniel Kiss and Momchil Velikov. Differential Revision: https://reviews.llvm.org/D71019	2019-12-13 17:38:20 +00:00
Fangrui Song	f99eedeb72	[MC][PowerPC] Fix a crash when redefining a symbol after .set Fix PR44284. This is probably not valid assembly but we should not crash. Reviewed By: luporl, #powerpc, steven.zhang Differential Revision: https://reviews.llvm.org/D71443	2019-12-13 09:31:54 -08:00
Fangrui Song	f16377f11c	[ARM][MVE] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71062	2019-12-13 09:26:26 -08:00
Sam Parker	84593f058b	[ARM][MVE] Make VPT invalid for tail predication We've been marking VPT incompatible instructions as invalid for tail predication too, though this may not strictly be true. VPT are incompatible and, unless its the first predicate def in a loop, they shouldn't be compatible for tail predication either. Differential Revision: https://reviews.llvm.org/D71410	2019-12-13 15:01:08 +00:00
Nicola Zaghen	97572775d2	Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. This fixes the buildbot failures. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-13 14:30:21 +00:00
Mikhail Maltsev	99581fd4c8	[ARM][MVE] Add vector reduction intrinsics with two vector operands Summary: This patch adds intrinsics for the following MVE instructions: * VABAV * VMLADAV, VMLSDAV * VMLALDAV, VMLSLDAV * VRMLALDAVH, VRMLSLDAVH Each of the above 4 groups has a corresponding new LLVM IR intrinsic, since the instructions cannot be easily represented using general-purpose IR operations. Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71062	2019-12-13 13:17:29 +00:00
Simon Tatham	25305a9311	[ARM][MVE] Add intrinsics for more immediate shifts. Summary: This fills in the remaining shift operations that take a single vector input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and `vshll[bt]` families. `vshll[bt]` (which shifts each input lane left into a double-width output lane) is the most interesting one. There are separate MC instruction ids for shifting by exactly the input lane width and shifting by less than that, because the instruction encoding is so completely different for the lane-width special case. So I had to write two sets of patterns to match based on the immediate shift count, which involved adding a ComplexPattern matcher to avoid the general-case pattern accidentally matching the special case too. For that family I've made sure to add an llc codegen test for both versions of each instruction. I'm experimenting with a new strategy for parametrising the isel patterns for all these instructions: adding extra fields to the relevant `Instruction` subclass itself, which are ignored by the Tablegen backends that generate the MC data, but can be retrieved from each instance of that instruction subclass when it's passed as a template parameter to the multiclass that generates its isel patterns. A nice effect of that is that I can fill in those informational fields using `let` blocks, rather than having to type them out once per instruction at `defm` time. (As a result, quite a lot of existing instruction `def`s are reindented by this patch, so it's clearer to read with whitespace changes ignored.) Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71458	2019-12-13 13:07:39 +00:00
John Brawn	01ba201abc	[ARM] Add custom strict fp conversion lowering when non-strict is custom We have custom lowering for operations converting to/from floating-point types when we don't have hardware support for those types, and this doesn't interact well with the target-independent legalization of the strict versions of these operations. Fix this by adding similar custom lowering of the strict versions. This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics test, with the remaining failures due to poor instruction selection. Differential Revision: https://reviews.llvm.org/D71127	2019-12-13 13:00:00 +00:00
Tim Renouf	fce1a6f584	Revert "AMDGPU: Try to commute sub of boolean ext" This reverts commit `69fcfb7d35`. As shown in the test I attached to this commit, the change I reverted causes a problem with "zext(cc1) - zext(cc2)". It commuted the operands to the sub and used different logic to select the addc/subc instruction: sub zext (setcc), x => addcarry 0, x, setcc sub sext (setcc), x => subcarry 0, x, setcc ... but that is bogus. I believe it is not possible to fold those commuted patterns into any form of addcarry or subcarry. It may have worked as intended before "AMDGPU: Change boolean content type to 0 or 1" because the setcc was considered to be -1 rather than 1. Differential Revision: https://reviews.llvm.org/D70978 Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43	2019-12-13 12:49:06 +00:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Sjoerd Meijer	e91420e17d	Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC." This reverts commit `9468e3334b`. There's a test that doesn't like this change. The RDA analysis gets invalided by changes in the block, which is not taken into account. Revert while I work on a fix for this.	2019-12-13 11:56:44 +00:00
Mark Murray	228c74076d	[ARM][MVE][Intrinsics] Add _x() variants of my _m() intrinsics. Summary: Better use of multiclass is used, and this helped find some existing bugs in the predicated VMULL* intrinsics, which are now fixed. The refactored VMULL[TB]Q_(INT\|POLY)_M() intrinsics were discovered to have an argument ("inactive") with incorrect type, and this required a fix that is included in this whole patch. The argument "inactive" should have been the same width (per vector element) as the return type of the intrinsic, but was not in the case where the return type was double the element width of the input types. To assist in testing the multiclassing , and to thwart further gremlins, the unit tests are improved in scope. The .ll tests are all generated by a small bit of throw-away scripting from the corresponding .c tests, and as such the diffs are large and nasty. Look at the file rather than the diff. Reviewers: dmgreen, miyuki, ostannard, simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71421	2019-12-13 11:51:23 +00:00
Kerry McLaughlin	4194ca8e5a	Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores" Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch to use reg + imm non-temporal loads to fix previous test failures. Original commit message: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above.	2019-12-13 10:08:20 +00:00
David Stenberg	5c7cc6f83d	[LiveDebugValues] Omit entry values for DBG_VALUEs with pre-existing expressions Summary: This is a quickfix for PR44275. An assertion that checks that the DIExpression is valid failed due to attempting to create an entry value for an indirect parameter. This started appearing after D69028, as the indirect parameter started being represented using an DW_OP_deref, rather than with the DBG_VALUE's second operand, meaning that the isIndirectDebugValue() check in LiveDebugValues did not exclude such parameters. A DIExpression that has an entry value operation can currently not have any other operation, leading to the failed isValid() check. This patch simply makes us stop considering emitting entry values for such parameters. To support such cases I think we at least need to do the following changes: * In DIExpression::isValid(): Remove the limitation that a DW_OP_LLVM_entry_value operation can be the only operation in a DIExpression. * In LiveDebugValues::emitEntryValues(): Create an entry value of size 1, so that it only wraps the register operand, and not the whole pre-existing expression (the DW_OP_deref). * In LiveDebugValues::removeEntryValue(): Check that the new debug value has the same debug expression as the original, rather than checking that the debug expression is empty. * In DwarfExpression::addMachineRegExpression(): Modify the logic so that a DW_OP_reg* expression is emitted for the entry value. That is how GCC emits entry values for indirect parameters. That will currently not happen to due the DW_OP_deref causing the !HasComplexExpression to fail. The LocationKind needs to be changed also, rather than always emitting a DW_OP_stack_value for entry values. There are probably more things I have missed, but that could hopefully be a good starting point for emitting such entry values. Reviewers: djtodoro, aprantl, jmorse, vsk Reviewed By: aprantl, vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D71416	2019-12-13 10:49:46 +01:00
Georgii Rymar	86e652f828	[yaml2obj] - Add a way to override sh_flags section field. Currently we have the `Flags` property that allows to set flags for a section. The problem is that it does not allow us to set an arbitrary value, because of bit fields validation under the hood. An arbitrary values can be used to test specific broken cases. We probably do not want to relax the validation, so this patch adds a `ShSize` property that allows to override the `sh_size`. It is inline with others `Sh*` properties we have already. Differential revision: https://reviews.llvm.org/D71411	2019-12-13 11:54:37 +03:00
Craig Topper	5c80a4f454	[LegalizeTypes] Remove unnecessary if before calling ReplaceValueWith on the chain in SoftenFloatRes_LOAD. I believe this is a leftover from when fp128 was softened to fp128 on X86-64. In that case type legalization must have been able to create a load that was the same as N which would make this replacement fail or assert. Since we no longer do that, this check should be unneeded.	2019-12-13 00:14:41 -08:00
Nikita Popov	21fbd5587c	Reapply [LVI] Normalize pointer behavior This is a rebase of the change over D70376, which fixes an LVI cache invalidation issue that also affected this patch. ----- Related to D69686. As noted there, LVI currently behaves differently for integer and pointer values: For integers, the block value is always valid inside the basic block, while for pointers it is only valid at the end of the basic block. I believe the integer behavior is the correct one, and CVP relies on it via its getConstantRange() uses. The reason for the special pointer behavior is that LVI checks whether a pointer is dereferenced in a given basic block and marks it as non-null in that case. Of course, this information is valid only after the dereferencing instruction, or in conservative approximation, at the end of the block. This patch changes the treatment of dereferencability: Instead of including it inside the block value, we instead treat it as something similar to an assume (it essentially is a non-nullness assume) and incorporate this information in intersectAssumeOrGuardBlockValueConstantRange() if the context instruction is the terminator of the basic block. This happens either when determining an edge-value internally in LVI, or when a terminator was explicitly passed to getValueAt(). The latter case makes this change not fully NFC, because we can now fold terminator icmps based on the dereferencability information in the same block. This is the reason why I changed one JumpThreading test (it would optimize the condition away without the change). Of course, we do not want to recompute dereferencability on each intersectAssume call, so we need a new cache for this. The dereferencability analysis requires walking the entire basic block and computing underlying objects of all memory operands. This was previously done separately for each queried pointer value. In the new implementation (both because this makes the caching simpler, and because it is faster), I instead only walk the full BB once and cache all the dereferenced pointers. So the traversal is now performed only once per BB, instead of once per queried pointer value. I think the overall model now makes more sense than before, and there will be no more pitfalls due to differing integer/pointer behavior. Differential Revision: https://reviews.llvm.org/D69914	2019-12-13 08:59:58 +01:00
Rui Ueyama	69da7e29de	Revert an accidental commit `af5ca40b47`	2019-12-13 15:17:40 +09:00
Rui Ueyama	af5ca40b47	temporary	2019-12-13 14:35:03 +09:00
Nate Voorhies	bc16666de4	[NFC][AArch64] Fix typo. Summary: Coaleascer should be coalescer. Reviewers: qcolombet, Jim Reviewed By: Jim Subscribers: Jim, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70731	2019-12-13 10:25:36 +08:00
Eric Christopher	a8154e5e0c	Temporarily revert "NFC: DebugInfo: Refactor RangeSpanList to be a struct, like DebugLocStream::List" as it was causing bot and build failures. This reverts commit `8e04896288`.	2019-12-12 17:55:41 -08:00
David Blaikie	8e04896288	NFC: DebugInfo: Refactor RangeSpanList to be a struct, like DebugLocStream::List Move these data structures closer together so their emission code can eventually share more of its implementation.	2019-12-12 16:53:59 -08:00
David Blaikie	20e06a28da	NFC: DebugInfo: Refactor debug_loc/loclist emission into a common function (except for v4 loclists, which are sufficiently different to not fit well in this generic implementation) In subsequent patches I intend to refactor the DebugLoc and ranges data structures to be more similar so I can common more of the implementation here.	2019-12-12 16:39:12 -08:00
Evgenii Stepanov	dabd2622a8	hwasan: add tag_offset DWARF attribute to optimized debug info Summary: Support alloca-referencing dbg.value in hwasan instrumentation. Update AsmPrinter to emit DW_AT_LLVM_tag_offset when location is in loclist format. Reviewers: pcc Subscribers: srhines, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70753	2019-12-12 16:18:54 -08:00
Danilo Carvalho Grael	6bed43f3c4	[AArch64][SVE] Add integer arithmetic with immediate instructions. Summary: Add pattern matching for the following instructions: - add, sub, subr, sqadd, sqsub, uqadd, uqsub This patch required complex patterns to match the immediate with optinal left shift. I re-used the Select function from the other SVE repo to implement the complext pattern. I plan on doing another patch to also match constant vector of the same immediate. Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D71370	2019-12-12 17:28:46 -05:00
Johannes Doerfert	6abd01e462	[Attributor][FIX] Do treat byval arguments special When we reason about the pointer argument that is byval we actually reason about a local copy of the value passed at the call site. This was not the case before and we wrongly introduced attributes based on the surrounding function. AAMemoryBehaviorArgument, AAMemoryBehaviorCallSiteArgument and AANoCaptureCallSiteArgument are made aware of byval now. The code to skip "subsuming positions" for reasoning follows a common pattern and we should refactor it. A TODO was added. Discovered by @efriedma as part of D69748.	2019-12-12 16:04:21 -06:00
Denis Bakhvalov	7081c92241	[NFC][InstSimplify] Refactoring ThreadCmpOverSelect function Removed code duplication in ThreadCmpOverSelect and broke it into several smaller functions for reusing them. Differential Revision: https://reviews.llvm.org/D71158	2019-12-12 22:45:58 +01:00
Sanjay Patel	9432937190	Revert "[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc" This reverts commit `8963332c33`. There was a logic bug typo in this code, but it wasn't visible in the asm for the tests.	2019-12-12 16:24:40 -05:00
Sanjay Patel	8963332c33	[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc This fold is done in IR by instcombine, and we have a special form of it already here in DAGCombiner, but we want the more general transform too: https://rise4fun.com/Alive/3jZm Name: general Pre: (C1 + zext(C2) < 64) %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %a = and i64 %s2, zext((1 << (16 - C2)) - 1) %r = trunc %a to i16 Name: special Pre: C1 == 48 %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %r = trunc %s2 to i16 ...because D58017 exposes a regression without this fold.	2019-12-12 15:44:13 -05:00
Teresa Johnson	c8e0bb3b2c	[LTO] Support for embedding bitcode section during LTO Summary: This adds support for embedding bitcode in a binary during LTO. The libLTO gains supports the `-lto-embed-bitcode` flag. The option allows users of the LTO library to embed a bitcode section. For example, LLD can pass the option via `ld.lld -mllvm=-lto-embed-bitcode`. This feature allows doing something comparable to `clang -c -fembed-bitcode`, but on the (LTO) linker level. Having bitcode alongside native code has many use-cases. To give an example, the MacOS linker can create a `-bitcode_bundle` section containing bitcode. Also, having this feature built into LLVM is an alternative to 3rd party tools such as [[ https://github.com/travitch/whole-program-llvm \| wllvm ]] or [[ https://github.com/SRI-CSL/gllvm \| gllvm ]]. As with these tools, this feature simplifies creating "whole-program" llvm bitcode files, but in contrast to wllvm/gllvm it does not rely on a specific llvm frontend/driver. Patch by Josef Eisl <josef.eisl@oracle.com> Reviewers: #llvm, #clang, rsmith, pcc, alexshap, tejohnson Reviewed By: tejohnson Subscribers: tejohnson, mehdi_amini, inglorion, hiraditya, aheejin, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits, #llvm, #clang Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68213	2019-12-12 12:34:19 -08:00
Kit Barton	61368c8e98	Rename LoopInfo::isRotated() to LoopInfo::isRotatedForm(). This patch renames the LoopInfo::isRotated() method to LoopInfo::isRotatedForm() to make it clear that the method checks whether the loop is in rotated form, not whether the loop has been rotated by the LoopRotation pass.	2019-12-12 14:22:36 -05:00
Jonas Paulsson	61f5ba5c32	[SystemZ] Implement the packed stack layout Any llvm function with the "packed-stack" attribute will be compiled to use the packed stack layout which reuses unused parts of the incoming register save area. This is needed for building the Linux kernel. Review: Ulrich Weigand https://reviews.llvm.org/D70821	2019-12-12 10:26:03 -08:00
Sanjay Patel	b39009bf1d	[DAGCombiner] improve readability This is not quite NFC because I changed the SDLoc to use the more standard 'N' (the starting node for the fold). This transform is a special-case of a more general fold that we do in IR, but it seems like the general fold is needed here too to avoid a potential regression seen in D58017. https://rise4fun.com/Alive/3jZm	2019-12-12 13:16:50 -05:00
Florian Hahn	bd12a322d7	[BasicAA] Use GEP as context for computeKnownBits in aliasGEP. In order to use assumptions, computeKnownBits needs a context instruction. We can use the GEP, if it is an instruction. We already pass the assumption cache, but it cannot be used without a context instruction. Reviewers: anemet, asbirlea, hfinkel, spatel Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D71264	2019-12-12 17:18:04 +00:00
Michael Liao	11b2b2f4b1	[amdgpu] Fix `-Wenum-compare` warning. NFC.	2019-12-12 11:44:16 -05:00
Florian Hahn	526244b187	[Matrix] Add first set of matrix intrinsics and initial lowering pass. This is the first patch adding an initial set of matrix intrinsics and a corresponding lowering pass. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html The first patch introduces four new intrinsics (transpose, multiply, columnwise load and store) and a LowerMatrixIntrinsics pass, that lowers those intrinsics to vector operations. Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix embedded in a <16 x float> vector) and the intrinsics take the dimension information as parameters. Those parameters need to be ConstantInt. For the memory layout, we initially assume column-major, but in the RFC we also described how to extend the intrinsics to support row-major as well. For the initial lowering, we split the input of the intrinsics into a set of column vectors, transform those column vectors and concatenate the result columns to a flat result vector. This allows us to lower the intrinsics without any shape propagation, as mentioned in the RFC. In follow-up patches, we plan to submit the following improvements: * Shape propagation to eliminate the embedding/splitting for each intrinsic. * Fused & tiled lowering of multiply and other operations. * Optimization remarks highlighting matrix expressions and costs. * Generate loops for operations on large matrixes. * More general block processing for operation on large vectors, exploiting shape information. We would like to add dedicated transpose, columnwise load and store intrinsics, even though they are not strictly necessary. For example, we could instead emit a large shufflevector instruction instead of the transpose. But we expect that to (1) become unwieldy for larger matrixes (even for 16x16 matrixes, the resulting shufflevector masks would be huge), (2) risk instcombine making small changes, causing us to fail to detect the transpose, preventing better lowerings For the load/store, we are additionally planning on exploiting the intrinsics for better alias analysis. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70456	2019-12-12 15:42:18 +00:00
Sjoerd Meijer	9468e3334b	[ARM][MVE] findVCMPToFoldIntoVPS. NFC. This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can reimplement findVCMPToFoldIntoVPS with just a few calls to RDA. Differential Revision: https://reviews.llvm.org/D71330	2019-12-12 15:41:20 +00:00
Guillaume Chatelet	dbc5acf8ce	[Alignment][NFC] Adding Align compatible methods to IntrinsicInst/IRBuilder Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71420	2019-12-12 16:22:15 +01:00
Tom Stellard	bf13a71095	AMDGPU/SILoadStoreOptimizer: Simplify function Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71044	2019-12-12 06:53:03 -08:00
Sam Parker	1274ac3dc2	[ARM][MVE] Sink vector shift operand Recommit `e0b966643f`. sub instructions were being generated for the negated value, and for some reason they were the register only ones. I think the problem was because I was grabbing the 'zero' from vmovimm, which is a target constant. Now I'm just generating a new Constant zero and so rsb instructions are now generated. Original commit message: The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841	2019-12-12 14:34:00 +00:00
James Henderson	84a9756a72	[llvm-dwarfdump] Add blank line after printing line table This helps delineate it in the output from later tables or other output. Reviewed by: JDevlieghere Differential Revision: https://reviews.llvm.org/D71344	2019-12-12 14:06:10 +00:00
Hideto Ueno	4ecf25545c	[Attributor][NFC] Fix comments and unnecessary comma	2019-12-12 13:42:40 +00:00
Hideto Ueno	827bade262	[Attributor] [NFC] Use `checkForAllUses` helpr in `AAHeapToStackImpl::updateImpl` Summary: Remove `Worklist` iteration and make use `checkForAllUses`. There is no test chage. Reviewers: sstefan1, jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71352	2019-12-12 13:27:53 +00:00
Hideto Ueno	63599bd072	[Attributor][NFC] Refactoring `AANoFreeArgument::updateImpl` Summary: Refactoring `AANoFreeArgument::updateImpl`. There is no test change. Reviewers: sstefan1, jdoerfert Reviewed By: sstefan1 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71349	2019-12-12 13:27:53 +00:00
stozer	e39e2b4a79	[DebugInfo] Prevent invalid fragments at ISel from dropping debug info During SelectionDAG, if a value which is associated with a DBG_VALUE needs to be split across multiple registers, the DBG_VALUE will be split into a set of fragment expressions to recreate the original value. If one or more of these fragments cannot be created, they would previously be silently dropped, causing the old debug value to live past its expiry date. This patch fixes this issue by keeping invalid fragments while setting their value as Undef. Differential revision: https://reviews.llvm.org/D70248	2019-12-12 12:28:39 +00:00
Russell Gallop	f70f180148	[Support] Try to fix bot failure after `8ddcd1dc26` http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/41755	2019-12-12 12:20:11 +00:00
Russell Gallop	8ddcd1dc26	[Support] Extend TimeProfiler to support multiple threads This makes TimeTraceProfilerInstance thread local. Added timeTraceProfilerFinishThread() which moves the thread local instance to a global vector of instances. timeTraceProfilerWrite() then writes recorded data from all instances. Threads are identified based on their thread ids. Totals are reported with artificial thread ids higher than the real ones. Replaced raw pointer for TimeTraceProfilerInstance with unique_ptr. Differential Revision: https://reviews.llvm.org/D71059	2019-12-12 12:01:44 +00:00
Mirko Brkusanin	d7357c52a4	[Mips] Add support for min/max/umin/umax atomics In order to properly implement these atomic we need one register more than other binary atomics. It is used for storing result from comparing values in addition to the one that is used for actual result of operation. https://reviews.llvm.org/D71028	2019-12-12 11:32:37 +01:00
Nicola Zaghen	f798eb21ec	Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same." This reverts commit `5f6208778f`. This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.	2019-12-12 10:29:54 +00:00
Nicola Zaghen	5f6208778f	[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-12 10:07:01 +00:00
Cullen Rhodes	bbd16b6876	[AArch64][SVE] Remove nxv1f32 and nxv1f64 as legal types Summary: Also cleans up ZPR register class definition. Reviewers: sdesmalen, cameron.mcinally, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71351	2019-12-12 09:49:22 +00:00
Puyan Lotfi	756db63af9	[NFC][llvm][MIRVRegNamerUtils] Moving methods around. Making some private. Making all externally unused methods private in MIRVRegNamerUtils.h. Moving or deleting a couple other methods around.	2019-12-12 03:32:53 -05:00
Alexey Lapshin	71aaebc824	[DWARF5][DWARFVerifier] Check that Skeleton compilation unit does not have children. That patch adds checking into DWARFVerifier that the Skeleton compilation unit does not have children. Differential Revision: https://reviews.llvm.org/D71244	2019-12-12 10:59:10 +03:00
Sam Parker	f8ff3bf55b	Revert "[ARM][MVE] Sink vector shift operand" This reverts commit `e0b966643f`. Instruction selection is failing with expensive checks.	2019-12-12 07:52:57 +00:00
Sam Parker	e0b966643f	[ARM][MVE] Sink vector shift operand The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841	2019-12-12 07:35:21 +00:00
Wenlei He	d275a06487	[AutoFDO] Statistic for context sensitive profile guided inlining Summary: AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70584	2019-12-11 21:37:21 -08:00
Puyan Lotfi	f5b7a46837	[llvm][MIRVRegNamerUtils] Adding hashing on memoperands. No more hash collisions for memoperands. Now the MIRCanonicalization pass shouldn't hit hash collisions when dealing with nearly identical memory accessing instructions when their memoperands are in fact different. Differential Revision: https://reviews.llvm.org/D71328	2019-12-11 22:11:49 -05:00
Cameron McInally	7aa5c16088	[AArch64][SVE] Add patterns for scalable vselect This patch matches scalable vector selects to predicated move instructions. Differential Revision: https://reviews.llvm.org/D71298	2019-12-11 20:15:44 -06:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Reid Kleckner	85ba5f637a	Rename TTI::getIntImmCost for instructions and intrinsics Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381	2019-12-11 18:00:20 -08:00
Vedant Kumar	56232f950d	Revert "[DWARF] Allow cross-CU references of subprogram definitions" This reverts commit `30038da15b`. It causes the stage2 thinLTO bot to fail with: Assertion failed: (CU.getDIE(CalleeSP) && "Expected declaration subprogram DIE for callee") rdar://57840415	2019-12-11 15:55:48 -08:00
Sanjay Patel	cdf5cfea8e	Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()" This reverts commit `d1f0bdf2d2`. The patch can cause infinite loops in DAGCombiner.	2019-12-11 16:56:58 -05:00
Craig Topper	4b452952fe	[LegalizeTypes] In SoftenFloatRes_FP_EXTEND, move the check for input already being promoted above the check for fp16 converting to something other than fp32. The fp16 to larger than fp32 inserts an extend that need to re-legalized if fp16 is promoted. But if we check for fp16 promotion first, then we can avoid emiting the fp_extend all together.	2019-12-11 12:48:08 -08:00
Johannes Doerfert	d23c61490c	[OpenMP] Introduce the OpenMP-IR-Builder This is the initial patch for the OpenMP-IR-Builder, as discussed on the mailing list ([1] and later) and at the US Dev Meeting'19. The design is similar to D61953 but: - in a non-WIP status, with proper documentation and working. - using a OpenMPKinds.def file to manage lists of directives, runtime functions, types, ..., similar to the current Clang implementation. - restricted to handle only (simple) barriers, to implement most `#pragma omp barrier` directives and most implicit barriers. - properly hooked into Clang to be used if possible (D69922). - compatible with the remaining code generation. Parts have been extracted into D69853. The plan is to have multiple people working on moving logic from Clang here once the initial scaffolding (=this patch) landed. [1] http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim Subscribers: mgorny, hiraditya, bollu, guansong, jfb, cfe-commits, llvm-commits, penzn, ppenzin Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69785	2019-12-11 14:38:49 -06:00
Sam Clegg	881d877846	[WebAssembly] Add new `export_name` clang attribute for controlling wasm export names This is equivalent to the existing `import_name` and `import_module` attributes which control the import names in the final wasm binary produced by lld. This maps the existing This attribute currently requires a string rather than using the symbol name for a couple of reasons: 1. Avoid confusion with static and dynamic linking which is based on symbol name. Exporting a function from a wasm module using this directive is orthogonal to both static and dynamic linking. 2. Avoids name mangling. Differential Revision: https://reviews.llvm.org/D70520	2019-12-11 11:54:57 -08:00
Nikita Popov	8db5143b1a	[InstCombine] Optimize overflow check base on uadd.with.overflow result Fix for https://bugs.llvm.org/show_bug.cgi?id=40846. This adds a combine for cases where a (a + b) < a style overflow check is performed, but with a + b being the result of uadd.with.overflow, so the overflow result is also already available and we can just use it. Subsequently GVN/CSE will deduplicate the extracts. We can run into this situation if you have both a uadd.with.overflow and a manual add + overflow check in the same function (on the same operands), in which case GVN will rewrite the add to the with.overflow result and leave you with this pattern. The implementation is a bit ugly because I'm handling the various canonicalization edge cases. This does not yet handle the negated version of this pattern. Differential Revision: https://reviews.llvm.org/D58644	2019-12-11 20:52:04 +01:00
Danila Kutenin	19e83a9b4c	[ValueTracking] Pointer is known nonnull after load/store If the pointer was loaded/stored before the null check, the check is redundant and can be removed. For now the optimizers do not remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG. The patch allows to use more nonnull constraints. Also, it found one more optimization in some PowerPC test. This is my first llvm review, I am free to any comments. Differential Revision: https://reviews.llvm.org/D71177	2019-12-11 20:32:29 +01:00
Nikita Popov	b361d3bbcd	[MergeFuncs] Remove incorrect attribute copying Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1. However, the attribute copying was done in the wrong place (in general call replacement, not thunk generation) and a proper fix was implemented in D12581. Previously this code was just unnecessary but harmless (because FunctionComparator ensured that the attributes of the two functions are exactly the same), but since byval was changed to accept a type this copying is actively wrong and may result in malformed IR. Differential Revision: https://reviews.llvm.org/D71173	2019-12-11 20:09:54 +01:00
Fangrui Song	25e21a09b3	Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D65958 and D70450	2019-12-11 11:04:03 -08:00
Andrzej Warzynski	a75463c471	Add intrinsics for unary narrowing operations Summary: The following intrinsics for unary narrowing operations are added: * @llvm.aarch64.sve.sqxtnb * @llvm.aarch64.sve.uqxtnb * @llvm.aarch64.sve.sqxtunb * @llvm.aarch64.sve.sqxtnt * @llvm.aarch64.sve.uqxtnt * @llvm.aarch64.sve.sqxtunt Reviewers: sdesmalen, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71270	2019-12-11 18:55:51 +00:00
Florian Hahn	2675a3c880	[AArch64] Be more careful to skip debug operands in LdSt Optimizier. This fixes crashes with $noreg operands.	2019-12-11 18:47:45 +00:00
Sanjay Patel	d1f0bdf2d2	[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch skips the use check during the rewrite phase. So we determine that some expression isNegatibleForFree (identically to without this patch), but during the rewrite, don't rely on use counts to decide how to create the optimal expression. Differential Revision: https://reviews.llvm.org/D70975	2019-12-11 13:30:39 -05:00
Bardia Mahjour	916d37a2bc	[DA] Improve dump to show source and sink of the dependence Summary: The current da printer shows the dependence without indicating which instructions are being considered as the src vs dst. It also silently ignores call instructions, despite the fact that they create confused dependence edges to other memory instructions. This patch addresses these two issues plus a couple of minor non-functional improvements. Authored By: bmahjour Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc Reviewed By: dmgreen, fhahn Tags: #llvm Differential Revision: https://reviews.llvm.org/D71088	2019-12-11 11:48:16 -05:00
Florian Hahn	4fe92abceb	[AArch64] Skip debug ops with regsOverlap in AArch64 LD/ST opt. This fixes a crash when debug instructions are in between 2 stores.	2019-12-11 16:26:31 +00:00
Craig Topper	3adc819b7a	[X86] Erase dead LEA instruction after converting it to MOV in FixupLEAPass::processInstrForSlow3OpLEA.	2019-12-11 07:51:23 -08:00
Ulrich Weigand	ac473394ff	[SystemZ] Fix 128-bit strict FMA expansion pre-z14 Before z14, we did not have any FMA instruction for 128-bit floating-point, so the @llvm.fma.f128 intrinsic needs to be expanded to a libcall on those platforms. This worked correctly for regular FMA, but was implemented incorrectly for the strict version. This was not noticed because we did not have test coverage for this case. This patch fixes that incorrect expansion and adds the missing test cases.	2019-12-11 16:32:08 +01:00
Kit Barton	942c9946cc	[Loop] Add isRotated method to Loop class. Summary: This patch adds a method to determine if a loop is in rotated form (the latch is an exiting block). It also modifies the getLoopGuardBranch method to use this new method. This method can also be used in Loopfusion. Once this patch lands I will make the corresponding changes there. Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel Reviewed By: Meinersbur Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65958	2019-12-11 09:43:10 -05:00
Russell Gallop	df494f7512	[Support] Add TimeTraceScope constructor without detail arg This simplifies code where no extra details are required Also don't write out detail when it is empty. Differential Revision: https://reviews.llvm.org/D71347	2019-12-11 14:32:21 +00:00
Matt Arsenault	49d731b5e0	Verifier: Check frame-pointer attribute values There are a few places that check specific string attributes have particular values, and assert if they are something else. The verifier should catch these kinds of cases.	2019-12-11 19:53:49 +05:30
Kerry McLaughlin	c0a3ab3655	Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores" This reverts commit `3f5bf35f86` as it was causing build failures in llvm-clang-x86_64-expensive-checks: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045	2019-12-11 13:58:39 +00:00
Florian Hahn	17554b8961	[AArch64] Teach Load/Store optimizier to rename store operands for pairing. In some cases, we can rename a store operand, in order to enable pairing of stores. For store pairs, that cannot be merged because the first tored register is defined in between the second store, we try to find suitable rename register. First, we check if we can rename the given register: 1. The first store register must be killed at the store, which means we do not have to rename instructions after the first store. 2. We scan backwards from the first store, to find the definition of the stored register and check all uses in between are renamable. Along they way, we collect the minimal register classes of the uses for overlapping (sub/super)registers. Second, we try to find an available register from the minimal physical register class of the original register. A suitable register must not be 1. defined before FirstMI 2. between the previous definition of the register to rename 3. a callee saved register. We use KILL flags to clear defined registers while scanning from the beginning to the end of the block. This triggers quite often, here are the top changes for MultiSource, SPEC2000, SPEC2006 compiled with -O3 for iOS: Metric: aarch64-ldst-opt.NumPairCreated Program base patch diff test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0% test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9% test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1% test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5% test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0% test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3% test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0% test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0% test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0% test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0% test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0% test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5% test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9% test-suite...marks/Olden/health/health.test 16.00 18.00 12.5% test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2% test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9% test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1% test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7% test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1% test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0% test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9% test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8% test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7% test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4% test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2% test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9% test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9% test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6% test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6% test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2% test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1% test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1% test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0% test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0% test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8% test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8% test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8% test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5% test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3% test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2% test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8% test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8% test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7% test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7% Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D70450	2019-12-11 13:50:11 +00:00
Guillaume Chatelet	0a0d54b357	[Alignment][NFC] Introduce Align in IRBuilder Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71343	2019-12-11 14:41:23 +01:00
James Henderson	2f8155023a	[DebugInfo] Fix printing of DW_LNS_set_isa The Isa register is a uint8_t, but at least on Windows this is internally an unsigned char, which meant that prior to this patch it got formatted as an ASCII character, rather than a decimal number. This patch fixes this by casting it to a uint64_t before printing. I did it this way instead of using a uint8_t formatter because a) it is simpler, and b) it allows us to change the internal type of Isa in the future without this code breaking. I also took the opportunity to test the printing of the other standard opcodes. Reviewed by: probinson Differential Revision: https://reviews.llvm.org/D71274	2019-12-11 13:38:41 +00:00
Guillaume Chatelet	3491109587	Rollback assumeAligned in MemorySanitizer Summary: Rollback of parts of D71213. After digging more into the code I think we should leave 0 when creating the instructions (CreateMemcpy, CreateMaskedStore, CreateMaskedLoad). It's probably fine for MemorySanitizer because Alignement is resolved but I'm having a hard time convincing myself it has no impact at all (although tests are passing). Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71332	2019-12-11 14:25:21 +01:00
Andrzej Warzynski	65651f197a	[AArch64][SVE] Add DAG combine rules for gather loads and sext/zext Summary: These changes allow us to support sign-extending gather loads with the exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*). Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential revision: https://reviews.llvm.org/D70812	2019-12-11 12:56:18 +00:00
Oliver Stannard	6ae3d310bd	Revert "Reland [AArch64][MachineOutliner] Return address signing for outlined functions" This reverts commit `cec2d5c174`. Reverting because this is still creating outlined functions with return address signing instructions with mismatches SP values. For example: int *volatile v; void foo(int x) { int a[x]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; } void bar(int x) { int a[x]; v = 0; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; v = &a[0]; } This generates these two outlined functions, both of which modify SP between the paciasp and retaa instructions: $ clang --target=aarch64-arm-none-eabi -march=armv8.3-a -c test2.c -o - -S -Oz -mbranch-protection=pac-ret+leaf ... OUTLINED_FUNCTION_0: // @OUTLINED_FUNCTION_0 .cfi_sections .debug_frame .cfi_startproc // %bb.0: paciasp .cfi_negate_ra_state mov w8, w0 lsl x8, x8, #2 add x8, x8, #15 // =15 mov x9, sp and x8, x8, #0x7fffffff0 sub x8, x9, x8 mov x29, sp mov sp, x8 adrp x9, v retaa ... OUTLINED_FUNCTION_1: // @OUTLINED_FUNCTION_1 .cfi_startproc // %bb.0: paciasp .cfi_negate_ra_state str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] str x8, [x9, :lo12:v] mov sp, x29 retaa	2019-12-11 12:06:20 +00:00
Simon Tatham	1fed9a0c0c	[TableGen] Add bang-operators !getop and !setop. Summary: These allow you to get and set the operator of a dag node, without affecting its list of arguments. `!getop` is slightly fiddly because in many contexts you need its return value to have a static type more specific than 'any record'. It works to say `!cast<BaseClass>(!getop(...))`, but it's cumbersome, so I made `!getop` take an optional type suffix itself, so that can be written as the shorter `!getop<BaseClass>(...)`. Reviewers: hfinkel, nhaehnle Reviewed By: nhaehnle Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71191	2019-12-11 12:05:22 +00:00
Kerry McLaughlin	3f5bf35f86	[AArch64][SVE] Implement intrinsics for non-temporal loads & stores Summary: Adds the following intrinsics: - llvm.aarch64.sve.ldnt1 - llvm.aarch64.sve.stnt1 This patch creates masked loads and stores with the MONonTemporal flag set when used with the intrinsics above. Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71000	2019-12-11 11:13:51 +00:00
Sjoerd Meijer	d97cf1f889	[ARM][LowOverheadLoops] Remove dead loop update instructions. After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions. To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively. This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet: .. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 .. which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this. Differential Revision: https://reviews.llvm.org/D71007	2019-12-11 10:20:19 +00:00
Simon Tatham	bd0f271c9e	[ARM][MVE] Add intrinsics for immediate shifts. (reland) This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen, MarkMurrayARM Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-11 10:10:09 +00:00
Sam Parker	ee7579409b	[ARM][TypePromotion] Enable by default Enable the TypePromotion pass my default (again). This patch was originally committed in `393dacacf7`. This patch was reverted in `a38396939c`. Differential Revision: https://reviews.llvm.org/D70998	2019-12-11 10:00:16 +00:00
QingShan Zhang	eba7cbd3d0	[NFC][PowerPC] Remove the dead conditions in the if(cond)	2019-12-11 09:57:06 +00:00
shkzhang	1408e7e175	[PowerPC] [CodeGen] Use MachineBranchProbabilityInfo in EarlyIfPredicator to avoid the potential bug Summary: In the function `EarlyIfPredicator::shouldConvertIf()`, we call `TII->isProfitableToIfCvt()` with `BranchProbability::getUnknown()`, it may cause the potential assertion error for those hook which use `BranchProbability` in `isProfitableToIfCvt()`, for example `SystemZ`. `SystemZ` use `Probability < BranchProbability(1, 8))` in the function `SystemZInstrInfo::isProfitableToIfCvt()`, if we call this function with `BranchProbability::getUnknown()`, it will cause assertion error. This patch is to fix the potential bug. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D71273	2019-12-11 04:46:00 -05:00
Florian Hahn	11f311875f	[LiveRegUnits] Add phys_regs_and_masks iterator range (NFC). This iterator range just includes physical registers and register masks, which are interesting when dealing with register liveness. Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D70562	2019-12-11 09:34:42 +00:00
Guillaume Chatelet	8a7c52bc22	[Alignment][NFC] Introduce Align in SROA Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71277	2019-12-11 09:34:38 +01:00
QingShan Zhang	f99297176c	[PowerPC] Exploitate the Vector Integer Average Instructions PowerPC has instruction to do the semantics of this piece of code: vector int foo(vector int m, vector int n) { return (m + n + 1) >> 1; } This patch is adding the match rule to select it. Differential Revision: https://reviews.llvm.org/D71002	2019-12-11 07:25:57 +00:00
Craig Topper	d4345636e6	[LegalizeTypes] Remove manual worklist management from SoftenFloatRes_FP_EXTEND. I think this is no longer needed. The system should take care of legalizing any new nodes that are added. I think this might have been needed prior to r371709 or r307053.	2019-12-10 22:33:31 -08:00
Nico Weber	caa4120906	Revert "[DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission." This reverts commit `307f60a1a3`. DebugInfo/X86/debug-macinfo-split-dwarf.ll fails on Windows: Command Output (stdout): -- $ ":" "RUN: at line 1" $ "c:\src\llvm-project\out\gn\bin\llc.exe" "-mtriple=x86_64-pc-windows-gnu" "-O0" "-split-dwarf-file=foo.dwo" "-filetype=obj" Assertion failed: Section && "Cannot switch to a null section!", file ../../llvm/lib/MC/MCStreamer.cpp, line 1103 Stack dump: 0. Program arguments: c:\src\llvm-project\out\gn\bin\llc.exe -mtriple=x86_64-pc-windows-gnu -O0 -split-dwarf-file=foo.dwo -filetype=obj	2019-12-10 21:32:30 -05:00
Craig Topper	935d41e4bd	[X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256 This is an improvement to `88dacbd436`	2019-12-10 17:29:02 -08:00
Puyan Lotfi	f364686f34	[llvm][MIRVRegNamerUtil] Adding hashing against MachineInstr flags. Now, flags will result in differing hashes for a given MI. In effect, if you have two instructions with everything identical except for their flags then you should get two different hashes and fewer collisions. Differential Revision: https://reviews.llvm.org/D70479	2019-12-10 20:16:14 -05:00
Wang, Pengfei	21bc8631fe	[FPEnv][X86] Constrained FCmp intrinsics enabling on X86 Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision. Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70582	2019-12-11 08:23:09 +08:00
Vlad Tsyrklevich	636c93ed11	Revert "Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty..." This reverts commit `f2ba93971c`, it was causing build timeouts on sanitizer-x86_64-linux-autoconf such as http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/44917	2019-12-10 16:03:17 -08:00
Craig Topper	88dacbd436	[X86] Go back to considering v64i1 as a legal type under min-legal-vector-width=256. Scalarize v64i1 arguments and shuffles under min-legal-vector-width=256. This reverts `3e1aee2ba7` in favor of a different approach. Scalarizing isn't great codegen, but making the type illegal was interfering with k constraint in inline assembly.	2019-12-10 15:07:55 -08:00
Sanjay Patel	16e9315685	[IR] allow undefined elements when checking for splat constants This mimics the related call in SDAG. The caller is responsible for ensuring that undef values are propagated safely.	2019-12-10 17:16:59 -05:00
David Blaikie	4ffd3f44e3	DebugInfo: Clarify some more reasons v4 loc.dwo can't share much implementation with loclists.dwo	2019-12-10 14:11:03 -08:00
Vedant Kumar	30038da15b	[DWARF] Allow cross-CU references of subprogram definitions This allows a call site tag in CU A to reference a callee DIE in CU B without resorting to creating an incomplete duplicate DIE for the callee inside of CU A. We already allow cross-CU references of subprogram declarations, so it doesn't seem like definitions ought to be special. This improves entry value evaluation and tail call frame synthesis in the LTO setting. During LTO, it's common for cross-module inlining to produce a call in some CU A where the callee resides in a different CU, and there is no declaration subprogram for the callee anywhere. In this case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram in order to fill in the call site tag. That empty 'definition' defeats entry value evaluation etc., because the debugger can't figure out what it means. As a follow-up, maybe we could add a DWARF verifier check that a DW_TAG_subprogram at least has a DW_AT_name attribute. rdar://46577651 Differential Revision: https://reviews.llvm.org/D70350	2019-12-10 14:00:57 -08:00
Sourabh Singh Tomar	307f60a1a3	[DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission. Reviewers: dblaikie, aprantl, jini.susan.george Tags: #debug-info #llvm Differential Revision: https://reviews.llvm.org/D71008	2019-12-11 02:19:27 +05:30
Sourabh Singh Tomar	fb4d8fe1a8	Recommit "[DWARF5] Start emitting DW_AT_dwo_name when -gdwarf-5 is specified." Reviewers: dblaikie, aprantl, probinson Tags: #debug-info #llvm Differential Revision: https://reviews.llvm.org/D71185	2019-12-11 01:24:50 +05:30
Sourabh Singh Tomar	d82b6ba21b	Revert "[DWARF5] Start emitting DW_AT_dwo_name when -gdwarf-5 is specified." This reverts commit `6ef01588f4`. Missing Differetial revision.	2019-12-11 01:20:40 +05:30
Sourabh Singh Tomar	6ef01588f4	[DWARF5] Start emitting DW_AT_dwo_name when -gdwarf-5 is specified.	2019-12-11 01:18:02 +05:30
Yonghong Song	7d0e8930ed	[BPF] put not-section-attribute externs into BTF ".extern" data section Currently for extern variables with section attribute, those BTF_KIND_VARs will not be placed in any DataSec. This is inconvenient as any other generated BTF_KIND_VAR belongs to one DataSec. This patch put these extern variables into ".extern" section so bpf loader can have a consistent processing mechanism for all data sections and variables.	2019-12-10 11:45:17 -08:00
Hans Wennborg	49da20ddb4	Revert `30e8f80fd5` "[DebugInfo] Don't create multiple DBG_VALUEs when sinking" This caused non-determinism in the compiler, see command on the Phabricator code review. > This patch addresses a performance problem reported in PR43855, and > present in the reapplication in in 001574938e5. It turns out that > MachineSink will (often) move instructions to the first block that > post-dominates the current block, and then try to sink further. This > means if we have a lot of conditionals, we can needlessly create large > numbers of DBG_VALUEs, one in each block the sunk instruction passes > through. > > To fix this, rather than immediately sinking DBG_VALUEs, record them in > a pass structure. When sinking is complete and instructions won't be > sunk any further, new DBG_VALUEs are added, avoiding lots of > intermediate DBG_VALUE $noregs being created. > > Differential revision: https://reviews.llvm.org/D70676	2019-12-10 19:20:11 +01:00
Simon Cook	a6e50e40e6	[RISCV] Improve assembler missing feature warnings This adds support for printing improved missing feature error messages from the assembler, which now indicates which feature caused the parse to fail. Differential Revision: https://reviews.llvm.org/D69899	2019-12-10 16:44:48 +00:00
Francesco Petrogalli	0be81968a2	[VectorUtils] Introduce the Vector Function Database (VFDatabase). This patch introduced the VFDatabase, the framework proposed in http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [] In this patch the VFDatabase is used to bridge the TargetLibraryInfo (TLI) calls that were previously used to query for the availability of vector counterparts of scalar functions. The VFISAKind field `ISA` of VFShape have been moved into into VFInfo, under the assumption that different vector ISAs may provide the same vector signature. At the moment, the vectorizer accepts any of the available ISAs as long as the signature provided by the VFDatabase matches the one expected in the vectorization process. For example, when targeting AVX or AVX2, which both have 256-bit registers, the IR signature of the two vector functions associated to the two ISAs is the same. The `getVectorizedFunction` method at the moment returns the first available match. We will need to add more heuristics to the search system to decide which of the available version (TLI, AVX, AVX2, ...) the system should prefer, when multiple versions with the same VFShape are present. Some of the code in this patch is based on the work done by Sumedh Arani in https://reviews.llvm.org/D66025. [] Notice that in the proposal the VFDatabase was called SVFS. The name VFDatabase is more in line with LLVM recommendations for naming classes and variables. Differential Revision: https://reviews.llvm.org/D67572	2019-12-10 16:36:44 +00:00
Mikhail Maltsev	e6d3261c67	[ARM][MVE] Refactor complex vector intrinsics [NFCI] Summary: This patch refactors instruction selection of the complex vector addition, multiplication and multiply-add intrinsics, so that it is now based on TableGen patterns rather than C++ code. It also changes the first parameter (halving vs non-halving) of the arm_mve_vcaddq IR intrinsic to match the corresponding instruction encoding, hence it requires some changes in the tests. The patch addresses David's comment in https://reviews.llvm.org/D71190 Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM Reviewed By: dmgreen Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71245	2019-12-10 16:21:52 +00:00
diggerlin	98f5f022f0	[BUG-FIX][XCOFF] fixed a bug of XCOFFObjectFile.cpp when there is padding at the last csect of a sections SUMMARY: Fixed a bug of XCOFFObjectFile.cpp when there is padding at the last csect of a sections. when there is a tail padding of a section, but the value of CurrentAddressLocation do not be increased by the padding size. it will hit assert assert(CurrentAddressLocation == Section->Address && "We should have no padding between sections."); Reviewers: daltenty,hubert.reinterpretcast, Differential Revision: https://reviews.llvm.org/D70859	2019-12-10 11:14:49 -05:00
Yonghong Song	d77ae1552f	[DebugInfo] Support to emit debugInfo for extern variables Extern variable usage in BPF is different from traditional pure user space application. Recent discussion in linux bpf mailing list has two use cases where debug info types are required to use extern variables: - extern types are required to have a suitable interface in libbpf (bpf loader) to provide kernel config parameters to bpf programs. https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t - extern types are required so kernel bpf verifier can verify program which uses external functions more precisely. This will make later link with actual external function no need to reverify. https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed This patch added clang support to emit debuginfo for extern variables with a TargetInfo hook to enable it. The debuginfo for the extern variable is emitted only if that extern variable is referenced in the current compilation unit. Currently, only BPF target enables to generate debug info for extern variables. The emission of such debuginfo is disabled for C++ at this moment since BPF only supports a subset of C language. Emission with C++ can be enabled later if an appropriate use case is identified. -fstandalone-debug permits us to see more debuginfo with the cost of bloated binary size. This patch did not add emission of extern variable debug info with -fstandalone-debug. This can be re-evaluated if there is a real need. Differential Revision: https://reviews.llvm.org/D70696	2019-12-10 08:09:51 -08:00
Sanjay Patel	396d18aeb6	[InstCombine] replace shuffle's insertelement operand if inserted scalar is not demanded This pattern is noted as a regression from: D70246 ...where we removed an over-aggressive shuffle simplification. SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses, so I'm proposing to pattern match the minimal sequence directly. This fold does not conflict with any of our current shuffle undef/poison semantics. Differential Revision: https://reviews.llvm.org/D71220	2019-12-10 10:10:05 -05:00
Guillaume Chatelet	1b2842bf90	[Alignment][NFC] CreateMemSet use MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71213	2019-12-10 15:17:44 +01:00
stozer	f2ba93971c	Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty... basic blocks Originally applied in `72ce759928`. Fixed a build failure caused by incorrect use of cast instead of dyn_cast. This reverts commit `8b0780f795`.	2019-12-10 13:33:32 +00:00
Sam Parker	933de40729	[TypePromotion] Query target register width TargetLoweringInfo may report that an integer should be promoted, but it maybe provide a size that isn't natively supported by the target register file... So check this before trying to perform a promotion. This is to fix some chromium issues: https://bugs.chromium.org/p/chromium/issues/detail?id=1031978 https://bugs.chromium.org/p/chromium/issues/detail?id=1031979 Differential Revision: https://reviews.llvm.org/D71200	2019-12-10 13:23:00 +00:00
Kiran Chandramohan	965ed1e974	[AArch64] Fix issues with large arrays on stack Summary: This patch fixes a few issues when large arrays are allocated on the stack. Currently, clang has inconsistent behaviour, for debug builds there is an assertion failure when the array size on stack is around 2GB but there is no assertion when the stack is around 8GB. For release builds there is no assertion, the compilation succeeds but generates incorrect code. The incorrect code generated is due to using int/unsigned int instead of their 64-bit counterparts. This patch, 1) Removes the assertion in frame legality check. 2) Converts int/unsigned int in some places to the 64-bit variants. This helps in generating correct code and removes the inconsistent behaviour. 3) Adds a test which runs without optimisations. Reviewers: sdesmalen, efriedma, fhahn, aemerson Reviewed By: efriedma Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70496	2019-12-10 11:44:41 +00:00
Simon Tatham	0e894edee1	[TableGen] Permit dag operators to be unset. This is not a new semantic feature. The syntax `(? 1, 2, 3)` was disallowed by the parser in a dag //expression//, but there were already ways to sneak a `?` into the operator field of a dag //value//, e.g. by initializing it from a class template parameter which is then set to `?` by the instantiating `def`. This patch makes `?` in the operator slot syntactically legal, so it's now easy to construct dags with an unset operator. Also, the semantics of `!con` are relaxed so that it will allow a combination of set and unset operator fields in the dag nodes it's concatenating, with the restriction that all the operators that are //not// unset still have to agree with each other. Reviewers: hfinkel, nhaehnle Reviewed By: hfinkel, nhaehnle Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71195	2019-12-10 11:09:40 +00:00
Cullen Rhodes	1b9a608c84	[AArch64][SVE] Add wide compare immediate patterns Summary: Recognize wide compares where the wide operand is a splat of a scalar value in the appropriate range and convert to the immediate variant of the instruction. Patch by Graham Hunter Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71009	2019-12-10 10:41:22 +00:00
Mikael Holmen	4763267eee	[LegalizeTypes] Bugfixes for big-endian targets when handling BITCASTs Summary: This fixes PR44135. The special case when we promote a bitcast from a vector to an int needs special handling when we are on a big-endian target. Prior to this fix, for the added vec_to_int we see the following in the SelectionDAG printouts Type-legalized selection DAG: %bb.1 'foo:bb.1' SelectionDAG has 9 nodes: t0: ch = EntryToken t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0 t17: v4i32 = bitcast t2 t23: i32 = extract_vector_elt t17, Constant:i32<3> t8: ch,glue = CopyToReg t0, Register:i32 $r0, t23 t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1 and I think here the extract_vector_elt is wrong and extracts the value from the wrong index. The program program should return the 32 bits made up of the elements at index 4 and 5 in the vec6 array, but with t23: i32 = extract_vector_elt t17, Constant:i32<3> as far as I can tell, we will extract values that originally didn't even exist in the vec6 vectore. If we would instead extract the element at index 2 we would get the wanted values. With this fix we insert a right shift after the bitcast in DAGTypeLegalizer::PromoteIntRes_BITCAST which then gives us Type-legalized selection DAG: %bb.1 'vec_to_int:bb.1' SelectionDAG has 9 nodes: t0: ch = EntryToken t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0 t23: v4i32 = bitcast t2 t27: i32 = extract_vector_elt t23, Constant:i32<2> t8: ch,glue = CopyToReg t0, Register:i32 $r0, t27 t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1 So now we get t27: i32 = extract_vector_elt t23, Constant:i32<2> which is what we want. Similarly, the new int_to_vec testcase exposes a bug where we cast the other direction. Then we instead need to add a left shift before the bitcast on big-endian targets for the bits in the input integer to end up at the exptected place in the vector. Reviewers: bogner, spatel, craig.topper, t.p.northover, dmgreen, efriedma, SjoerdMeijer, samparker Reviewed By: efriedma Subscribers: eli.friedman, bjope, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70942	2019-12-10 11:22:35 +01:00
Johannes Doerfert	eb3e81f43f	[OpenMP][NFCI] Introduce llvm/IR/OpenMPConstants.h Summary: The new OpenMPConstants.h is a location for all OpenMP related constants (and helpers) to live. This patch moves the directives there (the enum OpenMPDirectiveKind) and rewires Clang to use the new location. Initially part of D69785. Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim Subscribers: jholewinski, ppenzin, penzn, llvm-commits, cfe-commits, jfb, guansong, bollu, hiraditya, mgorny Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69853	2019-12-10 00:10:09 -06:00
Yonghong Song	4448125007	[BPF] Support to emit debugInfo for extern variables extern variable usage in BPF is different from traditional pure user space application. Recent discussion in linux bpf mailing list has two use cases where debug info types are required to use extern variables: - extern types are required to have a suitable interface in libbpf (bpf loader) to provide kernel config parameters to bpf programs. https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t - extern types are required so kernel bpf verifier can verify program which uses external functions more precisely. This will make later link with actual external function no need to reverify. https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed This patch added bpf support to consume such info into BTF, which can then be used by bpf loader. Function processFuncPrototypes() only adds extern function definitions into BTF. The functions with actual definition have been added to BTF in some other places. Differential Revision: https://reviews.llvm.org/D70697	2019-12-09 21:53:29 -08:00
Puyan Lotfi	479e3b85e2	[NFCi][llvm][MIRVRegNamerUtils] Making some code cleanup and stylistic changes. Making some changes to MIRVRegNamerUtils.cpp to use some more modern c++ features as well as some changes to generally make the code more concise and more understandable. I make this an NFCi because in one case I drop the whole "if (!MO->isDef()) MO->setIsKill(false);" thing that was added in the original implementation, generally because I don't think this is really semantically sound. I also changed up the implementation of VRegRenamer::createVirtualRegisterWithLowerName somewhat because I am now lower-casing the name unconditionally because I confirmed that that was in fact aditya_nandakumar@apple.com's intent. In all other cases, behavior should not be changed. Differential Revision: https://reviews.llvm.org/D71182	2019-12-09 23:35:27 -05:00
Fangrui Song	9574757dba	[MC] Delete MCCodePadder D34393 added MCCodePadder as an infrastructure for padding code with NOP instructions. It lacked tests and was not being worked on since then. Intel has now worked on an assembler patch to mitigate performance loss after applying microcode update for the Jump Conditional Code Erratum. https://www.intel.com/content/www/us/en/support/articles/000055650/processors.html This new patch shares similarity with MCCodePadder, but has a concrete use case in mind and is being actively developed. The infrastructure it introduces can potentially be used for general performance improvement via alignment. Delete the unused MCCodePadder so that people can develop the new feature from a clean state. Reviewed By: jyknight, skan Differential Revision: https://reviews.llvm.org/D71106	2019-12-09 19:21:31 -08:00
QingShan Zhang	05b0c76aa7	[NFC][MacroFusion] Adding the assertion if someone want to fuse more than 2 instructions As discussed in https://reviews.llvm.org/D69998, we miss to create some dependency edges if chained more than 2 instructions. Adding an assertion here if someone want to chain more than 2 instructions. Differential Revision: https://reviews.llvm.org/D71180	2019-12-10 03:10:21 +00:00
Huihui Zhang	6507e13589	[NFC] Add { } to silence compiler warning [-Wmissing-braces]. ../llvm/lib/Target/PowerPC/PPCISelLowering.cpp:5371:37: warning: suggest braces around initialization of subobject [-Wmissing-braces] std::array<EVT, 2> ReturnTypes = {MVT::Other, MVT::Glue}; ^~~~~~~~~~~~~~~~~~~~~ { }	2019-12-09 17:19:34 -08:00
Liu, Chen3	bbf7860b93	add support for strict operation fpextend/fpround/fsqrt on X86 backend Differential Revision: https://reviews.llvm.org/D71184	2019-12-10 09:04:28 +08:00
Eric Christopher	9c6b7f68b8	Revert "[ARM][MVE] Add intrinsics for immediate shifts." and two follow-on commits: one warning fix and one functionality. As it's breaking at least the lto bot: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio This reverts commits: `8d70f3c933` `ff4dceef92` `d97b3e3e65`	2019-12-09 16:47:38 -08:00
Eli Friedman	7c69a03c56	[ConstantFold][SVE] Fix constant folding for shufflevector. Don't try to fold away shuffles which can't be folded. Fix creation of shufflevector constant expressions. Differential Revision: https://reviews.llvm.org/D71147	2019-12-09 15:31:50 -08:00
Eli Friedman	f1ddef34f1	[AArch64][SVE] Implement SPLAT_VECTOR for i1 vectors. The generated sequence with whilelo is unintuitive, but it's the best I could come up with given the limited number of SVE instructions that interact with scalar registers. The other sequence I was considering was something like dup+cmpne, but an extra scalar instruction seems better than an extra vector instruction. Differential Revision: https://reviews.llvm.org/D71160	2019-12-09 15:09:33 -08:00
Jinsong Ji	a0b025b8e7	[PowerPC] [NFC] Cleanup xxpermdi peephole optimization Summary: Following on from rG884351547da2, this patch cleans up the logic for `xxpermdi` peephole optimizations by converting two layers of nested `if`s to early breaks and simplifying the logic. Reviewers: hfinkel, nemanjai, jsji, lkail, #powerpc, steven.zhang Reviewed By: #powerpc, steven.zhang Subscribers: wuzish, steven.zhang, hiraditya, kbarton, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71170 Patch by vddvss (Colin Samples).	2019-12-09 21:41:26 +00:00
Johannes Doerfert	a7d992c0f2	[ValueTracking] Allow context-sensitive nullness check for non-pointers Summary: Same as D60846 and D69571 but with a fix for the problem encountered after them. Both times it was a missing context adjustment in the handling of PHI nodes. The reproducers created from the bugs that caused the old commits to be reverted are included. Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans Subscribers: hiraditya, bollu, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71181	2019-12-09 15:15:52 -06:00
Hiroshi Yamauchi	d9ae493937	[PGO][PGSO] Instrument the code gen / target passes. Summary: Split off of D67120. Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't enable the size optimizations in those passes yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass queries). A second try after reverted D71072. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71149	2019-12-09 12:42:59 -08:00
Jinsong Ji	3d41a58eac	[PowerPC][NFC] Rename ANDI(S)o8 to ANDI(S)8o Summary: This is found during https://reviews.llvm.org/D70758 All the other record forms are having suffix o at the end. ANDIo8 and ANDISo8 are the only two that put o before 8. This patch rename them to be consistent with others. Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg Reviewed By: jhibbits Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70928	2019-12-09 19:21:34 +00:00
Mark Murray	fc3417cb5a	[ARM][MVE][Intrinsics] Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics. Summary: Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen, miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71198	2019-12-09 17:41:47 +00:00
Mark Murray	2eb61fa5d6	[ARM][MVE][Intrinsics] Add VMULL[BT]Q_(INT\|POLY) intrinsics. Summary: Add VMULL[BT]Q_(INT\|POLY) intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71066	2019-12-09 17:41:47 +00:00
Sean Fertile	c78726fae0	[PowerPC] Refactor FinishCall. [NFC] Refactor FinishCall to be more easily understandable as a precursor to implementing indirect calls for AIX. The refactor tries to group similar code together at the cost of some code duplication. The high level overview of the refactor: - Adds a number of helper functions for things like: * Determining if a call is indirect. * What the Opcode for a call is. * Transforming the callee for a direct function call. * Extracting the Chain operand from a CallSeqStart node. * Building the operands of the call. - Adds helpers for building the indirect call DAG nodes (excluding the call instruction itself which is created in `FinishCall`). - Removes PrepareCall, which has been subsumed by the helpers. - Rename 'InFlag' to 'Glue'. - FinishCall has been refactored to: 1) Set TOC pointer usage on the DAG for the TOC based subtargets. 2) Calculate if a call is indirect. 3) Determine the Opcode to use for the call instruction. 4) Transform the Callee for direct calls, or build the DAG nodes for indirect calls. 5) Buildup the call operands. 6) Emit the call instruction. 7) If needed, emit the callSeqEnd Node and finish lowering by calling `LowerCallResult` Differential Revision: https://reviews.llvm.org/D70126	2019-12-09 12:40:15 -05:00
Simon Tatham	8d70f3c933	[ARM] Fix NEON failure introduced by D71065. I rewrote the isel tablegen for MVE immediate shifts, and accidentally removed the `let Predicates=[HasMVEInt]` that was wrapping the old version, which seems to have allowed those rules to cause trouble on non-MVE targets. That's what I get for only re-running the MVE tests.	2019-12-09 16:56:00 +00:00
Simon Tatham	d97b3e3e65	[ARM][MVE] Add intrinsics for immediate shifts. Summary: This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-09 15:44:09 +00:00
Thomas Raoux	caabb713ea	[ModuloSchedule] Fix data types in ModuloScheduleExpander::isLoopCarried The cycle values in modulo scheduling results can be negative. The result of ModuloSchedule::getCycle() must be received as an int type. Patch by Masaki Arai! Differential Revision: https://reviews.llvm.org/D71122	2019-12-09 07:37:00 -08:00
Sam Elliott	c20930a724	[RISCV] Machine Operand Flag Serialization Summary: These hooks ensure that the RISC-V backend can serialize and parse MIR correctly. Reviewers: jrtc27, luismarques Reviewed By: luismarques Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70666	2019-12-09 13:18:32 +00:00
Djordje Todorovic	9b9e995819	[DebugInfo][EarlyCSE] Use the salvageDebugInfoOrMarkUndef(); NFC Use the newest API. Differential Revision: https://reviews.llvm.org/D71061	2019-12-09 13:57:35 +01:00
Jeremy Morse	00e238896c	[DebugInfo] Nerf placeDbgValues, with prejudice CodeGenPrepare::placeDebugValues moves variable location intrinsics to be immediately after the Value they refer to. This makes tracking of locations very easy; but it changes the order in which assignments appear to the debugger, from the source programs order to the order in which the optimised program computes values. This then leads to PR43986 and PR38754, where variable locations that were in a conditional block are made unconditional, which is highly misleading. This patch adjusts placeDbgValues to only re-order variable location intrinsics if they use a Value before it is defined, significantly reducing the damage that it does. This is still not 100% safe, but the rest of CodeGenPrepare needs polishing to correctly update debug info when optimisations are performed to fully fix this. This will probably break downstream debuginfo tests -- if the instruction-stream position of variable location changes isn't the focus of the test, an easy fix should be to manually apply placeDbgValues' behaviour to the failing tests, moving dbg.value intrinsics next to SSA variable definitions thus: %foo = inst1 %bar = ... %baz = ... void call @llvm.dbg.value(metadata i32 %foo, ... to %foo = inst1 void call @llvm.dbg.value(metadata i32 %foo, ... %bar = ... %baz = ... This should return your test to exercising whatever it was testing before. Differential Revision: https://reviews.llvm.org/D58453	2019-12-09 12:52:10 +00:00
Mikhail Maltsev	0d1490bf6a	[ARM][MVE] Add complex vector intrinsics Summary: This patch adds intrinsics for the following MVE instructions: * VCADD, VHCADD * VCMUL * VCMLA Each of the above 3 groups has a corresponding new LLVM IR intrinsic. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71190	2019-12-09 12:05:59 +00:00
David Green	b1aba0378e	[ARM] Enable MVE masked loads and stores With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968	2019-12-09 11:37:34 +00:00
David Green	be7a107070	[ARM] Teach the Arm cost model that a Shift can be folded into other instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966	2019-12-09 10:24:33 +00:00
David Green	f008b5b8ce	[ARM] Additional tests and minor formatting. NFC This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.	2019-12-09 10:24:33 +00:00
David Stenberg	6965f835b4	[DebugInfo] Make describeLoadedValue() reg aware Summary: Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes situations in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: djtodoro, vsk Subscribers: ormris, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D70431	2019-12-09 10:47:49 +01:00
David Stenberg	f3696533f2	Revert "[DebugInfo] Make describeLoadedValue() reg aware" This reverts commit `3cd93a4efc`. I'll recommit with a well-formatted arcanist commit message.	2019-12-09 10:45:13 +01:00
David Stenberg	3cd93a4efc	[DebugInfo] Make describeLoadedValue() reg aware Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes a case in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers.	2019-12-09 10:44:17 +01:00
Hans Wennborg	a38396939c	Revert `393dacacf7` "[ARM] Enable TypePromotion by default" This caused "Too many bits for uint64_t" asserts when building Chromium. See https://crbug.com/1031978#c2 for a reproducer. I'll follow up on the llvm-commits thread with a creduced version. > ARMCodeGenPrepare has already been generalized and renamed to > TypePromotion. We've had it enabled and tested downstream for a > while, so enable it by default. > > Differential Revision: https://reviews.llvm.org/D70998	2019-12-09 09:39:31 +01:00
rollrat	9fdb7ac503	[NFC][LivePhysRegs] Fix incorrect comment Reviewers: #llvm, tellenbach Reviewed By: tellenbach Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71051 Patch by: rollrat <rollrat.cse@gmail.com>	2019-12-08 21:07:28 +01:00
Sanjay Patel	1c4dd3ae2f	[InstSimplify] fold copysign with negated operand, part 2 This is another transform suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 Unlike rG12f39e0fede9, it doesn't look like the backend matches this variant.	2019-12-08 10:16:29 -05:00
Sanjay Patel	12f39e0fed	[InstSimplify] fold copysign with negated operand This is another transform suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 The backend for some targets already manages to get this if it converts copysign to bitwise logic.	2019-12-08 10:08:02 -05:00
David Green	792fab343b	[ARM] Attempt to use whole register vmovs for MVE shuffles. MVE doesn't have the range of shuffle instructions available in Neon. We also cannot use the trick of cutting a difficult vector shuffle in half to simplify things. Instead we need to be more careful about how we lower shuffles. This patch adds an extra combine that attempts to find "whole lane" vmovs when lowering shuffles of smaller types. This helps us make some shuffles a lot simpler, generating single lane movs for the parts that can make use of it, falling back to the original shuffle for the rest. Differential Revision: https://reviews.llvm.org/D69509	2019-12-08 10:53:54 +00:00
David Green	3a6eb5f160	[ARM] Disable VLD4 under MVE Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109	2019-12-08 10:37:29 +00:00
Florian Hahn	c491949694	[LV] Pick correct BB as insert point when fixing PHI for FORs. Currently we fail to pick the right insertion point when PreviousLastPart of a first-order-recurrence is a PHI node not in the LoopVectorBody. This can happen when PreviousLastPart is produce in a predicated block. In that case, we should pick the insertion point in the BB the PHI is in. Fixes PR44020. Reviewers: hsaito, fhahn, Ayal, dorit Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D71071	2019-12-07 19:32:00 +00:00
Ulrich Weigand	a6fcdb211d	[SystemZ] Fix build bot failures My patch `9db13b5a7d` seems to have caused some build bots to fail due to warnings that appear only when using -Wcovered-switch-default. This patch is an attempt to fix this by trying to avoid both the warning "default label in switch which covers all enumeration values" for the inner switch statements and at the same time the warning "this statement may fall through" for the outer switch statement in getVectorComparison (SystemZISelLowering.cpp).	2019-12-07 19:37:16 +01:00
Florian Hahn	c25de56905	[SimplifyCFG] Account for N being null. Fixes a crash, e.g. http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15119/	2019-12-07 17:23:42 +00:00
Yonghong Song	5ea611daf9	[BPF] Support weak global variables for BTF Generate types for global variables with "weak" attribute. Keep allocation scope the same for both weak and non-weak globals as ELF symbol table can determine whether a global symbol is weak or not. Differential Revision: https://reviews.llvm.org/D71162	2019-12-07 08:58:19 -08:00
Rodrigo Caetano Rocha	d714aa0dfd	[SimplifyCFG] Handle AssumptionCache being null. AssumptionCache can be null in SimplifyCFGOptions. However, FoldCondBranchOnPHI() was not properly handling that when passing a null AssumptionCache to simplifyCFG. Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com> Reviewers: fhahn, lebedev.ri, spatel Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D69963	2019-12-07 16:54:49 +00:00
Ulrich Weigand	9db13b5a7d	[FPEnv] Constrained FCmp intrinsics This adds support for constrained floating-point comparison intrinsics. Specifically, we add: declare <ty2> @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) declare <ty2> @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) The first variant implements an IEEE "quiet" comparison (i.e. we only get an invalid FP exception if either argument is a SNaN), while the second variant implements an IEEE "signaling" comparison (i.e. we get an invalid FP exception if either argument is any NaN). The condition code is implemented as a metadata string. The same set of predicates as for the fcmp instruction is supported (except for the "true" and "false" predicates). These new intrinsics are mapped by SelectionDAG codegen onto two new ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again representing quiet vs. signaling comparison operations. Otherwise those nodes look like SETCC nodes, with an additional chain argument and result as usual for strict FP nodes. The patch includes support for the common legalization operations for those nodes. The patch also includes full SystemZ back-end support for the new ISD nodes, mapping them to all available SystemZ instruction to fully implement strict semantics (scalar and vector). Differential Revision: https://reviews.llvm.org/D69281	2019-12-07 11:28:39 +01:00
Florian Hahn	e60b36cf92	[VPlan] Rename VPlanHCFGTransforms to VPlanTransforms (NFC). The file is intended to gather various VPlan transformations, not only CFG related transforms. Actually, the only transformation there is not CFG related. Reviewers: Ayal, gilr, hsaito, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D70732	2019-12-07 08:56:35 +00:00
Kai Luo	884351547d	[PowerPC] Fix MI peephole optimization for splats Summary: This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap. Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered. This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register. Here is an example in pseudo-MIR of what happens in the test cased added in this patch: Before PPC MI peephole optimization: ``` %arg = XVADDDP %0, %1 $f1 = COPY %arg.sub_64 call double rint(double) %res.first = COPY $f1 %vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64 %arg.swapped = XXPERMDI %arg, %arg, 2 $f1 = COPY %arg.swapped.sub_64 call double rint(double) %res.second = COPY $f1 %vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64 %vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0 %vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2 ; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ] ``` After optimization: ``` ; ... %vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0 ; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1 ; so the pass replaces the swap with a copy: %vec.res = COPY %vec.res.splat ; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ] ``` As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat. Committed for vddvss (Colin Samples). Thanks Colin for the patch! Differential Revision: https://reviews.llvm.org/D69497	2019-12-07 14:51:20 +08:00
Amara Emerson	7ac9662401	[AArch64][GlobalISel] Add missing default statement to a switch in the selector.	2019-12-06 17:43:27 -08:00
Sterling Augustine	aa3c877fb5	Move variable only used in an assert into the assert itself. This prevents unused variable warnings from breaking the build.	2019-12-06 17:09:19 -08:00
Amara Emerson	c77b441140	[AArch64][GlobalISel] Add support for selection of vector G_SHL with immediates. Only implemented for the type combinations already supported for G_SHL. Differential Revision: https://reviews.llvm.org/D71153	2019-12-06 16:24:57 -08:00
Sam Clegg	b4f4e370b5	[WebAssebmly][MC] Support .import_name/.import_field asm directives Convert the MC test to use asm rather than bitcode. This is a precursor to https://reviews.llvm.org/D70520. Differential Revision: https://reviews.llvm.org/D70877	2019-12-06 15:09:56 -08:00
Reid Kleckner	1d9291cc78	[MC] Rewrite tablegen for printInstrAlias to comiple faster, NFC Before this change, the *InstPrinter.cpp files of each target where some of the slowest objects to compile in all of LLVM. See this snippet produced by ClangBuildAnalyzer: https://reviews.llvm.org/P8171$96 Search for "InstPrinter", and see that it shows up in a few places. Tablegen was emitting a large switch containing a sequence of operand checks, each of which created many conditions and many BBs. Register allocation and jump threading both did not scale well with such a large repetitive sequence of basic blocks. So, this change essentially turns those control flow structures into data. The previous structure looked like: switch (Opc) { case TGT::ADD: // check alias 1 if (MI->getOperandCount() == N && // check num opnds MI->getOperand(0).isReg() && // check opnd 0 ... MI->getOperand(1).isImm() && // check opnd 1 AsmString = "foo"; break; } // check alias 2 if (...) ... return false; The new structure looks like: OpToPatterns: Sorted table of opcodes mapping to pattern indices. \-> Patterns: List of patterns. Previous table points to subrange of patterns to match. \-> Conds: The if conditions above encoded as a kind and 32-bit value. See MCInstPrinter.cpp for the details of how the new data structures are interpreted. Here are some before and after metrics. Time to compile AArch64InstPrinter.cpp: 0m29.062s vs. 0m2.203s size of the obj: 3.9M vs. 676K size of clang.exe: 97M vs. 96M I have not benchmarked disassembly performance, but typically disassemblers are bottlenecked on IO and string processing, not alias matching, so I'm not sure it's interesting enough to be worth doing. Reviewers: RKSimon, andreadb, xbolva00, craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D70650	2019-12-06 15:00:18 -08:00
Amara Emerson	84fdd9d7a5	[X86] Fix prolog/epilog mismatch for stack protectors on win32-macho. The xor'ing behaviour is only used for msvc/crt environments, when we're targeting macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing when targeting win32-macho to be consistent. Differential Revision: https://reviews.llvm.org/D71095	2019-12-06 14:44:56 -08:00
Craig Topper	28b573d249	[TargetLowering] Fix another potential FPE in expandFP_TO_UINT D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105	2019-12-06 14:11:04 -08:00
Teresa Johnson	c8e36862f5	[WPD] Remove unused parameter (NFC) Remove unused parameter.	2019-12-06 13:14:21 -08:00
Reid Kleckner	c089f02898	[X86] Don't setup and teardown memory for a musttail call Summary: musttail calls should not require allocating extra stack for arguments. Updates to arguments passed in memory should happen in place before the epilogue. This bug was mostly a missed optimization, unless inalloca was used and store to push conversion fired. If a reserved call frame was used for an inalloca musttail call, the call setup and teardown instructions would be deleted, and SP adjustments would be inserted in the prologue and epilogue. You can see these are removed from several test cases in this change. In the case where the stack frame was not reserved, i.e. call frame optimization fires and turns argument stores into pushes, then the imbalanced call frame setup instructions created for inalloca calls become a problem. They remain in the instruction stream, resulting in a call setup that allocates zero bytes (expected for inalloca), and a call teardown that deallocates the inalloca pack. This deallocation was unbalanced, leading to subsequent crashes. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71097	2019-12-06 12:58:54 -08:00
Hiroshi Yamauchi	2eb30fafa5	Revert "[PGO][PGSO] Instrument the code gen / target passes." This reverts commit `9a0b5e1407`. This seems to break buildbots.	2019-12-06 12:17:32 -08:00
Wenlei He	7b61ae68ec	[AutoFDO] Inline replay for cold/small callees from sample profile loader Summary: Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout. This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70750	2019-12-06 11:44:45 -08:00
Reid Kleckner	7f63db197e	Avoid naming variable after type to fix GCC 5.3 build GCC says: .../llvm/lib/DebugInfo/GSYM/FunctionInfo.cpp:195:12: error: ‘InfoType’ is not a class, namespace, or enumeration case InfoType::EndOfList: ^ Presumably, GCC thinks InfoType is a variable here. Work around it by using the name IT as is done above.	2019-12-06 11:25:28 -08:00
Sanjay Patel	43e2a901e1	Revert "[InstCombine] reduce code duplication; NFC" This reverts commit `db57396584`. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.	2019-12-06 14:24:14 -05:00
Sanjay Patel	b6d6f5470f	Revert "[InstCombine] improve readability; NFC" This reverts commit `7250ef3613`. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.	2019-12-06 14:20:44 -05:00
Sanjay Patel	142a75a9b1	Revert "[InstCombine] reduce indentation; NFC" This reverts commit `8bf8ef7116`. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.	2019-12-06 14:19:02 -05:00
Alina Sbirlea	c7faa68142	Revert "ARM-Darwin: keep the frame register reserved even if not updated." This reverts commit `a7d90af1be`. This revision came back as the root-cause for crashes in internal ARM-IOS apps. Reproducer in https://bugs.llvm.org/show_bug.cgi?id=44231.	2019-12-06 10:59:26 -08:00
Sanjay Patel	7ff0fcb53f	[x86] add cost model special-case for insert/extract from element 0 This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023	2019-12-06 13:50:25 -05:00
Hiroshi Yamauchi	9a0b5e1407	[PGO][PGSO] Instrument the code gen / target passes. Summary: Split off of D67120. Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't enable the size optimizations in those passes yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass queries). Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71072	2019-12-06 10:43:39 -08:00
Sanjay Patel	8bf8ef7116	[InstCombine] reduce indentation; NFC	2019-12-06 13:26:45 -05:00
Sanjay Patel	7250ef3613	[InstCombine] improve readability; NFC CreateIntCast returns the input if its type matches, so need to duplicate that check.	2019-12-06 13:26:45 -05:00
Sanjay Patel	db57396584	[InstCombine] reduce code duplication; NFC	2019-12-06 13:26:45 -05:00
Sanjay Patel	6bb62a9d97	[InstCombine] improve readability; NFC	2019-12-06 13:26:44 -05:00
Guozhi Wei	72942459d0	[MBP] Avoid tail duplication if it can't bring benefit Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead. To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks: make sure there is at least one duplication in current work set. the number of duplication should not exceed the number of successors. The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain. Differential Revision: https://reviews.llvm.org/D64376	2019-12-06 09:53:53 -08:00
diggerlin	50d72fa146	[NFC][AIX][XCOFF] if the size of Csect is zero, the Csect do not need write any data into sections SUMMARY: if the size of Csect is zero, the Csect do not need write any data into sections for example, the TOC Csect has zero size, it do not need invoke a Asm.writeSectionData(W.OS, Csect.MCCsect, Layout); Reviewers: daltenty Subscribers: rupprecht, seiyai,hiraditya Differential Revision: https://reviews.llvm.org/D71120	2019-12-06 12:41:38 -05:00
diggerlin	c04b63eccd	[NFC][AIX][XCOFF] fixed compile warning on the strncpy. SUMMARY: There is warning when compile the file XCOFFObjectWriter.cpp /srv/llvm-buildbot-srcatch/llvm-build-dir/openmp-gcc-x86_64-linux-debian/llvm.src/llvm/lib/MC/XCOFFObjectWriter.cpp:414:17: warning: 'char* strncpy(char, const char, size_t)' specified bound 8 equals destination size [-Wstringop-truncation] The patch fixed the warning. Reviewer: daltenty Differential Revision: https://reviews.llvm.org/D71119	2019-12-06 12:22:28 -05:00
John Brawn	984f1bb3e7	[LegalizeTypes] Add missing case for STRICT_FP_ROUND softening This fixes a test failure in test/CodeGen/ARM/fp-intrinsics.ll.	2019-12-06 15:54:27 +00:00
Simon Tatham	3fab4276cb	[ARM][MVE] Fix copy-paste error in VQSHL instruction ids. Summary: The immediate forms of the MVE VQSHL instruction have MC names like `MVE_VSLIimms8` and `MVE_VSLIimmu32`. Those names are confusing, because VSLI is a completely different shift instruction with no semantic relation to VQSHL. But it just happens to be defined immediately before VQSHL in `ARMInstrMVE.td`, so this looks like a copy-paste error. Renamed the ids to match the instruction name. Reviewers: ostannard, dmgreen, MarkMurrayARM, miyuki Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71114	2019-12-06 15:23:23 +00:00
Cullen Rhodes	2c63e8e36d	[AArch64] Fix a bug with jump table generation Summary: When trying to calculate the offsets for the jump table entries we fail to take into account the block alignment, which could be greater than 4 bytes. This led to cases where the jump table offset was too big to fit in a byte. Reviewers: t.p.northover, sdesmalen, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits Committed on behalf of David Sherwood (david-arm) Tags: #llvm Differential Revision: https://reviews.llvm.org/D70533	2019-12-06 14:31:53 +00:00
Gil Rapaport	39ccc099c9	[LV] Record GEP widening decisions in recipe (NFCI) InnerLoopVectorizer's code called during VPlan execution still relies on original IR's def-use relations to decide which vector code to generate, limiting VPlan transformations ability to modify def-use relations and still have ILV generate the vector code. This commit moves GEP operand queries controlling how GEPs are widened to a dedicated recipe and extracts GEP widening code to its own ILV method taking those recorded decisions as arguments. This reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Differential revision: https://reviews.llvm.org/D69067	2019-12-06 13:41:19 +02:00
Cullen Rhodes	b31a531f9b	[AArch64][SVE2] Implement while comparison intrinsics Summary: Adds the following intrinsics: * whilege, whilegt, whilehi, whilehs Reviewers: sdesmalen, rovka, dancgr, efriedma, rengolin, huntergr Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70909	2019-12-06 11:29:34 +00:00

... 3 4 5 6 7 ...

129445 Commits