llvm-project

Commit Graph

Author	SHA1	Message	Date
Nico Weber	cd470717d1	Revert "[DA][TTI][AMDGPU] Add option to select GPUDA with TTI" This reverts commit `a90a6502ab`. Broke tests on Windows: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/13808	2020-01-22 12:56:19 -05:00
Florian Hahn	300997c41a	[AArch64] Don't rename registers with pseudo defs in Ld/St opt. If the root def of for renaming is a noop-pseudo instruction like kill, we would end up without a correct def for the renamed register, causing miscompiles. This patch conservatively bails out on any pseudo instruction. This fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1037912#c70	2020-01-22 09:26:25 -08:00
Matt Arsenault	1192d7b254	AMDGPU/GlobalISel: Handle 16-bank LDS llvm.amdgcn.interp.p1.f16 The pattern is also mishandled by the generated matcher, so workaround this as in the DAG path. The existing DAG tests aren't particularly targeted to just this one intrinsic. These also end up differing in scheduling from SGPR->VGPR operand constraint copies.	2020-01-22 12:10:59 -05:00
David Tenty	45a4aaea7f	[NFC][XCOFF] Refactor Csect creation into TargetLoweringObjectFile Summary: We create a number of standard types of control sections in multiple places for things like the function descriptors, external references and the TOC anchor among others, so it is possible for their properties to be defined inconsistently in different places. This refactor moves their creation and properties into functions in the TargetLoweringObjectFile class hierarchy, where functions for retrieving various special types of sections typically seem to reside. Note: There is one case in PPCISelLowering which is specific to function entry points which we don't address since we don't have access to the TLOF there. Reviewers: DiggerLin, jasonliu, hubert.reinterpretcast Reviewed By: jasonliu, hubert.reinterpretcast Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72347	2020-01-22 12:09:11 -05:00
Stanislav Mekhanoshin	2d0fcf786c	Precommit NFC part of DAGCombiner change. NFC. This is NFC part of DAGCombiner::visitEXTRACT_SUBVECTOR() change in the D73132.	2020-01-22 09:01:22 -08:00
Matt Arsenault	c05f23e409	AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp This is deprecated, but easy to support.	2020-01-22 11:43:53 -05:00
Matt Arsenault	dd09ec1208	AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp8	2020-01-22 11:43:40 -05:00
Hiroshi Yamauchi	ddbc728828	[PGO][PGSO] Update BFI in CodeGenPrepare::optimizeSelectInst. Summary: Without the BFI update, some hot blocks are incorrectly treated as cold code. This fixes a FDO perf regression in the TSVC benchmark from D71288. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73146	2020-01-22 08:36:54 -08:00
Matt Arsenault	0bf434ccd5	AMDGPU: Fix element size assertion The GlobalISel usage called this with bits, but the DAG usage was incorrectly using bytes.	2020-01-22 11:18:45 -05:00
Matt Arsenault	bb562d1af0	AMDGPU/GlobalISel: Keep G_BITCAST out of waterfall loop The waterfall utility function blindly inserts a phi for every def in the loop. We don't need this one to be preserved for every iteration. Saves an extra phi and copy inside the loop body.	2020-01-22 11:16:19 -05:00
Zakk Chen	0cb274de39	[RISCV] Support ABI checking with per function target-features 1. if users don't specific -mattr, the default target-feature come from IR attribute. 2. fixed bug and re-land this patch Reviewers: lenary, asb Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D70837	2020-01-22 08:12:28 -08:00
Simon Pilgrim	a14aa7dabd	[X86][SSE] combineExtractWithShuffle - extract(bictcast(scalar_to_vector(x))) --> x Removes some unnecessary gpr<-->fpu traffic	2020-01-22 16:11:08 +00:00
Matt Arsenault	52ec7379ad	AMDGPU/GlobalISel: Fold add of constant into G_INSERT_VECTOR_ELT Move the subregister base like in the extract case.	2020-01-22 11:09:15 -05:00
Matt Arsenault	d1dbb5e471	AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT	2020-01-22 11:00:49 -05:00
Matt Arsenault	3524d4412c	AMDGPU/GlobalISel: Fix RegBankSelect for G_INSERT_VECTOR_ELT The result and source vector are going to be tied, so these need to be the same bank. The inserted value also needs to be broken down based on the result bank, not the inserted value itself.	2020-01-22 10:57:50 -05:00
Matt Arsenault	e3d352c541	AMDGPU/GlobalISel: Fold constant offset vector extract indexes Handle dynamic vector extracts that use an index that's an add of a constant offset into moving the base subregister of the indexing operation. Force the add into the loop in regbankselect, which will be recognized when selected.	2020-01-22 10:50:59 -05:00
Kazushi (Jam) Marukawa	83b67526d5	[VE] select and selectcc patterns Summary: select and selectcc isel patterns and tests for i32/i64 and fp32/fp64. Includes optimized selectcc patterns for fmin/fmax/maxs/mins. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73195	2020-01-22 16:30:38 +01:00
Matt Arsenault	e93e1b621c	AMDGPU: Fix typo	2020-01-22 10:17:46 -05:00
Matt Arsenault	2fe500ab5b	AMDGPU: Look through casted selects to constant fold bin ops The promotion of the uniform select to i32 interfered with this fold.	2020-01-22 10:16:39 -05:00
Matt Arsenault	bcd91778fe	AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare DAGCombiner does this, but divisions expanded here miss this optimization. Since `67aa18f165`, divisions have been expanded here and missed out on this optimization. Avoids test regressions in a future patch.	2020-01-22 10:16:39 -05:00
Matt Arsenault	a174f0da62	AMDGPU/GlobalISel: Add pre-legalize combiner pass Just copy the AArch64 pass as-is for now, except for removing the memcpy handling.	2020-01-22 10:16:39 -05:00
Sanjay Patel	0ade2abdb0	[InstCombine] fneg(X + C) --> -C - X This is 1 of the potential folds uncovered by extending D72521. We don't seem to do this in the backend either (unless I'm not seeing some target-specific transform). icc and gcc (appears to be target-specific) do this transform. Differential Revision: https://reviews.llvm.org/D73057	2020-01-22 09:48:43 -05:00
Kazushi (Jam) Marukawa	dc69265eea	[VE] setcc isel patterns Summary: SETCC isel patterns and tests for i32/64 and fp32/64 comparison Reviewers: arsenm, rengolin, craig.topper, k-ishizaka Reviewed By: arsenm Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits Tags: #ve, #llvm Differential Revision: https://reviews.llvm.org/D73171	2020-01-22 15:45:57 +01:00
David Green	e9c198278e	[ARM] Basic gather scatter cost model This is a very basic MVE gather/scatter cost model, based roughly on the code that we will currently produce. It does not handle truncating scatters or extending gathers correctly yet, as it is difficult to tell that they are going to be correctly extended/truncated from the limited information in the cost function. This can be improved as we extend support for these in the future. Based on code originally written by David Sherwood. Differential Revision: https://reviews.llvm.org/D73021	2020-01-22 14:41:38 +00:00
Sander de Smalen	4cf16efe49	[AArch64][SVE] Add patterns for unpredicated load/store to frame-indices. This patch also fixes up a number of cases in DAGCombine and SelectionDAGBuilder where the size of a scalable vector is used in a fixed-width context (thus triggering an assertion failure). Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D71215	2020-01-22 14:32:27 +00:00
Jay Foad	e0f0d0e55c	[MachineScheduler] Allow clustering mem ops with complex addresses The generic BaseMemOpClusterMutation calls into TargetInstrInfo to analyze the address of each load/store instruction, and again to decide whether two instructions should be clustered. Previously this had to represent each address as a single base operand plus a constant byte offset. This patch extends it to support any number of base operands. The old target hook getMemOperandWithOffset is now a convenience function for callers that are only prepared to handle a single base operand. It calls the new more general target hook getMemOperandsWithOffset. The only requirements for the base operands returned by getMemOperandsWithOffset are: - they can be sorted by MemOpInfo::Compare, such that clusterable ops get sorted next to each other, and - shouldClusterMemOps knows what they mean. One simple follow-on is to enable clustering of AMDGPU FLAT instructions with both vaddr and saddr (base register + offset register). I've left a FIXME in the code for this case. Differential Revision: https://reviews.llvm.org/D71655	2020-01-22 14:28:24 +00:00
Matt Arsenault	70096ca111	AMDGPU/GlobalISel: Fix RegbankSelect for llvm.amdgcn.fmul.legacy	2020-01-22 09:26:17 -05:00
Matt Arsenault	a722cbf77c	AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec The intermediate instruction drops the extra volatile argument. We are missing an atomic ordering on these.	2020-01-22 09:26:17 -05:00
Matt Arsenault	9c928649a0	AMDGPU: Fix interaction of tfe and d16 This using the wrong result register, and dropping the result entirely for v2f16. This would fail to select on the scalar case. I believe it was also mishandling packed/unpacked subtargets.	2020-01-22 09:26:17 -05:00
Matt Arsenault	b94d3b9b77	AMDGPU/GlobalISel: RegBankSelect interp intrinsics Note this assumes the future use of immediates for immarg, not the current G_CONSTANT which will be emitted.	2020-01-22 09:01:34 -05:00
Matt Arsenault	64e9528201	AMDGPU: Fix missing immarg on llvm.amdgcn.interp.mov The first operand maps to an immediate field, so this should be immarg.	2020-01-22 09:01:34 -05:00
Simon Pilgrim	80656fd7ae	[SelectionDAG] getShiftAmountConstant - assert the type is an integer.	2020-01-22 13:52:44 +00:00
Simon Pilgrim	c784e5451b	Use SelectionDAG::getShiftAmountConstant(). NFCI.	2020-01-22 13:52:43 +00:00
Simon Pilgrim	963f268186	[X86][SSE] combineExtractWithShuffle - pull out repeated extract index code. NFCI.	2020-01-22 12:08:58 +00:00
Kerry McLaughlin	cdcc4f2a44	[AArch64][SVE] Add intrinsic for non-faulting loads Summary: This patch adds the llvm.aarch64.sve.ldnf1 intrinsic, plus DAG combine rules for non-faulting loads and sign/zero extends Reviewers: sdesmalen, efriedma, andwar, dancgr, mgudim, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71698	2020-01-22 11:15:20 +00:00
Sander de Smalen	67d4c9924c	Add support for (expressing) vscale. In LLVM IR, vscale can be represented with an intrinsic. For some targets, this is equivalent to the constexpr: getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1 This can be used to propagate the value in CodeGenPrepare. In ISel we add a node that can be legalized to one or more instructions to materialize the runtime vector length. This patch also adds SVE CodeGen support for VSCALE, which maps this node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD] instructions (scaled multiples of 2, 4, or 8 bytes, respectively). Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner Reviewed by: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D68203	2020-01-22 10:09:27 +00:00
Guillaume Chatelet	0957233320	[Alignment][NFC] Use Align with CreateMaskedStore Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73106	2020-01-22 11:04:39 +01:00
Sam Parker	c04b9ba595	[ARM][MVE] Clear MaskedInsts vector In MVETailPredication, clear the vector before running on a new loop. Differential Revision: https://reviews.llvm.org/D73048	2020-01-22 04:27:36 -05:00
Evgeny Leviant	3593b5b3e9	[llvm-as] Fix assembling of index with multiple summaries sharing single GUID Differential revision: https://reviews.llvm.org/D73091	2020-01-22 01:09:13 -08:00
Kazushi (Jam) Marukawa	3a906a9f4e	[VE] i<N> and fp32/64 arguments, return values and constants Summary: Support for i<N> and fp32/64 arguments (in register), return values and constants along with tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73092	2020-01-22 09:17:44 +01:00
Amara Emerson	2e25d75aaa	[AArch64][GlobalISel] Fix llvm.returnaddress(0) selection when LR is clobbered. The code was originally ported from SelectionDAG, which does CSE behind the scenes automatically. When copying the return address from LR live into the function, we need to make sure to use the single copy on function entry. Any later copy from LR could be using clobbered junk. Implement this by caching the copy in the per-MF state in the selector. Should hopefully fix the AArch64 sanitiser buildbot failure.	2020-01-21 22:53:32 -08:00
Daniil Suchkov	7bdc83f340	[LICM] Don't cache AliasSetTrackers when run under legacy PM Summary: This is the first step towards complete removal of AST caching from LICM. Attempts to keep LICM's AST cache up to date across passes can lead to miscompiles like this one: https://bugs.llvm.org/show_bug.cgi?id=44320. LICM has already switched to using MemorySSA to do sinking and hoisting and only builds an AliasSetTracker on demand for the promoteToScalars step, without caching it from one LICM instance to the next. Given this, we don't have compile-time reasons to keep AST caching any more. The only scenario where the caching would be used currently is when using the LegacyPassManager and setting -enable-mssa-loop-dependency=false. This switch should help us to surface any possible issues that may arise along this way, also it turns subsequent removal of AST caching into NFC. Reviewers: asbirlea, fhahn, efriedma, reames Reviewed By: asbirlea Subscribers: hiraditya, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73081	2020-01-22 13:16:45 +07:00
Austin Kerbow	a90a6502ab	[DA][TTI][AMDGPU] Add option to select GPUDA with TTI Summary: Enable the new diveregence analysis by default for AMDGPU. Reviewers: rampitec, nhaehnle, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73049	2020-01-21 21:13:20 -08:00
Carl Ritson	6b4b3e2856	[AMDGPU] SIRemoveShortExecBranches should not remove branches exiting loops Summary: Check that a s_cbranch_execz is not a loop exit before removing it. As the pass is generating infinite loops. Reviewers: cdevadas, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits, dstuttard, foad Tags: #llvm Differential Revision: https://reviews.llvm.org/D72997	2020-01-22 13:18:40 +09:00
cdevadas	e53a9d96e6	Resubmit: [AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-22 13:18:32 +09:00
Lang Hames	e0a6093a74	[ORC] Fix a missing move in `ce2207abaf`. This should fix the build failure at http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/32524 and others.	2020-01-21 20:14:50 -08:00
Lang Hames	ce2207abaf	[ORC] Add support for emulated TLS to ORCv2. This commit adds a ManglingOptions struct to IRMaterializationUnit, and replaces IRCompileLayer::CompileFunction with a new IRCompileLayer::IRCompiler class. The ManglingOptions struct defines the emulated-TLS state (via a bool member, EmulatedTLS, which is true if emulated-TLS is enabled and false otherwise). The IRCompileLayer::IRCompiler class wraps an IRCompiler (the same way that the CompileFunction typedef used to), but adds a method to return the IRCompileLayer::ManglingOptions that the compiler will use. These changes allow us to correctly determine the symbols that will be produced when a thread local global variable defined at the IR level is compiled with or without emulated TLS. This is required for ORCv2, where MaterializationUnits must declare their interface up-front. Most ORCv2 clients should not require any changes. Clients writing custom IR compilers will need to wrap their compiler in an IRCompileLayer::IRCompiler, rather than an IRCompileLayer::CompileFunction, however this should be a straightforward change (see modifications to CompileUtils.* in this patch for an example).	2020-01-21 19:55:33 -08:00
Amara Emerson	67a8775322	[AArch64] Don't generate gpr CSEL instructions in early-ifcvt if regclasses aren't compatible. In GlobalISel we may in some unfortunate circumstances generate PHIs with operands that are on separate banks. If-conversion doesn't currently check for that case and ends up generating a CSEL on AArch64 with incorrect register operands. Differential Revision: https://reviews.llvm.org/D72961	2020-01-21 16:51:31 -08:00
Andrei Elovikov	e1d6d36852	[SLP] Don't allow Div/Rem as alternate opcodes Summary: We don't have control/verify what will be the RHS of the division, so it might happen to be zero, causing UB. Reviewers: Vasilis, RKSimon, ABataev Reviewed By: ABataev Subscribers: vporpo, ABataev, hiraditya, llvm-commits, vdmitrie Tags: #llvm Differential Revision: https://reviews.llvm.org/D72740	2020-01-21 15:21:17 -08:00
Florian Hahn	535ed62c5f	[AArch64] Add custom store lowering for 256 bit non-temporal stores. Currently we fail to lower non-termporal stores for 256+ bit vectors to STNPQ, because type legalization will split them up to 128 bit stores and because there are no single non-temporal stores, creating STPNQ in the Load/Store optimizer would be quite tricky. This patch adds custom lowering for 256 bit non-temporal vector stores to improve the generated code. Reviewers: dmgreen, samparker, t.p.northover, ab Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D72919	2020-01-21 14:53:40 -08:00
Matt Arsenault	e47965bf64	AMDGPU/GlobalISel: Merge trivial legalize rules Also move constant-like rules together	2020-01-21 17:37:19 -05:00
Roman Lebedev	a6492e2271	[IR] Value::getPointerAlignment(): handle pointer constants Summary: New `@test13` in `Attributor/align.ll` is the main motivation - `null` pointer really does not limit our alignment knowledge, in fact it is fully aligned since it has no bits set. Here we don't special-case `null` pointer because it is somewhat controversial to add one more place where we enforce that `null` pointer is zero, but instead we do the more general thing of trying to perform constant-fold of pointer constant to an integer, and perform alignment inferrment on that. Reviewers: jdoerfert, gchatelet, courbet, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, arphaman, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73131	2020-01-22 01:32:46 +03:00
Florian Hahn	f42994f228	[Matrix] Hide and describe matrix-propagate-shape option.	2020-01-21 14:28:47 -08:00
Matt Arsenault	9a5a6e9465	AMDGPU/GlobalISel: Merge G_PTR_ADD/G_PTR_MASK rules	2020-01-21 16:57:01 -05:00
Matt Arsenault	fd109308a7	AMDGPU/GlobalISel: Legalize G_PTR_ADD for arbitrary pointers Pointers of unrecognized address spaces shoudl be treated as global-like pointers. Even if loads and stores of them aren't handled, dumb operations that just operate on the bits should work.	2020-01-21 16:35:36 -05:00
Quentin Colombet	ff1f3cc1a1	[GISelKnownBits] Make the max depth a parameter of the analysis Allow users of that analysis to define the cut off depth of the analysis instead of hardcoding 6. NFC as the default parameter is 6.	2020-01-21 11:35:31 -08:00
Thomas Lively	28857d14a8	[WebAssembly] Split and recombine multivalue calls for ISel Summary: Multivalue calls both take and return an arbitrary number of arguments, but ISel only supports one or the other in a single instruction. To get around this, calls are modeled as two pseudo instructions during ISel. These pseudo instructions, CALL_PARAMS and CALL_RESULTS, are recombined into a single CALL MachineInstr in a custom emit hook. RegStackification and the MC layer will additionally need to be made aware of multivalue calls before the tests will produce correct output. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71496	2020-01-21 11:31:33 -08:00
Thomas Lively	3ef169e586	[WebAssembly][InstrEmitter] Foundation for multivalue call lowering Summary: WebAssembly is unique among upstream targets in that it does not at any point use physical registers to store values. Instead, it uses virtual registers to model positions in its value stack. This means that some target-independent lowering activities that would use physical registers need to use virtual registers instead for WebAssembly and similar downstream targets. This CL generalizes the existing `usesPhysRegsForPEI` lowering hook to `usesPhysRegsForValues` in preparation for using it in more places. One such place is in InstrEmitter for instructions that have variadic defs. On register machines, it only makes sense for these defs to be physical registers, but for WebAssembly they must be virtual registers like any other values. This CL changes InstrEmitter to check the new target lowering hook to determine whether variadic defs should be physical or virtual registers. These changes are necessary to support a generalized CALL instruction for WebAssembly that is capable of returning an arbitrary number of arguments. Fully implementing that instruction will require additional changes that are described in comments here but left for a follow up commit. Reviewers: aheejin, dschuff, qcolombet Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71484	2020-01-21 11:13:46 -08:00
Ehud Katz	0b336b6048	[APFloat] Add support for operations on Signaling NaN Fix PR30781 Differential Revision: https://reviews.llvm.org/D69774	2020-01-21 21:02:00 +02:00
Ehud Katz	68122b5826	[APFloat] Extend conversion from special strings Add support for converting Signaling NaN, and a NaN Payload from string. The NaNs (the string "nan" or "NaN") may be prefixed with 's' or 'S' for defining a Signaling NaN. A payload for a NaN can be specified as a suffix. It may be a octal/decimal/hexadecimal number in parentheses or without. Differential Revision: https://reviews.llvm.org/D69773	2020-01-21 20:22:27 +02:00
Fangrui Song	8e1f0974c2	[PowerPC] Delete PPCSubtarget::isDarwin and isDarwinABI http://lists.llvm.org/pipermail/llvm-dev/2018-August/125614.html developers have agreed to remove Darwin support from POWER backends. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D72067	2020-01-21 09:54:44 -08:00
Jonas Devlieghere	72b8bad150	[lldb/Hexagon] Include <mutex> Fixes compiler error on macOS: error: no type named 'mutex' in namespace 'std'.	2020-01-21 09:51:30 -08:00
Fangrui Song	7a8b0b1595	[StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager() Reviewed By: dantrushin Differential Revision: https://reviews.llvm.org/D73063	2020-01-21 09:46:27 -08:00
Krzysztof Parzyszek	305bf5b21d	[Hexagon] Add support for Hexagon v67t microarchitecture (tiny core)	2020-01-21 11:35:10 -06:00
Krzysztof Parzyszek	020041d99b	Update spelling of {analyze,insert,remove}Branch in strings and comments These names have been changed from CamelCase to camelCase, but there were many places (comments mostly) that still used the old names. This change is NFC.	2020-01-21 10:15:38 -06:00
Zakk Chen	1256d68093	[RISCV] Check the target-abi module flag matches the option Reviewers: lenary, asb Reviewed By: lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D72768	2020-01-21 07:32:12 -08:00
Simon Pilgrim	f04284cf1d	[TargetLowering] SimplifyDemandedBits ISD::SRA multi-use handling Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses	2020-01-21 15:12:07 +00:00
Benjamin Kramer	81f385b0c6	Make dropTriviallyDeadConstantArrays not quadratic Only look at the operands of dead constant arrays instead of all constant arrays again.	2020-01-21 16:06:46 +01:00
Jinsong Ji	d7032bc3c0	[PowerPC][NFC] Reclaim TSFlags bit 6 We removed UseVSXReg flag in https://reviews.llvm.org/D58685 But we did not reclain the bit 6 it was assigned, this will become confusing and a hole later.. We should reclaim it as early as possible before new bits. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D72649	2020-01-21 15:04:05 +00:00
Simon Pilgrim	47f99d2ca8	[SelectionDAG] GetDemandedBits - remove ANY_EXTEND handling Rely on SimplifyMultipleUseDemandedBits fallback instead.	2020-01-21 14:39:00 +00:00
Simon Pilgrim	b065902ed4	[X86] combineBT - use SimplifyDemandedBits instead of GetDemandedBits Another step towards removing SelectionDAG::GetDemandedBits entirely	2020-01-21 14:24:46 +00:00
Simon Pilgrim	651fa669a2	[TargetLowering] SimplifyDemandedBits ANY_EXTEND/ANY_EXTEND_VECTOR_INREG multi-use handling Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses	2020-01-21 14:07:19 +00:00
Guillaume Chatelet	139771f8b0	[Alignment][NFC] Use Align with CreateElementUnorderedAtomicMemMove Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73050	2020-01-21 14:16:50 +01:00
Guillaume Chatelet	bc8a1ab26f	[Alignment][NFC] Use Align with CreateMaskedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73087	2020-01-21 14:13:22 +01:00
Simon Pilgrim	5f5f478564	[DAG] Fold extract_vector_elt (scalar_to_vector), K to undef (K != 0) This was unconditionally folding this to the source operand, even if the access was out of bounds. Use undef instead of the extract is not the first element. This helps with some cases where 3-vectors are legalized and avoids processing the 4th component. Original Patch by: arsenm (Matt Arsenault) Differential Revision: https://reviews.llvm.org/D51589	2020-01-21 10:58:30 +00:00
Simon Pilgrim	8d2e6bdbe1	[TargetLowering] SimplifyDemandedBits - Pull out InDemandedMask variable to ISD::SHL. NFCI. Matches ISD::SRA + ISD::SRL variants.	2020-01-21 10:40:18 +00:00
Anna Welker	ff9877ce34	[ARM][MVE] Enable masked scatter Extends the gather/scatter pass in MVEGatherScatterLowering.cpp to enable the transformation of masked scatters into calls to MVE's masked scatter intrinsic. Differential Revision: https://reviews.llvm.org/D72856	2020-01-21 09:46:26 +00:00
Nicolai Hähnle	a80291ce10	Revert "[AMDGPU] Invert the handling of skip insertion." This reverts commit `0dc6c249bf`. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.	2020-01-21 09:17:25 +01:00
Fangrui Song	02c1321139	[MC] Improve a report_fatal_error	2020-01-20 23:13:18 -08:00
Fangrui Song	5721483b64	[AMDGPU] Fix -Wunused-variable after `e5823bf806`	2020-01-20 22:41:13 -08:00
Matt Arsenault	c72aa27f91	AMDDGPU/GlobalISel: Fix RegBankSelect for llvm.amdgcn.ps.live	2020-01-20 23:21:53 -05:00
Matt Arsenault	e5823bf806	AMDGPU: Don't create weird sized integers There's no reason to introduce a new, unnaturally sized value here. This has a chance to produce worse code with legalization. Avoids regression in a future patch.	2020-01-20 20:02:54 -05:00
Fangrui Song	d232c21566	[AsmPrinter] Don't emit __patchable_function_entries entry if "patchable-function-entry"="0" Add improve tests	2020-01-20 16:13:48 -08:00
Matt Arsenault	9b13b4a0e3	AMDGPU: Prepare to use scalar register indexing Define pseudos mirroring the the VGPR indexing ones, and adjust the operands in the s_movrel* instructions to avoid the result def.	2020-01-20 17:19:16 -05:00
Matt Arsenault	8615eeb455	AMDGPU: Partially merge indirect register write handling `a785209bc2` switched to using a pseudos instead of manually tying operands on the regular instruction. The VGPR indexing mode path should have the same problems that change attempted to avoid, so these should use the same strategy. Use a single pseudo for the VGPR indexing mode and movreld paths, and expand it based on the subtarget later. These have essentially the same constraints, reading the index from m0. Switch from using an offset to the subregister index directly, instead of computing an offset and re-adding it back. Also add missing pseudos for existing register class sizes.	2020-01-20 17:19:16 -05:00
Krzysztof Parzyszek	c12a5917d2	[Hexagon] Add support for Hexagon/HVX v67 ISA	2020-01-20 16:16:49 -06:00
Mircea Trofin	2e42cc7a50	[NFC] small rename of private member in InlineCost.cpp Summary: Follow-up from https://reviews.llvm.org/D71733. Also moved an initialization to the base class, where it belonged in the first place. Reviewers: eraman, davidxl Reviewed By: davidxl Subscribers: hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72949	2020-01-20 13:03:15 -08:00
Matt Arsenault	f6418d72f5	AMDGPU/GlobalISel: Add documentation for RegisterBankInfo Document some high level strategies that should be used for register bank selection. The constant bus restriction section hasn't actually been implemented yet.	2020-01-20 15:41:25 -05:00
Simon Pilgrim	9c06c10fba	[SelectionDAG] GetDemandedBits - fallback to SimplifyMultipleUseDemandedBits by default. First step towards removing SelectionDAG::GetDemandedBits entirely since it so similar to SimplifyMultipleUseDemandedBits anyhow.	2020-01-20 16:51:52 +00:00
Sid Manning	7fee4fed4c	Add support for Linux/Musl ABI Differential revision: https://reviews.llvm.org/D72701 The patch adds a new option ABI for Hexagon. It primary deals with the way variable arguments are passed and is use in the Hexagon Linux Musl environment. If a callee function has a variable argument list, it must perform the following operations to set up its function prologue: 1. Determine the number of registers which could have been used for passing unnamed arguments. This can be calculated by counting the number of registers used for passing named arguments. For example, if the callee function is as follows: int foo(int a, ...){ ... } ... then register R0 is used to access the argument ' a '. The registers available for passing unnamed arguments are R1, R2, R3, R4, and R5. 2. Determine the number and size of the named arguments on the stack. 3. If the callee has named arguments on the stack, it should copy all of these arguments to a location below the current position on the stack, and the difference should be the size of the register-saved area plus padding (if any is necessary). The register-saved area constitutes all the registers that could have been used to pass unnamed arguments. If the number of registers forming the register-saved area is odd, it requires 4 bytes of padding; if the number is even, no padding is required. This is done to ensure an 8-byte alignment on the stack. For example, if the callee is as follows: int foo(int a, ...){ ... } ... then the named arguments should be copied to the following location: current_position - 5 (for R1-R5) * 4 (bytes) - 4 (bytes of padding) If the callee is as follows: int foo(int a, int b, ...){ ... } ... then the named arguments should be copied to the following location: current_position - 4 (for R2-R5) * 4 (bytes) - 0 (bytes of padding) 4. After any named arguments have been copied, copy all the registers that could have been used to pass unnamed arguments on the stack. If the number of registers is odd, leave 4 bytes of padding and then start copying them on the stack; if the number is even, no padding is required. This constitutes the register-saved area. If padding is required, ensure that the start location of padding is 8-byte aligned. If no padding is required, ensure that the start location of the on-stack copy of the first register which might have a variable argument is 8-byte aligned. 5. Decrement the stack pointer by the size of register saved area plus the padding. For example, if the callee is as follows: int foo(int a, ...){ ... } ; ... then the decrement value should be the following: 5 (for R1-R5) * 4 (bytes) + 4 (bytes of padding) = 24 bytes The decrement should be performed before the allocframe instruction. Increment the stack-pointer back by the same amount before returning from the function.	2020-01-20 09:59:56 -06:00
Sanjay Patel	7bee94410c	[InstCombine] form copysign from select of FP constants (PR44153) This should be the last step needed to solve the problem in the description of PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 If we're casting an FP value to int, testing its signbit, and then choosing between a value and its negated value, that's a complicated way of saying "copysign": (bitcast X) < 0 ? -TC : TC --> copysign(TC, X) Differential Revision: https://reviews.llvm.org/D72643	2020-01-20 10:51:14 -05:00
Guillaume Chatelet	46b9563cf6	[Alignment][NFC] Use Align with CreateElementUnorderedAtomicMemCpy Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, nicolasvasilache Subscribers: hiraditya, jfb, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73041	2020-01-20 15:39:45 +01:00
Mark Murray	b10a0eb04a	[ARM][MVE][Intrinsics] Take abs() of VMINNMAQ, VMAXNMAQ intrinsics' first arguments. Summary: Fix VMINNMAQ, VMAXNMAQ intrinsics; BOTH arguments have the absolute values taken. Reviewers: dmgreen, simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72830	2020-01-20 14:33:26 +00:00
Sanjay Patel	da9c93f330	[InstSimplify] fold select of vector constants that include undef elements As mentioned in D72643, we'd like to be able to assert that any select of equivalent constants has been removed before we're deep into InstCombine. But there's a loophole in that assertion for vectors with undef elements that don't match exactly. This patch should close that gap. If we have undefs, we can't safely propagate those unless both constants elements for that lane are undef. Differential Revision: https://reviews.llvm.org/D72958	2020-01-20 08:48:32 -05:00
dfukalov	de34b54edc	[SCEV] Swap guards estimation sequence. NFC Summary: Loop unroll spends a lot of time in SCEVs processing in case when a function contains hundreds of simple 'for' loops with a quite complex arrays indexes like for (int i = 0; i < 8; ++i) { for (int j = 0; j < 32; ++j) { C[j8+i] = B[j32+i+128] + A[i64+128]; } } for (int i = 0; i < 8; ++i) { for (int j = 0; j < 8; ++j) { for (int k = 0; k < 32; ++k) { D[k64+i8+j] = D[k64+i8+j] + E[i+16] C[k*8+j+256]; } } } The patch improves loop unroll speed since isLoopBackedgeGuardedByCond takes much less time than isLoopEntryGuardedByCond in the edge case. Reviewers: skatkov, sanjoy, mkazantsev Reviewed By: sanjoy Subscribers: fhahn, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72929	2020-01-20 16:41:16 +03:00
Simon Tatham	f3e73e88fd	[ARM,MVE] Fix confusing MC names for MVE VMINA/VMAXA insns. Summary: A recent commit accidentally defined names like `MVE_VMAXAs8` as instances of the multiclass `MVE_VMINA`, and vice versa. This has no effect on the test suite, because nothing directly refers to those instruction names (the isel patterns are generated in Tablegen using `!cast<Instruction>(NAME)` inside a lower-level multiclass). But it means that `llvm-mc -show-inst` was listing VMAXA as VMINA, and it would also affect any further draft code gen patches that use those instruction ids. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73034	2020-01-20 13:25:52 +00:00
Andrzej Warzynski	7e717b3990	[AArch64][SVE] Extend int_aarch64_sve_ld1_gather_imm The ACLE distinguishes between the following addressing modes for gather loads: * "scalar base, vector offset", and * "vector base, scalar offset". For the "vector base, scalar offset" case, the `int_aarch64_sve_ld1_gather_imm` intrinsic was added in `79f2422d`. Currently, that intrinsic assumes that the scalar offset is passed as an immediate. As a result, it does not cater for cases where scalar offset is stored in a register. In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all cases are covered: * `int_aarch64_sve_ld1_gather_imm` is renamed as `int_aarch64_sve_ld1_gather_scalar_offset` * new DAG combine rules are added for GLD1_IMM for scenarios where the offset is a non-immediate scalar or an out-of-range immediate * sve-intrinsics-gather-loads-vector-base.ll is renamed as sve-intrinsics-gather-loads-vector-base-imm-offset.ll * sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test file for non-immediate offsets Similar changes are made for scatter store intrinsics. Reviewed By: sdesmalen, efriedma Differential Revision: https://reviews.llvm.org/D71773	2020-01-20 12:19:18 +00:00
Evgeniy Brevnov	af7e158872	[LV] Vectorizer should adjust trip count in profile information Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly. Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov Reviewed By: Ayal, DaniilSuchkov Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67905	2020-01-20 18:36:28 +07:00
Simon Pilgrim	eaa4548459	[X86][SSE] Add PACKSS SimplifyMultipleUseDemandedBits 'sign bit' handling. Attempt to use SimplifyMultipleUseDemandedBits to simplify PACKSS if we're only after the sign bit.	2020-01-20 10:48:54 +00:00
Sjoerd Meijer	8cba99e2aa	[ARM][MVE] Tail-Predication: rematerialise iteration count in exit blocks This patch uses helper function rewriteLoopExitValues that is refactored in D72602 to rematerialise the iteration count in exit blocks, so that we can clean-up loop update expressions inside the hardware-loops later in ARMLowOverheadLoops, which is necessary to get actual performance gains for tail-predicated loops. Differential Revision: https://reviews.llvm.org/D72714	2020-01-20 10:26:36 +00:00

1 2 3 4 5 ...

130312 Commits