llvm-project

Commit Graph

Author	SHA1	Message	Date
Petar Avramovic	2b66d32f3f	[MIPS GlobalISel] Select count leading zeros llvm.ctlz.<type> intrinsic has additional i1 argument is_zero_undef, it tells whether zero as the first argument produces a defined result. MIPS clz instruction returns 32 for zero input. G_CTLZ is generated from llvm.ctlz.<type> (<type> <src>, i1 false) intrinsics, clang generates these intrinsics from __builtin_clz and __builtin_clzll. G_CTLZ_ZERO_UNDEF can also be generated from llvm.ctlz with true as second argument. It is also traditionally part of and many algorithms that are now predicated on avoiding zero-value inputs. Add narrow scalar for G_CTLZ (algorithm uses G_CTLZ_ZERO_UNDEF). Lower G_CTLZ_ZERO_UNDEF and select G_CTLZ for MIPS32. Differential Revision: https://reviews.llvm.org/D73214	2020-01-27 09:43:38 +01:00
Roman Lebedev	76fcf900d5	[X86][BdVer2] Polish LEA instruction scheduling info Based on exhaustive llvm-exegesis measurements. There may still be some imperfections for LEA16r/LEA32r. Much like was observed in D68646, i'm also measuring some outliers with some specific registers.	2020-01-26 22:17:27 +03:00
Roman Lebedev	31019dfdf5	[NFC][MCA] Re-autogenerate all check lines in all X86 MCA tests Some whitespace issues have crept in, and some znver2 check lines were missing..	2020-01-26 22:17:26 +03:00
Simon Pilgrim	f99ef5455a	[InstCombine] Add extra shift(c1,add(c2,y)) tests for PR15141	2020-01-26 19:04:12 +00:00
Simon Pilgrim	fa19d67a2a	[X86][AVX] Extend combineCommutableSHUFP to handle v8f32 and v16f32 commutable shufps patterns	2020-01-26 19:04:12 +00:00
Guillaume Chatelet	cc034a5883	[IR] masked gather/scatter alignment should be set Summary: masked_load and masked_store instructions require the alignment to be specified and a power of two. It seems to me that this requirement applies to masked_gather and masked_scatter as well. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73179	2020-01-26 18:51:36 +01:00
Simon Pilgrim	377e86d12e	[X86][AVX] Add tests showing combineCommutableSHUFP failure to handle v8f32 and v16f32 commutable shufps patterns	2020-01-26 14:36:24 +00:00
Simon Pilgrim	1a81b296cd	[X86][SSE] combineCommutableSHUFP - permilps(shufps(load(),x)) --> permilps(shufps(x,load())) Pull out combineTargetShuffle code added in rG3fd5d1c6e7db into a helper function and extend it to handle shufps(shufps(load(),x),y) and shufps(y,shufps(load(),x)) cases as well.	2020-01-26 14:36:23 +00:00
Simon Pilgrim	3daa71ee00	[SelectionDAG] ComputeNumSignBits - add DemandedElts support for MIN/MAX ops	2020-01-25 20:21:14 +00:00
Simon Pilgrim	481b79668c	[X86] Add tests showing ComputeNumSignBits's failure to use DemandedElts for MIN/MAX opcodes	2020-01-25 19:28:57 +00:00
Simon Pilgrim	3f8916b2e8	[SelectionDAG] ComputeNumSignBits - add support for rotate non-uniform vector amounts	2020-01-25 19:15:05 +00:00
Simon Pilgrim	e3c26a9d1b	[SelectionDAG] ComputeNumSignBits - add support for rotate uniform vector amounts	2020-01-25 18:55:47 +00:00
Simon Pilgrim	435a60a5af	[X86] Add tests showing ComputeNumSignBits's failure to see through rotate vector amounts	2020-01-25 18:24:51 +00:00
Simon Pilgrim	c8de7c8f50	[TargetLowering] SimplifyDemandedBits - Remove ashr if all our demandedbits already match the sign bit Differential Revision: https://reviews.llvm.org/D73412	2020-01-25 17:36:46 +00:00
Matt Arsenault	fe9765762c	AMDGPU: Generate test checks	2020-01-24 23:25:57 -05:00
Tom Stellard	86c944d790	AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets Summary: This improves merging of sequences like: store a, ptr + 4 store b, ptr + 8 store c, ptr + 12 store d, ptr + 16 store e, ptr + 20 store f, ptr Prior to this patch the basic block was scanned in order to find instructions to merge and the above sequence would be transformed to: store4 <a, b, c, d>, ptr + 4 store e, ptr + 20 store r, ptr With this change, we now sort all the candidate merge instructions by their offset, so instructions are visited in offset order rather than in the order they appear in the basic block. We now transform this sequnce into: store4 <f, a, b, c>, ptr store2 <d, e>, ptr + 16 Another benefit of this change is that since we have sorted the mergeable lists by offset, we can easily check if an instruction is mergeable by checking the offset of the instruction that becomes before or after it in the sorted list. Once we determine an instruction is not mergeable we can remove it from the list and avoid having to do the more expensive mergeablilty checks. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: arsenm, nhaehnle Subscribers: kerbowa, merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65966	2020-01-24 19:45:56 -08:00
Vedant Kumar	802bec8961	Revert "Reland: [DWARF] Allow cross-CU references of subprogram definitions" ... as well as: Revert "[DWARF] Defer creating declaration DIEs until we prepare call site info" This reverts commit `fa4701e197`. This reverts commit `79daafc903`. There have been reports of this assert getting hit: CalleeDIE && "Could not find DIE for call site entry origin	2020-01-24 18:07:54 -08:00
Evgenii Stepanov	1df8549b26	[msan] Instrument x86.pclmulqdq* intrinsics. Summary: These instructions ignore parts of the input vectors which makes the default MSan handling too strict and causes false positive reports. Reviewers: vitalybuka, RKSimon, thakis Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73374	2020-01-24 14:31:06 -08:00
Heejin Ahn	65eb11306e	[WebAssembly] Update bleeding-edge CPU features Summary: This adds bulk memory and tail call to "bleeding-edge" CPU, since their implementation in LLVM/clang seems mostly complete. Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D73322	2020-01-24 14:27:35 -08:00
Heejin Ahn	764f4089e8	[WebAssembly] Add reference types target feature Summary: This adds the reference types target feature. This does not enable any more functionality in LLVM/clang for now, but this is necessary to embed the info in the target features section, which is used by Binaryen and Emscripten. It turned out that after D69832 `-fwasm-exceptions` crashed because we didn't have the reference types target feature. Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73320	2020-01-24 14:26:27 -08:00
Craig Topper	d3bf06bc81	[DAGCombiner] Add combine for (not (strict_fsetcc)) to create a strict_fsetcc with the opposite condition. Unlike the existing code that I modified here, I only handle the case where the strict_fsetcc has a single use. Not sure exactly how to handle multiples uses. Testing this on X86 is hard because we already have a other combines that get rid of lowered version of the integer setcc that this xor will eventually become. So this combine really just saves a bunch of extra nodes being created. Not sure about other targets. Differential Revision: https://reviews.llvm.org/D71816	2020-01-24 14:15:36 -08:00
Craig Topper	b1f3a0f972	Revert `a107f86` "[GlobalsAA] Add back a check to intrinsic_addresstaken.ll to see if the AVX and AVX512 bots still fail for it." It still fails some buildbots which is what I was trying to test.	2020-01-24 13:15:23 -08:00
Matt Arsenault	3b93945587	AMDGPU/GlobalISel: Select wqm, softwqm and wwm intrinsics	2020-01-24 13:06:44 -08:00
Matt Arsenault	87c46a3129	AMDGPU: Don't error on ds.ordered intrinsic in function These should be assumed to be called from a compute context. Also don't use a 2 entry switch over constants.	2020-01-24 13:06:44 -08:00
Matt Arsenault	9c346464c1	TableGen/GlobalISel: Handle non-leaf EXTRACT_SUBREG This previously only handled EXTRACT_SUBREGs from leafs, such as operands directly in the original output. Handle extracting from a result instruction.	2020-01-24 12:15:10 -08:00
Matt Arsenault	4fdae24733	AMDGPU/GlobalISel: Add selection tests for G_ATOMICRMW_ADD	2020-01-24 12:15:09 -08:00
Craig Topper	a107f86417	[GlobalsAA] Add back a check to intrinsic_addresstaken.ll to see if the AVX and AVX512 bots still fail for it. These bots failed for this several months ago and as a result, this check was removed. If they still fail I'm going to try to see if I can figure out why.	2020-01-24 11:54:23 -08:00
Stanislav Mekhanoshin	555d8f4ef5	[AMDGPU] Bundle loads before post-RA scheduler We are relying on atrificial DAG edges inserted by the MemOpClusterMutation to keep loads and stores together in the post-RA scheduler. This does not work all the time since it allows to schedule a completely independent instruction in the middle of the cluster. Removed the DAG mutation and added pass to bundle already clustered instructions. These bundles are unpacked before the memory legalizer because it does not work with bundles but also because it allows to insert waitcounts in the middle of a store cluster. Removing artificial edges also allows a more relaxed scheduling. Differential Revision: https://reviews.llvm.org/D72737	2020-01-24 11:33:38 -08:00
Andy Kaylor	b35b7da460	[PGO] Attach appropriate funclet operand bundles to value profiling instrumentation calls Patch by Chris Chrulski When generating value profiling instrumentation, ensure the call gets the correct funclet token, otherwise WinEHPrepare will turn the call (and all subsequent instructions) into unreachable. Differential Revision: https://reviews.llvm.org/D73221	2020-01-24 11:20:53 -08:00
Stanislav Mekhanoshin	44b865fa7f	[AMDGPU] Allow narrowing muti-dword loads Currently BE allows only a little load narrowing because of the fear it will produce sub-dword ext loads. However, we can always allow narrowing if we are shrinking one multi-dword load to another multi-dword load. In particular we were unable to reduce s_load_dwordx8 into s_load_dwordx4 if identity shuffle was used to extract low 4 dwords. Differential Revision: https://reviews.llvm.org/D73133	2020-01-24 11:03:41 -08:00
Stanislav Mekhanoshin	7a94d4f4ee	Allow combining of extract_subvector to extract element Differential Revision: https://reviews.llvm.org/D73132	2020-01-24 10:50:26 -08:00
Austin Kerbow	c226646337	Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI Summary: Enable the new diveregence analysis by default for AMDGPU. Resubmit with test updates since GPUDA was causing failures on Windows. Reviewers: rampitec, nhaehnle, arsenm, thakis Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73315	2020-01-24 10:39:40 -08:00
Austin Kerbow	37aa16ebb7	[DA] Don't propagate from unreachable blocks Summary: Fixes crash that could occur when a divergent terminator has an unreachable parent. Reviewers: rampitec, nhaehnle, arsenm Subscribers: jvesely, wdng, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73323	2020-01-24 10:28:11 -08:00
Andy Kaylor	a33accde95	[PGO] Early detection regarding whether pgo counter promotion is possible Patch by Chris Chrulski This fixes a problem with the current behavior when assertions are enabled. A loop that exits to a catchswitch instruction is skipped for the counter promotion, however this check was being done after the PGOCounterPromoter tried to collect an insertion point for the exit block. A call to getFirstInsertionPt() on a block that begins with a catchswitch instruction triggers an assertion. This change performs a check whether the counter promotion is possible prior to collecting the ExitBlocks and InsertPts. Differential Revision: https://reviews.llvm.org/D73222	2020-01-24 09:55:41 -08:00
Fangrui Song	50a3ff30e1	[PatchableFunction] Allow empty entry MachineBasicBlock Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73301	2020-01-24 09:42:48 -08:00
David Green	b535aa405a	[ARM] Use reduction intrinsics for larger than legal reductions The codegen for splitting a llvm.vector.reduction intrinsic into parts will be better than the codegen for the generic reductions. This will only directly effect when vectorization factors are specified by the user. Also added tests to make sure the codegen for larger reductions is OK. Differential Revision: https://reviews.llvm.org/D72257	2020-01-24 17:07:24 +00:00
Fangrui Song	f1dab29908	[ELF][PowerPC] Support R_PPC_COPY and R_PPC64_COPY Reviewed By: Bdragon28, jhenderson, grimar, sfertile Differential Revision: https://reviews.llvm.org/D73255	2020-01-24 09:06:20 -08:00
Kazushi (Jam) Marukawa	0fca35c652	[VE] global variable isel patterns Summary: Asm expr fixups, isel patterns and tests for global variables addresses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73355	2020-01-24 17:35:14 +01:00
Tom Weaver	f5147765ba	[DebugInfo][LiveDebugValues] Teach Live Debug Values About Meta Instructions Previously LiveDebugValues pass would consider meta instructions that 'fiddle' with liveness of registers as register definitions when transfering register defs. This would mean that, for example, a KILL instruction would cause LiveDebugValues to terminate the range of an earlier DBG_VALUE instruction resulting in the none propogation of said DBG_VALUE instructions into later blocks. This patch adds the check and a helpful comment, fixes a test that previously tested for the broken behaviour by coincidence and adds a test specifically for this. reviewers: vsk, dstenb, djtodoro Differential Revision: https://reviews.llvm.org/D73210	2020-01-24 16:29:05 +00:00
Simon Pilgrim	3fd5d1c6e7	[X86][SSE] combineTargetShuffle - permilps(shufps(load(),x)) --> permilps(shufps(x,load())) Moves lowerShuffleWithSHUFPS commutation code from rG30fcd29fe479 to catch cases during combine	2020-01-24 15:23:20 +00:00
Sergey Dmitriev	f69eba0772	[llvm-objcopy][COFF] Add support for --set-section-flags Reviewers: jhenderson, MaskRay, alexshap, rupprecht, mstorsjo Reviewed By: jhenderson Subscribers: abrachet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73107	2020-01-24 07:12:55 -08:00
Kazushi (Jam) Marukawa	08ebd8c79e	[VE] aligned load/store isel patterns Summary: Aligned load/store isel patterns and tests for i1/i8/16/32/64 (including extension and truncation) and fp32/64. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73276	2020-01-24 15:16:54 +01:00
Thomas Preud'homme	8e96697c7d	FileCheck [9/12]: Add support for matching formats Summary: This patch is part of a patch series to add support for FileCheck numeric expressions. This specific patch adds support for selecting a matching format to match a numeric value against (ie. decimal, hex lower case letters or hex upper case letters). This commit allows to select what format a numeric value should be matched against. The following formats are supported: decimal value, lower case hex value and upper case hex value. Matching formats impact both the format of numeric value to be matched as well as the format of accepted numbers in a definition with empty numeric expression constraint. Default for absence of format is decimal value unless the numeric expression constraint is non null and use a variable in which case the format is the one used to define that variable. Conclict of format in case of several variable being used is diagnosed and forces the user to select a matching format explicitely. This commit also enables immediates in numeric expressions to be in any radix known to StringRef's GetAsInteger method, except for legacy numeric expressions (ie [[@LINE+<offset>]] which only support decimal immediates. Copyright: - Linaro (changes up to diff 183612 of revision D55940) - GraphCore (changes in later versions of revision D55940 and in new revision created off D55940) Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson Reviewed By: jhenderson, arichardson Subscribers: daltenty, MaskRay, hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, kristina, hfinkel, rogfer01, JonChesterfield Tags: #llvm Differential Revision: https://reviews.llvm.org/D60389	2020-01-24 14:15:28 +00:00
Victor Huang	5cee34013c	[PowerPC][Future] Add prefixed instruction paddi to future CPU Future CPU will include support for prefixed instructions. These prefixed instructions are formed by a 4 byte prefix immediately followed by a 4 byte instruction effectively making an 8 byte instruction. The new instruction paddi is a prefixed form of addi. This patch adds paddi and all of the support required for that instruction. The majority of the patch deals with supporting the new prefixed instructions. The addition of paddi is mainly to allow for testing. Differential Revision: https://reviews.llvm.org/D72569	2020-01-24 07:27:25 -06:00
Simon Pilgrim	5e62e162cd	[X86][SSE] Add another shufps+shufps test for fold through commutation	2020-01-24 12:04:11 +00:00
Simon Pilgrim	30fcd29fe4	[X86][SSE] lowerShuffleWithSHUFPS - commute '2V1+2V2 elements' mask if it allows a loaded fold As mentioned on D73023.	2020-01-24 12:04:10 +00:00
Georgii Rymar	1af6209d64	[llvm-readelf] - Improve dumping of objects without a section header string table. We have a test/Object/no-section-header-string-table.test which checks what happens when an object does not have a section header string table. It does not check the full output though. Currently our output is different from GNU readelf, because the latter prints "<no-strings>" instead of a section name, while we print nothing. This patch fixes this, adds a proper test case and removes the one from test/Object, as it is not a right folder for llvm-readelf tests. Differential revision: https://reviews.llvm.org/D73193	2020-01-24 14:30:03 +03:00
Simon Pilgrim	e37cdbeeab	[X86][SSE] Add shufps+shufps test for fold through commutation As mentioned on D73023, lowerShuffleWithSHUFPS should be able to commute the shufps inputs to fold the second arg as it will then permute the shufps result anyway.	2020-01-24 11:16:44 +00:00
Sam Parker	0ae13766ff	[NFC][ARM] Add test	2020-01-24 11:00:18 +00:00
Kerry McLaughlin	4c4861b577	[AArch64][SVE] Add intrinsics for FFR manipulation Summary: Implements the following intrinsics: - llvm.aarch64.sve.setffr - llvm.aarch64.sve.rdffr - llvm.aarch64.sve.rdffr.z - llvm.aarch64.sve.wrffr Reviewers: sdesmalen, efriedma, dancgr, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73097	2020-01-24 10:58:12 +00:00

1 2 3 4 5 ...

68144 Commits