llvm-project

Commit Graph

Author	SHA1	Message	Date
Kai Luo	fec7da8285	[PowerPC][Peephole] Check if `extsw`'s second operand is a virtual register Summary: When combining `extsw` and `sldi` in `PPCMIPeephole`, we have to check if `extsw`'s second operand is a virtual register, otherwise we might get miscompile. Differential Revision: https://reviews.llvm.org/D65315 llvm-svn: 367645	2019-08-02 03:14:17 +00:00
Sanjay Patel	8560ea5534	[AArch64][x86] adjust tests with shift-add-shift; NFC Prevent folding away the math completely. llvm-svn: 367612	2019-08-01 21:08:08 +00:00
Sanjay Patel	cb3140b7bf	[AArch64][x86] add tests for shift-add-shift; NFC (PR42644) llvm-svn: 367607	2019-08-01 20:32:27 +00:00
Matt Arsenault	d9d30a408e	GlobalISel: Lower scalarizing unmerge of a vector to shifts AMDGPU sometimes has legal s16 and <2 x s16> operations, but all registers are really 32-bit. An unmerge destination really should ben widened to a 32-bit register. If widening a scalarizing vector with a target size that matches the vector size, bitcast to integer and extract the relevant bits with shifts. I'm not sure if this is the right place for this. This could arguably be part of widenScalar for the result. I also have a growing feeling that we're missing a bitcast legalize action. llvm-svn: 367604	2019-08-01 19:10:05 +00:00
Craig Topper	a9ed5436bd	[X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it. This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that... Differential Revision: https://reviews.llvm.org/D65533 llvm-svn: 367601	2019-08-01 18:49:07 +00:00
Craig Topper	005cc42316	[X86] Add some test cases for 512-bit truncate to 128-bits with min-legal-vector-width=0 and prefer-vector-width=256. We currently split the 512 type, truncate each half to 128 bits, concatenate them, and then truncate again. Probably better to truncate each half to 64-bits and then concat the results using vpunpcklqdq. llvm-svn: 367600	2019-08-01 18:48:57 +00:00
Matt Arsenault	bb582ebdba	AMDGPU: Remove v0 workaround for DS_GWS_* instructions Any register should work for the src field since r366067, since the used value is not pulled from the expected encoding field. llvm-svn: 367598	2019-08-01 18:41:32 +00:00
Matt Arsenault	5faa533e47	GlobalISel: Fix widenScalar for G_MERGE_VALUES to pointer AMDGPU testcase isn't broken now, but will be in a future patch without this. llvm-svn: 367591	2019-08-01 18:13:16 +00:00
Wouter van Oortmerssen	87af0b1911	[WebAssembly] Assembler/InstPrinter: support call_indirect type index. A TYPE_INDEX operand (as used by call_indirect) used to be represented by the InstPrinter as a symbol (e.g. .Ltype_index0@TYPE_INDEX) which was a bit of a mismatch with the WasmObjectWriter which expects an unnamed symbol, to receive the signature from and then turn into a reloc. There was really no good way to round-trip this information. An earlier version of this patch tried to attach the signature information using a .functype, but that ran into trouble when the symbol was re-emitted without a name. Removing the name was a giant hack also. The current version changes the assembly syntax to have an inline signature spec for TYPEINDEX operands that is always unnamed, which is much more elegant both in syntax and in implementation (as now the assembler is able to follow the same path as the regular backend) Reviewers: sbc100, dschuff, aheejin, jgravelle-google, sunfish, tlively Subscribers: arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64758 llvm-svn: 367590	2019-08-01 18:08:26 +00:00
Simon Pilgrim	1d183b407a	[TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling Allow us to peek through vector insertions to avoid dependencies on entire insertion chains. llvm-svn: 367588	2019-08-01 17:46:44 +00:00
Simon Pilgrim	63d4114f72	[X86][SSE] Add PEXTR(PINSR(v, s, c), c) -> s combine. We should probably extend this to cover bitcasts as well to help other cases in promote-vec3.ll. llvm-svn: 367582	2019-08-01 16:38:39 +00:00
Simon Pilgrim	33f5f863b5	[X86][SSE] SimplifyMultipleUseDemandedBits - Add PEXTR/PINSR B+W handling This adds SimplifyMultipleUseDemandedBitsForTargetNode X86 support and uses it to allow us to peek through vector insertions to avoid dependencies on entire insertion chains. llvm-svn: 367570	2019-08-01 14:46:03 +00:00
Simon Pilgrim	f99f9881e3	[X86] EltsFromConsecutiveLoads - don't attempt to merge volatile loads (PR42846) llvm-svn: 367556	2019-08-01 13:13:18 +00:00
David Green	1343814fb4	[ARM] Fix for MVE VREV64 The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the cross-beat nature of the instruction. This adds an earlyclobber to Qd, which seems to be the same way we deal with this on other instructions like the write-back on loads and stores. Differential Revision: https://reviews.llvm.org/D65502 llvm-svn: 367544	2019-08-01 11:22:03 +00:00
Simon Pilgrim	7d766c393e	[ARM] Regenerate BSWAP16 tests llvm-svn: 367543	2019-08-01 11:12:10 +00:00
Sander de Smalen	7ebccfefb8	[AArch64] Do not allocate unnecessary emergency slot. Fix an issue where the compiler still allocates an emergency spill slot even though it already decided to spill an extra callee-save register to use as a scratch register. Reviewers: gberry, thegameg, mstorsjo, t.p.northover Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D65504 llvm-svn: 367540	2019-08-01 10:53:45 +00:00
Petar Avramovic	8a40cedfe6	[MIPS GlobalISel] Fold load/store + G_GEP + G_CONSTANT Fold load/store + G_GEP + G_CONSTANT when immediate in G_CONSTANT fits into 16 bit signed integer. Differential Revision: https://reviews.llvm.org/D65507 llvm-svn: 367535	2019-08-01 09:40:13 +00:00
David Zarzycki	4f1d893f9e	[Testing] Fix tests that break with read-only checkouts Found with `mount --bind -o ro ...` on Linux. llvm-svn: 367519	2019-08-01 06:41:40 +00:00
Zi Xuan Wu	66c320908b	recommit:[PowerPC] Eliminate loads/swap feeding swap/store for vector type by using big-endian load/store In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target. So we can combine vector load + reverse into big endian load to eliminate the swap instruction. Also combine vector reverse + store into big endian store. Differential Revision: https://reviews.llvm.org/D65063 llvm-svn: 367516	2019-08-01 05:26:02 +00:00
Fangrui Song	67a8d6c795	AMDGPU/GlobalISel: fix inst-select-load-local.mir in -DLLVM_ENABLE_ASSERTIONS=off builds after r367498 llvm-svn: 367514	2019-08-01 04:03:06 +00:00
Matt Arsenault	9952f46407	AMDGPU/GlobalISel: Fix flat load/store of pointer types llvm-svn: 367513	2019-08-01 03:57:42 +00:00
Matt Arsenault	57495268ac	AMDGPU/GlobalISel: Remove manual store select code This regresses the weird types that are newly treated as legal load types, but fixes incorrectly using flat instrucions on SI. llvm-svn: 367512	2019-08-01 03:52:40 +00:00
Matt Arsenault	ae87b9f2c2	AMDGPU/GlobalISel: Select local atomic cmpxchg llvm-svn: 367511	2019-08-01 03:41:41 +00:00
Matt Arsenault	26cb53b260	AMDGPU/GlobalISel: Handle G_ATOMICRMW_FADD llvm-svn: 367509	2019-08-01 03:33:15 +00:00
Matt Arsenault	da5b9bfa95	AMDGPU/GlobalISel: Allow selection of DS atomicrmw llvm-svn: 367507	2019-08-01 03:29:01 +00:00
Matt Arsenault	3baf4d3418	AMDGPU/GlobalISel: Select simple local stores llvm-svn: 367504	2019-08-01 03:09:15 +00:00
Matt Arsenault	7bedceb5b2	GlobalISel: moreElementsVector for G_LOAD/G_STORE AMDGPU change and test is a placeholder until a future patch with complete handling. llvm-svn: 367503	2019-08-01 01:44:22 +00:00
Peter Collingbourne	fbc563e2cb	Create unique, but identically-named ELF sections for explicitly-sectioned functions and globals when using -function-sections and -data-sections. This allows functions and globals to to be reordered later in the linking phase (using the -symbol-ordering-file) even though reordering will be limited to the scope of the explicit section. Patch by Rahman Lavaee! Differential Revision: https://reviews.llvm.org/D65478 llvm-svn: 367501	2019-08-01 01:38:53 +00:00
Matt Arsenault	d48324ff6f	Reapply "AMDGPU: Split block for si_end_cf" This reverts commit r359363, reapplying r357634 llvm-svn: 367500	2019-08-01 01:25:27 +00:00
Matt Arsenault	3594011de0	AMDGPU/GlobalISel: Select local loads llvm-svn: 367498	2019-08-01 00:53:38 +00:00
Amy Huang	153f20057c	Revert "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" and and partial fix. Causes windows buildbot errors. This reverts commit 6e65c34523963094acd0d6c94a5f5c64b32fe6aa and `53da7ca943`. llvm-svn: 367496	2019-07-31 23:59:31 +00:00
Eli Friedman	89b80f1239	[ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1. This is extremely specific, but saves three instructions when it's legal. I don't think the code can be usefully generalized. Differential Revision: https://reviews.llvm.org/D65351 llvm-svn: 367492	2019-07-31 23:19:21 +00:00
Eli Friedman	2f45ec1c39	[ARM] Transform compare of masked value to shift on Thumb1. Thumb1 has very limited immediate modes, so turning an "and" into a shift can save multiple instructions. It's possible to simplify the generated code for test2 and test3 in cmp-and-fold.ll a little more, but I'll implement that as a followup. Differential Revision: https://reviews.llvm.org/D65175 llvm-svn: 367491	2019-07-31 23:17:34 +00:00
Craig Topper	b70026c43c	[ScalarizeMaskedMemIntrin] Bitcast the mask to the scalar domain and use scalar bit tests for the branches. X86 at least is able to use movmsk or kmov to move the mask to the scalar domain. Then we can just use test instructions to test individual bits. This is more efficient than extracting each mask element individually. I special cased v1i1 to use the previous behavior. This avoids poor type legalization of bitcast of v1i1 to i1. I've skipped expandload/compressstore as I think we need to handle constant masks for those better first. Many tests end up with duplicate test instructions due to tail duplication in the branch folding pass. But the same thing happens when constructing similar code in C. So its not unique to the scalarization. Not sure if this lowering code will also be good for other targets, but we're only testing X86 today. Differential Revision: https://reviews.llvm.org/D65319 llvm-svn: 367489	2019-07-31 22:58:15 +00:00
Craig Topper	b51dc64063	[X86] Add DAG combine to fold any_extend_vector_inreg+truncstore to an extractelement+store We have custom code that ignores the normal promoting type legalization on less than 128-bit vector types like v4i8 to emit pavgb, paddusb, psubusb since we don't have the equivalent instruction on a larger element type like v4i32. If this operation appears before a store, we can be left with an any_extend_vector_inreg followed by a truncstore after type legalization. When truncstore isn't legal, this will normally be decomposed into shuffles and a non-truncating store. This will then combine away the any_extend_vector_inreg and shuffle leaving just the store. On avx512, truncstore is legal so we don't decompose it and we had no combines to fix it. This patch adds a new DAG combine to detect this case and emit either an extract_store for 64-bit stoers or a extractelement+store for 32 and 16 bit stores. This makes the avx512 codegen match the avx2 codegen for these situations. I'm restricting to only when -x86-experimental-vector-widening-legalization is false. When we're widening we're not likely to create this any_extend_inreg+truncstore combination. This means we should be able to remove this code when we flip the default. I would like to flip the default soon, but I need to investigate some performance regressions its causing in our branch that I wasn't seeing on trunk. Differential Revision: https://reviews.llvm.org/D65538 llvm-svn: 367488	2019-07-31 22:43:08 +00:00
Michael Berg	005d705d43	Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context. Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper Reviewed By: spatel Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji Differential Revision: https://reviews.llvm.org/D65170 llvm-svn: 367486	2019-07-31 21:57:28 +00:00
Amy Huang	27a73dd02c	Fix to r367374 "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" after windows buildbot failure. Added a check that the MachineInstr exists and is a call before trying to add symbols around it. llvm-svn: 367483	2019-07-31 21:03:38 +00:00
Peter Collingbourne	09f39967a2	AArch64: Add a tagged-globals backend feature. This feature instructs the backend to allow locally defined global variable addresses to contain a pointer tag in bits 56-63 that will be ignored by the hardware (i.e. TBI), but may be used by an instrumentation pass such as HWASAN. It works by adding a MOVK instruction to the regular ADRP/ADD sequence that sets bits 48-63 to the corresponding bits of the global, with the linker bounds check disabled on the ADRP instruction to prevent the tag from causing a link failure. This implementation of the feature omits the MOVK when loading from or storing to a global, which is sufficient for TBI. If the same approach is extended to MTE, assuming that 0 is not configured as a catch-all tag, we will most likely also need the MOVK in this case in order to avoid a tag mismatch. Differential Revision: https://reviews.llvm.org/D65364 llvm-svn: 367475	2019-07-31 20:14:19 +00:00
Craig Topper	d502f25373	[X86] Add test cases to show premature decomposition of vector multiplies into shift+add/sub for types that aren't legal and need to be split. NFC llvm-svn: 367466	2019-07-31 19:05:11 +00:00
Craig Topper	e3f0e67f2e	[X86] Add AVX512DQ command lines to vector-mul.ll to show that we use vpmullq instead of shift+add/sub for some cases. NFC llvm-svn: 367465	2019-07-31 19:05:03 +00:00
Simon Pilgrim	c4fa139a5c	[X86][SSE] Add test cases for PR42825 llvm-svn: 367435	2019-07-31 14:29:44 +00:00
Simon Pilgrim	24ad2b5e7d	[X86][AVX] Ensure chained subvector insertions are the same size (PR42833) Before combining insert_subvector(insert_subvector(vec, sub0, c0), sub1, c1) patterns, ensure that the subvectors are all the same type. On AVX512 targets especially we might have a mixture of 128/256 subvector insertions. llvm-svn: 367429	2019-07-31 12:55:39 +00:00
Momchil Velikov	a36d31478c	[AArch64] Add support for Transactional Memory Extension (TME) Re-commit r366322 after some fixes TME is a future architecture technology, documented in https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools https://developer.arm.com/docs/ddi0601/a More about the future architectures: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and TCANCEL and the target feature/arch extension "tme". It also implements TME builtin functions, defined in ACLE Q2 2019 (https://developer.arm.com/docs/101028/latest) Differential Revision: https://reviews.llvm.org/D64416 Patch by Javed Absar and Momchil Velikov llvm-svn: 367428	2019-07-31 12:52:17 +00:00
Simon Pilgrim	7cf5ef08b8	[X86] Regenerate lrshrink test checks to make D65354 diff easier llvm-svn: 367426	2019-07-31 12:30:24 +00:00
Simon Pilgrim	54a68f7c73	[X86] Regenerate callee-saved test checks to make D65354 diff easier llvm-svn: 367425	2019-07-31 12:29:07 +00:00
Simon Pilgrim	83d8d62399	[X86] Regenerate alias-static-alloca test checks to make D65354 diff easier I've manually added the stack offsets back as these are worth keeping - we really need a way for update_llc_test_checks.py not to mask out useful address math llvm-svn: 367424	2019-07-31 12:27:47 +00:00
Simon Pilgrim	f69cbb43ec	[X86] Regenerate vp2intersect tests Enable nounwind to remove unnecessary stack manipulation code llvm-svn: 367421	2019-07-31 12:17:10 +00:00
Simon Pilgrim	24e4e8087f	[X86][AVX] Add reduced test case for PR42833 llvm-svn: 367412	2019-07-31 11:35:01 +00:00
Oliver Cruickshank	09a1b8172b	[ARM] Generate MVE VFMAs llvm-svn: 367408	2019-07-31 10:44:11 +00:00
Sam Elliott	9e6b2e1605	[RISCV] Support 'f' Inline Assembly Constraint Summary: This adds the 'f' inline assembly constraint, as supported by GCC. An 'f'-constrained operand is passed in a floating point register. Exactly which kind of floating-point register (32-bit or 64-bit) is decided based on the operand type and the available standard extensions (-f and -d, respectively). This patch adds support in both the clang frontend, and LLVM itself. Reviewers: asb, lewis-revill Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65500 llvm-svn: 367403	2019-07-31 09:45:55 +00:00

1 2 3 4 5 ...

30013 Commits