llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	0ac4f6b627	[ARM] MVE vector reduce MLA tests. NFC.	2020-02-17 11:54:04 +00:00
Kerry McLaughlin	633db60f3e	[AArch64][SVE] Add SVE index intrinsic Summary: Implements the @llvm.aarch64.sve.index intrinsic, which takes a scalar base and step value. This patch also adds the printSImm function to AArch64InstPrinter to ensure that immediates of type i8 & i16 are printed correctly. Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin Reviewed By: cameron.mcinally Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74550	2020-02-17 10:30:11 +00:00
Sjoerd Meijer	a02056c960	[X86] New test to check rev16 patterns, prep step for D74032. NFC.	2020-02-17 09:13:21 +00:00
Kang Zhang	f4e920720d	[NFC][PowerPC] Update the test case scalar-equal.ll Modify the command option to add --enable-no-nans-fp-math	2020-02-17 08:34:56 +00:00
QingShan Zhang	113df90388	[PowerPC] Add the missing InstrAliasing for 64-bit rotate instructions We have the InstAlias rules for 32-bit rotate but missing the 64-bit one. Rotate left immediate rotlwi ra,rs,n rlwinm ra,rs,n,0,31 Rotate left rotlw ra,rs,rb rlwnm ra,rs,rb,0,31 Differential Revision: https://reviews.llvm.org/D72676	2020-02-17 05:42:49 +00:00
Kang Zhang	1ae05a3c66	[NFC][PowerPC] Add a new test case scalar-equal.ll	2020-02-17 05:27:36 +00:00
Craig Topper	dd0b18e1ec	[X86] Disable load folding for X86ISD::ADD with 128 as an immediate. It can be turned into a sub with -128 instead as long as the carry flag isn't used.	2020-02-16 20:52:51 -08:00
Matt Arsenault	295bbea3ed	AMDGPU/GlobalISel: Fix non-power-of-2 G_SITOFP/G_UITOFP This wouldn't work for s33-s63 sources.	2020-02-16 22:48:57 -05:00
Matt Arsenault	24c156194b	AMDGPU/GlobalISel: Add some missing tests for non-power-of-2 cases	2020-02-16 22:48:42 -05:00
Zheng Chen	04377a81ae	[Powerpc] set instruction count as lsr first priority of lsr. On Powerpc, set instruction count as lsr first priority of lsr by default. Add an option ppc-lsr-no-insns-cost to return back to default lsr cost model. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D72683	2020-02-16 21:04:55 -05:00
Simon Pilgrim	b85df2e185	[X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to PALIGNR	2020-02-16 16:13:26 +00:00
Simon Pilgrim	c9c1c2b335	[X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to bit shifts	2020-02-16 16:13:25 +00:00
Sanjay Patel	e48b536be6	[x86] form broadcast of scalar memop even with >1 use The unseen logic diff occurs because MayFoldLoad() is defined like this: static bool MayFoldLoad(SDValue Op) { return Op.hasOneUse() && ISD::isNormalLoad(Op.getNode()); } The test diffs here all seem ok to me on screen/paper, but it's hard to know if that will lead to universally better perf for all targets. For example, if a target implements broadcast from mem as multiple uops, we would have to weigh the potential reduction of instructions and register pressure vs. possible increase in number of uops. I don't know if we can make a truly informed decision on this at compile-time. The motivating case that I'm looking at in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 ...resembles the diff in extract-concat.ll, but we're not going to change the larger example there without at least 1 other fix. Differential Revision: https://reviews.llvm.org/D74088	2020-02-16 10:32:56 -05:00
Simon Pilgrim	5d22b6a87f	[X86] Add test cases showing failure to simplify target shuffles to bit shifts	2020-02-15 23:34:31 +00:00
Simon Pilgrim	c1186d50f9	[X86][AVX512] Split AVX512F and AVX512BW shuffle combining tests Split off shuffle combine tests that use AVX512F intrinsics, so we can test it with/without AVX512BW support.	2020-02-15 22:48:52 +00:00
Fangrui Song	46788a21f9	[X86][AsmPrinter] PrintSymbolOperand: prefer to lower ELF MO_GlobalAddress to .Lfoo$local	2020-02-15 13:45:29 -08:00
Simon Pilgrim	34a054ce71	[X86] combineX86ShuffleChain - add support for combining to X86ISD::ROTLI Refactors matchShuffleAsBitRotate to allow use by both lowerShuffleAsBitRotate and matchUnaryPermuteShuffle.	2020-02-15 20:04:54 +00:00
Simon Pilgrim	4abbaceea0	[X86] Add test showing failure to combine shuffle to bit rotation	2020-02-15 19:23:00 +00:00
Craig Topper	3f7649799b	[X86] Move combineIncDecVector logic from Select to PreprocessISelDAG. This allows it to work properly with masked inc/dec for avx512. Those would have a vselect as the root node so didn't get a chance to call combineIncDecVector. This also simplifies the logic because we don't have to manage the topological ordering.	2020-02-15 09:59:12 -08:00
David Green	da147ef0a5	[AArch64] Fixup kill flags on BSL generation This hopefully fixes up the expensive checks bot.	2020-02-15 11:44:23 +00:00
Fangrui Song	6b14814e10	[AsmPrinter] Omit unique ID for .stack_sizes Follow-up for D74006.	2020-02-14 21:25:06 -08:00
Fangrui Song	895cad1a13	[AsmPrinter][XRay] Omit unique ID for xray_instr_map and xray_fn_idx Follow-up for D74006.	2020-02-14 21:10:46 -08:00
Diogo Sampaio	8bc790f9e6	[AArch64][FPenv] Update chain of int to fp conversion Summary: When using strict fp, it is required to update the chain when performing integer type promotion of a operand to a integer to floating point conversion. Reviewers: craig.topper, john.brawn Reviewed By: craig.topper Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74597	2020-02-15 05:07:34 +00:00
Fangrui Song	f554e27224	[AsmPrinter] Omit unique ID for __patchable_function_entries sections Follow-up for D74006. When the integrated assembler is used, we use SHF_LINK_ORDER. The linked-to symbol is part of ELFSectionKey, thus we can omit the unique ID.	2020-02-14 20:54:54 -08:00
Fangrui Song	0fbe221543	[MC][ELF] Make linked-to symbol name part of ELFSectionKey https://bugs.llvm.org/show_bug.cgi?id=44775 This rule has been implemented by GNU as https://sourceware.org/ml/binutils/2020-02/msg00028.html (binutils >= 2.35) It allows us to simplify ``` .section .foo,"o",foo,unique,0 .section .foo,"o",bar,unique,1 # different section ``` to ``` .section .foo,"o",foo .section .foo,"o",bar # different section ``` We consider the two `.foo` different even if the linked-to symbols foo and bar are defined in the same section. This is a deliberate choice so that we don't need to know the section where foo and bar are defined beforehand. Differential Revision: https://reviews.llvm.org/D74006	2020-02-14 20:03:04 -08:00
Matt Arsenault	8d8d46b57a	AMDGPU/GlobalISel: Fix missing impdef of scc on boolean bit ops	2020-02-14 22:35:30 -05:00
Shiva Chen	1cae2f9d19	[RISCV] Correct the CallPreservedMask for the function call in an interrupt handler CallPreservedMask is used to describe the register liveness after a function call. The function call in an interrupt handler should use the same CallPreservedMask as normal functions. So that only callee save registers can live through the function call.	2020-02-15 09:14:04 +08:00
Matt Arsenault	65dbdc329f	AMDGPU: Don't preserve analyses with div64 IR expansion The dominator tree needs to be updated, but that isn't handled now.	2020-02-14 20:06:02 -05:00
Matt Arsenault	dc3e499dd4	AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results This would assert on an unhandled size in getRegSplitParts.	2020-02-14 15:57:40 -08:00
Matt Arsenault	630b47e518	AMDGPU: Use generated checks for memcpy expansion	2020-02-14 15:57:40 -08:00
Matt Arsenault	60fea2713d	AMDGPU/GlobalISel: Improve 16-bit bswap Match the new DAG behavior and use v_perm_b32 when available. Also does better on SI/CI by expanding 16-bit swaps. Also fix non-power-of-2 cases.	2020-02-14 15:57:39 -08:00
Matt Arsenault	9ec668606b	AMDGPU: Add option to disable CGP division expansion The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalization in some cases. We still need a way to be able to write tests for the legalizer versions of the expansion. This is mostly for GlobalISel, since the expected optimzations is expecting aren't implemented. The interaction with the flag to expand 64-bit division in the IR is pretty confusing, but these flags have different purposes.	2020-02-14 11:37:07 -08:00
Sanjay Patel	63ed0eceaf	[x86] remove stray test assertions; NFC I updated the prefix and forgot to manually remove the old names as part of rG6071fc57a45.f	2020-02-14 14:28:50 -05:00
Sanjay Patel	6071fc57a4	[x86] regenerate complete test checks for sqrt{est}; NFC The existing checks were trying to test both CPU-specific codegen and generic codegen with explicit attributes for the various sqrt estimate possibilities, but that was hard to decipher and update (D69989). Instead generate the complete results for various CPUs, and that makes it clear which models have slow/fast sqrt attributes along with all of the other potential diffs (FMA, AVX2, scheduling). Also, explicitly add the function attributes corresponding to whether DAZ/FTZ denorm settings are expected.	2020-02-14 14:21:28 -05:00
Matt Arsenault	34d9a16e54	AMDGPU: Add option to expand 64-bit integer division in IR I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, so produces significantly smaller code than the inline DAG expansion. This now requires width reductions of 64-bit divisions before introducing the expanded loops. This helps work around missing legalization in GlobalISel for division, which are the only remaining core instructions that didn't work at all. I think this is plausibly a better implementation than exists in the DAG, although turning it on by default misses out on the constant value optimizations and also needs benchmarking.	2020-02-14 11:16:08 -08:00
Craig Topper	391cc4dd41	[X86] Use ZERO_EXTEND instead of SIGN_EXTEND in the fast isel handling of convert_from_fp16.	2020-02-14 10:57:12 -08:00
Craig Topper	fc0c72b2df	[X86] Add AVX512 support to the fast isel code for Intrinsic::convert_from_fp16/convert_to_fp16.	2020-02-14 10:57:11 -08:00
Matt Arsenault	bfbfa18591	GlobalISel: Lower s64->s16 G_FPTRUNC This is more or less directly ported from the AMDGPU custom lowering for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES instead of creating shift/trunc to extract the two halves, and zexting an inverted compare instead of select_cc). This also does not include the fast math expansion the DAG which converts to f32 and then to f16. I think that belongs in a pre-legalize combine instead.	2020-02-14 10:46:58 -08:00
Volkan Keles	187686a22f	[GlobalISel] LegalizationArtifactCombiner: Fix a bug in tryCombineMerges Like COPY instructions explained in D70616, we don't check the constraints when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check if registers can be replaced, or a COPY instruction needs to be built. https://reviews.llvm.org/D70564	2020-02-14 10:45:58 -08:00
Brian Cain	bf3b86bc2f	[Hexagon] v67+ HVX register pairs should support either direction Assembler now permits pairs like 'v0:1', which are encoded differently from the odd-first pairs like 'v1:0'. The compiler will require more work to leverage these new register pairs.	2020-02-14 12:43:43 -06:00
Matt Arsenault	8c2c0b3637	AMDGPU: Improve i16/v2i16 bswap	2020-02-14 09:53:22 -08:00
Matt Arsenault	e0fd2d6d62	AMDGPU: Add baseline tests for 16-bit bswap	2020-02-14 09:34:13 -08:00
Matt Arsenault	a257bde420	AMDGPU/GlobalISel: Handle G_BSWAP	2020-02-14 09:09:44 -08:00
Pavel Iliin	b6a9fe2099	[AArch64] Add BIT/BIF support. This patch added generation of SIMD bitwise insert BIT/BIF instructions. In the absence of GCC-like functionality for optimal constraints satisfaction during register allocation the bitwise insert and select patterns are matched by pseudo bitwise select BSP instruction with not tied def. It is expanded later after register allocation with def tied to BSL/BIT/BIF depending on operands registers. This allows to get rid of redundant moves. Reviewers: t.p.northover, samparker, dmgreen Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D74147	2020-02-14 14:19:39 +00:00
Simon Pilgrim	2492075add	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option. REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.	2020-02-14 11:55:18 +00:00
Kazushi (Jam) Marukawa	60431bd728	[VE] Support for PIC (global data and calls) Summary: Support for PIC with tests for global variables and function calls. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D74536	2020-02-14 09:50:02 +01:00
Liu, Chen3	ec89335c47	[X86] Fix the bug that _mm_mask_cvtsepi64_epi32 generates result without zero the upper 64bit. Differential Revision : https://reviews.llvm.org/D74552	2020-02-14 09:26:06 +08:00
Thomas Lively	918e90559b	[WebAssembly] Make stack pointer args inhibit tail calls Summary: Also make return calls terminator instructions so epilogues are inserted before them rather than after them. Together, these changes make WebAssembly's tail call optimization more stack-safe. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73943	2020-02-13 16:43:53 -08:00
Pavel Iliin	b23ec43973	[AArch64][NFC] Update test checks. This NFC commit updates several llc tests checks by automatically generated ones.	2020-02-14 00:13:15 +00:00
Craig Topper	c2e8a421ac	[X86] Don't widen 128/256-bit strict compares with vXi1 result to 512-bits on KNL. If we widen the compare we might trigger a spurious exception from the garbage data. We have two choices here. Explicitly force the upper bits to zero. Or use a legacy VEX vcmpps/pd instruction and convert the XMM/YMM result to mask register. I've chosen to go with the second option. I'm not sure which is really best. In some cases we could get rid of the zeroing since the producing instruction probably already zeroed it. But we lose the ability to fold a load. So which is best is dependent on surrounding code. Differential Revision: https://reviews.llvm.org/D74522	2020-02-13 13:26:40 -08:00

1 2 3 4 5 ...

32711 Commits