llvm-project

Commit Graph

Author	SHA1	Message	Date
Coby Tayree	d8b17bedfa	[x86][icelake]GFNI galois field arithmetic (GF(2^8)) insns: gf2p8affineinvqb gf2p8affineqb gf2p8mulb Differential Revision: https://reviews.llvm.org/D40373 llvm-svn: 318993	2017-11-26 09:36:41 +00:00
Craig Topper	e485631cd1	[X86] Add separate intrinsics for scalar FMA4 instructions. Summary: These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits. I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512. I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before. I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics. fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39851 llvm-svn: 318984	2017-11-25 18:32:43 +00:00
Craig Topper	ea37e201ec	[X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions. Summary: This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior. Test command lines have been added for these two cases. Reviewers: magabari, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40282 llvm-svn: 318983	2017-11-25 18:09:37 +00:00
Andrew V. Tischenko	198720d38e	Add BTVER2 sched support for SHLD/SHRD. Differential Revision: https://reviews.llvm.org/D40124 llvm-svn: 318977	2017-11-25 10:46:53 +00:00
Craig Topper	c1b3269171	[X86] Support folding to andnps with SSE1 only. With SSE1 only, we emit FAND and FXOR nodes for v4f32. llvm-svn: 318968	2017-11-25 07:20:22 +00:00
Craig Topper	5b85df8605	[X86] Add some early DAG combines to turn v4i32 AND/OR/XOR into FAND/FOR/FXOR whe only SSE1 is available. v4i32 isn't a legal type with sse1 only and would end up getting scalarized otherwise. This isn't completely ideal as it doesn't handle cases like v8i32 that would get split to v4i32. But it at least helps with code written using the clang intrinsic header. llvm-svn: 318967	2017-11-25 07:20:21 +00:00
Craig Topper	13ed01e635	[X86] Prevent using X * rsqrt(X) to approximate sqrt when only sse1 is enabled. This optimization can occur after type legalization and emit a vselect with v4i32 type. But that type is not legal with sse1. This ultimately gets scalarized by the second type legalization that runs after vector op legalization, but that's really intended to handle the scalar types that might be introduced by legalizing vector ops. For now just stop this from happening by disabling the optimization with sse1. llvm-svn: 318965	2017-11-24 19:57:48 +00:00
Simon Dardis	230f453574	[CodeGenPrepare] Check that erased sunken address are not reused CodeGenPrepare sinks address computations from one basic block to another and attempts to reuse address computations that have already been sunk. If the same address computation appears twice with the first instance as an operand of a load whose result is an operand to a simplifable select, CodeGenPrepare simplifies the select and recursively erases the now dead instructions. CodeGenPrepare then attempts to use the erased address computation for the second load. Fix this by erasing the cached address value if it has zero uses before looking for the address value in the sunken address map. This partially resolves PR35209. Thanks to Alexander Richardson for reporting the issue! This fixed version relands r318032 which was reverted in r318049 due to sanitizer buildbot failures. Reviewers: john.brawn Differential Revision: https://reviews.llvm.org/D39841 llvm-svn: 318956	2017-11-24 16:45:28 +00:00
Dmitry Preobrazhensky	0e8924a5c7	[AMDGPU][MC][GFX9] Added v_interp_p2_f16 and v_interp_p2_legacy_f16 See bug 33629: https://bugs.llvm.org//show_bug.cgi?id=33629 Reviewers: artem.tamazov, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D39488 llvm-svn: 318955	2017-11-24 15:37:14 +00:00
Dylan McKay	d3972a8f11	[AVR] Use the short form of 'clr <reg>' r318895 made it so that the simpler instruction aliases are printed rather than their expanded form. llvm-svn: 318954	2017-11-24 15:36:43 +00:00
John Brawn	70cdb5b391	[CGP] Make optimizeMemoryInst able to combine more kinds of ExtAddrMode fields This patch extends the recent work in optimizeMemoryInst to make it able to combine more ExtAddrMode fields than just the BaseReg. This fixes some benchmark regressions introduced by r309397, where GVN PRE is hoisting a getelementptr such that it can no longer be combined into the addressing mode of the load or store that uses it. Differential Revision: https://reviews.llvm.org/D38133 llvm-svn: 318949	2017-11-24 14:10:45 +00:00
Aleksandar Beserminji	590f0793e8	[mips] Set microMIPS ASE flag This patch fixes an issue where microMIPS ASE flag is not set when a function has micromips attribute or when .set micromips directive is used. Differential Revision: https://reviews.llvm.org/D40316 llvm-svn: 318948	2017-11-24 14:00:47 +00:00
Dmitry Preobrazhensky	dd2f1c993e	[AMDGPU][MC][GFX9] Added support of 'inst_offset' modifier for compatibility with SP3 See bug 35329: https://bugs.llvm.org//show_bug.cgi?id=35329 Reviewers: arsenm, vpykhtin, artem.tamazov Differential Revision: https://reviews.llvm.org/D40350 llvm-svn: 318947	2017-11-24 13:22:38 +00:00
Craig Topper	40a1edc307	[X86] Don't invert NewCC variable while processing the jcc/setcc/cmovcc instructions in optimizeCompareInstr. The NewCC variable is calculated outside of the loop that processes jcc/setcc/cmovcc instructions. If we invert it during the loop it can cause an incorrect value to be used by a later iteration. Instead only read it during the loop and use a new variable to store the possibly inverted value. Fixes PR35399. llvm-svn: 318934	2017-11-23 19:25:45 +00:00
Craig Topper	f31b0b850b	[X86] Teach isel that X86ISD::CMPM_RND zeros the upper bits of the mask register. llvm-svn: 318933	2017-11-23 18:41:21 +00:00
Simon Pilgrim	90accbc5d9	[X86][SSE] Use (V)PHMINPOSUW for vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions (PR32841) (V)PHMINPOSUW determines the UMIN element in an v8i16 input, with suitable bit flipping it can also be used for SMAX/SMIN/UMAX cases as well. This patch matches vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions and reduces the input down to a v8i16 vector before calling (V)PHMINPOSUW. A later patch will use this for v16i8 reductions as well (PR32841). Differential Revision: https://reviews.llvm.org/D39729 llvm-svn: 318917	2017-11-23 13:50:27 +00:00
Diana Picus	c01f7f131b	[ARM GlobalISel] Support G_FDIV for s32 and s64 TableGen already generates code for selecting a G_FDIV, so we only need to add a test. For the legalizer and reg bank select, we do the same thing as for the other floating point binary operations: either mark as legal if we have a FP unit or lower to a libcall, and map to the floating point registers. llvm-svn: 318915	2017-11-23 13:26:07 +00:00
Diana Picus	9faa09b21e	[ARM GlobalISel] Support G_FMUL for s32 and s64 TableGen already generates code for selecting a G_FMUL, so we only need to add a test for that part. For the legalizer and reg bank select, we do the same thing as the other floating point binary operators: either mark as legal if we have a FP unit or lower to a libcall, and map to the floating point registers. llvm-svn: 318910	2017-11-23 12:44:20 +00:00
Simon Dardis	eb5bfd9889	[mips] Use the delay slot filler to convert branches for microMIPSR6. The MIPS delay slot filler converts delay slot branches into compact forms for the MIPS ISAs which support them. For branches that compare (in)equality with with zero, it converts them into branches with implict zero register operands. These branches have a slightly greater range than normal two register operands branches. Changing the branches at this point in the pipeline offers the long branch pass the ability to mark better judgements if a long branch sequence is required. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D40314 llvm-svn: 318908	2017-11-23 12:38:04 +00:00
Coby Tayree	e8bdd383e9	[x86][icelake]BITALG 2/3 vpshufbitqmb encoding 3/3 vpshufbitqmb intrinsics Differential Revision: https://reviews.llvm.org/D40222 llvm-svn: 318904	2017-11-23 11:15:50 +00:00
Alexander Potapenko	391804f54b	[MSan] Move the access address check before the shadow access for that address MSan used to insert the shadow check of the store pointer operand _after_ the shadow of the value operand has been written. This happens to work in the userspace, as the whole shadow range is always mapped. However in the kernel the shadow page may not exist, so the bug may cause a crash. This patch moves the address check in front of the shadow access. llvm-svn: 318901	2017-11-23 08:34:32 +00:00
Craig Topper	3fba1bfb77	[X86] Regenerate the vector-popcnt and vector-tzcnt tests to get BITALG CHECK linse on all functions not just the vXi16/vXi8. llvm-svn: 318885	2017-11-22 23:35:12 +00:00
Fedor Sergeev	61975b49fe	IR printing improvement for loop passes Summary: Loop-pass printing is somewhat deficient since it does not provide the context around the loop (e.g. preheader). This context information becomes pretty essential when analyzing transformations that move stuff out of the loop. Extending printLoop to cover preheader and exit blocks (if any). Reviewers: sanjoy, silvas, weimingz Reviewed By: sanjoy Subscribers: apilipenko, skatkov, llvm-commits Differential Revision: https://reviews.llvm.org/D40246 llvm-svn: 318878	2017-11-22 20:59:53 +00:00
Krzysztof Parzyszek	942fa1631f	[Hexagon] Implement buildVector32 and buildVector64 as utility functions Change LowerBUILD_VECTOR to use those functions. This commit will tempora- rily affect constant vector generation (it will generate constant-extended values instead of non-extended combines), but the code for the general case should be better. The constant selection part will be fixed later. llvm-svn: 318877	2017-11-22 20:56:23 +00:00
Krzysztof Parzyszek	b9f33b32ee	[Hexagon] Add patterns to select A2_combine_ll and its variants llvm-svn: 318876	2017-11-22 20:55:41 +00:00
Krzysztof Parzyszek	6acecc96ac	[Hexagon] Remove trailing spaces, NFC llvm-svn: 318875	2017-11-22 20:43:00 +00:00
Craig Topper	726968d6a2	[X86] Support v32i16/v64i8 CTLZ using lookup table. Had to tweak the setcc's used by the code to use a vXi1 result type with a sign extend back to vector size. llvm-svn: 318871	2017-11-22 20:05:57 +00:00
Paul Robinson	6ca1dd6fa3	[DwarfDump] -debug-line=offset applies to .dwo too. llvm-svn: 318856	2017-11-22 18:23:55 +00:00
Yaxun Liu	c596226604	[AMDGPU] Fix SITargetLowering::LowerCall for pointer info of byval argument SITargetLowering::LowerCall uses dummy pointer info for byval argument, which causes flat load instead of buffer load. This patch fixes that. Differential Revision: https://reviews.llvm.org/D40040 llvm-svn: 318844	2017-11-22 16:13:35 +00:00
Paul Robinson	511b54cadc	[DebugInfo] Dump a .debug_line section, including line-number program, without any compile units. Differential Revision: https://reviews.llvm.org/D40114 llvm-svn: 318842	2017-11-22 15:48:30 +00:00
Dmitry Preobrazhensky	c492500e7e	[AMDGPU][mc][tests] Updated generated lit tests for GFX8/9 Summary: Added tests to better cover features introduced by commit rL318675. See http://llvm.org/viewvc/llvm-project?view=revision&revision=318675 llvm-svn: 318841	2017-11-22 15:47:27 +00:00
Paul Robinson	63811a472e	[DWARFv5] Support DW_FORM_strp in the .debug_line.dwo header. As a side effect, the .debug_line section will be dumped in physical order, rather than in the order that compile units refer to their associated portions of the .debug_line section. These are probably always the same order anyway, and no tests noticed the difference. Differential Revision: https://reviews.llvm.org/D39854 llvm-svn: 318839	2017-11-22 15:33:17 +00:00
Paul Robinson	e0833349b6	[DWARF] Fix handling of extended line-number opcodes Differential Revision: https://reviews.llvm.org/D40200 llvm-svn: 318838	2017-11-22 15:14:49 +00:00
Nicolai Haehnle	dd059c161d	AMDGPU: Consider memory dependencies with moved instructions in SILoadStoreOptimizer Summary: This bug seems to have gone unnoticed because critical cases with LDS instructions are eliminated by the peephole optimizer. However, equivalent situations arise with buffer loads and stores as well, so this fixes regressions since r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4"). Fixes at least: KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs KHR-GL45.cull_distance.functional piglit tes-input-gl_ClipDistance.shader_test ... and probably more Change-Id: I0e371536288eb8e6afeaa241a185266fd45d129d Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40303 llvm-svn: 318829	2017-11-22 12:25:21 +00:00
Jonas Paulsson	181e260e32	[DAGCombiner] Bugfix in isAlias(). Since i1 is a legal type, this: NumBytes = Op1->getMemoryVT().getSizeInBits() >> 3; is wrong and should be instead NumBytes = Op0->getMemoryVT().getStoreSize(); There seems to be more places where this should be fixed outside DAGCombiner. Review: Hal Finkel https://bugs.llvm.org/show_bug.cgi?id=35366 llvm-svn: 318824	2017-11-22 08:58:30 +00:00
Max Kazantsev	23044fa639	[SCEV] Strengthen variance condition in calculateLoopDisposition Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively. When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if `L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are consecutive sibling loops within the parent loop. In this case, `AR2` is also varying w.r.t. `L1`, but we don't correctly identify it. It can lead, for exaple, to attempt of incorrect folding. Consider: AR1 = {a,+,b}<L1> AR2 = {c,+,d}<L2> EXAR2 = sext(AR1) MUL = mul AR1, EXAR2 If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect because `AR2` is not available on entrance of `L1`. Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled with one check: "header of `L1` dominates header of `L2`". This patch replaces the old insufficient check with this one. Differential Revision: https://reviews.llvm.org/D39453 llvm-svn: 318819	2017-11-22 06:21:39 +00:00
Davide Italiano	b480b5c2ee	[SCCP] Pick the right lattice value for constants. After the dataflow algorithm proves that an argument is constant, it replaces it value with the integer constant and drops the lattice value associated to the DEF. e.g. in the example we have @f() that's called twice: call @f(undef, ...) call @f(2, ...) `undef` MEET 2 = 2 so we replace the argument and all its uses with the constant 2. Shortly after, tryToReplaceWithConstantRange() tries to get the lattice value for the argument we just replaced, causing an assertion. This function is a little peculiar as it runs when we're doing replacement and not as part of the solver but still queries the solver. The fix is that of checking whether we replaced the value already and get a temporary lattice value for the constant. Thanks to Zhendong Su for the report! Fixes PR35357. llvm-svn: 318817	2017-11-22 03:04:55 +00:00
Peter Collingbourne	6c48462276	Object: Improve COFF irsymtab comdat representation. Change the representation of COFF comdats so that a COFF linker is able to accurately resolve comdats between IR and native object files. Specifically, apply name mangling to comdat names consistently with native object files, and do not export comdats with an internal leader because they do not affect symbol resolution. Differential Revision: https://reviews.llvm.org/D40278 llvm-svn: 318805	2017-11-21 22:06:20 +00:00
Krzysztof Parzyszek	fc0a1812f5	[Hexagon] Make sure that RDF does not remove EH_LABELs Since EH_LABELs (and other labels) no longer have "side-effects", they should be checked for separately. llvm-svn: 318801	2017-11-21 21:05:51 +00:00
Craig Topper	ba150ef60a	[X86] Allow vpclmulqdq instructions to be commuted during isel to allow load folding. The commuting patterns for the AVX version actually still had priority over the new patterns. llvm-svn: 318800	2017-11-21 21:05:21 +00:00
Nirav Dave	61ffc9c0eb	Avoid unecessary opsize byte in segment move to memory Segment moves to memory are always 16-bit. Remove invalid 32 and 64 bit variants. Recommiting with missing clang inline assembly test change. Fixes PR34478. Reviewers: rnk, craig.topper Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39847 llvm-svn: 318797	2017-11-21 19:28:13 +00:00
Chad Rosier	fe97d73674	[AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as having side effects. This partially reverts r298851. The the underlying issue is that we don't currently model the dependency between mrs (read system register) and msr (write system register) instructions. Something like the below should never be reordered: msr TPIDR_EL0, x0 ;; set thread pointer mrs x8, TPIDR_EL0 ;; read thread pointer but was being reordered after r298851. The functional part of the patch that wasn't reverted needed to remain in place in order to not break r299462. PR35317 llvm-svn: 318788	2017-11-21 18:08:34 +00:00
Hans Wennborg	d97c0f7855	Rename test/Transforms/CountingFunctionInserter -> EntryExitInstrumenter The pass was renamed in r318195. llvm-svn: 318784	2017-11-21 17:22:19 +00:00
Hans Wennborg	37cbf28e79	EntryExitInstrumenter: support __cyg_profile_func_enter_bare It works just like __cyg_profile_func_enter but takes no arguments. llvm-svn: 318783	2017-11-21 17:22:19 +00:00
Oliver Stannard	9cb89f6611	[ARM] Remove pre-UAL FLDM/FSTM aliases These are pre-UAL syntax, and we don't support any other pre-UAL instructions, with the exception of FLDMX/FSTMX, which don't have a UAL equivalent. Therefore there's no reason to keep them or their AsmParser hacks around. With the AsmParser hacks removed, the FLDMX and FSTMX instructions get the same operand diagnostics as the UAL instructions. Differential revision: https://reviews.llvm.org/D39196 llvm-svn: 318777	2017-11-21 16:20:25 +00:00
Oliver Stannard	1e6d4b9e62	[ARM] Don't omit non-default predication code This was causing the (invalid) predicated versions of the NEON VRINTX and VRINTZ instructions to be accepted, with the condition code being ignored. Also, there is no NEON VRINTR instruction, so that part of the check was not necessary. Differential revision: https://reviews.llvm.org/D39193 llvm-svn: 318771	2017-11-21 15:34:15 +00:00
Oliver Stannard	1e73e95f3c	[Asm] Improve "too few operands" errors - We can still emit this error if the actual instruction has two or more operands missing compared to the expected one. - We should only emit this error once per instruction. Differential revision: https://reviews.llvm.org/D36746 llvm-svn: 318770	2017-11-21 15:16:50 +00:00
Sander de Smalen	4acd57eb51	Revert r318759 due to make check-all failure on Windows llvm-svn: 318768	2017-11-21 15:07:43 +00:00
Oliver Stannard	d6ca9879ba	[ARM] Add diagnostics for SPR/DPR lists Differential revision: https://reviews.llvm.org/D39195 llvm-svn: 318766	2017-11-21 15:06:01 +00:00
Alexey Bataev	a054ea9848	[InstCombine] Test for PR35354: unable to vectorize loop with std::max on floats, NFC. llvm-svn: 318764	2017-11-21 14:49:13 +00:00

1 2 3 4 5 ...

49024 Commits