llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	905e79c4dc	[X86][SSE] Add test cases vector for integer multiplies Mainly inspired by PR34474 / D37896 llvm-svn: 313353	2017-09-15 11:17:42 +00:00
Sjoerd Meijer	0c5ba21cbf	[AArch64] allow v8f16 types when FullFP16 is supported This adds support for allowing v8f16 vector types, thus avoiding conversions from/to single precision for these types. This is a follow up patch of commits r311154 and r312104, which added support for scalars and v4f16 types, respectively. Differential Revision: https://reviews.llvm.org/D37802 llvm-svn: 313351	2017-09-15 09:24:48 +00:00
Jatin Bhateja	908c8b37c2	[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet Reviewed By: lsaba Subscribers: spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313343	2017-09-15 05:29:51 +00:00
Matt Arsenault	c317287fde	AMDGPU: Fix violating constant bus restriction You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298	2017-09-14 20:54:29 +00:00
Matt Arsenault	37ab4cf8b8	AMDGPU: Fix assert on alloca of array of struct llvm-svn: 313282	2017-09-14 18:02:29 +00:00
Matt Arsenault	defe371771	AMDGPU: Stop modifying SP in call sequences Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279	2017-09-14 17:37:40 +00:00
Matt Arsenault	6efd082c01	AMDGPU: Make frame register caller preserved Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274	2017-09-14 17:14:57 +00:00
Krzysztof Parzyszek	6ca02b25a7	[IfConversion] More simple, correct dead/kill liveness handling Patch by Jesper Antonsson. Differential Revision: https://reviews.llvm.org/D37611 llvm-svn: 313268	2017-09-14 15:53:11 +00:00
Chad Rosier	4d6d74e236	Add newline to end of test file. NFC. llvm-svn: 313263	2017-09-14 14:48:59 +00:00
Simon Pilgrim	0b220c7524	[X86] Regenerate test. NFCI. llvm-svn: 313259	2017-09-14 13:00:27 +00:00
Simon Pilgrim	47d8f62472	Regenerate test (broadcast comment). NFCI. llvm-svn: 313258	2017-09-14 12:41:19 +00:00
Ayman Musa	ab68449c53	[X86] When applying the shuffle-to-zero-extend transformation on floating point, bitcast to integer first. Fix issue described in PR34577. Differential Revision: https://reviews.llvm.org/D37803 llvm-svn: 313256	2017-09-14 12:06:38 +00:00
Simon Dardis	28365b33ad	[mips] Pick the right variant of DINS upfront and enable target instruction verification This patch complements D16810 "[mips] Make isel select the correct DEXT variant up front.". Now ISel picks the right variant of DINS, so now there is no need to replace DINS with the appropriate variant during MipsMCCodeEmitter::encodeInstruction(). This patch also enables target specific instruction verification for ins, dins, dinsm, dinsu, ext, dext, dextm, dextu. These instructions have constraints that are checked when generating MipsISD::Ins and MipsISD::Ext nodes, but these constraints are not checked during instruction selection. Adding machine verification should catch outstanding cases. Finally, correct a bug that instruction verification uncovered, where the position operand of a DINSU generated during lowering was being silently and accidently corrected to the correct value. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D34809 llvm-svn: 313254	2017-09-14 10:58:00 +00:00
Simon Pilgrim	8bd2d8780a	[DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or. Original patch by @tstellar Differential Revision: https://reviews.llvm.org/D19325 llvm-svn: 313251	2017-09-14 10:38:30 +00:00
Simon Pilgrim	337b2d007a	Fix line endings. NFCI. llvm-svn: 313247	2017-09-14 10:30:54 +00:00
Simon Pilgrim	11e2969a35	Fix line endings. NFCI. llvm-svn: 313246	2017-09-14 10:30:22 +00:00
Dean Michael Berris	01fd7c8bd4	[XRay][CodeGen] Use the current function symbol as the associated symbol for the instrumentation map Summary: XRay had been assuming that the previous section is the "text" section of the function when lowering the instrumentation map. Unfortunately this is not a safe assumption, because we may be coming from lowering debug type information for the function being lowered. This fixes an issue with combining -gsplit-dwarf, -generate-type-units, -debug-compile and -fxray-instrument for sole member functions. When the split dwarf section is stripped, we're left with references from the xray_instr_map to the debug section. The change now uses the function's symbol instead of the previous section's start symbol. We found the bug while attempting to strip the split debug sections off an XRay-instrumented object file, which had a peculiar edge-case for single-function classes where the single function is being lowered. Because XRay had assocaited the instrumentation map for a function to the debug types section instead of the function's section, the objcopy call will fail due to the misplaced reference from the xray_instr_map section. Reviewers: pcc, dblaikie, echristo Subscribers: llvm-commits, aprantl Differential Revision: https://reviews.llvm.org/D37791 llvm-svn: 313233	2017-09-14 07:08:23 +00:00
NAKAMURA Takumi	38fac5905e	Move llvm/test/CodeGen/X86/clear-liverange-spillreg.mir to SystemZ. It was in wrong place. llvm-svn: 313218	2017-09-14 00:03:23 +00:00
Matt Arsenault	ecb43ef1bc	AMDGPU: Don't spill SP reg like a normal CSR llvm-svn: 313217	2017-09-13 23:47:01 +00:00
Hans Wennborg	06e2a384c2	Revert r312719 "[MachineCombiner] Update instruction depths incrementally for large BBs." This caused PR34596. > [MachineCombiner] Update instruction depths incrementally for large BBs. > > Summary: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313213	2017-09-13 23:23:09 +00:00
Stanislav Mekhanoshin	7fe9a5d9b4	Allow target to decide when to cluster loads/stores in misched MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208	2017-09-13 22:20:47 +00:00
Wei Mi	a2a135a01c	Add a comment for the test. NFC. llvm-svn: 313199	2017-09-13 21:47:13 +00:00
Wei Mi	c0d066468e	[RegAlloc] Keep a copy of live interval for the spilled vregs in HoistSpillHelper. This is to fix PR34502. After rL311401, the live range of spilled vreg will be cleared. HoistSpill need to use the live range of the original vreg before splitting to know the moving range of the spills. The patch saves a copy of live interval for the spilled vreg inside of HoistSpillHelper. Differential Revision: https://reviews.llvm.org/D37578 llvm-svn: 313197	2017-09-13 21:41:30 +00:00
Gadi Haber	35f4d7ca46	[X86][Skylake] Replacing -mcpu=skx by -mattr in a codegen test. NFC. NFC. Replacing -mcpu=skx by -mattr in the run command of the codegen test: avx512-gather-scatter-intrin.ll. Reviewers: delena Revision: https://reviews.llvm.org/D37799 llvm-svn: 313144	2017-09-13 12:39:18 +00:00
Simon Pilgrim	f613a45bf3	[X86][FMA4] Test FMA4 commutation with repeated ops as well as FMA3 llvm-svn: 313143	2017-09-13 11:21:38 +00:00
Simon Pilgrim	322fc53725	[X86][FMA] Added *213 fma instructions to scheduling tests Annoyingly the 132/231 variants are pretty tricky to create when you need to due to weak FMA commutation patterns. llvm-svn: 313142	2017-09-13 11:12:56 +00:00
Gadi Haber	a753080d1e	[X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC. NFC. Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options. The fix is in preparation for a large patch of adding all SKL scheduling information. Reviewers: delena, zvi, RKSimon Revision: https://reviews.llvm.org/D37796 llvm-svn: 313138	2017-09-13 09:28:25 +00:00
Gadi Haber	04de4ce9e2	[X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC. NFC. Updating the code gen regression test bmi2-schedule.ll to use the KNL and SKYLAKE prefixes for the run commands that use the knl and Skylake mcpu options. The fix is in preparation for a large patch of adding all SKL scheduling information. Reviewers: delena, zvi Revision: https://reviews.llvm.org/D37796 llvm-svn: 313137	2017-09-13 09:28:18 +00:00
Gadi Haber	fb47ab7cdd	NFC. Updating codegen test bmi2-schedule.ll to use the SKYLAKE and KNL prefix as preparatipn for an upcoming patch to add all SKL scheduling information. llvm-svn: 313136	2017-09-13 09:27:39 +00:00
Igor Breger	5c721199dd	[GlobalISel][X86] support G_FPEXT operation. Summary: Support G_FPEXT operation. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: zvi Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34816 llvm-svn: 313135	2017-09-13 09:05:23 +00:00
Uriel Korach	5d5da5f531	[X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm) This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR. differential revision: https://reviews.llvm.org/D37693. llvm-svn: 313134	2017-09-13 09:02:36 +00:00
Uriel Korach	53872a2d89	[X86] Add explicit mc-encoding checks to X86/viabs.ll. NFC. Add explicit mc-encoding checks showing that the AVX512VL ABS intrinsics are actually mapped to EVEX encoding. This is a pre-commit for a soon to come patch which will lower x86 target specific ABS intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37688 llvm-svn: 313131	2017-09-13 08:33:55 +00:00
Craig Topper	2b6bfda561	[X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a BEXTR64rr instruction from a shift/and pair. Fixes PR34589. llvm-svn: 313126	2017-09-13 07:53:21 +00:00
Elena Demikhovsky	6cab129464	[X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions. llvm-svn: 313121	2017-09-13 06:40:26 +00:00
Derek Schuff	a519fe5a37	[WebAssembly] Add sign extend instructions from atomics proposal Select them from ISD::SIGN_EXTEND_INREG Differential Revision: https://reviews.llvm.org/D37603 remove spurious change llvm-svn: 313101	2017-09-13 00:29:06 +00:00
Sanjay Patel	659279450e	[x86] eliminate unnecessary vector compare for AVX masked store The masked store instruction only cares about the sign-bit of each mask element, so the compare s<0 isn't needed. As noted in PR11210: https://bugs.llvm.org/show_bug.cgi?id=11210 ...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR. (Although more testing will be needed to confirm that.) I filed a bug to track improvements for AVX512: https://bugs.llvm.org/show_bug.cgi?id=34584 Differential Revision: https://reviews.llvm.org/D37446 llvm-svn: 313089	2017-09-12 23:24:05 +00:00
Ahmed Bougacha	106dd035a8	[AArch64][GlobalISel] Select all fpexts. Tablegen already can select these: mark them as legal, remove the c++ code, and add tests for all types. llvm-svn: 313074	2017-09-12 21:04:11 +00:00
Ahmed Bougacha	a7aa2a9fb1	[AArch64][GlobalISel] Select all fptruncs. We already support these in tablegen, but we're matching the wrong operator (libm ftrunc). Fix that. While there, drop the c++ code, support COPYs of FPR16, and add tests for the other types. llvm-svn: 313073	2017-09-12 21:04:10 +00:00
Lei Huang	34e6621724	Update branch coalescing to be a PowerPC specific pass Implementing this pass as a PowerPC specific pass. Branch coalescing utilizes the analyzeBranch method which currently does not include any implicit operands. This is not an issue on PPC but must be handled on other targets. Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce. Differential Revision : https: // reviews.llvm.org/D32776 llvm-svn: 313061	2017-09-12 18:39:11 +00:00
Craig Topper	958106d0f1	[X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp. This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together. I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel. I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch. Differential Revision: https://reviews.llvm.org/D37592 llvm-svn: 313054	2017-09-12 17:40:25 +00:00
Elena Demikhovsky	18ff5c1374	Added "zext" from v2i8 to v2i32. In the next patch I'll optimize the sequence. llvm-svn: 313052	2017-09-12 17:27:53 +00:00
Hans Wennborg	8c1eb106bd	Revert r313009 "[ARM] Use ADDCARRY / SUBCARRY" This was causing PR34045 to fire again. > This is a preparatory step for D34515 and also is being recommitted as its > first version caused PR34045. > > This change: > - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 > - lowering is done by first converting the boolean value into the carry flag > using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value > using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two > operations does the actual addition. > - for subtraction, given that ISD::SUBCARRY second result is actually a > borrow, we need to invert the value of the second operand and result before > and after using ARMISD::SUBE. We need to invert the carry result of > ARMISD::SUBE to preserve the semantics. > - given that the generic combiner may lower ISD::ADDCARRY and > ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering > as well otherwise i64 operations now would require branches. This implies > updating the corresponding test for unsigned. > - add new combiner to remove the redundant conversions from/to carry flags > to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C > - fixes PR34045 > > Differential Revision: https://reviews.llvm.org/D35192 Also revert follow-up r313010: > [ARM] Fix typo when creating ISD::SUB nodes > > In D35192, I accidentally introduced a typo when creating ISD::SUB nodes, > giving them two values instead of one. > > This fails when the merge_values combiner finds one of these nodes. > > This change fixes PR34564. > > Differential Revision: https://reviews.llvm.org/D37690 llvm-svn: 313044	2017-09-12 16:24:17 +00:00
Simon Pilgrim	76418aae74	[X86][AVX2] Add gather/movntdqa/pmaskmov/pmovmskb/pslldq/psrldq instructions to scheduling tests llvm-svn: 313039	2017-09-12 15:52:01 +00:00
Simon Pilgrim	0af5a772e0	[X86][AVX2] Add further instructions to scheduling tests llvm-svn: 313032	2017-09-12 15:01:20 +00:00
Simon Pilgrim	d2d2b37cc9	[X86][AVX2] Add integer broadcast scheduling tests llvm-svn: 313026	2017-09-12 12:59:20 +00:00
Jonas Paulsson	fc4f323ac1	[SystemZ] Add the CoveredBySubRegs bit to GPR64, GPR128 and FPR128 registers. This bit is needed in order for the CalleeSavedRegs list to automatically include the super registers if all of their subregs are present. Thanks to Wei Mi for initially indicating this deficiency in the SystemZ backend. Review: Ulrich Weigand. https://bugs.llvm.org/show_bug.cgi?id=34550 llvm-svn: 313023	2017-09-12 12:11:29 +00:00
Simon Pilgrim	5a931c641e	[X86][AVX2] Add additional fp-broadcast/subvector/shuffle scheduling tests llvm-svn: 313022	2017-09-12 11:17:01 +00:00
Simon Pilgrim	ef9a9d709a	[X86][AVX] Add vperm2f128 scheduling test llvm-svn: 313021	2017-09-12 11:10:59 +00:00
Simon Pilgrim	f336d9ce3c	[X86][AVX2] Remove old (unused) intrinsic declarations llvm-svn: 313020	2017-09-12 11:09:30 +00:00
Yael Tsafrir	47668b5e03	[X86] Lower _mm[256\|512]_[mask[z]]_avg_epu[8\|16] intrinsics to native llvm IR Differential Revision: https://reviews.llvm.org/D37560 llvm-svn: 313013	2017-09-12 07:50:35 +00:00

1 2 3 4 5 ...

21507 Commits