llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	ed99c332c3	[X86][X87] Renamed CHECK prefix, its not actually broken anymore just scheduled differently llvm-svn: 321423	2017-12-24 10:25:01 +00:00
Simon Pilgrim	fcbf977e04	[X86][X87] Add another test case mentioned on PR34080 Did my best to reduce this, but the X87 scheduling bug is hard to hit at the best of times... llvm-svn: 321422	2017-12-24 10:22:55 +00:00
Craig Topper	2d1d9a11c1	[X86] Fix (v2f64 (s/uint_to_fp (v2i1))) to avoid scalarization without AVX512DQ. Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32. llvm-svn: 321420	2017-12-24 06:51:36 +00:00
Craig Topper	f65a6e4ed8	[DAGCombiners] Don't turn ANDs to shuffles with zero so early. Give some other combines a chance to run. This moves the combine for turning ANDs into shuffle with zero out of SimplifyVBinOps and places it only in visitAND below the reassociate handling. This fixes the specific case I noticed where we failed to combine two ands with constants. llvm-svn: 321417	2017-12-24 02:05:18 +00:00
Craig Topper	62fd123731	[X86] Teach WidenMaskArithmetic to handle any constant buildvector on the RHS not just all zeros/ones. llvm-svn: 321415	2017-12-24 01:03:31 +00:00
Craig Topper	1f2f265fc1	[SelectionDAG] Teach SelectionDAG::getNode to constant fold zext/aext/sext of constant build vectors. llvm-svn: 321414	2017-12-23 20:21:29 +00:00
Craig Topper	06dad14797	[X86] Remove type restrictions from WidenMaskArithmetic. This can help AVX-512 code where mask types are legal allowing us to remove extends and truncates to/from mask types. llvm-svn: 321408	2017-12-23 18:53:05 +00:00
Craig Topper	b2368fbdf4	[SelectionDAG] Reverse the order of operands in the ISD::ADD created by TargetLowering::getVectorElementPointer so that the FrameIndex is on the left. This seems to improve X86's ability to match this into an address computation. Otherwise the other operand gets assigned to the base register and the stack pointer + frame index ends up in the index register. But index registers can't encode ESP/RSP so we end up having to move it into another register to meet the constraint. I could try to improve the address matcher in X86, but swapping the producer seemed easier. Several other places already have the operands in this order so this is at least consistent. llvm-svn: 321370	2017-12-22 17:18:13 +00:00
Craig Topper	576335f998	[X86] When lowering insert_vector_elt/extract_vector_elt of vXi1 with a non-constant index just use either a 128-bit type or the vXi8 type with the correct number of elements. Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code. llvm-svn: 321369	2017-12-22 17:18:11 +00:00
Simon Atanasyan	5cd90ccbe3	[mips] Add test case to check that calls to mcount follow long calls / short calls options. NFC llvm-svn: 321357	2017-12-22 13:45:46 +00:00
Diana Picus	28a6d0e639	[ARM GlobalISel] Support G_INTTOPTR and G_PTRTOINT for s32 Mark conversions between pointers and 32-bit scalars as legal, map them to the GPR and select to a simple COPY. llvm-svn: 321356	2017-12-22 13:05:51 +00:00
Diana Picus	68773859c8	[ARM GlobalISel] Support pointer constants Pointer constants are pretty rare, since we usually represent them as integer constants and then cast to pointer. One notable exception is the null pointer constant, which is represented directly as a G_CONSTANT 0 with pointer type. Mark it as legal and make sure it is selected like any other integer constant. llvm-svn: 321354	2017-12-22 11:09:18 +00:00
Sam Parker	cf426fccd4	[DAGCombine] Revert r321259 Improve ReduceLoadWidth for SRL Patch is causing an issue on the PPC64 BE santizer. llvm-svn: 321349	2017-12-22 08:36:25 +00:00
Craig Topper	e268598dd3	[X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling for prefetch instructions. Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well. The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled. Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions. I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations. llvm-svn: 321335	2017-12-22 02:30:30 +00:00
Craig Topper	9befe89367	[X86] Use SIGN_EXTEND to implement ANY_EXTEND from vXi1. llvm-svn: 321334	2017-12-22 02:30:26 +00:00
Craig Topper	8772228963	[X86] Use SIGN_EXTEND rather than ZERO_EXTEND for lowering extract_vector_elt from vXi1 with a non-const index. We have a better range of instructions we can use if we can fill with the value i1 value rather than zeroing. llvm-svn: 321315	2017-12-21 22:08:23 +00:00
Craig Topper	742ac98d01	[X86] When lowering truncates to vXi1, don't sign extend i16/i8 types to 512-bit if we have VLX. This should only affect what we do for v8i16. Previously we went to v8i64, but if we have VLX we only need v8i32. This prevents an unnecessary zmm usage. llvm-svn: 321303	2017-12-21 20:45:13 +00:00
Craig Topper	410a289b79	[X86] Promote v8i1 shuffles to v8i32 instead of v8i64 if we have VLX. We should have equally good shuffle options for v8i32 with VLX. This was spotted during my attempts to remove 512-bit vectors from SKX. We still use 512-bits for v16i1, v32i1, and v64i1. I'm less sure we can handle those well with narrower vectors. i32 and i64 element sizes get the best shuffle support. llvm-svn: 321291	2017-12-21 18:44:06 +00:00
Simon Pilgrim	4de5bb093c	[X86][SSE] Split large PAVGB/PAVGW vectors to legal widths Patch to allow detectAVGPattern handle vectors larger than the legal size (128 SSE2, 256 AVX2, 512 AVX512BW), splitting the vectors accordingly. Differential Revision: https://reviews.llvm.org/D41440 llvm-svn: 321288	2017-12-21 18:12:31 +00:00
Simon Pilgrim	6b915d3353	[DAGCombiner] Generalize (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) combine to work on non-splat vectors The knownbits_mask_or_shuffle_uitofp change is interesting - shuffle combines manage to kick in, removing the AND constant mask load. For targets with fast-variable-shuffle this should reduce further to VPOR+VPSHUFB+VCVTDQ2PS. llvm-svn: 321279	2017-12-21 16:34:46 +00:00
Simon Pilgrim	4707709d1b	[X86] Add (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) non-splat vector test llvm-svn: 321278	2017-12-21 16:08:41 +00:00
Tony Jiang	eba757e45c	[PowerPC] Fix parest build failure in SPEC2017. The build failure was caused by an assertion in pre-legalization DAGCombine: Combining: t6: ppcf128 = uint_to_fp t5 ... into: t20: f32 = PPCISD::FCFIDUS t19 which is clearly wrong since ppcf128 are definitely different type with f32 and we cannot change the node value type when do DAGCombine. The fix is don't handle ppc_fp128 or i1 conversions in PPCTargetLowering::combineFPToIntToFP and leave it to downstream to legalize it and expand it to small legal types. Differential Revision: https://reviews.llvm.org/D41411 llvm-svn: 321276	2017-12-21 15:42:50 +00:00
Simon Pilgrim	4dd03ed7e3	[DAGCombiner] Generalize (and (or x, C), D) -> D iff (C & D) == D combine to work on non-splat vectors llvm-svn: 321275	2017-12-21 15:17:29 +00:00
Simon Pilgrim	636510a702	[X86] Add (and (or x, C), D) -> D iff (C & D) == D non-splat vector test llvm-svn: 321268	2017-12-21 14:33:40 +00:00
Simon Pilgrim	bed1aceadb	[X86] Add v48i8 AVG test case, based on discussion on D41440 llvm-svn: 321261	2017-12-21 13:18:19 +00:00
Sam Parker	59efb8cb5b	[DAGCombine] Improve ReduceLoadWidth for SRL If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 321259	2017-12-21 12:55:04 +00:00
Craig Topper	72c22f4366	[X86] Use PSHUFB for v32i16 shuffles before falling back to VPERMW/VPERMI2W. PSHUFB has the ability to implicitly 0 elements which VPERMI2W can't do. So give a chance to use it first. llvm-svn: 321251	2017-12-21 08:22:51 +00:00
Craig Topper	38af615b4c	[X86] Use VPERMI2B for v16i8 shuffles if we have VBMI+VLX and would have otherwise used two PSHUFBs ORed together. llvm-svn: 321249	2017-12-21 07:31:30 +00:00
Craig Topper	03b2bc4838	[X86] Use VPERMB/VPERMI2B for v32i8 shuffle lowering if VBMI and VLX are supported. llvm-svn: 321248	2017-12-21 05:58:31 +00:00
Craig Topper	5e4c7bc963	[X86] Add avx512vbmi command lines to vector-shuffle-256-v32.ll llvm-svn: 321247	2017-12-21 03:58:31 +00:00
Michael Zolotukhin	ad371e0caa	[SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking. If a block has N predecessors, then the current algorithm will try to sink common code to this block N times (whenever we visit a predecessor). Every attempt to sink the common code includes going through all predecessors, so the complexity of the algorithm becomes O(N^2). With this patch we try to sink common code only when we visit the block itself. With this, the complexity goes down to O(N). As a side effect, the moment the code is sunk is slightly different than before (the order of simplifications has been changed), that's why I had to adjust two tests (note that neither of the tests is supposed to test SimplifyCFG): * test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic the changes that previous implementation of SimplifyCFG would do. * test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common code sinking by a command line flag. llvm-svn: 321236	2017-12-21 01:22:13 +00:00
Joel Galenson	6f4e827e4c	[ARM] Optimize {s,u}{add,sub}.with.overflow. The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during SelectionDAG. This commit ports that code to the ARM backend. Differential revision: https://reviews.llvm.org/D35635 llvm-svn: 321224	2017-12-20 22:25:39 +00:00
Krzysztof Parzyszek	e4ce92cabf	[Hexagon] Allow construction of HVX vector predicates Handle BUILD_VECTOR of boolean values. llvm-svn: 321220	2017-12-20 20:49:43 +00:00
Yonghong Song	25bf825961	bpf: add support for objdump -print-imm-hex Add support for 'objdump -print-imm-hex' for imm64, operand imm and branch target. If user programs encode immediate values as hex numbers, such an option will make it easy to correlate asm insns with source code. This option also makes it easy to correlate imm values with insn encoding. There is one changed behavior in this patch. In old way, we print the 64bit imm as u64: O << (uint64_t)Op.getImm(); and the new way is: O << formatImm(Op.getImm()); The formatImm is defined in llvm/MC/MCInstPrinter.h as format_object<int64_t> formatImm(int64_t Value) So the new way to print 64bit imm is i64 type. If a 64bit value has the highest bit set, the old way will print the value as a positive value and the new way will print as a negative value. The new way is consistent with x86_64. For the code (see the test program): ... if (a == 0xABCDABCDabcdabcdULL) ... x86_64 objdump, with and without -print-imm-hex, looks like: 48 b8 cd ab cd ab cd ab cd ab movabsq $-6067004223159161907, %rax 48 b8 cd ab cd ab cd ab cd ab movabsq $-0x5432543254325433, %rax Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 321215	2017-12-20 19:39:58 +00:00
Craig Topper	07820f2fe4	[X86] Remove zext from vXi32 to vXi64 on indices of gather/scatter instructions if we can prove the pre-extended value is positive. Gather/scatter can implicitly sign extend from i32->i64 on indices. So if we know the sign bit of the input to a zext is 0 we can use the implicit extension. llvm-svn: 321209	2017-12-20 19:25:33 +00:00
Warren Ristow	c07d49585f	Improve the test for r320216. NFC. Patch by Matthew Voss! llvm-svn: 321207	2017-12-20 19:11:31 +00:00
Craig Topper	bc92e00f2e	[X86] Implement the fusing of MUL+SUBADD to FMSUBADD This patch turns shuffles of fadd/fsub with fmul into fmsubadd. Patch by Dmitry Venikov Differential Revision: https://reviews.llvm.org/D40335 llvm-svn: 321200	2017-12-20 18:05:15 +00:00
Krzysztof Parzyszek	8f6b0c850a	[Hexagon] Adjust the value type for BCvt in LowerFormalArguments llvm-svn: 321177	2017-12-20 14:44:05 +00:00
Simon Pilgrim	a50eec0293	[X86][AVX2] Split more shuffle tests into 'slow' and 'fast' variable shuffles llvm-svn: 321171	2017-12-20 13:12:34 +00:00
Diana Picus	75ce852abe	[ARM GlobalISel] Fix assertion in RegBankSelect We get an assertion in RegBankSelect for code along the lines of my_32_bit_int = my_64_bit_int, which tends to translate into a 64-bit load, followed by a G_TRUNC, followed by a 32-bit store. This appears in a couple of places in the test-suite. At the moment, the legalizer doesn't distinguish between integer and floating point scalars, so a 64-bit load will be marked as legal for targets with VFP, and so will the rest of the sequence, leading to a slightly bizarre G_TRUNC reaching RegBankSelect. Since the current support for 64-bit integers is rather immature, this patch works around the issue by explicitly handling this case in RegBankSelect and InstructionSelect. In the future, we may want to revisit this decision and make sure 64-bit integer loads are narrowed before reaching RegBankSelect. llvm-svn: 321165	2017-12-20 11:27:10 +00:00
Florian Hahn	c3aa6d83fd	[ARM] Lower unsigned saturation to USAT Summary: Implement lower of unsigned saturation on an interval [0, k] where k + 1 is a power of two using USAT instruction in a similar way to how [~k, k] is lowered using SSAT on ARM models that supports it. Patch by Marten Svanfeldt Reviewers: t.p.northover, pbarrio, eastig, SjoerdMeijer, javed.absar, fhahn Reviewed By: fhahn Subscribers: fhahn, aemerson, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D41348 llvm-svn: 321164	2017-12-20 11:13:57 +00:00
Tim Northover	6db5d027c6	AArch64: fix one more place movi.2d could be created. Somehow got missed out of r320965. llvm-svn: 321162	2017-12-20 10:45:39 +00:00
Craig Topper	abed821c36	[X86] Optimize sign extends on index operand to gather/scatter to not sign extend past i32. The gather instruction will implicitly sign extend to the pointer width, we don't need to further extend it. This can prevent unnecessary splitting in some cases. There's still an issue that lowering on non-VLX can introduce another sign extend that doesn't get combined with shifts from a lowered sign_extend_inreg. llvm-svn: 321152	2017-12-20 07:36:59 +00:00
Martin Storsjo	2778fd0b59	[AArch64] Implement stack probing for windows Differential Revision: https://reviews.llvm.org/D41131 llvm-svn: 321150	2017-12-20 06:51:45 +00:00
Hiroshi Inoue	11e571e0c6	[PowerPC] fix a bug in redundant compare elimination This patch fixes a bug in the redundant compare elimination reported in https://reviews.llvm.org/rL320786 and re-enables the optimization. The redundant compare elimination assumes that we can replace signed comparison with unsigned comparison for the equality check. But due to the difference in the sign extension behavior we cannot change the opcode if the comparison is against an immediate and the most significant bit of the immediate is one. Differential Revision: https://reviews.llvm.org/D41385 llvm-svn: 321147	2017-12-20 05:18:19 +00:00
Craig Topper	b1ae03fd61	[X86] Improve coverage of fma negations. llvm-svn: 321137	2017-12-20 01:26:36 +00:00
Craig Topper	171fb15786	[X86] Fix probable typo in fma fneg test. llvm-svn: 321136	2017-12-20 01:26:35 +00:00
Francis Visoiu Mistrih	3b265c8fcf	[CodeGen] Move printing MO_FPImmediate operands to MachineOperand::print Work towards the unification of MIR and debug output by refactoring the interfaces. llvm-svn: 321110	2017-12-19 21:47:00 +00:00
Mark Searles	e4f067ebe2	[AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed. Differential Revision: https://reviews.llvm.org/D41377 llvm-svn: 321100	2017-12-19 19:26:23 +00:00
Simon Pilgrim	7cabb4c384	[X86] Regenerate popcnt tests llvm-svn: 321093	2017-12-19 18:05:13 +00:00

1 2 3 4 5 ...

22758 Commits