llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	9e030c9e00	[X86] Improve combineCastedMaskArithmetic to fold (bitcast (vXi1 (and/or/xor X, C)))->(vXi1 (and/or/xor (bitcast X), (bitcast C)) where C is a constant build_vector. Most vxi1 constant build vectors have to be implemented in the scalar domain anyway so we'll probably end up with a cast there later. But by then its too late to do the combine to get rid of it. llvm-svn: 324662	2018-02-08 22:26:39 +00:00
Craig Topper	1b5b4ccb77	[X86] Add DAG combine to constant fold a bitcast of a vXi1 constant build_vector into a scalar integer. llvm-svn: 324661	2018-02-08 22:26:36 +00:00
Craig Topper	8d0c8c9be1	[X86] Support folding in a k-register OR when creating KORTEST from scalar compare of a bitcast from vXi1. This should allow us to remove the kortest intrinsic from IR and use compare+bitcast+or in IR instead. llvm-svn: 324580	2018-02-08 08:29:43 +00:00
Craig Topper	93505707b6	[X86] Allow KORTEST instruction to be used for testing if a mask is all ones The KTEST instruction sets the C flag if the result of anding both operands together is all 1s. We can use this to lower (icmp eq/ne (bitcast (vXi1 X), -1) Differential Revision: https://reviews.llvm.org/D42772 llvm-svn: 324577	2018-02-08 07:54:16 +00:00
Craig Topper	94235556aa	[X86] Modify a few tests to not use icmps that are provably false. These used things like unsigned less than zero, which is always false because there is no unsigned number less than zero. I plan to teach DAG combine to optimize these so need to stop using them. llvm-svn: 324315	2018-02-06 06:44:05 +00:00
Craig Topper	9c6c7c5e9b	[X86] Relax restrictions on what setcc condition codes can be folded with a sext when AVX512 is enabled. We now allow all signed comparisons and not equal. The complement that needs to be added for this is no worse than the extend. And the vector output forms of pcmpeq/pcmpgt have better latency than the k-register version on SKX. llvm-svn: 324294	2018-02-05 23:57:01 +00:00
Craig Topper	9a06f24704	[X86] Artificially lower the complexity of the scalar ANDN patterns so that AND with immediate will match first. This allows the immediate to folded into the and instead of being forced to move into a register. This can sometimes result in shorter encodings since the and can sign extend an immediate. This also allows us to match an and to a movzx after a not. This can cause an extra move if the input to the separate NOT has an additional user which requires a copy before the NOT. llvm-svn: 324260	2018-02-05 18:31:04 +00:00
Craig Topper	8d511a65af	[X86] Add DAG combine to turn (bitcast (and/or/xor (bitcast X), Y)) -> (and/or/xor X, (bitcast Y)) when casting between GPRs and mask operations. This reduces the number of transitions between k-registers and GPRs, reducing the number of instructions. There's still some room for improvement to remove more transitions, but this is a good start. llvm-svn: 324184	2018-02-04 01:43:48 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Craig Topper	247016a735	[X86] Use vptestm/vptestnm for comparisons with zero to avoid creating a zero vector. We can use the same input for both operands to get a free compare with zero. We already use this trick in a couple places where we explicitly create PTESTM with the same input twice. This generalizes it. I'm hoping to remove the ISD opcodes and move this to isel patterns like we do for scalar cmp/test. llvm-svn: 323605	2018-01-27 20:19:09 +00:00
Craig Topper	513d3fa674	[X86] Remove X86ISD::PCMPGTM/PCMPEQM and instead just use X86ISD::PCMPM and pattern match the immediate value during isel. Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute. I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar. llvm-svn: 323604	2018-01-27 20:19:02 +00:00
Craig Topper	2c570eaa00	[TargetLowering] Teach TargetLowering::SimplifySetCC to simplify setcc of vXi1 vectors into logic ops. This transform was already being done for setcc of scalar i1. This extends it to vectors. llvm-svn: 323585	2018-01-27 09:10:58 +00:00
Craig Topper	c58c2b5c9b	[X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and inserting into a vXi1 vector. The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code. This patch is a little larger than it should be due to differences between the DQI handling between the two today. llvm-svn: 323212	2018-01-23 15:56:36 +00:00
Craig Topper	f090e8a89a	[X86] Replace CVT2MASK ISD opcode with PCMPGTM compared to zero. CVT2MASK is just checking the sign bit which can be represented with a comparison with zero. llvm-svn: 321985	2018-01-08 06:53:54 +00:00
Craig Topper	9b800c692e	[X86] Use SSE_INTMUL_ITINS_P for the AVX-512 MUL instructions to match their SSE/AVX counterparts. llvm-svn: 321451	2017-12-26 05:43:04 +00:00
Craig Topper	2d1d9a11c1	[X86] Fix (v2f64 (s/uint_to_fp (v2i1))) to avoid scalarization without AVX512DQ. Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32. llvm-svn: 321420	2017-12-24 06:51:36 +00:00
Craig Topper	06dad14797	[X86] Remove type restrictions from WidenMaskArithmetic. This can help AVX-512 code where mask types are legal allowing us to remove extends and truncates to/from mask types. llvm-svn: 321408	2017-12-23 18:53:05 +00:00
Andrew V. Tischenko	22f0742dda	Fix for bug PR35549 - Repeated schedule comments. Differential Revision: https://reviews.llvm.org/D40960 llvm-svn: 320837	2017-12-15 18:13:05 +00:00
Craig Topper	600f1ba333	[X86] Don't zero the upper bits of the k-register before extracting a single bit from a vXi1. This doesn't match the semantics of the extract_vector_elt operation. Nothing downstream knows the bits were zeroed so they still get masked or sign extended after the extrat anyway. llvm-svn: 320723	2017-12-14 18:35:25 +00:00
Craig Topper	a0be5a06c1	[X86] Rename some instructions that start with Int_ to have the _Int at the end. This matches AVX512 version and is more consistent overall. And improves our scheduler models. In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses. llvm-svn: 320325	2017-12-10 19:47:56 +00:00
Craig Topper	1de942b2d1	[X86] Rename some instructions from 'rb' to 'rrb' to make 'b' a proper suffix. Fix the scheduling information for some of them. Some of the scheduling information was only present for the 'rb' version' and not the 'rr' version. Now we match 'rr(b?)' llvm-svn: 320320	2017-12-10 17:42:44 +00:00
Craig Topper	c7445f2cdc	[X86] Add VCVTQQ2PS to the skylake server scheduler models. llvm-svn: 320319	2017-12-10 17:42:43 +00:00
Simon Pilgrim	4ff43d8120	Regenerate some AVX2+ scheduling tests that got missed llvm-svn: 320307	2017-12-10 13:41:29 +00:00
Craig Topper	391c6f9507	[X86] Fix bad regular expressions in the scheduler models. Question marks should be outside of multicharacter parenthesized expressions If the question mark is inside the parentheses it only applies to the single character proceeding it. I had to make a few additional cleanups to fix some duplicate warnings that were exposed by fixing this. llvm-svn: 320279	2017-12-10 01:24:08 +00:00
Craig Topper	323ba39f10	[X86] Handle alls version of vXi1 insert_vector_elt with a constant index without falling back to shuffles. We previously only supported inserting to the LSB or MSB where it was easy to zero to perform an OR to insert. This change effectively extracts the old value and the new value, xors them together and then xors that single bit with the correct location in the original vector. This will cancel out the old value in the first xor leaving the new value in the position. The way I've implemented this uses 3 shifts and two xors and uses an additional register. We can avoid the additional register at the cost of another shift. llvm-svn: 320120	2017-12-08 00:16:09 +00:00
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Simon Pilgrim	9afbe77a91	[X86][AVX512] Tag mask reg op instruction scheduler classes llvm-svn: 319945	2017-12-06 19:36:00 +00:00
Simon Pilgrim	df05251921	[X86][AVX512] Tag aligned/unaligned move instruction scheduler classes llvm-svn: 319913	2017-12-06 17:59:26 +00:00
Simon Pilgrim	aa902be158	[X86][AVX512] Tag BROADCAST instruction scheduler classes llvm-svn: 319900	2017-12-06 15:48:40 +00:00
Simon Pilgrim	a9282309e5	[X86][AVX512] Regenerate vpmovm2/vpmov2m avx512 schedule tests llvm-svn: 319895	2017-12-06 14:07:38 +00:00
Simon Pilgrim	d495301414	[X86][AVX512] Tag BLENDM instruction scheduler classes llvm-svn: 319833	2017-12-05 21:05:25 +00:00
Simon Pilgrim	833c260a4b	[X86][AVX512] Tag VPTRUNC/VPMOVSX/VPMOVZX instruction scheduler classes llvm-svn: 319815	2017-12-05 19:21:28 +00:00
Simon Pilgrim	71660c61e6	[X86][AVX512] Add missing scalar CMPSS/CMPSD logic scheduler classes llvm-svn: 319770	2017-12-05 14:34:42 +00:00
Simon Pilgrim	b9b46394e3	[X86][AVX512] Cleanup bit logic scheduler classes llvm-svn: 319767	2017-12-05 14:04:23 +00:00
Simon Pilgrim	fd3a2632e5	[X86][AVX512] Tag scalar CVT and CMP instruction scheduler classes llvm-svn: 319765	2017-12-05 13:49:44 +00:00
Simon Pilgrim	aa91155960	[X86][AVX512] Tag VPCMP/VPCMPU instruction scheduler classes Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now. llvm-svn: 319760	2017-12-05 12:14:36 +00:00
Simon Pilgrim	a2b5862641	[X86][AVX512] Cleanup VPCMP scheduler classes Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now. llvm-svn: 319758	2017-12-05 12:02:22 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Simon Pilgrim	465a88bb92	[X86][AVX512] Tag packed F2I/I2F/F2F conversion instructions scheduler class llvm-svn: 319636	2017-12-03 21:16:12 +00:00
Simon Pilgrim	9291053463	[X86][AVX512] Regenerate schedule tests. llvm-svn: 319635	2017-12-03 21:07:36 +00:00
Simon Pilgrim	2dc4ff1cde	[X86][AVX512] Tag vshift/vpermv/pshufd/pshufb instructions scheduler classes llvm-svn: 319540	2017-12-01 13:25:54 +00:00
Simon Pilgrim	bb791b3dbd	[X86][AVX512] Tag fcmp/ptest/ternlog instructions scheduler classes llvm-svn: 319433	2017-11-30 13:18:06 +00:00
Simon Pilgrim	1c7556fb29	[X86][AVX512] Regenerate avx512 schedule tests llvm-svn: 319432	2017-11-30 13:09:21 +00:00
Simon Pilgrim	36be852cee	[X86][AVX512] Tag 3OP (shuffles, double-shifts and GFNI) instructions scheduler classes llvm-svn: 319337	2017-11-29 18:52:20 +00:00
Craig Topper	88ffb5d4d5	[X86] Mark ISD::FP_TO_UINT v16i8/v16i16 as Promote under AVX512 instead of legal. Fix infinite loop in op legalization when promotion requires 2 steps. Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel. The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value. llvm-svn: 319259	2017-11-28 23:56:02 +00:00
Craig Topper	3f749c2d4b	[X86] Regenerate avx512-schedule test. For some reason some sqrt instructions were missing the scheduling comments. llvm-svn: 319258	2017-11-28 23:55:59 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Reid Kleckner	7adb2fdbba	Revert "Correct dwarf unwind information in function epilogue for X86" This reverts r317579, originally committed as r317100. There is a design issue with marking CFI instructions duplicatable. Not all targets support the CFIInstrInserter pass, and targets like Darwin can't cope with duplicated prologue setup CFI instructions. The compact unwind info emission fails. When the following code is compiled for arm64 on Mac at -O3, the CFI instructions end up getting tail duplicated, which causes compact unwind info emission to fail: int a, c, d, e, f, g, h, i, j, k, l, m; void n(int o, int b) { if (g) f = 0; for (; f < o; f++) { m = a; if (l > j k > i) j = i = k = d; h = b[c] - e; } } We get assembly that looks like this: ; BB#1: ; %if.then Lloh3: adrp x9, _f@GOTPAGE Lloh4: ldr x9, [x9, _f@GOTPAGEOFF] mov w8, wzr Lloh5: str wzr, [x9] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.lt LBB0_3 b LBB0_7 LBB0_2: ; %entry.if.end_crit_edge Lloh6: adrp x8, _f@GOTPAGE Lloh7: ldr x8, [x8, _f@GOTPAGEOFF] Lloh8: ldr w8, [x8] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.ge LBB0_7 LBB0_3: ; %for.body.lr.ph Note the multiple .cfi_def* directives. Compact unwind info emission can't handle that. llvm-svn: 317726	2017-11-08 21:31:14 +00:00
Petar Jovanovic	e2a585dddc	Reland "Correct dwarf unwind information in function epilogue for X86" Reland r317100 with minor fix regarding ComputeCommonTailLength function in BranchFolding.cpp. Skipping top CFI instructions block needs to executed on several more return points in ComputeCommonTailLength(). Original r317100 message: "Correct dwarf unwind information in function epilogue for X86" This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. llvm-svn: 317579	2017-11-07 14:40:27 +00:00
Petar Jovanovic	bb5c84fb57	Revert "Correct dwarf unwind information in function epilogue for X86" This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf buildbot failure (build #15606). llvm-svn: 317136	2017-11-01 23:05:52 +00:00

1 2

62 Commits