llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	1ee19ae126	[X86] Add new patterns for masked scalar load/store to match clang's codegen from r331958. Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets. So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened. We may be able to drop some of the old patterns, but I leave that for a future patch. llvm-svn: 332049	2018-05-10 21:49:16 +00:00
Gabor Buella	a3b581906f	[X86] Initialize HasPTWRITE member of X86Subtarget This was missing from r331961. Caught by sanitizer bots. llvm-svn: 332024	2018-05-10 19:15:10 +00:00
Simon Pilgrim	a3686c9a28	[X86] Convert/Merge more instregex patterns to reduce InstrRW compile time. Use instrs lists or merge multiple instregex patterns. llvm-svn: 332022	2018-05-10 19:08:06 +00:00
Simon Pilgrim	b7f274ef16	[X86][Znver1] Remove unnecessary SchedWritePMULLD InstRW overrides. llvm-svn: 332006	2018-05-10 17:42:26 +00:00
Simon Pilgrim	37fbb7f173	[X86][SNB] Fix typo in PEXTRDmr instregex, was missing VPEXTRDmr. llvm-svn: 332002	2018-05-10 17:30:49 +00:00
Simon Pilgrim	38ac0e9c6b	[X86] Split WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes Split off XMM classes from the default (MMX) classes. llvm-svn: 331999	2018-05-10 17:06:09 +00:00
Sanjay Patel	b4e7893ba8	[x86] fix fmaxnum/fminnum with nnan With nnan, there's no need for the masked merge / blend sequence (that probably costs much more than the min/max instruction). Somewhere between clang 5.0 and 6.0, we started producing these intrinsics for fmax()/fmin() in C source instead of libcalls or fcmp/select. The backend wasn't prepared for that, so we regressed perf in those cases. Note: it's possible that other targets have similar problems as seen here. Noticed while investigating PR37403 and related bugs: https://bugs.llvm.org/show_bug.cgi?id=37403 The IR FMF propagation cases still don't work. There's a proposal that might fix those cases in D46563. llvm-svn: 331992	2018-05-10 15:40:49 +00:00
Gabor Buella	a832b22bae	[X86] ptwrite intrinsic Reviewers: craig.topper, RKSimon Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D46539 llvm-svn: 331961	2018-05-10 07:26:05 +00:00
Simon Pilgrim	ca7981ac98	[X86] Fix Broadwell's Shuffle256 schedule classes load latency values. Allows us to remove some unnecessary InstRW overrides. llvm-svn: 331913	2018-05-09 19:27:48 +00:00
Simon Pilgrim	d5d4cdb49d	[X86] Merge instregex patterns to reduce InstrRW compile time. llvm-svn: 331911	2018-05-09 19:04:15 +00:00
Simon Pilgrim	ab34aa8294	[X86] Cleanup WriteFStore/WriteVecStore schedules MOVNTPD/MOVNTPS should be WriteFStore Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction.... llvm-svn: 331864	2018-05-09 11:01:16 +00:00
Craig Topper	b9a473d186	[X86] Combine (vXi1 (bitcast (-1)))) and (vXi1 (bitcast (0))) to all ones or all zeros vXi1 vector. llvm-svn: 331847	2018-05-09 06:07:20 +00:00
Shiva Chen	801bf7ebbe	[DebugInfo] Examine all uses of isDebugValue() for debug instructions. Because we create a new kind of debug instruction, DBG_LABEL, we need to check all passes which use isDebugValue() to check MachineInstr is debug instruction or not. When expelling debug instructions, we should expel both DBG_VALUE and DBG_LABEL. So, I create a new function, isDebugInstr(), in MachineInstr to check whether the MachineInstr is debug instruction or not. This patch has no new test case. I have run regression test and there is no difference in regression test. Differential Revision: https://reviews.llvm.org/D45342 Patch by Hsiangkai Wang. llvm-svn: 331844	2018-05-09 02:42:00 +00:00
Jessica Paquette	ec37c640dd	Revert "[X86][CET] Shadow stack fix for setjmp/longjmp" This reverts commit 30962eca38ef02666ebcdded72a94f2cd0292d68. This commit has been causing test asan failures on a build bot. http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/45108/ Original commit: https://reviews.llvm.org/D46181 llvm-svn: 331813	2018-05-08 22:00:57 +00:00
Simon Pilgrim	f5f28aa714	[X86] Tag PCONFIG instruction with WriteSystem scheduler class llvm-svn: 331773	2018-05-08 15:55:14 +00:00
Simon Pilgrim	2864b46469	[X86] Split off WriteIMul64 from WriteIMul schedule class (PR36931) This fixes a couple of BtVer2 missing instructions that weren't been handled in the override. NOTE: There are still a lot of overrides that still need cleaning up! llvm-svn: 331770	2018-05-08 14:55:16 +00:00
Simon Pilgrim	2580554333	[X86] Split WriteIDiv into div/idiv 8/16/32/64 implementations (PR36930) I've created the necessary classes but there are still a lot of overrides that need cleaning up. NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides. llvm-svn: 331767	2018-05-08 13:51:45 +00:00
Simon Pilgrim	b0a3be04ec	[X86] Add vector masked load/store scheduler classes (PR32857) Split off from existing vector load/store classes to remove InstRW overrides. llvm-svn: 331760	2018-05-08 12:17:55 +00:00
Simon Pilgrim	210286ed8f	[X86] Add SchedWriteFTest/SchedWriteVecTest TEST scheduler classes Split off from SchedWriteVecLogic to remove InstRW overrides. llvm-svn: 331757	2018-05-08 10:28:03 +00:00
Jeremy Morse	4f799c027e	[X86] Mark all byval parameters as aliased This is a fix for PR30290: by marking all byval stack slots as being aliased, the instruction scheduler is more conservative about rescheduling memory accesses to such stack slots as an LLVM Value* might alias it. This fixes errors such as in the patched test case, where reads and writes to a data structure are illegally mixed. This could be fixed better in the future with better analysis for the instruction scheduler to know what Values alias what stack slots. Differential Revision: https://reviews.llvm.org/D45022 llvm-svn: 331749	2018-05-08 09:18:01 +00:00
Alexander Ivchenko	c47f799289	[X86][CET] Shadow stack fix for setjmp/longjmp This patch adds a shadow stack fix when compiling setjmp/longjmp with the shadow stack enabled. This allows setjmp/longjmp to work correctly with CET. Patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D46181 llvm-svn: 331748	2018-05-08 09:04:07 +00:00
Gabor Buella	4a02bf945e	[x86] Introduce the enclv instruction Summary: and use the -msgx flag as a requirement for the SGX instructions. Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D46436 llvm-svn: 331742	2018-05-08 07:11:05 +00:00
Gabor Buella	2b5e96004b	[x86] Introduce the pconfig instruction Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D46430 llvm-svn: 331739	2018-05-08 06:47:36 +00:00
Roman Lebedev	cc42d08b1d	[DagCombiner] Not all 'andn''s work with immediates. Summary: Split off from D46031. In masked merge case, this degrades IPC by decreasing instruction count. {F6108777} The next patch should be able to recover and improve this. This also affects the transform @spatel have added in D27489 / rL289738, and the test coverage for X86 was missing. But after i have added it, and looked at the changes in MCA, i'm somewhat confused. {F6093591} {F6093592} {F6093593} I'd say this regression is an improvement, since `IPC` increased in that case? Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: andreadb, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D46493 llvm-svn: 331684	2018-05-07 21:52:11 +00:00
Simon Pilgrim	1233e1234a	[X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions. Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour. llvm-svn: 331672	2018-05-07 20:52:53 +00:00
Simon Pilgrim	e480ed0b9f	[X86][AVX2] Tag VPMOVSX/VPMOVZX ymm instructions as WriteShuffle256 These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents. Differential Revision: https://reviews.llvm.org/D46229 llvm-svn: 331659	2018-05-07 18:25:19 +00:00
Simon Pilgrim	763bf12085	[X86][Znver1] Remove WriteFMul/WriteFRcp InstRW overrides/aliases. Fixes x87 schedules to more closely match Agner - AMD doesn't tend to "special case" x87 instructions as much as Intel. llvm-svn: 331645	2018-05-07 16:34:26 +00:00
Simon Pilgrim	ac5d0a31ef	[X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions. This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values. llvm-svn: 331643	2018-05-07 16:15:46 +00:00
Simon Pilgrim	f3ae50fca2	[X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions. WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions. This removes all InstrRW overrides for these instructions. NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner. NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80. llvm-svn: 331629	2018-05-07 11:50:44 +00:00
Craig Topper	c882014f43	[X86] Fix copy/paste mistake in comment. NFC llvm-svn: 331611	2018-05-07 00:47:02 +00:00
Craig Topper	cb2abc7977	[X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS Summary: The legacy VRCPPS/VRSQRTPS instructions aren't available in 512-bit versions. The new increased precision versions are. So we can use those to implement v16f32 reciprocal estimates. For KNL CPUs we can probably use VRCP28PS/VRSQRT28PS and avoid the NR step altogether, but I leave that for a future patch. Reviewers: spatel Reviewed By: spatel Subscribers: RKSimon, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D46498 llvm-svn: 331606	2018-05-06 17:48:21 +00:00
Simon Pilgrim	0e51a125ea	[X86] Add WriteEMMS scheduler class Filled in the missing values from Btver2 SoG or Agner llvm-svn: 331546	2018-05-04 18:16:13 +00:00
Simon Pilgrim	d7ffbc5c7e	[X86] Finish splitting WriteVecShift and WriteVecIMul to remove InstRW overrides. llvm-svn: 331543	2018-05-04 17:47:46 +00:00
Clement Courbet	b18c34bc29	[llvm-exegesis] Fix pfm counter names for BDW. Summary: They are not consistent with other microarchitectures. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D46434 llvm-svn: 331532	2018-05-04 15:26:12 +00:00
Simon Pilgrim	67cc246dca	[X86] Cleanup SchedWriteFMA classes and use X86SchedWriteWidths directly. Rename scalar and XMM versions, this is to match/simplify an upcoming change to split MUL/DIV/SQRT scalar/xmm/ymm/zmm classes. llvm-svn: 331531	2018-05-04 15:20:18 +00:00
Simon Pilgrim	bf4c8c0ff2	[X86] Add WriteVecMOVMSKY scheduler class llvm-svn: 331525	2018-05-04 14:54:33 +00:00
Simon Pilgrim	be51b20127	[X86] Add SchedWriteFRnd fp rounding scheduler classes Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions. Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA. llvm-svn: 331515	2018-05-04 12:59:24 +00:00
Simon Pilgrim	542b20d656	[X86] Add WriteDPPD/WriteDPPS dot product scheduler classes llvm-svn: 331489	2018-05-03 22:31:19 +00:00
Simon Pilgrim	0aed731516	[X86][Znver1] Use SchedAlias to tag microcoded scheduler classes Avoids extra entries in the class tables. Found a typo that missed the MMX_PHSUBSW instruction. llvm-svn: 331488	2018-05-03 22:12:23 +00:00
Simon Pilgrim	0720c8d90e	[X86][AVX512] VPLZCNT instructions match SchedWriteVecIMul scheduling class not SchedWriteVecALU. llvm-svn: 331473	2018-05-03 18:22:49 +00:00
Simon Pilgrim	f2d2cedab4	[X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM scheduler classes This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness. llvm-svn: 331472	2018-05-03 17:56:43 +00:00
Simon Pilgrim	f7dd6069a5	[X86] Split WriteVecALU/WritePHAdd into XMM and YMM/ZMM scheduler classes llvm-svn: 331453	2018-05-03 13:27:10 +00:00
Simon Pilgrim	39196a1dd3	[X86][AVX512] VPAVG instructions should be tagged as SchedWriteVecALU llvm-svn: 331446	2018-05-03 10:53:17 +00:00
Simon Pilgrim	93c878c76b	[X86] Split WriteVecIMul/WriteVecPMULLD/WriteMPSAD/WritePSADBW into XMM and YMM/ZMM scheduler classes Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...) llvm-svn: 331445	2018-05-03 10:31:20 +00:00
Simon Pilgrim	342ac8cd7e	[X86] Update MMX instructions to be tagged with X86SchedWriteWidths types llvm-svn: 331443	2018-05-03 09:11:32 +00:00
Clement Courbet	6794660828	[TableGen][NFC] Make ResourceCycles definitions more explicit. https://reviews.llvm.org/D46356 llvm-svn: 331439	2018-05-03 06:08:47 +00:00
Simon Pilgrim	350c22c587	[X86][SNB] Fix scheduling of MMX integer multiply instructions. The entries were being bound to the wrong class. llvm-svn: 331388	2018-05-02 19:26:14 +00:00
Simon Pilgrim	6732f6ea51	[X86] Split WriteShuffle/WriteVarShuffle + WriteBlend/WriteVarBlend into XMM and YMM/ZMM scheduler classes llvm-svn: 331386	2018-05-02 18:48:23 +00:00
Simon Pilgrim	819f218f07	[X86] Cleanup WriteFShuffle/WriteFVarShuffle (+256 variants) scheduler classes with more common default values llvm-svn: 331380	2018-05-02 17:58:50 +00:00
Simon Pilgrim	a3a4df3708	[X86] Convert most remaining XOP uses of X86SchedWritePair scheduler classes to X86SchedWriteWidths. llvm-svn: 331369	2018-05-02 16:25:41 +00:00
Simon Pilgrim	a53d330890	Fix line-endings. NFCI. llvm-svn: 331367	2018-05-02 16:16:24 +00:00
Clement Courbet	d2ff5fb536	Re-land rL331357 "[X86] Fix scheduling info for VMPSADBWYrmi." Without the rebase mess. https://reviews.llvm.org/D46356 llvm-svn: 331362	2018-05-02 14:35:48 +00:00
Simon Pilgrim	86d9f23ded	[X86] Cleanup WriteFMul scheduler classes with more common default values Intel models were targeting x87 instead of packed sse. llvm-svn: 331360	2018-05-02 14:25:32 +00:00
Clement Courbet	0f1da8f365	Revert rL331355 "[X86] Fix scheduling info for VMPSADBWYrmi." It contains unrelated changes. llvm-svn: 331357	2018-05-02 13:54:38 +00:00
Clement Courbet	a1a3095d88	[X86] Fix scheduling info for (V?)SQRTPDm on silvermont. https://reviews.llvm.org/D46356 llvm-svn: 331356	2018-05-02 13:46:14 +00:00
Clement Courbet	eeb2123a83	[X86] Fix scheduling info for VMPSADBWYrmi. https://reviews.llvm.org/D46356 llvm-svn: 331355	2018-05-02 13:40:48 +00:00
Simon Pilgrim	a1f1a3bf94	[X86] Convert most remaining AVX512 uses of X86SchedWritePair scheduler classes to X86SchedWriteWidths. We've dealt with the majority already. llvm-svn: 331353	2018-05-02 13:32:56 +00:00
Simon Pilgrim	e8671ef434	[X86] Convert most remaining uses of X86SchedWritePair scheduler classes to X86SchedWriteWidths. We've dealt with the majority already. llvm-svn: 331347	2018-05-02 12:27:54 +00:00
Simon Pilgrim	e93fd5f1e4	[X86] Cleanup WriteFAdd/WriteFCmp scheduler classes with more common default values Intel models were targeting x87 instead of packed sse. Also fixes XOP's VFRCZ to use WriteFAdd/WriteFAddY. llvm-svn: 331340	2018-05-02 09:18:49 +00:00
Jessica Paquette	7853666438	Create a MachineBasicBlock for created IR-level BasicBlock While running the lit tests for the most recent version of D45916 (https://reviews.llvm.org/D45916), I found that a couple tests for this pass suddenly started segfaulting. Since the outliner wasn't actually doing anything to the code in either of these tests I got curious. I found that the pass doesn’t completely create the machine-level constructs necessary to actually add a MachineFunction and MachineBasicBlock to the module. This patch adds in those missing bits. After this, adding the outliner before this pass won’t cause it to segfault. You can recreate this behaviour by adding the MachineOutliner directly before the pass and having it return false immediately. https://reviews.llvm.org/D46330 llvm-svn: 331307	2018-05-01 20:49:42 +00:00
Simon Pilgrim	21caf0124f	[X86] Split WriteFMul/WriteFDiv into XMM and YMM/ZMM scheduler classes llvm-svn: 331293	2018-05-01 18:22:53 +00:00
Simon Pilgrim	c708868cb1	[X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt into XMM and YMM/ZMM scheduler classes llvm-svn: 331290	2018-05-01 18:06:07 +00:00
Simon Pilgrim	c546f9424f	[X86] Split WriteFCmp into XMM and YMM/ZMM scheduler classes Removes more WriteFCmp InstRW overrides llvm-svn: 331283	2018-05-01 16:50:16 +00:00
Simon Pilgrim	5269167f5b	[X86] Split WriteFAdd into XMM and YMM/ZMM scheduler classes Removes more WriteFAdd InstRW overrides llvm-svn: 331276	2018-05-01 16:13:42 +00:00
Simon Pilgrim	1b7a80d80a	[X86] Convert all uses of WriteFAdd to X86SchedWriteWidths. In preparation of splitting WriteFAdd by vector width. llvm-svn: 331273	2018-05-01 15:57:17 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Simon Pilgrim	dd8eae128b	[X86] Split WriteFShuffle into XMM and YMM/ZMM scheduler classes Removes more WriteFShuffle InstRW overrides llvm-svn: 331264	2018-05-01 14:25:01 +00:00
Simon Pilgrim	f6b81dae9e	[X86] Convert all uses of WriteFShuffle to X86SchedWriteWidths. In preparation of splitting WriteFShuffle by vector width. llvm-svn: 331262	2018-05-01 14:14:42 +00:00
Simon Pilgrim	57f2b185ac	[X86] Split WriteVecLogic into XMM and YMM/ZMM scheduler classes This removes all the WriteVecLogic InstRW overrides. llvm-svn: 331258	2018-05-01 12:39:17 +00:00
Simon Pilgrim	6f710a6440	[X86] Convert all uses of WriteFLogic/WriteVecLogic to X86SchedWriteWidths. In preparation of splitting WriteVecLogic by vector width. llvm-svn: 331256	2018-05-01 12:15:29 +00:00
Simon Pilgrim	fc0c26f1a6	[X86] Tag PSLLDQ/PSRLDQ as WriteShuffle scheduler classes instead of shifts. Although they are encoded similar to bit shifts, the byte shifts behave like shuffles from a scheduling point of view. llvm-svn: 331253	2018-05-01 11:05:42 +00:00
Andrea Di Biagio	d4c58400c5	[X86] Correct spill slot size. This patch fixes a bug introduced by revision 330778 (originally reviewed at: https://reviews.llvm.org/D44782), where function isFrameLoadOpcode returned the wrong number of bytes read for opcodes VMOVSSrm and VMOVSDrm. This corrects that mistake, and extends the regression test to catch cases where the dead stores should be removed. Patch by Jeremy Morse. Differential Revision: https://reviews.llvm.org/D46256 llvm-svn: 331252	2018-05-01 10:29:38 +00:00
Gabor Buella	c8ded04e85	[X86] movdiri and movdir64b instructions Reviewers: spatel, craig.topper, RKSimon Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D45983 llvm-svn: 331248	2018-05-01 10:01:16 +00:00
Craig Topper	33dc01d105	[X86] Remove 'opaque ptr' from the intel syntax parser and printer. Previously for instructions like fxsave we would print "opaque ptr" as part of the memory operand. Now we print nothing. We also no longer accept "opaque ptr" in the parser. We still accept any size to be specified for these instructions, but we may want to consider only parsing when no explicit size is specified. This what gas does. llvm-svn: 331243	2018-05-01 04:42:00 +00:00
Simon Pilgrim	3c35408e48	[X86] Introduce X86SchedWriteWidths schedule wrapper for different vector widths. We need to split most of the scheduler classes by vector width to remove more of the InstRW overrides, this patch should make this easier/tidier by allowing us to pass the X86SchedWriteWidths wrapper to multi-width multiclasses and then split as required. I've included fields for Scl (scalar float/double), MMX (MMX integer), XMM, YMM and ZMM widths. These fields mostly share the same classes but it should give us the flexibility that we may need in the future. This patch has replaced a set of example SSE/AVX512 instruction cases but isn't exhaustive as it gets very noisy before we really need the functionality. Differential Revision: https://reviews.llvm.org/D46266 llvm-svn: 331208	2018-04-30 18:18:38 +00:00
Simon Pilgrim	c80a5aa09a	[X86][Atom] Remove unnecessary x87 load/move instrw overrides. llvm-svn: 331198	2018-04-30 16:51:13 +00:00
Simon Pilgrim	0a732da261	[X86] Drop unnecessary VPORrm InstrRW override in SkylakeServer. llvm-svn: 331188	2018-04-30 15:18:33 +00:00
Simon Pilgrim	b521ec7525	[X86] Fix SkylakeServer typo in WritePSADBW class - it only uses 1 resource. llvm-svn: 331187	2018-04-30 15:17:16 +00:00
Nico Weber	432a38838d	IWYU for llvm-config.h in llvm, additions. See r331124 for how I made a list of files missing the include. I then ran this Python script: for f in open('filelist.txt'): f = f.strip() fl = open(f).readlines() found = False for i in xrange(len(fl)): p = '#include "llvm/' if not fl[i].startswith(p): continue if fl[i][len(p):] > 'Config': fl.insert(i, '#include "llvm/Config/llvm-config.h"\n') found = True break if not found: print 'not found', f else: open(f, 'w').write(''.join(fl)) and then looked through everything with `svn diff \| diffstat -l \| xargs -n 1000 gvim -p` and tried to fix include ordering and whatnot. No intended behavior change. llvm-svn: 331184	2018-04-30 14:59:11 +00:00
Simon Pilgrim	318f816f49	[X86] Fix typo in skylake-avx512 model for PMAXSD/PMINSD instructions The PMAXSD/PMINSD instregexs had been written as PMAX(C?)SD - looks like this was a search+replace error when matching float MAXSD/MINSD commutative instructions. llvm-svn: 331167	2018-04-30 10:46:35 +00:00
Craig Topper	5a4c88005c	[X86] Add a Requires<[In64BitMode]> to FARJMP64 Otherwise we can try to assemble it in 32-bit mode and throw an assert in the encoder. llvm-svn: 331161	2018-04-30 06:21:24 +00:00
Craig Topper	d94002c414	[X86] Hide another instruction from the assembly matcher table to avoid a duplicate entry. NFC llvm-svn: 331160	2018-04-30 06:21:23 +00:00
Craig Topper	cda7690976	[X86] Remove some InstAliases aren't needed because a MnemonicAlias makes them unreachable. llvm-svn: 331159	2018-04-30 06:21:22 +00:00
Craig Topper	429ae3d775	[X86] Remove some instructions from the Intel assembly matcher table as there are equivalent mode aware InstAliases that conflict. The instructions have predicates of Not64BitMode, but there are identical strings in InstAliases that have Mode32Bit and Mode16Bit. But the ordering is uncontrolled and the less specific Not64BitMode was ordered first. This patch hides the Not64BitMode from the table so there is no conflict anymore. llvm-svn: 331158	2018-04-30 06:21:21 +00:00
Craig Topper	ebc7de07ee	[X86] Use a MnemonicAlias instead of an InstAlias. llvm-svn: 331157	2018-04-30 06:21:19 +00:00
Craig Topper	64e7a16fe4	[X86] Remove support for accepting 'fnstsw %eax' and 'fnstsw %al'. I assume this was done because gas accepted it at one point, but current versions of gas don't. llvm-svn: 331154	2018-04-30 01:53:12 +00:00
Craig Topper	b2bf3da152	[X86] Mark some more InstAliases as 'att' syntax only. These aliases are used to default the memory forms of call and jmp to the size of the operating mode. This doesn't work for Intel syntax. We have a different hack in the AsmParser code itself to force a size on unsized memory operands. llvm-svn: 331153	2018-04-30 01:53:10 +00:00
Craig Topper	5731558979	[X86] Make 64-bit sysret/sysexit not ambiguous in Intel assembly syntax. This also makes it default to the 32-bit non REX.W version in 64-bit mode. This seems to be more consistent with gas. llvm-svn: 331149	2018-04-29 22:55:54 +00:00
Simon Pilgrim	288c73e7be	[X86] Remove unnecessary BT InstRW overrides. llvm-svn: 331147	2018-04-29 18:18:51 +00:00
Simon Pilgrim	d5ada498db	[X86] Merge more instregex single matches to reduce InstrRW compile time. llvm-svn: 331143	2018-04-29 15:33:15 +00:00
Simon Pilgrim	684a719270	[X86] Remove unnecessary add/adc+sub/sbb InstRW overrides. llvm-svn: 331142	2018-04-29 14:16:17 +00:00
Craig Topper	18c4c8efaf	[X86] Add suffixes to the LGDT/LIDT/SGDT/SIDT mnemonics in Intel syntax. Add aliases based on 16/32-bit mode to choose the default. This allows the instruction selection to follow mode in Intel syntax. And allows a suffix to be used to change size. This matches gas behavior from what I could tell. llvm-svn: 331138	2018-04-29 06:24:09 +00:00
Craig Topper	ebd3e4a69c	[X86] Remove SLDT64m instruction. It doesn't really exist. The instruction always writes 16-bits of memory. Putting a REX.w on it won't change anything. While I was touching the encoding tests to remove it, I added some other missing register form test cases. llvm-svn: 331135	2018-04-29 04:50:53 +00:00
Craig Topper	8e3dbfd725	[X86] Remove unnecessary InstAliases. NFCI These used to disambiguate MOV16ms/MOV16sm from other size instructions that no longer exist. llvm-svn: 331134	2018-04-29 04:06:02 +00:00
Craig Topper	a3f52aaa19	[X86] Use getX86SubSuperRegister in addGR32orGR64Operands in the AsmParser instead of duplicating its functionality. NFC llvm-svn: 331128	2018-04-29 00:53:10 +00:00
Craig Topper	06624e1a93	[X86] Restrict many of the InstAliases to either to only att or intel syntax. NFCI Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic This patch restricts a lot of these to only one variant so we don't get the duplication. This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen. llvm-svn: 331117	2018-04-28 18:46:11 +00:00
Simon Pilgrim	badf63e95c	[X86] Remove unnecessary rotate-carry folded InstRW overrides. Merge some remaining instregex entries. llvm-svn: 331116	2018-04-28 18:45:16 +00:00
Simon Pilgrim	39d7720354	[X86] Remove unnecessary shift/rotate folded InstRW overrides. llvm-svn: 331110	2018-04-28 15:32:19 +00:00
Simon Pilgrim	9f561dd54a	[X86][SSE] Stop hard coding some instruction scheduler classes. Make these arguments to the multiclass to allow easier specialization. llvm-svn: 331107	2018-04-28 14:08:51 +00:00
Simon Pilgrim	c2fa056a2d	[X86][HW] Cleanup Haswell model. NFCI. Moved LAHF/SAHF to instrs instead of instregex. Removed some unnecessary instregex entries. llvm-svn: 331106	2018-04-28 14:06:28 +00:00
Craig Topper	9e11f96df6	[X86] Remove mayLoad flag from BNDMK/BNDCL/BNDCN/BNDCU. The instruction documentation specifically says that these instruction don't access memory. llvm-svn: 331105	2018-04-28 06:58:27 +00:00
Craig Topper	6d6b2b9503	[X86] Change memory operand of BNDMK/BNDCL/BNDCU/BNDCN/BNDST to anymem. These instruction don't use their memory operands as normal memory operands. They're just used as addresses. They don't have a size because they aren't directly representing a load or store. llvm-svn: 331104	2018-04-28 06:58:26 +00:00
Craig Topper	ef3866a859	[X86] Remove REX.W from 64-bit mode BND instructions. As far as I can tell from the docs, the instructions are automatically 64-bit in 64-bit mode. We don't need REX.W. llvm-svn: 331102	2018-04-28 06:02:40 +00:00
Craig Topper	8a6532ae84	[X86] Rename BNDMOV instructions and hide redundant instruction encoding from the assembler. Favor the 0x1a encoding for register/register move to match gas. The instructions used RM and MR in their name along with rr/rm/mr at the end. To make more consistent with other instructions remove the RM/MR and use rr/rm/mr/rr_REV. Hide the _REV encoding from the assembler but leave it for the disassembler. llvm-svn: 331101	2018-04-28 06:02:39 +00:00
Craig Topper	d656410293	[X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm if the mask instrinsics are also used in the same basic block. Summary: Previously the flag intrinsics always used the index instructions even if a mask instruction also exists. To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction. Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs. I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions. Reviewers: chandlerc, RKSimon, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46202 llvm-svn: 331091	2018-04-27 22:15:33 +00:00
Simon Pilgrim	8ee7d01dcf	[X86] Merge some x87 instruction instregex single matches. NFCI. llvm-svn: 331084	2018-04-27 21:14:19 +00:00
Simon Pilgrim	8a937e00d8	[X86] Split WriteFBlend/WriteFVarBlend/WriteFVarShuffle into XMM and YMM/ZMM scheduler classes This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed. llvm-svn: 331065	2018-04-27 18:19:48 +00:00
Simon Pilgrim	c3c767bf50	[X86] Split WriteFHadd into XMM and YMM/ZMM scheduler classes This removes all the HADD/HSUB PS/PD InstRW overrides. llvm-svn: 331054	2018-04-27 16:11:57 +00:00
Simon Pilgrim	b2aa89c909	[X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classes This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides. llvm-svn: 331051	2018-04-27 15:50:33 +00:00
Simon Pilgrim	aef5ca7299	[X86] Replace some system instruction instregex single matches with instrs entry. NFCI. llvm-svn: 331034	2018-04-27 13:32:42 +00:00
Chandler Carruth	16429acacb	[x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsics The LLVM commit introduces a crash in LLVM's instruction selection. I filed http://llvm.org/PR37260 with the test case. llvm-svn: 330997	2018-04-26 21:46:01 +00:00
Lama Saba	a331f91853	[X86] Fix Update Kill Register in Avoid SFB Pass - Bug 37153 Differential Revision: https://reviews.llvm.org/D45823 Change-Id: Icf6f34f6babc3cb2ff5292fde003472473037a71 llvm-svn: 330939	2018-04-26 13:16:11 +00:00
Craig Topper	bc26f3b61b	[X86] Print 'tbyte ptr' instead of 'xword ptr' for f80mem in Intel syntax. This matches objdump. llvm-svn: 330922	2018-04-26 05:07:40 +00:00
Craig Topper	b0227189fd	[X86] Remove alignment restriction on loading folding of pcmp[ei]str* during isel too. This is a follow up to the changes in r330896 which enabled folding after isel during peephole and register allocation. llvm-svn: 330897	2018-04-26 03:53:39 +00:00
Chandler Carruth	eb631ef51e	[x86] Allow folding unaligned memory operands into pcmp[ei]str* instructions. These have special permission according to the x86 manual to read unaligned memory, and this folding is done by ICC and GCC as well. This corrects one of the issues identified in PR37246. llvm-svn: 330896	2018-04-26 03:17:25 +00:00
Simon Pilgrim	2faf606fb6	[CostModel][X86] Remove hard coded SDIV/UDIV vector costs Algorithmically compute the 'x20' SDIV/UDIV vector costs - this is necessary for PR36550 when DIV costs will be driven from the scheduler models. llvm-svn: 330870	2018-04-25 20:59:16 +00:00
Craig Topper	300e20d61c	[X86] Form MUL_IMM for multiplies with 3/5/9 to encourage LEA formation over load folding. Previously we only formed MUL_IMM when we split a constant. This blocked load folding on those cases. We should also form MUL_IMM for 3/5/9 to favor LEA over load folding. Differential Revision: https://reviews.llvm.org/D46040 llvm-svn: 330850	2018-04-25 17:35:03 +00:00
Simon Pilgrim	58e03a09db	[CostModel][X86] Recursive call for cost of imul for packed v16i16 constant shift left. Don't just assume cost = 1. llvm-svn: 330834	2018-04-25 15:22:03 +00:00
Simon Pilgrim	dbd1ae7ddd	[X86] Split WriteFMA into XMM, Scalar and YMM/ZMM scheduler classes This removes all the FMA InstRW overrides. If we ever get PR36924, then we can remove many of these declarations from models. llvm-svn: 330820	2018-04-25 13:07:58 +00:00
Simon Pilgrim	6a82e96ed9	[X86][SKX] Setup WriteFAdd and remove unnecessary InstRW scheduler overrides. llvm-svn: 330813	2018-04-25 10:51:19 +00:00
Simon Pilgrim	98e21c5ade	[X86][SNB] Remove unnecessary WriteFBlendLd InstRW scheduler overrides. llvm-svn: 330812	2018-04-25 10:50:39 +00:00
Warren Ristow	b960d2cb40	[X86] Account for partial stack slot spills (PR30821) Previously, _any_ store or load instruction was considered to be operating on a spill if it had a frameindex as an operand, and thus was fair game for optimisations such as "StackSlotColoring". This usually works, except on architectures where spills can be partially restored, for example on X86 where a spilt vector can have a single component loaded (zeroing the rest of the target register). This can be mis-interpreted and the zero extension unsoundly eliminated, see pr30821. To avoid this, this commit optionally provides the caller to isLoadFromStackSlot and isStoreToStackSlot with the number of bytes spilt/loaded by the given instruction. Optimisations can then determine that a full spill followed by a partial load (or vice versa), for example, cannot necessarily be commuted. Patch by Jeremy Morse! Differential Revision: https://reviews.llvm.org/D44782 llvm-svn: 330778	2018-04-24 22:01:50 +00:00
Simon Pilgrim	c4d25a2922	[X86][SKX] Setup WriteFMul and remove unnecessary InstRW scheduler overrides. llvm-svn: 330760	2018-04-24 19:22:01 +00:00
Simon Pilgrim	27bc83e228	[X86] Split off PHMINPOSUW to their own schedule class This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. llvm-svn: 330756	2018-04-24 18:49:25 +00:00
Simon Pilgrim	81cb67ad82	[XOP] v4i32 IFMA 'VPMACS' instructions should use the WritePMULLD schedule class llvm-svn: 330751	2018-04-24 18:13:57 +00:00
Simon Pilgrim	cf0199a289	[AVX512] VPERMQ/VPERMPD/VPERMIL single op shuffles are not variable shuffles These variants all take an immediate shuffle mask value and should be scheduled as such. llvm-svn: 330747	2018-04-24 17:59:54 +00:00
Simon Pilgrim	f0945aa0e0	[X86][F16C] Add WriteCvtF2FSt scheduling class Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887) llvm-svn: 330737	2018-04-24 16:43:07 +00:00
Simon Pilgrim	828ef9e013	[X86][BtVer2] Fix VCVTPS2PHmr/VCVTPS2PHYmr latencies These are stores, not loads, so don't need to account for load latency. llvm-svn: 330735	2018-04-24 16:26:51 +00:00
Simon Pilgrim	16299273d0	[X86] Remove unnecessary FMA reg-mem InstRW scheduler overrides. llvm-svn: 330720	2018-04-24 14:47:11 +00:00
Simon Pilgrim	f7d2a93d5f	[X86] Add vector element insertion/extraction scheduler classes Split off pinsr/pextr and extractps instructions. (Mostly) fixes PR36887. Note: It might be worth adding a WriteFInsertLd class as well in the future. Differential Revision: https://reviews.llvm.org/D45929 llvm-svn: 330714	2018-04-24 13:21:41 +00:00
Alexander Ivchenko	5717fbaf4c	[X86] Replace action Promote with Expand for operation ISD::SINT_TO_FP Summary: If attribute "use-soft-float"="true" is set then X86ISelLowering.cpp sets 'Promote' action for ISD::SINT_TO_FP operation on type i32. But 'Promote' action is not proper in this case since lib function __floatsidf is available for casting from signed int to float type. Thus Expand action is more suitable here. The Expand action should be set for ISD::UINT_TO_FP for soft float as well. If function attribute "use-soft-float"="true" is set then infinite looping can happen in DAG combining, function visitSINT_TO_FP() replaces SINT_TO_FP node with UINT_TO_FP node and function combineUIntToFP() replace vice versa in cycle. The fix prevents it. Patch by vrybalov Differential Revision: https://reviews.llvm.org/D45572 llvm-svn: 330711	2018-04-24 12:57:51 +00:00
Petar Jovanovic	e2bfcd6394	Correct dwarf unwind information in function epilogue This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: * CFI instructions do not affect code generation (they are not counted as instructions when tail duplicating or tail merging) * Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Added CFIInstrInserter pass: * analyzes each basic block to determine cfa offset and register are valid at its entry and exit * verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors * inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D42848 llvm-svn: 330706	2018-04-24 10:32:08 +00:00
Craig Topper	19b85103a3	[X86] Add a BSWAP16 instruction using the 32-bit encoding plus a 0x66 prefix. This encoding is recognized by the CPU, but the behavior is undefined. This makes the disassembler handle it correctly so we don't print bswapl with a 16-bit register. llvm-svn: 330682	2018-04-24 04:28:02 +00:00
Simon Pilgrim	e5e4bf02d6	[X86] Remove unnecessary vector memory folded InstRW overrides. We have test coverage for these with resources-sse/avx llvm-svn: 330662	2018-04-23 22:45:04 +00:00
Simon Pilgrim	eb6090941c	[X86] Remove unnecessary BMI2 InstRW overrides. We have test coverage for these with resources-bmi2.s llvm-svn: 330659	2018-04-23 22:19:55 +00:00
Simon Pilgrim	ed09ebb48d	[X86] Remove unnecessary WriteLEA InstRW overrides. llvm-svn: 330648	2018-04-23 21:04:23 +00:00
Gabor Buella	1a2ce572bf	[X86] Revert r330638 - accidental commit llvm-svn: 330640	2018-04-23 20:05:51 +00:00
Gabor Buella	213a7cda1f	[X86] movdiri and movdir64b instructions Reviewers: craig.topper llvm-svn: 330638	2018-04-23 20:00:59 +00:00
Simon Pilgrim	8cd01aaa0f	[X86] Replace x87 instregex with instrs if they only match one instruction llvm-svn: 330611	2018-04-23 16:10:50 +00:00
Simon Pilgrim	455d0b2cfe	[X86] Remove instregex matching from CLAC/STAC. Note - noticed this as the STAC case as it was unintentionally matching against STACK pseudo instructions. llvm-svn: 330588	2018-04-23 13:24:17 +00:00
Simon Pilgrim	0a334a8668	[X86] Remove unnecessary MMX reg-mem InstRW scheduler overrides. llvm-svn: 330581	2018-04-23 11:57:15 +00:00
Craig Topper	3f1d538165	[X86] Add VEX_WIG to VEX encoded version of VCMPPSY/VCMPPDY. llvm-svn: 330563	2018-04-23 04:50:01 +00:00
Simon Pilgrim	326594bc92	[X86][Znver1] Remove unnecessary BMI1 ANDN InstRW overrides. llvm-svn: 330558	2018-04-22 21:37:08 +00:00
Simon Pilgrim	06e16541ba	[X86] Remove unnecessary WriteFBlend/WriteBlend InstRW overrides. Fixed a lot of the default classes which were being completely overridden. llvm-svn: 330554	2018-04-22 18:35:53 +00:00
Simon Pilgrim	091680b6e7	[X86] Remove unnecessary WriteFMul/WriteFRcp/WriteFRsqrt InstRW overrides. llvm-svn: 330553	2018-04-22 18:09:50 +00:00
Simon Pilgrim	b362d02229	[X86] Remove unnecessary CVT instrw overrides. llvm-svn: 330552	2018-04-22 17:54:58 +00:00
Simon Pilgrim	c7f9b183c2	[X86][SkylakeServer] Remove unnecessary PMULLD instrw overrides. llvm-svn: 330549	2018-04-22 16:51:12 +00:00
Simon Pilgrim	3e8640a93a	[X86][Atom] Remove unnecessary scalar/vector load/move instrw overrides. llvm-svn: 330548	2018-04-22 16:49:35 +00:00
Simon Pilgrim	ef8d3ae4b5	[X86] Fix (completely overridden) WriteFHAdd/WritePHAdd classes to allow us to remove unnecessary instrw overrides. llvm-svn: 330546	2018-04-22 15:25:59 +00:00
Simon Pilgrim	2fd8269c6f	[X86][MMX][SSE] Tag missed PHADD/PHSUB instructions with WritePHAdd llvm-svn: 330545	2018-04-22 15:02:23 +00:00
Simon Pilgrim	96855ec39e	[X86] Remove unnecessary WriteFVarBlend/WriteVarBlend InstRW overrides. This also fixes some of the ReadAfterLd issues due to InstRW. llvm-svn: 330544	2018-04-22 14:43:12 +00:00
Simon Pilgrim	a41ae2f005	[X86] Fix WriteMPSAD/WritePSADBW values to allow us to remove unnecessary instrw overrides. llvm-svn: 330542	2018-04-22 10:39:16 +00:00
Simon Pilgrim	523fd335b1	[X86][SandyBridge] Remove unnecessary WritePOPCNTLd overrides by fixing load latency. llvm-svn: 330541	2018-04-22 10:03:52 +00:00
Craig Topper	e958c7270e	[X86] Change TB to PS on LFENCE instruction. This matches the other FENCE instructions. llvm-svn: 330533	2018-04-22 03:15:02 +00:00
Craig Topper	2a28336f34	[X86] Remove OpSizeIgnore, it's not implemented any differently than OpSizeFixed. llvm-svn: 330532	2018-04-22 01:24:58 +00:00
Craig Topper	e33ed7d667	[X86] Remove DATA32_PREFIX. Hack the printing for DATA16_PREFIX to print 'data32' in 16-bit mode. Hack the asm parser to convert 'data32' to 'data16' in 16-bit mode. Improve the error messages to match GNU assembler. This also allows us to remove the hack from the disassembler table building. llvm-svn: 330531	2018-04-22 00:52:02 +00:00
Simon Pilgrim	37334ea67a	[X86] Strip unnecessary prefetch + vector move/load instrw overrides from scheduler models. llvm-svn: 330527	2018-04-21 21:59:36 +00:00
Simon Pilgrim	920802cc50	[X86] Strip unnecessary WriteCvtF2I instrw overrides from scheduler models. llvm-svn: 330525	2018-04-21 21:16:44 +00:00
Simon Pilgrim	825ead950e	[X86] Strip unnecessary broadcast/shuffle256 instrw overrides from scheduler models. llvm-svn: 330523	2018-04-21 20:45:12 +00:00
Simon Pilgrim	58ddaeabe2	[X86][AVX] VPERM2F128/VINSERTF128 should be a shuffle256 schedule like VPERM2I128/VINSERTI128 llvm-svn: 330522	2018-04-21 20:04:24 +00:00
Simon Pilgrim	74ccc6a303	[X86] Strip unnecessary vector integer math, shift-imm, extend, shuffle, pack/unpack instruction instrw overrides from scheduler models. llvm-svn: 330521	2018-04-21 19:11:55 +00:00
Craig Topper	fe59bea07b	[X86] Add DAG combine to turn (trunc (srl (mul ext, ext), 16) into PMULHW/PMULHUW. Ultimately I want to use this to remove the intrinsics for these instructions. llvm-svn: 330520	2018-04-21 18:39:21 +00:00
Craig Topper	05242bf691	[X86] Add SchedWrites for LDMXCSR/STMXCSR. llvm-svn: 330517	2018-04-21 18:07:36 +00:00
Simon Pilgrim	44278f6598	[X86][Haswell] Strip unnecessary WriteFAdd/WriteFHAdd instruction instrw overrides. llvm-svn: 330514	2018-04-21 16:20:28 +00:00
Simon Pilgrim	a80df0999f	[X86][Broadwell] Remove unnecessary VORPD/VORPS instrw override - missed in D45629 llvm-svn: 330513	2018-04-21 16:17:47 +00:00
Simon Pilgrim	93b102cd45	[X86] Strip unnecessary WriteFRcp/WriteFRsqrt instruction instrw overrides from scheduler models. The required the default skylake schedules to be updated - these were being completely overriden by the InstRW and the existing values not used at all. llvm-svn: 330510	2018-04-21 15:16:59 +00:00
Simon Pilgrim	2193524fb4	[X86] Strip unnecessary WriteFShuffle instruction instrw overrides from scheduler models. llvm-svn: 330508	2018-04-21 14:56:56 +00:00
Simon Pilgrim	f7f84a0ca3	[X86][SandyBridge] Strip unnecessary MOVQ/CVT instruction instrw overrides. llvm-svn: 330505	2018-04-21 14:03:40 +00:00
Simon Pilgrim	02fc375a22	[X86] Strip unnecessary MMX instruction instrw overrides from scheduler models. llvm-svn: 330503	2018-04-21 12:15:42 +00:00
Simon Pilgrim	c0f654f18e	[X86] Strip unnecessary x87 instruction instrw overrides from scheduler models. llvm-svn: 330501	2018-04-21 11:25:02 +00:00
Simon Pilgrim	d14d2e7b18	[X86] Add WriteFSign/WriteFLogic scheduler classes Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes. This unearthed a couple of things that are also handled in this patch: (1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic (2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015. (3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops. Differential Revision: https://reviews.llvm.org/D45629 llvm-svn: 330480	2018-04-20 21:16:05 +00:00
Craig Topper	173d59b62e	[X86][SandyBridge] Remove duplciate InstRWs from Sandy Brige scheduler model. llvm-svn: 330465	2018-04-20 18:55:40 +00:00
Gabor Buella	31fa8025ba	[X86] WaitPKG instructions Three new instructions: umonitor - Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be a writeback memory caching type. umwait - A hint that allows the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. tpause - Directs the processor to enter an implementation-dependent optimized state until the TSC reaches the value in EDX:EAX. Also modifying the description of the mfence instruction, as the rep prefix (0xF3) was allowed before, which would conflict with umonitor during disassembly. Before: $ echo 0xf3,0x0f,0xae,0xf0 \| llvm-mc -disassemble .text mfence After: $ echo 0xf3,0x0f,0xae,0xf0 \| llvm-mc -disassemble .text umonitor %rax Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45253 llvm-svn: 330462	2018-04-20 18:42:47 +00:00
Simon Pilgrim	df8fa6d734	[X86][BtVer2] Cleanup some old FIXMEs from the model. NFCI. llvm-svn: 330428	2018-04-20 13:12:04 +00:00
Simon Pilgrim	2f522ef13d	[X86] Tag CLDEMOTE instruction with WriteLoad scheduling class Same as other cacheline instructions llvm-svn: 330424	2018-04-20 12:54:53 +00:00
Craig Topper	bc895a3afc	[X86] Enable popcnt false dependency breaking on Silvermont and Goldmont. Silvermont and Goldmont have the same issue on popcnt as Sandy Bridge, Haswell, Broadwell, and Skylake. Believe it is fixed in Goldmont Plus. llvm-svn: 330358	2018-04-19 19:25:24 +00:00
Simon Pilgrim	4ba057dbd1	[X86][SLM] Fix typo using SandyBridge resources. Luckily this was on instructions not supported on Silvermont.... llvm-svn: 330351	2018-04-19 18:01:52 +00:00
Craig Topper	b5f2659130	[X86] Correct the scheduling data for register forms of XCHG and XADD on Intel CPUs. The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register. XADD is probably 2 moves and an add also using a temporary register. Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available. llvm-svn: 330349	2018-04-19 18:00:17 +00:00
Simon Pilgrim	5e492d29a3	[X86] Merge some MMX instregex There's a lot more but I'd prefer focussing on removing unnecessary InstRWs first. llvm-svn: 330347	2018-04-19 17:32:10 +00:00
Simon Pilgrim	f21ace6cdd	[X86][BtVer2] Remove SSE4A EXTRQ/EXTRQI InstRW overrides. These are already handled identically by WriteALU. llvm-svn: 330332	2018-04-19 14:38:36 +00:00
Alexander Ivchenko	e8fed1546e	Lowering x86 adds/addus/subs/subus intrinsics (llvm part) This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. The patch also includes folding of previously missing saturation patterns so that IR emits the same machine instructions as the intrinsics. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D44785 llvm-svn: 330322	2018-04-19 12:13:30 +00:00
Simon Pilgrim	3c06617f0e	[X86][FMA] Remove FMA reg-reg InstRW scheduler overrides. These are all already handled identically by WriteFMA. llvm-svn: 330319	2018-04-19 11:37:26 +00:00
Simon Pilgrim	33dede9075	[X86][BtVer2] Remove 128-bit F16C InstRW overrides. These are already handled identically by WriteCvtF2F. llvm-svn: 330318	2018-04-19 11:16:33 +00:00
Craig Topper	f846e2d1b1	[X86] Scrub scheduling information for MUL/IMUL on Intel CPUs. This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models. llvm-svn: 330308	2018-04-19 05:34:05 +00:00
Bob Haarman	cb80a3fce0	Fix data race in X86FloatingPoint.cpp ASSERT_SORTED Summary: ASSERT_SORTED checks if a table is sorted, and uses a boolean to prevent the check from being run again if it was earlier determined that the table is in fact sorted. Unsynchronized reads and writes of that boolean triggered ThreadSanitizer's data race detection. This change rewrites the code to use std::atomic<bool> instead. Fixes PR36922. Reviewers: rnk Reviewed By: rnk Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D45742 llvm-svn: 330301	2018-04-18 23:04:09 +00:00
Craig Topper	ebf52e80c1	[X86] Correct the Defs, Uses, hasSideEffects, mayLoad, mayStore for XCHG and XADD instructions. I don't think we emit any of these from codegen except for using XCHG16ar as 2 byte NOP. llvm-svn: 330298	2018-04-18 22:07:53 +00:00
Craig Topper	04244cbf45	[X86] Fix the Uses/Defs,mayLoad,mayStore,hasSideEffects flags for the CMPXCHG instructions. The compiler only emits the locked version of these which use different instruction definitions. The versions fixed here are only used by the assembler/disassembler. llvm-svn: 330287	2018-04-18 20:15:00 +00:00
Chandler Carruth	ccd3ecb95a	[x86] Switch EFLAGS copy lowering to use reg-reg form of testing for a zero register. Previously I tried this and saw LLVM unable to transform this to fold with memory operands such as spill slot rematerialization. However, it clearly works as shown in this patch. We turn these into `cmpb $0, <mem>` when useful for folding a memory operand without issue. This form has no disadvantage compared to `testb $-1, <mem>`. So overall, this is likely no worse and may be slightly smaller in some cases due to the `testb %reg, %reg` form. Differential Revision: https://reviews.llvm.org/D45475 llvm-svn: 330269	2018-04-18 15:52:50 +00:00
Chandler Carruth	1f87618f8f	[x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite uses across basic blocks in the limited cases where it is very straight forward to do so. This will also be useful for other places where we do some limited EFLAGS propagation across CFG edges and need to handle copy rewrites afterward. I think this is rapidly approaching the maximum we can and should be doing here. Everything else begins to require either heroic analysis to prove how to do PHI insertion manually, or somehow managing arbitrary PHI-ing of EFLAGS with general PHI insertion. Neither of these seem at all promising so if those cases come up, we'll almost certainly need to rewrite the parts of LLVM that produce those patterns. We do now require dominator trees in order to reliably diagnose patterns that would require PHI nodes. This is a bit unfortunate but it seems better than the completely mysterious crash we would get otherwise. Differential Revision: https://reviews.llvm.org/D45673 llvm-svn: 330264	2018-04-18 15:13:16 +00:00
Craig Topper	dfccafe18a	[X86][Broadwell] Remove some unnecessary InstRW overrides and add some FIXMEs. llvm-svn: 330241	2018-04-18 06:41:25 +00:00
Craig Topper	513e11bb70	[X86] Give CMOV 2 cycle latency on SLM. llvm-svn: 330239	2018-04-18 06:04:30 +00:00
Craig Topper	8704612481	[X86] Don't crash on bad operand modifiers in inline assembly Summary: Previously if a modifer was placed on a non-GPR register class we would hit an assert or crash. Reviewers: echristo Reviewed By: echristo Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D45751 llvm-svn: 330238	2018-04-18 05:15:24 +00:00
Keith Wyss	3d86823f3d	[XRay] Typed event logging intrinsic Summary: Add an LLVM intrinsic for type discriminated event logging with XRay. Similar to the existing intrinsic for custom events, but also accepts a type tag argument to allow plugins to be aware of different types and semantically interpret logged events they know about without choking on those they don't. Relies on a symbol defined in compiler-rt patch D43668. I may wait to submit before I can see demo everything working together including a still to come clang patch. Reviewers: dberris, pelikan, eizan, rSerge, timshen Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45633 llvm-svn: 330219	2018-04-17 21:30:29 +00:00
Craig Topper	e56a2fc5e7	[X86] Add separate scheduling class for PSADBW instruction. llvm-svn: 330204	2018-04-17 19:35:19 +00:00
Craig Topper	655e1db722	[X86] Remove unnecessary InstRW overrides. Add somes FIXMEs/TODOs. llvm-svn: 330203	2018-04-17 19:35:14 +00:00
Simon Pilgrim	86e3c26924	[X86] Add FP comparison scheduler classes Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd Differential Revision: https://reviews.llvm.org/D45656 llvm-svn: 330179	2018-04-17 07:22:44 +00:00
Gabor Buella	8f1646b579	[X86] Introduce archs: goldmont-plus & tremont Using Goldmont's cost tables for these two upcoming atom archs. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45612 llvm-svn: 330109	2018-04-16 07:47:35 +00:00
Craig Topper	53f9558903	[X86] Use uint32_t instead of unsigned in GetLo32XForm for readability. NFC GetLo8XForm right next to it uses uint8_t so uint32_t is consistent. llvm-svn: 330104	2018-04-15 19:11:24 +00:00
Simon Pilgrim	b8adf558f8	[X86][MMX] Set PAVG/PHADD/PMIN/PMAX/PSIGN instructions to use same scheduler classes as SSE/AVX llvm-svn: 330085	2018-04-14 13:06:38 +00:00
Hiroshi Inoue	ae17900997	[NFC] fix trivial typos in document and comments "not not" -> "not" etc llvm-svn: 330083	2018-04-14 08:59:00 +00:00
Craig Topper	95f421cfbf	[X86] Add the bizarro movsww and movzww mnemonics for the disassembler. The destination size of the movzx/movsx instruction is controlled by the normal operand size mechanisms. Only the input type is fixed. This means that a 0x66 prefix on the encoding for zext/sext 16->32 should really produce a 16->16 instruction. Functionally this is equivalent to a GR16->GR16 move since bits 16 and above will be preserved. So nothing is actually extended. llvm-svn: 330078	2018-04-13 23:57:54 +00:00
Tim Northover	271d3d2771	MachO: trap unreachable instructions Debugability is more important than saving 4 bytes to let us to fall through to nonense. llvm-svn: 330073	2018-04-13 22:25:20 +00:00
Simon Pilgrim	0e74e50401	[X86] Remove remaining itinerary support from instructions and target (PR37093) llvm-svn: 330035	2018-04-13 15:37:56 +00:00
Simon Pilgrim	a3a9d81231	[X86] Generalize X86FixupLEAs to work with TargetSchedModel Similar to rL329834, don't rely on itinerary scheduler model to determine latencies for LEA thresholds, use the generic TargetSchedModel::computeInstrLatency call. llvm-svn: 330030	2018-04-13 15:09:39 +00:00
Simon Pilgrim	01637c473f	Remove comment reference to itineraries. NFCI. llvm-svn: 330025	2018-04-13 14:42:48 +00:00
Simon Pilgrim	fe3d59e98b	[X86][AVX512] UNPCKL/H PS and PD should be scheduled with WriteFShuffle not WriteFAdd llvm-svn: 330023	2018-04-13 14:41:05 +00:00
Simon Pilgrim	21e89795cc	[X86] Remove remaining OpndItins/SizeItins from all instruction defs (PR37093) llvm-svn: 330022	2018-04-13 14:36:59 +00:00
Simon Pilgrim	e0c7868ded	Remove comment references to itineraries. NFCI. llvm-svn: 330021	2018-04-13 14:31:57 +00:00
Simon Pilgrim	963bf4de2b	Remove out of data comment. NFCI. llvm-svn: 330019	2018-04-13 14:24:06 +00:00
Simon Pilgrim	ae0c2711b6	[X86] Remove OpndItins/SizeItins from all sse instruction defs (PR37093) llvm-svn: 330013	2018-04-13 12:50:31 +00:00
Hiroshi Inoue	372ffa15cb	[NFC] fix trivial typos in comments "the the" -> "the", "we we" -> "we", etc llvm-svn: 330006	2018-04-13 11:37:06 +00:00
Gabor Buella	604be4424b	[X86] Introduce cldemote instruction Hint to hardware to move the cache line containing the address to a more distant level of the cache without writing back to memory. Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45256 llvm-svn: 329992	2018-04-13 07:35:08 +00:00
Craig Topper	254ed028a4	[X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR. This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics. We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well. llvm-svn: 329990	2018-04-13 06:07:18 +00:00
Simon Pilgrim	1f070c334c	[X86] Remove unused MoveLoadStoreItins/ShiftOpndItins schedule class wrappers. Was being used to move around empty/unused itineraries... llvm-svn: 329970	2018-04-12 22:57:34 +00:00
Simon Pilgrim	6551d405dc	[X86] Remove x86 InstrItinClass entries (PR37093) This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093. llvm-svn: 329967	2018-04-12 22:44:47 +00:00
Simon Pilgrim	0e45634f4e	[X86] Remove InstrItinClass entries from all x86 instruction defs (PR37093) llvm-svn: 329953	2018-04-12 20:47:34 +00:00
Simon Pilgrim	e9376b9fdc	[X86] Remove InstrItinClass entries from SSE/AVX instructions defs (PR37093) llvm-svn: 329945	2018-04-12 19:59:35 +00:00
Simon Pilgrim	577ae24feb	[X86] Remove explicit SSE/AVX schedule itineraries from defs (PR37093) llvm-svn: 329940	2018-04-12 19:25:07 +00:00
Simon Pilgrim	35935c0632	[X86] Remove remaining gpr schedule itineraries (PR37093) llvm-svn: 329938	2018-04-12 18:46:15 +00:00
Gabor Buella	297c138798	[X86] Introduce LLVM wbinvd intrinsic A previously missing intrinsic for an old instruction. Reviewers: craig.topper, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45312 llvm-svn: 329936	2018-04-12 18:38:18 +00:00
Simon Pilgrim	dec781c141	[X86] Remove gpr shift/extension schedule itineraries (PR37093) llvm-svn: 329933	2018-04-12 18:25:38 +00:00
Simon Pilgrim	8904a86f65	[X86] Remove AES/CLMUL/CRC32/LDDQU/MOVNT/POPCNT/SHA schedule itineraries (PR37093) llvm-svn: 329912	2018-04-12 14:31:42 +00:00
Simon Pilgrim	294556d40e	[X86] Remove remaining system/special schedule itineraries (PR37093) llvm-svn: 329906	2018-04-12 12:43:49 +00:00
Simon Pilgrim	0cd0fbd8c5	[X86] Remove system/control schedule itineraries (PR37093) llvm-svn: 329903	2018-04-12 12:09:24 +00:00
Simon Pilgrim	69e0e8e3d4	[X86] Remove CMOV/SETCC schedule itineraries (PR37093) llvm-svn: 329898	2018-04-12 11:01:40 +00:00
Simon Pilgrim	10e3bdaaa8	[X86] Remove MMX/3DNow schedule itineraries (PR37093) llvm-svn: 329896	2018-04-12 10:49:57 +00:00
Simon Pilgrim	32d368147f	[X86] Remove X87 schedule itineraries (PR37093) First of a number of commits to remove x86 schedule itineraries entirely - approved off-line with @craig.topper llvm-svn: 329893	2018-04-12 10:27:37 +00:00
Simon Pilgrim	7b88d09e75	[X86] Remove unused itinerary argument from FMA3/FMA4/XOP instructions. NFCI. llvm-svn: 329862	2018-04-11 23:24:38 +00:00
Gabor Buella	2ef36f3571	[X86] Describe wbnoinvd instruction Similar to the wbinvd instruction, except this one does not invalidate caches. Ring 0 only. The encoding matches a wbinvd instruction with an F3 prefix. Reviewers: craig.topper, zvi, ashlykov Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D43816 llvm-svn: 329847	2018-04-11 20:01:57 +00:00
Simon Pilgrim	8fc2b49620	[X86][Atom] Convert Atom scheduler model to SchedRW (PR32431) Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550. I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here, There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers. There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical. NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild. Differential Revision: https://reviews.llvm.org/D45486 llvm-svn: 329837	2018-04-11 18:23:01 +00:00
Simon Pilgrim	7f321d8c24	[X86] Generalize X86PadShortFunction to work with TargetSchedModel Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call. Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width. Differential Revision: https://reviews.llvm.org/D45486 llvm-svn: 329834	2018-04-11 18:05:17 +00:00
Simon Pilgrim	89c8a10f7c	[X86] Add variable shuffle schedule classes Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806	2018-04-11 13:49:19 +00:00
Craig Topper	9507fa358c	[X86] Remove 128/256-bit masked pmaddubsw and pmaddwd intrinsics. Replace 512-bit masked intrinsic with unmasked intrinsic and a select. The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency. llvm-svn: 329774	2018-04-11 04:55:04 +00:00
Craig Topper	ee2c1dea4d	[X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need to emit an explicit MOV8mr instruction. Previously the code only knew how to handle setcc to a register. This should fix a crash in the chromium build. llvm-svn: 329771	2018-04-11 01:09:10 +00:00
Sriraman Tallam	d693093a65	GOTPCREL references must always use RIP. With -fno-plt, global value references can use GOTPCREL and RIP must be used. Differential Revision: https://reviews.llvm.org/D45460 llvm-svn: 329765	2018-04-10 22:50:05 +00:00
Gabor Buella	213edc4a15	[X86] Split up -march=icelake to -client & -server Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45055 llvm-svn: 329742	2018-04-10 18:59:13 +00:00
Craig Topper	442428540a	[X86] Change the name string for the newly add DF flag register to 'dirflag' to match the clobber name supported by clang for MS inline assembly. This should fix the failure found by Chromium reported here https://bugs.chromium.org/p/chromium/issues/detail?id=831158 The test case will be added in clang. llvm-svn: 329734	2018-04-10 18:21:04 +00:00
Simon Pilgrim	95f941117c	Fix whitespace indentation. NFCI. llvm-svn: 329704	2018-04-10 14:21:33 +00:00
Gabor Buella	3eab22d896	[X86] Disable SGX for Skylake Server Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45057 llvm-svn: 329700	2018-04-10 13:58:57 +00:00
Andrea Di Biagio	486358c153	[X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources. The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell resource, and not a Broadwell processor resource. That entry was added to the Broadwell model because variable blends were consuming it. This was clearly a typo (the resource name should have been BWPort5), which unfortunately was never caught before. It was not reported as an error because HWPort5 is a resource defined by the Haswell model. It has been found when testing some code with llvm-mca: the list of resources in the resource pressure view was odd. This patch fixes the issue; now variable blend instructions consume 2 cycles on BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious) entry in the BroadWellModelProcResources table. llvm-svn: 329686	2018-04-10 10:49:41 +00:00
Clement Courbet	b449379eae	[MC][TableGen] Add optional libpfm counter names for ProcResUnits. Summary: Subtargets can define the libpfm counter names that can be used to measure cycles and uops issued on ProcResUnits. This allows making llvm-exegesis available on more targets. Fixes PR36984. Reviewers: gchatelet, RKSimon, andreadb, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45360 llvm-svn: 329675	2018-04-10 08:16:37 +00:00
Chandler Carruth	0ca3bd0729	[x86] Model the direction flag (DF) separately from the rest of EFLAGS. This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673	2018-04-10 06:40:51 +00:00
Craig Topper	7e42af87a6	[X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits. Prefer to use the 32-bit AND with immediate instead. Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold. Fixes PR37063. llvm-svn: 329667	2018-04-10 03:44:15 +00:00
Chandler Carruth	19618fc639	[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues. The key idea is to lower COPY nodes populating EFLAGS by scanning the uses of EFLAGS and introducing dedicated code to preserve the necessary state in a GPR. In the vast majority of cases, these uses are cmovCC and jCC instructions. For such cases, we can very easily save and restore the necessary information by simply inserting a setCC into a GPR where the original flags are live, and then testing that GPR directly to feed the cmov or conditional branch. However, things are a bit more tricky if arithmetic is using the flags. This patch handles the vast majority of cases that seem to come up in practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of partially preserved EFLAGS as LLVM doesn't currently model that at all. There are a large number of operations that techinaclly observe EFLAGS currently but shouldn't in this case -- they typically are using DF. Currently, they will not be handled by this approach. However, I have never seen this issue come up in practice. It is already pretty rare to have these patterns come up in practical code with LLVM. I had to resort to writing MIR tests to cover most of the logic in this pass already. I suspect even with its current amount of coverage of arithmetic users of EFLAGS it will be a significant improvement over the current use of pushf/popf. It will also produce substantially faster code in most of the common patterns. This patch also removes all of the old lowering for EFLAGS copies, and the hack that forced us to use a frame pointer when EFLAGS copies were found anywhere in a function so that the dynamic stack adjustment wasn't a problem. None of this is needed as we now lower all of these copies directly in MI and without require stack adjustments. Lots of thanks to Reid who came up with several aspects of this approach, and Craig who helped me work out a couple of things tripping me up while working on this. Differential Revision: https://reviews.llvm.org/D45146 llvm-svn: 329657	2018-04-10 01:41:17 +00:00
Vlad Tsyrklevich	0cdc6ec535	ShadowCallStack/x86_64: Ignore pseudo-machine instructions llvm-svn: 329656	2018-04-10 01:31:01 +00:00
Craig Topper	47b2f9d836	[X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types on targets without AVX512BW. LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type. Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round. Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases. This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that. llvm-svn: 329616	2018-04-09 20:37:14 +00:00
Craig Topper	0c2a12cb3e	[X86] Revert the SLM part of r328914. While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list. llvm-svn: 329593	2018-04-09 17:07:40 +00:00
Simon Pilgrim	e5ed5e2cba	[X86][MMX] Fix missing itinerary for PALIGNR llvm-svn: 329568	2018-04-09 13:52:33 +00:00
Simon Pilgrim	140fee078f	[X86][MMX] Fix missing itinerary for MOVQ2DQ instruction format llvm-svn: 329567	2018-04-09 13:42:14 +00:00
Simon Pilgrim	abf3611332	[X86][MMX] Fix missing itinerary for CVTPI2PS llvm-svn: 329565	2018-04-09 13:27:47 +00:00
Simon Pilgrim	0047efdd1e	[X86][MMX] Fix flipped reg/mem typo in MMX_MISC_FUNC_ITINS The RR/RM itineraries were the wrong way around llvm-svn: 329561	2018-04-09 13:02:07 +00:00
Simon Pilgrim	6131286553	[X86][SSE] Fix f32 mul/div itinerary groups typo The RM folded itineraries were incorrectly using the f64 version. llvm-svn: 329556	2018-04-09 10:45:53 +00:00
Sanjay Patel	0d7df36c66	[TargetSchedule] shrink interface for init(); NFCI The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540	2018-04-08 19:56:04 +00:00
Craig Topper	b7baa358f6	[X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs. Summary: Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops. This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs. The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc. Reviewers: RKSimon, andreadb, GGanesh Reviewed By: RKSimon Subscribers: GGanesh, llvm-commits Differential Revision: https://reviews.llvm.org/D45380 llvm-svn: 329539	2018-04-08 17:53:18 +00:00
Craig Topper	c362f42b6a	[X86][Znver1] Remove InstRWs for BLENDVPS/PD Summary: This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data. The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx" Reviewers: RKSimon, GGanesh Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44841 llvm-svn: 329538	2018-04-08 17:53:15 +00:00
Mandeep Singh Grang	68a151a13c	[X86] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: chandlerc, craig.topper, RKSimon Reviewed By: chandlerc, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44874 llvm-svn: 329534	2018-04-08 16:42:52 +00:00
Simon Pilgrim	86588fc809	[X86][Btver2] Add vector extract costs llvm-svn: 329524	2018-04-08 11:26:26 +00:00
Craig Topper	ef37aebc96	[X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering. Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget llvm-svn: 329510	2018-04-07 19:09:52 +00:00
Simon Pilgrim	80ce1dde44	[CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets llvm-svn: 329498	2018-04-07 13:24:33 +00:00
Craig Topper	c50570fb4f	[X686] Add appropriate ReadAfterLd for the register input to memory forms of ADC/SBB. llvm-svn: 329424	2018-04-06 17:12:18 +00:00
Craig Topper	b9d298ecf2	[X86] Remove InstRWs for basic arithmetic instructions from Sandy Bridge scheduler model. We can get this right through WriteALU and friends now. llvm-svn: 329417	2018-04-06 16:29:31 +00:00
Craig Topper	f0d042619b	[X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs Summary: This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency. Apparently we were inconsistent about whether the store has latency or not thus the test changes. I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5. Reviewers: RKSimon, andreadb Reviewed By: andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45351 llvm-svn: 329416	2018-04-06 16:16:48 +00:00
Craig Topper	f131b60049	[X86] Add an extra store address cycle to WriteRMW in the Sandy Bridge/Broadwell/Haswell/Skylake scheduler model. Even those the address was calculated for the load, its calculated again for the store. llvm-svn: 329415	2018-04-06 16:16:46 +00:00
Craig Topper	22d25a08ae	[X86] Merge itineraries for CLC, CMC, and STC. These are very simple flag setting instructions that appear to only be a single uop. They're unlikely to need this separation. llvm-svn: 329414	2018-04-06 16:16:43 +00:00
Simon Pilgrim	09eeb3a8b9	[X86][SandyBridge] Add (V)DPPS memory fold latencies Noticed this during D44654 llvm-svn: 329389	2018-04-06 11:25:21 +00:00
Simon Pilgrim	8a83f16ccd	[X86][SandyBridge] SBWriteResPair +5cy Memory Folds As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions. I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent. As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests... Differential Revision: https://reviews.llvm.org/D44654 llvm-svn: 329388	2018-04-06 11:00:51 +00:00
Simon Pilgrim	fd1f4fe54e	[X86][SkylakeServer] Merge 2 InstRW entries to the same sched group. NFCI. llvm-svn: 329386	2018-04-06 10:16:36 +00:00
Craig Topper	fbe3132f67	[X86] Separate CDQ and CDQE in the scheduler model. According to Agner's data, CDQE is closer to CWDE. llvm-svn: 329354	2018-04-05 21:56:19 +00:00
Craig Topper	4cc3827791	[X86] Add MOVZPQILo2PQIrr to the Sandy Bridge scheduler model llvm-svn: 329351	2018-04-05 21:40:32 +00:00
Craig Topper	3b0b96c591	[X86] Add LEAVE instruction to the scheduler models using the same data as LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge. This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64. The Sandy Bridge version was missing a load port use. llvm-svn: 329347	2018-04-05 21:16:26 +00:00
Craig Topper	c6bb36a3d0	[X86] Remove some InstRWs for plain store instructions on Sandy Bridge. We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion. llvm-svn: 329339	2018-04-05 20:04:06 +00:00
Craig Topper	9eec2025c5	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329330	2018-04-05 18:38:45 +00:00
Craig Topper	665f74414d	[X86] Disassembler support for having an ADSIZE prefix affect instructions with 0xf2 and 0xf3 prefixes. Needed to support umonitor from D45253. llvm-svn: 329327	2018-04-05 18:20:14 +00:00
Craig Topper	6ecdb03f16	[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. llvm-svn: 329310	2018-04-05 16:32:48 +00:00
Andrea Di Biagio	c74ad502ce	[MC][Tablegen] Allow models to describe the retire control unit for llvm-mca. This patch adds the ability to describe properties of the hardware retire control unit. Tablegen class RetireControlUnit has been added for this purpose (see TargetSchedule.td). A RetireControlUnit specifies the size of the reorder buffer, as well as the maximum number of opcodes that can be retired every cycle. A zero (or negative) value for the reorder buffer size means: "the size is unknown". If the size is unknown, then llvm-mca defaults it to the value of field SchedMachineModel::MicroOpBufferSize. A zero or negative number of opcodes retired per cycle means: "there is no restriction on the number of instructions that can be retired every cycle". Models can optionally specify an instance of RetireControlUnit. There can only be up-to one RetireControlUnit definition per scheduling model. Information related to the RCU (RetireControlUnit) is stored in (two new fields of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp). This patch fixes PR36661. Differential Revision: https://reviews.llvm.org/D45259 llvm-svn: 329304	2018-04-05 15:41:41 +00:00
Craig Topper	15303dda0d	[X86] Revert r329251-329254 It's failing on the bots and I'm not sure why. This reverts: [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. [X86] Remove some InstRWs for plain store instructions on Sandy Bridge. [X86] Auto-generate complete checks. NFC llvm-svn: 329256	2018-04-05 05:19:36 +00:00
Craig Topper	25c7110a37	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329254	2018-04-05 04:42:03 +00:00
Craig Topper	4b1fdd4921	[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. llvm-svn: 329253	2018-04-05 04:42:02 +00:00
Craig Topper	5c36557426	[X86] Auto-generate complete checks. NFC llvm-svn: 329251	2018-04-05 04:41:59 +00:00
Jessica Paquette	bccd18b816	[MachineOutliner] Add `useMachineOutliner` target hook The MachineOutliner has a bunch of target hooks that will call llvm_unreachable if the target doesn't implement them. Therefore, if you enable the outliner on such a target, it'll just crash. It'd be much better if it'd just not run the outliner at all in this case. This commit adds a hook to TargetInstrInfo that returns false by default. Targets that implement the hook make it return true. The outliner checks the return value of this hook to decide whether or not to continue. llvm-svn: 329220	2018-04-04 19:13:31 +00:00
Craig Topper	498875fab0	[X86] Separate BSWAP32r and BSWAP64r scheduling data in SandyBridge/Haswell/Broadwell/Skylake scheduler models. The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r. llvm-svn: 329211	2018-04-04 17:54:19 +00:00
Nico Weber	1cbd096914	Sort targetgen calls in lib/Target/*/CMakeLists. Makes it easier to see mistakes such as the one fixed in r329178 and makes the different target CMakeLists more consistent. Also remove some stale-looking comments from the Nios2 target cmakefile. No intended behavior change. llvm-svn: 329181	2018-04-04 12:37:44 +00:00
Benjamin Kramer	1fc0da4849	Make helpers static. NFC. llvm-svn: 329170	2018-04-04 11:45:11 +00:00
Craig Topper	a30db995b3	[X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ. These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both. llvm-svn: 329154	2018-04-04 07:00:24 +00:00
Craig Topper	a3cac956fc	[X86] Use loadi16/loadi32 predicates in multiply patterns llvm-svn: 329153	2018-04-04 07:00:19 +00:00
Craig Topper	88e38e3e3e	[X86] Remove more dead code left over from the handling of i8/i16 UMUL_LOHI/SMUL_LOHI that is no longer needed. NFC llvm-svn: 329152	2018-04-04 07:00:16 +00:00
Craig Topper	afa22edcf0	[X86] Remove dead code for handling i8/i16 UMUL_LOHI/SMUL_LOHI from X86ISelDAGToDAG.cpp. NFC These are promoted to i16/i32 multiplies by a DAG combine. llvm-svn: 329147	2018-04-04 04:38:55 +00:00
Craig Topper	3064c15dc3	[X86] Remove some code that was only needed when i1 was a legal type. NFC llvm-svn: 329146	2018-04-04 04:38:54 +00:00
Vlad Tsyrklevich	b324733169	Fix bad #include path in r329139 llvm-svn: 329140	2018-04-04 01:34:42 +00:00
Vlad Tsyrklevich	e3446017ed	Add the ShadowCallStack pass Summary: The ShadowCallStack pass instruments functions marked with the shadowcallstack attribute. The instrumented prolog saves the return address to [gs:offset] where offset is stored and updated in [gs:0]. The instrumented epilog loads/updates the return address from [gs:0] and checks that it matches the return address on the stack before returning. Reviewers: pcc, vitalybuka Reviewed By: pcc Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44802 llvm-svn: 329139	2018-04-04 01:21:16 +00:00
Jessica Paquette	5fa2a63785	[MachineOutliner] Test for X86FI->getUsesRedZone() as well as Attribute::NoRedZone This commit is similar to r329120, but uses the existing getUsesRedZone() function in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a function truly uses a redzone instead of just the noredzone attribute on a function. Thus, after this commit, it's possible to outline from x86 without using -mno-red-zone and still get outlining results. This also adds a new test for the new redzone behaviour. llvm-svn: 329134	2018-04-03 23:32:41 +00:00
Andrea Di Biagio	9da4d6db33	[MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067	2018-04-03 13:36:24 +00:00
Craig Topper	9b6a65b9ef	[X86] Reduce number of OpPrefix bits in TSFlags to 2. NFCI TSFlag doesn't need to disambiguate NoPrfx from PS. So shift the encodings so PS is NoPrfx\|0x4. llvm-svn: 329049	2018-04-03 06:37:04 +00:00
Lama Saba	927468309f	[X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346 If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Differential revision: https://reviews.llvm.org/D41330 Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9 llvm-svn: 328973	2018-04-02 13:48:28 +00:00
Craig Topper	96729cd64b	[X86][Silvermont] Use correct latency and throughput information for divide and square root in the scheduler model. Data taken from Table 16-17 in the Intel Optimization Manual. llvm-svn: 328962	2018-04-02 06:34:16 +00:00
Craig Topper	6a814904da	[X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide. Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/ llvm-svn: 328961	2018-04-02 05:54:34 +00:00
Craig Topper	8104f266a4	[X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models. Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those. llvm-svn: 328960	2018-04-02 05:33:28 +00:00
Craig Topper	dc74094398	[X86] Fix the SchedRW for AVX512 shift instructions. It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959	2018-04-02 03:15:02 +00:00
Craig Topper	5fb1dc2d22	[X86] Give the AVX512 VEXTRACT instructions the same SchedRWs as the SSE/AVX versions. llvm-svn: 328958	2018-04-02 02:44:55 +00:00
Craig Topper	caec723a1a	[X86] Add an itinerary to BTR64rr. llvm-svn: 328956	2018-04-02 01:12:34 +00:00
Craig Topper	02daec00a2	[X86] Make sure all the classes declare in the Haswell scheduler model are prefixed with HW. The tablegen files all share a namespace so we shouldn't use a generic names in a specific scheduler model. llvm-svn: 328955	2018-04-02 01:12:32 +00:00
Craig Topper	c90d906b16	[X86] Give VINSERTPS the same intinerary as INSERTPS. llvm-svn: 328954	2018-04-02 00:48:11 +00:00
Craig Topper	dc4a6d1ef6	[X86] Cleanup ADCX/ADOX instruction definitions. Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW. llvm-svn: 328952	2018-04-01 23:58:50 +00:00
Craig Topper	9f834810ea	[X86] Give ADC8/16/32/64mi the same scheduling information as ADC8/16/32/64mr and SBB8/16/32/64mi. It doesn't make a lot of sense that it would be different. llvm-svn: 328946	2018-04-01 21:54:24 +00:00
Chandler Carruth	4244625c51	[x86] Correct the operand structure of the ADOX instruction. This also moves to define it in the same way as ADCX which seems to use constraints a bit better. This is pulled out of the review for reducing the use of popf for restoring EFLAGS, but is independent. There are still more problems with our definitions for these instructions that Craig is going to look at but this is at least less broken and he can start from this to improve them more fully. Thanks to Craig for the review here. llvm-svn: 328945	2018-04-01 21:53:18 +00:00
Chandler Carruth	06b343c6ed	[x86] Expose more of the condition conversion routines in the public API for X86's instruction information. I've now got a second patch under review that needs these same APIs. This bit is nicely orthogonal and obvious, so landing it. NFC. llvm-svn: 328944	2018-04-01 21:47:55 +00:00
Craig Topper	9b8cd5fe55	[X86] Don't check for folding into a store when deciding if we can promote an i16 mul. There's no RMW mul operation. llvm-svn: 328931	2018-04-01 06:29:32 +00:00
Craig Topper	db6caabccc	[X86] Check if the load and store are to the same pointer before preventing i16 RMW shifts and subtracts from being promoted. llvm-svn: 328930	2018-04-01 06:29:28 +00:00
Craig Topper	ae2de57db0	[X86] Allow i16 subtracts to be promoted if the load is on the LHS and its not being stored. llvm-svn: 328928	2018-04-01 06:29:25 +00:00
Craig Topper	9bc0d881a3	[X86] Remove unneeded temporary variable. NFC This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false. llvm-svn: 328926	2018-04-01 06:29:21 +00:00
Simon Pilgrim	3b8ad346f9	[X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entries llvm-svn: 328918	2018-03-31 09:15:54 +00:00
Simon Pilgrim	8c8ebd7945	Fix trailing whitespace. NFCI. llvm-svn: 328917	2018-03-31 09:14:14 +00:00
Craig Topper	13a0f83a05	[X86] Add SchedRW for PMULLD Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914	2018-03-31 04:54:32 +00:00
Fangrui Song	956ee79795	Fix a bunch of typoes. NFC llvm-svn: 328907	2018-03-30 22:22:31 +00:00
Andrea Di Biagio	dc97172b2f	[X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts and VSQRT instructions. There were still a few AVX instructions with an incorrect number of opcodes. These should be fixed now. llvm-svn: 328892	2018-03-30 18:53:47 +00:00
Andrea Di Biagio	3eaa26bb64	[X86][BtVer2] Fix the number of uOps for horizontal operations. llvm-svn: 328886	2018-03-30 18:15:30 +00:00
Andrea Di Biagio	073a9d74ca	[X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds and most vector logic instructions. Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input operand. llvm-svn: 328867	2018-03-30 14:48:08 +00:00
Craig Topper	ee3c19fd7f	[X86] Add ReadAfterLds to some 3 src instructions Sometimes the operand comes after the memory operand so we need 5 ReadDefaults first. I suspect we also need to do something for the mask operand for masked avx512 instructions? I'm not sure if the mask should be ReadAfterLd or not since it can mask faults. If it shouldn't be ReadAfterLd then we're probably wrong for zero masking instructions already. Differential Revision: https://reviews.llvm.org/D44726 llvm-svn: 328834	2018-03-29 22:03:05 +00:00
Craig Topper	3f2dbec652	[X86] Remove ReadAfterLd from BMI and TBM instructions that don't have a register operand in their memory form The memory form of these instructions only read an input from memory. They don't have any register operands. Differential Revision: https://reviews.llvm.org/D44836 llvm-svn: 328828	2018-03-29 21:03:53 +00:00
Craig Topper	89310f56c8	[X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI. These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd Differential Revision: https://reviews.llvm.org/D44838 llvm-svn: 328823	2018-03-29 20:41:39 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
Simon Pilgrim	71c5f3fffd	[X86][SSE] Don't bother re-adding combined target shuffles to the work list We are re-adding all the bitcasts, constant masks and target shuffles to the work list for no apparent gain. Found while investigating adding SimplifyDemandedVectorElts to target shuffles. Differential Revision: https://reviews.llvm.org/D44942 llvm-svn: 328771	2018-03-29 11:18:41 +00:00
Craig Topper	a21758fa2c	[X86] Don't pass getRegisterName from the InstPrinters into EmitAnyX86InstComments. Just always use the function from the ATTPrinter. NFC The IntelPrinter and the ATTPrinter produce the same strings for the same input. We already use the ATTPrinter explicitly in several other places. llvm-svn: 328762	2018-03-29 04:14:04 +00:00
Craig Topper	7456af88f4	[X86] Rename RIi64_NOREX tblgen class to just Ii64. Make RIi64 inherit from it. NFC This feels more consistent with the other classes. We don't need to say _NOREX if we didn't start it with an R in the first place. llvm-svn: 328757	2018-03-29 03:14:57 +00:00
Craig Topper	7441ffff84	[X86] Cleanup inheritance of the X86InstrFormats.td classes. NFC EVEX shouldn't inherit from VEX and EVEX_4V shouldn't inherit from VEX_4V. llvm-svn: 328756	2018-03-29 03:14:56 +00:00
Craig Topper	aac23d7881	[X86][SkylakeServer] Remove checks for 'k', 'z', '_Int' and 'b' from scheduler regexs. Most of these were optional matches at the end of the strings, but since the strings themselves are prefix matches by default you don't need to check for something optional at the end. I've left the 'b' on memory instructions where it means 'broadcast' because I'm not sure those really have the same load latency and we may need to split them explicitly in the future. llvm-svn: 328730	2018-03-28 20:40:24 +00:00
Simon Pilgrim	b1bc6cd96b	[X86][Btver2] Moved JWriteFCmp/JWriteFCmpY classes next to each other. NFCI Renamed JWriteFPAY22 to JWriteFCmpY - we've tended to avoid latency based names llvm-svn: 328701	2018-03-28 13:53:21 +00:00
Andrea Di Biagio	5076b98fb9	[X86][BtVer2] Fix the number of micro opcodes for AES[ENC\|DEC] and other YMM instructions. Similar to r328694. The number of micro opcodes should be 2 for those instructions. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328698	2018-03-28 12:12:04 +00:00
Andrea Di Biagio	010924e35c	[X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions. The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694	2018-03-28 10:49:33 +00:00
Simon Pilgrim	a2f26788a3	[X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer. Differential Revision: https://reviews.llvm.org/D44924 llvm-svn: 328664	2018-03-27 20:38:54 +00:00
Simon Pilgrim	5f7ab4fedf	[X86][Btver2] Add MMX_PMOVMSKBrr to MOVMSK scheduler class llvm-svn: 328620	2018-03-27 12:26:12 +00:00
Simon Pilgrim	28e7bcbba6	[X86] Add WriteCRC32 scheduler class Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582	2018-03-26 21:06:14 +00:00
Simon Pilgrim	fcf49df21c	[X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573	2018-03-26 19:01:06 +00:00
Reid Kleckner	41fb2dba9c	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32 Summary: Re-lands r328386 and r328443, reverting r328482. Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small parameters in i8 and i16 do not end up in the SysV register parameters (EDI, ESI, etc). I added tests for how we receive small parameters, since that is the important part. It's always safe to store more bytes than will be read, but the assumptions you make when loading them are what really matter. I also tested this by self-hosting clang and it passed tests on win64. Reviewers: mstorsjo, hans Subscribers: hiraditya, mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44900 llvm-svn: 328570	2018-03-26 18:49:48 +00:00
Simon Pilgrim	f33d905293	[X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566	2018-03-26 18:19:28 +00:00
Simon Pilgrim	86ea53123d	[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551	2018-03-26 17:02:02 +00:00
Simon Pilgrim	8815105cd5	[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs llvm-svn: 328541	2018-03-26 16:24:13 +00:00
Simon Pilgrim	aa40148cae	[X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of JALU0 for GPR PRF write) llvm-svn: 328536	2018-03-26 16:10:08 +00:00
Simon Pilgrim	0b73b29388	[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505	2018-03-26 15:30:47 +00:00
Simon Pilgrim	3aa9344605	[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501	2018-03-26 14:44:24 +00:00
Simon Pilgrim	67df1cf597	[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497	2018-03-26 14:03:40 +00:00
Simon Pilgrim	caa203aed5	[X86][Btver2] Double the AGU and schedule pipe resources for YMM Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491	2018-03-26 13:15:20 +00:00
Hans Wennborg	311b63f13b	Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32" This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up patch at D44876 fixes this, but let's revert back to green for now until that's ready to land. (Also reverts r328443.) > Both GCC and MSVC only look at the low byte of a boolean when it is > passed. llvm-svn: 328482	2018-03-26 10:07:51 +00:00
Craig Topper	6f28d3c954	[X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT. llvm-svn: 328474	2018-03-26 05:05:12 +00:00
Craig Topper	cdfcf8ecda	[X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models. I've used Agner's data as best I could to get the values to converge on. llvm-svn: 328473	2018-03-26 05:05:10 +00:00
Craig Topper	fbf2d850e3	[X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss instructions. llvm-svn: 328472	2018-03-26 04:20:36 +00:00
Craig Topper	c049cb7823	[X86] Correct the itineraries for the dot production instructions. llvm-svn: 328471	2018-03-26 02:17:15 +00:00
Craig Topper	4367874bc5	[X86] Use the same itinerary for VCVTDQ2PD as the SSE version so that the generated scheduler classes will merge. llvm-svn: 328470	2018-03-26 02:17:14 +00:00
Craig Topper	659f85af14	[X86] Swap the itineraries on the memory and register forms of CVTDQ2PD. They were backwards. llvm-svn: 328469	2018-03-26 02:17:13 +00:00
Craig Topper	4bf23eddaf	[X86] Give VMOVSX/ZX the same itinerary as the SSE version so they'll reuse the same generated scheduler class. llvm-svn: 328468	2018-03-26 02:17:12 +00:00

... 5 6 7 8 9 ...

17374 Commits