llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikolai Bozhenov	f679530ba1	[X86] Heuristic to selectively build Newton-Raphson SQRT estimation On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725	2016-08-04 12:47:28 +00:00
Simon Pilgrim	381a0ade5a	[X86] Add 'FeatureSlowSHLD' to cpu 'bdver4' As with all AMD CPUs, excavator has poor SHLD/SHRD performance. Also added bdver3 to the test as it was missing. llvm-svn: 276569	2016-07-24 16:00:53 +00:00
Ashutosh Nema	348af9cc6b	Add new flag and intrinsic support for MWAITX and MONITORX instructions Summary: MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT pair while adding a timer function, such that another termination of the MWAITX instruction occurs when the timer expires. The presence of the MONITORX and MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29. The MONITORX and MWAITX instructions are intercepted by the same bits that intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be monitored. MWAITX instruction causes the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is "0F 01 FB". These opcode information is used in adding tests for the disassembler. These instructions are enabled for AMD's bdver4 architecture. Patch by Ganesh Gopalasubramanian! Reviewers: echristo, craig.topper, RKSimon Subscribers: RKSimon, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19795 llvm-svn: 269911	2016-05-18 11:59:12 +00:00
Mitch Bodart	e60465ddf7	[X86] Enable the post-RA-scheduler for clang's default 32-bit cpu. For compilations with no explicit cpu specified, this exhibits nice gains on Silvermont, with neutral performance on big cores. Differential Revision: http://reviews.llvm.org/D19138 llvm-svn: 267809	2016-04-27 22:52:35 +00:00
Andrey Turetskiy	958eb46443	[X86] Introduce Lakemont CPU. Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128	2016-04-01 10:16:15 +00:00
Andrey Turetskiy	6a3d561ea0	[X86] Introduction of FeatureX87. Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87. Differential Revision: http://reviews.llvm.org/D13979 llvm-svn: 264148	2016-03-23 11:13:54 +00:00
Simon Pilgrim	aa99331bad	[X86] AMD Bobcat CPU (btver1) doesn't support XSAVE btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't support XSAVE. Differential Revision: http://reviews.llvm.org/D17683 llvm-svn: 262782	2016-03-05 22:00:50 +00:00
Sanjoy Das	aa63dc0e9a	Fix LLVM's handling and detection of skylake and cannonlake CPUs Summary: - Rename `"skylake"` == SkylakeServerProc to `"skylake-avx512"` - Change `"skylake"` to denote SkylakeClientProc - Fix the detection of cpu family 6 and model 94 to be SkylakeClientProc instead of SkylakeServerProc - Remove the `"cnl"` for CannonLake Reviewers: craig.topper, delena Subscribers: zansari, echristo, qcolombet, RKSimon, spatel, DavidKreitzer, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17090 llvm-svn: 261482	2016-02-21 17:12:03 +00:00
Craig Topper	f730a6bedc	Remove Proc feature flags for X86 processors that are used to inherit features from one processor to another. This exposed extra features to the -mattr command line that we shouldn't. Replace with just inherited listconcats. llvm-svn: 260832	2016-02-13 21:35:37 +00:00
Yunzhong Gao	0de36ec169	Disable the vzeroupper insertion pass on PS4. Differential Revision: http://reviews.llvm.org/D16837 llvm-svn: 260764	2016-02-12 23:37:57 +00:00
Craig Topper	3bb3f73be3	[X86] Change FeatureIFMA string to 'avx512ifma'. Matches gcc and fixes PR26461. llvm-svn: 260069	2016-02-08 01:23:15 +00:00
Elena Demikhovsky	29cde35b43	Added Skylake client to X86 targets and features Changes in X86.td: I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X .. I added Skylake client processor and defined it's features FeatureADX was missing on KNL Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others Differential Revision: http://reviews.llvm.org/D16357 llvm-svn: 258659	2016-01-24 10:41:28 +00:00
Elena Demikhovsky	9242ea87d6	Added Cannonlake processor to X86 Target Differential Revision: http://reviews.llvm.org/D16289 llvm-svn: 258046	2016-01-18 13:00:31 +00:00
Michael Zuckerman	97b6a6923e	[AVX512] adding AVXVBMI feature flag The feature flag is for VPERMB,VPERMI2B,VPERMT2B and VPMULTISHIFTQB instructions. More about the instruction can be found in: hattps://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Differential Revision: http://reviews.llvm.org/D16190 llvm-svn: 258012	2016-01-17 13:42:12 +00:00
Craig Topper	3294966ed7	[X86] Remove declaration of ATTAsmParser. Its equivalent to the DefaultAsmParser. NFC llvm-svn: 256541	2015-12-29 07:03:27 +00:00
Asaf Badouh	5acf66ff97	[x86] adding PKU feature flag the feature flag is essential for RDPKRU and WRPKRU instruction more about the instruction can be found in the SDM rev 56, vol 2 from http://www.intel.com/sdm Differential Revision: http://reviews.llvm.org/D15491 llvm-svn: 255644	2015-12-15 13:35:29 +00:00
Hans Wennborg	fbf2822e6d	Add FeatureLAHFSAHF to amdfam10 as well. llvm-svn: 254801	2015-12-04 23:32:19 +00:00
Hans Wennborg	5000ce8a63	X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported These instructions are not supported by all CPUs in 64-bit mode. Emitting them causes Chromium to crash on start-up for users with such chips. (GCC puts these instructions behind -msahf on 64-bit for the same reason.) This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering from before r244503 when the instructions are not available. Differential Revision: http://reviews.llvm.org/D15240 llvm-svn: 254793	2015-12-04 23:00:33 +00:00
Eric Christopher	57a6e1321f	Add MMX to the 3dnow enum and propagate changes around. This makes it somewhat more consistent with how the feature is used. llvm-svn: 253122	2015-11-14 03:04:00 +00:00
Craig Topper	09b6598572	[X86] Add fxsr feature flag for fxsave/fxrestore instructions. llvm-svn: 250497	2015-10-16 06:03:09 +00:00
Craig Topper	fd2cc7cd8a	Add XSAVE/XSAVEOPT to KNL processor. llvm-svn: 250362	2015-10-15 03:56:54 +00:00
Craig Topper	0ee356951a	[X86] Add XSAVE feature flags to their various processors. llvm-svn: 250268	2015-10-14 05:37:38 +00:00
Sanjay Patel	53d1d8b731	fix capitalization; NFC llvm-svn: 250049	2015-10-12 15:24:01 +00:00
Amjad Aboud	1db6d7af46	[X86] Add XSAVE intrinsic family Add intrinsics for the XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64) XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64) XSAVEC instructions (XSAVEC/XSAVEC64) XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64) Differential Revision: http://reviews.llvm.org/D13012 llvm-svn: 250029	2015-10-12 11:47:46 +00:00
Eric Christopher	11e5983658	Move the MMX subtarget feature out of the SSE set of features and into its own variable. This is needed so that we can explicitly turn off MMX without turning off SSE and also so that we can diagnose feature set incompatibilities that involve MMX without SSE. Rationale: // sse3 __m128d test_mm_addsub_pd(__m128d A, __m128d B) { return _mm_addsub_pd(A, B); } // mmx void shift(__m64 a, __m64 b, int c) { _mm_slli_pi16(a, c); _mm_slli_pi32(a, c); _mm_slli_si64(a, c); _mm_srli_pi16(a, c); _mm_srli_pi32(a, c); _mm_srli_si64(a, c); _mm_srai_pi16(a, c); _mm_srai_pi32(a, c); } clang -msse3 -mno-mmx file.c -c For this code we should be able to explicitly turn off MMX without affecting the compilation of the SSE3 function and then diagnose and error on compiling the MMX function. This matches the existing gcc behavior and follows the spirit of the SSE/MMX separation in llvm where we can (and do) turn off MMX code generation except in the presence of intrinsics. Updated a couple of tests, but primarily tested with a couple of tests for turning on only mmx and only sse. This is paired with a patch to clang to take advantage of this behavior. llvm-svn: 249731	2015-10-08 20:10:06 +00:00
Sanjay Patel	30145677a8	rename "slow-unaligned-mem-under-32" to slow-unaligned-mem-16" (NFCI) This is a follow-on suggested by: http://reviews.llvm.org/D12154 ( http://reviews.llvm.org/rL245729 ) http://reviews.llvm.org/D10662 ( http://reviews.llvm.org/rL245075 ) This makes the attribute name match most of the existing lowering logic and regression test expectations. But the current use of this attribute is inconsistent; see the FIXME comment for "allowsMisalignedMemoryAccesses()". That change will result in functional changes and should be coming soon. llvm-svn: 246585	2015-09-01 20:51:51 +00:00
Sanjay Patel	dddad10241	remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or later See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data. llvm-svn: 245733	2015-08-21 20:39:17 +00:00
Sanjay Patel	9e916dc48d	[x86] invert logic for attribute 'FeatureFastUAMem' This is a 'no functional change intended' patch. It removes one FIXME, but adds several more. Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however, you can see that we're not consistent about this. Changing the name of the attribute makes it clearer to see the logic holes. Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute to new chips; fast unaligned accesses have been standard for several generations of CPUs now. Differential Revision: http://reviews.llvm.org/D12154 llvm-svn: 245729	2015-08-21 20:17:26 +00:00
Craig Topper	cb1f601a7b	[X86] Add ADX and RDSEED to Skylake processor. llvm-svn: 244396	2015-08-08 07:31:15 +00:00
Craig Topper	01dd4ea334	Add SlowBTMem to Sandy Bridge and newer Intel CPUs. Reading through Agner Fog's table suggests there have been no improvements to these processors relative to Westmere for bit test instructions. llvm-svn: 244395	2015-08-08 07:20:04 +00:00
Sean Silva	e1c6b549ef	Avoid using uncommon acronym "MSROM". llvm-svn: 243256	2015-07-27 00:46:59 +00:00
Michael Kuperstein	454d145395	[X86] Allow load folding into PUSH instructions Adds pushes to the folding tables. This also required a fix to the TD definition, since the memory forms of the push instructions did not have the right mayLoad/mayStore flags. Differential Revision: http://reviews.llvm.org/D11340 llvm-svn: 243010	2015-07-23 12:23:45 +00:00
Sanjay Patel	667a7e2a0f	make reciprocal estimate code generation more flexible by adding command-line options (3rd try) The first try (r238051) to land this was reverted due to ExecutionEngine build failure; that was hopefully addressed by r238788. The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure; that was hopefully addressed by r238953. This patch adds a TargetRecip class for processing many recip codegen possibilities. The class is intended to handle both command-line options to llc as well as options passed in from a front-end such as clang with the -mrecip option. The x86 backend is updated to use the new functionality. Only -mcpu=btver2 with -ffast-math should see a functional change from this patch. All other x86 CPUs continue to not use reciprocal estimates by default with -ffast-math. Differential Revision: http://reviews.llvm.org/D8982 llvm-svn: 239001	2015-06-04 01:32:35 +00:00
Elena Demikhovsky	f7e641cc2d	X86: Added MPX feature and bound registers. Intel® Memory Protection Extensions (Intel® MPX) is a new feature in Skylake. It is a part of KNL and SKX sets. It is also a part of Skylake client. I added definition of %bnd0 - %bnd3 registers, each register is a pair of 64-bit integers. llvm-svn: 238916	2015-06-03 10:30:57 +00:00
Rafael Espindola	cf8beece97	Revert "make reciprocal estimate code generation more flexible by adding command-line options (2nd try)" This reverts commit r238842. It broke -DBUILD_SHARED_LIBS=ON build. llvm-svn: 238900	2015-06-03 05:32:44 +00:00
Sanjay Patel	6f031d848e	make reciprocal estimate code generation more flexible by adding command-line options (2nd try) The first try (r238051) to land this was reverted due to bot failures that were hopefully addressed by r238788. This patch adds a TargetRecip class for processing many recip codegen possibilities. The class is intended to handle both command-line options to llc as well as options passed in from a front-end such as clang with the -mrecip option. The x86 backend is updated to use the new functionality. Only -mcpu=btver2 with -ffast-math should see a functional change from this patch. All other x86 CPUs continue to not use reciprocal estimates by default with -ffast-math. Differential Revision: http://reviews.llvm.org/D8982 llvm-svn: 238842	2015-06-02 15:28:15 +00:00
Rafael Espindola	445712264d	Revert "make reciprocal estimate code generation more flexible by adding command-line options" This reverts commit r238051. It broke some bots: http://lab.llvm.org:8011/builders/llvm-ppc64-linux1/builds/18190 llvm-svn: 238075	2015-05-23 00:22:44 +00:00
Sanjay Patel	ba2ba80302	make reciprocal estimate code generation more flexible by adding command-line options This patch adds a class for processing many recip codegen possibilities. The TargetRecip class is intended to handle both command-line options to llc as well as options passed in from a front-end such as clang with the -mrecip option. The x86 backend is updated to use the new functionality. Only -mcpu=btver2 with -ffast-math should see a functional change from this patch. All other CPUs continue to not use reciprocal estimates by default with -ffast-math. Differential Revision: http://reviews.llvm.org/D8982 llvm-svn: 238051	2015-05-22 21:10:06 +00:00
Eric Christopher	824f42f209	Migrate existing backends that care about software floating point to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079	2015-05-12 01:26:05 +00:00
Craig Topper	3611d9bc01	[X86] Remove FeatureAES for 'corei7' CPU. 'corei7' should match 'nehalem' which doesn't have AES. Having AES and not PCLMUL makes 'corei7' halfway between Nehalem and Westmere. llvm-svn: 233517	2015-03-30 06:31:11 +00:00
Craig Topper	a898c2d737	[X86] Remove two feature flags that covered sets of instructions that have no patterns or intrinsics. Since we don't check feature flags in the assembler parser for any instruction sets, these flags don't provide any value. This frees up 2 of the fully utilized feature flags. llvm-svn: 228282	2015-02-05 08:51:02 +00:00
Sanjay Patel	ffd039bde1	Fix program crashes due to alignment exceptions generated for SSE memop instructions (PR22371). r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit. The commit log says that change did not affect anything, but that's not correct. That change allowed SSE instructions to have unaligned mem operands folded into math ops, and that's not allowed in the default specification for any SSE variant. The bug is exposed when compiling for an AVX-capable CPU that had this feature flag but without enabling AVX codegen. Another mistake in r224330 was not adding the feature flag to all AVX CPUs; the AMD chips were excluded. This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ). This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem". Changed the existing test case for the feature bit to reflect the new name and renamed the test file itself to better reflect the feature. Added runs to fold-vex.ll to check for the failing codegen. Note that the feature bit is not set by default on any CPU because it may require a configuration register setting to enable the enhanced unaligned behavior. llvm-svn: 227983	2015-02-03 17:13:04 +00:00
Elena Demikhovsky	a79fc16bb0	X86: Added FeatureVectorUAMem for all AVX architectures. According to AVX specification: "Most arithmetic and data processing instructions encoded using the VEX prefix and performing memory accesses have more flexible memory alignment requirements than instructions that are encoded without the VEX prefix. Specifically, With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded, arithmetic and data processing instructions operate in a flexible environment regarding memory address alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load operation by default. Memory arguments for most instructions with VEX prefix operate normally without causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions)." The same for AVX-512. This change does not affect anything right now, because only the "memop pattern fragment" depends on FeatureVectorUAMem and it is not used in AVX patterns. All AVX patterns are based on the "unaligned load" anyway. llvm-svn: 224330	2014-12-16 09:10:08 +00:00
Chandler Carruth	f57ac3bd22	[x86] Fix the test to actually test things for the CPU names, add the missing barcelona CPU which that test uncovered, and remove the 32-bit x86 CPUs which I really wasn't prepared to audit and test thoroughly. If anyone wants to clean up the 32-bit only x86 CPUs, go for it. Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd be cool, but from the looks of it wouldn't save as much as it did for the Intel CPUs. llvm-svn: 223774	2014-12-09 14:25:55 +00:00
Chandler Carruth	af892403c2	[x86] Bring some sanity to the x86 CPU processor definitions. Notably, this adds simple micro-architecture names for the Intel CPU variants, and defines the old 'core'-based names as aliases. GCC has started to simplify their documented interface to use these names as well, so it seems like we can start to converge on a consistent pattern. I'd appreciate Intel double checking the entries that aren't yet documented widely, especially Atom (Bonnell and Silvermont), Knights Landing, and Skylake. But this change shouldn't break any existing users. Also, ran clang-format to re-format this code and it actually worked (modulo a tiny bug) so hopefully we can start to stop thinking about formatting this stuff. llvm-svn: 223769	2014-12-09 10:58:36 +00:00
Michael Liao	5bf9578ce4	[X86] Clean up whitespace as well as minor coding style llvm-svn: 223339	2014-12-04 05:20:33 +00:00
Sanjay Patel	e57f3c0a42	Enable FeatureFastUAMem for btver2 Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets. Replace the existing supposed small memcpy test with an actual test of a small memcpy. The previous test wasn't using FileCheck either. This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6360 llvm-svn: 222925	2014-11-28 18:40:18 +00:00
Sanjay Patel	501890e909	Add a feature flag for slow 32-byte unaligned memory accesses [x86]. This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen for Sandy Bridge and Ivy Bridge. There is no functionality change intended for those chips. Previously, the absence of AVX2 was being used as a proxy to detect this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2 that do not have the 32-byte unaligned access slowdown. Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6355 llvm-svn: 222544	2014-11-21 17:40:04 +00:00
Alexey Volkov	fd1731d876	[X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers Differential Revision: http://reviews.llvm.org/D5938 llvm-svn: 222521	2014-11-21 11:19:34 +00:00
Alexey Volkov	7de210bd52	[X86] Use ADD/SUB instead of INC/DEC for Haswell and Broadwell CPUs Differential Revision: http://reviews.llvm.org/D5934 llvm-svn: 222141	2014-11-17 16:17:51 +00:00

1 2 3 4 5

223 Commits