llvm-project

Commit Graph

Author	SHA1	Message	Date
Rafael Espindola	445712264d	Revert "make reciprocal estimate code generation more flexible by adding command-line options" This reverts commit r238051. It broke some bots: http://lab.llvm.org:8011/builders/llvm-ppc64-linux1/builds/18190 llvm-svn: 238075	2015-05-23 00:22:44 +00:00
Sanjay Patel	ba2ba80302	make reciprocal estimate code generation more flexible by adding command-line options This patch adds a class for processing many recip codegen possibilities. The TargetRecip class is intended to handle both command-line options to llc as well as options passed in from a front-end such as clang with the -mrecip option. The x86 backend is updated to use the new functionality. Only -mcpu=btver2 with -ffast-math should see a functional change from this patch. All other CPUs continue to not use reciprocal estimates by default with -ffast-math. Differential Revision: http://reviews.llvm.org/D8982 llvm-svn: 238051	2015-05-22 21:10:06 +00:00
Eric Christopher	824f42f209	Migrate existing backends that care about software floating point to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079	2015-05-12 01:26:05 +00:00
Craig Topper	3611d9bc01	[X86] Remove FeatureAES for 'corei7' CPU. 'corei7' should match 'nehalem' which doesn't have AES. Having AES and not PCLMUL makes 'corei7' halfway between Nehalem and Westmere. llvm-svn: 233517	2015-03-30 06:31:11 +00:00
Craig Topper	a898c2d737	[X86] Remove two feature flags that covered sets of instructions that have no patterns or intrinsics. Since we don't check feature flags in the assembler parser for any instruction sets, these flags don't provide any value. This frees up 2 of the fully utilized feature flags. llvm-svn: 228282	2015-02-05 08:51:02 +00:00
Sanjay Patel	ffd039bde1	Fix program crashes due to alignment exceptions generated for SSE memop instructions (PR22371). r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit. The commit log says that change did not affect anything, but that's not correct. That change allowed SSE instructions to have unaligned mem operands folded into math ops, and that's not allowed in the default specification for any SSE variant. The bug is exposed when compiling for an AVX-capable CPU that had this feature flag but without enabling AVX codegen. Another mistake in r224330 was not adding the feature flag to all AVX CPUs; the AMD chips were excluded. This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ). This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem". Changed the existing test case for the feature bit to reflect the new name and renamed the test file itself to better reflect the feature. Added runs to fold-vex.ll to check for the failing codegen. Note that the feature bit is not set by default on any CPU because it may require a configuration register setting to enable the enhanced unaligned behavior. llvm-svn: 227983	2015-02-03 17:13:04 +00:00
Elena Demikhovsky	a79fc16bb0	X86: Added FeatureVectorUAMem for all AVX architectures. According to AVX specification: "Most arithmetic and data processing instructions encoded using the VEX prefix and performing memory accesses have more flexible memory alignment requirements than instructions that are encoded without the VEX prefix. Specifically, With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded, arithmetic and data processing instructions operate in a flexible environment regarding memory address alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load operation by default. Memory arguments for most instructions with VEX prefix operate normally without causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions)." The same for AVX-512. This change does not affect anything right now, because only the "memop pattern fragment" depends on FeatureVectorUAMem and it is not used in AVX patterns. All AVX patterns are based on the "unaligned load" anyway. llvm-svn: 224330	2014-12-16 09:10:08 +00:00
Chandler Carruth	f57ac3bd22	[x86] Fix the test to actually test things for the CPU names, add the missing barcelona CPU which that test uncovered, and remove the 32-bit x86 CPUs which I really wasn't prepared to audit and test thoroughly. If anyone wants to clean up the 32-bit only x86 CPUs, go for it. Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd be cool, but from the looks of it wouldn't save as much as it did for the Intel CPUs. llvm-svn: 223774	2014-12-09 14:25:55 +00:00
Chandler Carruth	af892403c2	[x86] Bring some sanity to the x86 CPU processor definitions. Notably, this adds simple micro-architecture names for the Intel CPU variants, and defines the old 'core'-based names as aliases. GCC has started to simplify their documented interface to use these names as well, so it seems like we can start to converge on a consistent pattern. I'd appreciate Intel double checking the entries that aren't yet documented widely, especially Atom (Bonnell and Silvermont), Knights Landing, and Skylake. But this change shouldn't break any existing users. Also, ran clang-format to re-format this code and it actually worked (modulo a tiny bug) so hopefully we can start to stop thinking about formatting this stuff. llvm-svn: 223769	2014-12-09 10:58:36 +00:00
Michael Liao	5bf9578ce4	[X86] Clean up whitespace as well as minor coding style llvm-svn: 223339	2014-12-04 05:20:33 +00:00
Sanjay Patel	e57f3c0a42	Enable FeatureFastUAMem for btver2 Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets. Replace the existing supposed small memcpy test with an actual test of a small memcpy. The previous test wasn't using FileCheck either. This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6360 llvm-svn: 222925	2014-11-28 18:40:18 +00:00
Sanjay Patel	501890e909	Add a feature flag for slow 32-byte unaligned memory accesses [x86]. This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen for Sandy Bridge and Ivy Bridge. There is no functionality change intended for those chips. Previously, the absence of AVX2 was being used as a proxy to detect this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2 that do not have the 32-byte unaligned access slowdown. Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6355 llvm-svn: 222544	2014-11-21 17:40:04 +00:00
Alexey Volkov	fd1731d876	[X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers Differential Revision: http://reviews.llvm.org/D5938 llvm-svn: 222521	2014-11-21 11:19:34 +00:00
Alexey Volkov	7de210bd52	[X86] Use ADD/SUB instead of INC/DEC for Haswell and Broadwell CPUs Differential Revision: http://reviews.llvm.org/D5934 llvm-svn: 222141	2014-11-17 16:17:51 +00:00
Sanjay Patel	e2e589288f	Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385). This is a first step for generating SSE rcp instructions for reciprocal calcs when fast-math allows it. This is very similar to the rsqrt optimization enabled in D5658 ( http://reviews.llvm.org/rL220570 ). For now, be conservative and only enable this for AMD btver2 where performance improves significantly both in terms of latency and throughput. We may never enable this codegen for Intel Core* chips because the divider circuits are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21 cycle critical path for the rcp + mul + sub + mul + add estimate. Follow-on patches may allow configuration of the number of Newton-Raphson refinement steps, add AVX512 support, and enable the optimization for more chips. More background here: http://llvm.org/bugs/show_bug.cgi?id=21385 Differential Revision: http://reviews.llvm.org/D6175 llvm-svn: 221706	2014-11-11 20:51:00 +00:00
Andrea Di Biagio	f5b34e535d	[X86] Add 'FeatureSlowSHLD' to cpu 'bdver3'. Also explicit set FeatureAVX and FeatureSSE4A for all the bdver* cpus. This patch adds 'FeatureSlowSHLD' to 'bdver3'. According to the official AMD optimization guide for amdfam15: "Using alternative code in place of SHLD achieves lower overall latency and requires fewer execution resources. The 32-bit and 64-bit forms of ADD, ADC, SHR, and LEA (except 16-bit form) are DirectPath instructions, while SHLD is a VectorPath instruction." This patch also explicitly sets feature AVX and SSE4A for all the bdver* cpus. This part of the patch is a non-functional change and it is mainly done for clarity reasons (Both XOP and FMA4 already imply AVX and SSE4A). llvm-svn: 221296	2014-11-04 21:18:09 +00:00
Sanjay Patel	957efc23bb	Use rsqrt (X86) to speed up reciprocal square root calcs This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570	2014-10-24 17:02:16 +00:00
Sanjay Patel	1191adf4df	Add a scheduling model for AMD 16H Jaguar (btver2). This is a first pass at a scheduling model for Jaguar. It's structured largely on the existing SandyBridge and SLM sched models. Using this model, in addition to turning on the PostRA scheduler, results in some perf wins on internal and 3rd party benchmarks. There's not much difference in LLVM's test-suite benchmarking subset of tests. Differential Revision: http://reviews.llvm.org/D5229 llvm-svn: 217457	2014-09-09 20:07:07 +00:00
Robert Khasanov	98441b6e7f	[x86] Enable Broadwell target. Added FeatureSMAP. Broadwell ISA includes Haswell ISA + ADX + RDSEED + SMAP llvm-svn: 216161	2014-08-21 09:16:12 +00:00
Kevin Enderby	0d928a142b	Add support for the X86 secure guard extensions instructions in assembler (SGX). This allows assembling the two new instructions, encls and enclu for the SKX processor model. Note the diffs are a bigger than what might think, but to fit the new MRM_CF and MRM_D7 in things in the right places things had to be renumbered and shuffled down causing a bit more diffs. rdar://16228228 llvm-svn: 214460	2014-07-31 23:57:38 +00:00
Robert Khasanov	bfa0131365	[SKX] Enabling SKX target and AVX512BW, AVX512DQ, AVX512VL features. Enabling HasAVX512{DQ,BW,VL} predicates. Adding VK2, VK4, VK32, VK64 masked register classes. Adding new types (v64i8, v32i16) to VR512. Extending calling conventions for new types (v64i8, v32i16) Patch by Zinovy Nis <zinovy.y.nis@intel.com> Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213545	2014-07-21 14:54:21 +00:00
Elena Demikhovsky	678bd5ba4a	AVX-512: dec/inc instructions are slow on KNL After Alexey Volkov, I'm adding the same property for KNL, that prefers ADD/SUB instead of INC/DEC. Added a test. llvm-svn: 212178	2014-07-02 14:11:05 +00:00
Alexey Volkov	5260dba323	[X86] Use ADD/SUB instead of INC/DEC for Silvermont According to Intel Software Optimization Manual on Silvermont INC or DEC instructions require an additional uop to merge the flags. As a result, a branch instruction depending on an INC or a DEC instruction incurs a 1 cycle penalty. Differential Revision: http://reviews.llvm.org/D3990 llvm-svn: 210466	2014-06-09 11:40:41 +00:00
Alexey Volkov	6226de6721	[X86] Tune LEA usage for Silvermont According to Intel Software Optimization Manual on Silvermont in some cases LEA is better to be replaced with ADD instructions: "The rule of thumb for ADDs and LEAs is that it is justified to use LEA with a valid index and/or displacement for non-destructive destination purposes (especially useful for stack offset cases), or to use a SCALE. Otherwise, ADD(s) are preferable." Differential Revision: http://reviews.llvm.org/D3826 llvm-svn: 209198	2014-05-20 08:55:50 +00:00
Chandler Carruth	32908d7a35	[x86] Make the 'x86-64' cpu, what I see as and many use as the generic default architecture for reasonable modern x86 processors, actually be modern. This processor model should essentially be "tuned" for modern x86 chips as much as possible without undue penalties on any specific architecture. Previously we weren't even using the nice scheduling models. There are a few other tweaks needed here, but this change at least I have benchmarked across a decent swatch of chips (intel's clovertown, westmere, and sandybridge; amd's istanbul) and seen no significant regressions. If anyone has suggested ways to test this, just let me know. Somewhat alarmingly, no existing tests failed. llvm-svn: 208230	2014-05-07 17:37:03 +00:00
Benjamin Kramer	6004573ecf	Add a description for AMD's bdver4 (aka Excavator). This is just bdver3 + AVX2 + BMI2. llvm-svn: 207847	2014-05-02 15:47:07 +00:00
Alexey Volkov	1051f04a8d	Enable FeatureFastUAMem for Silvermont processor Differential Revision: http://llvm-reviews.chandlerc.com/D2982 llvm-svn: 203218	2014-03-07 09:03:49 +00:00
Alexey Volkov	bb2f047346	Test commit Removed whitespace llvm-svn: 203216	2014-03-07 08:28:44 +00:00
Craig Topper	3c80d62a6c	[x86] Add basic support for .code16 This is not really expected to work right yet. Mostly because we will still emit the OpSize (0x66) prefix in all the wrong places, along with a number of other corner cases. Those will all be fixed in the subsequent commits. Patch from David Woodhouse. llvm-svn: 198584	2014-01-06 04:55:54 +00:00
Rafael Espindola	50712a456d	Change the default of AsmWriterClassName and isMCAsmWriter. llvm-svn: 196065	2013-12-02 04:55:42 +00:00
Ekaterina Romanova	d5fa55470c	SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction. AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) \| (y >> (64 - c))) when we are not optimizing for size. It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel. llvm-svn: 195383	2013-11-21 23:21:26 +00:00
Benjamin Kramer	d114def3d6	X86: Add a description for AMD bdver3 aka Steamroller. This is just bdver2 + FSGSBase. llvm-svn: 193984	2013-11-04 10:29:20 +00:00
Yunzhong Gao	dfc277f350	Enabling 3DNow! prefetch instruction for a few AMD processors: bobcat, jaguar, bulldozer and piledriver. Support for the instruction itself seems to have already been added in r178040. Differential Revision: http://llvm-reviews.chandlerc.com/D1933 llvm-svn: 192828	2013-10-16 19:04:11 +00:00
Nick Lewycky	3be42b8f06	Rename this feature to "cx16" to match gcc's flag name. Apparently these strings are directly tied to the flag names in clang with no remapping in between? llvm-svn: 192044	2013-10-05 20:11:44 +00:00
Yunzhong Gao	dd36e9387b	Adding a feature flag to the llvm backend for x86 TBM instruction set. Adding TBM feature to bdver2 processor; piledriver supports this instruction set according to the following document: http://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf Phabricator code review is located here: http://llvm-reviews.chandlerc.com/D1692 llvm-svn: 191324	2013-09-24 18:21:52 +00:00
Preston Gurd	ba6f9d1b7d	Remove unused code, which had been commented out. llvm-svn: 190869	2013-09-17 16:53:36 +00:00
Craig Topper	a6d204ec68	Make F16C feature flag imply AVX rather than just checking both at the patterns. llvm-svn: 190775	2013-09-16 04:29:58 +00:00
Preston Gurd	3fe264d625	Adds support for Atom Silvermont (SLM) - -march=slm Implements Instruction scheduler latencies for Silvermont, using latencies from the Intel Silvermont Optimization Guide. Auto detects SLM. Turns on post RA scheduler when generating code for SLM. llvm-svn: 190717	2013-09-13 19:23:28 +00:00
Ben Langmuir	1650175de6	Partial support for Intel SHA Extensions (sha1rnds4) Add basic assembly/disassembly support for the first Intel SHA instruction 'sha1rnds4'. Also includes feature flag, and test cases. Support for the remaining instructions will follow in a separate patch. llvm-svn: 190611	2013-09-12 15:51:31 +00:00
Benjamin Kramer	8f429384b5	X86: Add a description of the Intel Atom Silvermont CPU. Currently this is just the atom model with SSE4.2 enabled. llvm-svn: 189669	2013-08-30 14:05:32 +00:00
Rafael Espindola	94a2c5642d	Rename features to match what gcc and clang use. There is no advantage in being different and using the same names simplifies clang a bit. llvm-svn: 189141	2013-08-23 20:21:34 +00:00
Craig Topper	5c94bb8551	Rename mattr names for AVX-512 to from avx-512 -> avx512f, avx-512-pfi -> av512pf, avx-512-cdi -> avx512cd, avx-512-eri->avx512er. This matches better with official docs and what gcc patches appearto be using. I didn't touch the has* functions or the feature flag names to avoid change the td and lowering file while commits are still happening. llvm-svn: 188859	2013-08-21 03:57:57 +00:00
Elena Demikhovsky	003e7d73b9	Added encoding prefixes for KNL instructions (EVEX). Added 512-bit operands printing. Added instruction formats for KNL instructions. llvm-svn: 187324	2013-07-28 08:28:38 +00:00
Elena Demikhovsky	8cfb43f73b	I'm starting to commit KNL backend. I'll push patches one-by-one. This patch includes support for the extended register set XMM16-31, YMM16-31, ZMM0-31. The full ISA you can see here: http://software.intel.com/en-us/intel-isa-extensions llvm-svn: 187030	2013-07-24 11:02:47 +00:00
Benjamin Kramer	b44c4275d5	X86: Add target description for btver2; make autodetection logic aware of AVX. llvm-svn: 181005	2013-05-03 10:20:08 +00:00
Preston Gurd	8b7ab4ba2b	This patch adds the X86FixupLEAs pass, which will reduce instruction latency for certain models of the Intel Atom family, by converting instructions into their equivalent LEA instructions, when it is both useful and possible to do so. llvm-svn: 180573	2013-04-25 20:29:37 +00:00
Chad Rosier	9f7a221fdc	[asm parser] Add support for predicating MnemonicAlias based on the assembler variant/dialect. Addresses a FIXME in the emitMnemonicAliases function. Use and test case to come shortly. rdar://13688439 and part of PR13340. llvm-svn: 179804	2013-04-18 22:35:36 +00:00
Michael Liao	a486a11dcf	Add support of RDSEED defined in AVX2 extension llvm-svn: 178314	2013-03-28 23:41:26 +00:00
Nadav Rotem	e7b6a8aa8c	Add the Haswell machine model. llvm-svn: 178301	2013-03-28 22:34:46 +00:00
Preston Gurd	663e6f9558	For the current Atom processor, the fastest way to handle a call indirect through a memory address is to load the memory address into a register and then call indirect through the register. This patch implements this improvement by modifying SelectionDAG to force a function address which is a memory reference to be loaded into a virtual register. Patch by Sriram Murali. llvm-svn: 178171	2013-03-27 19:14:02 +00:00

1 2 3 4

187 Commits