llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	a8d4097445	[AVX-512] Make VEX encoded FMA instructions available when AVX512 is enabled regardless of whether +fma was added on the command line. We weren't able to handle isel of the 128/256-bit FMA instructions when AVX512F was enabled but VLX and FMA weren't. I didn't mask FeatureAVX512 imply FeatureFMA as I wasn't sure I wanted disabling FMA to also disable AVX512. Instead we just can't prevent FMA instructions if AVX512 is enabled. Another option would be to promote 128/256-bit to 512-bit, do the operation and extract it. But that requires a lot of extra isel patterns. Since no CPUs exist that support AVX512, but not FMA just using the VEX instructions seems better. llvm-svn: 298051	2017-03-17 07:37:31 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00
Petr Hosek	a7d5916308	[Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081	2017-02-24 03:10:10 +00:00
Craig Topper	d88389aa7e	[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697	2017-02-21 06:39:13 +00:00
Craig Topper	3cac763532	[X86] Remove the HLE feature flag. We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562	2017-02-09 06:51:02 +00:00
Craig Topper	86576bd921	[X86] Remove INVPCID and SMAP feature flags. They aren't currently used by any instructions and not tested. If we implement intrinsics for their instructions in the future, the feature flags can be added back with proper testing. llvm-svn: 294561	2017-02-09 06:50:59 +00:00
Craig Topper	50f3d1452c	[X86] Clzero intrinsic and its addition under znver1 This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558	2017-02-09 04:27:34 +00:00
Craig Topper	3fd463a15a	[X86] Add test for clflushopt intrinsic and only enable it to be selected if the feature flag is set. llvm-svn: 294407	2017-02-08 05:45:46 +00:00
Craig Topper	6c05192018	[X86] Remove the VMFUNC feature flag. It was only partially implemented and we have no support for codegening vmfunc instructions today. If that support ever gets added, the full feature flag support should come along with it. llvm-svn: 294406	2017-02-08 05:45:42 +00:00
Craig Topper	e0ac7f3beb	[X86] Remove PCOMMIT instruction support since Intel has deprecated this instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405	2017-02-08 05:45:39 +00:00
Eugene Zelenko	fbd13c5c12	[X86] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 293949	2017-02-02 22:55:55 +00:00
Nikolai Bozhenov	6bdf92cec7	[X86] Tune bypassing of slow division for Intel CPUs 64-bit integer division in Intel CPUs is extremely slow, much slower than 32-bit division. On the other hand, 8-bit and 16-bit divisions aren't any faster. The only important exception is Atom where DIV8 is fastest. Because of that, the patch 1) Enables bypassing of 64-bit division for Atom, Silvermont and all big cores. 2) Modifies 64-bit bypassing to use 32-bit division instead of 16-bit one. This doesn't make the shorter division slower but increases chances of taking it. Moreover, it's much more likely to prove at compile-time that a value fits 32 bits and doesn't require a run-time check (e.g. zext i32 to i64). Differential Revision: https://reviews.llvm.org/D28196 llvm-svn: 291800	2017-01-12 19:34:15 +00:00
Zvi Rackover	8bc7e4da51	[X86] Prefer reduced width multiplication over pmulld on Silvermont Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles]. Fixes pr31202. Analysis of this issue was done by Fahana Aleen. Reviewers: wmi, delena, mkuper Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27203 llvm-svn: 288844	2016-12-06 19:35:20 +00:00
Paul Robinson	78a695321e	[PS4] Tighten up a triple check. llvm-svn: 288286	2016-11-30 23:14:27 +00:00
Zvi Rackover	76dbf26599	[X86][GlobalISel] Add minimal call lowering support to the IRTranslator Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935	2016-11-15 06:34:33 +00:00
Pierre Gousseau	b6d652adb5	[X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248	2016-10-14 16:41:38 +00:00
Dean Michael Berris	4640154446	[XRay] ARM 32-bit no-Thumb support in LLVM This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: https://reviews.llvm.org/D23932 (Clang test) https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 281878	2016-09-19 00:54:35 +00:00
Renato Golin	049f387112	Revert "[XRay] ARM 32-bit no-Thumb support in LLVM" And associated commits, as they broke the Thumb bots. This reverts commit r280935. This reverts commit r280891. This reverts commit r280888. llvm-svn: 280967	2016-09-08 17:10:39 +00:00
Dean Michael Berris	17d94e279e	[XRay] ARM 32-bit no-Thumb support in LLVM This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: 1. https://reviews.llvm.org/D23932 (Clang test) 2. https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 280888	2016-09-08 00:19:04 +00:00
Nikolai Bozhenov	f679530ba1	[X86] Heuristic to selectively build Newton-Raphson SQRT estimation On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725	2016-08-04 12:47:28 +00:00
Rafael Espindola	f9e348bd59	Convert a few more comparisons to isPositionIndependent(). NFC. llvm-svn: 273945	2016-06-27 21:33:08 +00:00
Rafael Espindola	0d34826218	Simplify PICStyles. The main difference is that StubDynamicNoPIC is gone. The dynamic-no-pic mode as the name implies is simply not pic. It is just conservative about what it assumes to be dso local. llvm-svn: 273222	2016-06-20 23:41:56 +00:00
Rafael Espindola	94eb31a7a9	Delete dead code. NFC. llvm-svn: 273206	2016-06-20 22:08:35 +00:00
Davide Italiano	ef5d8bead1	[X86Subtarget] Use isPositionIndependent(). NFC. Differential Revision: http://reviews.llvm.org/D21480 llvm-svn: 273071	2016-06-18 00:03:20 +00:00
David Majnemer	ca29023b02	[X86] Reduce memory allocations in X86TargetMachine::getSubtargetImpl We performed a number of memory allocations each time getTTI was called, remove them by using SmallString. No functionality change intended. llvm-svn: 270246	2016-05-20 18:16:06 +00:00
Rafael Espindola	c7e9813228	Refactor X86 symbol access classification. This refactors the logic in X86 to avoid code duplication. It also splits it in two steps: it first decides if a symbol is local to the DSO and then uses that information to decide how to access it. The first part is implemented by shouldAssumeDSOLocal. It is not in any way specific to X86. In a followup patch I intend to move it to somewhere common and reused it in other backends. llvm-svn: 270209	2016-05-20 12:20:10 +00:00
Rafael Espindola	ab03eb007c	Record a TargetMachine instead of a Reloc::Model. Addresses r270095's code review. llvm-svn: 270147	2016-05-19 22:07:57 +00:00
Rafael Espindola	46107b9e62	Remember the relocation model. NFC. This avoids passing a TargetMachine in a few places. llvm-svn: 270095	2016-05-19 18:49:29 +00:00
Rafael Espindola	cb2d266360	Style fixes. NFC. llvm-svn: 270093	2016-05-19 18:34:20 +00:00
Ashutosh Nema	348af9cc6b	Add new flag and intrinsic support for MWAITX and MONITORX instructions Summary: MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT pair while adding a timer function, such that another termination of the MWAITX instruction occurs when the timer expires. The presence of the MONITORX and MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29. The MONITORX and MWAITX instructions are intercepted by the same bits that intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be monitored. MWAITX instruction causes the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is "0F 01 FB". These opcode information is used in adding tests for the disassembler. These instructions are enabled for AMD's bdver4 architecture. Patch by Ganesh Gopalasubramanian! Reviewers: echristo, craig.topper, RKSimon Subscribers: RKSimon, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19795 llvm-svn: 269911	2016-05-18 11:59:12 +00:00
Marcin Koscielnicki	0275fac2c9	[X86] Extend some Linux special cases to cover kFreeBSD. Both Linux and kFreeBSD use glibc, so follow similiar code paths. Add isTargetGlibc to check for this, and use it instead of isTargetLinux in a few places. Fixes PR22248 for kFreeBSD. Differential Revision: http://reviews.llvm.org/D19104 llvm-svn: 268624	2016-05-05 11:35:51 +00:00
Sriraman Tallam	7da9b445ea	Differential Revision: http://reviews.llvm.org/D19733 llvm-svn: 268106	2016-04-29 21:19:16 +00:00
Sriraman Tallam	3cb773431d	Differential Revision: http://reviews.llvm.org/D19040 llvm-svn: 267229	2016-04-22 21:41:58 +00:00
Asaf Badouh	89406d1815	[X86] enable PIE for functions Call locally defined function directly for PIE/fPIE Differential Revision: http://reviews.llvm.org/D19226 llvm-svn: 266863	2016-04-20 08:32:57 +00:00
Andrey Turetskiy	6a3d561ea0	[X86] Introduction of FeatureX87. Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87. Differential Revision: http://reviews.llvm.org/D13979 llvm-svn: 264148	2016-03-23 11:13:54 +00:00
Craig Topper	f730a6bedc	Remove Proc feature flags for X86 processors that are used to inherit features from one processor to another. This exposed extra features to the -mattr command line that we shouldn't. Replace with just inherited listconcats. llvm-svn: 260832	2016-02-13 21:35:37 +00:00
Sanjay Patel	e9bf993cee	[x86-64] allow mfence even with -mno-sse (PR23203) As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828	2016-02-13 17:26:29 +00:00
Yunzhong Gao	0de36ec169	Disable the vzeroupper insertion pass on PS4. Differential Revision: http://reviews.llvm.org/D16837 llvm-svn: 260764	2016-02-12 23:37:57 +00:00
Elena Demikhovsky	29cde35b43	Added Skylake client to X86 targets and features Changes in X86.td: I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X .. I added Skylake client processor and defined it's features FeatureADX was missing on KNL Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others Differential Revision: http://reviews.llvm.org/D16357 llvm-svn: 258659	2016-01-24 10:41:28 +00:00
Michael Zuckerman	97b6a6923e	[AVX512] adding AVXVBMI feature flag The feature flag is for VPERMB,VPERMI2B,VPERMT2B and VPMULTISHIFTQB instructions. More about the instruction can be found in: hattps://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Differential Revision: http://reviews.llvm.org/D16190 llvm-svn: 258012	2016-01-17 13:42:12 +00:00
Asaf Badouh	5acf66ff97	[x86] adding PKU feature flag the feature flag is essential for RDPKRU and WRPKRU instruction more about the instruction can be found in the SDM rev 56, vol 2 from http://www.intel.com/sdm Differential Revision: http://reviews.llvm.org/D15491 llvm-svn: 255644	2015-12-15 13:35:29 +00:00
Hans Wennborg	5000ce8a63	X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported These instructions are not supported by all CPUs in 64-bit mode. Emitting them causes Chromium to crash on start-up for users with such chips. (GCC puts these instructions behind -msahf on 64-bit for the same reason.) This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering from before r244503 when the instructions are not available. Differential Revision: http://reviews.llvm.org/D15240 llvm-svn: 254793	2015-12-04 23:00:33 +00:00
Sanjay Patel	60216f6943	[x86] add a convenience method to check for FMA capability; NFCI llvm-svn: 254425	2015-12-01 17:27:55 +00:00
Simon Pilgrim	db26b3ddfa	[X86][FMA4] Prefer FMA4 to FMA We currently output FMA instructions on targets which support both FMA4 + FMA (i.e. later Bulldozer CPUS bdver2/bdver3/bdver4). This patch flips this so FMA4 is preferred; this is for several reasons: 1 - FMA4 is non-destructive reducing the need for mov instructions. 2 - Its more straighforward to commute and fold inputs (although the recent work on FMA has reduced this difference). 3 - All supported targets have FMA4 performance equal or better to FMA - Piledriver (bdver2) in particular has half the throughput when executing FMA instructions. Its looks like no future AMD processor lines will support FMA4 after the Bulldozer series so we're not causing problems for later CPUs. Differential Revision: http://reviews.llvm.org/D14997 llvm-svn: 254339	2015-11-30 22:22:06 +00:00
Eric Christopher	57a6e1321f	Add MMX to the 3dnow enum and propagate changes around. This makes it somewhat more consistent with how the feature is used. llvm-svn: 253122	2015-11-14 03:04:00 +00:00
Michael Kuperstein	e1194bdb4f	[X86] Make elfiamcu an OS, not an environment. GNU tools require elfiamcu to take up the entire OS field, so, e.g. i?86-*-linux-elfiamcu is not considered a legal triple. Make us compatible. Differential Revision: http://reviews.llvm.org/D14081 llvm-svn: 251390	2015-10-27 07:23:59 +00:00
Michael Kuperstein	fe897623f3	[X86] Add support for elfiamcu triple This adds support for the i?86-*-elfiamcu triple, which indicates the IAMCU psABI is used. Differential Revision: http://reviews.llvm.org/D13977 llvm-svn: 251222	2015-10-25 08:07:37 +00:00
Craig Topper	09b6598572	[X86] Add fxsr feature flag for fxsave/fxrestore instructions. llvm-svn: 250497	2015-10-16 06:03:09 +00:00
Amjad Aboud	1db6d7af46	[X86] Add XSAVE intrinsic family Add intrinsics for the XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64) XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64) XSAVEC instructions (XSAVEC/XSAVEC64) XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64) Differential Revision: http://reviews.llvm.org/D13012 llvm-svn: 250029	2015-10-12 11:47:46 +00:00
Evgeniy Stepanov	5fe279e727	Add Triple::isAndroid(). This is a simple refactoring that replaces Triple.getEnvironment() checks for Android with Triple.isAndroid(). llvm-svn: 249750	2015-10-08 21:21:24 +00:00

1 2 3 4 5 ...

312 Commits