llvm-project

Commit Graph

Author	SHA1	Message	Date
Preston Gurd	a01daace88	Pad Short Functions for Intel Atom The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. This patch has been updated to address Nadav's review comments - Optimize only at >= O1 and don't do optimization if -Os is set - Stores MachineBasicBlock* instead of BBNum - Uses DenseMap instead of std::map - Fixes placement of braces Patch by Andy Zhang. llvm-svn: 171879	2013-01-08 18:27:24 +00:00
Nadav Rotem	478b6a47ec	Revert revision 171524. Original message: URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev Log: The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171603	2013-01-05 05:42:48 +00:00
Preston Gurd	e36b685a94	The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171524	2013-01-04 20:54:54 +00:00
Chandler Carruth	7a28f95419	Make '-mtune=x86_64' assume fast unaligned memory accesses. Not all chips targeted by x86_64 have this feature, but a dramatically increasing number do. Specifying a chip-specific tuning parameter will continue to turn the feature on or off as appropriate for that particular chip, but the generic flag should try to achieve the best performance on the most widely available hardware. Today, the number of chips with fast UA access dwarfs those without in the x86-64 space. Note that this also brings LLVM's code generation for this '-march' flag more in line with that of modern GCCs. Reviewed by Dan Gohman. llvm-svn: 170269	2012-12-15 09:01:13 +00:00
Chandler Carruth	867c7bff9a	Revert "Make '-mtune=x86_64' assume fast unaligned memory accesses." Accidental commit... git svn betrayed me. Sorry for the noise. llvm-svn: 169741	2012-12-10 18:23:52 +00:00
Chandler Carruth	7eaa45c738	Make '-mtune=x86_64' assume fast unaligned memory accesses. Summary: Not all chips targeted by x86_64 have this feature, but a dramatically increasing number do. Specifying a chip-specific tuning parameter will continue to turn the feature on or off as appropriate for that particular chip, but the generic flag should try to achieve the best performance on the most widely available hardware. Today, the number of chips with fast UA access dwarfs those without in the x86-64 space. Note that this also brings LLVM's code generation for this '-march' flag more in line with that of modern GCCs. CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D195 llvm-svn: 169740	2012-12-10 18:22:42 +00:00
Chandler Carruth	0f58558101	Address a FIXME and update the fast unaligned memory feature for newer Intel chips. The model number rules were determined by inspecting Intel's documentation for their newer chip model numbers. My understanding is that all of the newer Intel chips have fast unaligned memory access, but if anyone is concerned about a particular chip, just shout. No tests updated; it's not clear we have dedicated tests for the chips' various features, but if anyone would like tests (or can point me at some existing ones), I'm happy to oblige. llvm-svn: 169730	2012-12-10 09:18:44 +00:00
Michael Liao	73cffddb95	Add support of RTM from TSX extension - Add RTM code generation support throught 3 X86 intrinsics: xbegin()/xend() to start/end a transaction region, and xabort() to abort a tranaction region llvm-svn: 167573	2012-11-08 07:28:54 +00:00
Michael Liao	c6696b04db	Atom has SIMD instruction set extension up to SSSE3 llvm-svn: 166665	2012-10-25 07:06:48 +00:00
Craig Topper	303878108e	Fix 80-column violation llvm-svn: 165089	2012-10-03 03:56:12 +00:00
Roman Divacky	fd69009419	Add support for AMD Geode. llvm-svn: 163710	2012-09-12 14:36:02 +00:00
Preston Gurd	cdf540d5d6	Generic Bypass Slow Div - CodeGenPrepare pass for identifying div/rem ops - Backend specifies the type mapping using addBypassSlowDivType - Enabled only for Intel Atom with O2 32-bit -> 8-bit - Replace IDIV with instructions which test its value and use DIVB if the value is positive and less than 256. - In the case when the quotient and remainder of a divide are used a DIV and a REM instruction will be present in the IR. In the non-Atom case they are both lowered to IDIVs and CSE removes the redundant IDIV instruction, using the quotient and remainder from the first IDIV. However, due to this optimization CSE is not able to eliminate redundant IDIV instructions because they are located in different basic blocks. This is overcome by calculating both the quotient (DIV) and remainder (REM) in each basic block that is inserted by the optimization and reusing the result values when a subsequent DIV or REM instruction uses the same operands. - Test cases check for the presents of the optimization when calculating either the quotient, remainder, or both. Patch by Tyler Nowicki! llvm-svn: 163150	2012-09-04 18:22:17 +00:00
Anitha Boyapati	af3e98347f	Patch to enable FMA on bdver2 target. Make XOP feature enable FMA4 as well. llvm-svn: 162012	2012-08-16 04:04:02 +00:00
Anitha Boyapati	426feb61b9	(no commit message) llvm-svn: 162010	2012-08-16 03:50:04 +00:00
Andrew Trick	87255e340e	I'm introducing a new machine model to simultaneously allow simple subtarget CPU descriptions and support new features of MachineScheduler. MachineModel has three categories of data: 1) Basic properties for coarse grained instruction cost model. 2) Scheduler Read/Write resources for simple per-opcode and operand cost model (TBD). 3) Instruction itineraties for detailed per-cycle reservation tables. These will all live side-by-side. Any subtarget can use any combination of them. Instruction itineraries will not change in the near term. In the long run, I expect them to only be relevant for in-order VLIW machines that have complex contraints and require a precise scheduling/bundling model. Once itineraries are only actively used by VLIW-ish targets, they could be replaced by something more appropriate for those targets. This tablegen backend rewrite sets things up for introducing MachineModel type #2: per opcode/operand cost model. llvm-svn: 159891	2012-07-07 04:00:00 +00:00
Craig Topper	79dbb0c6e4	Rename FMA3 feature flag to just FMA to match gcc so it can be added to clang. llvm-svn: 157903	2012-06-03 18:58:46 +00:00
Benjamin Kramer	a0396e4583	X86: Rename the CLMUL target feature to PCLMUL. It was renamed in gcc/gas a while ago and causes all kinds of confusion because it was named differently in llvm and clang. llvm-svn: 157745	2012-05-31 14:34:17 +00:00
Craig Topper	bae0e9ea1d	Make XOP and FMA4 require SSE4A to match GCC behavior. Use this to simplify Bulldozer feature list. llvm-svn: 155897	2012-05-01 06:54:48 +00:00
Craig Topper	43518cc55f	Make XOP imply AVX as its needed to legalize the registers types. llvm-svn: 155891	2012-05-01 05:41:41 +00:00
Craig Topper	29dd148a71	Make CLMUL and AES imply SSE2 since its needed to legalize the type. llvm-svn: 155888	2012-05-01 05:28:32 +00:00
Craig Topper	0eacda5f69	Enable AVX and FMA4 for AMD Bulldozer processors. llvm-svn: 155885	2012-05-01 05:18:13 +00:00
Craig Topper	08ccfbe57b	Enable detection of AVX and AVX2 support through CPUID. Add AVX/AVX2 to corei7-avx, core-avx-i, and core-avx2 cpu names. llvm-svn: 155618	2012-04-26 06:40:15 +00:00
Jia Liu	b22310fda6	Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore. llvm-svn: 150878	2012-02-18 12:03:15 +00:00
Evan Cheng	1b81fddd65	Use LEA to adjust stack ptr for Atom. Patch by Andy Zhang. llvm-svn: 150008	2012-02-07 22:50:41 +00:00
Andrew Trick	8523b16ff5	Instruction scheduling itinerary for Intel Atom. Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT. Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches. Adds a test to verify that the scheduler is working. Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP. Patch by Preston Gurd! llvm-svn: 149558	2012-02-01 23:20:51 +00:00
Devang Patel	4a6e778aae	Rename X86ATTAsmParser -> X86AsmParser We are using one parser to parse att as well as intel style syntax. llvm-svn: 148032	2012-01-12 18:03:40 +00:00
Devang Patel	67bf992a8f	Add definition for intel asm variant. Right now, this just adds additional entries in match table. The parser does not use them yet. llvm-svn: 147859	2012-01-10 17:51:54 +00:00
Benjamin Kramer	077ae1d760	Add definitions for AMD's bobcat (aka btver1) llvm-svn: 147846	2012-01-10 11:50:02 +00:00
Devang Patel	85d684a4d9	Split AsmParser into two components - AsmParser and AsmParserVariant AsmParser holds info specific to target parser. AsmParserVariant holds info specific to asm variants supported by the target. llvm-svn: 147787	2012-01-09 19:13:28 +00:00
Craig Topper	f287a4509e	Remove AVX hack in X86Subtarget. AVX/AVX2 are now treated as an SSE level. Predicate functions have been altered to maintain previous names and behavior. llvm-svn: 147770	2012-01-09 09:02:13 +00:00
Craig Topper	a5d1fc2cc7	Make FMA4 imply AVX so that YMM registers would be available. Necessitates removing from Bulldozer CPU types since it would enable AVX code generation implicitly. Also make SSE4A imply SSE3. Without some level of SSE implied, XMM registers wouldn't be legal. llvm-svn: 147369	2011-12-30 07:16:00 +00:00
Craig Topper	e1bd05128e	Make FMA3 imply AVX needs to be enabled. Particularly because 256-bit types aren't valid unless AVX is enabled. llvm-svn: 147349	2011-12-29 19:46:19 +00:00
Craig Topper	a060afb5ba	Add FeaturePOPCNT to all CPU types that lost it was removed from SSE42/SSE4A in r147339. llvm-svn: 147347	2011-12-29 18:47:31 +00:00
Craig Topper	7bd3305f3e	Make SSE42 and SSE4A not imply POPCNT. POPCNT should be able to be disabled on its own without disabling SSE4.2 or SSE4A. llvm-svn: 147339	2011-12-29 15:51:45 +00:00
Jan Sjödin	1280eb1d06	Add XOP feature flag. llvm-svn: 145682	2011-12-02 15:14:37 +00:00
Benjamin Kramer	5feb3dab79	X86: Turns out bulldozer also supports sse42 and lzcnt. While at it remove the barcelona/instanbul/shanghai subtargets, they're unsupported by GCC and look pretty broken. llvm-svn: 145494	2011-11-30 15:48:16 +00:00
Benjamin Kramer	981f32327d	X86: Add subtargets for AMD's bulldozer. llvm-svn: 145493	2011-11-30 15:27:46 +00:00
Craig Topper	228d9131aa	Add intrinsics and feature flag for read/write FS/GS base instructions. Also add AVX2 feature flag. llvm-svn: 143319	2011-10-30 19:57:21 +00:00
David Meyer	49045ddb4c	Remove NaClMode llvm-svn: 142338	2011-10-18 05:29:23 +00:00
Craig Topper	aea148c366	Add X86 BZHI instruction as well as BMI2 feature detection. llvm-svn: 142122	2011-10-16 07:55:05 +00:00
Craig Topper	3657fe4b17	Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell. llvm-svn: 141939	2011-10-14 03:21:46 +00:00
Bill Wendling	063f55ffdd	Revert r141854 because it was causing failures: http://lab.llvm.org:8011/builders/llvm-x86_64-linux/builds/101 --- Reverse-merging r141854 into '.': U test/MC/Disassembler/X86/x86-32.txt U test/MC/Disassembler/X86/simple-tests.txt D test/CodeGen/X86/bmi.ll U lib/Target/X86/X86InstrInfo.td U lib/Target/X86/X86ISelLowering.cpp U lib/Target/X86/X86.td U lib/Target/X86/X86Subtarget.h llvm-svn: 141857	2011-10-13 07:48:07 +00:00
Craig Topper	8cc9388073	Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell. llvm-svn: 141854	2011-10-13 07:09:14 +00:00
Craig Topper	271064e873	Add X86 LZCNT instruction. Including instruction selection support. llvm-svn: 141651	2011-10-11 06:44:02 +00:00
Benjamin Kramer	874c519337	X86: Add a subtarget definition for core-avx-i, which is GCC's name for ivy bridge. llvm-svn: 141571	2011-10-10 19:35:07 +00:00
Benjamin Kramer	42c0330a79	X86: Add patterns for the movbe instruction (mov + bswap, only available on atom) llvm-svn: 141563	2011-10-10 18:34:56 +00:00
Craig Topper	fe9179fa4f	Add Ivy Bridge 16-bit floating point conversion instructions for the X86 disassembler. llvm-svn: 141505	2011-10-09 07:31:39 +00:00
Craig Topper	786bdb9e14	Add support for MOVBE and RDRAND instructions for the assembler and disassembler. Includes feature flag checking, but no instrinsic support. Fixes PR10832, PR11026 and PR11027. llvm-svn: 141007	2011-10-03 17:28:23 +00:00
Nick Lewycky	73df7e3830	Add a new MC bit for NaCl (Native Client) mode. NaCl requires that certain instructions are more aligned than the CPU requires, and adds some additional directives, to follow in future patches. Patch by David Meyer! llvm-svn: 139125	2011-09-05 21:51:43 +00:00
Eli Friedman	5e5704277f	Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction. llvm-svn: 138660	2011-08-26 21:21:21 +00:00

1 2 3

133 Commits