Commit Graph

205 Commits

Author SHA1 Message Date
Craig Topper 2d9361e325 [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 203378
2014-03-09 07:44:38 +00:00
Craig Topper 73156025e0 Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
llvm-svn: 202621
2014-03-02 09:09:27 +00:00
David Woodhouse 1c3996abc7 [x86] Kill gratuitous X86_{32,64}TargetMachine subclasses, use X86TargetMachine
llvm-svn: 198720
2014-01-08 00:08:50 +00:00
Craig Topper 3c80d62a6c [x86] Add basic support for .code16
This is not really expected to work right yet. Mostly because we will
still emit the OpSize (0x66) prefix in all the wrong places, along with
a number of other corner cases. Those will all be fixed in the subsequent
commits.

Patch from David Woodhouse.

llvm-svn: 198584
2014-01-06 04:55:54 +00:00
Rafael Espindola ddb913cc8f Synchronize the NaCl DataLayout strings with the ones in clang.
Patch by Derek Schuff.

llvm-svn: 197640
2013-12-19 00:44:37 +00:00
Tim Northover 9653eb5759 Make Triple's isOSBinFormatXXX functions partition triple-space.
Most users would be surprised if "isCOFF" and "isMachO" were simultaneously
true, unless they'd put the compiler in a box with a gun attached to a photon
detector.

This makes sure precisely one of the three formats is true for any triple and
simplifies some target logic based on that.

llvm-svn: 196934
2013-12-10 16:57:43 +00:00
Ekaterina Romanova d5fa55470c SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction.
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.

It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.

llvm-svn: 195383
2013-11-21 23:21:26 +00:00
Yaron Keren 79bb266346 (this is a corrected patch)
Calling _chkstk is required on ELF as well as COFF on Windows. Without 
_chkstk, functions requiring large stack crash in initialization code.

Previous code tested for COFF format but not Mach-O and this patch modifies 
the code to test for Windows OS (both Windows target and MingW target) 
but not Mach-O object format: Looks like macho environment was used to 
build some EFI code.
 
Credits to Andrew MacPherson.

llvm-svn: 193289
2013-10-23 23:37:01 +00:00
Rafael Espindola bca3ab0905 Revert "Calling _chkstk is required on ELF as well as COFF on Windows. Without _chkstk functions requiring large stack crash in initialization code. Previous code tested for COFF format but not Mach-O and this patch modifies the code to test for Windows."
This reverts commit r193263.

It is causing CodeGen/X86/mingw-alloca.ll to fail.

llvm-svn: 193275
2013-10-23 21:45:09 +00:00
Yaron Keren 03ac82edf5 Calling _chkstk is required on ELF as well as COFF on Windows.
Without _chkstk functions requiring large stack crash in 
initialization code. Previous code tested for COFF format but 
not Mach-O and this patch modifies the code to test for Windows.

Credits to Andrew MacPherson.

llvm-svn: 193263
2013-10-23 19:40:07 +00:00
Andrew Trick e97d8d6dde Enable MI Sched for x86.
This changes the SelectionDAG scheduling preference to source
order. Soon, the SelectionDAG scheduler can be bypassed saving
a nice chunk of compile time.

Performance differences that result from this change are often a
consequence of register coalescing. The register coalescer is far from
perfect. Bugs can be filed for deficiencies.

On x86 SandyBridge/Haswell, the source order schedule is often
preserved, particularly for small blocks.

Register pressure is generally improved over the SD scheduler's ILP
mode. However, we are still able to handle large blocks that require
latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also
attempts to discover the critical path in single-block loops and
adjust heuristics accordingly.

The MI scheduler relies on the new machine model. This is currently
unimplemented for AVX, so we may not be generating the best code yet.

Unit tests are updated so they don't depend on SD scheduling heuristics.

llvm-svn: 192750
2013-10-15 23:33:07 +00:00
Yunzhong Gao dd36e9387b Adding a feature flag to the llvm backend for x86 TBM instruction set.
Adding TBM feature to bdver2 processor; piledriver supports this instruction set
according to the following document:
http://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf

Phabricator code review is located here: http://llvm-reviews.chandlerc.com/D1692

llvm-svn: 191324
2013-09-24 18:21:52 +00:00
Preston Gurd 3fe264d625 Adds support for Atom Silvermont (SLM) - -march=slm
Implements Instruction scheduler latencies for Silvermont,
using latencies from the Intel Silvermont Optimization Guide.

Auto detects SLM.

Turns on post RA scheduler when generating code for SLM.

llvm-svn: 190717
2013-09-13 19:23:28 +00:00
Ben Langmuir 1650175de6 Partial support for Intel SHA Extensions (sha1rnds4)
Add basic assembly/disassembly support for the first Intel SHA
instruction 'sha1rnds4'. Also includes feature flag, and test cases.

Support for the remaining instructions will follow in a separate patch.

llvm-svn: 190611
2013-09-12 15:51:31 +00:00
Cameron Esfahani 943908b78d Clean up some usage of Triple. The base class has methods for determining if the target is iOS and Linux.
llvm-svn: 189604
2013-08-29 20:23:14 +00:00
NAKAMURA Takumi 9ea7c6d463 X86Subtarget.h: Recognize x86_64-cygwin.
In the LLVM side, x86_64-cygwin is almost as same as x86_64-mingw32.

llvm-svn: 189436
2013-08-28 03:04:02 +00:00
Craig Topper 5c94bb8551 Rename mattr names for AVX-512 to from avx-512 -> avx512f, avx-512-pfi -> av512pf, avx-512-cdi -> avx512cd, avx-512-eri->avx512er. This matches better with official docs and what gcc patches appearto be using. I didn't touch the has* functions or the feature flag names to avoid change the td and lowering file while commits are still happening.
llvm-svn: 188859
2013-08-21 03:57:57 +00:00
Elena Demikhovsky 8cfb43f73b I'm starting to commit KNL backend. I'll push patches one-by-one. This patch includes support for the extended register set XMM16-31, YMM16-31, ZMM0-31.
The full ISA you can see here: http://software.intel.com/en-us/intel-isa-extensions

llvm-svn: 187030
2013-07-24 11:02:47 +00:00
Charles Davis e8f297ca94 Target/X86: Add explicit Win64 and System V/x86-64 calling conventions.
Summary:
This patch adds explicit calling convention types for the Win64 and
System V/x86-64 ABIs. This allows code to override the default, and use
the Win64 convention on a target that wants to use SysV (and
vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU
attributes.

Reviewers:

CC:

llvm-svn: 186144
2013-07-12 06:02:35 +00:00
Andrew Trick 121124acf8 Revert "Temporarily enable MI-Sched on X86."
This reverts commit 98a9b72e8c56dc13a2617de84503a3d78352789c.

llvm-svn: 184823
2013-06-25 02:48:58 +00:00
Andrew Trick 5a1e0af838 Temporarily enable MI-Sched on X86.
Sorry for the unit test churn. I'll try to make the change permanently
next time.

llvm-svn: 184705
2013-06-24 09:13:20 +00:00
Preston Gurd 8b7ab4ba2b This patch adds the X86FixupLEAs pass, which will reduce instruction
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.

llvm-svn: 180573
2013-04-25 20:29:37 +00:00
Michael Liao a486a11dcf Add support of RDSEED defined in AVX2 extension
llvm-svn: 178314
2013-03-28 23:41:26 +00:00
Preston Gurd 663e6f9558 For the current Atom processor, the fastest way to handle a call
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.

This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.

Patch by Sriram Murali.

llvm-svn: 178171
2013-03-27 19:14:02 +00:00
Michael Liao e344ec919f Add HLE target feature
llvm-svn: 178082
2013-03-26 22:46:02 +00:00
Michael Liao 5173ee03af Add PREFETCHW codegen support
- Add 'PRFCHW' feature defined in AVX2 ISA extension

llvm-svn: 178040
2013-03-26 17:47:11 +00:00
Bill Wendling 61375d8953 Reinitialize the ivars in the subtarget so that they can be reset with the new features.
llvm-svn: 175336
2013-02-16 01:36:26 +00:00
Bill Wendling e9434778f7 Temporary revert of 175320.
llvm-svn: 175322
2013-02-15 23:22:32 +00:00
Bill Wendling a060d0efd8 Reinitialize the ivars in the subtarget.
When we're recalculating the feature set of the subtarget, we need to have the
ivars in their initial state.

llvm-svn: 175320
2013-02-15 23:18:01 +00:00
Bill Wendling aef9c37c65 Use the 'target-features' and 'target-cpu' attributes to reset the subtarget features.
If two functions require different features (e.g., `-mno-sse' vs. `-msse') then
we want to honor that, especially during LTO. We can do that by resetting the
subtarget's features depending upon the 'target-feature' attribute.

llvm-svn: 175314
2013-02-15 22:31:27 +00:00
Kay Tiong Khoo f809c6491d added basic support for Intel ADX instructions
-feature flag, instructions definitions, test cases

llvm-svn: 175196
2013-02-14 19:08:21 +00:00
Evan Cheng 0e88c7d897 Teach SDISel to combine fsin / fcos into a fsincos node if the following
conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
   the target provides a sincos library call.

Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.

rdar://13087969
PR13204

llvm-svn: 173755
2013-01-29 02:32:37 +00:00
Eli Bendersky 597fc1233a In this patch, we teach X86_64TargetMachine that it has a ILP32
(defined by the x32 ABI) mode, in which case its pointers are 32-bits
in size. This knowledge is also added to X86RegisterInfo that now
returns the appropriate registers in getPointerRegClass.

There are many outcomes to this change. In order to keep the patches
separate and manageable, we start by focusing on some simple testable
cases. The patch adds a test with passing a pointer to a function -
focusing on the difference between the two data models for x86-64.
Another test is added for handling of 'sret' arguments (and
functionality is added in X86ISelLowering to make it work).

A note on naming: the "x32 ABI" document refers to the AMD64
architecture (in LLVM it's distinguished by being is64Bits() in the
x86 subtarget) with two variations: the LP64 (default) data model, and
the ILP32 data model. This patch adds predicates to the subtarget
which are consistent with this naming scheme.

llvm-svn: 173503
2013-01-25 22:07:43 +00:00
Preston Gurd a01daace88 Pad Short Functions for Intel Atom
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.

When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.

It includes tests.

This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces

Patch by Andy Zhang.

llvm-svn: 171879
2013-01-08 18:27:24 +00:00
Nadav Rotem 478b6a47ec Revert revision 171524. Original message:
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171603
2013-01-05 05:42:48 +00:00
Preston Gurd e36b685a94 The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171524
2013-01-04 20:54:54 +00:00
Chandler Carruth 9fb823bbd4 Move all of the header files which are involved in modelling the LLVM IR
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.

There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.

The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.

I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).

I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.

llvm-svn: 171366
2013-01-02 11:36:10 +00:00
Eli Bendersky abe546368b Make NaCl naming consistent. The triple OSType is called NaCl and is represented
textually as NativeClient. Also added a link to the native client project for
readers unfamiliar with it.

A Clang patch will follow shortly.

llvm-svn: 169291
2012-12-04 18:37:26 +00:00
Chandler Carruth 802d755533 Sort includes for all of the .h files under the 'lib' tree. These were
missed in the first pass because the script didn't yet handle include
guards.

Note that the script is now able to handle all of these headers without
manual edits. =]

llvm-svn: 169224
2012-12-04 07:12:27 +00:00
Elena Demikhovsky eace43bff7 I changed hasAVX() to hasFp256() and hasAVX2() to hasInt256() in X86IselLowering.cpp.
The logic was not changed, only names.

llvm-svn: 168875
2012-11-29 12:44:59 +00:00
Michael Liao 73cffddb95 Add support of RTM from TSX extension
- Add RTM code generation support throught 3 X86 intrinsics:
  xbegin()/xend() to start/end a transaction region, and xabort() to abort a
  tranaction region

llvm-svn: 167573
2012-11-08 07:28:54 +00:00
Andrew Trick 07dced627e misched: remove the unused getSpecialAddressLatency hook.
llvm-svn: 165418
2012-10-08 18:54:00 +00:00
Andrew Kaylor feb805fcf2 Support for generating ELF objects on Windows.
This adds 'elf' as a recognized target triple environment value and overrides the default generated object format on Windows platforms if that value is present.  This patch also enables MCJIT tests on Windows using the new environment value.

llvm-svn: 165030
2012-10-02 18:38:34 +00:00
Craig Topper 0a928fa32e Remove hasNoAVX method. Can just invert hasAVX instead.
llvm-svn: 164664
2012-09-26 06:29:37 +00:00
Preston Gurd cdf540d5d6 Generic Bypass Slow Div
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder,  or both.

Patch by Tyler Nowicki!

llvm-svn: 163150
2012-09-04 18:22:17 +00:00
Michael Liao bbd10792c2 Introduce 'UseSSEx' to force SSE legacy encoding
- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
  enabled.

  As the penalty of inter-mixing SSE and AVX instructions, we need
  prevent SSE legacy insn from being generated except explicitly
  specified through some intrinsics. For patterns supported by both
  SSE and AVX, so far, we force AVX insn will be tried first relying on
  AddedComplexity or position in td file. It's error-prone and
  introduces bugs accidentally.

  'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
  by AVX, we need this predicate to force VEX encoding or SSE legacy
  encoding only.

  For insns not inherited by AVX, we still use the previous predicates,
  i.e. 'HasSSEx'. So far, these insns fall into the following
  categories:
  * SSE insns with MMX operands
  * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
    CRC, and etc.)
  * SSE4A insns.
  * MMX insns.
  * x87 insns added by SSE.

2 test cases are modified:

 - test/CodeGen/X86/fast-isel-x86-64.ll
   AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
   selected by fast-isel due to complicated pattern and fast-isel
   fallback to materialize it from constant pool.

 - test/CodeGen/X86/widen_load-1.ll
   AVX code generation is different from SSE one after fixing SSE/AVX
   inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
   'vmovaps'.

llvm-svn: 162919
2012-08-30 16:54:46 +00:00
Craig Topper 663d160adb Custom lower FMA intrinsics to target specific nodes and remove the patterns.
llvm-svn: 162534
2012-08-24 04:03:22 +00:00
Craig Topper 4a4634d6de Favor FMA3 over FMA4 if both are enabled.
llvm-svn: 162454
2012-08-23 18:14:30 +00:00
Chad Rosier 24c19d20c0 Whitespace.
llvm-svn: 161122
2012-08-01 18:39:17 +00:00
Craig Topper 79dbb0c6e4 Rename FMA3 feature flag to just FMA to match gcc so it can be added to clang.
llvm-svn: 157903
2012-06-03 18:58:46 +00:00