Commit Graph

273 Commits

Author SHA1 Message Date
Eli Bendersky 597fc1233a In this patch, we teach X86_64TargetMachine that it has a ILP32
(defined by the x32 ABI) mode, in which case its pointers are 32-bits
in size. This knowledge is also added to X86RegisterInfo that now
returns the appropriate registers in getPointerRegClass.

There are many outcomes to this change. In order to keep the patches
separate and manageable, we start by focusing on some simple testable
cases. The patch adds a test with passing a pointer to a function -
focusing on the difference between the two data models for x86-64.
Another test is added for handling of 'sret' arguments (and
functionality is added in X86ISelLowering to make it work).

A note on naming: the "x32 ABI" document refers to the AMD64
architecture (in LLVM it's distinguished by being is64Bits() in the
x86 subtarget) with two variations: the LP64 (default) data model, and
the ILP32 data model. This patch adds predicates to the subtarget
which are consistent with this naming scheme.

llvm-svn: 173503
2013-01-25 22:07:43 +00:00
Preston Gurd a01daace88 Pad Short Functions for Intel Atom
The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.

When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.

It includes tests.

This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces

Patch by Andy Zhang.

llvm-svn: 171879
2013-01-08 18:27:24 +00:00
Nadav Rotem 478b6a47ec Revert revision 171524. Original message:
URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev
Log:
The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171603
2013-01-05 05:42:48 +00:00
Preston Gurd e36b685a94 The current Intel Atom microarchitecture has a feature whereby when a function
returns early then it is slightly faster to execute a sequence of NOP
instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction
until the return address is ready.

When compiling for X86 Atom only, this patch will run a pass, called
"X86PadShortFunction" which will add NOP instructions where less than four
cycles elapse between function entry and return.

It includes tests.

Patch by Andy Zhang.

llvm-svn: 171524
2013-01-04 20:54:54 +00:00
Chandler Carruth 9fb823bbd4 Move all of the header files which are involved in modelling the LLVM IR
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.

There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.

The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.

I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).

I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.

llvm-svn: 171366
2013-01-02 11:36:10 +00:00
Eli Bendersky abe546368b Make NaCl naming consistent. The triple OSType is called NaCl and is represented
textually as NativeClient. Also added a link to the native client project for
readers unfamiliar with it.

A Clang patch will follow shortly.

llvm-svn: 169291
2012-12-04 18:37:26 +00:00
Chandler Carruth 802d755533 Sort includes for all of the .h files under the 'lib' tree. These were
missed in the first pass because the script didn't yet handle include
guards.

Note that the script is now able to handle all of these headers without
manual edits. =]

llvm-svn: 169224
2012-12-04 07:12:27 +00:00
Elena Demikhovsky eace43bff7 I changed hasAVX() to hasFp256() and hasAVX2() to hasInt256() in X86IselLowering.cpp.
The logic was not changed, only names.

llvm-svn: 168875
2012-11-29 12:44:59 +00:00
Michael Liao 73cffddb95 Add support of RTM from TSX extension
- Add RTM code generation support throught 3 X86 intrinsics:
  xbegin()/xend() to start/end a transaction region, and xabort() to abort a
  tranaction region

llvm-svn: 167573
2012-11-08 07:28:54 +00:00
Andrew Trick 07dced627e misched: remove the unused getSpecialAddressLatency hook.
llvm-svn: 165418
2012-10-08 18:54:00 +00:00
Andrew Kaylor feb805fcf2 Support for generating ELF objects on Windows.
This adds 'elf' as a recognized target triple environment value and overrides the default generated object format on Windows platforms if that value is present.  This patch also enables MCJIT tests on Windows using the new environment value.

llvm-svn: 165030
2012-10-02 18:38:34 +00:00
Craig Topper 0a928fa32e Remove hasNoAVX method. Can just invert hasAVX instead.
llvm-svn: 164664
2012-09-26 06:29:37 +00:00
Preston Gurd cdf540d5d6 Generic Bypass Slow Div
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder,  or both.

Patch by Tyler Nowicki!

llvm-svn: 163150
2012-09-04 18:22:17 +00:00
Michael Liao bbd10792c2 Introduce 'UseSSEx' to force SSE legacy encoding
- Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is
  enabled.

  As the penalty of inter-mixing SSE and AVX instructions, we need
  prevent SSE legacy insn from being generated except explicitly
  specified through some intrinsics. For patterns supported by both
  SSE and AVX, so far, we force AVX insn will be tried first relying on
  AddedComplexity or position in td file. It's error-prone and
  introduces bugs accidentally.

  'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited
  by AVX, we need this predicate to force VEX encoding or SSE legacy
  encoding only.

  For insns not inherited by AVX, we still use the previous predicates,
  i.e. 'HasSSEx'. So far, these insns fall into the following
  categories:
  * SSE insns with MMX operands
  * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH,
    CRC, and etc.)
  * SSE4A insns.
  * MMX insns.
  * x87 insns added by SSE.

2 test cases are modified:

 - test/CodeGen/X86/fast-isel-x86-64.ll
   AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be
   selected by fast-isel due to complicated pattern and fast-isel
   fallback to materialize it from constant pool.

 - test/CodeGen/X86/widen_load-1.ll
   AVX code generation is different from SSE one after fixing SSE/AVX
   inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of
   'vmovaps'.

llvm-svn: 162919
2012-08-30 16:54:46 +00:00
Craig Topper 663d160adb Custom lower FMA intrinsics to target specific nodes and remove the patterns.
llvm-svn: 162534
2012-08-24 04:03:22 +00:00
Craig Topper 4a4634d6de Favor FMA3 over FMA4 if both are enabled.
llvm-svn: 162454
2012-08-23 18:14:30 +00:00
Chad Rosier 24c19d20c0 Whitespace.
llvm-svn: 161122
2012-08-01 18:39:17 +00:00
Craig Topper 79dbb0c6e4 Rename FMA3 feature flag to just FMA to match gcc so it can be added to clang.
llvm-svn: 157903
2012-06-03 18:58:46 +00:00
Benjamin Kramer a0396e4583 X86: Rename the CLMUL target feature to PCLMUL.
It was renamed in gcc/gas a while ago and causes all kinds of
confusion because it was named differently in llvm and clang.

llvm-svn: 157745
2012-05-31 14:34:17 +00:00
Preston Gurd 9a0914753a This patch fixes a problem which arose when using the Post-RA scheduler
on X86 Atom. Some of our tests failed because the tail merging part of
the BranchFolding pass was creating new basic blocks which did not
contain live-in information. When the anti-dependency code in the Post-RA
scheduler ran, it would sometimes rename the register containing
the function return value because the fact that the return value was
live-in to the subsequent block had been lost. To fix this, it is necessary
to run the RegisterScavenging code in the BranchFolding pass.

This patch makes sure that the register scavenging code is invoked
in the X86 subtarget only when post-RA scheduling is being done.
Post RA scheduling in the X86 subtarget is only done for Atom.

This patch adds a new function to the TargetRegisterClass to control
whether or not live-ins should be preserved during branch folding.
This is necessary in order for the anti-dependency optimizations done
during the PostRASchedulerList pass to work properly when doing
Post-RA scheduling for the X86 in general and for the Intel Atom in particular.

The patch adds and invokes the new function trackLivenessAfterRegAlloc()
instead of using the existing requiresRegisterScavenging().
It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of
requiresRegisterScavenging(). It changes the all the targets that
implemented requiresRegisterScavenging() to also implement
trackLivenessAfterRegAlloc().  

It adds an assertion in the Post RA scheduler to make sure that post RA
liveness information is available when it is needed.

It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order
to avoid running into the added assertion.

Finally, this patch restores the use of anti-dependency checking
(which was turned off temporarily for the 3.1 release) for
Intel Atom in the Post RA scheduler.

Patch by Andy Zhang!

Thanks to Jakob and Anton for their reviews.

llvm-svn: 155395
2012-04-23 21:39:35 +00:00
Craig Topper b25fda95f6 Reorder includes in Target backends to following coding standards. Remove some superfluous forward declarations.
llvm-svn: 152997
2012-03-17 18:46:09 +00:00
Jia Liu e1d619691b some comment fix for X86 and ARM
llvm-svn: 150902
2012-02-19 02:03:36 +00:00
Jia Liu b22310fda6 Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore.
llvm-svn: 150878
2012-02-18 12:03:15 +00:00
Evan Cheng 1b81fddd65 Use LEA to adjust stack ptr for Atom. Patch by Andy Zhang.
llvm-svn: 150008
2012-02-07 22:50:41 +00:00
Chandler Carruth ebd90c58e6 Begin fleshing out more convenience predicates in llvm::Triple and
convert at least one client over to use them. Subsequent patches both to
LLVM and Clang will try to convert more people over to a common set of
predicates.

This round of predicates is focused on OS-categorization predicates.

llvm-svn: 149815
2012-02-05 08:26:40 +00:00
Andrew Trick 8523b16ff5 Instruction scheduling itinerary for Intel Atom.
Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT.

Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches.

Adds a test to verify that the scheduler is working.

Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP.

Patch by Preston Gurd!

llvm-svn: 149558
2012-02-01 23:20:51 +00:00
Craig Topper b0c0f72ae6 Remove hasXMM/hasXMMInt functions. Move callers to hasSSE1/hasSSE2. This is the final piece to remove the AVX hack that disabled SSE.
llvm-svn: 147843
2012-01-10 06:54:16 +00:00
Craig Topper d97bbd7b60 Remove hasSSE*orAVX functions and change all callers to use just hasSSE*. AVX is now an SSE level and no longer disables SSE checks.
llvm-svn: 147842
2012-01-10 06:37:29 +00:00
Craig Topper eb8f9e9e5b Instruction selection priority fixes to remove the XMM/XMMInt/orAVX predicates. Another commit will remove orAVX functions from X86SubTarget.
llvm-svn: 147841
2012-01-10 06:30:56 +00:00
Craig Topper f287a4509e Remove AVX hack in X86Subtarget. AVX/AVX2 are now treated as an SSE level. Predicate functions have been altered to maintain previous names and behavior.
llvm-svn: 147770
2012-01-09 09:02:13 +00:00
Evan Cheng 557cda7f1d Remove hasSSE1orAVX(). It's the same as hasXMM().
llvm-svn: 146246
2011-12-09 06:32:46 +00:00
Evan Cheng 4d1a2d449f Many of the SSE patterns should not be selected when AVX is available. This led to the following code in X86Subtarget.cpp
if (HasAVX)
    X86SSELevel = NoMMXSSE;

This is so patterns that are predicated on hasSSE3, etc. would not be selected when avx is available. Instead, the AVX variant is selected.
However, this breaks instructions which do not have AVX variants.

The right way to fix this is for the SSE but not-AVX patterns to predicate on something like hasSSE3() && !hasAVX().
Then we can take out the hack in X86Subtarget.cpp. Patterns which do not have AVX variants do not need to change.

However, we need to audit all the patterns before we make the change. This patch is workaround that fixes one specific case,
the prefetch instructions. rdar://10538297

llvm-svn: 146163
2011-12-08 19:00:42 +00:00
Jan Sjödin 1280eb1d06 Add XOP feature flag.
llvm-svn: 145682
2011-12-02 15:14:37 +00:00
Craig Topper f563977795 Add methods for querying minimum SSE version along with AVX. Simplifies all the places that had to check a version of SSE and AVX.
llvm-svn: 145053
2011-11-22 00:44:41 +00:00
Craig Topper 228d9131aa Add intrinsics and feature flag for read/write FS/GS base instructions. Also add AVX2 feature flag.
llvm-svn: 143319
2011-10-30 19:57:21 +00:00
David Meyer 49045ddb4c Remove NaClMode
llvm-svn: 142338
2011-10-18 05:29:23 +00:00
Craig Topper aea148c366 Add X86 BZHI instruction as well as BMI2 feature detection.
llvm-svn: 142122
2011-10-16 07:55:05 +00:00
Craig Topper 3657fe4b17 Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell.
llvm-svn: 141939
2011-10-14 03:21:46 +00:00
Bill Wendling 063f55ffdd Revert r141854 because it was causing failures:
http://lab.llvm.org:8011/builders/llvm-x86_64-linux/builds/101

--- Reverse-merging r141854 into '.':
U    test/MC/Disassembler/X86/x86-32.txt
U    test/MC/Disassembler/X86/simple-tests.txt
D    test/CodeGen/X86/bmi.ll
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86.td
U    lib/Target/X86/X86Subtarget.h

llvm-svn: 141857
2011-10-13 07:48:07 +00:00
Craig Topper 8cc9388073 Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell.
llvm-svn: 141854
2011-10-13 07:09:14 +00:00
Craig Topper 271064e873 Add X86 LZCNT instruction. Including instruction selection support.
llvm-svn: 141651
2011-10-11 06:44:02 +00:00
Craig Topper fe9179fa4f Add Ivy Bridge 16-bit floating point conversion instructions for the X86 disassembler.
llvm-svn: 141505
2011-10-09 07:31:39 +00:00
Craig Topper 786bdb9e14 Add support for MOVBE and RDRAND instructions for the assembler and disassembler. Includes feature flag checking, but no instrinsic support. Fixes PR10832, PR11026 and PR11027.
llvm-svn: 141007
2011-10-03 17:28:23 +00:00
Nick Lewycky 73df7e3830 Add a new MC bit for NaCl (Native Client) mode. NaCl requires that certain
instructions are more aligned than the CPU requires, and adds some additional
directives, to follow in future patches. Patch by David Meyer!

llvm-svn: 139125
2011-09-05 21:51:43 +00:00
Eli Friedman 5e5704277f Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction.
llvm-svn: 138660
2011-08-26 21:21:21 +00:00
NAKAMURA Takumi b66d255595 X86Subtarget.h: Assume "x86_64-cygwin", though it has not been released yet, to appease test/CodeGen/X86 on cygwin.
llvm-svn: 135564
2011-07-20 04:02:20 +00:00
Evan Cheng 60fc0fca5c Restore old behavior. Always auto-detect features unless cpu or features are specified.
llvm-svn: 134757
2011-07-08 22:30:25 +00:00
Evan Cheng 13bcc6c1c7 Add Mode64Bit feature and sink it down to MC layer.
llvm-svn: 134641
2011-07-07 21:06:52 +00:00
Evan Cheng 1a72add615 Compute feature bits at time of MCSubtargetInfo initialization.
llvm-svn: 134606
2011-07-07 07:07:08 +00:00
Evan Cheng c9c090d7a5 Rename XXXGenSubtarget.inc to XXXGenSubtargetInfo.inc for consistency.
llvm-svn: 134281
2011-07-01 22:36:09 +00:00
Evan Cheng 0d639a28aa Rename TargetSubtarget to TargetSubtargetInfo for consistency.
llvm-svn: 134259
2011-07-01 21:01:15 +00:00
Evan Cheng 54b68e3432 - Added MCSubtargetInfo to capture subtarget features and scheduling
itineraries.
- Refactor TargetSubtarget to be based on MCSubtargetInfo.
- Change tablegen generated subtarget info to initialize MCSubtargetInfo
  and hide more details from targets.

llvm-svn: 134257
2011-07-01 20:45:01 +00:00
Evan Cheng fe6e405e8c Fix the ridiculous SubtargetFeatures API where it implicitly expects CPU name to
be the first encoded as the first feature. It then uses the CPU name to look up
features / scheduling itineray even though clients know full well the CPU name
being used to query these properties.

The fix is to just have the clients explictly pass the CPU name!

llvm-svn: 134127
2011-06-30 01:53:36 +00:00
Evan Cheng 3a0c5e52ff Remove TargetOptions.h dependency from X86Subtarget.
llvm-svn: 133726
2011-06-23 17:54:54 +00:00
Daniel Dunbar 2b9b0e3748 ADT/Triple: Move a variety of clients to using isOSDarwin() and isOSWindows()
predicates.

llvm-svn: 129816
2011-04-19 21:14:45 +00:00
Daniel Dunbar 100455a3c8 Target/X86: Eliminate uses of getDarwinVers().
llvm-svn: 129813
2011-04-19 21:04:12 +00:00
Daniel Dunbar 44b530369d Target/X86: Add getTargetTriple() accessor.
llvm-svn: 129812
2011-04-19 21:01:47 +00:00
Roman Divacky e8a93fe8f0 Stack alignment is 16 bytes on FreeBSD/i386 too.
llvm-svn: 126226
2011-02-22 17:30:05 +00:00
Duncan Sands bda7175a43 The stack should be 16 byte aligned on 32 bit solaris. Patch by Yuri.
llvm-svn: 126130
2011-02-21 17:37:17 +00:00
NAKAMURA Takumi 4c14a5cc2c Triple::MinGW64 is deprecated and removed. We can use Triple::MinGW32 generally.
No one uses *-mingw64. mingw-w64 is represented as {i686|x86_64}-w64-mingw32. In llvm side, i686 and x64 can be treated as similar way.

llvm-svn: 125747
2011-02-17 12:24:17 +00:00
NAKAMURA Takumi 0544fe7287 Fix whitespace.
llvm-svn: 125746
2011-02-17 12:23:50 +00:00
Evan Cheng d22a4a1fd6 Patches to build EFI with Clang/LLVM. By Carl Norum.
llvm-svn: 124639
2011-02-01 01:14:13 +00:00
Nate Begeman 8b08f5232b Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view. Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX". Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable.
llvm-svn: 121439
2010-12-10 00:26:57 +00:00
Benjamin Kramer 2f489236ab Add patterns for the x86 popcnt instruction.
- Also adds a new POPCNT subtarget feature that is currently enabled if the target
  supports SSE4.2 (nehalem) or SSE4A (barcelona).

llvm-svn: 120917
2010-12-04 20:32:23 +00:00
Rafael Espindola 66e08d43d2 Jim Asked us to move DataLayout on ARM back to the most specialized classes. Do
so and also change X86 for consistency.

Investigating if this can be improved a bit.

llvm-svn: 115469
2010-10-03 18:59:45 +00:00
NAKAMURA Takumi ea639aa11f X86Subtarget.h: Fix Cygwin's TD.
llvm-svn: 114297
2010-09-18 19:50:42 +00:00
Anton Korobeynikov a5a645559c Properly emit __chkstk call instead of __alloca on non-mingw windows targets.
Patch by Cameron Esfahani!

llvm-svn: 112902
2010-09-02 23:03:46 +00:00
Bruno Cardoso Lopes 09dc24beac Add x86 CLMUL (Carry-less multiplication) cpu feature
llvm-svn: 109206
2010-07-23 01:17:51 +00:00
Eric Christopher d429846eca Have the X86 backend use Triple instead of a string and some enums.
llvm-svn: 107625
2010-07-05 19:26:33 +00:00
Dan Gohman dc53f1cb5c FastISel doesn't yet handle callee-pop functions.
To support this, move IsCalleePop from X86ISelLowering to X86Subtarget.

llvm-svn: 104866
2010-05-27 18:43:40 +00:00
Evan Cheng 050df1b8de Enable i16 to i32 promotion by default.
llvm-svn: 102493
2010-04-28 08:30:49 +00:00
Evan Cheng 9c8cd8c061 isel (i32 anyext i16) as insert_subreg when 16-bit ops are being promoted.
llvm-svn: 101979
2010-04-21 01:47:12 +00:00
Eric Christopher 2ef63183a5 Separate out the AES-NI instructions from the SSE4.2 instructions. Add
a new subtarget option for AES and check for the support.  Add "westmere"
line of processors and add AES-NI support to the core i7.

Add a couple of TODOs for information I couldn't verify.

llvm-svn: 100231
2010-04-02 21:54:27 +00:00
Evan Cheng 738b0f9ec7 Nehalem unaligned memory access is fast.
llvm-svn: 100089
2010-04-01 05:58:17 +00:00
Evan Cheng bf724b9ee0 Turning off post-ra scheduling for x86. It isn't a consistent win.
llvm-svn: 98810
2010-03-18 06:55:42 +00:00
Chris Lattner a30d4ce194 add support for pentium class CPUs which do not have cmov,
PR4841.  Patch by Craig Smith!

llvm-svn: 98496
2010-03-14 18:31:44 +00:00
Mikhail Glushenkov abd56bde0e 80-col violations/trailing whitespace.
llvm-svn: 97427
2010-02-28 22:54:30 +00:00
Anton Korobeynikov c3c357006e Setup correct data layout to match gcc's expectations on mingw32.
llvm-svn: 95981
2010-02-12 15:28:56 +00:00
Duncan Sands 0067d6bbbe Fix typo.
llvm-svn: 93235
2010-01-12 08:30:46 +00:00
Duncan Sands fd75e12954 Tweak commit 91745, which changed target data for both Mingw and Cygwin,
to not touch Cygwin: the change caused llvm-gcc build failures due to
long double getting the wrong size.  Patch by Aaron Gray.

llvm-svn: 93234
2010-01-12 08:21:07 +00:00
David Greene 206351a1ff Implement a feature (-vector-unaligned-mem) to allow targets to
ignore alignment requirements for SIMD memory operands.  This
is useful on architectures like the AMD 10h that do not trap on
unaligned references if a status bit is twiddled at startup time.

llvm-svn: 93151
2010-01-11 16:29:42 +00:00
Evan Cheng 71d7eaa87e Remove target attribute break-sse-dep. Instead, do not fold load into sse partial update instructions unless optimizing for size.
llvm-svn: 91910
2009-12-22 17:47:23 +00:00
Anton Korobeynikov 148d87b0b0 Bump alignment requirements for windows targets to achieve compartibility with vcpp.
Based on patch by Michael Beck!

llvm-svn: 91745
2009-12-19 02:04:23 +00:00
Evan Cheng 4cf30b72bf On recent Intel u-arch's, folding loads into some unary SSE instructions can
be non-optimal. To be precise, we should avoid folding loads if the instructions
only update part of the destination register, and the non-updated part is not
needed. e.g. cvtss2sd, sqrtss. Unfolding the load from these instructions breaks
the partial register dependency and it can improve performance. e.g.

movss (%rdi), %xmm0
cvtss2sd %xmm0, %xmm0

instead of
cvtss2sd (%rdi), %xmm0

An alternative method to break dependency is to clear the register first. e.g.
xorps %xmm0, %xmm0
cvtss2sd (%rdi), %xmm0

llvm-svn: 91672
2009-12-18 07:40:29 +00:00
Dan Gohman 7a6611793f Target-independent support for TargetFlags on BlockAddress operands,
and support for blockaddresses in x86-32 PIC mode.

llvm-svn: 89506
2009-11-20 23:18:13 +00:00
David Goodwin b9fe5d5d02 Allow target to specify regclass for which antideps will only be broken along the critical path.
llvm-svn: 88682
2009-11-13 19:52:48 +00:00
David Goodwin 0d412c2528 Fixed to address code review. No functional changes.
llvm-svn: 86634
2009-11-10 00:48:55 +00:00
David Goodwin cf89db135e Allow targets to specify register classes whose member registers should not be renamed to break anti-dependencies.
llvm-svn: 86628
2009-11-10 00:15:47 +00:00
Chris Lattner 8714348afd indicate what the native integer types for the target are.
Please verify.

llvm-svn: 86397
2009-11-07 19:07:32 +00:00
Evan Cheng 8b86efefec X86 needs critical path anti-dependency breaking.
llvm-svn: 84931
2009-10-23 05:57:35 +00:00
David Goodwin 02ad4cb32e Allow the target to select the level of anti-dependence breaking that should be performed by the post-RA scheduler. The default is none.
llvm-svn: 84911
2009-10-22 23:19:17 +00:00
Evan Cheng c436631a9c Turn on post-alloc scheduling for x86.
llvm-svn: 84431
2009-10-18 19:57:27 +00:00
Evan Cheng 936d87b39d Oops. I forgot to change the tests first. Disable post-alloc scheduling.
llvm-svn: 84425
2009-10-18 18:31:31 +00:00
Evan Cheng 0e9d9ca855 -Revert parts of 84326 and 84411. Distinquishing between fixed and non-fixed
stack slots and giving them different PseudoSourceValue's did not fix the
problem of post-alloc scheduling miscompiling llvm itself.
- Apply Dan's conservative workaround by assuming any non fixed stack slots can
alias other memory locations. This means a load from spill slot #1 cannot 
move above a store of spill slot #2. 
- Enable post-alloc scheduling for x86 at optimization leverl Default and above.

llvm-svn: 84424
2009-10-18 18:16:27 +00:00
Evan Cheng 007ceb4603 Change createPostRAScheduler so it can be turned off at llc -O1.
llvm-svn: 84273
2009-10-16 21:06:15 +00:00
Evan Cheng e4a2117161 Remove X86Subtarget::IsLinux. It's no longer being used.
llvm-svn: 84200
2009-10-15 20:23:21 +00:00
Chris Lattner 46dcaadb4a rearrange X86ATTAsmPrinter::doFinalization, making a scan of
the global variable list only happen for COFF targets.

llvm-svn: 82010
2009-09-16 05:20:33 +00:00
Daniel Dunbar c3a0aba120 Make these functions static and local.
llvm-svn: 80892
2009-09-03 05:47:34 +00:00
Evan Cheng 47455a79ae X86JITInfo::getLazyResolverFunction() should not read cpu id to determine whether sse is available. Just use consult subtarget.
No functionality changes.

llvm-svn: 80880
2009-09-03 04:37:05 +00:00
Chris Lattner cc8c581a5b Add support for modeling whether or not the processor has support for
conditional moves as a subtarget feature.  This is the easy part of 
PR4841.

llvm-svn: 80763
2009-09-02 05:53:04 +00:00
Chris Lattner 1e9097e36a change the -x86-asm-syntax=intel/att flag to be in X86TAI
instead of X86 Subtarget.  This elimianates dependencies on
X86Subtarget from X86TAI.

llvm-svn: 78746
2009-08-11 23:01:09 +00:00
Daniel Dunbar 31b44e8f6c Normalize Subtarget constructors to take a target triple string instead of
Module*.

Also, dropped uses of TargetMachine where unnecessary. The only target which
still takes a TargetMachine& is Mips, I would appreciate it if someone would
normalize this to match other targets.

llvm-svn: 77918
2009-08-02 22:11:08 +00:00
Chris Lattner 21c2940553 remove the now-dead TM argument to these methods.
llvm-svn: 75276
2009-07-10 21:00:45 +00:00
Chris Lattner ba4d73310a make PIC vs DynamicNoPIC be explicit in PICStyles.
llvm-svn: 75275
2009-07-10 20:58:47 +00:00
Chris Lattner e2f524f176 add a couple of predicates to test for "stub style pic in PIC mode" and "stub style pic in dynamic-no-pic" mode.
llvm-svn: 75273
2009-07-10 20:47:30 +00:00
Chris Lattner 20073edf67 simplify fast isel by using ClassifyGlobalReference. This
elimiantes the last use of GVRequiresExtraLoad, so delete it.

llvm-svn: 75244
2009-07-10 07:48:51 +00:00
Chris Lattner 93f0f79178 eliminate GVRequiresRegister, replacing it with predicates we
need for other purposes.

llvm-svn: 75243
2009-07-10 07:38:24 +00:00
Chris Lattner dc842c06c2 move some classification logic around. Now GVRequiresExtraLoad
is just a trivial wrapper around "ClassifyGlobalReference", which
stole a ton of logic from LowerGlobalAddress.

llvm-svn: 75237
2009-07-10 07:20:05 +00:00
Chris Lattner b9af63a4d2 GVRequiresExtraLoad is now never used for calls, simplify it based on this.
llvm-svn: 75232
2009-07-10 05:52:02 +00:00
Chris Lattner ace6ec26d9 actually, just eliminate PCRelGVRequiresExtraLoad. It makes the code
more complex and slow than just directly testing what we care about.

llvm-svn: 75231
2009-07-10 05:48:03 +00:00
Chris Lattner 7277a807f0 There is only one case where GVRequiresExtraLoad returns true for calls:
split its handling out to PCRelGVRequiresExtraLoad, and simplify code
based on this.

llvm-svn: 75230
2009-07-10 05:45:15 +00:00
Chris Lattner 1cc7ae7c3b the "isDirectCall" operand of GVRequiresRegister is always false, eliminate it.
llvm-svn: 75229
2009-07-10 05:37:11 +00:00
Chris Lattner 1c5bf9d26d When in -static mode, force the PIC style to none. Doing this requires fixing
code which conflated RIPRel PIC with x86-64.  Fix these to just check for X86-64
directly.

llvm-svn: 75092
2009-07-09 03:15:51 +00:00
David Greene a4b8998fbb Fix a subtarget feature bug.
llvm-svn: 74428
2009-06-29 16:51:01 +00:00
David Greene 8f6f72cc99 Add feature flags for AVX and FMA and fix some SSE4A feature flag
initialization problems.

llvm-svn: 74350
2009-06-26 22:46:54 +00:00
Chris Lattner cfad3f3807 cosmetic changes.
llvm-svn: 73836
2009-06-21 01:27:55 +00:00
Stefanus Du Toit 96180b5387 Update CPU capabilities for AMD machines
- added processors k8-sse3, opteron-sse3, athlon64-sse3, amdfam10, and
barcelona with appropriate sse3/4a levels
- added FeatureSSE4A for amdfam10 processors
in X86Subtarget:
- added hasSSE4A
- updated AutoDetectSubtargetFeatures to detect SSE4A
- updated GetCurrentX86CPU to detect family 15 with sse3 as k8-sse3 and
family 10h as amdfam10

New processor names match those used by gcc.

Patch by Paul Redmond!

llvm-svn: 72434
2009-05-26 21:04:35 +00:00
Anton Korobeynikov 08bf4c0f5a Propagate CPU string out of SubtargetFeatures
llvm-svn: 72335
2009-05-23 19:50:50 +00:00
Evan Cheng 960983371c Try again. Allow call to immediate address for ELF or when in static relocation mode.
llvm-svn: 72160
2009-05-20 04:53:57 +00:00
Dan Gohman 906152a20f Tidy up #includes, deleting a bunch of unnecessary #includes.
llvm-svn: 61715
2009-01-05 17:59:02 +00:00
Evan Cheng 4c91aa3418 Do not isel load folding bt instructions for pentium m, core, core2, and AMD processors. These are significantly slower than a load followed by a bt of a register.
llvm-svn: 61557
2009-01-02 05:35:45 +00:00
Dan Gohman b9a012156b Add initial support for back-scheduling address computations,
especially in the case of addresses computed from loop induction
variables.

llvm-svn: 61075
2008-12-16 03:35:01 +00:00
Dale Johannesen b49d7cf19e Forgot a file.
llvm-svn: 60609
2008-12-05 21:55:35 +00:00
Duncan Sands 595a4423dc Fix build with gcc-4.4: it doesn't like PICStyle
being both a namespace and a variable name.

llvm-svn: 60208
2008-11-28 09:29:37 +00:00
Bill Wendling 1782584f56 Just don't transform this memset into "bzero" if no-builtin is specified.
llvm-svn: 56888
2008-09-30 22:05:33 +00:00
Bill Wendling bd09262e97 Add the new `-no-builtin' flag. This flag is meant to mimic the GCC
`-fno-builtin' flag. Currently, it's used to replace "memset" with "_bzero"
instead of "__bzero" on Darwin10+. This arguably violates the meaning of this
flag, but is currently sufficient. The meaning of this flag should become more
specific over time.

llvm-svn: 56885
2008-09-30 21:22:07 +00:00
Dan Gohman 6fd71c6512 Use a dedicated IsLinux flag instead of an ELFLinux TargetType.
llvm-svn: 50649
2008-05-05 16:11:31 +00:00
Dan Gohman bcde172222 Add AsmPrinter support for emitting a directive to declare that
the code being generated does not require an executable stack.

Also, add target-specific code to make use of this on Linux
on x86. 

llvm-svn: 50634
2008-05-05 00:28:39 +00:00
Evan Cheng 6c66bd368e Re-enable SSE4.
llvm-svn: 49158
2008-04-03 08:53:29 +00:00
Evan Cheng 3063c5546e Temporarily disabling SSE4 until we fix the encoding issues.
llvm-svn: 49129
2008-04-03 04:49:54 +00:00
Dan Gohman 980d7200c1 Speculatively micro-optimize memory-zeroing calls on Darwin 10.
llvm-svn: 49048
2008-04-01 20:38:36 +00:00
Anton Korobeynikov 7f125b2ba5 Add convenient helper for win64 check. Simplify things slightly.
llvm-svn: 48691
2008-03-22 20:57:27 +00:00
Evan Cheng 352acec37e Update comment.
llvm-svn: 47002
2008-02-12 07:59:55 +00:00
Evan Cheng a20a773654 Fix a x86-64 codegen deficiency. Allow gv + offset when using rip addressing mode.
Before:
_main:
        subq    $8, %rsp
        leaq    _X(%rip), %rax
        movsd   8(%rax), %xmm1
        movss   _X(%rip), %xmm0
        call    _t
        xorl    %ecx, %ecx
        movl    %ecx, %eax
        addq    $8, %rsp
        ret
Now:
_main:
        subq    $8, %rsp
        movsd   _X+8(%rip), %xmm1
        movss   _X(%rip), %xmm0
        call    _t
        xorl    %ecx, %ecx
        movl    %ecx, %eax
        addq    $8, %rsp
        ret

Notice there is another idiotic codegen issue that needs to be fixed asap:
xorl    %ecx, %ecx
movl    %ecx, %eax

llvm-svn: 46850
2008-02-07 08:53:49 +00:00
Nate Begeman e14fdfaecd SSE 4.1 Intrinsics and detection
llvm-svn: 46681
2008-02-03 07:18:54 +00:00
Chris Lattner cce79c67ca darwin9 and above support aligned common symbols.
llvm-svn: 45494
2008-01-02 19:44:55 +00:00
Chris Lattner f3ebc3f3d2 Remove attribution from file headers, per discussion on llvmdev.
llvm-svn: 45418
2007-12-29 20:36:04 +00:00
Rafael Espindola 063f177300 Make ARM an X86 memcpy expansion more similar to each other.
Now both subtarget define getMaxInlineSizeThreshold and the expansion uses it.

This should not change generated code.

llvm-svn: 43552
2007-10-31 11:52:06 +00:00
Rafael Espindola 1de0c86717 Add support for having different alignment for objects on call frames.
The x86-64 ABI states that objects passed on the stack have
8 byte alignment. Implement that.

llvm-svn: 41768
2007-09-07 14:52:14 +00:00
Evan Cheng 623dd88775 Mac OS X X86-64 ABI is same as the standard.
llvm-svn: 41700
2007-09-04 16:44:41 +00:00
Rafael Espindola bb8a5cff67 Align i64 and f64 at 8 byte on x86-64.
This is mandated table 3.1 at
http://www.x86-64.org/documentation/abi.pdf

llvm-svn: 41642
2007-08-31 12:23:58 +00:00
Dale Johannesen a010822b45 Replace 4-line function with 10-line version per review comment.
llvm-svn: 40881
2007-08-06 22:10:35 +00:00
Dale Johannesen d1822ea7d1 Move lengthy conditional down 1 level per review comment.
llvm-svn: 40878
2007-08-06 21:48:35 +00:00
Evan Cheng 763cdfd371 Mac OS X X86-64 low 4G address not available.
llvm-svn: 40701
2007-08-01 23:45:51 +00:00
Bill Wendling f099841573 Add support for our first SSSE3 instruction "pmulhrsw".
llvm-svn: 35869
2007-04-10 22:10:25 +00:00
Chris Lattner 6cc58a0dc5 document some subtlety
llvm-svn: 33257
2007-01-16 17:51:40 +00:00
Bill Wendling c7b2ab9bdf Instead of yet another enum indicating the "assembly language flavor",
just use the one that's in the subtarget.

llvm-svn: 33255
2007-01-16 09:29:17 +00:00
Anton Korobeynikov a0554d90e8 * PIC codegen for X86/Linux has been implemented
* PIC-aware internal structures in X86 Codegen have been refactored
* Visibility (default/weak) has been added
* Docs fixes (external weak linkage, visibility, formatting)

llvm-svn: 33136
2007-01-12 19:20:47 +00:00
Anton Korobeynikov 4efbbc963f Really big cleanup.
- New target type "mingw" was introduced
- Same things for both mingw & cygwin are marked as "cygming" (as in
gcc)
- .lcomm is supported here, so allow LLVM to use it
- Correctly use underscored versions of setjmp & _longjmp for both mingw
& cygwin

llvm-svn: 32833
2007-01-03 11:43:14 +00:00
Anton Korobeynikov 430e68a1b9 Refactored JIT codegen for mingw32. Now we're using standart relocation
type for distinguish JIT & non-JIT instead of "dirty" hacks :)

llvm-svn: 32745
2006-12-22 22:29:05 +00:00