Commit Graph

341 Commits

Author SHA1 Message Date
Craig Topper 09b6598572 [X86] Add fxsr feature flag for fxsave/fxrestore instructions.
llvm-svn: 250497
2015-10-16 06:03:09 +00:00
Amjad Aboud 1db6d7af46 [X86] Add XSAVE intrinsic family
Add intrinsics for the
  XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
  XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
  XSAVEC instructions (XSAVEC/XSAVEC64)
  XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)

Differential Revision: http://reviews.llvm.org/D13012

llvm-svn: 250029
2015-10-12 11:47:46 +00:00
Eric Christopher 11e5983658 Move the MMX subtarget feature out of the SSE set of features and into
its own variable.

This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.

Rationale:

// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
  return _mm_addsub_pd(A, B);
}

// mmx
void shift(__m64 a, __m64 b, int c) {
  _mm_slli_pi16(a, c);
  _mm_slli_pi32(a, c);
  _mm_slli_si64(a, c);
  _mm_srli_pi16(a, c);
  _mm_srli_pi32(a, c);
  _mm_srli_si64(a, c);
  _mm_srai_pi16(a, c);
  _mm_srai_pi32(a, c);
}

clang -msse3 -mno-mmx file.c -c

For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.

This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.

Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.

This is paired with a patch to clang to take advantage of this behavior.

llvm-svn: 249731
2015-10-08 20:10:06 +00:00
Daniel Sanders 50f17235dd Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Eric has replied and has demanded the patch be reverted.

llvm-svn: 247702
2015-09-15 16:17:27 +00:00
Daniel Sanders 153010c52d Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247692
2015-09-15 14:08:28 +00:00
Daniel Sanders c40de48041 Revert r247684 - Replace Triple with a new TargetTuple ...
LLDB needs to be updated in the same commit.

llvm-svn: 247686
2015-09-15 13:46:21 +00:00
Daniel Sanders 18d4b0dab7 Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247683
2015-09-15 13:17:40 +00:00
Sanjay Patel 30145677a8 rename "slow-unaligned-mem-under-32" to slow-unaligned-mem-16" (NFCI)
This is a follow-on suggested by:
http://reviews.llvm.org/D12154 ( http://reviews.llvm.org/rL245729 )
http://reviews.llvm.org/D10662 ( http://reviews.llvm.org/rL245075 )

This makes the attribute name match most of the existing lowering logic
and regression test expectations.

But the current use of this attribute is inconsistent; see the FIXME
comment for "allowsMisalignedMemoryAccesses()". That change will
result in functional changes and should be coming soon.

llvm-svn: 246585
2015-09-01 20:51:51 +00:00
Sanjay Patel deb8f826a5 make fast unaligned memory accesses implicit with SSE4.2 or SSE4a
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.

This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.

A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx

This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449

Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.

Differential Revision: http://reviews.llvm.org/D12288

llvm-svn: 245950
2015-08-25 16:29:21 +00:00
Sanjay Patel 9e916dc48d [x86] invert logic for attribute 'FeatureFastUAMem'
This is a 'no functional change intended' patch. It removes one FIXME, but adds several more.

Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any
sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however,
you can see that we're not consistent about this. Changing the name of the attribute makes it
clearer to see the logic holes.

Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute
to new chips; fast unaligned accesses have been standard for several generations of CPUs now.

Differential Revision: http://reviews.llvm.org/D12154

llvm-svn: 245729
2015-08-21 20:17:26 +00:00
Sanjay Patel 2e75341b7f don't repeaat function names in comments; NFC
llvm-svn: 245058
2015-08-14 15:11:42 +00:00
Duncan P. N. Exon Smith bb57d73805 MC: Remove MCSubtargetInfo::InitCPUSched()
Remove all calls to `MCSubtargetInfo::InitCPUSched()` and merge its body
into the only relevant caller, `MCSubtargetInfo::InitMCProcessorInfo()`.
We were only calling the former after explicitly calling the latter with
the same CPU; it's confusing to have both methods exposed.

Besides a minor (surely unmeasurable) speedup in ARM and X86 from
avoiding running the logic twice, no functionality change.

llvm-svn: 241956
2015-07-10 22:33:01 +00:00
Mehdi Amini 157e5a6d10 Remove getDataLayout() from TargetSelectionDAGInfo (had no users)
Summary:
Remove empty subclass in the process.

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren, ted

Differential Revision: http://reviews.llvm.org/D11045

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241780
2015-07-09 02:10:08 +00:00
Peter Collingbourne 6a9d1774d0 IR: Do not consider available_externally linkage to be linker-weak.
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.

Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.

Differential Revision: http://reviews.llvm.org/D10941

llvm-svn: 241413
2015-07-05 20:52:35 +00:00
Reid Kleckner f9977bfb23 Re-land "[X86] Cache variables that only depend on the subtarget"
Re-instates r239949 without accidentally flipping the sense of UseLEA.

llvm-svn: 239950
2015-06-17 21:50:02 +00:00
Reid Kleckner 09543c2998 Revert "[X86] Cache variables that only depend on the subtarget"
This reverts commit r239948, tests seem to be failing.

llvm-svn: 239949
2015-06-17 21:35:02 +00:00
Reid Kleckner 05b39483c1 [X86] Cache variables that only depend on the subtarget
There is a one-to-one relationship between X86Subtarget and
X86FrameLowering, but every frame lowering method would previously pull
the subtarget off the MachineFunction and query some subtarget
properties.

Over time, these locals began to grow in complexity and it became
important to keep their names and meaning in sync across all of the
frame lowering methods, leading to duplication. We can eliminate that
duplication by computing them once in the constructor.

llvm-svn: 239948
2015-06-17 21:31:17 +00:00
Daniel Sanders a73f1fdb19 Replace string GNU Triples with llvm::Triple in MCSubtargetInfo and create*MCSubtargetInfo(). NFC.
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

Reviewers: rafael

Reviewed By: rafael

Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski

Differential Revision: http://reviews.llvm.org/D10311

llvm-svn: 239467
2015-06-10 12:11:26 +00:00
Sanjay Patel 667a7e2a0f make reciprocal estimate code generation more flexible by adding command-line options (3rd try)
The first try (r238051) to land this was reverted due to ExecutionEngine build failure;
that was hopefully addressed by r238788.

The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure;
that was hopefully addressed by r238953.

This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.

The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.

Differential Revision: http://reviews.llvm.org/D8982

llvm-svn: 239001
2015-06-04 01:32:35 +00:00
Elena Demikhovsky f7e641cc2d X86: Added MPX feature and bound registers.
Intel® Memory Protection Extensions (Intel® MPX) is a new feature in Skylake.
It is a part of KNL and SKX sets. It is also a part of Skylake client.

I added definition of %bnd0 - %bnd3 registers, each register is a pair of 64-bit integers.

llvm-svn: 238916
2015-06-03 10:30:57 +00:00
Rafael Espindola cf8beece97 Revert "make reciprocal estimate code generation more flexible by adding command-line options (2nd try)"
This reverts commit r238842.

It broke -DBUILD_SHARED_LIBS=ON build.

llvm-svn: 238900
2015-06-03 05:32:44 +00:00
Sanjay Patel 6f031d848e make reciprocal estimate code generation more flexible by adding command-line options (2nd try)
The first try (r238051) to land this was reverted due to bot failures
that were hopefully addressed by r238788.

This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.

The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.

Differential Revision: http://reviews.llvm.org/D8982

llvm-svn: 238842
2015-06-02 15:28:15 +00:00
Rafael Espindola 445712264d Revert "make reciprocal estimate code generation more flexible by adding command-line options"
This reverts commit r238051.

It broke some bots:

http://lab.llvm.org:8011/builders/llvm-ppc64-linux1/builds/18190

llvm-svn: 238075
2015-05-23 00:22:44 +00:00
Sanjay Patel ba2ba80302 make reciprocal estimate code generation more flexible by adding command-line options
This patch adds a class for processing many recip codegen possibilities.
The TargetRecip class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.

The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other CPUs continue to *not* use reciprocal estimates by default with -ffast-math.

Differential Revision: http://reviews.llvm.org/D8982

llvm-svn: 238051
2015-05-22 21:10:06 +00:00
Eric Christopher 824f42f209 Migrate existing backends that care about software floating point
to use the information in the module rather than TargetOptions.

We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.

For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.

Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.

llvm-svn: 237079
2015-05-12 01:26:05 +00:00
Craig Topper a898c2d737 [X86] Remove two feature flags that covered sets of instructions that have no patterns or intrinsics. Since we don't check feature flags in the assembler parser for any instruction sets, these flags don't provide any value. This frees up 2 of the fully utilized feature flags.
llvm-svn: 228282
2015-02-05 08:51:02 +00:00
Sanjay Patel ffd039bde1 Fix program crashes due to alignment exceptions generated for SSE memop instructions (PR22371).
r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit.
The commit log says that change did not affect anything, but that's not correct.
That change allowed SSE instructions to have unaligned mem operands folded into
math ops, and that's not allowed in the default specification for any SSE variant. 

The bug is exposed when compiling for an AVX-capable CPU that had this feature
flag but without enabling AVX codegen. Another mistake in r224330 was not adding
the feature flag to all AVX CPUs; the AMD chips were excluded.

This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ).

This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem".
Changed the existing test case for the feature bit to reflect the new name and
renamed the test file itself to better reflect the feature.
Added runs to fold-vex.ll to check for the failing codegen.

Note that the feature bit is not set by default on any CPU because it may require a
configuration register setting to enable the enhanced unaligned behavior.

llvm-svn: 227983
2015-02-03 17:13:04 +00:00
Eric Christopher 05b819718c Reuse a bunch of cached subtargets and remove getSubtarget calls
without a Function argument.

llvm-svn: 227814
2015-02-02 17:38:43 +00:00
Eric Christopher 8b7706517c Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

llvm-svn: 227113
2015-01-26 19:03:15 +00:00
Sanjay Patel 501890e909 Add a feature flag for slow 32-byte unaligned memory accesses [x86].
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for 
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.

Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6355

llvm-svn: 222544
2014-11-21 17:40:04 +00:00
Alexey Volkov fd1731d876 [X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers
Differential Revision: http://reviews.llvm.org/D5938

llvm-svn: 222521
2014-11-21 11:19:34 +00:00
Sanjay Patel 50fc6ff5e3 Initialize new subtarget feature variable for generating reciprocal estimate instructions.
This was missed in r221706.

llvm-svn: 221731
2014-11-11 23:13:15 +00:00
Rafael Espindola 246c4fb5d9 Remove redundant calls to isMaterializable.
This removes calls to isMaterializable in the following cases:

* It was redundant with a call to isDeclaration now that isDeclaration returns
  the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.

llvm-svn: 221050
2014-11-01 16:46:18 +00:00
Sanjay Patel 957efc23bb Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Eric Christopher 12f4a78581 constify TargetMachine parameter for X86TargetLowering.
llvm-svn: 218804
2014-10-01 20:38:22 +00:00
Eric Christopher b68e25330b Remove resetSubtargetFeatures as it is unused.
llvm-svn: 217071
2014-09-03 20:36:31 +00:00
Eric Christopher 79cc1e3ae7 Reinstate "Nuke the old JIT."
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reinstates commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 216982
2014-09-02 22:28:02 +00:00
Robert Khasanov 98441b6e7f [x86] Enable Broadwell target.
Added FeatureSMAP.

Broadwell ISA includes Haswell ISA + ADX + RDSEED + SMAP

llvm-svn: 216161
2014-08-21 09:16:12 +00:00
Duncan P. N. Exon Smith e85df16564 Fix whitespace error from r215279, NFC
llvm-svn: 215667
2014-08-14 17:18:26 +00:00
Eric Christopher e950b6776b Initialize X86 DataLayout based on the Triple only.
llvm-svn: 215279
2014-08-09 04:38:53 +00:00
Eric Christopher 4629ed75e4 Move some X86 subtarget configuration onto the subtarget that's being
created.

llvm-svn: 215271
2014-08-09 01:07:25 +00:00
Eric Christopher b9fd9ed37e Temporarily Revert "Nuke the old JIT." as it's not quite ready to
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.

Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reverts commits r215111, 215115, 215116, 215117, 215136.

llvm-svn: 215154
2014-08-07 22:02:54 +00:00
Rafael Espindola f8b27c41e8 Nuke the old JIT.
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.

Thanks to Lang Hames for making MCJIT a good replacement!

llvm-svn: 215111
2014-08-07 14:21:18 +00:00
Kevin Enderby 0d928a142b Add support for the X86 secure guard extensions instructions in assembler (SGX).
This allows assembling the two new instructions, encls and enclu for the
SKX processor model.

Note the diffs are a bigger than what might think, but to fit the new
MRM_CF and MRM_D7 in things in the right places things had to be
renumbered and shuffled down causing a bit more diffs.

rdar://16228228

llvm-svn: 214460
2014-07-31 23:57:38 +00:00
Robert Khasanov bfa0131365 [SKX] Enabling SKX target and AVX512BW, AVX512DQ, AVX512VL features.
Enabling HasAVX512{DQ,BW,VL} predicates.
Adding VK2, VK4, VK32, VK64 masked register classes.
Adding new types (v64i8, v32i16) to VR512.
Extending calling conventions for new types (v64i8, v32i16)

Patch by Zinovy Nis <zinovy.y.nis@intel.com>
Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 213545
2014-07-21 14:54:21 +00:00
Sanjay Patel a2f658d69d Move Post RA Scheduling flag bit into SchedMachineModel
Refactoring; no functional changes intended

    Removed PostRAScheduler bits from subtargets (X86, ARM).
    Added PostRAScheduler bit to MCSchedModel class.
    This bit is set by a CPU's scheduling model (if it exists).
    Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
    Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
    Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
    Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
    Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: 
       a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. 
       b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. 
       c. PPC overrides the CPU's postRA settings by enabling postRA for everything. 
       d. X86 is the only target that actually has postRA specified via sched model info.

Differential Revision: http://reviews.llvm.org/D4217

llvm-svn: 213101
2014-07-15 22:39:58 +00:00
Eric Christopher 1a2120312b Move to a private function to initialize the subtarget dependencies
so that we can use initializer lists for the X86Subtarget.

llvm-svn: 210614
2014-06-11 00:25:19 +00:00
Eric Christopher cd996edec5 Use unique_ptr for X86Subtarget pointer members.
llvm-svn: 210606
2014-06-10 23:26:47 +00:00
Eric Christopher 6c786a1dd1 Remove the use of TargetMachine from X86InstrInfo.
llvm-svn: 210596
2014-06-10 22:34:31 +00:00
Eric Christopher 0fb16ab204 Delete X86JITInfo in the subtarget destructor.
llvm-svn: 210516
2014-06-10 08:03:42 +00:00