We were handling the non-hidden case in lib/Target/TargetMachine.cpp,
but the hidden case was handled in architecture dependent code and
only X86_64 and AArch64 were covered.
While it is true that some code sequences in some ABIs might be able
to produce the correct value at runtime, that doesn't seem to be the
common case.
I left the AArch64 code in place since it also forces a got access for
non-pic code. It is not clear if that is needed, but it is probably
better to change that in another commit.
llvm-svn: 316799
Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments.
llvm-svn: 315862
Adding x86 Processor families to initialize several uArch properties (based on the family)
This patch shows how gather cost can be initialized based on the proc. family
Differential Revision: https://reviews.llvm.org/D35348
llvm-svn: 313132
Summary:
Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge".
This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion.
This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX)
This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature.
Reviewers: spatel, chandlerc, RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37280
llvm-svn: 312097
Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly.
Patch by David Zarzycki.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37224
llvm-svn: 311979
This reverts commit r310425, thus reapplying r310335 with a fix for link
issue of the AArch64 unittests on Linux bots when BUILD_SHARED_LIBS is ON.
Original commit message:
[GlobalISel] Remove the GISelAccessor API.
Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.
NFC.
----
The fix for the link issue consists in adding the GlobalISel library in
the list of dependencies for the AArch64 unittests. This dependency
comes from the use of AArch64Subtarget that needs to know how
to destruct the GISel related APIs when being detroyed.
Thanks to Bill Seurer and Ahmed Bougacha for helping me reproducing and
understand the problem.
llvm-svn: 310969
When the access to a weak symbol is not a call, the access has to be
able to produce the value 0 at runtime.
We were sometimes producing code sequences where that was not possible
if the code was leaded more than 4g away from 0.
llvm-svn: 310756
This reverts commit r310115.
It causes a linker failure for the one of the unittests of AArch64 on one
of the linux bot:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/3429
: && /home/fedora/gcc/install/gcc-7.1.0/bin/g++ -fPIC
-fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W
-Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -pedantic -Wno-long-long
-Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -O2
-L/home/fedora/gcc/install/gcc-7.1.0/lib64 -Wl,-allow-shlib-undefined
-Wl,-O3 -Wl,--gc-sections
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o -o
unittests/Target/AArch64/AArch64Tests
lib/libLLVMAArch64CodeGen.so.6.0.0svn lib/libLLVMAArch64Desc.so.6.0.0svn
lib/libLLVMAArch64Info.so.6.0.0svn lib/libLLVMCodeGen.so.6.0.0svn
lib/libLLVMCore.so.6.0.0svn lib/libLLVMMC.so.6.0.0svn
lib/libLLVMMIRParser.so.6.0.0svn lib/libLLVMSelectionDAG.so.6.0.0svn
lib/libLLVMTarget.so.6.0.0svn lib/libLLVMSupport.so.6.0.0svn -lpthread
lib/libgtest_main.so.6.0.0svn lib/libgtest.so.6.0.0svn -lpthread
-Wl,-rpath,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib
&& :
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x0):
undefined reference to `vtable for llvm::LegalizerInfo'
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x8):
undefined reference to `vtable for llvm::RegisterBankInfo'
The particularity of this bot is that it is built with
BUILD_SHARED_LIBS=ON
However, I was not able to reproduce the problem so far.
Reverting to unblock the bot.
llvm-svn: 310425
Summary:
Direct calls to dllimport functions are very common Windows. We should
add them to the -O0 fast path.
Reviewers: rafael
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D36197
llvm-svn: 310152
Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.
NFC.
llvm-svn: 310115
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.
llvm-svn: 309990
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).
Differential Revision: https://reviews.llvm.org/D33169
llvm-svn: 303858
Summary:
This causes them to be re-computed more often than necessary but resolves
objections that were raised post-commit on r301750.
Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls
Reviewed By: qcolombet
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32861
llvm-svn: 303418
According to Intel's Optimization Reference Manual for SNB+:
" For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
dispatch via port 1:
- LEA that has all three source operands: base, index, and offset
- LEA that uses base and index registers where the base is EBP, RBP,or R13
- LEA that uses RIP relative addressing mode
- LEA that uses 16-bit addressing mode "
This patch currently handles the first 2 cases only.
Differential Revision: https://reviews.llvm.org/D32277
llvm-svn: 303333
According to psABI, PLT stub clobbers XMM8-XMM15.
In Regcall calling convention those registers are used for passing parameters.
Thus we need to prevent lazy binding in Regcall.
Differential Revision: https://reviews.llvm.org/D32430
llvm-svn: 302124
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).
Reapplied - this time without changing line endings of existing files.
Differential Revision: https://reviews.llvm.org/D32769
llvm-svn: 302041
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).
Differential Revision: https://reviews.llvm.org/D32769
llvm-svn: 302028
Summary:
Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (RecomputePerFunction==0)
and per-function (RecomputePerFunction==1). Per-function predicates are
currently recomputed more frequently than necessary since the only predicate
in this category is cheap to test. Per-module predicates are now computed in
getSubtargetImpl() while per-function predicates are computed in selectImpl().
Tablegen now manages the PredicateBitset internally. It should only be
necessary to add the required includes.
Also fixed a problem revealed by the test case where
constrainSelectedInstRegOperands() would attempt to tie operands that
BuildMI had already tied.
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32491
llvm-svn: 301750
when the subtarget has fast strings.
This has two advantages:
- Speed is improved. For example, on Haswell thoughput improvements increase
linearly with size from 256 to 512 bytes, after which they plateau:
(e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes).
- Code is much smaller (no need to handle boundaries).
llvm-svn: 300957
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.
Differential Revision: https://reviews.llvm.org/D29874
llvm-svn: 296859
Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.
This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.
Reviewers: zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30181
llvm-svn: 295697
We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.
llvm-svn: 294562
This patch does the following.
1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1)
3. Adds the clzero feature under znver1 architecture.
4. The custom inserter is added in Lowering.
5. A testcase is added to check the intrinsic.
6. The clzero instruction is added to assembler test.
Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me.
Differential revision: https://reviews.llvm.org/D29385
llvm-svn: 294558
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.
As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html
Differential Revision: https://reviews.llvm.org/D25878
llvm-svn: 289087
Summary:
Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference Manual]:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].
Fixes pr31202.
Analysis of this issue was done by Fahana Aleen.
Reviewers: wmi, delena, mkuper
Subscribers: RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27203
llvm-svn: 288844
Summary:
Add basic functionality to support call lowering for X86.
Currently only supports functions which return void and take zero arguments.
Inspired by commit 286573.
Reviewers: ab, qcolombet, t.p.northover
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26593
llvm-svn: 286935
This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
To:
srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.
Differential Revision: https://reviews.llvm.org/D23446
llvm-svn: 284248
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.
The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.
Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.
Differential Revision: https://reviews.llvm.org/D21379
llvm-svn: 277725
The main difference is that StubDynamicNoPIC is gone. The
dynamic-no-pic mode as the name implies is simply not pic. It is just
conservative about what it assumes to be dso local.
llvm-svn: 273222
We performed a number of memory allocations each time getTTI was called,
remove them by using SmallString.
No functionality change intended.
llvm-svn: 270246
This refactors the logic in X86 to avoid code duplication. It also
splits it in two steps: it first decides if a symbol is local to the DSO
and then uses that information to decide how to access it.
The first part is implemented by shouldAssumeDSOLocal. It is not in any
way specific to X86. In a followup patch I intend to move it to
somewhere common and reused it in other backends.
llvm-svn: 270209
Summary:
MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT
pair while adding a timer function, such that another termination of the MWAITX
instruction occurs when the timer expires. The presence of the MONITORX and
MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29.
The MONITORX and MWAITX instructions are intercepted by the same bits that
intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be
monitored. MWAITX instruction causes the processor to stop instruction execution
and enter an implementation-dependent optimized state until occurrence of a
class of events.
Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is
"0F 01 FB". These opcode information is used in adding tests for the
disassembler.
These instructions are enabled for AMD's bdver4 architecture.
Patch by Ganesh Gopalasubramanian!
Reviewers: echristo, craig.topper, RKSimon
Subscribers: RKSimon, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19795
llvm-svn: 269911
Since r207518 they are printed exactly like non-hidden stubs on x86 and
since r207517 on ARM.
This means we can use a single set for all stubs in those platforms.
llvm-svn: 269776
Both Linux and kFreeBSD use glibc, so follow similiar code paths.
Add isTargetGlibc to check for this, and use it instead of isTargetLinux
in a few places.
Fixes PR22248 for kFreeBSD.
Differential Revision: http://reviews.llvm.org/D19104
llvm-svn: 268624
Changes in X86.td:
I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X ..
I added Skylake client processor and defined it's features
FeatureADX was missing on KNL
Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others
Differential Revision: http://reviews.llvm.org/D16357
llvm-svn: 258659
The feature flag is for VPERMB,VPERMI2B,VPERMT2B and VPMULTISHIFTQB instructions.
More about the instruction can be found in:
hattps://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf
Differential Revision: http://reviews.llvm.org/D16190
llvm-svn: 258012
the feature flag is essential for RDPKRU and WRPKRU instruction
more about the instruction can be found in the SDM rev 56, vol 2 from http://www.intel.com/sdm
Differential Revision: http://reviews.llvm.org/D15491
llvm-svn: 255644
These instructions are not supported by all CPUs in 64-bit mode. Emitting them
causes Chromium to crash on start-up for users with such chips.
(GCC puts these instructions behind -msahf on 64-bit for the same reason.)
This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets
and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering
from before r244503 when the instructions are not available.
Differential Revision: http://reviews.llvm.org/D15240
llvm-svn: 254793
its own variable.
This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.
Rationale:
// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
return _mm_addsub_pd(A, B);
}
// mmx
void shift(__m64 a, __m64 b, int c) {
_mm_slli_pi16(a, c);
_mm_slli_pi32(a, c);
_mm_slli_si64(a, c);
_mm_srli_pi16(a, c);
_mm_srli_pi32(a, c);
_mm_srli_si64(a, c);
_mm_srai_pi16(a, c);
_mm_srai_pi32(a, c);
}
clang -msse3 -mno-mmx file.c -c
For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.
This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.
Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.
This is paired with a patch to clang to take advantage of this behavior.
llvm-svn: 249731
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).
For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.
This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.
This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.
Reviewers: rengolin
Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10969
llvm-svn: 247692
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).
For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.
This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.
This commit also contains a trivial patch to clang to account for the C++ API
change.
Reviewers: rengolin
Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10969
llvm-svn: 247683
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.
This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.
A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx
This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449
Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.
Differential Revision: http://reviews.llvm.org/D12288
llvm-svn: 245950
This is a 'no functional change intended' patch. It removes one FIXME, but adds several more.
Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any
sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however,
you can see that we're not consistent about this. Changing the name of the attribute makes it
clearer to see the logic holes.
Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute
to new chips; fast unaligned accesses have been standard for several generations of CPUs now.
Differential Revision: http://reviews.llvm.org/D12154
llvm-svn: 245729
Remove all calls to `MCSubtargetInfo::InitCPUSched()` and merge its body
into the only relevant caller, `MCSubtargetInfo::InitMCProcessorInfo()`.
We were only calling the former after explicitly calling the latter with
the same CPU; it's confusing to have both methods exposed.
Besides a minor (surely unmeasurable) speedup in ARM and X86 from
avoiding running the logic twice, no functionality change.
llvm-svn: 241956
Summary:
Remove empty subclass in the process.
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren, ted
Differential Revision: http://reviews.llvm.org/D11045
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241780
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.
Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.
Differential Revision: http://reviews.llvm.org/D10941
llvm-svn: 241413
There is a one-to-one relationship between X86Subtarget and
X86FrameLowering, but every frame lowering method would previously pull
the subtarget off the MachineFunction and query some subtarget
properties.
Over time, these locals began to grow in complexity and it became
important to keep their names and meaning in sync across all of the
frame lowering methods, leading to duplication. We can eliminate that
duplication by computing them once in the constructor.
llvm-svn: 239948
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10311
llvm-svn: 239467
The first try (r238051) to land this was reverted due to ExecutionEngine build failure;
that was hopefully addressed by r238788.
The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure;
that was hopefully addressed by r238953.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 239001
Intel® Memory Protection Extensions (Intel® MPX) is a new feature in Skylake.
It is a part of KNL and SKX sets. It is also a part of Skylake client.
I added definition of %bnd0 - %bnd3 registers, each register is a pair of 64-bit integers.
llvm-svn: 238916
The first try (r238051) to land this was reverted due to bot failures
that were hopefully addressed by r238788.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 238842
This patch adds a class for processing many recip codegen possibilities.
The TargetRecip class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 238051
to use the information in the module rather than TargetOptions.
We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.
For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.
Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.
llvm-svn: 237079
r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit.
The commit log says that change did not affect anything, but that's not correct.
That change allowed SSE instructions to have unaligned mem operands folded into
math ops, and that's not allowed in the default specification for any SSE variant.
The bug is exposed when compiling for an AVX-capable CPU that had this feature
flag but without enabling AVX codegen. Another mistake in r224330 was not adding
the feature flag to all AVX CPUs; the AMD chips were excluded.
This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ).
This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem".
Changed the existing test case for the feature bit to reflect the new name and
renamed the test file itself to better reflect the feature.
Added runs to fold-vex.ll to check for the failing codegen.
Note that the feature bit is not set by default on any CPU because it may require a
configuration register setting to enable the enhanced unaligned behavior.
llvm-svn: 227983
derived classes.
Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.
*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.
llvm-svn: 227113
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.
Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).
Differential Revision: http://reviews.llvm.org/D6355
llvm-svn: 222544
This removes calls to isMaterializable in the following cases:
* It was redundant with a call to isDeclaration now that isDeclaration returns
the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.
llvm-svn: 221050
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
llvm-svn: 220570