I disabled FMA3 autodetection, since the result may differ from expected for some benchmarks.
I added tests for GodeGen and intrinsics.
I did not change llvm.fma.f32/64 - it may be done later.
llvm-svn: 157737
on X86 Atom. Some of our tests failed because the tail merging part of
the BranchFolding pass was creating new basic blocks which did not
contain live-in information. When the anti-dependency code in the Post-RA
scheduler ran, it would sometimes rename the register containing
the function return value because the fact that the return value was
live-in to the subsequent block had been lost. To fix this, it is necessary
to run the RegisterScavenging code in the BranchFolding pass.
This patch makes sure that the register scavenging code is invoked
in the X86 subtarget only when post-RA scheduling is being done.
Post RA scheduling in the X86 subtarget is only done for Atom.
This patch adds a new function to the TargetRegisterClass to control
whether or not live-ins should be preserved during branch folding.
This is necessary in order for the anti-dependency optimizations done
during the PostRASchedulerList pass to work properly when doing
Post-RA scheduling for the X86 in general and for the Intel Atom in particular.
The patch adds and invokes the new function trackLivenessAfterRegAlloc()
instead of using the existing requiresRegisterScavenging().
It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of
requiresRegisterScavenging(). It changes the all the targets that
implemented requiresRegisterScavenging() to also implement
trackLivenessAfterRegAlloc().
It adds an assertion in the Post RA scheduler to make sure that post RA
liveness information is available when it is needed.
It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order
to avoid running into the added assertion.
Finally, this patch restores the use of anti-dependency checking
(which was turned off temporarily for the 3.1 release) for
Intel Atom in the Post RA scheduler.
Patch by Andy Zhang!
Thanks to Jakob and Anton for their reviews.
llvm-svn: 155395
Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT.
Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches.
Adds a test to verify that the scheduler is working.
Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP.
Patch by Preston Gurd!
llvm-svn: 149558
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.
One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.
llvm-svn: 145714
instructions are more aligned than the CPU requires, and adds some additional
directives, to follow in future patches. Patch by David Meyer!
llvm-svn: 139125
and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
MCSubtargetInfo so MC code emitter can do the right thing.
llvm-svn: 134884
- Each target asm parser now creates its own MCSubtatgetInfo (if needed).
- Changed AssemblerPredicate to take subtarget features which tablegen uses
to generate asm matcher subtarget feature queries. e.g.
"ModeThumb,FeatureThumb2" is translated to
"(Bits & ModeThumb) != 0 && (Bits & FeatureThumb2) != 0".
llvm-svn: 134678
itineraries.
- Refactor TargetSubtarget to be based on MCSubtargetInfo.
- Change tablegen generated subtarget info to initialize MCSubtargetInfo
and hide more details from targets.
llvm-svn: 134257
be the first encoded as the first feature. It then uses the CPU name to look up
features / scheduling itineray even though clients know full well the CPU name
being used to query these properties.
The fix is to just have the clients explictly pass the CPU name!
llvm-svn: 134127
symbols as declarations in the X86 backend. This would manifest
on darwin x86-32 as errors like this with -fvisibility=hidden:
symbol '__ZNSbIcED1Ev' can not be undefined in a subtraction expression
This fixes PR7353.
llvm-svn: 105954
a new subtarget option for AES and check for the support. Add "westmere"
line of processors and add AES-NI support to the core i7.
Add a couple of TODOs for information I couldn't verify.
llvm-svn: 100231
Modules and ModuleProviders. Because the "ModuleProvider" simply materializes
GlobalValues now, and doesn't provide modules, it's renamed to
"GVMaterializer". Code that used to need a ModuleProvider to materialize
Functions can now materialize the Functions directly. Functions no longer use a
magic linkage to record that they're materializable; they simply ask the
GVMaterializer.
Because the C ABI must never change, we can't remove LLVMModuleProviderRef or
the functions that refer to it. Instead, because Module now exposes the same
functionality ModuleProvider used to, we store a Module* in any
LLVMModuleProviderRef and translate in the wrapper methods. The bindings to
other languages still use the ModuleProvider concept. It would probably be
worth some time to update them to follow the C++ more closely, but I don't
intend to do it.
Fixes http://llvm.org/PR5737 and http://llvm.org/PR5735.
llvm-svn: 94686
ignore alignment requirements for SIMD memory operands. This
is useful on architectures like the AMD 10h that do not trap on
unaligned references if a status bit is twiddled at startup time.
llvm-svn: 93151
be non-optimal. To be precise, we should avoid folding loads if the instructions
only update part of the destination register, and the non-updated part is not
needed. e.g. cvtss2sd, sqrtss. Unfolding the load from these instructions breaks
the partial register dependency and it can improve performance. e.g.
movss (%rdi), %xmm0
cvtss2sd %xmm0, %xmm0
instead of
cvtss2sd (%rdi), %xmm0
An alternative method to break dependency is to clear the register first. e.g.
xorps %xmm0, %xmm0
cvtss2sd (%rdi), %xmm0
llvm-svn: 91672
- This is an initial step towards -march=native support in Clang, and towards
eliminating host dependencies in the targets. See PR5389.
- Patch by Roman Divacky!
llvm-svn: 88768
Module*.
Also, dropped uses of TargetMachine where unnecessary. The only target which
still takes a TargetMachine& is Mips, I would appreciate it if someone would
normalize this to match other targets.
llvm-svn: 77918
- added processors k8-sse3, opteron-sse3, athlon64-sse3, amdfam10, and
barcelona with appropriate sse3/4a levels
- added FeatureSSE4A for amdfam10 processors
in X86Subtarget:
- added hasSSE4A
- updated AutoDetectSubtargetFeatures to detect SSE4A
- updated GetCurrentX86CPU to detect family 15 with sse3 as k8-sse3 and
family 10h as amdfam10
New processor names match those used by gcc.
Patch by Paul Redmond!
llvm-svn: 72434
and extern_weak_odr. These are the same as the non-odr versions,
except that they indicate that the global will only be overridden
by an *equivalent* global. In C, a function with weak linkage can
be overridden by a function which behaves completely differently.
This means that IP passes have to skip weak functions, since any
deductions made from the function definition might be wrong, since
the definition could be replaced by something completely different
at link time. This is not allowed in C++, thanks to the ODR
(One-Definition-Rule): if a function is replaced by another at
link-time, then the new function must be the same as the original
function. If a language knows that a function or other global can
only be overridden by an equivalent global, it can give it the
weak_odr linkage type, and the optimizers will understand that it
is alright to make deductions based on the function body. The
code generators on the other hand map weak and weak_odr linkage
to the same thing.
llvm-svn: 66339
is given, override the subtarget settings and enable 64-bit support.
This restores the earlier behavior, and fixes regressions on
Non-64-bit-capable x86-32 hosts.
This isn't necessarily the best approach, but the most obvious
alternative is to require -mcpu=x86-64 or -mattr=+64bit to be used
with -march=x86-64 when the host doesn't have 64-bit support. This
makes things little more consistent, but it's less convenient, and
it has the practical drawback of requiring lots of test changes, so
I opted for the above approach for now.
llvm-svn: 63642
SSE2, however it's possible to disable SSE2, and the subtarget support
code thinks that if 64-bit implies SSE2 and SSE2 is disabled then
64-bit should also be disabled. Instead, just mark all the 64-bit
subtargets as explicitly supporting SSE2.
Also, move the code that makes -march=x86-64 enable 64-bit support by
default to only apply when there is no explicit subtarget. If you
need to specify a subtarget and you want 64-bit code, you'll need to
select a subtarget that supports 64-bit code.
llvm-svn: 63575
loops when they can be subsumed into addressing modes.
Change X86 addressing mode check to realize that
some PIC references need an extra register.
(I believe this is correct for Linux, if not, I'm sure
someone will tell me.)
llvm-svn: 60608
instead of requiring all "short description" strings to begin with
two spaces. This makes these strings less mysterious, and it fixes
some cases where short description strings mistakenly did not
begin with two spaces.
llvm-svn: 57521
`-fno-builtin' flag. Currently, it's used to replace "memset" with "_bzero"
instead of "__bzero" on Darwin10+. This arguably violates the meaning of this
flag, but is currently sufficient. The meaning of this flag should become more
specific over time.
llvm-svn: 56885
are represented as "weak", but there are subtle differences
in some cases on Darwin, so we need both. The intent
is that "common" will behave identically to "weak" unless
somebody changes their target to do something else.
No functional change as yet.
llvm-svn: 51118
replaced it with a FIXME should have determined what did work. Then he would have
realized that the code was in fact correct, and would have avoided breaking it.
llvm-svn: 36173
1. New parameter attribute called 'inreg'. It has meaning "place this
parameter in registers, if possible". This is some generalization of
gcc's regparm(n) attribute. It's currently used only in X86-32 backend.
2. Completely rewritten CC handling/lowering code inside X86 backend.
Merged stdcall + c CCs and fastcall + fast CC.
3. Dropped CSRET CC. We cannot add struct return variant for each
target-specific CC (e.g. stdcall + csretcc and so on).
4. Instead of CSRET CC introduced 'sret' parameter attribute. Setting in
on first attribute has meaning 'This is hidden pointer to structure
return. Handle it gently'.
5. Fixed small bug in llvm-extract + add new feature to
FunctionExtraction pass, which relinks all internal-linkaged callees
from deleted function to external linkage. This will allow further
linking everything together.
NOTEs: 1. Documentation will be updated soon.
2. llvm-upgrade should be improved to translate csret => sret.
Before this, there will be some unexpected test fails.
llvm-svn: 33597
non-statics.
* Introduce new option to output zero-initialized data to .bss section.
This can reduce size of binaries. Enable it by default for ELF &
Cygwin/Mingw targets. Probably, Darwin should be also added.
llvm-svn: 33299
* PIC-aware internal structures in X86 Codegen have been refactored
* Visibility (default/weak) has been added
* Docs fixes (external weak linkage, visibility, formatting)
llvm-svn: 33136
- New target type "mingw" was introduced
- Same things for both mingw & cygwin are marked as "cygming" (as in
gcc)
- .lcomm is supported here, so allow LLVM to use it
- Correctly use underscored versions of setjmp & _longjmp for both mingw
& cygwin
llvm-svn: 32833
Turns them into calls to memset / memcpy if 1) buffer(s) are not DWORD aligned,
2) size is not known to be greater or equal to some minimum value (currently 128).
llvm-svn: 26224