When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.
I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.
llvm-svn: 178617
The P7 and A2 have additional floating-point conversion instructions which
allow a direct two-instruction sequence (plus load/store) to convert from all
combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores,
only some combinations were directly available).
llvm-svn: 178480
This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruction and decreasing the amount of
needed stack space).
llvm-svn: 178446
These instructions are available on the P5x (and later) and on the A2. They
implement the standard floating-point rounding operations (floor, trunc, etc.).
One caveat: frin (round to nearest) does not implement "ties to even", and so
is only enabled in fast-math mode.
llvm-svn: 178337
These are 64-bit load/store with byte-swap, and available on the P7 and the A2.
Like the similar instructions for 16- and 32-bit words, these are matched in the
target DAG-combine phase against load/store-bswap pairs.
llvm-svn: 178276
PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and
tell TTI about it so that popcount-loop recognition will know about it.
llvm-svn: 178233
This is the first commit of a large series which will add support for the
QPX vector instruction set to the PowerPC backend. This instruction set is
used on the IBM Blue Gene/Q supercomputers.
llvm-svn: 173973
missed in the first pass because the script didn't yet handle include
guards.
Note that the script is now able to handle all of these headers without
manual edits. =]
llvm-svn: 169224
ELF subtarget.
The existing logic is used as a fallback to avoid any changes to the Darwin
ABI. PPC64 ELF now has two possible data layout strings: one for FreeBSD,
which requires 8-byte alignment, and a default string that requires
16-byte alignment.
I've added a test for PPC64 Linux to verify the 16-byte alignment. If
somebody wants to add a separate test for FreeBSD, that would be great.
Note that there is a companion patch to update the alignment information
in Clang, which I am committing now as well.
llvm-svn: 166928
The PPC target feature gpul (IsGigaProcessor) was only used for one thing:
To enable the generation of the MFOCRF instruction. Furthermore, this
instruction is available on other PPC cores outside of the G5 line. This
feature now corresponds to the HasMFOCRF flag.
No functionality change.
llvm-svn: 158323
This adds a full itinerary for IBM's PPC64 A2 embedded core. These
cores form the basis for the CPUs in the new IBM BG/Q supercomputer.
llvm-svn: 153842
itineraries.
- Refactor TargetSubtarget to be based on MCSubtargetInfo.
- Change tablegen generated subtarget info to initialize MCSubtargetInfo
and hide more details from targets.
llvm-svn: 134257
be the first encoded as the first feature. It then uses the CPU name to look up
features / scheduling itineray even though clients know full well the CPU name
being used to query these properties.
The fix is to just have the clients explictly pass the CPU name!
llvm-svn: 134127
See PR5201. There is no way to know if direct calls will be within the allowed
range for BL. Hence emit all calls as indirect when in JIT mode.
Without this long-running applications will fail to JIT on PowerPC with a
relocation failure.
llvm-svn: 110246
The Link Register is volatile when using the 32-bit SVR4 ABI.
Make it possible to use the 64-bit SVR4 ABI.
Add non-volatile registers for the 64-bit SVR4 ABI.
Make sure r2 is a reserved register when using the 64-bit SVR4 ABI.
Update PPCFrameInfo for the 64-bit SVR4 ABI.
Add FIXME for 64-bit Darwin PPC.
Insert NOP instruction after direct function calls.
Emit official procedure descriptors.
Create TOC entries for GlobalAddress references.
Spill 64-bit non-volatile registers to the correct slots.
Only custom lower VAARG when using the 32-bit SVR4 ABI.
Use simple VASTART lowering for the 64-bit SVR4 ABI.
llvm-svn: 79091
Module*.
Also, dropped uses of TargetMachine where unnecessary. The only target which
still takes a TargetMachine& is Mips, I would appreciate it if someone would
normalize this to match other targets.
llvm-svn: 77918
Make CalculateParameterAndLinkageAreaSize() Darwin-specific.
Remove SVR4 specific code from LowerCALL_Darwin() and LowerFORMAL_ARGUMENTS_Darwin().
Rename MachoABI to DarwinABI for consistency.
Rename ELF ABI to SVR4 ABI for consistency.
Factor out common call return lowering between the Darwin and SVR4 ABI.
Factor out common call lowering between the Darwin and SVR4 ABI.
llvm-svn: 74766
The EH_frame and .eh symbols are now private, except for darwin9 and earlier.
The patch also fixes the definition of PrivateGlobalPrefix on pcc linux.
llvm-svn: 61242
it follows the order of the enum, not alphabetical.
The motivation is to make -mattr=+ssse3,+sse41
select SSE41 as it ought to. Added "ignored"
enum values of 0 to PPC and SPU to avoid compiler
warnings.
llvm-svn: 47143