This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.
llvm-svn: 158204
Dynamic linking on PPC64 has had problems since we had to move the top-down
hazard-detection logic post-ra. For dynamic linking to work there needs to be
a nop placed after every call. It turns out that it is really hard to guarantee
that nothing will be placed in between the call (bl) and the nop during post-ra
scheduling. Previous attempts at fixing this by placing logic inside the
hazard detector only partially worked.
This is now fixed in a different way: call+nop codegen-only instructions. As far
as CodeGen is concerned the pair is now a single instruction and cannot be split.
This solution works much better than previous attempts.
The scoreboard hazard detector is also renamed to be more generic, there is currently
no cpu-specific logic in it.
llvm-svn: 153816
into the immediate field. This allows us to encode stuff like this:
lbz r3, lo16(__ZL4init)(r4) ; globalopt.cpp:5
; encoding: [0x88,0x64,A,A]
; fixup A - offset: 0, value: lo16(__ZL4init), kind: fixup_ppc_lo16
stw r3, lo16(__ZL1s)(r5) ; globalopt.cpp:6
; encoding: [0x90,0x65,A,A]
; fixup A - offset: 0, value: lo16(__ZL1s), kind: fixup_ppc_lo16
With this, we should have a completely function MCCodeEmitter for PPC, wewt.
llvm-svn: 119134
modes. For example, we now get:
ld r3, lo16(_G)(r3) ; encoding: [0xe8,0x63,A,0bAAAAAA00]
; fixup A - offset: 0, value: lo16(_G), kind: fixup_ppc_lo14
llvm-svn: 119133
When a target instruction wants to set target-specific flags, it should simply
set bits in the TSFlags bit vector defined in the Instruction TableGen class.
This works well because TableGen resolves member references late:
class I : Instruction {
AddrMode AM = AddrModeNone;
let TSFlags{3-0} = AM.Value;
}
let AM = AddrMode4 in
def ADD : I;
TSFlags gets the expected bits from AddrMode4 in this example.
llvm-svn: 100384
InOperandList. This gives one piece of important information: # of results
produced by an instruction.
An example of the change:
def ADD32rr : I<0x01, MRMDestReg, (ops GR32:$dst, GR32:$src1, GR32:$src2),
"add{l} {$src2, $dst|$dst, $src2}",
[(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
=>
def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2),
"add{l} {$src2, $dst|$dst, $src2}",
[(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
llvm-svn: 40033
The algorithm it used before wasn't 100% correct, we now use an iterative
expansion model. This fixes assembler errors when compiling 403.gcc with
tail merging enabled.
Change the way the branch selector works overall: Now, the isel generates
PPC::BCC instructions (as it used to) directly, and these BCC instructions
are emitted to the output or jitted directly if branches don't need
expansion. Only if branches need expansion are instructions rewritten
and created. This should make branch select faster, and eliminates the
Bxx instructions from the .td file.
llvm-svn: 31837
Tell the codegen emitter that specific operands are not to be encoded, fixing
JIT regressions w.r.t. pre-inc loads and stores (e.g. lwzu, which we generate
even when general preinc loads are not enabled).
llvm-svn: 31770
1. Use flags on the instructions in the .td file to indicate the PPC970 unit
type instead of a table in the .cpp file. Much cleaner.
2. Change the hazard recognizer to build d-groups according to the actual
algorithm used, not my flawed understanding of it.
3. Model "must be in the first slot" and "must be the only instr in a group"
accurately.
llvm-svn: 26719
Registers. Apologies to Jim if the scheduling info so far isn't accurate.
There's a few more things like VRsave support that need to be finished up
in my local tree before I can commit code that Does The Right Thing for
turning 4 x float into the various altivec packed float instructions.
llvm-svn: 24489