function can support dynamic stack realignment. That's a much easier question
to answer at instruction selection stage than whether the function actually
will have dynamic alignment prologue. This allows the removal of the
stack alignment heuristic pass, and improves code quality for cases where
the heuristic would result in dynamic alignment code being generated when
it was not strictly necessary.
llvm-svn: 93885
doing global variable classification anymore) and hookized, sink almost
all target targets global variable emission code into AsmPrinter and out
of each target.
Some notes:
1. PIC16 does completely custom and crazy stuff, so it is not changed.
2. XCore has some custom handling for extra directives. I'll look at it next.
3. This switches linux/ppc to use .globl instead of .global. If .globl is
actually wrong, let me know and I'll fix it.
4. This makes linux/ppc get a lot of random cases right which were obviously
wrong before, it is probably now a bit healthier.
5. Blackfin will probably start getting .comm and other things that it didn't
before. If this is undesirable, it should explicitly opt out of these
things by clearing the relevant fields of MCAsmInfo.
This leads to a nice diffstat:
14 files changed, 127 insertions(+), 830 deletions(-)
llvm-svn: 93858
I'm not sure that this is correct, but it causes no test failures,
and just emitting a .comm without protecting its linkage somehow
is surely not right.
llvm-svn: 93854
This makes a similar code dead in all the other targets, I'll clean it up
in a bit.
This also moves handling of lcomm up before acquisition of a section,
since lcomm never needs a section.
llvm-svn: 93851
GCC would put weak zero initialized mutable data in the .bss section,
we would put it into a crasy '.gnu.linkonce.b.test,"aw",@nobits'
section. Fixing this will allow simplifications next up.
llvm-svn: 93844
simplify and commonize some of the asmprinter logic for globals.
This also avoids printing the MCSection for .zerofill, which broke
the llvm-gcc build.
llvm-svn: 93843
1. TargetLoweringObjectFileMachO should decide if something
goes in zerofill instead of having every target do it.
2. TargetLoweringObjectFileMachO should assign said symbols to
the right MCSection, the asmprinters should just emit to the
right section.
3. Since all zerofill stuff goes through mcstreamer anymore,
MAI can have a bool "haszerofill" instead of having the textual
directive to emit.
llvm-svn: 93838
comments (fast isel, X86). This doesn't seem
to break any functionality, but will introduce
cases where -g affects the generated code. I'll
be fixing that.
llvm-svn: 93811
idea, but unfortunately necessary.
- Default to using 4-bytes for the LSDA pointer encoding to agree with the
encoded value in the CIE.
llvm-svn: 93753
The CIE says that the LSDA point in the FDE section is an "sdata4". That's fine,
but we need it to actually be 4-bytes in the FDE for some platforms. Allow
individual platforms to decide for themselves.
llvm-svn: 93616
Note that the code wasn't calling DecorateCygMingName
when emitting the ".ascii -export" stuff at the end of
file for DLLExported functions. I don't know if it should
or not, but I'm preserving behavior.
llvm-svn: 93603
target-dependent memory address representation in it.
Restore X86 printing of DEBUG_VALUE; lowering is
done in X86RegisterInfo using the normal algorithm.
llvm-svn: 93565
FrameIndexes should be lowered, but the same way as
everything else (target dependent) rather than in a
special hacked way. The lowering needs to be done
for eventual purposes of Dwarf generation.
llvm-svn: 93530
the new ParseInstruction method just parses and returns a list of
target operands. A new MatchInstruction interface is used to
turn the operand list into an MCInst.
This requires new/deleting all the operands, but it also gives
targets the ability to use polymorphic operands if they want to.
llvm-svn: 93469
instead of returning it in an std::string. Based on this change:
1. Change TargetLoweringObjectFileCOFF::getCOFFSection to take a StringRef
2. Change a bunch of targets to call makeNameProper with a smallstring,
making several of them *much* more efficient.
3. Rewrite Mangler::makeNameProper to not build names and then prepend
prefixes, not use temporary std::strings, and to avoid other crimes.
llvm-svn: 93298
For now, this pass is fairly conservative. It only perform the replacement when both the pre- and post- extension values are used in the block. It will miss cases where the post-extension values are live, but not used.
llvm-svn: 93278
instruction is copy like where the source and destination registers can
overlap. This is to be used by the coalescable to coalesce the source and
destination registers of instructions like X86::MOVSX64rr32. Apparently
some crazy people believe the coalescer is too simple.
llvm-svn: 93210
- getToken is modeled after StringRef::split but it can split on multiple
separator chars and skips leading seperators.
- SplitString is a StringRef::split variant for more than 2 elements with the
same behaviour as getToken.
llvm-svn: 93161
has an immediate with at least 32 bits of leading zeros, to avoid needing to
materialize that immediate in a register first.
FileCheckize, tidy, and extend a testcase to cover this case.
This fixes rdar://7527390.
llvm-svn: 93160
new AsmPrinter. This is perhaps less elegant than describing them
in terms of MOV32r0 and subreg operations, but it allows the
current register to rematerialize them.
llvm-svn: 93158
ignore alignment requirements for SIMD memory operands. This
is useful on architectures like the AMD 10h that do not trap on
unaligned references if a status bit is twiddled at startup time.
llvm-svn: 93151
R11, and then asserting that the target was in R9. Since R9 isn't reserved for
the target anymore, and is used as an argument, this patch changes the
assertion.
llvm-svn: 93065
putting relocations into the constant pool - this isn't needed
for correctness and in the rare occasion it happens would pull
us out of fast isel for the block.
If fast-isel application startup time ever becomes an issue we
can add better support for these addresses instead of bailing.
llvm-svn: 92995
1. CMPXCHG8B and CMPXCHG16B did not specify implicit physical register defs and uses.
2. LCMPXCHG8B is loading 64 bit memory, not 32 bit.
llvm-svn: 92985
(OP (trunc x), (trunc y)) -> (trunc (OP x, y))
Unfortunately this simple change causes dag combine to infinite looping. The problem is the shrink demanded ops optimization tend to canonicalize expressions in the opposite manner. That is badness. This patch disable those optimizations in dag combine but instead it is done as a late pass in sdisel.
This also exposes some deficiencies in dag combine and x86 setcc / brcond lowering. Teach them to look pass ISD::TRUNCATE in various places.
llvm-svn: 92849
clear what information these functions are actually using.
This is also a micro-optimization, as passing a SDNode * around is
simpler than passing a { SDNode *, int } by value or reference.
llvm-svn: 92564
(or (x << c) | (y >> (64 - c))) ==> (shld64 x, y, c)
The isel patterns may not catch all the cases if general dag combine has reduced width of source operands.
llvm-svn: 92513
instead use the appropriate subreggy thing. This generates identical
code on some large apps (thanks to Evan's cross class coalescing
stuff he did back in july). This means that MOV16r0 can go away
completely in the future soon.
llvm-svn: 91972
return partial registers. This affected the back-end lowering code some.
Also patch up some places I missed before in the "get" functions.
llvm-svn: 91880
by allowing backends to override routines that will default
the JIT and Static code generation to an appropriate code model
for the architecture.
Should fix PR 5773.
llvm-svn: 91824
incarnations), integrated into the MC framework.
The disassembler is table-driven, using a custom TableGen backend to
generate hierarchical tables optimized for fast decode. The disassembler
consumes MemoryObjects and produces arrays of MCInsts, adhering to the
abstract base class MCDisassembler (llvm/MC/MCDisassembler.h).
The disassembler is documented in detail in
- lib/Target/X86/Disassembler/X86Disassembler.cpp (disassembler runtime)
- utils/TableGen/DisassemblerEmitter.cpp (table emitter)
You can test the disassembler by running llvm-mc -disassemble for i386
or x86_64 targets. Please let me know if you encounter any problems
with it.
llvm-svn: 91749
be non-optimal. To be precise, we should avoid folding loads if the instructions
only update part of the destination register, and the non-updated part is not
needed. e.g. cvtss2sd, sqrtss. Unfolding the load from these instructions breaks
the partial register dependency and it can improve performance. e.g.
movss (%rdi), %xmm0
cvtss2sd %xmm0, %xmm0
instead of
cvtss2sd (%rdi), %xmm0
An alternative method to break dependency is to clear the register first. e.g.
xorps %xmm0, %xmm0
cvtss2sd (%rdi), %xmm0
llvm-svn: 91672
incrementing the simple value type of the 16-bit type, which would give the
wrong type if an intemediate MVT (such as i24) were introduced.
llvm-svn: 91602
remove start/finishGVStub and the BufferState helper class from the
MachineCodeEmitter interface. It has the side-effect of not setting the
indirect global writable and then executable on ARM, but that shouldn't be
necessary.
llvm-svn: 91464
for all the processors where I have tried it, and even when it might not help
performance, the cost is quite low. The opportunities for duplicating
indirect branches are limited by other factors so code size does not change
much due to tail duplicating indirect branches aggressively.
llvm-svn: 90144
it is definitely profitable to tail duplicate indirect branches for x86.
This is likely to be true to various degrees for all modern x86 processors.
llvm-svn: 89865
way for each TargetJITInfo subclass to allocate its own stubs. This
means stubs aren't as exactly-sized anymore, but it lets us get rid of
TargetJITInfo::emitFunctionStubAtAddr(), which lets ARM and PPC
support the eager JIT, fixing http://llvm.org/PR4816.
* Rename the JITEmitter's stub creation functions to describe the kind
of stub they create. So far, all of them create lazy-compilation
stubs, but they sometimes get used when far-call stubs are needed.
Fixing http://llvm.org/PR5201 will involve fixing this.
llvm-svn: 89715
Note that "hasDotLocAndDotFile"-style debug info was already broken;
people wanting this functionality should implement it in the
AsmPrinter/DwarfWriter code.
llvm-svn: 89711
The large code model is documented at
http://www.x86-64.org/documentation/abi.pdf and says that calls should
assume their target doesn't live within the 32-bit pc-relative offset
that fits in the call instruction.
To do this, we turn off the global-address->target-global-address
conversion in X86TargetLowering::LowerCall(). The first attempt at
this broke the lazy JIT because it can separate the movabs(imm->reg)
from the actual call instruction. The lazy JIT receives the address of
the movabs as a relocation and needs to record the return address from
the call; and then when that call happens, it needs to patch the
movabs with the newly-compiled target. We could thread the call
instruction into the relocation and record the movabs<->call mapping
explicitly, but that seems to require at least as much new
complication in the code generator as this change.
To fix this, we make lazy functions _always_ go through a call
stub. You'd think we'd only have to force lazy calls through a stub on
difficult platforms, but that turns out to break indirect calls
through a function pointer. The right fix for that is to distinguish
between calls and address-of operations on uncompiled functions, but
that's complex enough to leave for someone else to do.
Another attempt at this defined a new CALL64i pseudo-instruction,
which expanded to a 2-instruction sequence in the assembly output and
was special-cased in the X86CodeEmitter's emitInstruction()
function. That broke indirect calls in the same way as above.
This patch also removes a hack forcing Darwin to the small code model.
Without far-call-stubs, the small code model requires things of the
JITMemoryManager that the DefaultJITMemoryManager can't provide.
Thanks to echristo for lots of testing!
llvm-svn: 88984
- This is an initial step towards -march=native support in Clang, and towards
eliminating host dependencies in the targets. See PR5389.
- Patch by Roman Divacky!
llvm-svn: 88768
Provide special isLoadFromStackSlotPostFE and isStoreToStackSlotPostFE
interfaces to explicitly request checking for post-frame ptr elimination
operands. This uses a heuristic so it isn't reliable for correctness.
llvm-svn: 87047
machine instruction loads or stores from/to a stack slot. Unlike
isLoadFromStackSlot and isStoreFromStackSlot, the instruction may be
something other than a pure load/store (e.g. it may be an arithmetic
operation with a memory operand). This helps AsmPrinter determine when
to print a spill/reload comment.
This is only a hint since we may not be able to figure this out in all
cases. As such, it should not be relied upon for correctness.
Implement for X86. Return false by default for other architectures.
llvm-svn: 87026
slots. The AsmPrinter will use this information to determine whether to
print a spill/reload comment.
Remove default argument values. It's too easy to pass a wrong argument
value when multiple arguments have default values. Make everything
explicit to trap bugs early.
Update all targets to adhere to the new interfaces..
llvm-svn: 87022
- Force NDEBUG on in any Release build. This drops the compile time to ~100s
from ~600s, in Release mode.
- This may just be a temporary workaround, I don't know the true nature of the
gcc-4.2 compile time performance problem.
llvm-svn: 86695
This patch forbids implicit conversion of DenseMap::const_iterator to
DenseMap::iterator which was possible because DenseMapIterator inherited
(publicly) from DenseMapConstIterator. Conversion the other way around is now
allowed as one may expect.
The template DenseMapConstIterator is removed and the template parameter
IsConst which specifies whether the iterator is constant is added to
DenseMapIterator.
Actually IsConst parameter is not necessary since the constness can be
determined from KeyT but this is not relevant to the fix and can be addressed
later.
Patch by Victor Zverovich!
llvm-svn: 86636
1. rename the movhp patfrag to movlhps, since thats what it actually matches
2. eliminate the bogus movhps load and store patterns, they were incorrect. The load transforms are already handled (correctly) by shufps/unpack.
3. revert a recent test change to its correct form.
llvm-svn: 86415
MachineRelocations, "stub" always refers to a far-call stub or a
load-a-faraway-global stub, so this patch adds "Far" to the term. (Other stubs
are used for lazy compilation and dlsym address replacement.) The variable was
also inconsistent between the positive and negative sense, and the positive
sense ("NeedStub") was more demanding than is accurate (since a nearby-enough
function can be called directly even if the platform often requires a stub).
Since the negative sense causes double-negatives, I switched to
"MayNeedFarStub" globally.
llvm-svn: 86363
The KILL pseudo-instruction may survive to the asm printer pass, just like the IMPLICIT_DEF. Print the KILL as a comment instead of just leaving a blank line in the output.
With -asm-verbose=0, a blank line is printed, like IMPLICIT?DEF.
llvm-svn: 86041
the testcase into:
_test1: ## @test1
## BB#0: ## %entry
leaq L_test1_bb6(%rip), %rax
jmpq *%rax
L_test1_bb: ## Address Taken
LBB1_1: ## %bb
movb $1, %al
ret
L_test1_bb6: ## Address Taken
LBB1_2: ## %bb6
movb $2, %al
ret
Note, it is very very strange that BlockAddressSDNode doesn't carry
around TargetFlags. Dan, please fix this.
llvm-svn: 85703
unfolding loads for hoisting. getOpcodeAfterMemoryUnfold returns the
opcode of the original operation without the load, not the load
itself, MachineLICM needs to know the operand index in order to get
the correct register class. Extend getOpcodeAfterMemoryUnfold to
return this information.
llvm-svn: 85622
bunch of associated comments, because it doesn't have anything to do
with DAGs or scheduling. This is another step in decoupling MachineInstr
emitting from scheduling.
llvm-svn: 85517
encounters an OEQ or UNE comparison, and update its callers to check
for this return status and recover. This fixes a problem resulting from
the LowerOperation hooks being called from LegalizeVectorOps, because
LegalizeVectorOps only lowers vectors, so OEQ and UNE comparisons may
still be at large. This fixes PR5092.
llvm-svn: 84640
All of these "subreg32" modifier instructions are handled
explicitly by the MCInst lowering phase. If they got to
the asmprinter, they would explode. They should eventually
be replace with correct use of subregs.
llvm-svn: 84526
LLC was scheduling compares before the adds causing wrong branches to be taken
in programs, resulting in misoptimized code wherever atomic adds where used.
llvm-svn: 84485
stack slots and giving them different PseudoSourceValue's did not fix the
problem of post-alloc scheduling miscompiling llvm itself.
- Apply Dan's conservative workaround by assuming any non fixed stack slots can
alias other memory locations. This means a load from spill slot #1 cannot
move above a store of spill slot #2.
- Enable post-alloc scheduling for x86 at optimization leverl Default and above.
llvm-svn: 84424
1. Emit external function type information for all COFF targets since it's
a feature of object format
2. Emit linker directives only for cygming (since this is ld-specific stuff)
llvm-svn: 84214
(for uses marked kill and defs marked dead) a few instructions in
addition to forwards. Also, increase the maximum number of instructions
to scan, as it appears to help in a fair number of cases.
llvm-svn: 84061
it to hold the address of an sret return value, for x86-64 ABI purposes.
Also, fix the test that was originally intended to test this to actually
test it, using FileCheck.
llvm-svn: 83853