Commit Graph

409 Commits

Author SHA1 Message Date
Chris Lattner 6a5e706e3c on darwin empty functions need to codegen into something of non-zero length,
otherwise labels get incorrectly merged.  We handled this by emitting a 
".byte 0", but this isn't correct on thumb/arm targets where the text segment
needs to be a multiple of 2/4 bytes.  Handle this by emitting a noop.  This
is more gross than it should be because arm/ppc are not fully mc'ized yet.

This fixes rdar://7908505

llvm-svn: 102400
2010-04-26 23:37:21 +00:00
Evan Cheng 1ff9d1b63e Remove a redundant comment.
llvm-svn: 102326
2010-04-26 08:16:57 +00:00
Evan Cheng ed69b382ea - Move TargetLowering::EmitTargetCodeForFrameDebugValue to TargetInstrInfo and rename it to emitFrameIndexDebugValue.
- Teach spiller to modify DBG_VALUE instructions to reference spill slots.

llvm-svn: 102323
2010-04-26 07:38:55 +00:00
Dan Gohman bcaf681cde Add const qualifiers to CodeGen's use of LLVM IR constructs.
llvm-svn: 101334
2010-04-15 01:51:59 +00:00
Evan Cheng 4ca4bc6f95 Re-apply 101075 and fix it properly. Just reuse the debug info of the branch instruction being optimized. There is no need to --I which can deref off start of the BB.
llvm-svn: 101162
2010-04-13 18:50:27 +00:00
Eric Christopher d67f66dc0c Temporarily revert r101075, it's causing invalid iterator assertions
in a nightly tester.

llvm-svn: 101158
2010-04-13 18:37:58 +00:00
Bill Wendling b02bbe416f Micro-optimization:
If we have this situation:

    jCC  L1
    jmp  L2
L1:
  ...
L2:
  ...

We can get a small performance boost by emitting this instead:

    jnCC L2
L1:
  ...
L2:
  ...

This testcase shows an example of this:

float func(float x, float y) {
    double product = (double)x * y;
    if (product == 0.0)
        return product;
    return product - 1.0;
}

llvm-svn: 101075
2010-04-12 22:19:57 +00:00
Chris Lattner 2104b8d36e rename llvm::llvm_report_error -> llvm::report_fatal_error
llvm-svn: 100709
2010-04-07 22:58:41 +00:00
Dale Johannesen 60b289709e Educate GetInstrSizeInBytes implementations that
DBG_VALUE does not generate code.

llvm-svn: 100681
2010-04-07 19:51:44 +00:00
Jakob Stoklund Olesen 1a9b3f3484 Properly enable load clustering.
Operand 2 on a load instruction does not have to be a RegisterSDNode for this to
work.

llvm-svn: 100497
2010-04-05 23:48:02 +00:00
Chris Lattner 6f306d7d30 use DebugLoc default ctor instead of DebugLoc::getUnknownLoc()
llvm-svn: 100214
2010-04-02 20:16:16 +00:00
Dale Johannesen 4244d12769 Teach AnalyzeBranch, RemoveBranch and the branch
folder to be tolerant of debug info following the
branch(es) at the end of a block.

llvm-svn: 100168
2010-04-02 01:38:09 +00:00
Jakob Stoklund Olesen 9986ba954c Replace V_SET0 with variants for each SSE execution domain.
llvm-svn: 99975
2010-03-31 00:40:13 +00:00
Jakob Stoklund Olesen dbff4e8103 Renumber SSE execution domains for better code size.
SSEDomainFix will collapse to the domain with the lower number when it has a
choice. The SSEPackedSingle domain often has smaller instructions, so prefer
that.

llvm-svn: 99952
2010-03-30 22:46:53 +00:00
Eric Christopher 6ad8167714 Remove the pmulld intrinsic and autoupdate it as a vector multiply.
Rewrite the pmulld patterns, and make sure that they fold in loads of
arguments into the instruction.

llvm-svn: 99910
2010-03-30 18:49:01 +00:00
Jakob Stoklund Olesen b551aa4da5 Basic implementation of SSEDomainFix pass.
Cross-block inference is primitive and wrong, but the pass is working otherwise.

llvm-svn: 99848
2010-03-29 23:24:21 +00:00
Jakob Stoklund Olesen 49e121d5e4 Add a late SSEDomainFix pass that twiddles SSE instructions to avoid domain crossings.
On Nehalem and newer CPUs there is a 2 cycle latency penalty on using a register
in a different domain than where it was defined. Some instructions have
equvivalents for different domains, like por/orps/orpd.

The SSEDomainFix pass tries to minimize the number of domain crossings by
changing between equvivalent opcodes where possible.

This is a work in progress, in particular the pass doesn't do anything yet. SSE
instructions are tagged with their execution domain in TableGen using the last
two bits of TSFlags. Note that not all instructions are tagged correctly. Life
just isn't that simple.

The SSE execution domain issue is very similar to the ARM NEON/VFP pipeline
issue handled by NEONMoveFixPass. This pass may become target independent to
handle both.

llvm-svn: 99524
2010-03-25 17:25:00 +00:00
Jakob Stoklund Olesen a86ccbfe88 Revert "Add a late SSEDomainFix pass that twiddles SSE instructions to avoid domain crossings."
This reverts commit 99345. It was breaking buildbots.

llvm-svn: 99352
2010-03-23 23:48:51 +00:00
Jakob Stoklund Olesen 31da45b7af Add a late SSEDomainFix pass that twiddles SSE instructions to avoid domain crossings.
This is work in progress. So far, SSE execution domain tables are added to
X86InstrInfo, and a skeleton pass is enabled with -sse-domain-fix.

llvm-svn: 99345
2010-03-23 23:14:44 +00:00
Evan Cheng b6dee6e015 Teach isSafeToClobberEFLAGS to ignore dbg_value's. We need a MachineBasicBlock::iterator that does this automatically?
llvm-svn: 99320
2010-03-23 20:35:45 +00:00
Evan Cheng d703df67ce Do not force indirect tailcall through fixed registers: eax, r11. Add support to allow loads to be folded to tail call instructions.
llvm-svn: 98465
2010-03-14 03:48:46 +00:00
Dan Gohman 772952f46e Don't try to fold V_SET0 and V_SETALLONES to loads in medium and
large code models.

llvm-svn: 98042
2010-03-09 03:01:40 +00:00
Bill Wendling 543ce1f64a Revert r97766. It's deleting a tag.
llvm-svn: 97768
2010-03-05 00:33:59 +00:00
Bill Wendling 6517f88f25 Micro-optimization:
This code:

float floatingPointComparison(float x, float y) {
    double product = (double)x * y;
    if (product == 0.0)
        return product;
    return product - 1.0;
}

produces this:

_floatingPointComparison:
0000000000000000        cvtss2sd        %xmm1,%xmm1
0000000000000004        cvtss2sd        %xmm0,%xmm0
0000000000000008        mulsd           %xmm1,%xmm0
000000000000000c        pxor            %xmm1,%xmm1
0000000000000010        ucomisd         %xmm1,%xmm0
0000000000000014        jne             0x00000004
0000000000000016        jp              0x00000002
0000000000000018        jmp             0x00000008
000000000000001a        addsd           0x00000006(%rip),%xmm0
0000000000000022        cvtsd2ss        %xmm0,%xmm0
0000000000000026        ret

The "jne/jp/jmp" sequence can be reduced to this instead:

_floatingPointComparison:
0000000000000000        cvtss2sd        %xmm1,%xmm1
0000000000000004        cvtss2sd        %xmm0,%xmm0
0000000000000008        mulsd           %xmm1,%xmm0
000000000000000c        pxor            %xmm1,%xmm1
0000000000000010        ucomisd         %xmm1,%xmm0
0000000000000014        jp              0x00000002
0000000000000016        je              0x00000008
0000000000000018        addsd           0x00000006(%rip),%xmm0
0000000000000020        cvtsd2ss        %xmm0,%xmm0
0000000000000024        ret

for a savings of 2 bytes.

This xform can happen when we recognize that jne and jp jump to the same "true"
MBB, the unconditional jump would jump to the "false" MBB, and the "true" branch
is the fall-through MBB.

llvm-svn: 97766
2010-03-05 00:24:26 +00:00
Dan Gohman bdd6405f29 Implement XMM subregs.
Extracting the low element of a vector is now done with EXTRACT_SUBREG,
and the zero-extension performed by load movss is now modeled with
SUBREG_TO_REG, and so on.

Register-to-register movss and movsd are no longer considered copies;
they are two-address instructions which insert a scalar into a vector.

llvm-svn: 97354
2010-02-28 00:17:42 +00:00
Dan Gohman 952f6f98bb movl is a cheaper way to materialize 0 without clobbering EFLAGS than movabsq.
llvm-svn: 97227
2010-02-26 16:49:27 +00:00
Dan Gohman c1a545c307 Fix a typo in a comment.
llvm-svn: 96778
2010-02-22 04:09:26 +00:00
Chris Lattner f7477e599f add a bunch of mod/rm encoding types for fixed mod/rm bytes.
This will work better for the disassembler for modeling things
like lfence/monitor/vmcall etc.

llvm-svn: 95960
2010-02-12 02:06:33 +00:00
Chris Lattner 2b0a7a2592 refactor the conditional jump instructions in the .td file to
use a multipattern that generates both the 1-byte and 4-byte 
versions from the same defm

llvm-svn: 95901
2010-02-11 19:25:55 +00:00
Chris Lattner b06015aa69 move target-independent opcodes out of TargetInstrInfo
into TargetOpcodes.h.  #include the new TargetOpcodes.h
into MachineInstr.  Add new inline accessors (like isPHI())
to MachineInstr, and start using them throughout the 
codebase.

llvm-svn: 95687
2010-02-09 19:54:29 +00:00
Chris Lattner 58827ff98e port X86InstrInfo::determineREX over to the new encoder.
llvm-svn: 95440
2010-02-05 22:10:22 +00:00
Chris Lattner 503243559a move functions for decoding X86II values into the X86II namespace.
llvm-svn: 95410
2010-02-05 19:24:13 +00:00
Chris Lattner b8d375fd21 change getSizeOfImm and getBaseOpcodeFor to just take
TSFlags directly instead of a TargetInstrDesc.

llvm-svn: 95405
2010-02-05 19:16:26 +00:00
Dale Johannesen e5a4134d11 use findDebugLoc in more places.
llvm-svn: 94477
2010-01-26 00:03:12 +00:00
Evan Cheng 16cf934381 Be more conservative with clustering f32 / f64 loads.
llvm-svn: 94254
2010-01-22 23:49:11 +00:00
Evan Cheng 4f026f3750 Add two target hooks to determine whether two loads are near and should be scheduled together.
llvm-svn: 94147
2010-01-22 03:34:51 +00:00
Evan Cheng 5d30f7c91c Fix a minor issue in x86 load / store folding table. movups does an unaligned load so it doesn't require 16-byte alignment.
llvm-svn: 94058
2010-01-21 00:55:14 +00:00
Dale Johannesen c5db599813 make findDebugLoc a class method
llvm-svn: 94032
2010-01-20 21:36:02 +00:00
Dale Johannesen 91970b4ea2 Move findDebugLoc somewhere more central. Fix
more cases where debug declarations affect
debug line info.

llvm-svn: 93953
2010-01-20 00:19:24 +00:00
Jim Grosbach 04770f2aa1 For aligned load/store instructions, it's only required to know whether a
function can support dynamic stack realignment. That's a much easier question
to answer at instruction selection stage than whether the function actually
will have dynamic alignment prologue. This allows the removal of the
stack alignment heuristic pass, and improves code quality for cases where
the heuristic would result in dynamic alignment code being generated when
it was not strictly necessary.

llvm-svn: 93885
2010-01-19 18:31:11 +00:00
Evan Cheng ceb5a4e8f6 For now, avoid issuing extract_subreg to reuse lower 8-bit, it's not safe in 32-bit.
llvm-svn: 93307
2010-01-13 08:01:32 +00:00
Evan Cheng 30bebff456 Add a quick pass to optimize sign / zero extension instructions. For targets where the pre-extension values are available in the subreg of the result of the extension, replace the uses of the pre-extension value with the result + extract_subreg.
For now, this pass is fairly conservative. It only perform the replacement when both the pre- and post- extension values are used in the block. It will miss cases where the post-extension values are live, but not used.

llvm-svn: 93278
2010-01-13 00:30:23 +00:00
Dan Gohman c119580307 Reapply the MOV64r0 patch, with a fix: MOV64r0 clobbers EFLAGS.
llvm-svn: 93229
2010-01-12 04:42:54 +00:00
Evan Cheng 4216615f99 Add TargetInstrInfo::isCoalescableInstr. It returns true if the specified
instruction is copy like where the source and destination registers can
overlap. This is to be used by the coalescable to coalesce the source and
destination registers of instructions like X86::MOVSX64rr32. Apparently
some crazy people believe the coalescer is too simple.

llvm-svn: 93210
2010-01-12 00:09:37 +00:00
Evan Cheng 7bdf339602 Revert 93158. It's breaking quite a few x86_64 tests.
llvm-svn: 93185
2010-01-11 21:13:41 +00:00
Dan Gohman 3a55686345 Re-instate MOV64r0 and MOV16r0, with adjustments to work with the
new AsmPrinter. This is perhaps less elegant than describing them
in terms of MOV32r0 and subreg operations, but it allows the
current register to rematerialize them.

llvm-svn: 93158
2010-01-11 17:37:57 +00:00
David Greene d589dafba6 Change errs() to dbgs().
llvm-svn: 92653
2010-01-05 01:29:29 +00:00
Bill Wendling 3179a89067 Remove dead variable.
llvm-svn: 92184
2009-12-28 01:36:02 +00:00
Chris Lattner 518b037620 completely eliminate the MOV16r0 'instruction'. The only
interesting part of this is the divrem changes, which are
already tested by CodeGen/X86/divrem.ll.

llvm-svn: 91975
2009-12-23 01:45:04 +00:00
Evan Cheng 71d7eaa87e Remove target attribute break-sse-dep. Instead, do not fold load into sse partial update instructions unless optimizing for size.
llvm-svn: 91910
2009-12-22 17:47:23 +00:00