This allows formation of OpcodeSwitch for top level patterns, in
particular on X86. This saves about 1K of data space in the x86
table and makes the dispatch much more efficient.
llvm-svn: 97440
ComplexPattern at the root be generated multiple times, once
for each opcode they are part of. This encourages factoring
because the opcode checks get treated just like everything
else in the matcher.
llvm-svn: 97439
to a scope where every child starts with a CheckOpcode, but
executes more efficiently. Enhance DAGISelMatcherOpt to
form it.
This also fixes a bug in CheckOpcode: apparently the SDNodeInfo
objects are not pointer comparable, we have to compare the
enum name.
llvm-svn: 97438
so that we get grouping at the top level.
Add an optimization to reorder type check & record nodes
after opcode checks. We prefer to expose tree shape
matching which improves grouping and will enhance the next
optimization.
llvm-svn: 97432
dispatcher method. This eliminates the dependence of the new isel's
generated code on the old isel's predicates, however some random
hand written isel code still uses them.
llvm-svn: 97431
specifies whether there is an output flag or not. Use this
instead of redundantly encoding the chain/flag results in the
output vtlist.
llvm-svn: 97419
even some the old isel didn't. There are several parts of
this that make me feel dirty, but it's no worse than the
old isel. I'll clean up the parts I can do without ripping
out the old one next.
llvm-svn: 97415
It gets its own implementation totally divorced from the (presumably
performance-sensitive) routines which parse into a uint64_t.
Add APInt::operator|=(uint64_t), which is situationally much better than
using a full APInt.
llvm-svn: 97381
payloads. APFloat's internal folding routines always make QNaNs now,
instead of sometimes making QNaNs and sometimes SNaNs depending on the
type.
llvm-svn: 97364
node is always guaranteed to have a particular type
instead of hacking in ISD::STORE explicitly. This allows
us to use implied types for a broad range of nodes, even
target specific ones.
llvm-svn: 97355
Extracting the low element of a vector is now done with EXTRACT_SUBREG,
and the zero-extension performed by load movss is now modeled with
SUBREG_TO_REG, and so on.
Register-to-register movss and movsd are no longer considered copies;
they are two-address instructions which insert a scalar into a vector.
llvm-svn: 97354
defs or uses. The regular def and use checking below covers them, and
can be more precise. It's safe to hoist an instruction with a dead
implicit def if the register isn't live into the loop header.
llvm-svn: 97352
but codegen'd differently. This really wanted to use some
sort of subreg to get the low 4 bytes of the G8RC register
or something. However, it's invalid and nothing is testing
it, so I'm just zapping the bogosity.
llvm-svn: 97345
with getType() == MVT::i32 etc. Teach it that two different
integer constants are contradictory. This cuts 1K off the X86
table, down to 98k
llvm-svn: 97314
predicates. For example if we have:
Scope:
CheckType i32
ABC
CheckType f32
DEF
CheckType i32
GHI
Then we know that we can transform this into:
Scope:
CheckType i32
Scope
ABC
GHI
CheckType f32
DEF
This reorders the check for the 'GHI' predicate above
the check for the 'DEF' predidate. However it is safe to do this
in this situation because we know that a node cannot have both an
i32 and f32 type.
We're now doing more factoring that the old isel did.
llvm-svn: 97312
as deeply into the pattern as we can get away with. In pratice, this
means "all the way to to the emitter code, but not across
ComplexPatterns". This substantially increases the amount of factoring
we get.
llvm-svn: 97305
confusing the old MAT variable with the new GlobalType one. This caused
us to promote the @disp global pointer into:
@disp.body = internal global double*** undef
instead of:
@disp.body = internal global [3 x double**] undef
llvm-svn: 97285
for alignment into the LSDA. If the TType base offset is emitted, then put the
padding there. Otherwise, put it in the call site table length. There will be no
conflict between the two sites when placing the padding in one place.
llvm-svn: 97277
o Parallel addition and subtraction, signed/unsigned
o Miscellaneous operations: QADD, QDADD, QSUB, QDSUB
o Unsigned sum of absolute differences [and accumulate]: USAD8, USADA8
o Signed/Unsigned saturate: SSAT, SSAT16, USAT, USAT16
o Signed multiply accumulate long (halfwords): SMLAL<x><y>
o Signed multiply accumulate/subtract [long] (dual): SMLAD[x], SMLALD[X], SMLSD[X], SMLSLD[X]
o Signed dual multiply add/subtract [long]: SMUAD[X], SMUSD[X]
llvm-svn: 97276
This is possible because F8RC is a subclass of F4RC. We keep FMRSD around so
fextend has a pattern.
Also allow folding of memory operands on FMRSD.
llvm-svn: 97275
longer than 80 columns. This replaces the heavy-handed "textwidth"
mechanism, and makes the trailing-whitespace highlighting lazy so
that it isn't constantly jumping on the user during typing.
llvm-svn: 97267
The PowerPC floating point registers can represent both f32 and f64 via the
two register classes F4RC and F8RC. F8RC is considered a subclass of F4RC to
allow cross-class coalescing. This coalescing only affects whether registers
are spilled as f32 or f64.
Spill slots must be accessed with load/store instructions corresponding to the
class of the spilled register. PPCInstrInfo::foldMemoryOperandImpl was looking
at the instruction opcode which is wrong.
X86 has similar floating point register classes, but doesn't try to fold
memory operands, so there is no problem there.
llvm-svn: 97262
gross little neighbor merging implementation. This one has
the benefit of not violating the ordering of patterns, so it
generates code that passes tests again.
llvm-svn: 97218
current design. This generates a matcher that successfully
runs, but it turns out that the factoring we're doing violates
the ordering of patterns, so we end up matching (e.g.) movups
where we want movaps. This won't due, but I'll address this in
a follow on patch. It's nice to not be on by default yet! :)
llvm-svn: 97215
object construction. There is no provision to change them when the
code for a function generated.
So we have to change these names while printing assembly.
llvm-svn: 97213
and restore the entire matcher stack by value. This is because children
we're testing could do moveparent or other things besides just
scribbling on additions to the stack.
llvm-svn: 97212
the alignment requirement, if it no longer makes the TType base offset overflow
into extra bytes, then we need to pad to those bytes ourselves.
llvm-svn: 97196
will eliminate the need for padding in the "Call site table length". E.g., if
we have this:
GCC_except_table1:
Lexception1:
.byte 0xff ## @LPStart Encoding = omit
.byte 0x9b ## @TType Encoding = indirect pcrel sdata4
.byte 0x7f ## @TType base offset
.byte 0x03 ## Call site Encoding = udata4
.byte 0x89 ## Call site table length
with padding of 1. We want to emit the padding like this:
GCC_except_table1:
Lexception1:
.byte 0xff ## @LPStart Encoding = omit
.byte 0x9b ## @TType Encoding = indirect pcrel sdata4
.byte 0xff ## @TType base offset
.space 1,0 ## Padding
.byte 0x03 ## Call site Encoding = udata4
.byte 0x89 ## Call site table length
and not with padding on the "Call site table length" entry.
llvm-svn: 97183
instead of to have a chained series of scope nodes. This makes
the generated table smaller, improves the efficiency of the
interpreter, and make the factoring optimization much more
reasonable to implement.
llvm-svn: 97160
section with TextAlignFillValue and calls EmitCodeAlignment() instead of
calling EmitValueToAlignment(). This allows x86 assembly code to be aligned
with optimal nops.
llvm-svn: 97158
terms of store and load, which means bitcasting between scalar
integer and vector has endian-specific results, which undermines
this whole approach.
llvm-svn: 97137
keep track of instructions that return void) per-function. This fixes PR5278.
This breaks backwards compatibility with the metadata format. That's okay
because we haven't released the metadata bitcode yet.
llvm-svn: 97132
splitting all the patterns under scope nodes into equality sets
based on their first node. The second step is to rewrite the
graph info a form that exposes the sharing. Before I do this,
I want to redesign the Scope node.
llvm-svn: 97130
which branch on undef to branch on a boolean constant for the edge
exiting the loop. This helps ScalarEvolution compute trip counts for
loops.
Teach ScalarEvolution to recognize single-value PHIs, when safe, and
ForgetSymbolicName to forget such single-value PHI nodes as apprpriate
in ForgetSymbolicName.
llvm-svn: 97126
format is not parsable, even if the module is legal. To get parsable output,
dump the module instead of the function or smaller, since metadata kind are
attached to the module (not the context).
llvm-svn: 97124
(511*16) bytes register displacement (D-form).
NOTE: This is a potential headache, given the SPU's local core limitations,
allowing the software developer to commit stack overrun suicide unknowingly.
Also, large SPU stack frames will cause code size explosion. But, one presumes
that the software developer knows what they're doing...
Contributed by Kalle.Raiskila@nokia.com, edited slightly before commit.
llvm-svn: 97091
GCC_except_table label but before the Lexception, which the FDE references.
This causes problems as the FDE does not point to the start of an LSDA chunk.
Use an unnormalized uleb128 for the call-site table length that includes the
padding.
llvm-svn: 97078
- Function uses all scratch registers AND
- Function does not use any callee saved registers AND
- Stack size is too big to address with immediate offsets.
In this case a register must be scavenged to calculate the address of a stack
object, and the scavenger needs a spare register or emergency spill slot.
llvm-svn: 97071
greater-than-or-equal SELECT_CCs to NEON vmin/vmax instructions. This is
only allowed when UnsafeFPMath is set or when at least one of the operands
is known to be nonzero.
llvm-svn: 97065
the number of value bits, not the number of bits of allocation for in-memory
storage.
Make getTypeStoreSize and getTypeAllocSize work consistently for arrays and
vectors.
Fix several places in CodeGen which compute offsets into in-memory vectors
to use TargetData information.
This fixes PR1784.
llvm-svn: 97064
Adding the function "lookupGCCName" to the MBlazeIntrinsicInfo
class to support the Clang MicroBlaze target.
Additionally, minor fixes which remove some unused PIC code
(PIC is not supported yet in the MicroBlaze backend) and
removed some unused variables.
llvm-svn: 97054
movechild/record -> recordchild/movechild and
movechild/moveparent -> noop xforms. This slightly shrinks the tables
(x86 to 117454) and enables adding future improvements.
llvm-svn: 97051
the old one around for comparative purposes: have the
ENABLE_NEW_ISEL #define (which is not enabled on mainline) stop
emitting the old isel at all, yay for build time win.
llvm-svn: 97033
the new isel: fold movechild+record+moveparent into a
single recordchild N node. This shrinks the X86 table
from 125443 to 117502 bytes.
llvm-svn: 97031
necessary to swap the operands to handle NaN and negative zero properly.
Also, reintroduce logic for checking for NaN conditions when forming
SSE min and max instructions, fixed to take into consideration NaNs and
negative zeros. This allows forming min and max instructions in more
cases.
llvm-svn: 97025
to adding them in a determinstic order (bottom up from
the root) based on the structure of the graph itself.
This updates tests for some random changes, interesting
bits: CodeGen/Blackfin/promote-logic.ll no longer crashes.
I have no idea why, but that's good right?
CodeGen/X86/2009-07-16-LoadFoldingBug.ll also fails, but
now compiles to have one fewer constant pool entry, making
the expected load that was being folded disappear. Since it
is an unreduced mass of gnast, I just removed it.
This fixes PR6370
llvm-svn: 97023
internal nodes with flag results. Record these with a new
OPC_MarkFlagResults opcode and use this to update the interior
nodes' flag results properly. This fixes CodeGen/X86/i256-add.ll
with the new isel.
llvm-svn: 97021
memory from three or four registers and VST2 (multiple two-element structures)
which stores to memory from two double-spaced registers.
A8.6.391 & A8.6.393
llvm-svn: 97018
argument is non-null, pass it along to PHITranslateSubExpr so that it can
prefer using existing values that dominate the PredBB, instead of just
blindly picking the first equivalent value that it finds on a uselist.
Also when the DominatorTree is specified, have PHITranslateValue filter
out any result that does not dominate the PredBB. This is basically just
refactoring the check that used to be in GetAvailablePHITranslatedSubExpr
and also in GVN.
Despite my initial expectations, this change does not affect the results
of GVN for any testcases that I could find, but it should help compile time.
Before this change, if PHITranslateSubExpr picked a value that does not
dominate, PHITranslateWithInsertion would then insert a new value, which GVN
would later determine to be redundant and would replace. By picking a good
value to begin with, we save GVN the extra work of inserting and then
replacing a new value.
llvm-svn: 97010
Previously, LiveIntervalAnalysis would infer phi joins by looking for multiply
defined registers. That doesn't work if the phi join is implicitly defined in
all but one of the predecessors.
llvm-svn: 96994
With the compiler changed to use EmitCodeAlignment() it does change the
functionality. But X86 assembly code assembled with llvm-mc does not change
its output. For that we will eventually change the assembler frontend to
detect a '.align x, 0x90' when used in a section that 'hasInstructions' and use
EmitCodeAlignment, but will wait until we have better target hooks.
llvm-svn: 96988
three or four registers and VLD2 (multiple two-element structures) which loads
memory into two double-spaced registers.
A8.6.307 & A8.6.310
llvm-svn: 96980
no id's would cause early exit allowing IsLegalToFold to return true
instead of false, producing a cyclic dag.
This was striking the new isel because it isn't using SelectNodeTo yet,
which theoretically is just an optimization.
llvm-svn: 96972
The MicroBlaze is a highly configurable 32-bit soft-microprocessor for
use on Xilinx FPGAs. For more information see:
http://www.xilinx.com/tools/microblaze.htmhttp://en.wikipedia.org/wiki/MicroBlaze
The current LLVM MicroBlaze backend generates assembly which can be
compiled using the an appropriate binutils assembler.
llvm-svn: 96969
to be aligned with optimal nops. This patch does not change any functionality
and when the compiler is changed to use EmitCodeAlignment() it should also not
change the resulting output. Once the compiler change is made and everything
looks good the next patch with the table of optimal X86 nops will be added to
WriteNopData() changing the output. There are many FIXMEs in this patch which
will be removed when we have better target hooks (coming soon I hear).
llvm-svn: 96963
getelementptr. Despite only doing so in the case where x is a known
array object and c can be converted to an index within range, this
could still be invalid if c is actually the address of an object
allocated outside of LLVM. Also, SCEVExpander, the original motivation
for this code, has since been improved to avoid inttoptr+ptroint in
more cases.
llvm-svn: 96950
x86 and x86_64 on UNIX systems. Only OS X 10.6.2 (x86_64) and 32bit CentOS 5.2
with gcc 4.1.2 were tested. ARM UNIX build triggered failure motivating this
modification, as it seems that the ARM ABI does not support _Unwind_GetIP(...),
_Unwind_SetGR(...), and _Unwind_SetIP(...). From doing a quick browse of:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038a/IHI0038A_ehabi.pdf,
it seems as if all other exception related apis are supported. Looks like
the port can be done to ARM. Thanks to Xerxes Rånby <xerxes@zafena.se> for
pointing out this error.
llvm-svn: 96949
Comes in two parts:
1. Use --with-clang=path/to/clang/compiler to select an installed clang, or
--with-built-clang to have the makefiles use the clang which will be built
as the LLVM capable compiler. If neither is given, --with-built-clang will
be used if the Clang sources are checked out into the standard location
(tools/clang).
2. Use --with-llvmcc={llvm-gcc,clang,none} to specify which LLVM capable
compiler to use. If not given, then llvm-gcc will be used if available,
otherwise Clang.
Makefile support still to come.
Eric, Doug, Chris, seem reasonable?
llvm-svn: 96934