vector shuffle mask. Forced the mask to be built using i32. Note: this will
be irrelevant once vector_shuffle no longer takes a build vector for the
shuffle mask.
llvm-svn: 67076
- Fix fabs, fneg for f32 and f64.
- Use BuildVectorSDNode.isConstantSplat, now that the functionality exists
- Continue to improve i64 constant lowering. Lower certain special constants
to the constant pool when they correspond to SPU's shufb instruction's
special mask values. This avoids the overhead of performing a shuffle on a
zero-filled vector just to get the special constant when the memory load
suffices.
llvm-svn: 67067
a single character requires only one branch to follow slow path.
- Never use a buffer when writing on an unbuffered stream.
- Move default buffer size to header.
llvm-svn: 67066
write as arguments.
- Add raw_ostream::GetNumBytesInBuffer.
- Privatize buffer pointers.
- Get rid of slow and unnecessary code for writing out large strings.
llvm-svn: 67060
- Flush a known non-empty buffers; enforces the interface to
flush_impl and kills off HandleFlush (which I saw no reason to be
an inline method, Chris?).
- Clarify invariant that flush_impl is only called with OutBufCur >
OutBufStart.
- This also cleary collects all places where we have to deal with the
buffer possibly not existing.
- A few more comments and fixing the unbuffered behavior remain in
this commit sequence.
llvm-svn: 67057
U test/CodeGen/X86/2009-03-13-PHIElimBug.ll
D test/CodeGen/X86/2009-03-16-PHIElimInLPad.ll
U lib/CodeGen/PHIElimination.cpp
r67049 was causing this failure:
Running /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/dg.exp ...
FAIL: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/2009-03-13-PHIElimBug.ll for PR3784
Failed with exit(1) at line 1
while running: llvm-as < /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/CodeGen/X86/2009-03-13-PHIElimBug.ll | llc -march=x86 | /usr/bin/grep -A 2 {call f} | /usr/bin/grep movl
child process exited abnormally
llvm-svn: 67051
how invokes are set up. The fix could be disturbed by
register copies coming after the EH_LABEL, and also didn't
behave quite right when it was the invoke result that
was used in a phi node. Also (see new testcase) fix
another phi elimination bug while there: register copies
in the landing pad need to come after the EH_LABEL, because
that's where execution branches to when unwinding. If they
come before the EH_LABEL then they will never be executed...
Also tweak the original testcase so it doesn't use a no-longer
existing counter.
The accumulated phi elimination changes fix two of seven Ada
testsuite failures that turned up after landing pad critical
edge splitting was turned off. So there's probably more to come.
llvm-svn: 67049
operand is a signed 32-bit immediate. Unlike with the 8-bit
signed immediate case, it isn't actually smaller to fold a
32-bit signed immediate instead of a load. In fact, it's
larger in the case of 32-bit unsigned immediates, because
they can be materialized with movl instead of movq.
llvm-svn: 67001
if FPConstant is legal because if the FPConstant doesn't need to be stored
in a constant pool, the transformation is unlikely to be profitable.
llvm-svn: 66994
ptrtoint and inttoptr in X86FastISel. These casts aren't always
handled in the generic FastISel code because X86 sometimes needs
custom code to do truncation and zero-extension.
llvm-svn: 66988
component's warnings to process for '-gen-clang-diags-defs'.
Also, when the component is specified, generate a '#if' prologue at the top of
the generated .def file (to match the current files).
llvm-svn: 66975
large for the testsuite) took over six minutes to compile on my Mac.
The patched LLVM-GCC compiles that testcase in three seconds (GCC
takes less than one second). This hash function is more complex
(about 35 instructions on x86) than what Chris wanted, but I expect it
will be well-behaved with arbitrary inputs.
Thank you to everyone who responded to my previous request for advice.
llvm-svn: 66962
- Required some extra makefile tweaks to introduce a new flag var
which only goes to compile/link tools but not the relink step,
otherwise we get a copy of libgcov in the relinked .o files.
- No configure magic for this.
llvm-svn: 66945
by inserting explicit zero extensions where necessary. Included
is a testcase where SelectionDAG produces a virtual register
holding an i1 value which FastISel previously mistakenly assumed
to be zero-extended.
llvm-svn: 66941
changes.
For InvokeInst now all arguments begin at op_begin().
The Callee, Cont and Fail are now faster to get by
access relative to op_end().
This patch introduces some temporary uglyness in CallSite.
Next I'll bring CallInst up to a similar scheme and then
the uglyness will magically vanish.
This patch also exposes all the reliance of the libraries
on InvokeInst's operand ordering. I am thinking of taking
care of that too.
llvm-svn: 66920
errors when thrown. This gets us nice errors like this from tblgen:
CMOVL32rr: (set GR32:i32:$dst, (X86cmov GR32:$src1, GR32:$src2))
/Users/sabre/llvm/Debug/bin/tblgen: error:
Included from X86.td:116:
Parsing X86InstrInfo.td:922: In CMOVL32rr: X86cmov node requires exactly 4 operands!
def CMOVL32rr : I<0x4C, MRMSrcReg, // if <s, GR32 = GR32
^
instead of just:
CMOVL32rr: (set GR32:i32:$dst, (X86cmov GR32:$src1, GR32:$src2))
/Users/sabre/llvm/Debug/bin/tblgen: In CMOVL32rr: X86cmov node requires exactly 4 operands!
This is all I plan to do with this, but it should be easy enough to improve if anyone
cares (e.g. keeping more loc info in "dag" expr records in tblgen.
llvm-svn: 66898
1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants.
2. MachineConstantPool alignment field is also a log2 value.
3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values.
4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries.
5. Asm printer uses expensive data structure multimap to track constant pool entries by sections.
6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic.
Solutions:
1. ConstantPoolSDNode alignment field is changed to keep non-log2 value.
2. MachineConstantPool alignment field is also changed to keep non-log2 value.
3. Functions that create ConstantPool nodes are passing in non-log2 alignments.
4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT.
5. Asm printer uses cheaper data structure to group constant pool entries.
6. Asm printer compute entry offsets after grouping is done.
7. Change JIT code to compute entry offsets on the fly.
llvm-svn: 66875
for i32/i64 expressions (we could also do i16 on cpus where
i16 lea is fast, but I didn't add this). On the example, we now
generate:
_test:
movl 4(%esp), %eax
cmpl $42, (%eax)
setl %al
movzbl %al, %eax
leal 4(%eax,%eax,8), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
cmpl $41, (%eax)
movl $4, %ecx
movl $13, %eax
cmovg %ecx, %eax
ret
llvm-svn: 66869
operands can't both be fully folded at the same time. For example,
in the included testcase, a global variable is being added with
an add of two values. The global variable wants RIP-relative
addressing, so it can't share the address with another base
register, but it's still possible to fold the initial add.
llvm-svn: 66865
right; did the wrong thing when there are exactly 11
non-debug instructions, followed by debug info.
Remove a FIXME since it's apparently been fixed along the way.
llvm-svn: 66840
in the Ada testcase. Reverting this only covers up
the real problem, which is a nasty conceptual difficulty
in the phi elimination pass: when eliminating phi nodes
in landing pads, the register copies need to come before
the invoke, not at the end of the basic block which is
too late... See PR3784.
llvm-svn: 66826
refers to the "prefix" directory, i.e., one level above "bin". LLVMGCCPATH
is used as the directory containing the llvm-gcc executable, so add a "/bin"
suffix to get from LLVMGCCDIR to LLVMGCCPATH.
llvm-svn: 66823
access each with a fixed negative index from op_end().
This has two important implications:
- getUser() will work faster, because there are less iterations
for the waymarking algorithm to perform. This is important
when running various analyses that want to determine callers
of basic blocks.
- getSuccessor() now runs faster, because the indirection via OperandList
is not necessary: Uses corresponding to the successors are at fixed
offset to "this".
The price we pay is the slightly more complicated logic in the operator
User::delete, as it has to pick up the information whether it has to free
the memory of an original unconditional BranchInst or a BranchInst that
was originally conditional, but has been shortened to unconditional.
I was not able to come up with a nicer solution to this problem. (And
rest assured, I tried *a lot*).
Similar reorderings will follow for InvokeInst and CallInst. After that
some optimizations to pred_iterator and CallSite will fall out naturally.
llvm-svn: 66815
related transformations out of target-specific dag combine into the
ARM backend. These were added by Evan in r37685 with no testcases
and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll).
Add some simple X86-specific (for now) DAG combines that turn things
like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently
with the recently added cp constant select optimization, but is a
very general xform. For example, we now compile the second example
in const-select.ll to:
_test:
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
seta %al
movzbl %al, %eax
movl 4(%esp), %ecx
movsbl (%ecx,%eax,4), %eax
ret
instead of:
_test:
movl 4(%esp), %eax
leal 4(%eax), %ecx
movsd LCPI2_0, %xmm0
ucomisd 8(%esp), %xmm0
cmovbe %eax, %ecx
movsbl (%ecx), %eax
ret
This passes multisource and dejagnu.
llvm-svn: 66779
from a switch table. Multiple table entries that
branch to the same place were being sorted by the
pointer value of the ConstantInt*; changed to sort
by the actual value of the ConstantInt.
llvm-svn: 66749
allocations. Apparently the assumption is there is an
instruction (terminator?) following the allocation so I
am allowing the same assumption.
llvm-svn: 66716
alignment of the generated constant pool entry to the
desired alignment of a type. If we don't do this, we end up
trying to do movsd from 4-byte alignment memory. This fixes
450.soplex and 456.hmmer.
llvm-svn: 66641
1. Use the same value# to represent unknown values being merged into sub-registers.
2. When coalescer commute an instruction and the destination is a physical register, update its sub-registers by merging in the extended ranges.
llvm-svn: 66610
the untimed version of getOrCreateSourceID. getOrCreateSourceID calls
GetOrCreateSourceID, of course.
- Move some methods into the "private" section. Constify at least one method.
- General clean-ups.
llvm-svn: 66582
scheduled in multiple regions, liveness data used by the
anti-dependence breaker is carried from one region to the next, however
the information reflects the state of the instructions before scheduling.
After scheduling, there may be new live range overlaps. Handle this by
pessimizing the liveness data carried between regions to the point where
it will be conservatively correct now matter how the earlier region is
scheduled. This fixes a miscompilation in 176.gcc with the post-RA
scheduler enabled.
llvm-svn: 66558
to obtain debug info about them.
Introduce helpers to access debug info for global variables. Also introduce a
helper that works for both local and global variables.
llvm-svn: 66541
allocating memory in the JIT. This is insanely inefficient, but
hey, most people implement their own memory managers anyway.
Patch by Eric Yew!
llvm-svn: 66472
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmgcc42.roots/llvmgcc42~obj/src/gcc/libgcc2.c: In function '__muldi3':
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmgcc42.roots/llvmgcc42~obj/src/gcc/libgcc2.c:567: internal compiler error: Bus error
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmgcc42.roots/llvmgcc42~obj/src/gcc/libgcc2.c: In function '__lshrdi3':
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmgcc42.roots/llvmgcc42~obj/src/gcc/libgcc2.c:421: internal compiler error: Bus error
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
make[5]: *** [libgcc/./_lshrdi3.o] Error 1
make[5]: *** Waiting for unfinished jobs....
make[5]: *** [libgcc/./_muldi3.o] Error 1
make[5]: *** [libgcc/./_negdi2.o] Error 1
--- Reverse-merging (from foreign repository) r66415 into '.':
U include/llvm/BasicBlock.h
U include/llvm/ADT/ilist_node.h
U include/llvm/CodeGen/SelectionDAG.h
U include/llvm/CodeGen/MachineFunction.h
U include/llvm/CodeGen/MachineBasicBlock.h
U include/llvm/Function.h
llvm-svn: 66426
from 66280. I was unable to verify this with gcc-3.4.6, but with gcc-3.3 it
avoids the "base class with only non-default constructor in class without
a constructor" warning. Apparently that warning was promoted to an error
in gcc-3.4.
llvm-svn: 66424
whether a global is dead or not. This should fix PR3749 - linker adds
spurious use to appending globals. I can't reasonably add a testcase
for this, because the bc writer/reader strip dead constant users.
llvm-svn: 66404
validate an invariant so that the asmparser rejects a bad construct
instead of the verifier. Before:
llvm-as: assembly parsed, but does not verify as correct!
Invalid struct return type!
i64 (%struct.Type*, %struct.Type*)* @foo
after:
llvm-as: t.ll:5:8: functions with 'sret' argument must return void
define i64 @foo(%struct.Type* noalias nocapture sret %agg.result, %struct.Type* nocapture byval %t) nounwind {
^
Second, check that void is only used where allowed (in function return types) not in
arbitrary places, fixing PR3747 - Crash in llvm-as with void field in struct. We
now reject that example with:
$ llvm-as t.ll
llvm-as: t.ll:1:12: struct element can not have void type
%x = type {void}
^
llvm-svn: 66394
by checking that the top-level type of a gep is sized. This
causes us to reject the example with:
llvm-as: t2.ll:2:16: invalid getelementptr indices
getelementptr i32()* null, i32 1
^
llvm-svn: 66393
and extern_weak_odr. These are the same as the non-odr versions,
except that they indicate that the global will only be overridden
by an *equivalent* global. In C, a function with weak linkage can
be overridden by a function which behaves completely differently.
This means that IP passes have to skip weak functions, since any
deductions made from the function definition might be wrong, since
the definition could be replaced by something completely different
at link time. This is not allowed in C++, thanks to the ODR
(One-Definition-Rule): if a function is replaced by another at
link-time, then the new function must be the same as the original
function. If a language knows that a function or other global can
only be overridden by an equivalent global, it can give it the
weak_odr linkage type, and the optimizers will understand that it
is alright to make deductions based on the function body. The
code generators on the other hand map weak and weak_odr linkage
to the same thing.
llvm-svn: 66339
signal handlers to prevent reentrance on unrelated things (a sigabort
where the handle bus errors) also, clear the signal mask so that the
signal doesn't infinitely reissue. This fixes rdar://6654827 -
Crash causes clang to loop
llvm-svn: 66330
1. When the JIT is asked to remove a function, updating it's
mapping to 0, we invalidate any function stubs used only
by that function. Now, also invalidate the JIT's mapping
from the GV the stub pointed to, to the address of the GV.
2. When dlsym stubs for cross-process JIT are enabled, do not
abort just because a named function cannot be found in the
JIT's process.
3. Fix various assumptions about when it is ok to use the lazy
resolver when non-lazy JITing is enabled.
llvm-svn: 66324
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.
llvm-svn: 66318