This change adds an new option --arm-enable-ehabi-descriptors that
enables emitting unwinding descriptors. This provides a mode with a
working backtrace() without the (currently broken) exception support.
llvm-svn: 148800
violation -- MC cannot depend on CodeGen.
Specifically, the MCTargetDesc component of each target is actually
a subcomponent of the MC library. As such, it cannot depend on the
target-independent code generator, because MC itself cannot depend on
the target-independent code generator. This change moved a flag from the
ARM MCTargetDesc file ARMMCAsmInfo.cpp to the CodeGen layer in
ARMException.cpp, leaving behind an 'extern' to refer back to it. That
layering order isn't viable givin the constraints outlined above.
Commandline flags are designed to be static specifically to avoid these
types of bugs.
Fixing this is likely going to require some non-trivial refactoring.
llvm-svn: 148759
This change adds an new value to the --arm-enable-ehabi option that
disables emitting unwinding descriptors. This mode gives a working
backtrace() without the (currently broken) exception support.
llvm-svn: 148686
returns false in the event the computation feeding into the pointer is
unreachable, which maybe ought to be true -- but this is at least consistent
with undef->isDereferenceablePointer().) Fixes PR11825!
llvm-svn: 148671
We have patterns for vector sext and zext operations but were missing
anyext. Without those patterns, codegen will fail when the selection DAG
has any_extend nodes.
llvm-svn: 148568
For bit patterns that aren't representable using the 8-bit floating point
representation for vmov.f32, but are representable via vmov.i32, treat
the .f32 syntax as an alias. Most importantly, this covers the case
'vmov.f32 Vd, #0.0'.
rdar://10616677
llvm-svn: 148556
can't handle. Also don't produce non-zero results for things which won't be
transformed by SROA at all just because we saw the loads/stores before we saw
the use of the address.
llvm-svn: 148536
LSR has gradually been improved to more aggressively reuse existing code, particularly existing phi cycles. This exposed problems with the SCEVExpander's sloppy treatment of its insertion point. I applied some rigor to the insertion point problem that will hopefully avoid an endless bug cycle in this area. Changes:
- Always used properlyDominates to check safe code hoisting.
- The insertion point provided to SCEV is now considered a lower bound. This is usually a block terminator or the use itself. Under no cirumstance may SCEVExpander insert below this point.
- LSR is reponsible for finding a "canonical" insertion point across expansion of different expressions.
- Robust logic to determine whether IV increments are in "expanded" form and/or can be safely hoisted above some insertion point.
Fixes PR11783: SCEVExpander assert.
llvm-svn: 148535
When the non-local symbol in the expression is in the same fragment
as the second symbol, the assembler can still evaluate the expression
without needing a relocation.
For example, on ARM:
_foo:
ldr lr, (_foo - 4)
rdar://10348687
llvm-svn: 148341
implementation.
Currently lit still executes ExecutionEngine tests with JIT (not MCJIT) by
default. MCJIT tests can be executed manually by calling llvm-lit with
--param jit_impl=mcjit
llvm-svn: 148299
In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner.
However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which
currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part.
llvm-svn: 148298
It's becoming clear that LoopSimplify needs to unconditionally create loop preheaders. But that is a bigger fix. For now, continuing to hack LSR.
Fixes rdar://10701050 "Cannot split an edge from an IndirectBrInst" assert.
llvm-svn: 148288
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.
Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.
llvm-svn: 148225
live across BBs before register allocation. This miscompiled 197.parser
when a cmp + b are optimized to a cbnz instruction even though the CPSR def
is live-in a successor.
cbnz r6, LBB89_12
...
LBB89_12:
ble LBB89_1
The fix consists of two parts. 1) Teach LiveVariables that some unallocatable
registers might be liveouts so don't mark their last use as kill if they are.
2) ARM constantpool island pass shouldn't form cbz / cbnz if the conditional
branch does not kill CPSR.
rdar://10676853
llvm-svn: 148168
overly conservative. It was concerned about cases where it would prohibit
folding simple [r, c] addressing modes. e.g.
ldr r0, [r2]
ldr r1, [r2, #4]
=>
ldr r0, [r2], #4
ldr r1, [r2]
Change the logic to look for such cases which allows it to form indexed memory
ops more aggressively.
rdar://10674430
llvm-svn: 148086
the optimizer doesn't eliminate objc_retainBlock calls which are needed
for their side effect of copying blocks onto the heap.
This implements rdar://10361249.
llvm-svn: 148076
lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&& "Op 1 of shuffle should not be undef"' failed.
Added a test.
llvm-svn: 148044
When we load the v12i32 type, the GenWidenVectorLoads method generates two loads: v8i32 and v4i32
and attempts to use CONCAT_VECTORS to join them. In this fix I concat undef values to widen
the smaller value. The test "widen_load-2.ll" also exposes this bug on AVX.
llvm-svn: 147964
hoped this would revive one of the llvm-gcc selfhost build bots, but it
didn't so it doesn't appear that my transform is the culprit.
If anyone else is seeing failures, please let me know!
llvm-svn: 147957
directives was in the wrong place and getting triggered incorectly with a
cpp .file directive. This change fixes that and adds a test case.
llvm-svn: 147951
strange build bot failures that look like a miscompile into an infloop.
I'll investigate this tomorrow, but I'd both like to know whether my
patch is the culprit, and get the bots back to green.
llvm-svn: 147945
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:
unsigned x = my_accelerator_table[input >> 11];
Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):
*(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));
The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.
In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.
llvm-svn: 147936
1. Size heuristics changed. Now we calculate number of unswitching
branches only once per loop.
2. Some checks was moved from UnswitchIfProfitable to
processCurrentLoop, since it is not changed during processCurrentLoop
iteration. It allows decide to skip some loops at an early stage.
Extended statistics:
- Added total number of instructions analyzed.
llvm-svn: 147935
Allow LDRD to be formed from pairs with different LDR encodings. This was the original intention of the pass. Somewhere along the way, the LDR opcodes were refined which broke the optimization. We really don't care what the original opcodes are as long as they both map to the same LDRD and the immediate still fits.
Fixes rdar://10435045 ARMLoadStoreOptimization cannot handle mixed LDRi8/LDRi12
llvm-svn: 147922
Consider this code:
int h() {
int x;
try {
x = f();
g();
} catch (...) {
return x+1;
}
return x;
}
The variable x is undefined on the first edge to the landing pad, but it
has the f() return value on the second edge to the landing pad.
SplitAnalysis::getLastSplitPoint() would assume that the return value
from f() was live into the landing pad when f() throws, which is of
course impossible.
Detect these cases, and treat them as if the landing pad wasn't there.
This allows spill code to be inserted after the function call to f().
<rdar://problem/10664933>
llvm-svn: 147912
with other symbols.
An object in the __cfstring section is suppoed to be filled with CFString
objects, which have a pointer to ___CFConstantStringClassReference followed by a
pointer to a __cstring. If we allow the object in the __cstring section to be
merged with another global, then it could end up in any section. Because the
linker is going to remove these symbols in the final executable, we shouldn't
bother to merge them.
<rdar://problem/10564621>
llvm-svn: 147899
This function runs after all constant islands have been placed, and may
shrink some instructions to their 2-byte forms. This can actually cause
some constant pool entries to move out of range because of growing
alignment padding.
Treat instructions that may be shrunk the same as inline asm - they
erode the known alignment bits.
Also reinstate an old assertion in verify(). It is correct now that
basic block offsets include alignments.
Add a single large test case that will hopefully exercise many parts of
the constant island pass.
<rdar://problem/10670199>
llvm-svn: 147885
assembly source when it generates the TAG_subprogram dwarf debug info for
the labels that have nothing between them as in this bit of assembly source:
% cat ZeroLength.s
_func1:
_func2:
nop
One solution would be to not emit the subsequent labels with the same address
and use the next label with a different address or the end of the section for
the AT_high_pc value of the TAG_subprogram.
Turns out in llvm-mc it is not possible in all cases to determine of two
symbols have the same value at the point we put out the TAG_subprogram dwarf
debug info.
So we will have llvm-mc instead of putting out TAG_subprogram's put out
DW_TAG_label's. And the DW_TAG_label does not have a AT_high_pc value which
avoids the problem.
This commit is only the functional change to make the diffs clear as to what is
really being changed. The next commit will be to clean up the names of such
things like MCGenDwarfSubprogramEntry to something like MCGenDwarfLabelEntry.
rdar://10666925
llvm-svn: 147860
define physical registers. It's currently very restrictive, only catching
cases where the CE is in an immediate (and only) predecessor. But it catches
a surprising large number of cases.
rdar://10660865
llvm-svn: 147827
These heuristics are sufficient for enabling IV chains by
default. Performance analysis has been done for i386, x86_64, and
thumbv7. The optimization is rarely important, but can significantly
speed up certain cases by eliminating spill code within the
loop. Unrolled loops are prime candidates for IV chains. In many
cases, the final code could still be improved with more target
specific optimization following LSR. The goal of this feature is for
LSR to make the best choice of induction variables.
Instruction selection may not completely take advantage of this
feature yet. As a result, there could be cases of slight code size
increase.
Code size can be worse on x86 because it doesn't support postincrement
addressing. In fact, when chains are formed, you may see redundant
address plus stride addition in the addressing mode. GenerateIVChains
tries to compensate for the common cases.
On ARM, code size increase can be mitigated by using postincrement
addressing, but downstream codegen currently misses some opportunities.
llvm-svn: 147826
After collecting chains, check if any should be materialized. If so,
hide the chained IV users from the LSR solver. LSR will only solve for
the head of the chain. GenerateIVChains will then materialize the
chained IV users by computing the IV relative to its previous value in
the chain.
In theory, chained IV users could be exposed to LSR's solver. This
would be considerably complicated to implement and I'm not aware of a
case where we need it. In practice it's more important to
intelligently prune the search space of nontrivial loops before
running the solver, otherwise the solver is often forced to prune the
most optimal solutions. Hiding the chained users does this well, so
that LSR is more likely to find the best IV for the chain as a whole.
llvm-svn: 147801
LoopSimplify may not run on some outer loops, e.g. because of indirect
branches. SCEVExpander simply cannot handle outer loops with no preheaders.
Fixes rdar://10655343 SCEVExpander segfault.
llvm-svn: 147718
file error checking. Use that to error on an unfinished cfi_startproc.
The error is not nice, but is already better than a segmentation fault.
llvm-svn: 147717
opportunities that only present themselves after late optimizations
such as tail duplication .e.g.
## BB#1:
movl %eax, %ecx
movl %ecx, %eax
ret
The register allocator also leaves some of them around (due to false
dep between copies from phi-elimination, etc.)
This required some changes in codegen passes. Post-ra scheduler and the
pseudo-instruction expansion passes have been moved after branch folding
and tail merging. They were before branch folding before because it did
not always update block livein's. That's fixed now. The pass change makes
independently since we want to properly schedule instructions after
branch folding / tail duplication.
rdar://10428165
rdar://10640363
llvm-svn: 147716
This eliminates a lot of constant pool entries for -O0 builds of code
with many global variable accesses.
This speeds up -O0 codegen of consumer-typeset by 2x because the
constant island pass no longer has to look at thousands of constant pool
entries.
<rdar://problem/10629774>
llvm-svn: 147712
present in the bottom of the CFG triangle, as the transformation isn't
ever valuable if the branch can't be eliminated.
Also, unify some heuristics between SimplifyCFG's multiple
if-converters, for consistency.
This fixes rdar://10627242.
llvm-svn: 147630
code can incorrectly move the load across a store. This never
happens in practice today, but only because the current
heuristics accidentally preclude it.
llvm-svn: 147623
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.
llvm-svn: 147604
Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX)
Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov
llvm-svn: 147601
Now that canRealignStack() understands frozen reserved registers, it is
safe to use it for aligned spill instructions.
It will only return true if the registers reserved at the beginning of
register allocation allow for dynamic stack realignment.
<rdar://problem/10625436>
llvm-svn: 147579
(x > y) ? x : y
=>
(x >= y) ? x : y
So for something like
(x - y) > 0 : (x - y) ? 0
It will be
(x - y) >= 0 : (x - y) ? 0
This makes is possible to test sign-bit and eliminate a comparison against
zero. e.g.
subl %esi, %edi
testl %edi, %edi
movl $0, %eax
cmovgl %edi, %eax
=>
xorl %eax, %eax
subl %esi, $edi
cmovsl %eax, %edi
rdar://10633221
llvm-svn: 147512
This patch caused a miscompilation of oggenc because a frame pointer was
suddenly needed halfway through register allocation.
<rdar://problem/10625436>
llvm-svn: 147487
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147445
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147399
1. The ST*UX instructions that store and update the stack pointer did not set define/kill on R1. This became a problem when I activated post-RA scheduling (and had incorrectly adjusted the Frames-large test).
2. eliminateFrameIndex did not kill its scavenged temporary register, and this could cause the scavenger to exhaust all available registers (and its emergency spill slot) when there were a lot of CR values to spill. The 2010-02-12-saveCR test has been adjusted to check for this.
llvm-svn: 147359
captured. This allows the tracker to look at the specific use, which may be
especially interesting for function calls.
Use this to fix 'nocapture' deduction in FunctionAttrs. The existing one does
not iterate until a fixpoint and does not guarantee that it produces the same
result regardless of iteration order. The new implementation builds up a graph
of how arguments are passed from function to function, and uses a bottom-up walk
on the argument-SCCs to assign nocapture. This gets us nocapture more often, and
does so rather efficiently and independent of iteration order.
llvm-svn: 147327
This has the obvious advantage of being commutable and is always a win on x86 because
const - x wastes a register there. On less weird architectures this may lead to
a regression because other arithmetic doesn't fuse with it anymore. I'll address that
problem in a followup.
llvm-svn: 147254
LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type
We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]
llvm-svn: 147251
my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like
nounwind. Remove stray uses comments. Name things semantically rather
than tN so that adding a new test in the middle doesn't cause pain, and
so that new tests can be grouped semantically.
This exposes how little systematic testing is going on here. I noticed
this by finding several bugs via inspection and wondering why this test
wasn't catching any of them. =[
llvm-svn: 147248
'bsf' instructions here.
This one is actually debatable to my eyes. It's not clear that any chip
implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless
EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding.
Still, this restores the old behavior with 'tzcnt' enabled for now.
llvm-svn: 147246
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:
(sizeof(x)*8 - 1) ^ __builtin_clz(x)
Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.
The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.
Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.
These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.
llvm-svn: 147244
ARM targets with NEON units have access to aligned vector loads and
stores that are potentially faster than unaligned operations.
Add support for spilling the callee-saved NEON registers to an aligned
stack area using 16-byte aligned NEON loads and store.
This feature is off by default, controlled by an -align-neon-spills
command line option.
llvm-svn: 147211
time regressions. In general, it is beneficial to compile-time.
Original commit message:
Fix for bug #11429: Wrong behaviour for switches. Small improvement for code
size heuristics.
llvm-svn: 147175
probability wouldn't be considered "hot" in some weird loop structures
or other compounding probability patterns. This makes it much harder to
confuse, but isn't really a principled fix. I'd actually like it if we
could model a zero probability, as it would make this much easier to
reason about. Suggestions for how to do this better are welcome.
llvm-svn: 147142
performance regressions (both execution-time and compile-time) on our
nightly testers.
Original commit message:
Fix for bug #11429: Wrong behaviour for switches. Small improvement for code
size heuristics.
llvm-svn: 147131
Use the spill slot alignment as well as the local variable alignment to
determine when the stack needs to be realigned. This works now that the
ARM target can always realign the stack by using a base pointer.
Still respect the ARMBaseRegisterInfo::canRealignStack() function
vetoing a realigned stack. Don't use aligned spill code in that case.
llvm-svn: 146997
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.
The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.
llvm-svn: 146974
We used to rely on the *eh_sjlj_setjmp instructions to mark that a function
with setjmp/longjmp exception handling clobbers all the registers. But with
the recent reorganization of ARM EH, those eh_sjlj_setjmp instructions are
expanded away earlier, before PEI can see them to determine what registers to
save and restore. Mark the dispatchsetup instruction in the same way, since
that instruction cannot be expanded early. This also more accurately reflects
when the registers are clobbered.
llvm-svn: 146949
"mov r1, r2, lsl #0" should assemble as "mov r1, r2" even though it's
not strictly legal UAL syntax. It's a common extension and the friendly
thing to do.
rdar://10604663
llvm-svn: 146937
merging types by name when we can. We still don't guarantee type name linkage
but we do it when obviously the right thing to do. This makes LTO type names
easier to read, for example.
llvm-svn: 146932
This change reduces the number of instructions generated.
For example,
(load (add (sub $n0, $n1), (MipsLo got(s))))
results in the following sequence of instructions:
1. sub $n2, $n0, $n1
2. lw got(s)($n2)
Previously, three instructions were needed.
1. sub $n2, $n0, $n1
2. addiu $n3, $n2, got(s)
3. lw 0($n3)
llvm-svn: 146888
internal nightly testers. Original commit message:
By popular demand, link up types by name if they are isomorphic and one is an
autorenamed version of the other. This makes the IR easier to read, because
we don't end up with random renamed versions of the types after LTO'ing a large
app.
llvm-svn: 146838
autorenamed version of the other. This makes the IR easier to read, because
we don't end up with random renamed versions of the types after LTO'ing a large app.
llvm-svn: 146728
into Analysis as a standalone function, since there's no need for
it to be in VMCore. Also, update it to use isKnownNonZero and
other goodies available in Analysis, making it more precise,
enabling more aggressive optimization.
llvm-svn: 146610
On ARM, peephole optimization for ABS creates a trivial cfg triangle which tempts machine sink to sink instructions in code which is really straight line code. Sometimes this sinking may alter register allocator input such that use and def of a reg is divided by a branch in between, which may result in extra spills. Now mahine sink avoids sinking if final sink destination is post dominator.
Radar 10266272.
llvm-svn: 146604
Work in progress. Parsing for non-writeback, single spaced register lists
works now. The rest have the representations better factored, but still
need more to be able to parse properly.
llvm-svn: 146579
When 'cmp rn #imm' doesn't match due to the immediate not being representable,
but 'cmn rn, #-imm' does match, use the latter in place of the former, as
it's equivalent.
rdar://10552389
llvm-svn: 146567
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
prevent IT blocks from being broken apart.
llvm-svn: 146542
test cases where there were a lot of relocations applied relative to a large
rodata section. Gas would create a symbol for each of these whereas we would
be relative to the beginning of the rodata section. This change mimics what
gas does.
Patch by Jack Carter.
llvm-svn: 146468
These modifiers simply select either the low or high D subregister of a Neon
Q register. I've also removed the unimplemented 'p' modifier, which turns out
to be a bit different than the comment here suggests and as far as I can tell
was only intended for internal use in Apple's version of gcc.
llvm-svn: 146417
I followed three heuristics for deciding whether to set 'true' or
'false':
- Everything target independent got 'true' as that is the expected
common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
set the flag in the way that exercises the most of codegen. For most
architectures this is also the likely path from a GCC builtin, with
'true' being set. It will (eventually) require lowering away that
difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
operation should be tested.
Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.
llvm-svn: 146370
intrinsic syntax.
Now that this is explicitly covered, I plan to upgrade the existing test
suite to use an explicit immediate. Note that I plan to specify 'true'
in most places rather than the auto-upgraded value as that is the far
more common value to end up here as that is the value coming from GCC's
builtins. The only place I'm likely to put a 'false' in is when testing
x86 which actually has different instructions for the two variants.
llvm-svn: 146369
does. The _GLOBAL_OFFSET_TABLE_ is still magical in that we get a R_386_GOTPC,
but it doesn't change the immediate in the same way as when the expression
has no right hand side symbol.
llvm-svn: 146311
When the immediate operand of an AND or BIC instruction isn't representable
in the immediate field of the instruction, but the bitwise negation of the
immediate is, assemble the instruction as the inverse operation instead
with the inverted immediate as the operand.
rdar://10550057
llvm-svn: 146283
Refactor the instructions into fixed writeback and register-stride
writeback variants to simplify the offset operand (no more optional
register operand using reg0). This is a simpler representation and allows
the assembly parser to more easily handle these instructions.
Add tests for the instruction variants now supported.
llvm-svn: 146278
generates the dwarf Compile Unit DIE and a dwarf subprogram DIE for each
non-temporary label.
The next part will be to get the clang driver to enable this when assembling
a .s file. rdar://9275556
llvm-svn: 146262
Patch by Brendon Cahoon!
This extends the existing LoopUnroll and LoopUnrollPass. Brendon
measured no regressions in the llvm test suite with -unroll-runtime
enabled. This implementation works by using the existing loop
unrolling code to unroll the loop by a power-of-two (default 8). It
generates an if-then-else sequence of code prior to the loop to
execute the extra iterations before entering the unrolled loop.
llvm-svn: 146245
We must not issue a bitcast operation for integer-promotion of vector types, because the
location of the values in the vector may be different.
llvm-svn: 146150