simplifying them and exposing more information to tblgen. It would be nice
if other target authors adopted this as well, particularly arm since it has fastisel.
llvm-svn: 129676
kind of predicate: one that is specific to imm nodes. The predicate function
specified here just checks an int64_t directly instead of messing around with
SDNode's. The virtue of this is that it means that fastisel and other things
can reason about these predicates.
llvm-svn: 129675
structure and fix some fixmes. We now have a TreePredicateFn class
that handles all of the decoding of these things. This is an internal
cleanup that has no impact on the code generated by tblgen.
llvm-svn: 129670
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the
shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
instead of FastEmit_ri to simplify code.
llvm-svn: 129666
when we have a global variable base an an index. Instead, just give up on
folding the global variable.
Before we'd geenrate:
_test: ## @test
## BB#0:
movq _rtx_length@GOTPCREL(%rip), %rax
leaq (%rax), %rax
addq %rdi, %rax
movzbl (%rax), %eax
ret
now we generate:
_test: ## @test
## BB#0:
movq _rtx_length@GOTPCREL(%rip), %rax
movzbl (%rax,%rdi), %eax
ret
The difference is even more significant when there is a scale
involved.
This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64
llvm-svn: 129664
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
llvm-svn: 129662
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
llvm-svn: 129656
Break the arc-profile code out to a function like the notes emission code is,
and reorder the functions in the file.
The only functionality change is that we no longer modify the Module when the
Module has no debug info to use.
llvm-svn: 129631
The transferValues() function can now handle both singly and multiply defined
values, as long as the resulting live range is known. Only rematerialized values
have their live range recomputed by extendRange().
The updateSSA() function can now insert PHI values in bulk across multiple
values in multiple target registers in one pass. The list of blocks received
from transferValues() is in layout order which seems to work well for the
iterative algorithm. Blocks from extendRange() are still in reverse BFS order,
but this function is used so rarely now that it doesn't matter.
llvm-svn: 129580
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.
llvm-svn: 129571
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
llvm-svn: 129508
I've used it a few times to reduce unit tests and gotten one request for information on it. It's not easy to use correctly because bugpoint doesn't tell you when you're doing it wrong.
llvm-svn: 129507
instruction around, reducing work.
Greatly simplify handling of debug instructions. There is no need to
build up a vector of them and then move them into the one predecessor
if we're processing a block. Instead just rescan the block and *copy*
them into the pred. If a block gets merged into multiple preds, this
will retain more debug info.
llvm-svn: 129502
the same allocation size but different primitive sizes(e.g., <3xi32> and
<4xi32>). When ScalarRepl promotes them, it can't use a bit cast but
should use a shuffle vector instead.
llvm-svn: 129472
ignored. There was a test to catch this, but it was just blindly updated in
a large change. This fixes another part of <rdar://problem/9275290>.
llvm-svn: 129466
will allow multiple context with different loop unroll parameters to run. This is a minor change and no effect
on existing application.
llvm-svn: 129449
component names such as "engine" do not expand to "jit" and hence to
the native target libraries for external users.
Thanks to arrowdodger for reporting and diagnosing the problem.
llvm-svn: 129444
the max itself, so it is not easy to write a test case for this, but I added a
test case that would fail if the code in AsmPrinter were removed.
llvm-svn: 129432
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.
Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129421
The ARMARM specifies these instructions as unpredictable when storing the
writeback register. This shouldn't affect code generation much since storing a
pointer to itself is quite rare.
llvm-svn: 129409
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.
First part of <rdar://problem/8460511>.
llvm-svn: 129401
Add handling for tracking the relocations on symbols and resolving them.
Keep track of the relocations even after they are resolved so that if
the RuntimeDyld client moves the object, it can update the address and any
relocations to that object will be updated.
For our trival object file load/run test harness (llvm-rtdyld), this enables
relocations between functions located in the same object module. It should
be trivially extendable to load multiple objects with mutual references.
As a simple example, the following now works (running on x86_64 Darwin 10.6):
$ cat t.c
int bar() {
return 65;
}
int main() {
return bar();
}
$ clang t.c -fno-asynchronous-unwind-tables -o t.o -c
$ otool -vt t.o
t.o:
(__TEXT,__text) section
_bar:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp,%rbp
0000000000000004 movl $0x00000041,%eax
0000000000000009 popq %rbp
000000000000000a ret
000000000000000b nopl 0x00(%rax,%rax)
_main:
0000000000000010 pushq %rbp
0000000000000011 movq %rsp,%rbp
0000000000000014 subq $0x10,%rsp
0000000000000018 movl $0x00000000,0xfc(%rbp)
000000000000001f callq 0x00000024
0000000000000024 addq $0x10,%rsp
0000000000000028 popq %rbp
0000000000000029 ret
$ llvm-rtdyld t.o -debug-only=dyld ; echo $?
Function sym: '_bar' @ 0
Function sym: '_main' @ 16
Extracting function: _bar from [0, 15]
allocated to 0x100153000
Extracting function: _main from [16, 41]
allocated to 0x100154000
Relocation at '_main' + 16 from '_bar(Word1: 0x2d000000)
Resolving relocation at '_main' + 16 (0x100154010) from '_bar (0x100153000)(pcrel, type: 2, Size: 4).
loaded '_main' at: 0x100154000
65
$
llvm-svn: 129388
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.
llvm-svn: 129383
This merges the behavior of splitSingleBlocks into splitAroundRegion, so the
RS_Region and RS_Block register stages can be coalesced. That means the leftover
intervals after region splitting go directly to spilling instead of a second
pass of per-block splitting.
llvm-svn: 129379
Use debug info in the IR to find the directory/file:line:col. Each time that location changes, bump a counter.
Unlike the existing profiling system, we don't try to look at argv[], and thusly don't require main() to be present in the IR. This matches GCC's technique where you specify the profiling flag when producing each .o file.
The runtime library is minimal, currently just calling printf at program shutdown time. The API is designed to make it possible to emit GCOV data later on.
llvm-svn: 129340
reassociation opportunities are exposed. This fixes a bug where
the nested reassociation expects to be the IR to be consistent,
but it isn't, because the outer reassociation has disconnected
some of the operands. rdar://9167457
llvm-svn: 129324
.long 80+08
go ahead and assume that if we've got an Error token that we handled it
already. Otherwise if it's a token we can't handle then go ahead and
return the default error.
llvm-svn: 129322
If enabled, this will attempt to use the CC_LOG_DIAGNOSTICS feature I dropped
into Clang to print a log of all the diagnostics generated during an individual
build (from the top-level). Not sure if this will actually be useful, but for
now it is handy for testing the option.
llvm-svn: 129312
mean that it has to be ConstantArray of ConstantStruct. We might have
ConstantAggregateZero, at either level, so don't crash on that.
Also, semi-deprecate the sentinal value. The linker isn't aware of sentinals so
we end up with the two lists appended, each with their "sentinals" on them.
Different parts of LLVM treated sentinals differently, so make them all just
ignore the single entry and continue on with the rest of the list.
llvm-svn: 129307
the 'unwind' instruction. However, later on that instruction was converted into
a jump to the basic block it was located in, causing an infinite loop when we
get there.
It turns out, we get there if the _Unwind_Resume_or_Rethrow call returns (which
it's not supposed to do). It returns if it cannot find a place to unwind
to. Thus we would get what appears to be a "hang" when in reality it's just that
the EH couldn't be propagated further along.
Instead of infinitely looping (or calling `unwind', which none of our back-ends
support (it's lowered into nothing...)), call the @llvm.trap() intrinsic
instead. This may not conform to specific rules of a particular language, but
it's rather better than infinitely looping.
<rdar://problem/9175843&9233582>
llvm-svn: 129302
disassembler API. Hooked this up to the ARM target so such tools as Darwin's
otool(1) can now print things like branch targets for example this:
blx _puts
instead of this:
blx #-36
And even print the expression encoded in the Mach-O relocation entried for
things like this:
movt r0, :upper16:((_foo-_bar)+1234)
llvm-svn: 129284
Both coalescing and register allocation already check aliases for interference,
so these extra segments are only slowing us down.
This speeds up both linear scan and the greedy register allocator.
llvm-svn: 129283
--- Reverse-merging r129235 into '.':
D test/Feature/bb_attrs.ll
U include/llvm/BasicBlock.h
U include/llvm/Bitcode/LLVMBitCodes.h
U lib/VMCore/AsmWriter.cpp
U lib/VMCore/BasicBlock.cpp
U lib/AsmParser/LLParser.cpp
U lib/AsmParser/LLLexer.cpp
U lib/AsmParser/LLToken.h
U lib/Bitcode/Reader/BitcodeReader.cpp
U lib/Bitcode/Writer/BitcodeWriter.cpp
llvm-svn: 129259
* Add a "landing pad" attribute to the BasicBlock.
* Modify the bitcode reader and writer to handle said attribute.
Later: The verifier will ensure that the landing pad attribute is used in the
appropriate manner. I.e., not applied to the entry block, and applied only to
basic blocks that are branched to via a `dispatch' instruction.
(This is a work-in-progress.)
llvm-svn: 129235
is substantially different than a(b|c)d. Form the latter regex instead.
This found a few problems in the testsuite, which serves as its test.
llvm-svn: 129196
It is common for large live ranges to have few basic blocks with register uses
and many live-through blocks without any uses. This approach grows the Hopfield
network incrementally around the use blocks, completely avoiding checking
interference for some through blocks.
llvm-svn: 129188
error stream, in cases where the AsmParser is
being invoked by EDDisassembler. Before, they
were being sent to errs() because no error handler
was installed in the SourceMgr.
llvm-svn: 129177
If lower bound is more then upper bound then consider it is an unbounded array.
An array is unbounded if non-zero lower bound is same as upper bound.
If lower bound and upper bound are zero than array has one element.
llvm-svn: 129156