A non-identity copy cannot be coalesced when the phi join destination register
is live at the copy site.
Also verify the condition that the PHI join source register is only used in
the PHI join. Otherwise the coalescing is invalid.
llvm-svn: 86322
Here is the original commit message:
This commit updates malloc optimizations to operate on malloc calls that have constant int size arguments.
Update CreateMalloc so that its callers specify the size to allocate:
MallocInst-autoupgrade users use non-TargetData-computed allocation sizes.
Optimization uses use TargetData to compute the allocation size.
Now that malloc calls can have constant sizes, update isArrayMallocHelper() to use TargetData to determine the size of the malloced type and the size of malloced arrays.
Extend getMallocType() to support malloc calls that have non-bitcast uses.
Update OptimizeGlobalAddressOfMalloc() to optimize malloc calls that have non-bitcast uses. The bitcast use of a malloc call has to be treated specially here because the uses of the bitcast need to be replaced and the bitcast needs to be erased (just like the malloc call) for OptimizeGlobalAddressOfMalloc() to work correctly.
Update PerformHeapAllocSRoA() to optimize malloc calls that have non-bitcast uses. The bitcast use of the malloc is not handled specially here because ReplaceUsesOfMallocWithGlobal replaces through the bitcast use.
Update OptimizeOnceStoredGlobal() to not care about the malloc calls' bitcast use.
Update all globalopt malloc tests to not rely on autoupgraded-MallocInsts, but instead use explicit malloc calls with correct allocation sizes.
llvm-svn: 86311
This assert was very conservative to begin with (the error condition is well
covered by tests elsewhere in the code) so we won't miss much by removing it.
llvm-svn: 86088
MallocInst-autoupgrade users use non-TargetData-computed allocation sizes.
Optimization uses use TargetData to compute the allocation size.
Now that malloc calls can have constant sizes, update isArrayMallocHelper() to use TargetData to determine the size of the malloced type and the size of malloced arrays.
Extend getMallocType() to support malloc calls that have non-bitcast uses.
Update OptimizeGlobalAddressOfMalloc() to optimize malloc calls that have non-bitcast uses. The bitcast use of a malloc call has to be treated specially here because the uses of the bitcast need to be replaced and the bitcast needs to be erased (just like the malloc call) for OptimizeGlobalAddressOfMalloc() to work correctly.
Update PerformHeapAllocSRoA() to optimize malloc calls that have non-bitcast uses. The bitcast use of the malloc is not handled specially here because ReplaceUsesOfMallocWithGlobal replaces through the bitcast use.
Update OptimizeOnceStoredGlobal() to not care about the malloc calls' bitcast use.
Update all globalopt malloc tests to not rely on autoupgraded-MallocInsts, but instead use explicit malloc calls with correct allocation sizes.
llvm-svn: 86077
The KILL pseudo-instruction may survive to the asm printer pass, just like the IMPLICIT_DEF. Print the KILL as a comment instead of just leaving a blank line in the output.
With -asm-verbose=0, a blank line is printed, like IMPLICIT?DEF.
llvm-svn: 86041
This introduces a new pass, SlotIndexes, which is responsible for numbering
instructions for register allocation (and other clients). SlotIndexes numbering
is designed to match the existing scheme, so this patch should not cause any
changes in the generated code.
For consistency, and to avoid naming confusion, LiveIndex has been renamed
SlotIndex.
The processImplicitDefs method of the LiveIntervals analysis has been moved
into its own pass so that it can be run prior to SlotIndexes. This was
necessary to match the existing numbering scheme.
llvm-svn: 85979
This makes both logical sense (see below) and increases the
number of functions marked readnone/readonly by about 1-2%
in practice. The number of functions marked nocapture goes
up by about 5-10%. The reason it makes sense is shown by
the following example: if you run -functionattrs -inline on
it, then no attributes are assigned. But if you instead run
-inline -functionattrs then @f is marked readnone because the
simplifications produced by the inliner eliminate the store.
@x = external global i32
define void @w(i1 %b) {
br i1 %b, label %write, label %return
write:
store i32 1, i32 *@x
br label %return
return:
ret void
}
define void @f() {
call void @w(i1 0)
ret void
}
llvm-svn: 85893
1. we'd run simplifycfg at the very start, even though
the per function passes have already cleaned this up.
2. In the main per-function pipeline that is interlaced with inlining
etc, we would do instcombine, jump threading, simplifycfg *before*
doing SROA. SROA is much more likely to expose opportunities for
these passes than they are for SROA, so move SRoA up earlier.
also add some comments.
llvm-svn: 85742
ipconstprop and doesn't take much time. Just run it in its place.
This adds a testcase for it, which I plan to expand to cover other
"integration" cases, where we expect the optimizer to be able to
eliminate various things. Due to phase order issues we've regressed
in a number of areas and integration tests are the only way I see to
prevent this.
llvm-svn: 85729
block with a blockaddress still referring to it' replace the invalid
blockaddress with a new blockaddress(@func, null) instead of a
inttoptr(1).
This changes the bitcode encoding format, and still needs codegen
support (this should produce a non-zero value, referring to the entry
block of the function would also be quite reasonable).
llvm-svn: 85678
MergeBlockIntoPredecessor. This makes SimplifyCFG slightly more aggressive,
and makes it unnecessary for LoopUnroll to have its own copy of this code.
llvm-svn: 85667
unfolding loads for hoisting. getOpcodeAfterMemoryUnfold returns the
opcode of the original operation without the load, not the load
itself, MachineLICM needs to know the operand index in order to get
the correct register class. Extend getOpcodeAfterMemoryUnfold to
return this information.
llvm-svn: 85622
bunch of associated comments, because it doesn't have anything to do
with DAGs or scheduling. This is another step in decoupling MachineInstr
emitting from scheduling.
llvm-svn: 85517
ArraySize * ElementSize
ElementSize * ArraySize
ArraySize << log2(ElementSize)
ElementSize << log2(ArraySize)
Refactor isArrayMallocHelper and delete isSafeToGetMallocArraySize, so that there is only 1 copy of the malloc array determining logic.
Update users of getMallocArraySize() to not bother calling isArrayMalloc() as well.
llvm-svn: 85421
Checks on Demand algorithm which looks at arbitrary branches instead of loop
iterations. This is GSoC work by Andre Tavares with only editorial changes
applied!
llvm-svn: 85382
In the new world order, BlockAddress can have a BasicBlock operand.
This doesn't permute much, because if you have a ConstantExpr (or
anything more specific than Constant) we still know the operand has
to be a Constant.
llvm-svn: 85375
use it to control tail merging when there is a tradeoff between performance
and code size. When there is only 1 instruction in the common tail, we have
been merging. That can be good for code size but is a definite loss for
performance. Now we will avoid tail merging in that case when the
optimization level is "Aggressive", i.e., "-O3". Radar 7338114.
Since the IfConversion pass invokes BranchFolding, it too needs to know
the optimization level. Note that I removed the RegisterPass instantiation
for IfConversion because it required a default constructor. If someone
wants to keep that for some reason, we can add a default constructor with
a hard-wired optimization level.
llvm-svn: 85346
http://llvm.org/PR5184, and beef up the comments to describe what both options
do and the risks of lazy compilation in the presence of threads.
llvm-svn: 85295
being destroyed. This allows users to run global optimizations like globaldce
even after some functions have been jitted.
This patch also removes the Function* parameter to
JITEventListener::NotifyFreeingMachineCode() since it can cause that to be
called when the Function is partially destroyed. This change will be even more
helpful later when I think we'll want to allow machine code to actually outlive
its Function.
llvm-svn: 85182
GEPs (more than one non-zero index) into simple GEPs (at most one
non-zero index). In some simple experiments using this it's not
uncommon to see 3% overall code size wins, because it exposes
redundancies that can be eliminated, however it's tricky to use
because instcombine aggressively undoes the work that this pass does.
llvm-svn: 85144
bootstrapping. It's not safe to leave identity subreg_to_reg and insert_subreg
around.
- Relax register scavenging to allow use of partially "not-live" registers. It's
common for targets to operate on registers where the top bits are undef. e.g.
s0 =
d0 = insert_subreg d0<undef>, s0, 1
...
= d0
When the insert_subreg is eliminated by the coalescer, the scavenger used to
complain. The previous fix was to keep to insert_subreg around. But that's
brittle and it's overly conservative when we want to use the scavenger to
allocate registers. It's actually legal and desirable for other instructions
to use the "undef" part of d0. e.g.
s0 =
d0 = insert_subreg d0<undef>, s0, 1
...
s1 =
= s1
= d0
We probably need add a "partial-undef" marker on machine operand so the
machine verifier would not complain.
llvm-svn: 85091
used elsewhere - an exit block is a block outside the loop branched to
from within the loop. An exiting block is a block inside the loop that
branches out.
llvm-svn: 85019
Update all analysis passes and transforms to treat free calls just like FreeInst.
Remove RaiseAllocations and all its tests since FreeInst no longer needs to be raised.
llvm-svn: 84987
compiled.
When functions are compiled, they accumulate references in the JITResolver's
stub maps. This patch removes those references when the functions are
destroyed. It's illegal to destroy a Function when any thread may still try to
call its machine code.
This patch also updates r83987 to use ValueMap instead of explicit CallbackVHs
and fixes a couple "do stuff inside assert()" bugs from r84522.
llvm-svn: 84975
even when keys get RAUWed and deleted during its lifetime. By default the keys
act like WeakVHs, but users can pass a third template parameter to configure
how updates work and whether to do anything beyond updating the map on each
action.
It's also possible to automatically acquire a lock around ValueMap updates
triggered by RAUWs and deletes, to support the ExecutionEngine.
llvm-svn: 84890
Analysis/ConstantFolding.cpp. This doesn't change the behavior of
instcombine but makes other clients of ConstantFoldInstruction
able to handle loads. This was partially extracted from Eli's patch
in PR3152.
llvm-svn: 84836
- i < getNumElements() instead of getNumElements() > i
- Make setParent() private
- Fix use of resizeOperands
- Reset HasMetadata bit after removing all metadata attached to an instruction
- Efficient use of iterators
llvm-svn: 84765
JITEmitter.
I'm gradually making Functions auto-remove themselves from the JIT when they're
destroyed. In this case, the Function needs to be removed from the JITEmitter,
but the map recording which Functions need to be removed lived behind the
JITMemoryManager interface, which made things difficult.
This patch replaces the deallocateMemForFunction(Function*) method with a pair
of methods deallocateFunctionBody(void *) and deallocateExceptionTable(void *)
corresponding to the two startFoo/endFoo pairs.
llvm-svn: 84651
appropriate restore location for the spill as well as perform the actual
save and restore.
The Thumb1 target uses this to make sure R12 is not clobbered while a spilled
scavenger register is live there.
llvm-svn: 84554
The JITResolver maps Functions to their canonical stubs and all callsites for
lazily-compiled functions to their target Functions. To make Function
destruction work, I'm going to need to remove all callsites on destruction, so
this patch also adds the reverse mapping for that.
There was an incorrect assumption in here that the only stub for a function
would be the one caused by needing to lazily compile it, while x86-64 far calls
and dlsym-stubs could also cause such stubs, but I didn't look for a test case
that the assumption broke.
This also adds DenseMapInfo<AssertingVH> so I can use DenseMaps instead of
std::maps.
llvm-svn: 84522
stack slots and giving them different PseudoSourceValue's did not fix the
problem of post-alloc scheduling miscompiling llvm itself.
- Apply Dan's conservative workaround by assuming any non fixed stack slots can
alias other memory locations. This means a load from spill slot #1 cannot
move above a store of spill slot #2.
- Enable post-alloc scheduling for x86 at optimization leverl Default and above.
llvm-svn: 84424
Update testcases that rely on malloc insts being present.
Also prematurely remove MallocInst handling from IndMemRemoval and RaiseAllocations to help pass tests in this incremental step.
llvm-svn: 84292
identifying the malloc as a non-array malloc. This broke GlobalOpt's optimization of stores of mallocs
to global variables.
The fix is to classify malloc's into 3 categories:
1. non-array mallocs
2. array mallocs whose array size can be determined
3. mallocs that cannot be determined to be of type 1 or 2 and cannot be optimized
getMallocArraySize() returns NULL for category 3, and all users of this function must avoid their
malloc optimization if this function returns NULL.
Eventually, currently unexpected codegen for computing the malloc's size argument will be supported in
isArrayMalloc() and getMallocArraySize(), extending malloc optimizations to those examples.
llvm-svn: 84199
so get rid of eh.selector.i64 and rename eh.selector.i32 to eh.selector.
Likewise for eh.typeid.for. This aligns us with gcc, which always uses a
32 bit value for the selector on all platforms. My understanding is that
the register allocator used to assert if the selector intrinsic size didn't
match the pointer size, and this was the reason for introducing the two
variants. However my testing shows that this is no longer the case (I
fixed some bugs in selector lowering yesterday, and some more today in the
fastisel path; these might have caused the original problems).
llvm-svn: 84106
truncating an SDValue (depending on whether the target
type is bigger or smaller than the value's type); or zero
extending or truncating it. Use it in a few places (this
seems to be a popular operation, but I only modified cases
of it in SelectionDAGBuild). In particular, the eh_selector
lowering was doing this wrong due to a repeated rather than
inverted test, fixed with this change.
llvm-svn: 84027
bootstrap of FSF-style PPC, so there is some
reason to believe the original bug (which was
never analyzed) has been fixed, probably by
82266.
llvm-svn: 83871
is trivially rematerializable and integrate it into
TargetInstrInfo::isTriviallyReMaterializable. This way, all places that
need to know whether an instruction is rematerializable will get the
same answer.
This enables the useful parts of the aggressive-remat option by
default -- using AliasAnalysis to determine whether a memory location
is invariant, and removes the questionable parts -- rematting operations
with virtual register inputs that may not be live everywhere.
llvm-svn: 83687
While recording beginning of a function, use scope info from the first location entry instead of just relying on first location entry itself.
llvm-svn: 83684
mappings, which could cause errors and assert-failures. This patch fixes that,
adds a test, and refactors the global-mapping-removal code into a single place.
llvm-svn: 83678