When the pipeliner is renaming phi values, it may need to iterate through
the phi operands to check for other phis. However, the pipeliner should
stop once it reaches a phi that is outside the pipelined loop.
Also, when the generateExistingPhis code is unable to reuse an existing
phi, the default code that computes the PhiOp2 is only to be used when
the pipeliner is generating the kernel. Otherwise, the phi may be a value
computed earlier in the same epilog.
Patch by Brendon Cahoon.
llvm-svn: 290355
test/CodeGen/MIR should contain tests that intent to test the MIR
printing or parsing. Tests that test something else should be in
test/CodeGen/TargetName even when they are written in .mir.
As a rule of thumb, only tests using "llc -run-pass none" should be in
test/CodeGen/MIR.
llvm-svn: 289254
The Stack slot coloring pass removes a store that is followed by a load
that deal with the same stack slot. The function isLoadFromStackSlot
is supposed to consider the loads that have no side-effects. This
patch fixed the issue by removing the unsafe loads from this function
Eg:
%vreg0<def> = L2_loadruh_io <fi#15>, 0
S2_storeri_io <fi#15>, 0, %vreg0
In this case, we load an unsigned extended half word and store this in to
the same stack slot. The Stack slot coloring pass considers safe to remove
the store. This patch marked all the non-vector byte and half word loads as
unsafe.
llvm-svn: 286843
For pairs of 32-bit registers: isub_lo, isub_hi.
For pairs of vector registers: vsub_lo, vsub_hi.
Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function
HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg)
that returns the appropriate subreg index for RegClass.
llvm-svn: 286377
When LivePhysRegs adds live-in registers, it recognizes ~0 as a special
lane mask indicating the entire register. If the lane mask is not ~0,
it will only add the subregisters that overlap the specified lane mask.
The problem is that if a live-in register does not have subregisters,
and the lane mask is not ~0, it will not be added to the live set.
(The given lane mask may simply be the lane mask of its register class.)
If a register does not have subregisters, add it to the live set if
the lane mask is non-zero.
Differential Revision: https://reviews.llvm.org/D26094
llvm-svn: 285440
Do not use LiveIntervals to recalculate kills, because that cannot be
done accurately without implicit uses on predicated instructions.
llvm-svn: 285409
After register allocation it is possible to have a spill of a register
that is only partially defined. That in itself it fine, but creates a
problem for double vector registers. Stores of such registers are pseudo
instructions that are expanded into pairs of individual vector stores,
and in case of a partially defined source, one of the stores may use
an entirely undefined register. To avoid this, track the defined parts
and only generate actual stores for those.
llvm-svn: 284841
This required reengineering of some of the part of liveness calculation,
including fixing some issues caused by the limitations of the previous
approach. The current code is not necessarily the fastest, but it should
be functionally correct (at least more so than before). The compile-time
performance will be addressed in the future.
llvm-svn: 284609
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.
Differential Revision: https://reviews.llvm.org/D25478
llvm-svn: 284036
This avoids llc using the hosts OS/vendor as defaults and triggering
unwanted behaviour in the tests. This should deal with the buildbot
breakages on windows after r283140.
llvm-svn: 283149
Each shadow only represents data flow that is restricted to its reaching
def. Propagating more than that could lead to spurious register liveness,
resulting in extra (incorrectly) block live-ins.
llvm-svn: 283143
Normally, if conversion would add implicit uses for redefined registers,
e.g. R0<def> = add_if ..., R0<imp-use>. However, if only subregisters of
R0 are known to be live but not R0 itself, such implicit uses will not be
added, causing prior definitions of such subregisters and R0 itself to
become dead.
llvm-svn: 282626
descriptions now tag add instructions, and the Hexagon backend is using this to
identify loop induction statements.
Patch by Sam Parker and Sjoerd Meijer.
Differential Revision: https://reviews.llvm.org/D23601
llvm-svn: 281304
Shadow uses need to be analyzed together, since each individual shadow
will only have a partial reaching def. All shadows together may cover
a given register ref, while each individual shadow may not.
llvm-svn: 280855
This reverts commit r280268, it causes all MSVC 2013 to ICE. This
appears to have been fixed in a later MSVC 2013 update, because I cannot
reproduce it locally. That said, all upstream LLVM bots are broken right
now, so I am reverting.
Also reverts dependent change r280275, "[Hexagon] Deal with undefs when
extending live intervals".
llvm-svn: 280301
This bug shows up with diamonds that share unpredicable, unanalyzable branches.
There's an included test case from Hexagon. What was happening was that we were
attempting to predicate the branch instruction despite the fact that it was
checked to be the same. Now for unanalyzable branches we skip over the branch
instructions when predicating the block.
Differential Revision: https://reviews.llvm.org/D23939
llvm-svn: 279985
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.
Differential Revision: http://reviews.llvm.org/D23850
llvm-svn: 279698
The register allocator can split a live interval of a register into a set
of smaller intervals. After the allocation of registers is complete, the
rewriter will modify the IR to replace virtual registers with the corres-
ponding physical registers. At this stage, if a register corresponding
to a subregister of a virtual register is used, the rewriter will check
if that subregister is undefined, and if so, it will add the <undef> flag
to the machine operand. The function verifying liveness of the subregis-
ter would assume that it is undefined, unless any of the subranges of the
live interval proves otherwise.
The problem is that the live intervals created during splitting do not
have any subranges, even if the original parent interval did. This could
result in the <undef> flag placed on a register that is actually defined.
Differential Revision: http://reviews.llvm.org/D21189
llvm-svn: 279625
Improved handling of fma, floating point min/max, additional load/store
instructions for floating point types.
Patch by Jyotsna Verma.
llvm-svn: 279239
The pipeliner was generating an invalid Phi name for an operand
in the epilog block, which caused an assert in the live variable
analysis pass. The fix is to the code that generates new Phis
in the epilog block. In this case, there is an existing Phi that
needs to be reused rather than creating a new Phi instruction.
Differential Revision: https://reviews.llvm.org/D23513
llvm-svn: 278805
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!
Motivating testcase:
void f(float *a, float *b, float *c, int n) {
while (n-- > 0)
*c++ = *a++ + *b++;
}
It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.
llvm-svn: 278658
LowerTargetConstantPool is not properly setting the TargetFlag to indicate
desired relocation. Coding error, the offset parameter was omitted, so the
TargetFlag was used as the offset, and the TargetFlag defaulted to zero.
This only affects -fpic compilation, and only those items created in a
Constant Pool, for example a vector of constants. Halide ran into this issue.
llvm-svn: 278614
From the point of view of register assignment, byval parameters are
ignored: a byval parameter is not going to be assigned to a register,
and it will not affect the assignments of subsequent parameters.
When matching registers with parameters in the bit tracker, make sure
to skip byval parameters before advancing the registers.
llvm-svn: 278375
This change makes it possible for tail-duplication and tail-merging to
be disjoint. By being less aggressive when merging during layout, there are no
overlapping cases between tail-duplication and tail-merging, provided the
thresholds are disjoint.
There is a remaining TODO to benchmark the succ_size() test for non-layout tail
merging.
llvm-svn: 278265
Floating point instructions use general purpose registers, so the few
instructions that can put floating point immediates into registers are,
in fact, integer instruction. Use them explicitly instead of having
pseudo-instructions specifically for dealing with floating point values.
Simplify the constant loading instructions (from sdata) to have only two:
one for 32-bit values and one for 64-bit values: CONST32 and CONST64.
llvm-svn: 278244
When the same base address is used to load two different data types, LSR
would assume a memory type of "void". This type is not sized and has no
alignment information. Checking for it causes a crash.
llvm-svn: 277601
Identify patterns where the address is aligned to an 8-byte boundary,
but both the base address and the constant offset are both proper
multiples of 4. In such cases, extract Base+4 into a separate instruc-
tion, and use S2_storerd_io, instead of using S4_storerd_rr.
llvm-svn: 277497
Scavenging slots were only reserved when pseudo-instruction expansion in
frame lowering created new virtual registers. It is possible to still
need a scavenging slot even if no virtual registers were created, in cases
where the stack is large enough to overflow instruction offsets.
llvm-svn: 277355
The DAG combiner will try to merge consecutive stores into a bigger
store, unless the resulting store is not fast. Misaligned vector stores
are allowed on Hexagon, but are not fast. Add a testcase to make sure
this type of merging does not occur.
Patch by Pranav Bhandarkar.
llvm-svn: 277182
The DAG combiner tries to merge stores to adjacent vector wide memory
locations by creating stores which are integral multiples of the vector
width. Discourage this by informing it that this is slow. This should
not affect legalization passes, because all of them ignore the "Fast"
argument.
Patch by Pranav Bhandarkar.
llvm-svn: 277178
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.
This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.
This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.
Differential Review: http://reviews.llvm.org/D16829
llvm-svn: 277169
If the mask of a vector shuffle has alternating odd or even numbers
starting with 1 or 0 respectively up to the largest possible index
for the given type in the given HVX mode (single of double) we can
generate vpacko or vpacke instruction respectively.
E.g.
%42 = shufflevector <32 x i16> %37, <32 x i16> %41,
<32 x i32> <i32 1, i32 3, ..., i32 63>
is %42.h = vpacko(%41.w, %37.w)
Patch by Pranav Bhandarkar.
llvm-svn: 277168
Rebalances address calculation trees and applies Hexagon-specific
optimizations to the trees to improve instruction selection.
Patch by Tobias Edler von Koch.
llvm-svn: 277151
Normally, CFI instructions should be inserted after allocframe, but
if allocframe is in the same packet with a call, the CFI instructions
should be inserted before that packet.
llvm-svn: 277020
Before adding a new preheader block, check if there is a candidate block
where the loop setup could be placed speculatively. This will be off by
default.
llvm-svn: 276919
Consider this case:
vreg1 = A2_zxth vreg0 (1)
...
vreg2 = A2_zxth vreg1 (2)
Redundant instruction elimination could delete the instruction (1)
because the user (2) only cares about the low 16 bits. Then it could
delete (2) because the input is already zero-extended. The problem
is that the properties allowing each individual instruction to be
deleted depend on the existence of the other instruction, so either
one can be deleted, but not both.
The existing check for this situation in RIE was insufficient. The
fix is to update all dependent cells when an instruction is removed
(replaced via COPY) in RIE.
llvm-svn: 276792
Schedule a load and its use in the same packet in MISched. Previously,
isResourceAvailable was returning false for dependences in the same
packet, which prevented MISched from packetizing a load and its use in
the same packet for v60.
Patch by Ikhlas Ajbar.
llvm-svn: 275804
The machine scheduler needs to account for available resources
more accurately in order to avoid scheduling an instruction that
forces a new packet to be created.
This occurs in two ways: First, an instruction without an available
resource may have a large priority due to other metrics and be
scheduled when there are other instructions with available resources.
Second, an instruction with a non-zero latency may become available
prematurely. In both these cases, we attempt change the priority
in order to allow a better instruction to be scheduled.
Patch by Brendon Cahoon.
llvm-svn: 275793
An instruction may have multiple predecessors that are candidates
for using .cur. However, only one of them can use .cur in the
packet. When this case occurs, we need to make sure that only
one of the dependences gets a 0 latency value.
Patch by Brendon Cahoon.
llvm-svn: 275790
- Treat bitwise OR with a frame index as an ADD wherever possible, fold it
into addressing mode.
- Extend patterns for memops to allow memops with frame indexes as address
operands.
llvm-svn: 275569
On Hexagon is it legal to packetize the instructions setting up call
arguments with the call instruction itself. This was already done,
except for tail calls. Make sure tail calls are handled as well.
llvm-svn: 275458
Transform: (store ch addr (add x (add (shl y c) e)))
to: (store ch addr (add x (shl (add y d) c))),
where e = (shl d c) for some integer d.
The purpose of this is to enable generation of loads/stores with
shifted addressing mode, i.e. mem(x+y<<#c). For that, the shift
value c must be 0, 1 or 2.
llvm-svn: 273466
The aggressive anti-dependency breaker can rename the restored callee-
saved registers. To prevent this, mark these registers are live on all
paths to the return/tail-call instructions, and add implicit use operands
for them to these instructions.
llvm-svn: 270898
When looking for an available spill slot, the register scavenger would stop
after finding the first one with no register assigned to it. That slot may
have size and alignment that do not meet the requirements of the register
that is to be spilled. Instead, find an available slot that is the closest
in size and alignment to one that is needed to spill a register from RC.
Differential Revision: http://reviews.llvm.org/D20295
llvm-svn: 269969
Recent changes to the instruction selection code exposed a problem where
a dead node was not removed on time. This node had both input and output
chains, which lead to an apparent cycle.
llvm-svn: 269458
When generating .cfi_offset instructions, make sure that the offset is
calculated with respect to the register used to define the CFA (which is
currently always FP+8).
llvm-svn: 269191
An example from Hexagon where things went wrong:
%R0<def> = L2_loadrigp <ga:@fp04> ; load function address
J2_callr %R0<kill>, ..., %R0<imp-def> ; call *R0, return value in R0
ScheduleDAGInstrs::buildSchedGraph would visit all instructions going
backwards, and in each instruction it would visit all operands in their
order on the operand list. In the case of this call, it visited the use
of R0 first, then removed it from the set Uses after it visited the def.
This caused the DAG to be missing the data dependence edge on R0 between
the load and the call.
Differential Revision: http://reviews.llvm.org/D20102
llvm-svn: 269076
ScheduleDAGMI::initQueues changes the RegionBegin to the first non-debug
instruction. Since it does not track register pressure, it does not affect
any RP trackers. ScheduleDAGMILive inherits initQueues from ScheduleDAGMI,
and it does reset the TopTPTracker in its schedule method. Any derived,
target-specific scheduler will need to do it as well, but the TopRPTracker
is only exposed as a "const" object to derived classes. Without the ability
to modify the tracker directly, this leaves a derived scheduler with a
potential of having the TopRPTracker out-of-sync with the CurrentTop.
The symptom of the problem:
void llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit *, bool):
Assertion `TopRPTracker.getPos() == CurrentTop && "out of sync"' failed.
Differential Revision: http://reviews.llvm.org/D19438
llvm-svn: 267918
When a block is tail-duplicated, the PHI nodes from that block are
replaced with appropriate COPY instructions. When those PHI nodes
contained use operands with subregisters, the subregisters were
dropped from the COPY instructions, resulting in incorrect code.
Keep track of the subregister information and use this information
when remapping instructions from the duplicated block.
Differential Revision: http://reviews.llvm.org/D19337
llvm-svn: 267583
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.
Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.
Motivation
----------
Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.
We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.
Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.
http://reviews.llvm.org/D19034
<rdar://problem/25256815>
llvm-svn: 266446
In PIC mode, the registers R14, R15 and R28 are reserved for use by
the PLT handling code. This causes all functions to clobber these
registers. While this is not new for regular function calls, it does
also apply to save/restore functions, which do not follow the standard
ABI conventions with respect to the volatile/non-volatile registers.
Patch by Jyotsna Verma.
llvm-svn: 264324
Replace spills to memory with spills to registers, if possible. This
applies mostly to predicate registers (both scalar and vector), since
they are very limited in number. A spill of a predicate register may
happen even if there is a general-purpose register available. In cases
like this the stack spill/reload may be eliminated completely.
This optimization will consider all stack objects, regardless of where
they came from and try to match the live range of the stack slot with
a dead range of a register from an appropriate register class.
llvm-svn: 260758
Rewrite the code to handle all pseudo-instructions in a single pass.
This temporarily reverts spill slot optimization that used general-
purpose registers to hold values of spilled predicate registers.
llvm-svn: 260696
We can generate the actual instructions from the intrinsics without the
need for pseudo-instructions. Also, since the intrinsics have a side-
effect in a form of a store, attempt to optimize away loads from the
store location.
llvm-svn: 260690
The DataLayout can calculate alignment of vectors based on the alignment
of the element type and the number of elements. In fact, it is the product
of these two values. The problem is that for vectors of N x i1, this will
return the alignment of N bytes, since the alignment of i1 is 8 bits. The
vector types of vNi1 should be aligned to N bits instead. Provide explicit
alignment for HVX vectors to avoid such complications.
llvm-svn: 260678