SUnit represent a MachineInstr in post-regalloc scheduling but SDNode
in pre-regalloc scheduling. when pass -enable-hexagon-sdnode-sched to
Hexagon backend with -O1 and above, this may cause an assertion failed.
Fixes PR45194.
Differential Revision: https://reviews.llvm.org/D76134
While restoring latency, check if any of the registers of
source instruction is a subregister of the successor instructions
apart from being same register.
This allows targets to know exactly which operands are contributing to
the dependency, which is required for targets with per-operand
scheduling models.
Differential Revision: https://reviews.llvm.org/D77135
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
Differential revision: https://reviews.llvm.org/D72701
The patch adds a new option ABI for Hexagon. It primary deals with
the way variable arguments are passed and is use in the Hexagon Linux Musl
environment.
If a callee function has a variable argument list, it must perform the
following operations to set up its function prologue:
1. Determine the number of registers which could have been used for passing
unnamed arguments. This can be calculated by counting the number of
registers used for passing named arguments. For example, if the callee
function is as follows:
int foo(int a, ...){ ... }
... then register R0 is used to access the argument ' a '. The registers
available for passing unnamed arguments are R1, R2, R3, R4, and R5.
2. Determine the number and size of the named arguments on the stack.
3. If the callee has named arguments on the stack, it should copy all of these
arguments to a location below the current position on the stack, and the
difference should be the size of the register-saved area plus padding
(if any is necessary).
The register-saved area constitutes all the registers that could have
been used to pass unnamed arguments. If the number of registers forming
the register-saved area is odd, it requires 4 bytes of padding; if the
number is even, no padding is required. This is done to ensure an 8-byte
alignment on the stack. For example, if the callee is as follows:
int foo(int a, ...){ ... }
... then the named arguments should be copied to the following location:
current_position - 5 (for R1-R5) * 4 (bytes) - 4 (bytes of padding)
If the callee is as follows:
int foo(int a, int b, ...){ ... }
... then the named arguments should be copied to the following location:
current_position - 4 (for R2-R5) * 4 (bytes) - 0 (bytes of padding)
4. After any named arguments have been copied, copy all the registers that
could have been used to pass unnamed arguments on the stack. If the number
of registers is odd, leave 4 bytes of padding and then start copying them
on the stack; if the number is even, no padding is required. This
constitutes the register-saved area. If padding is required, ensure
that the start location of padding is 8-byte aligned. If no padding is
required, ensure that the start location of the on-stack copy of the
first register which might have a variable argument is 8-byte aligned.
5. Decrement the stack pointer by the size of register saved area plus the
padding. For example, if the callee is as follows:
int foo(int a, ...){ ... } ;
... then the decrement value should be the following:
5 (for R1-R5) * 4 (bytes) + 4 (bytes of padding) = 24 bytes
The decrement should be performed before the allocframe instruction.
Increment the stack-pointer back by the same amount before returning
from the function.
This requires std::intializer_list to be a literal type, which it is
starting with C++14. The downside is that std::bitset is still not
constexpr-friendly so this change contains a re-implementation of most
of it.
Shrinks clang by ~60k.
llvm-svn: 369847
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.
This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.
The goal of this patch is to refactor all this to return a base
operand instead of a base register.
Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.
Differential Revision: https://reviews.llvm.org/D54846
llvm-svn: 347746
Add the generic processor for Hexagon so that it can be used
with 3rd party programs that create a back-end with the
"generic" CPU. This patch also enables the JIT for Hexagon.
Differential Revision: https://reviews.llvm.org/D48571
llvm-svn: 335641
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.
The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.
The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).
The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.
The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound.
Patch by Brendon Cahoon.
llvm-svn: 328542
Add barrier edges to check for any physical register. The previous code
worked for the function return registers: r0/d0, v0/w0.
Patch by Brendon Cahoon.
llvm-svn: 328120
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).
Basically:
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed
Differential Revision: https://reviews.llvm.org/D40420
llvm-svn: 319427
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
This patch lets the llvm tools handle the new HVX target features that
are added by frontend (clang). The target-features are of the form
"hvx-length64b" for 64 Byte HVX mode, "hvx-length128b" for 128 Byte mode HVX.
"hvx-double" is an alias to "hvx-length128b" and is soon will be deprecated.
The hvx version target feature is upgated form "+hvx" to "+hvxv{version_number}.
Eg: "+hvxv62"
For the correct HVX code generation, the user must use the following
target features.
For 64B mode: "+hvxv62" "+hvx-length64b"
For 128B mode: "+hvxv62" "+hvx-length128b"
Clang picks a default length if none is specified. If for some reason,
no hvx-length is specified to llvm, the compilation will bail out.
There is a corresponding clang patch.
Differential Revision: https://reviews.llvm.org/D38851
llvm-svn: 316101
This removes the duplicate HVX instruction set for the 128-byte mode.
Single instruction set now works for both modes (64- and 128-byte).
llvm-svn: 313362
An instruction may have multiple predecessors that are candidates
for using .cur. However, only one of them can use .cur in the
packet. When this case occurs, we need to make sure that only
one of the dependences gets a 0 latency value.
Patch by Brendon Cahoon.
llvm-svn: 275790
The Hexagon schedulers need to handle instructions with a latency
of 0 or 2 more accurately. The problem, in v60, is that a dependence
between two instructions with a 2 cycle latency can use a .cur version
of the source to achieve a 0 cycle latency when the use is in the
same packet. Any othe use, must be at least 2 packets later, or a
stall occurs. In other words, the compiler does not want to schedule
the dependent instructions 1 cycle later.
To achieve this, the latency adjustment code allows only a single
dependence to have a zero latency. All other instructions have the
other value, which is typically 2 cycles. We use a heuristic to
determine which instruction gets the 0 latency.
The Hexagon machine scheduler was also changed to increase the cost
associated with 0 latency dependences than can be scheduled in the
same packet.
Patch by Brendon Cahoon.
llvm-svn: 275625