We always(and only) check the NLP flag after calling
classifyGlobalReference to see whether it is accessed
indirectly.
Refactor to code to use isGVIndirectSym instead.
llvm-svn: 372417
Summary:
Since the SPE4RC register class contains an identical set of registers
and an identical spill size to the GPRC class its slightly confusing
the tablegen emitter. It's preventing the GPRC_and_GPRC_NOR0 synthesized
register class from inheriting VTs and AltOrders from GPRC or GPRC_NOR0.
This is because SPE4C is found first in the super register class list
when inheriting these properties and it doesn't set the VTs or
AltOrders the same way as GPRC or GPRC_NOR0.
This patch replaces all uses of GPE4RC with GPRC and allows GPRC and
GPRC_NOR0 to contain f32.
The test changes here are because the AltOrders are being inherited
to GPRC_NOR0 now.
Found while trying to determine if getCommonSubClass needs to take
a VT argument. It was originally added to support fp128 on x86-64,
I've changed some things about that so that it might be needed
anymore. But a PowerPC test crashed without it and I think its
due to this subclass issue.
Reviewers: jhibbits, nemanjai, kbarton, hfinkel
Subscribers: wuzish, nemanjai, mehdi_amini, hiraditya, kbarton, MaskRay, dexonsmith, jsji, shchenz, steven.zhang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67513
llvm-svn: 371779
A lot of places in the code combine checks for both ABI (SVR4/Darwin/AIX) and
addressing mode (64-bit vs 32-bit). In an attempt to make some of the code more
readable I've added a couple functions that combine checking for the ELF abi and
64-bit/32-bit code at once. As we add more AIX support I intend to add similar
functions for the AIX ABI.
Differential Revision: https://reviews.llvm.org/D65814
llvm-svn: 369658
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
Summary:
Since we are planning to add ADDIStocHA for 32bit in later patch, we decided
to change 64bit one first to follow naming convention with 8 behind opcode.
Patch by: Xiangling_L
Differential Revision: https://reviews.llvm.org/D64814
llvm-svn: 366731
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)
The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.
The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".
Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.
(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.
Patch by: Xiangling_L
Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara
Differential Revision: https://reviews.llvm.org/D62532
llvm-svn: 362735
Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D58430
llvm-svn: 354762
Make copy register code as common function as following.
unsigned copyRegToRegClass(const TargetRegisterClass *ToRC, unsigned SrcReg, unsigned Flag = 0, unsigned SubReg = 0);
Differential Revision: https://reviews.llvm.org/D57368
llvm-svn: 352596
Move the CC analysis implementation to its own .cpp file instead of
duplicating it and artificually using functions in PPCISelLowering.cpp
and PPCFastISel.cpp. Follow-up to the same change done for X86, ARM, and
AArch64.
llvm-svn: 352444
Fast selection of llvm icmp and fcmp instructions is not handled well about VSX instruction support.
We'd use VSX float comparison instruction instead of non-vsx float comparison instruction
if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and
VSFRC correspondingly if VSX feature is opened.
If the target does not have corresponding VSX instruction comparison for some type,
just copy VSX-related register to common float register class and use non-vsx comparison instruction.
Differential Revision: https://reviews.llvm.org/D57078
llvm-svn: 352174
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Bad machine code: Illegal virtual register for instruction
function: TestULE
basic block: %bb.0 entry (0x1000a39b158)
instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc
operand 1: %1:vsfrc
Fix assert about missing match between fcmp instruction and register class.
We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened.
add -verifymachineinstrs option into related test cases to enable the verify pass.
Differential Revision: https://reviews.llvm.org/D55686
llvm-svn: 350685
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.
llvm-svn: 349365
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
of only 'break'.
We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
the outer case.
I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.
Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu
Differential Revision: https://reviews.llvm.org/D53950
llvm-svn: 345882
* Delete a no-longer-used override, and mark the other
getRegisterTypeForCallingConv() as override.
* SPE only supports i32, not i64, as the internal type, so simply remove
the type check, so that DestReg and Opc are provably always set.
GCC 6.4 did not warn about either of the above.
llvm-svn: 337350
Summary:
The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1,
e500v2, and several e200 cores. This adds support targeting the e500v2,
as this is more common than the e500v1, and is in SoCs still on the
market.
This patch is very intrusive because the SPE is binary incompatible with
the traditional FPU. After discussing with others, the cleanest
solution was to make both SPE and FPU features on top of a base PowerPC
subset, so all FPU instructions are now wrapped with HasFPU predicates.
Supported by this are:
* Code generation following the SPE ABI at the LLVM IR level (calling
conventions)
* Single- and Double-precision math at the level supported by the APU.
Still to do:
* Vector operations
* SPE intrinsics
As this changes the Callee-saved register list order, one test, which
tests the precise generated code, was updated to account for the new
register order.
Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D44830
llvm-svn: 337347
candidates with coldcc attribute.
This recommits r322721 reverted due to sanitizer memory leak build bot failures.
Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 323778
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
IMHO it is an antipattern to have a enum value that is Default.
At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.
This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.
llvm-svn: 309911
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
This patch is the first in a series of patches to provide code gen for
doing compares in GPRs when the compare result is required in a GPR.
It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64
extensions. This first patch handles equality comparison on i32 operands with
the result sign or zero extended.
Differential Revision: https://reviews.llvm.org/D31847
llvm-svn: 302810
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.
The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
affects all targets that use frame pseudo instructions and touched many
files although the changes are uniform.
- Access to frame properties are implemented using special instructions
rather than calls getOperand(N).getImm(). For X86 and ARM such
replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
instruction. These involve proper instruction initialization and
methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
llvm-svn: 302527
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458
As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which
are illegal types promoted to i32 on PowerPC, is a choice constrained by
assumptions within the infrastructure. Specifically, the logic in
FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI
operands will be zero extended, and so, at least when materializing constants
that are PHI operands, we must do the same.
The rest of our fast-isel implementation does not appear to depend on the fact
that we were sign-extending i8/i16 constants, and all other targets also appear
to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we
had been doing this only for i1 constants, and sign-extending the others).
Fixes PR27721.
llvm-svn: 280614
There were two locations where fast-isel would generate a LFD instruction
with a target register class VSFRC instead of F8RC when VSX was enabled.
This can ccause invalid registers to be used in certain cases, like:
lfd 36, ...
instead of using a VSX load instruction. The wrong register number gets
silently truncated, causing invalid code to be generated.
The first place is PPCFastISel::PPCEmitLoad, which had multiple problems:
1.) The IsVSSRC and IsVSFRC flags are not initialized correctly, since they
are computed from resultReg, which is still zero at this point in many cases.
Fixed by changing the helper routines to operate on a register class instead
of a register and passing in UseRC.
2.) Even with this fixed, Is64VSXLoad is still wrong due to a typo:
bool Is32VSXLoad = IsVSSRC && Opc == PPC::LFS;
bool Is64VSXLoad = IsVSSRC && Opc == PPC::LFD;
The second line needs to use isVSFRC (like PPCEmitStore does).
3.) Once both the above are fixed, we're now generating a VSX instruction --
but an incorrect one, since generation of an indexed instruction with null
index is wrong. Fixed by copying the code handling the same issue in
PPCEmitStore.
The second place is PPCFastISel::PPCMaterializeFP, where we would emit an
LFD to load a constant from the literal pool, and use the wrong result
register class. Fixed by hardcoding a F8RC class even on systems
supporting VSX.
Fixes: https://llvm.org/bugs/show_bug.cgi?id=28630
Differential Revision: https://reviews.llvm.org/D22632
llvm-svn: 277823
This patch fixes register alignment for long double type in
soft float mode. Before this patch alignment was 8 and this
patch changes it to 4.
Differential Revision: http://reviews.llvm.org/D18034
llvm-svn: 268909
This is the same change on PPC64 as r255821 on AArch64. I have even borrowed
his commit message.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given machine function and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng
http://reviews.llvm.org/D17533
llvm-svn: 265781
PPCSimplifyAddress contains this code:
IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context)
: Type::getInt64Ty(*Context));
to determine the type to be used for an index register, if one needs
to be created. However, the "VT" here is the type of the data being
loaded or stored, *not* the type of an address. This means that if
a data element of type i32 is accessed using an index that does not
not fit into 32 bits, a wrong address is computed here.
Note that PPCFastISel is only ever used on 64-bit currently, so the type
of an address is actually *always* MVT::i64. Other parts of the code,
even in this same PPCSimplifyAddress routine, already rely on that fact.
Thus, this patch changes the code to simply unconditionally use
Type::getInt64Ty(*Context) as OffsetTy.
llvm-svn: 265023
The fast isel pass currently emits a COPY_TO_REGCLASS node to convert
from a F4RC to a F8RC register class during conversion of a
floating-point number to integer. There is actually no support in the
common code instruction printers to emit COPY_TO_REGCLASS nodes, so the
PowerPC back-end has special code there to simply ignore
COPY_TO_REGCLASS.
This is correct *if and only if* the source and destination registers of
COPY_TO_REGCLASS are the same (except for the different register class).
But nothing guarantees this to be the case, and if the register
allocator does end up allocating source and destination to different
registers after all, the back-end simply generates incorrect code. I've
included a test case that shows such incorrect code generation.
However, it seems that COPY_TO_REGCLASS is actually not intended to be
used at the MI layer at all. It is used during SelectionDAG, but always
lowered to a plain COPY before emitting MI. Other back-end's fast isel
passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong
for the PowerPC back-end to emit it here.
This patch changes the PowerPC back-end to directly emit COPY instead of
COPY_TO_REGCLASS and removes the special handling in the instruction
printers.
Differential Revision: http://reviews.llvm.org/D18605
llvm-svn: 265020
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.
More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.
Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.
llvm-svn: 263753
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592
This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.
llvm-svn: 262233
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.
llvm-svn: 259840
check that the sign extended constant fits into 16-bits if we want a
zero extended value, otherwise go ahead and put it together piecemeal.
Fixes PR26356.
llvm-svn: 259177
incorrect, as the chosen representative of the weak symbol may not live
with the code in question. Always indirect the access through the TOC
instead.
Patch by Kyle Butt!
llvm-svn: 253708