This patch is a follow up for D61391 to add lround/llround
support for float16.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62861
llvm-svn: 362698
If the given SCEVExpr has no (un)signed flags attached to it, transfer
these to the resulting instruction or use them to find an existing
instruction.
Differential Revision: https://reviews.llvm.org/D61934
llvm-svn: 362687
We already get support for G_ZEXTLOAD to s32 from the importer, but it can't
deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual
selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing
with G_ZEXTLOAD isn't much work.
Also add tests to check the imported pattern selections to s32 work.
llvm-svn: 362681
This is intended to enable the use of an immediate blend or
more optimal instruction. But if the passthru is zero we don't
need any additional instructions.
llvm-svn: 362675
When looking through copies, make sure to not try to find the vreg def of a physreg.
Normally getVRegDef will return nullptr in this case, but if there happens to be
multiple defs then it will assert.
This fixes PR42129.
llvm-svn: 362666
This forced the caller to be aware of this, which is an ugly ABI
feature.
Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.
Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.
Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.
llvm-svn: 362665
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.
Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.
The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.
Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.
Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.
llvm-svn: 362661
Summary:
This change only unifies the API previous API pair accepting
CallInst and InvokeInst, thus making it easier to refactor
inliner pass ode to CallBase. The implementation of the unified
API still relies on the CallSite implementation.
Reviewers: eraman, chandlerc, jdoerfert
Reviewed By: jdoerfert
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62283
llvm-svn: 362656
The AllConstant check needs to be moved out of the if/else if chain to
avoid a test regression. The "there is no SimplifyZExt" comment
puzzles me, since there is SimplifyCastInst. Additionally, the
Simplify* calls seem to not see the operand as constant, so this needs
to be tried if the simplify failed.
llvm-svn: 362653
One of the sources controls the pass through value for the upper bits
of the result so we can't really commute it.
In practice this problem isn't a functional issue because we would
only try to commute this instruction in order to fold a load. But
we can't do embedded rounding and fold a load at the same time. So
the load fold would never succeed so I don't think we would ever
commute or at least keep the version after commuting.
llvm-svn: 362647
When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.
llvm-svn: 362643
Most parts of LLVM don't care whether the byval type is derived from an
explicit Attribute or from the parameter's pointee type, so it makes
sense for the main access function to just return the right value.
The very few users who do care (only BitcodeReader so far) can find out
how it's specified by accessing the Attribute directly.
llvm-svn: 362642
The current PIC support currently only works with Emscripten, so
disable it for other targets.
This is the PIC portion of https://reviews.llvm.org/D62542.
Reviewed By: dschuff, sbc100
llvm-svn: 362638
As far as I know these should be freely reassociatable just like
the floating point MAXC/MINC instructions.
The *reduce* test changes are largely regressions and caused by
the "generic" CPU we default to not having a scheduler model.
The machine-combiner-int-vec.ll test shows the positive benefits
of this change.
Differential Revision: https://reviews.llvm.org/D62787
llvm-svn: 362629
As suggested in D62498 - collectConcatOps() matches both
concat_vectors and insert_subvector patterns, and we see
more test improvements by using the more general match.
llvm-svn: 362620
This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 .
The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found.
Please see the lit test for some examples.
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D62427
llvm-svn: 362613
Instead of passing around fast-math-flags as a parameter, we can set those
using an IRBuilder guard object. This is no-functional-change-intended.
The motivation is to eventually fix the vectorizers to use and set the
correct fast-math-flags for reductions. Examples of that not behaving as
expected are:
https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast')
https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0)
D61802 (should be able to reduce with IR-level FMF)
Differential Revision: https://reviews.llvm.org/D62272
llvm-svn: 362612
This is the LLVM part of this change, the Clang part contains the full
description in its commit message.
Differential Revision: https://reviews.llvm.org/D60697
llvm-svn: 362600
We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type.
This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size.
llvm-svn: 362599
@jdoerfert Looks like these are placeholders for incoming abstract attributes patches so I've just commented the code out, even though this is usually frowned upon.
llvm-svn: 362592
This reverts commit 5b32f60ec3.
The fix is in commit 4f9e68148b.
This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126
llvm-svn: 362583
NOTE: Note that no attributes are derived yet. This patch will not go in
alone but only with others that derive attributes. The framework is
split for review purposes.
This commit introduces the Attributor pass infrastructure and fixpoint
iteration framework. Further patches will introduce abstract attributes
into this framework.
In a nutshell, the Attributor will update instances of abstract
arguments until a fixpoint, or a "timeout", is reached. Communication
between the Attributor and the abstract attributes that are derived is
restricted to the AbstractState and AbstractAttribute interfaces.
Please see the file comment in Attributor.h for detailed information
including design decisions and typical use case. Also consider the class
documentation for Attributor, AbstractState, and AbstractAttribute.
Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames
Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59918
llvm-svn: 362578
Generally speaking, we lower to an optimal rotate sequence for nodes visible in
the SDAG. However, there are instances where the two rotates are not visible at
ISEL time - most notably those in a very common sequence when lowering switch
statements to jump tables.
A common situation is a switch on a 32-bit integer. This value has to have the
upper 32 bits cleared and because jump table offsets are word offsets, the value
needs to be shifted left by 2 bits. We currently emit the clear and the left
shift as two separate instructions, but this is not needed as we can lower it to
a single RLDIC.
This patch just cleans that up.
Differential revision: https://reviews.llvm.org/D60402
llvm-svn: 362576
In glibc, DT_PPC_GOT indicates that PowerPC32 Secure PLT ABI is used.
I plan to use it in D62464.
DT_PPC_OPT currently indicates if a TLSDESC inspired TLS optimization is
enabled.
Reviewed By: grimar, jhenderson, rupprecht
Differential Revision: https://reviews.llvm.org/D62851
llvm-svn: 362569
-Use early returns to reduce indentation
-Replace multipe ifs with a switch.
-Replace an assert with an llvm_unreachable default in the switch.
-Check that the FP type we're going to use for the
X86ISD::FAND/FOR/FXOR is legal rather than checking that the
integer type matches the width of a legal scalar fp type. This all
runs after legalization so it shouldn't really matter, but making
sure we're using a valid type in the X86ISD node is really
whats important.
llvm-svn: 362565
Although we had the support in the prelegalizer combiner to generate the
G_SEXTLOAD or G_ZEXTLOAD ops, the legalizer definitions for arm64 had them as
lowering back to separate ops.
llvm-svn: 362553
Summary:
Adjusts the index and adds a bitcast around the vector operand of
EXTRACT_VECTOR_ELT so that its lane type matches the source type of
its parent sext_inreg. Without this bitcast the ISel patterns do not
match and ISel fails.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62646
llvm-svn: 362547
Summary:
An argument that is return by a function but bit-casted before can still
be annotated as "returned". Make sure we do not crash for this case.
Reviewers: sunfish, stephenwlin, niravd, arsenm
Subscribers: wdng, hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59917
llvm-svn: 362546
This patch allows current users of Value::stripPointerCasts() to force
the result of the function to have the same representation as the value
it was called on. This is useful in various cases, e.g., (non-)null
checks.
In this patch only a single call site was adjusted to fix an existing
misuse that would cause nonnull where they may be wrong. Uses in
attribute deduction and other areas, e.g., D60047, are to be expected.
For a discussion on this topic, please see [0].
[0] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128423.html
Reviewers: hfinkel, arsenm, reames
Subscribers: wdng, hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61607
llvm-svn: 362545
- The loop in demangleFunctionParameterList() only exits
on Error, @, and Z. All 3 cases were handled, so the
rest of the function is DEMANGLE_UNREACHABLE.
- The loop in demangleTemplateParameterList() always returns
on Error, so there's no need to check for that in the loop
header and after the loop.
- Add test cases for invalid function parameter manglings.
- Add a (redundant) test case for a simple template parameter
list mangling.
- Add a test case pointing out that varargs functions aren't
demangled correctly.
llvm-svn: 362540
Summary:
Following the cleanup in D48202, method foldBlockIntoPredecessor has the
same behavior. Replace its uses with MergeBlockIntoPredecessor.
Remove foldBlockIntoPredecessor.
Reviewers: chandlerc, dmgreen
Subscribers: jlebar, javed.absar, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62751
llvm-svn: 362538
We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table.
This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K.
There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need.
We can probably do something similar for scalar, but I haven't looked at it yet.
Differential Revision: https://reviews.llvm.org/D62757
llvm-svn: 362535
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.
Fixes PR42118
Differential Revision: https://reviews.llvm.org/D62828
llvm-svn: 362533
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3
But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.
We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is a reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.
Differential Revision: https://reviews.llvm.org/D62498
llvm-svn: 362524
Arm Architecture v8.5a introduces Branch Target Identification (BTI). When
enabled all indirect branches must target a bti instruction of the
appropriate form. As PLT sequences may sometimes be the target of an
indirect branch and PLT[0] always is, a static linker may need to generate
PLT sequences that contain "bti c" as the first instruction. In effect:
bti c
adrp x16, page offset to .got.plt
...
Instead of:
adrp x16, page offset to .got.plt
...
At present the PLT decoding assumes the adrp will always be the first
instruction. This patch adds support for a single "bti c" to prefix it. A
test binary has been uploaded with such a PLT sequence. A forthcoming LLD
patch will make heavy use of the PLT decoding code.
Differential Revision: https://reviews.llvm.org/D62598
llvm-svn: 362523
- For error returns in demangleSpecialTableNode(),
demangleLocalStaticGuard(), RTTITypeDescriptor,
demangleRttiBaseClassDescriptorNode(), demangleUnsigned(),
demangleUntypedVariable() (via RttiBaseClassArray)
- For ?_A and ?_P which are handled at early levels of the
demangler but are not implemented in a later stage; this
is now more obvious
- Replace a "default:" with an explicit list of cases, to
get -Wswitch check we list all cases
llvm-svn: 362520
The underlying ConstantRange functionality has been added in D60952,
D61207 and D61238, this just exposes it for LVI.
I'm switching the code from using a whitelist to a blacklist, as
we're down to one unsupported operation here (xor) and writing it
this way seems more obvious :)
Differential Revision: https://reviews.llvm.org/D62822
llvm-svn: 362519
- Add test coverage around invalid anon namespaces and
for error paths in demanglePrimitiveType() and in
demangleFullyQualifiedTypeName()
- Use DEMANGLE_UNREACHABLE in two more unreachable places
llvm-svn: 362514
This is to address some of the problems in existing P9 resource modeling,
especially about the dispatching rules.
Instead of using a hypothetical DISPATCHER , we try to use the number of
actual dispatch slots, and define SchedWriteRes to model dispatch rules,
then update instruction classes according to dispatch rules.
All the dispatch rules and instruction classes update are made according
to POWER9 User Manual.
Differential Revision: https://reviews.llvm.org/D61873
llvm-svn: 362509
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.
Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.
llvm-svn: 362507
- Replace `Error = true` in a few branches that are truly unreachable
with DEMANGLE_UNREACHABLE
- Remove early return early in startsWithLocalScopePattern() because
it's redundant with the next two early returns
- Remove unreachable `case '0'` (it's handled in the branch below)
- Remove an unused bool return
- Add test coverage for several early error returns, mostly in
array type parsing
llvm-svn: 362506
As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher.
llvm-svn: 362504
This matches APInt's versions of these functions, and there is no need for these to be size_t.
(as well as __builtin_clzll())
Differential Revision: https://reviews.llvm.org/D60823
llvm-svn: 362503
ELF for the 64-bit Arm Architecture defines two processor-specific dynamic
tags:
DT_AARCH64_BTI_PLT 0x70000001, d_val
DT_AARCH64_PAC_PLT 0x70000003, d_val
These presence of these tags indicate that PLT sequences have been
protected using Branch Target Identification and Pointer Authentication
respectively. The presence of both indicates that the PLT sequences have
been protected with both Branch Target Identification and Pointer
Authentication.
This patch adds the tags and tests for llvm-readobj and yaml2obj.
As some of the processor specific dynamic tags overlap, this patch splits
them up, keeping their original default value if they were not previously
mentioned explicitly in a switch case.
Differential Revision: https://reviews.llvm.org/D62596
llvm-svn: 362493
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.
As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.
This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/vMd3
Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma
Reviewed By: RKSimon
Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62774
llvm-svn: 362488
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.
The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.
It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.
Differential Revision: https://reviews.llvm.org/D62777
llvm-svn: 362486
Includes a fix for an introduced build failure due to a post c++11 use of std::mismatch.
This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command.
Differential Revision: https://reviews.llvm.org/D59491
llvm-svn: 362484
This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.
Unfortunately, insertions cannot be handled in the same way, because:
* There is no instruction to insert FP16 into an even lane (VINS only
works with odd lanes)
* The patterns for odd lanes will have a form of a DAG (not a tree),
and will not be implementable in pure tablegen
Because of this insertions are handled in the same way as 16-bit
integer insertions (with conversions between FP registers and GPRs
using VMOVHR instructions).
Without these patterns the ARM backend would sometimes fail during
instruction selection.
This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
instruction
* an FP16 load and insertion into a single VLD1 instruction
Differential Revision: https://reviews.llvm.org/D62651
llvm-svn: 362482
While prof branch_weights inconsistencies are being fixed patch
by patch (pass by pass) we need SwitchInstProfUpdateWrapper to
be safe with respect to inconsistent metadata that can come from
passes that have not been fixed yet. See the bug found by @nikic
in https://reviews.llvm.org/D62126.
This patch introduces one more state (called Invalid) to the
wrapper class that allows users to work with the underlying
SwitchInst ignoring the prof metadata changes.
Created a unit test for the SwitchInstProfUpdateWrapper class.
Reviewers: davidx, nikic, eraman, reames, chandlerc
Reviewed By: davidx
Differential Revision: https://reviews.llvm.org/D62656
llvm-svn: 362473
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D61843
llvm-svn: 362472
The family of 32-bit Thumb instruction encodings that include t2ORR,
t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15.
The Tablegen descriptions of those instructions listed them as ?. This
change tightens that up by making them into 0 + Unpredictable.
In the specific case of t2ORR, we tighten it up still further by
making the zero bit mandatory. This change comes from Arm v8.1-M, in
which encodings with that bit equal to 1 will now be used for
different instructions.
Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma
Reviewed By: dmgreen, efriedma
Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60705
llvm-svn: 362470
This is something of a workaround, and the state of stack realignment
controls is kind of a mess. Ideally, we would be able to specify the
stack is infinitely aligned on entry to a kernel.
TargetFrameLowering provides multiple controls which apply at
different points. The StackRealignable field is used during
SelectionDAG, and for some reason distinct from this
hook. StackAlignment is a single field not dependent on the
function. It would probably be better to make that dependent on the
calling convention, and the maximum value for kernels.
Currently this doesn't really change anything, since the frame
lowering mostly does its own thing. This helps avoid regressions in a
future change which will rely more heavily on hasFP.
llvm-svn: 362447
Instead of emitting all of the test stuff for a compare when it's only used by
a select, instead, just emit the compare + select. The select will use the
value of NZCV correctly, so we don't need to emit all of the test instructions
etc.
For now, only support fp selects which use G_FCMP. Also only support condition
codes which will only require one select to represent.
Also add a test.
Differential Revision: https://reviews.llvm.org/D62695
llvm-svn: 362446
This looks like an oversight as all the other binary operators are present.
Accidentally noticed while auditing places that need FNeg handling.
No test because as noted in the review it would be contrived and amount to "don't crash"
Differential Revision: https://reviews.llvm.org/D62790
llvm-svn: 362441
r362199 fixed it for zero masking, but not zero masking. The load
folding in the peephole pass hid the bug. This patch turns off
the peephole pass on the relevant test to ensure coverage.
llvm-svn: 362440
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.
Reviewers: qcolombet, spatel
Reviewed By: qcolombet
Subscribers: nemanjai, jsji
Differential Revision: https://reviews.llvm.org/D62552
llvm-svn: 362439
We currently miss the opportunities for optmizing comparisons in the peephole
optimizer if the input is the result of a COPY since we look for record-form
versions of the producing instruction.
This patch simply lets the optimization peek through copies.
Differential revision: https://reviews.llvm.org/D59633
llvm-svn: 362438
For some reason multiple places need to do this, and the variant the
loop unroller and inliner use was not handling it.
Also, introduce a new wrapper to be slightly more precise, since on
AMDGPU some addrspacecasts are free, but not no-ops.
llvm-svn: 362436
The implementation is conceptually simple: We separate the LHS and
RHS into positive and negative components and then also compute the
positive and negative components of the result, taking into account
that e.g. only pos/pos and neg/neg will give a positive result.
However, there's one significant complication: SignedMin / -1 is UB
for sdiv, and we can't just ignore it, because the APInt result of
SignedMin would break the sign segregation. Instead we drop SignedMin
or -1 from the corresponding ranges, taking into account some edge
cases with wrapped ranges.
Because of the sign segregation, the implementation ends up being
nearly fully precise even for wrapped ranges (the remaining
imprecision is due to ranges that are both signed and unsigned
wrapping and are divided by a trivial divisor like 1). This means
that the testing cannot just check the signed envelope as we
usually do. Instead we collect all possible results in a bitvector
and construct a better sign wrapped range (than the full envelope).
Differential Revision: https://reviews.llvm.org/D61238
llvm-svn: 362430
This patch fixes a problem that occurs in LowerSwitch when a switch statement has a PHI node as its condition, and the PHI node only has two incoming blocks, and one of those incoming blocks is through an unreachable default in the switch statement. When this condition occurs, LowerSwitch holds a pointer to the condition value, but removes the switch block as a predecessor of the PHI block, causing the PHI node to be replaced. LowerSwitch then tries to use its stale pointer to the original condition value, causing a crash.
Differential Revision: https://reviews.llvm.org/D62560
llvm-svn: 362427
This reverts commit r362407. It broke compilation of
llvm/lib/Object/ArchiveWriter.cpp:
error: type 'llvm::sys::path::const_iterator' does not provide a call
operator
llvm-svn: 362413
The big endian PPC buildbots are all failing now due to calls to cache
invalidation in unit tests on data that has only the PROT_EXEC flag set.
This has been an issue all along on FreeBSD but it can affect Linux machines
depending on configuration.
This patch mitigates the issue the same way it is mitigated on FreeBSD.
Since this is needed to bring the buildbots back to green, I plan to commit this
and allow for post-commit review, but I thought I would also post it here for
ease of access/readability.
Differential revision: https://reviews.llvm.org/D62741
llvm-svn: 362412
This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command.
Differential Revision: https://reviews.llvm.org/D59491
llvm-svn: 362407
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction
Differential Revision: https://reviews.llvm.org/D62807
llvm-svn: 362397
LanaiMCCodeEmitter.cpp was not using any APIs from Lanai.h, and was only
including it for transitive dependencies. Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Lanai target library and
the MCTargetDesc library).
llvm-svn: 362394
HexagonInstPrinter.cpp was not using any APIs from HexagonAsmPrinter.h.
Doing so is problematic from include-what-you-use perspective, but it is
also a layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362389
HexagonMCInstrInfo.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362387
HexagonMCCodeEmitter.cpp was not using any APIs from Hexagon.h. Doing
so is problematic from include-what-you-use perspective, but it is also
a layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362386
HexagonMCCompound.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362385
HexagonShuffler.cpp was not using any APIs from Hexagon.h, and was only
including it for transitive dependencies. Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Hexagon target library
and the MCTargetDesc library).
llvm-svn: 362384
HexagonMCChecker.cpp was not using any APIs from Hexagon.h. Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362383
HexagonMCTargetDesc.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362382
HexagonMCShuffler.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362381
The recent change D60691 introduced a bug in clang when handling
option combinations such as `-mcpu=cortex-m4 -mfpu=none`. Those
options together should select Cortex-M4 but disable all use of
hardware FP, but in fact, now hardware FP instructions can still be
generated in that mode.
The reason is because the handling of FPUVersion::NONE disables all
the same feature names it used to, of which the base one is `vfp2`.
But now there are further features below that, like `vfp2d16fp` and
(following D60694) `fpregs`, which also need to be turned off to
disable hardware FP completely.
Added a tiny test which double-checks that compiling a simple FP
function doesn't access the FP registers.
Reviewers: SjoerdMeijer, dmgreen
Reviewed By: dmgreen
Subscribers: lebedev.ri, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D62729
llvm-svn: 362380
HexagonELFObjectWriter.cpp was not using any APIs from Hexagon.h, and
was only including it for transitive dependencies. Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362376
When LiveDebugValues deduces new variable's location from spill, restore or
register copy instruction it should close old variable's location. Otherwise
we can have multiple block output locations for same variable. That could lead
to inserting two DBG_VALUEs for same variable to the beginning of the successor
block which results to ignoring of first DBG_VALUE.
Reviewers: aprantl, jmorse, wolfgangp, dstenb
Reviewed By: aprantl
Subscribers: probinson, asowda, ivanbaev, petarj, djtodoro
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D62196
llvm-svn: 362373
HexagonAsmBackend.cpp was not using any APIs from Hexagon.h. Doing so
is problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the MCTargetDesc library).
llvm-svn: 362372
HexagonAsmParser.cpp was not using any APIs from Hexagon.h. Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
Hexagon target library and the AsmParser library).
llvm-svn: 362370
HexagonShuffler.h was not using any APIs from Hexagon.h, and was only
including it for transitive dependencies. Doing so is problematic from
include-what-you-use perspective, but it is also a layering issue (it
creates a dependency cycle between the primary Hexagon target library
and the MCTargetDesc library).
llvm-svn: 362369
BPFMCTargetDesc.cpp was not using any APIs from BPF.h. Doing so is
problematic from include-what-you-use perspective, but it is also a
layering issue (it creates a dependency cycle between the primary
BPF target library and the MCTargetDesc library).
llvm-svn: 362368
Summary:
- pr42062
When compiling for MinSize,
ARMTargetLowering::LowerCall decides to indirect
multiple calls to a same function. However,
it disconsiders the limitation that thumb1
indirect calls require the callee to be in a
register from r0 to r3 (llvm limiation).
If all those registers are used by arguments, the
compiler dies with "error: run out of registers
during register allocation".
This patch tells the function
IsEligibleForTailCallOptimization if we intend to
perform indirect calls, as to avoid tail call
optimization.
Reviewers: dmgreen, efriedma
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62683
llvm-svn: 362366
Summary:
LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z.
So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS.
Reviewers: dylanmckay
Reviewed By: dylanmckay
Subscribers: dylanmckay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62300
llvm-svn: 362351
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.
Reviewers: niravd, spatel, craig.topper, rupprecht
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D62633
llvm-svn: 362350
Also add two FC_Far that seem to be missing, by symmetry from
the public and protected cases. (But FC_Far isn't really a thing
anymore, so this doesn't really have an observable effect.)
llvm-svn: 362344
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.
Differential Revision: https://reviews.llvm.org/D59188
llvm-svn: 362324
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.
PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.
Differential Revision: https://reviews.llvm.org/D62783
llvm-svn: 362323
The results of the dyn_casts were immediately dereferenced on the next line
so they had better not be null.
I don't think there's any way for these dyn_casts to fail, so use a cast
of adding null check.
llvm-svn: 362315
Over a year ago, MachineInstr gained a fourth boolean parameter that occurs
before the TII pointer. When this happened, several places started accidentally
passing TII into this boolean parameter instead of the TII parameter.
llvm-svn: 362312
Extract a willNotOverflow() helper function that is shared between
eliminateOverflowIntrinsic() and strengthenOverflowingOperation().
Use WithOverflowInst for the former.
We'll be able to reuse the same code for saturating intrinsics as
well.
llvm-svn: 362305
Summary: Fneg can be implemented with an xor rather than a function call so we don't need to add the function call overhead. This was pointed out in D62699
Reviewers: efriedma, cameron.mcinally
Reviewed By: efriedma
Subscribers: javed.absar, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62747
llvm-svn: 362304
The `cfcmsa` and `ctcmsa` instructions accept index of MSA control
register. The MIPS64 SIMD Architecture define eight MSA control
registers. But register index for `cfcmsa` and `ctcmsa` instructions
might be any number in 0..31 range. If the index is greater then 7,
`cfcmsa` writes zero to the destination registers and `ctcmsa` does
nothing [1].
[1] MIPS Architecture for Programmers Volume IV-j:
The MIPS64 SIMD Architecture Module
https://www.mips.com/?do-download=the-mips64-simd-architecture-module
Differential Revision: https://reviews.llvm.org/D62597
llvm-svn: 362299
If we would allow register coalescing on PTRDISPREGS class then register
allocator can lock Z register to some virtual register. Larger instructions
requiring a memory acces then fail during the register allocation phase since
there is no available register to hold a pointer if Y register was already
taken for a stack frame. This patch prevents it by keeping Z register
spillable. It does it by not allowing coalescer to lock it.
Original discussion on https://github.com/avr-rust/rust/issues/128.
llvm-svn: 362298
Fix for https://bugs.llvm.org/show_bug.cgi?id=31181 and partial fix
for LFTR poison handling issues in general.
When LFTR moves a condition from pre-inc to post-inc, it may now
depend on value that is poison due to nowrap flags. To avoid this,
we clear any nowrap flag that SCEV cannot prove for the post-inc
addrec.
Additionally, LFTR may switch to a different IV that is dynamically
dead and as such may be arbitrarily poison. This patch will correct
nowrap flags in some but not all cases where this happens. This is
related to the adoption of IR nowrap flags for the pre-inc addrec.
(See some of the switch_to_different_iv tests, where flags are not
dropped or insufficiently dropped.)
Finally, there are likely similar issues with the handling of GEP
inbounds, but we don't have a test case for this yet.
Differential Revision: https://reviews.llvm.org/D60935
llvm-svn: 362292
This allows the DWARFExpression class to handle addresses without
crashing on targets with 16-bit pointers like AVR.
This is required in order to generate assembly from clang via the '-S'
flag.
This fixes an error with the following message:
clang: llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h:132: llvm::DWARFExpression::DWARFExpression(llvm::DataExtractor, uint16_t, uint8_t):
Assertion `AddressSize == 8 || AddressSize == 4' failed.
llvm-svn: 362290
Summary:
This was flagged in https://www.viva64.com/en/b/0629/ under "Snippet No.
33".
It seems that this statement is doing the standard bitwise trick for
adjusting a value to have a specific alignment.
The issue is that getStubAlignment() returns an unsigned, while DataSize
is declared a uint64_t. The right hand side of the expression is not
extended to 64b before bitwise negation, resulting in the top half of
the mask being 0s, which is not correct for realignment.
Reviewers: lhames, MaskRay
Reviewed By: MaskRay
Subscribers: RKSimon, MaskRay, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62227
llvm-svn: 362286
At the moment, LoopPredication completely bails out if it sees a latch of the form:
%cmp = icmp ne %iv, %N
br i1 %cmp, label %loop, label %exit
OR
%cmp = icmp ne %iv.next, %NPlus1
br i1 %cmp, label %loop, label %exit
This is unfortunate since this is exactly the form that LFTR likes to produce. So, go ahead and recognize simple cases where we can.
For pre-increment loops, we leverage the fact that LFTR likes canonical counters (i.e. those starting at zero) and a (presumed) range fact on RHS to discharge the check trivially.
For post-increment forms, the key insight is in remembering that LFTR had to insert a (N+1) for the RHS. CVP can hopefully prove that add nsw/nuw (if there's appropriate range on N to start with). This leaves us both with the post-inc IV and the RHS involving an nsw/nuw add, and SCEV can discharge that with no problem.
This does still need to be extended to handle non-one steps, or other harder patterns of variable (but range restricted) starting values. That'll come later.
Differential Revision: https://reviews.llvm.org/D62748
llvm-svn: 362282
We were hashing the string pointer, not the string, so two instructions
could be identical (isIdenticalTo), but have different hash codes.
This showed up as a very rare, non-deterministic assertion failure
rehashing a DenseMap constructed by MachineOutliner. So there's no
"real" testcase, just a unittest which checks that the hash function
behaves correctly.
I'm a little scared fixing this is going to cause a regression in
outlining or MachineCSE, but hopefully we won't run into any issues.
Differential Revision: https://reviews.llvm.org/D61975
llvm-svn: 362281
CodeView has its own register map which is defined in cvconst.h. Missing this
mapping before saving register to CodeView causes debugger to show incorrect
value for all register based variables, like variables in register and local
variables addressed by register (stack pointer + offset).
This change added mapping between LLVM register and CodeView register so the
correct register number will be stored to CodeView/PDB, it aso fixed the
mapping from CodeView register number to register name based on current
CPUType but print PDB to yaml still assumes X86 CPU and needs to be fixed.
Differential Revision: https://reviews.llvm.org/D62608
llvm-svn: 362280
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM (r353563),
a few checks for INLINEASM needed to be updated to check for either
case.
pr/41999
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: nemanjai, hiraditya, kbarton, jsji, llvm-commits, craig.topper, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62403
llvm-svn: 362278
Testing with debuggers shows that our previous behavior was correct.
The reason I thought MSVC did things differently is that MSVC prefers to
use the 0xB combined code offset and code length update opcode when
inline sites are discontiguous.
Keep the test changes, and update the llvm-pdbutil inline line table
dumper to account for this new interpretation of the opcodes.
llvm-svn: 362277
When the object size argument is -1, no checking can be done, so calling the
_chk variant is unnecessary. We already did this for a bunch of these
functions.
rdar://50797197
Differential revision: https://reviews.llvm.org/D62358
llvm-svn: 362272
Just copy all of the operands except the chain and call MorphNode on that.
This removes the IsUnary and IsTernary flags.
Also always get the result type from the result type of the original
nodes. Previously we got it from the operand except for two nodes
where that didn't work.
llvm-svn: 362269
Summary:
Fixes a warning produced from scan-build (llvm.org/reports/scan-build/),
further warnings found by annotation isMoveInstr [[nodiscard]].
isMoveInstr potentially does not assign to its parameters, so if they
were uninitialized, they will potentially stay uninitialized. It seems
most call sites pass references to uninitialized values, then use them
without checking the return value.
Reviewers: wmi
Reviewed By: wmi
Subscribers: MatzeB, qcolombet, hiraditya, tpr, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62109
llvm-svn: 362265
After improving the inline line table dumper in llvm-pdbutil and looking
at MSVC's inline line tables, it is clear that setting the length of the
inlined code region does not update the code offset. This means that the
delta to the beginning of a new discontiguous inlined code region should
be calculated relative to the last code offset, excluding the length.
Implementing this is a one line fix for MC: simply don't update
LastLabel.
While I'm updating these test cases, switch them to use llvm-objdump -d
and llvm-pdbutil. This allows us to show offsets of each instruction and
correlate the line table offsets to the actual code.
llvm-svn: 362264
If we can determine that a saturating add/sub will not overflow based
on range analysis, convert it into a simple binary operation. This is
a sibling transform to the existing with.overflow handling.
Reapplying this with an additional check that the saturating intrinsic
has integer type, as LVI currently does not support vector types.
Differential Revision: https://reviews.llvm.org/D62703
llvm-svn: 362263
Noticed on D62703. LVI only handles plain integers, not vectors of
integers. This was previously not an issue, because vector support
for with.overflow is only a relatively recent addition.
llvm-svn: 362261
We don't want to create vregs if there is nothing to use them for. That causes
verifier errors.
Differential Revision: https://reviews.llvm.org/D62740
llvm-svn: 362247
If we can determine that a saturating add/sub will not overflow
based on range analysis, convert it into a simple binary operation.
This is a sibling transform to the existing with.overflow handling.
Differential Revision: https://reviews.llvm.org/D62703
llvm-svn: 362242
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Cameron McInally, Kevin P. Neal
Approved by: Cameron McInally
Differential Revision: https://reviews.llvm.org/D62546
llvm-svn: 362241
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.
Differential Revision: https://reviews.llvm.org/D62707
llvm-svn: 362239
In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.
condc = conda || condb
br condc, label %target, label %fallthrough
It can be transformed to following,
br conda, label %target, label %newbb
newbb:
br condb, label %target, label %fallthrough
Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.
This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.
Differential Revision: https://reviews.llvm.org/D62430
llvm-svn: 362237
These can take a significant amount of time in some builds.
Suggested by Andrea Di Biagio.
Differential Revision: https://reviews.llvm.org/D62666
llvm-svn: 362219
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.
https://rise4fun.com/Alive/ftR
llvm-svn: 362217
Summary:
A three sources variant of the TBL instruction is added to the existing
SVE instruction in SVE2. This is implemented with minor changes to the
existing TableGen class. TBX is a new instruction with its own
definition.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62600
llvm-svn: 362214
Handle position independent code for MIPS32.
When callee is global address, lower call will emit callee
as G_GLOBAL_VALUE and add target flag if needed.
Support $gp in getRegBankFromRegClass().
Select G_GLOBAL_VALUE, specially handle case when
there are target flags attached by lowerCall.
Differential Revision: https://reviews.llvm.org/D62589
llvm-svn: 362210
Move initGlobalBaseReg from MipsSEDAGToDAGISel to MipsFunctionInfo.
This way functions used for handling position independent code during
instruction selection, getGlobalBaseReg and initGlobalBaseReg,
end up in same class.
Differential Revision: https://reviews.llvm.org/D62586
llvm-svn: 362206
Lower call for callee that is register for MIPS32.
Register should contain callee function address.
Differential Revision: https://reviews.llvm.org/D62585
llvm-svn: 362204