Commit Graph

27928 Commits

Author SHA1 Message Date
Matt Arsenault 7464e8d6ad GlobalISel: Remove check for illegal MIR
The verifier will catch this.
2020-02-05 18:37:17 -05:00
Jonas Paulsson 96ea377ea4 [PHIElimination] Compile time optimization for huge functions.
This is a compile-time optimization for PHIElimination (splitting of critical
edges), which was reported at https://bugs.llvm.org/show_bug.cgi?id=44249. As
discussed there, the way to remedy the slowdowns with huge functions is to
pre-compute the live-in registers for each MBB in an efficient way in
PHIElimination.cpp and then pass that information along to
LiveVariabless::addNewBlock().

In all the huge test programs where this slowdown has been noticable, it has
dissapeared entirely with this patch.

Review: Björn Pettersson, Quentin Colombet.

Differential Revision: https://reviews.llvm.org/D73152
2020-02-05 18:10:03 -05:00
Matt Arsenault 9087ef0765 GlobalISel: Allow CSE of G_IMPLICIT_DEF
The legalizer produces a lot of these, and they make reading legalized
MIR annoying. For some reason, this does seem to sometimes introduce
copies of implicit def, which is dumb.
2020-02-05 17:47:21 -05:00
Shu-Chun Weng ce9633633c [GlobalISel][AArch64] Fix contract cross-bank copies with SIMD instructions
contractCrossBankCopyIntoStore() finds the instruction defines the
source register and uses its output to replace the register. There are,
however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES.
Current implementation hardcodes to operand 0 and has no way of knowing
which output should be used.

This change adds another function to directly return the register that
is the source of the register and use that for folding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=44783

Differential Revision: https://reviews.llvm.org/D74005
2020-02-05 10:38:35 -08:00
Simon Pilgrim 4592bb7195 visitINSERT_VECTOR_ELT - pull out repeated dyn_cast. NFCI.
This always gets called at least once.
2020-02-05 13:30:54 +00:00
Djordje Todorovic de90d73e03 [DebugInfo] Avoid the call site param for mem instrs with multiple defs
We currently only handle mem instructions with a single define.
Avoid the call site parameter debug info when we find the case with
multiple defs, rather than throwing an assert.

Differential Revision: https://reviews.llvm.org/D73954
2020-02-05 10:03:14 +01:00
Thomas Lively 649aba93a2 Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
Summary:
This reverts commit 3ef169e586. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.

Depends on D73927.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73928
2020-02-04 20:04:59 -08:00
David Blaikie ec50e10db4 DebugInfo: Hash DW_OP_convert in loclists when using Split DWARF
Originally committed in: 1ced28cbe7
            Reverted in: f75301d16d

(reverted due to tests failing on non-linux/x86 targets, tests have since been
generalized and specialized... since Split DWARF isn't supported on non-elf
targets anyway and we have no way to run on "whatever elf target is available"
so they fail on MacOS without an explicit target triple)

This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.

Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
2020-02-04 19:25:47 -08:00
Francis Visoiu Mistrih 7531a5039f [Remarks] Extend the RemarkStreamer to support other emitters
This extends the RemarkStreamer to allow for other emitters (e.g.
frontends, SIL, etc.) to emit remarks through a common interface.

See changes in llvm/docs/Remarks.rst for motivation and design choices.

Differential Revision: https://reviews.llvm.org/D73676
2020-02-04 17:16:02 -08:00
Reid Kleckner 2d89e0a098 [SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.

Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.

Fixes PR44697

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D73752
2020-02-04 15:13:12 -08:00
Matt Arsenault 23b76096b7 CodeGenPrepare: Reorder check for cold and shouldOptimizeForSize
shouldOptimizeForSize is showing up in a profile, spending around 10%
of the pass time in one function. This should probably not be so slow,
but the much cheaper attribute check should be done first anyway.
2020-02-04 11:23:13 -08:00
Matt Arsenault de8451fe4d GlobalISel: Fold SmallVector resizes into constructors 2020-02-04 10:28:08 -08:00
Matt Arsenault a3c814d234 Separately track input and output denormal mode
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
2020-02-04 12:59:21 -05:00
Nico Weber f75301d16d Revert "DebugInfo: Check DW_OP_convert in loclists with Split DWARF"
and follow-ups.

This reverts commit 1ced28cbe7.
This reverts commit 4f281f0474.
This reverts commit 552a8fe12b.

The test fails on non-Linux.
2020-02-04 10:06:46 -05:00
Jeremy Morse 41206b61e3 [DebugInfo] Re-instate LiveDebugVariables scope trimming
This patch reverts part of r362750 / D62650, which stopped
LiveDebugVariables from trimming leading variable location ranges down
to only covering those instructions that are in scope. I've observed some
circumstances where the number of DBG_VALUEs in a function can be
amplified in an un-necessary way, to cover more instructions that are
out of scope, leading to very slow compile times. Trimming the range
of instructions that the variables cover solves the slow compile times.

The specific problem that r362750 tries to fix is addressed by the
assignment to RStart that I've added. Any variable location that begins
at the first instruction of a block will now be considered to begin at the
start of the block. While these sound the same, the have different
SlotIndexes, and the register allocator may shoehorn additional
instructions in between the two. The test added in the past
(wrong_debug_loc_after_regalloc.ll) still works with this modification.

live-debug-variables.ll has a range trimmed to not cover the prologue of
the function, while dbg-addr-dse.ll has a DBG_VALUE sink past one
instruction with no DebugLoc, which is expected behaviour.

Differential Revision: https://reviews.llvm.org/D73691
2020-02-04 14:51:06 +00:00
Filipe Cabecinhas abada5036e [NFC] Fix some spelling mistakes to test pushing to GH. 2020-02-04 11:07:31 +00:00
Simon Pilgrim 3dd688a9ee [DAG] OptLevelChanger - fix uninitialized variable analyzer warning (PR44471)
Ensure that OptLevelChanger::SavedFastISel is initialized in the constructor.

This should be NFC - as the equivalent 'same opt level' early-out is used in the destructor as well, so SavedFastISel is only actually referenced in the general case.

Differential Revision: https://reviews.llvm.org/D73875
2020-02-04 10:54:33 +00:00
David Green 362d00e051 [ARM][VecReduce] Force expand vector_reduce_fmin
Under MVE, we do not have any lowering for fminimum, which a
vector_reduce_fmin without NoNan will be expanded into. As with the
other recent patches, force this to expand in the pre-isel pass. Note
that Neon lowering would be OK because the scalar fminimum uses the
vector VMIN instruction, but is probably better to just rely on the
scalar operations, which is what is done here.

Also fixes what appears to be the reversal of INF vs -INF in the
vector_reduce_fmin widening code.
2020-02-04 09:36:59 +00:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
David Blaikie 1ced28cbe7 DebugInfo: Hash DW_OP_convert in loclists when using Split DWARF
This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.

Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
2020-02-03 19:16:42 -08:00
David Blaikie 031f83fb82 DebugInfo: Simplify emitDebugLocEntry by never passing a null CU 2020-02-03 18:47:14 -08:00
Matt Arsenault cd7650c186 GlobalISel: Implement fewerElementsVector for G_SEXT_INREG
Start using a new strategy with a combination of merge and unmerges.

This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
2020-02-03 11:47:33 -08:00
Quentin Colombet f26ff8c9df [TargetRegisterInfo] Make the heuristic to skip region split overridable by the target
RegAllocGreedy uses a fairly compile time intensive splitting heuristic
called region splitting. This heuristic was disabled via another heuristic
when it is likely that it won't be worth the compile time. The only way
to control this other heuristic was via a command line option (huge-size-for-split).

This commit gives more control on this heuristic by making it overridable
by the target using a target hook in TargetRegisterInfo called
shouldRegionSplitForVirtReg.

The default implementation of this hook keeps the heuristic as it was
before this patch.
2020-02-03 11:30:35 -08:00
Simon Pilgrim 61621f826a [TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling
We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.
2020-02-03 16:50:04 +00:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
Guillaume Chatelet fc19465965 [Alignment][NFC] Use Align for code creating MemOp
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73874
2020-02-03 14:10:30 +01:00
Guillaume Chatelet 75d9994a51 Fix broken invariant
Summary:
A Copy with a source that is zeros is the same as a Set of zeros.
This fixes the invariant that SrcAlign should always be non-null.

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73791
2020-02-03 11:01:05 +01:00
Fangrui Song 44cdae68c3 [CodeGenPrepare] Delete dead !DL check
Follow-up for D73754

DL is assigned in CodeGenPrepare::runOnFunction and is guaranteed to be
non-null.
2020-02-02 09:49:06 -08:00
Fangrui Song 5a56a25b0b [CodeGenPrepare] Make TargetPassConfig required
The code paths in the absence of TargetMachine, TargetLowering or
TargetRegisterInfo are poorly tested. As rL285987 said, requiring
TargetPassConfig allows us to delete many (untested) checks littered
everywhere.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D73754
2020-02-02 09:28:45 -08:00
Fangrui Song 5932f7b8f2 [PatchableFunction] Use an empty DebugLoc
The current FirstMI.getDebugLoc() is actually null in almost all cases.
If it isn't, the generated .loc will be considered initial. The .loc
will have the prologue_end flag and terminate the prologue prematurely.

Also use an overload of BuildMI that will not prepend
PATCHABLE_FUNCTION_ENTRY to a MachineInstr bundle.
2020-02-01 14:12:06 -08:00
Craig Topper 943b5561d6 [LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops.
This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html

The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes.

It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from.

This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle.

Differential Revision: https://reviews.llvm.org/D73749
2020-02-01 11:21:04 -08:00
Matt Arsenault bc101ffd77 GlobalISel: Support widening unmerge results with pointer source 2020-02-01 10:47:03 -05:00
David Blaikie 338beff4dc DwarfDebug.cpp: Fix some indentation 2020-01-31 16:01:57 -08:00
David Blaikie b33e5f3c3e DebugInfo: Split DWARF: Hash non-member function child DIEs
Significant missing hashing - as per the comment this was only meant to
skip member functions (unspecified, but I think it's legible as member
function declarations, not definitions) but was skipping all named
subprograms (so only hashed child DIEs for member function definitions -
because they didn't have a direct name, but only a name given indirectly
in the DW_AT_specification-referenced DIE)
2020-01-31 15:32:03 -08:00
Matt Arsenault 792d9b5719 DAG: Check if a value is divergent before requiresUniformRegister
This avoids a potentially expensive scan if we already know it doesn't
matter.
2020-01-31 15:27:18 -08:00
Jay Foad f465b1aff4 [GlobalISel] Tweak lowering of G_SMULO/G_UMULO
Summary:
Applying this cleanup:

    -      MIRBuilder.buildInstr(TargetOpcode::G_ASHR)
    -        .addDef(Shifted)
    -        .addUse(Res)
    -        .addUse(ShiftAmt);
    +      MIRBuilder.buildAShr(Shifted, Res, ShiftAmt);

caused an assertion failure here:

    llc: /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404: llvm::MachineInstr *llvm::MachineRegisterInfo::getVRegDef(unsigned int) const: Assertion `(I.atEnd() || std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed.

    #4  0x00000000050a6d96 in llvm::MachineRegisterInfo::getVRegDef (this=0x74606a0, Reg=2147483650) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403
    #5  0x00000000066148f6 in llvm::getConstantVRegValWithLookThrough (VReg=2147483650, MRI=..., LookThroughInstrs=false, HandleFConstant=true) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:244
    #6  0x00000000066147da in llvm::getConstantVRegVal (VReg=2147483650, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:210
    #7  0x0000000006615367 in llvm::ConstantFoldBinOp (Opcode=101, Op1=2147483650, Op2=2147483656, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:341
    #8  0x000000000657eee0 in llvm::CSEMIRBuilder::buildInstr (this=0x7465010, Opc=101, DstOps=..., SrcOps=..., Flag=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp:160
    #9  0x0000000003645958 in llvm::MachineIRBuilder::buildAShr (this=0x7465010, Dst=..., Src0=..., Src1=..., Flags=...) at /home/jayfoad2/git/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1298
    #10 0x00000000065c35b1 in llvm::LegalizerHelper::lower (this=0x7fffffffb5f8, MI=..., TypeIdx=0, Ty=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2020

because at this point there are two instructions defining Res: the
original G_SMULO/G_UMULO and the new G_MUL that we built. The fix is
to modify the original mul in place, so that there is only ever one
definition of Res.

Reviewers: arsenm, aditya_nandakumar

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72842
2020-01-31 19:21:01 +00:00
Simon Pilgrim 8fbc7fd567 [DAG] SimplifyMultipleUseDemandedBits - peek through unused ISD::INSERT_SUBVECTOR subvectors
If we don't demand any elements of the inserted subvector then just skip it.
2020-01-31 18:57:22 +00:00
Simon Pilgrim 5702dadf6f [DAG] Enable ISD::INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
2020-01-31 18:02:34 +00:00
Hiroshi Yamauchi ac8da31a0f [PGO][PGSO] Handle MBFIWrapper
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.

Add a variant for MBFIWrapper in the PGSO query interface.

Depends on D73494.
2020-01-31 09:36:55 -08:00
Jay Foad 2a1b5af299 [GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister
Summary:
As a side effect some redundant copies of constant values are removed by
CSEMIRBuilder.

Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73789
2020-01-31 17:07:16 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Quentin Colombet cfebd77742 [GISel][KnownBits] Fix a bug where we could run out of stack space
One of the exit criteria of computeKnownBits is whether we reach the max
recursive call depth. Before this patch we would check that the
depth is exactly equal to max depth to exit.

Depth may get bigger than max depth if it gets passed to a different
GISelKnownBits object.
This may happen when say a generic part uses a GISelKnownBits object
with some max depth, but then we hit TL.computeKnownBitsForTargetInstr
which creates a new GISelKnownBits object with a different and smaller
depth. In that situation, when we hit the max depth check for the first
time in the target specific GISelKnownBits object, depth may already
be bigger than the current max depth. Hence we would continue to compute
the known bits, until we ran through the full depth of the chain of
computation or ran out of stack space.

For instance, let say we have
GISelKnownBits Info(/*MaxDepth*/ = 10);
Info.getKnownBits(Foo)
// 9 recursive calls to computeKnownBitsImpl.
// Then we hit a target specific instruction.
// The target specific GISelKnownBits does this:
  GISelKnownBits TargetSpecificInfo(/*MaxDepth*/ = 6)
  TargetSpecificInfo.computeKnownBitsImpl() // <-- next max depth checks would
                                            // always return false.

This commit does not have any test case, none of the in-tree targets
use computeKnownBitsForTargetInstr.
2020-01-30 19:30:39 -08:00
Leonard Chan 2d3174c4df [SafeStack][DebugInfo] Insert DW_OP_deref in correct location
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in corrupt
symbols when debugging.

This is an attempt to reland with a few fixes for buildbot since I
haven't merged from master in a bit.

Differential Revision: https://reviews.llvm.org/D73526
2020-01-30 17:09:42 -08:00
Amara Emerson 84bd851108 [GlobalISel][IRTranslator] When translating vector geps, splat the base pointer if required.
We can have geps that have a scalar base pointer, and a vector index value, which
means that the base pointer must be splatted into a vector of pointers.

This fixes crashes on arm64 GlobalISel with optimizations enabled.
2020-01-30 16:27:27 -08:00
Leonard Chan 3b23453b6c Revert "[SafeStack][DebugInfo] Insert DW_OP_deref in correct location"
This reverts commit fff6a1b0f1.

This was breaking a bunch of buildbots.
2020-01-30 16:18:41 -08:00
Leonard Chan fff6a1b0f1 [SafeStack][DebugInfo] Insert DW_OP_deref in correct location
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in
corrupt symbols when debugging.

Differential Revision: https://reviews.llvm.org/D73526
2020-01-30 15:58:37 -08:00
Matt Arsenault eb7f74e300 CodeGen: Use Register 2020-01-30 15:01:56 -08:00
Sean Fertile 8b737688c2 [AIX] Minor cleanup in AsmPrinter. [NFC]
- Extends the comments related to function descriptors, noting how they
are only used on AIX.

- Changes the condition used to gate the creation of the current function
symbol in AsmPrinter::SetupMachineFunction to reflect being AIX
specific. The creation of the symbol is different because of AIXs
linkage conventions, not because AIX uses function descriptors.

Differential Revision: https://reviews.llvm.org/D73115
2020-01-30 14:15:02 -05:00
Fangrui Song 06b8e32d4f [AArch64] -fpatchable-function-entry=N,0: place patch label after BTI
Summary:
For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after
9a24488cb6, we place the NOP sled after
the initial BTI.

```
.Lfunc_begin0:
bti c
nop
nop

.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lfunc_begin0
```

This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label:

```
.Lfunc_begin0:
bti c
.Lpatch0:
nop
nop

.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lpatch0
```

This placement is compatible with the resolution in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 .

A local linkage function whose address is not taken does not need a BTI.
Placing the patch label after BTI has the advantage that code does not
need to differentiate whether the function has an initial BTI.

Reviewers: mrutland, nickdesaulniers, nsz, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73680
2020-01-30 11:11:52 -08:00
Matt Arsenault ea956685a1 GlobalISel: Implement s32->s64 G_FPTOSI lowering
Port directly from DAG version.

The lowering for G_FPTOUI used to fail on AMDGPU because it uses
G_FPTOSI.
2020-01-30 08:47:07 -05:00