Commit Graph

27514 Commits

Author SHA1 Message Date
Kerry McLaughlin c0a3ab3655 Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
This reverts commit 3f5bf35f86 as it was
causing build failures in llvm-clang-x86_64-expensive-checks:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045
2019-12-11 13:58:39 +00:00
Kerry McLaughlin 3f5bf35f86 [AArch64][SVE] Implement intrinsics for non-temporal loads & stores
Summary:
Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.

Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71000
2019-12-11 11:13:51 +00:00
Sjoerd Meijer d97cf1f889 [ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.

To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.

This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:

    ..
    dlstp.32  lr, r2
  .LBB0_1:
    mov r3, r2
    subs  r2, #4
    vldrh.u32 q2, [r1], #8
    vmov  q1, q0
    vmla.u32  q0, q2, r0
    letp  lr, .LBB0_1
  @ %bb.2:
    vctp.32 r3
    ..

which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.

Differential Revision: https://reviews.llvm.org/D71007
2019-12-11 10:20:19 +00:00
Sam Parker ee7579409b [ARM][TypePromotion] Enable by default
Enable the TypePromotion pass my default (again).

This patch was originally committed in 393dacacf7.
This patch was reverted in a38396939c.

Differential Revision: https://reviews.llvm.org/D70998
2019-12-11 10:00:16 +00:00
shkzhang 1408e7e175 [PowerPC] [CodeGen] Use MachineBranchProbabilityInfo in EarlyIfPredicator to avoid the potential bug
Summary:
In the function `EarlyIfPredicator::shouldConvertIf()`, we call
`TII->isProfitableToIfCvt()` with `BranchProbability::getUnknown()`, it may
cause the potential assertion error for those hook which use `BranchProbability`
in `isProfitableToIfCvt()`, for example `SystemZ`.
`SystemZ` use `Probability < BranchProbability(1, 8))` in the function
`SystemZInstrInfo::isProfitableToIfCvt()`, if we call this function with
`BranchProbability::getUnknown()`, it will cause assertion error.

This patch is to fix the potential bug.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D71273
2019-12-11 04:46:00 -05:00
Florian Hahn 11f311875f [LiveRegUnits] Add phys_regs_and_masks iterator range (NFC).
This iterator range just includes physical registers and register masks,
which are interesting when dealing with register liveness.

Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D70562
2019-12-11 09:34:42 +00:00
Craig Topper d4345636e6 [LegalizeTypes] Remove manual worklist management from SoftenFloatRes_FP_EXTEND.
I think this is no longer needed. The system should take care
of legalizing any new nodes that are added. I think this might
have been needed prior to r371709 or r307053.
2019-12-10 22:33:31 -08:00
Nico Weber caa4120906 Revert "[DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission."
This reverts commit 307f60a1a3.

DebugInfo/X86/debug-macinfo-split-dwarf.ll fails on Windows:

Command Output (stdout):
--
$ ":" "RUN: at line 1"
$ "c:\src\llvm-project\out\gn\bin\llc.exe" "-mtriple=x86_64-pc-windows-gnu" "-O0" "-split-dwarf-file=foo.dwo" "-filetype=obj"
Assertion failed: Section && "Cannot switch to a null section!", file ../../llvm/lib/MC/MCStreamer.cpp, line 1103
Stack dump:
0.	Program arguments: c:\src\llvm-project\out\gn\bin\llc.exe -mtriple=x86_64-pc-windows-gnu -O0 -split-dwarf-file=foo.dwo -filetype=obj
2019-12-10 21:32:30 -05:00
Puyan Lotfi f364686f34 [llvm][MIRVRegNamerUtil] Adding hashing against MachineInstr flags.
Now, flags will result in differing hashes for a given MI. In effect, if
you have two instructions with everything identical except for their
flags then you should get two different hashes and fewer collisions.

Differential Revision: https://reviews.llvm.org/D70479
2019-12-10 20:16:14 -05:00
Wang, Pengfei 21bc8631fe [FPEnv][X86] Constrained FCmp intrinsics enabling on X86
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.

Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70582
2019-12-11 08:23:09 +08:00
David Blaikie 4ffd3f44e3 DebugInfo: Clarify some more reasons v4 loc.dwo can't share much implementation with loclists.dwo 2019-12-10 14:11:03 -08:00
Vedant Kumar 30038da15b [DWARF] Allow cross-CU references of subprogram definitions
This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.

We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.

This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.

As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.

rdar://46577651

Differential Revision: https://reviews.llvm.org/D70350
2019-12-10 14:00:57 -08:00
Sourabh Singh Tomar 307f60a1a3 [DebugInfo] Refactored macro related generation, added a test case for macinfo.dwo emission.
Reviewers: dblaikie, aprantl, jini.susan.george

Tags: #debug-info #llvm

Differential Revision: https://reviews.llvm.org/D71008
2019-12-11 02:19:27 +05:30
Sourabh Singh Tomar fb4d8fe1a8 Recommit "[DWARF5] Start emitting DW_AT_dwo_name when -gdwarf-5 is specified."
Reviewers: dblaikie, aprantl, probinson

Tags: #debug-info #llvm

Differential Revision: https://reviews.llvm.org/D71185
2019-12-11 01:24:50 +05:30
Sourabh Singh Tomar d82b6ba21b Revert "[DWARF5] Start emitting DW_AT_dwo_name when -gdwarf-5 is specified."
This reverts commit 6ef01588f4.
Missing Differetial revision.
2019-12-11 01:20:40 +05:30
Sourabh Singh Tomar 6ef01588f4 [DWARF5] Start emitting DW_AT_dwo_name when -gdwarf-5 is specified. 2019-12-11 01:18:02 +05:30
Hans Wennborg 49da20ddb4 Revert 30e8f80fd5 "[DebugInfo] Don't create multiple DBG_VALUEs when sinking"
This caused non-determinism in the compiler, see command on the Phabricator
code review.

> This patch addresses a performance problem reported in PR43855, and
> present in the reapplication in in 001574938e5. It turns out that
> MachineSink will (often) move instructions to the first block that
> post-dominates the current block, and then try to sink further. This
> means if we have a lot of conditionals, we can needlessly create large
> numbers of DBG_VALUEs, one in each block the sunk instruction passes
> through.
>
> To fix this, rather than immediately sinking DBG_VALUEs, record them in
> a pass structure. When sinking is complete and instructions won't be
> sunk any further, new DBG_VALUEs are added, avoiding lots of
> intermediate DBG_VALUE $noregs being created.
>
> Differential revision: https://reviews.llvm.org/D70676
2019-12-10 19:20:11 +01:00
Sam Parker 933de40729 [TypePromotion] Query target register width
TargetLoweringInfo may report that an integer should be promoted, but
it maybe provide a size that isn't natively supported by the target
register file... So check this before trying to perform a promotion.

This is to fix some chromium issues:
https://bugs.chromium.org/p/chromium/issues/detail?id=1031978
https://bugs.chromium.org/p/chromium/issues/detail?id=1031979

Differential Revision: https://reviews.llvm.org/D71200
2019-12-10 13:23:00 +00:00
Kiran Chandramohan 965ed1e974 [AArch64] Fix issues with large arrays on stack
Summary:
This patch fixes a few issues when large arrays are allocated on the
stack. Currently, clang has inconsistent behaviour, for debug builds
there is an assertion failure when the array size on stack is around 2GB
but there is no assertion when the stack is around 8GB. For release
builds there is no assertion, the compilation succeeds but generates
incorrect code. The incorrect code generated is due to using
int/unsigned int instead of their 64-bit counterparts. This patch,
1) Removes the assertion in frame legality check.
2) Converts int/unsigned int in some places to the 64-bit variants. This
helps in generating correct code and removes the inconsistent behaviour.
3) Adds a test which runs without optimisations.

Reviewers: sdesmalen, efriedma, fhahn, aemerson

Reviewed By: efriedma

Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70496
2019-12-10 11:44:41 +00:00
Mikael Holmen 4763267eee [LegalizeTypes] Bugfixes for big-endian targets when handling BITCASTs
Summary:
This fixes PR44135.

The special case when we promote a bitcast from a vector to an int
needs special handling when we are on a big-endian target.

Prior to this fix, for the added vec_to_int we see the following in the
SelectionDAG printouts

Type-legalized selection DAG: %bb.1 'foo:bb.1'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
      t17: v4i32 = bitcast t2
    t23: i32 = extract_vector_elt t17, Constant:i32<3>
  t8: ch,glue = CopyToReg t0, Register:i32 $r0, t23
  t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1

and I think here the extract_vector_elt is wrong and extracts the value
from the wrong index.

The program program should return the 32 bits made up of the elements at
index 4 and 5 in the vec6 array, but with

    t23: i32 = extract_vector_elt t17, Constant:i32<3>

as far as I can tell, we will extract values that originally didn't even
exist in the vec6 vectore.

If we would instead extract the element at index 2 we would get the wanted
values.

With this fix we insert a right shift after the bitcast in
DAGTypeLegalizer::PromoteIntRes_BITCAST which then gives us

Type-legalized selection DAG: %bb.1 'vec_to_int:bb.1'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
      t23: v4i32 = bitcast t2
    t27: i32 = extract_vector_elt t23, Constant:i32<2>
  t8: ch,glue = CopyToReg t0, Register:i32 $r0, t27
  t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1

So now we get

    t27: i32 = extract_vector_elt t23, Constant:i32<2>

which is what we want.

Similarly, the new int_to_vec testcase exposes a bug where we cast the other
direction. Then we instead need to add a left shift before the bitcast on
big-endian targets for the bits in the input integer to end up at the exptected
place in the vector.

Reviewers: bogner, spatel, craig.topper, t.p.northover, dmgreen, efriedma, SjoerdMeijer, samparker

Reviewed By: efriedma

Subscribers: eli.friedman, bjope, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70942
2019-12-10 11:22:35 +01:00
Puyan Lotfi 479e3b85e2 [NFCi][llvm][MIRVRegNamerUtils] Making some code cleanup and stylistic changes.
Making some changes to MIRVRegNamerUtils.cpp to use some more modern c++
features as well as some changes to generally make the code more concise
and more understandable.

I make this an NFCi because in one case I drop the whole
"if (!MO->isDef()) MO->setIsKill(false);" thing that was added in the
original implementation, generally because I don't think this is really
semantically sound. I also changed up the implementation of
VRegRenamer::createVirtualRegisterWithLowerName somewhat because I am
now lower-casing the name unconditionally because I confirmed that that
was in fact aditya_nandakumar@apple.com's intent.

In all other cases, behavior should not be changed.

Differential Revision: https://reviews.llvm.org/D71182
2019-12-09 23:35:27 -05:00
Fangrui Song 9574757dba [MC] Delete MCCodePadder
D34393 added MCCodePadder as an infrastructure for padding code with
NOP instructions. It lacked tests and was not being worked on since
then.

Intel has now worked on an assembler patch to mitigate performance loss
after applying microcode update for the Jump Conditional Code Erratum.

https://www.intel.com/content/www/us/en/support/articles/000055650/processors.html

This new patch shares similarity with MCCodePadder, but has a concrete
use case in mind and is being actively developed. The infrastructure it
introduces can potentially be used for general performance improvement
via alignment. Delete the unused MCCodePadder so that people can develop
the new feature from a clean state.

Reviewed By: jyknight, skan

Differential Revision: https://reviews.llvm.org/D71106
2019-12-09 19:21:31 -08:00
QingShan Zhang 05b0c76aa7 [NFC][MacroFusion] Adding the assertion if someone want to fuse more than 2 instructions
As discussed in https://reviews.llvm.org/D69998, we miss to create some dependency edges
if chained more than 2 instructions. Adding an assertion here if someone want to chain
more than 2 instructions.

Differential Revision: https://reviews.llvm.org/D71180
2019-12-10 03:10:21 +00:00
Hiroshi Yamauchi d9ae493937 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

A second try after reverted D71072.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71149
2019-12-09 12:42:59 -08:00
Thomas Raoux caabb713ea [ModuloSchedule] Fix data types in ModuloScheduleExpander::isLoopCarried
The cycle values in modulo scheduling results can be negative.
The result of ModuloSchedule::getCycle() must be received as an int type.

Patch by Masaki Arai!

Differential Revision: https://reviews.llvm.org/D71122
2019-12-09 07:37:00 -08:00
Jeremy Morse 00e238896c [DebugInfo] Nerf placeDbgValues, with prejudice
CodeGenPrepare::placeDebugValues moves variable location intrinsics to be
immediately after the Value they refer to. This makes tracking of locations
very easy; but it changes the order in which assignments appear to the
debugger, from the source programs order to the order in which the
optimised program computes values. This then leads to PR43986 and PR38754,
where variable locations that were in a conditional block are made
unconditional, which is highly misleading.

This patch adjusts placeDbgValues to only re-order variable location
intrinsics if they use a Value before it is defined, significantly reducing
the damage that it does. This is still not 100% safe, but the rest of
CodeGenPrepare needs polishing to correctly update debug info when
optimisations are performed to fully fix this.

This will probably break downstream debuginfo tests -- if the
instruction-stream position of variable location changes isn't the focus of
the test, an easy fix should be to manually apply placeDbgValues' behaviour
to the failing tests, moving dbg.value intrinsics next to SSA variable
definitions thus:

  %foo = inst1
  %bar = ...
  %baz = ...
  void call @llvm.dbg.value(metadata i32 %foo, ...

to

  %foo = inst1
  void call @llvm.dbg.value(metadata i32 %foo, ...
  %bar = ...
  %baz = ...

This should return your test to exercising whatever it was testing before.

Differential Revision: https://reviews.llvm.org/D58453
2019-12-09 12:52:10 +00:00
David Stenberg 6965f835b4 [DebugInfo] Make describeLoadedValue() reg aware
Summary:
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.

This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.

This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.

This also fixes situations in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.

This also allows us to handle cases like this:

  $ebx = [...]
  $rdi = MOVSX64rr32 $ebx
  $esi = MOV32rr $edi
  CALL64pcrel32 @call

The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.

This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.

I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: ormris, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D70431
2019-12-09 10:47:49 +01:00
David Stenberg f3696533f2 Revert "[DebugInfo] Make describeLoadedValue() reg aware"
This reverts commit 3cd93a4efc.
I'll recommit with a well-formatted arcanist commit message.
2019-12-09 10:45:13 +01:00
David Stenberg 3cd93a4efc [DebugInfo] Make describeLoadedValue() reg aware
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.

This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.

This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.

This also fixes a case in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.

This also allows us to handle cases like this:

  $ebx = [...]
  $rdi = MOVSX64rr32 $ebx
  $esi = MOV32rr $edi
  CALL64pcrel32 @call

The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.

This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.

I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
2019-12-09 10:44:17 +01:00
Hans Wennborg a38396939c Revert 393dacacf7 "[ARM] Enable TypePromotion by default"
This caused "Too many bits for uint64_t" asserts when building Chromium. See
https://crbug.com/1031978#c2 for a reproducer. I'll follow up on the
llvm-commits thread with a creduced version.

> ARMCodeGenPrepare has already been generalized and renamed to
> TypePromotion. We've had it enabled and tested downstream for a
> while, so enable it by default.
>
> Differential Revision: https://reviews.llvm.org/D70998
2019-12-09 09:39:31 +01:00
rollrat 9fdb7ac503 [NFC][LivePhysRegs] Fix incorrect comment
Reviewers: #llvm, tellenbach

Reviewed By: tellenbach

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71051

Patch by: rollrat <rollrat.cse@gmail.com>
2019-12-08 21:07:28 +01:00
Ulrich Weigand 9db13b5a7d [FPEnv] Constrained FCmp intrinsics
This adds support for constrained floating-point comparison intrinsics.

Specifically, we add:

      declare <ty2>
      @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
                                          metadata <condition code>,
                                          metadata <exception behavior>)
      declare <ty2>
      @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
                                           metadata <condition code>,
                                           metadata <exception behavior>)

The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).

The condition code is implemented as a metadata string.  The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).

These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations.  Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes.  The patch includes support
for the common legalization operations for those nodes.

The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).

Differential Revision: https://reviews.llvm.org/D69281
2019-12-07 11:28:39 +01:00
Craig Topper 28b573d249 [TargetLowering] Fix another potential FPE in expandFP_TO_UINT
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:

// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)

Instead, I'd suggest to use the following sequence:

// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).

In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.

There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)

Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper

Differential Revision: https://reviews.llvm.org/D67105
2019-12-06 14:11:04 -08:00
Hiroshi Yamauchi 2eb30fafa5 Revert "[PGO][PGSO] Instrument the code gen / target passes."
This reverts commit 9a0b5e1407.

This seems to break buildbots.
2019-12-06 12:17:32 -08:00
Hiroshi Yamauchi 9a0b5e1407 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71072
2019-12-06 10:43:39 -08:00
Guozhi Wei 72942459d0 [MBP] Avoid tail duplication if it can't bring benefit
Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead.

To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks:

  make sure there is at least one duplication in current work set.
  the number of duplication should not exceed the number of successors.

The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain.

Differential Revision: https://reviews.llvm.org/D64376
2019-12-06 09:53:53 -08:00
John Brawn 984f1bb3e7 [LegalizeTypes] Add missing case for STRICT_FP_ROUND softening
This fixes a test failure in test/CodeGen/ARM/fp-intrinsics.ll.
2019-12-06 15:54:27 +00:00
Jeremy Morse c93a9b15ce [DebugInfo][CGP] Update dbg.values when sinking address computations
One of CodeGenPrepare's optimizations is to duplicate address calculations
into basic blocks, so that as much information as possible can be folded
into memory addressing operands. This is great -- but the dbg.value
variable location intrinsics are not updated in the same way. This can lead
to dbg.values referring to address computations in other blocks that will
never be encoded into the DAG, while duplicate address computations are
performed locally that could be used by the dbg.value. Some of these (such
as non-constant-offset GEPs) can't be salvaged past.

Fix this by, whenever we duplicate an address computation into a block,
looking for dbg.value users of the original memory address in the same
block, and redirecting those to the local computation.

Differential Revision: https://reviews.llvm.org/D58403
2019-12-06 11:27:19 +00:00
Ulrich Weigand daee549b17 [FPEnv][SelectionDAG] Relax chain requirements
This patch implements the following changes:

1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats
each constrained intrinsic like a global barrier (e.g. a function call)
and fully serializes all pending chains. This is actually not required;
it is allowed for constrained intrinsics to be reordered w.r.t one
another or (nonvolatile) memory accesses. The MI-level scheduler already
allows for that flexibility, so it makes sense to allow it at the DAG
level as well.

This patch therefore changes the way chains for constrained intrisincs
are created, and handles them basically like load operations are handled.
This has the effect that constrained intrinsics are no longer serialized
against one another or (nonvolatile) loads. They are still serialized
against stores, but that seems hard to change with the current DAG chain
setup, and it also doesn't seem to be a big problem preventing DAG

2) The OPC_CheckFoldableChainNode check requires that each of the
intermediate nodes in a multi-node pattern match only has a single use.
This check tends to fail if those intermediate nodes are strict operations
as those have a chain output that typically indeed has another use.
However, we don't really need to consider chains here at all, since they
will all be rewritten anyway by UpdateChains later. Other parts of the
matcher therefore already ignore chains, but this hasOneUse check doesn't.

This patch replaces hasOneUse by a custom test that verifies there is no
more than one use of any non-chain output value.

In theory, this change could affect code unrelated to strict FP nodes,
but at least on SystemZ I could not find any single instance of that
happening

3) The SystemZ back-end currently does not allow matching multiply-and-
extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for
strict FP operations.  This was not possible in the past due to the
problems described under 1) and 2) above.

With those issues fixed, it is now possible to fully support those
instructions in strict mode as well, and this patch does so.

Differential Revision: https://reviews.llvm.org/D70913
2019-12-06 11:02:11 +01:00
Alexey Lapshin 9e8c799e2b [Dsymutil][NFC] Move NonRelocatableStringpool into common CodeGen folder.
That refactoring moves NonRelocatableStringpool into common CodeGen folder.
So that NonRelocatableStringpool could be used not only inside dsymutil.

Differential Revision: https://reviews.llvm.org/D71068
2019-12-06 10:02:27 +03:00
David Blaikie 560ab1f8d3 DebugInfo: Pull out a common expression.
This is for the case where -gmlt -gsplit-dwarf -fsplit-dwarf-inlining
are used together in some but not all units during LTO (or, in the
reduced case, even without LTO) - ensuring that no split dwarf is used
(because split-dwarf-inlining puts the same data in the .o file, so
there's no need to duplicate it into the .dwo file)
2019-12-05 19:51:30 -08:00
Quentin Colombet 2ec71ea7c7 [RegisterCoalescer] Fix the creation of subranges when rematerialization is used
* Context *

During register coalescing, we use rematerialization when coalescing is not
possible. That means we may rematerialize a super register when only a smaller
register is actually used.
E.g.,
0B v1 = ldimm 0xFF
1B v2 = COPY v1.low8bits
2B   = v2
=>
0B v1 = ldimm 0xFF
1B v2 = ldimm 0xFF
2B   = v2.low8bits

Where xB are the slot indexes.
Here v2 grew from a 8-bit register to a 16-bit register.

When that happens and subregister liveness is enabled, we create subranges for
the newly created value.
E.g., before remat, the live range of v2 looked like:
main range: [1r, 2r)
(Reads v2 is defined at index 1 slot register and used before the slot register
of index 2)

After remat, it should look like:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 1d) <-- dead def

I.e., the unsused lanes of v2 should be marked as dead definition.

* The Problem *

Prior to this patch, the live-ranges from the previous exampel, would have the
full live-range for all subranges:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long

* The Fix *

Technically, the code that this patch changes is not wrong:
When we create the subranges for the newly rematerialized value, we create only
one subrange for the whole bit mask.
In other words, at this point v2 live-range looks like this:
main range: [1r, 2r)
low & high: [1r, 2r)

Then, it gets wrong when we call LiveInterval::refineSubRanges on low 8 bits:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long

Ideally, we would like LiveInterval::refineSubRanges to be able to do the right
thing and mark the dead lanes as such. However, this is not possible, because by
the time we update / refine the live ranges, the IR hasn't been updated yet,
therefore we actually don't have enough information to do the right thing.

Another option to fix the problem would have been to call
LiveIntervals::shrinkToUses after the IR is updated. This is not desirable as
this may have a noticeable impact on compile time.

Instead, what this patch does is when we create the subranges for the
rematerialized value, we explicitly create one subrange for the lanes that were
used before rematerialization and one for the lanes that were not used. The used
one inherits the live range of the main range and the unused one is just created
empty. The existing rematerialization code then detects that the unused one are
not live and it correctly sets dead def intervals for them.

https://llvm.org/PR41372
2019-12-05 16:32:30 -08:00
David Blaikie decee04e63 DebugInfo: Fix LTO+DWARFv5 loclists
The loclists_table_base was being overwritten for each CU even though
only one loclists contribution is made so everything but the last CU
would have a label that was never defined and fail to assemble.
2019-12-05 12:47:54 -08:00
Volkan Keles bfa3d260b8 [GlobalISel] Localizer: Allow targets not to run the pass conditionally
Summary:
Previously, it was not possible to skip running the localizer pass
conditionally. This patch adds an input function to the pass which
decides if the pass should run on the given MachineFunction or not.

No test case as there is no upstream target needs this functionality.

Reviewers: qcolombet

Reviewed By: qcolombet

Subscribers: rovka, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71038
2019-12-05 11:09:50 -08:00
Jeremy Morse 30e8f80fd5 [DebugInfo] Don't create multiple DBG_VALUEs when sinking
This patch addresses a performance problem reported in PR43855, and
present in the reapplication in in 001574938e5. It turns out that
MachineSink will (often) move instructions to the first block that
post-dominates the current block, and then try to sink further. This
means if we have a lot of conditionals, we can needlessly create large
numbers of DBG_VALUEs, one in each block the sunk instruction passes
through.

To fix this, rather than immediately sinking DBG_VALUEs, record them in
a pass structure. When sinking is complete and instructions won't be
sunk any further, new DBG_VALUEs are added, avoiding lots of
intermediate DBG_VALUE $noregs being created.

Differential revision: https://reviews.llvm.org/D70676
2019-12-05 15:52:20 +00:00
Jeremy Morse e4cdd62631 [DebugInfo] Don't reorder DBG_VALUEs when sunk
Fix part of PR43855, resolving a problem that comes from the reapplication
in 001574938e5. If we have two DBG_VALUE insts in a block that specify
the location of the same variable, for example:

   %0 = someinst
   DBG_VALUE %0, !123, !DIExpression()
   %1 = anotherinst
   DBG_VALUE %1, !123, !DIExpression()

if %0 were to sink, the corresponding DBG_VALUE would sink too, past the
next DBG_VALUE, effectively re-ordering assignments. To fix this, I've
added a SeenDbgVars set recording what variable locations have been seen in
a block already (working bottom up), and now flag DBG_VALUEs that would
pass a later DBG_VALUE for the same variable.

NB, this only works for repeated DBG_VALUEs in the same basic block, the
general case involving control flow is much harder, which I've written
up in PR44117.

Differential revision: https://reviews.llvm.org/D70672
2019-12-05 15:52:20 +00:00
Jeremy Morse fca4100196 [DebugInfo] Re-apply two patches to MachineSink
These were:
 * D58386 / f5e1b718a6 / reverted in d382a8a768
 * D58238 / ee50590e16 / reverted in a8db456b53

Of which the latter has a performance regression tracked in PR43855,
fixed by D70672 / D70676, which will be committed atomically with this
reapplication.

Contains a minor difference to account for a change in the IsCopyInstr
signature.
2019-12-05 15:52:20 +00:00
Sam Parker 393dacacf7 [ARM] Enable TypePromotion by default
ARMCodeGenPrepare has already been generalized and renamed to
TypePromotion. We've had it enabled and tested downstream for a
while, so enable it by default.

Differential Revision: https://reviews.llvm.org/D70998
2019-12-05 14:21:11 +00:00
Djordje Todorovic 52b231ee84 [LiveDebugValues] Silence the unused var warning; NFC 2019-12-05 12:32:14 +01:00
David Stenberg 54682d871d [DebugInfo] Handle call site values for instructions before call bundle
Summary:
If a call is bundled then the code that looks for instructions that
produce parameter values would break when reaching the call's bundle
header, due to the `ifCall(/*AnyInBundle*/)` invocation returning true.

It is not enough to simply ignore bundle headers in the `isCall()`
invocation, as the bundle header may have defines of parameter registers
due to the call, meaning that such registers would incorrectly be
removed from the worklist. Therefore, do not look at bundle headers at
all.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: aprantl, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D71024
2019-12-05 11:50:41 +01:00