Commit Graph

160471 Commits

Author SHA1 Message Date
Mirko Brkusanin 6a1aa627fa [AMDGPU] Enable image_gather4h instruction for gfx10 and gfx11
Differential Revision: https://reviews.llvm.org/D130764
2022-07-29 15:42:06 +02:00
Alexey Lapshin ece341f598 [Debuginfo][DWARF][NFC] Add paired methods working with DWARFDebugInfoEntry.
This review is extracted from D96035.

DWARF Debuginfo classes have two representations for DIEs: DWARFDebugInfoEntry
(short) and DWARFDie(extended). Depending on the task, it might be more convenient
to use DWARFDebugInfoEntry or/and DWARFDie. DWARFUnit class already has methods
working with DWARFDie and DWARFDebugInfoEntry. This patch adds more
methods working with DWARFDebugInfoEntry to have paired functionality.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D126059
2022-07-29 16:40:17 +03:00
Jay Foad 3cfa9b1431 [AMDGPU] user-sgpr-init16-bug does not apply to gfx1103
Differential Revision: https://reviews.llvm.org/D130347
2022-07-29 14:21:13 +01:00
Simon Pilgrim af1b7ebcdf [TargetLowering] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC.
Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.
2022-07-29 14:20:35 +01:00
Matt Arsenault ef906f287e AMDGPU: Fix assertion when printing unreachable functions
Since 814a0abcce, this would break if we
had a function in the module that becomes dead in any codegen IR
pass. The function wasn't deleted since it was initially used in dead
code, but is detached from the call graph and doesn't appear in the PO
traversal. Do a second walk over the module to populate the resources
of any functions which weren't already processed.
2022-07-29 08:57:43 -04:00
Matt Arsenault a4834ad068 RegisterCoalescer: Shrink main range after shrinking subranges
If the subregister uses were dead, this would leave the main range
segment pointing to a deleted instruction.

Not sure if this should try to avoid shrinking if we know we don't
have dead components.
2022-07-29 08:57:28 -04:00
Alexander Timofeev d7ae1a9097 Revert "[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs"
This reverts commit 76d9ae924c.
because it causes several VK CTS tests to fail
2022-07-29 14:19:07 +02:00
Simon Pilgrim 641dba9e28 [DAG] Move a few hasOneUse() tests later to reduce unnecessary computations. NFC.
Many of these cases, an early-out on the much cheaper getOpcode() check will avoid us needing to call hasOneUse() entirely.
2022-07-29 11:34:39 +01:00
Simon Pilgrim 9082c13106 [Support] Add KnownBits::concat method
Add a method for the various cases where we need to concatenate 2 KnownBits together (BUILD_PAIR and SHIFT_PARTS in particular) - uses the existing APInt::concat 'HiBits.concat(LoBits)' convention

Differential Revision: https://reviews.llvm.org/D130557
2022-07-29 11:06:39 +01:00
Jay Foad d03110155b [IR] Simplify Intrinsic::getDeclaration. NFC. 2022-07-29 10:45:22 +01:00
wanglei 56ab2f4ccd [LoongArch] Offset folding for frameindex
This patch is for frameindex calculations.

Differential Revision: https://reviews.llvm.org/D130248
2022-07-29 17:27:34 +08:00
wanglei fd6545322c [LoongArch] Refactor insertDivByZeroTrap
Ensure non-terminators don't follow terminators.
This patch fixes the `sdiv-udiv-srem-urem.ll` test failure with
expensive check.

Differential Revision: https://reviews.llvm.org/D130247
2022-07-29 17:06:49 +08:00
David Sherwood 487fa6f8c3 [AArch64][DAGCombine] Add performBuildVectorCombine 'extract_elt ~> anyext'
A build vector of two extracted elements is equivalent to an extract
subvector where the inner vector is any-extended to the
extract_vector_elt VT, because extract_vector_elt has the effect of an
any-extend.

  (build_vector (extract_elt_i16_to_i32 vec Idx+0) (extract_elt_i16_to_i32 vec Idx+1))
  => (extract_subvector (anyext_i16_to_i32 vec) Idx)

Depends on D130697

Differential Revision: https://reviews.llvm.org/D130698
2022-07-29 09:51:09 +01:00
Florian Hahn 214e2d8fe5
[SCEV] Avoid repeated proveNoSignedWrapViaInduction calls.
At the moment, proveNoSignedWrapViaInduction may be called for the
same AddRec a large number of times via getSignExtendExpr. This can have
a severe compile-time impact for very loop-heavy code.

If proveNoSignedWrapViaInduction failed to prove NSW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.

This is the signed version of 8daa338297 / D130648.

This can drastically improve compile-time in some excessive cases and
also has a slightly positive compile-time impact on CTMark:

NewPM-O3: -0.06%
NewPM-ReleaseThinLTO: -0.04%
NewPM-ReleaseLTO-g: -0.04%

https://llvm-compile-time-tracker.com/compare.php?from=8daa338297d533db4d1ae8d3770613eb25c29688&to=aed126a196e7a5a9803543d9b4d6bdb233d0009c&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130694
2022-07-29 09:15:03 +01:00
Sunho Kim e590f945c6 Revert "[JITLink][COFF] Implement include/alternatename linker directive."
This reverts commit f1fcd06a2a.

Faliures reported in
https://lab.llvm.org/buildbot/#/builders/193/builds/16143 and http://lab.llvm.org/buildbot/#/builders/91/builds/13010
2022-07-29 17:03:19 +09:00
Sunho Kim f1fcd06a2a [JITLink][COFF] Implement include/alternatename linker directive.
Implements include/alternatename linker directive. Alternatename is used by static msvc runtime library. Alias symbol is technically incorrect (we have to search for external definition) but we don't have a way to represent this in jitlink/orc yet, this is solved in the following up patch.

Inlcude linker directive is used in ucrt to forcelly lookup the static initializer symbols so that they will be emitted. It's implemented as extenral symbols with live flag on that cause the lookup of these symbols.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130276
2022-07-29 16:48:29 +09:00
Sunho Kim 049fd21b42 [JITLink][COFF][x86_64] Implement ADDR64 relocation.
Implements ADDR64 relocation using x86_64 edge kind.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130178
2022-07-29 16:32:07 +09:00
Sunho Kim 410e0aa759 [JITLink][COFF] Implement dllimport stubs.
Implements dllimport stubs using GOT table manager. Benefit of using GOT table manager is that we can just reuse jitlink-check architecture.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D130175
2022-07-29 16:29:53 +09:00
Sunho Kim 6b27890b2c [ORC][COFF] Handle COFF import files of static archive.
Handles COFF import files of static archive. Changes static library genrator to build up object file map keyed by symbol name that excludes the symbols from dllimported symbols so that static generator will not be responsible for them. It exposes the list of dynamic libraries that need to be imported. Client should properly load the libraries in this list beforehand. Object file map is also an improvment from the past in terms of performance. Archive.findSym does a slow O(n) linear serach of symbol list to find the symbol. (we call findSym O(n) times, thus full time complexity is O(n^2); we were the only user of findSym function in fact)

There is a room for improvements in how to load the libraries in the list. We currently just hand the responsibility over to the clinet. A better way would be let ORC read this list and hand them over to JITLink side that would also help validation (e.g. not trying to generate stub for non dllimported targets) Nevertheless, we will have to exclude the symbols from COFF import object file list and need a way to access this list, which this patch offers.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D129952
2022-07-29 16:25:08 +09:00
Changpeng Fang 2b731b30a7 AMDGPU: Take care of "tied" operand when removeOperand
Summary:
  Flat scratch load of D16 type by default has tied vdst_in operand (with vdst). This should be taken
care of at the time of "removeOperand" in eliminateFrameIndex. Otherwise we will hit an assert saying
"Cannot move tied operands". This patch unties vdst_in before the move, and retie it with vdst afterwards.

Reviewers:
  arsenm, foad

Differential Revision: https://reviews.llvm.org/D130537
2022-07-28 17:30:49 -07:00
Francis Visoiu Mistrih bfd3883e83 [Matrix] Refactor transpose distribution. NFC
Use a function to distribute transposes. Preparation for future patches.
2022-07-28 17:30:00 -07:00
Felipe de Azevedo Piovezan 58526b2d2b [GlobalISel] Handle nullptr constants in dbg.value
Currently, the LLVM IR -> MIR translator fails to translate dbg.values
whose first argument is a null pointer. However, in other portions of
the code, such pointers are always lowered to the constant zero, for
example see IRTranslator::Translate(Constant, Register).

This patch addresses the limitation by following the same approach of
lowering null pointers to zero.

A prior test was checking that null pointers were always lowered to
$noreg; this test is changed to check for zero, and the previous
behavior is now checked by introducing a dbg.value whose first argument
is the address of a global variable.

Differential Revision: https://reviews.llvm.org/D130721
2022-07-28 14:58:14 -07:00
Felipe de Azevedo Piovezan 0ef6809c48 [GlobalISel][nfc] Remove unnecessary cast
The getOperand method already returns a Constant when it is called on
a ConstantExpression, as such the cast is not needed. To prevent a type
mismatch between the different return statements of the lambda, the
lambda return type is explicitly provided.

Differential Revision: https://reviews.llvm.org/D130719
2022-07-28 14:55:07 -07:00
Anshil Gandhi 5c38056431 [AMDGPU][Scheduler] Avoid initializing Register pressure tracker when tracking is disabled
When register pressure tracking is disabled, the scheduler attempts to load
pressures at SReg_32 and VGPR_32. This causes an index out of bounds error.
This patch fixes this issue by disabling the initialization of RPTracker
when not needed. NFC

Reviewed By: rampitec, kerbowa, arsenm

Differential Revision: https://reviews.llvm.org/D129322
2022-07-28 15:39:28 -06:00
David Blaikie 6139626d73 llvm-dwp: Include dwo name even when the input is a dwo
This still only includes the dwo name if it's in the DW_AT_dwo_name
attribute in the split unit - though it could be improved/modified to
use the dwo name from the command line (if linking raw dwo files) or
retrieved from the DW_AT_dwo_name in the executable (when using -e).

It's useful in any case because you might have a large command line with
many files and knowing exactly which dwo files are relevant will
simplify debugging, but especially with '-e' when you didn't pass the
dwo files explicitly in nthe first place it would be quite non-obvious
where the duplicate units are coming from.
2022-07-28 20:24:05 +00:00
Alexey Lapshin e74197bc12 [Reland][Debuginfo][llvm-dwarfutil] Add check for unsupported debug sections.
Current DWARFLinker implementation does not support some debug sections
(mainly DWARF v5 sections). This patch adds diagnostic for such sections.
The warning would be displayed for critical(such that could not be removed)
sections and the source file would be skipped. Other unsupported sections
would be removed and warning message should be displayed. The zero exit
status would be returned for both cases.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D123623
2022-07-28 21:29:16 +03:00
Austin Kerbow 0f93a45b11 [AMDGPU] Add isMeta flag to SCHED_GROUP_BARRIER 2022-07-28 11:04:33 -07:00
Fangrui Song c26dc2904b [llvm-objcopy] Support --{,de}compress-debug-sections for zstd
Also, add ELFCOMPRESS_ZSTD (2) from the approved generic-abi proposal:
https://groups.google.com/g/generic-abi/c/satyPkuMisk
("Add new ch_type value: ELFCOMPRESS_ZSTD")

Link: https://discourse.llvm.org/t/rfc-zstandard-as-a-second-compression-method-to-llvm/63399
("[RFC] Zstandard as a second compression method to LLVM")

Differential Revision: https://reviews.llvm.org/D130458
2022-07-28 10:45:53 -07:00
Austin Kerbow f5b21680d1 [AMDGPU] Add amdgcn_sched_group_barrier builtin
This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used either for
testing, in situations where the default scheduler heuristics cannot be
improved, or in critical kernels where users are trying to get performance that
is close to handwritten assembly. Obviously using these builtins will require
extra work from the kernel writer to maintain the desired behavior.

The builtin can be used to create groups of instructions called "scheduling
groups" where ordering between the groups is enforced by the scheduler.
__builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter
is a mask that determines the types of instructions that you would like to
synchronize around and add to a scheduling group. These instructions will be
selected from the bottom up starting from the sched_group_barrier's location
during instruction scheduling. The second parameter is the number of matching
instructions that will be associated with this sched_group_barrier. The third
parameter is an identifier which is used to describe what other
sched_group_barriers should be synchronized with. Note that multiple
sched_group_barriers must be added in order for them to be useful since they
only synchronize with other sched_group_barriers. Only "scheduling groups" with
a matching third parameter will have any enforced ordering between them.

As an example, the code below tries to create a pipeline of 1 VMEM_READ
instruction followed by 1 VALU instruction followed by 5 MFMA instructions...
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 1 VALU
__builtin_amdgcn_sched_group_barrier(2, 1, 0)
// 5 MFMA
__builtin_amdgcn_sched_group_barrier(8, 5, 0)
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 3 VALU
__builtin_amdgcn_sched_group_barrier(2, 3, 0)
// 2 VMEM_WRITE
__builtin_amdgcn_sched_group_barrier(64, 2, 0)

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D128158
2022-07-28 10:43:14 -07:00
Craig Topper 2750873dfe [RISCV] Update lowerFROUND to use masked instructions.
This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.

To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.

I plan to do a similar update for trunc, floor, and ceil.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D130659
2022-07-28 10:05:19 -07:00
Craig Topper 89173dee71 [RISCV] Remove duplicate code. NFC
The same operations are part of `FloatingPointVecReduceOps` a little
bit earlier.
2022-07-28 10:05:19 -07:00
Simon Pilgrim 8c99cef1e7 [DAG] Remove SelectionDAG::GetDemandedBits and use SimplifyMultipleUseDemandedBits directly.
GetDemandedBits is mainly a wrapper around SimplifyMultipleUseDemandedBits now, and is only used by DAGCombiner::visitSTORE so I've moved all remaining functionality there.

visitSTORE was making use of this to 'simplify' constants for a trunc-store. Just removing this code left to a mixture of regressions and gains - it came down to whether a target preferred a sign or zero extended constant for materialization/truncation. I've just moved the code over for now, but a next step would be to move this to targetShrinkDemandedConstant, but some targets that override the method expect a basic binop, and might react badly to a store node.....
2022-07-28 17:03:44 +01:00
Philip Reames 82c1b136db [LV] Don't predicate uniform mem op stores unneccessarily
We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible.

Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would *not* be legal to make this same change for merely "uniform" operations.

Differential Revision: https://reviews.llvm.org/D130637
2022-07-28 08:55:52 -07:00
Liqiang Tao d52e775b05 [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 22:44:03 +08:00
Liqiang Tao c113594378 Revert "[llvm][ModuleInliner] Add inline cost priority for module inliner"
This reverts commit bb7f62bbbd.
2022-07-28 22:36:28 +08:00
Florian Hahn f912bab111
Revert "[X86][DAGISel] Don't widen shuffle element with AVX512"
This reverts commit 5fb4134210.

This patch is causing crashes when building llvm-test-suite when
optimizing for CPUs with AVX512.

Reproducer crashing with llc:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-apple-macosx"

    define i32 @test(<32 x i32> %0) #0 {
    entry:
      %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
      %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1)
      ret i32 %2
    }

    ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
    declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

    attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
    attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }
2022-07-28 15:26:42 +01:00
Simon Pilgrim be488ba7de [DAG] DAGCombiner::visitTRUNCATE - remove GetDemandedBits call
This should now all be handled by SimplifyDemandedBits.
2022-07-28 15:23:04 +01:00
Simon Pilgrim ea7f14dad0 [DAG] SelectionDAG::GetDemandedBits - don't simplify opaque constants
I'm actually trying to get rid of GetDemandedBits - but while dismantling it I noticed that we were altering opaque constants. Fixing that causes a FP_TO_INT_SAT regression that should be addressed separately - I'll raise a bug.
2022-07-28 14:46:59 +01:00
Liqiang Tao bb7f62bbbd [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 21:28:07 +08:00
Simon Pilgrim 69d5a038b9 [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits.

There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP.

Differential Revision: https://reviews.llvm.org/D77804
2022-07-28 14:10:44 +01:00
Amaury Séchet 474a8ee03d [DAG] Use recursivelyDeleteUnusedNodes in PromoteLoad
It simplifies the code overall and removes the need for manual bookkeeping.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130447
2022-07-28 12:54:52 +00:00
Amaury Séchet 7920805b27 [DAG] Use recursivelyDeleteUnusedNodes in ReplaceLoadWithPromotedLoad
It simplifies the code overall and removes the need for manual bookkeeping.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130444
2022-07-28 12:32:37 +00:00
Alexander Timofeev 76d9ae924c [AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs
In the 2e29b0138c we introduce a specific solving algorithm
that analyzes the VGPR to SGPR copies use chains and either lowers
the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms.
Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs
in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs
are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert
copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues.
At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain
were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF
because of the weird mixing of SALU and VALU instructions.
This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact
that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs,
we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common
VGPR to SGPR lowering algorithm.
This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130367
2022-07-28 14:30:29 +02:00
Sunho Kim 72ea1a721e [ORC] Fix weak hidden symbols failure on PPC with runtimedyld
Fix "JIT session error: Symbols not found: [ DW.ref.__gxx_personality_v0 ] error" which happens when trying to use exceptions on ppc linux. To do this, it expands AutoClaimSymbols option in RTDyldObjectLinkingLayer to also claim weak symbols before they are tried to be resovled. In ppc linux, DW.ref symbols is emitted as weak hidden symbols in the later stage of MC pipeline. This means when using IRLayer (i.e. LLJIT), IRLayer will not claim responsibility for such symbols and RuntimeDyld will skip defining this symbol even though it couldn't resolve corresponding external symbol.

Reviewed By: sgraenitz

Differential Revision: https://reviews.llvm.org/D129175
2022-07-28 21:12:25 +09:00
Dmitry Preobrazhensky 2b230d69ad [AMDGPU][MC][GFX90A] Correct MIMG dst size validation
Correct validator to enable MIMG dst size checks.

Differential Revision: https://reviews.llvm.org/D130512
2022-07-28 14:30:08 +03:00
Sanjay Patel 28ad5dc3f7 [InstCombine] try harder to narrow bitwise logic with cast operands
This works with any logic + extend:
https://alive2.llvm.org/ce/z/vzsqQD

The motivating case is from issue #56294, but that's still not optimal
(it should simplify completely).
2022-07-28 07:23:22 -04:00
Dmitry Preobrazhensky fa7fd8ec31 [AMDGPU][MC][GFX11] Disable SGPRs for src1 of v_fma_mix*_dpp opcodes
Differential Revision: https://reviews.llvm.org/D130634
2022-07-28 14:20:05 +03:00
chendewen 7eeb468ae5 [Aarch64] Add cost for missing extensions.
This patch adds a cost estimate for some missing sign extensions.
ref: https://reviews.llvm.org/D14730

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D130565
2022-07-28 17:34:00 +08:00
Florian Hahn 8daa338297
[SCEV] Avoid repeated proveNoUnsignedWrapViaInduction calls.
At the moment, proveNoUnsignedWrapViaInduction may be called for the
same AddRec a large number of times via getZeroExtendExpr. This can have
a severe compile-time impact for very loop-heavy code. One one
particular workload, LSR takes ~51s without this patch, almost
exlusively in proveNoUnsignedWrapViaInduction. With this patch, the time
in LSR drops to ~0.4s.

If proveNoUnsignedWrapViaInduction failed to prove NUW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.

Besides drastically improving compile-time in some excessive cases, this
also has a slightly positive compile-time impact on CTMark:

NewPM-O3: -0.07%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.06

https://llvm-compile-time-tracker.com/compare.php?from=b435da027d7774c24cdb8c88d09f6b771e07fb14&to=f2729e33e8284b502f6c35a43345272252f35d12&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130648
2022-07-28 10:02:19 +01:00
David Spickett a0ccba5e19 [llvm] Fix some test failures with EXPENSIVE_CHECKS and libstdc++
DebugLocEntry assumes that it either contains 1 item that has no fragment
or many items that all have fragments (see the assert in addValues).

When EXPENSIVE_CHECKS is enabled, _GLIBCXX_DEBUG is defined. On a few machines
I've checked, this causes std::sort to call the comparator even
if there is only 1 item to sort. Perhaps to check that it is implemented
properly ordering wise, I didn't find out exactly why.

operator< for a DbgValueLoc will crash if this happens because the
optional Fragment is empty.

Compiler/linker/optimisation level seems to make this happen
or not. So I've seen this happen on x86 Ubuntu but the buildbot
for release EXPENSIVE_CHECKS did not have this issue.

Add an explicit check whether we have 1 item.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D130156
2022-07-28 08:53:38 +00:00