Commit Graph

146728 Commits

Author SHA1 Message Date
Roman Lebedev b8701dc174
[X86] AMD Zen 3: mark XMM/YMM (but not MMX!) reg moves as eliminatible in RegisterFile 2021-05-07 20:11:21 +03:00
Roman Lebedev 5b1610a250
[X86] AMD Zen 3: MOVSX32rr32 is a zero-cycle move
It measures as such, and the reference docs agree.

I can't easily add a MCA test, because there's no mnemonic for it,
it can only be disassembled or created as a MCInst.
2021-05-07 20:11:20 +03:00
Fangrui Song 6a2850f3fc [AArch64][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local
Similar to X86 D73230 & 46788a21f9

With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode,
for default visibility external linkage non-ifunc-non-COMDAT definitions.

For such dso_local definitions, variable access/taking the address of a
function/calling a function will go through a local alias to avoid GOT/PLT.

Note: the 'S' inline assembly constraint refers to an absolute symbolic address
or a label reference (D46745).

Differential Revision: https://reviews.llvm.org/D101872
2021-05-07 09:44:26 -07:00
Whitney Tsang 1006ac3963 [LoopNest] Consider loop nest with inner loop guard using outer loop
induction variable to be perfect

This patch allow more conditional branches to be considered as loop
guard, and so more loop nests can be considered perfect.

Reviewed By: bmahjour, sidbav

Differential Revision: https://reviews.llvm.org/D94717
2021-05-07 16:04:18 +00:00
Simon Pilgrim f744723f75 [X86] combineXor - limit fold to non-opaque constants (PR50254)
Ensure we don't try to fold when one might be an opaque constant - the constant fold will fail and then the reverse fold will happen in DAGCombine.....
2021-05-07 16:39:24 +01:00
Roman Lebedev 2819009b5a
[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261)
Sometimes disassembler picks _REV variants of instructions
over the plain ones, which in this case exposed an issue
that the _REV variants aren't being modelled as optimizable moves.
2021-05-07 18:27:40 +03:00
Joseph Tremoulet bc302bfbef BasicAA: Recognize inttoptr as isEscapeSource
Pointers escape when converted to integers, so a pointer produced by
converting an integer to a pointer must not be a local non-escaping
object.

Reviewed By: nikic, nlopes, aqjune

Differential Revision: https://reviews.llvm.org/D101541
2021-05-07 07:48:50 -07:00
Roman Lebedev 758c173309
[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6
They are resolved at the register rename stage without
using any execution units.
2021-05-07 17:06:45 +03:00
Roman Lebedev 715c0d0bd4
[X86] AMD Zen 3: AVX YMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
2021-05-07 17:06:45 +03:00
Roman Lebedev ee020b930d
[X86] AMD Zen 3: AVX XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
2021-05-07 17:06:44 +03:00
Roman Lebedev 9db4203883
[X86] AMD Zen 3: SSE XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.

Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.

Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
2021-05-07 17:06:44 +03:00
Roman Lebedev d8c6202576
[X86] AMD Zen 3: throughput for renameable GPR moves is 6
They are resolved at the register rename stage without
using any execution units.
2021-05-07 17:06:43 +03:00
Roman Lebedev c3cd8ed009
[NFC][X86] AMD Zen 3: move sched classes for renameables moves togeter 2021-05-07 17:06:43 +03:00
Stephen Tozer 7bc1dd1191 Reapply "[DebugInfo] Drop DBG_VALUE_LISTs with an excessive number of debug operands"
Reapply b623df3c, which was reverted while reverting a different patch
with a breaking change. There are no underlying issues with this patch,
so no changes have been made to the original patch.

This reverts commit b11e4c9907.
2021-05-07 14:55:02 +01:00
Simon Pilgrim c9d4b4173b [CodeGen] Ensure UserValue::getDebugLoc() and UserLabel::getDebugLoc() consistently return a const reference NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
2021-05-07 14:48:23 +01:00
Simon Pilgrim dd21c6b843 [DAG] Ensure all SD classes consistently return a const reference with getDebugLoc(). NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
2021-05-07 14:48:23 +01:00
Benjamin Kramer 6248d11190 Retire TargetRegisterInfo::getSpillAlignment
getSpillAlign does the same thing.
2021-05-07 15:16:22 +02:00
Sebastian Neubauer 13c0316239 [AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.

This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.

Differential Revision: https://reviews.llvm.org/D101292
2021-05-07 14:51:32 +02:00
David Stuttard 606d4e8061 AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Retrying after revert and fix (removed implicit def flag from operand). Now
passes with expensive_checks enabled.

Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360
2021-05-07 13:42:57 +01:00
Malhar Jajoo dfe3ffaa4a [ARM] Transforming memset to Tail predicated Loop
This patch converts llvm.memset intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).

The llvm.memset is converted to a TP loop for both
constant and non-constant input sizes (of llvm.memset).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100435
2021-05-07 13:35:53 +01:00
Joachim Meyer d9f2960c93 [NFC] Correctly assert the indents for printEnumValHelpStr.
Only verify that there's no negative indent.
Noted by @chapuni in https://reviews.llvm.org/D93494.

Reviewed By: chapuni

Differential Revision: https://reviews.llvm.org/D102021
2021-05-07 14:30:43 +02:00
Stephen Tozer ce0c1f3ced [DebugInfo] Fix crash when emitting an invalidated SDDbgValue
This patch fixes a crash in the compiler that occurs when certain
invalidated SDDbgValues are emitted. The cause of this was that we would
attempt to check the liveness of the debug value's operands, which
triggers an assert if any of those operands are invalid. This patch
changes this check such that it only occurs if the SDDbgValue is valid;
if not, the check is irrelevant anyway, so can be safely ignored.

Differential Revision: https://reviews.llvm.org/D101540
2021-05-07 13:13:56 +01:00
Simon Pilgrim 280aa3415e [DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.

Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.

I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).

NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.

Differential Revision: https://reviews.llvm.org/D101987
2021-05-07 13:12:30 +01:00
David Stuttard 793b4b2603 Revert "AMDGPU: Correct const_index_stride for wave 32 for PAL ABI"
This reverts commit 442de0c1ad.
2021-05-07 12:49:17 +01:00
Simon Pilgrim 8e42024f79 [X86] Ensure we pass DebugLoc by const reference where possible. NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef
2021-05-07 12:32:59 +01:00
David Stuttard 442de0c1ad AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c
2021-05-07 12:19:49 +01:00
Roman Lebedev 7059b28d5d
[X86] AMD Zen 3: 32/64 -bit GPR register moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.

Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.

Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
2021-05-07 13:56:07 +03:00
Stephen Tozer 0791f968fe [DebugInfo] Fix updateDbgUsersToReg to support DBG_VALUE_LIST
This patch modifies updateDbgUsersToReg to properly handle
DBG_VALUE_LIST instructions, by replacing the hard-coded operand indices
(i.e. getOperand(0)) with the more general getDebugOperandsForReg(), and
updating the register for all matching operands.

Differential Revision: https://reviews.llvm.org/D101523
2021-05-07 11:47:50 +01:00
Guillaume Chatelet e805b7c2d6 [llvm][NFC] Remove remaining deprecated alignment functions from CodeGen
Differential Revision: https://reviews.llvm.org/D102058
2021-05-07 10:22:41 +00:00
LemonBoy f876383384 [AsmParser][ARM] Make .thumb_func imply .thumb
GNU as documentation states that a `.thumb_func` directive implies `.thumb`, teach the asm parser to switch mode whenever it's encountered. On the other hand the labeled form, exclusive to Apple's toolchain, doesn't switch mode at all.

Reviewed By: nickdesaulniers, peter.smith

Differential Revision: https://reviews.llvm.org/D101975
2021-05-07 12:13:36 +02:00
Sebastian Neubauer 98e5ede604 [AMDGPU] Serialize MFInfo::ScavengeFI
Serialize ScavengeFI from SIMachineFunctionInfo into yaml.

ScavengeFI is not used outside of the PrologEpilogInserter,
so this shouldn't change anything.

Differential Revision: https://reviews.llvm.org/D101367
2021-05-07 11:15:25 +02:00
Caroline Concatto cf06c8eee3 [LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction
The function fixReduction used to assert/crash for scalable vector when
a vector reduce could be done with a smaller vector.
This patch removes this assertion as it is safe to use scalable vector for
vector reduce and truncate.

Differential Revision: https://reviews.llvm.org/D101260
2021-05-07 09:37:37 +01:00
Peilin Guo 911a541620 [LazyValueInfo] Insert an Overdefined placeholder to prevent infinite recursion
getValueFromCondition() uses a Visited set to record the intermediate value.
However, it uses a postorder way to compute the value first and update the
Visited set later. Thus it will be trapped into an infinite recursion if there
exists IRs that use no dominated by its def as in this example:

  %tmp3 = or i1 undef, %tmp4
  %tmp4 = or i1 undef, %tmp3

To prevent this, we can insert an Overdefined placeholder into the set
before computing the actual value.

Reviewed by: nikic

Differential Revision: https://reviews.llvm.org/D101273
2021-05-07 16:05:50 +08:00
Chen Zheng 9deb7eeaf7 [Debug-Info][NFC] add a wrapper for Die.addValue
Add a new wrapper function addAttribute() for Die.addValue() function,
so we can do some attributes control in one single interface.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101125
2021-05-07 07:24:09 +00:00
Amara Emerson 1ccebb18ef [GlobalISel] Micro-optimize the conditional branch optimization.
Convert a check into an assert and pass an MI instead of recomputing in the
apply function.
2021-05-07 00:03:09 -07:00
Chen Zheng a95473c563 [XCOFF] handle string constants generation for AIX
This follows https://www.ibm.com/docs/en/aix/7.2?topic=constants-string

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D101280
2021-05-07 06:43:36 +00:00
Qiu Chaofan f7294ac809 [PowerPC] Remove extra swap for extract+vperm on LE
This is a simple fix on LE. On BE, vector shuffles are categorized into
different ops. We may need more work to eliminate these in
tablegen/pre-isel.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D101605
2021-05-07 13:48:08 +08:00
Yonghong Song 605c811d2b BPF: fix FIELD_EXISTS relocation with array subscripts
Lorenz Bauer reported an issue in bpf mailing list ([1]) where
for FIELD_EXISTS relocation, if the object is an array subscript,
the patched immediate is the object offset from the base address,
instead of 1.

Currently in BPF AbstractMemberAccess pass, the final offset
from the base address is the patched offset except FIELD_EXISTS
which is 1 unconditionally. In this particular case, the last
data structure access is not a field (struct/union offset)
so it didn't hit the place to set patched immediate to be 1.

This patch fixed the issue by checking the relocation type.
If the type is FIELD_EXISTS, just set to 1.
Tested by modifying some bpf selftests, libbpf is okay with
such types with FIELD_EXISTS relocation.

 [1] https://lore.kernel.org/bpf/CACAyw99n-cMEtVst7aK-3BfHb99GMEChmRLCvhrjsRpHhPrtvA@mail.gmail.com/

Differential Revision: https://reviews.llvm.org/D102036
2021-05-06 22:37:02 -07:00
Bruno Cardoso Lopes 819e0d105e [CGAtomic] Lift strong requirement for remaining compare_exchange combinations
Follow up on 431e3138a and complete the other possible combinations.

Besides enforcing the new behavior, it also mitigates TSAN false positives when
combining orders that used to be stronger.
2021-05-06 21:05:20 -07:00
Cyndy Ishida c4ed142e69 [llvm][TextAPI] add mapping from OS string to Platform
* add utility for matching target triple OS value strings  to PlatformKind

This was reviewed offline by ributzka, steven_wu
2021-05-06 16:25:56 -07:00
Stanislav Mekhanoshin c714d03785 [AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32
Differential Revision: https://reviews.llvm.org/D102022
2021-05-06 16:17:33 -07:00
Malhar Jajoo 9ff38e2d9d [ARM] Transforming memcpy to Tail predicated Loop
This patch converts llvm.memcpy intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).

From an implementation point of view, the patch

- adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel)
- adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter,
  on matching the above node.
- Adds a custom inserter function that expands the pseudo instruction into MIR suitable
   to be (by later passes) into a WLSTP loop.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99723
2021-05-06 23:21:28 +01:00
Sanjay Patel fefcb1f878 [PassManager] add helper function to hold set of vector passes
This is no-functional-change-intended (NFC) and split off from
D102002 (which proposes to eliminate the LTO-based differences).
2021-05-06 15:36:15 -04:00
Mircea Trofin 97ab068034 [NPM] Do not run function simplification pipeline unnecessarily
The CGSCC pass manager interplay with the FunctionAnalysisManagerCGSCCProxy is 'special' in the sense that the former will rerun the latter if there are changes to a SCC structure; that being said, some of the functions in the SCC may be unchanged. In that case, the function simplification pipeline will be re-run, which impacts compile time[1].

This patch allows the function simplification pipeline be skipped if it was already run and the function was not modified since.

The behavior is currently disabled by default. This is because, currently, the rerunning of the function simplification pipeline on an unchanged function may still result in changes. The patch simplifies investigating and fixing those cases where repeated function pass runs do actually positively impact code quality, while offering an easy workaround for those impacted negatively by compile time regressions, and not impacting mainline scenarios.

[1] A [[ http://llvm-compile-time-tracker.com/compare.php?from=eb37d3546cd0c6e67798496634c45e501f7806f1&to=ac722d1190dc7bbdd17e977ef7ec95e69eefc91e&stat=instructions | compile time tracker ]] run with the option enabled.

Differential Revision: https://reviews.llvm.org/D98103
2021-05-06 12:24:33 -07:00
Craig Topper 191ffda3f7 [RISCV] Remove unused ComplexPatterns. NFC 2021-05-06 12:17:41 -07:00
Craig Topper a577d59db2 [RISCV] Minor vector instruction tablegen cleanup. NFC
Use result_type for the IMPLICIT_DEF in masked vector patterns.
This doesn't matter today because result_type and op_type are
always the same.

Use multiclass inheritance to reduce repeated code.
2021-05-06 11:23:59 -07:00
Fangrui Song 306370be0b [AArch64] Fix namespace issue. NFC 2021-05-06 11:16:07 -07:00
Craig Topper 6660319cef [RISCV] Remove unused RISCV::VLEFF and VLEFF_MASK. NFC
Looks like these got left behind when vleff isel was moved to
X86ISelDAGToDAG.cpp
2021-05-06 09:41:29 -07:00
Jonas Paulsson 1c4cb510b4 [SystemZ] Don't use libcall for 128 bit shifts.
Expand 128 bit shifts instead of using a libcall.

This patch removes the 128 bit shift libcalls and thereby causes
ExpandShiftWithUnknownAmountBit() to be called.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D101993
2021-05-06 18:14:41 +02:00
Craig Topper 58323be415 [RISCV] Cleanup instruction formats used for B extension ternary operations.
Rename RVInstR4 as used by F/D/Zfh extensions to RVInstR4Frm.
Introduce new RVInstR4 that takes funct3 as a parameter.

Add new format classes for FSRI and FSRIW instead of trying to
bend RVInstR4 to use a shamt overlayed on rs2 and funct2.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100427
2021-05-06 08:59:05 -07:00