Commit Graph

154397 Commits

Author SHA1 Message Date
Fraser Cormack 4d268dc94a [RISCV] Enable CGP to sink splat operands of VP intrinsics
This patch brings better splat-matching to our VP support, by sinking
splat operands of VP intrinsics back into the same block as the VP
operation. The list of VP intrinsics we are interested in matches that
of the regular instructions.

Some optimization is still lacking. For instance, our VL nodes aren't
recognized as commutative, so splats must be on the RHS. Because of
this, we limit our sinking of splats to just the RHS operand for now.
Improvement in this regard can come in another patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117703
2022-01-21 11:30:37 +00:00
OCHyams b6a41fddcf [DWARF][DebugInfo] Fix off-by-one error in size of DW_TAG_base_type types
Fix PR53163 by rounding the byte size of DW_TAG_base_type types up. Without
this fix we risk emitting types with a truncated size (including rounding
less-than-byte-sized types' sizes down to zero).

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D117124
2022-01-21 11:37:49 +00:00
Nikita Popov bfbdb5e43e [Coroutines] Avoid some pointer element type accesses
These are just verifying that pointer types are correct, which is
no longer relevant under opaque pointers.
2022-01-21 12:36:19 +01:00
Nikita Popov 9c5b856dac [CoroSplit] Avoid pointer element type accesses
Use isOpaqueOrPointeeTypeMatches() for the assertions instead.
2022-01-21 12:22:09 +01:00
Sebastian Neubauer ae2f9c8be8 [AMDGPU] Remove lz and nomip combine from codegen
These combines have been moved into the IR combiner in D116042.

Differential Revision: https://reviews.llvm.org/D116116
2022-01-21 12:09:08 +01:00
Sebastian Neubauer 603d18033c [AMDGPU][InstCombine] Remove zero LOD bias
If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.

Differential Revision: https://reviews.llvm.org/D116042
2022-01-21 12:09:07 +01:00
Sebastian Neubauer 0530fdbbbb [AMDGPU] Fix LOD bias in A16 combine
As the codegen fix in D111754, the LOD bias needs to be converted to 16
bits. Fix this in the combine.

Differential Revision: https://reviews.llvm.org/D116038
2022-01-21 12:09:06 +01:00
Nikita Popov e7762653d3 [Attributor] Avoid some pointer element type accesses 2022-01-21 11:20:10 +01:00
Florian Hahn 55689904d2
[VPlan] Move ::isCanonical outside ifdef.
This fixes a build failure with assertions disabled.
2022-01-21 09:44:31 +00:00
Florian Hahn c0cf209076
[VPlan] Add VPWidenIntOrFpInductionRecipe::isCanonical, use it (NFCI).
This patch adds VPWidenIntOrFpInductionRecipe::isCanonical to check if
an induction recipe is canonical. The code is also updated to use it
instead of isCanonicalID.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D117551
2022-01-21 09:35:06 +00:00
serge-sans-paille a2f6921ef2 [llvm] Remove unused headers in LLVMDemangle
As an hint to the impact of the cleanup, running

clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/Demangle/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l

before: 208053 lines
after:  203965 lines
2022-01-21 10:18:32 +01:00
Nikita Popov b4900296e4 [ConstantFold] Allow all float types in reinterpret load folding
Rather than hardcoding just half, float and double, allow all
floating point types.
2022-01-21 09:26:51 +01:00
Simon Moll 7950010e49 [VE][NFC] Factor out helper functions
Factor out some helper functions to cleanup VEISelLowering.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D117683
2022-01-21 09:15:59 +01:00
Nikita Popov 6a19cb837c [ConstantFold] Support pointers in reinterpret load folding
Peculiarly, the necessary code to handle pointers (including the
check for non-integral address spaces) is already in place,
because we were already allowing vectors of pointers here, just
not plain pointers.
2022-01-21 09:13:37 +01:00
Nikita Popov 05cd9a0596 [ConstantFold] Simplify type check in reinterpret load folding (NFC)
Keep a list of allowed types, but then always construct the map
type the same way. We need an integer with the same width as the
original type.
2022-01-21 09:06:35 +01:00
eopXD e6de53b4de [RISCV] Bump rvv-related extensions from 0.10 to 1.0
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112987
2022-01-20 23:22:20 -08:00
Igor Kudrin 86b08ed6bb [DebugInfo][NFC] Do not call 'isRootFile' for DWARF Version < 5
A quicker comparison should be done first.

Differential Revision: https://reviews.llvm.org/D117786
2022-01-21 13:52:10 +07:00
Igor Kudrin 75184f14ae [DebugInfo] Fix handling '# line "file"' for DWARFv5
`CppHashInfo.Filename` is a `StringRef` that references a part of the
source file and it is not null-terminated at the end of the file name.
`AsmParser::parseAndMatchAndEmitTargetInstruction()` passes it to
`getStreamer().emitDwarfFileDirective()`, and it eventually comes to
`isRootFile()`. The comparison fails because `FileName.data()` is not
properly terminated.

In addition, the old code might cause a significant speed degradation
for long source files. The `operator!=()` for `std::string` and
`const char *` can be implemented in a way that it finds the length of
the second argument first, which slows the comparison for long data.
`parseAndMatchAndEmitTargetInstruction()` calls
`emitDwarfFileDirective()` every time if `CppHashInfo.Filename` is not
empty. As a result, the longer the source file is, the slower the
compilation wend, and for a very long file, it might take hours instead
of a couple of seconds normally.

Differential Revision: https://reviews.llvm.org/D117785
2022-01-21 13:52:10 +07:00
wangpc 8def89b5dc [RISCV] Set CostPerUse to 1 iff RVC is enabled
After D86836, we can define multiple cost values for
different cost models. So here we set CostPerUse to
1 iff RVC is enabled to avoid potential impact on RA.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117741
2022-01-21 14:44:26 +08:00
Zi Xuan Wu 82bb8a588d [CSKY] Add codegen support of GlobalTLSAddress lowering
There are static and dynamic TLS address lowering in DAG stage according to different TLS model.
It needs PseudoTLSLA32 pseudo to get address of TLS-related entry which resides in constant pool.
2022-01-21 14:39:55 +08:00
Craig Topper 7b3d307288 [RISCV] Add isel patterns for grevi, shfli, and unshfli to brev8/zip/unzip instructions.
Zbkb supports some encodings of the general grevi, shfli, and
unshfli instructions legal, so we added separate instructions for
those encodings to improve the diagnostics for assembler and
disassembler. To be consistent we should always use these separate
instructions whenever those specific encodings of grevi/shfli/unshfli
occur. So this patch adds specific isel patterns to override the generic
isel patterns for these cases. Similar was done for rev8 and zext.h
for Zbb previously.
2022-01-20 20:43:52 -08:00
Wu Xinlong 7ee1c162cc [RISCV][RFC] add inst support of zbkb
This commit add instructions supports of `zbkb` which defined in scalar cryptography extension version v1.0.0 (has been ratified already).

Most of the zbkb directives reuse parts of the zbp and zbb directives, so this patch just modified some of the inst aliases and predicates.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117640
2022-01-21 11:49:36 +08:00
Joao Moreira 82af95029e [X86] Enable ibt-seal optimization when LTO is used in Kernel
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function.

Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1].

This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage.

A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540.

The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged.

[1] - https://lkml.org/lkml/2021/11/22/591

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D116070
2022-01-21 10:55:34 +08:00
Hsiangkai Wang ad06e65dc4 [RISCV] Fix the bug in the register allocator caused by reserved BP.
Originally, hasRVVFrameObject() will scan all the stack objects to check
whether if there is any scalable vector object on the stack or not.
However, it causes errors in the register allocator. In issue 53016, it
returns false before RA because there is no RVV stack objects. After RA,
it returns true because there are spilling slots for RVV values during RA.
The compiler will not reserve BP during register allocation and generate BP
access in the PEI pass due to the inconsistent behavior of the function.

The function is changed to use hasStdExtV() as the return value. It is
not precise, but it can make the register allocation correct.

Refer to https://github.com/llvm/llvm-project/issues/53016.

Differential Revision: https://reviews.llvm.org/D117663
2022-01-21 01:23:01 +00:00
Craig Topper cfae2c65db [RISCV] Factor Zve32 support into RISCVSubtarget::getMaxELENForFixedLengthVectors.
This is needed to properly limit fractional LMULs for Zve32.

Add new RUN Zve32 RUN lines to the existing tests for the
-riscv-v-fixed-length-vector-elen-max command line option.
2022-01-20 16:31:12 -08:00
Pawe Bylica 1d7604fdce
[InstCombine] Simplify bswap -> shift
Simplify bswap(x) to shl(x) or lshr(x) if x has exactly one
"active byte", i.e. all active bits are contained in boundaries
of a single byte of x.

https://alive2.llvm.org/ce/z/nvbbU5
https://alive2.llvm.org/ce/z/KiiL3J

Reviewed By: spatel, craig.topper, lebedev.ri

Differential Revision: https://reviews.llvm.org/D117680
2022-01-21 01:25:30 +01:00
Johannes Doerfert 37e0c58559 [Attributor][FIX] AAValueConstantRange should not loop unconstrained
The old method to avoid unconstrained expansion of the constant range in
a loop did not work as soon as there were multiple instructions in
between the phi and its input. We now take a generic approach and limit
the number of updates as a fallback. The old method is kept as it
catches "the common case" early.
2022-01-20 18:07:04 -06:00
Johannes Doerfert 7bf9065ad7 [Attributor][NFC] Clang format 2022-01-20 18:06:53 -06:00
Craig Topper 5e88f527da [RISCV] Remove RISCVSubtarget::hasStdExtV() and hasStdExtZve*(). NFC
All code should use one of the cleaner named hasVInstructions*
functions. Fix the two uses that weren't and delete the methods
so no new uses can be created.
2022-01-20 15:05:09 -08:00
Craig Topper fa8bb22466 [RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors.
RISCV only has a unary shuffle that requires places indices in a
register. For interleaving two vectors this means we need at least
two vrgathers and a vmerge to do a shuffle of two vectors.

This patch teaches shuffle lowering to use a widening addu followed
by a widening vmaccu to implement the interleave. First we extract
the low half of both V1 and V2. Then we implement
(zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which
simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further
simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the
result back to the original type splitting the wide elements in half.

We can only do this if we have a type with wider elements available.
Because we're using extends we also have to be careful with fractional
lmuls. Floating point types are supported by bitcasting to/from integer.

The tests test a varied combination of LMULs split across VLEN>=128 and
VLEN>=512 tests. There a few tests with shuffle indices commuted as well
as tests for undef indices. There's one test for a vXi64/vXf64 vector which
we can't optimize, but verifies we don't crash.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D117743
2022-01-20 14:44:47 -08:00
Michael Kruse 1d4ca42b43 [OpenMPIRBuilder] Detect ambiguous InsertPoints for apply*WorkshareLoop. NFC.
Follow-up on D117226 for applyStaticWorkshareLoop and
applyDynamicWorkshareLoop checking for conflicting InertPoints via an
assert. There is no in-tree code that violates this assertion, hence
nothing changes.
2022-01-20 16:10:17 -06:00
Philip Reames c0906f6b21 [SLP] Remove stray semicolon to make bots happy
Certain bots (e.g. sanitizer-x86_64-linux-android) appear to be running with strict c++98 flags which disallow ; at global scope.
2022-01-20 14:09:28 -08:00
Stanislav Mekhanoshin 41ebd19681 [AMDGPU] Do not ignore exec use where exec is read as data
Compares, v_cndmask_b32, and v_readfirstlane_b32 use EXEC
in a way which modifies the result. This implicit EXEC use
shall not be ignored for the purposes of instruction moves.

Differential Revision: https://reviews.llvm.org/D117814
2022-01-20 14:05:22 -08:00
Philip Reames 5a670f1378 [SLP] Kill an unused param and use a for-loop in calculateDependencies [NFC] 2022-01-20 13:58:20 -08:00
Philip Reames 60f6191879 [SLP] Extract formBundle helper for readability [NFC] 2022-01-20 13:08:37 -08:00
Sanjay Patel a7a2860d0e [InstCombine] convert mul with sexted bool and constant to select
We already have the related folds for zext-of-bool, so it
should make things more consistent to have this transform
to select for sext-of-bool too:
https://alive2.llvm.org/ce/z/YikdfA

Fixes #53319
2022-01-20 15:57:01 -05:00
Craig Topper dd7b69a61f [RISCV] Remove HadStdExtV and HasStdZve* Predicates from tablegen.
No instructions should be using these. Everything should use
HasVInstructions* Predicates. Remove them so that they can't be
used by accident.
2022-01-20 12:54:20 -08:00
Philip Reames 118babe67a [SLP] Use for loops for walking bundle elements 2022-01-20 12:44:33 -08:00
Craig Topper 7a275dc354 [RISCV] Remove Zvlsseg extension.
This string no longer appears in the Vector Extension specification.
The segment load/store instructions are just part of the vector
instruction set.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D117724
2022-01-20 12:40:07 -08:00
Philip Reames 860038e0d7 [SLP] Rename a couple lambdas to be more clearly separate from method names 2022-01-20 12:13:30 -08:00
Roman Lebedev ba8eb31bd9
[InstCombine] Instruction sinking: fix check for function terminating block
Checking for specific function terminating opcodes
means we don't handle other non-hardcoded ones :)

This should probably be generalized to something
similar to the `IsBlockFollowedByDeoptOrUnreachable()`.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D117810
2022-01-20 22:41:31 +03:00
Craig Topper 94e69fbb4f [RISCV] Add DAG combine to fold (fp_to_int_sat (ffloor X)) -> (select X == nan, 0, (fcvt X, rdn))
Similar for ceil, trunc, round, and roundeven. This allows us to use
static rounding modes to avoid a libcall.

This is similar to D116771, but for the saturating conversions.

This optimization is done for AArch64 as isel patterns.
RISCV doesn't have instructions for ceil/floor/trunc/round/roundeven
so the operations don't stick around until isel to enable a pattern
match. Thus I've implemented a DAG combine.

I'm only handling saturating to i64 or i32. This could be extended
to other sizes in the future.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D116864
2022-01-20 11:35:37 -08:00
Daniel Thornburgh 6b92bb4790 [Support] [DebugInfo] Lazily create cache dir.
This change defers creating Support/Caching.cpp's cache directory until
it actually writes to the cache.

This allows using Caching library in a read-only fashion. If read-only,
the cache is guaranteed not to write to disk. This keeps tools using
DebugInfod (currently llvm-symbolizer) hermetic when not configured to
perform remote lookups.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D117589
2022-01-20 19:27:15 +00:00
Jonas Paulsson 792853cb78 [SystemZ] Remove the ManipulatesSP flag from backend (NFC).
This flag was set in the presence of stacksave/stackrestore in order to force
a frame pointer.

This should however not be needed per the comment in MachineFrameInfo.h
stating that a a variable sized object "...is the sole condition which
prevents frame pointer elimination", and experiments have also shown that
there seems to be no effect whatsoever on code generation with ManipulatesSP.

Review: Ulrich Weigand
2022-01-20 13:00:51 -06:00
Sanjay Patel 2d031ec5e5 [InstCombine] add one-use check to opposite shift folds
Test comments say this might be intentional, but I don't
see any hard evidence to support it. The extra instruction
shows up as a potential regression in D117680.

One test does show a missed fold that might be recovered
with better demanded bits analysis.
2022-01-20 13:49:23 -05:00
Craig Topper 9abc593e98 [TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC
Use alignDown instead of &= ~7.

Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so
(BitWidth - NTZ - 8 == NLZ).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D117804
2022-01-20 10:45:17 -08:00
Alexandre Ganea 5af2433e17 [clang-cl] Support the /HOTPATCH flag
This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019

The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching).

Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: d703b92296/lld/COFF/Writer.cpp (L1298)

The outcome is that we can finally use Live++ or Recode along with clang-cl.

NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result.

Depends on D43002, D80833 and D81301 for the full feature.

Differential Revision: https://reviews.llvm.org/D116511
2022-01-20 12:57:19 -05:00
Matt Arsenault 064cea9c9a AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection
Avoids a test diff with SDAG.
2022-01-20 12:56:53 -05:00
Matt Arsenault 2e49e0cfde AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics
Emit an error if the return value is used on subtargets that do not
support them. Previously we were falling back to the DAG on selection
failure, where it would emit this error and then fail again.
2022-01-20 12:46:45 -05:00
Nadav Rotem 191a6e9dfa optimize icmp-ugt-ashr
This diff optimizes the sequence icmp-ugt(ashr,C_1) C_2. InstCombine
already implements this optimization for sgt, and this patch adds
support ugt. This patch adds the check for UGT.

@craig.topper came up with the idea and proof:

  define i1 @src(i8 %x, i8 %y, i8 %c) {
    %cp1 = add i8 %c, 1
    %i = shl i8 %cp1, %y
    %i.2 = ashr i8 %i, %y
    %cmp = icmp eq i8 %cp1, %i.2
    ;Assume: C + 1 == (((C + 1) << y) >> y)
    call void @llvm.assume(i1 %cmp)

    ; uncomment for the sgt case
    %j = shl i8 %cp1, %y
    %j.2 = sub i8 %j, 1
    %cmp2 = icmp ne i8 %j.2, 127
    ;Assume (((c + 1 ) << y) - 1) != 127
    call void @llvm.assume(i1 %cmp2)

    %s = ashr i8 %x, %y
    %r = icmp sgt i8 %s, %c
    ret i1 %r
  }

  define i1 @tgt(i8 %x, i8 %y, i8 %c) {
    %cp1 = add i8 %c, 1
    %j = shl i8 %cp1, %y
    %j.2 = sub i8 %j, 1

    %r = icmp sgt i8 %x, %j.2
    ret i1 %r
  }

  declare void @llvm.assume(i1)

  This change is related to the optimizations in D117252.

  Differential Revision: https://reviews.llvm.org/D117365
2022-01-20 09:31:46 -08:00