Commit Graph

407321 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu 78b0f3701d [HIPSPV][1/4] Refactor HIP tool chain
This patch refactors the HIP tool chain for new HIP tool chain, HIPSPV
tool chain, which is added in the follow up patch part 2.

Rename HIPToolChain to HIPAMDToolChain and Renames HIP.* files to HIPAMD.*.
Introduce HIPUtility.* file where common HIP utilities, shared among HIP
tool chain implementations, are placed in.
Move constructHIPFatbinCommand() and
constructGenerateObjFileFromHIPFatBinary() to HIPUtility. HIPSPV tool
chain is going to use them.
Tweak bundle target ID in constructHIPFatbinCommand(): extra dashes are
dropped if the Target ID is empty and 'hip' offload kind is made default
for non-AMD targets.

Patch by: Henry Linjamäki

Reviewed by: Yaxun Liu, Artem Belevich, Eric Christopher

Differential Revision: https://reviews.llvm.org/D110549
2021-12-13 10:50:25 -05:00
Nikita Popov 220815a91a [AMDGPUPerfHintAnalysis] Avoid getPointerElementType()
Extract the load/store type from the instruction rather than
fetching it from the pointer element type.
2021-12-13 16:48:21 +01:00
Neubauer, Sebastian 26924b57e8 [AMDGPU] Ignore special ABI registers for graphics
Fixed ABI arguments are compute specific and should not be added to
graphics shaders or functions, so do not try to add them.

Differential Revision: https://reviews.llvm.org/D115344
2021-12-13 16:44:37 +01:00
Lei Zhang 5e55a20119 [mlir][spirv] Serialize selection with separate header block
The previous "optimization" that tries to reuse existing block for
selection header block can be problematic for deserialization
because it effectively pulls in previous ops in the selection op's
enclosing block into the selection op's header. When deserializing,
those ops will be placed in the selection op's region. If any of
the previous ops has usage after the section op, it will break. That
is, the following IR cannot round trip:

```mlir
^bb:
  %def = ...
  spv.mlir.selection { ... }
  %use = spv.SomeOp %def
```

This commit removes the "optimization" to always create new blocks
for the selection header.

Along the way, also made error reporting better in deserialization
by turning asserts into proper errors and add check of uses outside
of sinked structured control flow region blocks.

Reviewed By: Hardcode84

Differential Revision: https://reviews.llvm.org/D115582
2021-12-13 10:42:26 -05:00
Chuanqi Xu 9db8162820 [NFC] Format .cppm files in tests 2021-12-13 23:32:25 +08:00
Louis Dionne 7c1d4c2e77 [libc++abi][NFC] Fix comment 2021-12-13 10:29:29 -05:00
Sanjay Patel f46a9c8edd [InstCombine] don't automatically drop poison-generating flags in SimplifyVectorDemandedElts
I noticed this while reviewing the test diffs in D115460
(and so the diffs in that patch will be reduced if this one is applied first).

This is effectively a revert of 3436dc2923 ( https://reviews.llvm.org/rG3436dc29239d ) -
since that commit, we've made several enhancements, so the reasoning there is no longer
valid. Specifically, we added a poison value to IR, and we clarified the behavior of
undef/poison elements in a shuffle mask:
https://llvm.org/docs/LangRef.html#shufflevector-instruction

Alive2 seems to agree that the propagation of flags in the test diffs shown here are valid:
https://alive2.llvm.org/ce/z/UuY-jr
https://alive2.llvm.org/ce/z/GXoMD9
https://alive2.llvm.org/ce/z/nVCyVH

Differential Revision: https://reviews.llvm.org/D115526
2021-12-13 10:12:19 -05:00
Mogball 843534db3c [mlir][ods] Fix OpDefinitionsGen infer return types builder with regions
Despite handling regions and inferred return types, the builder was never generated for ops with both InferReturnTypeOpInterface and regions.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D115525
2021-12-13 15:11:35 +00:00
Kadir Cetinkaya a47af1ac34
[clangd][Dex] Fix crashes when building trigrams for empty identifier 2021-12-13 15:58:33 +01:00
gysit 6c85a49e22 [mlir][memref] Use current source type in getCanonicalSubViewResultType.
Use the current instead of the new source type to compute the rank-reduction map in getCanonicalSubViewResultType. Otherwise, the computation of the rank-reduction map fails when folding a cast into a subview since the strides of the new source type cannot be related to the strides of the current result type.

Depends On D115428

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115446
2021-12-13 14:50:41 +00:00
Jay Foad 16de2c09dd [AMDGPU] SIShrinkInstructions: sink code to where it's used. NFC. 2021-12-13 14:46:40 +00:00
Jay Foad 63681527ee [AMDGPU] SIShrinkInstructions: remove redundant check
canShrink already calls hasVALU32BitEncoding, so there is no need
to call it again here.
2021-12-13 14:46:40 +00:00
Jay Foad 61f8af2657 [AMDGPU] Remove a FIXME implemented in D11061 2021-12-13 14:46:40 +00:00
Nikita Popov 432c41ebe9 [SLP] Avoid getPointerElementType() call
Use the load result type instead of the element type of the load
pointer operand.
2021-12-13 15:46:13 +01:00
Pavel Labath 529e03ea65 [lldb] Remove named function arguments from TestQemuLaunch
This is a swig-4 feature.
2021-12-13 15:30:26 +01:00
Nikita Popov 9cbab13282 [ConstantsTest] Avoid crash with opaque pointers
With opaque pointers there will be no bitcast, so don't assume
that.
2021-12-13 15:23:12 +01:00
Daniil Fukalov e5c64b45be [CostModel][AMDGPU] Fix intrinsics costs estimations.
1. Fixed costs inconsistency for llvm.fma.vXf16 instinsiscs.
2. Added tests for llvm.sadd.sat, llvm.ssub.sat, llvm.uadd.sat, llvm.usub.sat
   intrisics since they have special processing in cost model.
3. Minor intrisics' costs tests updat and refinement.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D115385
2021-12-13 17:17:34 +03:00
Markus Böck 664cc9312c [mlir] Implement `DataLayoutTypeInterface` for `LLVMStructType`
Using this implementation of the interface it is possible to query the size, ABI alignment as well as the preferred alignment of a struct. It should yield the same results as LLVMs `llvm::DataLayout` on an equivalent `llvm::StructType`, including for packed structs.

Additionally it is also possible to increase the ABI and preferred alignment using a data layout entry with the type `llvm.struct<()>, which serves the same functionality as the `a:` component in LLVMs data layout string.

Differential Revision: https://reviews.llvm.org/D115600
2021-12-13 15:09:16 +01:00
Jon Chesterfield 28345d7f6f [amdgpu] Add regression test for LDS in metadata 2021-12-13 13:35:38 +00:00
Florian Hahn e2885c7c9b
[VPlan] Add printing test with VPInstruction with debug locs.
Test case for D113223.
2021-12-13 13:08:41 +00:00
gysit db7a2e9176 [mlir][linalg] Only compose PadTensorOps if no ExtractSliceOp is rank-reducing.
Do not compose pad tensor operations if the extract slice of the outer pad tensor operation is rank reducing. The inner extract slice op cannot be rank-reducing since it source type must match the desired type of the padding.

Depends On D115359

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115428
2021-12-13 13:01:30 +00:00
gysit 6859f8ed1e [mlir][linalg] Adapt the PadTensorOpVectorizationWithInsertSlicePattern matching.
Tighten the matcher of the PadTensorOpVectorizationWithInsertSlicePattern pattern. Only match if the PadOp result is used by the InsertSliceOp source. Fail if the result is used by the InsertSliceOp dest.

Depends On D115336

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115359
2021-12-13 12:55:07 +00:00
gysit f895e95138 [mlir][linalg] Make padding work for rank-reducing slice ops.
Adapt the computation of a static bounding box to take rank-reducing slice operations into account by filtering out reduced size one dimensions. The revision is needed to make padding work for decomposed convolution operations. The decomposition introduces rank reducing extract slice operations that previously let padding fail.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D115336
2021-12-13 12:34:20 +00:00
Nico Weber 45158b1804 Revert "[NFC] format .cppm files in test"
This reverts commit 7c51a12833.
Breaks SemaCXX/modules-ts.cppm in check-clang.
2021-12-13 07:13:17 -05:00
Florian Hahn 42263e7d26
[LV] Add test with debug locations on branches that get scalarized. 2021-12-13 12:06:35 +00:00
Nico Weber b6f317d94d [gn build] Make arm_neon_sve_bridge.h header auto-syncable 2021-12-13 07:04:45 -05:00
Evgeniy Brevnov 7002125cff [LV][NFC] Fix debug message to print out resulting clamped VF 2021-12-13 18:54:05 +07:00
Chuanqi Xu 7c51a12833 [NFC] format .cppm files in test 2021-12-13 19:52:31 +08:00
Dmitry Vyukov 9fb8058a80 tsan: enable the new runtime
This enables the new runtime (D112603) by default.

Depends on D112603.

Differential Revision: https://reviews.llvm.org/D115624
2021-12-13 12:50:13 +01:00
Dmitry Vyukov b332134921 tsan: new runtime (v3)
This change switches tsan to the new runtime which features:
 - 2x smaller shadow memory (2x of app memory)
 - faster fully vectorized race detection
 - small fixed-size vector clocks (512b)
 - fast vectorized vector clock operations
 - unlimited number of alive threads/goroutimes

Depends on D112602.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D112603
2021-12-13 12:48:34 +01:00
Peter Waller 921e89c59a [SVE] Only combine (fneg (fma)) => FNMLA with nsz
-(Za + Zm * Zn) != (-Za + Zm * (-Zn))
when the FMA produces a zero output (e.g. all zero inputs can produce -0
output)

Add a PatFrag to check presence of nsz on the fneg, add tests which
ensure the combine does not fire in the absense of nsz.

See https://reviews.llvm.org/D90901 for a similar discussion on X86.

Differential Revision: https://reviews.llvm.org/D109525
2021-12-13 11:33:07 +00:00
Matt Devereau 41def32040 [AArch64][SVE][NEON] Add NEON-SVE-Bridge intrinsics
Adds svset_neonq, svget_neonq, svdup_neonq AArch64 intrinsics.

These are described in the ACLE specification:
https://github.com/ARM-software/acle/pull/72

https://reviews.llvm.org/D114713
2021-12-13 11:31:57 +00:00
Kazushi (Jam) Marukawa cffce86a1c [VE] Support srel32 in symbol reference
Support R_VE_SREL32 in symbol references in MC layer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D115591
2021-12-13 20:29:17 +09:00
Kazushi (Jam) Marukawa d1057f9604 [VE] Support R_VE_RELATIVE
Change getELFRelativeRelocationType() to return R_VE_RELATIVE
as a preparation of lld for VE.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D115592
2021-12-13 20:28:35 +09:00
Matt Devereau 2e585dd91a [AArch64][SVE] Lower vector.insert to predicated merged MOV
Use predicated SEL for vector.insert instead of going through memory

Differential Revision: https://reviews.llvm.org/D115259
2021-12-13 11:17:55 +00:00
Florian Hahn e90630e5a5
[VPlan] Remove unused createNaryOp (NFC). 2021-12-13 11:11:00 +00:00
Dmitry Vyukov b088833375 tsan: deflake dlopen_static_tls.cpp
Currently the test calls dlclose in the thread
concurrently with the main thread calling a function
from the dynamic library. This is not good.
Wait for the main thread to call the function
before calling dlclose.

Depends on D115612.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D115613
2021-12-13 12:01:40 +01:00
Dmitry Vyukov 7de546e9e8 tsan: deflake flush_memory.cpp
The test contains a race and checks that it's detected.
But the race may not be detected since we are doing aggressive flushes
and if the state flush happens between racing accesses, tsan won't
detect the race). So return 1 to make the test deterministic
regardless of the race.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D115612
2021-12-13 12:01:30 +01:00
Fraser Cormack b0319ab79b [PR52475] Ensure a correct chain in copies to/from hidden sret parameter
This patch fixes an issue during SelectionDAG construction. When the
target is unable to lower the function's return value, a hidden sret
parameter is created. It is initialized and copied to a stored variable
(DemoteRegister) with CopyToReg and is later fetched with
CopyFromReg. The bug is that the chains used for each copy are
inconsistent, and thus in rare cases the scheduler may issue them out of
order.

The fix is to ensure that the CopyFromReg uses the DAG root which is set
as the chain corresponding to the initial CopyToReg.

Fixes https://llvm.org/PR52475

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D114795
2021-12-13 10:46:32 +00:00
Nikita Popov 396370e889 [MemCpyOpt] Add additional call slot capture tests (NFC)
One test shows a miscompile when bitcasts are involved, the others
cases where we can perform the optimization despite a capture.
2021-12-13 10:57:06 +01:00
Simon Moll 9feeb2fb61 [VE][NFC] Cleanup vector patterns
Cleanup VE vector isel patterns and follow the downstream LLVM-VE
pattern naming convention.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D115516
2021-12-13 10:12:27 +01:00
Siva Chandra Reddy d37d0aadbf [libc][NFC] Add back NOLINT anntotations to PolyEval.
They were accidentally removed in a previous change.
2021-12-13 07:08:08 +00:00
Evgeniy Brevnov 2025e0985c [LV] Make sure VF doesn't exceed compile time known TC
For the simple copy loop (see test case) vectorizer selects VF equal to 32 while the loop is known to have 17 iterations only. Such behavior makes no sense to me since such vector loop will never be executed. The only case we may want to select VF large than TC is masked vectoriztion. So I haven't touched that case.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D114528
2021-12-13 13:48:46 +07:00
Fangrui Song 9115d75117 [ELF] Use parallelSort for .rela.dyn
An unstable sort suffices. In a large link (11.06s), this decreases .rela.dyn
writeTo time from 1.52s to 0.81s, resulting in 6% total time speedup (the
benefit will greatly dilute if --pack-dyn-relocs=relr becomes prevailing).

Encoding the dynamic relocations then sorting raw Elf_Rel/Elf_Rela doesn't seem
to improve much (doing that would require code duplicate because of
Elf_Rel/Elf_Rela plus unfortunate mips64le), so don't do that.
2021-12-12 20:53:06 -08:00
Fangrui Song 1eaa9b4374 [ELF] initializeSections: move SHT_LLVM_CALL_GRAPH_PROFILE check into SHF_EXCLUDE && !relocatable. NFC
Avoid a comparison in the majority of cases.
2021-12-12 20:05:21 -08:00
Fangrui Song d29766bb48 [ELF] relocateAlloc: remove variables type and expr. NFC 2021-12-12 19:31:30 -08:00
Fangrui Song 4cfff19b88 [ELF] Move adjustSplitStackFunctionPrologues's splitStack check to the caller. NFC
Avoid a function call in the majority of cases and make the output smaller.
2021-12-12 19:26:03 -08:00
Fangrui Song a8024dfc06 [ELF] Avoid mutable addend parameter. NFC 2021-12-12 19:12:01 -08:00
Fangrui Song 5fadb39e9b [Driver][test] Make some tests work with CLANG_DEFAULT_PIE_ON_LINUX=on
Also delete some cross-linux.c tests which are covered by linux-cross.cpp
2021-12-12 16:28:33 -08:00
Kazu Hirata bb6447a78c [llvm] Use llvm::reverse (NFC) 2021-12-12 16:13:49 -08:00