Commit Graph

400080 Commits

Author SHA1 Message Date
Roman Lebedev 2a7a768dad
[X86][Costmodel] Load/store i16 Stride=4 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.

For load we have:
https://godbolt.org/z/zP4hd8MT6 - for intels `Block RThroughput: =150.0`; for ryzens, `Block RThroughput: <=59`
So pick cost of `150`.

For store we have:
https://godbolt.org/z/vKb8zTK8E - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `64`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110548
2021-09-27 22:20:01 +03:00
Roman Lebedev ee5a050e2e
[X86][Costmodel] Load/store i16 Stride=4 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =75.0`; for ryzens, `Block RThroughput: <=29.5`
So pick cost of `75`. (note that `# 32-byte Reload` does not affect throughput there.)

For store we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110543
2021-09-27 22:20:01 +03:00
Roman Lebedev 5615d6a6dd
[X86][Costmodel] Load/store i16 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/dd8T5P471 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=14.5`
So pick cost of `33`.

For store we have:
https://godbolt.org/z/zPxcKWhn4 - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `10`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110541
2021-09-27 22:20:01 +03:00
Roman Lebedev df2b42d12e
[X86][Costmodel] Load/store i16 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rnsf639Wh - for intels `Block RThroughput: =17.0`; for ryzens, `Block RThroughput: <=7.5`
So pick cost of `17`.

For store we have:
https://godbolt.org/z/565KKrcY6 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110537
2021-09-27 22:20:01 +03:00
Roman Lebedev 45caac91c4
[X86][Costmodel] Load/store i16 Stride=4 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5EYc6r9nh - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/z61e5d6GE - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110536
2021-09-27 22:20:01 +03:00
Chris Bieneman 18cf5b220d Fixing docs build
I always forget that new line...
2021-09-27 14:16:28 -05:00
Chris Bieneman 1e48ef2035 Implement #pragma clang final extension
This patch adds a new preprocessor extension ``#pragma clang final``
which enables warning on undefinition and re-definition of macros.

The intent of this warning is to extend beyond ``-Wmacro-redefined`` to
warn against any and all alterations to macros that are marked `final`.

This warning is part of the ``-Wpedantic-macros`` diagnostics group.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D108567
2021-09-27 14:11:16 -05:00
Sanjay Patel fdba1dccbe [InstCombine] reduce code for shl-of-sub transform; NFC 2021-09-27 14:56:01 -04:00
Sanjay Patel b75ed244af [InstCombine] add tests for shl-of-sub; NFC 2021-09-27 14:56:01 -04:00
Aart Bik 06e2a0684e [mlir][sparse] sampled matrix multiplication fusion test
This integration tests runs a fused and non-fused version of
sampled matrix multiplication. Both should eventually have the
same performance!

NOTE: relies on pending tensor.init fix!

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D110444
2021-09-27 11:50:49 -07:00
Nico Weber 36dc5c048a Revert "[clangd] Refactor IncludeStructure: use File (unsigned) for most computations"
This reverts commit 0b1eff1bc5.
Breaks check-clangd on Windows, see comments on
https://reviews.llvm.org/D110386
2021-09-27 14:38:18 -04:00
Jon Chesterfield 80fa43fe9a Revert "[openmp] Add addrspacecast to getOrCreateIdent"
This reverts commit 1a761e5b7b.
Failed CI, albeit with a different failure mode to BZ51982
2021-09-27 19:27:35 +01:00
Jon Chesterfield 1a761e5b7b [openmp] Add addrspacecast to getOrCreateIdent
Fixes 51982. Minor refactor to remove `return x = y` construct.

Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\
blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting
parts while checking the assertion failure still occurred.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110556
2021-09-27 19:23:12 +01:00
Aart Bik ec97a205c3 [mlir][sparse] preserve zero-initialization for materializing buffers
This revision makes sure that when the output buffer materializes locally
(in contrast with the passing in of output tensors either in-place or not
in-place), the zero initialization assumption is preserved. This also adds
a bit more documentation on our sparse kernel assumption (viz. TACO
assumptions).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D110442
2021-09-27 11:22:05 -07:00
Aaron Ballman ef0f728abe Add a missing include to appease the build bots 2021-09-27 14:19:39 -04:00
Sanjay Patel 623f93ed1c [InstCombine] add use check to shl transform
This bug was introduced with the refactoring in:
9075edc89b
...but there were no tests to detect it.
2021-09-27 14:10:26 -04:00
Sanjay Patel d992950078 [InstCombine] add tests for opposing shifts separated by trunc; NFC 2021-09-27 14:10:26 -04:00
Jameson Nash e27a6db529 Bad SLPVectorization shufflevector replacement, resulting in write to wrong memory location
We see that it might otherwise do:

  %10 = getelementptr {}**, <2 x {}***> %9, <2 x i32> <i32 10, i32 4>
  %11 = bitcast <2 x {}***> %10 to <2 x i64*>
...
  %27 = extractelement <2 x i64*> %11, i32 0
  %28 = bitcast i64* %27 to <2 x i64>*
  store <2 x i64> %22, <2 x i64>* %28, align 4, !tbaa !2

Which is an out-of-bounds store (the extractelement got offset 10
instead of offset 4 as intended). With the fix, we correctly generate
extractelement for i32 1 and generate correct code.

Differential Revision: https://reviews.llvm.org/D106613
2021-09-27 14:06:13 -04:00
Carlos Galvez b2a2c38349 Fix bug in readability-uppercase-literal-suffix
Fixes https://bugs.llvm.org/show_bug.cgi?id=51790. The check triggers
incorrectly with non-type template parameters.

A bisect determined that the bug was introduced here:
ea2225a10b

Unfortunately that patch can no longer be reverted on top of the main
branch, so add a fix instead. Add a unit test to avoid regression in
the future.
2021-09-27 14:03:53 -04:00
peter klausler 9eab0da183 [flang] Catch branching into FORALL/WHERE constructs
Enforce constraints C1034 & C1038, which disallow the use
of otherwise valid statements as branch targets when they
appear in FORALL &/or WHERE constructs.  (And make the
diagnostic message somewhat more user-friendly.)

Differential Revision: https://reviews.llvm.org/D109936
2021-09-27 10:51:44 -07:00
Praveen Velliengiri e90b512c4d [AMDGPU] Change ASAN init/fini kernels linkage to external.
HSA runtime fails to find the symbols for Init and Fini kernels as
they mark with internal linkage, changing the linkage to external
to fix those errors.

Differential Revision: https://reviews.llvm.org/D110054
2021-09-27 11:50:37 -06:00
Sumesh Udayakumaran b2af2aeea6 [mlir] Mode for explicitly controlling the fusion kind
New mode option that allows for either running the default fusion kind that happens today or doing either of producer-consumer or sibling fusion. This will also be helpful to minimize the compile-time of the fusion tests.

Reviewed By: bondhugula, dcaballe

Differential Revision: https://reviews.llvm.org/D110102
2021-09-27 20:37:42 +03:00
Quinn Pham 682e15f371 [PowerPC] Fix td pattern for P10 VSLDBI and VSRDBI
This patch fixes the pattern for the P10 instructions Vector Shift Left
Double by Bit Immediate VN-form and Vector Shift Right Double by Bit
Immediate VN-form. The third argument should be a target constant (`timm`)
instead of an `i32` because an immediate is expected.

Reviewed By: lei

Differential Revision: https://reviews.llvm.org/D109920
2021-09-27 12:36:18 -05:00
Yaxun (Sam) Liu c4afb5f81b [HIP] Fix linking of asanrt.bc
HIP currently uses -mlink-builtin-bitcode to link all bitcode libraries, which
changes the linkage of functions to be internal once they are linked in. This
works for common bitcode libraries since these functions are not intended
to be exposed for external callers.

However, the functions in the sanitizer bitcode library is intended to be
called by instructions generated by the sanitizer pass. If their linkage is
changed to internal, their parameters may be altered by optimizations before
the sanitizer pass, which renders them unusable by the sanitizer pass.

To fix this issue, HIP toolchain links the sanitizer bitcode library with
-mlink-bitcode-file, which does not change the linkage.

A struct BitCodeLibraryInfo is introduced in ToolChain as a generic
approach to pass the bitcode library information between ToolChain and Tool.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D110304
2021-09-27 13:25:46 -04:00
William S. Moses 6dd5b1e33e [MLIR][LLVM] Add error if using incorrect attribute type for specifying LLVM linkage
Address post-commit review in https://reviews.llvm.org/D108524 to add appropriate diagnostics.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110566
2021-09-27 13:24:05 -04:00
peter klausler 1c2e5fd66e [flang] Enforce constraint: defined ass't in WHERE must be elemental
A defined assignment subroutine invoked in the context of a WHERE
statement or construct must necessarily be elemental (C1032).

Differential Revision: https://reviews.llvm.org/D109932
2021-09-27 10:12:53 -07:00
Craig Topper a2a07e8db3 [RISCV] Fold store of vmv.x.s to a vse with VL=1.
This can avoid a loss of decoupling with the scalar unit on cores
with decoupled scalar and vector units.

We should support FP too, but those use extract_element and not a
custom ISD node so it is a little different. I also left a FIXME
in the test for i64 extract and store on RV32.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109482
2021-09-27 09:54:46 -07:00
Fangrui Song 2bf06d9345 [ELF] Support symbol names with space in linker script expressions
Fix PR51961

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D110490
2021-09-27 09:50:42 -07:00
Kazu Hirata 59540b29f8 [InstCombine] Fix an "unused variable" warning 2021-09-27 09:49:32 -07:00
Bixia Zheng fbd5821c6f Implement the conversion from sparse constant to sparse tensors.
The sparse constant provides a constant tensor in coordinate format. We first split the sparse constant into a constant tensor for indices and a constant tensor for values. We then generate a loop to fill a sparse tensor in coordinate format using the tensors for the indices and the values. Finally, we convert the sparse tensor in coordinate format to the destination sparse tensor format.

Add tests.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D110373
2021-09-27 09:47:29 -07:00
@vladaindjic 5357a98c82 [OpenMP] libomp: Usage of TASK_TIED constant inside kmp_gsupport.cpp
The minor code refactorization introduces the TASK_TIED constant inside
kmp_gsupprot.cpp as a replacement for the literal value 1.
The mentioned constant is now used in both kmp_tasking.cpp and
kmp_gsupport.cpp files.

Differential Revision: https://reviews.llvm.org/D110441
2021-09-27 19:45:56 +03:00
Craig Topper 933182e948 [RISCV] Improve support for forming widening multiplies when one input is a scalar splat.
If one input of a fixed vector multiply is a sign/zero extend and
the other operand is a splat of a scalar, we can use a widening
multiply if the scalar value has sufficient sign/zero bits.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110028
2021-09-27 09:37:07 -07:00
Daniil Fukalov 1f73f0c19d [NFC][AMDGPU] Update cost model tests:
1. Convert to generated tests.
2. Added code-size case in few places.
2021-09-27 19:26:02 +03:00
Sanjay Patel 9075edc89b [InstCombine] move shl-only folds out from under commonShiftTransforms(); NFCI
This is no-functional-change-intended, but it hopefully makes things
slightly clearer and more efficient to have transforms that require
'shl' be called only from visitShl(). Further cleanup is possible.
2021-09-27 12:09:47 -04:00
Pavel Labath 3dbf27e762 [lldb] A different fix for Domain Socket tests
we need to drop nuls from the end of the string.
2021-09-27 18:00:27 +02:00
Kazu Hirata b68a62b3a9 [Lanai] Remove redundant declaration getTheLanaiTarget (NFC)
Note that getTheLanaiTarget is declared in
TargetInfo/LanaiTargetInfo.h, which LanaiDisassembler.cpp includes.

Identified with readability-redundant-declaration.
2021-09-27 08:58:27 -07:00
Kirill Bobyrev 0b1eff1bc5
[clangd] Refactor IncludeStructure: use File (unsigned) for most computations
Preparation for D108194.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D110386
2021-09-27 17:50:53 +02:00
Joseph Huber 74d622dea4 [OpenMP] Add new worksharing definitions into device RTL
This path defines the newly added `__kmpc_disitrute_static_init`
functions in the device runtime library. These functions are currently
exact copies of the current worksharing method but can be tuned later.

Depends on D110429

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110430
2021-09-27 11:36:41 -04:00
Joseph Huber b4a5543624 [OpenMP] Introduce a new worksharing RTL function for distribute
This patch adds a new RTL function for worksharing. Currently we use
`__kmpc_for_static_init` for both the `distribute` and `parallel`
portion of the loop clause. This patch replaces the `distribute` portion
with a new runtime call `__kmpc_distribute_static_init`. Currently this
will be used exactly the same way, but will make it easier in the future
to fine-tune the distribute and parallel portion of the loop.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110429
2021-09-27 11:36:37 -04:00
Raphael Isemann be2a4216fc [lldb] Fix SocketTest.DomainGetConnectURI on macOS by stripping more zeroes from getpeername result
Apparently macOS is padding the name result with several padding zeroes at
the end. Just strip them all to pretend it's a C-string.

Thanks to Pavel for suggesting this fix.
2021-09-27 17:34:45 +02:00
Nico Weber 7664508910 [llvm/OptTable] Add named param comment for GroupedShortOption 2021-09-27 11:33:29 -04:00
Jake Egan 56049b7129 Fix tests defaulting to incorrect triples on AIX
The tests only specify -march, so when the tests are run on AIX the target OS defaults to AIX, which causes the tests to misbehave.

This patch constrains the tests by specifying -mtriple instead of -march.

Reviewed By: daltenty, jsji, MaskRay

Differential Revision: https://reviews.llvm.org/D110186
2021-09-27 11:30:45 -04:00
Nico Weber 730bbc6f72 [llvm/OptTable] Drop "The" prefix on fields 2021-09-27 11:24:51 -04:00
Nico Weber 6ffd8e3902 [llvm] Convert OptTable::ParseOneArg() to std::unique_ptr<> 2021-09-27 11:19:21 -04:00
Nico Weber 7789a68e5a [llvm] Convert OptTable::parseOneArgGrouped() to std::unique_ptr<> 2021-09-27 11:19:15 -04:00
Nico Weber 2f955424c4 [llvm] ConvertOption::accept(), acceptInternal() to std::unique_ptr<>
These functions transfer ownership to the caller. Make this clear in the
type system.

No behavior change.
2021-09-27 11:05:02 -04:00
Sanjay Patel 21429cf43a [InstCombine] generalize fold for (trunc (X u>> C1)) u>> C
This is another step towards trying to re-apply D110170
by eliminating conflicting transforms that cause infinite loops.
a47c8e40c7 was a previous patch in this direction.

The diffs here are mostly cosmetic, but intentional:
1. The existing code that would handle this pattern in FoldShiftByConstant()
   is limited to 'shl' only now. The formatting change to IsLeftShift shows
   that we could move several transforms into visitShl() directly for
   efficiency because they are not common shift transforms.

2. The tests are regenerated to show new instruction names to prove that
   we are getting (almost) identical logic results.

3. The one case where we differ ("trunc_sandwich_small_shift1") shows that
   we now use a narrow 'and' instruction. Previously, we relied on another
   transform to do that, but it is limited to legal types. That seems to
   be a legacy constraint from when IR analysis and codegen were less robust.

https://alive2.llvm.org/ce/z/JxyGA4

  declare void @llvm.assume(i1)

  define i8 @src(i32 %x, i32 %c0, i8 %c1) {
    ; The sum of the shifts must not overflow the source width.
    %z1 = zext i8 %c1 to i32
    %sum = add i32 %c0, %z1
    %ov = icmp ult i32 %sum, 32
    call void @llvm.assume(i1 %ov)

    %sh1 = lshr i32 %x, %c0
    %tr = trunc i32 %sh1 to i8
    %sh2 = lshr i8 %tr, %c1
    ret i8 %sh2
  }

  define i8 @tgt(i32 %x, i32 %c0, i8 %c1) {
    %z1 = zext i8 %c1 to i32
    %sum = add i32 %c0, %z1
    %maskc = lshr i8 -1, %c1

    %s = lshr i32 %x, %sum
    %t = trunc i32 %s to i8
    %a = and i8 %t, %maskc
    ret i8 %a
  }
2021-09-27 10:57:31 -04:00
Sanjay Patel 025a805d7c [InstCombine] match variable names and code comments; NFC
Similar to:
29c09c7

Planned follow-up is to add a transform here to allow removing
a common shift fold that is conflicting with D110170.
2021-09-27 10:57:31 -04:00
Amy Kwan 1f5b60ad47 Explicitly specify -fintegrated-as to clang/test/Driver/compilation_database.c test case.
It appears that this test assumes that the toolchain utilizes the integrated
assembler by default, since the expected output in the CHECKs are
compilation_database.o.

However, this test fails on AIX as AIX does not utilize the integrated assembler.
On AIX, the output instead is of the form /tmp/compilation_database-*.s.
Thus, this patch explicitly adds the -fintegrated-as option to match the
assumption that the integrated assembler is used by default.

Differential Revision: https://reviews.llvm.org/D110431
2021-09-27 09:56:18 -05:00
Eugene Zhulenev 92db09cde0 [mlir] AsyncRuntime: use int64_t for ref counting operations
Workaround for SystemZ ABI problem: https://bugs.llvm.org/show_bug.cgi?id=51898

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D110550
2021-09-27 07:55:01 -07:00