Commit Graph

401151 Commits

Author SHA1 Message Date
Roman Lebedev d51532d8aa
[X86][Costmodel] Load/store i32/f32 Stride=6 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/szEj1ceee - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: <=8.8`
So could pick cost of `15`.

For store we have:
https://godbolt.org/z/81bq4fTo1 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=10.0`
So we could pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111087
2021-10-05 16:58:57 +03:00
Roman Lebedev 764fd5f463
[X86][Costmodel] Load/store i32/f32 Stride=6 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.3`
So could pick cost of `6`.

For store we have:
https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So we could pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111083
2021-10-05 16:58:57 +03:00
Roman Lebedev c800119c46
[X86][Costmodel] Load/store i64/f64 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/3M3hbq7n8 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0`
So could pick cost of `20`.

For store we have:
https://godbolt.org/z/zvnPYWTx7 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0`
So we could pick cost of `20`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111076
2021-10-05 16:58:57 +03:00
Roman Lebedev 000ce0bfd5
[X86][Costmodel] Load/store i64/f64 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/MTKdzjvnr - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So could pick cost of `8`.

For store we have:
https://godbolt.org/z/cMYEvqoah - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So we could pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111075
2021-10-05 16:58:57 +03:00
Roman Lebedev dcc2b0d933
[X86][Costmodel] Load/store i64/f64 Stride=4 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z197317d1 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So could pick cost of `6`.

For store we have:
https://godbolt.org/z/8dzszjf9q - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So we could pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111073
2021-10-05 16:58:57 +03:00
Roman Lebedev 7d91037fd2
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=16 interleaving costs
This one required quite a bit of assembly surgery, but the trend continues, so i think this is right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/EKWdj8cKT - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`.

For store we have:
https://godbolt.org/z/zj4bb9P75 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111064
2021-10-05 16:58:57 +03:00
Roman Lebedev 4aee1e5b93
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/a6rxMG6ec - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=12.0`
So could pick cost of `16`.

For store we have:
https://godbolt.org/z/ced1bdqc9 - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So we could pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111063
2021-10-05 16:58:57 +03:00
Roman Lebedev 3c2e22b795
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/avq1oz98W - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0`
So could pick cost of `8`.

For store we have:
https://godbolt.org/z/89PGMc1qs - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=6.0`
So we could pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111061
2021-10-05 16:58:57 +03:00
Roman Lebedev b6234c1edf
[X86][Costmodel] Load/store i32/f32 Stride=4 VF=2 interleaving costs
Finally, we are getting to the heavy-hitter stuff!

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/7crGWoar6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So could pick cost of `4`.

For store we have:
https://godbolt.org/z/T8aq3MszM - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.0`
So we could pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111060
2021-10-05 16:58:56 +03:00
kpyzhov 095c48fdf3 [AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration.
The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument.
This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf.

Differential revision: https://reviews.llvm.org/D110337
2021-10-05 09:56:04 -04:00
Hsiangkai Wang 80a6456306 [RISCV] Update to vlm.v and vsm.v according to v1.0-rc1.
vle1.v  -> vlm.v
vse1.v  -> vsm.v

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106044
2021-10-05 21:49:54 +08:00
Lei Zhang 83e074a0c6 [mlir] Add an 'cppNamespace' field to availability
This allows us to generate interfaces in a namespace,
following other TableGen'erated code.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D108311
2021-10-05 09:38:09 -04:00
Dmitry Vyukov c483140f3c tsan: improve detection of stack/tls races
Print meaningful stack frames for stack/tls races
(instead of PC 1/2 that don't symbolize).

Imitate stack/tls writes after we create and initialize
the new thread, otherwise the races are not detected.

This is re-submit of the following reverted commits,
but without tests as they failed on a number of OSes/arches:
"tsan: fix and test detection of TLS races"
"tsan: fix tls_race3 test on darwin"
"tsan: print a meaningful frame for stack races"

Differential Revision: https://reviews.llvm.org/D111147
2021-10-05 15:32:39 +02:00
Lei Zhang 070b0af9b8 [mlir][spirv] Fix path in define_enum.sh script
Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D108310
2021-10-05 09:32:01 -04:00
Jay Foad f65458df32 [PHIElimination] Update LiveVariables after handling an unspillable terminator
Update the LiveVariables analysis after the special handling for
unspillable terminators which was added in D91358. This is just enough
to fix some "Block should not be in AliveBlocks" / "Block missing from
AliveBlocks" errors in the codegen test suite when machine verification
is forced to run after PHIElimination (currently it is disabled).

Differential Revision: https://reviews.llvm.org/D110939
2021-10-05 14:25:53 +01:00
Dmitry Vyukov a0ed71ff29 tsan: make cur_thread_init return cur_thread
Whenever we call cur_thread_init, we call cur_thread on the next line.
So make cur_thread_init return the current thread directly.
Makes code a bit shorter, does not affect codegen.

Reviewed By: vitalybuka, melver

Differential Revision: https://reviews.llvm.org/D110384
2021-10-05 15:24:52 +02:00
Vassil Vassilev f4f9ad0f5d Reland "[clang-repl] Allow loading of plugins in clang-repl."
Differential revision: https://reviews.llvm.org/D110484
2021-10-05 13:04:01 +00:00
Jeremy Morse e265644b32 [DebugInfo][InstrRef] Track all of DBG_PHIs operands
An important part of the instruction referencing solution is that we
identify all the registers that values move between before we then compute
an SSA-like function from the machine code, and from the variable
intrinsics. DBG_PHIs weren't causing all the subregisters of their operands
to be tracked; this patch forces that to happen.

The practical implications were that not enough space is allocated for
storing values when analysing the function -- asan will crash on the
attached test case with an unpatched compiler. Non-asan llc's will produce
a DBG_VALUE $noreg, where it should be $dil.

Differential Revision: https://reviews.llvm.org/D109064
2021-10-05 14:01:26 +01:00
Kamau Bridgeman 8737c74fab [PowerPC][MMA] Allow MMA builtin types in pre-P10 compilation units
This patch allows the use of __vector_quad and __vector_pair, PPC MMA builtin
types, on all PowerPC 64-bit compilation units. When these types are
made available the builtins that use them automatically become available
so semantic checking for mma and pair vector memop __builtins is also
expanded to ensure these builtin function call are only allowed on
Power10 and new architectures. All related test cases are updated to
ensure test coverage.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D109599
2021-10-05 07:59:32 -05:00
Tobias Gysi e826db6240 [mlir][linalg] Move generalization pattern to Transforms (NFC).
Move the generalization pattern to the other Linalg transforms to make it available to the codegen strategy.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110728
2021-10-05 12:49:42 +00:00
Raphael Isemann cf818b55e7 [lldb][NFC] Remove unnecessary include in cpp/const_this test 2021-10-05 14:39:10 +02:00
Aaron Ballman aa4f4d18e8 consteval if is now fully supported
This amends 424733c12a which accidentally
dropped the change to the status page.
2021-10-05 08:21:29 -04:00
Corentin Jabot 424733c12a Implement if consteval (P1938)
Modify the IfStmt node to suppoort constant evaluated expressions.

Add a new ExpressionEvaluationContext::ImmediateFunctionContext to
keep track of immediate function contexts.

This proved easier/better/probably more efficient than walking the AST
backward as it allows diagnosing nested if consteval statements.
2021-10-05 08:04:14 -04:00
Valentin Clement b5a11a991e
[fir] Split FIROptimizer lib into several smaller libraries
Partition libFIROptimizer into smaller libraries that reflect the
structure. Adapt potential problems.

This patch is part of the upstreaming effort from fir-dev branch. It's a
building stone to upstreaming transformations.

Reviewed By: schweitz

Differential Revision: https://reviews.llvm.org/D111055

Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
2021-10-05 14:02:32 +02:00
Sjoerd Meijer cdfc678572 [SCCPSolver] Fix use-after-free in markArgInFuncSpecialization
In SCCPSolver::markArgInFuncSpecialization, the ValueState map may be
reallocated *after* the initial ValueLatticeElement reference is grabbed, but
*before* its use in copy initialization. This causes a use-after-free.  To fix
this, this commit changes the behavior to create the new ValueLatticeElement
before assigning the old one to it.

Patch by: https://github.com/duck-37/

Differential Revision: https://reviews.llvm.org/D111112
2021-10-05 12:56:32 +01:00
Mirko Brkusanin 40e00063bc [GlobalISel] Combine fabs(fneg(x)) to fabs(x)
Differential Revision: https://reviews.llvm.org/D110943
2021-10-05 13:43:39 +02:00
Nicolas Vasilache af9dce18bf [mlir][Linalg] Allow operand-less scf::ExecuteRegionOp to encapsulate scf::YieldOp
These are considered noops.
Buferization will still fail on scf.execute_region which yield values.
This is used to make comprehensive bufferization interoperate better with external clients.

Differential Revision: https://reviews.llvm.org/D111130
2021-10-05 11:34:53 +00:00
Aaron Ballman 1549be3e82 Silence an implicit conversion warning on the bit shift result in MSVC; NFC 2021-10-05 07:13:47 -04:00
gbhyamso 02895eede1 [llvm-cxxfilt][NFC] Fix test for running in Windows cmd
The test llvm\test\tools\llvm-cxxfilt\delimiters.test started failling when run
from cmd.exe on Windows after D110986 which added a unicode character (⦙) to it.
Piping the unicode character in cmd.exe causes it to be converted to a '?'.
That causes the test to fail because the llvm-cxxfilt output becomes Foo?Bar
rather than the expected Foo⦙Bar.

Redirect the echo output to and from a temporary file to get around this
problem.

It's not entirely clear what the root cause is, but two separate downstream
builders are tripping up on this, so we are landing the work around for the
time being.

Differential Revision: https://reviews.llvm.org/D111072
2021-10-05 12:10:06 +01:00
Balázs Kéri bcefea80a4 [clang][ASTImporter] Add import of thread safety attributes.
Attributes of "C/C++ Thread safety attributes" section in Attr.td
are added to ASTImporter. The not added attributes from this section
do not need special import handling.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D110528
2021-10-05 13:08:31 +02:00
Max Kazantsev 471b25e217 [Test] Add test showing profitable peeling opportunity
Patch by Dmitry Makogon!
2021-10-05 17:51:45 +07:00
LLVM GN Syncbot 8b2d6fd6cb [gn build] Port 214054f78a 2021-10-05 10:41:33 +00:00
Michał Górny 214054f78a [lldb] Move DynamicRegisterInfo to public Target library
Move DynamicRegisterInfo from the internal lldbPluginProcessUtility
library to the public lldbTarget library.  This is a prerequisite
towards ABI plugin changes that are going to pass DynamicRegisterInfo
parameters.

Differential Revision: https://reviews.llvm.org/D110942
2021-10-05 12:40:55 +02:00
Andrew Ng 3334b9d70b [ELF][test] Enhance relative dynamic relocation tests
Add checking of the value of the relocation with an addend. Also check
all relocation offsets.

Differential Revision: https://reviews.llvm.org/D111071
2021-10-05 11:32:22 +01:00
Bjorn Pettersson 8ed0e6b2cf [SelectionDAG] Replace error prone index check in BaseIndexOffset::computeAliasing
Deriving NoAlias based on having the same index in two BaseIndexOffset
expressions seemed weird (and as shown in the added unittest the
correctness of doing so depended on undocumented pre-conditions that
the user of BaseIndexOffset::computeAliasing would need to take care
of.

This patch removes the code that dereived NoAlias based on indices
being the same. As a compensation, to avoid regressions/diffs in
various lit test, we also add a new check. The new check derives
NoAlias in case the two base pointers are based on two different
GlobalValue:s (neither of them being a GlobalAlias).

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D110256
2021-10-05 12:15:55 +02:00
Bjorn Pettersson 1896fb2cff [SelectionDAG] Assume that a GlobalAlias may alias other global values
This fixes a bug detected in DAGCombiner when using global alias
variables. Here is an example:
  @foo = global i16 0, align 1
  @aliasFoo = alias i16, i16 * @foo
  define i16 @bar() {
    ...
    store i16 7, i16 * @foo, align 1
    store i16 8, i16 * @aliasFoo, align 1
    ...
  }

BaseIndexOffset::computeAliasing would incorrectly derive NoAlias
for the two accesses in the example above, resulting in DAGCombiner
miscompiles.

This patch fixes the problem by a defensive approach letting
BaseIndexOffset::computeAliasing return false, i.e. that the aliasing
couldn't be determined, when comparing two global values and at least
one is a GlobalAlias. In the future we might improve this with a
deeper analysis to look at the aliasee for the GlobalAlias etc. But
that is a bit more complicated considering that we could have
'local_unnamed_addr' and situations with several 'alias' variables.

Fixes PR51878.

Differential Revision: https://reviews.llvm.org/D110064
2021-10-05 12:15:55 +02:00
Adrian Kuegel d009f6e51c [mlir] Convert ConstShapeOp to a static tensor type.
ConstShapeOp knows its shape, so it should also have a static tensor type.

Differential Revision: https://reviews.llvm.org/D111127
2021-10-05 12:14:43 +02:00
Jay Foad 9ce4f37206 [AMDGPU][GlobalISel] Fix legalization of G_UMULH
Scalarize before narrowing because the narrowing implementation does not
work on vectors. This matches what we do for regular G_MUL.

Differential Revision: https://reviews.llvm.org/D111129
2021-10-05 10:56:02 +01:00
Simon Pilgrim e463b69736 [Support] Change fatal_error_handler_t to take a const char* instead of std::string
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

Excessive use of the <string> header has a massive impact on compile time; its most commonly included via the ErrorHandling.h header, which has to be included in many key headers, impacting many source files that have no need for std::string.

As an initial step toward removing the <string> include from ErrorHandling.h, this patch proposes to update the fatal_error_handler_t handler to just take a raw const char* instead.

The next step will be to remove the report_fatal_error std::string variant, which will involve a lot of cleanup and better use of Twine/StringRef.

Differential Revision: https://reviews.llvm.org/D111049
2021-10-05 10:55:40 +01:00
Jay Foad 0a031f5c88 [GlobalISel] Simplify narrowScalarMul. NFC.
Remove some redundancy because the source and result types of any
multiply are always the same.
2021-10-05 10:53:12 +01:00
David Green ffaaa9b05c [ARM] Reset speculation-hardening-sls.ll test checks.
The commit e497b12a69 went and regenerated
all the checks lines in the Arm speculation-hardening-sls.ll test in a
way that removed most of the important checks. This just resets them
back to how they were before, with the single character fix to change:
; NOHARDENARM:     {{bxge lr$}}
to
; NOHARDENARM:     {{bxgt lr$}}

Differential Revision: https://reviews.llvm.org/D111074
2021-10-05 10:51:18 +01:00
Frederik Gossen 519663beba [MLIR] Add an option to disable `maxIterations` in greedy pattern rewrites
This option is needed for passes that are known to reach a fix point, but may
need many iterations depending on the size of the input IR.

Differential Revision: https://reviews.llvm.org/D111058
2021-10-05 11:49:01 +02:00
David Green 10b93a5dec [AArch64] Make speculation-hardening-sls.ll x16 test more robust
As suggested in D110830, this copies the Arm backend method of testing
function calls through specific registers, using inline assembly to
force the variable into x16 to check that the __llvm_slsblr_thunk calls
do not use a register that may be clobbered by the linker.

Differential Revision: https://reviews.llvm.org/D111056
2021-10-05 10:32:30 +01:00
Tim Northover 5f65ee260d AArch64+GISel: legalize vector remainder operations. 2021-10-05 10:20:10 +01:00
Valentin Clement 4755fb2e18
Revert "[fir] Split FIROptimizer lib into several smaller libraries"
This reverts commit c02a8cdda8.
2021-10-05 11:19:53 +02:00
Carl Ritson e86d45ec00 [AMDGPU] Pre-commit test for D111126 (NFC) 2021-10-05 18:13:54 +09:00
Valentin Clement c02a8cdda8
[fir] Split FIROptimizer lib into several smaller libraries
Partition libFIROptimizer into smaller libraries that reflect the
structure. Adapt potential problems.

This patch is part of the upstreaming effort from fir-dev branch. It's a
building stone to upstreaming transformations.

Reviewed By: schweitz

Differential Revision: https://reviews.llvm.org/D111055

Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
2021-10-05 11:08:51 +02:00
Pavel Labath ca5be065c4 Revert "[lldb] Refactor variable parsing"
This commit has introduced test failures in internal google tests.
Working theory is they are caused by a genuine problem in the patch
which gets tripped by some debug info from system libraries.

Reverting while we try to reproduce the problem in a self-contained
fashion.

This reverts commit 601168e420.
2021-10-05 10:46:30 +02:00
Nicolas Vasilache 8096759519 [mlir][Linalg] NFC - Add support to specify that a tensor value is known to bufferize to writeable memory
This change allows better interop with external clients of comprehensive bufferization functions
but is otherwise NFC for the MLIR pass itself.

Differential Revision: https://reviews.llvm.org/D111121
2021-10-05 08:37:34 +00:00
Valentin Clement bc02a3d428
Revert "[fir] Split FIROptimizer lib into several smaller libraries"
This reverts commit c2eff3d5b9.
2021-10-05 10:16:19 +02:00