Commit Graph

436174 Commits

Author SHA1 Message Date
Stanley Winata 026fac2a14 [mlir][linalg] Vectorization for conv_1d_ncw_fcw
Most computer vision torch models uses nchw/ncw convolution. In a previous patch we added decomposition conv2dNchw to conv1dNcw. To enhance the performance on torch models we add this vectorization pattern for conv1dNcw which would consquently also improve the performance on conv2dNchw.

On IREE + Intel Xeon 8360 + Resnet50, we were able to get ~7x speed up ~880ms to 126ms.

Reviewed By: nicolasvasilache, hanchung

Differential Revision: https://reviews.llvm.org/D133675
2022-09-14 11:07:53 -07:00
Piotr Sobczak abd927e5a8 [AMDGPU] Check for num elts in SelectVOP3PMods
The rest of the code section assumes there are exactly two elements
in the vector (Lo, Hi), so add the check before entering the section.

Differential Revision: https://reviews.llvm.org/D133852
2022-09-14 20:00:19 +02:00
Julius 49e7ef2c09 [Clang]: Diagnose deprecated copy operations also in MSVC compatibility mode
When running in MSVC compatibility mode, previously no deprecated copy
operation warnings (enabled by -Wdeprecated-copy) were raised. This
restriction was already in place when the deprecated copy warning was
first introduced.

This patch removes said restriction so that deprecated copy warnings, if
enabled, are also raised in MSVC compatibility mode. The reasoning here
being that these warnings are still useful when running in MSVC
compatibility mode and also have to be semi-explicitly enabled in the
first place (using -Wdeprecated-copy, -Wdeprecated or -Wextra).

Differential Revision: https://reviews.llvm.org/D133354
2022-09-14 19:48:08 +02:00
Florian Hahn 7f3ff9d3c0
[ConstraintElimination] Track if variables are positive in constraint.
Keep track if variables are known positive during constraint
decomposition, aggregate the information when building the constraint
object and encode the extra information as constraints to be used during
reasoning.
2022-09-14 18:43:54 +01:00
Thomas Raoux 4abb9e5d20 [mlir][vector] Clean up and generalize lowering of warp_execute to scf
Simplify the lowering of warp_execute_on_lane0 of scf.if by making the
logic more generic. Also remove the assumption that the most inner
dimension is the dimension distributed.

Differential Revision: https://reviews.llvm.org/D133826
2022-09-14 17:36:16 +00:00
Matt Arsenault c9ef7d49ab llvm-reduce: Do not insert replacement IMPLICIT_DEFs for dead defs
Also skip dead defs when looking for a previous vreg with the same
class. This helps avoid some mid-reduction verifier errors when
LiveIntervals computation starts introducing dead flags everywhere.
2022-09-14 13:21:14 -04:00
Matt Arsenault 0e1ee738f1 llvm-reduce: Restrict test to only test relevant reductions
Avoids breaking this test in a future change.
2022-09-14 13:21:01 -04:00
Joseph Huber 23bc343855 [Libomptarget] Change device free routines to accept the allocation kind
Previous support for device memory allocators used a single free
routine and did not provide the original kind of the allocation. This is
problematic as some of these memory types required different handling.
Previously this was worked around using a map in runtime to record the
original kind of each pointer. Instead, this patch introduces new free
routines similar to the existing allocation routines. This allows us to
avoid a map traversal every time we free a device pointer.

The only interfaces defined by the standard are `omp_target_alloc` and
`omp_target_free`, these do not take a kind as `omp_alloc` does. The
standard dictates the following:

"The omp_target_alloc routine returns a device pointer that references
the device address of a storage location of size bytes. The storage
location is dynamically allocated in the device data environment of the
device specified by device_num."

Which suggests that these routines only allocate the default device
memory for the kind. So this has been changed to reflect this. This
change is somewhat breaking if users were using `omp_target_free` as
previously shown in the tests.

Reviewed By: JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D133053
2022-09-14 12:14:07 -05:00
Nico Weber 5631d20bfc Revert "[clang] fix generation of .debug_aranges with LTO"
This reverts commit 6bf6730ac5.
Breaks tests if LLD isn't being built, see comments on
https://reviews.llvm.org/D133092
2022-09-14 12:43:24 -04:00
revunov.denis@huawei.com 553c238952 [BOLT] Preserve original LSDA type encoding
In non-pie binaries BOLT unconditionally converted type encoding
from indirect to absptr, which broke std exceptions since pointers
to their typeinfo were only assigned at runtime in .data section.
In this patch we preserve original encoding so that indirect
remains indirect and can be resolved at runtime, and absolute remains absolute.

Reviewed By: rafauler, maksfb

Differential Revision: https://reviews.llvm.org/D132484
2022-09-14 16:33:47 +00:00
Ashay Rane f1848b0a0e
[clang] fix linker executable path in test
A previous patch (https://reviews.llvm.org/D132810) introduced a test
that fails on systems where the linker executable (`ld`) has a `.exe`
extension.  This patch updates the regex in the test so that lit can
look for both `ld` as well as `ld.exe`.

Reviewed By: stella.stamenova

Differential Revision: https://reviews.llvm.org/D133773
2022-09-14 11:35:37 -05:00
Stella Stamenova da459043f8 Revert "[lldb][DWARF5] Enable macro evaluation"
This reverts commit a0fb69d17b.

This broke the windows lldb bot: https://lab.llvm.org/buildbot/#/builders/83/builds/23666
2022-09-14 09:30:49 -07:00
Nico Weber db6a53450f Revert "[test][clang] run test for lld emitting dwarf-aranages only if lld is presented"
This reverts commit 44075cc34a.
Broke check-clang, see comments on https://reviews.llvm.org/D133841
2022-09-14 12:17:41 -04:00
Eman Copty 54bd8bb452 [mlir] Add accessor methods for I[2|4|16] types to Builder.
Adds the accessor methods for I[2|4|16] types to the Builder.

Differential Revision: https://reviews.llvm.org/D133793
2022-09-14 09:06:00 -07:00
Peiming Liu 55a1d50fb9 [mlir][sparse] Make sparse compiler more admissible.
Previously, the iteration graph is computed without priority. This patch add a heuristic when computing the iteration graph by starting with Reduction iterator when doing topo sort, which makes Reduction iterators (likely) appear as late in the sorted array as possible.

The current sparse compiler also failed to compile the newly added case.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D133738
2022-09-14 15:59:47 +00:00
Nicolas Vasilache e479aecd56 Revert "[mlir][scf][Transform] Refactor transform.fuse_into_containing_op so it is iterative and supports output fusion."
This reverts commit 54a5f60628 which is a WIP that was pushed by mistake.
2022-09-14 08:51:30 -07:00
Nicolas Vasilache 54a5f60628 [mlir][scf][Transform] Refactor transform.fuse_into_containing_op so it is iterative and supports output fusion.
This revision revisits the implementation of `transform.fuse_into_containing_op` so that it iterates on
producers one use at a time.

Support is added to fuse a producer through a foreach_thread shared tensor argument, in which case we
tile and fuse the op inside the containing op and update the shared tensor argument to the unique destination operand.
If one cannot find such a unique destination operand the transform fails.
2022-09-14 08:50:32 -07:00
Nicolas Vasilache 593c14d422 [mlir][Linalg] Add return type filter to the transform dialect
This allows matching ops by additionally providing an idiomatic spec for a unique return type.

Differential Revision: https://reviews.llvm.org/D133862
2022-09-14 08:50:31 -07:00
Alexey Bataev d647312e3f [SLP][NFC]Extract getLastInstructionInBundle function for better
dependence  checking, NFC.

Part of D110978
2022-09-14 08:43:15 -07:00
Jeff Niu 3108249dea [MLIR][math] Use approximate matches for folded ops
LibM implementations differ, so the folders can have different results
on different platforms. For instance, the `cos` folder was failing on M1
mac. I chose to match the constant floats to 2(.5) significant digits.

Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D133797
2022-09-14 08:39:41 -07:00
Groverkss b696e25a7a [MLIR][Presburger] Add hermite normal form computation to Matrix
This patch adds hermite normal form computation to Matrix. Part of this algorithm
lived in LinearTransform, being used for compuing column echelon form. This
patch moves the implementation to Matrix::hermiteNormalForm and generalises it
to compute the hermite normal form.

Reviewed By: arjunp

Differential Revision: https://reviews.llvm.org/D133510
2022-09-14 16:39:05 +01:00
Mats Petersson 9067de2a43 [flang][driver]Fix broken PowerPC tests
Tests don't work on PPC since `return` instruciton is't called `ret` (apparently)

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D133859
2022-09-14 16:33:38 +01:00
Zain Jaffal 8253f7e286
[InstCombine] Optimize multiplication where both operands are negated
Handle the case where both operands are negated in matrix multiplication

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133695
2022-09-14 16:29:39 +01:00
Haojian Wu f6e759bd26 Remove some unused static functions in CGOpenMPRuntimeGPU.cpp, NFC 2022-09-14 17:20:02 +02:00
David Spickett 3acaf04033 [LLVM][AArch64] Don't warn about clobbering X16 when Speculative Load Hardening is used
SLH will fall back to a different technique if X16 is being used,
so there is no need to warn for inline asm use. Only prevent other codegen
from using it.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D133766
2022-09-14 15:19:53 +00:00
Joseph Huber bae1a2cf3c [OpenMP] Remove unused function after removing simplified interface
Summary:
A previous patch removed the user of this function but did not remove
the function causing unused function warnings. Remove it.
2022-09-14 10:14:43 -05:00
John Ericson 154db06ce0 [CMake] Avoid `LLVM_BINARY_DIR` when other more specific variable are better-suited, part 1
A simple sed doing these substitutions:

- `${LLVM_BINARY_DIR}/\$\{CMAKE_CFG_INTDIR}/lib(${LLVM_LIBDIR_SUFFIX})?\>` -> `${LLVM_LIBRARY_DIR}`
- `${LLVM_BINARY_DIR}/\$\{CMAKE_CFG_INTDIR}/bin\>` -> `${LLVM_TOOLS_BINARY_DIR}`

where `\>` means "word boundary".

The only manual modifications were reverting changes in

- `compiler-rt/cmake/Modules/CompilerRTUtils.cmake`

because these were "entry points" where we wanted to tread carefully not not introduce a "loop" which would end with an undefined variable being expanded to nothing.

There are many more occurrences without `CMAKE_CFG_INTDIR`, but those are left for D132316 as they have proved somewhat tricky to fix.

This hopefully increases readability overall, and also decreases the usages of `LLVM_LIBDIR_SUFFIX`, preparing us for D130586.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D133828
2022-09-14 10:58:47 -04:00
Arjun P 6d6f6c4d3f [MLIR][Presburger] use arbitrary-precision arithmetic with MPInt instead of int64_t
Only the main Presburger library under the Presburger directory has been switched to use arbitrary precision. Users have been changed to just cast returned values back to int64_t or to use newly added convenience functions that perform the same cast internally.

The performance impact of this has been tested by checking test runtimes after copy-pasting 100 copies of each function. Affine/simplify-structures.mlir goes from 0.76s to 0.80s after this patch. Its performance sees no regression compared to its original performance at commit 18a06d4f3a before a series of patches that I landed to offset the performance overhead of switching to arbitrary precision.

Affine/canonicalize.mlir and SCF/canonicalize.mlir show no noticable difference, staying at 2.02s and about 2.35s respectively.

Also, for Affine and SCF tests as a whole (no copy-pasting), the runtime remains about 0.09s on average before and after.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D129510
2022-09-14 15:47:41 +01:00
Balazs Benics b8e1da0506 [analyzer] Initialize ShouldEmitErrorsOnInvalidConfigValue analyzer option
Downstream users who doesn't make use of the clang cc1 frontend for
commandline argument parsing, won't benefit from the Marshalling
provided default initialization of the AnalyzerOptions entries. More
about this later.
Those analyzer option fields, as they are bitfields, cannot be default
initialized at the declaration (prior c++20), hence they are initialized
at the constructor.
The only problem is that `ShouldEmitErrorsOnInvalidConfigValue` was
forgotten.

In this patch I'm proposing to initialize that field with the rest.

Note that this value is read by
`CheckerRegistry.cpp:insertAndValidate()`.
The analyzer options are initialized by the marshalling at
`CompilerInvocation.cpp:GenerateAnalyzerArgs()` by the expansion of the
`ANALYZER_OPTION_WITH_MARSHALLING` xmacro to the appropriate default
value regardless of the constructor initialized list which I'm touching.
Due to that this only affects users using CSA as a library, without
serious effort, I believe we cannot test this.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D133851
2022-09-14 16:45:44 +02:00
Joseph Huber 194ec844f5 [OpenMP][AMDGPU] Link bitcode ROCm device libraries per-TU
Previously, we linked in the ROCm device libraries which provide math
and other utility functions late. This is not stricly correct as this
library contains several flags that are only set per-TU, such as fast
math or denormalization. This patch changes this to pass the bitcode
libraries per-TU using the same method we use for the CUDA libraries.
This has the advantage that we correctly propagate attributes making
this implementation more correct. Additionally, many annoying unused
functions were not being fully removed during LTO. This lead to
erroneous warning messages and remarks on unused functions.

I am not sure if not finding these libraries should be a hard error. let
me know if it should be demoted to a warning saying that some device
utilities will not work without them.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D133726
2022-09-14 09:42:06 -05:00
Joseph Huber 2d26ecb1fb [OpenMP] Remove simplified device runtime handling
The old device runtime had a "simplified" version that prevented many of
the runtime features from being initialized. The old device runtime was
deleted in LLVM 14 and is no longer in use. Selectively deactivating
features is now done using specific flags rather than the old technique.
This patch simply removes the extra logic required for handling the old
simple runtime scheme.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133802
2022-09-14 09:41:50 -05:00
Nikita Popov b1cd393f9e [AA] Tracking per-location ModRef info in FunctionModRefBehavior (NFCI)
Currently, FunctionModRefBehavior tracks whether the function reads
or writes memory (ModRefInfo) and which locations it can access
(argmem, inaccessiblemem and other). This patch changes it to track
ModRef information per-location instead.

To give two examples of why this is useful:

* D117095 highlights a weakness of ModRef modelling in the presence
  of operand bundles. For a memcpy call with deopt operand bundle,
  we want to say that it can read any memory, but only write argument
  memory. This would allow them to be treated like any other calls.
  However, we currently can't express this and have to say that it
  can read or write any memory.
* D127383 would ideally be modelled as a separate threadid location,
  where threadid Refs outside pre-split coroutines can be ignored
  (like other accesses to constant memory). The current representation
  does not allow modelling this precisely.

The patch as implemented is intended to be NFC, but there are some
obvious opportunities for improvements and simplification. To fully
capitalize on this we would also want to change the way we represent
memory attributes on functions, but that's a larger change, and I
think it makes sense to separate out the FunctionModRefBehavior
refactoring.

Differential Revision: https://reviews.llvm.org/D130896
2022-09-14 16:34:41 +02:00
Florian Hahn efd3ec47d9
[ConstraintElimination] Clear new indices directly in getConstraint(NFC)
Instead of checking if any of the new indices has a non-zero coefficient
before using the constraint, do this directly when constructing the
constraint.
2022-09-14 15:31:25 +01:00
Christian Sigg 5cff32b9f0 [MLIR] Fix toy lit substitutions
The tools are called e.g. `toyc-ch1`, not `toy-ch1`.

Add missing toyc-ch6/7.

It turns out that the other substitutions are not needed more by specific circumstances rather than by design:
The lit test exec root is set to build/mlir/test, which is where all the test tools are placed by CMake and we wouldn't need to substitute them at all.
We shouldn't rely on this assumption though, because it will make things harder for standalone tests and other build systems.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D133842
2022-09-14 15:59:24 +02:00
Jordan Rupprecht 1f3def30ca Fix heap-use-after-free when clearing DIEs in fission compile units.
D131437 caused heap-use-after-free failures when testing TestCreateAfterAttach.py in asan mode, and "regular" crashes outside of asan.

This appears to be due to a mismatch in a couple places where we choose to clear the DIEs. When we clear the DIE of a skeleton unit, we unconditionally clear the DIE of the DWO unit if it exists. However, `~ScopedExtractDIEs()` only looks at the skeleton unit when deciding to clear. If we decide to clear the skeleton unit because it is now unused, we end up clearing the DWO unit that _is_ used. This change adds a guard by checking `m_cancel_scopes` to prevent clearing the DWO unit.

This is 100% reproducible by running TestCreateAfterAttach.py in asan mode, although it only seems to reproduce in our internal build, so no test case is added here. If someone has suggestions on how to write one, I can add it.

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D133790
2022-09-14 06:52:47 -07:00
Zain Jaffal d1dec04d76
[AArch64] Disable nontemproal load for Big Endian
The current code for generating nontemporal load outputs the wrong assembly for big endian architecture.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133789
2022-09-14 14:49:55 +01:00
Sanjay Patel 73919a87e9 [InstCombine] try multi-use demanded bits folds for 'add'
This patch enables a multi-use demanded bits fold (motivated by issue #57576):
https://alive2.llvm.org/ce/z/DsZakh

This mimics transforms that we already do on the single-use path.

Originally, this patch did not include the last part to form a constant, but
that can be removed independently to reduce risk. It's not clear what the
effect of either change will be when viewed end-to-end.

This is expected to be neutral or a slight win for compile-time.
See the "add-demand2" series for experimental timing results:
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright

Differential Revision: https://reviews.llvm.org/D133788
2022-09-14 09:30:59 -04:00
Alexey Bataev 796af0c027 [SLP] Move getInsertIndex function, NFC.
Part of D110978.
2022-09-14 06:22:52 -07:00
Mats Petersson b36b27b3fc [flang][driver]Fix broken flang-new mlir test
The test was added as a .mlir file, and this extension is not
in the lit.cfg.py, so it was never run. When running it, the
file would produce an error, as semicolon is not an MLIR comment.

This adds the extension and fixes the comment start by using C++
style comments.

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D133792
2022-09-14 14:16:31 +01:00
Zain Jaffal 244a6a76d9
[AArch64] Add nontemporal load tests for big endian.
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133765
2022-09-14 13:51:58 +01:00
Nikita Popov 1cfbbba15b [AA] Remove unnecessary intersections from getModRefBehavior() (NFC)
Intersection with other providers is performed by AAResults. Doing
this here is both pointless and confusing.
2022-09-14 14:26:39 +02:00
Florian Hahn f213128b29
[ConstraintElimination] Further de-compose operands of add operations.
This simply extends the existing logic to look through adds and combine
the components as done in other places already.
2022-09-14 12:00:32 +01:00
Simon Pilgrim 854a4595b6 [CostModel][X86] getArithmeticInstrCost - move GLM/SLM custom costs AFTER constant shift -> multiply canonicalization
Corrects the shift by constant costs to better account for them being converted to multiples for lowering - which demonstrates that we should probably be trying harder NOT to convert these to multiplies for some CPUs (v4i32 in particular).
2022-09-14 11:46:26 +01:00
Simon Pilgrim 40ab7875f8 [CostModel][X86] Fix throughput costs for AVX512BW v32i16 shifts
Fixes regression from a931dbfbd3
2022-09-14 11:18:23 +01:00
Pavel Labath d079bf33de [lldb] Enable (un-xfail) some dwarf tests for arm
These are passing now that the relocation assertion has been removed in
D132954.

Relocations still remain unimplemented though, so it's possible this may
start to fail due to unrelated changes. If that happens very often, we
may just need to disable (skip) the test instead.
2022-09-14 11:35:16 +02:00
Florian Hahn aba2085e52
[ConstraintElimination] Add tests where info from zext can be used. 2022-09-14 10:04:07 +01:00
Pavel Kosov a0fb69d17b [lldb][DWARF5] Enable macro evaluation
Patch enables handing of DWARFv5 DW_MACRO_define_strx and DW_MACRO_undef_strx

~~~

OS Laboratory. Huawei RRI. Saint-Petersburg

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D130062
2022-09-14 11:32:07 +03:00
Marco Elver 4627a30acf [MIR] Support printing and parsing pcsections
Adds support for printing and parsing PC sections metadata in MIR.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D133785
2022-09-14 10:30:25 +02:00
Florian Hahn 62c1928437
[ConstraintElimination] Add tests for chained adds.
Add test coverage for reasoning about chains of adds.
2022-09-14 09:27:18 +01:00
Azat Khuzhin 44075cc34a [test][clang] run test for lld emitting dwarf-aranages only if lld is presented
Fixes: https://reviews.llvm.org/D133092
CI: https://lab.llvm.org/buildbot/#/builders/109/builds/46592

Reviewed By: hokein

Differential Revision: https://reviews.llvm.org/D133841
2022-09-14 10:17:03 +02:00