Commit Graph

397160 Commits

Author SHA1 Message Date
Philip Reames aec08e8600 Special case common branch patterns in breakLoopBackedge
This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability.  I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.
2021-08-22 10:42:23 -07:00
Simon Pilgrim 805fb1f6c1 [X86] combineMul - move MUL_IMM comment inside function. NFC.
combineMul is now used for other things as well as the mul-with-constant expansion - move the comment to where its actually relevant.
2021-08-22 18:27:03 +01:00
Alexey Lapshin 07d44cc0b1 [DWARF][Verifier] Do not add child DieRangeInfo with empty address range to the parent.
verifyDieRanges function checks for the intersected address ranges.
It adds child DieRangeInfo into parent DieRangeInfo to check
whether children have overlapping address ranges. It is safe to not add
DieRangeInfo with empty address range into parent's children list.
This decreases the number of children which should be navigated and as a result
decreases execution time(parents having a lot of children with empty ranges
spend much time navigating them). For this command: "llvm-dwarfdump --verify clang-repl"
execution time decreased from 220 sec till 75 sec.

Differential Revision: https://reviews.llvm.org/D107554
2021-08-22 19:39:21 +03:00
Kazu Hirata 40fd2d93c0 [Transforms] Remove unused declaration emitStrNLen (NFC)
The corresponding definition has been missing for at least 5 years.
2021-08-22 09:08:22 -07:00
Arthur O'Dwyer ca7926bd79 [libc++] Eliminate needless `add_lvalue_reference` from <algorithm> helpers. NFCI.
When `_Compare` is a function parameter already (so it's not `void`
and it's not an abominable function type), `add_lvalue_reference_t<_Compare>`
is simply a synonym for `_Compare&`. We don't need to pull in `<type_traits>`
and instantiate a template trait to figure that out.

Differential Revision: https://reviews.llvm.org/D108400
2021-08-22 11:43:12 -04:00
Nikita Popov fafe5a6f44 [InstCombine] Perform "eq of parts" fold with logical ops
The pattern matched here is too complex for the general logical
and/or to bitwise and/or conversion to trigger. However, the
fold is poison-safe, so match it with a select root as well:

https://alive2.llvm.org/ce/z/vNzzSg
https://alive2.llvm.org/ce/z/Beyumt
2021-08-22 16:55:53 +02:00
Nikita Popov be4b8366fb [InstCombine] Add tests for "eq of parts" with logical op (NFC)
We currently only handle this with a bitwise and/or instruction,
but not a logical.
2021-08-22 16:52:44 +02:00
Simon Pilgrim 352df10a23 [X86][AVX] matchShuffleAsBlend - use isElementEquivalent to help match broadcast/repeated elements
Extend matchShuffleAsBlend to not only match against known in-place elements for BLEND shuffles, but use isElementEquivalent to determine if the shuffle mask's referenced element is the same as the in-place element.

This allows us to replace a number of insertps instructions with more general blendps instructions (better opportunities for commutation, concatenation etc.).
2021-08-22 15:26:17 +01:00
Simon Pilgrim 96fb3eef66 Fix signed/unsigned comparison warning. NFCI. 2021-08-22 15:02:19 +01:00
Simon Pilgrim 7b7ac4b16a [X86] Expose memory codegen in element insert load tests to improve accuracy of checks
Also replace X32 with X86 check prefixes for i686 tests (we tend to try to use X32 for gnux32 targets)
2021-08-22 14:54:48 +01:00
Simon Pilgrim a1c892b439 [X86][SSE] lowerVECTOR_SHUFFLE - canonicalize with horizontal ops.
Before lowering shuffles, see if we can merge horizontal ops or canonicalize the shuffle mask to point to the same LHS/RHS of the HOps when an HOp's args are repeated.
2021-08-22 14:54:48 +01:00
Sanjay Patel dcf659e821 [InstSimplify] fold rotate of -1 to -1
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/GpkFCt
2021-08-22 09:15:48 -04:00
Sanjay Patel d41e308f10 [InstSimplify] fold rotate of zero to zero
This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/fjKwqv
2021-08-22 09:15:48 -04:00
Sanjay Patel a0ebac4466 [InstSimplify] add tests for rotates of 0/-1; NFC 2021-08-22 09:15:48 -04:00
Simon Pilgrim 8533e782ef [X86] Try to sync HSW + BDW model class defs to simplify comparisons. NFC.
Broadwell is mainly a die shrink of Haswell, but the model had many of the scheduling classes in different orders, making side-by-side comparisons very difficult.

The InstRW overrides are still quite different, but at least that part of the side-by-side diff is now in the same position.

This was noticed while I was trying to investigate diffs between llvm-mca and other perf analyzers in https://uica.uops.info/ - we used to be able to do diffs between most of the models very easily, but we seem to have lost that simplicity as classes have been altered, models have been refined and other models have rotted.
2021-08-22 13:02:51 +01:00
Sanjay Patel 3aa009cc87 [InstCombine] generalize subtract with 'not' operands
The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.

In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.
2021-08-22 07:18:31 -04:00
Simon Pilgrim 7f48bd3bed CGBuiltin.cpp - pass SVETypeFlags by const reference. NFC.
Don't pass the struct by value.
2021-08-22 12:13:17 +01:00
Florian Hahn 9baed023b4
[LV] Adjust reduction recipes before recurrence handling.
Adjusting the reduction recipes still relies on references to the
original IR, which can become outdated by the first-order recurrence
handling. Until reduction recipe construction does not require IR
references, move it before first-order recurrence handling, to prevent a
crash as exposed by D106653.
2021-08-22 11:02:33 +01:00
Ben Shi f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00
luxufan dda116bc3d [JITLink] Add support of R_X86_64_32S relocation
This patch supported the R_X86_64_32S relocation and add the Pointer32Signed generic edge kind.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108446
2021-08-22 16:45:25 +08:00
Lang Hames 1e5e1bee49 [ORC] Add std::tuple support to SimplePackedSerialization. 2021-08-22 11:17:04 +10:00
Lang Hames 76d6a8df20 [ORC] Rename blobSerializationRoundTrip, drop explicit arg types on calls.
Renames the blobSerializationRoundTrip test helper function to
spsSerializationRoundTrip ('blob' was the placeholder name for the serialization
scheme during prototyping, this function was missed when renaming everything
for the mainline). Also drops explicit template arguments at call sites where
they can be inferred (and are obvious) from the call argument type.
2021-08-22 11:17:04 +10:00
Wang, Pengfei b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Lang Hames 75e5f35aea [ORC] Add missing header.
Should fix bot failure at
https://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/4367
2021-08-22 10:36:52 +10:00
Fangrui Song 1dfb30e54c [TargetCallingConv] Change OutputArg ctor to match its members
This avoids unneeded MVT->EVT conversion.
2021-08-21 16:41:48 -07:00
Fangrui Song 0473e9f41a [AArch64] Replace unneeded CCAssignToRegWithShadow with CCAssignToReg
CCState::AllocateReg handles aliased registers.
2021-08-21 16:33:29 -07:00
Fangrui Song a83d99c55e [TargetMachine] Drop special case for *-win32-macho
clang CodeGenModule shouldAssumeDSOLocal has set dso_local.
2021-08-21 13:59:17 -07:00
Fangrui Song c5ee312368 [TargetMachine] Simplify shouldAssumeDSOLocal. NFC 2021-08-21 12:37:29 -07:00
Kazu Hirata 612048aec1 [clang] Fix typos in documentation (NFC) 2021-08-21 12:17:58 -07:00
Sanjay Patel 41af8f0ad5 [InstCombine] combine constants by reassociating add/sub/add
This may overlap partially with the reassociate pass,
but it seems simple enough that we should try it here
in InstCombine to enable other folds.

This shows up as an opportunity and potential regression
if we improve a subtract fold with 'not' ops to be more
general.
2021-08-21 11:45:43 -04:00
Sanjay Patel c0844de7a2 [InstCombine] add tests for add/sub/add combines; NFC 2021-08-21 11:45:43 -04:00
Sanjay Patel 0751347bc3 [InstCombine] add tests for min/max with nots and sub; NFC 2021-08-21 11:45:43 -04:00
David Green 605489d593 [ARM] Fix VQDMULH fold for scalar smin
Add a variant of mve-vqdmulh tests that uses min/max intrinsics
directly, including a scalar test that shows it misbehaving for min
intrinsics and a fix for the combine to prevent it from misbehaving.
2021-08-21 16:33:18 +01:00
Andrzej Warzynski 787c443a8d [flang] Refine output file generation
This patch cleans-up the file generation code in Flang's frontend
driver. It improves the layering between
`CompilerInstance::CreateDefaultOutputFile`,
`CompilerInstance::CreateOutputFile` and their various clients.

* Rename `CreateOutputFile` as `CreateOutputFileImpl` and make it
  private. This method is an implementation detail.
* Instead of passing an `std::error_code` out parameter into
  `CreateOutputFileImpl`, have it return Expected<>. This is a bit shorter
  and idiomatic LLVM.
* Make `CreateDefaultOutputFile` (which calls `CreateOutputFileImpl`)
  issue an error when file creation fails. The error code from
  `CreateOutputFileImpl` is used to generate a meaningful diagnostic
  message.
* Remove error reporting from `PrintPreprocessedAction::ExecuteAction`.
  This is only for cases when output file generation fails. This is
  handled in `CreateDefaultOutputFile` instead (see the previous point).
* Inline `AddOutputFile` into its only caller,
  `CreateDefaultOutputFile`.
* Switch from `lvm::buffer_ostream` to `llvm::buffer_unique_ostream>`
  for non-seekable output streams. This simplifies the logic in the driver
  and was introduced for this very reason in [1]
* Moke sure that the diagnostics from the prescanner when running `-E`
  (`PrintPreprocessedAction::ExecuteAction`) are printed before the actual
  output is generated.
* Update comments, add test.

NOTE: This patch relands [2]. As suggested by Michael Kruse in the
post-commit/post-revert review, I've added the following:
```
config.errc_messages = "@LLVM_LIT_ERRC_MESSAGES@"
```
in Flang's `lit.site.cfg.py.in`. This way, `%errc_ENOENT` in
output-paths.f90 gets the correct value on Windows as well as on Linux.

[1] https://reviews.llvm.org/D93260
[2] fd21d1e198

Reviewed By: ashermancinelli

Differential Revision: https://reviews.llvm.org/D108390
2021-08-21 15:18:48 +00:00
Kirill Shmakov 2cc1198e36 [lldb] Fix typo in the description of breakpoint options 2021-08-21 12:24:29 +02:00
LLVM GN Syncbot 93de779d63 [gn build] Port 7f99337f9b 2021-08-21 09:44:22 +00:00
Lang Hames 7f99337f9b [ORC] Add EPCGenericMemoryAccess: generic executor memory access via EPC calls.
All ExecutorProcessControl subclasses must provide an
ExecutorProcessControl::MemoryAccess object that can be used to access executor
memory from the JIT process. The EPCGenericMemoryAccess class provides an
off-the-shelf MemoryAccess implementation for JITs that do not need (or cannot
provide) a specialized MemoryAccess implementation. This simplifies the process
of creating new ExecutorProcessControl implementations.
2021-08-21 19:33:39 +10:00
eopXD 4fc98ca617 [NFC][LoopIdiom] Let processLoopStoreOfLoopLoad take StoreSize as SCEV instead of unsigned
Letting it take SCEV allows further modification on the function to optimize
if the StoreSize / Stride is runtime determined.

The plan is to let memcpy / memmove deal with runtime-determined sizes, just
like what D107353 did to memset.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D108289
2021-08-21 00:03:28 -07:00
Siva Chandra Reddy 5e147d3058 [libc] Add a new suite called "libc-long-running-tests".
This suite is helpful is adding long running tests which take a long
time to finish that they can be run on the public builders. They
will probably be run on special builders in future.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D104816
2021-08-21 05:01:28 +00:00
Kazu Hirata 24d4cbeca3 [CodeGen] Remove unused declaration setLiveInsUsed (NFC)
The corresponding definition was removed on Jan 20, 2017 in commit
710a4c1f3d.
2021-08-20 19:19:54 -07:00
Joseph Huber ec66ed79f4 [OpenMP] Correctly add member expressions to OpenMP info
Mapping expressions that have `this` as their base expression aren't
considered a valid base variable and the rest of the runtime expects
this. However, if we have an expression with no value declaration we can
try to extract it manually to provide more helpful debuggin information.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108483
2021-08-20 20:45:14 -04:00
Amara Emerson 3187a4f3f1 [AArch64][GlobalISel] Add legalizer support for the @llvm.get.dynamic.area.offset intrinsic.
This is just 0 on AArch64.
2021-08-20 17:13:34 -07:00
Geoffrey Martin-Noble 52acc0547d [Bazel] Fix version defines
Some of these were the wrong version and some of them were the wrong
format. Did some hunting around to figure out what exactly they're
supposed to be. Since basically everything is derived from the LLVM
version we should probably make this a bit less hardcoded, but just
fixing the values for now.

Sources:
https://github.com/llvm/llvm-project/blob/b686fc7a1bea/clang/include/clang/Basic/Version.inc.in
https://github.com/llvm/llvm-project/blob/b686fc7a1bea/clang/CMakeLists.txt#L353-L363
https://github.com/llvm/llvm-project/blob/b686fc7a1bea/llvm/CMakeLists.txt#L13-L29
https://github.com/llvm/llvm-project/blob/b686fc7a1bea/lld/CMakeLists.txt#L131-L138

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108500
2021-08-20 17:01:01 -07:00
Fangrui Song b686fc7a1b [Driver] Remove discouraged -gcc-toolchain
Space separated driver options are uncommon but Clang traditionally
did not do a good job. --gcc-toolchain= is the preferred form.

This discourage form appears to be rare, so we can just drop it.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D108494
2021-08-20 16:36:42 -07:00
Amara Emerson 67bf3ac744 [AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores.
Truncating stores with GPR bank sources shouldn't be mutated into using FPR bank
sources, since those aren't supported.

Ideally this should be a selection failure in the tablegen patterns, but for now
avoid generating them.
2021-08-20 16:36:23 -07:00
Geoffrey Martin-Noble 2bd7c30e5a [Bazel] Reduce quote escaping
There's a lot of unnecessary backslashes here that we can avoid to
reduce confusion.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108495
2021-08-20 16:33:10 -07:00
William S. Moses 973cb2c326 [MLIR][OMP] Ensure nested scf.parallel execute all iterations
Presently, the lowering of nested scf.parallel loops to OpenMP creates one omp.parallel region, with two (nested) OpenMP worksharing loops on the inside. When lowered to LLVM and executed, this results in incorrect results. The reason for this is as follows:

An OpenMP parallel region results in the code being run with whatever number of threads available to OpenMP. Within a parallel region a worksharing loop divides up the total number of requested iterations by the available number of threads, and distributes accordingly. For a single ws loop in a parallel region, this works as intended.

Now consider nested ws loops as follows:

omp.parallel {
   A: omp.ws %i = 0...10 {
      B: omp.ws %j = 0...10 {
          code(%i, %j)
      }
   }
}

Suppose we ran this on two threads. The first workshare loop would decide to execute iterations 0, 1, 2, 3, 4 on thread 0, and iterations 5, 6, 7, 8, 9 on thread 1. The second workshare loop would decide the same for its iteration. This means thread 0 would execute i \in [0, 5) and j \in [0, 5). Thread 1 would execute i \in [5, 10) and j \in [5, 10). This means that iterations i in [5, 10), j in [0, 5) and i in [0, 5), j in [5, 10) never get executed, which is clearly wrong.

This permits two options for a remedy:
1) Change the semantics of the omp.wsloop to be distinct from that of the OpenMP runtime call or equivalently #pragma omp for. This could then allow some lowering transformation to remedy the aforementioned issue. I don't think this is desirable for an abstraction standpoint.
2) When lowering an scf.parallel always surround the wsloop with a new parallel region (thereby causing the innermost wsloop to use the number of threads available only to it).

This PR implements the latter change.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108426
2021-08-20 19:06:28 -04:00
Fangrui Song 40aab0412f [test] Migrate -gcc-toolchain with space separator to --gcc-toolchain=
Space separated driver options are uncommon but Clang traditionally
did not do a good job. --gcc-toolchain= is the preferred form.
2021-08-20 15:24:58 -07:00
Jessica Paquette 9e9d70591e [AArch64][GlobalISel] Legalize non-register-sized scalar G_BITREVERSE
Clamp types to [s32, s64] and make them a power of 2.

This matches SDAG's behaviour.

https://godbolt.org/z/vTeGqf4vT

Differential Revision: https://reviews.llvm.org/D108344
2021-08-20 14:44:03 -07:00
Jessica Paquette 7e91c59844 [AArch64][GlobalISel] Legalize 32-bit + narrow G_SMULO + G_UMULO
SDAG lowers 32-bit and 64-bit G_SMULO + G_UMULO. We were missing the 32-bit
case.

For other sizes, make the 0th type a power of 2 and clamp it to either 32 bits
or 64 bits.

Right now, this will allow us to handle narrow types (e.g. s4, s24, etc.). The
LegalizerHelper doesn't support narrowing G_SMULO or G_UMULO right now. I think
we want clamping behaviour either way, so we might as well include it now to
be explicit.

Differential Revision: https://reviews.llvm.org/D108240
2021-08-20 14:37:46 -07:00