Commit Graph

426364 Commits

Author SHA1 Message Date
Johannes Doerfert 94841c713f [Attributor] Try to delete stores and simplify stored values
By default we should try to eliminate unused stores and simplify values
stored while we are at it.
2022-06-09 16:48:53 +02:00
Johannes Doerfert a3273c0c06 [Attributor] Ensure to use the proper liveness AA
When determining liveness via Attributor::isAssumedDead(...) we might
end up without a liveness AA or with one pointing into another function.
Neither is helpful and we will avoid both from now on.
2022-06-09 16:48:53 +02:00
Jay Foad a3fc8adb7e [AMDGPU] Add GFX11 test coverage for the memory legalizer 2022-06-09 15:35:56 +01:00
Levon a33983729d Pass plugin_name in SBProcess::SaveCore
This CL allows to use minidump save-core functionality (https://reviews.llvm.org/D108233) via SBProcess interface.
After adding a support from gdb-remote client (https://reviews.llvm.org/D101329) if the plugin name is empty the plugin manager will try to save the core directly from the process plugin.
See https://github.com/llvm/llvm-project/blob/main/lldb/source/Core/PluginManager.cpp#L696

To have an ability to save the core with minidump plugin I added plugin name as a parameter in SBProcess::SaveCore.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D125325
2022-06-09 16:29:50 +02:00
Philip Reames 0e29a80fdc [RISCV] Add cost model for reverse shuffle
The majority of the cost appears to be forming the indices vector.

Differential Revision: https://reviews.llvm.org/D127141
2022-06-09 07:21:40 -07:00
Florian Hahn 20d798bd47
Recommit "[SCEV] Look through single value PHIs." (take 3)
This reverts commit 1fbdbb5595.

All known issues surfaced by this patch should have been fixed now.

The fixes included fixing issues with SCEV expansion in LV and DA's
reliance on LCSSA phis.
2022-06-09 15:20:10 +01:00
Yitzhak Mandelbaum dd38caf3b5 [clang][dataflow] Track `optional` contents in `optional` model.
This patch adds partial support for tracking (i.e. modeling) the contents of an
optional value. Specifically, it supports tracking the value after it is
accessed. We leave tracking constructed/assigned contents to a future patch.

Differential Revision: https://reviews.llvm.org/D124932
2022-06-09 14:17:39 +00:00
Gabor Marton bc2c759aee [analyzer] Fix assertion failure after getKnownValue call
Depends on D126560. `getKnownValue` has been changed by the parent patch
in a way that simplification was removed. This is not correct when the
function is called by the Checkers. Thus, a new internal function is
introduced, `getConstValue`, which simply queries the constraint manager.
This `getConstValue` is used internally in the `SimpleSValBuilder` when a
binop is evaluated, this way we avoid the recursion into the `Simplifier`.

Differential Revision: https://reviews.llvm.org/D127285
2022-06-09 16:13:57 +02:00
Simon Moll b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Simon Pilgrim 7dbfcfa735 [DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one.
This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.
2022-06-09 14:56:14 +01:00
Johannes Doerfert ae10b8a582 [Attributor][FIX] Give registered simplification callbacks precedence
We accidentally checked for constants before we looked for registered
simplification callbacks. The latter needs to take precedence though.
2022-06-09 15:31:53 +02:00
Andrew Turner 95141aa9cb Fix TableLookupTest on FreeBSD
As with Linux placce the Counters array in the __libfuzzer_extra_counters
section. This fixes the test on FreeBSD.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D125902
2022-06-09 09:24:09 -04:00
Matthias Springer 461dafd2a3 [mlir][bufferization] Add OneShotBufferize transform op
This commit allows for One-Shot Bufferize to be used through the transform dialect. No op handle is currently returned for the bufferized IR.

Differential Revision: https://reviews.llvm.org/D125098
2022-06-09 15:15:09 +02:00
Yuki Okushi cedfb5462c
[docs] Update supported language standards list for C++
Differential Revision: https://reviews.llvm.org/D127065
2022-06-09 22:14:08 +09:00
Yuki Okushi 074f12e467
[OpenMP] Fix the build on Windows
The code expanded from kmp_barrier.h uses some `KMP_INTERNAL_*`s,
so the definitions have to be placed before it.

Fixes #55815

Differential Revision: https://reviews.llvm.org/D126873
2022-06-09 22:12:42 +09:00
Timm Bäder 8feb92add8 [clang][tests] Add missing compiler name
The driver stripts the first argument. Without the compiler name, the
test depends on whether GCC_INSTALL_PREFIX is set or not.

See https://reviews.llvm.org/D125862
2022-06-09 15:11:40 +02:00
Haojian Wu c70aeaad2b [pseudo] Move grammar-related headers to a separate dir, NFC.
We did that for .cpp, but forgot the headers.

Differential Revision: https://reviews.llvm.org/D127388
2022-06-09 14:58:05 +02:00
Mark de Wever a7bd1ab776 [libc++][CI] Updates Docker image.
- Updates the image to use Ubuntu Jammy.
- Installs GCC-12 as preparation to migrate to that GCC version.

NOTE: This is a re-application of f2f0dba818, which was reverted
in 2b5e3ef83c due to an issue with the CI nodes. The CI nodes have
since then been updated and this appears to be fine.

Differential Revision: https://reviews.llvm.org/D126666
2022-06-09 08:53:20 -04:00
Aaron Ballman 8e1a29ecab Use HTTPS links instead of HTTP ones in the C DR status page 2022-06-09 08:52:30 -04:00
Christian Kandeler bf830623b0 [pseudo] Fix unit test build
Analogous to 632545e8ce.

Reviewed By: hokein

Differential Revision: https://reviews.llvm.org/D127397
2022-06-09 14:42:47 +02:00
Alex Zinenko b6c58ec486 [mlir] add producer fusion to structured transform ops
This relies on the existing TileAndFuse pattern for tensor-based structured
ops. It complements pure tiling, from which some utilities are generalized.

Depends On D127300

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D127319
2022-06-09 14:30:45 +02:00
Douglas Yung 942f4e3a7c Revert "[lld-macho] Initial support for EH Frames"
This reverts commit 826be330af.

This was causing a test failure on build bots:
  - https://lab.llvm.org/buildbot/#/builders/36/builds/21770
  - https://lab.llvm.org/buildbot/#/builders/58/builds/23913
2022-06-09 05:25:43 -07:00
Douglas Yung 10641a42e2 Revert "[lld-macho] Support EH frames under arm64"
This reverts commit 977d62c33e.

This change was causing crashes in 2 tests on the buildbots:
  - https://lab.llvm.org/buildbot/#/builders/58/builds/23914
  - https://lab.llvm.org/buildbot/#/builders/36/builds/21771
2022-06-09 05:24:28 -07:00
Sam McCall 18f0b7092d [pseudo] Don't clang-format test inputs. NFC 2022-06-09 14:18:30 +02:00
Haojian Wu 9ce232fba9 [pseudo] Fix the missing-field-initializers warning from f1ac00c9b0, NFC 2022-06-09 14:10:36 +02:00
Jose Manuel Monsalve Diaz 15ed5c0a07 [LIBOMPTARGET] Adding AMD to llvm-omp-device-info
Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added

Example of an output:
```
llvm-omp-device-info
Device (0):
    This is a generic-elf-64bit device

Device (1):
    This is a generic-elf-64bit device

Device (2):
    This is a generic-elf-64bit device

Device (3):
    This is a generic-elf-64bit device

Device (4):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           0
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (5):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           1
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (6):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           2
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (7):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           3
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE
```

Differential Revision: https://reviews.llvm.org/D126836
2022-06-09 11:58:39 +00:00
Benjamin Kramer 0abb472fff AMDGPU/GISel: Remove unused variable. NFC. 2022-06-09 13:43:47 +02:00
Nathan Sidwell 65b34b78f8 [clang][pr55896]:co_yield/co_await thread-safety
co_await and co_yield are represented by (classes derived from)
CoroutineSuspendExpr.  That has a number of child nodes, not all of
which are used for code-generation.  In particular the operand is
represented multiple times, and, like the problem with co_return
(55406) it must only be emitted in the CFG exactly once.  The operand
also appears inside OpaqueValueExprs, but that's ok.

This adds a visitor for SuspendExprs to emit the required children in
the correct order.  Note that this CFG is pre-coro xform.  We don't
have initial or final suspend points.

Reviewed By: bruno

Differential Revision: https://reviews.llvm.org/D127236
2022-06-09 04:42:10 -07:00
Johannes Doerfert 982053e85e [Attributor][NFC] Improve debug code and comments 2022-06-09 13:41:23 +02:00
Johannes Doerfert 0ece283f03 [Attributor] Add checks needed as we strengthen value simplify 2022-06-09 13:41:23 +02:00
Johannes Doerfert 393be12b74 [Attributor] Look at base values for align, nonnull, and deref
Stripping bitcasts and 0-geps helps normalization and minimizes the
impact of a follow up change.
2022-06-09 13:41:23 +02:00
Johannes Doerfert cb8adf76f7 [Attributor] Simplify loads from constant globals
If a global is constant and the initializer is known we can simplify
loads from it as the value has to be the initializer.
2022-06-09 13:41:23 +02:00
Martin Storsjö 39c4ac140d [lldb] Silence a GCC warning about missing returns after a fully covered switch. NFC. 2022-06-09 14:39:33 +03:00
Alvin Wong c8daf4a707 [lldb] Add gnu-debuglink support for Windows PE/COFF
The specification of gnu-debuglink can be found at:
https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html

The file CRC or the CRC value from the .gnu_debuglink section is now
used to calculate the module UUID as a fallback, to allow verifying that
the debug object does match the executable. Note that if a CodeView
build id exists, it still takes precedence. This works even for MinGW
builds because LLD writes a synthetic CodeView build id which does not
get stripped from the debug object.

The `Minidump/Windows/find-module` test also needs a fix by adding a
CodeView record to the exe to match the one in the minidump, otherwise
it fails due to the new UUID calculated from the file CRC.

Fixes https://github.com/llvm/llvm-project/issues/54344

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D126367
2022-06-09 14:39:33 +03:00
Nicolai Hähnle 264d1136f9 AMDGPU/GISel: Introduce custom legalization of G_MUL
The generic legalizer framework is still used to reduce the problem
to scalar multiplication with the bit size a multiple of 32.

Generating optimal code sequences for big integer multiplication is
somewhat tricky and has a number of target-specific intricacies:

- The target has V_MAD_U64_U32 instructions that multiply two 32-bit
  factors and add a 64-bit accumulator. Most partial products should
  use this instruction.
- The accumulator is mapped to consecutive 32-bit GPRs, and partial-
  product multiply-adds can feed the accumulator into each other
  directly. (The register allocator's support for that is somewhat
  limited, but that only matters for 128-bit integers and larger.)
- OTOH, on some hardware, V_MAD_U64_U32 requires the accumulator
  to be stored in an even-aligned pair of GPRs. To avoid excessive
  register copies, it makes sense to compute odd partial products
  separately from even partial products (where a partial product
  src0[j0] * src1[j1] is "odd" if j0 + j1 is odd) and add both
  halves together as a final step.
- We can combine G_MUL+G_ADD into a single cascade of multiply-adds.
- The target can keep many carry-bits in flight simultaneously, so
  combining carries using G_UADDE is preferable over G_ZEXT + G_ADD.
- Not addressed by this patch: When the factors are sign-extended,
  the V_MAD_I64_I32 instruction (signed version!) can be used.

It is difficult to address these points generically:

1) Finding matching pairs of G_MUL and G_UMULH to find a wide
multiply is expensive. We could add a G_UMUL_LOHI generic instruction
and conditionally use that in the generic legalizer, but by itself
this wouldn't allow us to use the accumulation capability of
V_MAD_U64_U32. One could attempt to find matching G_ADD + G_UADDE
post-legalization, but this is also expensive.

2) Similarly, making sense of the legalization outcome of a wide
pre-legalization G_MUL+G_ADD pair is extremely expensive.

3) How could the generic legalizer possibly deal with the
particular idiosyncracy of "odd" vs. "even" partial products.

All this points in the direction of directly emitting an ideal code
sequence during legalization, but the generic legalizer should not
be burdened with such overly target-specific concerns. Hence, a
custom legalization.

Note that the implemented approach is different from that used by
SelectionDAG because narrowing of scalars works differently in
general. SelectionDAG iteratively cuts wide scalars into low and
high halves until a legal size is reached. By contrast, GlobalISel
does the narrowing in a single shot, which should be better for
compile-time and for the quality of the generated code.

This patch leaves three gaps open:

1. When the factors are uniform, we should execute the multiplication on
   the SALU. Register bank mapping already ensures this.

   However, the resulting code sequence is not optimal because it doesn't
   fully use the carry-in capabilities of S_ADDC_U32. (V_MAD_U64_U32
   doesn't have a carry-in.) It is very difficult to fix this after the
   fact, so we should really use a different legalization sequence in
   this case. Unfortunately, we don't have a divergence analysis and so
   cannot make that choice.

   (This only matters for 128-bit integers and larger.)

2. Avoid unnecessary multiplies when sources are known to be zero- or
   sign-extended. The challenge is that the legalizer does not currently
   have access to GISelKnownBits.

3. When the G_MUL is followed by a G_ADD, we should consider combining
   the two instructions into a single multiply-add sequence, to utilize
   the accumulator of V_MAD_U64_U32 fully. (Unless the multiply has
   multiple uses and the implied duplication of the multiply is an
   overall negative). However, this is also not true when the factors
   are uniform: in that case, it is generally better to *not* combine
   the two operations, so that the multiply can be done on the SALU.

   Again, we don't have a divergence analysis available and so cannot
   make an informed choice.

Differential Revision: https://reviews.llvm.org/D124844
2022-06-09 13:38:56 +02:00
Simon Pilgrim 1a02db9882 [X86] canonicalizeShuffleWithBinOps - add TODO for X86ISD::ANDNP bitwise handling
Its just as safe to move shuffles across X86ISD::ANDNP as any other logical bitop, they just tend to appear too late to matter.

Noticed while triaging D127115 regressions.
2022-06-09 12:18:26 +01:00
Benjamin Kramer abcf1496ad Fix complex.conj integration test
- It doesn't actually print the fractional part if the result is a whole number
- One of the expectations was just wrong
2022-06-09 13:11:10 +02:00
Florian Hahn 85983ca42e
[VPlan] Replace remaining use of needsScalarIV.
All information is already available in VPlan. Note that there are some
test changes, because we now can correctly look through instructions
like truncates to analyze the actual users.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123541
2022-06-09 12:05:37 +01:00
Jose Manuel Monsalve Diaz 84e020a061 Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info"
This reverts commit d16a0877d8.
2022-06-09 10:46:03 +00:00
Matheus Izvekov 51608515fa
cmake: use llvm dir variables for clang/utils/hmaptool
Copy hmaptool using the paths for CURRENT_TOOLS_DIR, so
everything goes in the right place in case llvm is included
from a top level CMakeLists.txt.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: stephenneuendorffer

Differential Revision: https://reviews.llvm.org/D126308
2022-06-09 12:25:38 +02:00
Paul Pluzhnikov afbe3aed49 [clangd] Minor refactor of CanonicalIncludes::addSystemHeadersMapping.
Before commit b3a991df3c SystemHeaderMap used to be a vector.

Commit b3a991df3c changed it into a map, but neglected to remove
duplicate keys (e.g. "bits/typesizes.h", "include/stdint.h", etc.).

To prevent confusion, remove all duplicates, build HeaderMapping
one pair at a time and assert() that no duplicates are found.

Change by Paul Pluzhnikov (ppluzhnikov)!

Reviewed By: ilya-biryukov

Differential Revision: https://reviews.llvm.org/D125742
2022-06-09 12:18:39 +02:00
Kiran Chandramohan 08407255b2 [Flang] Temporary fix for conversion materialization
Simply add a source and target materialization handler that do nothing
and that override the default handlers that would add illegal
LLVM::DialectCastOp otherwise.

This is the simplest workaround, but not an actual fix, something may be
inconsistent after D82831 (most likely fir lowering to llvm happens in a
way that mlir infrastructure is not expecting in D82831).

Here is a minimal reproducer of what the issue was:
```
func @foop(%a : !fir.real<4>) -> ()
func @bar(%a : !fir.real<2>) {
  %1 = fir.convert %a : (!fir.real<2>) -> !fir.real<4>
  call @foop(%1) : (!fir.real<4>) -> ()
  return
}
```
tco -o - output was:
```
error: 'llvm.mlir.cast' op type must be non-index integer types, float types, or vector of mentioned types.
llvm.func @foop(!llvm.float)
llvm.func @bar(%arg0: !llvm.half) {
  %0 = llvm.fpext %arg0 : !llvm.half to !llvm.float
  %1 = llvm.mlir.cast %0 : !llvm.float to !fir.real<4>
  llvm.call @foop(%1) : (!fir.real<4>) -> ()
  llvm.return
}
```
This patch disable the introduction of the llvm.mlir.cast and preserve the previous behavior.

Also fixes https://github.com/llvm/llvm-project/issues/55210.

Note: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D127212

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
2022-06-09 10:10:56 +00:00
Haojian Wu f1ac00c9b0 [pseudo] Add grammar annotations support.
Add annotation handling ([key=value]) in the BNF grammar parser, which
will be used in the conditional reduction, and error recovery.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D126536
2022-06-09 12:06:22 +02:00
Denis Antrushin e99e821ce8 [FixupStatepoints] Precommit test for D127308. NFC 2022-06-09 17:04:48 +07:00
Johannes Doerfert 14899bc43d [Attributor] Generalize interface from ConstantInt to Constant
We can use constant to allow undef and there is no need to force
integers in the API anyway. The user can decide if a non integer
constant is fine or not.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 7a07b88f37 [Attributor][FIX] Replace call site argument uses, not values
We need to be careful replacing values as call site arguments
(IRPosition::IRP_CALL_SITE_ARGUMENT) is representing a use and not a
value. This patch replaces the interface to take a IR position instead
making it harder to misuse accidentally. It does not change our tests
right now but a follow up exposed the potential footgun.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 1df6e171c3 [Attributor] Simplify (integer range) state handling
We used to be very conservative when integer states were merged.
Instead of adding the known range (which is large due to uncertainty)
into the assumed range (which is hopefully small), we can also only
allow to merge in both at the same time into their respective
counterpart. This will ensure we keep the invariant that assumed is part
of known.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 481b8f31df [Attributor][NFC] Introduce helper struct
We often use a context associated with a value. For now only one use
case has been changed.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 4277c1be88 [Attributor][FIX] Avoid metadata and duplicate replication assertion
When we recreate instructions as part of simplification we need to take
care of debug metadata and replacing the value multiple times. For now,
we handle both conservatively.
2022-06-09 12:00:26 +02:00
Biplob Mishra d87bfa9ad0 [InstCombine] Combine instructions of type or/and where AND masks can be combined.
The patch simplifies some of the patterns as below

(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A

In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.

Additionally this commit fixes the issue reported with the test case.
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

The previous revision/commit did not check one-use of an intermediate value that this transform re-uses.
When that value has another use, an existing transform will try to invert the transform here.
By adding one-use checks, we avoid the infinite loops seen with the earlier commit.

Differential Revision: https://reviews.llvm.org/D124119
2022-06-09 10:58:30 +01:00