Commit Graph

426336 Commits

Author SHA1 Message Date
Johannes Doerfert 982053e85e [Attributor][NFC] Improve debug code and comments 2022-06-09 13:41:23 +02:00
Johannes Doerfert 0ece283f03 [Attributor] Add checks needed as we strengthen value simplify 2022-06-09 13:41:23 +02:00
Johannes Doerfert 393be12b74 [Attributor] Look at base values for align, nonnull, and deref
Stripping bitcasts and 0-geps helps normalization and minimizes the
impact of a follow up change.
2022-06-09 13:41:23 +02:00
Johannes Doerfert cb8adf76f7 [Attributor] Simplify loads from constant globals
If a global is constant and the initializer is known we can simplify
loads from it as the value has to be the initializer.
2022-06-09 13:41:23 +02:00
Martin Storsjö 39c4ac140d [lldb] Silence a GCC warning about missing returns after a fully covered switch. NFC. 2022-06-09 14:39:33 +03:00
Alvin Wong c8daf4a707 [lldb] Add gnu-debuglink support for Windows PE/COFF
The specification of gnu-debuglink can be found at:
https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html

The file CRC or the CRC value from the .gnu_debuglink section is now
used to calculate the module UUID as a fallback, to allow verifying that
the debug object does match the executable. Note that if a CodeView
build id exists, it still takes precedence. This works even for MinGW
builds because LLD writes a synthetic CodeView build id which does not
get stripped from the debug object.

The `Minidump/Windows/find-module` test also needs a fix by adding a
CodeView record to the exe to match the one in the minidump, otherwise
it fails due to the new UUID calculated from the file CRC.

Fixes https://github.com/llvm/llvm-project/issues/54344

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D126367
2022-06-09 14:39:33 +03:00
Nicolai Hähnle 264d1136f9 AMDGPU/GISel: Introduce custom legalization of G_MUL
The generic legalizer framework is still used to reduce the problem
to scalar multiplication with the bit size a multiple of 32.

Generating optimal code sequences for big integer multiplication is
somewhat tricky and has a number of target-specific intricacies:

- The target has V_MAD_U64_U32 instructions that multiply two 32-bit
  factors and add a 64-bit accumulator. Most partial products should
  use this instruction.
- The accumulator is mapped to consecutive 32-bit GPRs, and partial-
  product multiply-adds can feed the accumulator into each other
  directly. (The register allocator's support for that is somewhat
  limited, but that only matters for 128-bit integers and larger.)
- OTOH, on some hardware, V_MAD_U64_U32 requires the accumulator
  to be stored in an even-aligned pair of GPRs. To avoid excessive
  register copies, it makes sense to compute odd partial products
  separately from even partial products (where a partial product
  src0[j0] * src1[j1] is "odd" if j0 + j1 is odd) and add both
  halves together as a final step.
- We can combine G_MUL+G_ADD into a single cascade of multiply-adds.
- The target can keep many carry-bits in flight simultaneously, so
  combining carries using G_UADDE is preferable over G_ZEXT + G_ADD.
- Not addressed by this patch: When the factors are sign-extended,
  the V_MAD_I64_I32 instruction (signed version!) can be used.

It is difficult to address these points generically:

1) Finding matching pairs of G_MUL and G_UMULH to find a wide
multiply is expensive. We could add a G_UMUL_LOHI generic instruction
and conditionally use that in the generic legalizer, but by itself
this wouldn't allow us to use the accumulation capability of
V_MAD_U64_U32. One could attempt to find matching G_ADD + G_UADDE
post-legalization, but this is also expensive.

2) Similarly, making sense of the legalization outcome of a wide
pre-legalization G_MUL+G_ADD pair is extremely expensive.

3) How could the generic legalizer possibly deal with the
particular idiosyncracy of "odd" vs. "even" partial products.

All this points in the direction of directly emitting an ideal code
sequence during legalization, but the generic legalizer should not
be burdened with such overly target-specific concerns. Hence, a
custom legalization.

Note that the implemented approach is different from that used by
SelectionDAG because narrowing of scalars works differently in
general. SelectionDAG iteratively cuts wide scalars into low and
high halves until a legal size is reached. By contrast, GlobalISel
does the narrowing in a single shot, which should be better for
compile-time and for the quality of the generated code.

This patch leaves three gaps open:

1. When the factors are uniform, we should execute the multiplication on
   the SALU. Register bank mapping already ensures this.

   However, the resulting code sequence is not optimal because it doesn't
   fully use the carry-in capabilities of S_ADDC_U32. (V_MAD_U64_U32
   doesn't have a carry-in.) It is very difficult to fix this after the
   fact, so we should really use a different legalization sequence in
   this case. Unfortunately, we don't have a divergence analysis and so
   cannot make that choice.

   (This only matters for 128-bit integers and larger.)

2. Avoid unnecessary multiplies when sources are known to be zero- or
   sign-extended. The challenge is that the legalizer does not currently
   have access to GISelKnownBits.

3. When the G_MUL is followed by a G_ADD, we should consider combining
   the two instructions into a single multiply-add sequence, to utilize
   the accumulator of V_MAD_U64_U32 fully. (Unless the multiply has
   multiple uses and the implied duplication of the multiply is an
   overall negative). However, this is also not true when the factors
   are uniform: in that case, it is generally better to *not* combine
   the two operations, so that the multiply can be done on the SALU.

   Again, we don't have a divergence analysis available and so cannot
   make an informed choice.

Differential Revision: https://reviews.llvm.org/D124844
2022-06-09 13:38:56 +02:00
Simon Pilgrim 1a02db9882 [X86] canonicalizeShuffleWithBinOps - add TODO for X86ISD::ANDNP bitwise handling
Its just as safe to move shuffles across X86ISD::ANDNP as any other logical bitop, they just tend to appear too late to matter.

Noticed while triaging D127115 regressions.
2022-06-09 12:18:26 +01:00
Benjamin Kramer abcf1496ad Fix complex.conj integration test
- It doesn't actually print the fractional part if the result is a whole number
- One of the expectations was just wrong
2022-06-09 13:11:10 +02:00
Florian Hahn 85983ca42e
[VPlan] Replace remaining use of needsScalarIV.
All information is already available in VPlan. Note that there are some
test changes, because we now can correctly look through instructions
like truncates to analyze the actual users.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123541
2022-06-09 12:05:37 +01:00
Jose Manuel Monsalve Diaz 84e020a061 Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info"
This reverts commit d16a0877d8.
2022-06-09 10:46:03 +00:00
Matheus Izvekov 51608515fa
cmake: use llvm dir variables for clang/utils/hmaptool
Copy hmaptool using the paths for CURRENT_TOOLS_DIR, so
everything goes in the right place in case llvm is included
from a top level CMakeLists.txt.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: stephenneuendorffer

Differential Revision: https://reviews.llvm.org/D126308
2022-06-09 12:25:38 +02:00
Paul Pluzhnikov afbe3aed49 [clangd] Minor refactor of CanonicalIncludes::addSystemHeadersMapping.
Before commit b3a991df3c SystemHeaderMap used to be a vector.

Commit b3a991df3c changed it into a map, but neglected to remove
duplicate keys (e.g. "bits/typesizes.h", "include/stdint.h", etc.).

To prevent confusion, remove all duplicates, build HeaderMapping
one pair at a time and assert() that no duplicates are found.

Change by Paul Pluzhnikov (ppluzhnikov)!

Reviewed By: ilya-biryukov

Differential Revision: https://reviews.llvm.org/D125742
2022-06-09 12:18:39 +02:00
Kiran Chandramohan 08407255b2 [Flang] Temporary fix for conversion materialization
Simply add a source and target materialization handler that do nothing
and that override the default handlers that would add illegal
LLVM::DialectCastOp otherwise.

This is the simplest workaround, but not an actual fix, something may be
inconsistent after D82831 (most likely fir lowering to llvm happens in a
way that mlir infrastructure is not expecting in D82831).

Here is a minimal reproducer of what the issue was:
```
func @foop(%a : !fir.real<4>) -> ()
func @bar(%a : !fir.real<2>) {
  %1 = fir.convert %a : (!fir.real<2>) -> !fir.real<4>
  call @foop(%1) : (!fir.real<4>) -> ()
  return
}
```
tco -o - output was:
```
error: 'llvm.mlir.cast' op type must be non-index integer types, float types, or vector of mentioned types.
llvm.func @foop(!llvm.float)
llvm.func @bar(%arg0: !llvm.half) {
  %0 = llvm.fpext %arg0 : !llvm.half to !llvm.float
  %1 = llvm.mlir.cast %0 : !llvm.float to !fir.real<4>
  llvm.call @foop(%1) : (!fir.real<4>) -> ()
  llvm.return
}
```
This patch disable the introduction of the llvm.mlir.cast and preserve the previous behavior.

Also fixes https://github.com/llvm/llvm-project/issues/55210.

Note: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D127212

Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
2022-06-09 10:10:56 +00:00
Haojian Wu f1ac00c9b0 [pseudo] Add grammar annotations support.
Add annotation handling ([key=value]) in the BNF grammar parser, which
will be used in the conditional reduction, and error recovery.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D126536
2022-06-09 12:06:22 +02:00
Denis Antrushin e99e821ce8 [FixupStatepoints] Precommit test for D127308. NFC 2022-06-09 17:04:48 +07:00
Johannes Doerfert 14899bc43d [Attributor] Generalize interface from ConstantInt to Constant
We can use constant to allow undef and there is no need to force
integers in the API anyway. The user can decide if a non integer
constant is fine or not.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 7a07b88f37 [Attributor][FIX] Replace call site argument uses, not values
We need to be careful replacing values as call site arguments
(IRPosition::IRP_CALL_SITE_ARGUMENT) is representing a use and not a
value. This patch replaces the interface to take a IR position instead
making it harder to misuse accidentally. It does not change our tests
right now but a follow up exposed the potential footgun.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 1df6e171c3 [Attributor] Simplify (integer range) state handling
We used to be very conservative when integer states were merged.
Instead of adding the known range (which is large due to uncertainty)
into the assumed range (which is hopefully small), we can also only
allow to merge in both at the same time into their respective
counterpart. This will ensure we keep the invariant that assumed is part
of known.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 481b8f31df [Attributor][NFC] Introduce helper struct
We often use a context associated with a value. For now only one use
case has been changed.
2022-06-09 12:00:26 +02:00
Johannes Doerfert 4277c1be88 [Attributor][FIX] Avoid metadata and duplicate replication assertion
When we recreate instructions as part of simplification we need to take
care of debug metadata and replacing the value multiple times. For now,
we handle both conservatively.
2022-06-09 12:00:26 +02:00
Biplob Mishra d87bfa9ad0 [InstCombine] Combine instructions of type or/and where AND masks can be combined.
The patch simplifies some of the patterns as below

(A | (B & C0)) | (B & C1) -> A | (B & C0|C1)
((B & C0) | A) | (B & C1) -> (B & C0|C1) | A

In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns.

Additionally this commit fixes the issue reported with the test case.
int f(int a, int b) {
  int c = ((unsigned char)(a >> 23) & 925);
  if (a)
    c = (a >> 23 & b) | ((unsigned char)(a >> 23) & 925) | (b >> 23 & 157);
  return c;
}

The previous revision/commit did not check one-use of an intermediate value that this transform re-uses.
When that value has another use, an existing transform will try to invert the transform here.
By adding one-use checks, we avoid the infinite loops seen with the earlier commit.

Differential Revision: https://reviews.llvm.org/D124119
2022-06-09 10:58:30 +01:00
Andrzej Warzynski 1953bcdaac [flang] Add RUN lines using `fir-opt`
In tests that define a pass pipeline to use, add a RUN line using fir-opt.

Differential Revision: https://reviews.llvm.org/D126955
2022-06-09 09:48:54 +00:00
Kiran Chandramohan 8b951e64d9 [Flang][OpenMP] Lower schedule modifiers for worksharing loop
Add support for lowering the schedule modifiers (simd, monotonic,
non-monotonic) in worksharing loops.

Note: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project.

Reviewed By: peixin

Differential Revision: https://reviews.llvm.org/D127311

Co-authored-by: Mats Petersson <mats.petersson@arm.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
2022-06-09 09:45:14 +00:00
Alex Zinenko 5f0d4f208e [mlir] Introduce Transform ops for loops
Introduce transform ops for "for" loops, in particular for peeling, software
pipelining and unrolling, along with a couple of "IR navigation" ops. These ops
are intended to be generalized to different kinds of loops when possible and
therefore use the "loop" prefix. They currently live in the SCF dialect as
there is no clear place to put transform ops that may span across several
dialects, this decision is postponed until the ops actually need to handle
non-SCF loops.

Additionally refactor some common utilities for transform ops into trait or
interface methods, and change the loop pipelining to be a returning pattern.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D127300
2022-06-09 11:41:55 +02:00
Andrzej Warzynski ef4318e1a4 [flang][driver] Generate run-time type info
This is a small follow-up for https://reviews.llvm.org/D120051. It makes
sure that tables with "run-time type information for derived types" are
generated for code-gen actions. Originally, only non-code-gen actions
were updated (i.e. actions that were fully supported at that time).

Differential Revision: https://reviews.llvm.org/D127307
2022-06-09 09:38:12 +00:00
Haojian Wu 74e4d5f256 [pseudo] Simplify the glrReduce implementation.
glrReduce maintains two priority queues (one for bases, and the other
for Sequence), these queues are in parallel with each other, corresponding to a
single family. They can be folded into one.

This patch removes the bases priority queue, which improves the glrParse by
10%.

ASTReader.cpp: 2.03M/s (before) vs 2.26M/s (after)

Differential Revision: https://reviews.llvm.org/D127283
2022-06-09 11:28:52 +02:00
owenca 40a5d79a5c [clang-format][NFC] Format lib/Format and unittests/Format in clang
Reformat these directories with InsertBraces and RemoveBracesLLVM.

Differential Revision: https://reviews.llvm.org/D127366
2022-06-09 02:25:06 -07:00
Haojian Wu 7a05942dd0 [pseudo] Remove the explicit Accept actions.
As pointed out in the previous review section, having a dedicated accept
action doesn't seem to be necessary. This patch implements the the same behavior
without accept acction, which will save some code complexity.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D125677
2022-06-09 11:19:07 +02:00
Haojian Wu 075449da80 [pseudo] Fix a sign-compare warning in debug build, NFC. 2022-06-09 11:18:03 +02:00
lewuathe fff27d181c [mlir][complex] Correctness check for complex.conj
Add correctness check for complex.conj operation

Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D127377
2022-06-09 11:11:56 +02:00
Sam Parker 447c411fef [ARM][ParallelDSP] Fix self reference bug
Ensure we don't generate a smlad intrinsic that takes itself as an
argument.

Differential Revision: https://reviews.llvm.org/D127213
2022-06-09 09:10:57 +00:00
Guillaume Chatelet dc3367970e [SelectionDAG] Handle bzero/memset libcalls globally instead of per target
Differential Revision: https://reviews.llvm.org/D127279
2022-06-09 08:34:55 +00:00
Florian Hahn 553d5161ee
[cmake] Add missing dependencies to objlib in add_llvm_executable.
After f06abbb393 I have been seeing build
failures due to the obj.clang target missing a dependency on
tools/clang/clang-tablegen-targets.

This appears to be due to the fact that LLVM_COMMON_DEPENDS are not added
as dependencies to the object library.

This patch uses the same logic as llvm_add_library to register
dependencies for object libraries.

Reviewed By: beanz, abrachet, steven_wu

Differential Revision: https://reviews.llvm.org/D127318
2022-06-09 09:24:13 +01:00
Chenbing Zheng 38992d2c5e [InstCombine] improve fold for icmp-ugt-ashr
Existing condition for
fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1
missed some boundary. It cause this fold don't work for some cases, and the
reason is due to signed number overflow.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D127188
2022-06-09 16:22:12 +08:00
Lian Wang 91e31fd205 [RISCV][VP] Add fp test of widen and split for vp.setcc
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D127079
2022-06-09 08:14:12 +00:00
Nikita Popov 56c9976d46 [IndVarSimplify] Don't assert that terminator is not SCEVable (PR55925)
The IV widening code currently asserts that terminators aren't SCEVable
-- however, this is not the case for invokes with a returned attribute.

As far as I can tell, this assertions is not necessary -- even if we
have a critical edge (the second test case), the trunc gets inserted
in a legal position.

Fixes https://github.com/llvm/llvm-project/issues/55925.

Differential Revision: https://reviews.llvm.org/D127288
2022-06-09 10:12:13 +02:00
Nicolai Hähnle f971e77fb4 ADT/ArrayRef: Add makeMutableArrayRef overloads
Equivalent overloads already exist for makeArrayRef.

Differential Revision: https://reviews.llvm.org/D126421
2022-06-09 09:59:50 +02:00
Lian Wang 362a02dabe [RISCV][test] Add widen STEP_VECTOR tests.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127371
2022-06-09 07:47:04 +00:00
Christian Kandeler d352017184 [include-cleaner] Fix build error in unit test
Reviewed By: nridge

Differential Revision: https://reviews.llvm.org/D127217
2022-06-09 03:38:13 -04:00
lorenzo chelini 2a3c07f897 [MLIR][Math] Re-order conversions alphabetically (NFC)
Minor follow-up after: D127286 (https://reviews.llvm.org/D127286/new/)

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D127382
2022-06-09 09:12:13 +02:00
Matthias Gehre 7e17e15c9f clang: Introduce -fexperimental-max-bitint-width
This splits of the introduction of -fexperimental-max-bitint-width
from https://reviews.llvm.org/D122234
because that PR is still blocked on discussions on the backend side.

I was asked [0] to upstream at least the flag.

[0] 09854f2af3 (commitcomment-75116619)

Differential Revision: https://reviews.llvm.org/D127287
2022-06-09 07:15:03 +01:00
Bruno Cardoso Lopes e6a76a4935 [Clang][CoverageMapping] Fix compile time explosions by adjusting only appropriated skipped ranges
D83592 added comments to be part of skipped regions, and as part of that, it
also shrinks a skipped range if it spans a line that contains a non-comment
token. This is done by `adjustSkippedRange`.

The `adjustSkippedRange` currently runs on skipped regions that are not
comments, causing a 5min regression while building a big C++ files without any
comments.

Fix the compile time introduced in D83592 by tagging SkippedRange with kind
information and use that to decide what needs additional processing.

Differential Revision: https://reviews.llvm.org/D127338
2022-06-08 23:13:39 -07:00
David Carlier a4c97e1937 [Sanitizers] prctl interception update for the PR_SET_VMA option case.
Supports on Android but also from Linux 5.17

Reviewers: vitalybuka, eugenis

Reviewed-By: vitalybuka

Differential Revision: https://reviews.llvm.org/D127326
2022-06-09 06:07:26 +01:00
Fangrui Song 11136a6032 [DeadArgElim] Remove dead code after r128810 2022-06-08 21:11:54 -07:00
Jez Ng 977d62c33e [lld-macho] Support EH frames under arm64
For arm64, llvm-mc emits relocations for the target function
address like so:

  ltmp:
    <CIE start>
    ...
    <CIE end>
    ... multiple FDEs ...
    <FDE start>
    <target function address - (ltmp + pcrel offset)>
    ...

If any of the FDEs in `multiple FDEs` get dead-stripped, then `FDE start`
will move to an earlier address, and `ltmp + pcrel offset` will no longer
reflect an accurate pcrel value. To avoid this problem, we "canonicalize"
our relocation by adding an `EH_Frame` symbol at `FDE start`, and updating
the reloc to be `target function address - (EH_Frame + new pcrel offset)`.

Reviewed By: #lld-macho, Roger

Differential Revision: https://reviews.llvm.org/D124561
2022-06-08 23:41:29 -04:00
Jez Ng 826be330af [lld-macho] Initial support for EH Frames
== Background ==

`llvm-mc` generates unwind info in both compact unwind and DWARF
formats. LLD already handles the compact unwind format; this diff gets
us close to handling the DWARF format properly.

== Caveats ==

It's not quite done yet, but I figure it's worth getting this reviewed
and landed first as it's shaping up to be a fairly large code change.

**Known limitations of the current code:**

* Only works for x86_64, for which `llvm-mc` emits "abs-ified"
  relocations as described in 618def651b.
  `llvm-mc` emits regular relocations for ARM EH frames, which we do not
  yet handle correctly.

Since the feature is not ready for real use yet, I've gated it behind a
flag that only gets toggled on during test suite runs. With most of the
new code disabled, we see just a hint of perf regression, so I don't
think it'd be remiss to land this as-is:

             base           diff           difference (95% CI)
  sys_time   1.926 ± 0.168  1.979 ± 0.117  [  -1.2% ..   +6.6%]
  user_time  3.590 ± 0.033  3.606 ± 0.028  [  +0.0% ..   +0.9%]
  wall_time  7.104 ± 0.184  7.179 ± 0.151  [  -0.2% ..   +2.3%]
  samples    30             31

== Design ==

Like compact unwind entries, EH frames are also represented as regular
ConcatInputSections that get pointed to via `Defined::unwindEntry`. This
allows them to be handled generically by e.g. the MarkLive and ICF
code. (But note that unlike compact unwind subsections, EH frame
subsections do end up in the final binary.)

In order to make EH frames "look like" a regular ConcatInputSection,
some processing is required. First, we need to split the `__eh_frame`
section along EH frame boundaries rather than along symbol boundaries.
We do this by decoding the length field of each EH frame. Second, the
abs-ified relocations need to be turned into regular Relocs.

== Next Steps ==

In order to support EH frames on ARM targets, we will either have to
teach LLD how to handle EH frames with explicit relocs, or we can try to
make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do
the latter as I think it will make the LLD implementation both simpler
and faster to execute.

== Misc ==

The `obj-file-with-stabs.s` test had to be updated as the previous
version would trip assertion errors in the code. It appears that in our
attempt to produce a minimal YAML test input, we created a file with
invalid EH frame data. I've fixed this by re-generating the YAML and not
doing any hand-pruning of it.

Reviewed By: #lld-macho, Roger

Differential Revision: https://reviews.llvm.org/D123435
2022-06-08 23:40:52 -04:00
Mogball 971e13d69e [mlir][ods] Mark StructAttr as deprecated 2022-06-09 03:23:31 +00:00
chenglin.bi 226c564329 [InstCombine] Add vector tests for shl+lshr+and transforms; NFC
D126617
2022-06-09 11:14:26 +08:00
Fangrui Song 62309ed955 [msan][test] Fix cpusetsize for another pthread_getaffinity_np.cpp test
Similar to D127368
2022-06-08 20:08:05 -07:00