Commit Graph

436299 Commits

Author SHA1 Message Date
Felipe de Azevedo Piovezan 9749587498 [lldb] Reset breakpoint hit count before new runs
A common debugging pattern is to set a breakpoint that only stops after
a number of hits is recorded. The current implementation never resets
the hit count of breakpoints; as such, if a user re-`run`s their
program, the debugger will never stop on such a breakpoint again.

This behavior is arguably undesirable, as it renders such breakpoints
ineffective on all but the first run. This commit changes the
implementation of the `Will{Launch, Attach}` methods so that they reset
the _target's_ breakpoint hitcounts.

Differential Revision: https://reviews.llvm.org/D133858
2022-09-19 12:56:12 -04:00
mydeveloperday 95b3947111 [clang-format] JSON formatting add new option for controlling newlines in json arrays
Working in a mixed environment of both vscode/vim with a team configured prettier configuration, this can leave clang-format and prettier fighting each other over the formatting of arrays, both simple arrays of elements.

This review aims to add some "control knobs" to the Json formatting in clang-format to help align the two tools so they can be used interchangeably.

This will allow simply arrays `[1, 2, 3]` to remain on a single line but will break those arrays based on context within that array.

Happy to change the name of the option (this is the third name I tried)

Reviewed By: HazardyKnusperkeks, owenpan

Differential Revision: https://reviews.llvm.org/D133589
2022-09-19 17:54:39 +01:00
Keith Smiley f331ccca26
[ORC] Fix macho section name typo
I don't think __obj_selrefs is a thing, but __objc_selrefs definitely
is.

Differential Revision: https://reviews.llvm.org/D130221
2022-09-19 09:49:46 -07:00
Rahul Joshi e4c395018e BEGIN_PUBLIC
Use isa<> instead of dyn_cast
END_PUBLIC

Differential Revision: https://reviews.llvm.org/D134092
2022-09-19 09:48:39 -07:00
Krzysztof Parzyszek 3eee45cdc8 [Hexagon] Rework SplitHvxPairOp to be a general vector splitting utiity
Enable creating an idiom: V -> opJoin(SplitVectorOp(V))
2022-09-19 09:42:13 -07:00
Simon Pilgrim 6b4d409f69 [CostModel][X86] Add CostKinds handling for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 17:37:58 +01:00
Xiang Li 649a59712f [clang] Allow vector of BitInt
Remove check which disable BitInt as element type for ext_vector.

Enabling it for HLSL to use _BitInt(16) as 16bit int at https://reviews.llvm.org/D133668

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D133634
2022-09-19 09:26:56 -07:00
zhijian fdffdf39fc fixed a compiler error as description in
https://lab.llvm.org/buildbot/#/builders/174/builds/13432

XCOFFObjectFile.cpp:805:12: error: reinterpret_cast from 'unsigned long' to 'uintptr_t' (aka 'unsigned int') is not allowed
    return reinterpret_cast<uintptr_t>(0ul);
2022-09-19 12:14:02 -04:00
Florian Hahn f02ff5348f
[LV] Move new epilog-vectorization-widen-inductions.ll to AArch64 dir.
The test requires the AArch64 backend, so move it to the right subdir.
2022-09-19 17:13:06 +01:00
Florian Hahn 6087b6386e
[LV] Add tests for epilogue vectorization with widened inductions.
Includes a test for the miscompile in #57712.
2022-09-19 17:10:41 +01:00
Krzysztof Parzyszek e5844462f6 [Hexagon] Use proper output chain when widening HVX loads 2022-09-19 09:04:13 -07:00
Mingming Liu 34db7c64df [NFC] Use opaqueptr in llvm/test/Transforms/SimplifyCFG/preserve-llvm-loop-metadata.ll
Use opaqueptr for test case
llvm/test/Transforms/SimplifyCFG/preserve-llvm-loop-metadata.ll.

- Adjust variable number accordingly since bitcast between different pointer
  types are not necessary.

Differential Revision: https://reviews.llvm.org/D134159
2022-09-19 09:01:11 -07:00
zhijian a3aab98ef4 fixed a compiler error as description in
https://lab.llvm.org/buildbot/#/builders/216/builds/9977

XCOFFOtFile.cpp: error C3487: 'unsigned long': all return expressions must deduce to the same type: previously it was 'uintptr_t'
2022-09-19 11:57:45 -04:00
Katherine Rasmussen e13b273d63 [flang] Write semantics test for atomic_and
Write a semantics test for the atomic intrinsic subroutine,
atomic_and.

Reviewed By: rouson

Differential Revision: https://reviews.llvm.org/D133727
2022-09-19 08:47:06 -07:00
Simon Pilgrim 135c9b2c4b [CostModel][X86] Add CostKinds handling for vector ctlz instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 16:44:09 +01:00
spupyrev 539b6c68cb [BOLT] Unifying implementations of ext-tsp
After BOLT's merge to LLVM, there are two (almost identical) versions of the
code layout algorithm. The diff unifies the implementations by keeping the one
in LLVM.

There are mild changes in the resulting block orders. I tested the changes
extensively both on the clang binary and on prod services. Didn't see stat sig
differences on average.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D129895
2022-09-19 08:29:08 -07:00
zhijian b135358877 [AIX] llvm-nm support environment "OBJECT_MODE" for option -X on AIX OS
Summary:

according nm in AIX OS , https://www.ibm.com/docs/en/aix/7.2?topic=n-nm-command

In AIX OS, The default is to process 32-bit object files (ignore 64-bit objects). The mode can also be set with the OBJECT_MODE environment variable. For example, OBJECT_MODE=64 causes nm to process any 64-bit objects and ignore 32-bit objects. The -X flag overrides the OBJECT_MODE variable.

In non AIX OS. The default is to process all support object files. and not support the OBJECT_MODE environment variable.

Reviewers:  James Henderson

Differential Revision: https://reviews.llvm.org/D132494
2022-09-19 11:27:19 -04:00
Tue Ly 354ee3814c [libc][Obvious] Fix exp10f spec. 2022-09-19 11:21:01 -04:00
Louis Dionne f1a601fe88 [libc++] Always query the compiler to find whether a type is always lockfree
In https://llvm.org/D56913, we added an emulation for the __atomic_always_lock_free
compiler builtin when compiling in Freestanding mode. However, the emulation
did (and could not) give exactly the same answer as the compiler builtin,
which led to a potential ABI break for e.g. enum classes.

After speaking to the original author of D56913, we agree that the correct
behavior is to instead always use the compiler builtin, since that provides
a more accurate answer, and __atomic_always_lock_free is a purely front-end
builtin which doesn't require any runtime support. Furthermore, it is
available regardless of the Standard mode (see https://godbolt.org/z/cazf3ssYY).

However, this patch does constitute an ABI break. As shown by https://godbolt.org/z/1eoex6zdK:
- In LLVM <= 11.0.1, an atomic<enum class with 1 byte> would not contain a lock byte.
- In LLVM >= 12.0.0, an atomic<enum class with 1 byte> would contain a lock byte.

This patch breaks the ABI again to bring it back to 1 byte, which seems
like the correct thing to do.

Fixes #57440

Differential Revision: https://reviews.llvm.org/D133377
2022-09-19 11:10:02 -04:00
Simon Pilgrim 0a4c946abc Fix MSVC warning "all return expressions must deduce to the same type" 2022-09-19 16:00:48 +01:00
Sam McCall e424418358 [clangd] Allow programmatically disabling rename of virtual method hierarchies.
This feature relies on Relations in the index being complete.
An out-of-tree index implementation is missing some override relations, so
such renames end up breaking the code.
We plan to fix it, but this flag is a cheap band-aid for now.

Differential Revision: https://reviews.llvm.org/D133440
2022-09-19 16:59:28 +02:00
Simon Pilgrim 2538adde5c [CostModel][X86] Add CostKinds handling for cttz
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-19 15:57:03 +01:00
zhijian dcd5abd4c4 [AIX] llvm-readobj support a new option --exception-section for xcoff object file.
Summary:

llvm-readobj support a new option --exception-section for xcoff object file.

https://www.ibm.com/docs/en/aix/7.2?topic=formats-xcoff-object-file-format#XCOFF__iua3i23ajbau

Reviewers:  James Henderson,Paul Scoropan

Differential Revision: https://reviews.llvm.org/D133030
2022-09-19 10:55:48 -04:00
Sam McCall 924974a3a1 [clangd] Improve inlay hints of things expanded from macros
When we aim a hint at some expanded tokens, we're only willing to attach it
to spelled tokens that exactly corresponde.

e.g.
int zoom(int x, int y, int z);
int dummy = zoom(NUMBERS);

Here we want to place a hint "x:" on the expanded "1", but we shouldn't
be willing to place it on NUMBERS, because it doesn't *exactly*
correspond (it has more tokens).

Fortunately we don't even have to implement this algorithm from scratch,
TokenBuffer has it.

Fixes https://github.com/clangd/clangd/issues/1289
Fixes https://github.com/clangd/clangd/issues/1118
Fixes https://github.com/clangd/clangd/issues/1018

Differential Revision: https://reviews.llvm.org/D133982
2022-09-19 16:44:21 +02:00
Benjamin Kramer 9b2a3d20f2 [bazel] Port 233de4e808 2022-09-19 16:38:20 +02:00
Guray Ozen 233de4e808 [mlir] Add map_nested_foreach_thread_to_gpu_threads op to transform dialect
This revision adds a new op `map_nested_foreach_thread_to_gpu_threads` to transform dialect. The op searches `scf.foreach_threads` inside the `gpu_launch` and distributes them with `gpu.thread_id` attribute.

Loop mapping is explicit and given by the `map_nested_foreach_thread_to_gpu_threads` op. Mapping is done one-to-one, therefore the loops dissappear.

The dynamic trip count or trip count that are larger than thread size are not supported for the time being. However, we can indeed support them by generating a loop inside with cyclic scheduling.

For the time being, trip counts that are dynamic or bigger than thread sizes are not supported. However, in the future the compiler can indeed generate a loop with static cyclic scheduling to support these cases.

Current mechanism allows `scf.foreach_threads` to be siblings or nested. There cannot be interleaving code between the loops when they are nested.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133950
2022-09-19 16:27:30 +02:00
Tue Ly 47c4a87641 [libc][Obvious] Remove constexpr qualifier from Exp10Base::powb_lo. 2022-09-19 10:13:29 -04:00
Tue Ly a752460d73 [libc][math] Implement exp10f function correctly rounded to all rounding modes.
Implement exp10f function correctly rounded to all rounding modes.

Algorithm: perform range reduction to reduce
```
  10^x = 2^(hi + mid) * 10^lo
```
where:
```
  hi is an integer,
  0 <= mid * 2^5 < 2^5
  -log10(2) / 2^6 <= lo <= log10(2) / 2^6
```
Then `2^mid` is stored in a table of 32 entries and the product `2^hi * 2^mid` is
performed by adding `hi` into the exponent field of `2^mid`.
`10^lo` is then approximated by a degree-5 minimax polynomials generated by Sollya with:
```
  > P = fpminimax((10^x - 1)/x, 4, [|D...|], [-log10(2)/64. log10(2)/64]);
```
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 10.215
System LIBC reciprocal throughput : 7.944

LIBC reciprocal throughput        : 38.538
LIBC reciprocal throughput        : 12.175   (with `-msse4.2` flag)
LIBC reciprocal throughput        : 9.862    (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 40.744
System LIBC latency : 37.546

BEFORE
LIBC latency        : 48.989
LIBC latency        : 44.486   (with `-msse4.2` flag)
LIBC latency        : 40.221   (with `-mfma` flag)
```
This patch relies on https://reviews.llvm.org/D134002

Reviewed By: orex, zimmermann6

Differential Revision: https://reviews.llvm.org/D134104
2022-09-19 10:01:40 -04:00
Tue Ly cd1d71c5f1 [libc][Obvious] Remove constexpr qualifier from ExpBase::powb_lo. 2022-09-19 09:29:37 -04:00
Nikita Popov 36f325413e [SCEV] Don't verify dispositions of invalid loops
This should fix the expensive checks build. Ideally we would not
have invalid loops in LoopDispositions.
2022-09-19 15:07:44 +02:00
Simon Pilgrim d90a42d64c [CostModel][X86] Add CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF cost handling
Without LZCNT/BMI, the *_ZERO_UNDEF costs are cheaper as they can avoid the zero handling.
2022-09-19 14:06:33 +01:00
Nikita Popov dd61726d5b Revert "[SimplifyCFG] accumulate bonus insts cost"
This reverts commit e5581df60a.

This causes major compile-time regressions, about 2-3% end-to-end
on CTMark.
2022-09-19 14:46:43 +02:00
Tue Ly 4973eee122 [libc][math] Improve tanhf performance.
Optimize the core part of `tanhf` implementation that is to compute `e^x`
similar to https://reviews.llvm.org/D133870.  Factor the constants and
polynomial approximation out so that it can be used for `exp10f`

Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanhf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 13.377
System LIBC reciprocal throughput : 55.046

BEFORE:
LIBC reciprocal throughput        : 75.674
LIBC reciprocal throughput        : 33.242    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 25.927    (with `-mfma` flag)

AFTER:
LIBC reciprocal throughput        : 26.359
LIBC reciprocal throughput        : 18.888    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 14.243    (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanhf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 43.365
System LIBC latency : 123.499

BEFORE
LIBC latency        : 112.968
LIBC latency        : 104.908   (with `-msse4.2` flag)
LIBC latency        : 92.310    (with `-mfma` flag)

AFTER
LIBC latency        : 69.828
LIBC latency        : 63.874    (with `-msse4.2` flag)
LIBC latency        : 57.427    (with `-mfma` flag)
```

Reviewed By: orex, zimmermann6

Differential Revision: https://reviews.llvm.org/D134002
2022-09-19 08:43:03 -04:00
Simon Pilgrim 5665d0941a [SLP][X86] Add AVX512 test coverage to CTLZ/CTTZ tests
Only AVX512 has decent CTTZ/CTLZ vector ops, add tests to ensure we definitely vectorize these
2022-09-19 13:07:55 +01:00
Aaron Ballman a244194f73 Add additional test coverage for C2x N2508
This spotted a mistake with the original patch, so it puts the status
back to "partial" in the C status tracking page.

This amends 510383626f.
2022-09-19 07:52:07 -04:00
Simon Pilgrim 8206044183 [DAG] SimplifyDemandedVectorElts - add MULHS/MULHU handling to existing MUL/AND handling
Allows to determine known zero elements, which particularly helps simplification of DIV/REM by constant patterns
2022-09-19 12:44:43 +01:00
Aaron Ballman f51789ce5e Fix a typo in the release notes; NFC 2022-09-19 07:37:41 -04:00
Nicolas Vasilache ecd9dc0499 [mlir][Transform] Add a new navigation op to retrieve the producer of an operand
Given an opOperand uniquely determined by the operation `%op` and the operand number `num`,
the `transform.get_producer_of_operand %op[num]` returns the handle to the unique operation
that produced the SSA value used as opOperand.

The transform fails if the operand is a block argument.

Differential Revision: https://reviews.llvm.org/D134171
2022-09-19 04:16:15 -07:00
Nicolas Vasilache 12831be96c [mlir][Linalg] NFC - Cleanup internal transform APIs and produce better messages on failure to apply. 2022-09-19 04:16:15 -07:00
Max Kazantsev 92e9bddc49 [LoopRotate] Drop loop dispositions when rotating loops. PR56260
This is required because if there is a pure loop-invariant instruction, Loop Rotation
may decide to not clone it and just hoist it instead. If SCEV has previously cached
that it was loop-variant (not being smart enough to prove invariance), we may end
up with inconsistent cache state (which may later trigger false-negative assertion
failures checking that something was invariant).

This is a conservative fix that unconditionally drops the dispositions. We could
only drop it if the hoisting has actually happened, but it should take some time
understanding whether it's safe with all other things this function does.

Differential Revision: https://reviews.llvm.org/D134167
Reviewed By: fhahn
2022-09-19 18:01:02 +07:00
Nuno Lopes d953d01737 Introduce -enable-global-analyses to allow users to disable inter-procedural analyses
Alive2 doesn't support verification of optimizations that use inter-procedural analyses.
Right now, clang uses GlobalsAA by default and there's no way to disable it.
This leads to Alive2 producing false positives.
The added flag allows us to skip global analyses altogether.

Differential Revision: https://reviews.llvm.org/D134139
2022-09-19 11:59:35 +01:00
LLVM GN Syncbot bdb9ca4830 [gn build] Port 1146d40d9a 2022-09-19 10:55:29 +00:00
Simon Pilgrim 1146d40d9a [UnitTests] Add ShuffleVectorInst unit test coverage for shuffle mask kind matchers
Add tests for the core static shuffle pattern match helpers
2022-09-19 11:53:30 +01:00
Lorenzo Chelini 3718082e2b [MLIR][Linalg] introduce batch-reduce GEMM
The batch-reduce GEMM kernel essentially multiplies a sequence of input tensor
blocks (which form a batch) and the partial multiplication results are reduced
into a single output tensor block.

See: https://ieeexplore.ieee.org/document/9139809 for more details.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D134163
2022-09-19 12:50:27 +02:00
Max Kazantsev 21a9abc1ce [LoopFuse] Drop loop dispositions before reassigning blocks to other loop
This bug was found by recent improvement in SCEV verifier. The code in LoopFuse
directly reassigns blocks to be a part of a different loop, which should automatically
invalidate all related cached loop dispositions.

Differential Revision: https://reviews.llvm.org/D134173
Reviewed By: nikic
2022-09-19 17:43:06 +07:00
Max Kazantsev bb68b2402d [SCEV] Verify contents of loop disposition cache
It seems that it is sometimes broken. Initial motivation for this was
investigation of https://github.com/llvm/llvm-project/issues/56260, but
it also seems that we have found an unrelated bug in LoopFusion that leaves
broken caches.

Differential Revision: https://reviews.llvm.org/D134158
Reviewed By: nikic
2022-09-19 17:43:00 +07:00
David Green 908b3b6ccb [AArch64] Use fast-math-flags in isAssociativeAndCommutative
Previously only using the UnsafeFPMath option, this now looks for the
fast moth flags on the instructions, using the same flag flags as other
backends.
2022-09-19 11:34:00 +01:00
Lorenzo Chelini e9dd2b2d4b Revert "[MLIR][Linalg] introduce batch-reduce GEMM"
This reverts commit f381768a8d.
2022-09-19 12:17:30 +02:00
lorenzo chelini f381768a8d [MLIR][Linalg] introduce batch-reduce GEMM
The batch-reduce GEMM kernel essentially multiplies a sequence of input tensor
blocks (which form a batch) and the partial multiplication results are reduced
into a single output tensor block.

See: https://ieeexplore.ieee.org/document/9139809 for more details.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D134163
2022-09-19 12:11:54 +02:00
Simon Pilgrim 393cc6a354 [LoopVectorize] Regenerate runtime-check.ll 2022-09-19 10:25:48 +01:00