Commit Graph

400160 Commits

Author SHA1 Message Date
Lang Hames aa061ddde7 [ORC] Fix the LLJITWithRemoteDebugging example.
This was broken by the switch from JITTargetAddress to ExecutorAddr in
21a06254a3.
2021-09-27 20:06:00 -07:00
Xiang1 Zhang ebe9944a34 [ISel] Legalized arithmetic.fence.f128 for 32-bits target
Reviewed By: Craig Topper, Wang Pengfei

Differential Revision: https://reviews.llvm.org/D110467
2021-09-28 10:27:25 +08:00
Anna Thomas 90fb73aa73 [LoopPred Test] Fix lld-x86_64-win BB failure
Need a more general CHECK line for testcase in 5df9112 for correctly
handling  lld-x86_64-win buildbot.
2021-09-27 21:28:46 -04:00
Ahsan Saghir 4f6a6ba126 Revert "tsan: fix trace tests on darwin"
This reverts commit 94ea36649e.

Reverting due to errors on buildbots.
2021-09-27 20:17:17 -05:00
Anna Thomas 5df9112ce3 Reland "[LoopPredication] Add testcase showing BPI computation. NFC"
This relands commit 16a62d4f.
Relanded after fixing CHECK-LINES for opt pipeline output to be more
general (based on failures seen in buildbot).
2021-09-27 21:15:46 -04:00
Lang Hames 61e25d2550 clang-format 2021-09-27 18:02:06 -07:00
Lang Hames 22f8276fe4 [llvm-jitlink] Add more information about allocation failures.
Slab allocator failures will now report requested size and remaining capacity.
2021-09-27 18:02:06 -07:00
Ahsan Saghir 593b074a09 [PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins
This patch adds the following built-ins:

__builtin_vsx_build_pair
__builtin_mma_build_acc

Reviewed By: #powerpc, nemanjai, lei

Differential Revision: https://reviews.llvm.org/D107647
2021-09-27 19:51:28 -05:00
Lang Hames 21a06254a3 [ORC] Switch from JITTargetAddress to ExecutorAddr for EPC-call APIs.
Part of the ongoing move to ExecutorAddr.
2021-09-27 16:53:09 -07:00
Michael Kruse 027c036663 [Polly] Reject regions entered by an indirectbr/callbr.
SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.

This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.

This fixes llvm.org/PR51964

Recommit with "REQUIRES: asserts" in test that uses statistics.
2021-09-27 18:49:11 -05:00
Joe Loser 9451d9da95
[libc++][NFC] s/enable_if<...>::type/enable_if_t<...> in span
There is some use of `enable_if<...>::type` when the rest of the file
uses `enable_if_t`. So, use `enable_if_t` consistently throughout.
2021-09-27 19:21:07 -04:00
Haowei Wu 283ed7de32 Revert "[Polly] Reject reject regions entered by an indirectbr/callbr."
This reverts commit 91f46bb77e which
causes test failures when assertions are off.
2021-09-27 16:05:33 -07:00
Lang Hames 6fe2e9a9cc [ORC] Hold shared_ptr<SymbolStringPool> in errors containing SymbolStringPtrs.
This allows these error values to remain valid, even if they tear down the JIT
itself.
2021-09-27 15:46:56 -07:00
Congzhe Cao c42772752a [CodeMoverUtils] Enhance isSafeToMoveBefore() when control flow equivalence is satisfied
With improved analysis in determining CFG equivalence that does
not require strict dominance and post-dominance conditions, we
now relax  isSafeToMoveBefore() such that an instruction I can
be moved before InsertPoint even if they do not strictly dominate
each other, as long as they follow the same control flow path.

For example,  we can move Instruction 0 before Instruction 1,
and vice versa.

```
if (cond1)
   // Instruction 0: %add = add i32 1, 2
if (cond1)
   // Instruction 1: %add2 = add i32 2, 1
```

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D110456
2021-09-27 18:37:36 -04:00
Kevin Athey b345952ad4 Revert "tsan: add a test for stack init race"
This reverts commit b72176b9bc.

Broke bot: https://lab.llvm.org/buildbot/#/builders/70/builds/12193
2021-09-27 15:31:23 -07:00
LLVM GN Syncbot 57cd7b018c [gn build] Port 6cfb4d46ba 2021-09-27 21:56:39 +00:00
Jozef Lawrynowicz 6cfb4d46ba [llvm-readobj] Support dumping of MSP430 ELF attributes
The MSP430 ABI supports build attributes for specifying
the ISA, code model, data model and enum size in ELF object files.

Differential Revision: https://reviews.llvm.org/D107969
2021-09-28 00:56:11 +03:00
Jon Chesterfield 2bc4d48a78 [libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal 2021-09-27 22:44:35 +01:00
Jon Chesterfield 738734f655 [libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv 2021-09-27 22:13:12 +01:00
Anna Thomas a0a9e3e05f Revert "[LoopPredication] Add testcase showing BPI computation. NFC"
This reverts commit 16a62d4f3d.

Needs some update to check lines to fix bb failure.
2021-09-27 17:08:57 -04:00
Louis Dionne 1e628d0c14 [libc++] Do not enable P1951 before C++23, since it's a breaking change
In reaction to the issues raised by Richard in https://llvm.org/D109066,
this commit does not apply P1951 as a DR in previous standard modes,
since it breaks valid code.

I do believe it should be applied as a DR, however ideally we'd get some
sort of statement from the Committee to this effect (and all implementations
would behave consistently). In the meantime, only implement P1951 starting
with C++23 -- we can always come back and apply it as a DR if that's what
the Committee says.

Differential Revision: https://reviews.llvm.org/D110347
2021-09-27 17:06:44 -04:00
Anna Thomas 16a62d4f3d [LoopPredication] Add testcase showing BPI computation. NFC
Precommit testcase for D110438. Since we do not preserve BPI in loop
pass manager, we are forced to compute BPI everytime Loop predication is
invoked.
The patch referenced changes that behaviour by preserving lossy BPI for
loop passes.
2021-09-27 16:54:22 -04:00
Simon Pilgrim 540ed354d3 [X86] Add slow/fast pmulld test coverage to vector-mul.ll 2021-09-27 21:53:56 +01:00
Kostya Kortchinsky 04f5913395 [gwp-asan] Initialize AllocatorVersionMagic at runtime
GWP-ASan's `AllocatorState` was recently extended with a
`AllocatorVersionMagic` structure required so that GWP-ASan bug reports
can be understood by tools at different versions.

On Fuchsia, this in included in the `scudo::Allocator` structure, and
by having non-zero initializers, this effectively moved the static
allocator structure from the `.bss` segment to the `.data` segment, thus
increasing (significantly) the size of the libc.

This CL proposes to initialize the structure with its magic numbers at
runtime, allowing for the allocator to go back into the `.bss` segment.

I will work on adding a test on the Scudo side to ensure that this type
of changes get detected early on. Additional work is also needed to
reduce the footprint of the (large) memory-tagging related structures
that are currently part of the allocator.

Differential Revision: https://reviews.llvm.org/D110575
2021-09-27 13:49:55 -07:00
Roman Lebedev ee6228ff8c
[NFC][X86] Add 'gather' optsize/minsize test coverage 2021-09-27 23:49:10 +03:00
Florian Mayer 4f352d444e [NFC] [PSI] explain encoding of PercentileCutoff.
Reviewed By: mtrofin, davidxl

Differential Revision: https://reviews.llvm.org/D109764
2021-09-27 21:41:33 +01:00
Fangrui Song 75f0194d3d [Driver] Remove confusing *-linux-android detection with non-android --target=
These values allow, for example, `--target=aarch64` and
`--target=aarch64-linux-gnu` to detect `aarch64-linux-android`. This is
confusing. Users should specify `--target=aarch64-linux-android` to get Android GCC
installation.

Reverts D53463.

Reviewed By: nickdesaulniers, danalbert

Differential Revision: https://reviews.llvm.org/D110379
2021-09-27 13:28:40 -07:00
Roman Lebedev f7e82e4fa8
[NFC][X86] Add test showing that legal `GATHER`'s are expoanded on Znver3 2021-09-27 22:41:09 +03:00
modimo 20faf78919 [ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation
Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities.

This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build:
1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities.
2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time.

Implementation-wise this adds the following summary function attributes:
1. noUnwind: function is noUnwind
2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions)
3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well)

Testing:
Clang self-build passes and 2nd stage build passes check-all
ninja check-all with newly added tests passing

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D36850
2021-09-27 12:28:07 -07:00
Tobias Gysi d20d0e145d [mlir][linalg] Finer-grained padding control.
Adapt the signature of the PaddingValueComputationFunction callback to either return the padding value or failure to signal padding is not desired.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D110572
2021-09-27 19:21:37 +00:00
Roman Lebedev 2a7a768dad
[X86][Costmodel] Load/store i16 Stride=4 VF=32 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.

For load we have:
https://godbolt.org/z/zP4hd8MT6 - for intels `Block RThroughput: =150.0`; for ryzens, `Block RThroughput: <=59`
So pick cost of `150`.

For store we have:
https://godbolt.org/z/vKb8zTK8E - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `64`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110548
2021-09-27 22:20:01 +03:00
Roman Lebedev ee5a050e2e
[X86][Costmodel] Load/store i16 Stride=4 VF=16 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =75.0`; for ryzens, `Block RThroughput: <=29.5`
So pick cost of `75`. (note that `# 32-byte Reload` does not affect throughput there.)

For store we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110543
2021-09-27 22:20:01 +03:00
Roman Lebedev 5615d6a6dd
[X86][Costmodel] Load/store i16 Stride=4 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/dd8T5P471 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=14.5`
So pick cost of `33`.

For store we have:
https://godbolt.org/z/zPxcKWhn4 - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `10`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110541
2021-09-27 22:20:01 +03:00
Roman Lebedev df2b42d12e
[X86][Costmodel] Load/store i16 Stride=4 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rnsf639Wh - for intels `Block RThroughput: =17.0`; for ryzens, `Block RThroughput: <=7.5`
So pick cost of `17`.

For store we have:
https://godbolt.org/z/565KKrcY6 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110537
2021-09-27 22:20:01 +03:00
Roman Lebedev 45caac91c4
[X86][Costmodel] Load/store i16 Stride=4 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5EYc6r9nh - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/z61e5d6GE - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110536
2021-09-27 22:20:01 +03:00
Chris Bieneman 18cf5b220d Fixing docs build
I always forget that new line...
2021-09-27 14:16:28 -05:00
Chris Bieneman 1e48ef2035 Implement #pragma clang final extension
This patch adds a new preprocessor extension ``#pragma clang final``
which enables warning on undefinition and re-definition of macros.

The intent of this warning is to extend beyond ``-Wmacro-redefined`` to
warn against any and all alterations to macros that are marked `final`.

This warning is part of the ``-Wpedantic-macros`` diagnostics group.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D108567
2021-09-27 14:11:16 -05:00
Sanjay Patel fdba1dccbe [InstCombine] reduce code for shl-of-sub transform; NFC 2021-09-27 14:56:01 -04:00
Sanjay Patel b75ed244af [InstCombine] add tests for shl-of-sub; NFC 2021-09-27 14:56:01 -04:00
Aart Bik 06e2a0684e [mlir][sparse] sampled matrix multiplication fusion test
This integration tests runs a fused and non-fused version of
sampled matrix multiplication. Both should eventually have the
same performance!

NOTE: relies on pending tensor.init fix!

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D110444
2021-09-27 11:50:49 -07:00
Nico Weber 36dc5c048a Revert "[clangd] Refactor IncludeStructure: use File (unsigned) for most computations"
This reverts commit 0b1eff1bc5.
Breaks check-clangd on Windows, see comments on
https://reviews.llvm.org/D110386
2021-09-27 14:38:18 -04:00
Jon Chesterfield 80fa43fe9a Revert "[openmp] Add addrspacecast to getOrCreateIdent"
This reverts commit 1a761e5b7b.
Failed CI, albeit with a different failure mode to BZ51982
2021-09-27 19:27:35 +01:00
Jon Chesterfield 1a761e5b7b [openmp] Add addrspacecast to getOrCreateIdent
Fixes 51982. Minor refactor to remove `return x = y` construct.

Test case derived from https://github.com/ROCm-Developer-Tools/aomp/\
blob/aomp-dev/test/smoke/nest_call_par2/nest_call_par2.c by deleting
parts while checking the assertion failure still occurred.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110556
2021-09-27 19:23:12 +01:00
Aart Bik ec97a205c3 [mlir][sparse] preserve zero-initialization for materializing buffers
This revision makes sure that when the output buffer materializes locally
(in contrast with the passing in of output tensors either in-place or not
in-place), the zero initialization assumption is preserved. This also adds
a bit more documentation on our sparse kernel assumption (viz. TACO
assumptions).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D110442
2021-09-27 11:22:05 -07:00
Aaron Ballman ef0f728abe Add a missing include to appease the build bots 2021-09-27 14:19:39 -04:00
Sanjay Patel 623f93ed1c [InstCombine] add use check to shl transform
This bug was introduced with the refactoring in:
9075edc89b
...but there were no tests to detect it.
2021-09-27 14:10:26 -04:00
Sanjay Patel d992950078 [InstCombine] add tests for opposing shifts separated by trunc; NFC 2021-09-27 14:10:26 -04:00
Jameson Nash e27a6db529 Bad SLPVectorization shufflevector replacement, resulting in write to wrong memory location
We see that it might otherwise do:

  %10 = getelementptr {}**, <2 x {}***> %9, <2 x i32> <i32 10, i32 4>
  %11 = bitcast <2 x {}***> %10 to <2 x i64*>
...
  %27 = extractelement <2 x i64*> %11, i32 0
  %28 = bitcast i64* %27 to <2 x i64>*
  store <2 x i64> %22, <2 x i64>* %28, align 4, !tbaa !2

Which is an out-of-bounds store (the extractelement got offset 10
instead of offset 4 as intended). With the fix, we correctly generate
extractelement for i32 1 and generate correct code.

Differential Revision: https://reviews.llvm.org/D106613
2021-09-27 14:06:13 -04:00
Carlos Galvez b2a2c38349 Fix bug in readability-uppercase-literal-suffix
Fixes https://bugs.llvm.org/show_bug.cgi?id=51790. The check triggers
incorrectly with non-type template parameters.

A bisect determined that the bug was introduced here:
ea2225a10b

Unfortunately that patch can no longer be reverted on top of the main
branch, so add a fix instead. Add a unit test to avoid regression in
the future.
2021-09-27 14:03:53 -04:00
peter klausler 9eab0da183 [flang] Catch branching into FORALL/WHERE constructs
Enforce constraints C1034 & C1038, which disallow the use
of otherwise valid statements as branch targets when they
appear in FORALL &/or WHERE constructs.  (And make the
diagnostic message somewhat more user-friendly.)

Differential Revision: https://reviews.llvm.org/D109936
2021-09-27 10:51:44 -07:00