Add the remaining math and common builtin functions from the OpenCL C
specification.
Patch by Pierre Gondois and Sven van Haastregt.
Differential Revision: https://reviews.llvm.org/D69883
Patch allows importing declarations of functions and variables, referenced
by the initializer of some other readonly variable.
Differential revision: https://reviews.llvm.org/D69561
We have a vector compare reduction problem seen in PR39665 comment 2:
https://bugs.llvm.org/show_bug.cgi?id=39665#c2
Or slightly reduced here:
define i1 @cmp2(<2 x double> %a0) {
%a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0>
%b = extractelement <2 x i1> %a, i32 0
%c = extractelement <2 x i1> %a, i32 1
%d = and i1 %b, %c
ret i1 %d
}
SLP would not attempt to turn this into a vector reduction because there is an
artificial lower limit on that transform. We can not completely remove that limit
without inducing regressions though, so this patch just hacks an extra attempt at
creating a 2-way reduction to the end of the analysis.
As shown in the test file, we are still not getting some of the motivating cases,
so follow-on patches will be needed to solve those cases.
Differential Revision: https://reviews.llvm.org/D59710
`saa` and `saad` are 32-bit and 64-bit store atomic add instructions.
memory[base] = memory[base] + rt
These instructions are available for "Octeon+" CPU. The patch adds support
for both instructions to MIPS assembler and diassembler and introduces new
CPU type - "octeon+".
Next patches will implement `.set arch=octeon+` directive and `AFL_EXT_OCTEONP`
ISA extension flag support.
Differential Revision: https://reviews.llvm.org/D69849
It broke Chromium, causing "Instruction does not dominate all uses!" errors.
See https://bugs.chromium.org/p/chromium/issues/detail?id=1022297#c1 for a
reproducer.
> If the recurrence PHI node has a single user, we can sink any
> instruction without side effects, given that all users are dominated by
> the instruction computing the incoming value of the next iteration
> ('Previous'). We can sink instructions that may cause traps, because
> that only causes the trap to occur later, but not on any new paths.
>
> With the relaxed check, we also have to make sure that we do not have a
> direct cycle (meaning PHI user == 'Previous), which indicates a
> reduction relation, which potentially gets missed by
> ReductionDescriptor.
>
> As follow-ups, we can also sink stores, iff they do not alias with
> other instructions we move them across and we could also support sinking
> chains of instructions and multiple users of the PHI.
>
> Fixes PR43398.
>
> Reviewers: hsaito, dcaballe, Ayal, rengolin
>
> Reviewed By: Ayal
>
> Differential Revision: https://reviews.llvm.org/D69228
Following up on https://reviews.llvm.org/D62221, this change introduces
the settings plugin.process.gdb-remote.use-g-packet-for-reading. When
they are on, 'g' packets are used for reading registers.
Using 'g' packets can improve performance by reducing the number of
packets exchanged between client and server when a large number of
registers needs to be fetched.
Differential revision: https://reviews.llvm.org/D62931
Summary:
This should be NFC to clang-rename, by default the traversal scope is
TUDecl. Traversing the TUDecl in clangd is a performance cliff, we should
avoid it.
Reviewers: ilya-biryukov
Subscribers: kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D69892
This caused Chromium builds to fail with "inlinable function call in a function
with debug info must have a !dbg location" errors. See
https://bugs.chromium.org/p/chromium/issues/detail?id=1022296#c1 for a
reproducer.
> Currently, clang emits subprograms for declared functions when the
> target debugger or DWARF standard is known to support entry values
> (DW_OP_entry_value & the GNU equivalent).
>
> Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU
> tail calls.
>
> Pre-patch debug session with a cross-TU tail call:
>
> ```
> * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt]
> frame #1: 0x0000000100000f99 main`main at a.c:8:10 [opt]
> ```
>
> Post-patch (note that the tail-calling frame, "helper", is visible):
>
> ```
> * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt]
> frame #1: 0x0000000100000f80 main`helper [opt] [artificial]
> frame #2: 0x0000000100000f99 main`main at a.c:8:10 [opt]
> ```
>
> rdar://46577651
>
> Differential Revision: https://reviews.llvm.org/D69743
It converts binary contents of .hash and .gnu.hash that were generated by a linker
to YAML descriptions.
I've also dropped Shift2 and BloomFilter values because they are not needed here.
Differential revision: https://reviews.llvm.org/D69881
This was added for Linux toolchains in rC271692, this
patch extends this to the Hurd toolchain.
Patch by sthibaul (Samuel Thibault)
Differential Revision: https://reviews.llvm.org/D69754
Summary:
When adjusting function entry counts after inlining, Funciton::setEntryCount is called without providing an import function list. The side effect of that is the previously set import function list will be dropped. The import function list is used by ThinLTO to help import hot cross module callee for LTO inlining, so dropping that during ThinLTO pre-link may adversely affect LTO inlining. The fix is to keep the list while updating entry counts for inlining.
Reviewers: wmi, davidxl, tejohnson
Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69736
AMDGPU has some atomic instructions that do not return the previous
result, and can only be selected if there are no uses. The source
pattern will only match if the use is empty, so it should be safe to
discard the result.
"[SLP] Generalization of stores vectorization."
"[SLP] Fix -Wunused-variable. NFC"
"[SLP] Vectorize jumbled stores."
As they're causing significant (10-30x) compile time regressions on
vectorizable code.
The primary cause of the compile-time regression is f228b53716.
This reverts commits:
f228b537165503455ccb21d498c9c0
Performance issues lead to the libc++ std::function formatter to be disabled.
This change is the first of two changes that should address the performance issues and allow us to enable the formatter again.
In some cases we end up scanning the symbol table for the callable wrapped by std::function for those cases we will now cache the results and used the cache in subsequent look-ups. This still leaves a large cost for the initial lookup which will be addressed in the next change.
Differential Revision: https://reviews.llvm.org/D67111
Have CMake treat the unwind libraries as C libraries rather than C++.
There is no C++ runtime dependency at runtime. This ensures that we do
not accidentally end up with a link against the C++ runtime.
We need to explicitly reset the implicitly linked libraries for C++ to
ensure that we do not have CMake force the link against the C++ runtime.
This adjustment should enable the NetBSD bots to be happy with this
change.
The basic idea of the transform is to convert variant loop exit conditions into invariant exit conditions by changing the iteration on which the exit is taken when we know that the trip count is unobservable. See the original patch which introduced the code for a more complete explanation.
The individual parts of this have been reviewed, the result has been fuzzed, and then further analyzed by hand, but despite all of that, I will not be suprised to see breakage here. If you see problems, please don't hesitate to revert - though please do provide a test case. The most likely class of issues are latent SCEV bugs and without a reduced test case, I'll be essentially stuck on reducing them.
(Note: A bunch of tests were opted out of the new transform to preserve coverage. That landed in a previous commit to simplify revert cycles if they turn out to be needed.)
I'm about to enable the new loop predication transform by default. It has the effect of completely destroying many read only loops - which happen to be a super common idiom in our test cases. So as to preserve test coverage of other transforms, disable the new transform where it would cause sharp test coverage regressions.
(This is semantically part of the enabling commit. It's committed separate to ease revert if the actual flag flip gets reverted.)
return value location depends on the calling convention of the callee.
`F.getCallingConv()`, however, is the caller CC. Correct it to the
callee CC from `CallLoweringInfo`.
Fixes PR43449
Patch by Shu-Chun Weng!
Summary:
Much like D67339, adds ConstantRange handling for
when we know no-wrap behavior of the `sub`.
Unlike addWithNoWrap(), we only get lucky re returning empty set
for signed wrap. For unsigned, we must perform overflow check manually.
A patch that makes use of this in LVI (CVP) to be posted later.
Reviewers: nikic, shchenz, efriedma
Reviewed By: nikic
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69918
As discussed in https://reviews.llvm.org/D69918
that happens to work as intended, and returns empty set if
there is always an overflow because we get lucky with intersection.
Since there's now an explicit test for that, let's prefer cleaner code.