Summary:
This commit allows specifying LIBCXX_ENABLE_PARALLEL_ALGORITHMS when
configuring libc++ in CMake. When that option is enabled, libc++ will
assume that the PSTL can be found somewhere on the CMake module path,
and it will provide the C++17 parallel algorithms based on the PSTL
(that is assumed to be available).
The commit also adds support for running the PSTL tests as part of
the libc++ test suite.
The first attempt to commit this failed because it exposed a bug in the
tests for modules. Now that this has been fixed, it should be safe to
commit this.
Reviewers: EricWF
Subscribers: mgorny, christof, jkorous, dexonsmith, libcxx-commits, mclow.lists, EricWF
Tags: #libc
Differential Revision: https://reviews.llvm.org/D60480
llvm-svn: 367903
This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.
This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.
Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.
llvm-svn: 367901
Intended use case is:
./utils/update_test_checks.py test/Transform/PassDir/* --update-only
(i.e. rapidly be able to see changes in autogened filed, before handing non-autogened tests individually)
Differential Revision: https://reviews.llvm.org/D65610
llvm-svn: 367900
Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should
the target support the it. Unless `TTI::MemCmpExpansionOptions()`
is overridden by the target.
In a proprietary benchmark we see a performance drop of about 12% on PNG
compression before this patch, though it passes all tests.
This patch mirrors X86 for AArch64 and initializes
`TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when
appropriate. No tuning of the parameters was performed, but, at this point,
it's enough to recover the performance drop above.
This problem also exists on ARM. Once a consensus is reached for AArch64, we
can work to fix ARM as well.
Authors:
- Evandro Menezes (@evandro) <e.menezes@samsung.com>
- Brian Rzycki (@brzycki) <b.rzycki@samsung.com>
Differential revision: https://reviews.llvm.org/D64805
llvm-svn: 367898
This dependency was removed in r357486, which has lead to a stream of difficult to diagnose bugs.
Without this dependency, when building with `LLVM_OPTIMIZED_TABLEGEN=On` the native tablegen executible may not be rebuilt at all, and often won't get rebuilt before targets that use the tablegen headers. In the best case this results in a build-time failure, in the worst case it results in runtime failures.
llvm-svn: 367895
Summary:
The Arm Neoverse N1 Software Optimization Guide [1], Section "4.8 Branch
instruction alignment" states:
"Consider aligning subroutine entry points and branch targets to 32B
boundaries, within the bounds of the code-density requirements of the
program."
This patch sets the preferred function alignment on Neoverse N1 to 2^4=16B.
This was already the case in some of the latest Cortex-A CPUs. Benchmarking
in previous Cortex-A CPUs suggested that 16B alignment is already better
than the default. See commit d04ee305.
The reason we don't set it to 32B right now (as the optimisation guide
suggests) is that this will impact code size and perhaps the instruction
cache performance. Therefore we need benchmark numbers first.
I have also added testing for A75 and A76 that we were missing.
[1] https://developer.arm.com/docs/swog309707/latest
Reviewers: fhahn, greened, samparker, dmgreen
Reviewed By: dmgreen
Subscribers: dmgreen, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65654
llvm-svn: 367894
This appears to slightly help patterns similar to what's
shown in PR42874:
https://bugs.llvm.org/show_bug.cgi?id=42874
...but not in the way requested.
That fix will require some later IR and/or backend pass to
decompose multiply/shifts into something more optimal per
target. Those transforms already exist in some basic forms,
but probably need enhancing to catch more cases.
https://rise4fun.com/Alive/Qzv2
llvm-svn: 367891
Some compiler, notably older gccs (< 8) can have trouble with multiline raw
string literals inside macros. This just moves the code outsize the macro, to
attempt to appease the bots.
llvm-svn: 367885
Summary:
This is a follow up to r367237. MachineFunction::getMachineMemOperand()
adds the offset parameter to the existing offset instead of resetting it.
So we need to reset the offset to the correct value after calling this
function.
Reviewers: arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65557
llvm-svn: 367881
Some tls-*.s tests do not test generic TLS behavior but rather are x86 specific.
Rename them to i386-*.s or x86-64-*.s
Delete tls-static.s: covered by tls-opt.s
Delete tls-opt-no-plt.s: add --implicit-check-not=.plt to x86-64-tls-gdie.s to cover it
Rename tls-dynamic-i686.s to i386-tls-dynamic.s
Rename tls-i686.s to i386-tls-le.s
Rename tls-opt-i686.s to i386-tls-opt.s
Rename tls-opt-iele-i686-nopic.s to i386-tls-opt-iele-nopic.s
Rename tls-dynamic.s to x86-64-tls-dynamic.s . IE should be split off in the future.
Rename tls-error.s to x86-64-reloc-tpoff32-error.s
Rename tls-opt-gdie.s to x86-64-tls-gdie.s
Rename tls-opt-x86_64-noplt.s to x86-64-tls-opt-noplt.s
Rename tls-opt-local.s => x86-64-tls-ie-opt-local.s . It can be merged with x86-64-tls-ie-local.s in the future.
llvm-svn: 367877
The MSan bot was (rightfully) complaining that NumASTLoaded was
unitialized, so put the initialization removed in r367829 back in.
While here, remove two needless semicolons added in that change.
llvm-svn: 367875
This was switching to use a format store for a non-format store for
f16 types. Also fixes i16/f16 stores on targets without legal f16.
The corresponding loads also need to be fixed.
llvm-svn: 367872
Without context we assume SGPR. Allowing VGPR constants theoretically
helps avoid a copy. This seems to not actually work now, and the
choice isn't based on the use bank.
llvm-svn: 367871
I'm not sure what complications these present, but the current
argument lowering is pretty much directly copied from the DAG
lowering, so I assume these work as they should.
No tests because I'm lazy and things are getting pretty close to the
point where the existing calling-conventions.ll can be shared with
SelectionDAG.
llvm-svn: 367870
We prioritize non-* wildcards overs VER_NDX_LOCAL/VER_NDX_GLOBAL "*".
This patch generalizes the rule to "*" of other versions and thus fixes PR40176.
I don't feel strongly about this GNU linkers' behavior but the
generalization simplifies code.
Delete `config->defaultSymbolVersion` which was used to special case
VER_NDX_LOCAL/VER_NDX_GLOBAL "*".
In `SymbolTable::scanVersionScript`, custom versions are handled the same
way as VER_NDX_LOCAL/VER_NDX_GLOBAL. So merge
`config->versionScript{Locals,Globals}` into `config->versionDefinitions`.
Overall this seems to simplify the code.
In `SymbolTable::assign{Exact,Wildcard}Versions`,
`sym->verdefIndex == config->defaultSymbolVersion` is changed to
`verdefIndex == UINT32_C(-1)`.
This allows us to give duplicate assignment diagnostics for
`{ global: foo; };` `V1 { global: foo; };`
In test/linkerscript/version-script.s:
vs_index of an undefined symbol changes from 0 to 1. This doesn't matter (arguably 1 is better because the binding is STB_GLOBAL) because vs_index of an undefined symbol is ignored.
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D65716
llvm-svn: 367869
Builtins-*-sunos :: compiler_rt_logbf_test.c currently FAILs on Solaris, both SPARC and
x86, 32 and 64-bit.
It turned out that this is due to different behaviour of logb depending on the C
standard compiled for, as documented on logb(3M):
RETURN VALUES
Upon successful completion, these functions return the exponent of x.
If x is subnormal:
o For SUSv3-conforming applications compiled with the c99 com-
piler driver (see standards(7)), the exponent of x as if x
were normalized is returned.
o Otherwise, if compiled with the cc compiler driver, -1022,
-126, and -16382 are returned for logb(), logbf(), and
logbl(), respectively.
Studio c99 and gcc control this by linking with the appropriate version of values-xpg[46].o, but clang uses neither of those.
The following patch fixes this by following what gcc does, as corrected some time ago in
Fix use of Solaris values-Xc.o (PR target/40411)
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02350.html and
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02384.html.
Tested on x86_64-pc-solaris2.11, sparcv9-sun-solaris2.11, and x86_64-pc-linux-gnu.
Differential Revision: https://reviews.llvm.org/D64793
llvm-svn: 367866
D65562 <https://reviews.llvm.org/D65562> moves LangStandard.h from clang/Frontend to clang/Basic. This patch
adjusts the single file in lldb that uses it to match.
Tested on x86_64-pc-linux-gnu.
Differential Revision: https://reviews.llvm.org/D65717
llvm-svn: 367865
This patch is a prerequisite for using LangStandard from Driver in
https://reviews.llvm.org/D64793.
It moves LangStandard* and InputKind::Language to Basic. It is mostly
mechanical, with only a few changes of note:
- enum Language has been changed into enum class Language : uint8_t to
avoid a clash between OpenCL in enum Language and OpenCL in enum
LangFeatures and not to increase the size of class InputKind.
- Now that getLangStandardForName, which is currently unused, also checks
both canonical and alias names, I've introduced a helper getLangKind
which factors out a code pattern already used 3 times.
The patch has been tested on x86_64-pc-solaris2.11, sparcv9-sun-solaris2.11,
and x86_64-pc-linux-gnu.
There's a companion patch for lldb which uses LangStandard.h
(https://reviews.llvm.org/D65717).
While polly includes isl which in turn uses InputKind::C, that part of the
code isn't even built inside the llvm tree. I've posted a patch to allow
for both InputKind::C and Language::C upstream
(https://groups.google.com/forum/#!topic/isl-development/6oEvNWOSQFE).
Differential Revision: https://reviews.llvm.org/D65562
llvm-svn: 367864
Summary:
rL364517 introduced further instances of `od` output checking of the
kind previously corrected by rL363829. This patch corrects the issue by
suppressing output of the input offset. The check remains sufficiently
sensitive to test for the intended value of the specific byte since the
relevant byte value is the only output we are expecting from `od`.
Reviewers: grimar, xingxue, daltenty, jasonliu, jhenderson, MaskRay
Reviewed By: grimar, MaskRay
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65680
llvm-svn: 367862
This allows to write a test case for one of untested errors
in llvm/Object/ELF.h.
I did it in this patch to demonstrate.
Differential revision: https://reviews.llvm.org/D65394
llvm-svn: 367860
Summary:
This patch adds initial support for the SVE calling convention such that
SVE types can be passed as arguments and return values to/from a
subroutine.
The SVE AAPCS states [1]:
z0-z7 are used to pass scalable vector arguments to a subroutine,
and to return scalable vector results from a function. If a
subroutine takes arguments in scalable vector or predicate
registers, or if it is a function that returns results in such
registers, it must ensure that the entire contents of z8-z23 are
preserved across the call. In other cases it need only preserve the
low 64 bits of z8-z15, as described in §5.1.2.
p0-p3 are used to pass scalable predicate arguments to a subroutine
and to return scalable predicate results from a function. If a
subroutine takes arguments in scalable vector or predicate
registers, or if it is a function that returns results in these
registers, it must ensure that p4-p15 are preserved across the call.
In other cases it need not preserve any scalable predicate register
contents.
SVE predicate and data registers are passed indirectly (i.e. spilled to the
stack and pass the address) if they exceed the registers used for argument
passing defined by the PCS referenced above. Until SVE stack support is merged
we can't spill SVE registers to the stack, so currently an llvm_unreachable is
used where we will eventually handle this.
[1] https://static.docs.arm.com/100986/0000/100986_0000.pdf
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D65448
llvm-svn: 367859
Recently an advanced support of SHT_NULL sections
was implemented in yaml2obj.
This patch adds a corresponding support to obj2yaml.
Differential revision: https://reviews.llvm.org/D65215
llvm-svn: 367852