This fix UBSAN bots after r302935. Storing non-defined values in enum is
undefined behavior.
Other places, where "if (ScalarCast != CK_Invalid)" is used, never get to the
"if" with CK_Invalid. tryGCCVectorConvertAndSplat can get to the "if" with
CK_Invalid and it looks like expected case. So we have to use something other
than CK_Invalid, e.g. CK_NoOp.
llvm-svn: 303121
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).
llvm-svn: 303118
There's no need (& a bit incorrect) to mask off the high bits of the
register reference when describing a simple bool value.
Reviewers: aprantl
Differential Revision: https://reviews.llvm.org/D31062
llvm-svn: 303117
ARM Neon has native support for half-sized vector registers (64 bits). This
is beneficial for example for 2D and 3D graphics. This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.
*** Performance Analysis
This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.
The results are with -O3 and PGO. A negative percentage is an improvement.
The testsuite was run with a sample size of 4.
** SPEC
* CFP2006/482.sphinx3 -3.34%
A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.
* CFP2000/177.mesa -3.34%
* CINT2000/256.bzip2 +6.97%
My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%. There are also other problems with the codegen in
this loop so there is further room for improvement.
** LLVM testsuite
* SingleSource/Benchmarks/Misc/ReedSolomon -10.75%
There are multiple small SLP vectorizations outside the hot code. It's a bit
surprising that it adds up to 10%. Some of this may be code-layout noise.
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%
The opt-viewer screenshot can be seen at F3218284. We start at a colder store
but the tree leads us into the hottest loop.
* MultiSource/Applications/lambda-0.1.3/lambda -2.68%
* MultiSource/Benchmarks/Bullet/bullet -2.18%
This is using 3D vectors.
* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%
Noise, binary is unchanged.
* MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90%
There is an additional SLP in the cold code. The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.
* MultiSource/Applications/aha/aha +1.63%
* MultiSource/Applications/JM/lencod/lencod +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark +1.15%
Differential Revision: https://reviews.llvm.org/D31965
llvm-svn: 303116
This caused PR33053.
Original commit message:
> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247
llvm-svn: 303115
ptr_refs exposed a problem in ClangASTContext's implementation: it
uses an accessor to downcast a QualType to an
ObjCObjectPointerType, but the accessor is not fully general.
getAs() is the safer way to go.
I've added a test case that uses ptr_refs in a way that would
crash before the fix.
<rdar://problem/31363513>
llvm-svn: 303110
We were silently ignoring any features we couldn't match up, which led to
errors in an inline asm block missing the conventional "\n\t".
llvm-svn: 303108
Some build targets (e.g. i686) have aliased names (e.g. i386). We would
get multiple definitions previously and have the linker arbitrarily
select a definition on those aliased targets. Make this more
deterministic by checking those aliases.
llvm-svn: 303103
Summary:
The following loops should be recognized:
i = 0;
while (n) {
n = n >> 1;
i++;
body();
}
use(i);
And replaced with builtin_ctlz(n) if body() is empty or
for CPUs that have CTLZ instruction converted to countable:
for (j = 0; j < builtin_ctlz(n); j++) {
n = n >> 1;
i++;
body();
}
use(builtin_ctlz(n));
Reviewers: rengolin, joerg
Differential Revision: http://reviews.llvm.org/D32605
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 303102
An assert() was being tripped when KMP_AFFINITY=respect + Multiple Processor
Groups. Let __kmp_affinity_create_proc_group_map() function be able to create
address2os object which contains a single group by deleting restriction that
process affinity mask must span multiple groups.
llvm-svn: 303101
verifyMemoryCongruency() filters out trivially dead MemoryDef(s),
as we find them immediately dead, before moving from TOP to a new
congruence class.
This fixes the same problem for PHI(s) skipping MemoryPhis if all
the operands are dead.
Differential Revision: https://reviews.llvm.org/D33044
llvm-svn: 303100
Summary:
All GlobalIndirectSymbol types (not just GlobalAlias) should return
their base object.
Without this patch LTO would warn "Unable to determine comdat of
alias!" for an ifunc.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D33202
llvm-svn: 303096
We should only ever expect this function to return a regular
InputSection; I would not expect a function definition to be in a
MergeInputSection or EhInputSection. We were previously crashing
in writeTo if this function returned a section that was not an
InputSection because we do not set OutSec for such sections.
This can happen in practice if a function is defined in an empty
section which shares its offset-in-file with a MergeInputSection,
as in the provided test case.
A better fix for this bug would be to fix the
DWARFUnit::collectAddressRanges() interface to provide section
information (see D33183), but this at least fixes the crash.
Differential Revision: https://reviews.llvm.org/D33176
llvm-svn: 303089
Summary:
C++14 added a couple of user-defined literals in the standard library. E.g.
std::chrono_literals and std::literals::chrono_literals . Using them
requires a using directive so do not warn in google-build-using-namespace
if namespace name starts with "std::" and ends with "literals".
Reviewers: alexfh
Reviewed By: alexfh
Subscribers: cfe-commits
Patch by Martin Ejdestig!
Differential Revision: https://reviews.llvm.org/D33010
llvm-svn: 303085
At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2 instructions is
conservative, and can be relaxed at O3.
Benchmark results:
llvm test-suite:
6% improvement in aha, due to duplication of loop latch
3% improvement in hexxagon
2% slowdown in lpbench. Seems related, but couldn't completely diagnose.
Internal google benchmark:
Produces 4% improvement on internal google protocol buffer serialization
benchmarks.
Differential-Revision: https://reviews.llvm.org/D32324
llvm-svn: 303084
Add a lit substitution (I chose %gmlt) so that only stack trace tests
get debug info.
We need a lit substition so that this expands to -gline-tables-only
-gcodeview on Windows. I think in the future we should reconsider the
need for -gcodeview from the GCC driver, but for now, this is necessary.
llvm-svn: 303083
Follow up to D33147
NVPTXTargetLowering::LowerCall was trusting the default argument values.
Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.
Differential Revision: https://reviews.llvm.org/D33189
llvm-svn: 303082
The check was using AST matchers in a very inefficient manner. By rewriting the
BinaryOperator-related parts using RAV, the check was sped up by a factor of
up to 10000 on some files (mostly, generated code using binary operators in
tables), but also significantly sped up for regular large files.
As a side effect, the code became clearer and more readable.
llvm-svn: 303081
Currently clang checks for default data sharing attributes only for
variables captured in OpenMP regions by reference. Patch adds checks for
variables captured by value.
llvm-svn: 303077
as described in pr33042, we cannot reliably retrieve the return value on
arm64 in cases it is returned via x8 pointer. I tried to do this as
surgically as possible and disabled it only on targets I know to be
affected, as the code is still useful, even though it can only work on
best-effort basis.
llvm-svn: 303076
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.
llvm-svn: 303073