The pmaddwd inserts a truncate, if that truncate would end up
creating additional instructions instead of making a zext
narrower, then we shouldn't do it.
I've restricted this to only sse4.1 targets since on prior
targets the zext will be done in stages. So the truncate will
probably not create additional instructions. Might need some
more investigation of mul shrinking and the other pmaddwd
transform to be sure this is the right decision.
There might be a slight regression on AVX1 targets due to add
splitting. Hard to say for sure. Maybe we need to look into
using the vector reduction flag to use 2 narrow loads and a
blend instead of extracting and inserting.
llvm-svn: 367198
This reverts r340478 and r340631 and replaces them with a simpler
method of just letting DAG combine revisit the nodes to handle
the other operand.
llvm-svn: 367195
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
llvm-svn: 367194
This adds the patterns required to transform xor P0, -1 to a VPNOT. The
instruction operands have to change a little for this, adding an in and an out
VCCR reg and using a custom DecodeMVEVPNOT for the decode.
Differential Revision: https://reviews.llvm.org/D65133
llvm-svn: 367192
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.
Differential Revision: https://reviews.llvm.org/D65103
llvm-svn: 367191
While we implemented taint propagation rules for several
builtin/standard functions, there's a natural desire for users to add
such rules to custom functions.
A series of patches will implement an option that allows users to
annotate their functions with taint propagation rules through a YAML
file. This one adds parsing of the configuration file, which may be
specified in the commands line with the analyzer config:
alpha.security.taint.TaintPropagation:Config. The configuration may
contain propagation rules, filter functions (remove taint) and sink
functions (give a warning if it gets a tainted value).
I also added a new header for future checkers to conveniently read YAML
files as checker options.
Differential Revision: https://reviews.llvm.org/D59555
llvm-svn: 367190
Summary:
Right now our CommandOptions.inc only generates the initializer for the options list but
not the array declaration boilerplate around it. As the array definition is identical for all arrays,
we might as well also let the CommandOptions.inc generate it alongside the initializers.
This patch will also allow us to generate additional declarations related to that option list in
the future (e.g. a enum class representing the specific options which would make our
handling code less prone).
This patch also fixes a few option tables that didn't follow our naming style.
Reviewers: JDevlieghere
Reviewed By: JDevlieghere
Subscribers: abidh, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D65331
llvm-svn: 367186
Summary:
In D62801, new function attribute `willreturn` was introduced. In short, a function with `willreturn` is guaranteed to come back to the call site(more precise definition is in LangRef).
In this patch, willreturn is annotated for LLVM intrinsics.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: jvesely, nhaehnle, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64904
llvm-svn: 367184
The optimizer is petulant and temperamental. In this case LLVM failed to lower
the the "insert at end" loop used by`vector<unsigned char>` to a `memset` despite
`memset` being substantially faster over a range of bytes.
LLVM has the ability to lower loops to `memset` whet appropriate, but the
odd nature of libc++'s loops prevented the optimization from taking places.
This patch addresses the issue by rewriting the loops from the form
`do [ ... --__n; } while (__n > 0);` to instead use a for loop over a pointer
range (For example: `for (auto *__i = ...; __i < __e; ++__i)`).
This patch also rewrites the asan annotations to unposion all additional memory
at the start of the loop instead of once per iterations. This could potentially
permit false negatives where the constructor of element N attempts to access
element N + 1 during its construction.
The before and after results for the `BM_ConstructSize/vector_byte/5140480_mean`
benchmark (run 5 times) are:
--------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------
Before
------
BM_ConstructSize/vector_byte/5140480_mean 12530140 ns 12469693 ns N/A
BM_ConstructSize/vector_byte/5140480_median 12512818 ns 12445571 ns N/A
BM_ConstructSize/vector_byte/5140480_stddev 106224 ns 107907 ns 5
-----
After
-----
BM_ConstructSize/vector_byte/5140480_mean 167285 ns 166500 ns N/A
BM_ConstructSize/vector_byte/5140480_median 166749 ns 166069 ns N/A
BM_ConstructSize/vector_byte/5140480_stddev 3242 ns 3184 ns 5
llvm-svn: 367183
Same kind of fix as in r367176, but for "RUN on line 76"
this time.
I'll ask for a post-commit review, to ensure this
matches the intention with the test added in r367165.
But I think this at least will make the buildbots a
little bit happier.
llvm-svn: 367182
The current pattern would trigger for scheduling changes of the
post-load computation, since those are commutable with the inline asm.
Avoid this by explicitly check the order of load vs asm block.
llvm-svn: 367180
Sprinkled some #ifndef NDEBUG in Selection.cpp to make
it possible to build with NDEBUG defined.
The problem was introduced in rL366698 when using dlog
for some debug printouts. The dlog macro expands to
DEBUG_WITH_TYPE, which isn't using it's arguments in
optimized builds (when NDEBUG is defined).
llvm-svn: 367178
This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back.
This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value.
Fixes PR42777
llvm-svn: 367174
The test case from:
https://bugs.llvm.org/show_bug.cgi?id=42771
...shows a ~30x slowdown caused by the awkward loop iteration (rL207302) that is
seemingly done just to avoid invalidating the instruction iterator. We can instead
delay instruction deletion until we reach the end of the block (or we could delay
until we reach the end of all blocks).
There's a test diff here for a degenerate case with llvm.assume that is not
meaningful in itself, but serves to verify this change in logic.
This change probably doesn't result in much overall compile-time improvement
because we call '-instsimplify' as a standalone pass only once in the standard
-O2 opt pipeline currently.
Differential Revision: https://reviews.llvm.org/D65336
llvm-svn: 367173
Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines.
llvm-svn: 367172
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.
This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it.
This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.
llvm-svn: 367171
Summary:
If XCode is not installed, `xcodebuild -version -sdk macosx Path` will give
xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance
In this case the variable OSX_SYSROOT will be empty and
OSX_SYSROOT_FLAG is set to "-isysroot" (without a path).
This then causes the CompilerRTUnitTestCheckCxx target failed to for me
because "${COMPILER_RT_TEST_COMPILER} ${OSX_SYSROOT_FLAG} -E" expanded to
"clang -isysroot -E". This results in a warning "sysroot -E does not exist"
and the target fails to run because the C++ headers cannot be found.
Reviewers: beanz, kubamracek
Reviewed By: beanz
Subscribers: dberris, mgorny, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D65323
llvm-svn: 367170
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.
llvm-svn: 367169
This is like r305666 (which added support for `if constexpr`) except
that it allows a macro name after the if.
This is slightly tricky for two reasons:
1. r305666 didn't add test coverage for all cases where it added a
kw_constexpr, so I had to figure out what all the added cases were
for. I now added tests for all `if constexpr` bits that didn't have
tests. (This took a while, see e.g. https://reviews.llvm.org/D65223)
2. Parsing `if <ident> (` as an if means that `#if defined(` and
`#if __has_include(` parse as ifs too. Add some special-case code
to prevent this from happening where it's incorrect.
Fixes PR39248.
Differential Revision: https://reviews.llvm.org/D65227
llvm-svn: 367167
This morally relands r365703 (and r365714), originally reviewed at
https://reviews.llvm.org/D64527, but with a different implementation.
Relanding the same approach with a fix for the revert reason got a bit
involved (see https://reviews.llvm.org/D65108) so use a simpler approach
with a more localized implementation (that in return duplicates code
a bit more).
This approach also doesn't validate flags for the integrated assembler
if the assembler step doesn't run.
Fixes PR42066.
Differential Revision: https://reviews.llvm.org/D65233
llvm-svn: 367165
Add partial instruction selection for intrinsics like this:
```
declare i32 @llvm.aarch64.stlxr(i64, i32*)
```
(This only handles the case where a G_ZEXT is feeding the intrinsic.)
Also make sure that the added store instruction actually has the memory op from
the original G_STORE.
Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll.
Differential Revision: https://reviews.llvm.org/D65355
llvm-svn: 367163
This patch changes the following tests to run under the new pass manager only:
```
Clang :: CodeGen/avx512-reduceMinMaxIntrin.c (1 of 4)
Clang :: CodeGen/avx512vl-builtins.c (2 of 4)
Clang :: CodeGen/avx512vlbw-builtins.c (3 of 4)
Clang :: CodeGen/avx512f-builtins.c (4 of 4)
```
The new PM added extra bitcasts that weren't checked before. For
reduceMinMaxIntrin.c, the issue was mostly the alloca's being in a different
order. Other changes involved extra bitcasts, and differently ordered loads and
stores, but the logic should still be the same.
Differential revision: https://reviews.llvm.org/D65110
llvm-svn: 367157
This adds support to the yaml remark parser to be able to parse remarks
directly from the metadata.
This supports parsing separate metadata and following the external file
with the associated metadata, and also a standalone file containing
metadata + remarks all together.
Original llvm-svn: 367148
Revert llvm-svn: 367151
This has a fix for gcc builds.
llvm-svn: 367155
unreachable loop.
updatePredecessorProfileMetadata in jumpthreading tries to find the
first dominating predecessor block for a PHI value by searching upwards
the predecessor block chain.
But jumpthreading may see some temporary IR state which contains
unreachable bb not being cleaned up. If an unreachable loop happens to
be on the predecessor block chain, keeping chasing the predecessor
block will run into an infinite loop.
The patch fixes it.
Differential Revision: https://reviews.llvm.org/D65310
llvm-svn: 367154
This didn't get updated after we decided to set PYTHON_MAJOR_VERSION and
PYTHON_MINOR_VERSION in find_python_libs_windows, instead of parsing the
variables ourselves.
llvm-svn: 367153
This adds support to the yaml remark parser to be able to parse remarks
directly from the metadata.
This supports parsing separate metadata and following the external file
with the associated metadata, and also a standalone file containing
metadata + remarks all together.
llvm-svn: 367148
This restores the use of `rm` instead of the non-portable `ln -n`. Such
use being the status quo for the 12-month period between rC334972 and
rC365414.
llvm-svn: 367147