This patch decodes target and faux shuffles with getTargetShuffleInputs - a reduced version of resolveTargetShuffleInputs that doesn't resolve SM_SentinelZero cases, so we can correctly remove zero vectors if they aren't demanded.
llvm-svn: 371353
If the two zero vectors have undefs in different places they
won't get combined by simplifySelect.
This fixes a regression from an earlier commit.
llvm-svn: 371351
The change to avx512-vec-cmp.ll is a regression, but should be
easy to fix. It occurs because the getZeroVector call was
canonicalizing both sides to the same node, then SimplifySelect
was able to simplify it. But since only called getZeroVector
on some VTs this isn't a robust way to combine this.
The change to vector-shuffle-combining-ssse3.ll is more
instructions, but removes a constant pool load so its unclear
if its a regression or not.
llvm-svn: 371350
Summary:
This is motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
In this particular case, given
```
char* test(char& base, unsigned long offset) {
return &base + offset;
}
```
it will end up producing something like
https://godbolt.org/z/LK5-iH
which after optimizations reduces down to roughly
```
define i1 @t0(i8* nonnull %base, i64 %offset) {
%base_int = ptrtoint i8* %base to i64
%adjusted = add i64 %base_int, %offset
%non_null_after_adjustment = icmp ne i64 %adjusted, 0
%no_overflow_during_adjustment = icmp uge i64 %adjusted, %base_int
%res = and i1 %non_null_after_adjustment, %no_overflow_during_adjustment
ret i1 %res
}
```
Without D67122 there was no `%non_null_after_adjustment`,
and in this particular case we can get rid of the overhead:
Here we add some offset to a non-null pointer,
and check that the result does not overflow and is not a null pointer.
But since the base pointer is already non-null, and we check for overflow,
that overflow check will already catch the null pointer,
so the separate null check is redundant and can be dropped.
Alive proofs:
https://rise4fun.com/Alive/WRzq
There are more patterns of "unsigned-add-with-overflow", they are not handled here,
but this is the main pattern, that we currently consider canonical,
so it makes sense to handle it.
https://bugs.llvm.org/show_bug.cgi?id=43246
Reviewers: spatel, nikic, vsk
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits, reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67332
llvm-svn: 371349
As reported in post-commit review of r370327,
there is some case where the code crashes.
As discussed with Craig Topper, the problem is that getConstant()
internally calls getSplatBuildVector(), so we don't insert
the constant itself.
If we do that manually we're good.
llvm-svn: 371346
This is similar to the existing fold for splats added with:
rL365379
If we can adjust the shuffle mask to include another element
in an identity mask (if it changes vector length, that's an
extract/insert subvector operation in the backend), then that
can eliminate extractelement/insertelement pairs in IR.
All targets are expected to lower shuffles with identity masks
efficiently.
llvm-svn: 371340
This is what the original bug (http://llvm.org/PR39641) and the fix
in https://reviews.llvm.org/D63877 have been about.
With the dynamic runtime the test only passes when the asan library
is linked against libstdc++: In contrast to libc++abi, it does not
implement __cxa_rethrow_primary_exception so the regex matches the
line saying that asan cannot intercept this function. Indeed, there
is no message that the runtime failed to intercept __cxa_throw.
Differential Revision: https://reviews.llvm.org/D67298
llvm-svn: 371336
Summary:
Add zero-materializing XORs to X86's describeLoadedValue() hook in order
to produce call site values.
I have had to change the defs logic in collectCallSiteParameters() a bit
to be able to describe the XORs. The XORs implicitly define $eflags,
which would cause them to never be considered, due to a guard condition
that I->getNumDefs() is one. I have changed that condition so that we
now only consider instructions where a forwarded register overlaps with
the instruction's single explicit define. We still need to collect the implicit
defines of other forwarded registers to remove them from the work list.
I'm not sure how to move towards supporting instructions with multiple
explicit defines, cases where forwarded register are implicitly defined,
and/or cases where an instruction produces values for multiple forwarded
registers. Perhaps the describeLoadedValue() hook should take a register
argument, and we then leave it up to the hook to describe the loaded
value in that register? I have not yet encountered a situation where
that would be necessary though.
Reviewers: aprantl, vsk, djtodoro, NikolaPrica
Reviewed By: vsk
Subscribers: ychen, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67225
llvm-svn: 371333
Summary:
This changes the ParamLoadedValue pair which the describeLoadedValue()
hook returns so that MachineOperand objects are returned instead of
pointers.
When describing call site values we may need to describe operands which
are not part of the instruction. One such example is zero-materializing
XORs on x86, which I have implemented support for in a child revision.
Instead of having to return a pointer to an operand stored somewhere
outside the instruction, start returning objects directly instead, as
that simplifies the code.
The MachineOperand class only holds POD members, and on x86-64 it is 32
bytes large. That combined with copy elision means that the overhead of
returning a machine operand object from the hook does not become very
large.
I benchmarked this on a 8-thread i7-8650U machine with 32 GB RAM. The
benchmark consisted of building a clang 8.0 binary configured with:
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DLLVM_TARGETS_TO_BUILD=X86 \
-DLLVM_USE_SANITIZER=Address \
-DCMAKE_CXX_FLAGS="-Xclang -femit-debug-entry-values -stdlib=libc++"
The average wall clock time increased by 4 seconds, from 62:05 to
62:09, which is an 0.1% increase.
Reviewers: aprantl, vsk, djtodoro, NikolaPrica
Reviewed By: vsk
Subscribers: hiraditya, ychen, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67261
llvm-svn: 371332
This generalizes the existing <32 x i1> pre-AVX2 split code to support reductions from <64 x i1> as well, we can probably generalize to any larger pow2 case in the future if the (unlikely) need ever arises.
We still need to tweak combineBitcastvxi1 to improve AVX512F codegen as its assumes vXi1 types should be handled on the mask registers even when they aren't legal.
Differential Revision: https://reviews.llvm.org/D67070
llvm-svn: 371328
We're running into linker errors from missing sancov sections:
```
ld.lld: error: relocation refers to a discarded section: __sancov_guards
>>> defined in user-arm64-ubsan-sancov-full.shlib/obj/third_party/ulib/scudo/scudo.wrappers_c.cc.o
>>> referenced by common.h:26 (../../zircon/third_party/ulib/scudo/common.h:26)
... many other references
```
I believe this is due to a pass in the default pipeline that somehow discards
these sections. The ModuleSanitizerCoveragePass was initially added at the
start of the pipeline. This now adds it to the end of the pipeline for
optimized and unoptimized builds.
Differential Revision: https://reviews.llvm.org/D67323
llvm-svn: 371326
isel used to require zero vectors to be canonicalized to a single
type to minimize the number of patterns needed to match. This is
no longer required.
I plan to do this to integers too, but floating point was simpler
to start with. Integer has a complication where v32i16/v64i8 aren't
legal when the other 512-bit integer types are.
llvm-svn: 371325
Summary:
In https://svnweb.freebsd.org/changeset/base/351659 @emaste removed gets() from
FreeBSD 13's libc, and our copies of libc++ and libstdc++. In that change, the
declarations were simply deleted, but I would like to propose this conditional
test instead.
Reviewers: EricWF, mclow.lists, emaste
Reviewed By: mclow.lists
Subscribers: krytarowski, christof, ldionne, emaste, libcxx-commits
Differential Revision: https://reviews.llvm.org/D67316
llvm-svn: 371324
This patch enables generation of fused multiply add/sub for instructions operating on fp16.
Tested on aarch64-linux.
Differential Revision: https://reviews.llvm.org/D67297
llvm-svn: 371321
Fixes PR35682. When a template in instantiated with an incomplete typo corrected type an assertion can trigger if the -ferror-limit is used to reduce the number of errors.
Patch by Mark de Wever.
llvm-svn: 371320
Summary:
Similar to the previous prefer-256-bit flag. We might want to
enable this by default some CPUs. This just starts the initial
work to implement and prove that it effects TTI's vector width.
Reviewers: RKSimon, echristo, spatel, atdt
Reviewed By: RKSimon
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67311
llvm-svn: 371319
Use getAPIntValue() directly - this is mainly a best practice style issue to help prevent fuzz tests blowing up when a i12345 (or whatever) is generated.
Use getConstantOperandVal/getConstantOperandAPInt wrappers where possible.
llvm-svn: 371315
```
.type foo,@gnu_indirect_function
.set foo,foo_resolver
.set foo2,foo
.set foo3,foo2
```
The types of foo2 and foo3 should be STT_GNU_IFUNC, but we currently
resolve them to the type of foo_resolver. This patch fixes it.
Differential Revision: https://reviews.llvm.org/D67206
Patch by Senran Zhang
llvm-svn: 371312
Summary:
Normally TargetLowering::expandFixedPointMul would handle
SMULFIXSAT with scale zero by using an SMULO to compute the
product and determine if saturation is needed (if overflow
happened). But if SMULO isn't custom/legal it falls through
and uses the same technique, using MULHS/SMUL_LOHI, as used
for non-zero scales.
Problem was that when checking for overflow (handling saturation)
when not using MULO we did not expect to find a zero scale. So
we ended up in an assertion when doing
APInt::getLowBitsSet(VTSize, Scale - 1)
This patch fixes the problem by adding a new special case for
how saturation is computed when scale is zero.
Reviewers: RKSimon, bevinh, leonardchan, spatel
Reviewed By: RKSimon
Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67071
llvm-svn: 371309
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.
This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.
Patch by: leonardchan, bjope
Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel
Reviewed By: leonardchan
Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57836
llvm-svn: 371308
Fix for https://bugs.llvm.org/show_bug.cgi?id=43230.
When creating PSHUFLW from a repeated shuffle mask, we have to apply
the checks to the repeated mask, not the original one. For the test
case from PR43230 the inspected part of the original mask is all undef.
Differential Revision: https://reviews.llvm.org/D67314
llvm-svn: 371307