gcov is an "Edge Profiling with Edge Counters" application according to
Optimally Profiling and Tracing Programs (1994).
The minimum number of counters necessary is |E|-(|V|-1). The unmeasured edges
form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage
this optimization. This patch implements the optimization for clang --coverage.
The produced .gcda files are much smaller now.
Adds link/code sample to avoid rendering two dashes as non-ASCII "en dash".
Also make wording a complete sentence.
Reviewed By: nickdesaulniers, tmfink
Differential Revision: https://reviews.llvm.org/D85596
CTTZ, CTLZ, CTPOP, and FCANONICALIZE all have the same input and
output types so the operand should have already been legalized when the
result type was legalized.
Also add the +mutable-globals features in clang when
building with `-fPIC` since the linker will generate mutable
globals imports and exports in that case.
Differential Revision: https://reviews.llvm.org/D87537
i.e. change the work flow from
* .gcno for function A
* .gcno for function B
* .gcno for function C
* .gcda for function A
* .gcda for function B
* .gcda for function C
to
* .gcno for function A
* .gcda for function A
* .gcno for function B
* .gcda for function B
* .gcno for function C
* .gcda for function C
Currently there is duplicate logic in .gcno & .gcda processing: how functions
are filtered, which edges are instrumented, etc. This refactor enables simplification.
Since we always process .gcno, in -fprofile-arcs -fno-test-coverage mode,
__llvm_internal_gcov_emit_function_args.0 will have non-zero checksums.
In MinGW world, UNIX like lib prefix is preferred for the libraries.
This patch adjusts CMake files to do that.
Differential Revision: https://reviews.llvm.org/D87517
Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop
isn't natively supported by the target, this leads to poor codegen
due to the expansion of ctpop being more complex than what is needed
for parity.
This adds a DAG combine to convert the pattern to ISD::PARITY
before operation legalization. Type legalization is updated
to handled Expanding and Promoting this operation. If after type
legalization, CTPOP is supported for this type, LegalizeDAG will
turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a
series of shifts and xors followed by an AND with 1.
I've avoided vectors in this patch to avoid more legalization
complexity for this patch.
X86 previously had a custom DAG combiner for this. This is now
moved to Custom lowering for the new opcode. There is a minor
regression in vector-reduce-xor-bool.ll, but a follow up patch
can easily fix that.
Fixes PR47433
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87209
When deleting stores at the end of a function, we have to do PHI
translation, otherwise we might miss reads in different iterations of a
loop. See multiblock-loop-carried-dependence.ll for details.
This fixes a mis-compile and surprisingly also increases the number of
eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006
on X86 with -O3 -flto. This is most likely because we save budget by not
exploring through MemoryPhis, which are less likely to result in valid
candidates for elimination.
The issue was reported post-commit for fb109c42d9.
This allows the backend to tell the vectorizer to produce inloop
reductions through a TTI hook.
For the moment on ARM under MVE this means allowing integer add
reductions of the correct size. In the future this can include integer
min/max too, under -Os.
Differential Revision: https://reviews.llvm.org/D75512
As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything.
We already do something similar for BLENDV -> SELECT conversion.
After the recent discussion on cfe-dev 'Can indirect class parameters be
noalias?' [1], it seems like using using noalias is problematic for
current C++, but should be allowed for C-only code.
This patch introduces a new option to let the user indicate that it is
safe to mark indirect class parameters as noalias. Note that this also
applies to external callers, e.g. it might not be safe to use this flag
for C functions that are called by C++ functions.
In targets that allocate indirect arguments in the called function, this
enables more agressive optimizations with respect to memory operations
and brings a ~1% - 2% codesize reduction for some programs.
[1] : http://lists.llvm.org/pipermail/cfe-dev/2020-July/066353.html
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D85473
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything.
This fixes a complication on top of D87276. If we are sign extending
around a mul with the two operands that are the same, instcombine will
helpfully convert one of the sext to a zext. Reverse that so that we
again generate a reduction.
Differnetial Revision: https://reviews.llvm.org/D87287
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html
This is hopefully the final remaining showstopper before we can remove
the 'experimental' from the reduction intrinsics.
No behavior was specified for the FP min/max reductions, so we have a
mess of different interpretations.
There are a few potential options for the semantics of these max/min ops.
I think this is the simplest based on current behavior/implementation:
make the reductions inherit from the existing llvm.maxnum/minnum intrinsics.
These correspond to libm fmax/fmin, and those are similar to the (now
deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing
data). So the default expansion creates calls to libm functions.
Another option would be to inherit from llvm.maximum/minimum (NaNs propagate),
but most targets just crash in codegen when given those nodes because no
default expansion was ever implemented AFAICT.
We could also just assume 'nnan' semantics by default (we are already
assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets
(AArch64, PowerPC) support the more defined behavior, so it doesn't make much
sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to
loosen the semantics.
(Note that D67507 was proposed to update the LangRef to acknowledge the more
recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do
update based on the new standard, the reduction instructions can seamlessly
inherit from whatever updates are made to the max/min intrinsics.)
x86 sees a regression here on 'nnan' tests because we have underlying,
longstanding bugs in FMF creation/propagation. Those need to be fixed apart
from this change (for example: https://llvm.org/PR35538). The expansion
sequence before this patch may not have been correct.
Differential Revision: https://reviews.llvm.org/D87391
We can sometimes get code that does:
xe = zext i16 x to i32
ye = zext i16 y to i32
m = mul i32 xe, ye
me = zext i32 m to i64
r = vecreduce.add(me)
This "double extend" can trip up the reduction identification, but
should give identical results.
This extends the pattern matching to handle them.
Differential Revision: https://reviews.llvm.org/D87276
This is a followup to D86834, which partially fixed this issue in
InstSimplify. However, InstCombine repeats the same transform while
dropping poison flags -- which does not cover cases where poison is
introduced in some other way.
The fix here is a bit more comprehensive, because things are quite
entangled, and it's hard to only partially address it without
regressing optimization. There are really two changes here:
* Export the SimplifyWithOpReplaced API from InstSimplify, with an
added AllowRefinement flag. For replacements inside the TrueVal
we don't actually care whether refinement occurs or not, the
replacement is always legal. This part of the transform is now
done in InstSimplify only. (It should be noted that the current
AllowRefinement check is not sufficient -- that's an issue we
need to address separately.)
* Change the InstCombine fold to work by temporarily dropping
poison generating flags, running the fold and then restoring the
flags if it didn't work out. This will ensure that the InstCombine
fold is correct as long as the InstSimplify fold is correct.
Differential Revision: https://reviews.llvm.org/D87445
Follow up to D86429 to handle the remaining regressions.
This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered.
For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better.
Differential Revision: https://reviews.llvm.org/D87405
This change allow a CastExpr to have optional FPOptionsOverride object,
stored in trailing storage. Of all cast nodes only ImplicitCastExpr,
CStyleCastExpr, CXXFunctionalCastExpr and CXXStaticCastExpr are allowed
to have FPOptions.
Differential Revision: https://reviews.llvm.org/D85960
with P9 Model
Enable the pre-ra and post-ra scheduler strategy for Power10 as we want
to customize the heuristic later. And switch the scheduler model with P9
model before P10 Model is available. The NoSchedModel is modelled as
in-order cpu and the pre-ra scheduler is not bi-directional which will
have big impact on the scheduler.
Reviewed By: jji
Differential Revision: https://reviews.llvm.org/D86865
From ISA, fcmpu will raise the Floating-Point Invalid Operation
Exception (SNaN) if either of the operands is a Signaling NaN by setting
the bit VXSNAN. But the instruction description didn't set the
mayRaiseFPException which might have impact on the scheduling or some
backend optimization.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D83937