Commit Graph

365975 Commits

Author SHA1 Message Date
Fangrui Song 5f4e9bf641 [gcov] Fix memory leak due to BranchProbabilityInfoWrapperPass
This is weird.
2020-09-13 00:44:32 -07:00
Fangrui Song 63182c2ac0 [gcov] Add spanning tree optimization
gcov is an "Edge Profiling with Edge Counters" application according to
Optimally Profiling and Tracing Programs (1994).

The minimum number of counters necessary is |E|-(|V|-1). The unmeasured edges
form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage
this optimization. This patch implements the optimization for clang --coverage.
The produced .gcda files are much smaller now.
2020-09-13 00:07:31 -07:00
Fangrui Song f086e85eea [gcov] Assign names to some types and loaded values used in @__llvm_internal*
This makes the generated IR much more readable.
2020-09-12 22:42:37 -07:00
Fangrui Song 8cf1ac97ce [llvm-cov gcov] Improve accuracy when some edges are not measured
Also guard against infinite recursion if GCOV_ARC_ON_TREE edges contain a cycle.
2020-09-12 22:33:41 -07:00
Travis Finkenauer 0fb2203cd6 [Docs] Fix --print-supported-cpus option rendering
Adds link/code sample to avoid rendering two dashes as non-ASCII "en dash".
Also make wording a complete sentence.

Reviewed By: nickdesaulniers, tmfink

Differential Revision: https://reviews.llvm.org/D85596
2020-09-13 05:26:18 +00:00
Craig Topper 61d29e0dff [LegalizeTypes] Remove a few cases from SplitVectorOperand that should never happen. NFC
CTTZ, CTLZ, CTPOP, and FCANONICALIZE all have the same input and
output types so the operand should have already been legalized when the
result type was legalized.
2020-09-12 20:59:14 -07:00
Craig Topper 758732a34e [X86] Use ISD::PARITY directly instead of emitting CTPOP and AND from combineHorizontalPredicateResult.
We have a PARITY ISD node now so might as well use it. It will
get re-expanded later.
2020-09-12 20:01:17 -07:00
Krzysztof Parzyszek 9d300bc8d2 [Hexagon] Avoid widening vectors with non-HVX element types 2020-09-12 20:26:54 -05:00
LLVM GN Syncbot 70daa353e2 [gn build] Port cc2da5554b 2020-09-12 23:13:20 +00:00
Sam Clegg cc2da5554b [lld][WebAssembly] Add initial support for -Map/--print-map
Differential Revision: https://reviews.llvm.org/D77187
2020-09-12 16:10:51 -07:00
Nikita Popov c2f8bc986f [ARM] Add tests for fmin/max + inf folds (NFC) 2020-09-13 00:22:03 +02:00
Sam Clegg 04febd30a8 [lld][WebAssembly] Error on import/export of mutable global without `mutable-globals` feature
Also add the +mutable-globals features in clang when
building with `-fPIC` since the linker will generate mutable
globals imports and exports in that case.

Differential Revision: https://reviews.llvm.org/D87537
2020-09-12 14:28:14 -07:00
Fangrui Song d6fadc49e3 [gcov] Process .gcda immediately after the accompanying .gcno instead of doing all .gcda after all .gcno
i.e. change the work flow from

* .gcno for function A
* .gcno for function B
* .gcno for function C
* .gcda for function A
* .gcda for function B
* .gcda for function C

to

* .gcno for function A
* .gcda for function A
* .gcno for function B
* .gcda for function B
* .gcno for function C
* .gcda for function C

Currently there is duplicate logic in .gcno & .gcda processing: how functions
are filtered, which edges are instrumented, etc. This refactor enables simplification.

Since we always process .gcno, in -fprofile-arcs -fno-test-coverage mode,
__llvm_internal_gcov_emit_function_args.0 will have non-zero checksums.
2020-09-12 13:53:03 -07:00
Nikita Popov bdd1eba37b [ARM] Add additional vecreduce float legalization test (NFC) 2020-09-12 22:40:39 +02:00
Paul C. Anagnostopoulos 93b4f85382 Update TableGen test files to use the new '...' range punctuation. 2020-09-12 16:26:32 -04:00
Paul C. Anagnostopoulos e8e3693cea Change range operator from deprecated '-' to '...' 2020-09-12 16:26:32 -04:00
Fangrui Song 7d3825ed95 Revert "[gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering"
This reverts commit 412c9c0bf2.
2020-09-12 12:34:43 -07:00
Fangrui Song 412c9c0bf2 [gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering 2020-09-12 12:21:32 -07:00
Fangrui Song c55c14837e [gcov] Clean up by getting llvm.dbg.cu earlier 2020-09-12 12:21:32 -07:00
Nikita Popov c34a99fe58 [InstCombine] Add extra use tests for abs canonicalization (NFC) 2020-09-12 21:13:46 +02:00
Mateusz Mikuła 7da9419399 [MinGW][libclang] Allow simultaneous shared and static lib
It builds fine for MinGW on Windows.

Differential Revision: https://reviews.llvm.org/D87539
2020-09-12 22:03:43 +03:00
Mateusz Mikuła bb613044b6 [MinGW][clang-shlib] Build by default on MinGW
It builds without errors and makes possible to use
CLANG_LINK_CLANG_DYLIB=1.

Differential Revision: https://reviews.llvm.org/D87547
2020-09-12 22:02:31 +03:00
Mateusz Mikuła cc76965b19 [MinGW] Use lib prefix for libraries
In MinGW world, UNIX like lib prefix is preferred for the libraries.
This patch adjusts CMake files to do that.

Differential Revision: https://reviews.llvm.org/D87517
2020-09-12 22:01:29 +03:00
Craig Topper ad3d6f993d [SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors.
Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop
isn't natively supported by the target, this leads to poor codegen
due to the expansion of ctpop being more complex than what is needed
for parity.

This adds a DAG combine to convert the pattern to ISD::PARITY
before operation legalization. Type legalization is updated
to handled Expanding and Promoting this operation. If after type
legalization, CTPOP is supported for this type, LegalizeDAG will
turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a
series of shifts and xors followed by an AND with 1.

I've avoided vectors in this patch to avoid more legalization
complexity for this patch.

X86 previously had a custom DAG combiner for this. This is now
moved to Custom lowering for the new opcode. There is a minor
regression in vector-reduce-xor-bool.ll, but a follow up patch
can easily fix that.

Fixes PR47433

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87209
2020-09-12 11:42:18 -07:00
Florian Hahn d85ac6d577 [DSE] Adjust coroutines test after e082dee2b5. 2020-09-12 19:23:13 +01:00
Florian Hahn e082dee2b5 [DSE] Bail out on MemoryPhis when deleting stores at end of function.
When deleting stores at the end of a function, we have to do PHI
translation, otherwise we might miss reads in different iterations of a
loop. See multiblock-loop-carried-dependence.ll for details.

This fixes a mis-compile and surprisingly also increases the number of
eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006
on X86 with -O3 -flto. This is most likely because we save budget by not
exploring through MemoryPhis, which are less likely to result in valid
candidates for elimination.

The issue was reported post-commit for fb109c42d9.
2020-09-12 19:05:59 +01:00
Florian Hahn 3de9e3e493 [DSE] Precommit test case with loop carried dependence. 2020-09-12 18:51:08 +01:00
David Green 74760bb00f [LV][ARM] Add preferInloopReduction target hook.
This allows the backend to tell the vectorizer to produce inloop
reductions through a TTI hook.

For the moment on ARM under MVE this means allowing integer add
reductions of the correct size. In the future this can include integer
min/max too, under -Os.

Differential Revision: https://reviews.llvm.org/D75512
2020-09-12 17:47:04 +01:00
Paul C. Anagnostopoulos 8ce75e2778 TableGen: change a couple of member names to clarify their use. 2020-09-12 12:21:36 -04:00
Simon Pilgrim 3170d54842 [InstCombine][X86] Covert masked load/stores with (sign extended) bool vector masks to generic intrinsics.
As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything.

We already do something similar for BLENDV -> SELECT conversion.
2020-09-12 15:09:28 +01:00
Florian Hahn a874d63344 [Clang] Add option to allow marking pass-by-value args as noalias.
After the recent discussion on cfe-dev 'Can indirect class parameters be
noalias?' [1], it seems like using using noalias is problematic for
current C++, but should be allowed for C-only code.

This patch introduces a new option to let the user indicate that it is
safe to mark indirect class parameters as noalias. Note that this also
applies to external callers, e.g. it might not be safe to use this flag
for C functions that are called by C++ functions.

In targets that allocate indirect arguments in the called function, this
enables more agressive optimizations with respect to memory operations
and brings a ~1% - 2% codesize reduction for some programs.

[1] : http://lists.llvm.org/pipermail/cfe-dev/2020-July/066353.html

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D85473
2020-09-12 14:56:13 +01:00
Evgeny Leviant 2e61cd1295 [MachineScheduler] Fix operand scheduling for pre/post-increment loads
Differential revision: https://reviews.llvm.org/D87557
2020-09-12 16:53:12 +03:00
Tyker 78de7297ab Reland [AssumeBundles] Use operand bundles to encode alignment assumptions
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
2020-09-12 15:36:06 +02:00
Simon Pilgrim d030aad789 [InstCombine][X86] Add tests for masked load/stores with comparisons.
As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything.
2020-09-12 14:32:27 +01:00
David Green 6cfd38d03d [ARM] Fixup single source mla reductions.
This fixes a complication on top of D87276. If we are sign extending
around a mul with the two operands that are the same, instcombine will
helpfully convert one of the sext to a zext. Reverse that so that we
again generate a reduction.

Differnetial Revision: https://reviews.llvm.org/D87287
2020-09-12 14:31:26 +01:00
Sanjay Patel 3a8ea8609b [Intrinsics] define semantics for experimental fmax/fmin vector reductions
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

This is hopefully the final remaining showstopper before we can remove
the 'experimental' from the reduction intrinsics.

No behavior was specified for the FP min/max reductions, so we have a
mess of different interpretations.

There are a few potential options for the semantics of these max/min ops.
I think this is the simplest based on current behavior/implementation:
make the reductions inherit from the existing llvm.maxnum/minnum intrinsics.
These correspond to libm fmax/fmin, and those are similar to the (now
deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing
data). So the default expansion creates calls to libm functions.

Another option would be to inherit from llvm.maximum/minimum (NaNs propagate),
but most targets just crash in codegen when given those nodes because no
default expansion was ever implemented AFAICT.

We could also just assume 'nnan' semantics by default (we are already
assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets
(AArch64, PowerPC) support the more defined behavior, so it doesn't make much
sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to
loosen the semantics.

(Note that D67507 was proposed to update the LangRef to acknowledge the more
recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do
update based on the new standard, the reduction instructions can seamlessly
inherit from whatever updates are made to the max/min intrinsics.)

x86 sees a regression here on 'nnan' tests because we have underlying,
longstanding bugs in FMF creation/propagation. Those need to be fixed apart
from this change (for example: https://llvm.org/PR35538). The expansion
sequence before this patch may not have been correct.

Differential Revision: https://reviews.llvm.org/D87391
2020-09-12 09:10:28 -04:00
Simon Pilgrim 50ee0b99ec [InstCombine][X86] getNegativeIsTrueBoolVec - use ConstantExpr evaluators. NFCI.
Don't do this manually, we can just use the ConstantExpr evaluators to do it more tidily for us.
2020-09-12 13:58:58 +01:00
David Green c437446d90 [ARM] Recognize "double extend" reduction patterns
We can sometimes get code that does:
  xe = zext i16 x to i32
  ye = zext i16 y to i32
  m = mul i32 xe, ye
  me = zext i32 m to i64
  r = vecreduce.add(me)
This "double extend" can trip up the reduction identification, but
should give identical results.

This extends the pattern matching to handle them.

Differential Revision: https://reviews.llvm.org/D87276
2020-09-12 13:51:42 +01:00
Nikita Popov 36e2e2e12e [InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322)
This is a followup to D86834, which partially fixed this issue in
InstSimplify. However, InstCombine repeats the same transform while
dropping poison flags -- which does not cover cases where poison is
introduced in some other way.

The fix here is a bit more comprehensive, because things are quite
entangled, and it's hard to only partially address it without
regressing optimization. There are really two changes here:

 * Export the SimplifyWithOpReplaced API from InstSimplify, with an
   added AllowRefinement flag. For replacements inside the TrueVal
   we don't actually care whether refinement occurs or not, the
   replacement is always legal. This part of the transform is now
   done in InstSimplify only. (It should be noted that the current
   AllowRefinement check is not sufficient -- that's an issue we
   need to address separately.)
 * Change the InstCombine fold to work by temporarily dropping
   poison generating flags, running the fold and then restoring the
   flags if it didn't work out. This will ensure that the InstCombine
   fold is correct as long as the InstSimplify fold is correct.

Differential Revision: https://reviews.llvm.org/D87445
2020-09-12 14:45:06 +02:00
Simon Pilgrim 35dc91aee2 [X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases
Follow up to D86429 to handle the remaining regressions.

This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered.

For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better.

Differential Revision: https://reviews.llvm.org/D87405
2020-09-12 13:39:33 +01:00
LLVM GN Syncbot 4ede83c068 [gn build] Port 19531a81f1 2020-09-12 10:08:18 +00:00
Serge Pavlov de044f7562 Revert "[AST][FPEnv] Keep FP options in trailing storage of CastExpr"
This reverts commit 6c8041aa0f.
It caused some fails on buildbots.
2020-09-12 17:06:42 +07:00
Jianzhou Zhao b3f364e856 Add a header file to support ssize_t for windows
fixing
0ece51c60c
2020-09-12 08:50:22 +00:00
Serge Pavlov 9c651c231f Missing change from previous commit 2020-09-12 15:11:09 +07:00
Jianzhou Zhao 19531a81f1 Add raw_fd_stream_test.cpp into CMakeLists.txt
Fixing 0ece51c60c
2020-09-12 07:48:12 +00:00
Jianzhou Zhao 0ece51c60c Add raw_fd_stream that supports reading/seeking/writing
This is used by https://reviews.llvm.org/D86905 to support bitcode
writer's incremental flush.
2020-09-12 07:34:19 +00:00
Serge Pavlov 6c8041aa0f [AST][FPEnv] Keep FP options in trailing storage of CastExpr
This change allow a CastExpr to have optional FPOptionsOverride object,
stored in trailing storage. Of all cast nodes only ImplicitCastExpr,
CStyleCastExpr, CXXFunctionalCastExpr and CXXStaticCastExpr are allowed
to have FPOptions.

Differential Revision: https://reviews.llvm.org/D85960
2020-09-12 14:30:44 +07:00
QingShan Zhang 0680a3d56d [Power10] Enable the heuristic for Power10 and switch the sched model
with P9 Model

Enable the pre-ra and post-ra scheduler strategy for Power10 as we want
to customize the heuristic later. And switch the scheduler model with P9
model before P10 Model is available. The NoSchedModel is modelled as
in-order cpu and the pre-ra scheduler is not bi-directional which will
have big impact on the scheduler.

Reviewed By: jji

Differential Revision: https://reviews.llvm.org/D86865
2020-09-12 02:49:47 +00:00
QingShan Zhang 528554c39b [PowerPC] Set the mayRaiseFPException for FCMPUS/FCMPUD
From ISA, fcmpu will raise the Floating-Point Invalid Operation
Exception (SNaN) if either of the operands is a Signaling NaN by setting
the bit VXSNAN. But the instruction description didn't set the
mayRaiseFPException which might have impact on the scheduling or some
backend optimization.

Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D83937
2020-09-12 02:42:22 +00:00
LLVM GN Syncbot 0e0d93e2f0 [gn build] Port ad99e34c59 2020-09-12 01:54:23 +00:00