This creates a way to configure MSAN to for eager checks that will be leveraged
by the introduction of a clang flag (-fsanitize-memory-param-retval).
This is redundant with the existing flag: -mllvm -msan-eager-checks.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D116855
This builds on the code from D114963, and extends it to handle calls both direct and indirect. With the revised code structure (from series of previously landed NFCs), this is pretty straight forward.
One thing to note is that we can not infer writeonly for arguments which might be captured. If the pointer can be read back by the caller, and then read through, we have no way to track that. This is the same restriction we have for readonly, except that we get no mileage out of the "callee can be readonly" exception since a writeonly param on a readonly function is either a) readnone or b) UB. This means we can't actually infer much unless nocapture has already been inferred.
Differential Revision: https://reviews.llvm.org/D115003
Summary:
A new option exec-on-ir-changed is defined that allows one to specify an
exe that is called after each pass in the opt pipeline that changes the IR.
The exec-on-ir-change=exe option saves the IR in a temporary file and calls exe
with the name of the file and the name of the pass that just changed it after
each pass alters the IR. exe is also called with the initial IR. This
can be used, for example, to determine which pass corrupts the IR by having
exe as a script that calls llc and runs a test to see after which pass the
results change. The print-changed filtering options are respected. Note that
this is only supported with the new pass manager.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D110776
This patch uses a similar trick as in D113947 to only run the extra
passes after vectorization on functions where loops have been
vectorized.
The reason for running the 'extra vector passes' is
simplification/unswitching of the runtime checks created by LV, there
should be no need to run them if nothing got vectorized
To do that, a new dummy analysis ShouldRunExtraVectorPasses has been
added. If loops have been vectorized for a function, LV will cache the
analysis. At the moment it uses MadeCFGChanges as proxy for loop
vectorized, which isn't perfect (it could be too aggressive, e.g.
because no runtime checks have been added), but should be good enough
for now.
The extra passes are now managed by a new FunctionPassManager that
runs its passes only if ShouldRunExtraVectorPasses has been cached.
Without this patch, `-extra-vectorizer-passes` has the following
compile-time impact:
NewPM-O3: +4.86%
NewPM-ReleaseThinLTO: +3.56%
NewPM-ReleaseLTO-g: +7.17%
http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=c292da649e2c6e88a31e702fdc474727d09c72bc&stat=instructions
With this patch, that gets reduced to
NewPM-O3: +1.43%
NewPM-ReleaseThinLTO: +1.00%
NewPM-ReleaseLTO-g: +1.58%
http://llvm-compile-time-tracker.com/compare.php?from=ead3979a92fc33add4710c4510d6906260dcb4ad&to=e67d86b57810011cf285eb9aa1944781be6096f0&stat=instructions
It is probably still too high to enable by default, but much better.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115052
Summary:
A new option test-changed is defined that allows one to specify an
exe that is called after each pass in the opt pipeline that changes the IR.
The test-changed=exe option saves the IR in a temporary file and calls exe
with the name of the file and the name of the pass that just changed it after
each pass alters the IR. exe is also called with the initial IR. This
can be used, for example, to determine which pass corrupts the IR by having
exe as a script that calls llc and runs a test to see after which pass the
results change. The print-changed filtering options are respected.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D110776
Swap AIC and IC neighbouring in pipeline. This looks more natural and even
almost has no effect for now (three slightly touched tests of test-suite). Also
this could be the first step towards merging AIC (or its part) to -O2 pipeline.
After several changes in AIC (like D108091, D108201, D107766, D109515, D109236)
there've been observed several regressions (like PR52078, PR52253, PR52289)
that were fixed in different passes (see D111330, D112721) by extending their
functionality, but these regressions were exposed since changed AIC prevents IC
from making some of early optimizations.
This is common problem and it should be fixed by just moving AIC after IC
which looks more logically by itself: make aggressive instruction combining
only after failed ordinary one.
Fixes PR52289
Reviewed By: spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D113179
Add -NOT lines to ensure that no extra passes are run if
-extra-vectorizer-passes is not specified.
Also add a loop that actually gets vectorized in preparation for
D115052.
Summary:
Expand the testing for whether the lit tests for print-changed=dot-cfg
are supported to include checking whether dot supports pdf output.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: hvdijk (Harald van Dijk)
Differential Revision: https://reviews.llvm.org/D113187
Fix printing of LoopNestPasses when using the opt pipeline printer
option -print-pipeline-passes.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D114771
The legacy PM is deprecated, so update a bunch of lit tests running
opt to use the new PM syntax when specifying the pipeline.
In this patch focus has been put on test cases for ConstantMerge,
ConstraintElimination, CorrelatedValuePropagation, GlobalDCE,
GlobalOpt, SCCP, TailCallElim and PredicateInfo.
Differential Revision: https://reviews.llvm.org/D114516
In a CGSCC pass manager, we may visit the same function multiple times
due to SCC mutations. In the inliner pipeline, this results in running
the function simplification pipeline on a function multiple times even
if it hasn't been changed since the last function simplification
pipeline run.
We use a newly introduced analysis to keep track of whether or not a
function has changed since the last time the function simplification
pipeline has run on it. If we see this analysis available for a function
in a CGSCCToFunctionPassAdaptor, we skip running the function passes on
the function. The analysis is queried at the end of the function passes
so that it's available after the first time the function simplification
pipeline runs on a function. This is a per-adaptor option so it doesn't
apply to every adaptor.
The goal of this is to improve compile times. However, currently we
can't turn this on by default at least for the higher optimization
levels since the function simplification pipeline is not robust enough
to be idempotent in many cases, resulting in performance regressions if
we stop running the function simplification pipeline on a function
multiple times. We may be able to turn this on for -O1 in the near
future, but turning this on for higher optimization levels would require
more investment in the function simplification pipeline.
Heavily inspired by D98103.
Example compile time improvements with flag turned on:
https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D113947
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved since we've already handled function analysis invalidation.
So far this only touches the inliner, argpromotion, function-attrs, and
updateCGAndAnalysisManager(), since they are the most used.
This is part of an effort to investigate running the function
simplification pipeline less on functions we visit multiple times in the
inliner pipeline.
However, this causes major memory regressions especially on larger IR.
To counteract this, turn on the option to eagerly invalidate function
analyses. This invalidates analyses on functions immediately after
they're processed in a module or scc to function adaptor for specific
parts of the pipeline.
Within an SCC, if a pass only modifies one function, other functions in
the SCC do not have their analyses invalidated, so in later function
passes in the SCC pass manager the analyses may still be cached. It is
only after the function passes that the eager invalidation takes effect.
For the default pipelines this makes sense because the inliner pipeline
runs the function simplification pipeline after all other SCC passes
(except CoroSplit which doesn't request any analyses).
Overall this has mostly positive effects on compile time and positive effects on memory usage.
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructionshttps://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss
D113196 shows that we slightly regressed compile times in exchange for
some memory improvements when turning on eager invalidation. D100917
shows that we slightly improved compile times in exchange for major
memory regressions in some cases when invalidating less in SCC passes.
Turning these on at the same time keeps the memory improvements while
keeping compile times neutral/slightly positive.
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D113304
To be more consistent with other pass struct names.
There are still more passes that don't end with "Pass", but these are the important ones.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D112935
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC.
- This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,
```
foo(float *p) {
__builtin_assume(__isGlobal(p));
// From there, we could assume p is a global pointer instead of a
// generic one.
}
```
This makes the code portable without introducing the implementation-specific features.
Note that NVCC starts to support __builtin_assume from version 11.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D112041
Having a NoOpLoopNestPass can ensure that only outermost loop is invoked
for a LoopNestPass with a lit test.
There are some existing passes that are implemented as LoopNestPass, but
they are still using LOOP_PASS macro.
It would be easier to identify LoopNestPasses with a LOOPNEST_PASS
macro.
Differential Revision: https://reviews.llvm.org/D113185
Summary:
Add new options -print-changed=[dot-cfg | dot-cfg-quiet] which create
a website of DOT files showing colourized changes as the IR is changed
by passes in the new pass manager pipeline.
A new change reporter is introduced that creates a website of changes made
by passes in the opt pipeline that change the IR. The hidden option
-dot-cfg-dir=<dir> specifies a directory (defaulting to "./") into which the
website will be created.
A file passes.html is created that contains a list of all the passes that
act on the IR. Those that do not change the IR are listed as omitted
because of no change, ignored or filtered out (using -filter-print-func
and -filter-passes) or not listed in quiet mode. Those that
do change the IR are listed as a link to a DOT file which contains a
CFG depiction of the IR (ala -dot-cfg) except that the instructions,
basic blocks and links that are only in the IR before the pass (ie, removed)
and those that are only in the IR after the pass (ie, added) are shown in
red and green, respectively, while the aspects of the CFG that do not change
are shown in black. Additional hidden options
-dot-cfg-before-color=<dot named color>,
-dot-cfg-after-color=<dot named color> and
-dot-cfg-common-color=<dot named color> are defined that allow the
customization of the colors used in colorizing the CFG.
-change-printer-dot-path=<path to dot exe> is also added.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D87202
Right now when we see -O# we add the corresponding 'default<O#>' into
the list of passes to run when translating legacy -pass-name. This has
the side effect of not using the default AA pipeline.
Instead, treat -O# as -passes='default<O#>', but don't allow any other
-passes or -pass-name. I think we can keep `opt -O#` as shorthand for
`opt -passes='default<O#>` but disallow anything more than just -O#.
Tests need to be updated to not use `opt -O# -pass-name`.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D112036
If another inlining session came after a ModuleInlinerWrapperPass, the
advisor alanysis would still be cached, but its Result would be cleared.
We need to clear both.
This addresses PR52118
Differential Revision: https://reviews.llvm.org/D111586
This adds the `--dump-blockinfo` flag to `llvm-bcanalyzer`, allowing a sufficiently motivated user to dump (parts of) the `BLOCKINFO_BLOCK` block. The default behavior is unchanged, and `--dump-blockinfo` only takes effect in the same context as other flags that control dump behavior (i.e., requires that `--dump` is also passed).
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D107536
Summary:
The IR is saved in its print form before each pass is started and a
signal handler is registered. If the compilation crashes, the signal
handler will print the saved IR to dbgs(). This option
can be modified using -print-module-scope to get the IR for the complete
module. Filtering options can be used to improve performance by limiting
which passes (or functions) save the IR. Note that this option only works
with the new pass manager.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks) yrouban (Yevgeny Rouban)
Differential Revision: https://reviews.llvm.org/D86657
Bisecting and reducing opt pipelines that includes the
ModuleInlinerWrapperPass has turned out to be a bit problematic.
This is far from perfect (it still lacks information about inline
advisor params etc.), but it should give some kind of hint to what
the wrapped pipeline looks like when using -print-pipeline-passes.
Reviewed By: aeubanks, mtrofin
Differential Revision: https://reviews.llvm.org/D109878
IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.
This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like
using matrix_t = double __attribute__((matrix_type(15, 15)));
void foo(unsigned i, matrix_t &A, matrix_t &B) {
for (unsigned j = 0; j < 4; ++j)
for (unsigned k = 0; k < i; k++)
B[k][j] -= A[k][j] * B[i][j];
}
https://clang.godbolt.org/z/6dKxK1Ed7
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D102496
Split MemorySanitizerPass into MemorySanitizerPass (as a function
pass) and ModuleMemorySanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.
This is a follow-up to D105006 and D105007.
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.
Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.
The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).
LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch
Differential Revision: https://reviews.llvm.org/D109310
Currently, opaque pointers are supported in two forms: The
-force-opaque-pointers mode, where all pointers are opaque and
typed pointers do not exist. And as a simple ptr type that can
coexist with typed pointers.
This patch removes support for the mixed mode. You either get
typed pointers, or you get opaque pointers, but not both. In the
(current) default mode, using ptr is forbidden. In -opaque-pointers
mode, all pointers are opaque.
The motivation here is that the mixed mode introduces additional
issues that don't exist in fully opaque mode. D105155 is an example
of a design problem. Looking at D109259, it would probably need
additional work to support mixed mode (e.g. to generate GEPs for
typed base but opaque result). Mixed mode will also end up
inserting many casts between i8* and ptr, which would require
significant additional work to consistently avoid.
I don't think the mixed mode is particularly valuable, as it
doesn't align with our end goal. The only thing I've found it to
be moderately useful for is adding some opaque pointer tests in
between typed pointer tests, but I think we can live without that.
Differential Revision: https://reviews.llvm.org/D109290
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.
As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass
At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
(currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
BitcodeWriterPass).
Differential Revision: https://reviews.llvm.org/D108298
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.
As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass
At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
(currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
BitcodeWriterPass).
Only enforce that ptr* is illegal if the base type is a simple type,
not when it is something like %ty, where %ty may resolve to an
opaque pointer in force-opaque-pointers mode.
Differential Revision: https://reviews.llvm.org/D108876
Currently it's possible to silently use a loop pass that does not
preserve MemorySSA in a loop-mssa pass manager, as we don't
statically know which loop passes preserve MemorySSA (as was the
case with the legacy pass manager).
However, we can at least add a check after the fact that if
MemorySSA is used, then it should also have been preserved.
Hopefully this will reduce confusion as seen in
https://bugs.llvm.org/show_bug.cgi?id=51020.
Differential Revision: https://reviews.llvm.org/D108399
MSSA-based LICM has been enabled by default for a few years now.
This drops the old AST-based implementation. Using loop(licm) will
result in a fatal error, the use of loop-mssa(licm) is required
(or just licm, which defaults to loop-mssa).
Note that the core canSinkOrHoistInst() logic has to retain AST
support for now, because it is shared with LoopSink.
Differential Revision: https://reviews.llvm.org/D108244
`LLVM :: Other/lit-quoting.txt` currently `FAIL`s on Solaris:
llvm/test/Other/lit-quoting.txt:8:9: error: CHECK2: expected string not found in input
CHECK2: {{^a\[b\\c$}}
^
<stdin>:1:1: note: scanning from here
a[b
^
This happens because echo with backslashes or special characters is
unportable, as extensively documented in the Autoconf manual. In the case
at hand, `echo 'a[b\c'` yields `a[b\c` on Linux, but `a[b` (no newline) on
Solaris.
This patch fixes this by using the portable alternative suggested in the
Autoconf manual.
Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D108031
Summary:
The assertion that both functions were not missing was incorrect and would
fail when one of the functions was missing. Fixed it and moved the
assertion earlier to check the input parameters to better capture
first-failure. Added lit test.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D107989