forked from OSchip/llvm-project
69716394f3
Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with *similar* characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with *different* performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into *different* clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move *all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, *this* measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441 |
||
---|---|---|
.. | ||
AMDGPU | ||
CommandGuide | ||
Frontend | ||
HistoricalNotes | ||
PDB | ||
Proposals | ||
TableGen | ||
_ocamldoc | ||
_static | ||
_templates | ||
_themes/llvm-theme | ||
tutorial | ||
AMDGPUInstructionNotation.rst | ||
AMDGPUInstructionSyntax.rst | ||
AMDGPUModifierSyntax.rst | ||
AMDGPUOperandSyntax.rst | ||
AMDGPUUsage.rst | ||
ARM-BE-bitcastfail.png | ||
ARM-BE-bitcastsuccess.png | ||
ARM-BE-ld1.png | ||
ARM-BE-ldr.png | ||
AdvancedBuilds.rst | ||
AliasAnalysis.rst | ||
Atomics.rst | ||
Benchmarking.rst | ||
BigEndianNEON.rst | ||
BitCodeFormat.rst | ||
BlockFrequencyTerminology.rst | ||
BranchWeightMetadata.rst | ||
BugLifeCycle.rst | ||
Bugpoint.rst | ||
CFIVerify.rst | ||
CMake.rst | ||
CMakeLists.txt | ||
CMakePrimer.rst | ||
CodeGenerator.rst | ||
CodeOfConduct.rst | ||
CodingStandards.rst | ||
CommandLine.rst | ||
CompileCudaWithLLVM.rst | ||
CompilerWriterInfo.rst | ||
Contributing.rst | ||
Coroutines.rst | ||
CoverageMappingFormat.rst | ||
DebuggingJITedCode.rst | ||
DeveloperPolicy.rst | ||
Docker.rst | ||
ExceptionHandling.rst | ||
ExtendedIntegerResults.txt | ||
ExtendingLLVM.rst | ||
Extensions.rst | ||
FAQ.rst | ||
FaultMaps.rst | ||
FuzzingLLVM.rst | ||
GarbageCollection.rst | ||
GetElementPtr.rst | ||
GettingStarted.rst | ||
GettingStartedVS.rst | ||
GlobalISel.rst | ||
GoldPlugin.rst | ||
HowToAddABuilder.rst | ||
HowToBuildOnARM.rst | ||
HowToBuildWithPGO.rst | ||
HowToCrossCompileBuiltinsOnArm.rst | ||
HowToCrossCompileLLVM.rst | ||
HowToReleaseLLVM.rst | ||
HowToSetUpLLVMStyleRTTI.rst | ||
HowToSubmitABug.rst | ||
HowToUseAttributes.rst | ||
HowToUseInstrMappings.rst | ||
InAlloca.rst | ||
LLVMBuild.rst | ||
LLVMBuild.txt | ||
LangRef.rst | ||
Lexicon.rst | ||
LibFuzzer.rst | ||
LinkTimeOptimization.rst | ||
MCJIT-creation.png | ||
MCJIT-dyld-load.png | ||
MCJIT-engine-builder.png | ||
MCJIT-load-object.png | ||
MCJIT-load.png | ||
MCJIT-resolve-relocations.png | ||
MCJITDesignAndImplementation.rst | ||
MIRLangRef.rst | ||
Makefile.sphinx | ||
MarkdownQuickstartTemplate.md | ||
MarkedUpDisassembly.rst | ||
MemorySSA.rst | ||
MergeFunctions.rst | ||
NVPTXUsage.rst | ||
OptBisect.rst | ||
Packaging.rst | ||
Passes.rst | ||
Phabricator.rst | ||
ProgrammersManual.rst | ||
Projects.rst | ||
README.txt | ||
ReleaseNotes.rst | ||
ReleaseProcess.rst | ||
ReportingGuide.rst | ||
ScudoHardenedAllocator.rst | ||
SegmentedStacks.rst | ||
SourceLevelDebugging.rst | ||
SpeculativeLoadHardening.md | ||
SphinxQuickstartTemplate.rst | ||
StackMaps.rst | ||
StackSafetyAnalysis.rst | ||
Statepoints.rst | ||
SupportLibrary.rst | ||
SystemLibrary.rst | ||
TableGenFundamentals.rst | ||
TestSuiteGuide.md | ||
TestSuiteMakefileGuide.rst | ||
TestingGuide.rst | ||
TransformMetadata.rst | ||
TypeMetadata.rst | ||
Vectorizers.rst | ||
WritingAnLLVMBackend.rst | ||
WritingAnLLVMPass.rst | ||
XRay.rst | ||
XRayExample.rst | ||
XRayFDRFormat.rst | ||
YamlIO.rst | ||
conf.py | ||
doxygen-mainpage.dox | ||
doxygen.cfg.in | ||
gcc-loops.png | ||
index.rst | ||
linpack-pc.png | ||
llvm-objdump.1 | ||
make.bat | ||
re_format.7 | ||
speculative_load_hardening_microbenchmarks.png | ||
yaml2obj.rst |
README.txt
LLVM Documentation ================== LLVM's documentation is written in reStructuredText, a lightweight plaintext markup language (file extension `.rst`). While the reStructuredText documentation should be quite readable in source form, it is mostly meant to be processed by the Sphinx documentation generation system to create HTML pages which are hosted on <http://llvm.org/docs/> and updated after every commit. Manpage output is also supported, see below. If you instead would like to generate and view the HTML locally, install Sphinx <http://sphinx-doc.org/> and then do: cd <build-dir> cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true <src-dir> make -j3 docs-llvm-html $BROWSER <build-dir>/docs//html/index.html The mapping between reStructuredText files and generated documentation is `docs/Foo.rst` <-> `<build-dir>/docs//html/Foo.html` <-> `http://llvm.org/docs/Foo.html`. If you are interested in writing new documentation, you will want to read `SphinxQuickstartTemplate.rst` which will get you writing documentation very fast and includes examples of the most important reStructuredText markup syntax. Manpage Output =============== Building the manpages is similar to building the HTML documentation. The primary difference is to use the `man` makefile target, instead of the default (which is `html`). Sphinx then produces the man pages in the directory `<build-dir>/docs/man/`. cd <build-dir> cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_MAN=true <src-dir> make -j3 docs-llvm-man man -l >build-dir>/docs/man/FileCheck.1 The correspondence between .rst files and man pages is `docs/CommandGuide/Foo.rst` <-> `<build-dir>/docs//man/Foo.1`. These .rst files are also included during HTML generation so they are also viewable online (as noted above) at e.g. `http://llvm.org/docs/CommandGuide/Foo.html`. Checking links ============== The reachability of external links in the documentation can be checked by running: cd docs/ make -f Makefile.sphinx linkcheck Doxygen page Output ============== Install doxygen <http://www.stack.nl/~dimitri/doxygen/download.html> and dot2tex <https://dot2tex.readthedocs.io/en/latest>. cd <build-dir> cmake -DLLVM_ENABLE_DOXYGEN=On <llvm-top-src-dir> make doxygen-llvm # for LLVM docs make doxygen-clang # for clang docs It will generate html in <build-dir>/docs/doxygen/html # for LLVM docs <build-dir>/tools/clang/docs/doxygen/html # for clang docs