llvm-project

Commit Graph

Author	SHA1	Message	Date
Andrea Di Biagio	edbf06a767	[AsmPrinter] Remove hidden flag -print-schedule. This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043	2019-02-04 12:51:26 +00:00
Roman Lebedev	bd84b139b0	[llvm-exegesis] Cut run time of analysis mode by another -35% (sic) (YamlContext::getRegNo()) Summary: Together with the previous patch, it's an -90% improvement, or roughly -96% improvement if you look starting with rL347204 ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs): 1483.18 msec task-clock # 0.999 CPUs utilized ( +- 0.10% ) 68 context-switches # 46.085 M/sec ( +- 22.62% ) 0 cpu-migrations # 0.000 K/sec 11641 page-faults # 7850.880 M/sec ( +- 0.62% ) 5943246799 cycles # 4008184.428 GHz ( +- 0.10% ) (83.28%) 442869514 stalled-cycles-frontend # 7.45% frontend cycles idle ( +- 0.41% ) (83.29%) 1443375663 stalled-cycles-backend # 24.29% backend cycles idle ( +- 0.47% ) (33.43%) 7714006752 instructions # 1.30 insn per cycle # 0.19 stalled cycles per insn ( +- 0.07% ) (50.17%) 1977242936 branches # 1333472193.855 M/sec ( +- 0.07% ) (66.79%) 32624220 branch-misses # 1.65% of all branches ( +- 0.18% ) (83.34%) 1.48438 +- 0.00143 seconds time elapsed ( +- 0.10% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-newer.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-newer.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-newer.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-newer.html' (9 runs): 963.28 msec task-clock # 0.999 CPUs utilized ( +- 0.37% ) 12 context-switches # 12.695 M/sec ( +- 52.79% ) 0 cpu-migrations # 0.000 K/sec 11599 page-faults # 12046.971 M/sec ( +- 0.59% ) 3860122322 cycles # 4009359.596 GHz ( +- 0.37% ) (83.19%) 380300669 stalled-cycles-frontend # 9.85% frontend cycles idle ( +- 0.34% ) (83.30%) 1071910340 stalled-cycles-backend # 27.77% backend cycles idle ( +- 1.30% ) (33.51%) 4773418224 instructions # 1.24 insn per cycle # 0.22 stalled cycles per insn ( +- 0.15% ) (50.17%) 1106990316 branches # 1149787979.919 M/sec ( +- 0.11% ) (66.80%) 23632231 branch-misses # 2.13% of all branches ( +- 0.18% ) (83.33%) 0.96389 +- 0.00356 seconds time elapsed ( +- 0.37% ) ``` ``` $ sha512sum /tmp/clusters-* db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-bew.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-newer.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-old.html ``` Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57658 llvm-svn: 353025	2019-02-04 09:12:25 +00:00
Roman Lebedev	5b94fe9623	[llvm-exegesis] Cut run time of analysis mode by -84% (sic) (YamlContext::getInstrOpcode()) Summary: ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 9465.46 msec task-clock # 1.000 CPUs utilized ( +- 0.05% ) 60 context-switches # 6.363 M/sec ( +- 79.45% ) 0 cpu-migrations # 0.000 K/sec 11364 page-faults # 1200.697 M/sec ( +- 0.60% ) 37935623543 cycles # 4008083.912 GHz ( +- 0.05% ) (83.32%) 2371625356 stalled-cycles-frontend # 6.25% frontend cycles idle ( +- 0.37% ) (83.32%) 8476077875 stalled-cycles-backend # 22.34% backend cycles idle ( +- 0.18% ) (33.36%) 41822439158 instructions # 1.10 insn per cycle # 0.20 stalled cycles per insn ( +- 0.02% ) (50.03%) 11607658944 branches # 1226405861.486 M/sec ( +- 0.01% ) (66.69%) 210864633 branch-misses # 1.82% of all branches ( +- 0.06% ) (83.34%) 9.46636 +- 0.00441 seconds time elapsed ( +- 0.05% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 14656 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-bew.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs): 1480.66 msec task-clock # 1.000 CPUs utilized ( +- 0.19% ) 13 context-switches # 8.483 M/sec ( +- 83.10% ) 0 cpu-migrations # 0.075 M/sec ( +-100.00% ) 11596 page-faults # 7834.247 M/sec ( +- 0.59% ) 5933732194 cycles # 4008977.535 GHz ( +- 0.19% ) (83.22%) 438111928 stalled-cycles-frontend # 7.38% frontend cycles idle ( +- 0.37% ) (83.25%) 1454969705 stalled-cycles-backend # 24.52% backend cycles idle ( +- 0.94% ) (33.53%) 7724218604 instructions # 1.30 insn per cycle # 0.19 stalled cycles per insn ( +- 0.07% ) (50.14%) 1979796413 branches # 1337599858.945 M/sec ( +- 0.06% ) (66.74%) 32641638 branch-misses # 1.65% of all branches ( +- 0.18% ) (83.31%) 1.48128 +- 0.00284 seconds time elapsed ( +- 0.19% ) $ sha512sum /tmp/clusters-* db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-bew.html db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf /tmp/clusters-old.html ``` Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D57657 llvm-svn: 353024	2019-02-04 09:12:21 +00:00
Roman Lebedev	1a0d595f15	[llvm-exegesis] Throughput support in analysis mode Summary: D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 \| PR37698 ]] added support for measuring of the inverse throughput. But the support for the analysis was not added. This attempts to fix that. (analysis done o bdver2 / piledriver) First, small-scale experiment: ``` $ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o --- mode: inverse_throughput key: instructions: - 'BSF64rr RAX RDX' config: '' register_initial_values: - 'RDX=0x0' cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 } error: '' info: instruction has no tied variables picking Uses different from defs assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3 ... ``` If we plug `bsfq %r12, %r10` into llvm-mca: https://godbolt.org/z/ZtOyhJ ``` Dispatch Width: 4 uOps Per Cycle: 3.00 IPC: 0.50 Block RThroughput: 2.0 ``` So RThroughput mismatch exists. Now, let's upscale and analyse: {F8207148} `$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`: {F8207172} {F8207197} And if we now look at https://www.agner.org/optimize/instruction_tables.pdf, `Reciprocal throughput` for `BSF r,r` is listed as `3`. Yay? Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57647 llvm-svn: 353023	2019-02-04 09:12:17 +00:00
Roman Lebedev	dc78bc277d	[llvm-exegesis] deserializeMCInst(): bump SmallVector small size up to 16 Summary: ... from 8. `VALIGNDZ128rmbik XMM0 XMM0 K1 XMM3 RDI i_0x1 i_0x0 i_0x1` instruction already has 9 components. It does not matter much in terms of performance, but avoiding allocation seems to come with low cost here.. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57654 llvm-svn: 353022	2019-02-04 09:12:13 +00:00
Roman Lebedev	21193f4b7e	[llvm-exegesis] Don't default to running&dumping all analyses to '-' Summary: Up until the point i have looked in the source, i didn't even understood that i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`. (And hoped it wasn't contributing much of the run time.) While i expect that it has it's use-cases i never once needed it so far. If i forget to silence it, console is completely flooded with that output. How about not expecting users to opt-out of analyses, but to explicitly specify the analyses that should be performed? Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57648 llvm-svn: 353021	2019-02-04 09:12:08 +00:00
Clement Courbet	362653f7af	[llvm-exegesis] Add throughput mode. Summary: This just uses the latency benchmark runner on the parallel uops snippet generator. Fixes PR37698. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D57000 llvm-svn: 352632	2019-01-30 16:02:20 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Clement Courbet	176388c973	Revert rL350035 "[llvm-exegesis] Clustering: don't enqueue a point multiple times" Let's discuss this on the review thread before submitting. llvm-svn: 350207	2019-01-02 09:21:00 +00:00
Fangrui Song	cd93d7ef43	[llvm-exegesis] Clustering: don't enqueue a point multiple times Summary: SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so we can replace the DenseSet part by indexing into a vector<char> instead. Don't cargo cult the pseudocode on the wikipedia DBSCAN page. This is a standard BFS style algorithm (the similar loops have been used several times in other LLVM components): every point is processed at most once, thus the queue has at most NumPoints elements. We represent it with a vector and allocate it outside of the loop to avoid allocation in the loop body. We check `Processed[P]` to avoid enqueueing a point more than once, which also nicely saves us a `ClusterIdForPoint_[Q].isUndef()` check. Many people hate the oneshot abstraction but some favor it, therefore we make a compromise, use a lambda to abstract away the neighbor adding process. Delete the comment `assert(Neighbors.capacity() == (Points_.size() - 1));` as it is wrong. llvm-svn: 350035	2018-12-23 20:48:52 +00:00
Simon Pilgrim	96408bb04a	Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess` We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check. Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54442 ........ Patch wasn't approved and breaks buildbots llvm-svn: 349139	2018-12-14 09:25:08 +00:00
Fangrui Song	92537ccc7e	[llvm-exegesis] Optimize ToProcess in dbScan Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess` We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check. Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54442 llvm-svn: 349136	2018-12-14 08:27:35 +00:00
Jinsong Ji	56c74cff70	[llvm-exegesis][NFC] Some code style cleanup Apply review comments of https://reviews.llvm.org/D54185 to other target as well, specifically: 1. make anonymous namespaces as small as possible, avoid using static inside anonymous namespaces 2. Add missing header to some files 3. GetLoadImmediateOpcodem-> getLoadImmediateOpcode 4. Fix typo Differential Revision: https://reviews.llvm.org/D54343 llvm-svn: 347309	2018-11-20 14:41:59 +00:00
Clement Courbet	bbab546a71	[llvm-exegesis][NFC] More tests for ExegesisTarget::fillMemoryOperands(). Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54304 llvm-svn: 347209	2018-11-19 14:31:43 +00:00
Roman Lebedev	71fdb57640	[llvm-exegesis] (+final perf overview) InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors Summary: As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known, it will contain at most Points_.size() minus one (the center of the cluster) While that is the upper bound, meaning in the most cases, the actual count will be much smaller, since D54390 made the allocation persistent, we no longer have to worry about overly-optimistically `reserve()`ing. Old: (D54393) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% ) ... 6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% ) ... 6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% ) ``` And that is another -~4%. Since this is the last (as of this moment) patch in this patch series, it is a good time to summarize: Old: (svn trunk, as stated in D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` So these patches, on a given benchmark, has decreased llvm-exegesis analysis time by 74.62%. There surely is more room for further improvements. D54514 may improve thins by -11.5% more (relative to this patch). Parallelization may improve things further significantly, too. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54415 llvm-svn: 347204	2018-11-19 13:28:41 +00:00
Roman Lebedev	8e315b66c2	[llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header Summary: Old: (D54390) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7432.421721 task-clock (msec) # 1.000 CPUs utilized ( +- 0.15% ) ... 7.4336 +- 0.0115 seconds time elapsed ( +- 0.15% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 6569.936144 task-clock (msec) # 1.000 CPUs utilized ( +- 0.22% ) ... 6.5711 +- 0.0143 seconds time elapsed ( +- 0.22% ) ``` And another -12%. You'd think it would be `inline`d anyway, but no! :) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54393 llvm-svn: 347203	2018-11-19 13:28:36 +00:00
Roman Lebedev	666d855fbb	[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): write into llvm::SmallVectorImpl& output parameter Summary: I do believe this is the correct fix. We call `rangeQuery()` very often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help. Old: (D54389) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7934.528363 task-clock (msec) # 1.000 CPUs utilized ( +- 0.19% ) ... 7.9354 +- 0.0148 seconds time elapsed ( +- 0.19% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7383.793440 task-clock (msec) # 1.000 CPUs utilized ( +- 0.47% ) ... 7.3868 +- 0.0340 seconds time elapsed ( +- 0.46% ) ``` And another -7%. And that isn't even the good bit yet. Old: * calls to allocation functions: 2081419 * temporary allocations: 219658 (10.55%) * bytes allocated in total (ignoring deallocations): 4.31 GB New: * calls to allocation functions: 1880295 (-10%) * temporary allocations: 18758 (1%) (-91% sic) * bytes allocated in total (ignoring deallocations): 545.15 MB (-88% sic) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54390 llvm-svn: 347202	2018-11-19 13:28:31 +00:00
Roman Lebedev	5c5b1ea725	[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace std::vector<> with std::deque<> in llvm::SetVector<> Summary: Old: (D54388) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8606.323981 task-clock (msec) # 1.000 CPUs utilized ( +- 0.11% ) ... 8.60773 +- 0.00978 seconds time elapsed ( +- 0.11% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 7971.403653 task-clock (msec) # 1.000 CPUs utilized ( +- 0.14% ) ... 7.9728 +- 0.0113 seconds time elapsed ( +- 0.14% ) ``` Another -~7%. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, RKSimon Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54389 llvm-svn: 347201	2018-11-19 13:28:26 +00:00
Roman Lebedev	8aecb0c489	[llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use llvm::SmallVector<size_t, 0> for storage. Summary: Old: (D54383) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 9098.781978 task-clock (msec) # 1.000 CPUs utilized ( +- 0.16% ) ... 9.1015 +- 0.0148 seconds time elapsed ( +- 0.16% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs): 8553.352480 task-clock (msec) # 1.000 CPUs utilized ( +- 0.12% ) ... 8.5539 +- 0.0105 seconds time elapsed ( +- 0.12% ) ``` So another -6%. That is because the `SmallVector` doubles it size when reallocating, which is great here, since we can't `reserve()` since we can't know how many `Neighbors` we will have. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54388 llvm-svn: 347200	2018-11-19 13:28:22 +00:00
Roman Lebedev	b311c1d6b8	[llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54382) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% ) ... 9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% ) ``` New time: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% ) ... 9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% ) ``` -~0.3%, not that much. But this isn't the important part. Old: * calls to allocation functions: 2109712 * temporary allocations: 33112 * bytes allocated in total (ignoring deallocations): 4.43 GB New: * calls to allocation functions: 2095345 (-0.68%) * temporary allocations: 18745 (-43.39% !!!) * bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54383 llvm-svn: 347199	2018-11-19 13:28:17 +00:00
Roman Lebedev	f8b28e9bf4	[llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.487s user 0m9.745s sys 0m0.740s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m9.599s user 0m8.824s sys 0m0.772s ``` Not that much, around -9%. But that is not the good part yet, again. Old: * calls to allocation functions: 3347676 * temporary allocations: 277818 * bytes allocated in total (ignoring deallocations): 10.52 GB New: * calls to allocation functions: 2109712 (-36%) * temporary allocations: 33112 (-88%) * bytes allocated in total (ignoring deallocations): 4.43 GB (-58% sic) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54382 llvm-svn: 347198	2018-11-19 13:28:14 +00:00
Roman Lebedev	0b4b512826	[llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<> Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m24.884s user 0m24.099s sys 0m0.785s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.469s user 0m9.797s sys 0m0.672s ``` So -60%. And that isn't the good bit yet. Old: * calls to allocation functions: 106560180 (yes, 107 million allocations.) * bytes allocated in total (ignoring deallocations): 12.17 GB New: * calls to allocation functions: 3347676 (-96.86%) (just 3 mil) * bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less) --- Two points i want to raise: * `std::unordered_set<>` should not have been used there in the first place. It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options * There is no tests, so i'm not fully sure this is correct. Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok? * I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc, this `llvm::SetVector<>` seems to be best here. Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: kristina, bobsayshilol, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54381 llvm-svn: 347197	2018-11-19 13:28:09 +00:00
Clement Courbet	eee2e06e2a	[llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target. Summary: This simplifies the code and moves everything to tablegen for consistency. This also prepares the ground for adding issue counters. Reviewers: gchatelet, john.brawn, jsji Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54297 llvm-svn: 346489	2018-11-09 13:15:32 +00:00
Jinsong Ji	5fd3e75478	[PowerPC][llvm-exegesis] Add a PowerPC target This is patch to add PowerPC target to llvm-exegesis. The target does just enough to be able to run llvm-exegesis in latency mode for at least some opcodes. Differential Revision: https://reviews.llvm.org/D54185 llvm-svn: 346411	2018-11-08 16:51:42 +00:00
Clement Courbet	54c2fa1202	[llvm-exegesis][NFC] Add missing header guard + cosmetics. Reviewers: gchatelet Reviewed By: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54252 llvm-svn: 346400	2018-11-08 12:37:56 +00:00
Clement Courbet	0d79aaf1a7	Revert "[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes." This reverts accidental commit rL346394. llvm-svn: 346398	2018-11-08 12:09:45 +00:00
Clement Courbet	c0950ae990	[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes. llvm-svn: 346394	2018-11-08 11:45:14 +00:00
Clement Courbet	5b0d783078	[llvm-exegesis] Remove superfluous move. /Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move] return std::move(Error); ^ /Users/buildslave/as-bldslv9_new/lld-x86_64-darwin13/llvm.src/tools/llvm-exegesis/lib/X86/Target.cpp:155:12: note: remove std::move call here return std::move(Error); ^~~~~~~~~~ ~ llvm-svn: 346333	2018-11-07 16:52:50 +00:00
Clement Courbet	c544838f87	[llvm-exegesis] Correclty handle all X86 memory encoding formats. Summary: Add unit tests to check the support for each supported format to avoid regressions such as the one in PR36906. Reviewers: gchatelet Subscribers: tschuett, lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D54144 llvm-svn: 346330	2018-11-07 16:14:55 +00:00
Clement Courbet	7066769223	[llvm-exegesis] Increasing wrapping limit. Summary: Fixes PR39097. Reviewers: gchatelet Subscribers: llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D54151 llvm-svn: 346328	2018-11-07 15:46:45 +00:00
Clement Courbet	003e08ff28	[llvm-exegesis] Ignore X86 pseudo instructions. Summary: They do not lower to actual MCInsts and have no scheduling info. Reviewers: gchatelet Subscribers: llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D54147 llvm-svn: 346227	2018-11-06 14:11:58 +00:00
Matthias Braun	3d849f67cb	MachineModuleInfo: Store more specific reference to LLVMTargetMachine; NFC MachineModuleInfo can only be used in code using lib/CodeGen, hence we can keep a more specific reference to LLVMTargetMachine rather than just TargetMachine around. llvm-svn: 346182	2018-11-05 23:49:13 +00:00
Clement Courbet	4d837fce88	[llvm-exegesis] Fix SNB counter definition and handling. Summary: SNB is the only one that has P23 as a single proc res. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53766 llvm-svn: 345480	2018-10-28 19:09:14 +00:00
Simon Pilgrim	2a9c728088	Fix MSVC llvm-exegesis build. NFCI. MSVC is a bit funny about is_pod..... llvm-svn: 345252	2018-10-25 10:45:38 +00:00
Clement Courbet	b4b6ec01c6	[llvm-exegesis] Add missing initializer. This is a better fix than rL345245. llvm-svn: 345246	2018-10-25 08:11:35 +00:00
Clement Courbet	fa99b36e4d	[llvm-exegesis] Fix VC build of r345243. "const members cannot be default initialized unless their type has a user defined default constructor" Make members non-const. llvm-svn: 345245	2018-10-25 08:08:58 +00:00
Clement Courbet	8902c885d6	[llvm-exegesis] Fix warning in r345243. warning C4099: 'llvm::exegesis::PfmCountersInfo': type name first seen using 'class' now seen using 'struct' llvm-svn: 345244	2018-10-25 08:06:35 +00:00
Clement Courbet	41c8af3924	[MCSched] Bind PFM Counters to the CPUs instead of the SchedModel. Summary: The pfm counters are now in the ExegesisTarget rather than the MCSchedModel (PR39165). This also compresses the pfm counter tables (PR37068). Reviewers: RKSimon, gchatelet Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D52932 llvm-svn: 345243	2018-10-25 07:44:01 +00:00
Guillaume Chatelet	da11b85606	[llvm-exegesis] Implements a cache of Instruction objects. llvm-svn: 345130	2018-10-24 11:55:06 +00:00
Fangrui Song	a342834b24	[llvm-exegesis] Fix name lookup ambiguity in MSVC after 344922 llvm-svn: 344927	2018-10-22 17:52:31 +00:00
Fangrui Song	32401afd8c	[llvm-exegesis] Move namespace exegesis inside llvm:: Summary: This allows simplifying references of llvm::foo with foo when the needs come in the future. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53455 llvm-svn: 344922	2018-10-22 17:10:47 +00:00
Guillaume Chatelet	18ef4a4a0d	[llvm-exegesis] Crash when assembling invalid Operand llvm-svn: 344907	2018-10-22 15:06:10 +00:00
Guillaume Chatelet	02f70a3fde	[llvm-exegesis] Mark x86 segment register instructions as unsupported. Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53499 llvm-svn: 344906	2018-10-22 14:55:43 +00:00
Guillaume Chatelet	3c639f33b4	[llvm-exegesis] Reject x86 instructions that use non uniform memory accesses Reviewers: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53438 llvm-svn: 344905	2018-10-22 14:46:08 +00:00
Clement Courbet	8d0dd0ba0e	[llvm-exegesis] Mark second-form X87 instructions as unsupported. Summary: We only support the first form because we rely on information that is only available there. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53430 llvm-svn: 344782	2018-10-19 12:24:49 +00:00
Clement Courbet	22bad0497e	[llvm-exegesis] Re-enable liveliness tracker. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53429 llvm-svn: 344780	2018-10-19 12:08:05 +00:00
Clement Courbet	c51f45239d	[llvm-exegesis] X87 RFP setup code. Summary: This was lost during refactoring in rL342644. Fix and simplify simplify value size handling: always go through a 80 bit value, because the value can be 1 byte). Add unit tests. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53423 llvm-svn: 344779	2018-10-19 09:56:54 +00:00
Fangrui Song	2e83b2e9ee	Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC llvm-svn: 344774	2018-10-19 06:12:02 +00:00
Guillaume Chatelet	6a208e8c5f	[llvm-exegesis] Fix off by one error llvm-svn: 344731	2018-10-18 08:20:50 +00:00
Krasimir Georgiev	11bc3a18e2	[llvm-exegesis] Mark destructor virtual after r344695 This was causing a -Wnon-virtual-dtor warning. llvm-svn: 344721	2018-10-18 02:06:16 +00:00

1 2 3 4 5

208 Commits