llvm-project/bolt/test/X86/pre-aggregated-perf.test

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

45 lines
1.8 KiB
Plaintext
Raw Normal View History

[BOLT] Add parser for pre-aggregated perf data Summary: The regular perf2bolt aggregation job is to read perf output directly. However, if the data is coming from a database instead of perf, one could write a query to produce a pre-aggregated file. This function deals with this case. The pre-aggregated file contains aggregated LBR data, but without binary knowledge. BOLT will parse it and, using information from the disassembled binary, augment it with fall-through edge frequency information. After this step is finished, this data can be either written to disk to be consumed by BOLT later, or can be used by BOLT immediately if kept in memory. File format syntax: {B|F|f} [<start_id>:]<start_offset> [<end_id>:]<end_offset> <count> [<mispred_count>] B - indicates an aggregated branch F - an aggregated fall-through (trace) f - an aggregated fall-through with external origin - used to disambiguate between a return hitting a basic block head and a regular internal jump to the block <start_id> - build id of the object containing the start address. We can skip it for the main binary and use "X" for an unknown object. This will save some space and facilitate human parsing. <start_offset> - hex offset from the object base load address (0 for the main executable unless it's PIE) to the start address. <end_id>, <end_offset> - same for the end address. <count> - total aggregated count of the branch or a fall-through. <mispred_count> - the number of times the branch was mispredicted. Omitted for fall-throughs. Example F 41be50 41be50 3 F 41be90 41be90 4 f 41be90 41be90 7 B 4b1942 39b57f0 3 0 B 4b196f 4b19e0 2 0 (cherry picked from FBD8887182)
2018-07-18 09:31:46 +08:00
# This script checks that perf2bolt is reading pre-aggregated perf information
# correctly for a simple example. The perf.data of this example was generated
# with the following command:
#
# $ perf record -j any,u -e branch -o perf.data -- ./blarge
#
# blarge is the binary for "basicmath large inputs" taken from Mibench.
# Currently failing in MacOS / generating different hash for usqrt
REQUIRES: system-linux
[BOLT] Add parser for pre-aggregated perf data Summary: The regular perf2bolt aggregation job is to read perf output directly. However, if the data is coming from a database instead of perf, one could write a query to produce a pre-aggregated file. This function deals with this case. The pre-aggregated file contains aggregated LBR data, but without binary knowledge. BOLT will parse it and, using information from the disassembled binary, augment it with fall-through edge frequency information. After this step is finished, this data can be either written to disk to be consumed by BOLT later, or can be used by BOLT immediately if kept in memory. File format syntax: {B|F|f} [<start_id>:]<start_offset> [<end_id>:]<end_offset> <count> [<mispred_count>] B - indicates an aggregated branch F - an aggregated fall-through (trace) f - an aggregated fall-through with external origin - used to disambiguate between a return hitting a basic block head and a regular internal jump to the block <start_id> - build id of the object containing the start address. We can skip it for the main binary and use "X" for an unknown object. This will save some space and facilitate human parsing. <start_offset> - hex offset from the object base load address (0 for the main executable unless it's PIE) to the start address. <end_id>, <end_offset> - same for the end address. <count> - total aggregated count of the branch or a fall-through. <mispred_count> - the number of times the branch was mispredicted. Omitted for fall-throughs. Example F 41be50 41be50 3 F 41be90 41be90 4 f 41be90 41be90 7 B 4b1942 39b57f0 3 0 B 4b196f 4b19e0 2 0 (cherry picked from FBD8887182)
2018-07-18 09:31:46 +08:00
RUN: yaml2obj %p/Inputs/blarge.yaml &> %t.exe
RUN: perf2bolt %t.exe -o %t -pa -p %p/Inputs/pre-aggregated.txt -w %t.new
RUN: cat %t | sort | FileCheck %s -check-prefix=PERF2BOLT
RUN: cat %t.new | FileCheck %s -check-prefix=NEWFORMAT
PERF2BOLT: 0 [unknown] 7f36d18d60c0 1 main 53c 0 2
PERF2BOLT: 1 main 451 1 SolveCubic 0 0 2
PERF2BOLT: 1 main 490 0 [unknown] 4005f0 0 1
PERF2BOLT: 1 main 537 0 [unknown] 400610 0 1
PERF2BOLT: 1 usqrt 30 1 usqrt 32 0 22
PERF2BOLT: 1 usqrt 30 1 usqrt 39 4 33
PERF2BOLT: 1 usqrt 35 1 usqrt 39 0 22
PERF2BOLT: 1 usqrt 3d 1 usqrt 10 0 58
PERF2BOLT: 1 usqrt 3d 1 usqrt 3f 0 22
PERF2BOLT: 1 usqrt a 1 usqrt 10 0 22
[BOLT] Add parser for pre-aggregated perf data Summary: The regular perf2bolt aggregation job is to read perf output directly. However, if the data is coming from a database instead of perf, one could write a query to produce a pre-aggregated file. This function deals with this case. The pre-aggregated file contains aggregated LBR data, but without binary knowledge. BOLT will parse it and, using information from the disassembled binary, augment it with fall-through edge frequency information. After this step is finished, this data can be either written to disk to be consumed by BOLT later, or can be used by BOLT immediately if kept in memory. File format syntax: {B|F|f} [<start_id>:]<start_offset> [<end_id>:]<end_offset> <count> [<mispred_count>] B - indicates an aggregated branch F - an aggregated fall-through (trace) f - an aggregated fall-through with external origin - used to disambiguate between a return hitting a basic block head and a regular internal jump to the block <start_id> - build id of the object containing the start address. We can skip it for the main binary and use "X" for an unknown object. This will save some space and facilitate human parsing. <start_offset> - hex offset from the object base load address (0 for the main executable unless it's PIE) to the start address. <end_id>, <end_offset> - same for the end address. <count> - total aggregated count of the branch or a fall-through. <mispred_count> - the number of times the branch was mispredicted. Omitted for fall-throughs. Example F 41be50 41be50 3 F 41be90 41be90 4 f 41be90 41be90 7 B 4b1942 39b57f0 3 0 B 4b196f 4b19e0 2 0 (cherry picked from FBD8887182)
2018-07-18 09:31:46 +08:00
NEWFORMAT: - name: usqrt
NEWFORMAT: fid: 7
NEWFORMAT: exec: 0
NEWFORMAT: nblocks: 5
NEWFORMAT: blocks:
NEWFORMAT: - bid: 0
NEWFORMAT: insns: 4
[BOLT] Add parser for pre-aggregated perf data Summary: The regular perf2bolt aggregation job is to read perf output directly. However, if the data is coming from a database instead of perf, one could write a query to produce a pre-aggregated file. This function deals with this case. The pre-aggregated file contains aggregated LBR data, but without binary knowledge. BOLT will parse it and, using information from the disassembled binary, augment it with fall-through edge frequency information. After this step is finished, this data can be either written to disk to be consumed by BOLT later, or can be used by BOLT immediately if kept in memory. File format syntax: {B|F|f} [<start_id>:]<start_offset> [<end_id>:]<end_offset> <count> [<mispred_count>] B - indicates an aggregated branch F - an aggregated fall-through (trace) f - an aggregated fall-through with external origin - used to disambiguate between a return hitting a basic block head and a regular internal jump to the block <start_id> - build id of the object containing the start address. We can skip it for the main binary and use "X" for an unknown object. This will save some space and facilitate human parsing. <start_offset> - hex offset from the object base load address (0 for the main executable unless it's PIE) to the start address. <end_id>, <end_offset> - same for the end address. <count> - total aggregated count of the branch or a fall-through. <mispred_count> - the number of times the branch was mispredicted. Omitted for fall-throughs. Example F 41be50 41be50 3 F 41be90 41be90 4 f 41be90 41be90 7 B 4b1942 39b57f0 3 0 B 4b196f 4b19e0 2 0 (cherry picked from FBD8887182)
2018-07-18 09:31:46 +08:00
NEWFORMAT: succ: [ { bid: 1, cnt: 22 } ]
NEWFORMAT: - bid: 1
NEWFORMAT: insns: 9
NEWFORMAT: succ: [ { bid: 3, cnt: 33, mis: 4 }, { bid: 2, cnt: 22 } ]
NEWFORMAT: - bid: 2
NEWFORMAT: insns: 2
NEWFORMAT: succ: [ { bid: 3, cnt: 22 } ]
NEWFORMAT: - bid: 3
NEWFORMAT: insns: 2
NEWFORMAT: succ: [ { bid: 1, cnt: 58 }, { bid: 4, cnt: 22 } ]