llvm-project/libc/benchmarks/README.md

# Libc mem* benchmarks

This framework has been designed to evaluate and compare relative performance of
memory function implementations on a particular host.

It will also be use to track implementations performances over time.

## Quick start

### Setup

**Python 2** [being deprecated](https://www.python.org/doc/sunset-python-2/) it is
advised to used **Python 3**.

Then make sure to have `matplotlib`, `scipy` and `numpy` setup correctly:

```shell
apt-get install python3-pip
pip3 install matplotlib scipy numpy
```
You may need `python3-gtk` or similar package for displaying benchmark results.

To get good reproducibility it is important to make sure that the system runs in
`performance` mode. This is achieved by running:

```shell
cpupower frequency-set --governor performance
```

### Run and display `memcpy` benchmark

The following commands will run the benchmark and display a 95 percentile
confidence interval curve of **time per copied bytes**. It also features **host
informations** and **benchmarking configuration**.

```shell
cd llvm-project
cmake -B/tmp/build -Sllvm -DLLVM_ENABLE_PROJECTS='clang;clang-tools-extra;libc' -DCMAKE_BUILD_TYPE=Release -G Ninja
ninja -C /tmp/build display-libc-memcpy-benchmark-small
```

The display target will attempt to open a window on the machine where you're
running the benchmark. If this may not work for you then you may want `render`
or `run` instead as detailed below.

## Benchmarking targets

The benchmarking process occurs in two steps:

1. Benchmark the functions and produce a `json` file
2. Display (or renders) the `json` file

Targets are of the form `<action>-libc-<function>-benchmark-<configuration>`

 - `action` is one of :
    - `run`, runs the benchmark and writes the `json` file
    - `display`, displays the graph on screen
    - `render`, renders the graph on disk as a `png` file
 - `function` is one of : `memcpy`, `memcmp`, `memset`
 - `configuration` is one of : `small`, `big`

## Benchmarking regimes

Using a profiler to observe size distributions for calls into libc functions, it
was found most operations act on a small number of bytes.

Function           | % of calls with size ≤ 128 | % of calls with size ≤ 1024
------------------ | --------------------------: | ---------------------------:
memcpy             | 96%                         | 99%
memset             | 91%                         | 99.9%
memcmp<sup>1</sup> | 99.5%                       | ~100%

Benchmarking configurations come in two flavors:

 - [small](libc/utils/benchmarks/configuration_small.json)
    - Exercises sizes up to `1KiB`, representative of normal usage
    - The data is kept in the `L1` cache to prevent measuring the memory
      subsystem
 - [big](libc/utils/benchmarks/configuration_big.json)
    - Exercises sizes up to `32MiB` to test large operations
    - Caching effects can show up here which prevents comparing different hosts

_<sup>1</sup> - The size refers to the size of the buffers to compare and not
the number of bytes until the first difference._

## Superposing curves

It is possible to **merge** several `json` files into a single graph. This is
useful to **compare** implementations.

In the following example we superpose the curves for `memcpy`, `memset` and
`memcmp`:

```shell
> make -C /tmp/build run-libc-memcpy-benchmark-small run-libc-memcmp-benchmark-small run-libc-memset-benchmark-small
> python libc/utils/benchmarks/render.py3 /tmp/last-libc-memcpy-benchmark-small.json /tmp/last-libc-memcmp-benchmark-small.json /tmp/last-libc-memset-benchmark-small.json
```

## Useful `render.py3` flags

 - To save the produced graph `--output=/tmp/benchmark_curve.png`.
 - To prevent the graph from appearing on the screen `--headless`.


## Under the hood

 To learn more about the design decisions behind the benchmarking framework,
 have a look at the [RATIONALE.md](RATIONALE.md) file.
[llvm-libc] Add memory function benchmarks Summary: This patch adds a benchmarking infrastructure for llvm-libc memory functions. In a nutshell, the code can benchmark small and large buffers for the memcpy, memset and memcmp functions. It also produces graphs of size vs latency by running targets of the form `render-libc-{memcpy\|memset\|memcmp}-benchmark-{small\|big}`. The configurations are provided as JSON files and the benchmark also produces a JSON file. This file is then parsed and rendered as a PNG file via the `render.py` script (make sure to run `pip3 install matplotlib scipy numpy`). The script can take several JSON files as input and will superimpose the curves if they are from the same host. TODO: - The code benchmarks whatever is available on the host but should be configured to benchmark the -to be added- llvm-libc memory functions. - Add a README file with instructions and rationale. - Produce scores to track the performance of the functions over time to allow for regression detection. Reviewers: sivachandra, ckennelly Subscribers: mgorny, MaskRay, libc-commits Tags: #libc-project Differential Revision: https://reviews.llvm.org/D72516 2020-01-06 20:17:04 +08:00			`# Libc mem* benchmarks`

			`This framework has been designed to evaluate and compare relative performance of`
			`memory function implementations on a particular host.`

			`It will also be use to track implementations performances over time.`

			`## Quick start`

			`### Setup`

			`Python 2 [being deprecated](https://www.python.org/doc/sunset-python-2/) it is`
			`advised to used Python 3.`

			Then make sure to have `matplotlib`, `scipy` and `numpy` setup correctly:

			```shell
			`apt-get install python3-pip`
			`pip3 install matplotlib scipy numpy`
			```
[libc] Reorganize and clarify a few points around benchmarking A few documentation clarifications and moving one part of the docs around to be closer to the first mention of display so that it's easier to spot based on some user feedback. Differential Revision: https://reviews.llvm.org/D79443 2020-05-06 05:02:10 +08:00			You may need `python3-gtk` or similar package for displaying benchmark results.
[llvm-libc] Add memory function benchmarks Summary: This patch adds a benchmarking infrastructure for llvm-libc memory functions. In a nutshell, the code can benchmark small and large buffers for the memcpy, memset and memcmp functions. It also produces graphs of size vs latency by running targets of the form `render-libc-{memcpy\|memset\|memcmp}-benchmark-{small\|big}`. The configurations are provided as JSON files and the benchmark also produces a JSON file. This file is then parsed and rendered as a PNG file via the `render.py` script (make sure to run `pip3 install matplotlib scipy numpy`). The script can take several JSON files as input and will superimpose the curves if they are from the same host. TODO: - The code benchmarks whatever is available on the host but should be configured to benchmark the -to be added- llvm-libc memory functions. - Add a README file with instructions and rationale. - Produce scores to track the performance of the functions over time to allow for regression detection. Reviewers: sivachandra, ckennelly Subscribers: mgorny, MaskRay, libc-commits Tags: #libc-project Differential Revision: https://reviews.llvm.org/D72516 2020-01-06 20:17:04 +08:00
			`To get good reproducibility it is important to make sure that the system runs in`
			`performance` mode. This is achieved by running:

			```shell
			`cpupower frequency-set --governor performance`
			```

			### Run and display `memcpy` benchmark

			`The following commands will run the benchmark and display a 95 percentile`
			`confidence interval curve of time per copied bytes. It also features **host`
			`informations and benchmarking configuration**.`

			```shell
			`cd llvm-project`
[libc] Migrate the libc benchmark instruction to ninja. Reviewers: sivachandra Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D82143 2020-06-19 12:54:30 +08:00			`cmake -B/tmp/build -Sllvm -DLLVM_ENABLE_PROJECTS='clang;clang-tools-extra;libc' -DCMAKE_BUILD_TYPE=Release -G Ninja`
			`ninja -C /tmp/build display-libc-memcpy-benchmark-small`
[llvm-libc] Add memory function benchmarks Summary: This patch adds a benchmarking infrastructure for llvm-libc memory functions. In a nutshell, the code can benchmark small and large buffers for the memcpy, memset and memcmp functions. It also produces graphs of size vs latency by running targets of the form `render-libc-{memcpy\|memset\|memcmp}-benchmark-{small\|big}`. The configurations are provided as JSON files and the benchmark also produces a JSON file. This file is then parsed and rendered as a PNG file via the `render.py` script (make sure to run `pip3 install matplotlib scipy numpy`). The script can take several JSON files as input and will superimpose the curves if they are from the same host. TODO: - The code benchmarks whatever is available on the host but should be configured to benchmark the -to be added- llvm-libc memory functions. - Add a README file with instructions and rationale. - Produce scores to track the performance of the functions over time to allow for regression detection. Reviewers: sivachandra, ckennelly Subscribers: mgorny, MaskRay, libc-commits Tags: #libc-project Differential Revision: https://reviews.llvm.org/D72516 2020-01-06 20:17:04 +08:00			```

[libc] Reorganize and clarify a few points around benchmarking A few documentation clarifications and moving one part of the docs around to be closer to the first mention of display so that it's easier to spot based on some user feedback. Differential Revision: https://reviews.llvm.org/D79443 2020-05-06 05:02:10 +08:00			`The display target will attempt to open a window on the machine where you're`
			running the benchmark. If this may not work for you then you may want `render`
			or `run` instead as detailed below.

			`## Benchmarking targets`

			`The benchmarking process occurs in two steps:`

			1. Benchmark the functions and produce a `json` file
			2. Display (or renders) the `json` file

			Targets are of the form `<action>-libc-<function>-benchmark-<configuration>`

			- `action` is one of :
			- `run`, runs the benchmark and writes the `json` file
			- `display`, displays the graph on screen
			- `render`, renders the graph on disk as a `png` file
			- `function` is one of : `memcpy`, `memcmp`, `memset`
			- `configuration` is one of : `small`, `big`

[llvm-libc] Add memory function benchmarks Summary: This patch adds a benchmarking infrastructure for llvm-libc memory functions. In a nutshell, the code can benchmark small and large buffers for the memcpy, memset and memcmp functions. It also produces graphs of size vs latency by running targets of the form `render-libc-{memcpy\|memset\|memcmp}-benchmark-{small\|big}`. The configurations are provided as JSON files and the benchmark also produces a JSON file. This file is then parsed and rendered as a PNG file via the `render.py` script (make sure to run `pip3 install matplotlib scipy numpy`). The script can take several JSON files as input and will superimpose the curves if they are from the same host. TODO: - The code benchmarks whatever is available on the host but should be configured to benchmark the -to be added- llvm-libc memory functions. - Add a README file with instructions and rationale. - Produce scores to track the performance of the functions over time to allow for regression detection. Reviewers: sivachandra, ckennelly Subscribers: mgorny, MaskRay, libc-commits Tags: #libc-project Differential Revision: https://reviews.llvm.org/D72516 2020-01-06 20:17:04 +08:00			`## Benchmarking regimes`

			`Using a profiler to observe size distributions for calls into libc functions, it`
			`was found most operations act on a small number of bytes.`

			`Function \| % of calls with size ≤ 128 \| % of calls with size ≤ 1024`
			`------------------ \| --------------------------: \| ---------------------------:`
			`memcpy \| 96% \| 99%`
			`memset \| 91% \| 99.9%`
			`memcmp<sup>1</sup> \| 99.5% \| ~100%`

			`Benchmarking configurations come in two flavors:`

			`- [small](libc/utils/benchmarks/configuration_small.json)`
			- Exercises sizes up to `1KiB`, representative of normal usage
			- The data is kept in the `L1` cache to prevent measuring the memory
			`subsystem`
			`- [big](libc/utils/benchmarks/configuration_big.json)`
			- Exercises sizes up to `32MiB` to test large operations
			`- Caching effects can show up here which prevents comparing different hosts`

			`_<sup>1</sup> - The size refers to the size of the buffers to compare and not`
			`the number of bytes until the first difference._`

			`## Superposing curves`

			It is possible to merge several `json` files into a single graph. This is
			`useful to compare implementations.`

			In the following example we superpose the curves for `memcpy`, `memset` and
			`memcmp`:

			```shell
			`> make -C /tmp/build run-libc-memcpy-benchmark-small run-libc-memcmp-benchmark-small run-libc-memset-benchmark-small`
			`> python libc/utils/benchmarks/render.py3 /tmp/last-libc-memcpy-benchmark-small.json /tmp/last-libc-memcmp-benchmark-small.json /tmp/last-libc-memset-benchmark-small.json`
			```

			## Useful `render.py3` flags

			- To save the produced graph `--output=/tmp/benchmark_curve.png`.
			- To prevent the graph from appearing on the screen `--headless`.


			`## Under the hood`

			`To learn more about the design decisions behind the benchmarking framework,`
			`have a look at the [RATIONALE.md](RATIONALE.md) file.`