# Benchmarks

`wuffs bench -mimic` summarized throughput numbers for various codecs are
below. Higher is better.

"Mimic" tests check that Wuffs' output mimics (i.e. exactly matches) other
libraries' output. "Mimic" benchmarks give the numbers for those other
libraries, as shipped with Debian. These were measured on a Debian Testing
system as of October 2019, which meant these compiler versions:

- clang/llvm 8.0.1
- gcc 9.2.1

and these "mimic" library versions, all written in C:

- libgif 5.1.4
- zlib 1.2.11

Unless otherwise stated, the numbers below were measured on an Intel x86\_64
Broadwell, and were taken as of Wuffs git commit ffdce5ef "Have bench-rust-gif
process animated / RGBA images".


## Reproducing

The benchmark programs aim to be runnable "out of the box" without any
configuration or installation. For example, to run the `std/zlib` benchmarks:

    git clone https://github.com/google/wuffs.git
    cd wuffs
    gcc -O3 test/c/std/zlib.c
    ./a.out -bench
    rm a.out

A comment near the top of that `.c` file says how to run the mimic benchmarks.

The output of those benchmark programs is compatible with the
[benchstat](https://godoc.org/golang.org/x/perf/cmd/benchstat) tool. For
example, that tool can calculate confidence intervals based on multiple
benchmark runs, or calculate p-values when comparing numbers before and after a
code change. To install it, first install Go, then run `go get
golang.org/x/perf/cmd/benchstat`.


## wuffs bench

As mentioned above, individual benchmark programs can be run manually. However,
the canonical way to run the benchmarks (across multiple compilers and multiple
packages like GIF and PNG) for Wuffs' standard library is to use the `wuffs`
command line tool, as it will also re-generate (transpile) the C code whenever
you edit the `std/*/*.wuffs` code. Running `go install -v
github.com/google/wuffs/cmd/...` will install the Wuffs tools. After that, you
can say

    wuffs bench

or

    wuffs bench -mimic std/deflate

or

    wuffs bench -ccompilers=gcc -reps=3 -focus=wuffs_gif_decode_20k std/gif


## Clang versus GCC

On some of the benchmarks below, clang performs noticeably worse (e.g. 1.3x
slower) than gcc, on the same C code. A relatively simple reproduction was
filed as [LLVM bug 35567](https://bugs.llvm.org/show_bug.cgi?id=35567).


## CPU Scaling

CPU power management can inject noise in benchmark times. On a Linux system,
power management can be controlled with:

    # Query.
    cpupower --cpu all frequency-info --policy
    # Turn on.
    sudo cpupower frequency-set --governor powersave
    # Turn off.
    sudo cpupower frequency-set --governor performance


---

# Adler-32

The `1k`, `10k`, etc. numbers are approximately how many bytes are hashed.

    name                                             speed     vs_mimic

    wuffs_adler32_10k/clang8                         2.41GB/s  0.84x
    wuffs_adler32_100k/clang8                        2.42GB/s  0.84x

    wuffs_adler32_10k/gcc9                           3.24GB/s  1.13x
    wuffs_adler32_100k/gcc9                          3.24GB/s  1.12x

    mimic_adler32_10k                                2.87GB/s  1.00x
    mimic_adler32_100k                               2.90GB/s  1.00x


# CRC-32

The `1k`, `10k`, etc. numbers are approximately how many bytes are hashed.

    name                                             speed     vs_mimic

    wuffs_crc32_ieee_10k/clang8                      2.85GB/s  2.11x
    wuffs_crc32_ieee_100k/clang8                     2.87GB/s  2.13x

    wuffs_crc32_ieee_10k/gcc9                        3.38GB/s  2.50x
    wuffs_crc32_ieee_100k/gcc9                       3.40GB/s  2.52x

    mimic_crc32_ieee_10k                             1.35GB/s  1.00x
    mimic_crc32_ieee_100k                            1.35GB/s  1.00x


# Deflate

The `1k`, `10k`, etc. numbers are approximately how many bytes there in the
decoded output.

The `full_init` vs `part_init` suffixes are whether
[`WUFFS_INITIALIZE__LEAVE_INTERNAL_BUFFERS_UNINITIALIZED`](/doc/note/initialization.md#partial-zero-initialization)
is unset or set.

    name                                             speed     vs_mimic

    wuffs_deflate_decode_1k_full_init/clang8          160MB/s  0.74x
    wuffs_deflate_decode_1k_part_init/clang8          199MB/s  0.92x
    wuffs_deflate_decode_10k_full_init/clang8         255MB/s  0.94x
    wuffs_deflate_decode_10k_part_init/clang8         263MB/s  0.97x
    wuffs_deflate_decode_100k_just_one_read/clang8    306MB/s  0.93x
    wuffs_deflate_decode_100k_many_big_reads/clang8   250MB/s  0.98x

    wuffs_deflate_decode_1k_full_init/gcc9            164MB/s  0.76x
    wuffs_deflate_decode_1k_part_init/gcc9            207MB/s  0.95x
    wuffs_deflate_decode_10k_full_init/gcc9           247MB/s  0.91x
    wuffs_deflate_decode_10k_part_init/gcc9           254MB/s  0.94x
    wuffs_deflate_decode_100k_just_one_read/gcc9      333MB/s  1.01x
    wuffs_deflate_decode_100k_many_big_reads/gcc9     261MB/s  1.02x

    mimic_deflate_decode_1k                           217MB/s  1.00x
    mimic_deflate_decode_10k                          270MB/s  1.00x
    mimic_deflate_decode_100k_just_one_read           329MB/s  1.00x
    mimic_deflate_decode_100k_many_big_reads          256MB/s  1.00x

32-bit ARMv7 (2012 era Samsung Exynos 5 Chromebook), Debian Stretch (2017):

    name                                             speed     vs_mimic

    wuffs_deflate_decode_1k_full_init/clang5         30.4MB/s  0.60x
    wuffs_deflate_decode_1k_part_init/clang5         37.9MB/s  0.74x
    wuffs_deflate_decode_10k_full_init/clang5        72.8MB/s  0.81x
    wuffs_deflate_decode_10k_part_init/clang5        76.2MB/s  0.85x
    wuffs_deflate_decode_100k_just_one_read/clang5   96.5MB/s  0.82x
    wuffs_deflate_decode_100k_many_big_reads/clang5  81.1MB/s  0.90x

    wuffs_deflate_decode_1k_full_init/gcc6           31.6MB/s  0.62x
    wuffs_deflate_decode_1k_part_init/gcc6           39.9MB/s  0.78x
    wuffs_deflate_decode_10k_full_init/gcc6          69.6MB/s  0.78x
    wuffs_deflate_decode_10k_part_init/gcc6          72.4MB/s  0.81x
    wuffs_deflate_decode_100k_just_one_read/gcc6     87.3MB/s  0.74x
    wuffs_deflate_decode_100k_many_big_reads/gcc6    73.8MB/s  0.82x

    mimic_deflate_decode_1k                          51.0MB/s  1.00x
    mimic_deflate_decode_10k                         89.7MB/s  1.00x
    mimic_deflate_decode_100k_just_one_read           118MB/s  1.00x
    mimic_deflate_decode_100k_many_big_reads         90.0MB/s  1.00x


## Deflate (C, miniz)

For comparison, here are [miniz](https://github.com/richgel999/miniz) 2.1.0's
numbers.

    name                                             speed     vs_mimic

    miniz_deflate_decode_1k/clang8                    174MB/s  0.80x
    miniz_deflate_decode_10k/clang8                   245MB/s  0.91x
    miniz_deflate_decode_100k_just_one_read/clang8    309MB/s  0.94x

    miniz_deflate_decode_1k/gcc9                      158MB/s  0.73x
    miniz_deflate_decode_10k/gcc9                     221MB/s  0.82x
    miniz_deflate_decode_100k_just_one_read/gcc9      250MB/s  0.76x

To reproduce these numbers, look in `test/c/mimiclib/deflate-gzip-zlib.c`.


## Deflate (Go)

For comparison, here are Go 1.12.10's numbers, using Go's standard library's
`compress/flate` package.

    name                                             speed     vs_mimic

    go_deflate_decode_1k                             45.4MB/s  0.21x
    go_deflate_decode_10k                            82.5MB/s  0.31x
    go_deflate_decode_100k                           94.0MB/s  0.29x

To reproduce these numbers:

    git clone https://github.com/google/wuffs.git
    cd wuffs/script/bench-go-deflate/
    go run main.go


## Deflate (Rust)

For comparison, here are Rust 1.37.0's numbers, using the
[alexcrichton/flate2-rs](https://github.com/alexcrichton/flate2-rs) and
[Frommi/miniz_oxide](https://github.com/Frommi/miniz_oxide) crates, which [this
file](https://github.com/sile/libflate/blob/77a1004edf6518a0badab7ce8837bc5338ff9bc3/README.md#an-informal-benchmark)
suggests is the fastest pure-Rust Deflate decoder.

    name                                             speed     vs_mimic

    rust_deflate_decode_1k                            104MB/s  0.48x
    rust_deflate_decode_10k                           202MB/s  0.75x
    rust_deflate_decode_100k                          218MB/s  0.66x

To reproduce these numbers:

    git clone https://github.com/google/wuffs.git
    cd wuffs/script/bench-rust-deflate/
    cargo run --release


# GIF

The `1k`, `10k`, etc. numbers are approximately how many pixels there are in
the decoded image. For example, the `test/data/harvesters.*` images are 1165 ×
859, approximately 1000k pixels.

The `bgra` vs `indexed` suffixes are whether to decode to 4 bytes (BGRA or
RGBA) or 1 byte (a palette index) per pixel, even if the underlying file format
gives 1 byte per pixel.

The `full_init` vs `part_init` suffixes are whether
[`WUFFS_INITIALIZE__LEAVE_INTERNAL_BUFFERS_UNINITIALIZED`](/doc/note/initialization.md#partial-zero-initialization)
is unset or set.

The libgif library doesn't export any API for decode-to-BGRA or decode-to-RGBA,
so there are no mimic numbers to compare to for the `bgra` suffix.

    name                                             speed     vs_mimic

    wuffs_gif_decode_1k_bw/clang8                     461MB/s  3.18x
    wuffs_gif_decode_1k_color_full_init/clang8        141MB/s  1.85x
    wuffs_gif_decode_1k_color_part_init/clang8        189MB/s  2.48x
    wuffs_gif_decode_10k_bgra/clang8                  743MB/s  n/a
    wuffs_gif_decode_10k_indexed/clang8               200MB/s  2.11x
    wuffs_gif_decode_20k/clang8                       245MB/s  2.50x
    wuffs_gif_decode_100k_artificial/clang8           531MB/s  3.43x
    wuffs_gif_decode_100k_realistic/clang8            218MB/s  2.27x
    wuffs_gif_decode_1000k_full_init/clang8           221MB/s  2.25x
    wuffs_gif_decode_1000k_part_init/clang8           221MB/s  2.25x
    wuffs_gif_decode_anim_screencap/clang8           1.07GB/s  6.01x

    wuffs_gif_decode_1k_bw/gcc9                       478MB/s  3.30x
    wuffs_gif_decode_1k_color_full_init/gcc9          148MB/s  1.94x
    wuffs_gif_decode_1k_color_part_init/gcc9          194MB/s  2.54x
    wuffs_gif_decode_10k_bgra/gcc9                    645MB/s  n/a
    wuffs_gif_decode_10k_indexed/gcc9                 203MB/s  2.14x
    wuffs_gif_decode_20k/gcc9                         244MB/s  2.49x
    wuffs_gif_decode_100k_artificial/gcc9             532MB/s  3.43x
    wuffs_gif_decode_100k_realistic/gcc9              214MB/s  2.23x
    wuffs_gif_decode_1000k_full_init/gcc9             217MB/s  2.21x
    wuffs_gif_decode_1000k_part_init/gcc9             218MB/s  2.22x
    wuffs_gif_decode_anim_screencap/gcc9             1.11GB/s  6.24x

    mimic_gif_decode_1k_bw                            145MB/s  1.00x
    mimic_gif_decode_1k_color                        76.3MB/s  1.00x
    mimic_gif_decode_10k_indexed                     94.9MB/s  1.00x
    mimic_gif_decode_20k                             98.1MB/s  1.00x
    mimic_gif_decode_100k_artificial                  155MB/s  1.00x
    mimic_gif_decode_100k_realistic                  96.1MB/s  1.00x
    mimic_gif_decode_1000k                           98.4MB/s  1.00x
    mimic_gif_decode_anim_screencap                   178MB/s  1.00x

32-bit ARMv7 (2012 era Samsung Exynos 5 Chromebook), Debian Stretch (2017):

    name                                             speed     vs_mimic

    wuffs_gif_decode_1k_bw/clang5                    49.1MB/s  1.76x
    wuffs_gif_decode_1k_color_full_init/clang5       22.3MB/s  1.35x
    wuffs_gif_decode_1k_color_part_init/clang5       27.4MB/s  1.66x
    wuffs_gif_decode_10k_bgra/clang5                  157MB/s  n/a
    wuffs_gif_decode_10k_indexed/clang5              42.0MB/s  1.79x
    wuffs_gif_decode_20k/clang5                      49.3MB/s  1.68x
    wuffs_gif_decode_100k_artificial/clang5           132MB/s  2.62x
    wuffs_gif_decode_100k_realistic/clang5           47.8MB/s  1.62x
    wuffs_gif_decode_1000k_full_init/clang5          46.4MB/s  1.62x
    wuffs_gif_decode_1000k_part_init/clang5          46.4MB/s  1.62x
    wuffs_gif_decode_anim_screencap/clang5            243MB/s  4.03x

    wuffs_gif_decode_1k_bw/gcc6                      46.6MB/s  1.67x
    wuffs_gif_decode_1k_color_full_init/gcc6         20.1MB/s  1.22x
    wuffs_gif_decode_1k_color_part_init/gcc6         24.2MB/s  1.47x
    wuffs_gif_decode_10k_bgra/gcc6                    124MB/s  n/a
    wuffs_gif_decode_10k_indexed/gcc6                34.8MB/s  1.49x
    wuffs_gif_decode_20k/gcc6                        43.8MB/s  1.49x
    wuffs_gif_decode_100k_artificial/gcc6             123MB/s  2.44x
    wuffs_gif_decode_100k_realistic/gcc6             42.7MB/s  1.44x
    wuffs_gif_decode_1000k_full_init/gcc6            41.6MB/s  1.45x
    wuffs_gif_decode_1000k_part_init/gcc6            41.7MB/s  1.45x
    wuffs_gif_decode_anim_screencap/gcc6              227MB/s  3.76x

    mimic_gif_decode_1k_bw                           27.9MB/s  1.00x
    mimic_gif_decode_1k_color                        16.5MB/s  1.00x
    mimic_gif_decode_10k_indexed                     23.4MB/s  1.00x
    mimic_gif_decode_20k                             29.4MB/s  1.00x
    mimic_gif_decode_100k_artificial                 50.4MB/s  1.00x
    mimic_gif_decode_100k_realistic                  29.5MB/s  1.00x
    mimic_gif_decode_1000k                           28.7MB/s  1.00x
    mimic_gif_decode_anim_screencap                  60.3MB/s  1.00x


## GIF (Go)

For comparison, here are Go 1.12.10's numbers, using Go's standard library's
`image/gif` package.

    name                                             speed     vs_mimic

    go_gif_decode_1k_bw                               107MB/s  0.74x
    go_gif_decode_1k_color                           39.2MB/s  0.51x
    go_gif_decode_10k_bgra                            117MB/s  n/a
    go_gif_decode_10k_indexed                        57.8MB/s  0.61x
    go_gif_decode_20k                                67.2MB/s  0.69x
    go_gif_decode_100k_artificial                     151MB/s  0.97x
    go_gif_decode_100k_realistic                     67.2MB/s  0.70x
    go_gif_decode_1000k                              68.1MB/s  0.69x
    go_gif_decode_anim_screencap                      206MB/s  1.16x

To reproduce these numbers:

    git clone https://github.com/google/wuffs.git
    cd wuffs/script/bench-go-gif/
    go run main.go


## GIF (Rust)

For comparison, here are Rust 1.37.0's numbers, using the
[image-rs/image-gif](https://github.com/image-rs/image-gif) crate, easily the
top `crates.io` result for ["gif"](https://crates.io/search?q=gif).

    name                                             speed     vs_mimic

    rust_gif_decode_1k_bw                            89.2MB/s  0.62x
    rust_gif_decode_1k_color                         20.7MB/s  0.27x
    rust_gif_decode_10k_bgra                         74.5MB/s  n/a
    rust_gif_decode_10k_indexed                      20.4MB/s  0.21x
    rust_gif_decode_20k                              28.9MB/s  0.29x
    rust_gif_decode_100k_artificial                  79.1MB/s  0.51x
    rust_gif_decode_100k_realistic                   27.9MB/s  0.29x
    rust_gif_decode_1000k                            27.9MB/s  0.28x
    rust_gif_decode_anim_screencap                    144MB/s  0.81x

To reproduce these numbers:

    git clone https://github.com/google/wuffs.git
    cd wuffs/script/bench-rust-gif/
    cargo run --release


# Gzip (Deflate + CRC-32)

The `1k`, `10k`, etc. numbers are approximately how many bytes there in the
decoded output.

    name                                             speed     vs_mimic

    wuffs_gzip_decode_10k/clang8                      238MB/s  1.05x
    wuffs_gzip_decode_100k/clang8                     273MB/s  1.03x

    wuffs_gzip_decode_10k/gcc9                        239MB/s  1.06x
    wuffs_gzip_decode_100k/gcc9                       297MB/s  1.12x

    mimic_gzip_decode_10k                             226MB/s  1.00x
    mimic_gzip_decode_100k                            265MB/s  1.00x


# LZW

The `1k`, `10k`, etc. numbers are approximately how many bytes there in the
decoded output.

The libgif library doesn't export its LZW decoder in its API, so there are no
mimic numbers to compare to.

    name                                             speed     vs_mimic

    wuffs_lzw_decode_20k/clang8                       263MB/s  n/a
    wuffs_lzw_decode_100k/clang8                      438MB/s  n/a

    wuffs_lzw_decode_20k/gcc9                         266MB/s  n/a
    wuffs_lzw_decode_100k/gcc9                        450MB/s  n/a


# Zlib (Deflate + Adler-32)

The `1k`, `10k`, etc. numbers are approximately how many bytes there in the
decoded output.

    name                                             speed     vs_mimic

    wuffs_zlib_decode_10k/clang8                      237MB/s  0.96x
    wuffs_zlib_decode_100k/clang8                     272MB/s  0.92x

    wuffs_zlib_decode_10k/gcc9                        242MB/s  0.98x
    wuffs_zlib_decode_100k/gcc9                       294MB/s  0.99x

    mimic_zlib_decode_10k                             247MB/s  1.00x
    mimic_zlib_decode_100k                            296MB/s  1.00x


---

Updated on December 2019.
