| # Benchmarks |
| |
| `wuffs bench -mimic` summarized throughput numbers for various codecs are |
| below. Higher is better. |
| |
| "Mimic" tests check that Wuffs' output mimics (i.e. exactly matches) other |
| libraries' output. "Mimic" benchmarks give the numbers for those other |
| libraries, as shipped with Debian. These were measured on a Debian Testing |
| system as of October 2019, which meant these compiler versions: |
| |
| - clang/llvm 8.0.1 |
| - gcc 9.2.1 |
| |
| and these "mimic" library versions, all written in C: |
| |
| - libgif 5.1.4 |
| - zlib 1.2.11 |
| |
| Unless otherwise stated, the numbers below were measured on an Intel x86\_64 |
| Broadwell, and were taken as of Wuffs git commit ffdce5ef "Have bench-rust-gif |
| process animated / RGBA images". |
| |
| |
| ## Reproducing |
| |
| The benchmark programs aim to be runnable "out of the box" without any |
| configuration or installation. For example, to run the `std/zlib` benchmarks: |
| |
| git clone https://github.com/google/wuffs.git |
| cd wuffs |
| gcc -O3 test/c/std/zlib.c |
| ./a.out -bench |
| rm a.out |
| |
| A comment near the top of that `.c` file says how to run the mimic benchmarks. |
| |
| The output of those benchmark programs is compatible with the |
| [benchstat](https://godoc.org/golang.org/x/perf/cmd/benchstat) tool. For |
| example, that tool can calculate confidence intervals based on multiple |
| benchmark runs, or calculate p-values when comparing numbers before and after a |
| code change. To install it, first install Go, then run `go install |
| golang.org/x/perf/cmd/benchstat`. |
| |
| |
| ## wuffs bench |
| |
| As mentioned above, individual benchmark programs can be run manually. However, |
| the canonical way to run the benchmarks (across multiple compilers and multiple |
| packages like GIF and PNG) for Wuffs' standard library is to use the `wuffs` |
| command line tool, as it will also re-generate (transpile) the C code whenever |
| you edit the `std/*/*.wuffs` code. Running `go install -v |
| github.com/google/wuffs/cmd/...` will install the Wuffs tools. After that, you |
| can say |
| |
| wuffs bench |
| |
| or |
| |
| wuffs bench -mimic std/deflate |
| |
| or |
| |
| wuffs bench -ccompilers=gcc -reps=3 -focus=wuffs_gif_decode_20k std/gif |
| |
| |
| ## Clang versus GCC |
| |
| On some of the benchmarks below, clang performs noticeably worse (e.g. 1.3x |
| slower) than gcc, on the same C code. A relatively simple reproduction was |
| filed as [LLVM bug 35567](https://bugs.llvm.org/show_bug.cgi?id=35567). |
| |
| |
| ## CPU Scaling |
| |
| CPU power management can inject noise in benchmark times. On a Linux system, |
| power management can be controlled with: |
| |
| # Query. |
| cpupower --cpu all frequency-info --policy |
| # Turn on. |
| sudo cpupower frequency-set --governor powersave |
| # Turn off. |
| sudo cpupower frequency-set --governor performance |
| |
| |
| --- |
| |
| # Adler-32 |
| |
| The `1k`, `10k`, etc. numbers are approximately how many bytes are hashed. |
| |
| name speed vs_mimic |
| |
| wuffs_adler32_10k/clang8 2.41GB/s 0.84x |
| wuffs_adler32_100k/clang8 2.42GB/s 0.84x |
| |
| wuffs_adler32_10k/gcc9 3.24GB/s 1.13x |
| wuffs_adler32_100k/gcc9 3.24GB/s 1.12x |
| |
| mimic_adler32_10k 2.87GB/s 1.00x |
| mimic_adler32_100k 2.90GB/s 1.00x |
| |
| |
| # CRC-32 |
| |
| The `1k`, `10k`, etc. numbers are approximately how many bytes are hashed. |
| |
| name speed vs_mimic |
| |
| wuffs_crc32_ieee_10k/clang8 2.85GB/s 2.11x |
| wuffs_crc32_ieee_100k/clang8 2.87GB/s 2.13x |
| |
| wuffs_crc32_ieee_10k/gcc9 3.38GB/s 2.50x |
| wuffs_crc32_ieee_100k/gcc9 3.40GB/s 2.52x |
| |
| mimic_crc32_ieee_10k 1.35GB/s 1.00x |
| mimic_crc32_ieee_100k 1.35GB/s 1.00x |
| |
| |
| # Deflate |
| |
| The `1k`, `10k`, etc. numbers are approximately how many bytes there in the |
| decoded output. |
| |
| The `full_init` vs `part_init` suffixes are whether |
| [`WUFFS_INITIALIZE__LEAVE_INTERNAL_BUFFERS_UNINITIALIZED`](/doc/note/initialization.md#partial-zero-initialization) |
| is unset or set. |
| |
| name speed vs_mimic |
| |
| wuffs_deflate_decode_1k_full_init/clang8 160MB/s 0.74x |
| wuffs_deflate_decode_1k_part_init/clang8 199MB/s 0.92x |
| wuffs_deflate_decode_10k_full_init/clang8 255MB/s 0.94x |
| wuffs_deflate_decode_10k_part_init/clang8 263MB/s 0.97x |
| wuffs_deflate_decode_100k_just_one_read/clang8 306MB/s 0.93x |
| wuffs_deflate_decode_100k_many_big_reads/clang8 250MB/s 0.98x |
| |
| wuffs_deflate_decode_1k_full_init/gcc9 164MB/s 0.76x |
| wuffs_deflate_decode_1k_part_init/gcc9 207MB/s 0.95x |
| wuffs_deflate_decode_10k_full_init/gcc9 247MB/s 0.91x |
| wuffs_deflate_decode_10k_part_init/gcc9 254MB/s 0.94x |
| wuffs_deflate_decode_100k_just_one_read/gcc9 333MB/s 1.01x |
| wuffs_deflate_decode_100k_many_big_reads/gcc9 261MB/s 1.02x |
| |
| mimic_deflate_decode_1k 217MB/s 1.00x |
| mimic_deflate_decode_10k 270MB/s 1.00x |
| mimic_deflate_decode_100k_just_one_read 329MB/s 1.00x |
| mimic_deflate_decode_100k_many_big_reads 256MB/s 1.00x |
| |
| 32-bit ARMv7 (2012 era Samsung Exynos 5 Chromebook), Debian Stretch (2017): |
| |
| name speed vs_mimic |
| |
| wuffs_deflate_decode_1k_full_init/clang5 30.4MB/s 0.60x |
| wuffs_deflate_decode_1k_part_init/clang5 37.9MB/s 0.74x |
| wuffs_deflate_decode_10k_full_init/clang5 72.8MB/s 0.81x |
| wuffs_deflate_decode_10k_part_init/clang5 76.2MB/s 0.85x |
| wuffs_deflate_decode_100k_just_one_read/clang5 96.5MB/s 0.82x |
| wuffs_deflate_decode_100k_many_big_reads/clang5 81.1MB/s 0.90x |
| |
| wuffs_deflate_decode_1k_full_init/gcc6 31.6MB/s 0.62x |
| wuffs_deflate_decode_1k_part_init/gcc6 39.9MB/s 0.78x |
| wuffs_deflate_decode_10k_full_init/gcc6 69.6MB/s 0.78x |
| wuffs_deflate_decode_10k_part_init/gcc6 72.4MB/s 0.81x |
| wuffs_deflate_decode_100k_just_one_read/gcc6 87.3MB/s 0.74x |
| wuffs_deflate_decode_100k_many_big_reads/gcc6 73.8MB/s 0.82x |
| |
| mimic_deflate_decode_1k 51.0MB/s 1.00x |
| mimic_deflate_decode_10k 89.7MB/s 1.00x |
| mimic_deflate_decode_100k_just_one_read 118MB/s 1.00x |
| mimic_deflate_decode_100k_many_big_reads 90.0MB/s 1.00x |
| |
| |
| ## Deflate (C, miniz) |
| |
| For comparison, here are [miniz](https://github.com/richgel999/miniz) 2.1.0's |
| numbers. |
| |
| name speed vs_mimic |
| |
| miniz_deflate_decode_1k/clang8 174MB/s 0.80x |
| miniz_deflate_decode_10k/clang8 245MB/s 0.91x |
| miniz_deflate_decode_100k_just_one_read/clang8 309MB/s 0.94x |
| |
| miniz_deflate_decode_1k/gcc9 158MB/s 0.73x |
| miniz_deflate_decode_10k/gcc9 221MB/s 0.82x |
| miniz_deflate_decode_100k_just_one_read/gcc9 250MB/s 0.76x |
| |
| To reproduce these numbers, look in `test/c/mimiclib/deflate-gzip-zlib.c`. |
| |
| |
| ## Deflate (Go) |
| |
| For comparison, here are Go 1.12.10's numbers, using Go's standard library's |
| `compress/flate` package. |
| |
| name speed vs_mimic |
| |
| go_deflate_decode_1k 45.4MB/s 0.21x |
| go_deflate_decode_10k 82.5MB/s 0.31x |
| go_deflate_decode_100k 94.0MB/s 0.29x |
| |
| To reproduce these numbers: |
| |
| git clone https://github.com/google/wuffs.git |
| cd wuffs/script/bench-go-deflate/ |
| go run main.go |
| |
| |
| ## Deflate (Rust) |
| |
| For comparison, here are Rust 1.37.0's numbers, using the |
| [alexcrichton/flate2-rs](https://github.com/alexcrichton/flate2-rs) and |
| [Frommi/miniz_oxide](https://github.com/Frommi/miniz_oxide) crates, which [this |
| file](https://github.com/sile/libflate/blob/77a1004edf6518a0badab7ce8837bc5338ff9bc3/README.md#an-informal-benchmark) |
| suggests is the fastest pure-Rust Deflate decoder. |
| |
| name speed vs_mimic |
| |
| rust_deflate_decode_1k 104MB/s 0.48x |
| rust_deflate_decode_10k 202MB/s 0.75x |
| rust_deflate_decode_100k 218MB/s 0.66x |
| |
| To reproduce these numbers: |
| |
| git clone https://github.com/google/wuffs.git |
| cd wuffs/script/bench-rust-deflate/ |
| cargo run --release |
| |
| |
| # GIF |
| |
| The `1k`, `10k`, etc. numbers are approximately how many pixels there are in |
| the decoded image. For example, the `test/data/harvesters.*` images are 1165 × |
| 859, approximately 1000k pixels. |
| |
| The `bgra` vs `indexed` suffixes are whether to decode to 4 bytes (BGRA or |
| RGBA) or 1 byte (a palette index) per pixel, even if the underlying file format |
| gives 1 byte per pixel. |
| |
| The `full_init` vs `part_init` suffixes are whether |
| [`WUFFS_INITIALIZE__LEAVE_INTERNAL_BUFFERS_UNINITIALIZED`](/doc/note/initialization.md#partial-zero-initialization) |
| is unset or set. |
| |
| The libgif library doesn't export any API for decode-to-BGRA or decode-to-RGBA, |
| so there are no mimic numbers to compare to for the `bgra` suffix. |
| |
| name speed vs_mimic |
| |
| wuffs_gif_decode_1k_bw/clang8 461MB/s 3.18x |
| wuffs_gif_decode_1k_color_full_init/clang8 141MB/s 1.85x |
| wuffs_gif_decode_1k_color_part_init/clang8 189MB/s 2.48x |
| wuffs_gif_decode_10k_bgra/clang8 743MB/s n/a |
| wuffs_gif_decode_10k_indexed/clang8 200MB/s 2.11x |
| wuffs_gif_decode_20k/clang8 245MB/s 2.50x |
| wuffs_gif_decode_100k_artificial/clang8 531MB/s 3.43x |
| wuffs_gif_decode_100k_realistic/clang8 218MB/s 2.27x |
| wuffs_gif_decode_1000k_full_init/clang8 221MB/s 2.25x |
| wuffs_gif_decode_1000k_part_init/clang8 221MB/s 2.25x |
| wuffs_gif_decode_anim_screencap/clang8 1.07GB/s 6.01x |
| |
| wuffs_gif_decode_1k_bw/gcc9 478MB/s 3.30x |
| wuffs_gif_decode_1k_color_full_init/gcc9 148MB/s 1.94x |
| wuffs_gif_decode_1k_color_part_init/gcc9 194MB/s 2.54x |
| wuffs_gif_decode_10k_bgra/gcc9 645MB/s n/a |
| wuffs_gif_decode_10k_indexed/gcc9 203MB/s 2.14x |
| wuffs_gif_decode_20k/gcc9 244MB/s 2.49x |
| wuffs_gif_decode_100k_artificial/gcc9 532MB/s 3.43x |
| wuffs_gif_decode_100k_realistic/gcc9 214MB/s 2.23x |
| wuffs_gif_decode_1000k_full_init/gcc9 217MB/s 2.21x |
| wuffs_gif_decode_1000k_part_init/gcc9 218MB/s 2.22x |
| wuffs_gif_decode_anim_screencap/gcc9 1.11GB/s 6.24x |
| |
| mimic_gif_decode_1k_bw 145MB/s 1.00x |
| mimic_gif_decode_1k_color 76.3MB/s 1.00x |
| mimic_gif_decode_10k_indexed 94.9MB/s 1.00x |
| mimic_gif_decode_20k 98.1MB/s 1.00x |
| mimic_gif_decode_100k_artificial 155MB/s 1.00x |
| mimic_gif_decode_100k_realistic 96.1MB/s 1.00x |
| mimic_gif_decode_1000k 98.4MB/s 1.00x |
| mimic_gif_decode_anim_screencap 178MB/s 1.00x |
| |
| 32-bit ARMv7 (2012 era Samsung Exynos 5 Chromebook), Debian Stretch (2017): |
| |
| name speed vs_mimic |
| |
| wuffs_gif_decode_1k_bw/clang5 49.1MB/s 1.76x |
| wuffs_gif_decode_1k_color_full_init/clang5 22.3MB/s 1.35x |
| wuffs_gif_decode_1k_color_part_init/clang5 27.4MB/s 1.66x |
| wuffs_gif_decode_10k_bgra/clang5 157MB/s n/a |
| wuffs_gif_decode_10k_indexed/clang5 42.0MB/s 1.79x |
| wuffs_gif_decode_20k/clang5 49.3MB/s 1.68x |
| wuffs_gif_decode_100k_artificial/clang5 132MB/s 2.62x |
| wuffs_gif_decode_100k_realistic/clang5 47.8MB/s 1.62x |
| wuffs_gif_decode_1000k_full_init/clang5 46.4MB/s 1.62x |
| wuffs_gif_decode_1000k_part_init/clang5 46.4MB/s 1.62x |
| wuffs_gif_decode_anim_screencap/clang5 243MB/s 4.03x |
| |
| wuffs_gif_decode_1k_bw/gcc6 46.6MB/s 1.67x |
| wuffs_gif_decode_1k_color_full_init/gcc6 20.1MB/s 1.22x |
| wuffs_gif_decode_1k_color_part_init/gcc6 24.2MB/s 1.47x |
| wuffs_gif_decode_10k_bgra/gcc6 124MB/s n/a |
| wuffs_gif_decode_10k_indexed/gcc6 34.8MB/s 1.49x |
| wuffs_gif_decode_20k/gcc6 43.8MB/s 1.49x |
| wuffs_gif_decode_100k_artificial/gcc6 123MB/s 2.44x |
| wuffs_gif_decode_100k_realistic/gcc6 42.7MB/s 1.44x |
| wuffs_gif_decode_1000k_full_init/gcc6 41.6MB/s 1.45x |
| wuffs_gif_decode_1000k_part_init/gcc6 41.7MB/s 1.45x |
| wuffs_gif_decode_anim_screencap/gcc6 227MB/s 3.76x |
| |
| mimic_gif_decode_1k_bw 27.9MB/s 1.00x |
| mimic_gif_decode_1k_color 16.5MB/s 1.00x |
| mimic_gif_decode_10k_indexed 23.4MB/s 1.00x |
| mimic_gif_decode_20k 29.4MB/s 1.00x |
| mimic_gif_decode_100k_artificial 50.4MB/s 1.00x |
| mimic_gif_decode_100k_realistic 29.5MB/s 1.00x |
| mimic_gif_decode_1000k 28.7MB/s 1.00x |
| mimic_gif_decode_anim_screencap 60.3MB/s 1.00x |
| |
| |
| ## GIF (Go) |
| |
| For comparison, here are Go 1.12.10's numbers, using Go's standard library's |
| `image/gif` package. |
| |
| name speed vs_mimic |
| |
| go_gif_decode_1k_bw 107MB/s 0.74x |
| go_gif_decode_1k_color 39.2MB/s 0.51x |
| go_gif_decode_10k_bgra 117MB/s n/a |
| go_gif_decode_10k_indexed 57.8MB/s 0.61x |
| go_gif_decode_20k 67.2MB/s 0.69x |
| go_gif_decode_100k_artificial 151MB/s 0.97x |
| go_gif_decode_100k_realistic 67.2MB/s 0.70x |
| go_gif_decode_1000k 68.1MB/s 0.69x |
| go_gif_decode_anim_screencap 206MB/s 1.16x |
| |
| To reproduce these numbers: |
| |
| git clone https://github.com/google/wuffs.git |
| cd wuffs/script/bench-go-gif/ |
| go run main.go |
| |
| |
| ## GIF (Rust) |
| |
| For comparison, here are Rust 1.37.0's numbers, using the |
| [image-rs/image-gif](https://github.com/image-rs/image-gif) crate, easily the |
| top `crates.io` result for ["gif"](https://crates.io/search?q=gif). |
| |
| name speed vs_mimic |
| |
| rust_gif_decode_1k_bw 89.2MB/s 0.62x |
| rust_gif_decode_1k_color 20.7MB/s 0.27x |
| rust_gif_decode_10k_bgra 74.5MB/s n/a |
| rust_gif_decode_10k_indexed 20.4MB/s 0.21x |
| rust_gif_decode_20k 28.9MB/s 0.29x |
| rust_gif_decode_100k_artificial 79.1MB/s 0.51x |
| rust_gif_decode_100k_realistic 27.9MB/s 0.29x |
| rust_gif_decode_1000k 27.9MB/s 0.28x |
| rust_gif_decode_anim_screencap 144MB/s 0.81x |
| |
| To reproduce these numbers: |
| |
| git clone https://github.com/google/wuffs.git |
| cd wuffs/script/bench-rust-gif/ |
| cargo run --release |
| |
| |
| # Gzip (Deflate + CRC-32) |
| |
| The `1k`, `10k`, etc. numbers are approximately how many bytes there in the |
| decoded output. |
| |
| name speed vs_mimic |
| |
| wuffs_gzip_decode_10k/clang8 238MB/s 1.05x |
| wuffs_gzip_decode_100k/clang8 273MB/s 1.03x |
| |
| wuffs_gzip_decode_10k/gcc9 239MB/s 1.06x |
| wuffs_gzip_decode_100k/gcc9 297MB/s 1.12x |
| |
| mimic_gzip_decode_10k 226MB/s 1.00x |
| mimic_gzip_decode_100k 265MB/s 1.00x |
| |
| |
| # LZW |
| |
| The `1k`, `10k`, etc. numbers are approximately how many bytes there in the |
| decoded output. |
| |
| The libgif library doesn't export its LZW decoder in its API, so there are no |
| mimic numbers to compare to. |
| |
| name speed vs_mimic |
| |
| wuffs_lzw_decode_20k/clang8 263MB/s n/a |
| wuffs_lzw_decode_100k/clang8 438MB/s n/a |
| |
| wuffs_lzw_decode_20k/gcc9 266MB/s n/a |
| wuffs_lzw_decode_100k/gcc9 450MB/s n/a |
| |
| |
| # Zlib (Deflate + Adler-32) |
| |
| The `1k`, `10k`, etc. numbers are approximately how many bytes there in the |
| decoded output. |
| |
| name speed vs_mimic |
| |
| wuffs_zlib_decode_10k/clang8 237MB/s 0.96x |
| wuffs_zlib_decode_100k/clang8 272MB/s 0.92x |
| |
| wuffs_zlib_decode_10k/gcc9 242MB/s 0.98x |
| wuffs_zlib_decode_100k/gcc9 294MB/s 0.99x |
| |
| mimic_zlib_decode_10k 247MB/s 1.00x |
| mimic_zlib_decode_100k 296MB/s 1.00x |
| |
| |
| --- |
| |
| Updated on December 2019. |