wuffs bench -mimic
summarized throughput numbers for various codecs are below. Higher is better.
“Mimic” tests check that Wuffs' output mimics (i.e. exactly matches) other libraries' output. “Mimic” benchmarks give the numbers for those other libraries, as shipped with Debian. These were measured on a Debian Bullseye system as of September 2022, which meant these compiler versions:
and these popular “mimic” library versions, all written in C:
and these alternative “mimic” library versions, again all written in C:
Unless otherwise stated, the numbers below were taken as of Wuffs git commit 315b2e52 “wuffs gen -version=0.3.0-rc.1”, the first Wuffs v0.3 release candidate. As for the CPU model:
$ cat /proc/cpuinfo | grep model.name | uniq model name: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
The benchmark programs aim to be runnable “out of the box” without any configuration or installation. For example, to run the std/zlib
benchmarks:
git clone https://github.com/google/wuffs.git cd wuffs gcc -O3 test/c/std/zlib.c ./a.out -bench rm a.out
A comment near the top of that .c
file says how to run the mimic benchmarks.
The output of those benchmark programs is compatible with the benchstat tool. For example, that tool can calculate confidence intervals based on multiple benchmark runs, or calculate p-values when comparing numbers before and after a code change. To install it, first install Go, then run go install golang.org/x/perf/cmd/benchstat
.
As mentioned above, individual benchmark programs can be run manually. However, the canonical way to run the benchmarks (across multiple compilers and multiple packages like GIF and PNG) for Wuffs' standard library is to use the wuffs
command line tool, as it will also re-generate (transpile) the C code whenever you edit the std/*/*.wuffs
code. Running go install -v github.com/google/wuffs/cmd/...
will install the Wuffs tools. After that, you can say
wuffs bench
or
wuffs bench -mimic std/deflate
or
wuffs bench -ccompilers=gcc -reps=3 -focus=wuffs_gif_decode_20k std/gif
On some of the benchmarks below, clang performs noticeably worse (e.g. 1.3x slower) than gcc, on the same C code. A relatively simple reproduction was filed as LLVM bug 35567.
CPU power management can inject noise in benchmark times. On a Linux system, power management can be controlled with:
# Query. cpupower --cpu all frequency-info --policy # Turn on. sudo cpupower frequency-set --governor powersave # Turn off. sudo cpupower frequency-set --governor performance
The 1k
, 10k
, etc. numbers are approximately how many bytes are hashed.
name speed vs_mimic wuffs_adler32_10k/clang11 22.3 GB/s 6.4x wuffs_adler32_100k/clang11 26.4 GB/s 7.5x wuffs_adler32_10k/gcc10 21.9 GB/s 6.3x wuffs_adler32_100k/gcc10 22.4 GB/s 6.4x mimicztl_adler32_10k 3.50 GB/s 1.0x mimicztl_adler32_100k 3.52 GB/s 1.0x mimiclibdeflate_adler32_10k 50.4 GB/s 14.4x mimiclibdeflate_adler32_100k 49.4 GB/s 14.0x
The 1k
, 10k
, etc. numbers are approximately how many bytes there are in the decoded output.
name speed vs_mimic wuffs_bzip2_decode_10k/clang11 63.0 MB/s 1.86x wuffs_bzip2_decode_100k/clang11 48.9 MB/s 1.62x wuffs_bzip2_decode_10k/gcc10 61.1 MB/s 1.81x wuffs_bzip2_decode_100k/gcc10 49.1 MB/s 1.62x mimic_bzip2_decode_10k 33.8 MB/s 1.00x mimic_bzip2_decode_100k 30.2 MB/s 1.00x
The 1k
, 10k
, etc. numbers are approximately how many bytes are hashed.
name speed vs_mimic wuffs_crc32_ieee_10k/clang11 14.7 GB/s 9.1x wuffs_crc32_ieee_100k/clang11 21.6 GB/s 13.3x wuffs_crc32_ieee_10k/gcc10 14.9 GB/s 9.2x wuffs_crc32_ieee_100k/gcc10 23.8 GB/s 14.7x mimicztl_crc32_ieee_10k 1.62 GB/s 1.0x mimicztl_crc32_ieee_100k 1.62 GB/s 1.0x mimiclibdeflate_crc32_ieee_10k 24.6 GB/s 15.2x mimiclibdeflate_crc32_ieee_100k 25.4 GB/s 15.7x
The 1k
, 10k
, etc. numbers are approximately how many bytes there are in the decoded output.
The full_init
vs part_init
suffixes are whether WUFFS_INITIALIZE__LEAVE_INTERNAL_BUFFERS_UNINITIALIZED
is unset or set.
name speed vs_mimic wuffs_deflate_decode_1k_full_init/clang11 195 MB/s 0.81x wuffs_deflate_decode_1k_part_init/clang11 226 MB/s 0.93x wuffs_deflate_decode_10k_full_init/clang11 409 MB/s 1.41x wuffs_deflate_decode_10k_part_init/clang11 418 MB/s 1.47x wuffs_deflate_decode_100k_just_one_read/clang11 521 MB/s 1.48x wuffs_deflate_decode_100k_many_big_reads/clang11 330 MB/s 1.19x wuffs_deflate_decode_1k_full_init/gcc10 183 MB/s 0.76x wuffs_deflate_decode_1k_part_init/gcc10 217 MB/s 0.90x wuffs_deflate_decode_10k_full_init/gcc10 402 MB/s 1.39x wuffs_deflate_decode_10k_part_init/gcc10 414 MB/s 1.43x wuffs_deflate_decode_100k_just_one_read/gcc10 522 MB/s 1.48x wuffs_deflate_decode_100k_many_big_reads/gcc10 330 MB/s 1.19x mimicztl_deflate_decode_1k_full_init 241 MB/s 1.00x mimicztl_deflate_decode_10k_full_init 289 MB/s 1.00x mimicztl_deflate_decode_100k_just_one_read 352 MB/s 1.00x mimicztl_deflate_decode_100k_many_big_reads 277 MB/s 1.00x mimiclibdeflate_deflate_decode_1k_full_init 326 MB/s 1.35x mimiclibdeflate_deflate_decode_10k_full_init 503 MB/s 1.74x mimiclibdeflate_deflate_decode_100k_just_one_read 507 MB/s 1.44x mimicminiz_deflate_decode_1k_full_init 204 MB/s 0.85x mimicminiz_deflate_decode_10k_full_init 252 MB/s 0.87x mimicminiz_deflate_decode_100k_just_one_read 287 MB/s 0.82x go_deflate_decode_1k_full_init 71 MB/s 0.29x go_deflate_decode_10k_full_init 120 MB/s 0.42x go_deflate_decode_100k_just_one_read 135 MB/s 0.38x rust_deflate_decode_1k_full_init 165 MB/s 0.68x rust_deflate_decode_10k_full_init 259 MB/s 0.90x rust_deflate_decode_100k_just_one_read 272 MB/s 0.77x
To reproduce the libdeflate or miniz numbers, look in test/c/mimiclib/deflate-gzip-zlib.c
. For Go 1.19, run go run main.go
in script/bench-go-deflate
. For Rust 1.48 / flate2 1.0.24 / miniz_oxide 0.5.3, run cargo run --release
in script/bench-rust-deflate
.
Historical (Wuffs v0.2; 2019) numbers for 32-bit ARMv7 (2012 era Samsung Exynos 5 Chromebook), Debian Stretch (2017):
name speed vs_mimic wuffs_deflate_decode_1k_full_init/clang5 30.4MB/s 0.60x wuffs_deflate_decode_1k_part_init/clang5 37.9MB/s 0.74x wuffs_deflate_decode_10k_full_init/clang5 72.8MB/s 0.81x wuffs_deflate_decode_10k_part_init/clang5 76.2MB/s 0.85x wuffs_deflate_decode_100k_just_one_read/clang5 96.5MB/s 0.82x wuffs_deflate_decode_100k_many_big_reads/clang5 81.1MB/s 0.90x wuffs_deflate_decode_1k_full_init/gcc6 31.6MB/s 0.62x wuffs_deflate_decode_1k_part_init/gcc6 39.9MB/s 0.78x wuffs_deflate_decode_10k_full_init/gcc6 69.6MB/s 0.78x wuffs_deflate_decode_10k_part_init/gcc6 72.4MB/s 0.81x wuffs_deflate_decode_100k_just_one_read/gcc6 87.3MB/s 0.74x wuffs_deflate_decode_100k_many_big_reads/gcc6 73.8MB/s 0.82x mimic_deflate_decode_1k 51.0MB/s 1.00x mimic_deflate_decode_10k 89.7MB/s 1.00x mimic_deflate_decode_100k_just_one_read 118MB/s 1.00x mimic_deflate_decode_100k_many_big_reads 90.0MB/s 1.00x
The 1k
, 10k
, etc. numbers are approximately how many pixels there are in the decoded image. For example, the test/data/harvesters.*
images are 1165 × 859, approximately 1000k pixels.
The bgra
vs indexed
suffixes are whether to decode to 4 bytes (BGRA or RGBA) or 1 byte (a palette index) per pixel, even if the underlying file format gives 1 byte per pixel.
The full_init
vs part_init
suffixes are whether WUFFS_INITIALIZE__LEAVE_INTERNAL_BUFFERS_UNINITIALIZED
is unset or set.
The libgif library doesn't export any API for decode-to-BGRA or decode-to-RGBA, so there are no mimic numbers to compare to for the bgra
suffix.
name speed vs_mimic wuffs_gif_decode_1k_bw/clang11 758 MB/s 4.38x wuffs_gif_decode_1k_color_full_init/clang11 176 MB/s 1.96x wuffs_gif_decode_1k_color_part_init/clang11 213 MB/s 2.37x wuffs_gif_decode_10k_bgra/clang11 786 MB/s n/a wuffs_gif_decode_10k_indexed/clang11 208 MB/s 1.98x wuffs_gif_decode_20k/clang11 262 MB/s 2.47x wuffs_gif_decode_100k_artificial/clang11 580 MB/s 3.52x wuffs_gif_decode_100k_realistic/clang11 225 MB/s 2.18x wuffs_gif_decode_1000k_full_init/clang11 229 MB/s 2.16x wuffs_gif_decode_1000k_part_init/clang11 229 MB/s 2.16x wuffs_gif_decode_anim_screencap/clang11 1.31 GB/s 6.58x wuffs_gif_decode_1k_bw/gcc10 659 MB/s 3.80x wuffs_gif_decode_1k_color_full_init/gcc10 175 MB/s 1.94x wuffs_gif_decode_1k_color_part_init/gcc10 202 MB/s 2.24x wuffs_gif_decode_10k_bgra/gcc10 768 MB/s n/a wuffs_gif_decode_10k_indexed/gcc10 202 MB/s 1.92x wuffs_gif_decode_20k/gcc10 250 MB/s 2.36x wuffs_gif_decode_100k_artificial/gcc10 564 MB/s 3.42x wuffs_gif_decode_100k_realistic/gcc10 217 MB/s 2.11x wuffs_gif_decode_1000k_full_init/gcc10 221 MB/s 2.08x wuffs_gif_decode_1000k_part_init/gcc10 221 MB/s 2.08x wuffs_gif_decode_anim_screencap/gcc10 1.31 GB/s 6.58x mimic_gif_decode_1k_bw 173 MB/s 1.00x mimic_gif_decode_1k_color_full_init 90 MB/s 1.00x mimic_gif_decode_10k_indexed 105 MB/s 1.00x mimic_gif_decode_20k 106 MB/s 1.00x mimic_gif_decode_100k_artificial 165 MB/s 1.00x mimic_gif_decode_100k_realistic 103 MB/s 1.00x mimic_gif_decode_1000k_full_init 106 MB/s 1.00x mimic_gif_decode_anim_screencap 199 MB/s 1.00x go_gif_decode_1k_bw 167 MB/s 0.97x go_gif_decode_1k_color_full_init 63 MB/s 0.70x go_gif_decode_10k_bgra 216 MB/s n/a go_gif_decode_10k_indexed 92 MB/s 0.88x go_gif_decode_20k 102 MB/s 0.96x go_gif_decode_100k_artificial 202 MB/s 1.22x go_gif_decode_100k_realistic 102 MB/s 0.99x go_gif_decode_1000k_full_init 103 MB/s 0.97x go_gif_decode_anim_screencap 237 MB/s 1.19x rust_gif_decode_1k_bw 362 MB/s 2.09x rust_gif_decode_1k_color_full_init 113 MB/s 1.26x rust_gif_decode_10k_bgra 333 MB/s n/a rust_gif_decode_10k_indexed 101 MB/s 0.96x rust_gif_decode_20k 119 MB/s 1.12x rust_gif_decode_100k_artificial 248 MB/s 1.50x rust_gif_decode_100k_realistic 113 MB/s 1.10x rust_gif_decode_1000k_full_init 115 MB/s 1.08x rust_gif_decode_anim_screencap 513 MB/s 2.58x
To reproduce the Go 1.19 numbers, run go run main.go
in script/bench-go-gif
. For Rust 1.48 / gif 0.11.4, run cargo run --release
in script/bench-rust-gif
.
Historical (Wuffs v0.2; 2019) numbers for 32-bit ARMv7 (2012 era Samsung Exynos 5 Chromebook), Debian Stretch (2017):
name speed vs_mimic wuffs_gif_decode_1k_bw/clang5 49.1MB/s 1.76x wuffs_gif_decode_1k_color_full_init/clang5 22.3MB/s 1.35x wuffs_gif_decode_1k_color_part_init/clang5 27.4MB/s 1.66x wuffs_gif_decode_10k_bgra/clang5 157MB/s n/a wuffs_gif_decode_10k_indexed/clang5 42.0MB/s 1.79x wuffs_gif_decode_20k/clang5 49.3MB/s 1.68x wuffs_gif_decode_100k_artificial/clang5 132MB/s 2.62x wuffs_gif_decode_100k_realistic/clang5 47.8MB/s 1.62x wuffs_gif_decode_1000k_full_init/clang5 46.4MB/s 1.62x wuffs_gif_decode_1000k_part_init/clang5 46.4MB/s 1.62x wuffs_gif_decode_anim_screencap/clang5 243MB/s 4.03x wuffs_gif_decode_1k_bw/gcc6 46.6MB/s 1.67x wuffs_gif_decode_1k_color_full_init/gcc6 20.1MB/s 1.22x wuffs_gif_decode_1k_color_part_init/gcc6 24.2MB/s 1.47x wuffs_gif_decode_10k_bgra/gcc6 124MB/s n/a wuffs_gif_decode_10k_indexed/gcc6 34.8MB/s 1.49x wuffs_gif_decode_20k/gcc6 43.8MB/s 1.49x wuffs_gif_decode_100k_artificial/gcc6 123MB/s 2.44x wuffs_gif_decode_100k_realistic/gcc6 42.7MB/s 1.44x wuffs_gif_decode_1000k_full_init/gcc6 41.6MB/s 1.45x wuffs_gif_decode_1000k_part_init/gcc6 41.7MB/s 1.45x wuffs_gif_decode_anim_screencap/gcc6 227MB/s 3.76x mimic_gif_decode_1k_bw 27.9MB/s 1.00x mimic_gif_decode_1k_color 16.5MB/s 1.00x mimic_gif_decode_10k_indexed 23.4MB/s 1.00x mimic_gif_decode_20k 29.4MB/s 1.00x mimic_gif_decode_100k_artificial 50.4MB/s 1.00x mimic_gif_decode_100k_realistic 29.5MB/s 1.00x mimic_gif_decode_1000k 28.7MB/s 1.00x mimic_gif_decode_anim_screencap 60.3MB/s 1.00x
The 1k
, 10k
, etc. numbers are approximately how many bytes there are in the decoded output.
name speed vs_mimic wuffs_gzip_decode_10k/clang11 420 MB/s 1.71x wuffs_gzip_decode_100k/clang11 527 MB/s 1.81x wuffs_gzip_decode_10k/gcc10 427 MB/s 1.74x wuffs_gzip_decode_100k/gcc10 550 MB/s 1.89x mimicztl_gzip_decode_10k 246 MB/s 1.00x mimicztl_gzip_decode_100k 291 MB/s 1.00x mimiclibdeflate_gzip_decode_10k 494 MB/s 2.01x mimiclibdeflate_gzip_decode_100k 496 MB/s 1.70x
The 1k
, 10k
, etc. numbers are approximately how many bytes there are in the decoded output.
The libgif library doesn't export its LZW decoder in its API, so there are no mimic numbers to compare to.
name speed vs_mimic wuffs_lzw_decode_20k/clang11 293 MB/s n/a wuffs_lzw_decode_100k/clang11 489 MB/s n/a wuffs_lzw_decode_20k/gcc10 259 MB/s n/a wuffs_lzw_decode_100k/gcc10 516 MB/s n/a
The 1k
, 10k
, etc. numbers are approximately how many bytes there are in the decoded image. For example, the test/data/harvesters.*
images are 1165 × 859, approximately 1000k pixels and hence 4000k bytes at 4 bytes per pixel.
The full_init
vs part_init
suffixes are whether WUFFS_INITIALIZE__LEAVE_INTERNAL_BUFFERS_UNINITIALIZED
is unset or set.
libpng‘s “simplified API” doesn’t provide a way to ignore the checksum. We copy the verify_checksum
numbers for a 1.00x baseline.
name speed vs_mimic wuffs_png_decode_image_19k_8bpp/clang11 279 MB/s 2.29x wuffs_png_decode_image_40k_24bpp/clang11 323 MB/s 2.20x wuffs_png_decode_image_77k_8bpp/clang11 984 MB/s 2.66x wuffs_png_decode_image_552k_32bpp_ignore_checksum/clang11 871 MB/s 3.01x wuffs_png_decode_image_552k_32bpp_verify_checksum/clang11 837 MB/s 2.90x wuffs_png_decode_image_4002k_24bpp/clang11 331 MB/s 1.55x wuffs_png_decode_image_19k_8bpp/gcc10 277 MB/s 2.27x wuffs_png_decode_image_40k_24bpp/gcc10 343 MB/s 2.33x wuffs_png_decode_image_77k_8bpp/gcc10 1.00 GB/s 2.70x wuffs_png_decode_image_552k_32bpp_ignore_checksum/gcc10 914 MB/s 3.16x wuffs_png_decode_image_552k_32bpp_verify_checksum/gcc10 870 MB/s 3.01x wuffs_png_decode_image_4002k_24bpp/gcc10 353 MB/s 1.65x mimiclibpng_png_decode_image_19k_8bpp 122 MB/s 1.00x mimiclibpng_png_decode_image_40k_24bpp 147 MB/s 1.00x mimiclibpng_png_decode_image_77k_8bpp 370 MB/s 1.00x mimiclibpng_png_decode_image_552k_32bpp_verify_checksum 289 MB/s 1.00x mimiclibpng_png_decode_image_4002k_24bpp 214 MB/s 1.00x mimiclibspng_png_decode_image_19k_8bpp 125 MB/s 1.02x mimiclibspng_png_decode_image_40k_24bpp 155 MB/s 1.05x mimiclibspng_png_decode_image_77k_8bpp 384 MB/s 1.04x mimiclibspng_png_decode_image_552k_32bpp_ignore_checksum 461 MB/s 1.60x mimiclibspng_png_decode_image_552k_32bpp_verify_checksum 392 MB/s 1.36x mimiclibspng_png_decode_image_4002k_24bpp 225 MB/s 1.05x mimiclodepng_png_decode_image_19k_8bpp 138 MB/s 1.13x mimiclodepng_png_decode_image_40k_24bpp 166 MB/s 1.13x mimiclodepng_png_decode_image_77k_8bpp 404 MB/s 1.09x mimiclodepng_png_decode_image_552k_32bpp_verify_checksum 258 MB/s 0.89x mimiclodepng_png_decode_image_4002k_24bpp 165 MB/s 0.77x mimicstb_png_decode_image_19k_8bpp 150 MB/s 1.23x mimicstb_png_decode_image_40k_24bpp 166 MB/s 1.13x mimicstb_png_decode_image_77k_8bpp 443 MB/s 1.20x mimicstb_png_decode_image_552k_32bpp_ignore_checksum 288 MB/s 1.00x mimicstb_png_decode_image_4002k_24bpp 163 MB/s 0.76x go_png_decode_image_19k_8bpp 92 MB/s 0.75x go_png_decode_image_40k_24bpp 107 MB/s 0.73x go_png_decode_image_77k_8bpp 207 MB/s 0.56x go_png_decode_image_552k_32bpp_verify_checksum 246 MB/s 0.85x go_png_decode_image_4002k_24bpp 116 MB/s 0.54x rust_png_decode_image_19k_8bpp 187 MB/s 1.53x rust_png_decode_image_40k_24bpp 260 MB/s 1.77x rust_png_decode_image_77k_8bpp 330 MB/s 0.89x rust_png_decode_image_552k_32bpp_verify_checksum 299 MB/s 1.03x rust_png_decode_image_4002k_24bpp 264 MB/s 1.23x
To reproduce the Go 1.19 numbers, run go run main.go
in script/bench-go-png
. For Rust 1.48 / png 0.17.5 / deflate 1.0.0 / miniz_oxide 0.5.3, run cargo run --release
in script/bench-rust-png
.
The 1k
, 10k
, etc. numbers are approximately how many bytes there are in the decoded output.
name speed vs_mimic wuffs_zlib_decode_10k/clang11 410 MB/s 1.53x wuffs_zlib_decode_100k/clang11 505 MB/s 1.56x wuffs_zlib_decode_10k/gcc10 431 MB/s 1.61x wuffs_zlib_decode_100k/gcc10 548 MB/s 1.70x mimicztl_zlib_decode_10k 268 MB/s 1.00x mimicztl_zlib_decode_100k 323 MB/s 1.00x mimiclibdeflate_zlib_decode_10k 497 MB/s 1.85x mimiclibdeflate_zlib_decode_100k 499 MB/s 1.54x mimicminiz_zlib_decode_10k 237 MB/s 0.88x mimicminiz_zlib_decode_100k 272 MB/s 0.84x
Updated on September 2022.