commit | b0e9efff3f44dc150edb22a27297bfef668035d2 | [log] [tgz] |
---|---|---|
author | Nigel Tao <nigeltao@golang.org> | Thu Oct 20 10:53:58 2022 +1100 |
committer | Nigel Tao <nigeltao@golang.org> | Thu Oct 20 11:18:09 2022 +1100 |
tree | 218647b445aa249a5cd512213b02b6a0f7fa8a12 | |
parent | 8b7f82142c90503e1b9ac6c9ae2cdcce63a8dd01 [diff] |
Avoid (NULL + 0) in derived io_buffer variables This addresses a "runtime error: applying zero offset to null pointer" UBSAN (Undefined Behavior Sanitizer) warning using clang 15.0.1: https://logs.chromium.org/logs/skia/5e005cc1b1981011/+/steps/dm/0/stdout The offending line of code was "io1_a_dst = io0_a_dst + a_dst->meta.wi" in wuffs_lzw__decoder__write_to and "io1_a_dst = NULL + 0" was Undefined Behavior even though io1_a_dst was never dereferenced. This commit avoids initializing io1_a_dst to (NULL + 0) when io0_a_dst is NULL, falling back to initializing it with just NULL. As discussed in https://reviews.llvm.org/D67122 while (nullptr + 0) is Defined Behavior according to the C++ spec, (NULL + 0) is Undefined Behavior (by omission) in C11 6.5.6/8: "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined." The benchmarks (an excerpt of the full suite is below) seem quite sensitive to this simple change that's outside of hot loops, sometimes much better and sometimes much worse. I don't know why. name old speed new speed delta wuffs_bzip2_decode_10k/clang11 63.1MB/s ± 0% 58.4MB/s ± 0% -7.43% (p=0.008 n=5+5) wuffs_bzip2_decode_100k/clang11 49.4MB/s ± 0% 46.2MB/s ± 0% -6.35% (p=0.008 n=5+5) wuffs_bzip2_decode_10k/gcc10 60.9MB/s ± 0% 56.3MB/s ± 0% -7.59% (p=0.016 n=5+4) wuffs_bzip2_decode_100k/gcc10 49.6MB/s ± 0% 47.0MB/s ± 0% -5.15% (p=0.008 n=5+5) wuffs_deflate_decode_1k_full_init/clang11 196MB/s ± 0% 195MB/s ± 1% -0.63% (p=0.008 n=5+5) wuffs_deflate_decode_1k_part_init/clang11 226MB/s ± 0% 224MB/s ± 0% -0.95% (p=0.008 n=5+5) wuffs_deflate_decode_10k_full_init/clang11 409MB/s ± 0% 420MB/s ± 0% +2.84% (p=0.008 n=5+5) wuffs_deflate_decode_10k_part_init/clang11 418MB/s ± 0% 431MB/s ± 0% +2.97% (p=0.008 n=5+5) wuffs_deflate_decode_100k_just_one_read/clang11 517MB/s ± 1% 542MB/s ± 0% +4.78% (p=0.008 n=5+5) wuffs_deflate_decode_100k_many_big_reads/clang11 330MB/s ± 0% 338MB/s ± 0% +2.45% (p=0.008 n=5+5) wuffs_deflate_decode_1k_full_init/gcc10 188MB/s ± 0% 177MB/s ± 0% -5.38% (p=0.008 n=5+5) wuffs_deflate_decode_1k_part_init/gcc10 218MB/s ± 0% 209MB/s ± 0% -4.11% (p=0.016 n=4+5) wuffs_deflate_decode_10k_full_init/gcc10 402MB/s ± 0% 407MB/s ± 1% +1.19% (p=0.008 n=5+5) wuffs_deflate_decode_10k_part_init/gcc10 413MB/s ± 0% 419MB/s ± 1% +1.64% (p=0.008 n=5+5) wuffs_deflate_decode_100k_just_one_read/gcc10 520MB/s ± 0% 532MB/s ± 0% +2.25% (p=0.008 n=5+5) wuffs_deflate_decode_100k_many_big_reads/gcc10 330MB/s ± 0% 335MB/s ± 0% +1.56% (p=0.008 n=5+5) wuffs_gif_decode_1k_bw/clang11 781MB/s ± 0% 645MB/s ± 0% -17.46% (p=0.008 n=5+5) wuffs_gif_decode_1k_color_full_init/clang11 176MB/s ± 0% 181MB/s ± 0% +3.11% (p=0.008 n=5+5) wuffs_gif_decode_1k_color_part_init/clang11 215MB/s ± 0% 223MB/s ± 0% +3.46% (p=0.008 n=5+5) wuffs_gif_decode_10k_bgra/clang11 786MB/s ± 0% 801MB/s ± 0% +1.94% (p=0.008 n=5+5) wuffs_gif_decode_10k_indexed/clang11 208MB/s ± 0% 211MB/s ± 0% +1.44% (p=0.008 n=5+5) wuffs_gif_decode_20k/clang11 261MB/s ± 0% 252MB/s ± 0% -3.43% (p=0.008 n=5+5) wuffs_gif_decode_100k_artificial/clang11 578MB/s ± 0% 583MB/s ± 0% +0.86% (p=0.008 n=5+5) wuffs_gif_decode_100k_realistic/clang11 224MB/s ± 0% 223MB/s ± 0% -0.79% (p=0.008 n=5+5) wuffs_gif_decode_1000k_full_init/clang11 229MB/s ± 0% 225MB/s ± 0% -1.73% (p=0.008 n=5+5) wuffs_gif_decode_1000k_part_init/clang11 229MB/s ± 0% 225MB/s ± 0% -1.69% (p=0.008 n=5+5) wuffs_gif_decode_anim_screencap/clang11 1.31GB/s ± 0% 1.29GB/s ± 0% -1.06% (p=0.008 n=5+5) wuffs_gif_decode_1k_bw/gcc10 650MB/s ± 0% 631MB/s ± 0% -3.03% (p=0.008 n=5+5) wuffs_gif_decode_1k_color_full_init/gcc10 169MB/s ± 0% 168MB/s ± 0% -0.51% (p=0.008 n=5+5) wuffs_gif_decode_1k_color_part_init/gcc10 202MB/s ± 0% 202MB/s ± 0% -0.18% (p=0.008 n=5+5) wuffs_gif_decode_10k_bgra/gcc10 766MB/s ± 0% 790MB/s ± 0% +3.08% (p=0.008 n=5+5) wuffs_gif_decode_10k_indexed/gcc10 202MB/s ± 0% 208MB/s ± 0% +3.16% (p=0.008 n=5+5) wuffs_gif_decode_20k/gcc10 250MB/s ± 0% 258MB/s ± 0% +3.20% (p=0.008 n=5+5) wuffs_gif_decode_100k_artificial/gcc10 563MB/s ± 0% 566MB/s ± 0% +0.57% (p=0.008 n=5+5) wuffs_gif_decode_100k_realistic/gcc10 217MB/s ± 0% 220MB/s ± 0% +1.49% (p=0.008 n=5+5) wuffs_gif_decode_1000k_full_init/gcc10 220MB/s ± 0% 224MB/s ± 0% +1.49% (p=0.008 n=5+5) wuffs_gif_decode_1000k_part_init/gcc10 220MB/s ± 0% 224MB/s ± 0% +1.43% (p=0.008 n=5+5) wuffs_gif_decode_anim_screencap/gcc10 1.30GB/s ± 0% 1.30GB/s ± 0% +0.34% (p=0.008 n=5+5) wuffs_lzw_decode_20k/clang11 293MB/s ± 0% 292MB/s ± 0% -0.41% (p=0.008 n=5+5) wuffs_lzw_decode_100k/clang11 486MB/s ± 0% 537MB/s ± 0% +10.49% (p=0.008 n=5+5) wuffs_lzw_decode_20k/gcc10 258MB/s ± 0% 276MB/s ± 0% +7.19% (p=0.016 n=4+5) wuffs_lzw_decode_100k/gcc10 512MB/s ± 0% 527MB/s ± 0% +3.08% (p=0.016 n=4+5)
Wuffs (Wrangling Untrusted File Formats Safely) is formerly known as Puffs (Parsing Untrusted File Formats Safely).
Wuffs is a memory-safe programming language (and a standard library written in that language) for wrangling untrusted file formats safely. Wrangling includes parsing, decoding and encoding. Example file formats include images, audio, video, fonts and compressed archives.
It is also fast. On many of its GIF decoding benchmarks, Wuffs measures 2x faster than “giflib” (C), 3x faster than “image/gif” (Go) and 7x faster than “gif” (Rust).
Wuffs' goal is to produce software libraries that are as safe as Go or Rust, roughly speaking, but as fast as C, and that can be used anywhere C libraries are used. This includes very large C/C++ projects, such as popular web browsers and operating systems (using that term to include desktop and mobile user interfaces, not just the kernel).
Wuffs the Library is available as transpiled C code. Other C/C++ projects can use that library without requiring the Wuffs the Language toolchain. Those projects can use Wuffs the Library like using any other third party C library. It's just not hand-written C.
However, unlike hand-written C, Wuffs the Language is safe with respect to buffer overflows, integer arithmetic overflows and null pointer dereferences. A key difference between Wuffs and other memory-safe languages is that all such checks are done at compile time, not at run time. If it compiles, it is safe, with respect to those three bug classes.
The trade-off in aiming for both safety and speed is that Wuffs programs take longer for a programmer to write, as they have to explicitly annotate their programs with proofs of safety. A statement like x += 1
unsurprisingly means to increment the variable x
by 1
. However, in Wuffs, such a statement is a compile time error unless the compiler can also prove that x
is not the maximal value of x
's type (e.g. x
is not 255
if x
is a base.u8
), as the increment would otherwise overflow. Similarly, an integer arithmetic expression like x / y
is a compile time error unless the compiler can also prove that y
is not zero.
Wuffs is not a general purpose programming language. It is for writing libraries, not programs. The idea isn't to write your whole program in Wuffs, only the parts that are both performance-conscious and security-conscious. For example, while technically possible, it is unlikely that a Wuffs compiler would be worth writing entirely in Wuffs.
The /std/lzw/decode_lzw.wuffs
file is a good example. The Wuffs the Language document has more information on how it differs from other languages in the C family.
For example, making this one-line edit to the LZW codec leads to a compile time error. wuffs gen
fails to generate the C code, i.e. fails to compile (transpile) the Wuffs code to C code:
diff --git a/std/lzw/decode_lzw.wuffs b/std/lzw/decode_lzw.wuffs index f878c5e..f10dcee 100644 --- a/std/lzw/decode_lzw.wuffs +++ b/std/lzw/decode_lzw.wuffs @@ -98,7 +98,7 @@ pub func lzw_decoder.decode?(dst ptr buf1, src ptr buf1, src_final bool)() { in.dst.write?(x:s) if use_save_code { - this.suffixes[save_code] = c as u8 + this.suffixes[save_code] = (c + 1) as u8 this.prefixes[save_code] = prev_code as u16 }
$ wuffs gen std/gif check: expression "(c + 1) as u8" bounds [1 ..= 256] is not within bounds [0 ..= 255] at /home/n/go/src/github.com/google/wuffs/std/lzw/decode_lzw.wuffs:101. Facts: n_bits < 8 c < 256 this.stack[s] == (c as u8) use_save_code
In comparison, this two-line edit will compile (but the “does it decode GIF correctly” tests then fail):
diff --git a/std/lzw/decode_lzw.wuffs b/std/lzw/decode_lzw.wuffs index f878c5e..b43443d 100644 --- a/std/lzw/decode_lzw.wuffs +++ b/std/lzw/decode_lzw.wuffs @@ -97,8 +97,8 @@ pub func lzw_decoder.decode?(dst ptr buf1, src ptr buf1, src_final bool)() { // type checking, bounds checking and code generation for it). in.dst.write?(x:s) - if use_save_code { - this.suffixes[save_code] = c as u8 + if use_save_code and (c < 200) { + this.suffixes[save_code] = (c + 1) as u8 this.prefixes[save_code] = prev_code as u16 }
$ wuffs gen std/gif gen wrote: /home/n/go/src/github.com/google/wuffs/gen/c/gif.c gen unchanged: /home/n/go/src/github.com/google/wuffs/gen/h/gif.h $ wuffs test std/gif gen unchanged: /home/n/go/src/github.com/google/wuffs/gen/c/gif.c gen unchanged: /home/n/go/src/github.com/google/wuffs/gen/h/gif.h test: /home/n/go/src/github.com/google/wuffs/test/c/gif gif/basic.c clang PASS (8 tests run) gif/basic.c gcc PASS (8 tests run) gif/gif.c clang FAIL test_lzw_decode: bufs1_equal: wi: got 19311, want 19200. contents differ at byte 3 (in hex: 0x000003): 000000: dcdc dc00 00d9 f5f9 f6df dc5f 393a 3a3a ..........._9::: 000010: 3a3b 618e c8e4 e4e4 e5e4 e600 00e4 bbbb :;a............. 000020: eded 8f91 9191 9090 9090 9190 9192 9192 ................ 000030: 9191 9292 9191 9293 93f0 f0f0 f1f1 f2f2 ................ excerpts of got (above) versus want (below): 000000: dcdc dcdc dcd9 f5f9 f6df dc5f 393a 3a3a ..........._9::: 000010: 3a3a 618e c8e4 e4e4 e5e4 e6e4 e4e4 bbbb ::a............. 000020: eded 8f91 9191 9090 9090 9090 9191 9191 ................ 000030: 9191 9191 9191 9193 93f0 f0f0 f1f1 f2f2 ................ gif/gif.c gcc FAIL test_lzw_decode: bufs1_equal: wi: got 19311, want 19200. contents differ at byte 3 (in hex: 0x000003): 000000: dcdc dc00 00d9 f5f9 f6df dc5f 393a 3a3a ..........._9::: 000010: 3a3b 618e c8e4 e4e4 e5e4 e600 00e4 bbbb :;a............. 000020: eded 8f91 9191 9090 9090 9190 9192 9192 ................ 000030: 9191 9292 9191 9293 93f0 f0f0 f1f1 f2f2 ................ excerpts of got (above) versus want (below): 000000: dcdc dcdc dcd9 f5f9 f6df dc5f 393a 3a3a ..........._9::: 000010: 3a3a 618e c8e4 e4e4 e5e4 e6e4 e4e4 bbbb ::a............. 000020: eded 8f91 9191 9090 9090 9090 9191 9191 ................ 000030: 9191 9191 9191 9193 93f0 f0f0 f1f1 f2f2 ................ wuffs-test-c: some tests failed wuffs test: some tests failed
lang
holds the Go libraries that implement Wuffs the Language: tokenizer, AST, parser, renderer, etc. The Wuffs tools are written in Go, but as mentioned above, Wuffs transpiles to C code, and Go is not necessarily involved if all you want is to use the C edition of Wuffs.lib
holds other Go libraries, not specific to Wuffs the Language per se.internal
holds internal implementation details, as per Go's internal packages convention.cmd
holds Wuffs the Language' command line tools, also written in Go.std
holds Wuffs the Library's code.release
holds the releases (e.g. in their C form) of Wuffs the Library.test
holds the regular tests for Wuffs the Library.fuzz
holds the fuzz tests for Wuffs the Library.script
holds miscellaneous utility programs.doc
holds documentation.example
holds example programs for Wuffs the Library.hello-wuffs-c
holds an example program for Wuffs the Language.The Note directory also contains various short articles.
Version 0.2. The API and ABI aren't stabilized yet. The compiler undoubtedly has bugs. Assertion checking needs more rigor, especially around side effects and aliasing, and being sufficiently well specified to allow alternative implementations. Lots of detail needs work, but the broad brushstrokes are there.
The mailing list is at https://groups.google.com/forum/#!forum/wuffs.
The CONTRIBUTING.md file contains instructions on how to file the Contributor License Agreement before sending any pull requests (PRs). Of course, if you‘re new to the project, it’s usually best to discuss any proposals and reach consensus before sending your first PR.
Source code is auto-formatted.
Apache 2. See the LICENSE file for details.
This is not an official Google product, it is just code that happens to be owned by Google.
Updated on December 2019.