doc/background.md - external/github.com/google/wuffs - Git at Google

 # Background

 Decoding untrusted data, such as images downloaded from across the web, has a
 long history of security vulnerabilities. As of 2019, libpng is over 20 years
 old, and the [PNG specification is dated 2003](https://www.w3.org/TR/PNG/), but
 that well examined C library is still getting [CVE's published in
 2019](https://www.cvedetails.com/vulnerability-list/vendor_id-7294/year-2019/Libpng.html).

 Sandboxing and fuzzing can mitigate the danger, but they are reactions to C's
 fundamental unsafety. Newer programming languages remove entire classes of
 potential security bugs. Buffer overflows and null pointer dereferences are
 amongst the most well known.

 Less well known are integer overflow bugs. Offset-length pairs, defining a
 sub-section of a file, are seen in many file formats, such as OpenType fonts
 and PDF documents. A conscientious C programmer might think to check that a
 section of a file or a buffer is within bounds by writing `if (offset + length
 < end)` before processing that section, but that addition can silently
 overflow, and a maliciously crafted file might bypass the check.

 A variation on this theme is where `offset` is a pointer, exemplified by
 [capnproto's
 CVE-2017-7892](https://github.com/sandstorm-io/capnproto/blob/master/security-advisories/2017-04-17-0-apple-clang-elides-bounds-check.md)
 and [another
 example](https://www.blackhat.com/docs/us-14/materials/us-14-Rosenberg-Reflections-on-Trusting-TrustZone.pdf).
 For a pointer-typed offset, witnessing such a vulnerability can depend on both
 the malicious input itself and the addresses of the memory the software used to
 process that input. Those addresses can vary from run to run and from system to
 system, e.g. 32-bit versus 64-bit systems and whether dynamically allocated
 memory can have sufficiently high address values, and that variability makes it
 harder to reproduce and to catch such subtle bugs from fuzzing.

 In C, some integer overflow is *undefined behavior*, as per [the C99 spec
 section 3.4.3](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf). In
 Go, integer overflow is [silently
 ignored](https://golang.org/ref/spec#Integer_overflow). In Rust, integer
 overflow is [checked at run time in debug mode and silently ignored in release
 mode](http://huonw.github.io/blog/2016/04/myths-and-legends-about-integer-overflow-in-rust/)
 by default, as the run time performance penalty was deemed too great. In Swift,
 it's a [run time
 error](https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/AdvancedOperators.html#//apple_ref/doc/uid/TP40014097-CH27-ID37).
 In D, it's [configurable](http://dconf.org/2017/talks/alexandrescu.pdf). Other
 languages like Python and Haskell can automatically spill into 'big integers'
 larger than 64 bits, but this can have a performance impact when such integers
 are used in inner loops.

 Even if overflow is checked, it is usually checked at run time. Similarly,
 modern languages do their bounds checking at run time. An expression like
 `a[i]` is really `if ((0 <= i) && (i < a.length)) { use a[i] } else { throw }`,
 in mangled pseudo-code. Compilers for these languages can often eliminate many
 of these bounds checks, e.g. if `i` is an iterator index, but not always all of
 them.

 The run time cost is small, measured in nanoseconds. But if an image decoding
 library has to eat this cost per pixel, and you have a megapixel image, then
 nanoseconds become milliseconds, and milliseconds can matter.

 In comparison, in Wuffs, all bounds checks and arithmetic overflow checks
 happen at compile time, with zero run time overhead.
	# Background

	Decoding untrusted data, such as images downloaded from across the web, has a
	long history of security vulnerabilities. As of 2019, libpng is over 20 years
	old, and the [PNG specification is dated 2003](https://www.w3.org/TR/PNG/), but
	that well examined C library is still getting [CVE's published in
	2019](https://www.cvedetails.com/vulnerability-list/vendor_id-7294/year-2019/Libpng.html).

	Sandboxing and fuzzing can mitigate the danger, but they are reactions to C's
	fundamental unsafety. Newer programming languages remove entire classes of
	potential security bugs. Buffer overflows and null pointer dereferences are
	amongst the most well known.

	Less well known are integer overflow bugs. Offset-length pairs, defining a
	sub-section of a file, are seen in many file formats, such as OpenType fonts
	and PDF documents. A conscientious C programmer might think to check that a
	section of a file or a buffer is within bounds by writing `if (offset + length
	< end)` before processing that section, but that addition can silently
	overflow, and a maliciously crafted file might bypass the check.

	A variation on this theme is where `offset` is a pointer, exemplified by
	[capnproto's
	CVE-2017-7892](https://github.com/sandstorm-io/capnproto/blob/master/security-advisories/2017-04-17-0-apple-clang-elides-bounds-check.md)
	and [another
	example](https://www.blackhat.com/docs/us-14/materials/us-14-Rosenberg-Reflections-on-Trusting-TrustZone.pdf).
	For a pointer-typed offset, witnessing such a vulnerability can depend on both
	the malicious input itself and the addresses of the memory the software used to
	process that input. Those addresses can vary from run to run and from system to
	system, e.g. 32-bit versus 64-bit systems and whether dynamically allocated
	memory can have sufficiently high address values, and that variability makes it
	harder to reproduce and to catch such subtle bugs from fuzzing.

	In C, some integer overflow is undefined behavior, as per [the C99 spec
	section 3.4.3](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf). In
	Go, integer overflow is [silently
	ignored](https://golang.org/ref/spec#Integer_overflow). In Rust, integer
	overflow is [checked at run time in debug mode and silently ignored in release
	mode](http://huonw.github.io/blog/2016/04/myths-and-legends-about-integer-overflow-in-rust/)
	by default, as the run time performance penalty was deemed too great. In Swift,
	it's a [run time
	error](https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/AdvancedOperators.html#//apple_ref/doc/uid/TP40014097-CH27-ID37).
	In D, it's [configurable](http://dconf.org/2017/talks/alexandrescu.pdf). Other
	languages like Python and Haskell can automatically spill into 'big integers'
	larger than 64 bits, but this can have a performance impact when such integers
	are used in inner loops.

	Even if overflow is checked, it is usually checked at run time. Similarly,
	modern languages do their bounds checking at run time. An expression like
	`a[i]` is really `if ((0 <= i) && (i < a.length)) { use a[i] } else { throw }`,
	in mangled pseudo-code. Compilers for these languages can often eliminate many
	of these bounds checks, e.g. if `i` is an iterator index, but not always all of
	them.

	The run time cost is small, measured in nanoseconds. But if an image decoding
	library has to eat this cost per pixel, and you have a megapixel image, then
	nanoseconds become milliseconds, and milliseconds can matter.

	In comparison, in Wuffs, all bounds checks and arithmetic overflow checks
	happen at compile time, with zero run time overhead.