doc/note/io-input-output.md - external/github.com/google/wuffs - Git at Google

 # I/O (Input / Output)

 Wuffs per se doesn't have the ability to read or write to files or network
 connections. Recall that Wuffs is a programming language for writing libraries,
 not applications, and that having fewer capabilities means that it's trivial to
 prove that you can't misuse a capability, even when given malicious input.

 Instead, the code that calls into Wuffs libraries is responsible for
 interfacing with e.g. the file system or the network system. An `io_buffer` is
 the mechanism for transferring data into and out of Wuffs libraries. For
 example, when decompressing gzip, there are two `io_buffer`s: the caller fills
 a source buffer with e.g. the compressed file's contents and the callee (the
 Wuffs library) reads compressed bytes from that source buffer and writes
 decompressed bytes to a destination buffer.


 ## I/O Buffers

 An `io_buffer` is a [slice](/doc/note/slices-arrays-and-tables.md) of bytes
 (the data, a `ptr` and `len`) with additional fields (the metadata): a read
 index (`ri`), a write index (`wi`), a position (`pos`) and whether or not it is
 `closed`.


 ## Read Index and Write Index

 Writing to an `io_buffer`, e.g. copying from a file to a buffer, increments
 `wi`. The buffer is full for writing (no more can be written) when `wi` equals
 `len`. Writing does not have to fill a buffer before further processing.

 Reading from an `io_buffer`, e.g. copying from a buffer to a file, increments
 `ri`. The buffer is empty for reading (no more can be read) when `ri` equals
 `wi`. Reading does not have to empty a buffer before further processing.

 An invariant condition is that `((0 <= ri) and (ri <= wi) and (wi <= len))`.

 Having separate read and write indexes simplifies connecting a sequence of
 filters or processors with `io_buffer`s, similar to connecting Unix processes
 with pipes. Each filter reads from the previous buffer and writes to the next
 buffer. Each buffer is written to by the previous filter and is read from by
 the next filter. There's no need to flip a buffer between reading and writing
 modes. Nonetheless, `io_buffer`s are generally not thread-safe.

 Continuing the "decompressing gzip" example, the application would write to
 the source buffer by copying from e.g. `stdin`. The Wuffs library would read
 from the source buffer and write to the destination buffer. The application
 would read from the destination buffer by copying to e.g. `stdout`. Buffer
 space can be re-used, via compaction (see below), so that neither the source
 or destination data needs to be entirely in memory at any point.

 For example, an `io_buffer` of length 8 could have 4 bytes available to read
 and 1 byte available to write. If 1 byte was written, there would then be 5
 bytes available to read. Visually:

 ```
 [.. .. .. .. .. .. .. ..]
  |<- ri ->|           |  |
  |<------- wi ------->|  |
  |<-------- len -------->|
 ```


 ## Position

 An `io_buffer` is a sliding window into a stream of bytes. Its position (`pos`)
 is the number of bytes in the stream prior to the first element of the slice.
 The total number of bytes read from and written to the stream are therefore
 `(pos + ri)` and `(pos + wi)`.

 While every slice element is in-memory, the stream's prior bytes do not
 necessarily have to be in-memory now, or have been in-memory in the past. It is
 valid to open a file, seek to the 1000'th byte and start copying from there to
 an `io_buffer`, provided that `pos` was also initialized to 1000.


 ## Closed-ness

 The `closed` field indicates that no further writes are expected to the
 `io_buffer`. When copying from a file to a buffer, `closed` means that we have
 reached EOF (End Of File).

 For example, decoding a particular file format might, at some point, expect at
 least another 4 bytes of data, but only 3 are available to read. If `closed` is
 false, this isn't necessarily an error, since an `io_buffer` holds only a
 partial view of the underlying data stream, and more data might be forthcoming
 but not yet buffered. If `closed` is true, it is definitely an error.


 ## Undoing Reads and Writes

 It is possible to decrement `ri` or `wi`, undoing previous reads or writes,
 provided that the invariant `((0 <= ri) and (ri <= wi) and (wi <= len))` holds.
 For example, it can be faster on 64 bit (8 byte) systems, if buffer space is
 available, to write 8 bytes and then undo 1 byte than to write exactly 7 bytes.

 The Wuffs compiler enforces that, during a Wuffs function, `ri` and `wi` will
 never be decremented (by an undo operation) to be less than the initial values
 at the time of the call. When considering a function as a 'black box', the two
 indexes can only travel forward, and it is up to the application code (not
 Wuffs library) code to rewind the indexes (e.g. by compaction).

 Even though `ri` cannot drop below its initial value, Wuffs code can still
 read the contents of the slice before `ri` (in sub-slice notation,
 `data[0 .. ri]`) and it should still contain the `(pos + 0)`th, `(pos + 1)`th,
 etc. byte of the stream.

 The contents of the slice after `wi` (in sub-slice notation, `data[wi .. len]`)
 are undefined, and code should not rely on its values. When passing an
 `io_buffer` into a function, that function is free to modify anything in
 `data[wi .. len]`, for either value of `wi` before or after the function
 returns.


 ## Compaction

 Compacting an `io_buffer` moves any written but unread bytes (those in `data[ri
 .. wi]`) to the start of the buffer, and updates the metadata fields `ri`, `wi`
 and `pos`. Equivalently, it moves the sliding window that is the `io_buffer` as
 far forward as possible along the stream.

 This generally increases `(len - wi)`, the number of bytes available for
 writing, allowing for re-using the allocated buffer memory (the data slice).

 Suppose that the underlying data stream's `i`th byte has value `i`, and that we
 start with `ri`, `wi` and `pos` were `3`, `7` and `20`. Compaction will
 subtract 3 from the first two and add 3 to the last, so that the new `ri`, `wi`
 and `pos` are `0`, `4` and `23`. Note that `len`, `(pos + ri)` and `(pos + wi)`
 are all unchanged.

 Here are two equivalent visualizations of before and after compaction. The `xx`
 means a byte whose value is undefined (as it is at or past `wi`).

 The first visualization is where the slice is fixed and its contents (its view
 of the stream) moves relative to the slice:

 ```
 Before:
 [20 21 22 23 24 25 26 xx]
  |<- ri ->|           |  |
  |<------- wi ------->|  |
  |<-------- len -------->|

 After:
 [23 24 25 26 xx xx xx xx]
  ||          |           |
  |<-- wi --->|           |
  |<-------- len -------->|
 ```

 The second visualization is where the stream (and its contents) is fixed and
 the slice (the sliding window) moves relative to the stream:

 ```
                            pos+ri      pos+wi
                            |           |
 Before:          [20 21 22 23 24 25 26 xx]
 Stream: ... 18 19 20 21 22 23 24 25 26 27 27 28 29 30 31 ...
 After:                    [23 24 25 26 xx xx xx xx]
                            |           |
                            pos+ri      pos+wi
 ```


 ## Seeking and I/O Positions

 Recall that Wuffs code has limited capabilities, and cannot seek in the
 underlying I/O data streams per se. When it needs to seek (e.g. when jumping
 between video frames), it will typically provide an "I/O position", a
 `uint64_t` value, via some package-specific API. The application (the caller of
 the Wuffs code) is then responsible for configuring an `io_buffer` whose
 `(pos + ri)` or `(pos + wi)` value, depending on whether we're reading or
 writing, is at that "I/O position".

 If the underlying file (or equivalent) isn't seekable, e.g. it's `/dev/stdin`
 instead of a regular file, then the request cannot be satisfied. The
 application should then decide whether that error is recoverable or fatal. This
 is the application's responsibility, not the library's, as the application
 usually has more context to make that decision.

 If that "I/O position" is already within the sliding window, it might not be
 necessary to seek in the underlying file, as it may be possible to e.g. simply
 decrement `ri` to reach a target `(pos + ri)`, for the reading case. Otherwise,
 the typical process is:

 1. Set `ri`, `wi` and `pos` to `0`, `0` and that "I/O position". This discards
    any buffered data (but does not free the buffer's memory).
 2. Seek in the underlying file to that same "I/O position".
 3. Copy from the underlying file to the `io_buffer`, incrementing `wi`.

 Whether or not it was necessary to seek and copy from the underlying file, when
 calling back into the Wuffs library, it typically checks that the `io_buffer`'s
 `(pos + ri)` is now at the expected "I/O position".


 ## I/O Reader and I/O Writer

 An `io_buffer` is the mechanism for transferring data between the application
 and the Wuffs library. Application code can manipulate an `io_buffer`'s fields
 as it wishes (but is responsible for maintaining the invariant condition).
 Wuffs library code places a further restriction that `io_buffer`s are used
 exclusively either for reading or for writing, as optimizing incremental access
 to an `io_buffer`'s data, while enforcing invariants, is simpler when only one
 of `ri` and `wi` can vary.

 Wuffs code therefore refers to either a `base.io_reader` or `base.io_writer`,
 both of which are essentially the same type (an `io_buffer`) with different
 methods. Wuffs code does not reference an `io_buffer` directly.


 ## Binding

 An `io_bind` block temporarily adapts a slice of bytes as an `io_reader` or
 `io_writer`. This is typically done to call other functions that take an
 `io_reader` or `io_writer` as an argument.

 ```
 var r : base.io_reader
 var s : slice base.u8

 etc

 // Just before the io_bind, r's state is saved.

 io_bind (io: r, data: s) {
     // At the top of the block, r's data slice is set to s, and r's metadata is
     // set so that ri = 0, pos = 0 and closed = false.
     //
     // Because r is an io_reader, not an io_writer, the wi metadata field is
     // set to the slice length, not 0.
     //
     // r must be a local variable, but s can be an expression.
     etc
 }

 // Just after the io_bind, r's state is restored.
 ```
	# I/O (Input / Output)

	Wuffs per se doesn't have the ability to read or write to files or network
	connections. Recall that Wuffs is a programming language for writing libraries,
	not applications, and that having fewer capabilities means that it's trivial to
	prove that you can't misuse a capability, even when given malicious input.

	Instead, the code that calls into Wuffs libraries is responsible for
	interfacing with e.g. the file system or the network system. An `io_buffer` is
	the mechanism for transferring data into and out of Wuffs libraries. For
	example, when decompressing gzip, there are two `io_buffer`s: the caller fills
	a source buffer with e.g. the compressed file's contents and the callee (the
	Wuffs library) reads compressed bytes from that source buffer and writes
	decompressed bytes to a destination buffer.


	## I/O Buffers

	An `io_buffer` is a [slice](/doc/note/slices-arrays-and-tables.md) of bytes
	(the data, a `ptr` and `len`) with additional fields (the metadata): a read
	index (`ri`), a write index (`wi`), a position (`pos`) and whether or not it is
	`closed`.


	## Read Index and Write Index

	Writing to an `io_buffer`, e.g. copying from a file to a buffer, increments
	`wi`. The buffer is full for writing (no more can be written) when `wi` equals
	`len`. Writing does not have to fill a buffer before further processing.

	Reading from an `io_buffer`, e.g. copying from a buffer to a file, increments
	`ri`. The buffer is empty for reading (no more can be read) when `ri` equals
	`wi`. Reading does not have to empty a buffer before further processing.

	An invariant condition is that `((0 <= ri) and (ri <= wi) and (wi <= len))`.

	Having separate read and write indexes simplifies connecting a sequence of
	filters or processors with `io_buffer`s, similar to connecting Unix processes
	with pipes. Each filter reads from the previous buffer and writes to the next
	buffer. Each buffer is written to by the previous filter and is read from by
	the next filter. There's no need to flip a buffer between reading and writing
	modes. Nonetheless, `io_buffer`s are generally not thread-safe.

	Continuing the "decompressing gzip" example, the application would write to
	the source buffer by copying from e.g. `stdin`. The Wuffs library would read
	from the source buffer and write to the destination buffer. The application
	would read from the destination buffer by copying to e.g. `stdout`. Buffer
	space can be re-used, via compaction (see below), so that neither the source
	or destination data needs to be entirely in memory at any point.

	For example, an `io_buffer` of length 8 could have 4 bytes available to read
	and 1 byte available to write. If 1 byte was written, there would then be 5
	bytes available to read. Visually:

	```
	[.. .. .. .. .. .. .. ..]
	\|<- ri ->\| \| \|
	\|<------- wi ------->\| \|
	\|<-------- len -------->\|
	```


	## Position

	An `io_buffer` is a sliding window into a stream of bytes. Its position (`pos`)
	is the number of bytes in the stream prior to the first element of the slice.
	The total number of bytes read from and written to the stream are therefore
	`(pos + ri)` and `(pos + wi)`.

	While every slice element is in-memory, the stream's prior bytes do not
	necessarily have to be in-memory now, or have been in-memory in the past. It is
	valid to open a file, seek to the 1000'th byte and start copying from there to
	an `io_buffer`, provided that `pos` was also initialized to 1000.


	## Closed-ness

	The `closed` field indicates that no further writes are expected to the
	`io_buffer`. When copying from a file to a buffer, `closed` means that we have
	reached EOF (End Of File).

	For example, decoding a particular file format might, at some point, expect at
	least another 4 bytes of data, but only 3 are available to read. If `closed` is
	false, this isn't necessarily an error, since an `io_buffer` holds only a
	partial view of the underlying data stream, and more data might be forthcoming
	but not yet buffered. If `closed` is true, it is definitely an error.


	## Undoing Reads and Writes

	It is possible to decrement `ri` or `wi`, undoing previous reads or writes,
	provided that the invariant `((0 <= ri) and (ri <= wi) and (wi <= len))` holds.
	For example, it can be faster on 64 bit (8 byte) systems, if buffer space is
	available, to write 8 bytes and then undo 1 byte than to write exactly 7 bytes.

	The Wuffs compiler enforces that, during a Wuffs function, `ri` and `wi` will
	never be decremented (by an undo operation) to be less than the initial values
	at the time of the call. When considering a function as a 'black box', the two
	indexes can only travel forward, and it is up to the application code (not
	Wuffs library) code to rewind the indexes (e.g. by compaction).

	Even though `ri` cannot drop below its initial value, Wuffs code can still
	read the contents of the slice before `ri` (in sub-slice notation,
	`data[0 .. ri]`) and it should still contain the `(pos + 0)`th, `(pos + 1)`th,
	etc. byte of the stream.

	The contents of the slice after `wi` (in sub-slice notation, `data[wi .. len]`)
	are undefined, and code should not rely on its values. When passing an
	`io_buffer` into a function, that function is free to modify anything in
	`data[wi .. len]`, for either value of `wi` before or after the function
	returns.


	## Compaction

	Compacting an `io_buffer` moves any written but unread bytes (those in `data[ri
	.. wi]`) to the start of the buffer, and updates the metadata fields `ri`, `wi`
	and `pos`. Equivalently, it moves the sliding window that is the `io_buffer` as
	far forward as possible along the stream.

	This generally increases `(len - wi)`, the number of bytes available for
	writing, allowing for re-using the allocated buffer memory (the data slice).

	Suppose that the underlying data stream's `i`th byte has value `i`, and that we
	start with `ri`, `wi` and `pos` were `3`, `7` and `20`. Compaction will
	subtract 3 from the first two and add 3 to the last, so that the new `ri`, `wi`
	and `pos` are `0`, `4` and `23`. Note that `len`, `(pos + ri)` and `(pos + wi)`
	are all unchanged.

	Here are two equivalent visualizations of before and after compaction. The `xx`
	means a byte whose value is undefined (as it is at or past `wi`).

	The first visualization is where the slice is fixed and its contents (its view
	of the stream) moves relative to the slice:

	```
	Before:
	[20 21 22 23 24 25 26 xx]
	\|<- ri ->\| \| \|
	\|<------- wi ------->\| \|
	\|<-------- len -------->\|

	After:
	[23 24 25 26 xx xx xx xx]
	\|\| \| \|
	\|<-- wi --->\| \|
	\|<-------- len -------->\|
	```

	The second visualization is where the stream (and its contents) is fixed and
	the slice (the sliding window) moves relative to the stream:

	```
	pos+ri pos+wi
	\| \|
	Before: [20 21 22 23 24 25 26 xx]
	Stream: ... 18 19 20 21 22 23 24 25 26 27 27 28 29 30 31 ...
	After: [23 24 25 26 xx xx xx xx]
	\| \|
	pos+ri pos+wi
	```


	## Seeking and I/O Positions

	Recall that Wuffs code has limited capabilities, and cannot seek in the
	underlying I/O data streams per se. When it needs to seek (e.g. when jumping
	between video frames), it will typically provide an "I/O position", a
	`uint64_t` value, via some package-specific API. The application (the caller of
	the Wuffs code) is then responsible for configuring an `io_buffer` whose
	`(pos + ri)` or `(pos + wi)` value, depending on whether we're reading or
	writing, is at that "I/O position".

	If the underlying file (or equivalent) isn't seekable, e.g. it's `/dev/stdin`
	instead of a regular file, then the request cannot be satisfied. The
	application should then decide whether that error is recoverable or fatal. This
	is the application's responsibility, not the library's, as the application
	usually has more context to make that decision.

	If that "I/O position" is already within the sliding window, it might not be
	necessary to seek in the underlying file, as it may be possible to e.g. simply
	decrement `ri` to reach a target `(pos + ri)`, for the reading case. Otherwise,
	the typical process is:

	1. Set `ri`, `wi` and `pos` to `0`, `0` and that "I/O position". This discards
	any buffered data (but does not free the buffer's memory).
	2. Seek in the underlying file to that same "I/O position".
	3. Copy from the underlying file to the `io_buffer`, incrementing `wi`.

	Whether or not it was necessary to seek and copy from the underlying file, when
	calling back into the Wuffs library, it typically checks that the `io_buffer`'s
	`(pos + ri)` is now at the expected "I/O position".


	## I/O Reader and I/O Writer

	An `io_buffer` is the mechanism for transferring data between the application
	and the Wuffs library. Application code can manipulate an `io_buffer`'s fields
	as it wishes (but is responsible for maintaining the invariant condition).
	Wuffs library code places a further restriction that `io_buffer`s are used
	exclusively either for reading or for writing, as optimizing incremental access
	to an `io_buffer`'s data, while enforcing invariants, is simpler when only one
	of `ri` and `wi` can vary.

	Wuffs code therefore refers to either a `base.io_reader` or `base.io_writer`,
	both of which are essentially the same type (an `io_buffer`) with different
	methods. Wuffs code does not reference an `io_buffer` directly.


	## Binding

	An `io_bind` block temporarily adapts a slice of bytes as an `io_reader` or
	`io_writer`. This is typically done to call other functions that take an
	`io_reader` or `io_writer` as an argument.

	```
	var r : base.io_reader
	var s : slice base.u8

	etc

	// Just before the io_bind, r's state is saved.

	io_bind (io: r, data: s) {
	// At the top of the block, r's data slice is set to s, and r's metadata is
	// set so that ri = 0, pos = 0 and closed = false.
	//
	// Because r is an io_reader, not an io_writer, the wi metadata field is
	// set to the slice length, not 0.
	//
	// r must be a local variable, but s can be an expression.
	etc
	}

	// Just after the io_bind, r's state is restored.
	```