Wuffs the Language

Wuffs is an imperative, C-like language and much of it should look familiar to somebody versed in any one of C, C++, D, Go, Java, JavaScript, Objective-C, Rust, Swift, etc. Accordingly, this section is not an exhaustive specification, merely a rough guide of where Wuffs differs.

Like Go, semi-colons can be omitted, and wuffsfmt will remove them. Similarly, the body of an if or while must be enclosed by curly {}s. There is no ‘dangling else’ ambiguity.

Keywords

7 keywords introduce top-level concepts:

  • const
  • error
  • func
  • packageid
  • struct
  • suspension
  • use

2 keywords distinguish between public and private API:

  • pri
  • pub

8 keywords deal with control flow within a function:

  • break
  • continue
  • else
  • if
  • iterate
  • return
  • while
  • yield

5 keywords deal with assertions:

  • assert
  • inv
  • post
  • pre
  • via

2 keywords deal with types:

  • nptr
  • ptr

1 keyword deals with local variables:

  • var

TODO: categorize try, io_bind. Also: and, or, not, as, ref, deref, false, true, in, out, this, u8, u16, etc.

Operators

There is no operator precedence. a * b + c is an invalid expression. You must explicitly write either (a * b) + c or a * (b + c).

Some binary operators (+, *, &, |, ^, and, or) are also associative: (a + b) + c and a + (b + c) are equivalent, and can be written as a + b + c.

The logical operators, && and || and ! in C, are written as and and or and not in Wuffs.

TODO: ignore-overflow ops, equivalent to Swift's &+.

Converting an expression x to the type T is written as x as T.

Types

Types read from left to right: ptr array [100] u32 is a pointer to a 100-element array of unsigned 32-bit integers. ptr here means a non-null pointer. Use nptr for a nullable pointer type.

Integer types can also be refined: var x u32[10..20] defines a variable x that is stored as 4 bytes (32 bits) and can be combined arithmetically (e.g. added, compared) with other u32s, but whose value must be between 10 and 20 inclusive. The syntax is reminiscent of Pascal's subranges, but in Wuffs, a subsequent assignment like x = 21 or even x = y + z is a compile time error unless the right hand side can be proven to be within range.

Types can also provide a default value, such as u32[10..20] = 16, especially if zero is out of range. If not specified, the implicit default is zero.

Refinement bounds may be omitted, where the base integer type provides the implicit bound. var x u8[..5] means that x is between 0 and 5. var y i8[-7..] means that y is between -7 and +127. var z u32[..] is equivalent to var z u32.

Refinement bounds must be constant expressions. var x u32[..2+3] is valid, but var x u32; var y u32[..x] is not. Wuffs does not have dependent types. Relationships such as y <= x are expressible as assertions (see below), but not by the type system.

Structs

Structs are a list of fields, enclosed in parentheses: struct point(x i32, y i32). The struct name may be followed by a question mark ?, which means that its methods may be coroutines. (See below).

Functions

Function signatures read from left to right: func max(x i32, y i32)(z i32) is a function that takes two i32s and returns one i32. Two pairs of parentheses are required: a function that in other languages would return void in Wuffs returns the empty struct (), also known as the unit type.

When calling a function, each argument must be named. It is m = max(x:10, y:20) and not m = max(10, 20).

The function name, such as max, may be followed by either an exclamation mark ! or a question mark ? but not both. An exclamation mark means that the function is impure, and may assign to things other than its local variables. A question mark means that the function is impure and furthermore a coroutine: it can return a (recoverable) suspension code, such as needing more input data, or return a (fatal) error code, such as being invalid with respect to a file format. If suspended, calling that function again will resume at the suspension point, not necessarily at the top of the function body. If an error was returned, calling that function again will return the same error.

Some functions are methods, with syntax func foo.bar(etc)(etc), where foo names a struct type and bar is the method name. Within the function body, an implicit this argument will point to the receiving struct. Methods can also be marked as impure or coroutines.

Variables

Variable declarations are hoisted like JavaScript, so that all variables have function scope. Proofs are easier to work with, for both humans and computers, when one variable can't shadow another variable with the same name.

Assertions

Assertions state a boolean typed, pure expression, such as assert x >= y, that the compiler must prove. As a consequence, the compiler can then prove that, if x and y have type u32, the expression x - y will not underflow.

An assertion applies for a particular point in the program. Subsequent assignments or impure function calls, such as x = z or f!(), can invalidate previous assertions. See the “Facts” section below for more details.

Arithmetic inside assertions is performed in ideal integer math, working in the integer ring ℤ. An expression like x + y in an assertion never overflows, even if x and y have a realized (non-ideal) integer type like u32.

The assert keyword makes an assertion as a statement, and can occur inside a block of code like any other statement, such as an assignment x = y or an if statement. The pre, post and inv keywords are similar to assert, but apply only to func bodies and while loops.

pre states a pre-condition: an assertion that must be proven on every function call or on every entry to the loop. Loop entry means the initial execution of the loop, plus every explicit continue statement that targets that while loop, plus the implicit continue at the end of the loop. For while loops, the pre-condition must be proven before the condition (the x < y in while x < y { etc }) executes.

post states a post-condition: an assertion that must be proven on every function return (whether by an explicit return statement or by an implicit return at the end of the function body) or on every loop exit (whether by an explicit break statement or by the implicit break when the condition evaluates false).

For example, a while loop that contains no explicit break statements can always claim the inverse of its condition as a post-condition: while x < y, post x >= y { etc }.

inv states an invariant, which is simply an assertion that is both a pre-condition and post-condition.

When a func or while has multiple assertions, they must be listed in pre, inv and then post order.

Facts

Static constraints are expressed in the type system and apply for the lifetime of the variable, struct field or function argument. For example, var x u32 and var p ptr u32 constrain x to be non-negative and p to be non-null. Furthermore, var y u32[..20] constrains y to be less than or equal to 20.

Dynamic constraints, also known as facts, are previously seen assertions, explicit or implicit, that are known to hold at a particular point in the program. The set of known facts can grow and shrink over the analysis of a program's statements. For example, the assignment x = 7 can both add the implicit assertion x == 7 to the set of known facts, as well as remove any other facts that mention the variable x, as x's value might have changed. TODO: define exactly when facts are dropped or updated.

Similarly, the sequence assert x < y; z = 3 would result in set of known facts that include both the explicit assertion x < y and the implicit assertion z == 3, if none of x, y and z alias another (e.g. they are all local variables).

Wuffs has two forms of non-sequential control flow: if branches (including if, else if, else if chains) and while loops.

For an if statement, such as if b { etc0 } else { etc1 }, the condition b is a known fact inside the if-true branch etc0 and its inverse not b is a known fact inside the if-false branch etc1. After that if statement, the overall set of known facts is the intersection of the set of known facts after each non-terminating branch. A terminating branch is a non-empty block of code whose final statement is a return, break, continue or an if, else if chain where the final else is present and all branches terminate. TODO: also allow ending in while true?

For a while statement, such as while b { etc }, the set of known facts at the start of the body etc is precisely the condition b plus all pre and inv assertions. No other prior facts carry into the loop body, as the loop can re-start coming from other points in the program (i.e. an explicit or implicit continue) If the while loop makes no such pre or inv assertions, no facts are known other than b.

Similarly, the set of known facts after the while loop exits is precisely its inv and post assertions, and no other. In other words, a while loop must explicitly list its state of the world just before and just after it executes. This includes facts (i.e. invariants) about variables that are not mentioned at all by the while condition or its body but are proven before the while loop and assumed after it.

Proofs

Wuffs' assertion and bounds checking system is a proof checker, not a full SMT solver or automated theorem prover. It verifies explicit annotations instead of the more open-ended task of searching for implicit proof steps. This involves more explicit work by the programmer, but compile times matter, so Wuffs is fast (and dumb) instead of smart (and slow).

Nonetheless, the Wuffs syntax is regular (and unlike C++, does not require a symbol table to parse), so it should be straightforward to transform Wuffs code to and from file formats used by more sophisticated proof engines.

Some rules are applied automatically by the proof checker. For example, if x <= 10 and y <= 5 are both known true, whether by a static constraint (the type system) or dynamic constraint (an asserted fact), then the checker knows that the expression x + y is bounded above by 10 + 5 and therefore will not overflow a u8 (but would overflow a u8[..12]).

TODO: rigorously specify these automatic rules, when we have written more Wuffs code and thus have more experience on what rules are needed to implement multiple, real world image codecs.

Other rules are built in to the proof checker but are not applied automatically (see “fast... instead of smart” above). Such rules have double-quote enclosed names that look a little like mathematical statements. They are axiomatic, in that these rules are assumed, not proved, by the Wuffs toolchain. They are typically at a higher level than e.g. Peano axioms, as Wuffs emphasizes practicality over theoretical minimalism. As they are axiomatic, they endeavour to only encode ‘obvious’ mathematical rules. For example, the rule named "a < b: a < c; c <= b" is one expression of transitivity: the assertion a < b is proven if both a < c and c <= b are provable, for some expression c. Terms like a, b and c here are all integers in ℤ, they do not encompass floating point concepts like negative zero, NaNs or rounding.

Such rules are invoked by the via keyword. For example, assert n_bits < 12 via "a < b: a < c; c <= b"(c:width) makes the assertion n_bits < 12 by applying that transitivity rule, where a is n_bits, b is 12 and c is width. The trailing (c:width) syntax is deliberately similar to a function call (recall that when calling a function, each argument must be named), but the "a < b: a < c; c <= b" named rule is not a function-typed expression.

TODO: specify these built-in via rules, again after more experience.

Miscellaneous Language Notes

Labeled break and continue statements enable jumping out of loops that aren‘t the most deeply nested. The syntax for labels is while:label instead of Java’s label:while, as the former is slightly easier to parse, and Wuffs does not otherwise use labels for switch cases or goto targets.

TODO: describe the built in buf1 and buf2 types: 1- and 2-dimensional buffers of bytes, such as an I/O stream or a table of pixel data.


Updated on April 2018.