tree: d9807b6992f7c0a476b33bad0d6e575a7e31a6dc [path history] [tgz]
  1. README.md
  2. main.c
  3. parse.wuffs
  4. run.sh
  5. wuffs-base.c
hello-wuffs-c/README.md

Hello wuffs-c

This directory contains a simple example of a C program using a Wuffs library. It uses the wuffs-c command line tool, which transpiles from Wuffs code to generate C code.

Traditionally, the first program anyone writes in a given programming language is something that prints “Hello world”. This doesn‘t work for Wuffs, for two reasons. One is that Wuffs doesn’t have a string type per se. Two is that Wuffs code doesn't even have the capability to write to files directly, such as to stdout. Wuffs is a language for writing libraries, not complete programs, and the less Wuffs can do, the less Wuffs can do that is surprising (such as upload your files to the internet), even when processing untrusted input.

Instead, we‘re going to run some Wuffs code that parses a string (like "123") and returns an integer (like 123). It’ll be similar to the C atoi function, although our function will return unsigned instead of signed, as it‘s a simpler problem. Our function will also take a pointer-length pair, not just a C style pointer. Anyway, here’s what the output should look like:

$ ./run.sh
0
12
56789
4294967295
0
3197704724
--------
0
12
56789
4294967295
parse: demo: too large
0
parse: demo: too large
0

The run.sh script compiles and runs the main.c program twice, without and with Wuffs. On each run, it parses 6 inputs. The first 4 are within the uint32_t range and the last 2 are not. Here's an excerpt of main.c:

int main(int argc, char* argv) {
  run("0");
  run("12");
  run("56789");
  run("4294967295");  // (1<<32) - 1, aka UINT32_MAX.
  run("4294967296");  // (1<<32).
  run("123456789012");
  return 0;
}

The first run (without Wuffs) uses a simple C implementation:

uint32_t parse(char* p, size_t n) {
  uint32_t ret = 0;
  for (size_t i = 0; (i < n) && p[i]; i++) {
    ret = (10 * ret) + (p[i] - '0');
  }
  return ret;
}

This works for the first 4 inputs, but silently overflows for the last 2. A subtle point is that, by default, the C compiler accepted this code without any indication that integer overflow could occur, yet integer overflow can lead to serious bugs.

Some C compilers, and some compilers for other languages, can optionally insert run-time checks for integer overflow, but these are typically disabled by default because of the performance impact. Having these checks enabled for developer builds are better than nothing, but it still isn‘t a complete solution. We don’t ship developer builds to our users, and while computer programmers are better than the average person at e.g. spotting phishing attempts, that also means that they can be less likely than the average person to ‘test’ their developer builds on the malicious input that users encounter.

Anyway, in Wuffs, integer overflow is a mandatory concern. Addressing that concern takes a little more code, this time in Wuffs:

pub func parser.parse?(src: base.io_reader) {
    var c : base.u8
    while true {
        c = args.src.read_u8?()
        if c == 0 {
            return ok
        }
        if (c < 0x30) or (0x39 < c) {  // '0' and '9' are ASCII 0x30 and 0x39.
            return "#not a digit"
        }
        // Rebase from ASCII (0x30 ..= 0x39) to the value (0 ..= 9).
        c -= 0x30

        if this.val < 429_496729 {
            this.val = (10 * this.val) + (c as base.u32)
            continue
        } else if (this.val > 429_496729) or (c > 5) {
            return "#too large"
        }
        // Uncomment this assertion to see what facts are known here.
        // assert false
        this.val = (10 * this.val) + (c as base.u32)
    } endwhile
}

Obviously, we could have written the same careful algorithm in C. The point is that, unlike C, Wuffs doesn't let you forget to consider integer (or buffer) overflows. This can certainly be annoying in general, but reassuring whenever the code has to run in security-concious contexts.

Play with the parse.wuffs code and see what sort of compiler errors you get:

diff --git a/hello-wuffs-c/parse.wuffs b/hello-wuffs-c/parse.wuffs
index cb207eec..c38dcf88 100644
--- a/hello-wuffs-c/parse.wuffs
+++ b/hello-wuffs-c/parse.wuffs
@@ -35,7 +35,7 @@ pub func parser.parse?(src: base.io_reader) {
                if this.val < 429_496729 {
                        this.val = (10 * this.val) + (c as base.u32)
                        continue
-               } else if (this.val > 429_496729) or (c > 5) {
+               } else if (this.val > 429_496729) {
                        return "#too large"
                }
                // Uncomment this assertion to see what facts are known here.
$ ./run.sh
check: expression "(10 * this.val) + (c as base.u32)" bounds [4294967290 ..= 4294967299] is not within bounds [0 ..= 4294967295] at stdin:43. Facts:
    c <> -48
    c >= 0
    c <= 9
    this.val >= 429496729
    this.val <= 429496729

Note that the parser.parse method is a coroutine and therefore doesn‘t return the value directly. Instead, the pattern is that the parse method updates the state and the value method returns the state. For more realistic problem domains, the output often isn’t just a single uint32_t, and the pattern for e.g. image decoding in Wuffs is that the decode method takes the destination buffer as an additional argument.

That's the end of this “Hello world”-ish introduction. After that, you can read more documentation about both Wuffs the Language and Wuffs the Library (which links to more example programs).

Finally, there are admittedly a couple of TODOs in this directory, concerning some rough edges in what should be a simple example. Version 0.2 (December 2019) has prioritized users of Wuffs the Library over users of Wuffs the Language. A future version should fix that imbalance.