Document features in Sparse Strips more correctly (#1356)

Also bring in some better consistency with Xilem/Masonry readmes.

I also audited the features of Vello Common, and found that there were
two available feature flags which were unmentioned; i.e. removing them
didn't actually require any other changes. These are the `bytemuck` and
`simd` features. As such, I suggest that we should remove them, and have
done so in this PR.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 70dfc91..0f1ba8f 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -107,10 +107,10 @@
           tool: cargo-rdme
 
       - name: Run cargo rdme (vello_cpu)
-        run: cargo rdme --check --heading-base-level=0 --workspace-project=vello_cpu
+        run: cargo rdme --check --workspace-project=vello_cpu
 
       - name: Run cargo rdme (vello_common)
-        run: cargo rdme --check --heading-base-level=0 --workspace-project=vello_common
+        run: cargo rdme --check --workspace-project=vello_common
     
       - name: Run cargo rdme (vello_api)
         run: cargo rdme --check --heading-base-level=0 --workspace-project=vello_api
diff --git a/sparse_strips/vello_common/Cargo.toml b/sparse_strips/vello_common/Cargo.toml
index 4bbc957..2ec22f4 100644
--- a/sparse_strips/vello_common/Cargo.toml
+++ b/sparse_strips/vello_common/Cargo.toml
@@ -17,7 +17,7 @@
 targets = []
 
 [dependencies]
-bytemuck = { workspace = true, features = [] }
+bytemuck = { workspace = true, features = ["derive"] }
 peniko = { workspace = true, features = ["bytemuck"] }
 fearless_simd = { workspace = true }
 hashbrown = { workspace = true, features = ["raw-entry"] }
@@ -29,12 +29,10 @@
 log = { workspace = true }
 
 [features]
+# If adding new features, also document in `src/lib.rs`
 default = ["std", "png", "text"]
-# Enable using SIMD instructions for rendering
-simd = []
 # Get floating point functions from the standard library (likely using your target's libc).
 std = ["peniko/std", "skrifa?/std", "fearless_simd/std"]
-bytemuck = ["bytemuck/bytemuck_derive"]
 # Use floating point implementations from libm.
 libm = ["peniko/libm", "skrifa?/libm", "dep:libm", "fearless_simd/libm"]
 # Allow loading Pixmap from PNG, and drawing png glyphs.
diff --git a/sparse_strips/vello_common/README.md b/sparse_strips/vello_common/README.md
index 7340beb..683b972 100644
--- a/sparse_strips/vello_common/README.md
+++ b/sparse_strips/vello_common/README.md
@@ -24,6 +24,7 @@
 
 [libm]: https://crates.io/crates/libm
 [crate::pixmap::Pixmap]: https://docs.rs/vello_common/latest/vello_common/pixmap/struct.Pixmap.html
+[`glyph`]: https://docs.rs/vello_common/latest/vello_common/glyph/index.html
 
 <!-- cargo-rdme start -->
 
@@ -42,12 +43,11 @@
 
 - `std` (enabled by default): Get floating point functions from the standard library
   (likely using your target's libc).
-- `libm`: Use floating point implementations from [libm].
+- `libm`: Use floating point implementations from [libm][].
 - `png` (enabled by default): Allow loading [`Pixmap`][crate::pixmap::Pixmap]s from PNG images.
   Also required for rendering glyphs with an embedded PNG.
   Implies `std`.
-- `simd`: Allows requesting SIMD execution modes.
-  Note that SIMD is not yet implemented.
+- `text` (enabled by default): Enables glyph rendering (see the [`glyph`][] module).
 
 At least one of `std` and `libm` is required; `std` overrides `libm`.
 
diff --git a/sparse_strips/vello_common/src/lib.rs b/sparse_strips/vello_common/src/lib.rs
index def21b8..93d84c4 100644
--- a/sparse_strips/vello_common/src/lib.rs
+++ b/sparse_strips/vello_common/src/lib.rs
@@ -1,9 +1,12 @@
 // Copyright 2025 the Vello Authors
 // SPDX-License-Identifier: Apache-2.0 OR MIT
 
+// After you edit the crate's doc comment, run this command, then check README.md for any missing links
+// cargo rdme --workspace-project=vello_common
+
 //! This crate includes common geometry representations, tiling logic, and other fundamental components used by both [Vello CPU][vello_cpu] and Vello Hybrid.
 //!
-//! ## Usage
+//! # Usage
 //!
 //! This crate should not be used on its own, and you should instead use one of the renderers which use it.
 //! At the moment, only [Vello CPU][vello_cpu] is published, and you probably want to use that.
@@ -12,20 +15,19 @@
 //! Vello CPU is being developed as part of work to address shortcomings in Vello.
 //! Vello does not use this crate.
 //!
-//! ## Features
+//! # Features
 //!
 //! - `std` (enabled by default): Get floating point functions from the standard library
 //!   (likely using your target's libc).
-//! - `libm`: Use floating point implementations from [libm].
+//! - `libm`: Use floating point implementations from [libm][].
 //! - `png` (enabled by default): Allow loading [`Pixmap`][crate::pixmap::Pixmap]s from PNG images.
 //!   Also required for rendering glyphs with an embedded PNG.
 //!   Implies `std`.
-//! - `simd`: Allows requesting SIMD execution modes.
-//!   Note that SIMD is not yet implemented.
+//! - `text` (enabled by default): Enables glyph rendering (see the [`glyph`][] module).
 //!
 //! At least one of `std` and `libm` is required; `std` overrides `libm`.
 //!
-//! ## Contents
+//! # Contents
 //!
 //! - Shared data structures for paths, tiles, and strips
 //! - Geometry processing utilities
@@ -34,7 +36,8 @@
 //! This crate acts as a foundation for `vello_cpu` and `vello_hybrid`, providing essential components to minimize duplication.
 //!
 //! [vello_cpu]: https://crates.io/crates/vello_cpu
-
+#![cfg_attr(feature = "libm", doc = "[libm]: libm")]
+#![cfg_attr(not(feature = "libm"), doc = "[libm]: https://crates.io/crates/libm")]
 // LINEBENDER LINT SET - lib.rs - v3
 // See https://linebender.org/wiki/canonical-lints/
 // These lints shouldn't apply to examples or tests.
diff --git a/sparse_strips/vello_cpu/Cargo.toml b/sparse_strips/vello_cpu/Cargo.toml
index a42952c..2b7b32a 100644
--- a/sparse_strips/vello_cpu/Cargo.toml
+++ b/sparse_strips/vello_cpu/Cargo.toml
@@ -26,6 +26,7 @@
 thread_local = { workspace = true, optional = true }
 
 [features]
+# If adding new features, also add to `src/lib.rs`
 default = ["std", "png", "text", "u8_pipeline"]
 # Get floating point functions from the standard library (likely using your target’s libc).
 std = ["vello_common/std"]
diff --git a/sparse_strips/vello_cpu/README.md b/sparse_strips/vello_cpu/README.md
index 6aee092..89be0d2 100644
--- a/sparse_strips/vello_cpu/README.md
+++ b/sparse_strips/vello_cpu/README.md
@@ -16,7 +16,7 @@
 
 <!-- We use cargo-rdme to update the README with the contents of lib.rs.
 To edit the following section, update it in lib.rs, then run:
-cargo rdme --workspace-project=vello_cpu --heading-base-level=0
+cargo rdme --workspace-project=vello_cpu
 Full documentation at https://github.com/orium/cargo-rdme -->
 
 <!-- Intra-doc links used in lib.rs should be evaluated here.
@@ -27,9 +27,10 @@
 [RenderContext::fill_path]: https://docs.rs/vello_cpu/latest/vello_cpu/struct.RenderContext.html#method.fill_path
 [RenderContext::stroke_path]: https://docs.rs/vello_cpu/latest/vello_cpu/struct.RenderContext.html#method.stroke_path
 [RenderContext::glyph_run]: https://docs.rs/vello_cpu/latest/vello_cpu/struct.RenderContext.html#method.glyph_run
+[RenderMode::OptimizeSpeed]: https://docs.rs/vello_cpu/latest/vello_cpu/enum.RenderMode.html#variant.OptimizeSpeed
+[RenderMode::OptimizeQuality]: https://docs.rs/vello_cpu/latest/vello_cpu/enum.RenderMode.html#variant.OptimizeQuality
 [`RenderContext::render_to_pixmap`]: https://docs.rs/vello_cpu/latest/vello_cpu/struct.RenderContext.html#method.render_to_pixmap
 [`Pixmap`]: https://docs.rs/vello_cpu/latest/vello_cpu/struct.Pixmap.html
-[libm]: https://crates.io/crates/libm
 
 <!-- cargo-rdme start -->
 
@@ -93,12 +94,22 @@
 
 - `std` (enabled by default): Get floating point functions from the standard library
   (likely using your target's libc).
-- `libm`: Use floating point implementations from `libm`.
+- `libm`: Use floating point implementations from [libm][].
 - `png`(enabled by default): Allow loading [`Pixmap`]s from PNG images.
-  Also required for rendering glyphs with an embedded PNG.
-- `multithreading`: Enable multi-threaded rendering.
+  Also required for rendering glyphs with an embedded PNG. Implies `std`.
+- `multithreading`: Enable multi-threaded rendering. Implies `std`.
+- `text` (enabled by default): Enables glyph rendering ([`glyph_run`][RenderContext::glyph_run]).
+- `u8_pipeline` (enabled by default): Enable the u8 pipeline, for speed focused rendering using u8 math.
+  The `u8` pipeline will be used for [`OptimizeSpeed`][RenderMode::OptimizeSpeed], if both pipelines are enabled.
+  If you're using Vello CPU for application rendering, you should prefer this pipeline.
+- `f32_pipeline`: Enable the `f32` pipeline, which is slower but has more accurate
+  results. This is espectially useful for rendering test snapshots.
+  The `f32` pipeline will be used for [`OptimizeQuality`][RenderMode::OptimizeQuality], if both pipelines are enabled.
 
 At least one of `std` and `libm` is required; `std` overrides `libm`.
+At least one of `u8_pipeline` and `f32_pipeline` must be enabled.
+You might choose to disable one of these pipelines if your application
+won't use it, so as to reduce binary size.
 
 ## Caveats
 
@@ -131,6 +142,9 @@
 become outdated as the implementation changes, but it should give a good
 overview nevertheless.
 
+<!-- We can't directly link to the libm crate built locally, because our feature is only a pass-through  -->
+[libm]: https://crates.io/crates/libm
+
 <!-- cargo-rdme end -->
 
 ## Minimum supported Rust Version (MSRV)
diff --git a/sparse_strips/vello_cpu/src/lib.rs b/sparse_strips/vello_cpu/src/lib.rs
index 497cb1f..0c253f1 100644
--- a/sparse_strips/vello_cpu/src/lib.rs
+++ b/sparse_strips/vello_cpu/src/lib.rs
@@ -1,12 +1,15 @@
 // Copyright 2025 the Vello Authors
 // SPDX-License-Identifier: Apache-2.0 OR MIT
 
+// After you edit the crate's doc comment, run this command, then check README.md for any missing links
+// cargo rdme --workspace-project=vello_cpu
+
 //! Vello CPU is a 2D graphics rendering engine written in Rust, for devices with no or underpowered GPUs.
 //!
 //! We also develop [Vello](https://crates.io/crates/vello), which makes use of the GPU for 2D rendering and has higher performance than Vello CPU.
 //! Vello CPU is being developed as part of work to address shortcomings in Vello.
 //!
-//! ## Usage
+//! # Usage
 //!
 //! To use Vello CPU, you need to:
 //!
@@ -57,18 +60,28 @@
 //! [examples](https://github.com/linebender/vello/tree/main/sparse_strips/vello_cpu/examples)
 //! to better understand how to interact with Vello CPU's API,
 //!
-//! ## Features
+//! # Features
 //!
 //! - `std` (enabled by default): Get floating point functions from the standard library
 //!   (likely using your target's libc).
-//! - `libm`: Use floating point implementations from `libm`.
+//! - `libm`: Use floating point implementations from [libm][].
 //! - `png`(enabled by default): Allow loading [`Pixmap`]s from PNG images.
-//!   Also required for rendering glyphs with an embedded PNG.
-//! - `multithreading`: Enable multi-threaded rendering.
+//!   Also required for rendering glyphs with an embedded PNG. Implies `std`.
+//! - `multithreading`: Enable multi-threaded rendering. Implies `std`.
+//! - `text` (enabled by default): Enables glyph rendering ([`glyph_run`][RenderContext::glyph_run]).
+//! - `u8_pipeline` (enabled by default): Enable the u8 pipeline, for speed focused rendering using u8 math.
+//!   The `u8` pipeline will be used for [`OptimizeSpeed`][RenderMode::OptimizeSpeed], if both pipelines are enabled.
+//!   If you're using Vello CPU for application rendering, you should prefer this pipeline.
+//! - `f32_pipeline`: Enable the `f32` pipeline, which is slower but has more accurate
+//!   results. This is espectially useful for rendering test snapshots.
+//!   The `f32` pipeline will be used for [`OptimizeQuality`][RenderMode::OptimizeQuality], if both pipelines are enabled.
 //!
 //! At least one of `std` and `libm` is required; `std` overrides `libm`.
+//! At least one of `u8_pipeline` and `f32_pipeline` must be enabled.
+//! You might choose to disable one of these pipelines if your application
+//! won't use it, so as to reduce binary size.
 //!
-//! ## Caveats
+//! # Caveats
 //!
 //! Overall, Vello CPU is already very feature-rich and should be ready for
 //! production use cases. The main caveat at the moment is that the API is
@@ -83,14 +96,14 @@
 //! (more than 4) might give diminishing returns, especially when
 //! making heavy use of layers and clip paths.
 //!
-//! ## Performance
+//! # Performance
 //!
 //! Performance benchmarks can be found [here](https://laurenzv.github.io/vello_chart/),
 //! As can be seen, Vello CPU achieves compelling performance on both,
 //! aarch64 and x86 platforms. We also have SIMD optimizations for WASM SIMD,
 //! meaning that you can expect good performance there as well.
 //!
-//! ## Implementation
+//! # Implementation
 //!
 //! If you want to gain a better understanding of Vello CPU and the
 //! sparse strips paradigm, you can take a look at the [accompanying
@@ -98,6 +111,9 @@
 //! that was written on the topic. Note that parts of the descriptions might
 //! become outdated as the implementation changes, but it should give a good
 //! overview nevertheless.
+//!
+//! <!-- We can't directly link to the libm crate built locally, because our feature is only a pass-through  -->
+//! [libm]: https://crates.io/crates/libm
 // LINEBENDER LINT SET - lib.rs - v3
 // See https://linebender.org/wiki/canonical-lints/
 // These lints shouldn't apply to examples or tests.