The transcoder unpacks .basis files to various GPU texture formats, almost always without needing to decompress entire images at the pixel level (i.e. it only deals with arrays of blocks). The one exception is PVRTC1, where the transcoder needs to recompute the per-pixel selector (“modulation”) values, but it does so using simple scalar operations.
To use .basis files in an application, you only need the files in the “transcoder” directory. The entire transcoder lives in a single .cpp file: transcoder/basisu_transcoder.cpp. If compiling with gcc/clang, be sure strict aliasing is disabled when compiling this file, as I have not tested either the encoder or transcoder with strict aliasing enabled: -fno-strict-aliasing (The Linux kernel is also compiled with this option.) The transcoder can also be cross compiled using emscripten (emcc), for web use.
To use the transcoder, #include “transcoder/basisu_transcoder.h”. Call
basist::basisu_transcoder_init() a single time (probably at startup). Also, ideally once at startup, you need to create a single instance of the
basist::etc1_global_selector_codebook class, like this:
basist::etc1_global_selector_codebook sel_codebook(basist::g_global_selector_cb_size, basist::g_global_selector_cb);
Now you can use the transcoder, which is implemented in the “basisu_transcoder” class in transcoder/basisu_transcoder.h. The key methods are
I will be simplifying the transcoder so the caller doesn‘t need to deal with etc1_global_selector_codebook’s next. To get an idea how to use the API, you can check out WebGL/basis_wrappers.cpp.
transcode_slice() are thread safe, i.e. you can decompress multiple images/slices from multiple threads.
To get development error messages printed to stdout when something goes wrong inside the transcoder, set the BASISU_DEVEL_MESSAGES macro to 1 in basisu_transcoder.h and recompile.
When transcoding video files, the
transcode_slice() methods must be called in order from first to last frame, with no frames skipped. Currently, the first image is an I-Frames, and all subsequent frames are P-Frames. We will be adding more support for periodic I-Frames and seeking eventually. (The encoder and transcoder support I-Frames anywhere, we just need more work to expose this option.)
Internally, Basis files are composed of a non-uniform texture array of one or more 2D ETC1S texture “slices”. ETC1S is a simple subset of the ETC1 texture format popular on Android. ETC1S has no block flips, no 4x2 or 2x4 subblocks, and each block only uses 555 base colors. ETC1S is still 100% standard ETC1, so transcoding to ETC1 or the color block of ETC2 is a no-op. We chose ETC1S because it has the very valuable property that it can be quickly transcoded to almost any other GPU texture format at very high quality using only simple per-block operations with small 1D lookup tables. Transcoding ETC1S to BC1 usually only introduces around .3 dB Y PSNR quality loss, with less loss for ETC1S->BC7. Transcoding to PVRTC1 involves only simple block level operations to compute the endpoints, and simple per-pixel scalar operations to compute the modulation values.
Basis files have a single set of compressed global endpoint/selector codebooks in ETC1S format, which all slices utilize. The ETC1S texture data is compressed using vector quantization (VQ) seperately on the endpoints and selectors, followed by DPCM/RLE/psuedo-MTF/canonical Huffman coding. Each ETC1S texture slice may be a different resolution. Mipmaps (if any) are always stored in order from largest to smallest level. The file format supports either storing the selector codebook directly (using DPCM+Huffman), or storing the selector codebook using a hierarchical virtual codebook scheme.
Once the codebook and Huffman tables are decompressed, the slices are randomly accessible in any order. Opaque files always have one slice per image mipmap level, and files with alpha channels always have two slices per image mipmap level (even if some images in the file don't have alpha channels, i.e. alpha is all or nothing at the file level). The transcoder abstracts these details away into a simple “image” API, which is what most callers will use. An image is either one or more RGB slices (one per mipmap level), or one or more pairs of RGB/A slices (two per mipmap level). Internally, alpha slices are also stored in ETC1S format, like the color data, so selector correlations across color/alpha can be exploited. This also allows both RGB and alpha slices to be transcoded to opaque-only texture formats like ETC1, BC1, or PVRTC1 with no transparency.
We currently only support CPU transcoding, but GPU assisted transcoding/format conversion is also possible by uploading the decompressed codebooks as textures and using compute shaders to convert the ETC1S codebook block indices to the desired output texture or pixel format.
I‘m going to provide a simple C-style API to call the encoder directly. For now, you can call the C++ interface in basisu_comp.cpp/.h. See struct basis_compressor_params and class basis_compressor. Almost the entire command line tool’s functionality is in basis_compressor. This class supports 100% in-memory compression with no file I/O.
Internally, all ETC1S slices can be converted to any format, and the system is very flexible. The transcoder's image API supports converting alpha slices to color texture formats, which allows the user to transcode textures with alpha to two ETC1 images, etc.
ETC1 - The system's internal texture format is ETC1S, so outputting ETC1 texture data is a no-op. We only use differential encodings, each subblock uses the same base color (the differential color is always [0,0,0]), and flips are always enabled.
ETC2 - The color block will be ETC1S, and the alpha block is EAC. Conversion from ETC1S->EAC is nearly lossless.
BC1/DXT1 - ETC1S->BC1 conversion loses approx. .3-.5 dB Y PSNR relative to the source ETC1S data. We don't currently use 3 color (punchthrough) blocks, but we could easily add them.
BC3/DXT5 - The color block is BC1, the alpha block is BC4. ETC1S->BC4 is nearly lossless.
BC4/DXT5A - ETC1S->BC4 conversion is nearly lossless.
BC5/3DC/DXN - Two BC4 blocks. As the conversion from ETC1S->BC4 blocks is nearly lossless, we think this format (with large codebooks) will work well with high quality tangent space normal maps. Each channel gets its own ETC1S texture.
BC7 - Currently we only output mode 6, for opaque only textures. The conversion from ETC1S->BC7 is nearly lossless. We'll soon be adding mode 4 or 5 support for alpha textures.
PVRTC1 4bpp - Currently we only support opaque textures. The conversion from ETC1S->PVRTC1 is a two step process. The first step finds the RGB bounding boxes of each ETC1S block, which is fast (we don‘t need to process the entire block’s pixels, just the 1-4 used block colors). The first pass occurs during ETC1S transcoding. The second pass computes the per-pixel 2-bpp modulation values, which is fast because we can do this in a luma-like colorspace using simple scalar (not full RGB) operations. The second pass is highly optimized, and threading it would be easy. Quality is roughly the same as PVRTexTool‘s “Normal (Good Quality)” setting. ETC1S->PVRTC1 loses the most quality - several Y dB PSNR. (I’ll be adding better statistics here soon.)
Interestingly, the low pass filtering-like artifacts due to PVRTC1's unique block endpoint interpolation help obscure ETC1S chroma artifacts.
We will be adding PVRTC1 4bpp transparent support soon, and possibly PVRTC1 2bpp.
Currently, the PVRTC1 transcoder requires that the ETC1S texture‘s dimensions both be a power of two (but non-square is OK, although I believe iOS doesn’t support that). We will be adding the ability to transcode non-pow2 ETC1S textures to larger pow2 PVRTC1 textures soon.
ASTC1: 4x4 is definitely coming and will be comparable to BC7's quality. We may also support fast conversion to ASTC 6x6 pixel blocks.
First, become familiar with the exact compressed texture formats your device hardware and rendering API support. Just because your device supports a particular format (like PVRTC2) doesn‘t mean your OS or API does (iOS doesn’t support PVRTC2, even though their hardware did). On Android, ETC1/2 are popular. iOS supports PVRTC1 (pretty much always) and possibly ETC1/2 (but absolutely don't bet on it), and on desktop BC1-5/BC7 are king.
Also, become familiar with any texture size restrictions. For example, on iOS, you can only use square power of 2 texture dimensions for PVRTC1, and there's nothing Basis can do for you today that works around this limitation. (We will be supporting the ability to trancode smaller non-pow2 textures into larger power of 2 PVRTC1 textures soon.)
Here are the major texturing scenarios we support today:
For color-only textures, you can transcode to whatever format your target device supports. Remember that PVRTC1 requires square power of 2 size textures, and there‘s nothing Basis can currently do to help you work around this limitation. (Basic supports non-square PVRTC1 textures, but iOS doesn’t.)
For alpha textures, you can create .basis files with alpha channels. To do this with the basisu compressor, either create 32-bit PNG files with alpha, or use two PNG files with the “-alpha_file” command line option to specify where the alpha data should come from. (For texture arrays, you can use multiple -file and -alpha_file command line options. Mipmap generation automatically supports alpha channels.)
Now deploy like this:
ETC1 only devices/API‘s: Transcode to two ETC1 textures and sample them in a shader. You can either use one ETC1 texture that’s twice as high, or two separate ETC1 textures.
ETC2 devices/API's: Just transcode to ETC2 EAC RGBA, which the transcoder supports.
PVRTC1 devices/API‘s: Transcode to two PVRTC1 4bpp textures, and sample twice. We’re working on adding PVRTC1 transparency support to the transcoder ASAP (but quality will definitely suffer a bit)
Devices/API's supporting only BC1-5: Use BC3, which the transcoder supports.
Newer devices supporting BC6H/BC7: You still need to transcode to BC3. We will support BC7 with transparency very soon, which will give you slightly higher quality.
Compress with the -normal_map flag, which disables a lot of stuff that has interfered with normal maps in the past. Also compress with -slower, which creates the highest quality codebooks. Use larger codebooks (use the -max_endpoints and -max_selectors options directly, with larger values).
Start with 2 component normalized XY tangent space normal maps (where XY range from [-1,1]) and encode them into two 8-bit channels (where XY is packed into [0,255]). Now put X in color, and Y in alpha, and compress that 32-bit PNG using basisu. The command line tool and encoder class support the option “-seperate_rg_to_color_alpha” that swizzles 2 component RG normal maps to RRRG before compression, aiding this process.
ETC1 only devices/API‘s: Transcode to two ETC1 textures and sample them in a shader. You can either use one ETC1 texture that’s twice as high, or two separate ETC1 textures. The transcoder supports transcoding alpha slices to any color output format using a special flag:
basist::basisu_transcoder::cDecodeFlagsTranscodeAlphaDataToOpaqueFormats. This will look great because each channel gets its own endpoints and selectors.
ETC2 devices/API's: Transcode to a single ETC2 EAC RGBA texture, sample once in shader, deswizzle to RG (XY). This should look great.
PVRTC1 devices/API's: Transcode to two PVRTC1 opaque textures, sample each in the shader. This should look fairly good.
Devices/API‘s supporting BC1-5, BC6H, BC7: Transcode to a single BC5 textures, which used to be called “ATI 3DC”. It has two high quality BC4 blocks in there, so it’ll look great. Once BC7 alpha support comes online that will be the better option.
BC3/DXT5 or ETC2 EAC: The color data gets transcoded to output color, as you would expect. You‘ll get all-255 blocks in the output alpha blocks, because the transcoder doesn’t have any alpha slice data to convert to the output format. (Alternately, we could convert a single channel of the color data (like G) to output alpha, and assume the user will swizzle in the shader, which could provide a tiny gain in ETC1S conversion quality. But now output alpha would require special interpretation and we would need to invoke the block transcoders twice.)
BC4/DXT5A: This format is usually interpreted as holding single channel red-only data. We invoke the ETC1S->BC4 transcoder, which takes the red channel of the color slice (which we assume is grayscale, but doesn't have to be) and converts that to BC4/DXT5A blocks. (We could allow the user to select the source channel, if that is useful.)
BC5/3DC: This format has two BC4 blocks, and is usually used for XY (red/green) tangent space normal maps. The first block (output red/or X) will have the R channel of the color slice (which we assume is actually grayscale, but doesn‘t have to be), and the output green channel (or Y) will have all-255 blocks. We could support converting the first two color components of the color ETC1S texture slice to BC5, but doing so doesn’t seem to have any practical benefits (just use BC1 or BC7). Alternately we could support allowing the user to select a source channel other than red.
Note that you can directly control exactly how transcoding works at the block level by calling a lower level API, basisu_transcoder::transcode_slice(). The higher level API (transcode_image_level) uses this low-level API internally. find_slice() and get_file_info() return all the slice information you would need to call this lower level API. I would study transcode_image_level()'s implementation before using the slice API to get familar with it. The slice API was written first.
Within the next couple months or so, we'll be adding ASTC 4x4 opaque and transparent (and maybe 6x6), PVRTC1 4bpp transparent, and BC7 transparent. Of these, PVRTC1 4bpp transparent will be the most challenging from a quality perspective, and ASTC will be the most challenging from a texture format perspective. The resulting quality will still be baseline ETC1S.
We‘ll be upgrading the system’s quality to something halfway in between BC1 and BC7 (but more towards BC7 than BC1). We‘re going to enlarge the codebooks with optional extended data, add 2 partitions so blocks can use multiple color endpoints, and add higher precision selectors and endpoints. We may allow codebook entries to be split up (with extra per-block data indicating which split codebook entry to use), creating larger codebooks that only the extended texture data references. We’re going to leverage what we learned building our state of the art vectorized BC7 encoder (Basis BC7), and our open source bc7enc16 encoder, while creating this. We currently think just BC7 modes 1 and 6 (and the ASTC equivalents) will be enough.
We need a C-style API for the compressor class, and a bunch of compression/transcoding examples (native and WebGL). We also need to release a decent regression test.
Basis supports up to 16K codebooks for both endpoints and selectors for significantly higher quality textures, uses much higher quality codebook generators, the format uses a new prediction scheme for block endpoints (replicate one of 3 neighbors or use DPCM+Huffman from left neighbor), the format uses a selector history buffer, and RLE codes are implemented for all symbol types for high efficiency on simpler textures. The encoder also implements several new rate distortion optimization stages on both endpoint and selectors, and in Basis the encoder backend can call back into the frontend to reoptimize endpoints or selectors after the RDO stages modify the block codebook indices for better rate distortion performance.
The file format also supports very large non-uniform texture arrays, making the system usable as an RDO backend in specialized block-based video encoders. Internally, the encoder only handles blocks and all later RDO stages which assume a fixed 2D raster order of the blocks can be optionally disabled.
A huge thanks to Google for partnering with us and enabling this system to be open sourced.
Thanks to a number of companies or groups who have supported or helped out Binomial over the years: Intel, SpaceX, Netflix, Forgotten Empires, Microsoft, Polystream, Hothead Games, BioDigital, Magic Leap, Blizzard Entertainment, Insomniac Games, Rockstar Games, Facebook, Activision, the Khronos Group, and the organizers at CppCon.
Thanks to Matt Pritchard, formerly of Valve Software and Microsoft, for helping me with the computer hardware I used while building this system and its predecessor.
Thanks to John Brooks at Blue Shift, Inc. for inspiring this work by showing me his Dreamcast texture compression system around 2002, and for releasing etc2comp. I first saw the subblock flip estimation approach (used in basisu_etc.cpp) in etc2comp.
Thanks to Colt McAnlis, for advertising one of my earlier open source texture compression libraries at GDC, and Won Chun, who originally suggested making a universal system.
I first saw using precomputed tables for quickly computing optimal encodings of solid color blocks in ryg_dxt. The method that limits the canonical Huffman codelengths to a maximum codesize was used in Yoshizaki‘s lharc. The canonical Huffman codelength compression system is similar to Katz’s Deflate method.
The codebook generation process is basically a high quality, but slow and brute force reference. It's possible to massively speed up codebook gen in several ways. One way is to not throw away the tree structures constructed during the creation of the initial codebooks.
The way the -q (quality) option is converted to codebook sizes is very simple (fixed formulas), and could be improved. It has a tendancy to plateue on some files.
The various Huffman codes could be divided up into groups (like Zstd), for much faster Huffman decoding in the transcoder. Also, larger slices could be divided up into multiple segments, and each segment transcoded using a different thread. Both of these changes would modify the format.
PVRTC1 modulation values could be determined using multiple threads and/or SIMD code.
PVRTC1 2bpp and ATITC support wouldn't be hard to add.
The transcoder's BC7 tables are a bit large, and can be reduced, which would allow the transcoder to be downloaded more quickly.
3-bit selectors for alpha would greatly improve the quality of the alpha, but would break the file format and require extensive additions to the compressor/transcoder.
Fast 6x6 ASTC support may be possible.