This documents describes how to develop a fuzzer target for an ICU API and its integration into the ICU build process.
Fuzzer targets are exclusively in directory source/test/fuzzer/
and end with _fuzzer.cpp
. Only files with such ending are recognized and executed as fuzzer targets by the OSS-Fuzz system.
As a minimum, a fuzzer target contains the function
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { ... }
This function is expected and invoked by the fuzzer system. The data
parameter contains the fuzzer-controlled data of size size
bytes. Part or all of this data is then passed into the ICU API under test.
Fuzzer target collator_rulebased_fuzzer.cpp
illustrates the basic elements.
// © 2019 and later: Unicode, Inc. and others. // License & terms of use: http://www.unicode.org/copyright.html #include <cstring> #include "fuzzer_utils.h" #include "unicode/coll.h" #include "unicode/localpointer.h" #include "unicode/locid.h" #include "unicode/tblcoll.h" IcuEnvironment* env = new IcuEnvironment(); extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { UErrorCode status = U_ZERO_ERROR; size_t unistr_size = size/2; std::unique_ptr<char16_t[]> fuzzbuff(new char16_t[unistr_size]); std::memcpy(fuzzbuff.get(), data, unistr_size * 2); icu::UnicodeString fuzzstr(false, fuzzbuff.get(), unistr_size); icu::LocalPointer<icu::RuleBasedCollator> col1( new icu::RuleBasedCollator(fuzzstr, status)); return 0; }
The ICU API under test is the RuleBasedCollator(const UnicodeString &rules, UErrorCode &status)
constructor. The code interprets the fuzzer data as UnicodeString and passes it to the constructor. And that is all. Specific error handling or return value verification is not required because the fuzzer will detect all memory issues by means of memory/address sanitizer findings.
ICU fuzzer targets are built and executed by the OSS-Fuzz project. On side of ICU they are compiled to assure that the code is syntactically correct and, as a sanity check, executed in the most basic manner, i.e. with minimal testdata and without ASAN or MSAN analysis.
Add the new fuzzer target to the list of targets in the FUZZER_TARGETS
variable in Makefile.in
. The new fuzzer target will then be built and executed as part of a normal ICU4C unit test run. Note that each fuzzer target becomes executable on its own. As such it is linked with the code in fuzzer_driver.cpp
, which contains the main()
function.
Any fuzzer seed data for a fuzzer target goes into a file with name <fuzzer_target>_seed_corpus.txt
. In many cases the input parameter of the ICU API under test is of type UnicodeString
, in case of which the seed data should be in UTF-16 format. As an example,see collator_rulebased_fuzzer_seed_corpus.txt.
At this time reproduction of fuzzer findings requires Docker installed on the local machine and the OSS-Fuzz project downloaded in a local git client.
Install Docker (Ubuntu):
sudo apt install docker
Download OSS-Fuzz, switch into directory oss-fuzz/
In a git client directory, download the fuzzer system.
git clone https://github.com/google/oss-fuzz.git cd oss-fuzz/
Build the Docker image for ICU. In some setups root permissions may be required to connect to the Docker.
[sudo] python infra/helper.py build_image icu
A prompt will appear: Pull latest base images (compiler/runtime)? (y/N)
Respond: ‘N’. If you are curious then respond with ‘y’ (won't hurt).
Build the ICU fuzzers:
[sudo] python infra/helper.py build_fuzzers --sanitizer [address | memory | undefined] icu
Check that the fuzzer targets were built successfully: ls -l build/out/icu
Reproduce the fuzzer finding. First, get the testdata the fuzzer used when finding the issue. In the fuzzer bug report look for ‘Reproducer Testcase’, a click on the link will download the testdata. Then execute
[sudo] python infra/helper.py reproduce icu <icu_fuzzer> <testdata>
Concrete example:
sudo python infra/helper.py reproduce icu uregex_open_fuzzer ~/Downloads/clusterfuzz-testcase-minimized-uregex_open_fuzzer-5732067058384896
Limitations: When reproducing a fuzzer finding in the way outlined above the fuzzer environment will use the current ICU trunk from https://github.com/unicode-org/icu.git. Thus it is not possible to modify the code to try out a possible fix. What can be done is to redirect Docker to download ICU from a forked ICU repository. Open the file oss-fuzz/projects/icu/Dockerfile and adjust the line with git clone --depth 1 https://github.com/unicode-org/icu.git icu
accordingly. Then modify the code in the forked repository and follow the steps above beginning with step 3, create a Docker image.
This of course is still a tedious way of reproducing and working on a fuzzer finding. Ticket ICU-20734 aims to introduce a fuzzer driver that can reproduce certain fuzzer findings in a local ICU workspace.