layout: default title: Resource and Data Tracing nav_order: 2 parent: ICU Data

Resource and Data Tracing

{: .no_toc }

Contents

{: .no_toc .text-delta }

  1. TOC {:toc}

Overview

When building an ICU data filter specification, it is useful to see what resources are being used by your application so that you can select those resources and discard the others. This guide describes how to use utrace.h to inspect resource access in real time in ICU4C.

Note: This feature is only available in ICU4C at this time. If you are interested in ICU4J, please see ICU-20656.

Quick Start

First, you must have a copy of ICU4C configured with tracing enabled.

$ ./runConfigureICU Linux --enable-tracing

The following program prints resource and data usages to standard out:

#include "unicode/brkiter.h"
#include "unicode/errorcode.h"
#include "unicode/localpointer.h"
#include "unicode/utrace.h"

#include <iostream>

static void U_CALLCONV traceData(
        const void *context,
        int32_t fnNumber,
        int32_t level,
        const char *fmt,
        va_list args) {
    char        buf[1000];
    const char *fnName;

    fnName = utrace_functionName(fnNumber);
    utrace_vformat(buf, sizeof(buf), 0, fmt, args);
    std::cout << fnName << " " << buf << std::endl;
}

int main() {
    icu::ErrorCode status;

    const void* context = nullptr;
    utrace_setFunctions(context, nullptr, nullptr, traceData);
    utrace_setLevel(UTRACE_VERBOSE);

    // Create a new BreakIterator
    icu::LocalPointer<icu::BreakIterator> brkitr(
        icu::BreakIterator::createWordInstance("zh-CN", status));
}

The following output is produced from this program:

res-open icudt64l-brkitr/zh_CN.res
res-open icudt64l-brkitr/zh.res
res-open icudt64l-brkitr/root.res
bundle-open icudt64l-brkitr/zh.res
resc       (get) icudt64l-brkitr/zh.res @ /boundaries
resc       (get) icudt64l-brkitr/root.res @ /boundaries/word
resc    (string) icudt64l-brkitr/root.res @ /boundaries/word
file-open icudt64l-brkitr/word.brk

What this means:

  1. The BreakIterator constructor opened three resource files in the locale fallback chain for zh_CN. The actual bundle was opened for zh.
  2. One string was read from that resource bundle: the one at the resource path “/boundaries/word” in brkitr/root.res.
  3. In addition, the binary data file brkitr/word.brk was opened.

Based on that information, you can make a more informed decision when writing resource filter rules for this simple program.

Data Tracing API

The traceData function shown above takes five arguments. The following two are most important for data tracing:

  • fnNumber indicates what type of data access this is.
  • args contains the details on which resources were accessed.

Important: When reading from args, the strings are valid only within the scope of your traceData function. You should make copies of the strings if you intend to save them for further processing.

UTRACE_UDATA_RESOURCE

UTRACE_UDATA_RESOURCE is used to indicate that a value inside of a resource bundle was read by ICU code.

When fnNumber is UTRACE_UDATA_RESOURCE, there are three C-style strings in args:

  1. Data type; not usually relevant for the purpose of resource filtering.
  2. The internal path of the resource file from which the value was read.
  3. The path to the value within that resource file.

To read each of these into different variables, you can write the code,

const char* dataType = va_arg(args, const char*);
const char* filePath = va_arg(args, const char*);
const char* resPath = va_arg(args, const char*);

As stated above, you should copy the strings if you intend to save them. The pointers will not be valid after the tracing function returns.

UTRACE_UDATA_BUNDLE

UTRACE_UDATA_BUNDLE is used to indicate that a resource bundle was opened by ICU code.

For the purposes of making your ICU data filter, the specific resource paths provided by UTRACE_UDATA_RESOURCE are more precise and useful.

UTRACE_UDATA_DATA_FILE

UTRACE_UDATA_DATA_FILE is used to indicate that a non-resource-bundle binary data file was opened by ICU code. Such files are used for break iteration, conversion, confusables, and a handful of other ICU services.

UTRACE_UDATA_RES_FILE

UTRACE_UDATA_RES_FILE is used to indicate that a binary resource bundle file was opened by ICU code. This can be helpful to debug locale fallbacks. This differs from UTRACE_UDATA_BUNDLE because the resource file is typically opened only once per application runtime.

For the purposes of making your ICU data filter, the specific resource paths provided by UTRACE_UDATA_RESOURCE are more precise and useful.